Multivariate Normal Distribution

Authored by: Jodi M. Casabianca , Brian W. Junker

Handbook of Item Response Theory Volume Two Statistical Tools

Print publication date:  February  2016
Online publication date:  March  2017

Print ISBN: 9781466514324
eBook ISBN: 9781315373645
Adobe ISBN:

10.1201/b19166-5

 

Abstract

In this chapter, we review several basic features of the multivariate normal distribution. Section 3.2 considers general properties of the multivariate normal density and Section 3.3 considers the sampling distribution of the maximum likelihood estimators (MLEs) of the mean vector and variance–covariance matrix based on iid (independent, identically distributed) random sampling of a multivariate normal distribution. Section 3.4 reviews the standard conjugate distributions for Bayesian inference with the multivariate normal distribution and Section 3.5 considers various generalizations and robustifications of the normal model. The properties of the multivariate normal distribution are well known and available in many places; our primary sources are the texts by Johnson and Wichern (1998) and Morrison (2005). A classic and comprehensive treatment is given by Anderson’s (2003) text.

 Add to shortlist  Cite

Multivariate Normal Distribution

3.1  Introduction

In this chapter, we review several basic features of the multivariate normal distribution. Section 3.2 considers general properties of the multivariate normal density and Section 3.3 considers the sampling distribution of the maximum likelihood estimators (MLEs) of the mean vector and variance–covariance matrix based on iid (independent, identically distributed) random sampling of a multivariate normal distribution. Section 3.4 reviews the standard conjugate distributions for Bayesian inference with the multivariate normal distribution and Section 3.5 considers various generalizations and robustifications of the normal model. The properties of the multivariate normal distribution are well known and available in many places; our primary sources are the texts by Johnson and Wichern (1998) and Morrison (2005). A classic and comprehensive treatment is given by Anderson’s (2003) text.

In item response theory (IRT), the multivariate normal and its generalizations are most often used as the underlying variables distribution for a data-augmentation version of the normal-ogive model (Bartholomew and Knott, 1999; Fox, 2010), as a population distribution for the proficiency parameters θ p , for persons p = 1, ..., P, and as a prior distribution for other model parameters (e.g., difficulty parameters bi , log-discrimination parameters log ai , etc.) whose domain is the entire real line or Euclidean space. Especially, in the latter two cases, the multivariate normal distribution serves to link standard IRT modeling with hierarchical linear models (HLMs) and other structures that incorporate various group structures and other dependence on covariates, to better model item responses in terms of the contexts in which they are situated (e.g., Fox, 2003, 2005a,b, 2010).

Because of the wide variety of applications of the normal distribution in IRT, we will depart somewhat from the notation used in the rest of the book in this chapter. We use X = (X 1, X 2, ..., XK ) T to represent a generic K-dimensional random vector and X to represent a generic random variable. Their observed values will be denoted x and x, respectively. Also, because very many densities will be discussed, we will use the generic notation f () to denote a density. The particular role or nature of f () will be clear from context.

3.2  Multivariate Normal Density

The univariate normal density with mean μ and variance σ2 for a random variable X is as follows:

3.1 f ( x ; μ , σ 2 ) = 1 2 π σ 2 e 1 2 ( x μ / σ ) 2 < x <

As it is well known, E[X] = μ, and Var(X) = E[(X − μ)2] = σ2. We often write XN (μ, σ2) to convey that the random variable X has this density. The quantity in the exponent of the normal density thus measures the square of the distance between realizations of the random variable, X, and the mean μ, scaled in units of the standard deviation σ.

The multivariate normal density, for X ∈ ℜ K , is as follows:

3.2 f ( x ; μ , Σ ) = 1 ( 2 π ) K / 2 | Σ | 1 / 2 e ( 1 / 2 ) ( x μ ) T Σ 1 ( x μ )

where again E[X] = μ = (μ1, ..., μ K ) T is the mean vector and Var(X) = E[(Xμ) (Xμ) T ] = Σ is the K × K symmetric nonnegative-definite variance–covariance matrix. We often write XNK (μ, Σ), or just XN(μ, Σ) if the dimension K is clear from context, to indicate that X follows the multivariate normal density. The quantity (xμ) T −1(xμ) in the exponent is the squared distance from x to μ, again scaled by the variance–covariance matrix Σ. In other contexts, this is called the Mahalanobis distance between x and μ (Morrison, 2005).

The following theorem gives some properties of the multivariate normal random variables.

Theorem 3.1

If XNK (μ, Σ), then

  1. E[X] = μ, and Var(X) = E[(Xμ)(Xμ) T ] = Σ (Johnson and Wichern, 1998).
  2. If W = AX + b for a constant matrix A and a constant vector b, then WNK ( + b, AΣA T ) (Johnson and Wichern, 1998).
  3. Cholesky decomposition: There exists a lower-triangular matrix L with nonnegative diagonal entries, such that Σ = LL T (Rencher, 2002). If XNK (0, I K ×K) then W = LX + μNK (μ, Σ).
  4. Σ is diagonal if and only if the components of X are mutually independent (Johnson and Wichern, 1998).
  5. If X is partitioned into disjoint subvectors X 1 and X 2, and we write the following equation:
    3.3 ( X 1 X 2 ) N   [ ( μ 1 μ 2 ) , ( Σ 11 Σ 12 Σ 21 Σ 22 ) ]
    then the conditional distribution of X 1, given X 2 = x 2, is also multivariate normal, with mean μ 1|2 and variance–covariance matrix Σ 11|2
    3.4 μ 1 | 2 = μ 1 + Σ 12 Σ 22 1 ( x 2 μ 2 )
    3.5 Σ 11 | 2 = Σ 11 Σ 12 Σ 22 1 Σ 12
    (Johnson and Wichern, 1998).

3.2.1  Geometry of the Multivariate Normal Density

A useful geometric property of the multivariate normal distribution is that it is log quadratic: log f(x) is a simple linear function of the symmetric nonnegative definite quadratic form (xμ) T Σ −1(xμ). This means that the level sets {x : f (x) = K}, or equivalently (after taking logs and omitting irrelevant additive constants) the level sets {x : (xμ) T Σ −1(xμ) = c 2}, will be ellipsoids. Figure 3.1 depicts bivariate normal density plots and the same densities using contour plots for two sets of variables with equal variance; variables in subplot (a) are uncorrelated and variables in subplot (b) are highly correlated. The contours are exactly the level sets {x : (xμ) T Σ −1(xμ) = c 2} for various values of c 2.

Finding the principal axes of the ellipsoids is straightforward with Lagrange multipliers. For example, to find the first principal axis, we want to find the point x that is a maximum distance from μ (maximize the squared distance (xμ) T (xμ)) that is still on the contour (satisfies the constraint (xμ) T Σ −1(xμ) = c 2). Differentiating the Lagrange-multiplier objective function

3.6 g λ ( x ) = ( x μ ) T   ( x μ ) λ [ ( x μ ) T   Σ 1 ( x μ ) c 2 ]

with respect to x and setting these derivatives equal to zero leads to the eigenvalue/eigenvector problem

3.7 ( Σ λ I ) ( x μ ) = 0

It is now a matter of calculation to observe that the eigenvalues can be ordered so that

3.8 λ 1 λ 2 λ K

with corresponding mutually orthogonal eigenvectors y k = (x k μ) lying along the principal axes of the ellipsoid with half-lengths c λ k

.

Let A be the K × K matrix with columns a k = y k /∥y k ∥. Then, A T A = I, the identity matrix, and A T A = diag(λ1, ..., λ K ), the K × K diagonal matrix with diagonal elements λ1, ..., λ K . Now, consider the random vector W = A T (Xμ). It is easy to verify that WN (0, diag(λ1, ..., λ K )). The components Wk of W are the (population) principal components of X.

Bivariate density plots. Plots in (a) are the bivariate density and contour plots for two uncorrelated variables σ

Figure 3.1   Bivariate density plots. Plots in (a) are the bivariate density and contour plots for two uncorrelated variables σ12 = 0 and plots in (b) are the bivariate density and contour plots for two perfectly correlated variables σ12 = 1.

3.3  Sampling from a Multivariate Normal Distribution

3.3.1  Multivariate Normal Likelihood

Let a set of NK × 1 vectors X 1, X 2, ..., X N represent an iid random sample from a multivariate normal population with mean vector μ and covariance matrix Σ. Since densities of independent random variables multiply, it is easy to observe that the joint density will be as follows:

3.9 f ( x 1 , x N | μ , Σ ) = 1 ( 2 π ) N K / 2 | Σ | N / 2  exp   [ 1 2 Σ n = 1 N   ( x n μ ) T   Σ 1 ( x n μ ) ]

The log of this density can be written (apart from some additive constants) as follows:

3.10 L ( μ , Σ ) = N 2 log | Σ | 1 2 Σ n = 1 N ( x n μ ) T Σ 1 ( x n μ ) = N 2 log | Σ | 1 2 Σ n = 1 N ( x n x ¯ ) T   Σ 1 ( x n x ¯ ) 1 2 Σ n = 1 N ( x ¯ μ ) T Σ 1 ( x ¯ μ )

The value of μ that maximizes L(μ, Σ) for any Σ is μ ̂ = x ¯

, since that makes the third term in this loglikelihood equal to zero, so that μ ̂ = x ¯ is the MLE. Furthermore, the first two terms of L(μ, Σ) may be rewritten as (N/2) log |Σ −1| − (1/2)tr −1 where A = Σ n = 1 N ( x n x ¯ ) ( x n x ¯ ) T , and from this and a little calculus, the MLE for Σ may be deduced. Summarizing,

Theorem 3.2

If X 1, ..., X N NK (μ, Σ), then the MLE for μ is as follows:

3.11 μ ̂ = x ¯ = 1 N Σ n = 1 N x n

The MLE for Σ is

3.12 Σ ̂ = 1 N Σ n = 1 N   ( x n x ¯ ) ( x n x ¯ ) T

Note that X ¯

and Σ ¯ are sufficient statistics; they contain all of the information about μ and Σ in the data matrix X. Furthermore, note that Σ ̂ is a biased estimate of Σ; the unbiased estimator S = ( 1 / N 1 )   Σ n = 1 N ( x n x ¯ ) ( x n x ¯ ) T is often used instead.

3.3.2  Sampling Distribution of X ¯ and S

The sampling distributions of X ¯

and S are easily generalized from the univariate case. In the univariate case, X ¯ and S are independent, X ¯ N ( μ , σ 2 / N ) , and ( N 1 ) S / σ 2 χ N 1 2 , a χ-squared distribution with N − 1 degrees of freedom. Recall that Σ n = 1 N   ( X n μ ) 2 / σ 2 = Z 1 2 + Z 2 2 +   Z N 2 χ N 2 by definition, because it is a sum of squares of independent standard normals; intuitively, we lose one degree of freedom for (N − 1) · S / σ 2 = Σ n = 1 N ( X n X ¯ ) 2 / σ 2 because we are estimating the mean μ with X ¯ in calculating S.

In the multivariate case where we have the random sample X 1, X 2, ..., X N , the following three theorems apply:

Theorem 3.3

If X 1, ..., X N NK (μ, Σ) are iid, then

  1. X ¯ and S are independent.
  2. X ¯ is distributed as a K-variate normal with parameters μ and Σ/N, X ¯ N ( μ , Σ / N ) .
  3. (N − 1) · S is distributed as a Wishart random variable with parameter Σ and N − 1 degrees of freedom, (N − 1) · SW N−1(Σ).

By definition, Σ n = 1 N ( x n μ ) ( x n μ ) T = Z 1 Z 1 T + Z 2 Z 2 T + + Z N Z N T W N ( Σ )

, since it is the sum of outer products of N independent N(0, Σ) random vectors. Once again, we lose one degree of freedom for (N − 1) · S = Σ n = 1 N ( x n x ¯ ) ( x n x ¯ ) T since we are estimating the mean μ with X. The Wishart density for a positive nonnegative definite random K × K matrix A with D > K degrees of freedom and parameter Σ is as follows:

3.13 ω D 1 ( A | Σ ) = | A | ( D K 2 ) / 2 e tr [ A Σ 1 ] / 2 2 K   ( D 1 ) / 2 π K ( K 1 ) / 4 | Σ | ( D 1 ) / 2   Π k = 1 K   Γ ( ( 1 / 2 ) ( D k ) )

3.4  Conjugate Families

Recall that if the likelihood for data X is (any function proportional to) the conditional density of X given (possibly multidimensional) parameter η, f(x|η), then a conjugate family of prior distributions for f(x|η) is a parametric family of densities f(η; τ) for η with (possibly multidimensional) hyperparameter τ, such that for any member of the conjugate family, the posterior distribution f(η|x; τ) = f(x|η) f(η; τ)/ h f(x|h) f(h; τ) d h can be rewritten as f(η|τ ), a member of the same parametric family as f(η|τ), where τ = τ (x, τ) is some function of x and τ. Since only the form of f(η|τ ) as a function of η matters, in verifying that f(η|τ ) and f(η; τ) belong to the same parametric family, it is usual to ignore multiplicative constants that do not depend on η.

For example, if X 1, ..., XN are iid N(μ, σ2), then the likelihood is

3.14 f ( x 1 , , x N | μ , σ 2 ) = Π n = 1 N 1 2 π σ e ( 1 / 2 σ 2 ) ( x n μ ) 2 1 2 π σ 2 / n e ( 1 / ( 2 σ 2 / n ) ) ( x ¯ μ ) 2

as a function of μ, as would be expected, since x ¯

is sufficient for μ.

If we assume σ2 is known and place a normal f ( μ ; μ 0 , τ 0 2 ) = ( 1 / 2 π τ 0 ) e ( 1 / 2 τ 0 2 ) ( μ μ 0 ) 2

prior on μ, then the posterior density for μ will be

3.15 f ( μ | x 1 , , x N ; μ N , τ N 2 ) f ( x 1 , , x N | μ , σ 2 ) f ( μ | μ 0 , τ 0 2 ) = 1 2 π σ 2 / n e ( 1 / ( 2 σ 2 / n ) ) ( x ¯ μ ) 2 1 2 π τ 0 e ( 1 / 2 τ 0 2 ) ( μ μ 0 ) 2 1 2 π τ N e ( 1 / 2 τ N 2 ) ( μ μ N ) 2

after completing the square, collecting terms, and identifying the normalizing constant, where

3.16 τ N 2 = 1 1 / ( σ 2 / N ) + 1 / τ 0 2
3.17 μ N = ( τ 0 2 τ 0 2 + σ 2 / N )   x ¯ + ( σ 2 / N τ 0 2 + σ 2 / N )   μ 0

In this case, the posterior mean is μ N = ρ N x ¯ + ( 1 ρ N ) μ 0

where ρ N = τ 0 2 / ( τ 0 2 + σ 2 / n ) is the classical reliability coefficient. Thus, if the prior distribution is μ ∼ N0, τ0), then the posterior distribution will be μ|x 1, ..., xN N N , τ N ); this shows that the normal distribution is the conjugate prior for a normal mean μ, when the variance σ2 is known.

Further calculation (as shown in Gelman et al., 2004) shows that when both the mean μ and the variance σ2 are unknown, and the data X 1, ..., XN are an iid sample from N(μ, σ2), the joint distribution

3.18 μ | σ 2 N ( μ 0 , σ 2 / κ 0 )
3.19 σ 2 Inv χ 2 ( ν 0 , σ 0 2 )

is the conjugate prior, where the notation “ σ 2 Inv χ 2 ( ν 0 , σ 0 2 )

” means that σ 0 2 / σ 2 χ ν 0 2 . Here, κ0 and ν0 are hyperparmeters that function as “prior sample sizes”—the larger κ0 and ν0, the greater the influence of the prior on the posterior. In this case, the joint posterior distribution for μ, σ2 is

3.20 μ | σ 2 , x 1 , x 2 , , x N N ( μ N , σ N 2 / κ N )
3.21 σ 2 | x 1 , x 2 , , x N Inv χ 2 ( ν N , σ N 2 )

where

κ N = κ 0 + N ν N = ν 0 + N ν N σ N 2 = ν 0 σ 0 2 + ( N 1 ) S 2 + κ 0 N κ 0 + N ( x μ 0 ) 2 μ N = ( σ 2 / κ 0 σ 2 / κ 0 + σ 2 / N )   x ¯ + ( σ 2 / N σ 2 / κ 0 + σ 2 / N )   μ 0 = ( N κ 0 + N )   x ¯ + ( κ 0 κ 0 + N )   μ 0

with S 2 = ( 1 / N 1 )   Σ n = 1 N ( x n x ¯ ) 2

.

Although this is the conjugate family when μ and σ2 are both unknown, the forced dependence between μ and σ2 in the prior is often awkward in applications. Common alternatives are as follows:

  1. To replace σ20 with an arbitrary τ2 in the conditional prior for μ|σ2, forcing independence. In this case, the conditional posterior for μ|σ2 is as in the “σ2 known” case above, but the marginal posterior for σ2 is neither conjugate nor in closed form (it is not difficult to calculate however) or
  2. To make the conditional prior for μ|σ2 very flat/uninformative by letting κ0 → 0. In this case, the posterior distributions for μ|σ2 and for σ2 mimic the sampling distributions of the MLEs

If one wishes for a noninformative prior for σ2 (analogous to the flat prior choice for μ in (b) above), a choice that preserves conjugacy would be to take the degrees of freedom ν0 = 0 in the Inv χ 2 ( ν 0 , σ 0 2 )

prior for σ2. This leads to a prior f2) ∝ 1/σ2. Another noninformative choice is the Jeffreys prior for σ2 (proportional to the square root of the Fisher information for the parameter), which in this case leads to the prior f(σ) ∝ 1/σ. The Jeffreys prior for log σ2 (or equivalently log σ)issimply f2) ∝ 1. All of these choices are improper priors and care must be taken that the posterior turns out to be proper. One common way to avoid this issue is to force the prior to be proper, say, by taking σ2 ∼ Unif(0, M) for some suitably large number M.

The conjugate prior distribution for a multivariate normal distribution with parameters μ and Σ has a form similar to that of the univariate case, but with multivariate normal and inverse-Wishart densities replacing the univariate normal and inverse-χ-squared densities. In particular, assuming sampling x 1, ..., x N iid from N(μ, Σ), the joint prior for μ and Σ is of the following form:

3.22 μ | Σ N ( μ 0 , Σ / κ 0 )
3.23 Σ Inv -Wishart ( ν 0 , Σ 0 1 )

where the notation “ Σ ∼ Inv-Wishart ( ν 0 , Σ 0 1 )

” means that Σ 0 Σ 1 ω ν 0 ( Σ 0 ) the usual Wishart distribution with ν0 degrees of freedom and parameter matrix Σ 0. Again, κ0 and ν0 function as prior sample sizes. Then, the joint posterior distribution will be as follows:

3.24 μ | Σ , x 1 , x 2 , , x N N ( μ N , Σ / κ N )
3.25 Σ | x 1 , x 2 , , x N Inv-Wishart ( ν N , Σ N 1 )

where

κ N = κ 0 + N ν N = ν 0 + N ν N Σ N = ν 0 Σ 0 + ( N 1 ) S + κ 0 N κ 0 + N ( x ¯ μ 0 ) ( x ¯ μ 0 ) T μ N = ( N κ 0 + N )   x ¯ + ( κ 0 κ 0 + N )   μ 0

with S = ( 1 / N 1 )   Σ n = 1 N   ( x n x ¯ ) ( x n x ¯ ) T

(Gelman et al., 2004). Once again the joint conjugate prior forces some prior dependence between μ and Σ that may be awkward in practice. And once again the common fixes are as follows:

  1. Replace Σ0 with an arbitrary Λ 0 in the conditional prior for μ|Σ, forcing independence or
  2. Make the conditional prior for μ|Σ very flat/uninformative by letting κ0 → 0

The details, generally analogous to the univariate normal case, are worked out in many places; in particular, see Gelman et al. (2004). Gelman et al. (2004) suggest another way to reduce the prior dependence between μ and . Sun and Berger (2007) provide an extensive discussion of several “default” objective/noninformative prior choices for the multivariate normal.

3.5  Generalizations of the Multivariate Normal Distribution

Assessing multivariate normality in higher dimensions is challenging. A well-known theorem (Johnson and Wichern, 1998; see also Anderson, 2003) states that X is a multivariate normal vector if and only if each linear combination a T X of its components is univariate normal, but this is seldom feasible to show in practice. Instead one often only checks the one- and two-dimensional margins of X; for example, examining a QQ plot for each of the K components of X, and in addition, examining bivariate scatterplots of each possible pair of variables to determine whether the data points yield an elliptical appearance. (See Johnson and Wichern (1998) for more information on techniques for assessing multivariate normality.)

There is no doubt that true multivariate normality is a rare property for a multivariate dataset. Although the latent ability variable in IRT is often assumed to be normally distributed, the data may not conform well to this assumption at all (Casabianca, 2011; Casabianca et al., 2010; Moran and Dresher, 2007; Woods and Lin, 2009; Woods and Thissen, 2006). In these and other cases, robust alternatives to the multivariate normal distribution can be considered.

An important class of generalizations of the multivariate normal distribution is the family of multivariate elliptical distributions, so named because each of its level sets defines an ellipsoid (Fang and Zhang, 1990; Branco and Dey, 2002). The K-dimensional random variable X has an elliptical distribution if and only if its characteristic function (Billingsley, 1995; Lukacs, 1970) is of the form

3.26 Ψ ( t ) = E [ exp ( i t T X ) ] = exp ( i t T   μ ) ψ ( t T   Σ t )

where as usual μ is a K-dimensional vector and Σ is a symmetric nonnegative definite K × K matrix, and t ∈ ℜ K . When a density f(x) exists for X, it has the form

3.27 f g ( x ; μ , Σ ) = 1 | Σ | 1 / 2 g [ ( x μ ) T   Σ 1   ( x μ ) ]

where g() is itself a univariate density; g() is called the generator density for f(). The density defines a location/scale family with location parameter μ and scale parameter Σ. The parameter μ is the median of f() in all cases, and if E[X] exists, E[X] = μ. If Var(X) exists, Var(X) = −(∂ψ(0)/∂t) · Σ. A little calculation shows that if W = AX + b is elliptically distributed with location μ and scale Σ, then W = AX + b is again elliptically distributed, with location + b and scale AΣA T .

Elliptical distributions are used as a tool for generalizing normal-theory structural equations modeling (e.g., Shapiro and Browne, 1987; Schumacker and Cheevatanarak, 2000). The special case of the K-dimensional multivariate-t distribution on ν degrees of freedom with density

3.28 f ( x ; μ , Σ ) = Γ ( ( ν + K ) / 2 ) Γ ( ν / 2 ) ν K / 2 π K / 2 | Σ | 1 / 2 × ( 1 + 1 ν ( x μ ) T Σ 1 ( x μ ) ) ( ν + K ) / 2

has been used as the error distribution in robust Bayesian and non-Bayesian linear regression modeling since at least Zellner (1976); more recently, robust regression with general elliptical error distributions and the particular case of scale mixtures of normals (of which the univariate-t and multivariate-t are also examples) has also been studied (e.g., Fernandez and Steel, 2000).

A further generalization is the family of skew-elliptical distributions (e.g., Branco and Dey, 2001, 2002). When it exists, the density of a skew-elliptical distribution is of the form

3.29 f g 1 , g 2   ( x | μ , Σ , λ ) = 2 f g 1   ( x | μ , Σ ) F g 2 ( λ T   ( x μ ) )

where f g1() is the density of a multivariate elliptical distribution, and F g2() is the cumulative distribution function of a (possibly different) univariate elliptical distribution with location parameter 0 and scale parameter 1. The vector parameter λ is a skewness parameter; when λ = 0, the skew-elliptical density reduces to a symmetric elliptical density.

When the generator densities g 1(x) = g 2(x) = ϕ(x), the standard normal density, we obtain the special case of the skew-normal distributions. It has been observed (Moran and Dresher, 2007) that the empirical distribution of the latent proficiency variable in IRT models applied to large-scale educational surveys sometimes exhibits some nontrivial skewing, which if unmodeled can cause bias in estimating item parameters and features of the proficiency distribution. Skew-normal and related distributions have been proposed (Xu and Jia, 2011) as a way of accounting for this skewing in the modeling of such data, with as few extra parameters as possible.

Two additional classes of transformed normal distributions used in IRT are the lognormal and logit-normal distributions. A random variable X has a lognormal distribution when its logarithm is normally distributed. The density of the lognormal distribution is of the following form:

3.30 f ( x ; μ , σ 2 ) = 1 x 2 π σ 2 e ( ln  x μ ) 2 / 2 σ 2 x > 0

where μ and σ2 are the mean and variance of the variable’s natural log. This distribution is an alternative to Gamma and Weibull distributions for nonnegative continuous random variables and is used in psychometrics to model examinee response times (e.g., van der Linden, 2006) and latent processes in decision making (e.g., Rouder et al., 2014). It is also used as a prior distribution in Bayesian estimation of nonnegative parameters, for example the discrimination parameter in two- and three-parameter logistic IRT models (van der Linden, 2006).

Similarly, a random variable X has a logit-normal distribution when its logit is normally distributed. The density of the logit-normal distribution is of the following form:

3.31 f ( x ; μ , σ 2 ) = 1 σ 2 π e ( logit ( x ) μ ) 2 / 2 σ 2 1 x ( 1 x ) 0 < x < 1

Here, μ and σ2 are the mean and variance of the variable’s logit, and x is a proportion, bounded by 0 and 1. A multivariate generalization of the logit-normal distribution (Aitchison, 1985) has been used in latent Dirichlet allocation models for text classification ( Blei and Lafferty, 2007) and in mixed membership models for strategy choice in cognitive diagnosis (Galyardt, 2012).

Acknowledgment

This work was supported by a postdoctoral fellowship at Carnegie Mellon University and Rand Corporation, through Grant #R305B1000012 from the Institute of Education Sciences, U.S. Department of Education.

References

Aitchison, J. 1985. A general class of distributions on the simplex. Journal of the Royal Statistical Society. Series B (Methodological), 47,136–146.
Anderson, T. W. 2003. Introduction to Multivariate Statistical Analysis. New York: John Wiley and Sons.
Bartholomew, D. J. and Knott, M. 1999. Latent Variable Models and Factor Analysis. London: Arnold. (Kendall’s Library of Statistics 7).
Billingsley, P. 1995. Probability and Measure (3rd ed.). New York: John Wiley & Sons.
Blei, D. and Lafferty, J. 2007. A correlated topic model of science. Annals of Applied Statistics, 1, 17–35.
Branco, M. and Dey, D. K. 2001. A general class of multivariate skew-elliptical distributions. Journal of Multivariate Analysis, 79, 99–113.
Branco, M. and Dey, D. K. 2002. Regression model under skew-elliptical error distribution. The Journal of Mathematical Sciences, Delhi, New Series, 1, 151–169.
Casabianca, J. M. 2011. Loglinear Smoothing for the Latent Trait Distribution: A Two-Tiered Evaluation. (Doctoral dissertation), ProQuest dissertations and theses. (Accession Order No. AAT 3474125.)
Casabianca, J. M. , Xu, X. , Jia, Y. , and Lewis, C. 2010. Estimation of item parameters when the underlying latent trait distribution of test takers is nonnormal. Paper Presented at the Meeting of the National Council for Measurement in Education, Denver, Colorado.
Fang, K. T. and Zhang, Y. T. 1990. Generalized Multivariate Analysis. New York: Springer.
Fernandez, C. and Steel, M. F. J. 2000. Bayesian regression analysis with scale mixtures of normals. Econometric Theory, 16, 80–101.
Fox, J. P. 2003. Stochastic EM for estimating the parameters of a multilevel IRT model. British Journal of Mathematical and Stat Psychology, 56, 65–81.
Fox, J. P. 2005a. Multilevel IRT model assessment. In van der Ark, L. A. , Croon, M. A. , and Sijtsma, K. (Eds.), New Developments in Categorical Data Analysis for the Social and Behavioral Sciences (pp. 227–252). Mahwah, NJ: Lawrence Erlbaum.
Fox, J. P. 2005b. Multilevel IRT using dichotomous and polytomous response data. British Journal of Mathematical and Statistical Psychology, 58, 145–172.
Fox, J. P. 2010. Bayesian Item Response Modeling. New York: Springer.
Gelman, A. , Carlin, J. B. , Stern, H. A. , and Rubin, D. B. 2004. Bayesian Data Analysis. New York: John Wiley and Sons.
Galyardt, A. 2012. Mixed Membership Distributions with Applications to Modeling Multiple Strategy Usage. PhD dissertation, Department of Statistics, Carnegie Mellon University, Pittsburgh, PA.
Johnson, R. A. and Wichern, D. W. 1998. Applied Multivariate Statistical Analysis. Upper Saddle River, NJ: Prentice-Hall.
Lukacs, E. 1970. Characteristic Functions. London: Griffin.
Moran, R. and Dresher, A. 2007. Results from NAEP marginal estimation research on multivariate scales. Paper Presented at the Meeting of the National Council for Measurement in Education, Chicago, IL.
Morrison, D. F. 2005. Multivariate Statistical Methods. Belmont, CA: Thomson Brooks Cole.
Rencher, A. C. 2002. Methods of Multivariate Analysis. New York: John Wiley and Sons.
Rouder, J. N. , Province, J. M. , Morey, R. D. , and Heathcote, A. 2014. The lognormal race: A cognitive-process model of choice and latency with desirable psychometric properties. Psychometrika, 1–23.
Schumacker, R. E. and Cheevatanarak, S. 2000. A comparison of normal and elliptical estimation methods in structural equations models. Paper Presented at the Meeting of the American Educational Research Association, New Orleans, LA (ERIC Document Reproduction Service No. ED441872). Retrieved April 21, 2012, from http://www.eric.ed.gov/PDFS/ED441872.pdf.
Shapiro, A. and Browne, M. W. 1987. Analysis of covariance structures under elliptical distributions. Journal of the American Statistical Association, 82, 1092–1097.
Sun, D. and Berger, J. O. 2007. Objective Bayesian analysis for the multivariate normal model. Bayesian Statistics, 8, 525–562.
van der Linden, W. J. 2006. A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31, 181–204.
Woods, C. M. and Lin, N. 2009. IRT with estimation of the latent density using Davidian curves. Applied Psychological Measurement, 33, 102–117.
Woods, C. M. and Thissen, D. 2006. IRT with estimation of the latent population distribution using spline-based densities. Psychometrika, 71, 281–301.
Xu, X. and Jia, Y. 2011. The Sensitivity of Parameter Estimates to the Latent Ability Distribution (Research Report 11–40). Princeton, NJ: Educational Testing Service.
Zellner, A. 1976. Bayesian and non-Bayesian analysis of the regression model with multivariate Student-t error terms. Journal of the American Statistical Association, 71, 400–405.
Search for more...
Back to top

Use of cookies on this website

We are using cookies to provide statistics that help us give you the best experience of our site. You can find out more in our Privacy Policy. By continuing to use the site you are agreeing to our use of cookies.