9

# Handbook of Graphical Models

Print publication date:  November  2018
Online publication date:  November  2018

Print ISBN: 9781498788625
eBook ISBN: 9780429463976

10.1201/9780429463976-3

#### Abstract

Consider a finite set of random variables  X v , v ∈ V . Section 1.7 in Chapter 1 describes how to use a simple undirected graph G = ( V , E ) to encode conditional independence (CI) statements among the random variables. One can also naturally associate a parametrized family of joint probability distributions of the  X v to a graph. For undirected graphs, the Hammersley-Clifford theorem (Observation 1.7.2) shows that both the implicit method and the parametric method lead to the same families of probability distributions (called graphical models), as long as all distributions are assumed strictly positive.

#### 3.1  Introduction

Consider a finite set of random variables  $X v$ , $v ∈ V$ . Section 1.7 in Chapter 1 describes how to use a simple undirected graph $G = ( V , E )$ to encode conditional independence (CI) statements among the random variables. One can also naturally associate a parametrized family of joint probability distributions of the  $X v$ to a graph. For undirected graphs, the Hammersley-Clifford theorem (Observation 1.7.2) shows that both the implicit method and the parametric method lead to the same families of probability distributions (called graphical models), as long as all distributions are assumed strictly positive.

When probabilities are allowed to go to zero, the models defined by the collections of CI statements contain probability distributions that do not lie in the parametric graphical model, which, by definition, consists of strictly positive probability distributions. In fact, these additional distributions do not even lie in the closure of the parametric graphical model, so they cannot be approximated by distributions from the parametric graphical model. Moreover, models defined by collections of CI statements (pairwise Markov properties, local Markov properties, global Markov properties) differ from one another. As an example, consider the four-cycle  $C 4$ .

The binary random variables $X = ( X 1 , X 2 , X 3 , X 4 )$ satisfy the global Markov statements of  $C 4$ , $1 ⊥ 3 [ { 2 , 4 } ]$ and $2 ⊥ 4 [ { 1 , 3 } ]$ , if and only if one (or more) of the following statements is true:

1. The joint distribution lies in the closure of the graphical model.
2. There is a pair $( X i , X i + 1 )$ of neighboring nodes such that $X i = X i + 1$ a.s.
3. There is a pair $( X i , X i + 1 )$ of neighboring nodes such that $X i ≠ X i + 1$ a.s.

This chapter shows how to prove results such as Proposition 1.1.1 using algebraic tools. The algebraic method can also be used to study implications between conditional independence statements. Here is an example:

Suppose that XYZ are binary random variables or jointly normal random variables. If $X ⊥ Y$ and $X ⊥ Y [ Z ]$ then either $X ⊥ ( Y , Z )$ or $( X , Z ) ⊥ Y$ .

The CI implication in Proposition 1.1.2 is a special case of the gaussoid axiom . One may wonder what is special about jointly normal or binary random variables. For instance, is there a variant of this implication when XYZ are discrete but not binary? How can one systematically find and study implications like this?

CI implications can also be interpreted as intersections of graphical models. For example, the two CI statements $X ⊥ Y$ and $X ⊥ Y [ Z ]$ in Proposition 1.1.2 correspond to the two graphical models $X → Z ← Y$ and $X - Z - Y$ , respectively. Thus, Proposition 1.1.2 says that the intersection of these two graphical models equals the union of the two graphical models $X Z - Y$ and $X - Z Y$ , provided the random variables are either binary or jointly normal. As the example shows, the intersection of two graphical models need not be a graphical model. How can one compute this intersection?

The goal of this chapter is to explore these questions and introduce tools from computational algebra for studying them. Our perspective is that, for a fixed type of random variable, the set of distributions that satisfy a collection of independence constraints is the zero set of a collection of polynomial equations. Solutions of systems of polynomial equations are the objects of study of algebraic geometry, and so tools from algebra can be brought to bear on the problem. The next section contains an overview of basic ideas in algebraic geometry which are useful for the study of conditional independence structures and graphical models. In particular, it introduces algebraic varieties, polynomial ideals, and primary decomposition. Section 3.3 introduces the ideals associated to families of conditional independence statements, and explains how to apply the basic techniques to deduce conditional independence implications. Section 3.4 illustrates the main ideas with some deeper examples coming from the literature. Section 3.5 concerns the vanishing ideal of a graphical model, which is a complete set of implicit restrictions for that model. This set of restrictions is usually much larger than the set of conditional independence constraints that come from the graph, but it can illuminate the structure of the model especially with more complex families of models involving mixed graphs or hidden random variables. Section 3.6 highlights some key references in this area.

#### 3.2  Notions of Algebraic Geometry and Commutative Algebra

Commutative algebra is the study of systems of polynomial equations, and algebraic geometry is the study of geometric properties of their solutions. Both are rich fields with many deep results. This section only gives a very coarse introduction to the basic facts that hopefully makes it possible for the reader to understand the phenomena and algorithms discussed in later parts of this chapter. For a more detailed introduction, the reader is referred to the standard textbook .

#### 3.2.1  Polynomials, ideals and varieties

Let $k$ be a field, for example the rational numbers $Q$ , the real numbers  $R$ , or the complex numbers  $C$ . Let $N = { 0 , 1 , … }$ denote the natural numbers. Let $p 1 , p 2 , ⋯ , p r$ be a collection of indeterminates or variables. A monomial in the indeterminates $p 1 , p 2 , ⋯ , p r$ is an expression of the form $p 1 u 1 p 2 u 2 ⋯ p r u r$ where $u 1 , … , u r$ are nonnegative integers. Writing $u = ( u 1 , … , u r )$ , let

$p u : = p 1 u 1 p 2 u 2 ⋯ p r u r .$

A polynomial is a finite linear combination of monomials, i.e.

$f = ∑ u ∈ U c u p u$

where $U ⊂ N r$ is a finite set and $c u ∈ k$ . Of course, one is used to thinking of a polynomial as a function, $f : k r → k$ , which can be evaluated in a point $a ∈ k r$ for p. In the following, this function will usually have the role of a constraint; i.e., the object of interest is the zero set ${ a ∈ k r : f ( a ) = 0 }$ . In algebra, it is also useful to think of a polynomial as a formal object, i.e. the indeterminates are simply symbols that are used in manipulations, with no need for them to be evaluated.

The set of all polynomials in indeterminates $p 1 , … , p r$ with coefficients in $k$ is called the polynomial ring and denoted  $k [ p 1 , … , p r ]$ . The word ring means that $k [ p 1 , … , p r ]$ has two operations, namely addition of polynomials and multiplication of polynomials, and that these operations satisfy all the usual properties of addition and multiplication (associativity, commutativity, distributivity). However, multiplicative inverses need not exist: The result of dividing one polynomial by another non-constant polynomial is in general not a polynomial, but a rational function.

Let $F ⊆ k [ p 1 , … , p r ]$ . The variety defined by $F$ is the vanishing set of the polynomials in $F$ , that is,

$V ( F ) = { a ∈ k r : f ( a ) = 0 for all f ∈ F } .$

Let $r = 2$ and consider $F = { x 2 - y } ⊆ k [ x , y ]$ . The variety $V ( { x 2 - y } ) ⊆ k 2$ is the familiar parabola “ $y = x 2$ ” in the plane. For $r = 4$ and $F = { p 11 p 22 - p 12 p 21 } ⊆ k [ p 11 , p 12 , p 21 , p 22 ]$ , the variety $V ( { p 11 p 22 - p 12 p 21 } ) ⊆ k 4$ is the set of all singular $2 × 2$ matrices $p 11 p 12 p 21 p 22$ with entries in $k$ .

Let $F = { x 2 - y , x 2 + y 2 - 1 }$ . The variety $V ( F )$ is the set of points

${ ( x , y ) ∈ k 2 : y = x 2 and x 2 + y 2 = 1 } ,$

in other words, the intersection of a parabola and a circle. The number of points in this intersection varies depending on whether the underlying field is $Q$ , $R$ , or $C$ (or some other field). If $k = Q$ the variety is empty, if $k = R$ the variety has two points, and if $k = C$ , the variety has four points. In statistical applications one is usually interested in solutions over  $R$ . However, it is often easier to first perform computations in algebraically closed fields like  $C$ before restricting to the real numbers, at least in theory. On the other hand, when using a computer algebra system, it may be advantageous to work with $Q$ , if possible, because rational numbers can be represented exactly on a computer.

The examples so far have always used finite sets  $F$ . This is not necessary for the definition of a variety, and it is often worthwhile to consider the variety $V ( F )$ where $F$ is an infinite set of polynomials. In fact, it is often convenient to replace the original set of polynomials $F$ by an infinite set, the ideal generated by $F$ , which is equivalent to $F$ in some sense but has more structure.

One reason is that different families of polynomials may have the same varieties. For example, if $f , g ∈ F$ , then the variety of $F ∪ { f + g }$ equals $V ( F )$ . Similarly, for $f ∈ F$ and $λ ∈ k$ , the variety of $F ∪ { λ f }$ equals $V ( F )$ .

A set $I ⊆ k [ p 1 , … , p r ]$ is an ideal if for all $f , g ∈ I$ , $f + g ∈ I$ and for all $f ∈ I$ and $h ∈ k [ p 1 , … , p r ]$ , $h f ∈ I$ .

Let $F ⊆ k [ p 1 , … , p r ]$ be a set of polynomials. The ideal generated by $F$ is the smallest ideal in $k [ p 1 , … , p r ]$ that contains $F$ . Equivalently, the ideal generated by $F$ consists of all polynomials $h 1 f 1 + ⋯ + h k f k$ for $h 1 , … , h k ∈ k [ p 1 , … , p r ]$ , $f 1 , … , f k ∈ F$ , and any k. The ideal generated by $F$ is denoted $⟨ F ⟩$ .

Let $F ⊆ k [ p 1 , … , p r ]$ be a set of polynomials. Then $V ( F ) = V ( ⟨ F ⟩ )$ .

The ideal $⟨ x 2 - y , x 2 + y 2 - 1 ⟩$ generated by the set $F$ from Example 1.2.2 has many different possible generating sets. For example, an alternate generating set is ${ x 2 - y , x 4 + x 2 - 1 }$ . This allows to easily find all the solutions of the polynomial system because all roots of the univariate polynomial $x 4 + x 2 - 1 = 0$ can be plugged into the second polynomial $x 2 - y = 0$ , which can then be solved for y.

Hilbert’s basis theorem implies that for any ideal $I ⊆ k [ p 1 , … , p r ]$ there exists a finite set of polynomials $F ⊆ k [ p 1 , … , p r ]$ such that $I = ⟨ F ⟩$ .

Even though it is, for theoretical considerations, often easier to think about systems of polynomial equations in terms of ideals, in practice (i.e. when working with computer algebra systems), the ideal is almost always specified in terms of a finite set of generators (or such a finite set of generators has to be computed on the way). On the other hand, during a computation it is often necessary to replace this set of generators by a more convenient set of generators (e.g. a Gröbner basis), so the generators may change even though the ideal stays the same along a computation.

Let $S ⊆ k r$ . The vanishing ideal of S is the set

$I ( S ) : = { f ∈ k [ p 1 , … , p r ] : f ( a ) = 0 for all a ∈ S } .$

It is easy to check that a vanishing ideal is indeed an ideal. Clearly, any ideal J satisfies $J ⊆ I ( V ( J ) )$ . However, the converse inclusion does not hold in general. For instance, $I ( V ( ⟨ x 2 ⟩ ) ) = ⟨ x ⟩$ , and $I ( V ( ⟨ x 2 y , x y 2 ⟩ ) ) = ⟨ x y ⟩$ (over any field $k$ ).

The ideal I(V(J)) is the $k$ -radical of J. An ideal J such that $I ( V ( J ) ) = J$ is a $k$ -radical ideal. If $k$ is algebraically closed (e.g. if $k = C$ ), such an ideal J is simply called a radical ideal.

Radical ideals can also be characterized algebraically, and there are algorithms to compute radicals. The radical is usually a simpler ideal, and if the radical of an ideal can be computed, it is advantageous to do this in a first step in each calculation (as long as one is only interested in properties of V(J), and not in algebraic properties of J).

The following proposition illustrates the close relation between ideals and varieties.

Let $S 1 , S 2 ⊆ k r$ . Then $I ( S 1 ∪ S 2 ) = I ( S 1 ) ∩ I ( S 2 )$ and $I ( S 1 ∩ S 2 ) ⊇ I ( S 1 ) + I ( S 2 ) : = { f + g : f ∈ I ( S 1 ) , g ∈ I ( S 2 ) }$ .

Let $I , J ⊆ k [ p 1 , ⋯ , p r ]$ . Then $V ( I ∪ J ) = V ( I + J ) = V ( I ) ∩ V ( J )$ and $V ( I ∩ J ) = V ( I ) ∪ V ( J )$ .

#### 3.2.2  Irreducible and primary decomposition

Proposition 1.2.2 shows that the union of two varieties is again a variety. Interestingly, not every variety can be written as a non-trivial finite union.

A variety V is reducible if there are two varieties $V 1 , V 2 ≠ V$ such that $V 1 ∪ V 2 = V$ . Otherwise V is irreducible.

Any variety $V ⊆ k r$ has a unique decomposition into finitely many irreducible varieties $V = V 1 ∪ ⋯ ∪ V k$ (with $V i ⊈ V j$ for $i ≠ j$ ).

The varieties $V 1 , ⋯ , V k$ are called the irreducible components of V.

Let V be an irreducible variety, and let $ϕ : k r → k s$ be a rational map. Then $V ( I ( ϕ ( V ) ) )$ is irreducible.

Let V be a variety that has a rational parametrization $ϕ : k r → V$ such that the image of $ϕ$ is dense in V. Then V is irreducible.

According to Proposition 1.2.2, the corresponding decomposition operation for ideals is to write ideals as the intersection of other ideals. However, for general ideals, the situation is much more complicated than for varieties. The situation simplifies for radical ideals (which are in a one-to-one correspondence with varieties). This case is discussed next. The general case is summarized afterwards.

An ideal $I ⊆ k [ p 1 , … , p r ]$ is prime if for all $f , g ∈ k [ p 1 , … , p r ]$ with $f · g ∈ I$ , one of the factors fg belongs to I.

For example, $I : = ⟨ x y ⟩$ is not prime, because $x y ∈ I$ , but neither $x ∈ I$ nor $y ∈ I$ .

A variety V is irreducible if and only if I(V) is prime.

A prime ideal $P ⊆ k [ p 1 , … , p r ]$ is a minimal prime of an ideal $I ⊆ k [ p 1 , … , p r ]$ if and only if V(P) is an irreducible component of V(I).

There is also an algebraic definition of the minimal primes, and there are algorithms to compute the minimal primes. By definition, the minimal primes of an ideal encode the irreducible decomposition of the corresponding variety:

Any ideal $I ⊆ k [ p 1 , … , p r ]$ has finitely many minimal primes  $P 1 , ⋯ , P k$ .

The ideal $P 1 ∩ ⋯ ∩ P k$ equals the radical of I.

The irreducible components of V(I) are $V ( P 1 ) , ⋯ , V ( P k )$ .

If I is not radical, then $P 1 ∩ ⋯ ∩ P k ⊆ I$ . In this case, it is still possible to write I as an intersection of special ideals (called primary ideals) in a way that is algebraically and geometrically meaningful. This intersection is called a primary decomposition. The precise definitions are omitted, since a primary decomposition often adds little to the statistical understanding. However, some works in algebraic statistics written by algebraists who do care about the differences between ideals and their radicals use this notation. The following result explains how a primary decomposition is related to the minimal primes.

Let $I = I 1 ∩ ⋯ ∩ I l$ be a primary decomposition of $I ⊆ k [ p 1 , … , p r ]$ , and let $P i$ be the radical of  $I i$ .

1. $V ( I ) = V ( I 1 ) ∪ V ( I 2 ) ∪ ⋯ ∪ V ( I l ) = V ( P 1 ) ∪ V ( P 2 ) ∪ ⋯ ∪ V ( P l )$ .
2. Each $P i$ is prime.
3. Each minimal prime of I is among the  $P i$ .
4. If $P i$ is not a minimal prime of I, then there is a minimal prime $P j$ of I with $P j ⊂ P i$ (and so $V ( P i ) ⊂ V ( P j )$ ).

Let $I = ⟨ x y , x z ⟩ ∈ k [ x , y , z ]$ . The variety V(I) consists of the union of the plane where $x = 0$ , and the line where $y = 0 , z = 0$ . Hence $V ( ⟨ x y , x z ⟩ ) = V ( ⟨ x ⟩ ) ∪ V ( ⟨ y , z ⟩ )$ is a decomposition into irreducibles. This corresponds to the ideal decomposition $⟨ x y , x z ⟩ = ⟨ x ⟩ ∩ ⟨ y , z ⟩$ .

The primary decomposition need not be unique.

The ideal $⟨ x 2 , x y ⟩$ has several different primary decompositions, e.g.

$⟨ x 2 , x y ⟩ = ⟨ x ⟩ ∩ ⟨ x 2 , y ⟩ = ⟨ x ⟩ ∩ ⟨ x 2 , x + y ⟩ .$

The variety $V ( ⟨ x 2 , x y ⟩ )$ equals the line where $x = 0$ , corresponding to the unique minimal prime $⟨ x ⟩$ . The non-uniqueness of the primary decomposition is related to the fact that the variety of the “extra” component is a subset of one of the other components. This variety (which is superfluous in the irreducible decomposition) is called an embedded component.

This example can be analyzed as follows using the computer algebra system Macaulay2 . First set up a polynomial ring in the indeterminates xy with the rational numbers $Q$ as the coefficient field. In Macaulay2 it is advisable to work with $Q$ rather than $R$ or  $C$ since the arithmetic in $Q$ can be carried out exactly on a computer. i1 : R = QQ[x,y] o1 = R o1 : PolynomialRing

The system reports that it understands ,R, as a polynomial ring. The following input makes Macaulay2 decompose the ideal. The decomposition is computed over $Q$ , but in this case it happens to be valid over any field  $k$ . i2 : primaryDecomposition ideal (x^2, x*y) 2 o2 = {ideal x, ideal (x , y)}

If one is only interested in the irreducible decomposition, the command ,decompose, returns the minimal primes corresponding to the irreducible components, discarding all embedded components: i3 : decompose ideal (x^2, x*y) o3 = {ideal x}

#### 3.2.3  Binomial ideals

This section ends with a short discussion of binomial ideals and toric ideals, which make frequent appearance in applications.

A binomial is a polynomial $p u - λ p v$ , $λ ∈ k$ with at most two terms. An ideal I is a binomial ideal if it has a generating set of binomials. A binomial ideal that is prime and does not contain any variable is a toric ideal.

The main reason why it is important whether an ideal is binomial is that there are dedicated algorithms for binomial ideals that are much faster than the generic algorithms that work for any ideal [10,13,22,23]. Note that there are some instances of ideals that arise in algebraic statistics that are not binomial in their natural coordinate systems but become binomial ideals after a linear change of coordinates .

Let $A ∈ Z h × r$ be an integer matrix, and consider the ideal

$I A : = ⟨ p u + - p u - : u = u + - u - ∈ ker Z A ⟩$

in the polynomial ring $k [ p 1 , ⋯ , p r ]$ , where $u = u + - u -$ is the decomposition of u into its positive and negative part $u + , u - ∈ N r$ . Clearly, $I A$ is binomial and does not contain any of the  $p i$ . One can also show that $I A$ is prime, and thus it is an example of a toric ideal. In fact, any toric ideal is of this form up to a scaling of coordinates [13, Corollary 2.6]. The generating set above is infinite, but Theorem 3.1 in  shows that finite generating sets of toric ideals are related to Markov bases, which can be computed using the software 4ti2 .

#### 3.2.4  Real algebraic geometry

In addition to polynomial equations, in many situations in statistics it is useful to consider solutions to polynomial inequalities as well. This is the subject of the field real algebraic geometry. Inequalities only make sense over an ordered field like $R$ (but not over  $C$ ). For simplicity, the following definitions and results are formulated with  $R$ . Again, this text only contains the basic definitions. For more details the reader is referred to [4,5].

Let $F , G ⊆ R [ p 1 , … , p r ]$ be sets of polynomials with $G$ finite. The basic semialgebraic set defined by $F$ and $G$ is

${ a ∈ R r : f ( a ) = 0 for all f ∈ F and g ( a ) > 0 for all g ∈ G } .$

A semialgebraic set is a finite union of basic semialgebraic sets.

Here are some common examples of semialgebraic sets arising in statistics.

The open probability simplex

$int ( Δ r - 1 ) : = { p ∈ R r : ∑ i = 1 r p i = 1 , p i > 0 , i = 1 , … , r }$

consists of all probability distributions for a categorical random variable with r states. It is a basic semialgebraic set: In the above definition, one may take $F = { ∑ i = 1 r p i - 1 }$ and $G = { p 1 , … , p r }$ . The probability simplex

$Δ r - 1 : = { p ∈ R r : ∑ i = 1 r p i = 1 , p i ≥ 0 , i = 1 , … , r }$

is a semialgebraic set. It can be written as the union of $2 r - 1$ basic semialgebraic sets.

The cone $P D m$ of $m × m$ positive definite symmetric matrices is an example of a basic semialgebraic set in $R m + 1 2$ , where $F = ∅$ and where $G$ consists of the principal subdeterminants of an $m × m$ symmetric matrix of indeterminates. For instance, if $m = 3$ consider the polynomial ring $R [ σ 11 , σ 12 , σ 13 , σ 22 , σ 23 , σ 33 ]$ and the symmetric matrix of indeterminates

$Σ = σ 11 σ 12 σ 13 σ 12 σ 22 σ 23 σ 13 σ 23 σ 33 .$

The symmetry has been enforced by making certain entries in the matrix equal. The set of polynomials defining $P D 3$ can be chosen to be

$G = { σ 11 , σ 11 σ 22 - σ 12 2 , det Σ } ,$

the set of leading principal minors of $Σ$ . The cone of positive semidefinite symmetric matrices is a semialgebraic set, which can be realized by using non-strict inequalities with the much larger set of all principal minors of  $Σ$ .

#### 3.3  Conditional Independence Ideals

This section shows how the algebraic tools introduced in Section 3.2 can be used to analyze conditional independence structures. The tools can be applied in the settings of discrete random variables and jointly normal variables, but in different ways.

#### 3.3.1  Discrete random variables

Let $X 1 , X 2 , … , X m$ be finite discrete random variables. Suppose that the state space of $X i$ is $[ r i ] : = { 1 , 2 , … , r i }$ . There is an algebraic description of the set of all distributions that satisfy a given conditional independence statement. The first example comes from the simplest CI statement: $1 ⊥ 2$ .

Let $X 1 , X 2$ be discrete random variables where the state space of $X i$ is  $[ r i ]$ . Let $p i 1 i 2 = P ( X 1 = i 1 , X 2 = i 2 )$ and let $p = ( p i 1 i 2 ) i 1 ∈ [ r 1 ] , i 2 ∈ [ r 2 ]$ be the joint probability mass function of $X 1$ and  $X 2$ . Then $1 ⊥ 2$ if and only if p is a rank one matrix.

If $1 ⊥ 2$ then $P ( X 1 = i 1 , X 2 = i 2 ) = P ( X 1 = i 1 ) P ( X 2 = i 2 )$ . This expresses the joint probability mass function as an outer product of two nonzero vectors, hence p has rank one.

Conversely, if p has rank one, it is expressed as the outer product of two vectors $p = α T β$ . Since p is a matrix of nonnegative real numbers, one can assume that $α$ and $β$ are also nonnegative. Let $‖ . ‖ 1$ denote the $l 1$ -norm. Replacing $α$ by $α / ‖ α ‖ 1$ and $β$ by $β / ‖ β ‖ 1$ , yields a rank one factorization for p where the two factors are necessarily the marginal distributions of $X 1$ and $X 2$ respectively. Hence $1 ⊥ 2$ .

A nonzero matrix having rank one is characterized by the vanishing of all its $2 × 2$ subdeterminants. Hence, one can associate an ideal to the independence statement $1 ⊥ 2$ .

The conditional independence ideal for the statement $1 ⊥ 2$ is

$I 1 ⊥ 2 = ⟨ p i 1 i 2 p j 1 j 2 - p i 1 j 2 p j 1 i 2 : i 1 , j 1 ∈ [ r 1 ] , i 2 , j 2 ∈ [ r 2 ] ⟩ = ⟨ 2 × 2 subdeterminants of p ⟩ ⊆ R [ p i 1 , i 2 : i 1 ∈ [ r 1 ] , i 2 ∈ [ r 2 ] ] .$

Let $r 1 = 2$ and $r 2 = 3$ . Then

$I 1 ⊥ 2 = ⟨ p 11 p 22 - p 12 p 21 , p 11 p 23 - p 13 p 21 , p 12 p 23 - p 13 p 22 ⟩ .$

The conditional independence ideal $I 1 ⊥ 2$ captures the algebraic structure of the independence condition. Although all probability distributions would satisfy the additional constraint that $∑ i 1 ∈ [ r 1 ] , i 2 ∈ [ r 2 ] p i 1 i 2 - 1 = 0$ , this trivial constraint is not included in the conditional independence ideal because leaving it out tends to simplify certain algebraic calculations. For example, without this constraint $I 1 ⊥ 2$ is a binomial ideal.

More generally, any conditional independence condition for discrete random variables can be expressed by similar determinantal constraints. This requires a bit of notation. The determinantal constraints are written in terms of the entries of the joint distribution of $X 1 , ⋯ , X m$ . This is a tensor $p = ( p i 1 , ⋯ , i m ) i j ∈ [ r j ]$ .

Let $A , B , C ⊂ [ m ]$ be disjoint subsets of indices of the random variables $X 1 , ⋯ , X m$ , and $D = [ m ] \ ( A ∪ B ∪ C )$ the set of indices appearing in none of ABC. Any such assignment yields a grouping of indices and random variables. The random vector $X A = ( X j ) j ∈ A$ takes values in $R A = ∏ j ∈ A [ r j ]$ . Let $R B , R C$ and $R D$ be defined analogously. The grouping allows one to write $p = ( p i A , i B , i C , i D )$ where now $i A ∈ R A$ and similarly for $i B , i C$ , and  $i D$ . The final notational gadget is the marginalization of p over D. The entries of this marginal distribution are indexed by $R A , R B , R C$ and have entries

$p i A , i B , i C , + = ∑ i D ∈ R D p i A , i B , i C , i D .$

The $+$ indicates the summation.

The conditional independence ideal for the conditional independence statement $A ⊥ B [ C ]$ is

$I A ⊥ B [ C ] = ⟨ p i A , i B , i C , + · p j A , j B , i C , + - p i A , j B , i C , + · p j A , i B , i C , + , for all i A , j A ∈ R A , i B , j B ∈ R B , i C ∈ R C ⟩ .$

The notation simplifies for saturated conditional independence statements, for which $A ∪ B ∪ C = [ m ]$ . With this condition there is no marginalization, and the defining polynomials of $I A ⊥ B [ C ]$ are binomials.

Consider three binary random variables $X 1 , X 2 , X 3$ . Let $p 111 , ⋯ , p 222$ denote the indeterminates standing for the elementary probabilities in the joint distribution. The conditional independence ideal of the statement $1 ⊥ 3 [ 2 ]$ is

$I 1 ⊥ 3 [ 2 ] = ⟨ p 111 p 212 - p 211 p 112 , p 121 p 222 - p 221 p 122 ⟩ .$

The conditional independence ideal of the statement $1 ⊥ 3$ is

$I 1 ⊥ 3 = ⟨ ( p 111 + p 121 ) ( p 212 + p 222 ) - ( p 112 + p 122 ) ( p 211 + p 221 ) ⟩ .$

For any conditional independence statement $A ⊥ B [ C ]$ , the conditional independence ideal $I A ⊥ B [ C ]$ is a prime ideal and hence $V ( I A ⊥ B [ C ] )$ is an irreducible variety.

Proposition 1.3.2 is a consequence of the fact that general determinantal ideals are prime (see ). Irreducibility of the variety $V ( I A ⊥ B [ C ] )$ can also be deduced from the fact that this variety can be parametrized, for instance, the set of all probability distributions in $V ( I A ⊥ B [ C ] )$ can be realized as the set of probability distributions in a graphical model.

A strictly positive joint distribution p of binary random variables $X 1 , X 2 , X 3$ satisfies $1 ⊥ 3 [ 2 ]$ if and only if

1 $p i 1 , i 2 , i 3 = s i 1 , i 2 t i 2 , i 3$

for some vectors $( s i 1 , i 2 ) i 1 ∈ [ r 1 ] , i 2 ∈ [ r 2 ]$ , $( t i 2 , i 3 ) i 2 ∈ [ r 2 ] , i 3 ∈ [ r 3 ]$ ; see Section . That is, it lies in the undirected graphical model

$X 1 - X 2 - X 3 .$

Since $V ( I A ⊥ B [ C ] )$ is irreducible, any joint distribution (possibly with zeros) that satisfies $1 ⊥ 3 [ 2 ]$ lies in the closure of the undirected graphical model. In fact, any such joint distribution has a parametrization of the form (1), where s or t also may have zeros.

More interesting than just single statements are combinations of two or more conditional independence statements. To determine the classes of distributions satisfying a collection of independence statements leads to interesting problems in computational algebra. Such sets are typically not irreducible varieties and cannot be parametrized with a single parametrization. The first task is to break such a set into components, and to see if those components have natural interpretations in terms of conditional independence and can be parametrized.

Let $C = { A 1 ⊥ B 1 [ C 1 ] , A 2 ⊥ B 2 [ C 2 ] , … }$ be a set of conditional independence statements for the random variables $X 1 , X 2 , … , X m$ . The conditional independence ideal of $C$ is the sum of the conditional independence ideals of the elements of  $C$ :

$I C = I A 1 ⊥ B 1 [ C 1 ] + I A 2 ⊥ B 2 [ C 2 ] + ⋯ .$

Understanding the probability distributions that satisfy  $C$ can be accomplished by analyzing an irreducible decomposition of  $V ( I C )$ , which can be obtained from a primary decomposition of  $I C$ .

Let $X 1$ , $X 2$ , $X 3$ be binary random variables, and consider $C = { 1 ⊥ 3 [ 2 ] , 1 ⊥ 3 }$ . The conditional independence ideal $I C$ is generated by three polynomials of degree 2:

$I C = I 1 ⊥ 3 [ 2 ] + I 1 ⊥ 3 = ⟨ p 111 p 212 - p 112 p 211 , p 121 p 222 - p 122 p 221 , ( p 111 + p 121 ) ( p 212 + p 222 ) - ( p 112 + p 122 ) ( p 211 + p 221 ) ⟩ .$

The following Macaulay2 code asks for the primary decomposition of this ideal over $Q$ . It can be shown that the decomposition is the same over $R$ and  $C$ . loadPackage "GraphicalModels" S = markovRing (2,2,2) L = {{{1},{3},{2}}, {{1},{3},{}}} I = conditionalIndependenceIdeal(S,L) primaryDecomposition I

This code uses the GraphicalModels package of Macaulay2 which implements many convenient functions to work with graphical and other conditional independence models. In particular, it allows to easily set up the polynomial ring with eight variables $p 111 , ⋯ , p 222$ with ’markovRing’ and write out the equations for $I C$ with ’conditionalIndependenceIdeal’. The command ’primaryDecomposition’ is a generic Macaulay2 command. The output of this code consists of two ideals which upon inspection can be recognized as binomial conditional independence ideals themselves. The result is

$I C = I { 1 , 2 } ⊥ 3 ∩ I 1 ⊥ { 2 , 3 } .$

According to Section 1.2 this implies a decomposition of varieties

$V ( I C ) = V ( I { 1 , 2 } ⊥ 3 ) ∪ V ( I 1 ⊥ { 2 , 3 } ) .$

On the level of probability distributions, this proves the binary case of Proposition 1.1.2.

The general situation may be less favorable than that in Example 1.3.4. In particular, the components that appear need not have interpretations in terms of conditional independence. The appearing ideals also need not be prime ideals (in general they are only primary) and it is unclear what this algebraic extra information may reveal about conditional independence. For examples on how to extract information from primary decompositions see [20,24].

#### 3.3.2  Gaussian random variables

Algebraic approaches to conditional independence can also be applied to Gaussian random variables. Let $X ∈ R m$ be a nonsingular multivariate Gaussian random vector with mean $μ ∈ R m$ and covariance matrix $Σ ∈ P D m$ , the cone of $m × m$ symmetric positive definite matrices. One writes $X ∼ N ( μ , Σ )$ . For subsets $A , B ⊆ [ m ]$ let $Σ A , B$ be the submatrix of $Σ$ obtained by extracting rows indexed by A and columns indexed by B, that is $Σ A , B = ( σ a , b ) a ∈ A , b ∈ B$ .

Let $X ∼ N ( μ , Σ )$ with $Σ ∈ P D m$ . Let $A , B , C ⊆ [ m ]$ be disjoint subsets. Then the conditional independence statement $A ⊥ B [ C ]$ holds if and only if the matrix $Σ A ∪ C , B ∪ C$ has rank  $≤ # C$ .

A proof of this proposition recognizes $Σ A ∪ C , B ∪ C$ as a Schur complement of a submatrix of $Σ$ . The details can be found in [17, Proposition 3.1.13]; see also Section 9.1 in the present handbook.

Just as the rank one condition on a matrix was characterized by the vanishing of $2 × 2$ subdeterminants, higher rank conditions on matrices can also be characterized by the vanishing of subdeterminants. Indeed, a basic fact of linear algebra is that a matrix has rank $≤ r$ if and only if the determinant of every $( r + 1 ) × ( r + 1 )$ submatrix is zero. This leads to the conditional independence ideals for multivariate Gaussian random variables.

Let $R [ Σ ] : = R [ σ ij : 1 ≤ i ≤ j ≤ m ]$ be the polynomial ring with real coefficients in the entries of the symmetric matrix $Σ$ .

The Gaussian conditional independence ideal for the conditional independence statement $A ⊥ B [ C ]$ is the ideal

$J A ⊥ B [ C ] : = ⟨ ( # C + 1 ) -minors of Σ A ∪ C , B ∪ C ⟩ .$

If $C = { A 1 ⊥ B 1 [ C 1 ] , A 2 ⊥ B 2 [ C 2 ] , … , }$ is a collection of conditional independence statements, the Gaussian conditional independence ideal is

$J C = J A 1 ⊥ B 1 [ C 1 ] + J A 2 ⊥ B 2 [ C 2 ] + ⋯ .$

A common criterion in statistics says that, in fact, $A ⊥ B [ C ]$ holds if and only if $det ( Σ { a } ∪ C , { b } ∪ C )$ vanishes for all  $a ∈ A$ , $b ∈ B$ . Since, by assumption, C is non-singular, it is easy to see that this condition is, in fact, equivalent to the vanishing of all $( # C + 1 )$ -minors of  $Σ A ∪ C , B ∪ C$ .

Consider the conditional independence statement $2 ⊥ { 1 , 3 } [ 4 ]$ . The ideal $J 2 ⊥ { 1 , 3 } [ 4 ]$ is generated by the $2 × 2$ minors of the matrix

$Σ { 2 , 4 } , { 1 , 3 , 4 } = σ 12 σ 23 σ 24 σ 14 σ 34 σ 44 .$

Since $Σ$ is a symmetric matrix, $σ ij = σ ji$ and one always writes $σ ij$ with $i ≤ j$ . Then

$J 2 ⊥ { 1 , 3 } [ 4 ] = ⟨ σ 12 σ 34 - σ 14 σ 23 , σ 12 σ 44 - σ 14 σ 24 , σ 23 σ 44 - σ 34 σ 24 ⟩ .$

Let $X 1$ , $X 2$ , $X 3$ be jointly Gaussian random variables. The conditional independence ideal of $C = { 1 ⊥ 3 [ 2 ] , 1 ⊥ 3 }$ is

$J C = J 1 ⊥ 3 [ 2 ] + J 1 ⊥ 3 = ⟨ σ 13 σ 22 - σ 12 σ 23 , σ 13 ⟩ .$

Straightforward manipulations of these ideals show

$J C = ⟨ σ 13 σ 22 - σ 12 σ 23 , σ 13 ⟩ = ⟨ σ 12 σ 23 , σ 13 ⟩ = ⟨ σ 12 , σ 13 ⟩ ∩ ⟨ σ 23 , σ 13 ⟩$
$= J { 1 , 2 } ⊥ 3 ∩ J 1 ⊥ { 2 , 3 } .$

This last primary decomposition proves the Gaussian case of Proposition 1.1.2.

#### 3.3.3  The contraction axiom

When computing the decompositions of conditional independence ideals, there might be components that are “uninteresting" from the statistical standpoint. These components might not intersect the region of interest in probabilistic applications (e.g. they might miss the probability simplex or the cone of positive definite matrices) or they might have non-trivial intersections but that intersection is contained in some other component.

Let $X = ( X 1 , X 2 , X 3 )$ be a multivariate Gaussian random vector. The conditional independence ideal of $C = { 1 ⊥ 2 [ 3 ] , 2 ⊥ 3 }$ is

$J C = ⟨ σ 12 σ 33 - σ 13 σ 23 , σ 23 ⟩$

which has primary decomposition

$J C = ⟨ σ 12 σ 33 , σ 23 ⟩ = ⟨ σ 12 , σ 23 ⟩ ∩ ⟨ σ 33 , σ 23 ⟩ .$

This decomposition shows that

$V ( J C ) = V ( ⟨ σ 12 , σ 23 ⟩ ) ∪ V ( ⟨ σ 33 , σ 23 ⟩ ) .$

However, the second component does not intersect the positive definite cone, because $σ 33 > 0$ for all $Σ ∈ P D 3$ . The first component is the conditional independence ideal $J 1 , 3 ⊥ 2$ . From this decomposition it is visible that $1 ⊥ 2 [ 3 ]$ and $2 ⊥ 3$ imply that $1 , 3 ⊥ 2$ . This implication is called the contraction axiom. See Section  for other CI axioms.

The contraction axiom also holds for non-Gaussian random variables. For discrete random variables, it can again be checked algebraically. The primary decomposition associated to the discrete contraction axiom is worked out in detail in . The next example discusses the binary case as an illustration:

Let $X 1 , X 2 , X 3$ be binary random variables. The conditional independence ideal of $C = { 1 ⊥ 2 [ 3 ] , 2 ⊥ 3 }$ is

$I C = ⟨ p 111 p 221 - p 121 p 211 , p 112 p 222 - p 122 p 212 , ( p 111 + p 211 ) ( p 122 + p 222 ) - ( p 112 + p 212 ) ( p 121 + p 221 ) ⟩$

which has primary decomposition

$I C = I 1 , 3 ⊥ 2 ∩ ⟨ p 122 + p 222 , p 112 + p 212 , p 121 p 211 - p 111 p 221 ⟩ ∩ ⟨ p 121 + p 221 , p 111 + p 211 , p 122 p 212 - p 112 p 222 ⟩ .$

The intersection of the second component with the probability simplex forces that $p 122 = p 222 = p 112 = p 212 = 0$ . This in turn implies that

$V ( ⟨ p 122 + p 222 , p 112 + p 212 , p 121 p 211 - p 111 p 221 ⟩ ) ∩ Δ 7 ⊆ V ( I 1 , 3 ⊥ 2 )$

A similar argument holds for the third component. So although the variety $V ( I C )$ has three components, only one of them is statistically meaningful:

$V ( I C ) ∩ Δ 7 = V ( I 1 , 3 ⊥ 2 ) ∩ Δ 7 .$

#### 3.4  Examples of Decompositions of Conditional Independence Ideals

This section studies some examples of families of conditional independence statements and how algebraic tools can be used to understand them. The first example is a detailed study of the intersection axiom, and the second example concerns the conditional independence statements associated to the 4-cycle graph.

#### 3.4.1  The intersection axiom

The intersection axiom from Section  is the following implication of conditional independence statements:

2 $A ⊥ B [ C ∪ D ] and A ⊥ C [ B ∪ D ] ⇒ A ⊥ B ∪ C [ D ] .$

This implication is valid for strictly positive probability distributions. Algebraic techniques can be used to study how its validity extends beyond this.

The question about the primary decomposition of the ideal(s) $I { A ⊥ B [ C ∪ D ] , A ⊥ C [ B ∪ D ] }$ was first asked in [15, Chapter 6.6], and the answer is due to . Grouping variables if necessary, one can assume $A = { 1 }$ , $B = { 2 }$ , $C = { 3 }$ . Moreover, let  $D = ∅$ . From this one can always recover the general case by adding conditioning constraints.

[Proposition 1 in ] The ideal $I { X 1 ⊥ X 2 [ X 3 ] , X 1 ⊥ X 3 [ X 2 ] }$ is radical, that is, its irredundant primary decompositions consists only of prime ideals. These minimal primes correspond to pairs of partitions $[ r 2 ] = A 1 ∪ ⋯ ∪ A s$ , $[ r 3 ] = B 1 ∪ ⋯ ∪ B s$ of the same size. The minimal prime P corresponding to two partitions is

$P = ⟨ p i 1 i 2 i 3 : i 1 ∈ [ r 1 ] , i 2 ∈ A j , i 3 ∈ B k for some j ≠ k ⟩ + ⟨ p i 1 i 2 i 3 p i 1 ′ i 2 ′ i 3 ′ - p i 1 i 2 ′ i 3 ′ p i 1 ′ i 2 i 3 : i 1 , i 1 ′ ∈ [ r 1 ] , i 2 , i 2 ′ ∈ A j , i 3 , i 3 ′ ∈ B j for some j ⟩ .$

The paper  uses a different formulation in terms of complete bipartite graphs. It can be seen that our formulation is equivalent.

To give a statistical interpretation to Proposition 1.4.1, whenever the joint distribution of $X 1 , X 2 , X 3$ lies in the prime P corresponding to the two partitions $[ r 2 ] = A 1 ∪ ⋯ ∪ A s$ , $[ r 3 ] = B 1 ∪ ⋯ ∪ B s$ , construct a random variable B as follows: put $B : = j$ whenever $X 2 ∈ A j$ and  $X 3 ∈ B j$ . Thus, B is uniquely defined except on a set of measure zero, since $P ( X 2 ∈ A j , X 3 ∈ B k ) = 0$ for $j ≠ k$ , which follows from the containment of monomials in P. The variable B specifies in which blocks of the two partitions the random variables $X 2$ and  $X 3$ lie. Now the binomials in P imply that $X 1 ⊥ { X 2 , X 3 } [ B ]$ .

Suppose that $X 1 , X 2 , X 3$ satisfy $X 1 ⊥ X 2 [ X 3 ]$ and $X 1 ⊥ X 3 [ X 2 ]$ . Then there is a random variable B that satisfies:

1. B is a (deterministic) function of  $X 2$ ;
2. B is a (deterministic) function of  $X 3$ ;
3. $X 1 ⊥ { X 2 , X 3 } [ B ]$ .
Conversely, whenever there exists a random variable B with properties 1. to 3., the random variables $X 1 , X 2 , X 3$ satisfy $X 1 ⊥ X 2 [ X 3 ]$ and $X 1 ⊥ X 3 [ X 2 ]$ .

The case where B is a constant corresponds to the CI statement $X 1 ⊥ { X 2 , X 3 }$ . The intersection axiom can be recovered by noting that a function B that is a function of only $X 2$ as well as a function of only $X 3$ is necessarily constant, if the joint distribution of $X 2$ and $X 3$ is strictly positive.

Similar results hold for all families of CI statements of the form $A ⊥ B i [ C i ]$ , where $A ∪ B i ∪ C i = V$ and where A is fixed for all statements, see [20,20] (the case where $X A$ is binary was already described in ). The corresponding CI ideal is still radical, and the minimal primes have a similar interpretation. However, finding the minimal primes is more difficult and involves solving a combinatorial problem. Finally, Corollary 1.4.1 can be generalized to continuous random variables .

#### 3.4.2  The four-cycle

Consider four discrete random variables $X 1 , X 2 , X 3 , X 4$ and the (undirected) graphical model of the four cycle $C 4 = ( V , E )$ with edge set $E = { ( 1 , 2 ) , ( 2 , 3 ) , ( 3 , 4 ) , ( 1 , 4 ) }$ . The global Markov CI statements of this graph are

$global ( C 4 ) = { 1 ⊥ 3 [ { 2 , 4 } ] , 2 ⊥ 4 [ { 1 , 3 } ] } .$

The primary decomposition of the corresponding CI ideal was studied in  in the case where $X 1$ and $X 3$ are binary. The case that all variables are binary is as follows:

[Theorem 5.6 in ] The minimal primes of the CI ideal $I global ( C 4 )$ of the binary four cycle are the toric ideal $I C 4$ and the monomial ideals

$P i = ⟨ p x : x i = x i + 1 ⟩ , P i ′ = ⟨ p x : x i ≠ x i + 1 ⟩ for 1 ≤ i < 4 .$

The ideal $I C 4$ equals the vanishing ideal of the graphical model, to be discussed in Section 1.5. Interestingly, in this primary decomposition, all ideals except $I C 4$ are monomial ideals; that is, they only give support restrictions on the probability distribution.

The primary decomposition of the CI ideal gives an irreducible decomposition of the corresponding set of probability distributions. This leads to the statement of Proposition 1.1.1 in the introduction.

When $X 2$ and $X 4$ are not binary, the decomposition of $I global ( C 4 )$ involves prime ideals parametrized by $i ∈ { 2 , 4 }$ and two sets $∅ ≠ C , D ⊊ [ r i ]$ . For such a choice of iCD, let j denote the element of ${ 2 , 4 } \ { i }$ , and let

$P i , C , D = ⟨ p 1 x 2 1 x 4 : x i ∈ C , x j ∈ [ r j ] ⟩ + ⟨ p 1 x 2 2 x 4 : x i ∈ D , x j ∈ [ r j ] ⟩ + ⟨ p 2 x 2 1 x 4 : x i ∉ D , x j ∈ [ r j ] ⟩ + ⟨ p 2 x 2 2 x 4 : x i ∉ C , x j ∈ [ r j ] ⟩ + I global ( C 4 )$

the result is the following:

[Theorem 6.5 in ] Let $X 1$ and $X 3$ be binary random variables. The minimal primes of the CI ideal $I global ( C 4 )$ of the four cycle are the toric ideal $I C 4$ and the ideals $P i , C , D$ for $i ∈ { 2 , 4 }$ and $∅ ≠ C , D ⊊ [ r i ]$ . Furthermore the ideal is radical and thus equals the intersection of its minimal primes.

In this case, the non-toric primes are not monomial, but consist of monomials and binomials. This fact is independent of the field  $k$ . The following is one example of the kind of information that can be extracted from knowing the minimal primes of a conditional independence ideal.

Let $X 1 , X 2 , X 3 , X 4$ be finite random variables that satisfy  $global ( C 4 )$ , and suppose that $X 1$ and $X 3$ are binary. Then one (or more) of the following statements is true:

1. The joint distribution lies in the closure of the graphical model.
2. There is $i ∈ { 2 , 4 }$ and there are sets $E , F ⊆ [ r i ]$ such that the following holds:
$If ( X 1 , X 3 ) = ( 1 , 1 ) ( 1 , 2 ) ( 2 , 1 ) ( 2 , 2 ) , then X i ∈ E X i ∈ F X i ∉ F X i ∉ E .$
Conversely, any probability distribution that satisfies one of these statements and that satisfies $2 ⊥ 4 [ { 1 , 3 } ]$ also satisfies  $global ( C 4 )$ .

#### 3.5  The Vanishing Ideal of a Graphical Model

Graphical models can be represented via either parametric descriptions (e.g. factorizations of the density function) or implicit descriptions (e.g. Markov properties and conditional independence constraints). One use of the algebraic perspective on graphical models is to find the complete implicit description of the model, in particular, to find the vanishing ideal of the model. As described in Definition 1.2.4, the vanishing ideal of a set S is the set of all polynomial functions that evaluate to zero at every point in S. Although some graphical models have complete descriptions only in terms of conditional independence constraints, understanding the vanishing ideal can be useful for more complex models or hidden variable models where conditional independence is not sufficient to describe the model, for instance, the mixed graph models studied in Chapter 2.

Consider the four cycle $C 4$ and let $X 1 , X 2 , X 3 , X 4$ be binary random variables. The vanishing ideal $I C 4 ⊆ R [ p i 1 i 2 i 3 i 4 : i 1 , i 2 , i 3 , i 4 ∈ { 1 , 2 } ]$ is generated by 16 binomials, 8 of which have degree 2 and 8 of which have degree 4. The degree 2 binomials are all implied by the two conditional independence statements $1 ⊥ 3 [ { 2 , 4 } ]$ and $2 ⊥ 4 [ { 1 , 3 } ]$ . On the other hand, the degree 4 binomials are not implied by the conditional independence constraints, even when we restrict to probability distributions. One example degree four polynomial is

$p 1111 p 1222 p 2122 p 2211 - p 1122 p 1211 p 2111 p 2222$

and the others are obtained by applying the symmetry group of the four cycle and permuting levels of the random variables.

Even in the simple example of the four cycle, there are generators of the vanishing ideal that do not correspond to conditional independence statements. It seems an important problem to try to understand what other types of equations can arise. Theorem 3.2 in  shows that the vanishing ideal of a graphical model of discrete random variables is the toric ideal  $I A$ , where A is the design matrix of the graphical model, defined in the end of Section 3.2. A classification for discrete random variables and undirected graphs of when no further polynomials are needed beyond conditional independence constraints is obtained in the following theorem:

[Theorem 4.4 in ] Let G be an undirected graph and let $M G$ be its graphical model for discrete random variables. Then the vanishing ideal $I ( M G )$ is equal to the conditional independence ideal $I global ( G )$ if and only if G is a chordal graph.

It is unknown what the appropriate analogue of Theorem 1.5.1 is for other families of graphical models, either with different classes of graphs (e.g. DAGs) or with other types of random variables (e.g. Gaussian). Computational studies of the vanishing ideals appear in many different papers: for Bayesian networks with discrete random variables , for Bayesian networks with Gaussian random variables , for undirected graphical models with Gaussian random variables . A characterization for which graph families the vanishing ideal is equal to the conditional independence ideal of global Markov statements is lacking in all these cases.

One natural question is to determine the other generators of the vanishing ideal that do not come from conditional independence, and to give combinatorial structures in the underlying graphs that imply that these more general constraints hold. For instance, for mixed Gaussian graphical models, conditional independence constraints are determinantal, but not every determinantal constraint comes from a conditional independence statement, and there is a characterization of which determinantal constraints come from conditional independence:

A mixed graph $G = ( [ m ] , B , D )$ is a triple where $[ m ] = { 1 , 2 , … , m }$ is the vertex set, B is a set of unordered pairs of [m] representing bidirected edges in G, and D is a set of ordered pairs of [m] representing directed edges in G. There might also be both directed and bidirected edges between a pair of vertices. To the set of B of bidirected edges one associates the set of symmetric positive definite matrices

$P D ( B ) = { Ω ∈ P D m : ω ij = 0 if i ≠ j and i ↔ j ∉ B } .$

To the set of directed edges one associates the set of $m × m$ matrices

$R D = { Λ ∈ R m × m : λ ij = 0 if i → j ∉ D } .$

Let $ϵ ∼ N ( 0 , Ω )$ and let X be a jointly normal random vector satisfying the structural equation system

$X = Λ T X + ϵ .$
This is an example of a linear structural equation model, and contains as special cases various families of graphical models. Let $I d$ denote the $m × m$ identity matrix. With these assumptions, if $( I d - Λ )$ is invertible,
$X ∼ N ( 0 , Σ ) where Σ = ( I d - Λ ) - T Ω ( I d - Λ ) - 1 .$ Figure 1   The mixed graph from Section 3.5.

Consider the mixed graph G from Figure 1. In this case, PD(B) is the set of positive definite matrices of the form

$Ω = ω 11 0 0 0 0 ω 22 0 0 0 0 ω 33 ω 34 0 0 ω 34 ω 44$

and $R D$ is the set of real matrices of the form

$Λ = 0 λ 12 λ 13 0 0 0 λ 23 0 0 0 0 λ 34 0 0 0 0 .$

A positive definite matrix $Σ$ belongs to the graphical model associated to this mixed graph, if and only if there are $Ω ∈ P D ( B )$ and $Λ ∈ R D$ such that $Σ = ( I d - Λ ) - T Ω ( I d - Λ ) - 1$ .

Let $G = ( [ m ] , B , D )$ be a mixed graph. A trek between vertices i and j in G consists of either

• a pair $( P L , P R )$ where $P L$ is a directed path ending in i and $P R$ is a directed path ending in j where both $P L$ and $P R$ have the same source, or
• a pair $( P L , P R )$ where $P L$ is a directed path ending in i and $P R$ is a directed path ending in j such that the source of $P L$ and the source of $P R$ are connected by a bidirected edge.
Let $T ( i , j )$ denote the set of all treks in G between i and j.

To each trek $T = ( P L , P R )$ one associates the trek monomial $m T$ which is the product with multiplicities of all $λ st$ over all directed edges appearing in T times $ω st$ where s and t are the sources of $P L$ and $P R$ . One reason for the interest in treks is the trek rule, which says for the Gaussian graphical model associated to G

$σ ij = ∑ T ∈ T ( i , j ) m T .$

For instance, for the mixed graph in Figure 1, the pair $( { 1 → 2 , 2 → 3 } , { 1 → 3 } )$ is a trek from 3 to 3. The corresponding trek monomial $m T$ is $ω 11 λ 12 λ 23 λ 13$ .

Let $A , B , C A , C B$ be four sets of vertices of G, not necessarily disjoint. The pair of sets $( C A , C B )$ t-separates (short for trek separates) A from B if for every $a ∈ A$ and $b ∈ B$ and every trek $( P L , P R ) ∈ T ( a , b )$ , either $P L$ has a vertex in $C A$ or $P R$ has a vertex in $C B$ , or both.

 Let $G = ( [ m ] , B , D )$ be a mixed graph and A and B two subsets of [m] with $| A | = | B | = k$ . Then the minor $det Σ A , B$ belongs to the vanishing ideal $I G$ if and only if there is a pair of sets $C A , C B$ such that $( C A , C B )$ t-separates A and B and such that $| C A | + | C B | < k$ .

The t-separation criterion can produce implicit constraints for structural equation models in situations where there are no conditional independence constraints.

Consider the mixed graph G from Figure 1. The vanishing ideal of the model is $I G = ⟨ det Σ { 1 , 2 } , { 3 , 4 } ⟩$ . This determinantal constraint is not a conditional independence constraint. It is implied by the t-separation criterion because the pair $( ∅ , { 3 } )$ t-separates ${ 1 , 2 }$ and ${ 3 , 4 }$ .

In the case where there are hidden random variables, the vanishing ideal is typically not sufficient to completely describe the set of probability distributions that come from the graphical model. Usually one also needs to consider inequality constraints and other semialgebraic conditions. This problem is discussed in more detail in [2,3,37], among others.

Diaconis, Eisenbud and Sturmfels  were the first to consider primary decompositions for statistical applications, in particular the analysis of the connectivity of certain random walks. This perspective was also picked up in  using conditional independence ideals. Primary decomposition of conditional independence ideals also makes an appearance in the following papers not already mentioned [12,16,25,34,36].

The algebraic view on undirected graphical models was presented in , which began extensive study of the vanishing ideals of undirected graphical models for discrete random variables. Focus has been on developing techniques for constructing generating sets of the vanishing ideals with [14,21,30] being representative papers in this area. Vanishing ideals of undirected models with Gaussian random variables and models for DAGs have not been much studied. Some papers that initiated their study include [17,32,33].

#### References

1
4ti2team. 4ti2–a software package for algebraic, geometric and combinatorial problems on linear spaces. available at http://www.4ti2.de
2
Allman Elizabeth S , Rhodes John A , Sturmfels Bernd , Zwiernik Piotr . Tensors of nonnegative rank two. Linear Algebra Appl. 2015;473:37–53.
3
Allman Elizabeth S , Rhodes John A , Taylor Amelia . A semialgebraic description of the general Markov model on phylogenetic trees. SIAM J. Discrete Math. 2014;28(2):736–755.
4
Saugata Basu Richard Pollack Marie-Françoise Roy . Algorithms in Real Algebraic Geometry. vol. 10., Algorithms and Computation in Mathematics, second edition,Berlin: Springer-Verlag; 2016.
5
Bochnak Jacek , Coste Michel , Roy Marie-Françoise . Real Algebraic Geometry, volume 36 of Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)]. Berlin: Springer-Verlag; 1998. Translated from the 1987 French original, Revised by the authors
6
Bruns Winfried , Vetter Udo . Determinantal rings, volume 45 of Monografías de Matemática [Mathematical Monographs]. Rio de Janeiro: Instituto de Matemática Pura e Aplicada (IMPA); 1988.
7
David A Cox , John Little , Donal O’Shea . {\em Ideals, Varieties, and Algorithms: An Introduction to Computational Agebraic Geometry and Commutative Algebra}, fourth edition. Springer; 2015.
8
Diaconis Persi , Eisenbud David , Sturmfels Bernd . Lattice walks and primary decomposition. In Mathematical essays in honor of Gian-Carlo Rota (Cambridge, MA, 1996), volume 161 of Progr. Math.. Boston, MA: Birkhäuser Boston; 1998. pp. 173 193--
9
Diaconis Persi , Sturmfels Bernd . Algebraic algorithms for sampling from conditional distributions. Ann. Statist. 1998;26:363–397.
10
Dickenstein Alicia . Laura Felicia Matusevich, and Ezra Miller. Combinatorics of binomial primary decomposition. Math. Z. 2010;264(4):745–763.
11
Drton Mathias , Sturmfels Bernd , Sullivant Seth . Lectures on Algebraic Statistics, volume 39 of Oberwolfach Seminars. Berlin: Springer; 2009. A Birkhäuser book
12
Drton Mathias , Xiao Han . Smoothness of Gaussian conditional independence models. In Algebraic methods in statistics and probability II, volume 516 of Contemp. Math.. Amer. Math. Soc., Providence, RI. 2010. Amer. Math. Soc., Providence, RI. pp. 155–177.
13
Eisenbud David , Sturmfels Bernd . Binomial ideals. Duke Math. J. 1996;84(1):1–45.
14
Engström Alexander , Kahle Thomas , Sullivant Seth . Multigraded commutative algebra of graph decompositions. J. Algebraic Combin. 2014;39(2):335–372.
15
Fink Alex . The binomial ideal of the intersection axiom for conditional probabilities. J. Algebraic Combin. 2011;33(3):455–463.
16
Fink Alex , Rajchgot Jenna , Sullivant Seth . Matrix Schubert varieties and Gaussian conditional independence models. srXiv:1510.04124, 2015
17
Luis David Garcia. Michael Stillman, and Bernd Sturmfels. Algebraic geometry of Bayesian networks. J. Symbolic Comput. 2005;39(3–4):331–355.
18
Geiger Dan , Meek Christopher , Sturmfels Bernd . On the toric algebra of graphical models. Ann. Statist. 2006;34(3):1463–1492.
19
. Macaulay2, a software system for research in algebraic geometry : Available at http://www.math.uiuc.edu/Macaulay2/.
20
Herzog Jürgen , Hibi Takayuki , Hreinsdóttir Freyja , Kahle Thomas , Rauh Johannes . Binomial edge ideals and conditional independence statements. Adv. in Appl. Math. 2010;45(3):317–333.
21
Hoşten Serkan , Sullivant Seth . Gröbner bases and polyhedral geometry of reducible and cyclic models. J. Combin. Theory Ser. A. 2002;100(2):277–301.
22
Kahle Thomas . Decompositions of binomial ideals. J. Softw. Algebra Geom. 2012;4:1–5.
23
Kahle Thomas , Miller Ezra , O’Neill Christopher . Irreducible decomposition of binomial ideals. Compos. Math. 2016;152(6):1319–1332.
24
Kahle Thomas , Rauh Johannes , Sullivant Seth . Positive margins and primary decomposition. J. Commut. Algebra. 2014;6(2):173–208.
25
Kirkup George A . Random variables with completely independent subcollections. J. Algebra. 2007;309(2):427–454.
26
Lněnička Radim , Matúš František . On Gaussian conditional independent structures. Kybernetika (Prague). 2007;43(3):327–342.
27
Peters Jonas . On the intersection property of conditional independence and its application to causal discovery. Journal of Causal Inference. 2015;3(1):97–108.
28
Rauh Johannes . Generalized binomial edge ideals. Adv. in Appl. Math. 2013;50(3):409–414.
29
Rauh Johannes , Ay Nihat . Robustness, canalyzing functions and systems design. Theory Biosci. 2014;133(2):63–78.
30
Rauh Johannes , Sullivant Seth . Lifting Markov bases and higher codimension toric fiber products. J. Symbolic Comput. 2016;74:276–307.
31
Sturmfels Bernd , Sullivant Seth . Toric ideals of phylogenetic invariants. J. Comp. Biol. 2005;12:204–228.
32
Sturmfels Bernd , Uhler Caroline . Multivariate Gaussian, semidefinite matrix completion, and convex algebraic geometry. Ann. Inst. Statist. Math. 2010;62(4):603–638.
33
Sullivant Seth . Algebraic geometry of Gaussian Bayesian networks. Adv. in Appl. Math. 2008;40(4):482–513.
34
Sullivant Seth . Gaussian conditional independence relations have no finite complete characterization. J. Pure Appl. Algebra. 2009;213(8):1502–1506.
35
Sullivant Seth , Talaska Kelli , Draisma Jan . Trek separation for Gaussian graphical models. Ann. Statist. 2010;38(3):1665–1685.
36
Swanson Irena , Taylor Amelia . Minimal primes of ideals arising from conditional independence statements. J. Algebra. 2013;392:299–314.
37
Zwiernik Piotr . Semialgebraic statistics and latent tree models. vol. 146., Monographs on Statistics and Applied Probability Boca Raton, FL: Chapman & Hall/CRC; 2016.

## Use of cookies on this website

We are using cookies to provide statistics that help us give you the best experience of our site. You can find out more in our Privacy Policy. By continuing to use the site you are agreeing to our use of cookies.