Revisions to How random are random spanning trees?

replaced the dead link with a Wayback Maxhine link

edited Dec 1, 2023 at 14:33

4.8k
4
38
42

Let $G$ be a graph with $n$ vertices. A spanning forest is any cycle-free subgraph $H$ with the same vertices. An ordered spanning forest is one with a linear order on the edges. If $H$ has $n-c$ edges then it has $c$ connected components. We mainly are concerned with the case $c=1$ when $G$ is connected and $H$ is a spanning tree. There are methods to enumerate the spanning trees methods to enumerate the spanning trees of a labelled graph $G.$ The complete graph $K_n$ has $n^{n-2}$ labelled spanning trees. It is not trivial to uniformly choose a member of $\mathcal{T}(G)$, the set of spanning trees of $G$ unless the set is small enough to list. It appears that a process involving loop erased random walks process involving loop erased random walks can be used to choose uniformly, I'm not sure what the expected stopping time is.

An article by David Aldous An article by David Aldous (A random tree model associated with random graphs, DOI: 10.1002/rsa.3240010402) discusses this process in some detail.

added 6570 characters in body

Source Link

edited Jan 28, 2013 at 1:34

Aaron Meyerowitz

30.3k
2
50
106

Each labelled treeThis is equally likely soan interesting question and more subtle than it first appears. One question is the frequencybehavior of an unlabeledminimal cost spanning tree for the complete graph. For the question at hand it is inversely proportional toworth first considering the size ofbehavior for $G(n,m)$ the automorphism grouprandom graph with $n$ vertices and $m$ edges. SoThen the path occursmodel $\frac{(n-1)!}{2}$ times as often as$G(n,p)$ gives a weighted average over the various $G(n,m)$ concentrated around $m=np$. Exact results seem difficult to give although as $m$ or $p$ increases a star is more likely than a path. Perhaps in trees with small diameter and more leaves are favored.

Let $G$ be a graph with $n$ vertices. A spanning forest is any cycle-free subgraph $H$ with the same vertices. An ordered spanning forest is one with a linear order on the edges. If $H$ has $n-c$ edges then it has $c$ connected components. We mainly are concerned with the case $c=1$ when $G$ is connected and $H$ is a spanning tree. There are methods to enumerate the spanning trees of a labelled graph $G.$ The probabilitycomplete graph $p$$K_n$ has $n^{n-2}$ labelled spanning trees. It is irrelevant except that you neednot trivial to account foruniformly choose a member of $\mathcal{T}(G)$, the possibility thatset of spanning trees of $G$ unless the set is small enough to list. It appears that a process involving loop erased random walks can be used to choose uniformly, I'm not connectedsure what the expected stopping time is.

I claim that we can equivalently makeA simple method for choosing a random permutation of the edgesspanning tree is to first randomly assign each edge of $K_n$$G$ a distinct weight weight and then takeconsider them one atin order of increasing weight accepting each edge if it creates a timespanning forest with those previously selected , rejecting any whichrejecting it if it would create a cycle with the ones already accepted, and stopping when a spanning tree has been created. This is the Kruskal algorithm for the minimal cost spanning tree (MCST). This is really just choosing a random permutation on the edges, but there are reasons to describe it this way. If we keep track of the order then we have aan ordered spanning tree which is the last member of a sequence of ordered spanning forests.

ThereWe pause to consider application of this method to the complete graph $K_n.$ It might be thought to give the uniform distribution on $\mathcal{F}(K_n)$ but it does not. Two ways to see that this can not be uniform are two random steps here: Get

The probability of attaining any given spanning tree $T$ will be a fraction $\frac{c(T)}{\binom{n}{2}!}.$ When $n \gt 3$ is an odd prime, the highest power of $n$ dividing the denominator is $n^{(n-1)/2} \lt n^{n-2}.$

By direct computation, of the $16$ spanning trees of $K_4$, each of the four stars occurs with probability $3/6 \cdot 2/5 \cdot 1/3=1/15=12/180$ and each of the $12$ paths with probability $11/180.$

To see why the distribution is not uniform it helps to consider ordered spanning forests. Given an (ordered) spanning forest $S$ with $c$ connected components having $n_1 \ge n_2 \ge \cdots \ge n_c$ vertices there are $q(S)=\sum_{i \lt j}n_i n_j$ ways to chose an edge giving a spanning forest $G(n,p)$ random graph, then assign$S'$ with $c-1$ components. $q(s) \ge (2n-c)(c-1)/2$ with equality exactly when $n_2=\cdots=n_c=1.$ The probability that the MCST method arrives at $S'$ is $p(S')=\frac{p(S)}{q(S)}$. The probability that the MCST arrives at an (unordered) labelled spanning tree $T$ of $K_n$ is the sum of $p(T^*)$ over the $(n-1)!$ orders of the edges distinct random weights and use your favorite algorithm to getof $T.$ Each of the unique minimal weight$(n-1)!$ terms is of the form $\prod_{j=0}^{n-2}\frac{1}{q(S_j)}$ where $S_j$ is an ordered spanning tree with $n-j$ components. The first step mightlargest this term can be imagined as: Assign theis $\frac{2^{n-1}}{(2n-2)!}$ for those orders which have all edges in the same component at each stage. So the spanning trees with the highest probability are the $n$ stars which occur with probability $\frac{2^{n-1}(n-1)!}{(2n-2)!}.$ To find the probability of other spanning trees would be more complicated because different edge orders give different terms.

An article by David Aldous discusses this process in some detail.

We might first somehow choose a "random" subgraph of $H$ of $K_n$ random weightswith less edges and then choose a "random" spanning tree of $H$ either uniformly or using the MCST method. We will assume that if the chosen fromgraph $(0,1)$$H$ is not connected then we just choose again by the same process and repeat until we get a connected $H.$

Consider first choosing a random (connected) member of $G(n,m)$ the set of $m$-edge subgraphs of $K_n$, then erasechoosing a spanning tree. In the extreme case $m=n-1$ we are directly (and uniformly) choosing a spanning tree although it will take a super exponential number of trials. For $m \gt n-1$ we need to specify how we choose a spanning tree. Let $G(n,m)(T)$ be the set of such graphs in $G(n,m)$ containing a given spanning tree $T$. When $m \gt \binom{n-1}{2}$ all edges with weight greaterthe graphs in $G(n,m)$ are connected, The number of connected graphs in $G(n,m)$ is fairly easy to compute for $m$ only slightly less that that or only slightly more than $p$$n-1.$ Otherwise it might be a bit hard. However, each set $G(n,m)(T)$ has the same size, the number of ways to getchoose $G$$m-(n-1)$ of the $\binom{n}{2}-(n-1)$ other edges. Since we already haveThe chance that first choosing a random edge weightsconnected member of $G(n,m)$ and then uniformly choosing a spanning subtree leads to $T$ is proportional to $$\sum_{H \in G(n,m)(T)}\frac{1}{|\mathcal{T}(H)|}.$$ At the lower end , just$m=n-1$, the chance of a specific star (or any other given spanning tree) is $\frac{1}{n^{n-2}}.$ For $m=n$ we see that (for uniform or MCST selection) the probability of a given star is proportional to $\binom{n-1}{2}/3=O(n^2)$ while a given path has probability proportional to $\sum_1^{n-2}\frac{j}{n+1-j}=O(n\ln{n}).$ I suspect that , uniformly selecting a spanning tree, the probability of a star increases and of a path decreases monotonically with $m$ . For larger $m$ I suspect less difference between $G(n,m)$ and $G(n,m+1)$. If we use those for the MCST method to choose a spanning tree then things will be different $T$(once $m \gt n$) although I still suspect the same thing. In fact,

Finally we could doget to the question: What about the model $G(n,p)$ wherewe choose each edge with probability $p$? Then the number of edges selected will be $m$ with probability $\binom{N}{m}p^m(1-p)^{N-m}$ where $N=\binom{n}{2}.$ However the requirement to choose a connected subgraph will bias against small $m.$ I suspect that as first$p$ goes to zero we will approach the uniform distribution on spanning trees, this is definitely true for (to$n=4.$ For $p \gt 2\ln{n}/n$ we are almost sure to have a connected subgraph. So we have some combination of the weighteddistributions from $G(n,m)$ concentrated around $m=np.$

We can reverse the order in this two step process of choosing a random subgraph and then the MCST of that subgraph. First randomly weight the edges of $K_n$ uniformly from $(0,1).$ In the G(n,m) to getmodel we $T$ and(may as well) retain only the then go through and erase$m$ lowest weight edges. In the $G(n,p)$ model we retain only those edges with weight over $p$ or less. Since we already have an edge weights we can utilize them for the MCST, provided we have a connected graph. The reverse order is to get $G.$ If this removes anyfirst find the MCST. Then we accept it or start over according as the edges of that tree are among the first $T$ then$m$ $G$ has(or have weights no larger than $p$.) For any spanning tree it is possible that it is not completely chosen until we have rejected $\binom{n-2}{2}$ edges. However we have seen that the expected number of rejected edges before completion is greater for some trees than for others. When (very unlikely if$m$ or $n$$p$ is not too small andlarge, a tree with a lower expected rejection rate is favored. When $m$ or $p$ is much above $\frac{1}{n}$small we are allowed fewer rejects before resetting and the distribution is more uniform.)

Each labelled tree is equally likely so the frequency of an unlabeled tree is inversely proportional to the size of the automorphism group. So the path occurs $\frac{(n-1)!}{2}$ times as often as the star. The probability $p$ is irrelevant except that you need to account for the possibility that $G$ is not connected.

I claim that we can equivalently make a random permutation of the edges of $K_n$ and then take them one at a time, rejecting any which create a cycle with the ones already accepted, stopping when we have a tree.

There are two random steps here: Get a $G(n,p)$ random graph, then assign the edges distinct random weights and use your favorite algorithm to get the unique minimal weight spanning tree. The first step might be imagined as: Assign the edges of $K_n$ random weights uniformly chosen from $(0,1)$ and then erase all edges with weight greater than $p$ to get $G$. Since we already have random edge weights, just use those for the tree $T$. In fact, we could do that first (to the weighted $K_n$) to get $T$ and only then go through and erase the edges with weight over $p$ to get $G.$ If this removes any edges of $T$ then $G$ has no spanning trees (very unlikely if $n$ is not too small and $p$ is much above $\frac{1}{n}$.)

This is an interesting question and more subtle than it first appears. One question is the behavior of minimal cost spanning tree for the complete graph. For the question at hand it is worth first considering the behavior for $G(n,m)$ the random graph with $n$ vertices and $m$ edges. Then the model $G(n,p)$ gives a weighted average over the various $G(n,m)$ concentrated around $m=np$. Exact results seem difficult to give although as $m$ or $p$ increases a star is more likely than a path. Perhaps in trees with small diameter and more leaves are favored.

Let $G$ be a graph with $n$ vertices. A spanning forest is any cycle-free subgraph $H$ with the same vertices. An ordered spanning forest is one with a linear order on the edges. If $H$ has $n-c$ edges then it has $c$ connected components. We mainly are concerned with the case $c=1$ when $G$ is connected and $H$ is a spanning tree. There are methods to enumerate the spanning trees of a labelled graph $G.$ The complete graph $K_n$ has $n^{n-2}$ labelled spanning trees. It is not trivial to uniformly choose a member of $\mathcal{T}(G)$, the set of spanning trees of $G$ unless the set is small enough to list. It appears that a process involving loop erased random walks can be used to choose uniformly, I'm not sure what the expected stopping time is.

A simple method for choosing a spanning tree is to first randomly assign each edge of $G$ a distinct weight weight and then consider them in order of increasing weight accepting each edge if it creates a spanning forest with those previously selected , rejecting it if it would create a cycle, and stopping when a spanning tree has been created. This is the Kruskal algorithm for the minimal cost spanning tree (MCST). This is really just choosing a random permutation on the edges, but there are reasons to describe it this way. If we keep track of the order then we have an ordered spanning tree which is the last member of a sequence of ordered spanning forests.

We pause to consider application of this method to the complete graph $K_n.$ It might be thought to give the uniform distribution on $\mathcal{F}(K_n)$ but it does not. Two ways to see that this can not be uniform are

The probability of attaining any given spanning tree $T$ will be a fraction $\frac{c(T)}{\binom{n}{2}!}.$ When $n \gt 3$ is an odd prime, the highest power of $n$ dividing the denominator is $n^{(n-1)/2} \lt n^{n-2}.$

By direct computation, of the $16$ spanning trees of $K_4$, each of the four stars occurs with probability $3/6 \cdot 2/5 \cdot 1/3=1/15=12/180$ and each of the $12$ paths with probability $11/180.$

To see why the distribution is not uniform it helps to consider ordered spanning forests. Given an (ordered) spanning forest $S$ with $c$ connected components having $n_1 \ge n_2 \ge \cdots \ge n_c$ vertices there are $q(S)=\sum_{i \lt j}n_i n_j$ ways to chose an edge giving a spanning forest $S'$ with $c-1$ components. $q(s) \ge (2n-c)(c-1)/2$ with equality exactly when $n_2=\cdots=n_c=1.$ The probability that the MCST method arrives at $S'$ is $p(S')=\frac{p(S)}{q(S)}$. The probability that the MCST arrives at an (unordered) labelled spanning tree $T$ of $K_n$ is the sum of $p(T^*)$ over the $(n-1)!$ orders of the edges of $T.$ Each of the $(n-1)!$ terms is of the form $\prod_{j=0}^{n-2}\frac{1}{q(S_j)}$ where $S_j$ is an ordered spanning tree with $n-j$ components. The largest this term can be is $\frac{2^{n-1}}{(2n-2)!}$ for those orders which have all edges in the same component at each stage. So the spanning trees with the highest probability are the $n$ stars which occur with probability $\frac{2^{n-1}(n-1)!}{(2n-2)!}.$ To find the probability of other spanning trees would be more complicated because different edge orders give different terms.

An article by David Aldous discusses this process in some detail.

We might first somehow choose a "random" subgraph of $H$ of $K_n$ with less edges and then choose a "random" spanning tree of $H$ either uniformly or using the MCST method. We will assume that if the chosen graph $H$ is not connected then we just choose again by the same process and repeat until we get a connected $H.$

Consider first choosing a random (connected) member of $G(n,m)$ the set of $m$-edge subgraphs of $K_n$, then choosing a spanning tree. In the extreme case $m=n-1$ we are directly (and uniformly) choosing a spanning tree although it will take a super exponential number of trials. For $m \gt n-1$ we need to specify how we choose a spanning tree. Let $G(n,m)(T)$ be the set of such graphs in $G(n,m)$ containing a given spanning tree $T$. When $m \gt \binom{n-1}{2}$ all the graphs in $G(n,m)$ are connected, The number of connected graphs in $G(n,m)$ is fairly easy to compute for $m$ only slightly less that that or only slightly more than $n-1.$ Otherwise it might be a bit hard. However, each set $G(n,m)(T)$ has the same size, the number of ways to choose $m-(n-1)$ of the $\binom{n}{2}-(n-1)$ other edges. The chance that first choosing a random connected member of $G(n,m)$ and then uniformly choosing a spanning subtree leads to $T$ is proportional to $$\sum_{H \in G(n,m)(T)}\frac{1}{|\mathcal{T}(H)|}.$$ At the lower end , $m=n-1$, the chance of a specific star (or any other given spanning tree) is $\frac{1}{n^{n-2}}.$ For $m=n$ we see that (for uniform or MCST selection) the probability of a given star is proportional to $\binom{n-1}{2}/3=O(n^2)$ while a given path has probability proportional to $\sum_1^{n-2}\frac{j}{n+1-j}=O(n\ln{n}).$ I suspect that , uniformly selecting a spanning tree, the probability of a star increases and of a path decreases monotonically with $m$ . For larger $m$ I suspect less difference between $G(n,m)$ and $G(n,m+1)$. If we use the MCST method to choose a spanning tree then things will be different (once $m \gt n$) although I still suspect the same thing.

Finally we get to the question: What about the model $G(n,p)$ wherewe choose each edge with probability $p$? Then the number of edges selected will be $m$ with probability $\binom{N}{m}p^m(1-p)^{N-m}$ where $N=\binom{n}{2}.$ However the requirement to choose a connected subgraph will bias against small $m.$ I suspect that as $p$ goes to zero we will approach the uniform distribution on spanning trees, this is definitely true for $n=4.$ For $p \gt 2\ln{n}/n$ we are almost sure to have a connected subgraph. So we have some combination of the distributions from $G(n,m)$ concentrated around $m=np.$

We can reverse the order in this two step process of choosing a random subgraph and then the MCST of that subgraph. First randomly weight the edges of $K_n$ uniformly from $(0,1).$ In the G(n,m) model we (may as well) retain only the $m$ lowest weight edges. In the $G(n,p)$ model we retain only those edges with weight $p$ or less. Since we already have an edge weights we can utilize them for the MCST, provided we have a connected graph. The reverse order is to first find the MCST. Then we accept it or start over according as the edges of that tree are among the first $m$ (or have weights no larger than $p$.) For any spanning tree it is possible that it is not completely chosen until we have rejected $\binom{n-2}{2}$ edges. However we have seen that the expected number of rejected edges before completion is greater for some trees than for others. When $m$ or $p$ is large, a tree with a lower expected rejection rate is favored. When $m$ or $p$ is small we are allowed fewer rejects before resetting and the distribution is more uniform.

Source Link

answered Jan 25, 2013 at 2:16

Aaron Meyerowitz

30.3k
2
50
106

Each labelled tree is equally likely so the frequency of an unlabeled tree is inversely proportional to the size of the automorphism group. So the path occurs $\frac{(n-1)!}{2}$ times as often as the star. The probability $p$ is irrelevant except that you need to account for the possibility that $G$ is not connected.

I claim that we can equivalently make a random permutation of the edges of $K_n$ and then take them one at a time, rejecting any which create a cycle with the ones already accepted, stopping when we have a tree.

There are two random steps here: Get a $G(n,p)$ random graph, then assign the edges distinct random weights and use your favorite algorithm to get the unique minimal weight spanning tree. The first step might be imagined as: Assign the edges of $K_n$ random weights uniformly chosen from $(0,1)$ and then erase all edges with weight greater than $p$ to get $G$. Since we already have random edge weights, just use those for the tree $T$. In fact, we could do that first (to the weighted $K_n$) to get $T$ and only then go through and erase the edges with weight over $p$ to get $G.$ If this removes any edges of $T$ then $G$ has no spanning trees (very unlikely if $n$ is not too small and $p$ is much above $\frac{1}{n}$.)

Stack Exchange Network

Return to Answer