Why is number of single cell clusters always greatest in a random matrix?

Question

Consider a large $N\times N$ square lattice, where each cell has a probability $p$ of being "occupied" (let's call denote them as "black") and a probability $1-p$ of being empty (let's denote them as "white"). Cells in the Moore neighbourhood of any central cell and having the same colour as the central cell, are considered to belong to the same ("black" or "white") cluster as that of the central cell.

To be more formal:

Define a cluster of "black" cells as a maximal connected component in the graph of cells with the colour "black", where edges connect cells whose rows and columns both differ by at most $1$ (so up to eight neighbours for each cell). Define a cluster of "white" cells in a similar
fashion.

I wrote a program for this situation (for a $1000\times 1000$ matrix) and found the cluster size distributions, that is, like (say) at $p=0.40$, the number of "black" clusters of size $1$ is $a_1$, the number of "black" clusters of size $2$ is $a_2$, and so on (averaged over $100$ iterations).

Now, interestingly, I found that $\forall p\in (0,1)$, for a matrix of size $1000\times 1000$ the number of clusters of size $1$ is always the greatest (when averaged over $100$ iterations). Is this by fluke or is there a mathematical proof for why this is true? Also, will the result that "number of black clusters of size $1$ is always the greatest for any $p\in (0,1)$", even in the limit $N\to \infty$?

P.S: By "a cluster of size $1$" I mean a cluster having a single cell; by "a cluster of size $2$" I mean a cluster having two cells, and so on.

N.B: All the data files and plots can be found here.

Perhaps you could include a graph of cluster size vs. number? And/or an image of a matrix of cells? — Joseph O'Rourke
– Joseph O'Rourke, Commented Jul 26, 2018 at 22:15
@JosephO'Rourke All the data files and plots can be found here. The DiagonalBSDxyz files correspond to the data for $p=x.yz$. For example DiagonalBSD025.jpg is the the plot for $p=0.25$ and DiagonalBSD025.txt is the data file for $p=0.25$. — user125648
– user125648, Commented Jul 26, 2018 at 22:33

Aaron Meyerowitz · Accepted Answer · 2018-07-28 05:13:20Z

Here is a revised answer that might be clearer:

You define white clusters but I'll just look at black clusters since that is what your data does, and it implies the other interpretation (counts of monochrome clusters.)

Strictly speaking your claim is not totally accurate. For $p=1-\frac1{10^6}$ and a $1000 \times 1000$ grid one would expect on average one white cell and $999,999$ black. The probabilities to see $0,1,2$ or $3$ white cells are about $36.8\%,36.8\%,18.4\%$ and $6\%$ So the largest cluster of black is $1000000$ or $999999$ a little over $\frac23$ of the time. However, if you make the grid $10^8 \times 10^8$ with that same $p$ I think you would see your effect.

The main effect I want to describe is clear already for a $1 \times n$ rectangle, so let me describe that first:

Suppose you flip a random biased coin that comes up heads with probability $p$ and tails with probability $1-p.$ You ignore the tails but when you get a head you record how long the cluster of heads is. Let $P_k$ be the probability that the next head you get will be the start of a cluster of length $k.$ It is easy to see that $P_k=p^{k-1}q$ so $P_{j+1}=pP_j \lt P_j.$ That is really just a $1 \times \infty$ rectangle. Note: If it is a finite $1 \times N$ rectangle then the chance that all the cells will end up black is $p_N=p^N$ so it is possible that $P_N \gt P_1 \gt P_2 \gt \cdots.$

Turn now to an $n \times n$ board. I will assume $n$ is quite large and ignore effects at the corners and sides.

One comment is that for $p$ large enough (and I think $p \gt 0.5$ is enough) there is usually one huge cluster and an assortment of smaller ones.

Already at $p=0.6$ (on average $600,000$ black cells) your data seems to indicate that the largest cluster (over $100$ trials) was always at least $586630$ (so over $97 \%$) and that the second largest was at most $307.$

At $p=0.54$ , if I read it correctly, out of an expected $540,000$ black cells you never saw less than $495,000$ (so over $90 \% $) in the largest cluster.

In a quick look at $p=0.5$ I did not see the phenomenon but for $p=0.51$ there is a jump from $77,311$ to $284,083.$

So I'll speculate that the larger a partial cluster is (say at least up to $250,000$), the more likely it is to grow a bit more. This tends to spread out the larger sizes leaving no one occurring too often.

Here is a small case: Consider a cell not too near the edges. The probability that it is black in a cluster of size $1$ is $P_1=pq^8.$ There are $8$ ways it could be in a cluster of size $2.$ Half of them (shared side) require $10$ other squares to be white. The other four (shared corner) require $12$ white squares. So the probability to be in a cluster of size $2$ is $P_2=p^2(4q^{10}+4q^{12})$

So $P_2=4p(q^2+q^4)P_1.$ Solving for the maximum ratio we get that $P_2<0.9P_1$ with that bound occurring at about $p=0.27.$

Here is another point of view. Randomly assign the distinct weights $1,2,3,\cdots, 100000$ to the squares and then turn them white to black in that order. So we are gradually raising $p.$ Do this 100 or 1000 times. Usually there will gradually be a few isolated one cell clusters far from each other. Eventually the first multi cell cluster will occur. Probably of size $2$ but maybe $3$ or even $4$. But at that stage there are many single cell clusters. Eventually there will be more cells in multi-cell clusters than in single cell ones. But that might be something like $40\%$ in single cell $30\%$ in double cell and $30\%$ in triple cell. That distribution would have the number of clusters of sizes $1,2,3$ in a ratio of 40:15:10. I will stop there.

Your sequel seems to be mathoverflow.net/questions/306974/… — Aaron Meyerowitz
– Aaron Meyerowitz, Commented Jul 28, 2018 at 7:33
If an event has probability $1/M$ and you do $M$ trials then the average number of hits is $1$ while the probability to get exactly one is almost exactly $1/e=0.368$ and that is also the probability to get no hits. For two hits it is $1/2e$ In general $1/(k!e)$ for $k$ small relative to $M$ — Aaron Meyerowitz
– Aaron Meyerowitz, Commented Jul 28, 2018 at 21:00

Stack Exchange Network

Why is number of single cell clusters always greatest in a random matrix?

1 Answer 1

You must log in to answer this question.

Linked

Why is number of single cell clusters always greatest in a random matrix?

1 Answer 1

You must log in to answer this question.

Linked

Related