What is the implicit pseudorandomness conjecture behind the use of e.g. numpy.random() for probabilistic applications?

Question

Suppose I want to investigate some complicated probabilistic phenomenon numerically, e.g. the eigenvalues of random matrices. One thing I might do is (ask some software to) generate a bunch of random matrices using numpy.random() or similar, then (ask some software to) compute their eigenvalues. Now, numpy.random() is not "truly random," it is a pseudorandom number generator, so:

Mathematically, what am I implicitly assuming about the behavior of numpy.random() when I assume that my numerical explorations will accurately reflect the nature of the true distribution of the phenomenon of interest? Has anyone stated a formal conjecture making it precise?

In more detail, numpy.random() appears to (edit: just kidding! It used to) use the Mersenne twister algorithm MT19937, which if I've understood correctly boils down to repeatedly multiplying by an $n \times n$ matrix $M$ over $\mathbb{F}_2$ where $n = 32 \cdot 624 = 19968$. This matrix $M$ was designed to have period a Mersenne prime $q = 2^{19937} - 1$ and to satisfy a certain equidistribution property with respect to the way its powers are used to generate $32$-bit numbers.

So it passes an "obvious" randomness test, and also passes less obvious tests such as the Diehard tests, one of which involves checking ranks of random matrices over $\mathbb{F}_2$. This is evidence that it's "pseudorandom enough for probability," e.g. I believe I will see the Tracy-Widom distribution if I generate random $1000 \times 1000$ Hermitian matrices this way. But what exactly am I believing about this single big matrix $M$ when I believe statements of this form?

Here's a stab at illustrating the kind of thing I mean by making this implicit pseudorandomness conjecture precise: it seems it might factor into two steps. First, a step of the form

There is some large $N$ such that, for almost all choices of initial seed, if MT19937 generates $n \le N$ samples from the set $X = \{ 0, 1, 2, \dots 2^{32} - 1 \}$ of $32$-bit numbers, the resulting distribution is sufficiently close to $n$ iid samples from the uniform distribution on $X$ such that...

which is just about generating sufficiently high-quality iid samples from a uniform distribution. Then, a second step of the form:

...if I then apply

some function $f : X^n \to Y$, which is "not too complicated," taking $n$-tuples of $32$-bit integers to elements of (a discretization of) some other sample space $Y$ modeling some probabilistic phenomenon,

such that if $f$ were fed true iid samples from the uniform distribution then it would output (a discretization of) true samples from some distribution of interest,

then $f$ applied to my $n$ samples from MT19937 generates an output sufficiently close to the true distribution of interest, such that I see approximately what I would see with true iid samples.

The part of this I feel like I understand the least well and that might be of most mathematical interest (which is why I'm asking this question here) is how to formalize the meaning of "not too complicated." To take an extreme example, I think $f$ can't be the Kolmogorov complexity, since MT19937 samples should have Kolmogorov complexity bounded by the complexity of the seed plus a pretty small constant. At the very least we should require that $f$ is computable, but that seems too strong, e.g. MT19937 isn't cryptographically secure. I think in practice we can probably get away with pretty weak conditions; $f$ should just be "not intentionally designed to break the Mersenne twister" - but how might we formalize what this means, along the lines of the Möbius pseudorandomness principle?

Edit: This paragraph from Timothy Chow's link Theory of Unconditional Pseudorandom Generators by Hatami and Hoza is relevant to the question I'm trying to ask, emphasis mine. This is specifically to clarify the sense in which one-way functions don't address my question, because the standard those functions are held up to is much too high:

To a theoretician, this state of affairs is deeply unsatisfactory. Yes, modern practical PRGs seem to almost always work well in practice, but we don’t have a mathematically rigorous explanation for why these systems work. It’s not even clear what precisely the goal is. (Mathematically, how can we make a distinction between “adversarially-designed” programs and “naturally-occurring” programs?) By theoreticians’ standards, the success of practical PRGs is largely a mystery.

We can define a computable approximation to Kolmogorov complexity by just enumerating Turing machines up to some size and choosing the smallest one that outputs some string in less than (say) $10^{100}$ steps. This will break MT19937 for the same reason that true Kolmogorov complexity does. So this has to somehow be very complicated, even though it's computable and has a very concise description; intuitively it feels like it's complicated because of the wide variety of different behaviours it ends up simulating (every small Turing machine), but I have no idea how to formalise that. — paste bee
– paste bee, Commented Feb 19 at 23:41
Or from the "not intentionally designed to break the Mersenne twister" perspective: nothing in this definition is specifically about the Mersenne twister at all, but when you run it, it will eventually find a machine that's designed to break MT and then run it, and that "counts" as intentionally breaking MT. — paste bee
– paste bee, Commented Feb 19 at 23:47
The phrase "effective randomness" is I think referring to measures of randomness subject to the constraint of being computable. Also the phrase "k-dimensional equidistribution" is what your first point is getting at, or no? — R Hahn
– R Hahn, Commented Feb 19 at 23:48
Just for the record, I believe NumPy switched from MT19937 to PCG-64 as the default pseudorandom number generator already back in version 1.17 (which was released in 2019). — Timothy Budd
– Timothy Budd, Commented Feb 20 at 8:16
There is a chapter in Knuth's book The Art of Computer Programming dedicated to the topic. — Kaveh
– Kaveh, Commented Feb 23 at 17:07

Timothy Chow · Accepted Answer · 2025-02-20 14:08:38Z

The standard definition of pseudorandomness in computational complexity is that the PRNG output cannot be distinguished from truly random output by any polyonmially bounded algorithm (typically, though not universally, "polynomially bounded" is taken to include non-uniform computation, not just Turing machines; i.e., the relevant complexity class is $\mathsf{P/poly}$, which is a larger class than $\mathsf{P}$ or even $\mathsf{BPP}$). Making this definition completely precise requires some slightly annoying technicalities; Oded Goldreich has written a variety of introductory discussions of this topic.

If you want to phrase it in terms of a conjecture, the conjecture is that one-way functions exist. It has been shown that this conjecture is equivalent to various other conjectures (such as that pseudorandom number generators exist) and the existence of one-way functions is the most commonly cited version of the conjecture.

EDIT: I should add that your example of the Mersenne Twister muddies the waters a bit. As you say, the Mersenne Twister is certainly not cryptographically secure, so it won't meet the pseudorandomness criterion that I described above. If you really are interested in the Mersenne Twister specifically, then you would have to conjecture, among other things, that the particular application you have in mind does not have any "finite-field structure" hidden inside. That might be true for your application, but to conjecture that this is true for "any scientific application" would be a bit rash, since there exist numerous cautionary tales about how non-cryptographically secure PRNGs failed to be strong enough for some Monte Carlo simulation. As a practical matter, if one is worried about such things, I would advise using a PRNG for which it is at least mildly plausible to conjecture that it is cryptographically secure.

Yes, I am looking for a conjecture which applies to the PRNGs which are used casually / in practice / by default, i.e. I naively open up Python or equivalent and use its default PRNG capabilities without knowing anything about one-way functions. I would be interested in hearing more about those failures of Monte Carlo simulations; I heard this mentioned when I was searching yesterday but without any specific examples. — Qiaochu Yuan
– Qiaochu Yuan, Commented Feb 20 at 15:16
One way to narrow the scope that might lead to an answer is to restrict attention to cases where I use the PRNG only to generate random real numbers (which then generate samples from the normal distribution etc) and apply, say, only continuous / Lipschitz / smooth / whatever functions to them (although I guess this doesn't cover the case of eigenvalues of matrices...), so the least significant bits don't matter and I'm not doing anything number-theoretic. — Qiaochu Yuan
– Qiaochu Yuan, Commented Feb 20 at 15:18
@QiaochuYuan For some examples of PRNGs gone wrong, search for the phrase "documented cases" in Theory of Unconditional Pseudorandom Generators. As for what a naive user is tacitly assuming when using a PRNG, it's likely a provably false assumption that PRNG is cryptographically secure. I guess you want to formulate some kind of conjecture that some PRNG that is known to be insecure is nevertheless "secure enough" for some application; to be honest, I'm not sure I see the value in trying to make that more formal. (continued) — Timothy Chow
– Timothy Chow, Commented Feb 20 at 18:18
@Timothy: I guess the reason I find this question interesting is that I am not that concerned about the quality of the PRNG. To use a loose analogy inspired by Möbius orthogonality, it seems even PRNGs which don't come close to being cryptographically secure are nevertheless "orthogonal to most applications in practice," and that's interesting to me and I'd like to understand something about that. To me this is roughly similar in spirit to questions like the normality of $\pi$ or why probabilistic heuristics about the primes seem to work well in practice. — Qiaochu Yuan
– Qiaochu Yuan, Commented Feb 20 at 18:48
@QiaochuYuan For that purpose, I suspect that a more fruitful approach would be to focus not on any particular PRNG, but to focus on some short list of statistical tests. The conjecture would be that any source of digits passing said statistical tests is "good enough in practice." It would then be a theorem that Mersenne Twister passes the tests. — Timothy Chow
– Timothy Chow, Commented Feb 20 at 18:55

user76284 · Accepted Answer · 2025-02-22 02:04:03Z

The Scholarpedia article on algorithmic randomness covers several formalizations of randomness:

Measure theoretical typicalness - Martin-Löf tests
Incompressibility - Kolmogorov complexity
Unpredictability - effective martingales
Arithmetical randomness
Computable and Schnorr randomness
Kurtz randomness
Resource-bounded randomness
Kolmogorov-Loveland randomness

According to the article, the following strict implications hold: n-randomness (n ≥ 2) ⇒ Martin-Löf randomness ⇒ computable randomness ⇒ Schnorr randomness ⇒ Kurtz randomness.

Pseudorandomness can perhaps be seen as some kind of finitary or bounded version of these. (As Levin complexity is a computable, time-bounded version of Kolmogorov complexity.)

Stack Exchange Network

What is the implicit pseudorandomness conjecture behind the use of e.g. numpy.random() for probabilistic applications?

2 Answers 2

You must log in to answer this question.

Linked

What is the implicit pseudorandomness conjecture behind the use of e.g. numpy.random() for probabilistic applications?

2 Answers 2

You must log in to answer this question.

Linked

Related