Suppose I want to investigate some complicated probabilistic phenomenon numerically, e.g. the eigenvalues of random matrices. One thing I might do is (ask some software to) generate a bunch of random matrices using numpy.random() or similar, then (ask some software to) compute their eigenvalues. Now, numpy.random() is not "truly random," it is a pseudorandom number generator, so:
Mathematically, what am I implicitly assuming about the behavior of
numpy.random()when I assume that my numerical explorations will accurately reflect the nature of the true distribution of the phenomenon of interest? Has anyone stated a formal conjecture making it precise?
In more detail, numpy.random() appears to (edit: just kidding! It used to) use the Mersenne twister algorithm MT19937, which if I've understood correctly boils down to repeatedly multiplying by an $n \times n$ matrix $M$ over $\mathbb{F}_2$ where $n = 32 \cdot 624 = 19968$. This matrix $M$ was designed to have period a Mersenne prime $q = 2^{19937} - 1$ and to satisfy a certain equidistribution property with respect to the way its powers are used to generate $32$-bit numbers.
So it passes an "obvious" randomness test, and also passes less obvious tests such as the Diehard tests, one of which involves checking ranks of random matrices over $\mathbb{F}_2$. This is evidence that it's "pseudorandom enough for probability," e.g. I believe I will see the Tracy-Widom distribution if I generate random $1000 \times 1000$ Hermitian matrices this way. But what exactly am I believing about this single big matrix $M$ when I believe statements of this form?
Here's a stab at illustrating the kind of thing I mean by making this implicit pseudorandomness conjecture precise: it seems it might factor into two steps. First, a step of the form
There is some large $N$ such that, for almost all choices of initial seed, if MT19937 generates $n \le N$ samples from the set $X = \{ 0, 1, 2, \dots 2^{32} - 1 \}$ of $32$-bit numbers, the resulting distribution is sufficiently close to $n$ iid samples from the uniform distribution on $X$ such that...
which is just about generating sufficiently high-quality iid samples from a uniform distribution. Then, a second step of the form:
...if I then apply
- some function $f : X^n \to Y$, which is "not too complicated," taking $n$-tuples of $32$-bit integers to elements of (a discretization of) some other sample space $Y$ modeling some probabilistic phenomenon,
- such that if $f$ were fed true iid samples from the uniform distribution then it would output (a discretization of) true samples from some distribution of interest,
then $f$ applied to my $n$ samples from MT19937 generates an output sufficiently close to the true distribution of interest, such that I see approximately what I would see with true iid samples.
The part of this I feel like I understand the least well and that might be of most mathematical interest (which is why I'm asking this question here) is how to formalize the meaning of "not too complicated." To take an extreme example, I think $f$ can't be the Kolmogorov complexity, since MT19937 samples should have Kolmogorov complexity bounded by the complexity of the seed plus a pretty small constant. At the very least we should require that $f$ is computable, but that seems too strong, e.g. MT19937 isn't cryptographically secure. I think in practice we can probably get away with pretty weak conditions; $f$ should just be "not intentionally designed to break the Mersenne twister" - but how might we formalize what this means, along the lines of the Möbius pseudorandomness principle?
Edit: This paragraph from Timothy Chow's link Theory of Unconditional Pseudorandom Generators by Hatami and Hoza is relevant to the question I'm trying to ask, emphasis mine. This is specifically to clarify the sense in which one-way functions don't address my question, because the standard those functions are held up to is much too high:
To a theoretician, this state of affairs is deeply unsatisfactory. Yes, modern practical PRGs seem to almost always work well in practice, but we don’t have a mathematically rigorous explanation for why these systems work. It’s not even clear what precisely the goal is. (Mathematically, how can we make a distinction between “adversarially-designed” programs and “naturally-occurring” programs?) By theoreticians’ standards, the success of practical PRGs is largely a mystery.