18
$\begingroup$

A common approach to construct fast multiplication algorithms is to make an ansatz for the matrix multiplication tensor of fixed dimension and rank (e.g. $2 \times 2 \times 2$ and rank $7$ if we want to rediscover Strassen's method). This leads to a system of nonlinear equations for the coefficients called the Brent equations. This system has far too many unknowns to attempt to solve completely (using e.g. Gröbner bases) but one can look for solutions using heuristics (e.g. SAT, numerical solvers, or staring very hard).

Most authors have only looked for solutions with coefficients in $\mathbb{Z}$, since this simplifies the search and leads to a matrix multiplication scheme that works in any characteristic, though some authors have looked for solutions in $\mathbb{Q}$. A priori, there seems to be no reason to exclude algebraic coefficients, i.e. in a fixed number field.

It has been widely reported that DeepMind's AlphaEvolve recently discovered a rank-48 tensor with coefficients in $\mathbb{Q}(i)$ for $4 \times 4 \times 4$ matrix multiplication, beating the rank-49 scheme achieved by iterating Strassen two times. (Actually, a 2024 paper by I. E. Kaporin (https://link.springer.com/article/10.1134/S0965542524701021) claims essentially the same result in the abstract, but I can't bypass the paywall to check if their result is equivalent.)

Question 1. What attempts have previously been made to search for solutions of the Brent equations in number fields other than $\mathbb{Q}$ and $\mathbb{Q}(i)$? It would seem natural to try other simple degree-2 number fields like $\mathbb{Q}(\sqrt{2})$ or $\mathbb{Q}(\tfrac{1}{2} (1+\sqrt{5}))$ (both of which would have the benefit of working in algebras over the real numbers without artificially adjoining $i$) or maybe $\mathbb{Q}(\sqrt{2 i})$ and $\mathbb{Q}(e^{2 \pi i / 3})$ for small sizes like $4 \times 4 \times 4$. What about number fields of higher degree?

Question 2. One can't help but think of the analogy with polynomial multiplication (which can be viewed as a very special structured matrix multiplication), where 1) considering $\mathbb{Q}$ instead of $\mathbb{Z}$ allows one to use Toom-Cook interpolation, 2) considering $\mathbb{Q}(e^{2 \pi i / N})$ with variable $N$ allows one to use FFT. Naively, one would expect higher roots of unity to make sense for larger matrix multiplications. Is there any existing literature exploring this idea? Are there any trivial reasons why higher roots of unity might not be useful?

$\endgroup$
2
  • 2
    $\begingroup$ I've emailed you pdf of Kaporin's paper. Please tell us more... $\endgroup$ Commented May 18 at 10:42
  • 1
    $\begingroup$ it would be fun if cyclotomic numbers pop up, and the construction is symmetric w.r.t. a finite group (all complex representations of finite groups can be written over cyclotomic numbers). $\endgroup$ Commented May 18 at 10:45

2 Answers 2

11
$\begingroup$

Q1: When people do numerical searches (rather than combinatorial ones) it is often over the real numbers, and then if one finds a solution one attempts to massage it into something exact (say, rational or in a low-degree extension like $\mathbb{Q}(\sqrt{2})$, as you suggest). I'm not aware of people who have explicitly done exact searches for solutions in low-degree extensions, (which isn't to say it hasn't been done, but it is an area I try to pay attention to, FWIW).

Many more combinatorially-minded searches actually work over a finite field (often just $\mathbb{F}_2$) and then attempt to do multivariate Hensel lifting. Technically those would produce p-adic (usually p=2) solutions, but when the multivariate Hensel lifting works out it often (empirically, not a theorem) quickly produces integer solutions (this is how Strassen discovered his algorithm, in fact - though in his case he found the $\mathbb{F}_2$ solution by brute force and then got the signs to work "by hand", I don't think he was using as systmatic algorithm like Hensel lifting).

Q2: asymptotically, the exponent of matrix multiplication at most depends on the characteristic of the field (as I mentioned w/ citation in my answer to another Q). So asymptotically there is no difference between $\mathbb{Q}$ and $\mathbb{Q}(\sqrt{2})$ and $\mathbb{Q}(i)$ and $\mathbb{C}$. This doesn't rule out getting a smaller decomposition for a fixed size, like 4x4, over field extensions as you suggest. But any improvement disappears asymptotically.

(The idea for finite-degree extensions is this: say you were working over ground field $\mathbb{F}$ and found a low-rank $r$ decomposition of the 4x4 MM tensor over a finite-degree extension field $\mathbb{K}$. Then multiplication in $\mathbb{K}$, as an $\mathbb{F}$-algebra, has some finite rank, say $r_0 \leq [\mathbb{K} : \mathbb{F}]^2$, a crude upper bound. So then you can get $4^t \times 4^t$ MM in rank $r^t$ over $\mathbb{K}$. But that gives you $4^t \times 4^t$ MM in rank $r_0 \cdot r^t$ over $\mathbb{F}$. Writing $N=4^t$, this gives a bound of $\omega_{\mathbb{F}} \leq \log_N r_0 + \log_4 r$. As $N \to \infty$ (equivalently, $t \to \infty$), that gives an bound on $\omega_{\mathbb{F}}$ of $\log_4 r$, the same as what you originally got over $\mathbb{K}$. This is an instance of what is sometimes called the "tensor power trick.")

Lastly, in response to Dima Pasechnik's comment - when you do the Cohn-Umans group theoretic approach, the constants that appear are indeed those in the representations of a finite group. Also, many known tensor rank decompositions for MM are indeed invariant under some interesting finite group, see e.g. G.-Moore, Burichenko, Chiantini-Ikenmeyer-Landbserg-Ottaviani and references therein).

$\endgroup$
4
$\begingroup$

Over any commutative ring allowing division by 2, a bound of 46 multiplications for $4 \times 4$ matrices was achieved by Abraham Waksman in 1970.

In fact, the total multiplication required is $\frac{n}{2}(n^2+2n-1)$.

Reference

Abraham Waksman, "On Winograd’s algorithm for inner products", IEEE Trans. Comput. 19, 360-361 (1970), MR455534, Zbl 0196.17104.

$\endgroup$
2
  • 4
    $\begingroup$ The requirement to allow division by 2 has subsequently been removed: doi.org/10.1016/j.jsc.2022.05.002 $\endgroup$ Commented Jun 4 at 14:52
  • 1
    $\begingroup$ Thanks, I guess all these algorithms don't allow decomposition of bigger matrices and hence the claim of the AlphaEvolve paper of 48 is still optimal?! $\endgroup$ Commented Jun 4 at 15:02

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.