A couple of months ago I wrote an expanation of ML-KEM in the discussion thread of an Ars Technica article. Folks seemed to like my explanation and the Ars staff marked it as a “Staff Pick”.
Below is the explanation that I wrote with some minor corrections…
Close friend of mine from college works as a cryptographer for some government agency (he jokes he can’t tell me which one it actually is…he went straight there from his Stanford PhD) and he tried to explain ML-KEM to me a couple weeks ago in “simple” language and my brain gave up after about 2 minutes.
This might help: Kyber - How does it work?
One thing that is different about ML-KEM compared to finite-field Diffie-Hellman (DH) and Elliptic-Curve Diffie-Hellman (ECDH) is that the former is a Key Encapsulation Mechanism (KEM) and the latter are key-agreement protocols.
In DH and ECDH:
- Alice generates a keypair which consists of a public key and a private key.
- Alice keeps the private key to herself and sends the public key to Bob.
- Bob generates a keypair which consists of a public key and a private key.
- Bob keeps the private key to himself and sends the public key to Alice.
- Alice combines her private key from step #1 with Bob’s public key from step #4 and produces a shared value.
- Bob combines his private key from step #3 with Alice’s public key from step #2 and produces the same shared value that Alice produced in step #5.
In ML-KEM things work a bit differently:
- The first party (Alice) generates an keypair which consists of an encapsulation key (analagous to a public key in DH) and a decapsulation key (analagous to a private key in DH).
- Alice sends an encapsulation key to the second party (Bob).
- Bob generates a random value, which is hashed with a hash of Alice’s encapsulation key from step #2 to generate a shared value (32 bytes in ML-KEM).
- Bob uses Alice’s encapsulation key from step #2 to encapsulate the random value, producing a ciphertext.
- Bob sends the ciphertext from step #4 back to Alice.
- Alice uses the decapsulation key from step #1 to decapsulate the random value generated by Bob in step #3 from the ciphertext sent by Bob in step #5.
- Alice hashes the random value from step #6 with the hash of the encapsulation key to generate the shared value.
So in DH and ECDH, both parties contribute equally to the shared value. In ML-KEM, one party (Bob) generates a random value which is hashed with a hash of the other party’s (Alice) encapsulation key to produce the shared value.
Another difference between DH/ECDH and ML-KEM is this: the shared value produced by DH and ECDH cannot be safely used as the secret key for a symmetric cipher (e.g. AES) because the shared value is biased (some values are much more likely than others). To safely derive a secret key (or keys) for use with a symmetric cipher, the shared value produced by DH and ECDH needs to be passed through a key derivation function (KDF) like HKDF.
The shared value derived in ML-KEM is uniformly random and can be used directly as the key for a symmetric cipher (FIPS 203, section 3.1). In practice I expect the ML-KEM shared value to be passed to a KDF anyway, because many protocols (for example, TLS) need to derive several keys in order to establish a secure channel.
DH, ECDH, and ML-KEM all rely on “hard” problems based on trapdoor functions.
“Hard” in this context means “believed to be computationally infeasible to solve without an implausible amount of computational resources or time”.
A trapdoor function is a function that is easy to compute in one direction but hard to compute in the other direction without some additional information. For example, it is easy to calculate 61 * 71 and hard to calculate the integer factors of 4331. However, if I tell you that one of the factors of 4331 is 61, then it is easy for you to calculate the other factor: 4331 / 61 = 71. This is the integer factorization problem, and it’s the basis for RSA.
The hard problem that Finite-Field Diffie-Hellman (FFDH) key exchange is based on is the discrete logarithm problem, which is this:
Given b = gs mod p, it is hard to calculate s, where:
s is a large randomly chosen positive integer that is secretg is a fixed positive integer that is publicly known and carefully chosen by cryptographersp is a fixed large prime number that is publicly known and carefully chosen by cryptographers
The hard problem that Elliptic-Curve Diffie-Hellman (ECDH) key exchange is based on is known as the elliptic curve discrete logarithm problem (ECDLP), which is this:
Given b = sG, it is hard to calculate s, where:
s is a large randomly chosen positive integer that is secret, andG is a fixed, publicly known point on an elliptic curve over a finite field. The point, elliptic curve, and field are all carefully chosen by cryptographers.
The hard problem that ML-KEM is based on is the Module Learning With Errors (MLWE) problem, which is derived from the Learning With Errors (LWE) problem. A simplified version of the LWE problem is this:
Given t = As + e, it is hard to calculate s, where:
t is a public vector with integer elementsA is a public square matrix with elements that are random integer valuess is a secret vector with elements that are small integer values (the secret)e is a secret vector with elements that are small integer values (the error)
Note that if you remove e from the equation above, then solving for s becomes very easy:
- Calculate
A-1 using Gaussian elimination. - Multiply by
A-1 from the left: A-1 t = A-1 As - Solution:
s = A-1 t
The important bit here is that the error vector is critical to making the problem hard.
In the Module Learning With Errors (MLWE) problem that is used by ML-KEM, the integers from the simplified LWE explanation above are replaced by polynomials with 256 coefficients.
(This is explained cryptically in FIPS 203, section 3.2)
Unfortunately the large polynomials make it difficult to visualize ML-KEM. There is a simplified implementation of Kyber called “Baby Kyber” in the Kyber - How does it work? article linked above that is easier to understand.
One problem with using 256-coefficient polynomials is multiplication. Adding polynomials is done coefficient-wise, so adding two polynomials with 256 integer coefficients requires 256 integer additions.
Polynomial multiplication, on the other hand, requires multiplying every coefficient by every other coefficient. This means that multiplying two polynomials with 256 integer coefficients requires 256 * 256 = 65536 integer multiplications.
To work around this, ML-KEM uses a trick called the Number-Theoretic Transform (NTT, FIPS 203, section 4.3). This allows polynomial multiplication to be done (almost) coefficient-wise and drastically reduces the number of integer multiplications needed.
Using polynomials instead of large, multi-word integers like DH and ECDH might seem confusing at first, but it actually simplifies a lot of the implementation because it enables SIMD optimizations and you don’t have to deal with carry propagation.
Another neat trick used by ML-KEM is compressing the A matrix in the encapsulation key by including a 32-byte seed (rho) instead of the actual polynomial coefficients. This seed value is expanded with SHAKE128 to pseudorandomly generate the coefficients for the polynomial elements of the A matrix (FIPS 203, SampleNTT() in section 4.2.2 and section 5.1).
There are a lot of other details but hopefully this gives you enough to start to get your head around ML-KEM.
If you want to see some source code, here is a self-contained, single-file, dependency-free C11 implementation of the FIPS 203 initial public draft (IPD) that I wrote last year. It includes test vectors, SIMD acceleration, and the necessary bits of FIPS 202 (SHA3-256, SHA3-512, SHAKE128, and SHAKE256), but it has not been updated to reflect the changes in the final version of FIPS 203. I mainly wrote it for fun, to learn, and to provide public comments to NIST during the standardization process.