DEV Community

Viktoras
Viktoras

Posted on • Originally published at dizzy.zone on

BLAKE2b performance on Apple Silicon

For work, I was going to store some hashed tokens in a database. I was going to keep it simple and go with HMAC-SHA256 for it but having recently read Jean-Philippe Aumasson’s book “Serious Cryptography” I remembered that BLAKE2 should be quicker:

BLAKE2 was designed with the following ideas in mind:



It should be faster than all previous hash standards

Cool, I thought, let’s consider BLAKE2 then. First, let’s write a simple benchmark to see just how much faster BLAKE2 would be than HMAC-SHA256. Performance is not important for my use case as the hashing will almost certainly not be a bottleneck but I was curious. So I write a benchmark:

import ( "crypto/hmac" "crypto/rand" "crypto/sha256" "log" "testing" "golang.org/x/crypto/blake2b" ) func BenchmarkHashes(b *testing.B) { token := []byte("some-api-token") secretKey := generateSecretKey() b.ResetTimer() _ = b.Run("HMACSHA256", func(b *testing.B) { for b.Loop() { _ = HMACSHA256(token, secretKey) } }) _ = b.Run("BLAKE2b", func(b *testing.B) { for b.Loop() { _ = BLAKE2b(token, secretKey) } }) } func generateSecretKey() []byte { key := make([]byte, 32) _, err := rand.Read(key) if err != nil { panic(err) } return key } func HMACSHA256(token []byte, secretKey []byte) []byte { h := hmac.New(sha256.New, secretKey) h.Write(token) return h.Sum(nil) } func BLAKE2b(token []byte, secretKey []byte) []byte { hasher, err := blake2b.New256(secretKey) if err != nil { log.Fatal(err) } hasher.Write(token) return hasher.Sum(nil) } 
Enter fullscreen mode Exit fullscreen mode

Run it and the results are:

cpu: Apple M1 Max BenchmarkHashes/HMACSHA256-10 3442680 341.0 ns/op 512 B/op 6 allocs/op BenchmarkHashes/BLAKE2b-10 1966382 584.0 ns/op 416 B/op 2 allocs/op 
Enter fullscreen mode Exit fullscreen mode

OK… So BLAKE2 is slower than HMAC-SHA256. Yes, we have less allocations which is nice, but it does take quite a few more CPU cycles. My first thought is that it might indeed be faster but only if the input is like way, way longer. So I switch to the following token:

 token := []byte(strings.Repeat("some-api-token", 10000)) cpu: Apple M1 Max BenchmarkHashes/HMACSHA256-10 19536 59946 ns/op 512 B/op 6 allocs/op BenchmarkHashes/BLAKE2b-10 6452 190926 ns/op 416 B/op 2 allocs/op 
Enter fullscreen mode Exit fullscreen mode

Hmmmm… This is even worse. Now, it could be that we’re dealing with a non optimal implementation. SHA256 is implemented in the stdlib in Go and very likely well optimized, whereas the BLAKE2 implementation I’m using comes from the golang.org/x/crypto package. Perhaps we can find a better one. The first search result recommends github.com/minio/blake2b-simd which has been archived since 2018. Not promising, but let’s give it a shot.

import ( blakeMinio "github.com/minio/blake2b-simd" ) func BLAKE2bMinio(token []byte, secretKey []byte) []byte { hasher := blakeMinio.NewMAC(32, secretKey) hasher.Write(token) return hasher.Sum(nil) } 
Enter fullscreen mode Exit fullscreen mode

These are the results with the original token.

cpu: Apple M1 Max BenchmarkHashes/HMACSHA256-10 3622322 316.7 ns/op 512 B/op 6 allocs/op BenchmarkHashes/BLAKE2b-10 2881012 415.6 ns/op 416 B/op 2 allocs/op BenchmarkHashes/BLAKE2b-minio-10 2138151 566.4 ns/op 480 B/op 2 allocs/op 
Enter fullscreen mode Exit fullscreen mode

So this is even slower… I quickly glance over the codebase of github.com/minio/blake2b-simd and notice all the architecture specific files but there aren’t any for ARM architecture. Let’s try on a machine with an AMD64 processor.

cpu: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz BenchmarkHashes/HMACSHA256-4 761613 1488 ns/op 512 B/op 6 allocs/op BenchmarkHashes/BLAKE2b-4 2287569 566.9 ns/op 416 B/op 2 allocs/op BenchmarkHashes/BLAKE2b-minio-4 1541990 765.4 ns/op 480 B/op 2 allocs/op 
Enter fullscreen mode Exit fullscreen mode

Right, so it seems that the implementations I’m using are only optimized for AMD64. Or are they? One final test, I spin up an ARM based VPS on Hetzner to test it out. Note, that since the benchmark was not able to determine the exact CPU model the CPU line was omitted.

BenchmarkHashes/HMACSHA256-2 1000000 1007 ns/op 512 B/op 6 allocs/op BenchmarkHashes/BLAKE2b-2 1367085 879.6 ns/op 416 B/op 2 allocs/op BenchmarkHashes/BLAKE2b-minio-2 1123965 1080 ns/op 480 B/op 2 allocs/op 
Enter fullscreen mode Exit fullscreen mode

So with this, it seems that the issue only occurs on Apple Silicon processors. In the benchmarks with other processors, the BLAKE2b does win. I’m not entirely sure what causes this as CPU architectures are not my strong suite. If you do - please let me know by posting a comment.

Top comments (0)