Do you know what false sharing is?
It’s a subtle issue that can ruin the performance of concurrent programs, especially when running on multi-core CPUs. You expect your goroutines to run in parallel and speed things up... but instead, you get almost the same performance as if you were using just a single core.
🧵 A Simple Example
Let’s take a basic case where two goroutines update different fields in the same struct:
type Metrics struct { A int64 B int64 }
And now we run them concurrently:
go func() { for i := 0; i < N; i++ { m.A++ } }() go func() { for i := 0; i < N; i++ { m.B++ } }()
Looks fine—different fields, right? But under the hood, A
and B
might live side-by-side in memory, potentially inside the same CPU cache line.
When that happens, each write operation from different cores triggers cache invalidation, even though they’re not touching the same variable. This causes the CPU to constantly sync memory between cores—losing the benefits of parallelism.
</> Code example
Below is code demonstrating the false sharing effect in action:
package main import ( "fmt" "runtime" "sync" "time" ) const N = 1_000_000_000 type Metrics struct { A int64 B int64 } func main() { runtime.GOMAXPROCS(2) var wg sync.WaitGroup m := &Metrics{} start := time.Now() wg.Add(2) go func() { for i := 0; i < N; i++ { m.A++ } wg.Done() }() go func() { for i := 0; i < N; i++ { m.B++ } wg.Done() }() wg.Wait() fmt.Println("Duration:", time.Since(start)) }
👣 Step by step with values
Given the struct:
type Data struct { A int64 B int64 }
Step 1 — Core 1 loads the struct from memory
Core 1 says:
"Loading the struct from memory. Initial values: A = 100, B = 200."
// Data in memory: { A: 100, B: 200 }
Under the hood:
- Core 1 loads the cache line containing both
A
andB
into its L1 cache.
Step 2 — Core 2 loads the struct from memory
Core 2 says:
"Loading the struct from memory. Current values: A = 100, B = 200."
// Data in memory (via cache line): { A: 100, B: 200 }
Under the hood:
- Core 2 loads the same cache line into its L1 cache.
Step 3 — Core 1 increments A
Core 1 says:
"Increment A by 10."
// A = 110, B = 200 // updated values after increment
Under the hood:
- Core 1 marks the cache line as modified.
- Core 2’s cached copy becomes invalid due to the Cache Coherence Protocol detecting the change.
Step 4 — Core 2 fetches the updated cache line
Core 2 says:
"My cached copy is stale. Fetching the latest cache line before updating B."
// Cache line updated: { A: 110, B: 200 }
Under the hood:
- Core 2 requests the latest cache line from Core 1.
- Core 1 shares its modified cache line (
A=110, B=200
) with Core 2. - Core 2 loads this updated cache line into its cache.
Step 5 — Core 2 increments B
Core 2 says:
"Increment B by 15."
// A = 110, B = 215 // updated values after increment
Under the hood:
- Core 2 marks the cache line as modified.
- Core 1’s cached copy becomes invalid due to the Cache Coherence Protocol.
💡 Notice how even working on different fields, the cores are forced to update the same cache line — leading to unnecessary synchronization and lost performance.
✅ Fixing It with Padding
We can fix false sharing by padding the struct to separate the fields across cache lines:
type PaddedMetrics struct { A int64 _ [56]byte // Padding to separate fields (assuming 64-byte cache line) B int64 }
Replace Metrics
with PaddedMetrics
, rerun the benchmark, and you’ll likely see a huge performance improvement.
🔬 Measuring the Problem with go test
Create a file with this content named metrics_test.go
:
package main import ( "sync" "testing" ) const N = 1_000_000_000 type Metrics struct { A int64 B int64 } type MetricsWithPadding struct { A int64 _ [56]byte B int64 } func BenchmarkMetrics(b *testing.B) { for n := 0; n < b.N; n++ { var wg sync.WaitGroup m := &Metrics{} wg.Add(2) go func() { for i := 0; i < N; i++ { m.A++ } wg.Done() }() go func() { for i := 0; i < N; i++ { m.B++ } wg.Done() }() wg.Wait() } } func BenchmarkMetricsWithPadding(b *testing.B) { for n := 0; n < b.N; n++ { var wg sync.WaitGroup m := &MetricsWithPadding{} wg.Add(2) go func() { for i := 0; i < N; i++ { m.A++ } wg.Done() }() go func() { for i := 0; i < N; i++ { m.B++ } wg.Done() }() wg.Wait() } }
Run the go test
in your terminal with the -bench
flag:
go test -bench=.
🔬 Measuring the Problem with perf
You can use Linux's perf
tool to confirm false sharing is happening:
go build -o false-sharing perf stat -e cache-references,cache-misses ./false-sharing
Compare the output before and after padding.
You’ll probably see a significant drop in cache-misses
with the padded struct:
Before Padding: 1,230,000,000 cache-references 980,000,000 cache-misses After Padding: 800,000,000 cache-references 150,000,000 cache-misses
💡 Lower cache misses = better core-level coordination and real parallelism.
🎮 Where I Learned This: The HP-MP Story
I ran into this issue while working on a side project—building a small MMORPG in Go (just for fun and to push the language's limits).
My Player
struct looked like this:
type Player struct { HP int64 MP int64 }
One goroutine handled combat and decreased HP
.
Another handled mana usage and decreased MP
.
I expected everything to run smoothly, but profiling showed almost no performance gain from using multiple cores. After digging deeper, I realized that HP
and MP
were sharing the same cache line—and that’s how I discovered false sharing in practice.
🧠 Takeaways
- False sharing happens when multiple goroutines write to fields in the same cache line.
- It leads to cache thrashing and performance comparable to single-core execution.
- You can fix it with padding, reordering fields, or splitting structs entirely.
- Profiling tools like
perf
help expose the hidden cost.
🧵 TL;DR
- ⚠️ Structs with multiple fields can secretly share memory in ways that destroy performance.
- 🧪 Test, measure, and don’t blindly trust "references are faster than copies."
- 🚀 Knowing how CPUs work helps you write better Go.
What’s the sneakiest performance bug you’ve ever hit?
Let me know in the comments 👇
Top comments (4)
Very cool man! I'm wondering if that would happen in Swift 🤔. Maybe, using actors in Swift instead of structs may solve this problem (I confess that with my limited golang knowledge I have no idea if something similar exists in this language). I'll probably test it out.
Hey, thanks! That’s a great question. False sharing is a low-level CPU cache issue, so it can happen in any language that uses shared memory and concurrency — including Swift and Go.
In Go, we don’t have actors like Swift, but we do have goroutines and channels for concurrency. The problem arises when multiple goroutines access variables that live close together in memory (same cache line).
Using actors in Swift may help because they enforce isolation, reducing data races and possibly false sharing too.
Nicely done! Keep it coming!
Thanks!