DEV Community

Cover image for ⚠️ False Sharing in Go — The Hidden Enemy in Your Concurrency
Kelvin Floresta de Andrade
Kelvin Floresta de Andrade

Posted on

⚠️ False Sharing in Go — The Hidden Enemy in Your Concurrency

Do you know what false sharing is?

It’s a subtle issue that can ruin the performance of concurrent programs, especially when running on multi-core CPUs. You expect your goroutines to run in parallel and speed things up... but instead, you get almost the same performance as if you were using just a single core.


🧵 A Simple Example

Let’s take a basic case where two goroutines update different fields in the same struct:

type Metrics struct { A int64 B int64 } 
Enter fullscreen mode Exit fullscreen mode

And now we run them concurrently:

go func() { for i := 0; i < N; i++ { m.A++ } }() go func() { for i := 0; i < N; i++ { m.B++ } }() 
Enter fullscreen mode Exit fullscreen mode

Looks fine—different fields, right? But under the hood, A and B might live side-by-side in memory, potentially inside the same CPU cache line.

When that happens, each write operation from different cores triggers cache invalidation, even though they’re not touching the same variable. This causes the CPU to constantly sync memory between cores—losing the benefits of parallelism.


</> Code example

Below is code demonstrating the false sharing effect in action:

package main import ( "fmt" "runtime" "sync" "time" ) const N = 1_000_000_000 type Metrics struct { A int64 B int64 } func main() { runtime.GOMAXPROCS(2) var wg sync.WaitGroup m := &Metrics{} start := time.Now() wg.Add(2) go func() { for i := 0; i < N; i++ { m.A++ } wg.Done() }() go func() { for i := 0; i < N; i++ { m.B++ } wg.Done() }() wg.Wait() fmt.Println("Duration:", time.Since(start)) } 
Enter fullscreen mode Exit fullscreen mode

👣 Step by step with values

Given the struct:

type Data struct { A int64 B int64 } 
Enter fullscreen mode Exit fullscreen mode

Step 1 — Core 1 loads the struct from memory

Core 1 says:

"Loading the struct from memory. Initial values: A = 100, B = 200."

// Data in memory: { A: 100, B: 200 } 
Enter fullscreen mode Exit fullscreen mode

Under the hood:

  • Core 1 loads the cache line containing both A and B into its L1 cache.

Step 2 — Core 2 loads the struct from memory

Core 2 says:

"Loading the struct from memory. Current values: A = 100, B = 200."

// Data in memory (via cache line): { A: 100, B: 200 } 
Enter fullscreen mode Exit fullscreen mode

Under the hood:

  • Core 2 loads the same cache line into its L1 cache.

Step 3 — Core 1 increments A

Core 1 says:

"Increment A by 10."

// A = 110, B = 200 // updated values after increment 
Enter fullscreen mode Exit fullscreen mode

Under the hood:

  • Core 1 marks the cache line as modified.
  • Core 2’s cached copy becomes invalid due to the Cache Coherence Protocol detecting the change.

Step 4 — Core 2 fetches the updated cache line

Core 2 says:

"My cached copy is stale. Fetching the latest cache line before updating B."

// Cache line updated: { A: 110, B: 200 } 
Enter fullscreen mode Exit fullscreen mode

Under the hood:

  • Core 2 requests the latest cache line from Core 1.
  • Core 1 shares its modified cache line (A=110, B=200) with Core 2.
  • Core 2 loads this updated cache line into its cache.

Step 5 — Core 2 increments B

Core 2 says:

"Increment B by 15."

// A = 110, B = 215 // updated values after increment 
Enter fullscreen mode Exit fullscreen mode

Under the hood:

  • Core 2 marks the cache line as modified.
  • Core 1’s cached copy becomes invalid due to the Cache Coherence Protocol.

💡 Notice how even working on different fields, the cores are forced to update the same cache line — leading to unnecessary synchronization and lost performance.


✅ Fixing It with Padding

We can fix false sharing by padding the struct to separate the fields across cache lines:

type PaddedMetrics struct { A int64 _ [56]byte // Padding to separate fields (assuming 64-byte cache line) B int64 } 
Enter fullscreen mode Exit fullscreen mode

Replace Metrics with PaddedMetrics, rerun the benchmark, and you’ll likely see a huge performance improvement.


🔬 Measuring the Problem with go test

Create a file with this content named metrics_test.go:

package main import ( "sync" "testing" ) const N = 1_000_000_000 type Metrics struct { A int64 B int64 } type MetricsWithPadding struct { A int64 _ [56]byte B int64 } func BenchmarkMetrics(b *testing.B) { for n := 0; n < b.N; n++ { var wg sync.WaitGroup m := &Metrics{} wg.Add(2) go func() { for i := 0; i < N; i++ { m.A++ } wg.Done() }() go func() { for i := 0; i < N; i++ { m.B++ } wg.Done() }() wg.Wait() } } func BenchmarkMetricsWithPadding(b *testing.B) { for n := 0; n < b.N; n++ { var wg sync.WaitGroup m := &MetricsWithPadding{} wg.Add(2) go func() { for i := 0; i < N; i++ { m.A++ } wg.Done() }() go func() { for i := 0; i < N; i++ { m.B++ } wg.Done() }() wg.Wait() } } 
Enter fullscreen mode Exit fullscreen mode

Run the go test in your terminal with the -bench flag:

go test -bench=. 
Enter fullscreen mode Exit fullscreen mode

🔬 Measuring the Problem with perf

You can use Linux's perf tool to confirm false sharing is happening:

go build -o false-sharing perf stat -e cache-references,cache-misses ./false-sharing 
Enter fullscreen mode Exit fullscreen mode

Compare the output before and after padding.

You’ll probably see a significant drop in cache-misses with the padded struct:

Before Padding: 1,230,000,000 cache-references 980,000,000 cache-misses After Padding: 800,000,000 cache-references 150,000,000 cache-misses 
Enter fullscreen mode Exit fullscreen mode

💡 Lower cache misses = better core-level coordination and real parallelism.


🎮 Where I Learned This: The HP-MP Story

I ran into this issue while working on a side project—building a small MMORPG in Go (just for fun and to push the language's limits).

My Player struct looked like this:

type Player struct { HP int64 MP int64 } 
Enter fullscreen mode Exit fullscreen mode

One goroutine handled combat and decreased HP.
Another handled mana usage and decreased MP.

I expected everything to run smoothly, but profiling showed almost no performance gain from using multiple cores. After digging deeper, I realized that HP and MP were sharing the same cache line—and that’s how I discovered false sharing in practice.


🧠 Takeaways

  • False sharing happens when multiple goroutines write to fields in the same cache line.
  • It leads to cache thrashing and performance comparable to single-core execution.
  • You can fix it with padding, reordering fields, or splitting structs entirely.
  • Profiling tools like perf help expose the hidden cost.

🧵 TL;DR

  • ⚠️ Structs with multiple fields can secretly share memory in ways that destroy performance.
  • 🧪 Test, measure, and don’t blindly trust "references are faster than copies."
  • 🚀 Knowing how CPUs work helps you write better Go.

What’s the sneakiest performance bug you’ve ever hit?
Let me know in the comments 👇

Top comments (4)

Collapse
 
raphacmartin profile image
Raphael Martin

Very cool man! I'm wondering if that would happen in Swift 🤔. Maybe, using actors in Swift instead of structs may solve this problem (I confess that with my limited golang knowledge I have no idea if something similar exists in this language). I'll probably test it out.

Collapse
 
kelvinfloresta profile image
Kelvin Floresta de Andrade

Hey, thanks! That’s a great question. False sharing is a low-level CPU cache issue, so it can happen in any language that uses shared memory and concurrency — including Swift and Go.

In Go, we don’t have actors like Swift, but we do have goroutines and channels for concurrency. The problem arises when multiple goroutines access variables that live close together in memory (same cache line).

Using actors in Swift may help because they enforce isolation, reducing data races and possibly false sharing too.

Collapse
 
luciano_ayres profile image
Luciano Ayres

Nicely done! Keep it coming!

Collapse
 
kelvinfloresta profile image
Kelvin Floresta de Andrade

Thanks!