Posted on May 16

Writing Concurrent Code in Go: Lessons from Using sync.WaitGroup

#backenddevelopment #opensource #go #performanceoptimization

Go makes it easy to get started with concurrency. You spin up a few goroutines, wait on a sync.WaitGroup, and you're off to the races. The go keyword lowers the barrier to entry so much that it's tempting to assume concurrency in Go is always this straightforward.

But is it?

At WunderGraph, we're building a GraphQL Router in Go. Its job is to take an incoming query, break it into smaller sub-requests, send them off to the appropriate backend services, then stitch the results together into a single response. For performance reasons, those sub-requests need to run in parallel whenever possible.

That pushed us deeper into Go’s concurrency model than we initially expected.

This post shares what we’ve learned along the way—because while starting goroutines and adding them to a WaitGroup might work for simple cases, it doesn’t take much to hit deadlocks, missed cancellations, or performance bottlenecks if you’re not careful.

Understanding sync.WaitGroup in Go

Here’s a quick example of how sync.WaitGroup works:

package main import ( "fmt" "sync" ) func main() { wg := &sync.WaitGroup{} wg.Add(1) go func() { defer wg.Done() fmt.Println("Hello, World!") }() wg.Wait() }

If you remove the wg.Wait() line, the program will likely exit before the goroutine prints anything. WaitGroup lets you block until a group of goroutines finishes—but in this example, we’ve only used one. The power of WaitGroup really shows when managing multiple concurrent tasks.

To see how this scales—and where things can go wrong—we go deeper in the original post

Parallel Execution with WaitGroup

Let’s simulate a more realistic scenario: fetching data for a product from two different services. In a federated GraphQL system, this might mean one subgraph resolves availability, while another provides pricing. The router needs both before it can respond.

 package main import ( "fmt" "sync" "time" ) type Product struct { ID int Availability string Price float64 } func main() { wg := &sync.WaitGroup{} product := &Product{ID: 1} wg.Add(1) go func() { // Always defer Done() immediately inside the goroutine. defer wg.Done() // Simulate network call to a "Products" subgraph fmt.Println("Fetching availability for product 1...") // Pretend GraphQL query: query { product(id:1) { availability } } time.Sleep(100 * time.Millisecond) // Simulate network latency product.Availability = "In Stock" fmt.Println("Availability fetched.") }() wg.Add(1) go func() { defer wg.Done() fmt.Println("Fetching price for product 1...") time.Sleep(150 * time.Millisecond) product.Price = 29.99 fmt.Println("Price fetched.") }() fmt.Println("Router is waiting for subgraph responses...") wg.Wait() fmt.Printf("Successfully fetched data for Product ID %d: %+v\n", product.ID, *product) }

Here's a link to the Go Playground if you want to run the code yourself.

Now we’re using WaitGroup to coordinate multiple concurrent tasks—a lot closer to how a GraphQL router behaves in production. But this version still hides a few potential pitfalls.

Before we explore those, let’s take a look at what’s actually going on inside sync.WaitGroup.

How does sync.WaitGroup Actually Work?

The WaitGroup implementation in Go is only about 130 lines long—but it’s packed with insight into how the Go runtime and scheduler handle concurrency.

Here’s a simplified look at the core struct:

type WaitGroup struct { noCopy noCopy // Prevents accidental copying state atomic.Uint64 // High 32 bits: counter; Low 32 bits: waiter count sema uint32 // Used internally to block/wake goroutines }

noCopy isn’t data—it’s a signal for the Go vet tool. If you copy a WaitGroup after using it, bad things can happen: goroutines might end up referencing different internal counters. vet helps prevent this.
state holds two values in one atomic Uint64: the top 32 bits track how many Done() calls are still needed, and the bottom 32 track how many goroutines are blocked in Wait(). By combining them, Go avoids mutexes in most cases—making updates fast and thread-safe.
sema is a semaphore used by the runtime. If Wait() is called while the counter isn’t zero, the goroutine sleeps on this semaphore. When the last Done() is called and the counter hits zero, the runtime wakes up everyone waiting.

It’s a tight, efficient design, but a design that makes assumptions. If violated, those assumptions can lead to bugs.

Lifecycle Summary

Call Add(n) before launching goroutines.
Inside each goroutine, call defer wg.Done() immediately.
Use Wait() to block until all Done() calls have been made.

So far, this looks pretty solid. But once you start handling panics, timeouts, or cancellations, cracks can start to show.

Let’s look at where things can go wrong.

Common Pitfalls with sync.WaitGroup in Go

It’s easy to launch goroutines in Go. But it’s just as easy to forget: every goroutine must end properly. When you’re using sync.WaitGroup, that becomes critical.

Deadlocks from Missing Done()

One of the most common mistakes is forgetting to call Done()—especially in error-handling paths. Here’s an example:

package main import ( "fmt" "net/http" "sync" ) func main() { wg := &sync.WaitGroup{} wg.Add(1) go func() { // PROBLEM: No defer req, err := http.NewRequest("GET", "https://api.example.com/data", nil) if err != nil { fmt.Println("Error creating request:", err) // wg.Done() is never called return } resp, err := http.DefaultClient.Do(req) if err != nil { fmt.Println("Error sending request:", err) // Done() is still missing return } defer resp.Body.Close() // Only call Done() on the happy path wg.Done() // <<< If errors happen, this line is skipped! }() wg.Wait() // If any error occurs, this blocks forever }

If NewRequest or Do fails, the goroutine exits early—but never calls Done(). The WaitGroup counter stays above zero, and the main goroutine blocks indefinitely.

Try this in the Go Playground, you'll eventually get this:

fatal error: all goroutines are asleep - deadlock!

The fix? Always defer wg.Done() at the start of your goroutine:

package main import ( "fmt" "net/http" "sync" ) func main() { wg := &sync.WaitGroup{} wg.Add(1) // Add before go go func() { defer wg.Done() //Done is always called, even on early return req, err := http.NewRequest("GET", "https://api.example.com/data", nil) if err != nil { fmt.Println("Error creating request:", err) return // Done() will still run thanks to defer } resp, err := http.DefaultClient.Do(req) if err != nil { fmt.Println("Error sending request:", err) return // Done() will still run } defer resp.Body.Close() // No longer need wg.Done() here fmt.Println("Request successful (in theory)!") }() wg.Wait() fmt.Println("Main goroutine finished waiting.") }

Okay, deadlock avoided. But wait, there's more! What about timeouts and cancellation?

WaitGroup Doesn’t Handle Cancellation

Even with defer wg.Done() in place, there’s another issue: what if your goroutine gets stuck?

In our HTTP example, http.DefaultClient.Do might block forever if the server hangs or the network is slow. And because there’s no context.Context, there’s no way to cancel the request. That means your goroutine—and your WaitGroup—could still hang indefinitely.

You might see this workaround:

done := make(chan struct{}) go func() { wg.Wait() close(done) }() select { case <-done: fmt.Println("WaitGroup finished normally.") case <-time.After(5 * time.Second): // Timeout after 5 seconds fmt.Println("Timeout waiting for WaitGroup!") }

It works in the sense that the main goroutine avoids blocking forever—but it doesn’t solve the core issue. You’ve sidestepped the WaitGroup deadlock, but the HTTP request still runs. If you're doing this repeatedly in a server or long-running process, you’ll leak resources over time.

Here’s the full example to try out:

package main import ( "fmt" "net/http" "sync" "time" ) func main() { wg := &sync.WaitGroup{} wg.Add(1) go func() { defer wg.Done() req, err := http.NewRequest("GET", "http://httpbin.org/delay/10", nil) if err != nil { fmt.Println("Error creating request:", err) return } fmt.Println("Sending request...") resp, err := http.DefaultClient.Do(req) // This might block forever if err != nil { fmt.Println("Error sending request:", err) return } defer resp.Body.Close() fmt.Println("Request finished.") }() // Wait for WaitGroup in a separate goroutine done := make(chan struct{}) go func() { wg.Wait() close(done) }() // Timeout handling select { case <-done: fmt.Println("WaitGroup finished normally.") case <-time.After(5 * time.Second): fmt.Println("Timeout waiting for WaitGroup!") } // Give the background goroutine time to finish (if it ever does) time.Sleep(1 * time.Second) fmt.Println("Main function exiting.") }

What Comes Next

So far, we’ve looked at how sync.WaitGroup works, where it helps, and where it falls short. It is great for waiting on multiple goroutines, but it does not help when you need to cancel one that gets stuck.

In real-world code, especially with blocking operations like HTTP requests, this becomes a serious problem. If you do not use context.Context, a slow or unresponsive service can leave your goroutine hanging, even after your program has tried to move on.

That is where the real solution begins.

In the original post, we go deeper into using context.WithTimeout to safely cancel requests, and how to use errgroup.Group to manage multiple tasks with coordinated failure handling. This pattern has been essential in our work at WunderGraph, where performance and parallelism are core to the design.

If you are working on similar infrastructure, or just want to write safer concurrent code in Go, I hope this post gave you a useful starting point—and a clearer sense of when WaitGroup is enough, and when it is time to use something more.