Posted on Jul 27

Cache or Crash: Avoiding the Four Most Common Caching Pitfalls

#goland #redis #performance #backenddevelopment

TL;DR

4 Cache Disasters & Go Solutions:

Thunder Hurd: Random TTL jitter prevents mass expiration

Cache Penetration: Cache "null" results for non-existent data

Cache Breakdown: Never expire hot keys, use background refresh

Cache Crash: Circuit breakers + rate limiting for graceful degradation Golden Rule: Your cache strategy must work when caches fail, not just when they succeed.

Caching is one of the most powerful tools in a developer's arsenal for improving application performance. By storing frequently accessed data in fast, temporary storage, we can dramatically reduce response times and database load. However, caching isn't a silver bullet – when implemented incorrectly, it can create serious performance bottlenecks and system failures.

In this post, we'll explore four critical caching problems that can bring your system to its knees, along with practical solutions to prevent them.

1. Thunder Hurd Problem: When Cache Misses Attack in Waves

The Problem

The Thunder Hurd problem occurs when a large number of cache keys expire simultaneously, causing a massive wave of concurrent requests to hit your database all at once. Imagine your Redis cache contains thousands of user session keys that were all created during peak traffic hours and set with the same TTL (Time To Live).

When these keys expire simultaneously:

Multiple application instances detect cache misses
All instances simultaneously query the database for the same data
Database gets overwhelmed with concurrent queries
System performance degrades significantly

The Solution

Set Random Expiry Times: Instead of using fixed TTL values, add randomization to prevent synchronized expiration:

package main import ( "context" "math/rand" "time" "github.com/go-redis/redis/v8" ) type CacheService struct { client *redis.Client } func NewCacheService() *CacheService { rdb := redis.NewClient(&redis.Options{ Addr: "localhost:6379", }) return &CacheService{client: rdb} } func (c *CacheService) SetCacheWithJitter(ctx context.Context, key, value string, baseTTL time.Duration) error { // Add 0-20% jitter to base TTL jitterPercent := rand.Float64() * 0.2 jitter := time.Duration(float64(baseTTL) * jitterPercent) actualTTL := baseTTL + jitter return c.client.Set(ctx, key, value, actualTTL).Err() }

This simple technique spreads cache expiration over time, preventing the thundering herd effect.

2. Cache Penetration: The Non-Existent Key Problem

The Problem

Cache penetration happens when your application repeatedly requests data that doesn't exist in either the cache or the database. This creates a perfect storm:

Application checks cache → miss
Application queries database → no results
No data gets cached (because it doesn't exist)
Process repeats for every request

Malicious users can exploit this by repeatedly requesting non-existent resources, effectively bypassing your cache layer and hammering your database directly.

The Solution

Cache Empty Results with Bloom Filters:

Cache null/empty results for a short period:

package main import ( "context" "encoding/json" "time" "github.com/go-redis/redis/v8" ) type User struct { ID int `json:"id"` Name string `json:"name"` } type UserService struct { cache *redis.Client db Database // Assume this interface exists } func (s *UserService) GetUserData(ctx context.Context, userID int) (*User, error) { key := fmt.Sprintf("user:%d", userID) // Check cache first cachedData, err := s.cache.Get(ctx, key).Result() if err == nil { if cachedData == "null" { return nil, nil // Cached miss } var user User if err := json.Unmarshal([]byte(cachedData), &user); err == nil { return &user, nil } } // Query database user, err := s.db.GetUser(ctx, userID) if err != nil { return nil, err } // Cache the result (even if nil) if user != nil { userData, _ := json.Marshal(user) s.cache.Set(ctx, key, string(userData), time.Hour) } else { // Cache miss for 5 minutes s.cache.Set(ctx, key, "null", 5*time.Minute) } return user, nil }

Implement Bloom Filters to quickly identify non-existent keys before hitting the database.

package main import ( "context" "hash/fnv" ) // Simple Bloom Filter implementation type BloomFilter struct { bitArray []bool size uint hashFunctions int } func NewBloomFilter(size uint, hashFunctions int) *BloomFilter { return &BloomFilter{ bitArray: make([]bool, size), size: size, hashFunctions: hashFunctions, } } func (bf *BloomFilter) Add(item string) { for i := 0; i < bf.hashFunctions; i++ { hash := bf.hash(item, uint(i)) % bf.size bf.bitArray[hash] = true } } func (bf *BloomFilter) MightContain(item string) bool { for i := 0; i < bf.hashFunctions; i++ { hash := bf.hash(item, uint(i)) % bf.size if !bf.bitArray[hash] { return false } } return true } func (bf *BloomFilter) hash(item string, seed uint) uint { h := fnv.New32a() h.Write([]byte(item)) h.Write([]byte{byte(seed)}) return uint(h.Sum32()) } // Enhanced UserService with Bloom Filter type EnhancedUserService struct { cache *redis.Client db Database bloomFilter *BloomFilter } func (s *EnhancedUserService) GetUserData(ctx context.Context, userID int) (*User, error) { key := fmt.Sprintf("user:%d", userID) // Check bloom filter first if !s.bloomFilter.MightContain(key) { // Definitely doesn't exist, cache the miss s.cache.Set(ctx, key, "null", 5*time.Minute) return nil, nil } // Continue with normal cache/database flow return s.getUserDataNormal(ctx, userID) }

3. Cache Breakdown: When Hot Keys Expire

The Problem

Cache breakdown occurs when a highly accessed "hot key" expires, causing a sudden surge of requests to the database for that specific piece of data. Unlike the Thunder Hurd problem (which affects multiple keys), cache breakdown focuses on a single, critical piece of data.

Consider a popular product page on an e-commerce site. When its cache entry expires:

Hundreds of concurrent users request the same product
All requests miss the cache
Database gets bombarded with identical queries
System performance suffers until the cache is repopulated

The Solution

Never Set Expiry for Hot Keys: For critical, frequently accessed data, consider these strategies:

No expiration with manual invalidation:

package main import ( "context" "encoding/json" "fmt" "github.com/go-redis/redis/v8" ) type HotDataService struct { cache *redis.Client db Database } func (s *HotDataService) UpdateHotData(ctx context.Context, key string, newValue interface{}) error { // Update database first if err := s.db.Update(ctx, key, newValue); err != nil { return err } // Then update cache without expiry data, err := json.Marshal(newValue) if err != nil { return err } hotKey := fmt.Sprintf("hot:%s", key) return s.cache.Set(ctx, hotKey, string(data), 0).Err() // 0 = no expiration }

Background refresh before expiration:

package main import ( "context" "encoding/json" "fmt" "time" "github.com/go-redis/redis/v8" ) type BackgroundRefreshService struct { cache *redis.Client db Database } func (s *BackgroundRefreshService) GetHotDataWithRefresh(ctx context.Context, key string) (interface{}, error) { hotKey := fmt.Sprintf("hot:%s", key) refreshKey := fmt.Sprintf("refresh:%s", key) // Check if data exists in cache cachedData, err := s.cache.Get(ctx, hotKey).Result() if err == nil { // Check if refresh is needed (before actual expiry) lastRefresh, err := s.cache.Get(ctx, refreshKey).Result() if err != nil || s.shouldRefresh(lastRefresh) { // Trigger background refresh go s.refreshHotKey(context.Background(), key) } var result interface{} json.Unmarshal([]byte(cachedData), &result) return result, nil } // Fallback to database if cache truly missing return s.getFromDatabaseAndCache(ctx, key) } func (s *BackgroundRefreshService) shouldRefresh(lastRefreshStr string) bool { if lastRefreshStr == "" { return true } lastRefresh, err := time.Parse(time.RFC3339, lastRefreshStr) if err != nil { return true } return time.Since(lastRefresh) > 50*time.Minute } func (s *BackgroundRefreshService) refreshHotKey(ctx context.Context, key string) { // Implementation for background refresh data, err := s.db.Get(ctx, key) if err != nil { return } jsonData, _ := json.Marshal(data) hotKey := fmt.Sprintf("hot:%s", key) refreshKey := fmt.Sprintf("refresh:%s", key) s.cache.Set(ctx, hotKey, string(jsonData), time.Hour) s.cache.Set(ctx, refreshKey, time.Now().Format(time.RFC3339), time.Hour) } func (s *BackgroundRefreshService) getFromDatabaseAndCache(ctx context.Context, key string) (interface{}, error) { // Fallback implementation data, err := s.db.Get(ctx, key) if err != nil { return nil, err } jsonData, _ := json.Marshal(data) hotKey := fmt.Sprintf("hot:%s", key) s.cache.Set(ctx, hotKey, string(jsonData), time.Hour) return data, nil }

4. Cache Crash: Building Resilient Systems

The Problem

Cache crash is perhaps the most catastrophic scenario – your entire cache system (Redis cluster, Memcached, etc.) becomes unavailable. When this happens:

All cache requests fail
Traffic redirects entirely to your database
Database becomes overwhelmed and may crash
Cascading failures throughout your system

The Solution

Implement Circuit Breakers and Highly Available Cache Clusters:

Circuit Breaker Pattern:

package main import ( "context" "encoding/json" "sync" "time" "github.com/go-redis/redis/v8" ) type CircuitState int const ( StateClosed CircuitState = iota StateOpen StateHalfOpen ) type CacheCircuitBreaker struct { client *redis.Client failureThreshold int timeout time.Duration failureCount int lastFailureTime time.Time state CircuitState mutex sync.RWMutex } func NewCacheCircuitBreaker(client *redis.Client) *CacheCircuitBreaker { return &CacheCircuitBreaker{ client: client, failureThreshold: 5, timeout: 60 * time.Second, state: StateClosed, } } func (cb *CacheCircuitBreaker) Get(ctx context.Context, key string) (string, error) { cb.mutex.RLock() state := cb.state cb.mutex.RUnlock() if state == StateOpen { cb.mutex.RLock() timeSinceLastFailure := time.Since(cb.lastFailureTime) cb.mutex.RUnlock() if timeSinceLastFailure > cb.timeout { cb.mutex.Lock() cb.state = StateHalfOpen cb.mutex.Unlock() } else { return "", fmt.Errorf("circuit breaker is open") } } result, err := cb.client.Get(ctx, key).Result() if err != nil { cb.recordFailure() return "", err } if state == StateHalfOpen { cb.reset() } return result, nil } func (cb *CacheCircuitBreaker) recordFailure() { cb.mutex.Lock() defer cb.mutex.Unlock() cb.failureCount++ cb.lastFailureTime = time.Now() if cb.failureCount >= cb.failureThreshold { cb.state = StateOpen } } func (cb *CacheCircuitBreaker) reset() { cb.mutex.Lock() defer cb.mutex.Unlock() cb.state = StateClosed cb.failureCount = 0 }

Graceful Degradation:

package main import ( "context" "encoding/json" "time" "golang.org/x/time/rate" ) type FallbackService struct { circuitBreaker *CacheCircuitBreaker rateLimiter *rate.Limiter db Database } func NewFallbackService(cb *CacheCircuitBreaker, db Database) *FallbackService { // Rate limiter: 100 requests per second with burst of 10 limiter := rate.NewLimiter(rate.Limit(100), 10) return &FallbackService{ circuitBreaker: cb, rateLimiter: limiter, db: db, } } func (s *FallbackService) GetDataWithFallback(ctx context.Context, key string) (interface{}, error) { // Try cache with circuit breaker cachedData, err := s.circuitBreaker.Get(ctx, key) if err == nil { var result interface{} if err := json.Unmarshal([]byte(cachedData), &result); err == nil { return result, nil } } // Fallback to database with rate limiting if !s.rateLimiter.Allow() { return nil, fmt.Errorf("rate limit exceeded for database fallback") } return s.getFromDatabaseWithRateLimit(ctx, key) } func (s *FallbackService) getFromDatabaseWithRateLimit(ctx context.Context, key string) (interface{}, error) { // Add timeout for database calls dbCtx, cancel := context.WithTimeout(ctx, 5*time.Second) defer cancel() data, err := s.db.Get(dbCtx, key) if err != nil { return nil, err } return data, nil }

Best Practices for Robust Caching

Monitor Cache Hit Rates: Maintain visibility into your cache performance
Implement Cache Warming: Pre-populate cache with critical data
Use Multi-Level Caching: Combine local and distributed caches
Plan for Cache Invalidation: Design clear strategies for updating stale data
Load Test Cache Failure Scenarios: Regularly test how your system behaves when caches fail

Conclusion

Caching is incredibly powerful, but these four problems – Thunder Hurd, Cache Penetration, Cache Breakdown, and Cache Crash – can turn your performance optimization into a performance nightmare. By understanding these issues and implementing the solutions we've discussed, you can build more resilient, performant systems that gracefully handle cache failures.

Notes: The best cache strategy is one that works well both when the cache is healthy and when it's not. Plan for failure, monitor your systems, and always have fallback mechanisms in place.

Have you encountered any of these caching problems in your systems? Share your experiences and solutions in the comments below.

DEV Community

Cache or Crash: Avoiding the Four Most Common Caching Pitfalls

TL;DR

Table of Contents

1. Thunder Hurd Problem: When Cache Misses Attack in Waves

The Problem

The Solution

2. Cache Penetration: The Non-Existent Key Problem

The Problem

The Solution

Cache null/empty results for a short period:

Implement Bloom Filters to quickly identify non-existent keys before hitting the database.

3. Cache Breakdown: When Hot Keys Expire

The Problem

The Solution

No expiration with manual invalidation:

Background refresh before expiration:

4. Cache Crash: Building Resilient Systems

The Problem

The Solution

Graceful Degradation:

Best Practices for Robust Caching

Conclusion

Top comments (0)