Author: Steve Tuohy - Director of Product Marketing
Discover how Criteo scaled to 290M queries-per-second, swapped Couchbase + Memcached for Aerospike, cut servers by 78%, and kept sub-ms latency.
Criteo serves more than 700 million users daily with personalized ads, processing billions of events in milliseconds. At its peak, Criteo’s infrastructure handles 290 million key-value queries per second (QPS).
Until recently, this real-time engine ran on a complex stack of Couchbase and Memcached, propped up by custom C clients and costly operational overhead. Performance during rebalancing was fragile, cache warming required manual traffic shaping, and RAM costs were high.
But when Criteo rebuilt its stack using Aerospike’s patented Hybrid Memory Architecture (HMA), the company consolidated two systems into one, simplified operations, and cut its server footprint by 78%, all while maintaining sub-millisecond latency at global scale.
Maintaining real-time access at AdTech scale
Criteo isn’t just serving banner ads. It runs real-time auctions on the open web, responding to 20 million bid requests per second. Each request requires dozens of micro-decisions: audience scoring, campaign pacing, frequency caps, and more.
"Our key-value storage system peaks at about 290 million queries per second, which is fairly large,” said Maxime Brugidou, vice president of engineering, Criteo.
All of that happens in under 100 ms. The stack is written in C, runs on premises across 40,000 servers and seven data centers, and operates through Kubernetes and Apache Mesos.
The demands on storage? Sub-millisecond latency, no downtime during rebalancing or upgrades, and a globally distributed footprint.
The legacy stack: Couchbase + Memcached + custom C logic
Before adopting Aerospike, Criteo’s real-time infrastructure relied on Couchbase for the data, with Memcached as a caching layer. A custom-built C client orchestrated dual writes and replication between tiers, and maintained consistency between the two systems. “It was very difficult to make it perform well during rebalancing or maintenance,” Brugidou said. “It was quite unstable.”
Operationally, this design required careful tuning and careful rerouting of traffic. Any node failure or rebalance event degraded performance and had to be dealt with.
The turning point: Aerospike’s Hybrid Memory Architecture (HMA)
Aerospike’s HMA changed the game. It decouples index storage from data storage:
- Indexes stay in RAM for fast access.
- Data lives on SSDs, such as NVMe drives, in Criteo's case.
This model gave Criteo Memcached-level low latency with the durability of disk and let the company collapse its architecture from two databases into one. "There was this nice design with the index being in memory and the data on disk,” Brugidou said. “It allowed for really good tradeoffs."
By keeping indexes in memory and serving records from fast SSDs, Aerospike hit sub-millisecond reads without needing RAM to hold entire datasets. For Criteo, this not only simplified its system design but also saved money.
"Aerospike combined with NVMe disks… we had basically Memcached performance except that we were using persistent storage,” Brugidou said. “That was quite impressive…Aerospike was able to keep reading steadily at high throughput and very, very low latency despite all the mess we were putting on the servers."
Kubernetes-native deployment: From custom scripts to automated ops
Criteo replaced both Couchbase and Memcached with Aerospike. The company removed its custom client drivers, adopted Aerospike’s native C client, and rolled out a Kubernetes-native deployment on-prem with the Aerospike Kubernetes Operator.
Operational wins included:
- Automatic node recovery and rebalancing via Kubernetes
- Eliminating the custom C client, reducing complexity
Multi-bin optimization: Collapsing data models for performance
Aerospike’s multi-bin optimization simplified things even more. Criteo merged multiple datasets into one namespace, which helped reduce index memory usage and improve access efficiency. "You can write to some bins very easily and read all the bins at once. You only pay for indexing once in memory. That was quite a significant win," Brugidou said.
Aerospike also made rebalancing and node failure recovery seamless to the team. "We do that transparently,” Brugidou said. Aerospike is able to rebalance so easily… This is fully automated and very easy to do with the Kubernetes Operator.” Multi-bin optimization also meant:
- Reduced index memory usage by avoiding duplication
- Faster lookups, because all bins could be read in one fetch
This move simplified things even more, aligning with Criteo's infrastructure-as-code strategy and reducing human intervention during failure recovery.
Infrastructure impact: 78% server reduction and lower carbon footprint
By replacing Couchbase and Memcached with Aerospike, Criteo reduced server count from more than 3,000 to just 720. This meant:
- Lowered total RAM and disk footprint
- Reduced power consumption and cooling
Began running on 100% renewable energy in 2022, accomplished with the purchase of certificates and relocation of data centers to more sustainable locations.
“We’re doing many more queries for less electricity in the end. Way less,” said Brugidou. This migration supports Criteo’s science-based climate target to reduce electricity consumption by 42% by 2030.
Lessons learned: Performance tuning, not over-engineering
Despite the performance of the new system, Criteo didn’t jump straight to exotic optimizations. Instead, it prioritized simplicity:
- Avoided pre-loading datasets in RAM
- Let Aerospike handle replication and rebalancing
- Tuned bin and namespace configs incrementally based on workload metrics
Brugidou emphasized that consistency and throughput remained stable even under stress tests, without resorting to low-level tuning.
Takeaway: Scaling real-time systems without scaling costs
By replacing a brittle, multi-tiered stack with Aerospike’s real-time engine, Criteo gained both technical efficiency and reduced costs. It consolidated two systems into one, improved performance, reduced latency, and slashed infrastructure overhead, without sacrificing resiliency or scale.
For teams building real-time systems with massive QPS and tight SLAs, Criteo’s experience shows what becomes possible with the right storage architecture.
With Aerospike, Criteo:
- Consolidated two systems into one
- Maintained sub-millisecond latency
- Reduced infrastructure costs and energy use
- Gained reliability through automation
Want to learn more? Check out the webinar replay or talk to our team about what real-time data infrastructure can do for you.
Top comments (0)