Posted on Sep 26

Solving Write-on-Read: How We Built a Low-Latency, Eventually Consistent AI Retrieval System

As I finalize the design and implementation of MeridianDB’s core innovation — a state-of-the-art, out-of-the-box AI retrieval system — I ran into a common backend challenge: write-on-read.

This is one of the most frequently encountered problems in software engineering. It touches on one of the most critical aspects of system design: trade-offs.

What is Write-on-Read?

Write-on-read is a pattern where a system must update the database whenever a read occurs.

Most engineers implement this as a simple read query followed by a write query. While this works in basic cases, it has a serious flaw:

It doesn’t follow any explicit consistency model.

And if you care about predictability, scalability, and correctness, you need to decide which consistency model your system will follow.

Choosing a Consistency Model

Consistency models define how up-to-date your reads will be relative to writes. There are several, but the two most common are:

Strong Consistency – Every read reflects the most recent write. This is perfect for systems where correctness is critical (e.g., banking, inventory).
Eventual Consistency – Reads may be slightly stale, but all replicas converge to the same state eventually. This is ideal for systems prioritizing low latency and high scalability (e.g., social feeds, analytics counters).

MeridianDB’s architecture is built on eventual consistency, optimizing for low-latency retrieval and horizontal scalability.

The Challenge: Temporal Relevance

One of the key features of our retrieval system is temporal relevance — how "fresh" or "important" a memory is.

This is determined by:

Access frequency – How often a memory is accessed.
Last accessed time – When it was last retrieved.
Recency score – A computed score that decays over time.

We calculate temporal relevance using Exponential Decay with a Frequency Boost:

function calculateRecencyScore(accessCount, lastAccessed, baseScore = 100) { const now = Date.now(); const hoursSinceAccess = (now - lastAccessed) / (1000 * 60 * 60); // Exponential decay based on time (more recent = higher score) const timeDecay = Math.exp(-hoursSinceAccess / 72); // Half-life of 72 hours // Frequency boost (logarithmic to avoid domination by very frequent accesses) const frequencyBoost = Math.log10(accessCount + 1); // Combine components const rawScore = baseScore * (0.7 * timeDecay + 0.3 * (frequencyBoost / 3)); // Normalize to 0-100 scale return Math.min(100, Math.max(0, rawScore)); }

Example output, keeping accessFrequency count consistent:

1 day ago -> 53.16 2 days ago -> 38.94 3 days ago -> 28.76

This ensures that memories decay over time if the AI agent doesn’t access them — allowing us to:

Give administrators insight into why certain data is no longer being used.
Filter out irrelevant memories from retrieval, improving output quality and reducing noise.

This also keeps our storage footprint efficient and ensures higher-quality data input, which leads to better results downstream.

The Write-on-Read Problem

Here’s the challenge: the three fields recency, accessFrequency, and lastAccessed must be updated every time the retrieval endpoint is hit.

But retrieval latency must remain extremely low. If we used strong consistency with a read-then-write transaction, we’d introduce significant latency overhead — unacceptable for a real-time AI retrieval system.

Our Solution: Eventual Consistency + Queues

The key insight was recognizing that temporal updates are not financial transactions — they don’t require strict guarantees.

We can safely update them eventually.

Here’s our approach:

Enqueue Updates:

Every retrieval request pushes temporal updates into a queue (one job per memory record in the topK results from semantic search).
Batch Processing:

A background scheduler periodically processes these queued jobs in batches.

The scheduling interval is configurable by the user.
Eventual Convergence:

Over time, all temporal fields (recency, accessFrequency, lastAccessed) converge to the correct state — satisfying our consistency requirements without impacting retrieval latency.

The CAP Theorem Trade-off

This approach strikes the golden balance of the CAP theorem:

Consistency: Eventual (but good enough for temporal relevance).
Availability: Guaranteed, since reads never block on writes.
Partition Tolerance: Preserved, since updates are queued and replayed.

By trading strict consistency for eventual consistency, we guarantee fast, low-latency retrievals while keeping temporal data accurate enough for decision-making.

Key Takeaways

Write-on-read must follow a consistency model — don’t just read and write naively.
Not all data needs strong consistency. Temporal metadata can be updated eventually.
Queues and batching are powerful tools for preserving performance in read-heavy systems.
Design trade-offs matter: in MeridianDB, we chose low latency and scalability over strict consistency, and it paid off.