Snapshot queries allow you to read data as it appeared at a single point in time in the recent past.
Starting in MongoDB 5.0, you can use read concern "snapshot"
to query data on secondary nodes. This feature increases the versatility and resilience of your application's reads. You do not need to create a static copy of your data, move it out into a separate system, and manually isolate these long-running queries from interfering with your operational workload. Instead, you can perform long-running queries against a live, transactional database while reading from a consistent state of the data.
Using read concern "snapshot"
on secondary nodes does not impact your application's write workload. Only application reads benefit from long-running queries being isolated to secondaries.
Use snapshot queries when you want to:
Perform multiple related queries and ensure that each query reads data from the same point in time.
Ensure that you read from a consistent state of the data from some point in the past.
Comparing Local and Snapshot Read Concerns
When MongoDB performs long-running queries using the default "local"
read concern, the query results may contain data from writes that occur at the same time as the query. As a result, the query may return unexpected or inconsistent results.
To avoid this scenario, create a session and specify read concern "snapshot"
. With read concern "snapshot"
, MongoDB runs your query with snapshot isolation, meaning that your query reads data as it appeared at a single point in time in the recent past.
Examples
The examples on this page show how you can use snapshot queries to:
Run Related Queries From the Same Point in Time
Read concern "snapshot"
lets you run multiple related queries within a session and ensure that each query reads data from the same point in time.
An animal shelter has a pets
database that contains collections for each type of pet. The pets
database has these collections:
cats
dogs
Each document in each collection contains an adoptable
field, indicating whether the pet is available for adoption. For example, a document in the cats
collection looks like this:
{ "name": "Whiskers", "color": "white", "age": 10, "adoptable": true }
You want to run a query to see the total number of pets available for adoption across all collections. To provide a consistent view of the data, you want to ensure that the data returned from each collection is from a single point in time.
To accomplish this goal, use read concern "snapshot"
within a session:
mongoc_client_session_t *cs = NULL; mongoc_collection_t *cats_collection = NULL; mongoc_collection_t *dogs_collection = NULL; int64_t adoptable_pets_count = 0; bson_error_t error; mongoc_session_opt_t *session_opts; cats_collection = mongoc_client_get_collection (client, "pets", "cats"); dogs_collection = mongoc_client_get_collection (client, "pets", "dogs"); /* Seed 'pets.cats' and 'pets.dogs' with example data */ if (!pet_setup (cats_collection, dogs_collection)) { goto cleanup; } /* start a snapshot session */ session_opts = mongoc_session_opts_new (); mongoc_session_opts_set_snapshot (session_opts, true); cs = mongoc_client_start_session (client, session_opts, &error); mongoc_session_opts_destroy (session_opts); if (!cs) { MONGOC_ERROR ("Could not start session: %s", error.message); goto cleanup; } /* * Perform the following aggregation pipeline, and accumulate the count in * `adoptable_pets_count`. * * adoptablePetsCount = db.cats.aggregate( * [ { "$match": { "adoptable": true } }, * { "$count": "adoptableCatsCount" } ], session=s * ).next()["adoptableCatsCount"] * * adoptablePetsCount += db.dogs.aggregate( * [ { "$match": { "adoptable": True} }, * { "$count": "adoptableDogsCount" } ], session=s * ).next()["adoptableDogsCount"] * * Remember in order to apply the client session to * this operation, you must append the client session to the options passed * to `mongoc_collection_aggregate`, i.e., * * mongoc_client_session_append (cs, &opts, &error); * cursor = mongoc_collection_aggregate ( * collection, MONGOC_QUERY_NONE, pipeline, &opts, NULL); */ accumulate_adoptable_count (cs, cats_collection, &adoptable_pets_count); accumulate_adoptable_count (cs, dogs_collection, &adoptable_pets_count); printf ("there are %" PRId64 " adoptable pets\n", adoptable_pets_count);
using namespace mongocxx; using bsoncxx::builder::basic::kvp; using bsoncxx::builder::basic::make_document; auto db = client["pets"]; int64_t adoptable_pets_count = 0; auto opts = mongocxx::options::client_session{}; opts.snapshot(true); auto session = client.start_session(opts); { pipeline p; p.match(make_document(kvp("adoptable", true))).count("adoptableCatsCount"); auto cursor = db["cats"].aggregate(session, p); for (auto doc : cursor) { adoptable_pets_count += doc.find("adoptableCatsCount")->get_int32(); } } { pipeline p; p.match(make_document(kvp("adoptable", true))).count("adoptableDogsCount"); auto cursor = db["dogs"].aggregate(session, p); for (auto doc : cursor) { adoptable_pets_count += doc.find("adoptableDogsCount")->get_int32(); } }
ctx := context.TODO() sess, err := client.StartSession(options.Session().SetSnapshot(true)) if err != nil { return err } defer sess.EndSession(ctx) var adoptablePetsCount int32 err = mongo.WithSession(ctx, sess, func(ctx context.Context) error { // Count the adoptable cats const adoptableCatsOutput = "adoptableCatsCount" cursor, err := db.Collection("cats").Aggregate(ctx, mongo.Pipeline{ bson.D{{"$match", bson.D{{"adoptable", true}}}}, bson.D{{"$count", adoptableCatsOutput}}, }) if err != nil { return err } if !cursor.Next(ctx) { return fmt.Errorf("expected aggregate to return a document, but got none") } resp := cursor.Current.Lookup(adoptableCatsOutput) adoptableCatsCount, ok := resp.Int32OK() if !ok { return fmt.Errorf("failed to find int32 field %q in document %v", adoptableCatsOutput, cursor.Current) } adoptablePetsCount += adoptableCatsCount // Count the adoptable dogs const adoptableDogsOutput = "adoptableDogsCount" cursor, err = db.Collection("dogs").Aggregate(ctx, mongo.Pipeline{ bson.D{{"$match", bson.D{{"adoptable", true}}}}, bson.D{{"$count", adoptableDogsOutput}}, }) if err != nil { return err } if !cursor.Next(ctx) { return fmt.Errorf("expected aggregate to return a document, but got none") } resp = cursor.Current.Lookup(adoptableDogsOutput) adoptableDogsCount, ok := resp.Int32OK() if !ok { return fmt.Errorf("failed to find int32 field %q in document %v", adoptableDogsOutput, cursor.Current) } adoptablePetsCount += adoptableDogsCount return nil }) if err != nil { return err }
db = client.pets async with await client.start_session(snapshot=True) as s: adoptablePetsCount = 0 docs = await db.cats.aggregate( [{"$match": {"adoptable": True}}, {"$count": "adoptableCatsCount"}], session=s ).to_list(None) adoptablePetsCount = docs[0]["adoptableCatsCount"] docs = await db.dogs.aggregate( [{"$match": {"adoptable": True}}, {"$count": "adoptableDogsCount"}], session=s ).to_list(None) adoptablePetsCount += docs[0]["adoptableDogsCount"] print(adoptablePetsCount)
$catsCollection = $client->selectCollection('pets', 'cats'); $dogsCollection = $client->selectCollection('pets', 'dogs'); $session = $client->startSession(['snapshot' => true]); $adoptablePetsCount = $catsCollection->aggregate( [ ['$match' => ['adoptable' => true]], ['$count' => 'adoptableCatsCount'], ], ['session' => $session], )->toArray()[0]->adoptableCatsCount; $adoptablePetsCount += $dogsCollection->aggregate( [ ['$match' => ['adoptable' => true]], ['$count' => 'adoptableDogsCount'], ], ['session' => $session], )->toArray()[0]->adoptableDogsCount; var_dump($adoptablePetsCount);
db = client.pets with client.start_session(snapshot=True) as s: adoptablePetsCount = ( ( db.cats.aggregate( [{"$match": {"adoptable": True}}, {"$count": "adoptableCatsCount"}], session=s, ) ).next() )["adoptableCatsCount"] adoptablePetsCount += ( ( db.dogs.aggregate( [{"$match": {"adoptable": True}}, {"$count": "adoptableDogsCount"}], session=s, ) ).next() )["adoptableDogsCount"] print(adoptablePetsCount)
client = Mongo::Client.new(uri_string, database: "pets") client.start_session(snapshot: true) do |session| adoptable_pets_count = client['cats'].aggregate([ { "$match": { "adoptable": true } }, { "$count": "adoptable_cats_count" } ], session: session).first["adoptable_cats_count"] adoptable_pets_count += client['dogs'].aggregate([ { "$match": { "adoptable": true } }, { "$count": "adoptable_dogs_count" } ], session: session).first["adoptable_dogs_count"] puts adoptable_pets_count end
The preceding series of commands:
Uses
MongoClient()
to establish a connection to the MongoDB deployment.Switches to the
pets
database.Establishes a session. The command specifies
snapshot=True
, so the session uses read concern"snapshot"
.Performs these actions for each collection in the
pets
database:Prints the
adoptablePetsCount
variable.
All queries within the session read data as it appeared at the same point in time. As a result, the final count reflects a consistent snapshot of the data.
Note
If the session lasts longer than the WiredTiger history retention period (300 seconds, by default), the query errors with a SnapshotTooOld
error. To learn how to configure snapshot retention and enable longer-running queries, see Configure Snapshot Retention.
Read from a Consistent State of the Data from Some Point in the Past
Read concern "snapshot"
ensures that your query reads data as it appeared at some single point in time in the recent past.
An online shoe store has a sales
collection that contains data for each item sold at the store. For example, a document in the sales
collection looks like this:
{ "shoeType": "boot", "price": 30, "saleDate": ISODate("2022-02-02T06:01:17.171Z") }
Each day at midnight, a query runs to see how many pairs of shoes were sold that day. The daily sales query looks like this:
mongoc_client_session_t *cs = NULL; mongoc_collection_t *sales_collection = NULL; bson_error_t error; mongoc_session_opt_t *session_opts; bson_t *pipeline = NULL; bson_t opts = BSON_INITIALIZER; mongoc_cursor_t *cursor = NULL; const bson_t *doc = NULL; bool ok = true; bson_iter_t iter; int64_t total_sales = 0; sales_collection = mongoc_client_get_collection (client, "retail", "sales"); /* seed 'retail.sales' with example data */ if (!retail_setup (sales_collection)) { goto cleanup; } /* start a snapshot session */ session_opts = mongoc_session_opts_new (); mongoc_session_opts_set_snapshot (session_opts, true); cs = mongoc_client_start_session (client, session_opts, &error); mongoc_session_opts_destroy (session_opts); if (!cs) { MONGOC_ERROR ("Could not start session: %s", error.message); goto cleanup; } if (!mongoc_client_session_append (cs, &opts, &error)) { MONGOC_ERROR ("could not apply session options: %s", error.message); goto cleanup; } pipeline = BCON_NEW ("pipeline", "[", "{", "$match", "{", "$expr", "{", "$gt", "[", "$saleDate", "{", "$dateSubtract", "{", "startDate", "$$NOW", "unit", BCON_UTF8 ("day"), "amount", BCON_INT64 (1), "}", "}", "]", "}", "}", "}", "{", "$count", BCON_UTF8 ("totalDailySales"), "}", "]"); cursor = mongoc_collection_aggregate (sales_collection, MONGOC_QUERY_NONE, pipeline, &opts, NULL); bson_destroy (&opts); ok = mongoc_cursor_next (cursor, &doc); if (mongoc_cursor_error (cursor, &error)) { MONGOC_ERROR ("could not get totalDailySales: %s", error.message); goto cleanup; } if (!ok) { MONGOC_ERROR ("%s", "cursor has no results"); goto cleanup; } ok = bson_iter_init_find (&iter, doc, "totalDailySales"); if (ok) { total_sales = bson_iter_as_int64 (&iter); } else { MONGOC_ERROR ("%s", "missing key: 'totalDailySales'"); goto cleanup; }
ctx := context.TODO() sess, err := client.StartSession(options.Session().SetSnapshot(true)) if err != nil { return err } defer sess.EndSession(ctx) var totalDailySales int32 err = mongo.WithSession(ctx, sess, func(ctx context.Context) error { // Count the total daily sales const totalDailySalesOutput = "totalDailySales" cursor, err := db.Collection("sales").Aggregate(ctx, mongo.Pipeline{ bson.D{{"$match", bson.D{{"$expr", bson.D{{"$gt", bson.A{"$saleDate", bson.D{{"$dateSubtract", bson.D{ {"startDate", "$$NOW"}, {"unit", "day"}, {"amount", 1}, }, }}, }, }}, }}, }}, bson.D{{"$count", totalDailySalesOutput}}, }) if err != nil { return err } if !cursor.Next(ctx) { return fmt.Errorf("expected aggregate to return a document, but got none") } resp := cursor.Current.Lookup(totalDailySalesOutput) var ok bool totalDailySales, ok = resp.Int32OK() if !ok { return fmt.Errorf("failed to find int32 field %q in document %v", totalDailySalesOutput, cursor.Current) } return nil }) if err != nil { return err }
db = client.retail async with await client.start_session(snapshot=True) as s: docs = await db.sales.aggregate( [ { "$match": { "$expr": { "$gt": [ "$saleDate", { "$dateSubtract": { "startDate": "$$NOW", "unit": "day", "amount": 1, } }, ] } } }, {"$count": "totalDailySales"}, ], session=s, ).to_list(None) total = docs[0]["totalDailySales"] print(total)
$salesCollection = $client->selectCollection('retail', 'sales'); $session = $client->startSession(['snapshot' => true]); $totalDailySales = $salesCollection->aggregate( [ [ '$match' => [ '$expr' => [ '$gt' => ['$saleDate', [ '$dateSubtract' => [ 'startDate' => '$$NOW', 'unit' => 'day', 'amount' => 1, ], ], ], ], ], ], ['$count' => 'totalDailySales'], ], ['session' => $session], )->toArray()[0]->totalDailySales;
db = client.retail with client.start_session(snapshot=True) as s: _ = ( ( db.sales.aggregate( [ { "$match": { "$expr": { "$gt": [ "$saleDate", { "$dateSubtract": { "startDate": "$$NOW", "unit": "day", "amount": 1, } }, ] } } }, {"$count": "totalDailySales"}, ], session=s, ) ).next() )["totalDailySales"]
The preceding query:
Uses
$match
with$expr
to specify a filter on thesaleDate
field.$expr
allows the use of aggregation expressions (such asNOW
) in the$match
stage.
Uses the
$gt
operator and$dateSubtract
expression to return documents where thesaleDate
is greater than one day before the time the query is executed.Uses
$count
to return a count of the matching documents. The count is stored in thetotalDailySales
variable.Specifies read concern
"snapshot"
to ensure that the query reads from a single point in time.
The sales
collection is quite large, and as a result this query may take a few minutes to run. Because the store is online, sales can occur at any time of day.
For example, consider if:
The query begins executing at 12:00 AM.
A customer buys three pairs of shoes at 12:02 AM.
The query finishes executing at 12:04 AM.
If the query doesn't use read concern "snapshot"
, sales that occur between when the query starts and when it finishes can be included in the query count, despite not occurring on the day the report is for. This could result in inaccurate reports with some sales being counted twice.
By specifying read concern "snapshot"
, the query only returns data that was present in the database at a point in time shortly before the query started executing.
Note
If the query takes longer than the WiredTiger history retention period (300 seconds, by default), the query errors with a SnapshotTooOld
error. To learn how to configure snapshot retention and enable longer-running queries, see Configure Snapshot Retention.
Configure Snapshot Retention
By default, the WiredTiger storage engine retains history for 300 seconds. You can use a session with snapshot=true
for a total of 300 seconds from the time of the first operation in the session to the last. If you use the session for a longer period of time, the session fails with a SnapshotTooOld
error. Similarly, if you query data using read concern "snapshot"
and your query lasts longer than 300 seconds, the query fails.
If your query or session run for longer than 300 seconds, consider increasing the snapshot retention period. To increase the retention period, modify the minSnapshotHistoryWindowInSeconds
parameter.
For example, this command sets the value of minSnapshotHistoryWindowInSeconds
to 600 seconds:
db.adminCommand( { setParameter: 1, minSnapshotHistoryWindowInSeconds: 600 } )
Important
To modify minSnapshotHistoryWindowInSeconds
for a MongoDB Atlas cluster, you must contact Atlas Support.
Disk Space and History
Increasing the value of minSnapshotHistoryWindowInSeconds
increases disk usage because the server must maintain the history of older modified values within the specified time window. The amount of disk space used depends on your workload, with higher volume workloads requiring more disk space.