Perform Long-Running Snapshot Queries

Snapshot queries allow you to read data as it appeared at a single point in time in the recent past.

Starting in MongoDB 5.0, you can use read concern "snapshot" to query data on secondary nodes. This feature increases the versatility and resilience of your application's reads. You do not need to create a static copy of your data, move it out into a separate system, and manually isolate these long-running queries from interfering with your operational workload. Instead, you can perform long-running queries against a live, transactional database while reading from a consistent state of the data.

Using read concern "snapshot" on secondary nodes does not impact your application's write workload. Only application reads benefit from long-running queries being isolated to secondaries.

Use snapshot queries when you want to:

Perform multiple related queries and ensure that each query reads data from the same point in time.
Ensure that you read from a consistent state of the data from some point in the past.

Comparing Local and Snapshot Read Concerns

When MongoDB performs long-running queries using the default "local" read concern, the query results may contain data from writes that occur at the same time as the query. As a result, the query may return unexpected or inconsistent results.

To avoid this scenario, create a session and specify read concern "snapshot". With read concern "snapshot", MongoDB runs your query with snapshot isolation, meaning that your query reads data as it appeared at a single point in time in the recent past.

Examples

The examples on this page show how you can use snapshot queries to:

Run Related Queries From the Same Point in Time

Read concern "snapshot" lets you run multiple related queries within a session and ensure that each query reads data from the same point in time.

An animal shelter has a pets database that contains collections for each type of pet. The pets database has these collections:

cats
dogs

Each document in each collection contains an adoptable field, indicating whether the pet is available for adoption. For example, a document in the cats collection looks like this:

{
 "name": "Whiskers",
 "color": "white",
 "age": 10,
 "adoptable": true
}

You want to run a query to see the total number of pets available for adoption across all collections. To provide a consistent view of the data, you want to ensure that the data returned from each collection is from a single point in time.

To accomplish this goal, use read concern "snapshot" within a session:

mongoc_client_session_t *cs = NULL;
mongoc_collection_t *cats_collection = NULL;
mongoc_collection_t *dogs_collection = NULL;
int64_t adoptable_pets_count = 0;
bson_error_t error;
mongoc_session_opt_t *session_opts;
cats_collection = mongoc_client_get_collection (client, "pets", "cats");
dogs_collection = mongoc_client_get_collection (client, "pets", "dogs");
/* Seed 'pets.cats' and 'pets.dogs' with example data */
if (!pet_setup (cats_collection, dogs_collection)) {
 goto cleanup;
}
/* start a snapshot session */
session_opts = mongoc_session_opts_new ();
mongoc_session_opts_set_snapshot (session_opts, true);
cs = mongoc_client_start_session (client, session_opts, &error);
mongoc_session_opts_destroy (session_opts);
if (!cs) {
 MONGOC_ERROR ("Could not start session: %s", error.message);
 goto cleanup;
}
/*
 * Perform the following aggregation pipeline, and accumulate the count in
 * `adoptable_pets_count`.
 *
 * adoptablePetsCount = db.cats.aggregate(
 * [ { "$match": { "adoptable": true } },
 * { "$count": "adoptableCatsCount" } ], session=s
 * ).next()["adoptableCatsCount"]
 *
 * adoptablePetsCount += db.dogs.aggregate(
 * [ { "$match": { "adoptable": True} },
 * { "$count": "adoptableDogsCount" } ], session=s
 * ).next()["adoptableDogsCount"]
 *
 * Remember in order to apply the client session to
 * this operation, you must append the client session to the options passed
 * to `mongoc_collection_aggregate`, i.e.,
 *
 * mongoc_client_session_append (cs, &opts, &error);
 * cursor = mongoc_collection_aggregate (
 * collection, MONGOC_QUERY_NONE, pipeline, &opts, NULL);
 */
accumulate_adoptable_count (cs, cats_collection, &adoptable_pets_count);
accumulate_adoptable_count (cs, dogs_collection, &adoptable_pets_count);
printf ("there are %" PRId64 " adoptable pets\n", adoptable_pets_count);

using namespace mongocxx;
using bsoncxx::builder::basic::kvp;
using bsoncxx::builder::basic::make_document;
auto db = client["pets"];
int64_t adoptable_pets_count = 0;
auto opts = mongocxx::options::client_session{};
opts.snapshot(true);
auto session = client.start_session(opts);
{
 pipeline p;
 p.match(make_document(kvp("adoptable", true))).count("adoptableCatsCount");
 auto cursor = db["cats"].aggregate(session, p);
 for (auto doc : cursor) {
 adoptable_pets_count += doc.find("adoptableCatsCount")->get_int32();
 }
}
{
 pipeline p;
 p.match(make_document(kvp("adoptable", true))).count("adoptableDogsCount");
 auto cursor = db["dogs"].aggregate(session, p);
 for (auto doc : cursor) {
 adoptable_pets_count += doc.find("adoptableDogsCount")->get_int32();
 }
}

ctx := context.TODO()
sess, err := client.StartSession(options.Session().SetSnapshot(true))
if err != nil {
return err
}
defer sess.EndSession(ctx)
var adoptablePetsCount int32
err = mongo.WithSession(ctx, sess, func(ctx context.Context) error {
// Count the adoptable cats
const adoptableCatsOutput = "adoptableCatsCount"
cursor, err := db.Collection("cats").Aggregate(ctx, mongo.Pipeline{
bson.D{{"$match", bson.D{{"adoptable", true}}}},
bson.D{{"$count", adoptableCatsOutput}},
})
if err != nil {
return err
}
if !cursor.Next(ctx) {
return fmt.Errorf("expected aggregate to return a document, but got none")
}
resp := cursor.Current.Lookup(adoptableCatsOutput)
adoptableCatsCount, ok := resp.Int32OK()
if !ok {
return fmt.Errorf("failed to find int32 field %q in document %v", adoptableCatsOutput, cursor.Current)
}
adoptablePetsCount += adoptableCatsCount
// Count the adoptable dogs
const adoptableDogsOutput = "adoptableDogsCount"
cursor, err = db.Collection("dogs").Aggregate(ctx, mongo.Pipeline{
bson.D{{"$match", bson.D{{"adoptable", true}}}},
bson.D{{"$count", adoptableDogsOutput}},
})
if err != nil {
return err
}
if !cursor.Next(ctx) {
return fmt.Errorf("expected aggregate to return a document, but got none")
}
resp = cursor.Current.Lookup(adoptableDogsOutput)
adoptableDogsCount, ok := resp.Int32OK()
if !ok {
return fmt.Errorf("failed to find int32 field %q in document %v", adoptableDogsOutput, cursor.Current)
}
adoptablePetsCount += adoptableDogsCount
return nil
})
if err != nil {
return err
}

db = client.pets
async with await client.start_session(snapshot=True) as s:
 adoptablePetsCount = 0
 docs = await db.cats.aggregate(
 [{"$match": {"adoptable": True}}, {"$count": "adoptableCatsCount"}], session=s
 ).to_list(None)
 adoptablePetsCount = docs[0]["adoptableCatsCount"]
 docs = await db.dogs.aggregate(
 [{"$match": {"adoptable": True}}, {"$count": "adoptableDogsCount"}], session=s
 ).to_list(None)
 adoptablePetsCount += docs[0]["adoptableDogsCount"]
print(adoptablePetsCount)

$catsCollection = $client->selectCollection('pets', 'cats');
$dogsCollection = $client->selectCollection('pets', 'dogs');
$session = $client->startSession(['snapshot' => true]);
$adoptablePetsCount = $catsCollection->aggregate(
 [
 ['$match' => ['adoptable' => true]],
 ['$count' => 'adoptableCatsCount'],
 ],
 ['session' => $session],
)->toArray()[0]->adoptableCatsCount;
$adoptablePetsCount += $dogsCollection->aggregate(
 [
 ['$match' => ['adoptable' => true]],
 ['$count' => 'adoptableDogsCount'],
 ],
 ['session' => $session],
)->toArray()[0]->adoptableDogsCount;
var_dump($adoptablePetsCount);

db = client.pets
with client.start_session(snapshot=True) as s:
 adoptablePetsCount = (
 (
 db.cats.aggregate(
 [{"$match": {"adoptable": True}}, {"$count": "adoptableCatsCount"}],
 session=s,
 )
 ).next()
 )["adoptableCatsCount"]
 adoptablePetsCount += (
 (
 db.dogs.aggregate(
 [{"$match": {"adoptable": True}}, {"$count": "adoptableDogsCount"}],
 session=s,
 )
 ).next()
 )["adoptableDogsCount"]
print(adoptablePetsCount)

client = Mongo::Client.new(uri_string, database: "pets")
client.start_session(snapshot: true) do |session|
 adoptable_pets_count = client['cats'].aggregate([
 { "$match": { "adoptable": true } },
 { "$count": "adoptable_cats_count" }
 ], session: session).first["adoptable_cats_count"]
 adoptable_pets_count += client['dogs'].aggregate([
 { "$match": { "adoptable": true } },
 { "$count": "adoptable_dogs_count" }
 ], session: session).first["adoptable_dogs_count"]
 puts adoptable_pets_count
end

The preceding series of commands:

Uses MongoClient() to establish a connection to the MongoDB deployment.
Switches to the pets database.
Establishes a session. The command specifies snapshot=True, so the session uses read concern "snapshot".
Performs these actions for each collection in the pets database:
- Uses $match to filter for documents where the adoptable field is True.
- Uses $count to return a count of the filtered documents.
- Increments the adoptablePetsCount variable with the count from the database.
Prints the adoptablePetsCount variable.

All queries within the session read data as it appeared at the same point in time. As a result, the final count reflects a consistent snapshot of the data.

Note

If the session lasts longer than the WiredTiger history retention period (300 seconds, by default), the query errors with a SnapshotTooOld error. To learn how to configure snapshot retention and enable longer-running queries, see Configure Snapshot Retention.

Read from a Consistent State of the Data from Some Point in the Past

Read concern "snapshot" ensures that your query reads data as it appeared at some single point in time in the recent past.

An online shoe store has a sales collection that contains data for each item sold at the store. For example, a document in the sales collection looks like this:

{
 "shoeType": "boot",
 "price": 30,
 "saleDate": ISODate("2022-02-02T06:01:17.171Z")
}

Each day at midnight, a query runs to see how many pairs of shoes were sold that day. The daily sales query looks like this:

mongoc_client_session_t *cs = NULL;
mongoc_collection_t *sales_collection = NULL;
bson_error_t error;
mongoc_session_opt_t *session_opts;
bson_t *pipeline = NULL;
bson_t opts = BSON_INITIALIZER;
mongoc_cursor_t *cursor = NULL;
const bson_t *doc = NULL;
bool ok = true;
bson_iter_t iter;
int64_t total_sales = 0;
sales_collection = mongoc_client_get_collection (client, "retail", "sales");
/* seed 'retail.sales' with example data */
if (!retail_setup (sales_collection)) {
 goto cleanup;
}
/* start a snapshot session */
session_opts = mongoc_session_opts_new ();
mongoc_session_opts_set_snapshot (session_opts, true);
cs = mongoc_client_start_session (client, session_opts, &error);
mongoc_session_opts_destroy (session_opts);
if (!cs) {
 MONGOC_ERROR ("Could not start session: %s", error.message);
 goto cleanup;
}
if (!mongoc_client_session_append (cs, &opts, &error)) {
 MONGOC_ERROR ("could not apply session options: %s", error.message);
 goto cleanup;
}
pipeline = BCON_NEW ("pipeline",
 "[",
 "{",
 "$match",
 "{",
 "$expr",
 "{",
 "$gt",
 "[",
 "$saleDate",
 "{",
 "$dateSubtract",
 "{",
 "startDate",
 "$$NOW",
 "unit",
 BCON_UTF8 ("day"),
 "amount",
 BCON_INT64 (1),
 "}",
 "}",
 "]",
 "}",
 "}",
 "}",
 "{",
 "$count",
 BCON_UTF8 ("totalDailySales"),
 "}",
 "]");
cursor = mongoc_collection_aggregate (sales_collection, MONGOC_QUERY_NONE, pipeline, &opts, NULL);
bson_destroy (&opts);
ok = mongoc_cursor_next (cursor, &doc);
if (mongoc_cursor_error (cursor, &error)) {
 MONGOC_ERROR ("could not get totalDailySales: %s", error.message);
 goto cleanup;
}
if (!ok) {
 MONGOC_ERROR ("%s", "cursor has no results");
 goto cleanup;
}
ok = bson_iter_init_find (&iter, doc, "totalDailySales");
if (ok) {
 total_sales = bson_iter_as_int64 (&iter);
} else {
 MONGOC_ERROR ("%s", "missing key: 'totalDailySales'");
 goto cleanup;
}

ctx := context.TODO()
sess, err := client.StartSession(options.Session().SetSnapshot(true))
if err != nil {
return err
}
defer sess.EndSession(ctx)
var totalDailySales int32
err = mongo.WithSession(ctx, sess, func(ctx context.Context) error {
// Count the total daily sales
const totalDailySalesOutput = "totalDailySales"
cursor, err := db.Collection("sales").Aggregate(ctx, mongo.Pipeline{
bson.D{{"$match",
bson.D{{"$expr",
bson.D{{"$gt",
bson.A{"$saleDate",
bson.D{{"$dateSubtract",
bson.D{
{"startDate", "$$NOW"},
{"unit", "day"},
{"amount", 1},
},
}},
},
}},
}},
}},
bson.D{{"$count", totalDailySalesOutput}},
})
if err != nil {
return err
}
if !cursor.Next(ctx) {
return fmt.Errorf("expected aggregate to return a document, but got none")
}
resp := cursor.Current.Lookup(totalDailySalesOutput)
var ok bool
totalDailySales, ok = resp.Int32OK()
if !ok {
return fmt.Errorf("failed to find int32 field %q in document %v", totalDailySalesOutput, cursor.Current)
}
return nil
})
if err != nil {
return err
}

db = client.retail
async with await client.start_session(snapshot=True) as s:
 docs = await db.sales.aggregate(
 [
 {
 "$match": {
 "$expr": {
 "$gt": [
 "$saleDate",
 {
 "$dateSubtract": {
 "startDate": "$$NOW",
 "unit": "day",
 "amount": 1,
 }
 },
 ]
 }
 }
 },
 {"$count": "totalDailySales"},
 ],
 session=s,
 ).to_list(None)
 total = docs[0]["totalDailySales"]
 print(total)

$salesCollection = $client->selectCollection('retail', 'sales');
$session = $client->startSession(['snapshot' => true]);
$totalDailySales = $salesCollection->aggregate(
 [
 [
 '$match' => [
 '$expr' => [
 '$gt' => ['$saleDate', [
 '$dateSubtract' => [
 'startDate' => '$$NOW',
 'unit' => 'day',
 'amount' => 1,
 ],
 ],
 ],
 ],
 ],
 ],
 ['$count' => 'totalDailySales'],
 ],
 ['session' => $session],
)->toArray()[0]->totalDailySales;

db = client.retail
with client.start_session(snapshot=True) as s:
 _ = (
 (
 db.sales.aggregate(
 [
 {
 "$match": {
 "$expr": {
 "$gt": [
 "$saleDate",
 {
 "$dateSubtract": {
 "startDate": "$$NOW",
 "unit": "day",
 "amount": 1,
 }
 },
 ]
 }
 }
 },
 {"$count": "totalDailySales"},
 ],
 session=s,
 )
 ).next()
 )["totalDailySales"]

The preceding query:

Uses $match with $expr to specify a filter on the saleDate field.
- $expr allows the use of aggregation expressions (such as NOW) in the $match stage.
Uses the $gt operator and $dateSubtract expression to return documents where the saleDate is greater than one day before the time the query is executed.
Uses $count to return a count of the matching documents. The count is stored in the totalDailySales variable.
Specifies read concern "snapshot" to ensure that the query reads from a single point in time.

The sales collection is quite large, and as a result this query may take a few minutes to run. Because the store is online, sales can occur at any time of day.

For example, consider if:

The query begins executing at 12:00 AM.
A customer buys three pairs of shoes at 12:02 AM.
The query finishes executing at 12:04 AM.

If the query doesn't use read concern "snapshot", sales that occur between when the query starts and when it finishes can be included in the query count, despite not occurring on the day the report is for. This could result in inaccurate reports with some sales being counted twice.

By specifying read concern "snapshot", the query only returns data that was present in the database at a point in time shortly before the query started executing.

Note

If the query takes longer than the WiredTiger history retention period (300 seconds, by default), the query errors with a SnapshotTooOld error. To learn how to configure snapshot retention and enable longer-running queries, see Configure Snapshot Retention.

Configure Snapshot Retention

By default, the WiredTiger storage engine retains history for 300 seconds. You can use a session with snapshot=true for a total of 300 seconds from the time of the first operation in the session to the last. If you use the session for a longer period of time, the session fails with a SnapshotTooOld error. Similarly, if you query data using read concern "snapshot" and your query lasts longer than 300 seconds, the query fails.

If your query or session run for longer than 300 seconds, consider increasing the snapshot retention period. To increase the retention period, modify the minSnapshotHistoryWindowInSeconds parameter.

For example, this command sets the value of minSnapshotHistoryWindowInSeconds to 600 seconds:

db.adminCommand( { setParameter: 1, minSnapshotHistoryWindowInSeconds: 600 } )

Important

To modify minSnapshotHistoryWindowInSeconds for a MongoDB Atlas cluster, you must contact Atlas Support.

Disk Space and History

Increasing the value of minSnapshotHistoryWindowInSeconds increases disk usage because the server must maintain the history of older modified values within the specified time window. The amount of disk space used depends on your workload, with higher volume workloads requiring more disk space.

Back

Timeouts

Update