There are two ways to synchronize sharded clusters. You can use either one mongosync
or several mongosync
instances. For best performance with large or heavily loaded clusters, use one mongosync
instance for each shard on the source cluster.
Important
You must always disable the balancer on a sharded destination cluster by using balancerStop
. After stopping the balancer, wait fifteen minutes before starting mongosync
. This gives the cluster time to finish any in-progress chunk migrations.
If the source or destination cluster is a sharded cluster and you are not running mongosync
with namespace filtering, you must disable the source cluster's balancer by running the balancerStop
command and waiting 15 minutes for the command to complete.
If the source or destination cluster is a sharded cluster and you are running mongosync
with namespace filtering, you can globally enable the source cluster's balancer but you must disable it for all collections within the namespace filter. See Disabling Balancer for Collections in Filtered Sync. You can also fully disable the source cluster's balancer.
During migration, do not run the moveChunk
or moveRange
commands. If you have enabled the source cluster's balancer, but disabled it for collections within the namespace filter, do not run shardCollection
on collections within the namespace filter. If you run shardCollection
on collections within the namespace filter during the migration, mongosync
returns an error and stops, which requires you to start the migration from scratch.
Configure a Single mongosync
Instance
To configure a single mongosync
, follow the connection instructions for your cluster architecture to connect to the mongos
instance in your cluster.
When you connect a single mongosync
to a sharded cluster do not use the replicaSet
option or the id
option.
The rest of this page addresses cluster to cluster synchronization using multiple mongosync
instances.
Configure Multiple mongosync
Instances
The number of mongosync
instances must match the number of shards on the source cluster. You must use the same version of mongosync
between all instances. For a replica set source, you can only use one mongosync
instance.
When you configure multiple mongosync
instances to sync between sharded clusters, you must send identical API endpoint commands to each mongosync
instance.
To configure multiple mongosync
instances:
Determine the shard IDs
To get the shard IDs, connect to the source cluster mongos
and run the listShards
command.
db.adminCommand( { listShards: 1 } )
The information is in the shards
array.
shards: [ { _id: 'shard01', host: 'shard01/localhost:27501,localhost:27502,localhost:27503', state: 1, topologyTime: Timestamp({ t: 1656612236, i: 2 }) }, { _id: 'shard02', host: 'shard02/localhost:27504,localhost:27505,localhost:27506', state: 1, topologyTime: Timestamp({ t: 1656612240, i: 4 }) } ]
Connect the mongosync
instances
These instructions use a generic connection string. To modify the connection string for your cluster architecture, refer to the architecture specific connection details.
Tip
A single host server can run multiple mongosync
instances. To improve performance, run mongosync
on multiple host servers.
Run the first mongosync
instance:
mongosync \ --cluster0 "mongodb://user:password@cluster0host:27500" \ --cluster1 "mongodb://user:password@cluster1host:27500" \ --id shard01 --port 27601
When running multiple mongosync
instances, the number of instances must equal the number of shards. Each mongosync
instance must be started with the --id
option or id
setting to specify the shard it replicates.
Run a new mongosync
instance for each shard in the source cluster. Edit the --id
and --port
fields for each additional mongosync
instance.
mongosync \ --cluster0 "mongodb://user:password@cluster0host:27500" \ --cluster1 "mongodb://user:password@cluster1host:27500" \ --id shard02 --port 27602
The connection strings for the --cluster0
and --cluster1
options should point to mongos
instances. In the example, they use the same mongos
instance.
Each mongosync
instance:
Connects to
mongos
instances in the source cluster.Connects to
mongos
instances in the destination cluster.Replicates a single shard from the source cluster, identified by the
--id
option.Specifies a unique port to use during synchronization. Consider designating a range of ports to simplify scripting Mongosync operations.
Start Multiple mongosync
Instances
Use curl
or another HTTP client to issue the start command to each of the mongosync
instances.
curl mongosync01Host:27601/api/v1/start -XPOST --data \ '{ "source": "cluster0", "destination": "cluster1", \ "reversible": false, "enableUserWriteBlocking": "none" }' curl mongosync02Host:27602/api/v1/start -XPOST --data \ '{ "source": "cluster0", "destination": "cluster1", \ "reversible": false, "enableUserWriteBlocking": "none" }'
The start
command options must be the same for all of the mongosync
instances.
Check Progress
To review synchronization progress for a particular shard, use curl
or another HTTP client to issue the progress command to the mongosync
instance syncing that shard.
curl mongosync02Host:27602/api/v1/progress -XGET
This command checks the progress of the mongosync
instance that is running on mongosync02Host
and using port 27602
for synchronization. To check progress on other shards, update the host and port number then repeat the API call to each mongosync
instance.
Pause a mongosync
Instance
The pause command will temporarily halt the synchronization process on a single shard. It does not pause any other mongosync
instances that may be running. Use curl
or another HTTP client to issue the pause
command to a mongosync
instance.
curl mongosync01Host:27601/api/v1/pause -XPOST --data '{}'
This command pauses the mongosync
instance that is running on mongosync01Host
and using port 27601
for synchronization. To pause synchronization on other shards, update the host and port number then repeat the API call to each mongosync
instance.
Resume Synchronization
If one or more mongosync
instances are paused, you can use the resume command to resume syncing. Run a separate resume
command against each paused mongosync
instance to continue syncing.
Use curl
or another HTTP client to issue the resume command to each mongosync
instance.
curl mongosync01Host:27601/api/v1/resume -XPOST --data '{}'
This command resumes synchronization on the mongosync
instance that is running on mongosync01Host
and using port 27601
. To resume synchronization on other shards, update the host and port number then repeat the API call to each mongosync
instance.
Commit Synchronization From Multiple mongosync
Instances
When you want to complete synchronization, issue the progress command and check the values for canCommit
and lagTimeSeconds
.
To minimize write blocking on the source cluster, you should only run the commit
command when the lagTimeSeconds value is small enough for your application.
If the lagTimeSeconds
value is small enough, and canCommit
is true
, issue the commit command to commit synchronization. Repeat the process on all of the mongosync
instances.
The commit
operation is blocking. The commit
command will not return until commit
has been called on every mongosync
instance.
// Check progress curl mongosync01Host:27601/api/v1/progress -XGET // Commit curl mongosync01Host:27601/api/v1/commit -XPOST --data '{}'
These commands only check progress and commit synchronization for the mongosync
instance that is running on mongosync01Host
and using port 27601
. To synchronize all of the shards, make additional calls to progress
and commit
on any other mongosync
instances that may be running.
Data Verification
Before transferring your application load from the source cluster to the destination, check your data to ensure that the sync was successful.
Note
If mongosync
stops during commit, before the /progress endpoint reports canWrite: true
, you must restart the entire migration to ensure that it's verified.
For more information, see Verify Data Transfer.
Reverse the Synchronization Direction
Note
For an in-depth tutorial on reversing your synchronization direction, see Reverse Sync Direction.
To reverse synchronization so that the original destination cluster acts as the source cluster:
If you have not already done so, issue the commit command to each
mongosync
instance and wait until all of the commits to finish. To check if the sync process has been committed, issue the progress command to allmongosync
instances and see if each response'sstate
field contains the valueCOMMITTED
.Issue the reverse command to each
mongosync
instance.
The reverse
operation is blocking. The reverse
command will not return until reverse
has been called on every mongosync
instance.
curl mongosync01Host:27601/api/v1/reverse -XPOST --data '{}'
This command reverses synchronization on the mongosync
instance that is running on mongosync01Host
and using port 27601
. Make additional calls to reverse
on any other mongosync
instances that may be running.
Note
Reverse synchronization is only possible if reversible
and enableUserWriteBlocking
are both set to "sourceAndDestination"
when the start API initiates mongosync
.