Docs Menu
Docs Home
/
Atlas
/

Atlas Data Federation Overview

Atlas Data Federation is a distributed query engine that allows you to natively query, transform, and move data across various sources inside & outside of MongoDB Atlas.

Data Federation is a strategy that separates compute from storage. When you use Data Federation, you associate data from multiple physical sources into a single virtual source of data for your applications. This enables you to query your data from a single endpoint without physically copying or moving it.

A federated database instance is a deployment of Atlas Data Federation. Each federated database instance contains virtual databases and collections that map to data in your data stores.

A virtual database is a grouping of collections that organizes your data into a single virtual location.

A virtual collection is a grouping of MongoDB documents (similar to a table in a relational database) that lives in a single virtual database. Collections do not require a fixed schema.

A data store refers to the location of your data. Atlas Data Federation supports the following data stores:

  • Atlas cluster

  • Atlas online archive

  • AWS S3 buckets

  • Azure Blob Storage

  • Google Cloud Storage

  • HTTP and HTTPS endpoints

Important

A federated database instance is limited to one instance of AWS S3, Azure Blob Storage, and Google Cloud Storage at the same time. Similarly, Data Federation across Atlas projects is unavailable. See Data Federation Limitations for more details on Data Federation limitations.

Storage configuration contains mappings between your virtual databases, collections, and data stores. You can define these mappings to access and run queries against your data.

When you use Atlas Data Federation, federated database instances exist within regional VPCs. All federated database instance data traffic, whether to Atlas clusters or cloud provider object storage, is TLS encrypted. TLS encryption is the primary form of network security available with Atlas Data Federation. If your federated database instance and data source (Atlas cluster, S3, Blob, Google Cloud Storage) use the same cloud provider, data traffic either traverses the cloud provider's network infrastructure, or the public internet, depending on the cloud provider's documented network service capabilities. If your federated database instance and data source are on different cloud providers, data traffic traverses the public internet.

Atlas Data Federation Architecture
click to enlarge

The Data Plane in the preceding diagram is where your data resides. You can configure Atlas Data Federation to access data in a variety of storage services. Specifically, you can configure Atlas Data Federation to access data in your AWS S3 buckets across AWS regions, Azure Blob Storage containers, Google Cloud Storage buckets, Atlas clusters, HTTP and HTTPS URLs, and Atlas Online Archives. To learn more about configuring Atlas Data Federation to access your data stores, see Define Data Stores for a Federated Database Instance.

You can then set up role-based access control for your federated database instances. You can control how your client connects to your federated database instance, either through a global connection option or by pinning it to a specific region. To learn more, see Configure Connection for Your Federated Database Instance.

Atlas Data Federation preserves data locality and maximizes local computation, where possible, to minimize data transfer and optimize performance. The Compute Plane in the preceding diagram shows where Atlas Data Federation processes all requests. Atlas Data Federation provides an elastic pool of agents in the region that is nearest to your data where Atlas Data Federation can process the data for your queries. To learn more about supported regions, see Atlas Data Federation Regions.

Atlas Data Federation doesn't persist data inside the system and once your query is processed, it only stores the metadata in your federated database instance. This allows you to comply with data sovereignty regulations and ensures that your data is stored and processed in compliance with legal requirements.

The Control Plane in the preceding diagram, which is the same as the Atlas Control Plane, is where Atlas Data Federation balances user requests and aggregates final results.

Atlas Data Federation executes certain parts of a query directly on the underlying storage service, rather than transfer all of the data to the compute nodes for processing. Additionally, when you execute a query, it is first processed by a Data Federation front-end component, which plans the query and then distributes it to the nodes in the backend. The backend nodes then access your data store directly to execute the query logic and return the results back to the front-end. This process reduces the amount of data to move around, thereby making the whole process faster and cheaper. To learn more, see Query a Federated Database Instance.

To optimize the performance of your queries, Atlas Data Federation does the following:

To learn more, see Optimize Query Performance.

You can connect to Atlas Data Federation using MongoDB language-specific drivers, mongosh, and Atlas SQL. To learn more, see Connect to Your Federated Database Instance.

You can use Atlas Data Federation to:

  • Copy Atlas cluster data into Parquet or CSV files written to AWS S3 buckets or Azure Blob Storage.

  • Query across multiple Atlas clusters and online archives to get a holistic view of your Atlas data.

  • Materialize data from aggregations across Atlas clusters, AWS S3 buckets, and Azure Blob Storage.

  • Read and import data from your AWS S3 buckets or Azure Blob Storage into an Atlas cluster.

Note

To prevent excessive charges on your bill, create your Atlas Data Federation in the same AWS or Azure region as your S3 or Azure Blob Storage data source. You can query AWS S3 only using federated database instances created in AWS and you can query Azure Blob Storage only using federated database instances created in Azure.

Atlas Data Federation routes your federated database requests through one of the following regions:

Data Federation Region
AWS Region
Atlas Region

Northern Virginia, USA

us-east-1

US_EAST_1

Oregon, USA

us-west-2

US_WEST_2

Sao Paulo, Brazil

sa-east-1

SA_EAST_1

Ireland

eu-west-1

EU_WEST_1

London, England, UK

eu-west-2

EU_WEST_2

Frankfurt, Germany

eu-central-1

EU_CENTRAL_1

Tokyo, Japan

ap-northeast-1

AP_NORTHEAST_1

Seoul, South Korea

ap-northeast-2

AP_NORTHEAST_2

Mumbai, India

ap-south-1

AP_SOUTH_1

Singapore

ap-southeast-1

AP_SOUTHEAST_1

Sydney, Australia

ap-southeast-2

AP_SOUTHEAST_2

Montreal, QC, Canada

ca-central-1

CA_CENTRAL_1

Data Federation Region
Azure Region
Atlas Region

Virginia, USA

eastus2

US_EAST_2

Sao Paulo, Brazil

brazilsouth

BRAZIL_SOUTH

Netherlands

westeurope

EUROPE_WEST

Data Federation Region
Google Cloud Region
Atlas Region

Iowa, USA

us-central1

CENTRAL_US

Belgium

europe-west1

WESTERN_EUROPE

Note

You will incur charges when running federated queries. For more information, see Data Federation Costs.

Back

Query Federated Data

On this page