Docs Menu
Docs Home
/ /

Analyze Your Data Schema

The Schema tab provides an overview of the data type and shape of the fields in a particular collection. Databases and collections are visible in the left-side navigation.

The overview is based on sampling the documents in the collection. The schema overview may include additional data about the contents of the fields, such as the minimum and maximum values of dates and integers, the frequency of occurrence of particular values, and the cardinality of the data.

MongoDB has a flexible schema model, which means that some fields may contain different types of data from one document to the next. For example, a field named address may contain strings and integers in some documents, objects in others, or some combination of all three.

In the case of heterogenous fields, the Schema tab shows a breakdown of the various data types contained within the field with the percentage of each data type represented.

Example

The Schema tab shows size information about the test.restaurants collection at the top, including the total number of documents in the collection, the average document size, and the total disk space occupied by the collection.

The following fields are shown with details:

  • The _id field is an ObjectId. Each ObjectId contains a timestamp, so Compass displays the range of creation times for the sampled documents.

  • The address field contains four nested fields. You can expand the field panel to see analyses of each of the nested fields.

  • The borough field contains a string indicating the borough in which the restaurant is located. The cardinality is low enough that Compass can provide a graded bar of the field contents, with the most-frequently occurring string on the left.

  • The grades field contains arrays of strings. The analysis shows the minimum, maximum, and average array lengths.

Example of a collection's schema
click to enlarge

Using the query bar in the Schema tab, you can create a query filter to limit your result set. Click the Options button to specify query options, such as the particular fields to display and the number of results to return.

Note

For query result sets larger than 1000 documents, Compass shows a subset of the results. Otherwise, Compass shows the entire result set.

For details on sampling, see Sampling.

Query bar schema view
click to enlarge

Tip

In the Schema tab, you can also use the Query Builder to enter a query into the query bar.

For each field, Compass displays summary information about the data type or types the field contains and the range of values. Depending on the data type and the level of cardinality, Compass displays histograms, graded bars, geographical maps, and sample data to provide a sense of the shape and scope of the data contained in each field.

Below is an example of the data type summary for a field called last_login which contains data of type date.

Example of a field with a single data type

For fields that contain multiple data types, Compass displays a percentage breakdown of the various data types across documents. In the example below, the chart shows the contents of a field called phone_no in which 20% of documents are of type int32, and the remaining 80% are of type string.

Example of percentage breakdown for data types
click to enlarge

If a collection contains documents in which not all fields contain a value, the missing values display as undefined. In the example below, the field age has no recorded value in 40% of the sampled documents.

Example of sparcely applied data type

Strings can appear in three different ways. If there are entirely unique strings in a field, Compass shows a random selection of string values from the specified field. Click the circular refresh icon to see a new set of randomly selected values from the field.

Example of string data types

If there are only a few different string values, Compass shows the strings in a single graded bar which shows the percentage of the population of the string values.

Example of few string data types

If there are multiple string values with some duplicates, Compass shows a histogram indicating the frequency of each string found within the field.

Example of string data types as a histogram

Note

Move the mouse over each bar to display a tooltip which shows the value of the string.

Numbers are similar to strings in their representation. Unique numbers are shown in the following manner:

Example of number data type

Duplicate numbers are shown in a histogram that indicates their frequency:

Example of duplicate number data types
click to enlarge

Fields that represent dates (and fields that contain the ObjectID data type, which includes a timestamp) are shown across multiple bar charts. The two charts on the top row represent the day of the week and time of day of the timestamp value.

The single chart on the bottom shows the first and last timestamp value, and the vertical lines represent the distribution of the timestamp across the range of first to last.

Example of Date data types

Fields that contain a sub-document or an array are displayed with a small triangle next to them and a visual representation of the data contained within the sub-document or array.

Example of fields with embedded documents or arrays

Click on the triangle to expand the field and view the embedded documents:

Expanding the embedded documents

Fields that contain GeoJSON data or [longitude,latitude] arrays are displayed with interactive maps. For more information on interacting with location data in Compass, see Analyze Location Data.

Example of GeoJSON data types
click to enlarge

Note

Third party mapping services are not available in Compass Isolated Edition.

If a field has mixed types, you can view different charts of each type by clicking on the type field. In the example below, the age field shows the values that are strings:

Example of a field with mixed types
click to enlarge

Clicking on the int32 type causes the chart to show its numeric data:

Example that shows numeric data for number type
click to enlarge

In the Schema tab, you can type the filter manually into the query bar or generate the filter with the Compass query builder. The query builder allows you to select data elements from one or more fields in your schema and construct a query matching the selected elements.

Tip

You can compose the initial query filter by using the clickable query builder and then manually edit the generated filter to your exact requirements.

The following procedure describes the steps involved in building a complex query with the query bar.

1

In the Schema view, you can click on a chart value to build a query. For example, the following image shows the query filter built by clicking the Manhattan value for the borough field.

Example of a created filter
click to enlarge
2

To select multiple values for a field, click and drag the cursor over a selection of values, or press shift+click on the desired values.

Exmaple of selecting multimple values for a field
click to enlarge
3

For example, the following image shows shows the compound query built by selecting values in the cuisine field.

Example of a compound query
click to enlarge
4

To deselect a previously selected value, shift+click on the selected value:

Example of removing a value from a filter
click to enlarge
5

To run the query, click Analyze. Click Reset to clear your query.

In the Schema tab, you can use interactive maps to filter and analyze location data. If your field contains GeoJSON data or [longitude,latitude] arrays, the Schema tab displays a map containing the points from the field. The data type for location fields is coordinates.

Image showing example field with location data
click to enlarge

You can apply a filter to the map to only analyze a specific range of points. To define a location filter:

  1. Click the Circle button at the top-right of the map.

  2. Click and drag on the map to draw a circle containing the area of the map you want to analyze.

  3. Repeat this process as desired to include additional areas of the map in the schema analysis.

Image showing map with filter circles drawn
click to enlarge

The query bar updates as you draw location filters to show the exact coordinates used in the $geoWithin query applied to the schema analysis.

If you specify multiple location filters, the query becomes an $or query with multiple $geoWithin operators.

To move or resize a location filter, click on the right side of the map. You will enter the filter editing mode, which looks like this:

Image showing map filter editing
click to enlarge
To move a filter
Click and drag the square in the center of the circle.
To resize a filter
Click and drag the square at the edge of the circle.

After modifying your filters, click Save.

To delete a location filter from the map:

  1. Click on the right side of the map.

  2. Either click:

    • A location filter to delete that filter.

    • Clear All to delete all location filters.

  3. Click Save.

If the analysis of your schema times out, it might be because the collection you are analyzing is very large, causing MongoDB to stop the operation before the analysis is complete. Increase the value of MAX TIME MS to allow the operation time to complete.

To increase the value of MAX TIME MS:

  1. In the query bar, expand Options.

    The Options button is on the right side of the query bar, next to the Analyze button.
  2. Increase the value of MAX TIME MS to accommodate your collection. MAX TIME MS defaults to 60000 milliseconds, or 60 seconds, but large collections might take tens of seconds to analyze.

Once you have increased the value of MAX TIME MS, retry your schema analysis by clicking Analyze.

Back

Atlas Vector Search Index

Earn a Skill Badge

Master "Schema Design Optimization" for free!

Learn more

On this page