All Products
Search
Document Center

DataWorks:Metadata acquisition

Last Updated:Aug 29, 2025

DataWorks Data Map provides the Metadata Acquisition feature to help you consolidate and manage metadata from various systems. You can view the collected metadata from various data sources in Data Map. This topic describes how to create a Database Collector to collect metadata from your data sources into DataWorks.

Prerequisites

You must create a data source in your workspace before you can perform metadata acquisition. For more information about how to create a data source, see Resource Management.

Overview of metadata acquisition

After you create a data source in a workspace, DataWorks can acquire its metadata. When you enable metadata acquisition in Data Map, the system performs a one-time full acquisition of existing metadata, followed by daily incremental acquisitions. The collected metadata is then available in Data Map. This lets you view a data overview, manage tables using classification and grouping, and view data lineage.

Note
  • If the default execution plan does not meet your needs, you can modify it. For more information, see Manage a Database Collector.

  • After you attach a MaxCompute or E-MapReduce (DLF) data source to the Data Development module, the system automatically manages the Database Collector. No manual management is required.

  • If you create a physical table in a data source but cannot find it in the Data Development module, you can manually run a metadata acquisition task for that data source to resolve the issue.

Supported data sources and acquisition methods

Data source type

Metadata acquisition method

Is the Database Collector visible in Data Map?

Metadata update timeliness

Table/Field

Partition

Data lineage

AnalyticDB for PostgreSQL

  • Data Development - Attach data source

  • Manual acquisition

Yes

Depends on the custom execution plan

Not supported

Real-time

AnalyticDB for MySQL

  • Data Development - Attach data source

  • Manual acquisition

Yes

Depends on the custom execution plan

Not supported

Real-time

Note

You must submit a ticket to enable the data lineage feature for your AnalyticDB for MySQL instance.

AnalyticDB for Spark

  • Data Development - Attach computing resource

    Note

    Currently, only the new version of Data Development supports attaching AnalyticDB for Spark computing resources.

  • Manual acquisition

    Note

    AnalyticDB for Spark and AnalyticDB for MySQL share the same entry point for metadata acquisition.

Yes

Real-time

Not supported

Real-time

CDH Hive

  • Management Center - Register open source cluster

  • Automatic acquisition

Yes

Depends on the custom execution plan

Real-time

Real-time

Data Lake Formation (DLF)

Automatic acquisition

No

Real-time

Real-time

N/A

E-MapReduce (DLF)

Note

You must enable EMR_HOOK for the cluster.

  • Management Center - Register open source cluster

  • Automatic acquisition

No

Real-time

Real-time

Real-time

E-MapReduce (HMS / RDS)

Note

You must enable EMR_HOOK for the cluster.

  • Management Center - Register open source cluster

  • Automatic acquisition

Yes

Real-time

Real-time

Real-time

Hologres

  • Data Development - Attach data source

  • Manual acquisition

Yes

Depends on the custom execution plan

Not supported

Real-time

Lindorm

  • Data Development - Attach data source

  • Manual acquisition

Yes

Depends on the custom execution plan

Not supported

Real-time

MaxCompute

  • Data Development - Attach data source

  • Automatic acquisition

No

Regular project: Real-time

External project: T+1

Regions in China: Real-time

Regions outside China: T+1

T+1

StarRocks

  • Management Center - Create data source

  • Manual acquisition

Yes

  • Instance mode: Real-time.

  • Connection string mode: Depends on the custom execution plan.

Not supported

Real-time

Note

Only instance mode supports data lineage acquisition. Connection string mode does not.

Other data source types (MySQL, PostgreSQL, SQL Server, Oracle, Table Store (OTS), Clickhouse, etc.)

  • Management Center - Create data source

  • Manual acquisition

Yes

Depends on the custom execution plan

Not supported

Not supported

Limits

  • You can perform metadata acquisition only for data sources that are configured in the workspace you are currently logged in to. To acquire metadata from a data source in another workspace, ask the workspace administrator to add you as a member. For more information, see Add a workspace member.

  • When you acquire metadata from a data source that uses a whitelist for access control, you must configure the database whitelist in advance. For more information, see Whitelists to configure when a data source for metadata acquisition has access control enabled.

  • Cross-region metadata acquisition is not recommended. The DataWorks region should be the same as the data source region. To perform cross-region metadata acquisition, you must use a public endpoint when you create the data source. For more information, see Data Source Management.

  • Using a MySQL Database Collector to acquire metadata from an OceanBase data source is not supported.

Feature entry point

  1. Go to Data Map.

  2. In the navigation pane on the left, click Metadata Acquisition.

    On the Data Source tab, you can manage the Database Collectors for your data sources. If no data sources exist, you can click Create Data Source to go to the data source configuration page and create one.

View a Database Collector

  • Overall statistics

    On the Metadata Acquisition page, the Data Source tab outlines metadata acquisition. This tab displays the number of data sources for which a Database Collector has been created.整体统计

  • Details

    You can also click the Manage button in the upper-right corner of a data source to open its details page. On this page, you can view the Status, Execution Plan, Last Run Time, Last Duration, and Average Duration of the corresponding Database Collector in a specific workspace, along with the number of tables that were updated and added during the last run.明细列表

Manage a Database Collector

Click the Manage button in the upper-right corner of the target data source. You are taken to the Collected tab by default, where you can perform the following operations on existing Database Collectors.

Run a Database Collector

You can manually run a Database Collector to execute a metadata acquisition task. On the Collected tab, find the target data source and click Run in the Actions column.

Modify the execution plan of a Database Collector

On the Collected tab, find the target Database Collector and click Edit in the Actions column to modify its execution plan. The supported execution plans are Manual and Periodic.

  • Manual: You must manually trigger metadata acquisition and updates after configuring the Database Collector for the target data source.

  • Periodic: After you configure the Database Collector for the target data source, the system periodically collects and updates metadata based on the configured execution plan. No manual trigger is required.

Remove a Database Collector

On the Collected tab, find the target data source and click Remove in the Actions column to remove its Database Collector. The data source is then moved to the Uncollected tab, and metadata acquisition stops.

Create a Database Collector

After you create a data source or register a cluster, you can enable metadata acquisition in Data Map and then view the acquisition status on the Collected tab.

If you remove a Database Collector and later need to restart data collection, you can create a new one from the Uncollected tab. The following steps describe this procedure.

  1. At the top of the list, click the Uncollected tab.

  2. Find the target data source and click Metadata Acquisition in the Actions column. In the Configure Execution Plan dialog box, configure the parameters.

    Note

    The configuration interface for the execution plan may vary depending on the data source. Refer to the actual interface in the product.

    配置采集计划

    Parameter

    Description

    Resource Group Name

    Select a resource group that is connected to the data source network. Data Map supports the following two types of resource groups. Select one as needed:

    • Your exclusive resource group for scheduling.

    • Your serverless resource group (general-purpose resource group).

    Connectivity Test

    After you select a resource group name, you can click Test Connectivity to verify the connection between the resource group and the data source again. If the message Connectivity Test Failed appears:

    Execution Plan

    Options include Manual, Monthly, Weekly, Daily, and Hourly. The system generates an execution plan based on the selected cycle and performs metadata acquisition for the target data source at the scheduled time.

    • Manual: Manually trigger metadata acquisition and updates based on your business needs.

    • Monthly: Automatically acquire metadata once at a specified time on a specified day of each month.

      Important

      Some months do not have a 29th, 30th, or 31st day. Select end-of-month dates with caution.

    • Weekly: Automatically acquire metadata once at a specified time on a specified day of each week.

      If you do not enter a Time, the acquisition runs at 00:00:00 on the specified days of the week by default.

    • Daily: Automatically acquire metadata once at a specific time each day.

    • Hourly: Automatically acquire metadata once at the Nth minute of each hour.

  3. After you confirm the configuration, click Confirm.

    The system performs metadata acquisition based on the configured execution plan. If you selected manual acquisition, go to the Collected tab, find the target data source, and click Run in the Actions column to run the acquisition task.

Notes on configuring whitelists for cloud products

For example, with ApsaraDB RDS for Alibaba Cloud, you must add the required IP address CIDR blocks to the database whitelist for metadata acquisition. Before you configure the whitelist, note the following:

Cloud products support standard and enhanced IP whitelist modes. The whitelist group that you configure can affect network connectivity during metadata acquisition:

  • If your database uses the standard IP whitelist mode: This mode does not distinguish between classic network and VPC whitelist groups.

  • If your database uses the enhanced IP whitelist mode:

    • The enhanced whitelist mode uses separate whitelist groups for classic networks and VPCs.

      Note

      In enhanced whitelist mode, you must specify a whitelist group for network isolation. For example, an IP address in a classic network whitelist cannot be used to access the RDS instance from a VPC, and vice versa.

    • If you use an exclusive resource group for scheduling to connect to the database through a VPC, use the VPC whitelist group.

    • If you use a public endpoint or a classic network address to access the database, use the classic network whitelist group.

  • If you switch the database from standard IP whitelist mode to enhanced IP whitelist mode:

    RDS copies the standard IP whitelist into two separate groups: one for the classic network and one for the VPC.

Additional notes on whitelist configuration:

  • Configuring a whitelist does not affect the normal operation of the RDS instance.

  • The default IP whitelist group (default) cannot be deleted. It can only be cleared.

  • Do not modify or delete system-generated groups to avoid issues with related products, such as ali_dms_group (the IP address whitelist group for DMS) and hdm_security_ips (the IP address whitelist group for DAS).

    Note

    When you configure a database whitelist, create a separate whitelist group for DataWorks.

  • The default IP whitelist contains only 127.0.0.1. This means that by default, no external IP address can access the RDS instance.

For more information about configuring an RDS whitelist, see Connect to an ApsaraDB RDS for MySQL instance. The process is similar for other data source types. Refer to the specific configuration steps for your data source.

What to do next

After acquiring metadata, you can perform operations in Data Map, such as viewing a data overview, managing tables by classification and grouping, and viewing data lineage. For more information, see Data overview, Look up a table, and Business-centric management: Data collections.