DataWorks provides readers and writers for batch synchronization nodes to simplify the process of data synchronization between data sources. You can add data sources to a workspace in the visualized user interface (UI), and synchronize full or incremental data between these data sources by using the scheduling capability of DataWorks. This topic describes how to use a batch synchronization node to perform data synchronization. In this example, a MaxCompute data source is used as the source, and a Hologres data source is used as the destination.
Prerequisites
(Required if you use a RAM user to develop tasks) The desired RAM user is added to your DataWorks workspace as a member and is assigned the Development or Workspace Administrator role. The Workspace Administrator role has more permissions than necessary. Exercise caution when you assign the Workspace Administrator role. For more information about how to add a member, see Add workspace members and assign roles to them.
NoteIf you use an Alibaba Cloud account, you can skip this operation.
A MaxCompute data source and a Hologres data source are added to the workspace, and the data sources have passed the network connectivity test. For more information, see Add and manage data sources.
NoteBatch synchronization nodes support multiple types of data sources. For more information, see Supported data source types and synchronization operations.
Limits
The batch synchronization feature provided by DataWorks does not support data synchronization across time zones. If the data sources of a batch synchronization node reside in a different time zone from the resource group that is used to run the task, errors may occur during data synchronization.
1. Create a batch synchronization node
Create a batch synchronization node. For more information, see Create an auto triggered node.
2. Configure network connectivity and a resource group
On the configuration tab of the batch synchronization node, configure the Source, Data Source Name, Destination, and Data Source Name parameters, select a resource group from the drop-down list in the middle, and then click Next. Make sure that the data sources and the resource group can connect to each other.
3. Configure the batch synchronization node
On the configuration tab of the batch synchronization node, you can use different methods to configure the node.
In most cases, we recommend that you configure a batch synchronization node by using the codeless UI, which is intuitive and convenient. If the data sources of the batch synchronization node do not support the codeless UI, you can click Code Editor in the top toolbar of the configuration tab and configure the batch synchronization node by using the code editor.
If you switch from the codeless UI to the code editor when you configure a batch synchronization node, you can no longer switch back to the codeless UI. If you want to use the codeless UI, you can only create another batch synchronization node.
4. Configure debugging parameters
In the right-side navigation pane of the configuration tab of the batch synchronization node, click Debugging Configurations. On the Debugging Configurations tab, configure the following parameters. These parameters are used to debug and run the batch synchronization node.
Parameter | Description |
Resource Group | Select the serverless resource group that you specify in 2. Configure network connectivity and a resource group. |
Script Parameters | If you configure scheduling parameters for the batch synchronization node, assign values to the scheduling parameters when you configure debugging parameters for the node. This ensures that the batch synchronization node can obtain the scheduling parameters when you debug and run the node. Note When synchronizing a partitioned table with partition filtering enabled by default, and the partition parameter is set to ${bizdate}, configure bizdate with a valid partition value from the source table. |
What to do next
If you want the system to periodically schedule the batch synchronization node, configure scheduling properties for the node based on your business requirements. For more information, see Node scheduling.
After the configuration of the batch synchronization node is complete, deploy the node. For more information, see Node or workflow deployment.
After the batch synchronization node is deployed, view the running information of the node in Operation Center. For information about Operation Center, see Getting started with Operation Center.