This topic uses a MySQL database that is deployed on an Alibaba Cloud Elastic Compute Service (ECS) instance as an example to show you how to connect your data source to DataWorks.
Scenarios
If your data source meets the following condition, we recommend that you use this solution:
The data source is hosted on an Elastic Compute Service (ECS) instance.
Solutions
Same Alibaba Cloud account and same region
If the ECS instance that hosts the data source and the DataWorks workspace are in the same Alibaba Cloud account and region, you can use a virtual private cloud (VPC) connection. To establish a network connection, deploy the resource group for the DataWorks workspace and the ECS instance in the same VPC.
Different Alibaba Cloud accounts, or same account but different regions
If the ECS instance that hosts the data source and the DataWorks workspace are in different Alibaba Cloud accounts or in the same Alibaba Cloud account but different regions, we recommend that you use a VPC network (intranet) connection. You can use a network connectivity tool, such as Cloud Enterprise Network or VPC Peering Connection, to establish network connectivity between the resource group for the DataWorks workspace and the VPC of the ECS instance.
Prerequisites
You have deployed a data source that is supported by DataWorks on an ECS instance.
A workspace is created. For more information, see Create a workspace.
A serverless resource group is created and associated with your workspace. For more information, see 1. Create a serverless resource group and 2. Associate the resource group with a workspace.
Billing
Billing varies based on the network connectivity tool that you use. For more information, see Billing of CEN or Billing of VPC peering connections.
When you use a VPC peering connection, if the ECS instance and the DataWorks resource group belong to different accounts but are in the same region, no fees are charged.
Configure network connectivity
This topic describes the general configuration flow for establishing network connectivity between a data source and DataWorks to help you quickly understand the core logic. For detailed configuration steps, a specific Configuration example is also provided for your reference.
Step 1: Obtain basic information
Same Alibaba Cloud account and same region
Data source side
VPC and vSwitch information of the ECS instance:
Go to the ECS console. In the top navigation bar, select the region where the ECS instance is located.
In the navigation pane on the left, choose
. On the Instances page, find the ECS instance on which the MySQL database is deployed and click the instance name to go to the Instance Details page.In the Configuration Information section, obtain the VPC (named
VPC 1
in this example) and vSwitch information.
DataWorks side
VPC and vSwitch information for the resource group:
Go to the Resource Groups page in the DataWorks console. Find the target resource group and click Network Settings in the Actions column.
In the corresponding feature module, view the attached VPC and vSwitch information.
For example, if you want to connect a MySQL database that is deployed on an ECS instance to DataWorks for data synchronization, view the corresponding VPC (
VPC 2
in this example) and vSwitch information in the Data Scheduling & Data Integration section.
Same Alibaba Cloud account, different regions
Data source side
Region information: This example uses an ECS instance in the China (Hangzhou) region.
VPC information of the ECS instance:
Go to the ECS console. In the top navigation bar, select the region where the ECS instance is located.
In the navigation pane on the left, choose
. Find the ECS instance where the MySQL database is deployed and click the instance name to go to the Instance Details page.In the Configuration Information section, obtain the VPC and vSwitch information.
DataWorks side
Region information: This example uses a DataWorks workspace and resource group in the China (Shanghai) region.
Information about the VPC and vSwitch that are bound to the resource group:
Go to the DataWorks resource group list page. Find the target resource group and click Network Settings in the Actions column.
In the corresponding feature module, view the attached VPC and vSwitch information.
For example, if you want to connect RDS for MySQL to DataWorks for data synchronization, view the corresponding VPC and vSwitch information in the Data Scheduling & Data Integration section.
Different Alibaba Cloud accounts
Data source side
Account information: This example uses Account A.
Region information: This example uses an ECS instance in the China (Hangzhou) region.
Information about the VPC and vSwitch of the ECS instance:
Go to the ECS console. In the top navigation bar, select the region where the ECS instance is located.
In the navigation pane on the left, choose
. Find the ECS instance on which a MySQL database is deployed and click the instance name to go to the Instance Details page.In the Configuration Information section, obtain the Virtual Private Cloud (VPC) and vSwitch information.
DataWorks side
Account information: This example uses Account B.
Region information: This example uses a DataWorks workspace and resource group in the China (Shanghai) region.
VPC and vSwitch CIDR block information for the resource group:
Go to the Resource Groups page in the DataWorks console. Find the target resource group and click Network Settings in the Actions column.
In the corresponding feature module, view the attached VPC and vSwitch information.
For example, if you want to connect RDS for MySQL to DataWorks for data synchronization, view the corresponding VPC and vSwitch information in the Data Scheduling & Data Integration section.
Step 2: Establish a network connection
Same Alibaba Cloud account and same region
If
VPC 1
andVPC 2
are the same, the ECS instance and the DataWorks resource group are deployed in the same VPC, and a network connection is established between them by default.If
VPC 1
andVPC 2
are different, you must click Add Binding on the network settings page of the DataWorks resource group to attachVPC 1
to the resource group. This allows the DataWorks resource group and the ECS instance to be deployed in the same VPC.
Same Alibaba Cloud account, different regions
Cloud Enterprise Network (CEN): CEN is suitable for establishing network connections among multiple VPCs in complex network environments. For more information, see Connect VPCs in different regions.
VPC peering connection: A VPC peering connection is used for network connectivity between two VPCs. For more information about the configuration, see Use a VPC peering connection to enable private connectivity between VPCs.
If errors occur when you configure network connectivity, submit a ticket to contact technical support of the related Alibaba Cloud service.
Different Alibaba Cloud accounts
Cloud Enterprise Network (CEN): CEN establishes network connectivity between multiple VPCs in complex enterprise network environments. For more information about the configuration, see cross-account VPC-to-VPC connection.
VPC peering connection: A VPC peering connection is used for establishing network connectivity between two VPCs. For more information, see Use a VPC peering connection to establish private connectivity between VPCs.
If errors occur when you configure network connectivity, submit a ticket to contact technical support of the related Alibaba Cloud service.
Step 3: Add a route for the DataWorks resource group
If the ECS instance and the DataWorks resource group are in the same account but different regions, or in different accounts, you must add a route for the DataWorks resource group. The route must point to the CIDR block of the vSwitch where the ECS instance is located.
Go to the Resource Groups page in the DataWorks console. Find the target resource group and click Network Settings in the Actions column.
In the corresponding feature module, find the attached VPC and click Custom Route in the Actions column.
Click Add Route. For Connection Method, select CIDR Block. Set Destination CIDR Block to the vSwitch CIDR block of the ECS instance.
Step 4: (Optional) Enable remote access for the database
For some databases, you must enable remote access in their configuration files to allow specified users to access the databases from an external source using an IP address and port. The configuration methods vary for different databases. For more information, see the official documentation for your database.
For more information about how to enable remote access for a MySQL database, see 4. Enable remote access for the MySQL database.
Step 5: Configure the ECS security group
Alibaba Cloud ECS uses security groups to provide firewall capabilities. You must open the database port in the security group of the ECS instance to the VPC of the DataWorks resource group. This allows the DataWorks resource group to access the services that are deployed on the ECS instance.
Go to the ECS console. In the top navigation bar, select the region where the ECS instance is located.
In the navigation pane on the left, choose
. Find the ECS instance where the MySQL database is deployed and click the instance name to go to the Instance Details page.Click the Security Groups tab and then click the name of a security group to go to the Security Group Details page.
In the Access Rule section, click Add Rule and configure the following key parameters. You can use the default values for the other parameters.
Source: Enter the CIDR block of the vSwitch that is bound to the DataWorks resource group.
Port: Select the corresponding port for the database deployed on the ECS instance. For example, port
3306
must be open for MySQL.
Test the network connection
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
In the navigation pane on the left, click Data Source. On the Data Sources page, click Add Data Source. Select a data source based on your requirements and configure the related connection parameters.
In the list of resource groups at the bottom of the page, select the resource group that is connected to the data source and click Test Connectivity.
NoteIf the connectivity test fails and the result is Failed, you can use the Connectivity Diagnosis Tool to resolve the issue. If you still cannot connect the resource group to the data source, you can submit a ticket.
Configuration example
This example shows how to configure a network connection. A MySQL database is deployed on an ECS instance in the China (Hangzhou) region within Account A. DataWorks is activated in the China (Shanghai) region within Account B.
1. Basic information
Parameter | Data source (ECS instance with MySQL) | DataWorks resource group |
Account | Account A | Account B |
Region | China (Hangzhou) | China (Shanghai) |
VPC |
Basic information page of the ECS instance: |
Network settings page of the resource group: |
2. Establish a network connection
This section describes how to use a VPC peering connection to configure network connectivity between an ECS instance and DataWorks.
If errors occur when you configure network connectivity, submit a ticket to contact technical support of the related Alibaba Cloud service.
Log on to Alibaba Cloud Account A. Go to the VPC Peering Connection console, switch the region to China (Hangzhou) at the top of the page, and then click Create Peering Connection and configure the relevant parameters.
The following table describes the key parameters. You can use the default values for the other parameters.
Parameter
Configuration and example
Peering Connection Name
The custom name of the peering connection. In this example,
Account_A to Account_B
is used.Requester VPC Instance
The VPC with which the ECS instance within Alibaba Cloud Account A is associated. In this example, select
Account_A_hangzhou_VPC
.Accepter Account Type
In this example, select
Cross-account
.Accepter Main Account UID
Enter the UID of Account B.
Accepter Region Type
In this example, select
Cross-region
.Accepter Region
The region in which the DataWorks workspace and resource group reside within Alibaba Cloud Account B. In this example, select
China (Shanghai)
.Accepter VPC
Enter the ID of the VPC with which the DataWorks resource group is associated within Alibaba Cloud Account B. In this example, the VPC is
Account_B_shanghai_VPC
.Click OK to complete the peering connection configuration. You are automatically redirected to the basic information page of the peering connection. On this page, the Status is Peering Accepting.
Log on to Alibaba Cloud Account B and go to the VPC Peering Connection console. At the top of the page, switch the region to China (Shanghai). Find the peering connection record from Alibaba Cloud Account A and click Accept in the Actions column. After the request is accepted, the Status of the peering connection changes to Activated.
Click Configure Route Entry in the Accepter VPC column. In the Configure Route Entry dialog box, enter a custom Name and set Destination CIDR Block to the CIDR block of the requester VPC that contains the ECS instance. In this example, the CIDR block is set to
192.168.0.0/16
.Log on to Alibaba Cloud Account A and go to the VPC Peering Connection console. At the top of the page, switch the region to China (Hangzhou) and find the created peering connection.
In the Requester VPC Instance section, click Configure Route Entry. In the Configure Route Entry dialog box, enter a custom Name and set Destination CIDR Block to the VPC CIDR block of the accepter (DataWorks resource group). In this example, the CIDR block is
172.16.0.0/12
.
3. Add a route for the DataWorks resource group
Log on to Alibaba Cloud Account B. Go to the Resource Groups page of the DataWorks console, find the target resource group, and click Network Settings in the Actions column.
In the corresponding feature module, find the attached VPC and click Custom Route in the Actions column.
Click Add Route. For Connection Method, select Specified CIDR Block. Set Destination CIDR Block to the vSwitch CIDR block of the ECS instance, for example,
192.168.6.0/24
.
4. Enable remote access for the MySQL database
Connect to the ECS instance where the MySQL database is deployed to enable remote access for the database.
The following commands apply only to a MySQL 8.0 database that runs in a Linux environment. You must adapt the commands for other operating systems and MySQL versions.
Find the
my.cnf
configuration file. For a default installation, the file is typically located at/etc/my.cnf
.find / -name my.cnf
Run the
vim /etc/my.cnf
command to edit the configuration file. Replace the path tomy.cnf
with the actual path that you found in the previous step.At the end of the configuration file, press
i
to add the following configuration in the[mysqld]
section:bind-address=0.0.0.0
Press
Esc
and then enter:wq!
to save and exit.Run the
systemctl restart mysqld
command to restart the service.Create a user that can be used to remotely connect to the MySQL database when you configure the data source in DataWorks.
Run the
mysql -u root -p
command to log on to the database as an administrator.Create a user and set a password.
-- "dataworks_user" is the username. You can customize it. -- "%" indicates that the user can access from any IP address. You can also specify an IP address for fine-grained control. -- "StrongPassword123!" is the password. You can customize it. CREATE USER 'dataworks_user'@'%' IDENTIFIED BY 'StrongPassword123!';
Grant database permissions to the user.
-- Run one of the following commands. -- Grant all privileges to the user. Use with caution. GRANT ALL PRIVILEGES ON *.* TO 'dataworks_user'@'%' WITH GRANT OPTION; -- Grant all privileges on a specific database, such as mydatabase, to the user. GRANT ALL PRIVILEGES ON mydatabase.* TO 'dataworks_user'@'%' WITH GRANT OPTION;
Run the
FLUSH PRIVILEGES;
command to refresh permissions and then run theexit
command to exit the database.Verify the remote connection.
mysql -u dataworks_user -h <Primary private IP address of the ECS instance> -p
5. Configure the ECS security group
Log on to Alibaba Cloud Account A. Go to the ECS console and select China (Hangzhou) in the top navigation bar.
In the navigation pane on the left, choose
. Find the ECS instance where the MySQL database is deployed and click the instance name to go to the Instance Details page.Click the Security Group tab and then click the security group name to go to the Security Group Details page.
In the Access Rules section, click Add Rule and configure the following key parameters. You can use the default values for parameters that are not specified.
Source: Enter the vSwitch CIDR block associated with the DataWorks resource group. In this example, use
172.16.66.0/24
.Port: Select the port for the database that is deployed on the ECS instance. In this example, the port is
3306
.
6. Test the network connection
Log on with Account B.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
In the navigation pane on the left, click Data Source. On the Data Sources page, click Add Data Source.
Select the MySQL data source type and configure its parameters.
Configuration Mode: Select Connection String Mode.
Host IP Address: Enter the private IP address of the ECS instance. In this example,
192.168.6.172
is used.Port Number: Set to
3306
.Database Name: Enter the name of an existing database.
Username and Password: Enter the username and password of the
dataworks_user
account that you created in the 4. Enable remote access for the MySQL database step.
In the Connection Configuration section, click Test Connectivity for the resource group that is bound to the workspace and check whether the result is Connected.
NoteIf the test fails, you can click Self-service Troubleshoot to resolve the issue. If the test still fails after troubleshooting, submit a ticket.
References
For more information about frequently asked questions related to network connectivity, see Resource group operations and network connectivity.