Database nodes - DataWorks - Alibaba Cloud Documentation Center

DataWorks allows you to create multiple types of database nodes for SQL task development, periodic scheduling, and integration with other jobs.

Prerequisites

The RAM user that you want to use is added to your workspace.
If you want to use a RAM user to develop tasks, you must add the RAM user to your workspace as a member and assign the Develop or Workspace Administrator role to the RAM user. The Workspace Administrator role has more permissions than necessary. Exercise caution when you assign the Workspace Administrator role. For more information about how to add a member and assign roles to the member, see Add workspace members and assign roles to them.
A data source is added to your workspace.
- The data source is connected to the serverless resource group. For more information, see Network connectivity solutions.
- The data source is added to the workspace in connection string mode. For more information, see Add and manage data sources.
- The data source can be used to create a database node. For more information, see Supported data sources.
Nodes of the data source that corresponds to a database node are created before you use the database node to develop a task. For more information, see Create task nodes.

Step 1: Use a database node to develop a task

After you create a database node, you can develop a task based on the database node.
1. Select a data source.
  Select the data source that you want to use to develop a task from the Select DataSource drop-down list. If you cannot find the data source that you want to use from the drop-down list, click Add Data Source to add a data source on the Data Sources page.
  Note
  - In a workspace in standard mode, only data sources that are configured for both production and development environments are displayed in the Select DataSource drop-down list.
  - Database nodes can be used to develop tasks only for data sources that are added in connection string mode in the production environment.
2. Write SQL statements.
  Write SQL statements in the SQL editor to create a task. In this example, a simple SQL query statement is used.
```
SELECT * FROM you_table_name; -- Query a table. SELECT '${var}'; -- Configure scheduling parameters.
```
  Note
  You can write SQL statements that you want to execute based on your business requirements and the syntax that is supported by the data source.
3. Select a resource group for debugging.
  Click Debugging Configurations in the right-side navigation pane of the configuration tab of the node. On the Debugging Configurations tab, click DataWorks Configurations. Then, select an existing serverless resource group that passes the connectivity test from the Resource Group drop-down list.
  Note
  If you want to access a data source over the Internet or a virtual private cloud (VPC), you must use the resource group for scheduling that is connected to the data source. For more information, see Network connectivity solutions.
4. Configure debugging parameters.
  Click Debugging Configurations in the right-side navigation pane of the configuration tab of the node. On the Debugging Configurations tab, click Script Parameters. Then, define variables in the node code and assign values to the variables.
5. After the debugging configurations are complete, click the icon to save the SQL node. Then, click the icon to run the SQL script and check whether the SQL script meets expectations.
After the debugging of the SQL script is complete, click Properties on the right side of the SQL Editor to configure task scheduling properties.

Step 2: Deploy the database node and perform O&M operations

After the node code and scheduling settings are configured, you can deploy the node to the production environment.
After the deployment is complete, go to the Auto Triggered Nodes page in Operation Center to view the auto triggered task that is deployed and perform O&M operations on the task. The node periodically runs as expected based on the scheduling settings that you configure. For more information, see Getting started with Operation Center.

Supported data sources

DataWorks allows you to create database nodes of various data source types. The following table describes the types of data sources that support database nodes.

Note

The data source that is used to create a database node must be added to the workspace in connection string mode.
Specific databases support the stored procedure feature. However, the stored procedure feature cannot be used in DataWorks Data Studio.

Types of data sources that support database nodes

Data source	Description
MySQL	MySQL is a popular relational database management system (RDBMS) for storing and processing data. It has the following benefits: small size, fast speed, and low cost. For more information, see MySQL.
SQL Server	SQL Server is a RDBMS for storing and processing data. It provides you with reliable, efficient, and secure data management and analysis services. For more information, see SQL Server.
Oracle	Oracle is a RDBMS for storing and processing data. It provides you with reliable, efficient, and secure data management and analysis services. For more information, see Oracle.
PostgreSQL	PostgreSQL is a powerful, flexible open source RDBMS. It is highly scalable and stable and provides you with powerful data models and various core features. For more information, see PostgreSQL.
StarRocks	StarRocks is a next-generation, high-speed data analytics engine that is built based on the Massively Parallel Processing (MPP) framework. It allows you to analyze data in an easier and more efficient manner. For more information, see StarRocks.
DRDS	PolarDB for Xscale (PolarDB-X), formerly called DRDS, is a distributed database service. It allows you to perform a scale-out operation to upgrade a relational database to a distributed system that supports storage and access of large amounts of data. PolarDB-X maintains the original features of relational databases, such as MySQL. For more information, see Overview.
PolarDB for MySQL	PolarDB for MySQL is a new-generation, cloud-native database service that is developed by Alibaba Cloud. This service decouples computing from storage and uses integrated software and hardware. PolarDB for MySQL is a secure and reliable database service that features high scalability, high performance, and mass storage. PolarDB for MySQL is 100% compatible with ecosystems of MySQL and PostgreSQL and highly compatible with Oracle. For more information, see What is PolarDB for MySQL Enterprise Edition?
PolarDB for PostgreSQL	PolarDB for PostgreSQL is a cloud-native relational database service that is developed by Alibaba Cloud. PolarDB for PostgreSQL is 100% compatible with PostgreSQL and is highly compatible with Oracle. PolarDB for PostgreSQL features rapid scalability, high performance, mass storage, and high security and reliability. It also supports the multi-dimensional, multi-model, spatio-temporal information engine Ganos that is developed by Alibaba Cloud and the open source geographic information engine PostGIS. For more information, see What is PolarDB for PostgreSQL Enterprise Edition?
Apache Doris	Apache Doris is a high-performance, real-time analytical database that can be used in scenarios such as report analysis, ad hoc queries, and federated queries across data lakes. For more information, see Introduction to Apache Doris.
MariaDB	MariaDB is an open source relational database management system (RDBMS) that is highly compatible with MySQL. It can seamlessly replace MySQL. After you uninstall MySQL, you can install MariaDB in the same location as MySQL, without modifying the application code. For more information, see MariaDB Server Documentation.
SelectDB	SelectDB is a next-generation real-time data warehouse service based on Apache Doris. It provides cost-effective and easy-to-use data analysis services to meet the real-time big data analysis requirements of enterprises. For more information, see SelectDB.
Amazon Redshift	Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. Amazon Redshift Serverless allows you to access and analyze data without performing configuration operations on a preset data warehouse. For more information, see What is Amazon Redshift?
SAP HANA	SAP HANA is a high-performance in-memory database and application platform that combines features of databases, data processing, and application platforms to provide enterprise-level in-memory computing capabilities. For more information, see SAP HANA Platform.
Vertica	Vertica is a high-performance, column-oriented database management system (DBMS) that can process and query large-scale data sets at a high speed. Vertica is mainly used for big data analysis and real-time queries. For more information, visit the Vertica official website.
DM	DM is an online transaction processing (OLTP) database that is integrated in a business system. It combines the advantages of distributed, elastic computing and cloud computing, and features high flexibility, reliability, and security, and ease of use. For more information, visit the DM official website.
KingbaseES	KingbaseES is a large relational database management system (RDBMS). It supports SQL standards and is suitable for enterprise-level scenarios in which a large amount of data is processed, and high concurrency and high availability are required. For more information, visit the KingbaseES official website.
OceanBase	OceanBase is a distributed relational database that is developed by Ant Group and Alibaba Cloud. OceanBase features strong data consistency, high availability, high performance, online scalability, and low costs, and is highly compatible with SQL standards and mainstream relational databases. For more information, see What is OceanBase Database?
Db2	Db2 is a relational database management system (RDBMS) for storing, retrieving, and managing data. Db2 allows you to process complex queries and transactions of large-sized datasets and data warehouses with high throughput. For more information, visit the Db2 official website.
GBase8a	GBase 8a is a relational database management system (RDBMS) that supports storage of large amounts of data and high-concurrent data read and write. GBase 8a is commonly used in fields such as public service, finance, telecommunication, and energy. GBase 8a supports SQL standards and provides a range of enterprise-level features, such as data partitioning, load balancing, and disaster recovery and backup. For more information, visit the GBase 8a official website.