Before you can develop and manage Lindorm tasks in DataWorks, you must add your Lindorm instance as a computing resource to a workspace. After the instance is added, you can use the computing resource for data synchronization and development.
Background information
Lindorm is a distributed computing service based on a cloud-native architecture. It supports community-supported computing models, is compatible with Spark interfaces, and is deeply integrated with the Lindorm storage engine. Lindorm uses its underlying data storage features and indexing capabilities to efficiently run distributed jobs. It is suitable for various scenarios, such as massive data processing, interactive analysis, machine learning, and graph computing.
Prerequisites
- Important
Only workspaces that are part of the Data Studio (New) public preview are supported.
A Lindorm instance is created. The instance must meet the following conditions:
The compute engine is activated for the Lindorm instance.
The Lindorm instance and the DataWorks workspace are in the same region.
A serverless resource group is added and attached to the DataWorks workspace.
Add a Lindorm computing resource
Limits
Region limits: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), and Indonesia (Jakarta).
Permission limits:
You can run Lindorm tasks in DataWorks only on a DataWorks serverless resource group.
Only workspace members with the O&M or Workspace Administrator role, or members with the AliyunDataWorksFullAccess permission can add computing resources. For more information about how to view member roles or grant permissions, see Add workspace members and manage their roles and permissions.
Go to the Computing Resource page
Go to the DataWorks Workspaces page. In the top navigation bar, switch to the destination region and find the workspace where you want to add the computing resource. Click the workspace name or Details in the Operation column to open the workspace details page.
In the navigation pane on the left, click Computing Resource and then select a computing resource type to open the computing resource list page.
Add a Lindorm computing resource
On the Computing Resource page, configure the parameters to add a Lindorm computing resource.
Select a computing resource type.
You can click Add Computing Resource or Create Computing Resource to open the Add Computing Resource page.
On the Add Computing Resource page, set the computing resource type to Lindorm to navigate to the Add Lindorm Computing Resource configuration page.
Configure the Lindorm computing resource.
On the Add Lindorm Computing Resource page, configure the parameters as described in the following table.
Configuration area
Parameter
Description
Basic Information
Configuration Mode
Only Alibaba Cloud Instance Mode is supported.
Instance
Select the Lindorm instance in the current region that you want to add to DataWorks.
Compute Engine Type
The default value is the Spark engine.
Lindorm Resource Group
Select the default Lindorm computing resource group to run Lindorm tasks in DataWorks. The default value is
default
.Database Name
Select the default database to connect to when you use this Lindorm computing resource in DataWorks. The default database is
default
.Username
Enter the username for identity authentication. To obtain the username, go to the Lindorm console, find your instance, and click the Instance Name. Then, find the username on the Database Connection page in the navigation pane on the left.
Password
Enter the password for identity authentication. To obtain the password, go to the Lindorm console, find your instance, and click the Instance Name. Then, find the password on the Database Connection page in the navigation pane on the left.
Connection Configuration
Connection Status
In the Connection Configuration section, select the serverless resource group for DataWorks to run Lindorm tasks. Click Test Connectivity to make sure the resource group can access your Lindorm instance. For more information, see Network connectivity solutions.
Click Confirm to complete the configuration.
(Optional) Configure global Spark parameters
In DataWorks, you can specify SPARK parameters for modules at the workspace level. These modules then use the specified SPARK parameters to run tasks by default. You can customize global SPARK parameters and set whether they have a higher priority than local parameters in specific modules, such as Data Development, DataAnalysis, and Operation Center. For more information, see Set global Spark parameters. The following sections describe how to set global SPARK parameters.
Background information
Apache Spark is an engine for large-scale data analytics. In DataWorks, you can configure the SPARK parameters used by scheduling nodes at runtime in the following ways:
Method 1: Configure global SPARK parameters at the workspace level. These parameters are used by DataWorks modules to run EMR tasks. You can also specify whether these global parameters have a higher priority than the SPARK parameters configured in a specific module. For more information, see Configure global SPARK parameters.
Method 2: In the Data Development module, you can set specific SPARK properties for a single node on the node editing page. Other product modules do not support setting SPARK properties for individual tasks.
Access control
Only the following roles can configure global SPARK parameters:
Alibaba Cloud account.
A Resource Access Management (RAM) user or RAM role with the
AliyunDataWorksFullAccess
permission.A RAM user with the Workspace Administrator role.
View global SPARK parameters
Go to the Computing Resource page and find the Lindorm computing resource that you added.
Click the SPARK Parameters tab to view the global parameter configurations for SPARK.
Configure global SPARK parameters
To configure the SPARK parameters for the Lindorm computing resource, see Job configuration.
Go to the Computing Resource page and find the Lindorm computing resource that you added.
Click the SPARK Parameters tab to view the global parameter configurations.
Set global SPARK parameters.
On the SPARK Parameters page, click Edit SPARK Parameters in the upper-right corner to configure global SPARK parameters and set the priority for each module.
NoteThese are global configurations for the workspace. Ensure that you are in the correct workspace before you proceed.
Parameter
Steps
Spark Property
Configure the Spark properties used when modules run Lindorm tasks. For more information, see Job configuration.
Click Add below, and enter the Spark Property Name and the corresponding Spark Property Value.
NoteTo enable the collection of data lineage and output information, make the following configurations:
Set Spark Property Name to
spark.sql.queryExecutionListeners
.Set Spark Property Value to
com.aliyun.dataworks.meta.lineage.LineageListener
.
For more information about Spark property settings, see Job configuration.
Global Configuration Priority
If you select this option, the global configurations take precedence over the configurations within product modules. Tasks are then run based on the global SPARK properties.
Global configuration: Refers to the Spark properties configured under for the corresponding Lindorm computing resource on the SPARK Parameters page.
Currently, you can set global SPARK parameters only for the Data Development (Data Studio) and Operation Center modules.
Configurations within product modules:
Data Development (DataStudio): For Lindorm Spark and Lindorm Spark SQL nodes, you can configure a single node task's SPARK properties on the Configuration Item tab of the node edit page or on the node edit page itself.
Other product modules: Setting SPARK properties separately within the module is not supported.
Click Confirm to save the global Spark parameters.
What to do next
After you configure the Lindorm computing resource, you can use it to develop node tasks in the Data Development module. For more information, see Lindorm Spark node and Lindorm Spark SQL node.
You can enable the collection of Lindorm data lineage and output information when you configure global SPARK parameters. After you create and run a metadata collector, you can view and manage the metadata of the Lindorm instance in Data Map. For more information, see View and manage Lindorm metadata in Data Map.