Lindorm computing resources - DataWorks - Alibaba Cloud Documentation Center

Before you can develop and manage Lindorm tasks in DataWorks, you must add your Lindorm instance as a computing resource to a workspace. After the instance is added, you can use the computing resource for data synchronization and development.

Background information

Lindorm is a distributed computing service based on a cloud-native architecture. It supports community-supported computing models, is compatible with Spark interfaces, and is deeply integrated with the Lindorm storage engine. Lindorm uses its underlying data storage features and indexing capabilities to efficiently run distributed jobs. It is suitable for various scenarios, such as massive data processing, interactive analysis, machine learning, and graph computing.

Prerequisites

A workspace is created.
Important
Only workspaces that are part of the Data Studio (New) public preview are supported.
A Lindorm instance is created. The instance must meet the following conditions:
- The compute engine is activated for the Lindorm instance.
- The Lindorm instance and the DataWorks workspace are in the same region.
A serverless resource group is added and attached to the DataWorks workspace.

Add a Lindorm computing resource

Limits

Region limits: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), and Indonesia (Jakarta).
Permission limits:
- You can run Lindorm tasks in DataWorks only on a DataWorks serverless resource group.
- Only workspace members with the O&M or Workspace Administrator role, or members with the AliyunDataWorksFullAccess permission can add computing resources. For more information about how to view member roles or grant permissions, see Add workspace members and manage their roles and permissions.

Go to the Computing Resource page

Go to the DataWorks Workspaces page. In the top navigation bar, switch to the destination region and find the workspace where you want to add the computing resource. Click the workspace name or Details in the Operation column to open the workspace details page.
In the navigation pane on the left, click Computing Resource and then select a computing resource type to open the computing resource list page.

Add a Lindorm computing resource

On the Computing Resource page, configure the parameters to add a Lindorm computing resource.

Select a computing resource type.
1. You can click Add Computing Resource or Create Computing Resource to open the Add Computing Resource page.
2. On the Add Computing Resource page, set the computing resource type to Lindorm to navigate to the Add Lindorm Computing Resource configuration page.

Configure the Lindorm computing resource.

On the Add Lindorm Computing Resource page, configure the parameters as described in the following table.

Configuration area	Parameter	Description
Basic Information	Configuration Mode	Only Alibaba Cloud Instance Mode is supported.
	Instance	Select the Lindorm instance in the current region that you want to add to DataWorks.
	Compute Engine Type	The default value is the Spark engine.
	Lindorm Resource Group	Select the default Lindorm computing resource group to run Lindorm tasks in DataWorks. The default value is `default`.
	Database Name	Select the default database to connect to when you use this Lindorm computing resource in DataWorks. The default database is `default`.
	Username	Enter the username for identity authentication. To obtain the username, go to the Lindorm console, find your instance, and click the Instance Name. Then, find the username on the Database Connection page in the navigation pane on the left.
	Password	Enter the password for identity authentication. To obtain the password, go to the Lindorm console, find your instance, and click the Instance Name. Then, find the password on the Database Connection page in the navigation pane on the left.
Connection Configuration	Connection Status	In the Connection Configuration section, select the serverless resource group for DataWorks to run Lindorm tasks. Click Test Connectivity to make sure the resource group can access your Lindorm instance. For more information, see Network connectivity solutions.

Click Confirm to complete the configuration.

(Optional) Configure global Spark parameters

In DataWorks, you can specify SPARK parameters for modules at the workspace level. These modules then use the specified SPARK parameters to run tasks by default. You can customize global SPARK parameters and set whether they have a higher priority than local parameters in specific modules, such as Data Development, DataAnalysis, and Operation Center. For more information, see Set global Spark parameters. The following sections describe how to set global SPARK parameters.

Background information

Apache Spark is an engine for large-scale data analytics. In DataWorks, you can configure the SPARK parameters used by scheduling nodes at runtime in the following ways:

Method 1: Configure global SPARK parameters at the workspace level. These parameters are used by DataWorks modules to run EMR tasks. You can also specify whether these global parameters have a higher priority than the SPARK parameters configured in a specific module. For more information, see Configure global SPARK parameters.

Method 2: In the Data Development module, you can set specific SPARK properties for a single node on the node editing page. Other product modules do not support setting SPARK properties for individual tasks.

Access control

Only the following roles can configure global SPARK parameters:

Alibaba Cloud account.
A Resource Access Management (RAM) user or RAM role with the AliyunDataWorksFullAccess permission.
A RAM user with the Workspace Administrator role.

View global SPARK parameters

Go to the Computing Resource page and find the Lindorm computing resource that you added.
Click the SPARK Parameters tab to view the global parameter configurations for SPARK.

Configure global SPARK parameters

To configure the SPARK parameters for the Lindorm computing resource, see Job configuration.

Go to the Computing Resource page and find the Lindorm computing resource that you added.
Click the SPARK Parameters tab to view the global parameter configurations.

Set global SPARK parameters.

On the SPARK Parameters page, click Edit SPARK Parameters in the upper-right corner to configure global SPARK parameters and set the priority for each module.

Note

These are global configurations for the workspace. Ensure that you are in the correct workspace before you proceed.

Parameter

Steps

Spark Property

Configure the Spark properties used when modules run Lindorm tasks. For more information, see Job configuration.

Click Add below, and enter the Spark Property Name and the corresponding Spark Property Value.

Note

To enable the collection of data lineage and output information, make the following configurations:

Set Spark Property Name to spark.sql.queryExecutionListeners.
Set Spark Property Value to com.aliyun.dataworks.meta.lineage.LineageListener.

For more information about Spark property settings, see Job configuration.

Global Configuration Priority

If you select this option, the global configurations take precedence over the configurations within product modules. Tasks are then run based on the global SPARK properties.

Global configuration: Refers to the Spark properties configured under Management Center > Computing Resource for the corresponding Lindorm computing resource on the SPARK Parameters page.
Currently, you can set global SPARK parameters only for the Data Development (Data Studio) and Operation Center modules.
Configurations within product modules:
Data Development (DataStudio): For Lindorm Spark and Lindorm Spark SQL nodes, you can configure a single node task's SPARK properties on the Configuration Item tab of the node edit page or on the node edit page itself.
Other product modules: Setting SPARK properties separately within the module is not supported.

Click Confirm to save the global Spark parameters.

What to do next

After you configure the Lindorm computing resource, you can use it to develop node tasks in the Data Development module. For more information, see Lindorm Spark node and Lindorm Spark SQL node.
You can enable the collection of Lindorm data lineage and output information when you configure global SPARK parameters. After you create and run a metadata collector, you can view and manage the metadata of the Lindorm instance in Data Map. For more information, see View and manage Lindorm metadata in Data Map.