Use LDPS to read and write data by executing SQL statements - Lindorm

This topic describes how to execute SQL statements to read and write data through the OLAP or ETL resource group of Lindorm Distributed Processing System (LDPS).

Prerequisites

LindormTable is activated for your Lindorm instance. For more information, see Create an instance.
LDPS is activated for your Lindorm instance. For more information, see Activate LDPS and modify the configurations.
The IP address of your client is added to the whitelist of your Lindorm instance. For more information, see Configure whitelists.

Use an OLAP resource group

The OLAP resource group of LDPS allows you to use MySQL protocol for query and analysis scenarios, delivering high-concurrency performance with low-latency responses. After the OLAP resource group is activated, permanent computing resources are allocated to ensure quick query responses.

Step 1: Create an OLAP resource group

Log on to the Lindorm console. In the upper-left corner of the page, select the region where the instance is deployed. On the Instances page, click the ID of the instance that you want to manage or click Manage in the Actions column corresponding to the instance.
In the Configurations section of the Instance Details page, click Resource Groups in the Actions column that corresponds to Compute Engine.

In the Resource Group Details dialog box that appears, click Create Resource Group and configure the following parameters.

Parameter	Description
Resource Group Type	Select OLAP.
Resource Group Name	The name of the resource group that you want to create. The name can contain only lowercase letters and digits and cannot exceed 63 characters in length. Example: `cg0`.
Node Specifications	Select a node specification.
Working Nodes in Job	Valid values: 4 to 1024. Default value: `4`.

Click OK to create the resource group.
Note
The creation process takes about 20 minutes.
In the Resource Group Details dialog box, when Status and Description is set to Running for the created resource group, you can move your pointer over the OLAP resource group name to obtain the Virtual Private Cloud (VPC) endpoint of the OLAP resource group, such as jdbc:mysql://ld-bp1dv48fk0yg0****-olap-proxy-ldps.lindorm.aliyuncs.com:9030.
After you configure a MySQL client, establish a JDBC connection to the specified OLAP resource group through the VPC internal endpoint, and log in with the username and password of LindormTable. You can connect to the OLAP resource group through the MySQL protocol and then execute SQL queries.
```
mysql -hld-bp1dv48fk0yg0****-olap-proxy-ldps.lindorm.aliyuncs.com -P9030 -uroot -p
```

Step 2: Access data

Access column-oriented data

Column-oriented data is a columnar storage data lake compatible with the Iceberg ecosystem. The data is stored in the file engine of the Lindorm instance and can be written and queried through the OLAP resource group.

Column-oriented data is stored in a catalog (used to identify different data sources) named lindorm_columnar. After the OLAP resource group is connected through the MySQL protocol, the column-oriented data catalog is accessed by default. You can also execute the SET CATALOG lindorm_columnar; statement to explicitly switch to the column-oriented data catalog.

Create and use a database.

-- Create a database. CREATE DATABASE olapdemo; -- Use the database. USE olapdemo;

Create a data table and write data.

-- Create a table. CREATE TABLE test (id INT, name STRING) ENGINE = iceberg; -- Insert data into the table. INSERT INTO test VALUES (0, 'Jay'), (1, 'Edison');

Query data.

Example 1:

SELECT id, name FROM test WHERE id != 0;

Output:

+------+--------+ | id | name | +------+--------+ | 1 | Edison | +------+--------+

Example 2:

SELECT count(distinct name) FROM test;

Output:

+----------------------+ | count(DISTINCT name) | +----------------------+ | 2 | +----------------------+

Example 3:

SELECT * FROM (SELECT id, name FROM test WHERE id != 0) t0 JOIN (SELECT id, name FROM test WHERE id != 2) t1 ON t0.id=t1.id;

+------+--------+------+--------+ | id | name | id | name | +------+--------+------+--------+ | 1 | Edison | 1 | Edison | +------+--------+------+--------+

Delete the table.
```
DROP TABLE test;
```
Delete the database.
```
DROP DATABASE olapdemo;
```

Access wide table data

The OLAP resource group allows you to directly query data in LindormTable, leveraging its computing power to efficiently perform complex query calculations. The OLAP resource group does not support creating a wide table or writing data to the wide table. It only supports data query operations.

LindormTable data is stored in a catalog named lindorm_table. To access LindormTable data, you need to explicitly execute the SET CATALOG lindorm_table; statement to switch to the lindorm_table catalog.

If you already have a usable wide table, proceed to the next step. Otherwise, connect to LindormTable and execute the following statements to create a wide table named tb:

-- Create a database. CREATE DATABASE test; -- Use the database. USE test; -- Create a wide table and insert two data records. CREATE TABLE tb (id varchar, name varchar, address varchar, primary key(id, name)) ; UPSERT INTO tb (id, name, address) values ('001', 'Jack', 'hz'); UPSERT INTO tb (id, name, address) values ('002', 'Edison', 'bj');

In the MySQL command-line tool connected to the OLAP resource group, execute the following statements to access wide table data:

Explicitly switch to the data source and use the database.

-- Explicitly switch to the data source. SET CATALOG lindorm_table; -- Use the test database. USE test;

Query wide table data.

Example 1:

SELECT * FROM tb LIMIT 5;

Output:

+------+--------+---------+ | id | name | address | +------+--------+---------+ | 001 | Jack | hz | | 002 | Edison | bj | +------+--------+---------+

Example 2:

SELECT count(*) FROM tb;

Output:

+----------+ | count(*) | +----------+ | 2 | +----------+

Use an ETL resource group

The ETL resource group of LDPS allows you to use serverless Spark SQL to query and write data. Resources are requested on demand and automatically released, making the ETL resource group suitable for low-frequency queries or offline report queries.

Step 1: Activate a resource group

Log on to the Lindorm console. In the upper-left corner of the page, select the region where the instance is deployed. On the Instances page, click the ID of the instance that you want to manage or click Manage in the Actions column corresponding to the instance.
In the Configurations section of the Instance Details page, click Resource Groups in the Actions column that corresponds to Compute Engine.

In the Resource Group Details dialog box, click Create Resource Group and configure the following parameters.

Parameter	Description
Resource Group Type	Select ETL.
Resource Group Name	The name of the resource group that you want to create. The name can contain only lowercase letters and digits and cannot exceed 63 characters in length. Example: `cg0`.
Daily Resource Quota	The upper limit of Capacity Unit (CU) resources consumed by the resource group per day. Unit: `CUHour`. Default value: `100000`. Important* If the limit is exceeded, jobs are forcibly deleted immediately. For resource groups with high stability requirements, you can set this parameter to `0`, which indicates no limit.
CPU Upper Limit (Cores)	The maximum number of CPU cores in the resource group. Valid values: 100 to 100000.
Memory Upper Limit (GB)	The maximum memory available for the resource group. Valid values: 400 to 1000000. No default value is configured.
Authorized Users	The default value is `*`, which indicates that all users are allowed to access the resource group.

Click OK to create the resource group.

Step 2: Prepare the environment

Note

The following environments are deployed on an Elastic Compute Service (ECS) instance in the same VPC where the Lindorm instance is deployed.

Install Java Development Kit (JDK) V1.8 or later.
Download the Spark release package.
Decompress the Spark release package.
Set the environment variable SPARK_HOME to the path to which the package is decompressed.

export SPARK_HOME=/path/to/spark/;

Configure parameters in the following configuration file: $SPARK_HOME/conf/beeline.conf.
- endpoint: the LDPS endpoint for JDBC. For information about how to obtain the endpoint, see View endpoints.
- user: the username that is used to connect to LindormTable.
- password: the password that is used to connect to LindormTable.
- shareResource: specifies whether computing resources are shared between multiple sessions of the same user. Default value: true.
- compute-group: the name of the ETL resource group that you want to use. Default value: default.

Go to the $SPARK_HOME/bin path and run the ./beeline command. The following information is returned:

Welcome to Lindorm Distributed Processing System (LDPS) !!! Initializing environment. It might take minutes ... Environemnt prepared. You may visit your jdbc cluster by below url: http://alb-boqak6zfns5gzx****.cn-hangzhou.alb.aliyuncsslb.com/proxy/75ce76086b61470da7046bd4c2b7**** Please note -- you are sharing this JDBC cluster between SQL sessions from the same user. The cluster will be released by auto if idle for 4 hours. You may also kill it manually by visiting above web url and clicking 'kill' in tab of 'Query Engine' lindorm-beeline>

In the interactive session, execute SQL statements to perform write or query operations.

Note

You can access the SparkUI of LDPS by using the URL http://alb-boqak6zfns5gzx****.cn-hangzhou.alb.aliyuncsslb.com/proxy/75ce76086b61470da7046bd4c2b7**** in the returned information.

Step 3: Access data

Access column-oriented data

Create and use a database.

-- Create a database. CREATE DATABASE etldemo; -- Use the database. USE etldemo;

Create a data table and write data to the table.

-- Create a table. CREATE TABLE test (id INT, name STRING); -- Insert data into the table. INSERT INTO test VALUES (0, 'Jay'), (1, 'Edison');

Query data.

Example 1:

SELECT id, name FROM test WHERE id != 0;

Output:

+------+--------+ | id | name | +------+--------+ | 1 | Edison | +------+--------+

Example 2:

SELECT count(distinct name) FROM test;

Output:

+----------------------+ | count(DISTINCT name) | +----------------------+ | 2 | +----------------------+

Example 3:

SELECT * FROM (SELECT id, name FROM test WHERE id != 0) t0 JOIN (SELECT id, name FROM test WHERE id != 2) t1 ON t0.id=t1.id;

+------+--------+------+--------+ | id | name | id | name | +------+--------+------+--------+ | 1 | Edison | 1 | Edison | +------+--------+------+--------+

Delete the table.
```
DROP TABLE test;
```
Delete the database.
```
DROP DATABASE etldemo;
```

Access wide table data

The ETL resource group allows you to use Spark SQL to query data in LindormTable, leveraging its elastic computing power to perform complex query calculations on wide table data. Spark SQL does not support DDL operations in LindormTable, such as creating or dropping tables, but you can query data in LindormTable.

If you already have a usable wide table, proceed to the next step. Otherwise, connect to LindormTable and execute the following statements to create a wide table named tb:

-- Create a database. CREATE DATABASE test; -- Use the database. USE test; -- Create a data table and insert two data records. CREATE TABLE tb (id varchar, name varchar, address varchar, primary key(id, name)) ; UPSERT INTO tb (id, name, address) values ('001', 'Jack', 'hz'); UPSERT INTO tb (id, name, address) values ('002', 'Edison', 'bj');

In the lindorm-beeline interactive session, execute the following query statements to access wide table data:

Explicitly switch to the data source and use the database.

-- Explicitly switch to the data source. SET CATALOG lindorm_table; -- Use the test database. USE test;

Query wide table data.

Example 1:

SELECT * FROM tb LIMIT 5;

Output:

+------+--------+---------+ | id | name | address | +------+--------+---------+ | 001 | Jack | hz | | 002 | Edison | bj | +------+--------+---------+

Example 2:

SELECT count(*) FROM tb;

Output:

+-----------+ | count(1) | +-----------+ | 2 | +-----------+