SPL usage - Simple Log Service - Alibaba Cloud Documentation Center

This topic describes how to use Structured Process Language (SPL) in different scenarios.

SPL statement optimization

You can often write SPL statements in several ways to achieve a specific data processing result. Writing concise and efficient SPL statements simplifies maintenance and improves performance. The following table provides some suggestions.

Suggestion	Before optimization	After optimization
Combine consecutive `where` clauses into one.	`* \| where request_method = 'POST' \| where cast(status as integer) >= 500`	`* \| where request_method = 'POST' and cast(status as integer) >= 500`
Combine consecutive `extend` clauses into one.	`* \| extend k1 = 'v1' \| extend k2 = 'v2' \| extend k3 = 'v3'`	`* \| extend k1 = 'v1', k2 = 'v2', k3 = 'v3'`
Replace `extend` and `project-away` with `project-rename`.	`* \| extend method = request_method \| project-away request_method`	`* \| project-rename method = request_method`
Modify field values in place instead of using `extend` to create new fields, unless necessary.	`* \| extend newContent = json_extract(content, 'abc \d+') \| project-away content`	`* \| extend content = json_extract(content, 'abc \d+')`

Handle special fields

Time fields

During SPL execution, the data type of an SLS log time field is always INTEGER or BIGINT. SLS log fields include the data timestamp field __time__ and the nanosecond part of the data timestamp field __time_ns_part__.

To update the data timestamp, use the `extend` instruction and ensure the new value is an INTEGER or BIGINT. Other instructions cannot operate on time fields. Their behavior is as follows:

`project`, `project-away`, and `project-rename`: These instructions retain time fields by default. You cannot rename or overwrite them.
`parse-regexp` and `parse-json`: If the extraction result includes time fields, they are ignored.

Example

Extract time field values from an existing time string.

SPL statement

* | parse-regexp time, '([\d\-\s:]+)\.(\d+)' as ts, ms | extend ts=date_parse(ts, '%Y-%m-%d %H:%i:%S') | extend __time__=cast(to_unixtime(ts) as INTEGER) | extend __time_ns_part__=cast(ms as INTEGER) * 1000000 | project-away ts, ms

Input data
```
time: '2023-11-11 01:23:45.678'
```

Output result

__time__: 1699637025 __time_ns_part__: 678000000 time: '2023-11-11 01:23:45.678'

Field names with special characters

If a field name contains spaces or other special characters, enclose it in double quotation marks (") when you reference it. For example, if a field is named A B, which contains a space, you can reference it as "A B" in an SPL statement. The following example shows how to do this:

* | where "A B" like '%error%'

Case-insensitive field names

In SLS scan queries, field names referenced in SPL instructions are case-insensitive. For example, if a log contains a field named Method, you can reference it as method or METHOD in an SPL instruction.

Important

This applies to the scan query feature of Simple Log Service. For more information, see Scan queries.

Example

Use a case-insensitive field name in a `where` clause.

SPL statement
```
* | where METHOD like 'Post%'
```
Input data
```
Method: 'PostLogstoreLogs'
```
Output result
```
Method: 'PostLogstoreLogs'
```

Handle field name conflicts

During log uploads or SPL execution, the case-insensitive handling of field names can lead to conflicts. For example, a conflict occurs if a raw log contains both `Method` and `method` fields. SPL resolves these conflicts differently depending on the scenario.

To avoid these situations, standardize the field names in your raw logs.

Conflicts in input data

If a raw log contains fields with case-insensitive name conflicts, such as Status and status, SPL randomly selects one field for input and discards the other. For example:

SPL statement

* | extend status_cast = cast(status as bigint)

Input data
```
Status: '200' status: '404'
```

Processing result

Possibility 1: The value of the Status field is retained.

Status: '200' -- The first column is retained, and the second column is discarded. status_cast: '200'

Possibility 2: The value of the status field is retained.

status: '404' -- The second column is retained, and the first column is discarded. Status_cast: '404'

Conflicts in output results

Scenario 1: Raw data field conflicts

During SPL execution, fields with case-insensitive name conflicts may be generated. In this case, SPL randomly selects one of them for the output. For example, if a log field contains a JSON string, using the parse-json instruction might create fields with conflicting names. For example:

SPL statement
```
* | parse-json content
```

Input data

content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}'

Output result

Possibility 1: The Method field is retained.

content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}' Method: 'PostLogs' -- The Method field is retained. status: '200'

Possibility 2: The method field is retained.

content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}' method: 'GetLogs' -- The method field is retained. status: '200'

Scenario 2: Conflicts with newly generated data fields

To avoid ambiguity, SPL preserves the case of new field names that are explicitly generated by instructions. These instructions include extend and instructions that use as to specify field names, such as parse-regexp and parse-csv.

For example, if you use extend to create a new field named Method, the resulting field name remains Method.

SPL statement
```
* | extend Method = 'Post'
```
Input data
```
Status: '200'
```
Output result
```
Status: '200' Method: 'Post'
```

Handle SLS reserved field conflicts

Important

This applies to the real-time consumption and scan query features of Simple Log Service.

For a complete list of reserved fields, see Reserved fields. SPL reads data from the LogGroup structure (see Data encoding for the LogGroup definition). If raw data does not conform to the standard LogGroup encoding, some reserved fields may be in LogContent instead of their standard location. SPL handles these conflicts as follows:

For the __source__, __topic__, __time__, and __time_ns_part__ fields, SPL reads values from the standard LogGroup encoding and ignores any LogContent fields with the same names.
For tag fields with the __tag__: prefix, SPL first attempts to read values from the standard LogGroup encoding. If a value is not found, SPL reads it from LogContent. For example, for the __tag__:ip field, SPL first attempts to read the field with the key ip from the LogTag list. If the field does not exist, SPL then reads the log field with the key __tag__:ip from the custom log fields in LogContent.

The line field for full-text search

Important

This applies to the SLS scan query feature.

To filter raw logs in the console or when using the GetLogstoreLogs API operation, you can use the `__line__` field.

Example

Search for the keyword `error` in logs.
```
* | where __line__ like '%error%'
```
If a log has a field named `__line__`, enclose the name in backticks (`` ` ``) to reference it, such as `__line__`.
```
* | where `__line__` ='20'
```

Value retention and overwrite policies

When an SPL instruction is executed, if an output field has the same name as an existing field in the input data, the policy for determining the field's value is as follows:

Important

The policies for retaining and overwriting field values do not apply to the `extend` instruction. For the `extend` instruction, if a field name conflicts, the new value is always used.

Inconsistent data types between old and new values

The original value of the input field is retained.

Example

Example 1: A renamed field from a `project` instruction has a conflicting name.

SPL statement

* | extend status=cast(status as BIGINT) -- Convert the type of the status field to BIGINT. | project code=status -- The new value's type (BIGINT) differs from the old value's type (VARCHAR), so the old value is kept.

Input data
```
status: '200' code: 'Success'
```
Output result
```
code: 'Success'
```

Example 2: An extracted field from a `parse-json` instruction has a conflicting name.

SPL statement

* | extend status=cast(status as BIGINT) -- Convert the type of the status field to BIGINT. | parse-json content -- The old type of status is BIGINT and the new type is VARCHAR. The old value is retained.

Input data

status: '200' content: '{"status": "Success", "body": "this is test"}'

Output result

content: '{"status": "Success", "body": "this is test"}' status: 200 body: 'this is test'

Consistent data types between old and new values

If the input value is null, the new value is used. Otherwise, the behavior is determined by the mode parameter in the instruction, as defined in the following table.

Important

If the mode parameter is not defined in the instruction, its default value is overwrite.

Mode	Description
overwrite	Overwrites the old value with the new value.
preserve	Retains the old value and discards the new value.

Example

Example 1: A renamed field from a `project` instruction has a conflicting name and the same type. The default mode is `overwrite`.
- SPL statement
```
* | project code=status -- The old and new types of code are both VARCHAR. The new value is used based on the overwrite mode.
```
- Input data
```
status: '200' code: 'Success'
```
- Output result
```
code: '200'
```

Example 2: An extracted field from a `parse-json` instruction has a conflicting name and the same type. The default mode is `overwrite`.

SPL statement

* | parse-json content -- The old and new types of status are both VARCHAR. The new value is used based on the overwrite mode.

Input data

status: '200' content: '{"status": "Success", "body": "this is test"}'

Output result

content: '{"status": "Success", "body": "this is test"}' status: 'Success' body: 'this is test'

Example 3: An extracted field from a `parse-json` instruction has a conflicting name and the same type. The mode is set to `preserve`.

SPL statement

* | parse-json -mode='preserve' content -- The old and new types of status are both VARCHAR. The old value is retained based on the preserve mode.

Input data

status: '200' content: '{"status": "Success", "body": "this is test"}'

Output result

content: '{"status": "Success", "body": "this is test"}' status: '200' body: 'this is test'

Data type conversion

Initial type

For data processing with SPL, the initial data type of all input fields is VARCHAR, except for log time fields. If subsequent processing logic involves different data types, you must perform data type conversion.

Example

To filter access logs with a status code of 5xx, you must convert the `status` field to the BIGINT type before the comparison.

* -- The initial type of the status field is VARCHAR. | where cast(status as BIGINT) >= 500 -- Convert the type of the status field to BIGINT, and then perform the comparison.

Type retention

During SPL data processing, after you use the `extend` instruction to convert the data type of a field, subsequent processing logic uses the converted data type.

Example

* -- A Logstore is used as input data. Except for time fields, all fields are initially of the VARCHAR type. | where __source__='127.0.0.1' -- Filter on the __source__ field. | extend status=cast(status as BIGINT) -- Convert the type of the status field to BIGINT. | project status, content | where status>=500 -- The type of the status field remains BIGINT, so it can be directly compared with the number 500.

Handle null values in SPL expressions

Generating null values

During SPL data processing, null values are generated in the following two scenarios:

If a field used in an SPL expression does not exist in the input data, its value is treated as null during calculations.
If an exception occurs during the calculation of an SPL expression, the result is null. For example, a `cast` type conversion fails or an array index is out of bounds.

Example

If a field does not exist, its value is treated as null in calculations.

SPL statement

* | extend withoutStatus=(status is null)

Input data

# Entry 1 status: '200' code: 'Success' # Entry 2 code: 'Success'

Output result

# Entry 1 status: '200' code: 'Success' withoutStatus: false # Entry 2 code: 'Success' withoutStatus: true

If an exception occurs during calculation, the result is null.

SPL statement

* | extend code=cast(code as BIGINT) -- Failed to convert the code field to BIGINT. | extend values=json_parse(values) | extend values=cast(values as ARRAY(BIGINT)) | extend last=arr[10] -- Array index out of bounds.

Input data

status: '200' code: 'Success' values: '[1,2,3]'

Output result

status: '200' code: null values: [1, 2, 3] last: null

Eliminating null values

To eliminate null values during calculations, you can use the `COALESCE` expression to combine multiple values by priority and retrieve the first non-null value as the final result. You can also set a default value to use if all expression results are null.

Example

Read the last element of an array. If the array is empty, the default value is 0.

SPL statement

* | extend values=json_parse(values) | extend values=cast(values as ARRAY(BIGINT)) | extend last=COALESCE(values[3], values[2], values[1], 0)

Input data

# Entry 1 values: '[1, 2, 3]' # Entry 2 values: '[]'

Output result

# Entry 1 values: [1, 2, 3] last: 3 # Entry 2 values: [] last: 0

Character escaping

Single quotation marks

Single quotation marks are used to enclose a field's value. If the field value contains a single quotation mark, you must use an additional single quotation mark to escape it.

Example

SPL statement

* | extend user = 'Alice' | extend phone = 'Alice''s Phone'

Output result
```
user: Alice phone: Alice's Phone
```

Double quotation marks

Double quotation marks are used to enclose a field name. If the field name contains a double quotation mark, you must use an additional double quotation mark to escape it.

Example

SPL statement

* | extend user_name = 'Alice' | extend "user name" = 'Alice' | extend "user""name" = 'Alice'

Output result

user_name: Alice user name: Alice user"name: Alice

Other special characters

Example 1

In SPL, the backslash (\) is not an escape character and is therefore preserved as is.

SPL statement

* | extend a = 'foo\tbar' | extend b = 'foo\nbar'

Output result
```
a: foo\tbar b: foo\nbar
```

Example 2

If a string must include special characters, such as a tab character or a line feed, you can use the chr function to concatenate strings.

SPL statement

* | extend a = concat('foo', chr(9), 'bar') | extend b = concat('foo', chr(10), 'bar')

Output result
```
a: foo	bar b: foo bar
```

Error handling

Syntax error

Syntax errors occur when an SPL statement is malformed, such as an incorrect instruction name, a keyword reference error, or a type setting error. When a syntax error occurs, SPL does not process any data. You must modify the statement based on the error message.

Data error

A data error occurs when a function or conversion fails during SPL execution. SPL sets the resulting field to null. Because data errors can occur on any row, SPL randomly samples and returns only some of the error messages. You can ignore these errors or modify the SPL statement based on the data content.

Data errors do not stop the entire execution process. The SPL statement still returns results, but the value of the field where the error occurred is null. You can ignore these errors as needed.

Execution timeout

An SPL statement contains various instructions, and their running time varies depending on the data scenario. If the total running time of an SPL statement exceeds the default timeout period, the execution stops and a timeout error is returned. In this case, the execution result is empty. The default timeout period may differ for scan queries, real-time consumption, and Logtail collection.

If you encounter this error, you can adjust your SPL statement to reduce its complexity, for example, by simplifying complex regular expressions or reducing the number of pipelines.

Memory limit exceeded

An SPL statement contains various instructions, and their memory consumption varies depending on the data scenario. SPL execution is limited to a specific memory quota. If this quota is exceeded, the execution fails and a memory limit exceeded error is returned. In this case, the execution result is empty. The default memory quota may differ for scan queries, real-time consumption, and Logtail collection.

If you encounter this error, you can adjust your SPL statement to reduce its complexity, reduce the number of pipelines, or check if the raw data size is too large.

Suggestion	Before optimization	After optimization
Combine consecutive `where` clauses into one.	`* \| where request_method = 'POST' \| where cast(status as integer) >= 500`	`* \| where request_method = 'POST' and cast(status as integer) >= 500`
Combine consecutive `extend` clauses into one.	`* \| extend k1 = 'v1' \| extend k2 = 'v2' \| extend k3 = 'v3'`	`* \| extend k1 = 'v1', k2 = 'v2', k3 = 'v3'`
Replace `extend` and `project-away` with `project-rename`.	`* \| extend method = request_method \| project-away request_method`	`* \| project-rename method = request_method`
Modify field values in place instead of using `extend` to create new fields, unless necessary.	`* \| extend newContent = json_extract(content, 'abc \d+') \| project-away content`	`* \| extend content = json_extract(content, 'abc \d+')`

SPL statement optimization

Handle special fields

Time fields

Example

Field names with special characters

Case-insensitive field names

Example

Handle field name conflicts

Conflicts in input data

Conflicts in output results

Handle SLS reserved field conflicts

The __line__ field for full-text search

Example

Value retention and overwrite policies

Inconsistent data types between old and new values

Example

Consistent data types between old and new values

Example

Data type conversion

Initial type

Example

Type retention

Example

Handle null values in SPL expressions

Generating null values

Example

Eliminating null values

Example

Character escaping

Single quotation marks

Example

Double quotation marks

Example

Other special characters

Example 1

Example 2

Error handling

Syntax error

Data error

Execution timeout

Memory limit exceeded

The line field for full-text search