This topic describes how to use Structured Process Language (SPL) in different scenarios.
SPL statement optimization
You can often write SPL statements in several ways to achieve a specific data processing result. Writing concise and efficient SPL statements simplifies maintenance and improves performance. The following table provides some suggestions.
Suggestion | Before optimization | After optimization |
Combine consecutive `where` clauses into one. | | |
Combine consecutive `extend` clauses into one. | | |
Replace `extend` and `project-away` with `project-rename`. | | |
Modify field values in place instead of using `extend` to create new fields, unless necessary. | | |
Handle special fields
Time fields
During SPL execution, the data type of an SLS log time field is always INTEGER or BIGINT. SLS log fields include the data timestamp field __time__ and the nanosecond part of the data timestamp field __time_ns_part__.
To update the data timestamp, use the `extend` instruction and ensure the new value is an INTEGER or BIGINT. Other instructions cannot operate on time fields. Their behavior is as follows:
`project`, `project-away`, and `project-rename`: These instructions retain time fields by default. You cannot rename or overwrite them.
`parse-regexp` and `parse-json`: If the extraction result includes time fields, they are ignored.
Example
Extract time field values from an existing time string.
SPL statement
* | parse-regexp time, '([\d\-\s:]+)\.(\d+)' as ts, ms | extend ts=date_parse(ts, '%Y-%m-%d %H:%i:%S') | extend __time__=cast(to_unixtime(ts) as INTEGER) | extend __time_ns_part__=cast(ms as INTEGER) * 1000000 | project-away ts, msInput data
time: '2023-11-11 01:23:45.678'Output result
__time__: 1699637025 __time_ns_part__: 678000000 time: '2023-11-11 01:23:45.678'
Field names with special characters
If a field name contains spaces or other special characters, enclose it in double quotation marks (") when you reference it. For example, if a field is named A B, which contains a space, you can reference it as "A B" in an SPL statement. The following example shows how to do this:
* | where "A B" like '%error%'Case-insensitive field names
In SLS scan queries, field names referenced in SPL instructions are case-insensitive. For example, if a log contains a field named Method, you can reference it as method or METHOD in an SPL instruction.
This applies to the scan query feature of Simple Log Service. For more information, see Scan queries.
Example
Use a case-insensitive field name in a `where` clause.
SPL statement
* | where METHOD like 'Post%'Input data
Method: 'PostLogstoreLogs'Output result
Method: 'PostLogstoreLogs'
Handle field name conflicts
During log uploads or SPL execution, the case-insensitive handling of field names can lead to conflicts. For example, a conflict occurs if a raw log contains both `Method` and `method` fields. SPL resolves these conflicts differently depending on the scenario.
To avoid these situations, standardize the field names in your raw logs.
Conflicts in input data
If a raw log contains fields with case-insensitive name conflicts, such as Status and status, SPL randomly selects one field for input and discards the other. For example:
SPL statement
* | extend status_cast = cast(status as bigint)Input data
Status: '200' status: '404'Processing result
Possibility 1: The value of the Status field is retained.
Status: '200' -- The first column is retained, and the second column is discarded. status_cast: '200'Possibility 2: The value of the status field is retained.
status: '404' -- The second column is retained, and the first column is discarded. Status_cast: '404'
Conflicts in output results
Scenario 1: Raw data field conflicts
During SPL execution, fields with case-insensitive name conflicts may be generated. In this case, SPL randomly selects one of them for the output. For example, if a log field contains a JSON string, using the parse-json instruction might create fields with conflicting names. For example:
SPL statement
* | parse-json contentInput data
content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}'Output result
Possibility 1: The Method field is retained.
content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}' Method: 'PostLogs' -- The Method field is retained. status: '200'Possibility 2: The method field is retained.
content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}' method: 'GetLogs' -- The method field is retained. status: '200'
Scenario 2: Conflicts with newly generated data fields
To avoid ambiguity, SPL preserves the case of new field names that are explicitly generated by instructions. These instructions include extend and instructions that use as to specify field names, such as parse-regexp and parse-csv.
For example, if you use extend to create a new field named Method, the resulting field name remains Method.
SPL statement
* | extend Method = 'Post'Input data
Status: '200'Output result
Status: '200' Method: 'Post'
Handle SLS reserved field conflicts
This applies to the real-time consumption and scan query features of Simple Log Service.
For a complete list of reserved fields, see Reserved fields. SPL reads data from the LogGroup structure (see Data encoding for the LogGroup definition). If raw data does not conform to the standard LogGroup encoding, some reserved fields may be in LogContent instead of their standard location. SPL handles these conflicts as follows:
For the
__source__,__topic__,__time__, and__time_ns_part__fields, SPL reads values from the standard LogGroup encoding and ignores any LogContent fields with the same names.For tag fields with the
__tag__:prefix, SPL first attempts to read values from the standard LogGroup encoding. If a value is not found, SPL reads it from LogContent. For example, for the__tag__:ipfield, SPL first attempts to read the field with the keyipfrom the LogTag list. If the field does not exist, SPL then reads the log field with the key__tag__:ipfrom the custom log fields in LogContent.
The __line__ field for full-text search
This applies to the SLS scan query feature.
To filter raw logs in the console or when using the GetLogstoreLogs API operation, you can use the `__line__` field.
Example
Search for the keyword `error` in logs.
* | where __line__ like '%error%'If a log has a field named `__line__`, enclose the name in backticks (`` ` ``) to reference it, such as
`__line__`.* | where `__line__` ='20'
Value retention and overwrite policies
When an SPL instruction is executed, if an output field has the same name as an existing field in the input data, the policy for determining the field's value is as follows:
The policies for retaining and overwriting field values do not apply to the `extend` instruction. For the `extend` instruction, if a field name conflicts, the new value is always used.
Inconsistent data types between old and new values
The original value of the input field is retained.
Example
Example 1: A renamed field from a `project` instruction has a conflicting name.
SPL statement
* | extend status=cast(status as BIGINT) -- Convert the type of the status field to BIGINT. | project code=status -- The new value's type (BIGINT) differs from the old value's type (VARCHAR), so the old value is kept.Input data
status: '200' code: 'Success'Output result
code: 'Success'
Example 2: An extracted field from a `parse-json` instruction has a conflicting name.
SPL statement
* | extend status=cast(status as BIGINT) -- Convert the type of the status field to BIGINT. | parse-json content -- The old type of status is BIGINT and the new type is VARCHAR. The old value is retained.Input data
status: '200' content: '{"status": "Success", "body": "this is test"}'Output result
content: '{"status": "Success", "body": "this is test"}' status: 200 body: 'this is test'
Consistent data types between old and new values
If the input value is null, the new value is used. Otherwise, the behavior is determined by the mode parameter in the instruction, as defined in the following table.
If the mode parameter is not defined in the instruction, its default value is overwrite.
Mode | Description |
overwrite | Overwrites the old value with the new value. |
preserve | Retains the old value and discards the new value. |
Example
Example 1: A renamed field from a `project` instruction has a conflicting name and the same type. The default mode is `overwrite`.
SPL statement
* | project code=status -- The old and new types of code are both VARCHAR. The new value is used based on the overwrite mode.Input data
status: '200' code: 'Success'Output result
code: '200'
Example 2: An extracted field from a `parse-json` instruction has a conflicting name and the same type. The default mode is `overwrite`.
SPL statement
* | parse-json content -- The old and new types of status are both VARCHAR. The new value is used based on the overwrite mode.Input data
status: '200' content: '{"status": "Success", "body": "this is test"}'Output result
content: '{"status": "Success", "body": "this is test"}' status: 'Success' body: 'this is test'
Example 3: An extracted field from a `parse-json` instruction has a conflicting name and the same type. The mode is set to `preserve`.
SPL statement
* | parse-json -mode='preserve' content -- The old and new types of status are both VARCHAR. The old value is retained based on the preserve mode.Input data
status: '200' content: '{"status": "Success", "body": "this is test"}'Output result
content: '{"status": "Success", "body": "this is test"}' status: '200' body: 'this is test'
Data type conversion
Initial type
For data processing with SPL, the initial data type of all input fields is VARCHAR, except for log time fields. If subsequent processing logic involves different data types, you must perform data type conversion.
Example
To filter access logs with a status code of 5xx, you must convert the `status` field to the BIGINT type before the comparison.
* -- The initial type of the status field is VARCHAR. | where cast(status as BIGINT) >= 500 -- Convert the type of the status field to BIGINT, and then perform the comparison.Type retention
During SPL data processing, after you use the `extend` instruction to convert the data type of a field, subsequent processing logic uses the converted data type.
Example
* -- A Logstore is used as input data. Except for time fields, all fields are initially of the VARCHAR type. | where __source__='127.0.0.1' -- Filter on the __source__ field. | extend status=cast(status as BIGINT) -- Convert the type of the status field to BIGINT. | project status, content | where status>=500 -- The type of the status field remains BIGINT, so it can be directly compared with the number 500.Handle null values in SPL expressions
Generating null values
During SPL data processing, null values are generated in the following two scenarios:
If a field used in an SPL expression does not exist in the input data, its value is treated as null during calculations.
If an exception occurs during the calculation of an SPL expression, the result is null. For example, a `cast` type conversion fails or an array index is out of bounds.
Example
If a field does not exist, its value is treated as null in calculations.
SPL statement
* | extend withoutStatus=(status is null)Input data
# Entry 1 status: '200' code: 'Success' # Entry 2 code: 'Success'Output result
# Entry 1 status: '200' code: 'Success' withoutStatus: false # Entry 2 code: 'Success' withoutStatus: true
If an exception occurs during calculation, the result is null.
SPL statement
* | extend code=cast(code as BIGINT) -- Failed to convert the code field to BIGINT. | extend values=json_parse(values) | extend values=cast(values as ARRAY(BIGINT)) | extend last=arr[10] -- Array index out of bounds.Input data
status: '200' code: 'Success' values: '[1,2,3]'Output result
status: '200' code: null values: [1, 2, 3] last: null
Eliminating null values
To eliminate null values during calculations, you can use the `COALESCE` expression to combine multiple values by priority and retrieve the first non-null value as the final result. You can also set a default value to use if all expression results are null.
Example
Read the last element of an array. If the array is empty, the default value is 0.
SPL statement
* | extend values=json_parse(values) | extend values=cast(values as ARRAY(BIGINT)) | extend last=COALESCE(values[3], values[2], values[1], 0)Input data
# Entry 1 values: '[1, 2, 3]' # Entry 2 values: '[]'Output result
# Entry 1 values: [1, 2, 3] last: 3 # Entry 2 values: [] last: 0
Character escaping
Single quotation marks
Single quotation marks are used to enclose a field's value. If the field value contains a single quotation mark, you must use an additional single quotation mark to escape it.
Example
SPL statement
* | extend user = 'Alice' | extend phone = 'Alice''s Phone'Output result
user: Alice phone: Alice's Phone
Double quotation marks
Double quotation marks are used to enclose a field name. If the field name contains a double quotation mark, you must use an additional double quotation mark to escape it.
Example
SPL statement
* | extend user_name = 'Alice' | extend "user name" = 'Alice' | extend "user""name" = 'Alice'Output result
user_name: Alice user name: Alice user"name: Alice
Other special characters
Example 1
In SPL, the backslash (\) is not an escape character and is therefore preserved as is.
SPL statement
* | extend a = 'foo\tbar' | extend b = 'foo\nbar'Output result
a: foo\tbar b: foo\nbar
Example 2
If a string must include special characters, such as a tab character or a line feed, you can use the chr function to concatenate strings.
SPL statement
* | extend a = concat('foo', chr(9), 'bar') | extend b = concat('foo', chr(10), 'bar')Output result
a: foo bar b: foo bar
Error handling
Syntax error
Syntax errors occur when an SPL statement is malformed, such as an incorrect instruction name, a keyword reference error, or a type setting error. When a syntax error occurs, SPL does not process any data. You must modify the statement based on the error message.
Data error
A data error occurs when a function or conversion fails during SPL execution. SPL sets the resulting field to null. Because data errors can occur on any row, SPL randomly samples and returns only some of the error messages. You can ignore these errors or modify the SPL statement based on the data content.
Data errors do not stop the entire execution process. The SPL statement still returns results, but the value of the field where the error occurred is null. You can ignore these errors as needed.
Execution timeout
An SPL statement contains various instructions, and their running time varies depending on the data scenario. If the total running time of an SPL statement exceeds the default timeout period, the execution stops and a timeout error is returned. In this case, the execution result is empty. The default timeout period may differ for scan queries, real-time consumption, and Logtail collection.
If you encounter this error, you can adjust your SPL statement to reduce its complexity, for example, by simplifying complex regular expressions or reducing the number of pipelines.
Memory limit exceeded
An SPL statement contains various instructions, and their memory consumption varies depending on the data scenario. SPL execution is limited to a specific memory quota. If this quota is exceeded, the execution fails and a memory limit exceeded error is returned. In this case, the execution result is empty. The default memory quota may differ for scan queries, real-time consumption, and Logtail collection.
If you encounter this error, you can adjust your SPL statement to reduce its complexity, reduce the number of pipelines, or check if the raw data size is too large.