Pattern text field type
Serverless Stack
This feature requires a subscription.
The pattern_text
field type is a variant of text
with improved space efficiency for log data. Internally, it decomposes values into static parts that are likely to be shared among many values, and dynamic parts that tend to vary. The static parts usually come from the explanatory text of a log message, while the dynamic parts are the variables that were interpolated into the logs. This decomposition allows for improved compression on log-like data.
We call the static portion of the value the template
. Although the template cannot be accessed directly, a separate field called <field_name>.template_id
is accessible. This field is a hash of the template and can be used to group similar values.
Analysis is configurable but defaults to a delimiter-based analyzer. This analyzer applies a lowercase filter and then splits on whitespace and the following delimiters: =
, ?
, :
, [
, ]
, {
, }
, "
, \
, '
.
Unlike most mapping types, pattern_text
does not support multiple values for a given field per document. If a document is created with multiple values for a pattern_text field, an error will be returned.
span queries are not supported with this field, use interval queries instead, or the text
field type if you absolutely need span queries.
Like text
, pattern_text
does not support sorting and has only limited support for aggregations.
Pattern text supports an index_options
parameter with valid values of docs
and positions
. The default value is docs
, which makes pattern_text
behave similarly to match_only_text
for phrase queries. Specifically, positions are not stored, which reduces the index size at the cost of slowing down phrase queries. If index_options
is set to positions
, positions are stored and pattern_text
will support fast phrase queries. In both cases, all queries return a constant score of 1.0.
The compression provided by pattern_text
can be significantly improved if the index is sorted by the template_id
field. For example, a typical approach would be to sort first by message.template_id
, then by @timestamp
, as shown in the following example.
PUT logs
{ "settings": { "index": { "sort.field": [ "message.template_id", "@timestamp" ], "sort.order": [ "asc", "desc" ] } }, "mappings": { "properties": { "@timestamp": { "type": "date" }, "message": { "type": "pattern_text" } } } }
The following mapping parameters are accepted:
analyzer
- The analyzer which should be used for the
pattern_text
field, both at index-time and at search-time (unless overridden by thesearch_analyzer
). Supports a delimiter-based analyzer and the standard analyzer, as is used inmatch_only_text
mappings. Defaults to the delimiter-based analyzer, which applies a lowercase filter and then splits on whitespace and the following delimiters:=
,?
,:
,[
,]
,{
,}
,"
,\
,'
. index_options
- What information should be stored in the index, for search and highlighting purposes. Valid values are
docs
andpositions
. Defaults todocs
. meta
-
Metadata about the field.