Loading

Mapping character filter

The mapping character filter accepts a map of keys and values. Whenever it encounters a string of characters that is the same as a key, it replaces them with the value associated with that key.

Matching is greedy; the longest pattern matching at a given point wins. Replacements are allowed to be the empty string.

The mapping filter uses Lucene’s MappingCharFilter.

The following analyze API request uses the mapping filter to convert Hindu-Arabic numerals (٠‎١٢٣٤٥٦٧٨‎٩‎) into their Arabic-Latin equivalents (0123456789), changing the text My license plate is ٢٥٠١٥ to My license plate is 25015.

 GET /_analyze { "tokenizer": "keyword", "char_filter": [ { "type": "mapping", "mappings": [ "٠ => 0", "١ => 1", "٢ => 2", "٣ => 3", "٤ => 4", "٥ => 5", "٦ => 6", "٧ => 7", "٨ => 8", "٩ => 9" ] } ], "text": "My license plate is ٢٥٠١٥" } 

The filter produces the following text:

 [ My license plate is 25015 ] 
mappings

(Required*, array of strings) Array of mappings, with each element having the form key => value.

Either this or the mappings_path parameter must be specified.

mappings_path

(Required*, string) Path to a file containing key => value mappings.

This path must be absolute or relative to the config location, and the file must be UTF-8 encoded. Each mapping in the file must be separated by a line break.

Either this or the mappings parameter must be specified.

To customize the mappings filter, duplicate it to create the basis for a new custom character filter. You can modify the filter using its configurable parameters.

The following create index API request configures a new custom analyzer using a custom mappings filter, my_mappings_char_filter.

The my_mappings_char_filter filter replaces the :) and :( emoticons with a text equivalent.

 PUT /my-index-000001 { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "standard", "char_filter": [ "my_mappings_char_filter" ] } }, "char_filter": { "my_mappings_char_filter": { "type": "mapping", "mappings": [ ":) => _happy_", ":( => _sad_" ] } } } } } 

The following analyze API request uses the custom my_mappings_char_filter to replace :( with _sad_ in the text I'm delighted about it :(.

 GET /my-index-000001/_analyze { "tokenizer": "keyword", "char_filter": [ "my_mappings_char_filter" ], "text": "I'm delighted about it :(" } 

The filter produces the following text:

 [ I'm delighted about it _sad_ ]