Get tokens from text analysis Generally available

POST /{index}/_analyze

All methods and paths for this operation:

GET /_analyze

POST /_analyze
GET /{index}/_analyze
POST /{index}/_analyze

The analyze API performs analysis on a text string and returns the resulting tokens.

Generating excessive amount of tokens may cause a node to run out of memory. The index.analyze.max_token_count setting enables you to limit the number of tokens that can be produced. If more than this limit of tokens gets generated, an error occurs. The _analyze endpoint without a specified index will always use 10000 as its limit.

Required authorization

  • Index privileges: index
External documentation

Path parameters

  • index string Required

    Index used to derive the analyzer. If specified, the analyzer or field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.

Query parameters

  • index string

    Index used to derive the analyzer. If specified, the analyzer or field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.

application/json

Body

  • analyzer string

    The name of the analyzer that should be applied to the provided text. This could be a built-in analyzer, or an analyzer that’s been configured in the index.

  • attributes array[string]

    Array of token attributes used to filter the output of the explain parameter.

  • char_filter array

    Array of character filters used to preprocess characters before the tokenizer.

    External documentation
  • explain boolean

    If true, the response includes token attributes and additional details.

    Default value is false.

  • field string

    Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

  • filter array

    Array of token filters used to apply after the tokenizer.

    External documentation
  • normalizer string

    Normalizer to use to convert text into a single token.

  • text string | array[string]

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • detail object
      Hide detail attributes Show detail attributes object
      • analyzer object
        Hide analyzer attributes Show analyzer attributes object
        • name string Required
        • tokens array[object] Required
          Hide tokens attributes Show tokens attributes object
          • bytes string Required
          • end_offset number Required
          • keyword boolean
          • position number Required
          • positionLength number Required
          • start_offset number Required
          • termFrequency number Required
          • token string Required
          • type string Required
      • charfilters array[object]
        Hide charfilters attributes Show charfilters attributes object
        • filtered_text array[string] Required
        • name string Required
      • custom_analyzer boolean Required
      • tokenfilters array[object]
        Hide tokenfilters attributes Show tokenfilters attributes object
        • name string Required
        • tokens array[object] Required
          Hide tokens attributes Show tokens attributes object
          • bytes string Required
          • end_offset number Required
          • keyword boolean
          • position number Required
          • positionLength number Required
          • start_offset number Required
          • termFrequency number Required
          • token string Required
          • type string Required
      • tokenizer object
        Hide tokenizer attributes Show tokenizer attributes object
        • name string Required
        • tokens array[object] Required
          Hide tokens attributes Show tokens attributes object
          • bytes string Required
          • end_offset number Required
          • keyword boolean
          • position number Required
          • positionLength number Required
          • start_offset number Required
          • termFrequency number Required
          • token string Required
          • type string Required
    • tokens array[object]
      Hide tokens attributes Show tokens attributes object
      • end_offset number Required
      • position number Required
      • positionLength number
      • start_offset number Required
      • token string Required
      • type string Required
GET /_analyze { "analyzer": "standard", "text": "this is a test" }
resp = client.indices.analyze( analyzer="standard", text="this is a test", )
const response = await client.indices.analyze({ analyzer: "standard", text: "this is a test", });
response = client.indices.analyze( body: { "analyzer": "standard", "text": "this is a test" } )
$resp = $client->indices()->analyze([ "body" => [ "analyzer" => "standard", "text" => "this is a test", ], ]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"analyzer":"standard","text":"this is a test"}' "$ELASTICSEARCH_URL/_analyze"
client.indices().analyze(a -> a .analyzer("standard") .text("this is a test") ); 
You can apply any of the built-in analyzers to the text string without specifying an index.
{ "analyzer": "standard", "text": "this is a test" }
If the text parameter is provided as array of strings, it is analyzed as a multi-value field.
{ "analyzer": "standard", "text": [ "this is a test", "the second text" ] }
You can test a custom transient analyzer built from tokenizers, token filters, and char filters. Token filters use the filter parameter.
{ "tokenizer": "keyword", "filter": [ "lowercase" ], "char_filter": [ "html_strip" ], "text": "this is a <b>test</b>" }
Custom tokenizers, token filters, and character filters can be specified in the request body.
{ "tokenizer": "whitespace", "filter": [ "lowercase", { "type": "stop", "stopwords": [ "a", "is", "this" ] } ], "text": "this is a test" }
Run `GET /analyze_sample/_analyze` to run an analysis on the text using the default index analyzer associated with the `analyze_sample` index. Alternatively, the analyzer can be derived based on a field mapping.
{ "field": "obj1.field1", "text": "this is a test" }
Run `GET /analyze_sample/_analyze` and supply a normalizer for a keyword field if there is a normalizer associated with the specified index.
{ "normalizer": "my_normalizer", "text": "BaR" }
If you want to get more advanced details, set `explain` to `true`. It will output all token attributes for each token. You can filter token attributes you want to output by setting the `attributes` option. NOTE: The format of the additional detail information is labelled as experimental in Lucene and it may change in the future.
{ "tokenizer": "standard", "filter": [ "snowball" ], "text": "detailed output", "explain": true, "attributes": [ "keyword" ] }
Response examples (200)
A successful response for an analysis with `explain` set to `true`.
{ "detail": { "custom_analyzer": true, "charfilters": [], "tokenizer": { "name": "standard", "tokens": [ { "token": "detailed", "start_offset": 0, "end_offset": 8, "type": "<ALPHANUM>", "position": 0 }, { "token": "output", "start_offset": 9, "end_offset": 15, "type": "<ALPHANUM>", "position": 1 } ] }, "tokenfilters": [ { "name": "snowball", "tokens": [ { "token": "detail", "start_offset": 0, "end_offset": 8, "type": "<ALPHANUM>", "position": 0, "keyword": false }, { "token": "output", "start_offset": 9, "end_offset": 15, "type": "<ALPHANUM>", "position": 1, "keyword": false } ] } ] } }