Get tokens from text analysis Generally available
All methods and paths for this operation:
The analyze API performs analysis on a text string and returns the resulting tokens.
Generating excessive amount of tokens may cause a node to run out of memory. The index.analyze.max_token_count
setting enables you to limit the number of tokens that can be produced. If more than this limit of tokens gets generated, an error occurs. The _analyze
endpoint without a specified index will always use 10000
as its limit.
Required authorization
- Index privileges:
index
Path parameters
-
Index used to derive the analyzer. If specified, the
analyzer
or field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.
Query parameters
-
Index used to derive the analyzer. If specified, the
analyzer
or field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.
Body
-
The name of the analyzer that should be applied to the provided
text
. This could be a built-in analyzer, or an analyzer that’s been configured in the index. -
Array of token attributes used to filter the output of the
explain
parameter. -
Array of character filters used to preprocess characters before the tokenizer.
External documentation -
If
true
, the response includes token attributes and additional details.Default value is
false
. -
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
Array of token filters used to apply after the tokenizer.
External documentation -
Normalizer to use to convert text into a single token.
GET /_analyze { "analyzer": "standard", "text": "this is a test" }
resp = client.indices.analyze( analyzer="standard", text="this is a test", )
const response = await client.indices.analyze({ analyzer: "standard", text: "this is a test", });
response = client.indices.analyze( body: { "analyzer": "standard", "text": "this is a test" } )
$resp = $client->indices()->analyze([ "body" => [ "analyzer" => "standard", "text" => "this is a test", ], ]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"analyzer":"standard","text":"this is a test"}' "$ELASTICSEARCH_URL/_analyze"
client.indices().analyze(a -> a .analyzer("standard") .text("this is a test") );
{ "analyzer": "standard", "text": "this is a test" }
{ "analyzer": "standard", "text": [ "this is a test", "the second text" ] }
{ "tokenizer": "keyword", "filter": [ "lowercase" ], "char_filter": [ "html_strip" ], "text": "this is a <b>test</b>" }
{ "tokenizer": "whitespace", "filter": [ "lowercase", { "type": "stop", "stopwords": [ "a", "is", "this" ] } ], "text": "this is a test" }
{ "field": "obj1.field1", "text": "this is a test" }
{ "normalizer": "my_normalizer", "text": "BaR" }
{ "tokenizer": "standard", "filter": [ "snowball" ], "text": "detailed output", "explain": true, "attributes": [ "keyword" ] }
{ "detail": { "custom_analyzer": true, "charfilters": [], "tokenizer": { "name": "standard", "tokens": [ { "token": "detailed", "start_offset": 0, "end_offset": 8, "type": "<ALPHANUM>", "position": 0 }, { "token": "output", "start_offset": 9, "end_offset": 15, "type": "<ALPHANUM>", "position": 1 } ] }, "tokenfilters": [ { "name": "snowball", "tokens": [ { "token": "detail", "start_offset": 0, "end_offset": 8, "type": "<ALPHANUM>", "position": 0, "keyword": false }, { "token": "output", "start_offset": 9, "end_offset": 15, "type": "<ALPHANUM>", "position": 1, "keyword": false } ] } ] } }