Elasticsearch Field Data Types

FIELD DATA TYPESby Bo Andersen - codingexplained.com

OUTLINE ➤ Core data types ➤ String, numeric, data, boolean, binary ➤ Complex data types ➤ Object, array, nested ➤ Geo data types ➤ Geo-point, Geo-shape ➤ Specialized data types ➤ IPv4, completion, token count, attachment

STRING ➤ String field types accept string values ➤ Can be sub-divided into full text and keywords ➤ We will take a look at these next

STRING - FULL TEXT ➤ Typically used for text based relevance searches (e.g. search for products by name) ➤ Full text fields are analyzed ➤ Data is passed through an analyzer to convert the string into a list of individual terms, before being indexed ➤ This allows Elasticsearch to search for individual words within a full text field ➤ Full text fields are not used for sorting and are rarely used for aggregations

STRING - KEYWORDS ➤ Exact values such as tags, status, e-mail addresses, etc. ➤ Keywords fields are not analyzed ➤ The exact string value is added to the index as a single term ➤ Typically used for filtering ➤ E.g. find all products where status is "On Discount" ➤ Also often used for sorting and aggregations

NUMERIC ➤ Supports the following numeric types ➤ long (signed 64-bit integer) ➤ integer (signed 32-bit integer) ➤ short (signed 16-bit integer) ➤ byte (signed 8-bit integer) ➤ double (double-precision 64-bit floating point) ➤ float (single-precision 32-bit floating point)

DATE ➤ Dates in Elasticsearch can be either ➤ Strings containing formatted dates ➤ E.g. 2016-01-01 or 2016/01/01 12:00:00 ➤ A long number representing milliseconds since the epoch ➤ An integer representing seconds since the epoch ➤ Internally stored as a long number representing milliseconds since the epoch

DATE - FORMATS ➤ Defaults to strict_date_optional_time||epoch_millis ➤ Dates with optional timestamps, which conform to the formats supported by strict_date_optional_time - or milliseconds since the epoch ➤ Examples ➤ 2016-01-01 (date only) ➤ 2016-01-01T12:00:00Z (date including time) ➤ 1410020500000 (milliseconds since the epoch) ➤ Multiple formats can be specified by separating them with the || separator ➤ E.g. yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis

BOOLEAN ➤ Boolean fields accept true and false values as in JSON ➤ Can also accept strings and numbers which are interpreted as either true or false ➤ False values ➤ false, "false", "off", "no", "0", "" (empty string), 0, 0.0 ➤ True values ➤ Anything that is not false

BINARY ➤ A binary value as a Base64 encoded string ➤ E.g. aHR0cDovL2NvZGluZ2V4cGxhaW5lZC5jb20= ➤ Not searchable

OBJECT ➤ JSON documents are hierarchical ➤ A document may contain inner objects, which in turn may contain inner objects ➤ In Elasticsearch, documents are indexed as flat lists of key-value pairs { "message": "Some text...", "customer.age": 26, "customer.address.city": "Copenhagen", "customer.address.country": "Denmark" }

ARRAY ➤ Elasticsearch does not have a dedicated array type ➤ Any field can contain zero or more values by default ➤ All values in an array must be of the same data type ➤ When adding a field dynamically, the first value in the array determines the field type ➤ Examples ➤ Array of strings: ["Elasticsearch", "rocks"] ➤ Array of integers: [1, 2] ➤ Array of arrays: [1, [2, 3]] - equivalent of [1, 2, 3] ➤ Array of objects: [{ "name": "Andy", "age": 26 }, { "name": "Brenda", "age": 32 }]

ARRAY - OBJECTS ➤ Arrays of objects do not work as you would expect ➤ You cannot query each object independently of the other objects in the array ➤ Lucene has no concept of inner objects ➤ Elasticsearch flattens object hierarchies into a list of field names and values is stored similar to this: { "users : [{ "name": "Andy", "age": 26 }, { "name": "Brenda", "age": 32 }] } { "users.name": ["Andy", "Brenda"], "users.age": [32, 26] } ➤ The association between "Andy" and 26 is lost ➤ A search for a user named "Andy" who is 26 years old would return incorrect results! ➤ If you need to be able to do this, then you must use the nested data type

NESTED ➤ If you need to index arrays of objects and to maintain the independence of each object in the array, you should used the nested data type ➤ Internally, nested objects index each object in the array as a separate hidden document ➤ Each nested object can be queried independently of the others, with a nested query ➤ A nested query is executed against the nested objects as if they were indexed as separate documents (internally, this is actually the case)

GEO-POINT ➤ Latitude-longitude pairs ➤ Used for geographical operations on documents (searching, sorting, ...) { "location": { "lat": 33.5206608, "lon": -86.8024900 } } { "location": "33.5206608,-86.8024900" } { "location": "drm3btev3e86" } { "location": [-86.8024900,33.5206608] } 1 2 3 4

GEO-SHAPE ➤ Geo shapes such as rectangles and polygons ➤ Should be used when either the data being indexed or the queries being executed contain shapes other than just points ➤ LineString ➤ Array of two or more positions (array of arrays). Straight line in the case of two points ➤ Polygon ➤ An array of arrays, where each array contains points ➤ The first and last points in the outer array must be the same (to close the polygon) ➤ ...

IPV4 ➤ Used to map IPv4 addresses ➤ Internally, values are indexed as long values

COMPLETION ➤ The completion suggester is a so-called prefix suggester ➤ It does not do spell correction, but enables basic auto-complete functionality ➤ Useful for providing the user with suggestions while searching, e.g. like on Google ➤ Stores a FST (Finite State Transducer) as part of the index ➤ Allows for very fast loads and executions ➤ You don't have to worry about this - just know when to use this type

TOKEN COUNT ➤ An integer field which accepts string values ➤ The string values are analyzed, and the number of tokens are indexed ➤ Example ➤ A name property could have a length field of the type token_count ➤ Then, a search query could be executed to find persons whose name contains X tokens (split by space, for instance)

ATTACHMENT ➤ Lets Elasticsearch index attachments in common formats ➤ E.g. PDF, XLS, PPT, ... ➤ Attachment content is stored as a Base64 encoded string ➤ This functionality is available as a plugin that must be installed ➤ sudo /path/to/elasticsearchbin/plugin install mapper-attachments ➤ Must be installed on every node of a cluster ➤ Nodes must be restarted after the installation

Elasticsearch Field Data Types

More Related Content

What's hot

Similar to Elasticsearch Field Data Types

Recently uploaded

Elasticsearch Field Data Types