Introduction to Elasticsearch 27th May 2014 - BigData Meetup Eric Rodriguez @wavyx
About Me Eric Rodriguez Founder of data.be ! • Web entrepreneur • Data addict • Multi-Language: PHP, Java/ Groovy/Grails, .Net, … be.linkedin.com/in/erodriguez ! github.com/wavyx ! @wavyx
Elasticsearch - Company • Founded in 2012 => http://www.elasticsearch.com • Professional services • Training • Consultancy / Development support • Production support subscription (3 levels of SLAs)
Enterprises using Elasticsearch
(M)ELK Stack • Elasticsearch - Search server based on Lucene • Logstash -Tool for managing events and logs • Kibana -Visualize logs and time-stamped data • Marvel - Monitor your cluster’s heartbeat You Know, for Search…
Logstash • Collect, parse, index, and search logs
Kibana • A versatile dashboard to see and interact with your data
Marvel • Monitor the health of your cluster
 cluster-wide metrics, overview of all nodes and indices and events (master election, new nodes)
real time, search and analytics engine open-source Lucene JSON schema free document
 store RESTful API documentation scalability high availability distributed multi tenancy per-operation
 persistence
Use Cases • Full-Text Search • Data Store • Analytics • Alerts • Ads • …
Copyright 2014 Elasticsearch Inc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
Copyright 2014 Elasticsearch Inc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
Elasticsearch core • Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java • Elasticsearch added value: “Simple is best” • Simple API (with documentation) • JSON & RESTful • Sharding & Replication • Extensibility: plugins and scripts • Interoperability: clients and integrations
Terms for DBAs • Index • Type • Document • Fields • Mapping ElasticsearchRDBMs • Database • Table • Row • Column • Schema
Plug & Play • Zero configuration • 4 LoC to get started ;)
Alive ! => http://localhost:9200/?pretty
REST • Check your cluster, node, and index health, status, and statistics • Administer your cluster, node, and index data and metadata • Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes • Execute advanced search operations such as paging, sorting, filtering, scripting, faceting, aggregations, and many others
Basic Operations 1/3 • Add a document • Create index
Basic Operations 2/3 • Modify/Replace a document • Delete a document • Delete index
Basic Operations 3/3 • Update a document
Mapping 1/2 • Define how a document should be mapped (similar to schema): searchable fields, tokenization, storage, .. • Explicit mapping is defined on an index/type level • A default mapping is automatically created
Mapping 2/2 • Core types: string, integer/long, float/double, boolean, and null • Other types:Array, Object, Nested, IP, GeoPoint, GeoShape, Attachment • Example
Search API 1/2 • Multi-index, Multi-type • Uri search - Google like
 Operators (AND/OR), fields, sort, paging, wildcards, …
Search API 2/2 • Paging & Sort • Fields: selection, scripts • Post filter • Highlighting • Rescoring • Explain • …
Query DSL • “SQL” for elasticsearch • Queries should be used • for full text search • where the result depends on a relevance score • Filters should be used • for binary yes/no searches • for queries on exact values
Basic Queries
Basic Filters
Analysis 1/2 • Analysis is extracting “terms” from a given text • Processing natural language to make it computer searchable • Configurable registry of Analyzers that can be used • to break indexed (analyzed) fields when a document is indexed • to process query strings
Analysis 2/2 • Analyzers are composed of • a singleTokenizer (may be preceded by one or more CharFilters) • zero or moreTokenFilters • Default Analyzers
 standard, pattern, whitespace, language, snowball
Copyright 2014 Elasticsearch Inc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
Analytics • Aggregation of information: similar to “group by” • Facets • Aggregated data based on a search query • One-dimensional results • Ex:“term facets” return facetcounts for various values for a specific field 
 Think color, tag, category, … • Aggregations (ES 1.0+) • Nested Facets • Basic Stats: mean, min, max, std dev, term counts • SignificantTerms, Percentiles, Cardinality estimations
Facets • not yet deprecated, but use aggregations! • Various Facets
 terms, range, histogram, date, statistical, geo distance, …
Aggregations • A generic powerful framework that can be divided into 2 main families: • Bucketing
 Each bucket is associated with a key and a document criterion
 The aggregation process provides a list of buckets - each one with a set of documents that "belong" to it. • Metric
 Aggregations that keep track and compute metrics over a set of documents. • Aggregations can be nested !
Bucket Aggregators • global • filter • missing • terms • range • date range • ip range • histogram • date histogram • geo distance • geohash grid • nested • reverse nested • top hits (version 1.3)
Metrics Aggregators • count • stats • extended stats • cardinality • percentiles • min • max • sum • avg
Search for end users • Suggesters - “Did you mean”
 Terms, Phrases, Completion, Context • “More like this”
 Find documents that are "like" provided text by running it against one or more fields
Percolator • Classic ES 1. Add & Index documents 2. Search with queries 3. Retrieve matching documents • Percolator 1. Add & Index queries 2. Percolate documents 3. Retrieve matching queries
Why Percolate ?! • Alerts: social media mentions, weather forecast, news alerts • Automatic Monitoring: price monitoring, stock alerts, logs • Ads: display targeted ads based on user’s search queries • Enrich: percolate new documents, then add query matches as document tags
High Availability 1/2 • Sharding - Write Scalability • Split logical data over multiple machines & Control data flows • Each index has a fixed number of shards • Improve indexing performance • Replication - Read Scalability • Each shard can have 0-many replicas (dynamic setup) • Removing SPOF (Single Point Of Failure) • Improve search performance
High Availability 2/2 • Zen Discovery • Automatic discovery of nodes within a cluster and electing a master node • Useful for failover and replication • Specific modules:Amazon EC2, Microsoft Azure, Google Compute Engine • Snapshot & Restore module
Cluster Management • Marvel - http://www.elasticsearch.org/overview/marvel/ • BigDesk - http://bigdesk.org/ • Paramedic - https://github.com/karmi/elasticsearch- paramedic • KOPF - https://github.com/lmenezes/elasticsearch-kopf/ • Elastic HQ - http://www.elastichq.org/
Clients & Integration • Ecosystem: Kibana, Logstash, Marvel, Hadoop integration • API Clients: Java, Javascript, Groovy, PHP, Perl, Python, .Net, Ruby, Scala, Clojure, Go, Erlang, … • Integrations: Grails, Django, Play!, Symfony2, Carrot2, Spring, Drupal,Wordpress, … • Rivers: CouchDB, JDBC, MongoDB, Neo4j, Redis, RabbitMQ, ActiveMQ,Amazon SQS, File System,Twitter,Wikipedia, RSS, …
Fast & Furious Evolution Version 1.1
 March 25, 2014 • Cardinality Agg • Percentiles Agg • SignificantTerms Agg • SearchTemplates • Cross fields search • Alias for indices & templates Version 1.2
 May 22, 2014 • Java 7 • Indexing & Merging performance • Aggregations performance • Context suggester • Deep scrolling • Field value factor Benchmark API coming in 1.3 Version 1.0
 Feb 12, 2014 • Aggregations • Snapshot & Restore • Distributed Percolator • Cat API • Federated search • Doc values • Circuit breaker
Resources • http://www.elasticsearch.org/guide/ • http://www.elasticsearch.org/videos/ • http://www.elasticsearchtutorial.com/ • http://exploringelasticsearch.com/ • http://joelabrahamsson.com/elasticsearch-101/ • http://belczyk.com/2014/01/elasticsearch-recomended-learning-materials/ • http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/modules- plugins.html
Books • Elasticsearch Server
 http://www.packtpub.com/ elasticsearch-server-2e/book • Elasticsearch in Action
 http://www.manning.com/ hinman/
Books • Elasticsearch Cookbook
 http://www.packtpub.com/ elasticsearch-cookbook/book • Mastering Elasticsearch
 http://www.packtpub.com/ mastering-elasticsearch- querying-and-data-handling/ book
Books • Elasticsearch -The Definitive Guide
 http://www.elasticsearch.org/blog/elasticsearch-definitive-guide/
Thank you! eric@data.be - @wavyx be.linkedin.com/in/erodriguez - github.com/wavyx http://www.meetup.com/ElasticSearch-User-Group-Belux-Belgium-Luxembourg/

Elasticsearch Introduction at BigData meetup

  • 1.
    Introduction to Elasticsearch 27th May2014 - BigData Meetup Eric Rodriguez @wavyx
  • 2.
    About Me Eric Rodriguez Founderof data.be ! • Web entrepreneur • Data addict • Multi-Language: PHP, Java/ Groovy/Grails, .Net, … be.linkedin.com/in/erodriguez ! github.com/wavyx ! @wavyx
  • 3.
    Elasticsearch - Company •Founded in 2012 => http://www.elasticsearch.com • Professional services • Training • Consultancy / Development support • Production support subscription (3 levels of SLAs)
  • 4.
  • 5.
    (M)ELK Stack • Elasticsearch- Search server based on Lucene • Logstash -Tool for managing events and logs • Kibana -Visualize logs and time-stamped data • Marvel - Monitor your cluster’s heartbeat You Know, for Search…
  • 6.
    Logstash • Collect, parse,index, and search logs
  • 7.
    Kibana • A versatiledashboard to see and interact with your data
  • 8.
    Marvel • Monitor thehealth of your cluster
 cluster-wide metrics, overview of all nodes and indices and events (master election, new nodes)
  • 9.
    real time, searchand analytics engine open-source Lucene JSON schema free document
 store RESTful API documentation scalability high availability distributed multi tenancy per-operation
 persistence
  • 10.
    Use Cases • Full-TextSearch • Data Store • Analytics • Alerts • Ads • …
  • 11.
    Copyright 2014 ElasticsearchInc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
  • 12.
    Copyright 2014 ElasticsearchInc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
  • 13.
    Copyright 2014 ElasticsearchInc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
  • 14.
    Copyright 2014 ElasticsearchInc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
  • 15.
    Copyright 2014 ElasticsearchInc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
  • 16.
    Copyright 2014 ElasticsearchInc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
  • 17.
    Copyright 2014 ElasticsearchInc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
  • 18.
    Elasticsearch core • ApacheLucene is a high-performance, full-featured text search engine library written entirely in Java • Elasticsearch added value: “Simple is best” • Simple API (with documentation) • JSON & RESTful • Sharding & Replication • Extensibility: plugins and scripts • Interoperability: clients and integrations
  • 19.
    Terms for DBAs •Index • Type • Document • Fields • Mapping ElasticsearchRDBMs • Database • Table • Row • Column • Schema
  • 20.
    Plug & Play •Zero configuration • 4 LoC to get started ;)
  • 21.
  • 22.
    REST • Check yourcluster, node, and index health, status, and statistics • Administer your cluster, node, and index data and metadata • Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes • Execute advanced search operations such as paging, sorting, filtering, scripting, faceting, aggregations, and many others
  • 23.
    Basic Operations 1/3 •Add a document • Create index
  • 24.
    Basic Operations 2/3 •Modify/Replace a document • Delete a document • Delete index
  • 25.
    Basic Operations 3/3 •Update a document
  • 26.
    Mapping 1/2 • Definehow a document should be mapped (similar to schema): searchable fields, tokenization, storage, .. • Explicit mapping is defined on an index/type level • A default mapping is automatically created
  • 27.
    Mapping 2/2 • Coretypes: string, integer/long, float/double, boolean, and null • Other types:Array, Object, Nested, IP, GeoPoint, GeoShape, Attachment • Example
  • 28.
    Search API 1/2 •Multi-index, Multi-type • Uri search - Google like
 Operators (AND/OR), fields, sort, paging, wildcards, …
  • 29.
    Search API 2/2 •Paging & Sort • Fields: selection, scripts • Post filter • Highlighting • Rescoring • Explain • …
  • 30.
    Query DSL • “SQL”for elasticsearch • Queries should be used • for full text search • where the result depends on a relevance score • Filters should be used • for binary yes/no searches • for queries on exact values
  • 31.
  • 32.
  • 33.
    Analysis 1/2 • Analysisis extracting “terms” from a given text • Processing natural language to make it computer searchable • Configurable registry of Analyzers that can be used • to break indexed (analyzed) fields when a document is indexed • to process query strings
  • 34.
    Analysis 2/2 • Analyzersare composed of • a singleTokenizer (may be preceded by one or more CharFilters) • zero or moreTokenFilters • Default Analyzers
 standard, pattern, whitespace, language, snowball
  • 35.
    Copyright 2014 ElasticsearchInc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
  • 36.
    Analytics • Aggregation ofinformation: similar to “group by” • Facets • Aggregated data based on a search query • One-dimensional results • Ex:“term facets” return facetcounts for various values for a specific field 
 Think color, tag, category, … • Aggregations (ES 1.0+) • Nested Facets • Basic Stats: mean, min, max, std dev, term counts • SignificantTerms, Percentiles, Cardinality estimations
  • 37.
    Facets • not yetdeprecated, but use aggregations! • Various Facets
 terms, range, histogram, date, statistical, geo distance, …
  • 38.
    Aggregations • A genericpowerful framework that can be divided into 2 main families: • Bucketing
 Each bucket is associated with a key and a document criterion
 The aggregation process provides a list of buckets - each one with a set of documents that "belong" to it. • Metric
 Aggregations that keep track and compute metrics over a set of documents. • Aggregations can be nested !
  • 39.
    Bucket Aggregators • global •filter • missing • terms • range • date range • ip range • histogram • date histogram • geo distance • geohash grid • nested • reverse nested • top hits (version 1.3)
  • 40.
    Metrics Aggregators • count •stats • extended stats • cardinality • percentiles • min • max • sum • avg
  • 41.
    Search for endusers • Suggesters - “Did you mean”
 Terms, Phrases, Completion, Context • “More like this”
 Find documents that are "like" provided text by running it against one or more fields
  • 42.
    Percolator • Classic ES 1. Add & Index documents 2. Search with queries 3. Retrieve matching documents • Percolator 1. Add & Index queries 2. Percolate documents 3. Retrieve matching queries
  • 43.
    Why Percolate ?! •Alerts: social media mentions, weather forecast, news alerts • Automatic Monitoring: price monitoring, stock alerts, logs • Ads: display targeted ads based on user’s search queries • Enrich: percolate new documents, then add query matches as document tags
  • 44.
    High Availability 1/2 •Sharding - Write Scalability • Split logical data over multiple machines & Control data flows • Each index has a fixed number of shards • Improve indexing performance • Replication - Read Scalability • Each shard can have 0-many replicas (dynamic setup) • Removing SPOF (Single Point Of Failure) • Improve search performance
  • 45.
    High Availability 2/2 •Zen Discovery • Automatic discovery of nodes within a cluster and electing a master node • Useful for failover and replication • Specific modules:Amazon EC2, Microsoft Azure, Google Compute Engine • Snapshot & Restore module
  • 46.
    Cluster Management • Marvel- http://www.elasticsearch.org/overview/marvel/ • BigDesk - http://bigdesk.org/ • Paramedic - https://github.com/karmi/elasticsearch- paramedic • KOPF - https://github.com/lmenezes/elasticsearch-kopf/ • Elastic HQ - http://www.elastichq.org/
  • 47.
    Clients & Integration •Ecosystem: Kibana, Logstash, Marvel, Hadoop integration • API Clients: Java, Javascript, Groovy, PHP, Perl, Python, .Net, Ruby, Scala, Clojure, Go, Erlang, … • Integrations: Grails, Django, Play!, Symfony2, Carrot2, Spring, Drupal,Wordpress, … • Rivers: CouchDB, JDBC, MongoDB, Neo4j, Redis, RabbitMQ, ActiveMQ,Amazon SQS, File System,Twitter,Wikipedia, RSS, …
  • 48.
    Fast & FuriousEvolution Version 1.1
 March 25, 2014 • Cardinality Agg • Percentiles Agg • SignificantTerms Agg • SearchTemplates • Cross fields search • Alias for indices & templates Version 1.2
 May 22, 2014 • Java 7 • Indexing & Merging performance • Aggregations performance • Context suggester • Deep scrolling • Field value factor Benchmark API coming in 1.3 Version 1.0
 Feb 12, 2014 • Aggregations • Snapshot & Restore • Distributed Percolator • Cat API • Federated search • Doc values • Circuit breaker
  • 49.
    Resources • http://www.elasticsearch.org/guide/ • http://www.elasticsearch.org/videos/ •http://www.elasticsearchtutorial.com/ • http://exploringelasticsearch.com/ • http://joelabrahamsson.com/elasticsearch-101/ • http://belczyk.com/2014/01/elasticsearch-recomended-learning-materials/ • http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/modules- plugins.html
  • 50.
  • 51.
    Books • Elasticsearch Cookbook
 http://www.packtpub.com/ elasticsearch-cookbook/book •Mastering Elasticsearch
 http://www.packtpub.com/ mastering-elasticsearch- querying-and-data-handling/ book
  • 52.
    Books • Elasticsearch -TheDefinitive Guide
 http://www.elasticsearch.org/blog/elasticsearch-definitive-guide/
  • 53.
    Thank you! eric@data.be -@wavyx be.linkedin.com/in/erodriguez - github.com/wavyx http://www.meetup.com/ElasticSearch-User-Group-Belux-Belgium-Luxembourg/