Elasticsearch Introduction at BigData meetup

Introduction to Elasticsearch 27th May 2014 - BigData Meetup Eric Rodriguez @wavyx

About Me Eric Rodriguez Founder of data.be ! • Web entrepreneur • Data addict • Multi-Language: PHP, Java/ Groovy/Grails, .Net, … be.linkedin.com/in/erodriguez ! github.com/wavyx ! @wavyx

Elasticsearch - Company • Founded in 2012 => http://www.elasticsearch.com • Professional services • Training • Consultancy / Development support • Production support subscription (3 levels of SLAs)

Enterprises using Elasticsearch

(M)ELK Stack • Elasticsearch - Search server based on Lucene • Logstash -Tool for managing events and logs • Kibana -Visualize logs and time-stamped data • Marvel - Monitor your cluster’s heartbeat You Know, for Search…

Logstash • Collect, parse, index, and search logs

Kibana • A versatile dashboard to see and interact with your data

Marvel • Monitor the health of your cluster  cluster-wide metrics, overview of all nodes and indices and events (master election, new nodes)

real time, search and analytics engine open-source Lucene JSON schema free document  store RESTful API documentation scalability high availability distributed multi tenancy per-operation  persistence

Use Cases • Full-Text Search • Data Store • Analytics • Alerts • Ads • …

Elasticsearch core • Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java • Elasticsearch added value: “Simple is best” • Simple API (with documentation) • JSON & RESTful • Sharding & Replication • Extensibility: plugins and scripts • Interoperability: clients and integrations

Terms for DBAs • Index • Type • Document • Fields • Mapping ElasticsearchRDBMs • Database • Table • Row • Column • Schema

Plug & Play • Zero conﬁguration • 4 LoC to get started ;)

Alive ! => http://localhost:9200/?pretty

REST • Check your cluster, node, and index health, status, and statistics • Administer your cluster, node, and index data and metadata • Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes • Execute advanced search operations such as paging, sorting, ﬁltering, scripting, faceting, aggregations, and many others

Basic Operations 1/3 • Add a document • Create index

Basic Operations 2/3 • Modify/Replace a document • Delete a document • Delete index

Basic Operations 3/3 • Update a document

Mapping 1/2 • Define how a document should be mapped (similar to schema): searchable fields, tokenization, storage, .. • Explicit mapping is defined on an index/type level • A default mapping is automatically created

Mapping 2/2 • Core types: string, integer/long, ﬂoat/double, boolean, and null • Other types:Array, Object, Nested, IP, GeoPoint, GeoShape, Attachment • Example

Search API 1/2 • Multi-index, Multi-type • Uri search - Google like  Operators (AND/OR), ﬁelds, sort, paging, wildcards, …

Search API 2/2 • Paging & Sort • Fields: selection, scripts • Post ﬁlter • Highlighting • Rescoring • Explain • …

Query DSL • “SQL” for elasticsearch • Queries should be used • for full text search • where the result depends on a relevance score • Filters should be used • for binary yes/no searches • for queries on exact values

Analysis 1/2 • Analysis is extracting “terms” from a given text • Processing natural language to make it computer searchable • Conﬁgurable registry of Analyzers that can be used • to break indexed (analyzed) ﬁelds when a document is indexed • to process query strings

Analysis 2/2 • Analyzers are composed of • a singleTokenizer (may be preceded by one or more CharFilters) • zero or moreTokenFilters • Default Analyzers  standard, pattern, whitespace, language, snowball

Analytics • Aggregation of information: similar to “group by” • Facets • Aggregated data based on a search query • One-dimensional results • Ex:“term facets” return facetcounts for various values for a specific field   Think color, tag, category, … • Aggregations (ES 1.0+) • Nested Facets • Basic Stats: mean, min, max, std dev, term counts • SignificantTerms, Percentiles, Cardinality estimations

Facets • not yet deprecated, but use aggregations! • Various Facets  terms, range, histogram, date, statistical, geo distance, …

Aggregations • A generic powerful framework that can be divided into 2 main families: • Bucketing  Each bucket is associated with a key and a document criterion  The aggregation process provides a list of buckets - each one with a set of documents that "belong" to it. • Metric  Aggregations that keep track and compute metrics over a set of documents. • Aggregations can be nested !

Bucket Aggregators • global • ﬁlter • missing • terms • range • date range • ip range • histogram • date histogram • geo distance • geohash grid • nested • reverse nested • top hits (version 1.3)

Metrics Aggregators • count • stats • extended stats • cardinality • percentiles • min • max • sum • avg

Search for end users • Suggesters - “Did you mean”  Terms, Phrases, Completion, Context • “More like this”  Find documents that are "like" provided text by running it against one or more ﬁelds

Percolator • Classic ES 1. Add & Index documents 2. Search with queries 3. Retrieve matching documents • Percolator 1. Add & Index queries 2. Percolate documents 3. Retrieve matching queries

Why Percolate ?! • Alerts: social media mentions, weather forecast, news alerts • Automatic Monitoring: price monitoring, stock alerts, logs • Ads: display targeted ads based on user’s search queries • Enrich: percolate new documents, then add query matches as document tags

High Availability 1/2 • Sharding - Write Scalability • Split logical data over multiple machines & Control data ﬂows • Each index has a ﬁxed number of shards • Improve indexing performance • Replication - Read Scalability • Each shard can have 0-many replicas (dynamic setup) • Removing SPOF (Single Point Of Failure) • Improve search performance

High Availability 2/2 • Zen Discovery • Automatic discovery of nodes within a cluster and electing a master node • Useful for failover and replication • Speciﬁc modules:Amazon EC2, Microsoft Azure, Google Compute Engine • Snapshot & Restore module

Cluster Management • Marvel - http://www.elasticsearch.org/overview/marvel/ • BigDesk - http://bigdesk.org/ • Paramedic - https://github.com/karmi/elasticsearch- paramedic • KOPF - https://github.com/lmenezes/elasticsearch-kopf/ • Elastic HQ - http://www.elastichq.org/

Clients & Integration • Ecosystem: Kibana, Logstash, Marvel, Hadoop integration • API Clients: Java, Javascript, Groovy, PHP, Perl, Python, .Net, Ruby, Scala, Clojure, Go, Erlang, … • Integrations: Grails, Django, Play!, Symfony2, Carrot2, Spring, Drupal,Wordpress, … • Rivers: CouchDB, JDBC, MongoDB, Neo4j, Redis, RabbitMQ, ActiveMQ,Amazon SQS, File System,Twitter,Wikipedia, RSS, …

Fast & Furious Evolution Version 1.1  March 25, 2014 • Cardinality Agg • Percentiles Agg • SigniﬁcantTerms Agg • SearchTemplates • Cross ﬁelds search • Alias for indices & templates Version 1.2  May 22, 2014 • Java 7 • Indexing & Merging performance • Aggregations performance • Context suggester • Deep scrolling • Field value factor Benchmark API coming in 1.3 Version 1.0  Feb 12, 2014 • Aggregations • Snapshot & Restore • Distributed Percolator • Cat API • Federated search • Doc values • Circuit breaker

Resources • http://www.elasticsearch.org/guide/ • http://www.elasticsearch.org/videos/ • http://www.elasticsearchtutorial.com/ • http://exploringelasticsearch.com/ • http://joelabrahamsson.com/elasticsearch-101/ • http://belczyk.com/2014/01/elasticsearch-recomended-learning-materials/ • http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/modules- plugins.html

Books • Elasticsearch Server  http://www.packtpub.com/ elasticsearch-server-2e/book • Elasticsearch in Action  http://www.manning.com/ hinman/

Books • Elasticsearch Cookbook  http://www.packtpub.com/ elasticsearch-cookbook/book • Mastering Elasticsearch  http://www.packtpub.com/ mastering-elasticsearch- querying-and-data-handling/ book

Books • Elasticsearch -The Deﬁnitive Guide  http://www.elasticsearch.org/blog/elasticsearch-deﬁnitive-guide/

Thank you! eric@data.be - @wavyx be.linkedin.com/in/erodriguez - github.com/wavyx http://www.meetup.com/ElasticSearch-User-Group-Belux-Belgium-Luxembourg/

Elasticsearch Introduction at BigData meetup

More Related Content

What's hot

Viewers also liked

Similar to Elasticsearch Introduction at BigData meetup

More from Eric Rodriguez (Hiring in Lex)

Recently uploaded

Elasticsearch Introduction at BigData meetup