Nov 21, 2015 Sofia var title = “Scalability and Real-time Queries with Elasticsearch”; var info = { name: “Ivelin Andreev”, otherOptional: “…” };
Nov 21, 2015 About me • Project Manager @ o 13 years professional experience o .NET Web Development MCPD o SQL Server 2012 (MCSA) • External Expert Horizon 2020 • Business Interests o Web Development, SOA, Integration o Security & Performance Optimization • Contact o ivelin.andreev@icb.bg o www.linkedin.com/in/ivelin o www.slideshare.net/ivoandreev
agenda(); Nov 21, 2015 • What? • Why? • First steps • Analyzers in depth • From RDBMS to Elasticsearch • Demo
Nov 21, 2015 What is ES • Powerful real-time search and analytics engine “…It has a very advanced distributed model, speaks JSON natively, and exposes many advanced search features, all seamlessly expressed through JSON DSL…” Shay Banon – Creator, Founder, CTO • What else… o Document-oriented o Sophisticated RESTful API o Entirely open source o Based on Apache Lucene o Requires JAVA
Nov 21, 2015 Popularity (All DB Engines) All DB Engines Ranking
Nov 21, 2015 Popularity (Search Engines)
Nov 21, 2015 Who Uses ES
Nov 21, 2015 First Steps in Elasticsearch “You don’t learn walk by following rules. You learn by doing” (Richard Branson)
Nov 21, 2015 Terms ElasticSearch RDBMS Index Database Type Table Document Row Field Column  Scaling  Cluster; Node; Shard (Primary/ Replica)
Nov 21, 2015 RESTful APIs • Document APIs o Index, Get, Update, Delete o Bulk API available • Search APIs o Send/Receive JSON o Basic queries via query string http://localhost:9200/{indexName}/{type}/_search?q=searchstr&size=100 http://localhost:9200/{index1,index2}/{type}/_search?q=createdby:ivo http://localhost:9200/_search?q=tag:spam POST /[index]/[type] { “…”,”…” } GET /[index]/[type]/[ID] { } PUT /[index]/[type]/[ID] { “…”,”…” } DELETE /[index]/[type]/[ID]
Nov 21, 2015 Query DSL • Entire JSON object is the Query DSL • Query o Full text queries o Results ordered by relevance o Every field is searchable • Filter o Binary – either a field matches or it does not • Filters and queries can be nested o Nesting passes relevance to parents
Nov 21, 2015 Query - for full-text search or for any condition that should affect the relevance score Filter – for everything else
Nov 21, 2015 • ES provides 27 filters (Sep 2015) • Term/Terms filter { "term": { "date": "2015-10-10" }} • Range filter {"range": {"age": {"gte":20, "lt":30}}} • Exists/Missing filter {"exists": {"field": "title"}} • Bool filter {"bool": { "must": { "term": { "folder": "inbox" }}, "must_not": { "term": { "tag": "spam" }} "should": [{ "term": { "starred": true }}, { "term": { "unread": true }}] }} How To (Filters)
Nov 21, 2015 How To (Queries) • ES provides 38 queries (Sep 2015) • match query { "match": { "tweet": "About Search" } • multi_match query { "multi_match": { "query": "full text search", "fields": [ "title", "body" ] }} • bool query { "bool": { "must": { "match": { "title": "how to make millions" }}, "must_not": { "match": { "tag": "spam" }}, "should": [ { "match": { "tag": "starred" }}, { "range": { "date": { "gte": "2014-01-01" }}} ]}} • fuzzy query
Nov 21, 2015 Filters • Boolean (Y/N) • Exact values • No analysis • Cached • Faster Queries • Relevance • Full text • Analysis • Not cached • Slower Queries vs. Filters
Nov 21, 2015 Any index search solution is way better than “LIKE”
Nov 21, 2015 How does SQL Full-text Index Work • Column-level language o Used by stemmers and tokenizers o Different columns for different languages o Language tags are respected (XML, binary) • Stop words ALTER FULLTEXT STOPLIST ProductSL ADD ‘blah' LANGUAGE 1033; • Thesaurus files o (i.e. “song”->”tune”)
Nov 21, 2015 Inverted Index
Nov 21, 2015 ES Analysis Process • Character filters o Simplify data (“&” -> “and”, “ü” -> “u”) • Tokenizers o Split data into words (terms, tokens) • Token filters o Lowercase o Remove words w/o relevance impact (“a”, “the”) o Synonyms added • Stemming o Reduce to root form (“dogs” -> “dog”)
Nov 21, 2015 Analyzers • FT fields are analyzed into terms to create inverted index • Configured when index is created "Set the shape to semi-transparent by calling set_trans(5)" Analyzer Type Example Whitespace Set, the, shape, to, semi-transparent, by, calling, set_trans(5) Standard (Def.) set, the, shape, to, semi, transparent, by, calling, set_trans, 5 Simple set, the, shape, to, semi, transparent, by, calling, set, trans Stop set, the, shape, to, semi, transparent, by, calling, set, trans Language (EN) set, shape, semi, transparent, calling, set_trans, 5 Pattern “nonword”:{ “type”: “pattern”, “pattern”:”[^w]+” } Custom Allows combination of Tokenizer[1:1] and TokenFilters[0:N]
Nov 21, 2015 Security Remarks • RAM is Important o Data structures reside in-memory o Performance and reliability depend on it oBe Aware • No authentication! • Protect private data alone • Prevent expensive requests (DoS) • Protect http://localhost:9200
Nov 21, 2015 Side by Side ElasticSearch SQL Full-text Search Performance RAM mainly Disk I/O mainly Licensing Open Source Commercial Platform Any (Java) Windows Only Wildcards Yes Partly FTS Syntax Rich Basic Extensibility Plugins CLR or custom code Scale Out Yes No Relational Integrity No Yes Security No Yes FT Search Setup Manual Wizard Index Update Manual Auto
Nov 21, 2015 From SQL to Elasticsearch • Rivers (deprecated) • Logstash o Open source log management tool • Client libraries o .NET • Elasticsearch.Net • Nest o Also Java, JS, Perl, Python, Ruby, PHP
Nov 21, 2015 Summary • Not a replacement of RDBMS • Real-time search applications • Built for scalability • Easy to install • RESTful API and JSON
Nov 21, 2015
Nov 21, 2015 Deployment (Windows)  Install Java   Download ES zip  Install  [ESHome]/bin> service install  Set ES service to start automatically  [ESHome]/bin> service manager  Open in browser http://localhost:9200/  Plugin Install  [ESHome]/bin> plugin -i elasticsearch/marvel/latest  Restart ES
Nov 21, 2015 Takeaways • Tools o Kopf: https://github.com/lmenezes/elasticsearch-kopf o Marvel: https://www.elastic.co/products/marvel o Curl: http://curl.haxx.se/download.html o JDBC Driver: http://www.java2s.com/Code/Jar/s/Downloadsqljdbc430jar.htm • Community o https://discuss.elastic.co (yes  “.co”, not “.com”) • Getting Started o http://joelabrahamsson.com/elasticsearch-101/
Nov 21, 2015 Thanks to our Sponsors: General Sponsor: Gold Sponsors: Media Partners: Technological Partners: Hosting Partner:
Scalability and Real-time Queries with Elasticsearch

Scalability and Real-time Queries with Elasticsearch

  • 1.
    Nov 21, 2015 Sofia vartitle = “Scalability and Real-time Queries with Elasticsearch”; var info = { name: “Ivelin Andreev”, otherOptional: “…” };
  • 2.
    Nov 21, 2015 Aboutme • Project Manager @ o 13 years professional experience o .NET Web Development MCPD o SQL Server 2012 (MCSA) • External Expert Horizon 2020 • Business Interests o Web Development, SOA, Integration o Security & Performance Optimization • Contact o ivelin.andreev@icb.bg o www.linkedin.com/in/ivelin o www.slideshare.net/ivoandreev
  • 3.
    agenda(); Nov 21, 2015 •What? • Why? • First steps • Analyzers in depth • From RDBMS to Elasticsearch • Demo
  • 4.
    Nov 21, 2015 Whatis ES • Powerful real-time search and analytics engine “…It has a very advanced distributed model, speaks JSON natively, and exposes many advanced search features, all seamlessly expressed through JSON DSL…” Shay Banon – Creator, Founder, CTO • What else… o Document-oriented o Sophisticated RESTful API o Entirely open source o Based on Apache Lucene o Requires JAVA
  • 5.
    Nov 21, 2015 Popularity(All DB Engines) All DB Engines Ranking
  • 6.
    Nov 21, 2015 Popularity(Search Engines)
  • 7.
  • 8.
    Nov 21, 2015 FirstSteps in Elasticsearch “You don’t learn walk by following rules. You learn by doing” (Richard Branson)
  • 9.
    Nov 21, 2015 Terms ElasticSearchRDBMS Index Database Type Table Document Row Field Column  Scaling  Cluster; Node; Shard (Primary/ Replica)
  • 10.
    Nov 21, 2015 RESTfulAPIs • Document APIs o Index, Get, Update, Delete o Bulk API available • Search APIs o Send/Receive JSON o Basic queries via query string http://localhost:9200/{indexName}/{type}/_search?q=searchstr&size=100 http://localhost:9200/{index1,index2}/{type}/_search?q=createdby:ivo http://localhost:9200/_search?q=tag:spam POST /[index]/[type] { “…”,”…” } GET /[index]/[type]/[ID] { } PUT /[index]/[type]/[ID] { “…”,”…” } DELETE /[index]/[type]/[ID]
  • 11.
    Nov 21, 2015 QueryDSL • Entire JSON object is the Query DSL • Query o Full text queries o Results ordered by relevance o Every field is searchable • Filter o Binary – either a field matches or it does not • Filters and queries can be nested o Nesting passes relevance to parents
  • 12.
    Nov 21, 2015 Query- for full-text search or for any condition that should affect the relevance score Filter – for everything else
  • 13.
    Nov 21, 2015 •ES provides 27 filters (Sep 2015) • Term/Terms filter { "term": { "date": "2015-10-10" }} • Range filter {"range": {"age": {"gte":20, "lt":30}}} • Exists/Missing filter {"exists": {"field": "title"}} • Bool filter {"bool": { "must": { "term": { "folder": "inbox" }}, "must_not": { "term": { "tag": "spam" }} "should": [{ "term": { "starred": true }}, { "term": { "unread": true }}] }} How To (Filters)
  • 14.
    Nov 21, 2015 HowTo (Queries) • ES provides 38 queries (Sep 2015) • match query { "match": { "tweet": "About Search" } • multi_match query { "multi_match": { "query": "full text search", "fields": [ "title", "body" ] }} • bool query { "bool": { "must": { "match": { "title": "how to make millions" }}, "must_not": { "match": { "tag": "spam" }}, "should": [ { "match": { "tag": "starred" }}, { "range": { "date": { "gte": "2014-01-01" }}} ]}} • fuzzy query
  • 15.
    Nov 21, 2015 Filters •Boolean (Y/N) • Exact values • No analysis • Cached • Faster Queries • Relevance • Full text • Analysis • Not cached • Slower Queries vs. Filters
  • 16.
    Nov 21, 2015 Anyindex search solution is way better than “LIKE”
  • 17.
    Nov 21, 2015 Howdoes SQL Full-text Index Work • Column-level language o Used by stemmers and tokenizers o Different columns for different languages o Language tags are respected (XML, binary) • Stop words ALTER FULLTEXT STOPLIST ProductSL ADD ‘blah' LANGUAGE 1033; • Thesaurus files o (i.e. “song”->”tune”)
  • 18.
  • 19.
    Nov 21, 2015 ESAnalysis Process • Character filters o Simplify data (“&” -> “and”, “ü” -> “u”) • Tokenizers o Split data into words (terms, tokens) • Token filters o Lowercase o Remove words w/o relevance impact (“a”, “the”) o Synonyms added • Stemming o Reduce to root form (“dogs” -> “dog”)
  • 20.
    Nov 21, 2015 Analyzers •FT fields are analyzed into terms to create inverted index • Configured when index is created "Set the shape to semi-transparent by calling set_trans(5)" Analyzer Type Example Whitespace Set, the, shape, to, semi-transparent, by, calling, set_trans(5) Standard (Def.) set, the, shape, to, semi, transparent, by, calling, set_trans, 5 Simple set, the, shape, to, semi, transparent, by, calling, set, trans Stop set, the, shape, to, semi, transparent, by, calling, set, trans Language (EN) set, shape, semi, transparent, calling, set_trans, 5 Pattern “nonword”:{ “type”: “pattern”, “pattern”:”[^w]+” } Custom Allows combination of Tokenizer[1:1] and TokenFilters[0:N]
  • 21.
    Nov 21, 2015 SecurityRemarks • RAM is Important o Data structures reside in-memory o Performance and reliability depend on it oBe Aware • No authentication! • Protect private data alone • Prevent expensive requests (DoS) • Protect http://localhost:9200
  • 22.
    Nov 21, 2015 Sideby Side ElasticSearch SQL Full-text Search Performance RAM mainly Disk I/O mainly Licensing Open Source Commercial Platform Any (Java) Windows Only Wildcards Yes Partly FTS Syntax Rich Basic Extensibility Plugins CLR or custom code Scale Out Yes No Relational Integrity No Yes Security No Yes FT Search Setup Manual Wizard Index Update Manual Auto
  • 23.
    Nov 21, 2015 FromSQL to Elasticsearch • Rivers (deprecated) • Logstash o Open source log management tool • Client libraries o .NET • Elasticsearch.Net • Nest o Also Java, JS, Perl, Python, Ruby, PHP
  • 24.
    Nov 21, 2015 Summary •Not a replacement of RDBMS • Real-time search applications • Built for scalability • Easy to install • RESTful API and JSON
  • 25.
  • 26.
    Nov 21, 2015 Deployment(Windows)  Install Java   Download ES zip  Install  [ESHome]/bin> service install  Set ES service to start automatically  [ESHome]/bin> service manager  Open in browser http://localhost:9200/  Plugin Install  [ESHome]/bin> plugin -i elasticsearch/marvel/latest  Restart ES
  • 27.
    Nov 21, 2015 Takeaways •Tools o Kopf: https://github.com/lmenezes/elasticsearch-kopf o Marvel: https://www.elastic.co/products/marvel o Curl: http://curl.haxx.se/download.html o JDBC Driver: http://www.java2s.com/Code/Jar/s/Downloadsqljdbc430jar.htm • Community o https://discuss.elastic.co (yes  “.co”, not “.com”) • Getting Started o http://joelabrahamsson.com/elasticsearch-101/
  • 28.
    Nov 21, 2015 Thanksto our Sponsors: General Sponsor: Gold Sponsors: Media Partners: Technological Partners: Hosting Partner: