FASTER AND BETTER SEARCHFASTER AND BETTER SEARCH RESULTS WITH ELASTICSEARCHRESULTS WITH ELASTICSEARCH TAKE YOUR SITE-WIDE SEARCHES TO THE NEXT LEVELTAKE YOUR SITE-WIDE SEARCHES TO THE NEXT LEVEL 1
WEB SITE SEARCHWEB SITE SEARCH Search across different fields (title, content,...); show relevant results first; 2
WEB SITE SEARCHWEB SITE SEARCH Search across different fields (title, content,...); show relevant results first; categorize results; filter by various attributes; 2
WEB SITE SEARCHWEB SITE SEARCH Search across different fields (title, content,...); show relevant results first; categorize results; filter by various attributes; withstand user typos; treat synonyms as the same word; 2
WEB SITE SEARCHWEB SITE SEARCH Search across different fields (title, content,...); show relevant results first; categorize results; filter by various attributes; withstand user typos; treat synonyms as the same word; be scalable; be fault tolerant; easy to deploy. 2
PLONE SITE SEARCHPLONE SITE SEARCH ZCatalog: fully integrated in Plone; no advanced features (like synonyms support); not very scalable. 3
PLONE SITE SEARCHPLONE SITE SEARCH ZCatalog: fully integrated in Plone; no advanced features (like synonyms support); not very scalable. Apache Solr: based on the Java search library Apache Lucene; better results ranking; advanced features; more configurable; some clustering support (using Zookeper) 3
PLONE SITE SEARCHPLONE SITE SEARCH ZCatalog: fully integrated in Plone; no advanced features (like synonyms support); not very scalable. Apache Solr: based on the Java search library Apache Lucene; better results ranking; advanced features; more configurable; some clustering support (using Zookeper) Elasticsearch: based (again) on Lucene; similar search features of Solr great scalability; less XML, more JSON. 3
PLONE SITE SEARCHPLONE SITE SEARCH ZCatalog: fully integrated in Plone; no advanced features (like synonyms support); not very scalable. Apache Solr: collective.solr, alm.solrindex based on the Java search library Apache Lucene; better results ranking; advanced features; more configurable; some clustering support (using Zookeper) Elasticsearch: collective.elasticsearch based (again) on Lucene; similar search features of Solr great scalability; less XML, more JSON. 3
ELASTIC STACKELASTIC STACK Also know as ELK: Elasticsearch, Logstash, Kibana, Beats. Two main class of use cases: Almost static data: search engines, Time series data: logs and metrics. 4
ELASTICSEARCHELASTICSEARCH 5
INDEX A DOCUMENTINDEX A DOCUMENT POST plone/_doc { "title": "Getting started with plone and Elasticsearch", "author": "Enrico Polesel", "content": "We want to index the entire content of our Plone website into elasticsearch...", "tags": ["plone", "search", "elasticsearch", "cluster", "performance", "high availability"], "date": "2019-10-25T11:50:00+0200" } 6
INDEX A DOCUMENTINDEX A DOCUMENT POST plone/_doc { "title": "Getting started with plone and Elasticsearch", "author": "Enrico Polesel", "content": "We want to index the entire content of our Plone website into elasticsearch...", "tags": ["plone", "search", "elasticsearch", "cluster", "performance", "high availability"], "date": "2019-10-25T11:50:00+0200" } { "_index" : "plone", "_type" : "_doc", "_id" : "Y0MZ7W0B3-sU3YTrncfM", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1 } 6
DATA TYPESDATA TYPES short, long, float, double IP geopoint interval, date_interval keyword (not analyzed strings), text (analyzed strings), object, array, nested object, ... 7
DATA TYPESDATA TYPES short, long, float, double IP geopoint interval, date_interval keyword (not analyzed strings), text (analyzed strings), object, array, nested object, ... 7
DATA TYPESDATA TYPES short, long, float, double IP geopoint interval, date_interval keyword (not analyzed strings), text (analyzed strings), object, array, nested object, ... 7
DATA TYPESDATA TYPES short, long, float, double IP geopoint interval, date_interval keyword (not analyzed strings), text (analyzed strings), object, array, nested object, ... 7
DATA TYPESDATA TYPES short, long, float, double IP geopoint interval, date_interval keyword (not analyzed strings), text (analyzed strings), object, array, nested object, ... 7
DATA TYPESDATA TYPES short, long, float, double IP geopoint interval, date_interval keyword (not analyzed strings), text (analyzed strings), object, array, nested object, ... 7
DATA TYPESDATA TYPES short, long, float, double IP geopoint interval, date_interval keyword (not analyzed strings), text (analyzed strings), object, array, nested object, ... 7
TEXT ANALYSISTEXT ANALYSIS 8
ANALYZERSANALYZERS 1. Char filters convert HTML escape codes normalize unicode symbols replace patterns 2. Tokenizer separate on whitespaces separate on punctuation may be grammar based may generate partial words special tokenizer for special strings (like paths) 3. Token filters normalize tokens stemming remove stopwords translate synonyms 9
QUERY - MATCHQUERY - MATCH GET plone/_search { "query": { "match": { "content": "elasticsearch" } } } { ... "hits" : { "total" : { ... }, "max_score" : 0.2876821, "hits" : [ { "_index" : "plone", "_id" : "Y0MZ7W0B3-sU3YTrncfM", "_score" : 0.2876821, "_source" : { "title" : "Getting started with plone and Elasticsearch", ... } } ] } }
QUERY - FUZZY MATCHQUERY - FUZZY MATCH With distance 1 we have: Changing a character (box → fox) Removing a character (black → lack) Inserting a character (sic → sick) Transposing two adjacent characters (act → cat) GET plone/_search { "query": { "match": { "content": { "query": "ploMe", "fuzziness": 1 } } } } 11
QUERY - MULTI MATCHQUERY - MULTI MATCH Matches in the title field will be boosted! GET plone/_search { "query": { "multi_match": { "query": "plome", "fields": [ "tilte^2", "content" ], "fuzziness": 1 } } } 12
QUERYQUERY And much more! Suggestions, search as you type, geo query, external ranking, more like this, ... 13
AGGREGATIONSAGGREGATIONS GET plone/_search { "query": { "match": { "content": "elasticsearch" } }, "aggs": { "Authors": { "term": { "field": "author", "size": 10 } } } } 14
AGGREGATIONSAGGREGATIONS GET plone/_search { "query": { "match": { "content": "elasticsearch" } }, "aggs": { "Authors": { "term": { "field": "author", "size": 10 }, "aggs": { "Tags": { "term": { "field": "tags", "size": 100 } } } } } } 15
AGGREGATIONSAGGREGATIONS GET plone/_search { "query": { ... }, }, "aggs": { "Authors": { "term": { "field": "author", "size": 10 }, "aggs": { "Avg-length": { "avg": { "field": "length" } }, "Last-published": { "max": { "field": "date" } } } } } } 16
AGGREGATIONSAGGREGATIONS And much more! Advanced stats, geo centroid, cardinality, significant terms, ... 17
RUNNING ELASTICSEARCHRUNNING ELASTICSEARCH config/elasticsearch.yml config/jvm.options Docker, yum/apt, Windows and MacOS also supported! See $ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.4.0-linux-x86_64.tar.gz $ tar -xf elasticsearch-7.4.0-linux-x86_64.tar.gz $ cd elasticsearch-7.4.0-linux-x86_64 $ bin/elasticsearch https://www.elastic.co/downloads/ 18
RUNNING ELASTICSEARCHRUNNING ELASTICSEARCH config/elasticsearch.yml config/jvm.options Docker, yum/apt, Windows and MacOS also supported! See $ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.4.0-linux-x86_64.tar.gz $ tar -xf elasticsearch-7.4.0-linux-x86_64.tar.gz $ cd elasticsearch-7.4.0-linux-x86_64 $ bin/elasticsearch $ wget https://artifacts.elastic.co/downloads/kibana/kibana-7.4.0-linux-x86_64.tar.gz $ tar -xf kibana-7.4.0-linux-x86_64.tar.gz $ cd kibana-7.4.0-linux-x86_64 $ bin/kibana https://www.elastic.co/downloads/ 18
CLUSTERINGCLUSTERING Need high availability? Install two data nodes! (replica is enabled by default) Need more space? Increase the number of nodes! (and of indeces/shards) Need more search performance? Increase the number of replicas! Have disks of different type (fast/slow)? Use hot-cold architecture! 19
WHAT'S NEXT? ELASTIC APP SEARCHWHAT'S NEXT? ELASTIC APP SEARCH 20
HOMEWORKHOMEWORK Download Elasticsearch from Untar, cd, and run Elasticsearch (bin/elasticsearch) Test it: curl http://localhost:9200/ Add collective.elasticsearch to your project eggs & re-run buildout Restart Plone Goto Control Panel Add "Elastic Search" in Add-on Products Click "Elastic Search" in "Add-on Configuration" Enable Click "Convert Catalog" Click "Rebuild Catalog" https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.8.4.tar.gz 21

Faster and better search results with Elasticsearch

  • 1.
    FASTER AND BETTERSEARCHFASTER AND BETTER SEARCH RESULTS WITH ELASTICSEARCHRESULTS WITH ELASTICSEARCH TAKE YOUR SITE-WIDE SEARCHES TO THE NEXT LEVELTAKE YOUR SITE-WIDE SEARCHES TO THE NEXT LEVEL 1
  • 2.
    WEB SITE SEARCHWEBSITE SEARCH Search across different fields (title, content,...); show relevant results first; 2
  • 3.
    WEB SITE SEARCHWEBSITE SEARCH Search across different fields (title, content,...); show relevant results first; categorize results; filter by various attributes; 2
  • 4.
    WEB SITE SEARCHWEBSITE SEARCH Search across different fields (title, content,...); show relevant results first; categorize results; filter by various attributes; withstand user typos; treat synonyms as the same word; 2
  • 5.
    WEB SITE SEARCHWEBSITE SEARCH Search across different fields (title, content,...); show relevant results first; categorize results; filter by various attributes; withstand user typos; treat synonyms as the same word; be scalable; be fault tolerant; easy to deploy. 2
  • 6.
    PLONE SITE SEARCHPLONESITE SEARCH ZCatalog: fully integrated in Plone; no advanced features (like synonyms support); not very scalable. 3
  • 7.
    PLONE SITE SEARCHPLONESITE SEARCH ZCatalog: fully integrated in Plone; no advanced features (like synonyms support); not very scalable. Apache Solr: based on the Java search library Apache Lucene; better results ranking; advanced features; more configurable; some clustering support (using Zookeper) 3
  • 8.
    PLONE SITE SEARCHPLONESITE SEARCH ZCatalog: fully integrated in Plone; no advanced features (like synonyms support); not very scalable. Apache Solr: based on the Java search library Apache Lucene; better results ranking; advanced features; more configurable; some clustering support (using Zookeper) Elasticsearch: based (again) on Lucene; similar search features of Solr great scalability; less XML, more JSON. 3
  • 9.
    PLONE SITE SEARCHPLONESITE SEARCH ZCatalog: fully integrated in Plone; no advanced features (like synonyms support); not very scalable. Apache Solr: collective.solr, alm.solrindex based on the Java search library Apache Lucene; better results ranking; advanced features; more configurable; some clustering support (using Zookeper) Elasticsearch: collective.elasticsearch based (again) on Lucene; similar search features of Solr great scalability; less XML, more JSON. 3
  • 10.
    ELASTIC STACKELASTIC STACK Alsoknow as ELK: Elasticsearch, Logstash, Kibana, Beats. Two main class of use cases: Almost static data: search engines, Time series data: logs and metrics. 4
  • 11.
  • 12.
    INDEX A DOCUMENTINDEXA DOCUMENT POST plone/_doc { "title": "Getting started with plone and Elasticsearch", "author": "Enrico Polesel", "content": "We want to index the entire content of our Plone website into elasticsearch...", "tags": ["plone", "search", "elasticsearch", "cluster", "performance", "high availability"], "date": "2019-10-25T11:50:00+0200" } 6
  • 13.
    INDEX A DOCUMENTINDEXA DOCUMENT POST plone/_doc { "title": "Getting started with plone and Elasticsearch", "author": "Enrico Polesel", "content": "We want to index the entire content of our Plone website into elasticsearch...", "tags": ["plone", "search", "elasticsearch", "cluster", "performance", "high availability"], "date": "2019-10-25T11:50:00+0200" } { "_index" : "plone", "_type" : "_doc", "_id" : "Y0MZ7W0B3-sU3YTrncfM", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1 } 6
  • 14.
    DATA TYPESDATA TYPES short,long, float, double IP geopoint interval, date_interval keyword (not analyzed strings), text (analyzed strings), object, array, nested object, ... 7
  • 15.
    DATA TYPESDATA TYPES short,long, float, double IP geopoint interval, date_interval keyword (not analyzed strings), text (analyzed strings), object, array, nested object, ... 7
  • 16.
    DATA TYPESDATA TYPES short,long, float, double IP geopoint interval, date_interval keyword (not analyzed strings), text (analyzed strings), object, array, nested object, ... 7
  • 17.
    DATA TYPESDATA TYPES short,long, float, double IP geopoint interval, date_interval keyword (not analyzed strings), text (analyzed strings), object, array, nested object, ... 7
  • 18.
    DATA TYPESDATA TYPES short,long, float, double IP geopoint interval, date_interval keyword (not analyzed strings), text (analyzed strings), object, array, nested object, ... 7
  • 19.
    DATA TYPESDATA TYPES short,long, float, double IP geopoint interval, date_interval keyword (not analyzed strings), text (analyzed strings), object, array, nested object, ... 7
  • 20.
    DATA TYPESDATA TYPES short,long, float, double IP geopoint interval, date_interval keyword (not analyzed strings), text (analyzed strings), object, array, nested object, ... 7
  • 21.
  • 22.
    ANALYZERSANALYZERS 1. Char filters convertHTML escape codes normalize unicode symbols replace patterns 2. Tokenizer separate on whitespaces separate on punctuation may be grammar based may generate partial words special tokenizer for special strings (like paths) 3. Token filters normalize tokens stemming remove stopwords translate synonyms 9
  • 23.
    QUERY - MATCHQUERY- MATCH GET plone/_search { "query": { "match": { "content": "elasticsearch" } } } { ... "hits" : { "total" : { ... }, "max_score" : 0.2876821, "hits" : [ { "_index" : "plone", "_id" : "Y0MZ7W0B3-sU3YTrncfM", "_score" : 0.2876821, "_source" : { "title" : "Getting started with plone and Elasticsearch", ... } } ] } }
  • 24.
    QUERY - FUZZYMATCHQUERY - FUZZY MATCH With distance 1 we have: Changing a character (box → fox) Removing a character (black → lack) Inserting a character (sic → sick) Transposing two adjacent characters (act → cat) GET plone/_search { "query": { "match": { "content": { "query": "ploMe", "fuzziness": 1 } } } } 11
  • 25.
    QUERY - MULTIMATCHQUERY - MULTI MATCH Matches in the title field will be boosted! GET plone/_search { "query": { "multi_match": { "query": "plome", "fields": [ "tilte^2", "content" ], "fuzziness": 1 } } } 12
  • 26.
    QUERYQUERY And much more! Suggestions, searchas you type, geo query, external ranking, more like this, ... 13
  • 27.
    AGGREGATIONSAGGREGATIONS GET plone/_search { "query": { "match":{ "content": "elasticsearch" } }, "aggs": { "Authors": { "term": { "field": "author", "size": 10 } } } } 14
  • 28.
    AGGREGATIONSAGGREGATIONS GET plone/_search { "query": { "match":{ "content": "elasticsearch" } }, "aggs": { "Authors": { "term": { "field": "author", "size": 10 }, "aggs": { "Tags": { "term": { "field": "tags", "size": 100 } } } } } } 15
  • 29.
    AGGREGATIONSAGGREGATIONS GET plone/_search { "query": {... }, }, "aggs": { "Authors": { "term": { "field": "author", "size": 10 }, "aggs": { "Avg-length": { "avg": { "field": "length" } }, "Last-published": { "max": { "field": "date" } } } } } } 16
  • 30.
    AGGREGATIONSAGGREGATIONS And much more! Advancedstats, geo centroid, cardinality, significant terms, ... 17
  • 31.
    RUNNING ELASTICSEARCHRUNNING ELASTICSEARCH config/elasticsearch.yml config/jvm.options Docker,yum/apt, Windows and MacOS also supported! See $ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.4.0-linux-x86_64.tar.gz $ tar -xf elasticsearch-7.4.0-linux-x86_64.tar.gz $ cd elasticsearch-7.4.0-linux-x86_64 $ bin/elasticsearch https://www.elastic.co/downloads/ 18
  • 32.
    RUNNING ELASTICSEARCHRUNNING ELASTICSEARCH config/elasticsearch.yml config/jvm.options Docker,yum/apt, Windows and MacOS also supported! See $ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.4.0-linux-x86_64.tar.gz $ tar -xf elasticsearch-7.4.0-linux-x86_64.tar.gz $ cd elasticsearch-7.4.0-linux-x86_64 $ bin/elasticsearch $ wget https://artifacts.elastic.co/downloads/kibana/kibana-7.4.0-linux-x86_64.tar.gz $ tar -xf kibana-7.4.0-linux-x86_64.tar.gz $ cd kibana-7.4.0-linux-x86_64 $ bin/kibana https://www.elastic.co/downloads/ 18
  • 33.
    CLUSTERINGCLUSTERING Need high availability?Install two data nodes! (replica is enabled by default) Need more space? Increase the number of nodes! (and of indeces/shards) Need more search performance? Increase the number of replicas! Have disks of different type (fast/slow)? Use hot-cold architecture! 19
  • 34.
    WHAT'S NEXT? ELASTICAPP SEARCHWHAT'S NEXT? ELASTIC APP SEARCH 20
  • 35.
    HOMEWORKHOMEWORK Download Elasticsearch from Untar,cd, and run Elasticsearch (bin/elasticsearch) Test it: curl http://localhost:9200/ Add collective.elasticsearch to your project eggs & re-run buildout Restart Plone Goto Control Panel Add "Elastic Search" in Add-on Products Click "Elastic Search" in "Add-on Configuration" Enable Click "Convert Catalog" Click "Rebuild Catalog" https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.8.4.tar.gz 21