Finding the right stuff Michael Reinsch an intro to Elasticsearch with Ruby/Rails at Ruby User Group Berlin, Feb 2016
How does it fit into my app?
Blackbox with REST API elasticsearch Update API: your app pushes updates 
 (updates are fast, but asynchronous) Search API: returns search results
For Ruby / Rails • https://github.com/elastic/elasticsearch-rails • gems for Rails: • elasticsearch-model & elasticsearch-rails • without Rails / AR: • elasticsearch-persistence
class Event < ActiveRecord::Base include Elasticsearch::Model
class Event < ActiveRecord::Base include Elasticsearch::Model def as_indexed_json(options={}) { title: title, description: description, starts_at: starts_at.iso8601, featured: group.featured? } end
class Event < ActiveRecord::Base include Elasticsearch::Model def as_indexed_json(options={}) { title: title, description: description, starts_at: starts_at.iso8601, featured: group.featured? } end settings do mapping dynamic: 'false' do indexes :title, type: 'string' indexes :description, type: 'string' indexes :starts_at, type: 'date' indexes :featured, type: 'boolean' end end
Event.import
Elasticsearch cluster
Index: events Type: event doc 1 Elasticsearch cluster
Index: creations Type: creation doc 1 Type: activity doc 2 doc 1 Index: events Type: event doc 1 Elasticsearch cluster
Documents, not relationships compose documents with all relevant data ➜ "denormalize" your data
class Event < ActiveRecord::Base include Elasticsearch::Model def as_indexed_json(options={}) { titles: [ title1, title2 ], locations: locs.map(&:as_indexed_json) } end settings do mapping dynamic: 'false' do indexes :titles, type: 'string' indexes :locations, type: 'nested' do indexes :name, type: 'string' indexes :address, type: 'string' indexes :location, type: 'geo_point' end end end
Event.search 'tokyo rubyist'
response = Event.search 'tokyo rubyist' response.took # => 28 response.results.total # => 2075 response.results.first._score # => 0.921177 response.results.first._source.title # => "Drop in Ruby" response.page(2).results # => second page of results
response = Event.search 'tokyo rubyist' response.took # => 28 response.results.total # => 2075 response.results.first._score # => 0.921177 response.results.first._source.title # => "Drop in Ruby" response.page(2).results # => second page of results supports kaminari / will_paginate
response = Event.search 'tokyo rubyist' response.records.to_a # => [#<Event id: 12409, ...>, ...] response.page(2).records # => second page of result records response.records.each_with_hit do |rec,hit| puts "* #{rec.title}: #{hit._score}" end # * Drop in Ruby: 0.9205564 # * Javascript meets Ruby in Kamakura: 0.8947 # * Meetup at EC Navi: 0.8766844 # * Pair Programming Session #3: 0.8603562 # * Kickoff Party: 0.8265461
Event.search 'tokyo rubyist'
Event.search 'tokyo rubyist' only upcoming events?
Event.search 'tokyo rubyist' only upcoming events? sorted by start date?
Event.search query: { filtered: { query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: 'asc' } }
Event.search query: { filtered: { query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: 'asc' } } our query
Event.search query: { filtered: { query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: 'asc' } } filtered by conditions our query
Event.search query: { filtered: { query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: 'asc' } } filtered by conditions sorted by start time our query
Query DSL query: { <query_type>: <arguments> } filter: { <filter_type>: <arguments> } valid arguments depend on query / filter type
Query DSL query: { <query_type>: <arguments> } filter: { <filter_type>: <arguments> } valid arguments depend on query / filter type scores matching documents
Query DSL query: { <query_type>: <arguments> } filter: { <filter_type>: <arguments> } valid arguments depend on query / filter type scores matching documents filters documents
Event.search query: { filtered: { query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }
Match Query Multi Match Query Bool Query Boosting Query Common Terms Query Constant Score Query Dis Max Query Filtered Query Fuzzy Like This Query Fuzzy Like This Field Query Function Score Query Fuzzy Query GeoShape Query Has Child Query Has Parent Query Ids Query Indices Query Match All Query More Like This Query Nested Query Prefix Query Query String Query Simple Query String Query Range Query Regexp Query Span First Query Span Multi Term Query Span Near Query Span Not Query Span Or Query Span Term Query Term Query Terms Query Top Children Query Wildcard Query Minimum Should Match Multi Term Query Rewrite Template Query
And Filter Bool Filter Exists Filter Geo Bounding Box Filter Geo Distance Filter Geo Distance Range Filter Geo Polygon Filter GeoShape Filter Geohash Cell Filter Has Child Filter Has Parent Filter Ids Filter Indices Filter Limit Filter Match All Filter Missing Filter Nested Filter Not Filter Or Filter Prefix Filter Query Filter Range Filter Regexp Filter Script Filter Term Filter Terms Filter Type Filter
Event.search query: { bool: { should: [ { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, { function_score: { filter: { and: [ { range: { starts_at: { lte: 'now' } } }, { term: { featured: true } } ] }, gauss: { starts_at: { origin: 'now', scale: '10d', decay: 0.5 }, }, boost_mode: "sum" } } ], minimum_should_match: 2 } }
Create service objects class EventSearch def initialize @filters = [] end def starting_after(time) tap { @filters << { range: { starts_at: { gte: time } } } } end def featured tap { @filters << { term: { featured: true } } } end def in_group(group_id) tap { @filters << { term: { group_id: group_id } } } end
Event.search '東京rubyist'
Dealing with different languages built in analysers for arabic, armenian, basque, brazilian, bulgarian, catalan, cjk, czech, danish, dutch, english, finnish, french, galician, german, greek, hindi, hungarian, indonesian, irish, italian, latvian, lithuanian, norwegian, persian, portuguese, romanian, russian, sorani, spanish, swedish, turkish, thai.
class Event < ActiveRecord::Base include Elasticsearch::Model def as_indexed_json(options={}) { title: { en: title_en, de: title_de, ja: title_ja }, description: { en: desc_en, de: desc_de, ja: desc_ja }, starts_at: starts_at.iso8601, featured: group.featured? } end settings do mapping dynamic: 'false' do indexes :title do indexes :en, type: 'string', analyzer: 'english' indexes :de, type: 'string', analyzer: 'german' indexes :ja, type: 'string', analyzer: 'cjk' end indexes :description do indexes :en, type: 'string', analyzer: 'english' indexes :de, type: 'string', analyzer: 'german' indexes :ja, type: 'string', analyzer: 'cjk' end indexes :starts_at, type: 'date' indexes :featured, type: 'boolean' end end
Changes to mappings? ⚠ can't change field types / analysers ⚠ but: we can add new field mappings
class AddCreatedAtToES < ActiveRecord::Migration def up client = Elasticsearch::Client.new client.indices.put_mapping( index: Event.index_name, type: Event.document_type, body: { properties: { created_at: { type: 'date' } } } ) Event.__elasticsearch__.import end def down end end
Automated tests
class Event < ActiveRecord::Base include Elasticsearch::Model index_name "drkpr_#{Rails.env}_events" Index names with environment
Test helpers • everything is asynchronous! • Helpers:
 wait_for_elasticsearch
 wait_for_elasticsearch_removal
 clear_elasticsearch!
 
 ➜ https://gist.github.com/mreinsch/094dc9cf63362314cef4 • specs: Tag tests which require elasticsearch
Production ready? • use elastic.co/found or AWS ES • use two clustered instances for redundancy • Elasticsearch could go away • keep impact at a minimum! • update Elasticsearch from background worker
Questions? Resources: Elastic Docs
 https://www.elastic.co/guide/index.html Ruby Gem Docs
 https://github.com/elastic/elasticsearch-rails Elasticsearch rspec helpers
 https://gist.github.com/mreinsch/094dc9cf63362314cef4
 
 Elasticsearch indexer job example
 https://gist.github.com/mreinsch/acb2f6c58891e5cd4f13 or ask me later: michael@movingfast.io @mreinsch

Finding the right stuff, an intro to Elasticsearch (at Rug::B)

  • 1.
    Finding the rightstuff Michael Reinsch an intro to Elasticsearch with Ruby/Rails at Ruby User Group Berlin, Feb 2016
  • 5.
    How does itfit into my app?
  • 6.
    Blackbox with REST API elasticsearch UpdateAPI: your app pushes updates 
 (updates are fast, but asynchronous) Search API: returns search results
  • 7.
    For Ruby /Rails • https://github.com/elastic/elasticsearch-rails • gems for Rails: • elasticsearch-model & elasticsearch-rails • without Rails / AR: • elasticsearch-persistence
  • 8.
    class Event <ActiveRecord::Base include Elasticsearch::Model
  • 9.
    class Event <ActiveRecord::Base include Elasticsearch::Model def as_indexed_json(options={}) { title: title, description: description, starts_at: starts_at.iso8601, featured: group.featured? } end
  • 10.
    class Event <ActiveRecord::Base include Elasticsearch::Model def as_indexed_json(options={}) { title: title, description: description, starts_at: starts_at.iso8601, featured: group.featured? } end settings do mapping dynamic: 'false' do indexes :title, type: 'string' indexes :description, type: 'string' indexes :starts_at, type: 'date' indexes :featured, type: 'boolean' end end
  • 11.
  • 12.
  • 13.
    Index: events Type: event doc1 Elasticsearch cluster
  • 14.
    Index: creations Type: creation doc1 Type: activity doc 2 doc 1 Index: events Type: event doc 1 Elasticsearch cluster
  • 15.
    Documents, not relationships compose documentswith all relevant data ➜ "denormalize" your data
  • 16.
    class Event <ActiveRecord::Base include Elasticsearch::Model def as_indexed_json(options={}) { titles: [ title1, title2 ], locations: locs.map(&:as_indexed_json) } end settings do mapping dynamic: 'false' do indexes :titles, type: 'string' indexes :locations, type: 'nested' do indexes :name, type: 'string' indexes :address, type: 'string' indexes :location, type: 'geo_point' end end end
  • 17.
  • 18.
    response = Event.search'tokyo rubyist' response.took # => 28 response.results.total # => 2075 response.results.first._score # => 0.921177 response.results.first._source.title # => "Drop in Ruby" response.page(2).results # => second page of results
  • 19.
    response = Event.search'tokyo rubyist' response.took # => 28 response.results.total # => 2075 response.results.first._score # => 0.921177 response.results.first._source.title # => "Drop in Ruby" response.page(2).results # => second page of results supports kaminari / will_paginate
  • 20.
    response = Event.search'tokyo rubyist' response.records.to_a # => [#<Event id: 12409, ...>, ...] response.page(2).records # => second page of result records response.records.each_with_hit do |rec,hit| puts "* #{rec.title}: #{hit._score}" end # * Drop in Ruby: 0.9205564 # * Javascript meets Ruby in Kamakura: 0.8947 # * Meetup at EC Navi: 0.8766844 # * Pair Programming Session #3: 0.8603562 # * Kickoff Party: 0.8265461
  • 21.
  • 22.
  • 23.
    Event.search 'tokyo rubyist' onlyupcoming events? sorted by start date?
  • 24.
    Event.search query: { filtered:{ query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: 'asc' } }
  • 25.
    Event.search query: { filtered:{ query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: 'asc' } } our query
  • 26.
    Event.search query: { filtered:{ query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: 'asc' } } filtered by conditions our query
  • 27.
    Event.search query: { filtered:{ query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: 'asc' } } filtered by conditions sorted by start time our query
  • 28.
    Query DSL query: {<query_type>: <arguments> } filter: { <filter_type>: <arguments> } valid arguments depend on query / filter type
  • 29.
    Query DSL query: {<query_type>: <arguments> } filter: { <filter_type>: <arguments> } valid arguments depend on query / filter type scores matching documents
  • 30.
    Query DSL query: {<query_type>: <arguments> } filter: { <filter_type>: <arguments> } valid arguments depend on query / filter type scores matching documents filters documents
  • 31.
    Event.search query: { filtered:{ query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }
  • 32.
    Match Query Multi MatchQuery Bool Query Boosting Query Common Terms Query Constant Score Query Dis Max Query Filtered Query Fuzzy Like This Query Fuzzy Like This Field Query Function Score Query Fuzzy Query GeoShape Query Has Child Query Has Parent Query Ids Query Indices Query Match All Query More Like This Query Nested Query Prefix Query Query String Query Simple Query String Query Range Query Regexp Query Span First Query Span Multi Term Query Span Near Query Span Not Query Span Or Query Span Term Query Term Query Terms Query Top Children Query Wildcard Query Minimum Should Match Multi Term Query Rewrite Template Query
  • 33.
    And Filter Bool Filter ExistsFilter Geo Bounding Box Filter Geo Distance Filter Geo Distance Range Filter Geo Polygon Filter GeoShape Filter Geohash Cell Filter Has Child Filter Has Parent Filter Ids Filter Indices Filter Limit Filter Match All Filter Missing Filter Nested Filter Not Filter Or Filter Prefix Filter Query Filter Range Filter Regexp Filter Script Filter Term Filter Terms Filter Type Filter
  • 34.
    Event.search query: { bool:{ should: [ { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, { function_score: { filter: { and: [ { range: { starts_at: { lte: 'now' } } }, { term: { featured: true } } ] }, gauss: { starts_at: { origin: 'now', scale: '10d', decay: 0.5 }, }, boost_mode: "sum" } } ], minimum_should_match: 2 } }
  • 35.
    Create service objects classEventSearch def initialize @filters = [] end def starting_after(time) tap { @filters << { range: { starts_at: { gte: time } } } } end def featured tap { @filters << { term: { featured: true } } } end def in_group(group_id) tap { @filters << { term: { group_id: group_id } } } end
  • 36.
  • 37.
    Dealing with different languages builtin analysers for arabic, armenian, basque, brazilian, bulgarian, catalan, cjk, czech, danish, dutch, english, finnish, french, galician, german, greek, hindi, hungarian, indonesian, irish, italian, latvian, lithuanian, norwegian, persian, portuguese, romanian, russian, sorani, spanish, swedish, turkish, thai.
  • 38.
    class Event <ActiveRecord::Base include Elasticsearch::Model def as_indexed_json(options={}) { title: { en: title_en, de: title_de, ja: title_ja }, description: { en: desc_en, de: desc_de, ja: desc_ja }, starts_at: starts_at.iso8601, featured: group.featured? } end settings do mapping dynamic: 'false' do indexes :title do indexes :en, type: 'string', analyzer: 'english' indexes :de, type: 'string', analyzer: 'german' indexes :ja, type: 'string', analyzer: 'cjk' end indexes :description do indexes :en, type: 'string', analyzer: 'english' indexes :de, type: 'string', analyzer: 'german' indexes :ja, type: 'string', analyzer: 'cjk' end indexes :starts_at, type: 'date' indexes :featured, type: 'boolean' end end
  • 39.
    Changes to mappings? ⚠can't change field types / analysers ⚠ but: we can add new field mappings
  • 40.
    class AddCreatedAtToES <ActiveRecord::Migration def up client = Elasticsearch::Client.new client.indices.put_mapping( index: Event.index_name, type: Event.document_type, body: { properties: { created_at: { type: 'date' } } } ) Event.__elasticsearch__.import end def down end end
  • 41.
  • 42.
    class Event <ActiveRecord::Base include Elasticsearch::Model index_name "drkpr_#{Rails.env}_events" Index names with environment
  • 43.
    Test helpers • everythingis asynchronous! • Helpers:
 wait_for_elasticsearch
 wait_for_elasticsearch_removal
 clear_elasticsearch!
 
 ➜ https://gist.github.com/mreinsch/094dc9cf63362314cef4 • specs: Tag tests which require elasticsearch
  • 44.
    Production ready? • useelastic.co/found or AWS ES • use two clustered instances for redundancy • Elasticsearch could go away • keep impact at a minimum! • update Elasticsearch from background worker
  • 45.
    Questions? Resources: Elastic Docs
 https://www.elastic.co/guide/index.html Ruby GemDocs
 https://github.com/elastic/elasticsearch-rails Elasticsearch rspec helpers
 https://gist.github.com/mreinsch/094dc9cf63362314cef4
 
 Elasticsearch indexer job example
 https://gist.github.com/mreinsch/acb2f6c58891e5cd4f13 or ask me later: michael@movingfast.io @mreinsch