Elasticsearch & Docker Rafał Kuć – Sematext Group, Inc. @kucrafal @sematext sematext.com Running High Performance Fault Tolerant Elasticsearch Clusters On Docker
About me… Sematext consultant & engineer Solr.pl co-founder Father and husband :)
Next 30 minutes
You Are Probably Familiar With This Development
You Are Probably Familiar With This Development Test
You Are Probably Familiar With This Development Test QA
You Are Probably Familiar With This Development Test QA Production environment
And The Problems That Come With It Resources not utilized
And The Problems That Come With It Resources not utilized Overprovisioned Servers
And The Problems That Come With It Resources not utilized Overprovisioned Servers ≠ ≠
The solution Development Test QA Production
Container Technologies
What is Docker? Lightweight Based on Open Standards Secure
Containers vs Virtual Machines Hardware Traditional Virtual Machine
Containers vs Virtual Machines Hardware Host Operating System Traditional Virtual Machine
Containers vs Virtual Machines Hardware Host Operating System Hypervisor Traditional Virtual Machine
Containers vs Virtual Machines Hardware Host Operating System Hypervisor Guest OS Guest OS Traditional Virtual Machine
Containers vs Virtual Machines Hardware Host Operating System Hypervisor Guest OS Guest OS Libraries Libraries Traditional Virtual Machine
Containers vs Virtual Machines Hardware Host Operating System Hypervisor Guest OS Guest OS Libraries Libraries Application 1 Application 2 Traditional Virtual Machine
Containers vs Virtual Machines Hardware Host Operating System Hypervisor Guest OS Guest OS Libraries Libraries Application 1 Application 2 Hardware Host Operating System Traditional Virtual MachineContainer
Containers vs Virtual Machines Hardware Host Operating System Hypervisor Guest OS Guest OS Libraries Libraries Application 1 Application 2 Hardware Host Operating System Docker Engine Traditional Virtual MachineContainer
Containers vs Virtual Machines Hardware Host Operating System Hypervisor Guest OS Guest OS Libraries Libraries Application 1 Application 2 Hardware Host Operating System Docker Engine Libraries Libraries Application 1 Application 2 Traditional Virtual MachineContainer
What is Elasticsearch? Reasonable defaults { JSON } Distributed by design http://www.dailypets.co.uk/2007/06/17/kittens-rest-at-half-time/
Running Official Elasticsearch Container $ docker run -d elasticsearch
Running Official Elasticsearch Container $ docker run -d elasticsearch == docker run -d elasticsearch:latest
Running Official Elasticsearch Container $ docker run -d elasticsearch:1.7 $ docker run -d elasticsearch == docker run -d elasticsearch:latest
Running Official Elasticsearch Container $ docker run -d elasticsearch == docker run -d elasticsearch:latest $ docker run --name es_1 -h es_master_1 elasticsearch $ docker run -d elasticsearch:1.7
Running Official Elasticsearch Container $ docker run -d elasticsearch == docker run -d elasticsearch:latest $ docker run --name es_1 -h es_master_1 elasticsearch $ docker run -d elasticsearch:1.7
Container Constraints $ docker run -d -m 2G elasticsearch http://docs.docker.com/engine/reference/run/
Container Constraints $ docker run -d -m 2G elasticsearch $ docker run -d -m 2G --memory-swappiness=0 elasticsearch http://docs.docker.com/engine/reference/run/
Container Constraints $ docker run -d -m 2G elasticsearch $ docker run -d -m 2G --memory-swappiness=0 elasticsearch $ docker run -d --cpuset-cpus="1,3" elasticsearch http://docs.docker.com/engine/reference/run/
Container Constraints $ docker run -d -m 2G elasticsearch $ docker run -d -m 2G --memory-swappiness=0 elasticsearch $ docker run -d --cpuset-cpus="1,3" elasticsearch http://docs.docker.com/engine/reference/run/ $ docker run -d --cpu-period=50000 --cpu-quota=25000 elasticsearch
Creating Optimized Image Dockerfile: FROM elasticsearch ADD ./elasticsearch.yml /usr/share/elasticsearch/config/
Creating Optimized Image Dockerfile: FROM elasticsearch ADD ./elasticsearch.yml /usr/share/elasticsearch/config/ $ docker build -t devops/example .
Creating Optimized Image Dockerfile: FROM elasticsearch ADD ./elasticsearch.yml /usr/share/elasticsearch/config/ $ docker build -t devops/example . Sending build context to Docker daemon 3.072 kB Step 1 : FROM elasticsearch ---> 8112755253f1 Step 2 : ADD ./elasticsearch.yml /usr/share/elasticsearch/config/ ---> Using cache ---> c9ca48a22e58 Successfully built c9ca48a22e58
Dealing With Network $ docker run -d -p 9200:9200 -p 9300:9300 elasticsearch
Dealing With Network $ docker run -d -p 9200:9200 -p 9300:9300 elasticsearch $ docker run -d elasticsearch -Dnetwork.publish_host=192.168.1.1
Dealing With Network $ docker run -d -p 9200:9200 -p 9300:9300 elasticsearch $ docker run -d elasticsearch -Dnetwork.publish_host=192.168.1.1 $ docker run -d -p 9200:9200 -p 9300:9300 elasticsearch -Dnetwork.publish_host=192.168.1.1
Dealing With Network $ docker run -d -p 9200:9200 -p 9300:9300 elasticsearch $ docker run -d elasticsearch -Dnetwork.publish_host=192.168.1.1 $ docker run -d -p 9200:9200 -p 9300:9300 elasticsearch -Dnetwork.publish_host=192.168.1.1 $ docker run -d -p 9200:9200 -p 9300:9300 elasticsearch -Dnetwork.publish_host=0.0.0.0
Network - Good Practices Separate network for Elasticsearch cluster
Network - Good Practices Separate network for Elasticsearch cluster Common host names for containers $ docker run -d -h es_node_1 elasticsearch
Network - Good Practices Separate network for Elasticsearch cluster Common host names for containers $ docker run -d -h es_node_1 elasticsearch Expose 9200 & 9300 ports only for client nodes
Network - Good Practices Separate network for Elasticsearch cluster Common host names for containers $ docker run -d -h es_node_1 elasticsearch Expose 9200 & 9300 ports only for client nodes Elasticsearch data & client nodes point to masters only
Dealing With Storage By default in /usr/share/elasticsearch/data
Dealing With Storage By default in /usr/share/elasticsearch/data By default not persisted
Dealing With Storage By default in /usr/share/elasticsearch/data By default not persisted $ docker run -d -v /opt/elasticsearch/data:/usr/share/elasticsearch/data elasticsearch
Dealing With Storage $ docker run -d -v /opt/elasticsearch/data:/usr/share/elasticsearch/data elasticsearch By default in /usr/share/elasticsearch/data By default not persisted Use data only containers Permissions
Data-Only Docker Volumes Bypasses Union File System
Data-Only Docker Volumes Bypasses Union File System Can be shared between containers
Data-Only Docker Volumes Bypasses Union File System Can be shared between containers Data volumes persist if the container itself is deleted
Data-Only Docker Volumes Bypasses Union File System Can be shared between containers Data volumes persist if the container itself is deleted $ docker create -v /mnt/es/data:/usr/share/elasticsearch/data --name esdata elasticsearch Permissions
Data-Only Docker Volumes Bypasses Union File System Can be shared between containers Data volumes persist if the container itself is deleted $ docker create -v /mnt/es/data:/usr/share/elasticsearch/data --name esdata elasticsearch $ docker run --volumes-from esdata elasticsearch
Highly Available Cluster Master only Master only Master only Data only Data only Data only Data only Data only Data only Client only Client only
Highly Available Cluster Master only Master only Master only Data only Data only Data only Data only Data only Data only Client only Client only minimum_master_nodes = N/2 + 1
Highly Available Cluster Master only Master only Master only Data only Data only Data only Data only Data only Data only Client only Client only minimum_master_nodes = N/2 + 1 recovery.after.nodes recovery.expected.nodes cluster.routing.allocation.node_concurrent_ recoveries index.unassigned.node_left.delayed_timeout index.priority
Master Nodes & Docker $ docker run -d elasticsearch -Dnode.master=true -Dnode.data=false -Dnode.client=false
Client Nodes & Docker $ docker run -d elasticsearch -Dnode.master=false -Dnode.data=false -Dnode.client=true
Data Nodes & Docker $ docker run -d elasticsearch -Dnode.master=false -Dnode.data=true -Dnode.client=false
Scaling Elasticsearch Node Elasticsearch Node Elasticsearch Node Elasticsearch Node
Scaling curl -XPUT 'http://localhost:9200/devops/' -d '{ "settings" : { "index" : { "number_of_shards" : 4, "number_of_replicas" : 0 } } }'
Scaling P P P P
Scaling curl -XPUT 'http://localhost:9200/devops/_settings' -d '{ "index.number_of_replicas" : 1 }'
Scaling P P P P R R R R
Scaling curl -XPUT 'http://localhost:9200/devops/_settings' -d '{ "index.number_of_replicas" : 2 }'
Scaling P P P P R R R R R R R R
Scaling curl -XPUT 'http://localhost:9200/devops/_settings' -d '{ "index.number_of_replicas" : 1 }'
Scaling P P P P R R R R
Scaling P P P P R R R R
Scaling P P PP UnassignedR R R R
RAM Buffer indices.memory.index_buffer_size: 10% indices.memory.min_index_buffer_size: 48mb indices.memory.max_index_buffer_size (unbounded) indices.memory.min_shard_index_buffer_size: 4mb
RAM Buffer indices.memory.index_buffer_size: 10% indices.memory.min_index_buffer_size: 48mb indices.memory.max_index_buffer_size (unbounded) indices.memory.min_shard_index_buffer_size: 4mb Higher Indexing Throughput Lower Indexing Throughput defaults ><
Time-Based Data? 2015-11-23 TODAY WEEK
Time-Based Data? curl -XPOST 'http://localhost:9200/_aliases' -d '{ "actions" : [ { "add" : {"index":"2015-11-23","alias":"today"} }, { "add" : {"index":"2015-11-23","alias":"week"} } ]}'
Time-Based Data? 2015-11-23 2015-11-24 TODAY WEEK
Time-Based Data? 2015-11-23 2015-11-24 2014-11-25 TODAY WEEK
Multiple Tiers node.tag=hot node.tag=cold node.tag=cold
Multiple Tiers curl -XPUT 'localhost:9200/data_2015-11-23' -d '{ "settings": { "index.routing.allocation.include.tag" : "hot" } }'
Multiple Tiers node.tag=hot node.tag=cold node.tag=cold data_2015-11-23 data_2015-11-23
Multiple Tiers curl -XPUT 'localhost:9200/data_2015-11-23/_settings' -d '{ "settings": { "index.routing.allocation.exclude.tag" : "hot", "index.routing.allocation.include.tag" : "cold", } }'
Multiple Tiers node.tag=hot node.tag=cold node.tag=cold data_2015-11-23 data_2015-11-23
Multiple Tiers node.tag=hot node.tag=cold node.tag=cold data_2015-11-23 data_2015-11-23 data_2015-11-24 data_2015-11-24
Multiple Tiers node.tag=hot node.tag=cold node.tag=cold data_2015-11-23 data_2015-11-23 data_2015-11-25 data_2015-11-25 data_2015-11-24 data_2015-11-24
Multiple Tenants
Multiple Tenants Hot Hot Cold Cold Cold Cold
Multiple Tenants Hot Hot Cold Cold Cold Cold R O U T I N G
Indexing Without Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Elasticsearch Application userA userA userA userA userAuserA userA userA
Indexing With Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Elasticsearch Application
Querying Without Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Elasticsearch Application
Querying With Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Elasticsearch Application
Routing vs No Routing Queries without routing (200 shards, 1 replica) #threads Avg response time Throughput 90% line Median CPU Utilization 1 3169ms 19,0/min 5214ms 2692ms 95 – 99%
Routing vs No Routing Queries without routing (200 shards, 1 replica) #threads Avg response time Throughput 90% line Median CPU Utilization 1 3169ms 19,0/min 5214ms 2692ms 95 – 99% Queries with routing (200 shards, 1 replica) #threads Avg response time Throughput 90% line Median CPU Utilization 10 196ms 50,6/sec 642ms 29ms 25 – 40% 20 218ms 91,2/sec 718ms 11ms 10 – 15%
Monitoring https://sematext.com/spm/integrations/docker-monitoring.html https://github.com/sematext/spm-agent-docker
Short summary http://www.soothetube.com/2013/12/29/thats-all-folks/
We Are Hiring! Dig Search? Dig Analytics? Dig Big Data? Dig Performance? Dig Logging? Dig working with, and in, open–source? We’re hiring worldwide! http://sematext.com/about/jobs.html
Rafał Kuć @kucrafal rafal.kuc@sematext.com Sematext @sematext http://sematext.com http://blog.sematext.com Thank You !

Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker

Editor's Notes

  • #9 Problems with standard deployment like: Resources not utilized Need to provision machines before deployment Differences between development, test, QA and production environments Hard to scale automatically
  • #10 Problems with standard deployment like: Resources not utilized Need to provision machines before deployment Differences between development, test, QA and production environments Hard to scale automatically
  • #11 Problems with standard deployment like: Resources not utilized Need to provision machines before deployment Differences between development, test, QA and production environments Hard to scale automatically
  • #13 Amazon EC2 container service Spoonium Kubernetes RKT
  • #60 Shards and Replicas
  • #62 Shards and Replicas
  • #64 Shards and Replicas
  • #66 Shards and Replicas
  • #68 Shards and Replicas
  • #69 Shards and Replicas
  • #70 Shards and Replicas
  • #84 No time based indices No tiers Possible solution - routing
  • #85 No time based indices No tiers Possible solution - routing
  • #86 No time based indices No tiers Possible solution - routing