InfluxDB Internals Platform Engineering Team
 @ryanbetts / ryan@influxdata.com
How great are databases? • I like making things with smart, clever, kind people. • I’ve been working on high-throughput, realtime data for the last 10 years.
• What’s so special about time series • Time series database designs • InfluxDB internals
RDBMS NoSQL TSDB Correctness ACID BASE BASE Schema DDL DDL / documents on-write Writing data DML POST/PUT line protocol Reading data SQL GET + filter filter, window, group, join
TSDB unique combination • Ingest: thousands to millions of points per second • Store: fast accumulating, append-mostly data, lots of repetition, often with time-to-live • Query: analytic queries with fast filtering, windowing • Scale: availability, storage, query
Facebook Gorilla • TTL eviction • Columnar compression • Write availability > query correctness • Metric-based schema • Separate query processing from access-path
Druid • Roll-up at ingest • Columnar storage & time-based segments • Indexes on dimension for fast filtering • Separation of real time and historical data nodes
Bullet Journals • Fast event recording • Ordered by time • Indexed by dimensions • Weekly / Monthly roll-up
InfluxDB 1.Write Path 2.Storage 3.Query Path 4.Clustering
InfluxDB: Adding data (1) POST ’http://localhost:8086/write?db=mydb' --data- binary 'cpu_load_short,host=server01,region=us-west value=0.64 1434055562000000000’
InfluxDB: Adding data (2) fsync( ) batch to WAL Add to in- memory cache Snapshot cache to TSM Add to index
InfluxDB: on-disk (filesystem) CREATE RETENTION POLICY <retention_policy_name> ON <database_name> DURATION <duration> REPLICATION <n> [SHARD DURATION <duration>] [DEFAULT] Database directory /db Retention Policy directory /db/rp Shard Group (time bounded) (Logical) Shard directory (db/rp/Id#) TSM0001.tsm (data file) TSM0002.tsm (data file)
TSM Blocks Block TSM Index
InfluxDB: Adding data (DB) fsync( ) batch to WAL Add to in- memory cache Snapshot to TSM Add to index
InfluxDB: Adding data (index) • Measurement name -> field keys • Measurement name -> series • Measurement name -> tag keys -> tag value -> series • Series -> shards • (Also sketches of series and measurements for fast cardinality estimation)
InfluxDB: TSI • Roaring-bitmaps to short- cut series creation on insert • Iterators for index mappings • Index is per-shard; series id file is per-database • Partitioned for lock-splitting
TSI TSI
InfluxDB: InfluxQL Queries 1. Parses time range and expressions for filtering data 2. Look-up shards to access using the list of measurements and the time frame 3. Create the iterators for each shard 4. Merge the shard iterator outputs select user, system from cpu where time > now() - 1h and host = 'serverA
InfluxQL: Query with IFQL 1. Stand-alone `ifqld` coordinator nodes 2. Streaming storage iterators that support rate-limits 3. Separation of query planning and query distribution 4. Extensible, functional language 5. Unification of InfluxQL and TICKScript
A brief sidebar on append-mostly databases No one tells you about: * Wrong data * Old (back-filled) data
InfluxDB Clustering • Strongly consistent meta-cluster (based on RAFT) • User configured replication factor • Replication and shard aware query planner • Hinted-Handoff queues on each data node • (WIP) Anti-entropy consistency repair
Conclusions • Time series data has unique storage and query requirements that impact database design.
 • Evolution of InfluxDB: 1. TSI: remove the in-memory size limit on cardinality 2. IFQL: faster feature velocity; safer execution. 3. Anti-entropy repair: easier, more robust scale-out.

InfluxDB Internals

  • 1.
    InfluxDB Internals Platform EngineeringTeam
 @ryanbetts / ryan@influxdata.com
  • 2.
    How great aredatabases? • I like making things with smart, clever, kind people. • I’ve been working on high-throughput, realtime data for the last 10 years.
  • 3.
    • What’s sospecial about time series • Time series database designs • InfluxDB internals
  • 5.
    RDBMS NoSQL TSDB CorrectnessACID BASE BASE Schema DDL DDL / documents on-write Writing data DML POST/PUT line protocol Reading data SQL GET + filter filter, window, group, join
  • 6.
    TSDB unique combination •Ingest: thousands to millions of points per second • Store: fast accumulating, append-mostly data, lots of repetition, often with time-to-live • Query: analytic queries with fast filtering, windowing • Scale: availability, storage, query
  • 7.
    Facebook Gorilla • TTLeviction • Columnar compression • Write availability > query correctness • Metric-based schema • Separate query processing from access-path
  • 8.
    Druid • Roll-up atingest • Columnar storage & time-based segments • Indexes on dimension for fast filtering • Separation of real time and historical data nodes
  • 9.
    Bullet Journals • Fastevent recording • Ordered by time • Indexed by dimensions • Weekly / Monthly roll-up
  • 10.
  • 11.
    InfluxDB: Adding data(1) POST ’http://localhost:8086/write?db=mydb' --data- binary 'cpu_load_short,host=server01,region=us-west value=0.64 1434055562000000000’
  • 12.
    InfluxDB: Adding data(2) fsync( ) batch to WAL Add to in- memory cache Snapshot cache to TSM Add to index
  • 13.
    InfluxDB: on-disk (filesystem) CREATERETENTION POLICY <retention_policy_name> ON <database_name> DURATION <duration> REPLICATION <n> [SHARD DURATION <duration>] [DEFAULT] Database directory /db Retention Policy directory /db/rp Shard Group (time bounded) (Logical) Shard directory (db/rp/Id#) TSM0001.tsm (data file) TSM0002.tsm (data file)
  • 14.
  • 15.
    InfluxDB: Adding data(DB) fsync( ) batch to WAL Add to in- memory cache Snapshot to TSM Add to index
  • 16.
    InfluxDB: Adding data(index) • Measurement name -> field keys • Measurement name -> series • Measurement name -> tag keys -> tag value -> series • Series -> shards • (Also sketches of series and measurements for fast cardinality estimation)
  • 17.
    InfluxDB: TSI • Roaring-bitmapsto short- cut series creation on insert • Iterators for index mappings • Index is per-shard; series id file is per-database • Partitioned for lock-splitting
  • 18.
  • 19.
    InfluxDB: InfluxQL Queries 1.Parses time range and expressions for filtering data 2. Look-up shards to access using the list of measurements and the time frame 3. Create the iterators for each shard 4. Merge the shard iterator outputs select user, system from cpu where time > now() - 1h and host = 'serverA
  • 20.
    InfluxQL: Query withIFQL 1. Stand-alone `ifqld` coordinator nodes 2. Streaming storage iterators that support rate-limits 3. Separation of query planning and query distribution 4. Extensible, functional language 5. Unification of InfluxQL and TICKScript
  • 21.
    A brief sidebaron append-mostly databases No one tells you about: * Wrong data * Old (back-filled) data
  • 22.
    InfluxDB Clustering • Stronglyconsistent meta-cluster (based on RAFT) • User configured replication factor • Replication and shard aware query planner • Hinted-Handoff queues on each data node • (WIP) Anti-entropy consistency repair
  • 23.
    Conclusions • Time seriesdata has unique storage and query requirements that impact database design.
 • Evolution of InfluxDB: 1. TSI: remove the in-memory size limit on cardinality 2. IFQL: faster feature velocity; safer execution. 3. Anti-entropy repair: easier, more robust scale-out.