Norikra: SQL Stream Processing In Ruby 2014/11/19 RubyConf 2014 DAY 3 Satoshi Tagomori (@tagomoris)
Topics Why I wrote Norikra Norikra overview Norikra queries Use cases in production JRuby for me
Satoshi Tagomori (@tagomoris) Tokyo, Japan LINE Corporation
Monitoring/Data Analytics Overview collect parse clean up process visualize Access logs, store process Application logs, ...
collect parse clean up process visualize store process
collect parse clean up process visualize store process
Fluentd stream aggregation: Good for simple data/calculation collect parse clean up process visualize store process
Our services: More and more different services Many changes in a day (including logging) Many kind of logs for each services Many different metrics for each services
Fluentd stream aggregation: Not good for processing about complex/fragile environment... collect parse clean up process visualize store process
We want to: add/remove queries anytime we want write many queries for a service log stream ignore events without data we want make our service directors / growth hackers to write their own queries!
collect parse clean up process visualize store process
break.
Norikra: Schema-less Stream Processing with SQL Server software, written in JRuby, runs on JVM Open source software (GPLv2) http://norikra.github.io/ https://github.com/norikra/norikra
How To Setup Norikra: Install JRuby download jruby.tar.gz, extract it and export $PATH use rbenv rbenv install jruby-1.7.xx rbenv shell jruby-.. Install Norikra gem install norikra Execute Norikra server norikra start
Norikra Interface: CLI client/Client library: norikra-client norikra-client target open ... norikra-client query add ... tail -f ... | norikra-client event send ... WebUI show status show/add/remove queries HTTP API JSON, MessagePack
Norikra: Schema-less event stream: Add/Remove data fields whenever you want SQL: No more restarts to add/remove queries w/ JOINs, w/ SubQueries w/ UDF (in Java/Ruby as rubygems) Truly Complex events: Nested Hash/Array, accessible directly from SQL
Norikra Queries: (1) SELECT name, age FROM events target
Norikra Queries: (1) {“name”:”tagomoris”, “age”:35, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT name, age FROM events {“name”:”tagomoris”,”age”:35}
Norikra Queries: (1) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} without “age” SELECT name, age FROM events nothing
Norikra Queries: (2) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT name, age FROM events WHERE current=”San Diego” {“name”:”tagomoris”,”age”:35}
Norikra Queries: (2) {“name”:”nobu”, “age”:0, “address”:”Somewhere”, “corp”:”Heroku”, “current”:”SAN”} current is not “San Diego” SELECT name, age FROM events WHERE current=”San Diego” nothing
Norikra Queries: (3) SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age
Norikra Queries: (3) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age every 5 mins {”age”:35,”cnt”:3}, {“age”:33,”cnt”:1}, ...
Norikra Queries: (4) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age {”age”:35,”cnt”:3}, {“age”:33,”cnt”:1}, ... SELECT max(age) as max FROM events.win:time_batch(5 mins) {“max”:51} every 5 mins
Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age
Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY user.age
Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) WHERE current=”San Diego” AND attend.$0 AND attend.$1 GROUP BY user.age
break. next: use cases
Use case 1: External API call reports for partners (LINE) External API call for LINE Business Connect LINE backend sends requests to partner’s API endpoint using users’ messages http://developers.linecorp.com/blog/?p=3386
Use case 1: External API call reports for partners (LINE) channel gateway partner’s server logs query results MySQL Mail SELECT channelId AS channel_id, reason, detail, count(*) AS error_count, min(timestamp) AS first_timestamp, max(timestamp) AS last_timestamp FROM api_error_log.win:time_batch(60 sec) GROUP BY channelId,reason,detail HAVING count(*) > 0 http://developers.linecorp.com/blog/?p=3386
Use case 1: External API call reports for partners (LINE) API error response summaries http://developers.linecorp.com/blog/?p=3386
Use case 2: Lambda architecture Prompt reports for Ad service console Prompt reports with Norikra + Fixed reports with Hive app serverapp serverapp server app serverapp serverapp server Fluentd HDFS console service execute hive query (daily) fetch query results (frequently) impression logs
Use case 2: Prompt reports for Ad service console SELECT yyyymmdd, hh, campaign_id, region, lang, COUNT(*) AS click, COUNT(DISTINCT member_id) AS uu FROM ( SELECT yyyymmdd, hh, get_json_object(log, '$.campaign.id') AS campaign_id, get_json_object(log, '$.member.region') AS region, get_json_object(log, '$.member.lang') AS lang, get_json_object(log, '$.member.id') AS member_id FROM applog WHERE service='myservice' AND yyyymmdd='20140913' AND get_json_object(log, '$.type')='click' ) x GROUP BY yyyymmdd, hh, campaign_id, region, lang Hive query for fixed reports
Use case 2: Prompt reports for Ad service console Norikra query for prompt reports SELECT campaign.id AS campaign_id, member.region AS region, member.lang AS lang, COUNT(*) AS click, COUNT(DISTINCT member.id) AS uu FROM myservice.win:time_batch(1 hours) WHERE type="click" GROUP BY campaign.id, member.region, member.lang
Use case 3: Realtime access dashboard on Google Platform Access log visualization Count using Norikra (2-step), Store on Google BigQuery Dashboard on Google Spreadsheet + Apps Script http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw
Use case 3: Realtime access dashboard on Google Platform Server Fluentd http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw ngnix access log access logs to BigQuery norikra query results norikra query to aggregate node to aggregate locally
Use case 3: Realtime access dashboard on Google Platform 70 servers, 120,000 requests/sec (or more!) Fluentd logs to store http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw ngnix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngnix Google BigQuery Google Spreadsheet + Apps script ... counts per host total count
Why Norikra is written in JRuby Esper CEP(Complex Event Processing) library, written in Java Rubygems.org Open repository, for public UDF plugins of Norikra provided as gem
JRuby for me Ruby! (by great JRuby developer team!) makes developing Norikra dramatically faster with rubygems and rubygems.org for easy deployment/installation with Java libraries, ex: Jetty, Esper, ... There are not so many users in Tokyo :(
More queries, more simplicity and less latency in data processing Thanks! photo: by my co-workers http://norikra.github.io/ https://github.com/norikra/norikra
See also: http://norikra.github.io/ “Lambda Architecture Platform Using SQL” http://www.slideshare.net/tagomoris/lambda-architecture-using-sql-hadoopcon- 2014-taiwan “Stream processing and Norikra” http://www.slideshare.net/tagomoris/stream-processing-and-norikra “Batch processing and Stream processing by SQL” http://www.slideshare.net/tagomoris/hcj2014-sql “Norikra in Action” http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring http://www.slideshare.net/tagomoris/presentations
Storm or Norikra? Simple and fixed workload for huge traffic Use Storm! Complex and fragile workload for non-huge traffic Use Norikra!
Scalability? 10,000 - 100,000 events/sec on 2CPU 8Core server
HA? Distributed? NO! I have some idea, but I have no time to implement it There are no needs for HA/Distributed processing
Data flow & API? Use Fluentd!

Norikra: SQL Stream Processing In Ruby

  • 1.
    Norikra: SQL StreamProcessing In Ruby 2014/11/19 RubyConf 2014 DAY 3 Satoshi Tagomori (@tagomoris)
  • 2.
    Topics Why Iwrote Norikra Norikra overview Norikra queries Use cases in production JRuby for me
  • 3.
    Satoshi Tagomori (@tagomoris) Tokyo, Japan LINE Corporation
  • 7.
    Monitoring/Data Analytics Overview collect parse clean up process visualize Access logs, store process Application logs, ...
  • 9.
    collect parse cleanup process visualize store process
  • 11.
    collect parse cleanup process visualize store process
  • 13.
    Fluentd stream aggregation: Good for simple data/calculation collect parse clean up process visualize store process
  • 14.
    Our services: Moreand more different services Many changes in a day (including logging) Many kind of logs for each services Many different metrics for each services
  • 15.
    Fluentd stream aggregation: Not good for processing about complex/fragile environment... collect parse clean up process visualize store process
  • 16.
    We want to: add/remove queries anytime we want write many queries for a service log stream ignore events without data we want make our service directors / growth hackers to write their own queries!
  • 17.
    collect parse cleanup process visualize store process
  • 18.
  • 20.
    Norikra: Schema-less StreamProcessing with SQL Server software, written in JRuby, runs on JVM Open source software (GPLv2) http://norikra.github.io/ https://github.com/norikra/norikra
  • 21.
    How To SetupNorikra: Install JRuby download jruby.tar.gz, extract it and export $PATH use rbenv rbenv install jruby-1.7.xx rbenv shell jruby-.. Install Norikra gem install norikra Execute Norikra server norikra start
  • 22.
    Norikra Interface: CLIclient/Client library: norikra-client norikra-client target open ... norikra-client query add ... tail -f ... | norikra-client event send ... WebUI show status show/add/remove queries HTTP API JSON, MessagePack
  • 23.
    Norikra: Schema-less eventstream: Add/Remove data fields whenever you want SQL: No more restarts to add/remove queries w/ JOINs, w/ SubQueries w/ UDF (in Java/Ruby as rubygems) Truly Complex events: Nested Hash/Array, accessible directly from SQL
  • 24.
    Norikra Queries: (1) SELECT name, age FROM events target
  • 25.
    Norikra Queries: (1) {“name”:”tagomoris”, “age”:35, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT name, age FROM events {“name”:”tagomoris”,”age”:35}
  • 26.
    Norikra Queries: (1) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} without “age” SELECT name, age FROM events nothing
  • 27.
    Norikra Queries: (2) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT name, age FROM events WHERE current=”San Diego” {“name”:”tagomoris”,”age”:35}
  • 28.
    Norikra Queries: (2) {“name”:”nobu”, “age”:0, “address”:”Somewhere”, “corp”:”Heroku”, “current”:”SAN”} current is not “San Diego” SELECT name, age FROM events WHERE current=”San Diego” nothing
  • 29.
    Norikra Queries: (3) SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age
  • 30.
    Norikra Queries: (3) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age every 5 mins {”age”:35,”cnt”:3}, {“age”:33,”cnt”:1}, ...
  • 31.
    Norikra Queries: (4) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age {”age”:35,”cnt”:3}, {“age”:33,”cnt”:1}, ... SELECT max(age) as max FROM events.win:time_batch(5 mins) {“max”:51} every 5 mins
  • 32.
    Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age
  • 33.
    Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY user.age
  • 34.
    Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) WHERE current=”San Diego” AND attend.$0 AND attend.$1 GROUP BY user.age
  • 35.
  • 36.
    Use case 1: External API call reports for partners (LINE) External API call for LINE Business Connect LINE backend sends requests to partner’s API endpoint using users’ messages http://developers.linecorp.com/blog/?p=3386
  • 37.
    Use case 1: External API call reports for partners (LINE) channel gateway partner’s server logs query results MySQL Mail SELECT channelId AS channel_id, reason, detail, count(*) AS error_count, min(timestamp) AS first_timestamp, max(timestamp) AS last_timestamp FROM api_error_log.win:time_batch(60 sec) GROUP BY channelId,reason,detail HAVING count(*) > 0 http://developers.linecorp.com/blog/?p=3386
  • 38.
    Use case 1: External API call reports for partners (LINE) API error response summaries http://developers.linecorp.com/blog/?p=3386
  • 39.
    Use case 2:Lambda architecture Prompt reports for Ad service console Prompt reports with Norikra + Fixed reports with Hive app serverapp serverapp server app serverapp serverapp server Fluentd HDFS console service execute hive query (daily) fetch query results (frequently) impression logs
  • 40.
    Use case 2: Prompt reports for Ad service console SELECT yyyymmdd, hh, campaign_id, region, lang, COUNT(*) AS click, COUNT(DISTINCT member_id) AS uu FROM ( SELECT yyyymmdd, hh, get_json_object(log, '$.campaign.id') AS campaign_id, get_json_object(log, '$.member.region') AS region, get_json_object(log, '$.member.lang') AS lang, get_json_object(log, '$.member.id') AS member_id FROM applog WHERE service='myservice' AND yyyymmdd='20140913' AND get_json_object(log, '$.type')='click' ) x GROUP BY yyyymmdd, hh, campaign_id, region, lang Hive query for fixed reports
  • 41.
    Use case 2: Prompt reports for Ad service console Norikra query for prompt reports SELECT campaign.id AS campaign_id, member.region AS region, member.lang AS lang, COUNT(*) AS click, COUNT(DISTINCT member.id) AS uu FROM myservice.win:time_batch(1 hours) WHERE type="click" GROUP BY campaign.id, member.region, member.lang
  • 42.
    Use case 3: Realtime access dashboard on Google Platform Access log visualization Count using Norikra (2-step), Store on Google BigQuery Dashboard on Google Spreadsheet + Apps Script http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw
  • 43.
    Use case 3: Realtime access dashboard on Google Platform Server Fluentd http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw ngnix access log access logs to BigQuery norikra query results norikra query to aggregate node to aggregate locally
  • 44.
    Use case 3: Realtime access dashboard on Google Platform 70 servers, 120,000 requests/sec (or more!) Fluentd logs to store http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw ngnix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngnix Google BigQuery Google Spreadsheet + Apps script ... counts per host total count
  • 45.
    Why Norikra iswritten in JRuby Esper CEP(Complex Event Processing) library, written in Java Rubygems.org Open repository, for public UDF plugins of Norikra provided as gem
  • 46.
    JRuby for me Ruby! (by great JRuby developer team!) makes developing Norikra dramatically faster with rubygems and rubygems.org for easy deployment/installation with Java libraries, ex: Jetty, Esper, ... There are not so many users in Tokyo :(
  • 47.
    More queries, moresimplicity and less latency in data processing Thanks! photo: by my co-workers http://norikra.github.io/ https://github.com/norikra/norikra
  • 48.
    See also: http://norikra.github.io/ “Lambda Architecture Platform Using SQL” http://www.slideshare.net/tagomoris/lambda-architecture-using-sql-hadoopcon- 2014-taiwan “Stream processing and Norikra” http://www.slideshare.net/tagomoris/stream-processing-and-norikra “Batch processing and Stream processing by SQL” http://www.slideshare.net/tagomoris/hcj2014-sql “Norikra in Action” http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring http://www.slideshare.net/tagomoris/presentations
  • 49.
    Storm or Norikra? Simple and fixed workload for huge traffic Use Storm! Complex and fragile workload for non-huge traffic Use Norikra!
  • 50.
    Scalability? 10,000 -100,000 events/sec on 2CPU 8Core server
  • 51.
    HA? Distributed? NO! I have some idea, but I have no time to implement it There are no needs for HA/Distributed processing
  • 52.
    Data flow &API? Use Fluentd!