DotNetLombardia Milano Fiori, Italy
 www.slideshare.net/marco.parenzan  www.github.com/marcoparenzan  marco [dot] parenzan [at] 1nn0va [dot] it  www.1nnova.it  @marco_parenzan Formazione ,Divulgazione e Consulenza con 1nn0va Microsoft MVP 2014 for Microsoft Azure Cloud Architect, NET developer Loves Functional Programming, Html5 Game Programming and Internet of Things AZURE COMMUNITY BOOTCAMP 2015 IoT Day - 08/05/2015 @1nn0va #microservicesconf2015 9 Maggio 2015
Real-time Analytics • Intake millions of events per second (up to 1 GB/s) • Low processing latency, auto adaptive (sub-second to seconds) • Correlate between different streams, or with reference data • Find patterns or lack of patterns in data in real-time Fully Managed Cloud Service • No hardware acquisition and maintenance • No platform/infrastructure deployment and maintenance • Easily expand your business globally leveraging Azure regions
Mission Critical Reliability • Guaranteed event delivery • Guaranteed business continuity: Automatic and fast recovery Effective Audits • Privacy and security properties of solutions are evident • Azure integration for monitoring and ops alerting Easy To Scale • Scale from small to large on demand
Rapid Development with SQL like language • High-level: focus on stream analytics solution • Concise: less code to maintain • Fast test: Rapid development and debugging • First-class support for event streams and reference data Built in temporal semantics • Built-in temporal windowing and joining • Simple policy configuration to manage out-of-order events and late arrivals
• SELECT • FROM • WHERE • GROUP BY • HAVING • CASE WHEN THEN ELSE • INNER/LEFT OUTER JOIN • UNION • CROSS/OUTER APPLY • CAST • INTO • ORDER BY ASC, DSC • WITH • PARTITION B • OVER • DateName • DatePart • Day • Month • Year • DateTimeFromParts • DateDiff • DateAdd • TumblingWindow • HoppingWindow • SlidingWindow • Sum • Count • Avg • Min • Max • StDev • StDevP • Var • VarP • Len • Concat • CharIndex • Substring • PatIndex • Lag IsFirst • CollectTop
Pipeline SELECT UserName, TimeZone INTO OutputTable FROM InputStream Put the data in a static data container
Filters SELECT UserName, TimeZone FROM InputStream WHERE Topic = 'XBox' Show me the user name and time zone of tweets on the topic XBox "Haroon”, “Eastern Time (US & Canada)” "XO", “London” “Zach Dotseth“, “London”, “Football”,(…) "Haroon”, “Eastern Time (US & Canada)” “XBox”,(…) "XO",”London”, “XBox“, (…)
Windowing Concepts • Windows can be tumbling, hopping, or sliding • Windows are fixed length • Must be used in a GROUP BY clause • Output event will have the timestamp of the end of the window 1 5 4 26 8 6 4 t1 t2 t5 t6t3 t4 Time Window 1 Window 2 Window 3 Aggregate Function (Sum) 18 14Output Events
SELECT Topic, Count(*) AS TotalTweets FROM TwitterStream TIMESTAMP BY CreatedAt GROUP BY Topic, TumblingWindow(second, 10) “Give me the count of tweets every 10 seconds” 1 5 4 26 8 6 A 10-second Tumbling Window 8 6 5 3 6 1 1 5 4 26 6 15 3
SELECT Topic, Count(*) AS TotalTweets FROM TwitterStream TIMESTAMP BY CreatedAt GROUP BY Topic, HoppingWindow(second, 10, 5) “Every 5 seconds give me the count of tweets over the last 10 seconds” 1 5 4 26 8 6 A 10-second Hopping Window with a 5-second “Hop” 4 26 8 6 5 3 6 1 1 5 4 26 8 6 5 3 6 15 3
SELECT Topic, Count(*) AS TotalTweets FROM TwitterStream TIMESTAMP BY CreatedAt GROUP BY Topic, SlidingWindow(second, 10) “Give me the count of tweets in every distinct 10 seconds window” 1 5 4 26 8 6 Every 10-second Sliding Window with changes 8 6 5 3 6 1 1 5 4 26 6 15 3
Reference Data Seamless correlation of event streams with reference data Static or slowly-changing data stored in blobs CSV and JSON files in Azure Blobs; scanned for new snapshots on a settable cadence JOIN (INNER or LEFT OUTER) between streams and reference data sources Reference data appears like another input: SELECT myRefData.Name, myStream.Value FROM myStream JOIN myRefData ON myStream.myKey = myRefData.myKey
WITH Step1 AS ( SELECT Count(*) AS CountTweets, Topic FROM TwitterStream PARTITION BY PartitionId GROUP BY TumblingWindow(second, 3), Topic, PartitionId ), Step2 AS ( SELECT Avg(CountTweets) FROM Step1 GROUP BY TumblingWindow(minute, 3) ) SELECT * INTO Output1 FROM Step1 SELECT * INTO Output2 FROM Step2 SELECT * INTO Output3 FROM Step2 • A query can have multiple steps to enable pipeline execution • A step is a sub-query defined using WITH (“common table expression”) • Can be used to develop complex queries more elegantly by creating a intermediary named result • Creates unit of execution for scaling out when PARTITION BY is used • Each step’s output can be sent to multiple output targets using INTO
Partitioning allows for parallel execution over scaled-out resources SELECT Count(*) AS Count, Topic FROM TwitterStream PARTITION BY PartitionId GROUP BY TumblingWindow(minute, 3), Topic, PartitionId Query Result 1 Query Result 2 Query Result 3 Event Hub
http://www.slideshare.net/dav idemauri/azureml-creating- and-using-machine-learning- solutions-italian
DotNetLombardia Milano Fiori, Italy

Implementing a canonical IoT backend in Azure with Azure Stream Analytics

  • 1.
  • 2.
     www.slideshare.net/marco.parenzan  www.github.com/marcoparenzan marco [dot] parenzan [at] 1nn0va [dot] it  www.1nnova.it  @marco_parenzan Formazione ,Divulgazione e Consulenza con 1nn0va Microsoft MVP 2014 for Microsoft Azure Cloud Architect, NET developer Loves Functional Programming, Html5 Game Programming and Internet of Things AZURE COMMUNITY BOOTCAMP 2015 IoT Day - 08/05/2015 @1nn0va #microservicesconf2015 9 Maggio 2015
  • 8.
    Real-time Analytics • Intakemillions of events per second (up to 1 GB/s) • Low processing latency, auto adaptive (sub-second to seconds) • Correlate between different streams, or with reference data • Find patterns or lack of patterns in data in real-time Fully Managed Cloud Service • No hardware acquisition and maintenance • No platform/infrastructure deployment and maintenance • Easily expand your business globally leveraging Azure regions
  • 9.
    Mission Critical Reliability •Guaranteed event delivery • Guaranteed business continuity: Automatic and fast recovery Effective Audits • Privacy and security properties of solutions are evident • Azure integration for monitoring and ops alerting Easy To Scale • Scale from small to large on demand
  • 10.
    Rapid Development withSQL like language • High-level: focus on stream analytics solution • Concise: less code to maintain • Fast test: Rapid development and debugging • First-class support for event streams and reference data Built in temporal semantics • Built-in temporal windowing and joining • Simple policy configuration to manage out-of-order events and late arrivals
  • 11.
    • SELECT • FROM •WHERE • GROUP BY • HAVING • CASE WHEN THEN ELSE • INNER/LEFT OUTER JOIN • UNION • CROSS/OUTER APPLY • CAST • INTO • ORDER BY ASC, DSC • WITH • PARTITION B • OVER • DateName • DatePart • Day • Month • Year • DateTimeFromParts • DateDiff • DateAdd • TumblingWindow • HoppingWindow • SlidingWindow • Sum • Count • Avg • Min • Max • StDev • StDevP • Var • VarP • Len • Concat • CharIndex • Substring • PatIndex • Lag IsFirst • CollectTop
  • 12.
    Pipeline SELECT UserName, TimeZone INTOOutputTable FROM InputStream Put the data in a static data container
  • 13.
    Filters SELECT UserName, TimeZone FROMInputStream WHERE Topic = 'XBox' Show me the user name and time zone of tweets on the topic XBox "Haroon”, “Eastern Time (US & Canada)” "XO", “London” “Zach Dotseth“, “London”, “Football”,(…) "Haroon”, “Eastern Time (US & Canada)” “XBox”,(…) "XO",”London”, “XBox“, (…)
  • 14.
    Windowing Concepts • Windowscan be tumbling, hopping, or sliding • Windows are fixed length • Must be used in a GROUP BY clause • Output event will have the timestamp of the end of the window 1 5 4 26 8 6 4 t1 t2 t5 t6t3 t4 Time Window 1 Window 2 Window 3 Aggregate Function (Sum) 18 14Output Events
  • 15.
    SELECT Topic, Count(*)AS TotalTweets FROM TwitterStream TIMESTAMP BY CreatedAt GROUP BY Topic, TumblingWindow(second, 10) “Give me the count of tweets every 10 seconds” 1 5 4 26 8 6 A 10-second Tumbling Window 8 6 5 3 6 1 1 5 4 26 6 15 3
  • 16.
    SELECT Topic, Count(*)AS TotalTweets FROM TwitterStream TIMESTAMP BY CreatedAt GROUP BY Topic, HoppingWindow(second, 10, 5) “Every 5 seconds give me the count of tweets over the last 10 seconds” 1 5 4 26 8 6 A 10-second Hopping Window with a 5-second “Hop” 4 26 8 6 5 3 6 1 1 5 4 26 8 6 5 3 6 15 3
  • 17.
    SELECT Topic, Count(*)AS TotalTweets FROM TwitterStream TIMESTAMP BY CreatedAt GROUP BY Topic, SlidingWindow(second, 10) “Give me the count of tweets in every distinct 10 seconds window” 1 5 4 26 8 6 Every 10-second Sliding Window with changes 8 6 5 3 6 1 1 5 4 26 6 15 3
  • 19.
    Reference Data Seamlesscorrelation of event streams with reference data Static or slowly-changing data stored in blobs CSV and JSON files in Azure Blobs; scanned for new snapshots on a settable cadence JOIN (INNER or LEFT OUTER) between streams and reference data sources Reference data appears like another input: SELECT myRefData.Name, myStream.Value FROM myStream JOIN myRefData ON myStream.myKey = myRefData.myKey
  • 20.
    WITH Step1 AS( SELECT Count(*) AS CountTweets, Topic FROM TwitterStream PARTITION BY PartitionId GROUP BY TumblingWindow(second, 3), Topic, PartitionId ), Step2 AS ( SELECT Avg(CountTweets) FROM Step1 GROUP BY TumblingWindow(minute, 3) ) SELECT * INTO Output1 FROM Step1 SELECT * INTO Output2 FROM Step2 SELECT * INTO Output3 FROM Step2 • A query can have multiple steps to enable pipeline execution • A step is a sub-query defined using WITH (“common table expression”) • Can be used to develop complex queries more elegantly by creating a intermediary named result • Creates unit of execution for scaling out when PARTITION BY is used • Each step’s output can be sent to multiple output targets using INTO
  • 21.
    Partitioning allows for parallelexecution over scaled-out resources SELECT Count(*) AS Count, Topic FROM TwitterStream PARTITION BY PartitionId GROUP BY TumblingWindow(minute, 3), Topic, PartitionId Query Result 1 Query Result 2 Query Result 3 Event Hub
  • 24.
  • 25.