1
ML is simple Data Magic Happiness 2
Maybe not 3
Even if there are instructions 4
The reality Set business goals Understand your data Create hypothesis Define experiments Prepare
 data Measure/
 evaluate results Score
 models Export
 models Verify/test models Train/tune models 5 Our Topic
What is the model? A model is a function transforming inputs to outputs - 
 y = f(x), for example: Linear regression: y = ac + a1*x1 + … + an*xn Neural network: f (x) = K ( ∑i wi gi (x))  Such a definition of the model allows for an easy implementation of model’s composition. From the implementation point of view, it is just function composition. 6
Model learning pipeline UC Berkeley AMPLab introduced machine learning pipelines as a graph defining the 
 complete chain of data transformation. Input Data Stream Data Preprocessing Predictive
 Model Data
 Postprocessing Results model
 outputs model
 inputs model learning pipeline 7
QuestionnaireTraditional approach to model serving • Model is code • This code has to be saved and then somehow imported into model serving
 Why is this problematic? 8
 Impedance mismatch Continually expanding 
 Data Scientist toolbox Defined Software
 Engineer toolbox 9
Alternative - the model as data Export Export Data Model
 Evaluator Results Model
 Document Portable Format for Analytics (PFA) 10 Standards
Exporting the model as data with PMML There are already a lot of export options https://github.com/jpmml/jpmml-sparkml https://github.com/jpmml/jpmml-sklearn https://github.com/jpmml/jpmml-r https://github.com/jpmml/jpmml-tensorflow 11
Evaluating the PMML model There are also a couple PMML evaluators https://github.com/jpmml/jpmml-evaluator https://github.com/opendatagroup/augustus 12
Exporting the model as data with 
 Tensorflow • Tensorflow execution is based on Tensors and Graphs • Tensors are defined as multilinear functions that consist of various vector variables • A computational graph is a series of Tensorflow operations arranged into graph of nodes • Tensorflow support exporting of such graph in the form of binary protocol buffers • There are two different export formats—optimized graph and a new format, the saved model 13
Evaluating the Tensorflow model • Tensorflow is implemented in C++ with a Python interface. • In order to simplify Tensorflow usage from Java, in 2017 Google introduced Tensorflow Java APIs. • Tensorflow Java APIs supports import of the exported model and use them for scoring. 14
Additional considerations – the model lifecycle • Models tend to change • Update frequencies vary greatly – 
 from hourly to quarterly/yearly • Model version tracking • Model release practices • Model update process 15
The solution A streaming system allowing to update models without interruption of execution (dynamically controlled stream). Machine
 learning Data
 source Model
 source Data stream Model stream Model update Streaming engine Current model Additional
 processing Result External model 
 storage (Optional) 16
Model representation On the wire syntax = “proto3”; // Description of the trained model. message ModelDescriptor {    // Model name    string name = 1;    // Human readable description.    string description = 2;    // Data type for which this model is applied.    string dataType = 3;    // Model type    enum ModelType {        TENSORFLOW  = 0;        TENSORFLOWSAVED = 2;        PMML = 2;    };    ModelType modeltype = 4;    oneof MessageContent {        // Byte array containing the model        bytes data = 5;        string location = 6;    } } Internal
 trait Model {  def score(input : AnyVal) : AnyVal  def cleanup() : Unit  def toBytes() : Array[Byte]  def getType : Long }
 trait ModelFactoryl {  def create(input : ModelDescriptor) : Model  def restore(bytes : Array[Byte]) : Model } 17
Implementation options Modern stream processing engines (SPE) take advantage of the cluster architectures. They organize computations into a set of operators, which enables execution parallelism; different operators can run on different threads or different machines. Stream-processing libraries (SPL), on the other hand, is a library, and often domain-specific language (DSL), of constructs that simplify building streaming applications. 18
Decision criteria • Using an SPE is a good fit for applications that require features provided out of the box by such engines. But you need to adhere to its programming and deployment models. • An SPL provides a programming model that allows developers to build the applications or microservices in a way that fits their precise needs and deploy them as simple standalone Java applications. 19
Apache Flink is an open-source SPE that provides the following: • Scales well, running on thousands of nodes • Provides powerful checkpointing and save pointing facilities that enable fault tolerance and restart-ability • Provides state support for streaming applications, which allows minimization of usage of external databases for streaming applications • Provides powerful window semantics, allowing you to produce accurate results, even in the case of out-of-order or late-arriving data 20
Flink low-level join • Create a state object for one input (or both) • Update the state upon receiving elements from its input • Upon receiving elements from the other input, probe the state and produce the joined result Source 1 Source 2 Process stream 1 record Process stream 2 record Current state Result stream processing Stream 1 Stream 2 Join component Results stream 21
Key-based join Flink’s CoProcessFunction allows key-based merge of two streams. When using this API, data is key-partitioned across multiple Flink executors. Records from both streams are routed (based on the key) to the appropriate executor that is responsible for the actual processing. Key group 1 Key group 2 Key group 3 • • • Key group n Model stream Model server 
 (key group 1) Model server
 (key group 2) Model server 
 (key group 3) Model server 
 (key group n) Servingresults • • • Key group 1 Key group 2 Key group 3 • • • Key group n Data stream 22
Partition-based join Flink’s RichCoFlatMapFunction allows merging of 2 streams in parallel (based on parallelization parameter). When using this API, on the partitioned stream, data from different partitions is processed by dedicated Flink executor. Model stream Model server 
 (instance 1) Model server 
 (instance 2) Model server 
 (instance 3) Model server 
 (instance n) Servingresults • • • Partition 1 Partition 2 Partition 3 Data stream 23
• Akka Streams, part of the Akka toolkit, is a library focused on in-process streaming with backpressure. • Provides a broad ecosystem of connectors to various technologies (data stores, message queues, file stores, streaming services, etc) - Alpakka • In Akka Streams computations are written in graph-resembling domain- specific language (DSL), which aims to make translating graph drawings to and from code simpler. 24
Using custom stages Create a custom stage, which is a fully type-safe way to encapsulate required functionality. The stage will provide functionality somewhat similar to a 
 Flink low-level join Source 1
 Alpakka Flow 1 Source 2
 Alpakka Flow 2 Custom stage – 
 model serving Stream 1 Stream 2 Stream 1 Stream 2 Results Stream 25
Improve scalability Using the router actor to forward request to an individual actor responsible for processing request for a specific model type low-level join Source 1
 Alpakka Flow 1 Source 2
 Alpakka Flow 2 Stream 1 Stream 2 Model serving router Stream 1 Stream 2 Model serving 
 actor Model serving 
 actorModel serving 
 actor Model serving 
 actor 26
Using Akka Cluster Two levels of scalability: • Kafka partitioned topic allow to scale listeners according to the amount of partitions. • Akka Cluster sharing allows to split model serving actors across clusters. Source 1
 Alpakka Flow 1 Source 2
 Alpakka Flow 2 Stream 1 Stream 2 Model serving router Stream 1 Stream 2 Model serving 
 actor Model serving 
 actor Model serving 
 actor Model serving 
 actor Source 1
 Alpakka Flow 1 Source 2
 Alpakka Flow 2 Stream 1 Stream 2 Model serving router Stream 1 Stream 2 Model serving 
 actor Model serving 
 actor Model serving 
 actor Model serving 
 actor 
 
 
 Kafka Cluster JVM JVM Akka Cluster 27
Additional considerations on monitoring case class ModelToServeStats( name: String, // Model name description: String, // Model descriptor modelType: ModelDescriptor.ModelType, // Model type since : Long, // Start time of model usage var usage : Long = 0, // Number of servings var duration : Double = .0, // Time spent on serving        var min : Long = Long.MaxValue, // Min serving time var max : Long = Long.MinValue // Max serving time ) Model monitoring should provide information about usage, behavior, performance and lifecycle of the deployed models: 28
Queryable state Queryable state (interactive queries) is an approach, which  allows to get more from streaming than just the processing of data. This feature allows to treat the stream processing layer as a lightweight embedded database and, more concretely, to directly query the current state of a stream processing application, without needing to materialize that state to external databases or external storage first. Stream
 source State Stream processor Monitoring Interactive queries Streaming engine Other app 29
Sign up for our tutorial - Building streaming applications with Kafka –
 at O’Reily Software Architecture Conference New York Serving Machine Learning Models A Guide to Architecture, Stream Processing Engines, 
 and Frameworks By Boris Lublinksy GET YOUR FREE COPY 30
If you’re serious about bringing Machine Learning into your Fast Data and streaming applications, 
 let’s chat! SET UP A 20-MIN DEMO

Operationalizing Machine Learning: Serving ML Models

  • 1.
  • 2.
    ML is simple DataMagic Happiness 2
  • 3.
  • 4.
    Even if thereare instructions 4
  • 5.
    The reality Set businessgoals Understand your data Create hypothesis Define experiments Prepare
 data Measure/
 evaluate results Score
 models Export
 models Verify/test models Train/tune models 5 Our Topic
  • 6.
    What is themodel? A model is a function transforming inputs to outputs - 
 y = f(x), for example: Linear regression: y = ac + a1*x1 + … + an*xn Neural network: f (x) = K ( ∑i wi gi (x))  Such a definition of the model allows for an easy implementation of model’s composition. From the implementation point of view, it is just function composition. 6
  • 7.
    Model learning pipeline UCBerkeley AMPLab introduced machine learning pipelines as a graph defining the 
 complete chain of data transformation. Input Data Stream Data Preprocessing Predictive
 Model Data
 Postprocessing Results model
 outputs model
 inputs model learning pipeline 7
  • 8.
    QuestionnaireTraditional approach tomodel serving • Model is code • This code has to be saved and then somehow imported into model serving
 Why is this problematic? 8
  • 9.
     Impedance mismatch Continually expanding
 Data Scientist toolbox Defined Software
 Engineer toolbox 9
  • 10.
    Alternative - themodel as data Export Export Data Model
 Evaluator Results Model
 Document Portable Format for Analytics (PFA) 10 Standards
  • 11.
    Exporting the modelas data with PMML There are already a lot of export options https://github.com/jpmml/jpmml-sparkml https://github.com/jpmml/jpmml-sklearn https://github.com/jpmml/jpmml-r https://github.com/jpmml/jpmml-tensorflow 11
  • 12.
    Evaluating the PMMLmodel There are also a couple PMML evaluators https://github.com/jpmml/jpmml-evaluator https://github.com/opendatagroup/augustus 12
  • 13.
    Exporting the modelas data with 
 Tensorflow • Tensorflow execution is based on Tensors and Graphs • Tensors are defined as multilinear functions that consist of various vector variables • A computational graph is a series of Tensorflow operations arranged into graph of nodes • Tensorflow support exporting of such graph in the form of binary protocol buffers • There are two different export formats—optimized graph and a new format, the saved model 13
  • 14.
    Evaluating the Tensorflowmodel • Tensorflow is implemented in C++ with a Python interface. • In order to simplify Tensorflow usage from Java, in 2017 Google introduced Tensorflow Java APIs. • Tensorflow Java APIs supports import of the exported model and use them for scoring. 14
  • 15.
    Additional considerations –the model lifecycle • Models tend to change • Update frequencies vary greatly – 
 from hourly to quarterly/yearly • Model version tracking • Model release practices • Model update process 15
  • 16.
    The solution A streamingsystem allowing to update models without interruption of execution (dynamically controlled stream). Machine
 learning Data
 source Model
 source Data stream Model stream Model update Streaming engine Current model Additional
 processing Result External model 
 storage (Optional) 16
  • 17.
    Model representation On thewire syntax = “proto3”; // Description of the trained model. message ModelDescriptor {    // Model name    string name = 1;    // Human readable description.    string description = 2;    // Data type for which this model is applied.    string dataType = 3;    // Model type    enum ModelType {        TENSORFLOW  = 0;        TENSORFLOWSAVED = 2;        PMML = 2;    };    ModelType modeltype = 4;    oneof MessageContent {        // Byte array containing the model        bytes data = 5;        string location = 6;    } } Internal
 trait Model {  def score(input : AnyVal) : AnyVal  def cleanup() : Unit  def toBytes() : Array[Byte]  def getType : Long }
 trait ModelFactoryl {  def create(input : ModelDescriptor) : Model  def restore(bytes : Array[Byte]) : Model } 17
  • 18.
    Implementation options Modern streamprocessing engines (SPE) take advantage of the cluster architectures. They organize computations into a set of operators, which enables execution parallelism; different operators can run on different threads or different machines. Stream-processing libraries (SPL), on the other hand, is a library, and often domain-specific language (DSL), of constructs that simplify building streaming applications. 18
  • 19.
    Decision criteria • Usingan SPE is a good fit for applications that require features provided out of the box by such engines. But you need to adhere to its programming and deployment models. • An SPL provides a programming model that allows developers to build the applications or microservices in a way that fits their precise needs and deploy them as simple standalone Java applications. 19
  • 20.
    Apache Flink isan open-source SPE that provides the following: • Scales well, running on thousands of nodes • Provides powerful checkpointing and save pointing facilities that enable fault tolerance and restart-ability • Provides state support for streaming applications, which allows minimization of usage of external databases for streaming applications • Provides powerful window semantics, allowing you to produce accurate results, even in the case of out-of-order or late-arriving data 20
  • 21.
    Flink low-level join •Create a state object for one input (or both) • Update the state upon receiving elements from its input • Upon receiving elements from the other input, probe the state and produce the joined result Source 1 Source 2 Process stream 1 record Process stream 2 record Current state Result stream processing Stream 1 Stream 2 Join component Results stream 21
  • 22.
    Key-based join Flink’s CoProcessFunction allowskey-based merge of two streams. When using this API, data is key-partitioned across multiple Flink executors. Records from both streams are routed (based on the key) to the appropriate executor that is responsible for the actual processing. Key group 1 Key group 2 Key group 3 • • • Key group n Model stream Model server 
 (key group 1) Model server
 (key group 2) Model server 
 (key group 3) Model server 
 (key group n) Servingresults • • • Key group 1 Key group 2 Key group 3 • • • Key group n Data stream 22
  • 23.
    Partition-based join Flink’s RichCoFlatMapFunction allowsmerging of 2 streams in parallel (based on parallelization parameter). When using this API, on the partitioned stream, data from different partitions is processed by dedicated Flink executor. Model stream Model server 
 (instance 1) Model server 
 (instance 2) Model server 
 (instance 3) Model server 
 (instance n) Servingresults • • • Partition 1 Partition 2 Partition 3 Data stream 23
  • 24.
    • Akka Streams,part of the Akka toolkit, is a library focused on in-process streaming with backpressure. • Provides a broad ecosystem of connectors to various technologies (data stores, message queues, file stores, streaming services, etc) - Alpakka • In Akka Streams computations are written in graph-resembling domain- specific language (DSL), which aims to make translating graph drawings to and from code simpler. 24
  • 25.
    Using custom stages Createa custom stage, which is a fully type-safe way to encapsulate required functionality. The stage will provide functionality somewhat similar to a 
 Flink low-level join Source 1
 Alpakka Flow 1 Source 2
 Alpakka Flow 2 Custom stage – 
 model serving Stream 1 Stream 2 Stream 1 Stream 2 Results Stream 25
  • 26.
    Improve scalability Using therouter actor to forward request to an individual actor responsible for processing request for a specific model type low-level join Source 1
 Alpakka Flow 1 Source 2
 Alpakka Flow 2 Stream 1 Stream 2 Model serving router Stream 1 Stream 2 Model serving 
 actor Model serving 
 actorModel serving 
 actor Model serving 
 actor 26
  • 27.
    Using Akka Cluster Twolevels of scalability: • Kafka partitioned topic allow to scale listeners according to the amount of partitions. • Akka Cluster sharing allows to split model serving actors across clusters. Source 1
 Alpakka Flow 1 Source 2
 Alpakka Flow 2 Stream 1 Stream 2 Model serving router Stream 1 Stream 2 Model serving 
 actor Model serving 
 actor Model serving 
 actor Model serving 
 actor Source 1
 Alpakka Flow 1 Source 2
 Alpakka Flow 2 Stream 1 Stream 2 Model serving router Stream 1 Stream 2 Model serving 
 actor Model serving 
 actor Model serving 
 actor Model serving 
 actor 
 
 
 Kafka Cluster JVM JVM Akka Cluster 27
  • 28.
    Additional considerations onmonitoring case class ModelToServeStats( name: String, // Model name description: String, // Model descriptor modelType: ModelDescriptor.ModelType, // Model type since : Long, // Start time of model usage var usage : Long = 0, // Number of servings var duration : Double = .0, // Time spent on serving        var min : Long = Long.MaxValue, // Min serving time var max : Long = Long.MinValue // Max serving time ) Model monitoring should provide information about usage, behavior, performance and lifecycle of the deployed models: 28
  • 29.
    Queryable state Queryable state(interactive queries) is an approach, which  allows to get more from streaming than just the processing of data. This feature allows to treat the stream processing layer as a lightweight embedded database and, more concretely, to directly query the current state of a stream processing application, without needing to materialize that state to external databases or external storage first. Stream
 source State Stream processor Monitoring Interactive queries Streaming engine Other app 29
  • 30.
    Sign up forour tutorial - Building streaming applications with Kafka –
 at O’Reily Software Architecture Conference New York Serving Machine Learning Models A Guide to Architecture, Stream Processing Engines, 
 and Frameworks By Boris Lublinksy GET YOUR FREE COPY 30
  • 31.
    If you’re seriousabout bringing Machine Learning into your Fast Data and streaming applications, 
 let’s chat! SET UP A 20-MIN DEMO