Operationalizing Machine Learning: Serving ML Models

ML is simple Data Magic Happiness 2

Even if there are instructions 4

The reality Set business goals Understand your data Create hypothesis Define experiments Prepare  data Measure/  evaluate results Score  models Export  models Verify/test models Train/tune models 5 Our Topic

What is the model? A model is a function transforming inputs to outputs -   y = f(x), for example: Linear regression: y = ac + a1*x1 + … + an*xn Neural network: f (x) = K ( ∑i wi gi (x)) Such a definition of the model allows for an easy implementation of model’s composition. From the implementation point of view, it is just function composition. 6

Model learning pipeline UC Berkeley AMPLab introduced machine learning pipelines as a graph defining the   complete chain of data transformation. Input Data Stream Data Preprocessing Predictive  Model Data  Postprocessing Results model  outputs model  inputs model learning pipeline 7

QuestionnaireTraditional approach to model serving • Model is code • This code has to be saved and then somehow imported into model serving  Why is this problematic? 8

Impedance mismatch Continually expanding   Data Scientist toolbox Defined Software  Engineer toolbox 9

Alternative - the model as data Export Export Data Model  Evaluator Results Model  Document Portable Format for Analytics (PFA) 10 Standards

Exporting the model as data with PMML There are already a lot of export options https://github.com/jpmml/jpmml-sparkml https://github.com/jpmml/jpmml-sklearn https://github.com/jpmml/jpmml-r https://github.com/jpmml/jpmml-tensorflow 11

Evaluating the PMML model There are also a couple PMML evaluators https://github.com/jpmml/jpmml-evaluator https://github.com/opendatagroup/augustus 12

Exporting the model as data with   Tensorflow • Tensorflow execution is based on Tensors and Graphs • Tensors are defined as multilinear functions that consist of various vector variables • A computational graph is a series of Tensorflow operations arranged into graph of nodes • Tensorflow support exporting of such graph in the form of binary protocol buffers • There are two different export formats—optimized graph and a new format, the saved model 13

Evaluating the Tensorflow model • Tensorflow is implemented in C++ with a Python interface. • In order to simplify Tensorflow usage from Java, in 2017 Google introduced Tensorflow Java APIs. • Tensorflow Java APIs supports import of the exported model and use them for scoring. 14

Additional considerations – the model lifecycle • Models tend to change • Update frequencies vary greatly –   from hourly to quarterly/yearly • Model version tracking • Model release practices • Model update process 15

The solution A streaming system allowing to update models without interruption of execution (dynamically controlled stream). Machine  learning Data  source Model  source Data stream Model stream Model update Streaming engine Current model Additional  processing Result External model   storage (Optional) 16

Model representation On the wire syntax = “proto3”; // Description of the trained model. message ModelDescriptor { // Model name string name = 1; // Human readable description. string description = 2; // Data type for which this model is applied. string dataType = 3; // Model type enum ModelType { TENSORFLOW = 0; TENSORFLOWSAVED = 2; PMML = 2; }; ModelType modeltype = 4; oneof MessageContent { // Byte array containing the model bytes data = 5; string location = 6; } } Internal  trait Model { def score(input : AnyVal) : AnyVal def cleanup() : Unit def toBytes() : Array[Byte] def getType : Long }  trait ModelFactoryl { def create(input : ModelDescriptor) : Model def restore(bytes : Array[Byte]) : Model } 17

Implementation options Modern stream processing engines (SPE) take advantage of the cluster architectures. They organize computations into a set of operators, which enables execution parallelism; different operators can run on different threads or different machines. Stream-processing libraries (SPL), on the other hand, is a library, and often domain-specific language (DSL), of constructs that simplify building streaming applications. 18

Decision criteria • Using an SPE is a good fit for applications that require features provided out of the box by such engines. But you need to adhere to its programming and deployment models. • An SPL provides a programming model that allows developers to build the applications or microservices in a way that fits their precise needs and deploy them as simple standalone Java applications. 19

Apache Flink is an open-source SPE that provides the following: • Scales well, running on thousands of nodes • Provides powerful checkpointing and save pointing facilities that enable fault tolerance and restart-ability • Provides state support for streaming applications, which allows minimization of usage of external databases for streaming applications • Provides powerful window semantics, allowing you to produce accurate results, even in the case of out-of-order or late-arriving data 20

Flink low-level join • Create a state object for one input (or both) • Update the state upon receiving elements from its input • Upon receiving elements from the other input, probe the state and produce the joined result Source 1 Source 2 Process stream 1 record Process stream 2 record Current state Result stream processing Stream 1 Stream 2 Join component Results stream 21

Key-based join Flink’s CoProcessFunction allows key-based merge of two streams. When using this API, data is key-partitioned across multiple Flink executors. Records from both streams are routed (based on the key) to the appropriate executor that is responsible for the actual processing. Key group 1 Key group 2 Key group 3 • • • Key group n Model stream Model server   (key group 1) Model server  (key group 2) Model server   (key group 3) Model server   (key group n) Servingresults • • • Key group 1 Key group 2 Key group 3 • • • Key group n Data stream 22

Partition-based join Flink’s RichCoFlatMapFunction allows merging of 2 streams in parallel (based on parallelization parameter). When using this API, on the partitioned stream, data from different partitions is processed by dedicated Flink executor. Model stream Model server   (instance 1) Model server   (instance 2) Model server   (instance 3) Model server   (instance n) Servingresults • • • Partition 1 Partition 2 Partition 3 Data stream 23

• Akka Streams, part of the Akka toolkit, is a library focused on in-process streaming with backpressure. • Provides a broad ecosystem of connectors to various technologies (data stores, message queues, file stores, streaming services, etc) - Alpakka • In Akka Streams computations are written in graph-resembling domain- specific language (DSL), which aims to make translating graph drawings to and from code simpler. 24

Using custom stages Create a custom stage, which is a fully type-safe way to encapsulate required functionality. The stage will provide functionality somewhat similar to a   Flink low-level join Source 1  Alpakka Flow 1 Source 2  Alpakka Flow 2 Custom stage –   model serving Stream 1 Stream 2 Stream 1 Stream 2 Results Stream 25

Improve scalability Using the router actor to forward request to an individual actor responsible for processing request for a specific model type low-level join Source 1  Alpakka Flow 1 Source 2  Alpakka Flow 2 Stream 1 Stream 2 Model serving router Stream 1 Stream 2 Model serving   actor Model serving   actorModel serving   actor Model serving   actor 26

Using Akka Cluster Two levels of scalability: • Kafka partitioned topic allow to scale listeners according to the amount of partitions. • Akka Cluster sharing allows to split model serving actors across clusters. Source 1  Alpakka Flow 1 Source 2  Alpakka Flow 2 Stream 1 Stream 2 Model serving router Stream 1 Stream 2 Model serving   actor Model serving   actor Model serving   actor Model serving   actor Source 1  Alpakka Flow 1 Source 2  Alpakka Flow 2 Stream 1 Stream 2 Model serving router Stream 1 Stream 2 Model serving   actor Model serving   actor Model serving   actor Model serving   actor       Kafka Cluster JVM JVM Akka Cluster 27

Additional considerations on monitoring case class ModelToServeStats( name: String, // Model name description: String, // Model descriptor modelType: ModelDescriptor.ModelType, // Model type since : Long, // Start time of model usage var usage : Long = 0, // Number of servings var duration : Double = .0, // Time spent on serving var min : Long = Long.MaxValue, // Min serving time var max : Long = Long.MinValue // Max serving time ) Model monitoring should provide information about usage, behavior, performance and lifecycle of the deployed models: 28

Queryable state Queryable state (interactive queries) is an approach, which allows to get more from streaming than just the processing of data. This feature allows to treat the stream processing layer as a lightweight embedded database and, more concretely, to directly query the current state of a stream processing application, without needing to materialize that state to external databases or external storage first. Stream  source State Stream processor Monitoring Interactive queries Streaming engine Other app 29

Sign up for our tutorial - Building streaming applications with Kafka –  at O’Reily Software Architecture Conference New York Serving Machine Learning Models A Guide to Architecture, Stream Processing Engines,   and Frameworks By Boris Lublinksy GET YOUR FREE COPY 30

If you’re serious about bringing Machine Learning into your Fast Data and streaming applications,   let’s chat! SET UP A 20-MIN DEMO

Operationalizing Machine Learning: Serving ML Models

More Related Content

What's hot

Similar to Operationalizing Machine Learning: Serving ML Models

More from Lightbend

Recently uploaded

Operationalizing Machine Learning: Serving ML Models