This project contains a demo to do model inference with Apache Kafka, Kafka Streams and a TensorFlow model deployed using TensorFlow Serving (leveraging Google Cloud ML Engine in this example). The concepts are very similar for other ML frameworks and Cloud Providers, e.g. you could also use Apache MXNet and AWS model server.
Machine Learning / Deep Learning models can be used in different way to do predictions. The preferred way is to deploy an analytic model directly into a Kafka Streams application. You could e.g. use the TensorFlow for Java API. Examples here: Model Inference within Kafka Streams Microservices.
However, it is not always a feasible approach. Sometimes it makes sense or is needed to deploy a model in another serving infrastructure like TF-Serving for TensorFlow models. This project shows how access such an infrastructure via Apache Kafka and Kafka Streams.
Pros of an external model serving infrastructure like TensorFlow Serving:
- Simple integration with existing systems and technologies
- Easier to understand if you come from non-streaming world
- Later migration to real streaming is also possible
Cons:
- Framework-specific Deployment (e.g. only TensorFlow models)
- Coupling the availability, scalability, and latency/throughput of your Kafka Streams application with the SLAs of the RPC interface
- Side-effects (e.g. in case of failure) not covered by Kafka processing (e.g. Exactly Once)
- Worse latency as communication over internet required
- No local inference (offline, devices, edge processing, etc.)
The blog post "How to deploy TensorFlow models to production using TF Serving" is a great explanation of how to export and deploy trained TensorFlow models to a TensorFlow Serving infrastructure. You can either deploy your own infrastructure anywhere or leverage a cloud service like Google Cloud ML Engine. A SavedModel is TensorFlow's recommended format for saving models, and it is the required format for deploying trained TensorFlow models using TensorFlow Serving or deploying on Goodle Cloud ML Engine
Things to do:
- Create Cloud ML Engine
- Deploy prebuild TensorFlow Model
- Create Kafka Cluster
- Implement Kafka Streams application
- Deploy Kafka Streams application (e.g. to a Kubernetes cluster)
- Generate streaming data to test the combination of Kafka Streams and TensorFlow Serving
I simply added an existing pretrained Image Recognition model built with TensorFlow (Inception V1).
I also created a new model for predictions of census using the "ML Engine getting started guide". The data for training is in 'data' folder.
Getting Started with Google ML Engine
Confluent Cloud - Apache Kafka as a Service
This example shows how do use TensorFlow Serving to deploy a model. The Kafka Streams app can access it via HTTP or gRPC to do the inference. You could also use e.g. Google Cloud ML Engine to deploy the TensorFlow model in a public cloud the same way.
TODO more details discussed in another github project.
Steps:
-
Install and run TensorFlow Serving locally (e.g. in Docker container) docker build --pull -t tensorflow-serving-devel -f Dockerfile.devel . docker run -it tensorflow-serving-devel
git clone --recurse-submodules https://github.com/tensorflow/serving cd serving/tensorflow ./configure cd .. bazel test tensorflow_serving/...
=> Takes long time... Better use a prebuilt container like below
-
mvn clean package istall
-
Start Kafka and create topics confluent start kafka
kafka-topics --zookeeper localhost:2181 --create --topic ImageInputTopic --partitions 3 --replication-factor 1 kafka-topics --zookeeper localhost:2181 --create --topic ImageOutputTopic --partitions 3 --replication-factor 1 java -cp target/kafka-streams-machine-learning-examples-1.0-SNAPSHOT-jar-with-dependencies.jar com.github.megachucky.kafka.streams.machinelearning.Kafka_Streams_TensorFlow_Serving_gRPC_Image_Recognition_Example java -cp target/kafka-streams-machine-learning-examples-1.0-SNAPSHOT-jar-with-dependencies.jar com.github.megachucky.kafka.streams.machinelearning.Main
-
TODO Start Streams App
-
TODO Start Kafka and create topic
-
TODO Send test message
-
Send messages, e.g. with kafkacat: echo -e "src/main/resources/TensorFlow_Images/dog.jpg" | kafkacat -b localhost:9092 -P -t ImageInputTopic
-
Consume predictions: kafka-console-consumer --bootstrap-server localhost:9092 --topic ImageOutputTopic --from-beginning
-
Find more details in the unit test...
https://github.com/gameofdimension/inception-java-client pull and start the prebuilt container, forward port 9000
docker run -it -p 9000:9000 tgowda/inception_serving_tika
root@8311ea4e8074:/# /serving/server.sh This is hosting the model. The client just uses gRPC and Protobuf. It does not include any TensorFlow APIs.
mvn clean compile exec:java -Dexec.args="localhost:9000 example.jpg"
java -cp target/kafka-streams-machine-learning-examples-1.0-SNAPSHOT-jar-with-dependencies.jar com.github.megachucky.kafka.streams.machinelearning.Main localhost:9000 src/main/resources/TensorFlow_Images/dog.jpg
java -cp target/kafka-streams-machine-learning-examples-1.0-SNAPSHOT-jar-with-dependencies.jar com.github.megachucky.kafka.streams.machinelearning.Kafka_Streams_TensorFlow_Serving_gRPC_Image_Recognition_Example localhost:9000 src/main/resources/TensorFlow_Images/dog.jpg