Big Things Conference 2019 - Distributed Deep Learning with Keras/TensorFlow on Apache Spark

DISTRIBUTED DEEP LEARNING WITH KERAS AND TENSORFLOW ON APACHE SPARK:YES,YOU CAN! GUGLIELMO IOZZIA MSD MADRID, NOVEMBER 21ST 2019 #guglielmoiozzia

ABOUT ME Currently at Previously at I got some awards lately Author I love cooking DataOps Champion #guglielmoiozzia

MSD IRELAND + 50 years Approx. 2,000 employees $2.5 billion investment to date Approx 50% MSD’s top 20 products manufactured here Export to + 60 countries €6.1 billion turnover in 2017 2017 + 300 jobs & €280m investment MSD Biotech, Dublin, coming in 2021 https://www.msd-ireland.com/

CORE TOPICS • What is it?Deep Learning • 2 of the most popular frameworks for DLKeras and Tensorflow • Why is it so difficult? Why Distributed Deep Learning on Spark? • Why and How? DL in Python on the JVM

DEEP LEARNING It is a subset of Machine Learning which is based on Multilayer Neural Networks

DEEP LEARNING http://www.asimovinstitute.org/wp-content/uploads/2019/04/NeuralNetworkZoo20042019.png

TENSORFLOW It is an end-to-end open source platform for ML. It has a comprehensive, flexible ecosystem of tools, libraries and community resources for researchers and developers. https://www.tensorflow.org/

KERAS Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It allows for easy prototyping and runs seamlessly on CPUs and GPUs. https://keras.io/

KERAS & TENSORFLOW Starting from TensorFlow r1.14

Speed It achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine. Ease of Use It offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python, R, and SQL shells. Generality Combine SQL, streaming, and complex analytics. Runs Everywhere It runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources.

WHEN WOULDYOU NEED TO TRAIN MNNS IN SPARK • Availability of a cluster of machines for training • Scarcity of GPUs • Networks very large • Huge data sets By the way, DL4J isn’t for Spark only: you can use it on a single machine with multiple GPUs or multiple physical processors.

CHALLENGES OF TRAINING MNNS IN SPARK • Different execution models between Spark and the DL frameworks • GPU configuration and management • Performance • Accuracy

WHY DISTRIBUTED DL ON THE JVM?

DEEPLEARNING4J It is an Open Source, distributed, Deep Learning framework written for JVM languages. It is integrated with Hadoop and Apache Spark. It can be used on distributed GPUs and CPUs.

WHY DISTRIBUTED DL ON THE JVM? TensorFlow

DL4J MODULES • DataVec • Arbiter • NN • Datasets • RL4J • DL4J-Spark • Model Import • ND4J It is an Open Source linear algebra and matrix manipulation library which supports n-dimensional arrays and it is integrated with Apache Hadoop and Spark.

DL4J + APACHE SPARK • DL4J provides high level API to design, configure train and evaluate MNNs. • Spark performances are excellent in particular for ETL/streaming, but in terms of computation, in a MNN training context, some data transformation/aggregation needs to be done using a low-level language. • DL4J uses ND4J, which is a C++ library that provides high level Scala API to developers.

MODEL IMPORT IN DL4J Keras TensorFlow Train the Model Save it as .h5 Load Model and Weights Load New Data Predict Train the Model Save it as .pb Load Model and Weights Load New Data Predict KerasModelImport TFGraphMapper Transfer Learning

MODEL IMPORT IN DL4J Keras TensorFlow Train the Model Save it as .h5 Load Model and Weights Load New Data Predict Train the Model Save it as .pb Load Model and Weights Load New Data Predict

KERAS MODEL IMPORT: SUPPORTED FEATURES • Layers • Losses • Activations • Initializers • Regularizers • Constraints • Metrics • Optimizers

MODEL IMPORT IN DL4J: EXAMPLE Keras Train the Model Save it as .h5 Load Model and Weights Load New Data Predict Import the VGG16 Model. Test it.

MODEL IMPORT IN DL4J: EXAMPLE Keras Train the Model Save it as .h5 Load Model and Weights Load New Data Predict

DATA PARALLELISM AND MODEL PARALLELISM

HOW TRAINING HAPPENS IN SPARK WITH DL4J Parameter Averaging (DL4J 1.0.0-alpha) Asynchronous SDG (DL4J 1.0.0-beta+)

HOW TRAINING HAPPENS IN SPARK WITH DL4J The key classes users should be familiar with to get started with distributed training in DL4J are: • TrainingMaster: It specifies how distributed training will be conducted in practice. Implementations include Gradient Sharing or Parameter Averaging . • SparkDl4jMultiLayer and SparkComputationGraph: They are wrappers around the MultiLayerNetwork and ComputationGraph classes in DL4J that enable the functionality related to distributed training. • RDD<DataSet> and RDD<MultiDataSet>: Spark RDDs with DL4J’s DataSet or MultiDataSet classes that define the source of the training or evaluation data.

RE-TRAIN AN IMPORTED MODEL Define the Spark Context Choose the TrainingMaster implementation Create the Spark network Start the training Get the model configuration

MEMORY UTILIZATION: SOMETHING TO TAKE CARE OF Take Care of the Off-Heap Memory!

More on DL with DL4J on Spark in my book http://tinyurl.3c1om/y9jkvtuy

Thanks! Any questions? You can find me at @GuglielmoIozzia https://ie.linkedin.com/in/giozzia googlielmo.blogspot.com

Big Things Conference 2019 - Distributed Deep Learning with Keras/TensorFlow on Apache Spark

More Related Content

What's hot

Similar to Big Things Conference 2019 - Distributed Deep Learning with Keras/TensorFlow on Apache Spark

Recently uploaded

Big Things Conference 2019 - Distributed Deep Learning with Keras/TensorFlow on Apache Spark