DISTRIBUTED DEEP LEARNING WITH KERAS AND TENSORFLOW ON APACHE SPARK:YES,YOU CAN! GUGLIELMO IOZZIA MSD MADRID, NOVEMBER 21ST 2019 #guglielmoiozzia
ABOUT ME Currently at Previously at I got some awards lately Author I love cooking DataOps Champion #guglielmoiozzia
MSD IRELAND + 50 years Approx. 2,000 employees $2.5 billion investment to date Approx 50% MSD’s top 20 products manufactured here Export to + 60 countries €6.1 billion turnover in 2017 2017 + 300 jobs & €280m investment MSD Biotech, Dublin, coming in 2021 https://www.msd-ireland.com/
THE DUBLIN TECH HUB
CORE TOPICS • What is it?Deep Learning • 2 of the most popular frameworks for DLKeras and Tensorflow • Why is it so difficult? Why Distributed Deep Learning on Spark? • Why and How? DL in Python on the JVM
DEEP LEARNING It is a subset of Machine Learning which is based on Multilayer Neural Networks
DEEP LEARNING http://www.asimovinstitute.org/wp-content/uploads/2019/04/NeuralNetworkZoo20042019.png
DL FRAMEWORKS POPULARITY
TENSORFLOW It is an end-to-end open source platform for ML. It has a comprehensive, flexible ecosystem of tools, libraries and community resources for researchers and developers. https://www.tensorflow.org/
KERAS Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It allows for easy prototyping and runs seamlessly on CPUs and GPUs. https://keras.io/
KERAS & TENSORFLOW Starting from TensorFlow r1.14
Speed It achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine. Ease of Use It offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python, R, and SQL shells. Generality Combine SQL, streaming, and complex analytics. Runs Everywhere It runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources.
WHEN WOULDYOU NEED TO TRAIN MNNS IN SPARK • Availability of a cluster of machines for training • Scarcity of GPUs • Networks very large • Huge data sets By the way, DL4J isn’t for Spark only: you can use it on a single machine with multiple GPUs or multiple physical processors.
CHALLENGES OF TRAINING MNNS IN SPARK • Different execution models between Spark and the DL frameworks • GPU configuration and management • Performance • Accuracy
WHY DISTRIBUTED DL ON THE JVM?
DEEPLEARNING4J It is an Open Source, distributed, Deep Learning framework written for JVM languages. It is integrated with Hadoop and Apache Spark. It can be used on distributed GPUs and CPUs.
WHY DISTRIBUTED DL ON THE JVM? TensorFlow
DL4J MODULES • DataVec • Arbiter • NN • Datasets • RL4J • DL4J-Spark • Model Import • ND4J It is an Open Source linear algebra and matrix manipulation library which supports n-dimensional arrays and it is integrated with Apache Hadoop and Spark.
DL4J + APACHE SPARK • DL4J provides high level API to design, configure train and evaluate MNNs. • Spark performances are excellent in particular for ETL/streaming, but in terms of computation, in a MNN training context, some data transformation/aggregation needs to be done using a low-level language. • DL4J uses ND4J, which is a C++ library that provides high level Scala API to developers.
MODEL IMPORT IN DL4J Keras TensorFlow Train the Model Save it as .h5 Load Model and Weights Load New Data Predict Train the Model Save it as .pb Load Model and Weights Load New Data Predict KerasModelImport TFGraphMapper Transfer Learning
MODEL IMPORT IN DL4J Keras TensorFlow Train the Model Save it as .h5 Load Model and Weights Load New Data Predict Train the Model Save it as .pb Load Model and Weights Load New Data Predict
KERAS MODEL IMPORT: SUPPORTED FEATURES • Layers • Losses • Activations • Initializers • Regularizers • Constraints • Metrics • Optimizers
MODEL IMPORT IN DL4J: EXAMPLE Keras Train the Model Save it as .h5 Load Model and Weights Load New Data Predict Import the VGG16 Model. Test it.
MODEL IMPORT IN DL4J: EXAMPLE Keras Train the Model Save it as .h5 Load Model and Weights Load New Data Predict
MODEL IMPORT IN DL4J: EXAMPLE Keras Train the Model Save it as .h5 Load Model and Weights Load New Data Predict
MODEL IMPORT IN DL4J: EXAMPLE Keras Train the Model Save it as .h5 Load Model and Weights Load New Data Predict
DL4J MODEL IMPORT IN ACTION
DATA PARALLELISM AND MODEL PARALLELISM
HOW TRAINING HAPPENS IN SPARK WITH DL4J Parameter Averaging (DL4J 1.0.0-alpha) Asynchronous SDG (DL4J 1.0.0-beta+)
HOW TRAINING HAPPENS IN SPARK WITH DL4J The key classes users should be familiar with to get started with distributed training in DL4J are: • TrainingMaster: It specifies how distributed training will be conducted in practice. Implementations include Gradient Sharing or Parameter Averaging . • SparkDl4jMultiLayer and SparkComputationGraph: They are wrappers around the MultiLayerNetwork and ComputationGraph classes in DL4J that enable the functionality related to distributed training. • RDD<DataSet> and RDD<MultiDataSet>: Spark RDDs with DL4J’s DataSet or MultiDataSet classes that define the source of the training or evaluation data.
RE-TRAIN AN IMPORTED MODEL Define the Spark Context Choose the TrainingMaster implementation Create the Spark network Start the training Get the model configuration
DL4J VISUAL FACILITIES
MEMORY UTILIZATION: SOMETHING TO TAKE CARE OF Take Care of the Off-Heap Memory!
More on DL with DL4J on Spark in my book http://tinyurl.3c1om/y9jkvtuy
Thanks! Any questions? You can find me at @GuglielmoIozzia https://ie.linkedin.com/in/giozzia googlielmo.blogspot.com

Big Things Conference 2019 - Distributed Deep Learning with Keras/TensorFlow on Apache Spark

  • 1.
    DISTRIBUTED DEEP LEARNING WITHKERAS AND TENSORFLOW ON APACHE SPARK:YES,YOU CAN! GUGLIELMO IOZZIA MSD MADRID, NOVEMBER 21ST 2019 #guglielmoiozzia
  • 2.
    ABOUT ME Currently at Previouslyat I got some awards lately Author I love cooking DataOps Champion #guglielmoiozzia
  • 3.
    MSD IRELAND + 50years Approx. 2,000 employees $2.5 billion investment to date Approx 50% MSD’s top 20 products manufactured here Export to + 60 countries €6.1 billion turnover in 2017 2017 + 300 jobs & €280m investment MSD Biotech, Dublin, coming in 2021 https://www.msd-ireland.com/
  • 4.
  • 5.
    CORE TOPICS • Whatis it?Deep Learning • 2 of the most popular frameworks for DLKeras and Tensorflow • Why is it so difficult? Why Distributed Deep Learning on Spark? • Why and How? DL in Python on the JVM
  • 6.
    DEEP LEARNING It isa subset of Machine Learning which is based on Multilayer Neural Networks
  • 7.
  • 8.
  • 9.
    TENSORFLOW It is anend-to-end open source platform for ML. It has a comprehensive, flexible ecosystem of tools, libraries and community resources for researchers and developers. https://www.tensorflow.org/
  • 10.
    KERAS Keras is ahigh-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It allows for easy prototyping and runs seamlessly on CPUs and GPUs. https://keras.io/
  • 11.
    KERAS & TENSORFLOW Startingfrom TensorFlow r1.14
  • 12.
    Speed It achieves highperformance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine. Ease of Use It offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python, R, and SQL shells. Generality Combine SQL, streaming, and complex analytics. Runs Everywhere It runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources.
  • 13.
    WHEN WOULDYOU NEEDTO TRAIN MNNS IN SPARK • Availability of a cluster of machines for training • Scarcity of GPUs • Networks very large • Huge data sets By the way, DL4J isn’t for Spark only: you can use it on a single machine with multiple GPUs or multiple physical processors.
  • 14.
    CHALLENGES OF TRAININGMNNS IN SPARK • Different execution models between Spark and the DL frameworks • GPU configuration and management • Performance • Accuracy
  • 15.
    WHY DISTRIBUTED DLON THE JVM?
  • 16.
    DEEPLEARNING4J It is anOpen Source, distributed, Deep Learning framework written for JVM languages. It is integrated with Hadoop and Apache Spark. It can be used on distributed GPUs and CPUs.
  • 17.
    WHY DISTRIBUTED DLON THE JVM? TensorFlow
  • 18.
    DL4J MODULES • DataVec •Arbiter • NN • Datasets • RL4J • DL4J-Spark • Model Import • ND4J It is an Open Source linear algebra and matrix manipulation library which supports n-dimensional arrays and it is integrated with Apache Hadoop and Spark.
  • 19.
    DL4J + APACHESPARK • DL4J provides high level API to design, configure train and evaluate MNNs. • Spark performances are excellent in particular for ETL/streaming, but in terms of computation, in a MNN training context, some data transformation/aggregation needs to be done using a low-level language. • DL4J uses ND4J, which is a C++ library that provides high level Scala API to developers.
  • 20.
    MODEL IMPORT INDL4J Keras TensorFlow Train the Model Save it as .h5 Load Model and Weights Load New Data Predict Train the Model Save it as .pb Load Model and Weights Load New Data Predict KerasModelImport TFGraphMapper Transfer Learning
  • 21.
    MODEL IMPORT INDL4J Keras TensorFlow Train the Model Save it as .h5 Load Model and Weights Load New Data Predict Train the Model Save it as .pb Load Model and Weights Load New Data Predict
  • 22.
    KERAS MODEL IMPORT:SUPPORTED FEATURES • Layers • Losses • Activations • Initializers • Regularizers • Constraints • Metrics • Optimizers
  • 23.
    MODEL IMPORT INDL4J: EXAMPLE Keras Train the Model Save it as .h5 Load Model and Weights Load New Data Predict Import the VGG16 Model. Test it.
  • 24.
    MODEL IMPORT INDL4J: EXAMPLE Keras Train the Model Save it as .h5 Load Model and Weights Load New Data Predict
  • 25.
    MODEL IMPORT INDL4J: EXAMPLE Keras Train the Model Save it as .h5 Load Model and Weights Load New Data Predict
  • 26.
    MODEL IMPORT INDL4J: EXAMPLE Keras Train the Model Save it as .h5 Load Model and Weights Load New Data Predict
  • 27.
  • 28.
    DATA PARALLELISM ANDMODEL PARALLELISM
  • 29.
    HOW TRAINING HAPPENSIN SPARK WITH DL4J Parameter Averaging (DL4J 1.0.0-alpha) Asynchronous SDG (DL4J 1.0.0-beta+)
  • 30.
    HOW TRAINING HAPPENSIN SPARK WITH DL4J The key classes users should be familiar with to get started with distributed training in DL4J are: • TrainingMaster: It specifies how distributed training will be conducted in practice. Implementations include Gradient Sharing or Parameter Averaging . • SparkDl4jMultiLayer and SparkComputationGraph: They are wrappers around the MultiLayerNetwork and ComputationGraph classes in DL4J that enable the functionality related to distributed training. • RDD<DataSet> and RDD<MultiDataSet>: Spark RDDs with DL4J’s DataSet or MultiDataSet classes that define the source of the training or evaluation data.
  • 31.
    RE-TRAIN AN IMPORTEDMODEL Define the Spark Context Choose the TrainingMaster implementation Create the Spark network Start the training Get the model configuration
  • 32.
  • 33.
    MEMORY UTILIZATION: SOMETHINGTO TAKE CARE OF Take Care of the Off-Heap Memory!
  • 34.
    More on DLwith DL4J on Spark in my book http://tinyurl.3c1om/y9jkvtuy
  • 35.
    Thanks! Any questions? You canfind me at @GuglielmoIozzia https://ie.linkedin.com/in/giozzia googlielmo.blogspot.com