Java and Deep Learning (Introduction)

Java and Deep Learning (Introduction) Java Meetup SF Pivotal Labs January 10, 2018 Oswald Campesato ocampesato@yahoo.com

Session Overview partial intro/overview of AI/ML/DL hyper-parameters a simple neural network linear regression cost/activation functions gradient descent/back propagation CNNs and RNNs Java code (CNN and TensorFlow)

Gartner 2017: Deep Learning (YES!)

Neural Network with 3 Hidden Layers

The Official Start of AI (1956)

AI/ML/DL: How They Differ Traditional AI (20th century): based on collections of rules Led to expert systems in the 1980s The era of LISP and Prolog

AI/ML/DL: How They Differ Machine Learning: Started in the 1950s (approximate) Alan Turing and “learning machines” Data-driven (not rule-based) Many types of algorithms Involves optimization

AI/ML/DL: How They Differ Deep Learning: Started in the 1950s (approximate) The “perceptron” (basis of NNs) Data-driven (not rule-based) large (even massive) data sets Involves neural networks (CNNs: ~1970s) Lots of heuristics Heavily based on empirical results

The Rise of Deep Learning Massive and inexpensive computing power Huge volumes of data/Powerful algorithms The “big bang” in 2009: ”deep-learning neural networks and NVidia GPUs" Google Brain used NVidia GPUs (2009)

AI/ML/DL: Commonality All of them involve a model A model represents a system Goal: a good predictive model The model is based on: Many rules (for AI) data and algorithms (for ML) large sets of data (for DL)

Clustering Example #1 Given some red dots and blue dots Red dots are in the upper half plane Blue dots in the lower half plane How to detect if a point is red or blue?

Clustering Example #2 Given some red dots and blue dots Red dots are inside a unit square Blue dots are outside the unit square How to detect if a point is red or blue?

Clustering Example #2  Two input nodes X and Y  One hidden layer with 4 nodes (one per line)  X & Y weights are the (x,y) values of the inward pointing perpendicular vector of each side  The threshold values are the negative of the y-intercept (or the x-intercept)  The outbound weights are all equal to 1  The threshold for the output node node is 4

Clustering Exercises #1 Describe an NN for a triangle Describe an NN for a pentagon Describe an NN for an n-gon (convex) Describe an NN for an n-gon (non-convex)

Clustering Exercises #2 Create an NN for an OR gate Create an NN for a NOR gate Create an NN for an AND gate Create an NN for a NAND gate Create an NN for an XOR gate => requires TWO hidden layers

Clustering Exercises #3 Convert example #2 to a 3D cube

Clustering Example #2 A few points to keep in mind: A “step” activation function (0 or 1) No back propagation No cost function => no learning involved

A 2D Linear Regression Model Perform the following steps: 1) Start with a simple model (2 variables) 2) Generalize that model (n variables) 3) See how it might apply to a NN

Linear Regression Details One of the simplest models in ML Fits a line (y = m*x + b) to data in 2D Finds best line by minimizing MSE: m = average of x values (“mean”) b also has a closed form solution

Linear Regression in 2D: graph

Linear Regression: example #1 One feature (independent variable): X = number of square feet Predicted value (dependent variable): Y = cost of a house A very “coarse grained” model We can devise a much better model

Linear Regression: example #2 Multiple features: X1 = # of square feet X2 = # of bedrooms X3 = # of bathrooms (dependency?) X4 = age of house X5 = cost of nearby houses X6 = corner lot (or not): Boolean a much better model (6 features)

Linear Multivariate Analysis General form of multivariate equation: Y = w1*x1 + w2*x2 + . . . + wn*xn + b w1, w2, . . . , wn are numeric values x1, x2, . . . , xn are variables (features) Properties of variables: Can be independent (Naïve Bayes) weak/strong dependencies can exist

Neural Networks: equations Node “values” in first hidden layer: N1 = w11*x1+w21*x2+…+wn1*xn N2 = w12*x1+w22*x2+…+wn2*xn N3 = w13*x1+w23*x2+…+wn3*xn . . . Nn = w1n*x1+w2n*x2+…+wnn*xn Similar equations for other pairs of layers

Neural Networks: Matrices From inputs to first hidden layer: Y1 = W1*X + B1 (X/Y1/B1: vectors; W1: matrix) From first to second hidden layers: Y2 = W2*X + B2 (X/Y2/B2: vectors; W2: matrix) From second to third hidden layers: Y3 = W3*X + B3 (X/Y3/B3: vectors; W3: matrix)  Apply an “activation function” to y values

Neural Networks (general) Multiple hidden layers: Layer composition is your decision Activation functions: sigmoid, tanh, RELU https://en.wikipedia.org/wiki/Activation_function Back propagation (1980s) https://en.wikipedia.org/wiki/Backpropagation => Initial weights: small random numbers

The sigmoid Activation Function

The softmax Activation Function

Activation Functions in Python import numpy as np ... # Python sigmoid example: z = 1/(1 + np.exp(-np.dot(W, x))) ... # Python tanh example: z = np.tanh(np.dot(W,x)); # Python ReLU example: z = np.maximum(0, np.dot(W, x))

What’s the “Best” Activation Function? Initially: sigmoid was popular Then: tanh became popular Now: RELU is preferred (better results) Softmax: for FC (fully connected) layers NB: sigmoid and tanh are used in LSTMs

Even More Activation Functions! https://stats.stackexchange.com/questions/11525 8/comprehensive-list-of-activation-functions-in- neural-networks-with-pros-cons https://medium.com/towards-data- science/activation-functions-and-its-types-which- is-better-a9a5310cc8f https://medium.com/towards-data-science/multi- layer-neural-networks-with-sigmoid-function- deep-learning-for-rookies-2-bf464f09eb7f

How to Select a Cost Function 1) Depends on the learning type: => supervised/unsupervised/RL 2) Depends on the activation function 3) Other factors Example: cross-entropy cost function for supervised learning on multiclass classification

GD versus SGD SGD (Stochastic Gradient Descent): + involves a SUBSET of the dataset + aka Minibatch Stochastic Gradient Descent GD (Gradient Descent): + involves the ENTIRE dataset More details: http://cs229.stanford.edu/notes/cs229-notes1.pdf

Setting up Data & the Model Normalize the data: Subtract the ‘mean’ and divide by stddev [Central Limit Theorem] Initial weight values for NNs: Random numbers in N(0,1) More details: http://cs231n.github.io/neural-networks-2/#losses

What are Hyper Parameters? higher level concepts about the model such as complexity, or capacity to learn Cannot be learned directly from the data in the standard model training process must be predefined

Hyper Parameters (examples) # of hidden layers in a neural network the learning rate (in many models) the dropout rate # of leaves or depth of a tree # of latent factors in a matrix factorization # of clusters in a k-means clustering

Hyper Parameter: dropout rate "dropout" refers to dropping out units (both hidden and visible) in a neural network a regularization technique for reducing overfitting in neural networks prevents complex co-adaptations on training data a very efficient way of performing model averaging with neural networks

How Many Layers in a DNN? Algorithm #1 (from Geoffrey Hinton): 1) add layers until you start overfitting your training set 2) now add dropout or some another regularization method Algorithm #2 (Yoshua Bengio): "Add layers until the test error does not improve anymore.”

How Many Hidden Nodes in a DNN? Based on a relationship between: # of input and # of output nodes Amount of training data available Complexity of the cost function The training algorithm

CNNs versus RNNs CNNs (Convolutional NNs): Good for image processing 2000: CNNs processed 10-20% of all checks => Approximately 60% of all NNs RNNs (Recurrent NNs): Good for NLP and audio

CNNs: Convolution Calculations https://docs.gimp.org/en/plug-in- convmatrix.html

CNNs: Convolution Matrices (examples) Sharpen: Blur:

CNNs: Convolution Matrices (examples) Edge detect: Emboss:

CNNs: Sample Convolutions/Filters

CNNs: convolution and pooling (2)

Sample CNN in Keras (fragment)  from keras.models import Sequential  from keras.layers.core import Dense, Dropout, Flatten, Activation  from keras.layers.convolutional import Conv2D, MaxPooling2D  from keras.optimizers import Adadelta  input_shape = (3, 32, 32)  nb_classes = 10  model = Sequential()  model.add(Conv2D(32, (3, 3), padding='same’, input_shape=input_shape))  model.add(Activation('relu'))  model.add(Conv2D(32, (3, 3)))  model.add(Activation('relu'))  model.add(MaxPooling2D(pool_size=(2, 2)))  model.add(Dropout(0.25))

GANs: Generative Adversarial Networks

GANs: Generative Adversarial Networks Make imperceptible changes to images Can consistently defeat all NNs Can have extremely high error rate Some images create optical illusions https://www.quora.com/What-are-the-pros-and-cons- of-using-generative-adversarial-networks-a-type-of- neural-network

GANs: Generative Adversarial Networks Create your own GANs: https://www.oreilly.com/learning/generative-adversarial-networks-for- beginners https://github.com/jonbruner/generative-adversarial-networks GANs from MNIST: http://edwardlib.org/tutorials/gan

GANs: Generative Adversarial Networks GANs, Graffiti, and Art: https://thenewstack.io/camouflaged-graffiti-road-signs-can-fool- machine-learning-models/ GANs and audio: https://www.technologyreview.com/s/608381/ai-shouldnt-believe- everything-it-hears Houdini algorithm: https://arxiv.org/abs/1707.05373

Deep Learning Playground TF playground home page: http://playground.tensorflow.org Demo #1: https://github.com/tadashi-aikawa/typescript- playground Converts playground to TypeScript

Java and DL/ML Frameworks Deeplearning4j: Pure Java framework for DL SMILE: “Statistical Machine Intelligence and Learning Engine” "outperforms R, Python, Spark, H2O significantly” https://haifengl.github.io/smile/ Weka (“WAY-kuh”): https://github.com/Waikato/wekaDeeplearning4j IBM neuroph: http://neuroph.sourceforge.net/download.html

Deeplearning4j Library Open source, distributed library for the JVM https://deeplearning4j.org/ https://github.com/deeplearning4j/deeplearning4j Written in Java and Scala (GPU support) Integrates with Hadoop and Spark https://deeplearning4j.org/gettingstarted.html

Deeplearning4j Library Basic set-up steps (command line): mkdir dl4j-examples cd dl4j-examples git clone https://github.com/deeplearning4j/dl4j- examples

Deeplearning4j Library Set-up steps for IntelliJ  File > Import Project (or New Project from Existing Sources)  Select the directory with the DL4J examples.  Select Maven build tool in the next window  Check the following two boxes:  1) "Search for projects recursively"  2) "Import Maven projects automatically” (Next)  click on "+" sign (bottom of window) to JDK/SDK  Click through until you reach "Finish"

Smile Framework Support for many algorithms classification, regression, clustering association rule mining, feature selection manifold learning, multidimensional scaling genetic algorithm, missing value imputation efficient nearest neighbor search

Smile Framework Natural Language Processing: Tokenizers, stemming, phrase detection part-of-speech tagging, keyword extraction named entity recognition, sentiment analysis relevance ranking, taxomony

Smile Framework Mathematics and Statistics linear algebra (LU decomposition) Cholesk decomposition, QR decomposition eigenvalue decomposition singular value decomposition band matrix, and sparse matrix tests: t-test, F-test, chi-square test correlation test (Pearson, Spearman, Kendall) Kolmogorov-Smirnov test distributions/random number generators interpolation, sorting, wavelet, plot

Smile Framework Random Forest (SMILE Scala API): val data = read.arff("iris.arff", 4) val (x, y) = data.unzipInt val rf = randomForest(x, y) println(s"OOB error = ${rf.error}") rf.predict(x(0))

What is TensorFlow? An open source framework for ML and DL A “computation” graph Created by Google (released 11/2015) Evolved from Google Brain Linux and Mac OS X support (VM for Windows) TF home page: https://www.tensorflow.org/

What is TensorFlow? Support for Python, Java, C++ TPUs available for faster processing Can be embedded in Python scripts Installation: pip install tensorflow TensorFlow cluster: https://www.tensorflow.org/deploy/distributed

What is a Tensor? TF tensors are n-dimensional arrays TF tensors are very similar to numpy ndarrays scalar number: a zeroth-order tensor vector: a first-order tensor matrix: a second-order tensor 3-dimensional array: a 3rd order tensor https://dzone.com/articles/tensorflow-simplified- examples

TensorFlow: constants (immutable)  import tensorflow as tf # tf-const.py  aconst = tf.constant(3.0)  print(aconst) # output: Tensor("Const:0", shape=(), dtype=float32)  sess = tf.Session()  print(sess.run(aconst)) # output: 3.0  sess.close()  # => there's a better way…

TensorFlow: constants import tensorflow as tf # tf-const2.py aconst = tf.constant(3.0) print(aconst) Automatically close “sess” with tf.Session() as sess:  print(sess.run(aconst))

TensorFlow Arithmetic import tensorflow as tf # basic1.py a = tf.add(4, 2) b = tf.subtract(8, 6) c = tf.multiply(a, 3) d = tf.div(a, 6) with tf.Session() as sess: print(sess.run(a)) # 6 print(sess.run(b)) # 2 print(sess.run(c)) # 18 print(sess.run(d)) # 1

TensorFlow Arithmetic Methods import tensorflow as tf #tf-math-ops.py PI = 3.141592 sess = tf.Session() print(sess.run(tf.div(12,8))) print(sess.run(tf.floordiv(20.0,8.0))) print(sess.run(tf.sin(PI))) print(sess.run(tf.cos(PI))) print(sess.run(tf.div(tf.sin(PI/4.), tf.cos(PI/4.))))

TensorFlow Arithmetic Methods Output from tf-math-ops.py: 1 2.0 6.27833e-07 -1.0 1.0

TensorFlow: placeholders example import tensorflow as tf # tf-var-multiply.py a = tf.placeholder("float") b = tf.placeholder("float") c = tf.multiply(a,b) # initialize a and b: feed_dict = {a:2, b:3} # multiply a and b: with tf.Session() as sess: print(sess.run(c, feed_dict))

TensorFlow fetch/feed_dict  import tensorflow as tf # fetch-feeddict.py  # y = W*x + b: W and x are 1d arrays  W = tf.constant([10,20], name=’W’)  x = tf.placeholder(tf.int32, name='x')  b = tf.placeholder(tf.int32, name='b')  Wx = tf.multiply(W, x, name='Wx')  y = tf.add(Wx, b, name=’y’)

TensorFlow fetch/feed_dict with tf.Session() as sess: print("Result 1: Wx = ", sess.run(Wx, feed_dict={x:[5,10]})) print("Result 2: y = ", sess.run(y, feed_dict={x:[5,10], b:[15,25]})) Result 1: Wx = [50 200] Result 2: y = [65 225]

TensorFlow Arithmetic Expressions import tensorflow as tf # tf-save-data.py x = tf.constant(5,name="x") y = tf.constant(8,name="y") z = tf.Variable(2*x+3*y, name="z”) model = tf.global_variables_initializer() with tf.Session() as session: writer = tf.summary.FileWriter(”./tf_logs",session.graph) session.run(model) print 'z = ',session.run(z) # => z = 34 # tensorboard –logdir=./tf_logs

TensorFlow Eager Execution An imperative interface to TF (experimental) Fast debugging & immediate run-time errors Eager execution is not included in v1.4 of TF build TF from source or install the nightly build pip install tf-nightly # CPU pip install tf-nightly-gpu #GPU

TensorFlow Eager Execution integration with Python tools Supports dynamic models + Python control flow support for custom and higher-order gradients Supports most TensorFlow operations https://research.googleblog.com/2017/10/eager- execution-imperative-define-by.html

TensorFlow Eager Execution import tensorflow as tf # tf-eager1.py import tensorflow.contrib.eager as tfe tfe.enable_eager_execution() x = [[2.]] m = tf.matmul(x, x) print(m)

Android and Deep Learning TensorFlow Lite (announced 2017 Google I/O) A subset of the TensorFlow APIs (which ones?) Provides “regular” TensorFlow APIs for apps Does not require Python scripts (?)

Deep Learning and Art “Convolutional Blending” images: => 19-layer Convolutional Neural Network www.deepart.io Prisma: Android app with CNN https://www.fastcodesign.com/90124942/this-google- engineer-taught-an-algorithm-to-make-train-footage- and-its-hypnotic

What Do I Learn Next?  PGMs (Probabilistic Graphical Models)  MC (Markov Chains)  MCMC (Markov Chains Monte Carlo)  HMMs (Hidden Markov Models)  RL (Reinforcement Learning)  Hopfield Nets  Neural Turing Machines  Autoencoders  Hypernetworks  Pixel Recurrent Neural Networks  Bayesian Neural Networks  SVMs

About Me: Recent Books 1) HTML5 Canvas and CSS3 Graphics (2013) 2) jQuery, CSS3, and HTML5 for Mobile (2013) 3) HTML5 Pocket Primer (2013) 4) jQuery Pocket Primer (2013) 5) HTML5 Mobile Pocket Primer (2014) 6) D3 Pocket Primer (2015) 7) Python Pocket Primer (2015) 8) SVG Pocket Primer (2016) 9) CSS3 Pocket Primer (2016) 10) Android Pocket Primer (2017) 11) Angular Pocket Primer (2017) 12) Data Cleaning Pocket Primer (2018) 13) RegEx Pocket Primer (2018)

About Me: Training => Deep Learning. Keras, and TensorFlow: http://codeavision.io/training/deep-learning-workshop => Mobile and TensorFlow Lite => R and Deep Learning (Keras and TensorFlow) => Android for Beginners

Java and Deep Learning (Introduction)

More Related Content

What's hot

Similar to Java and Deep Learning (Introduction)

More from Oswald Campesato

Recently uploaded

Java and Deep Learning (Introduction)