Machine Learning: Make Your Ruby Code Smarter

Machine Learning Boris Nadion boris@astrails.com @borisnadion

@borisnadion boris@astrails.com

awesome web and mobile apps since 2005

AI (artiﬁcial intelligence) - the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages

ML (machine learning) - is a type of artiﬁcial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data.

without being explicitly programmed

FF NN Cost Function I’m kidding

cost function with regularization

2 types of ML supervised learning unsupervised learning

supervised the training data is labeled, eg. we know the correct answer

unsupervised the training data is not labeled, eg. we would ﬁgure out hidden correlations by ourselves

linear regression supervised learning

(x(1) , y(1) ), (x(2) , y(2) )…(x(m) , y(m) ) m training examples of (x(i) , y(i) ) x(i) - feature y(i) - label x(i) y(i)

training set learning algorithm hθ(x)x(new data) y(prediction)

(x, y) y = hθ(x) = θ0 + θ1x ﬁnd θ0 and θ1

hθ(x) = θ0 + θ1x1 + θ2x2 + …+ θnxn many features, n - number of features

size, sq.m x1 # rooms x2 age x3 price y 80 3 22 2.9M 90 4 24 3.1M 75 3 28 2.5M 110 5 20 3.3M

hθ(x) = θ0 + θ1x1 summate the prediction error on training set

Linear Regression Cost Function

minimize J(θ) funding a minimum of cost function = “learning”

gradient descent batch, stochastic, etc, or advanced optimization algorithms to ﬁnd a global (sometimes local) minimum of cost function J 𝞪 - learning rate, a parameter of gradient descent

(x(1) , y(1) ), (x(2) , y(2) )…(x(m) , y(m) ) gradient descent θ0, θ1, θ2, …, θn magic inside

hθ(x) = θ0 + θ1x1 + θ2x2 + …+ θnxn we’re ready to predict

features scaling 0 ≤ x ≤ 1

size, sq.m size, sq.m / 110 x1 80 0.72 90 0.81 75 0.68 110 1

mean normalization average value of the feature is ~0 -0.5 ≤ x ≤ 0.5

size, sq.m (size, sq.m / 110) - 0.8025 x1 80 -0.0825 90 0.075 75 -0.1226 110 0.1975

matrix manipulations X = n x 1 vector, ϴ = n x 1 vector hθ(x) = θ0 + θ1x1 + θ2x2 + …+ θnxn hθ(x) = ϴT X

logistic regression supervised learning

hθ(x) = g(ϴT X) hθ(X) - estimated probability that y = 1 on input X g(z) - logistic non-linear function

logistic function g(z) there is a few: sigmoid, tahn, ReLUs, etc image source: Wikipedia

(x(1) , y(1) ), (x(2) , y(2) )…(x(m) , y(m) ) minimize the cost function vector θ y = {0, 1}

training set learning algorithm hθ(x)x(new data) y(prediction) hθ(x) = g(ϴT X) y ≥ 0.5 - true y < 0.5 - false

one-vs-all supervised learning

y = 1, true y = 0, false y = 0, false

don’t implement it at home use libsvm, liblinear, and others

neural networks supervised learning

neuron a0 a1 a2 computation hθ(a)

feed forward neural network input layer hidden layer output layer

estimates size, sq.m # rooms age e0 e1 e2 e3 estimates ﬁnal estimate

logistic unit x0 x1 x2 θ1 θ2 θ3 hθ = g(x0θ0 + x1θ1 + x2θ2) θ - weights g - activation function

output: probabilities 0.4765 that y = 2 0.7123 that y = 1

net with no hidden layers no hidden layers = one-vs-all logistic regression

cost function sometimes called loss function of NN, a representation of an error between a real and a predicted value

training set learning algorithm θx(new data) y(prediction)

backprop backward propagation of errors

gradient descent + backprop “deep learning” - is training a neural net “deep” - because we have many layers

convolutional neural nets widely used for image processing and object recognition

recurrent neural nets widely used for natural language processing

image source: https://xkcd.com/303/

image source: http://www.falsepositives.com/index.php/2008/01/31/the-real-reason-for-no-increased-productivity-behind-scripting-languages-reveled/ 2008

tangledpath/ruby-fann Ruby library for interfacing with FANN (Fast Artiﬁcial Neural Network)

require './neural_network' LOCATIONS = [:home, :work, :tennis, :parents] LOCATIONS_INDEXED = LOCATIONS.map.with_index { |x, i| [x, i] }.to_h XX = [ # week 1 # 1st day of week, 8am [:work, 1, 8], [:tennis, 1, 17], [:home, 1, 20], [:work, 2, 8], [:home, 2, 18], [:work, 3, 8], [:tennis, 3, 17], [:home, 3, 20], [:work, 4, 8], [:home, 4, 18], [:work, 5, 8], [:home, 5, 18], [:parents, 7, 13], [:home, 7, 18], # week 2 [:work, 1, 8], [:home, 1, 18], [:work, 2, 8], [:home, 2, 18], [:work, 3, 8], [:tennis, 3, 17], [:home, 3, 20], [:work, 4, 8], [:home, 4, 18], [:work, 5, 8], [:home, 5, 18],

XX.each do |destination, day, time| yy << LOCATIONS_INDEXED[destination] xx << [day.to_f/7, time.to_f/24] end features scaling

2 ➞ 25 ➞ 4 one hidden layer with 25 units

[ [1, 16.5], [1, 17], [1, 17.5], [1, 17.8], [2, 17], [2, 18.1], [4, 18], [6, 23], [7, 13], ].each do |day, time| res = nn.predict_with_probabilities([ [day.to_f/7, time.to_f/24] ]).first. select {|v| v[0] > 0} # filter zero probabilities puts "#{day} #{time} t #{res.map {|v| [LOCATIONS[v[1]], v[0]]}.inspect}" end

1 16.5 [[:tennis , 0.97]] 1 17 [[:tennis , 0.86], [:home , 0.06]] 1 17.5 [[:home , 0.52], [:tennis, 0.49]] 1 17.8 [[:home , 0.82], [:tennis, 0.22]] 2 17 [[:tennis , 0.85], [:home , 0.06]] 2 18.1 [[:home , 0.95], [:tennis, 0.07]] 4 18 [[:home , 0.96], [:tennis, 0.08]] 6 23 [[:home , 1.00]] [:work, 1, 8], [:tennis, 1, 17], [:home, 1, 20], [:work, 2, 8], [:home, 2, 18], [:work, 3, 8], [:tennis, 3, 17], [:home, 3, 20], [:work, 4, 8], [:home, 4, 18], [:work, 5, 8], [:home, 5, 18], [:parents, 7, 13], [:home, 7, 18], # week 2 [:work, 1, 8], [:home, 1, 18], [:work, 2, 8], [:home, 2, 18], [:work, 3, 8], [:tennis, 3, 17], [:home, 3, 20], [:work, 4, 8], [:home, 4, 18], [:work, 5, 8], [:home, 5, 18],

borisnadion/suggested-destination-demo ruby code of the demo

tensorﬂow but you will need to learn Python

clustering unsupervised learning

anomaly detection unsupervised learning

collaborative ﬁltering unsupervised learning

Jane Arthur John Star Wars VII 5 5 1 Dr. Strange 5 5 ? Arrival 5 ? 1

automatic features and their weights detection based on the user votes

similarity between users and between items

thanks! Boris Nadion http://astrails.com

Machine Learning: Make Your Ruby Code Smarter

More Related Content

What's hot

Viewers also liked

Similar to Machine Learning: Make Your Ruby Code Smarter

More from Astrails

Recently uploaded

Machine Learning: Make Your Ruby Code Smarter