Machine Learning Boris Nadion boris@astrails.com @borisnadion
@borisnadion boris@astrails.com
astrails http://astrails.com
awesome web and mobile apps since 2005
terms
AI (artificial intelligence) - the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages
ML (machine learning) - is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data.
without being explicitly programmed
FF NN cost function
FF NN Cost Function I’m kidding
cost function with regularization
2 types of ML supervised learning unsupervised learning
supervised the training data is labeled, eg. we know the correct answer
unsupervised the training data is not labeled, eg. we would figure out hidden correlations by ourselves
linear regression supervised learning
(x(1) , y(1) ), (x(2) , y(2) )…(x(m) , y(m) ) m training examples of (x(i) , y(i) ) x(i) - feature y(i) - label x(i) y(i)
training set learning algorithm hθ(x)x(new data) y(prediction)
training set learning algorithm hθ(x)x(new data) y(prediction)
hθ(x) = hypothesis
(x, y) y = hθ(x) = θ0 + θ1x find θ0 and θ1
hθ(x) = θ0 + θ1x1 + θ2x2 + …+ θnxn many features, n - number of features
size, sq.m x1 # rooms x2 age x3 price y 80 3 22 2.9M 90 4 24 3.1M 75 3 28 2.5M 110 5 20 3.3M
1 USD = 3.85 NIS
hθ(x) = θ0 + θ1x1 summate the prediction error on training set
Linear Regression Cost Function
minimize J(θ) funding a minimum of cost function = “learning”
gradient descent batch, stochastic, etc, or advanced optimization algorithms to find a global (sometimes local) minimum of cost function J 𝞪 - learning rate, a parameter of gradient descent
(x(1) , y(1) ), (x(2) , y(2) )…(x(m) , y(m) ) gradient descent θ0, θ1, θ2, …, θn magic inside
hθ(x) = θ0 + θ1x1 + θ2x2 + …+ θnxn we’re ready to predict
features scaling 0 ≤ x ≤ 1
size, sq.m size, sq.m / 110 x1 80 0.72 90 0.81 75 0.68 110 1
mean normalization average value of the feature is ~0 -0.5 ≤ x ≤ 0.5
size, sq.m (size, sq.m / 110) - 0.8025 x1 80 -0.0825 90 0.075 75 -0.1226 110 0.1975
matrix manipulations X = n x 1 vector, ϴ = n x 1 vector hθ(x) = θ0 + θ1x1 + θ2x2 + …+ θnxn hθ(x) = ϴT X
GPU
logistic regression supervised learning
classifier
y = 1, true y = 0, false
hθ(x) = g(ϴT X) hθ(X) - estimated probability that y = 1 on input X g(z) - logistic non-linear function
logistic function g(z) there is a few: sigmoid, tahn, ReLUs, etc image source: Wikipedia
(x(1) , y(1) ), (x(2) , y(2) )…(x(m) , y(m) ) minimize the cost function vector θ y = {0, 1}
training set learning algorithm hθ(x)x(new data) y(prediction) hθ(x) = g(ϴT X) y ≥ 0.5 - true y < 0.5 - false
one-vs-all supervised learning
y = 1, true y = 0, false y = 0, false
don’t implement it at home use libsvm, liblinear, and others
neural networks supervised learning
neuron a0 a1 a2 computation hθ(a)
feed forward neural network input layer hidden layer output layer
estimates size, sq.m # rooms age e0 e1 e2 e3 estimates final estimate
multiclass classifiers
logistic unit x0 x1 x2 θ1 θ2 θ3 hθ = g(x0θ0 + x1θ1 + x2θ2) θ - weights g - activation function
logistic function g(z) there is a few: sigmoid, tahn, ReLUs, etc image source: Wikipedia
output: probabilities 0.4765 that y = 2 0.7123 that y = 1
net with no hidden layers no hidden layers = one-vs-all logistic regression
cost function sometimes called loss function of NN, a representation of an error between a real and a predicted value
training set learning algorithm θx(new data) y(prediction)
backprop backward propagation of errors
gradient descent + backprop “deep learning” - is training a neural net “deep” - because we have many layers
convolutional neural nets widely used for image processing and object recognition
recurrent neural nets widely used for natural language processing
CPU/GPU expensive
image source: https://xkcd.com/303/
image source: http://www.falsepositives.com/index.php/2008/01/31/the-real-reason-for-no-increased-productivity-behind-scripting-languages-reveled/ 2008
2016
destination suggestion
tangledpath/ruby-fann Ruby library for interfacing with FANN (Fast Artificial Neural Network)
require './neural_network' LOCATIONS = [:home, :work, :tennis, :parents] LOCATIONS_INDEXED = LOCATIONS.map.with_index { |x, i| [x, i] }.to_h XX = [ # week 1 # 1st day of week, 8am [:work, 1, 8], [:tennis, 1, 17], [:home, 1, 20], [:work, 2, 8], [:home, 2, 18], [:work, 3, 8], [:tennis, 3, 17], [:home, 3, 20], [:work, 4, 8], [:home, 4, 18], [:work, 5, 8], [:home, 5, 18], [:parents, 7, 13], [:home, 7, 18], # week 2 [:work, 1, 8], [:home, 1, 18], [:work, 2, 8], [:home, 2, 18], [:work, 3, 8], [:tennis, 3, 17], [:home, 3, 20], [:work, 4, 8], [:home, 4, 18], [:work, 5, 8], [:home, 5, 18],
XX.each do |destination, day, time| yy << LOCATIONS_INDEXED[destination] xx << [day.to_f/7, time.to_f/24] end features scaling
2 ➞ 25 ➞ 4 one hidden layer with 25 units
100% accuracy on training set
[ [1, 16.5], [1, 17], [1, 17.5], [1, 17.8], [2, 17], [2, 18.1], [4, 18], [6, 23], [7, 13], ].each do |day, time| res = nn.predict_with_probabilities([ [day.to_f/7, time.to_f/24] ]).first. select {|v| v[0] > 0} # filter zero probabilities puts "#{day} #{time} t #{res.map {|v| [LOCATIONS[v[1]], v[0]]}.inspect}" end
1 16.5 [[:tennis , 0.97]] 1 17 [[:tennis , 0.86], [:home , 0.06]] 1 17.5 [[:home , 0.52], [:tennis, 0.49]] 1 17.8 [[:home , 0.82], [:tennis, 0.22]] 2 17 [[:tennis , 0.85], [:home , 0.06]] 2 18.1 [[:home , 0.95], [:tennis, 0.07]] 4 18 [[:home , 0.96], [:tennis, 0.08]] 6 23 [[:home , 1.00]] [:work, 1, 8], [:tennis, 1, 17], [:home, 1, 20], [:work, 2, 8], [:home, 2, 18], [:work, 3, 8], [:tennis, 3, 17], [:home, 3, 20], [:work, 4, 8], [:home, 4, 18], [:work, 5, 8], [:home, 5, 18], [:parents, 7, 13], [:home, 7, 18], # week 2 [:work, 1, 8], [:home, 1, 18], [:work, 2, 8], [:home, 2, 18], [:work, 3, 8], [:tennis, 3, 17], [:home, 3, 20], [:work, 4, 8], [:home, 4, 18], [:work, 5, 8], [:home, 5, 18],
borisnadion/suggested-destination-demo ruby code of the demo
tensorflow but you will need to learn Python
clustering unsupervised learning
{X(i) } no labels
anomaly detection unsupervised learning
collaborative filtering unsupervised learning
Jane Arthur John Star Wars VII 5 5 1 Dr. Strange 5 5 ? Arrival 5 ? 1
automatic features and their weights detection based on the user votes
similarity between users and between items
what to google
http://astrails.com
thanks! Boris Nadion http://astrails.com

Machine Learning: Make Your Ruby Code Smarter