Unit 5: Learning LH 6
02/01/20 1
25
 Contd…
• Outline:
 – Why
 learning?
 – Supervised (Error based) learning
 • Gradient descent learning: Least Mean Square, Back
 Propagation algorithm
 • Stochastic learning
 – Unsupervised learning
 • Hebbian learning algorithm
 • Competitive learning
 Reinforced learning (output based)
 Genetic algorithms: operators
02/01/20 2
25
 What is learning
• Learning is the process of acquiring new or
 modifying existing knowledge, behaviors, skills,
 values, or preferences.
• Evidence that learning has occurred may be seen in changes in
 behavior from simple to complex.
02/01/20 3
25
 What is Machine learning?
• The machine Learning denotes changes in the system that are adaptive in
 the sense that they enable the system to do the same task more
 effectively
 the next time.
• Like human learning from past experiences, computer system learns
 from
 data, which represent some “past experiences” of an application domain.
• Therefore, Machine learning gives computers the ability to learn
 without
 being explicitly programmed.
02/01/20 4
25
 Why machine learning?
• To learn a target function (relation between input and output)that can be
 used to predict the values of a discrete class attribute,
 – e.g., male or female, and high-risk or low risk, etc.
• To model the underlying structure or distribution in the data in order to
 learn more about the data.
• To learns behavior through trial-and-error interactions with a
 dynamic environment.
02/01/20 5
25
 Types of Machine Learning
• Based on training set machine learning algorithms are classified into the
 following three categories:
02/01/20 6
25
 Supervised learning
• Supervised learning is where you have input variables (x) and an
 output
 variable (Y) and you use an algorithm to learn the mapping function from
 input to the output.
 the Y = f(X)
• Learning stops the algorithm achieves an acceptable level of
 when
 performance.
• when you have new input data (x) that you can predict the output variables
 (Y)
 for that data.
 f
 02/01/20 7
 25
 Supervised learning
• It is called supervised learning because the process of an algorithm learning
 from the training dataset can be thought of as a teacher supervising the learning
 process.
• We know the correct answers, the algorithm iteratively makes predictions
 on the training data and is corrected by the teacher.
 02/01/20 8
 25
 Supervised Learning Example 1
02/01/20 9
25
 Application Example of Supervised learning
• Suppose we have a dataset giving the living areas and prices of house from
 KTM:
 02/01/20 10
 25
 Application Example of Supervised learning
• We can plot this data:
 02/01/20 11
 25
 Application Example of Supervised learning
• Given a data like this , we can learn to predict the price of other houses in KTM
 as a function of the size of their area, such types of learning is known as
 supervised learning.
• Price of house = f(area of house)
 i.e., y= f(x)
 02/01/20 12
 25
 Unsupervised Learning
• Unsupervised learning is where you only have input data (X) and no
 corresponding output variables(targets/ labels).
• The goal for unsupervised learning is to model the underlying structure
 or distribution in the data in order to learn more about the data.
• These are called unsupervised learning because unlike
 supervised learning above there is no correct answers and there is no
 teacher.
• Algorithms are left to their own devises to discover and present the
 interesting structure in the data.
• E.g., Clustering
 02/01/20 13
 25
 Unsupervised learning Example1
02/01/20 14
25
 Unsupervised learning example 2
02/01/20 15
25
 Reinforcement learning
• Is learning behavior through trial-and-error interactions with a environment.
• Is learning how to act in order to maximize a reward (Encouragements).
• Reinforcement learning emphasizes learning feedback that evaluates the
 learner's performance without providing standards of correctness in the form of
 behavioral targets.
• Example: Bicycle learning, game playing, etc.
 02/01/20 16
 25
 Contd…
02/01/20 17
25
 Supervised learning
• Algorithm
 Classification:
 – To predict the outcome of a given sample where the output variable is
 in the form of categories(discrete). Examples include labels such as
 male and female, sick and healthy.
• Regression:
 – To predict the outcome of a given sample where the output variable is
 in the form of real values(continuous). Examples include real-valued
 labels denoting the amount of rainfall, the height of a person.
02/01/20 18
25
 Contd…
02/01/20 19
25
 Least Mean square Regression algorithm
• For a given data build the predicted output O close to the standard output
 y i.e., t (target) as:
• f(x)= O =w0 + w1 x1 + … + wn xn
• Train (adjust or choose) the wi’s such that they minimize the squared error
 – E[w1,…,wn] = ½ i=1 to m (ti-oi)2
• Determine the output for new input based on the lien which has
 minimum
 value of initial seared error.
02/01/20 20
25
 Gradient Descent Learning Rule
• Gradient descent algorithm starts with some initial guess for
 weights (coefficients) Wi and repeatedly performs the update until
 convergence :
• Wi= Wi - l* E[w]
• Where,
 – l is learning rate or step size or rate of weight adjustment.
 – Gradient:
 E[w]=[E/w0,… E/wn]
 E[wi]=/wi 1/2i=1 to m(ti-oi)2
 = /wi 1/2i=1 to m(ti-i wi xi)2
 = i=1 to m(ti- oi)(-xi)
02/01/20 21
25
 Contd…
• For a single training example up date rule is:
• Wi= Wi +l*(ti- oi)(xi)
• This is called LMS update rule
02/01/20 22
25
 Contd…
Gradient-Descent(training_examples, l)
 Each training example is a pair of the form <(x1,…xn),t> where (x1,…,xn) is
 input values, and t is the target output value, l is the learning rate (e.g. 0.1)
• Initialize each wi to some small random value
• Until the termination condition is met, Do
 – Initialize each wi to zero
 – For each <(x1,…xn),t> in training_examples Do
 • Input the instance (x1,…,xn) to the linear function and compute the
 output o
 • Calculate change in weight
 – wi= l* (t-o) xi
 – For each weight wi Do
 • wi=wi+wi
02/01/20 23
25
 Contd…
• If we have more than one training examples:
• Two way to modify this gradient descent algorithm
 fora training set of more than one example.
 – Batch gradient descent
 – Stochastic(Incremental ) gradient descent
02/01/20 24
25
 Contd…
• Batch mode : gradient descent
 – This method looks at every example in the entire training
 set on every step. So it is static.
 – Algorithm
 Repeat until convergene
 wi=wi + l* i=1 to m (ti- oi)xi over the entire data D and for each weight
 02/01/20 25
 25
 Contd…
• Stochastic (Incremental mode: )gradient descent
 – This algorithm repeatedly run through the training set and each time
 encounter a training example. So it is dynamic.
 – Algorithm:
 for(i=1 to m)
 wi=wi +l* (t-o) xi for each weight
02/01/20 26
25
 Contd…
• Comparison between batch gradient descent and
 stochastic gradient descent:
 BGD SGD
 For a single step take entire For a single step take only single
 training set training example
 So a costly operation It does not matter how large m is
 Slower Faster
 Not preferred Used when m is large
02/01/20 27
25
 Artificial Neural Network
• A neural network is composed of number of nodes or units , connected
 by
 links. Each link has a numeric weight associated with it.
 Input
 Hidden
 Layer Output
 Layer
 1 Layer
 4
 6
 2
 5
• Artificial neural networks are programs design to solve any
 problem by trying to mimic the structure and the function of our nervous
 system.
02/01/20 28
25
 Artificial neural network model:
• Input to the network are represented by mathematical symbol xn.
• Each of these inputs are multiplied by a connection weight, wn
 sum  w1 x1  w2 x2 ......  wnxn
• These products are simply summed, fed through the transfer function f()
 to generate result and output
02/01/20 29
25
 Back propagation algorithm
• Back propagation is a neural network learning algorithm. Learns by
 adjusting the weight so as to be able to predict the correct class label of
 the input.
Input: D training data set and their associated class label
 l= learning rate(normally 0.0-1.0)
Output: a trained neural network.
 Method:
 Step1: initialize all weights and bias in
network. Step2: while termination condition is not
satisfied .
 For each training tuple x in D
02/01/20 30
25
 Contd…
2.1 calculate output:
 For input layer
 Oj  Ij
 For hidden layer and output layer
 Ij  
 k O k *W kj   j
 1
 Oj  Ij
  1 e
2/2/2019
02/01/20 31 31
25
 Contd…
2. Calculate error:
 For output layer:
 err j  O j (1  O j )(j  O j )
 For hidden layer:
 err j  O j (1  O j )  (err k *W j k )
 k
3. Update weight
 W i j (new)  W i j (old )  l * O i
 * err j
4. Update bias
  j  j  l * err j
2/2/2019
02/01/20 32
25
 Contd…
• Example: Sample calculations for learning by the back-propagation algorithm.
• Figure above shows a multilayer feed-forward neural network. Let the
 learning rate be 0.9. The initial weight and bias values of the network are
 given in Table below, along with the first training tuple, X = (1, 0, 1), whose
 class label is 1.
02/01/20
2/2/2019 33 33
25
 Contd…
• Step1: Initialization
 – Let initial input weight and bias values are:
 – Initial input:
 X1 X2 X3
 1 0 1
 – Bias values:
  4  5  6
 -0.4 0.2 0.1
 – Initial
 weight:
 W14 W15 W24 W25 W34 W35 W46 W56
 0.2 -0.3 0.4 0.1 -0.5 0.2 -0.3 -0.2
02/01/20
2/2/2019 34 34
25
 Contd…
• 2. Termination condition: Weight of two successive iteration are nearly
 equal or user defined number of iterations are reached.
• 2.1For each training tuple X in D, calculate the output for each input layer:
 – For input Oj 
 layer: Ij
 • O1=I1=1
 • O2=I2=0
 • O3=I3=1
02/01/20
2/2/2019 35 35
25
 Contd…
• For hidden layer and output layer:
1. I 4  O1* W14  O2 * W24  O3* W34
 4
  1* 0.2  0 * 0.4  1* (-0.5)
  - 0.7
 1 1 1
 O4
    0.332
  I 4 1  ( 0.7 )
  ( 0.7 )
 1 e 1
2. (2.7182)
 I 5  O1* W15  O2 * W25  O3 * W35 
 5 e
  1* (-0.3)  0 * 0.1  1* 0.2
  0.1
 1 1 1
 O5  
   ( 0.1)  ( 0.1)
  I 5 1 0.525
02/01/20
25 1 e 1 e (2.7182)
 36 36
 Contd…
3. I 6  O4 * W46  O5* W56   5
  0.332*(0.3)  0.525*(-0.2) 1*
 0.1
  - 0.105
 1 1
 O6   
 1 I 6  ( 0.105)  ( 0.105)
  1 1 0.474
 e 1
 (2.7182)
02/01/20
2/2/2019 37 37
25
 Contd…
• Calculation of error at each node:
 err j  O j (1  O j )(j  O j )
• For output layer:
 err 6  O 6 (1  O 6 )( 6 
 O
 6 )0.474(1  0.474)(1  0.474) 
• For Hidden layer: 0.1311
 e r r  O (1  O )  ( e r r * W )
 j j j k jk
 j  O j (1  O j ) ( e r r k * W j k )
 er r O 5 (1  O 5 ) ( e r r 6 * W 5 6)
  0 . 5 2 5 (1  0 . 5 2 5 ) ( 0 . 1 3 1 1 * (  0 . 2 ) )
  0.0065
 e r r j  O j (1  O j ) ( e r r k * W j k )
  O 4 (1  O 4 ) ( e r r 6 * W 4 6 )
  0 . 3 3 2 (1  0 . 3 3 2 ) ( 0 . 1 3 1 1 * (  0 . 3 ) )
  0.0087
02/01/20
2/2/2019 38 38
25
 Contd…
• Update the weight: W i j (new)  W i j (old )  l * O i
1. * err j
 W 4 6(new)  W 4 6(old )  l * O 4 * err 6
  0.3  0.9 * 0.1311* 0.332
2. W 50.26
 6 (new)  W 5 6 (old )  l * O 5 *
 err 6
  0.2  0.9 * 0.1311* 0.525
  0.138
3. W 1 4(new)  W 1 4(old )  l * O 1
 * err 4
  0.2  0.9 * 0.0087*1
  0.192
02/01/20
2/2/2019 39 39
25
 Contd…
4. W 2 4(new)  W 2 4(old )  l * O 2 *
 err 4
  0.4  0.9 * (0.0087* 0
  0.4
5.
 W 3 4(new)  W 3 4(old )  l * O 3 *
 err 4
  0.5  0.9 * (0.0087*1
  0.508
6.
 W 15(new)  W 15(old)  l *O1
 *err
  0.3
 5
  0.9*(0.0065)
 *1
  0.306
02/01/20
2/2/2019 40 40
25
 Contd…
7. W 2 5(new)  W 2 5(old )  l * O 2 *
 err 5
  0.1  0.9 * (0.0065) * 0
  0.1
8.
 W 3 5(new)  W 3 5(old )  l * O 3 *
 err 5
  0.2  0.9 * (0.0065) *1
  0.194
02/01/20
2/2/2019 41 41
25
 Contd…
• Update the Bias value:   l
 j
1.  * err j
  
 j
 6 6 ( old ) l*
  0.1 err
 0.96 * 0.1311
  0.218
2.   5 5 ( o ld )  l * err 5
  0.2  0.9
 * (0.0065)
  0.195 And so on until convergence!
3.   6 4 ( o ld )  l * err 4
  (0.4)  0.9 * (0.0087)
02/01/20
2/2/2019
25
 42 42
 Hebbian Learning
1. Take numberAlgorithm
 of steps = Number of inputs(X).
2. In each step
 I. Calculate Net input = weight(W) * input(X)
 II. Calculate Net Output (O)= f(net input)
 III. Calculate change in weight = learning rate * O* X
 IV. New weight = old weight + change in weight
3. Repeat step2 until terminating condition is
 satisfied
 2/2/2019
 02/01/20 43
 25
 Competitive Learning Rule (Winner-takes-all)
Basic Concept of Competitive Network:
•This network is just like a single layer feed-forward network with
 feedback
connection between outputs.
•The feed- forward connections are excitatory types.
•While the feed-back connections between outputs are inhibitory type, shown
by dotted lines, which means the competitors never support themselves.
 2/2/2019
 02/01/20 44 44
 25
 Basic Concept of Competitive Learning Rule:
• It is unsupervised learning algorithm in which the output
 nodes try to compete with each other to represent the input
 pattern.
• Hence, the main concept is that during training, the output unit
 with the highest Net-input to a given input, will be
 declared the winner.
• Then only the winning neuron’ s weights are updated and the
 rest of the neurons are left unchanged.
• Therefore , This rule is also called Winner-takes-all
02/01/20 45
25
 Competitive learning algorithm
• For n number of input and m number of output
• Repeat Until convergence
• Calculate the Net input
• The outputs of the network are determined by a ``winner-take-all'' competition such
 that only the output node receiving the maximal net input will output 1 while all
 others output 0:
• Calculate the change in weight:
• Update the weight:
 Where,
 02/01/20 46
 25
 Genetic Algorithm
02/01/20 47
25
 Introduction to Genetic
•
 Algorithms
 Nature has always been a great source of inspiration to all mankind.
• A genetic algorithm is a search heuristic that is inspired by Charles Darwin’s
 theory of natural evolution.
 “Select The Best, Discard The Rest”
• i.e., the fittest individuals are selected for reproduction in order to produce
 offspring of the next generation.
• GAs were developed by John Holland and his students and colleagues at
 the University of Michigan
 02/01/20 48
 25
 Notion of Natural Selection
• The process of natural selection starts with the selection of fittest
 individuals from a population. They produce offspring which inherit the
 characteristics of the parents and will be added to the next generation.
• If parents have better fitness, their offspring will be better than parents
 and have a better chance at surviving.
• This process keeps on iterating and at the end, a generation with the
 fittest individuals will be found.
• This notion can be applied for a search problem. We consider a set of
 solutions for a problem and select the set of best ones out of them.
 02/01/20 49
 25
 Basic Terminology
• Population: set of individuals each representing a possible solution to a
 given problem.
• Gene: a solution to problem represented as a set of parameters
 ,these
 parameters known as genes.
• Chromosome: genes joined together to form a string of values
 called chromosome.
• Fitness Function: measures the quality of solution. It is problem dependent.
02/01/20 50
25
 Nature to Computer Mapping
 Nature Computer
 Population Set of solutions.
 Individual Solution to a
 Fitness problem.
 Chromosom Quality of a solution.
 e Gene Encoding for a
 Reproductio Solution.
 n Part of the encoding
 of a solution.
 Crossover
02/01/20 51
25
Five phases are considered in a genetic algorithm.
 02/01/20 52
 25
 Simple Genetic
 Algorithm
Simple_Genetic_Algorithm()
 {
 Initialize the Population;
 Calculate Fitness Function;
 While(Fitness Value != Optimal Value)
 {
 Selection;//Natural Selection, Survival Of
 Fittest
 Crossover;//Reproduction, Propagate favorable
 characteristics
 Mutation;//Mutation
 Calculate Fitness Function;
 }
 2/2/2019
 02/01/20 53
 25 }
 Initialization of Population
• There are two primary methods to initialize a population in a GA.
 They are −
• Random Initialization
• Heuristic initialization
• There are several things to be kept in mind when dealing with GA
 population −
 – The diversity of the population should be maintained otherwise it might
 lead to premature convergence.
 – The population size should not be kept very large as it can cause a GA
 to slow down, while a smaller population might not be enough for a
 good mating pool. Therefore, an optimal population size needs to be
 decided by trial and error.
02/01/20 54
25
 Fitness Function
• The fitness function which takes a candidate solution to the
 problem as input and produces as output how fit an individual is
 (the ability of an individual to compete with other individuals).
• A fitness function should possess the following characteristics −
 – The fitness function should be sufficiently fast to compute.
 – It must quantitatively measure how fit a given solution is or
 how fit individuals can be produced from the given solution.
 02/01/20 55
 25
 Selection
• The idea of selection phase is to select the fittest individuals and let
 them pass their genes to the next generation.
• Two pairs of individuals (parents) are selected based on their fitness
 scores. Individuals with high fitness have more chance to be selected for
 reproduction.
• Selection of the parents can be done by using any one of the following
 strategy:
 – Fitness Proportionate Selection(Roulette Wheel Selection)
 – Tournament Selection
 – Rank Selection
 02/01/20 56
 25
 Contd..
• Fitness Proportionate Selection(Roulette Wheel Selection):
 – Chance of selection is directly propositional to the fitness value. This is
 chance not
 fixed so, any one can become parent.
 02/01/20 57
 25
 Contd..
• Tournament Selection: Take randomly k individuals and select only
 one among k to make the parent of the next generation.
 02/01/20 58
 25
 Contd..
• Rank Selection: mostly used when the individuals in the population have very
 close fitness values . This leads to each individual having an almost equal share
 of the pie chart. and hence each individual no matter how fit relative to each
 other has an approximately same probability of getting selected as a parent. This
 in turn leads to a loss in the selection pressure towards fitter individuals,
 making the GA to make poor parent selections in such situations.
 02/01/20 59
 25
 Contd..
• Rank every individual in the population is
 Selection: ranked
 according to their fitness. The higher ranked individuals
 are preferred more than the lower ranked ones.
 02/01/20 60
 25
 Crossover
• Crossover is the most significant phase in a genetic algorithm. For
 each pair of parents to be mated, a crossover point is chosen at
 random from within the genes.
• For example, consider the crossover point to be 3 as shown below.
 02/01/20 61
 25
 Contd..
• Offspring are created by exchanging the genes of parents among
 themselves until the crossover point is reached.
• The new offspring are added to the population
 02/01/20 62
 25
 Mutation
• Now if you think in the biological sense, are the children produced have the
 same traits as their parents? The answer is NO. During their growth, there
 is some change in the genes of children which makes them different from
 its parents.
• This process is known as mutation, which may be defined as a random
 tweak in
 the chromosome.
• This implies that some of the bits in the bit string can be flipped.
• Mutation occurs to maintain diversity within the population and prevent from
 the premature convergence.
 02/01/20 63
 25
 Terminatio
 n
• The algorithm terminates:
 – If the population has converged (does not produce offspring which
 are significantly different from the previous generation). Then it is said that
 the genetic algorithm has provided a set of solutions to our problem.
 – OR If the predefined number of generation is reached.
 02/01/20 64
 25
 Issues for GA
•
 Practitioners
 Choosing basic implementation issues:
 – representation
 – population size, mutation rate, ...
 – selection, deletion policies
 – crossover, mutation operators
• Termination Criteria
02/01/20 65
25
 Benefits of Genetic
• Algorithms
 Concept is easy to understand
• Modular, separate from application
• Supports multi-objective optimization
• Good for “noisy” environments
• Always an answer; answer gets better with time
• Easy to exploit previous or alternate solutions
• Flexible building blocks for hybrid applications
• Substantial history and range of use
02/01/20 66
25
 When to Use a GA
• Alternate solutions are too slow or overly complicated
• Need an exploratory tool to examine new approaches
• Problem is similar to one that has already been successfully solved by
 using
 a GA
• Want to hybridize with an existing solution
• Benefits of the GA technology meet key problem requirements
02/01/20 67
25
 Thank You !
02/01/20 68
25