Neural Networks: Least Mean Square (LSM) Algorithm

CHAPTER 03 THE LEAST-MEAN SQUARE ALGORITHM CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq M. Mostafa Computer Science Department Faculty of Computer & Information Sciences AIN SHAMS UNIVERSITY (most of figures in this presentation are copyrighted to Pearson Education, Inc.)

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq  Introduction  Filtering Structure of the LMS Algorithm  Unconstrained Optimization: A Review  Method of Steepest Descent  Newton’s Method  Gauss-Newton Method  The Least-Mean Square Algorithm  Computer Experiment 2 Model Building Through Regression

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Introduction  In Least-Mean Square (LMS) , developed by Widrow and Hoff (1960), was the first linear adaptive- filtering algorithm (inspired by the perceptron) for solving problems such as prediction:  Some features of the LMS algorithm:  Linear computational complexity with respect to adjustable parameters.  Simple to code  Robust with respect to external disturbance 3

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Filtering Structure of the LMS Algorithm  Unknown dynamic system described by: Figure 3.1 (a) Unknown dynamic system. (b) Signal-flow graph of adaptive model for the system; the graph embodies a feedback loop set in color. 4 T M ixixix )](),...,(),([ 21x  ,...,...,1),(),(: niidiΤ x where  M is the input dimensionality  The stimulus vector x(i) can arise in either way of the following:  Snapshot data, The M input elements of x originate at different point in space  Temporal data, The M input elements of x represent the set of present and (M-1) past values of some excitation.

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Filtering Structure of the LMS Algorithm  The problem addressed is “how to design a multi- input-single-output model of the unknown dynamic system by building it around a single linear neuron”  Adaptive filtering (system identification):  Start from an arbitrary setting of the adjustable weights.  Adjustment of weights are made on continuous basis.  Computation of adjustments to the weight are completed inside one interval that is one sampling period long.  Adaptive filter consists of two continuous processes:  Filtering process: computation of the output signal y(i) and the error signal e(i).  Adaptive process: The automatic adjustment of the weights according to the error signal. 5

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Filtering Structure of the LMS Algorithm  These two processes together constitute a feedback loop acting around the neuron.  Since the neuron is linear: where The error signal is determined by the cost function used to derive the adaptive-filtering algorithm, which is closely related to optimization process. 6 )()()()()()()( 1 iiiyixiwiviy T M k kk xw   T M iwiwiwi )](),...,(),([)( 21w )()()( iyidie 

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Unconstrained Optimization: A Review  Consider the cost function E (w) that is continuously differentiable function of unknown weights w. we need to find the optimal solution w* that satisfies:  That is, we need to solve the unconstrained-optimization problem stated as: “Minimize the cost function E (w) with respect to the weight vector w”  The necessary condition for optimality:  Usually the solution is based on the idea of local iterative descent: Starting with a guess w(0), generate a sequence of weight vectors w(1), w(2), …, such that: 7 )(*)( ww EE  0*)(  wE ))(())1(( nn ww EE 

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Unconstrained Optimization: A Review Method of Steepest descent  The successive adjustments applied to the weight vector w are in the direction of steepest descent; that is in a direction opposite to the gradient of E (w).  If g = E (w) , then the steepest descent algorithm is  Where  is a positive constant called the step size, or learning- rate, parameter. In each step, the algorithm applies the correction:  HW: Prove the convergence of this algorithm and show how it is influenced by the learning rate . (Sec. 3.3 of Haykin) 8 )()()1( nnn gww  )( )()1()( n nnn g www  

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Unconstrained Optimization: A Review  When  is small, the transient response of the algorithm is overdamped, i.e., w(n) follows smooth path.  When  is large, the transient response of the algorithm is underdamped, i.e., w(n) follows zigzaging (oscillatory) path.  When  exceeds critical value, the algorithm becomes unstable. Figure 3.2 Trajectory of the method of steepest descent in a two-dimensional space for two different values of learning-rate parameter: (a) small η (b) large η .The coordinates w1 and w2 are elements of the weight vector w; they both lie in the W -plane. 9

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Unconstrained Optimization: A Review Newton’s Method  The idea is to minimize the quadratic approximation of the cost function E (w) around the current point w(n):  Where H(n) is the Hessian matrix of E (w)  The weights are updated by minimizing E (w) as: 10 )()()( 2 1 )()(g ))(())1(())(( T nnnnn nnn T wHww www   EEE                                           2 2 2 2 1 2 2 2 2 2 2 12 2 1 2 21 2 2 1 2 2 ))(()( MMM M M wwwww wwwww wwwww nn EEE EEE EEE E     wH )()()( )()()1( 1 nnn nnn gHw www   

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Unconstrained Optimization: A Review Newton’s Method  Generally, Newton’s method converges quickly and does not exhibit the zigzagging behavior of the method of steepest- descent.  However, Newton’s method has two main disadvantages:  the Hessian matrix H(n) has to be a positive definite matrix for all n, which is not guaranteed by the algorithm.  This is solved by the modified Gauss-Newton method.  It has high computational complexity. 11

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq The Least-Mean Square Algorithm  The aim of the LMS algorithm is to minimize the instantaneous value of the cost function E (w) :  Differentiation of E (w), with respect to w, yields :  As with the least-square filters, the LMS operates on linear neuron, we can write: and Finally 12 )( 2 1 )ˆ( 2 newE ww w      )( )( ˆ )ˆ( ne ne E )( ˆ e(n) )()(ˆ)()( nnnidne T x w xw     )()( ˆ )ˆ( )( nenn x w w g     E )()()(ˆ)()(ˆ)1(ˆ nennngnn xwww  

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq The Least-Mean Square Algorithm  The inverse of the learning-rate acts as a memory of the LMS algorithm. The smaller the learning-rate , the longer the memory span over the past data,  which leads to more accurate results but with slow convergence rate.  In the steepest-descent algorithm the weight vector w(n) follows a well-defined trajectory in the weight space for a prescribed .  In contrast, in the LMS algorithm, the weight vector w(n) traces a random trajectory. For this reason, the LMS algorithm is sometimes referred to as “stochastic gradient algorithm.”  Unlike the steepest-descent, the LMS algorithm does not require knowledge of the statistics of the environment. It produces an instantaneous estimate of the weight vector. 13

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq The Least-Mean Square Algorithm  Summary of the LMS Algorithm 14

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Virtues and Limitation of the LMS Alg. Computational Simplicity and Efficiency:  The Algorithm is very simple to code, only two or three line of code.  the computational complexity of the algorithm is linear in the adjustable parameters. 15

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Cost Function Usually linear models produce concave cost functions Figure copyright of Andrew Ng. 17

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 1 0 J(0,1) Cost Function Different starting point could lead to different solutionFigure copyright of Andrew Ng. 18

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq The corresponding solution (Contour of the cost function Finding a Solution Figure copyright of Andrew Ng. 20

The corresponding solution (Contour of the cost function Figure copyright of Andrew Ng. 21

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Virtues and Limitation of the LMS Alg. Robustness  Since the LMS is model independent, therefore it is robust with respect to disturbance.  That is, the LMS is optimal in accordance with H norm (the response of the transfer function T of the estimator).  The philosophy of the optimality is to accommodate with the worst-case scenario: • “If you don’t know what you are up against, plan for the worst scenario and optimize.” Figure 3.8 Formulation of the optimal H∞ estimation problem. The generic estimation error at the transfer operator’s output could be the weight-error vector, the explanational error, etc. 29

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Virtues and Limitation of the LMS Alg. Factors Limiting the LMS Performance  The primary limitations of the LMS algorithm are:  Its slow rate of convergence (which become serious when the dimensionality of the input space becomes high), and  Its sensitivity to variation in the eigen structure of the input. (it typically requires a number of iterations equal to about 10 times the dimensionality of the input data space for it to converge to a stable solution. • The sensitivity to changes in environment become particularly acute when the condition number of the LMS algorithm is high. • The condition number, (R) = max /min, • Where max and min are the maximum and minimum eigenvalues of the correlation matrix, R xx. 30

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Learning-rate Annealing Schedules  The slow-rate convergence may be attributed to maintaining the learning- rate parameter, , constant.  In stochastic approximation, the learning-rate parameter is time- varying parameter  The most common form used is:  An alternative form, is the search-then- converge schedule. Figure 3.9 Learning-rate annealing schedules: The horizontal axis, printed in color, pertains to the standard LMS algorithm. 31 n c n )( )/(1 )(    n n o  

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Computer Experiment  d=1 Figure 3.6 LMS classification with distance 1, based on the double-moon configuration of Fig. 1.8. 32

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Computer Experiment  d=-4 Figure 3.7 LMS classification with distance –4, based on the double-moon configuration of Fig. 1.8. 33

•Problems: •3.3 •Computer Experiment •3.10, 3.13 Homework 3 34

Multilayer Perceptrons Next Time 35

Neural Networks: Least Mean Square (LSM) Algorithm

More Related Content

What's hot

Viewers also liked

Similar to Neural Networks: Least Mean Square (LSM) Algorithm

Recently uploaded

Neural Networks: Least Mean Square (LSM) Algorithm