Module 3
Artificial Neural Network
 Topics
1. Biological Motivation
2. Neural Network Representations
3. Appropriate Problems for Neural Network
 Learning
4. Perceptrons (Representation, Training Rule,
 Gradient Descent and Delta Rule , Remarks)
5. Multilayer and the Backpropagation Algorithm
6. Remarks on the BACKPROPOGATION
 Algorithm
 Introduction
•Neuralnetworklearningmethodsprovidearobustapproachtoapproximatin
 greal-valued,discrete-valued,andvector-valuedtargetfunctions.
•Forcertaintypesofproblems,suchaslearningtointerpretcomplexreal-
 worldsensordata,artificialneuralnetworksareamongthemosteffectivel
 earningmethodscurrentlyknown.
•Forexample,theBackpropagationAlgorithmdescribedinthischapterh
 asprovensurprisinglysuccessfulinmanypracticalproblemsuchasle
 arningtorecognize
 •handwrittencharacters(LeCunetal.1989),
 •learningtorecognizespokenwords(Langetal.1990),and
 •learningtorecognizefaces(Cottrell1990).
 1. Biological Motivation
•When we say "Neural Networks", we mean artificial
 Neural Networks (ANN). The idea of ANN is based on
 biological neural networks like the brain.
•Brain is a complex web of interconnected neurons . There are
 1011 𝑛𝑒𝑢𝑟𝑜𝑛𝑠 𝑎𝑛𝑑 𝑒𝑎𝑐ℎ 𝑛𝑒𝑢𝑟𝑜𝑛 𝑖𝑠 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 𝑡𝑜 1004
 𝑛𝑒𝑢𝑟𝑜𝑛𝑠
•The basic structure of a neural network is the neuron. A
 neuron in biology consists of three major parts: the soma
 (cell body), the dendrites, and the axon.
•The dendrites branch off from the soma in a tree-like way and
 getting thinner with every branch.
•They receive signals (impulses) from other neurons at
 synapses.
Biological Motivation
 Artificial Neural Network
•A neural network is made up
 of simple processing unit
 called neurons. A neural net
 can be viewed as massively
 parallel distributed
 processor which acquires
 experimental knowledge
 from its environment
 through a learning process.
•The acquired knowledge is
 stored in the form of inter
 neuron connection strengths
 , known as synaptic weights
Types of ANN
Some Activation functions of a neuron
Biological Neural network and Artificial Neural Network
 2. Neural Network Representation
•A prototypical example of ANN learning is provided by
 Pomerleau's (1993) system ALVINN, which uses a learned
 ANN to steer an autonomous vehicle driving at normal
 speeds on public highways.
•The input to the neural network is a 30 x 32 grid of pixel
 intensities obtained from a forward-pointed camera mounted
 on the vehicle.
•The network output is the direction in which the vehicle is
 steered. The ANN is trained to mimic the observed steering
 commands of a human driving the vehicle for approximately 5
 minutes. ALVINN has used its learned networks to
 successfully drive at speeds up to 70 miles per hour and for
 distances of 90 miles on public highways (driving in the left
 lane of a divided public highway, with other vehicles present).
 Example
•Figure illustrates the neural network representation
 used in one version of the ALVINN system,
•There are four units that receive inputs directly from all
 of the 30 x 32 pixels in the image. These are called
 "hidden“ units because their output is available only
 within the network and is not available as part of the
 global network output.
•The network structure of ALYINN is typical of many
 ANNs. Here the individual units are interconnected
 in layers that form a directed acyclic graph. In
 general, ANNs can be graphs with many types of
 structures-acyclic or cyclic, directed or undirected.
 3. APPROPRIATE PROBLEMS FOR NEURAL NETWORKLEARNING
ANN learning is well-suited to problems in the following
 cases:
•Instances are represented by many attribute-value pairs
•The target function output may be discrete-valued, real-
 valued, or a vector of several real-or discrete-valued
 attributes
•The training examples may contain errors. ANN learning
 methods are quite robust to noise in the training data.
•Long training times are acceptable.
•Fast evaluation of the learned target function may be
 required.
•The ability of humans to understand the learned target
 function is not important.
4. Perceptron
•A perceptron is a feedforward
 network with one output
 neuron that learns a separating
 hyper plane in a pattern space.
 The percepron is used where in
 the data is linearly separable.
•One type of ANN system is based
 on a unit called a perceptron,
 illustrated in Figure
• •A perceptron takes a vector of
 real-valued inputs, calculates a
 linear combination of these
 inputs, then outputs a 1 if the
 result is greater than some
 threshold and -1 otherwise
 4.1 Representational Power of Perceptron
•We can view the perceptron as representing a hyperplane decision
 surface in the n-dimensional space of instances (i.e., points). The
 perceptron outputs a 1 for instances lying on one side of the
 hyperplane and outputs a -1 for instances lying on the other side, as
 illustrated in Figure
•The equation for this decision hyperplane is w.x= 0.
•Some sets of positive and negative examples cannot be separated by any
 hyperplane. Those that can be separated are called linearly separable
 sets of examples.
•Perceptrons can represent all of the primitive boolean functions AND,
 OR,NAND (NOT AND), and NOR (NOT OR).
•Unfortunately, however, some boolean functions cannot be represented
 by a single perceptron, such as the XOR function whose value is 1 if
 and only if x1# x2
 4.2 The Perceptron Training Rule
•Perceptron Training Rule : The Learning problem is to
 determine a weight vector that causes the perceptron to
 produce the correct +1 or -1 output for each of the given
 training example. At each step the system weights are
 modified to reduced the error.
•The Perceptron Learning Theorem (Rosenblatt, 1960): Given
 enough training examples, there is an algorithm that will
 learn any linearly separable function.
•Theorem1(MinskyandPapert,1969):The perceptron rule
 converges to weights that correctly classify all training
 examples provided the given data set represents a function
 that is linearly separable
 4.3 Gradient Descent and the Delta Rule
•The perceptron rule fail to converge if the examples are
 not linearly separable
•The second rule called delta rule is designed to
 overcome this difficulty .
•If the training examples are not linearly separable,
 the delta rule converges toward a best-fit
 approximation to the target concept.
•The key idea behind the delta rule is to use gradient
 descent to search the hypothesis space of possible
 weight vectors to find the weights that best fit the
 training examples
• The delta training rule is best understood by
 considering the task of training an unthreshold
 perceptron ; that is a linear unit for which the output
 O is given by
• •The measure for the training error of a hypothesis
 (weight vector)
 Visualizing the hypothesis space
To understand the gradient descent algorithm, it is helpful to visualize the
entire hypothesis space of possible weight vectors and their associated E
values, as illustrated in Figure .
Derivation of the Gradient Descent Rule