Artificial Neural Networks for Data Mining

Dr. Kamal Gulati Artificial Neural Networks for data mining

Data Mining: Classification and Prediction • 1. Classification with decision trees • 2. Artificial Neural Networks

1. CLASSIFICATION WITH DECISION TREES • Classification is the process of learning a model that describes different classes of data. The classes are predetermined. • Example: In a banking application, customers who apply for a credit card may be classify as a “good risk”, a “fair risk” or a “poor risk”. Hence, this type of activity is also called supervised learning. • Once the model is built, then it can be used to classify new data.

• The first step, of learning the model, is accomplished by using a training set of data that has already been classified. Each record in the training data contains an attribute, called the class label, that indicates which class the record belongs to. • The model that is produced is usually in the form of a decision tree or a set of rules. • Some of the important issues with regard to the model and the algorithm that produces the model include: – the model’s ability to predict the correct class of the new data, – the computational cost associated with the algorithm – the scalability of the algorithm. • Let examine the approach where the model is in the form of a decision tree. • A decision tree is simply a graphical representation of the description of each class or in other words, a representation of the classification rules.

• Example : Suppose that we have a database of customers on the AllEletronics mailing list. The database describes attributes of the customers, such as their name, age, income, occupation, and credit rating. The customers can be classified as to whether or not they have purchased a computer at AllElectronics. • Suppose that new customers are added to the database and that you would like to notify these customers of an upcoming computer sale. To send out promotional literature to every new customers in the database can be quite costly. A more cost-efficient method would be to target only those new customers who are likely to purchase a new computer. A classification model can be constructed and used for this purpose. • The figure 2 shows a decision tree for the concept buys_computer, indicating whether or not a customer at AllElectronics is likely to purchase a computer.

Each internal node represents a test on an attribute. Each leaf node represents a class. A decision tree for the concept buys_computer, indicating whether or not a customer at AllElectronics is likely to purchase a computer.

Training data tuples from the AllElectronics customer database age income student credit_rating <=30 high no fair <=30 high no excellent 31…40 high no fair >40 medium no fair >40 low yes fair >40 low yes excellent 31…40 low yes excellent <=30 medium no fair <=30 low yes fair >40 medium yes fair <=30 medium yes excellent 31…40 medium no excellent 31…40 high yes fair >40 medium no excellent Class No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No

8 age? <= 30 >40 31…40 income student credit_rating class high no fair no high no excellent no medium no fair no low yes fair yes medium yes excellent yes income student credit_rating class high no fair yes low yes excellent yes medium no excellent yes high yes fair yes income student credit_rating class medium no fair yes low yes fair yes low yes excellent no medium yes fair yes medium no excellent no

9 Extracting Classification Rules from Trees • Represent the knowledge in the form of IF-THEN rules • One rule is created for each path from the root to a leaf • Each attribute-value pair along a path forms a conjunction • The leaf node holds the class prediction • Rules are easier for humans to understand. Example IF age = “<=30” AND student = “no” THEN buys_computer = “no” IF age = “<=30” AND student = “yes” THEN buys_computer = “yes” IF age = “31…40” THEN buys_computer = “yes” IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “no” IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “yes”

10 1. NEURAL NETWORK REPRESENTATION • An ANN is composed of processing elements called or perceptrons, organized in different ways to form the network’s structure. Processing Elements • An ANN consists of perceptrons. Each of the perceptrons receives inputs, processes inputs and delivers a single output. The input can be raw input data or the output of other perceptrons. The output can be the final result (e.g. 1 means yes, 0 means no) or it can be inputs to other perceptrons.

11 The network • Each ANN is composed of a collection of perceptrons grouped in layers. A typical structure is shown in Fig.2. Note the three layers: input, intermediate (called the hidden layer) and output. Several hidden layers can be placed between the input and output layers. Figure 2

12 Appropriate Problems for Neural Network • ANN learning is well-suited to problems in which the training data corresponds to noisy, complex sensor data. It is also applicable to problems for which more symbolic representations are used. • The backpropagation (BP) algorithm is the most commonly used ANN learning technique. It is appropriate for problems with the characteristics: – Input is high-dimensional discrete or real-valued (e.g. raw sensor input) – Output is discrete or real valued – Output is a vector of values – Possibly noisy data – Long training times accepted – Fast evaluation of the learned function required. – Not important for humans to understand the weights • Examples: – Speech phoneme recognition – Image classification – Financial prediction

13 NEURAL NETWORK APPLICATION DEVELOPMENT The development process for an ANN application has eight steps. • Step 1: (Data collection) The data to be used for the training and testing of ANN are collected. Important considerations are that the particular problem is amenable to ANN solution and that adequate data exist and can be obtained. • Step 2: (Training and testing data separation) Trainning data must be identified, and a plan must be made for testing the performance of ANN. The available data are divided into training and testing data sets. For a moderately sized data set, 80% of the data are randomly selected for training, 10% for testing, and 10% secondary testing. • Step 3: (Network architecture) A network architecture and a learning method are selected. Important considerations are the exact number of nodes and the number of layers.

14 • Step 4: (Parameter tuning and weight initialization) There are parameters for tuning ANN to the desired learning performance level. Part of this step is initialization of the network weights and parameters, followed by modification of the parameters as training performance feedback is received. – Often, the initial values are important in determining the effectiveness and length of training. • Step 5: (Data transformation) Transforms the application data into the type and format required by the ANN. • Step 6: (Training) Training is conducted iteratively by presenting input and known output data to the ANN. The ANN computes the outputs and adjusts the weights until the computed outputs are within an acceptable tolerance of the known outputs for the input cases.

15 • Step 7: (Testing) Once the training has been completed, it is necessary to test the network. – The testing examines the performance of ANN using the derived weights by measuring the ability of the network to classify the testing data correctly. – Black-box testing (comparing test results to historical results) is the primary approach for verifying that inputs produce the appropriate outputs. • Step 8: (Implementation) Now a stable set of weights are obtained. – Now ANN can reproduce the desired output given inputs like those in the training set. – The ANN is ready to use as a stand-alone system or as part of another software system where new input data will be presented to it and its output will be a recommended decision.

16 BENEFITS AND LIMITATIONS OF NEURAL NETWORKS 6.1 Benefits of ANNs • Usefulness for pattern recognition, classification, generalization, abstraction and interpretation of imcomplete and noisy inputs. (e.g. handwriting recognition, image recognition, voice and speech recognition, weather forecasing). • Providing some human characteristics to problem solving that are difficult to simulate using the logical, analytical techniques of expert systems and standard software technologies. (e.g. financial applications). • Ability to solve new kinds of problems. ANNs are particularly effective at solving problems whose solutions are difficult to define. This opened up a new range of decision support applications formerly either difficult or impossible to computerize.

[Artificial] Neural Networks • A class of powerful, general-purpose tools readily applied to: – Prediction – Classification – Clustering • Biological Neural Net (human brain) is the most powerful – we can generalize from experience • Computers are best at following pre-determined instructions • Computerized Neural Nets attempt to bridge the gap – Predicting time-series in financial world – Diagnosing medical conditions – Identifying clusters of valuable customers – Fraud detection – Etc…

Neural Networks • When applied in well-defined domains, their ability to generalize and learn from data “mimics” a human’s ability to learn from experience. • Very useful in Data Mining…better results are the hope • Drawback – training a neural network results in internal weights distributed throughout the network making it difficult to understand why a solution is valid

Neural Networks What is a Neural Network? Similarity with biological network Fundamental processing elements of a neural network is a neuron 1.Receives inputs from other source 2.Combines them in someway 3.Performs a generally nonlinear operation on the result 4.Outputs the final result •Biologically motivated approach to machine learning

Neural Network History • 1930s thru 1970s • 1980s: – Back propagation – better way of training a neural net – Computing power became available – Researchers became more comfortable with n-nets – Relevant operational data more accessible – Useful applications (expert systems) emerged • Check out Fair Isaac (www.fairisaac.com) which has a division here in San Diego (formerly HNC)

Neural Network • Neural Network learns by adjusting the weights so as to be able to correctly classify the training data and hence, after testing phase, to classify unknown data. • Neural Network needs long time for training. • Neural Network has a high tolerance to noisy and incomplete data

Neural Network Classifier • Input: Classification data It contains classification attribute • Data is divided, as in any classification problem. [Training data and Testing data] • All data must be normalized. (i.e. all values of attributes in the database are changed to contain values in the internal [0,1] or[-1,1]) Neural Network can work with data in the range of (0,1) or (-1,1) • Two basic normalization techniques [1] Max-Min normalization [2] Decimal Scaling normalization

Loan Prospector – HNC/Fair Isaac • A Neural Network (Expert System) is like a black box that knows how to process inputs to create a useful output. • The calculation(s) are quite complex and difficult to understand

Neural Net Limitations • Neural Nets are good for prediction and estimation when: – Inputs are well understood – Output is well understood – Experience is available for examples to use to “train” the neural net application (expert system) • Neural Nets are only as good as the training set used to generate it. The resulting model is static and must be updated with more recent examples and retraining for it to stay relevant

Neural Network Training • Training is the process of setting the best weights on the edges connecting all the units in the network • The goal is to use the training set to calculate weights where the output of the network is as close to the desired output as possible for as many of the examples in the training set as possible • Back propagation has been used since the 1980s to adjust the weights (other methods are now available): – Calculates the error by taking the difference between the calculated result and the actual result – The error is fed back through the network and the weights are adjusted to minimize the error

27 Introduction • Data Mining Definitions: – Building compact and understandable models incorporating the relationships between the description of a situation and a result concerning the situation. – Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases.

28 Kinds of Data Mining Problems • Classification / Segmentation • Forecasting/Prediction (how much) • Association rule extraction (market basket analysis) • Sequence detection

29 Data Mining Techniques: • Neural Networks • Decision Trees • Multivariate Adaptive Regression Splines (MARS) • Rule Induction • Nearest Neighbor Method and discriminant analysis • Genetic Algorithms • Boosting

30 Neural Networks • What are they? – Based on early research aimed at representing the way the human brain works – Neural networks are composed of many processing units called neurons • Types (Supervised versus Unsupervised) • Training

31 Neural Networks are great, but.. • Problem 1: The black box model! – Solution: 1. Do we really need to know? – Solution 2. Rule Extraction techniques • Problem 2: Long training times – Solution 1: Get a faster PC with lots of RAM – Solution 2: Use faster algorithms “For example: Quickprop” • Problems 3-: Back propagation – Solution: Evolutionary Neural Networks!

32 Rule Extraction Techniques • Representation Methods • Extraction Strategy • Network Requirement

Neural Network Concepts • Neural networks (NN): a brain metaphor for information processing • Neural computing • Artificial neural network (ANN) • Many uses for ANN for – pattern recognition, forecasting, prediction, and classification • Many application areas – finance, marketing, manufacturing, operations, information systems, and so on

Biological Neural Networks Soma Axon Axon Synapse Synapse Dendrites Dendrites Soma • Two interconnected brain cells (neurons)

Processing Information in ANN w1 w2 wn x1 x2 xn . . . Y Y1 Yn Y2 Inputs Weights Outputs . . . Neuron (or PE)   n i iiWXS 1 )( Sf Summation Transfer Function • A single neuron (processing element – PE) with inputs and outputs

Elements of ANN • Processing element (PE) • Network architecture – Hidden layers – Parallel processing • Network information processing – Inputs – Outputs – Connection weights – Summation function

Neural Network Architectures Recurrent Neural Networks

Learning in ANN • A process by which a neural network learns the underlying relationship between input and outputs, or just among the inputs • Supervised learning – For prediction type problems – E.g., backpropagation • Unsupervised learning – For clustering type problems – Self-organizing – E.g., adaptive resonance theory

A Taxonomy of ANN Learning Algorithms Learning Algorithms Discrete/binary input Continuous Input Surepvised Unsupervised · Delta rule · Gradient Descent · Competitive learning · Neocognitron · Perceptor · Simple Hopefield · Outerproduct AM · Hamming Net · ART-1 · Carpenter / Grossberg · ART-3 · SOFM (or SOM) · Other clustering algorithms Architectures Supervised Unsupervised Recurrent Feedforward Extimator Extractor · Hopefield · SOFM (or SOM)· Nonlinear vs. linear · Backpropagation · ML perceptron · Boltzmann · ART-1 · ART-2 UnsupervisedSurepvised

A Supervised Learning Process Compute output Is desired output achieved? Stop learning Adjust weights Yes No ANN Model Three-step process: 1. Compute temporary outputs 2. Compare outputs with desired targets 3. Adjust the weights and repeat the process

How a Network Learns • Example: single neuron that learns the inclusive OR operation * See your book for step-by-step progression of the learning process Learning parameters:  Learning rate  Momentum

Backpropagation Learning • Backpropagation of Error for a Single Neuron w1 w2 wn x1 x2 xn . . . Yi Neuron (or PE)   n i iiWXS 1 )( Sf Summation Transfer Function )(SfY  a(Zi – Yi) error

Backpropagation Learning • The learning algorithm procedure: 1. Initialize weights with random values and set other network parameters 2. Read in the inputs and the desired outputs 3. Compute the actual output (by working forward through the layers) 4. Compute the error (difference between the actual and desired output) 5. Change the weights by working backward through the hidden layers 6. Repeat steps 2-5 until weights stabilize

Neural Network Architectures • Architecture of a neural network is driven by the task it is intended to address – Classification, regression, clustering, general optimization, association, …. • Most popular architecture: Feedforward, multi- layered perceptron with backpropagation learning algorithm – Used for both classification and regression type problems

Other Popular ANN Paradigms Self Organizing Maps (SOM) • Applications of SOM – Customer segmentation – Bibliographic classification – Image-browsing systems – Medical diagnosis – Interpretation of seismic activity – Speech recognition – Data compression – Environmental modeling, many more …

Applications Types of ANN • Classification – Feedforward networks (MLP), radial basis function, and probabilistic NN • Regression – Feedforward networks (MLP), radial basis function • Clustering – Adaptive Resonance Theory (ART) and SOM • Association – Hopfield networks • Provide examples for each type?

Advantages of ANN • Able to deal with (identify/model) highly nonlinear relationships • Not prone to restricting normality and/or independence assumptions • Can handle variety of problem types • Usually provides better results (prediction and/or clustering) compared to its statistical counterparts • Handles both numerical and categorical variables (transformation needed!)

Disadvantages of ANN • They are deemed to be black-box solutions, lacking expandability • It is hard to find optimal values for large number of network parameters – Optimal design is still an art: requires expertise and extensive experimentation • It is hard to handle large number of variables (especially the rich nominal attributes) • Training may take a long time for large datasets; which may require case sampling

ANN Software • Standalone ANN software tool – NeuroSolutions – BrainMaker – NeuralWare – NeuroShell, … for more (see pcai.com) … • Part of a data mining software suit – PASW (formerly SPSS Clementine) – SAS Enterprise Miner – Statistica Data Miner, … many more …

Applications-I • Handwritten Digit Recognition • Face recognition • Time series prediction • Process identification • Process control • Optical character recognition

Application-II • Forecasting/Market Prediction: finance and banking • Manufacturing: quality control, fault diagnosis • Medicine: analysis of electrocardiogram data, RNA & DNA sequencing, drug development without animal testing • Control: process, robotics

Artificial Neural Networks for Data Mining

More Related Content

What's hot

Similar to Artificial Neural Networks for Data Mining

More from Amity University | FMS - DU | IMT | Stratford University | KKMI International Institute | AIMA | DTU

Recently uploaded

In this document

Artificial Neural Networks for Data Mining