1 ยฉ 2015 The MathWorks, Inc. Big Data and Machine Learning Using MATLAB Seth DeLand & Amit Doshi MathWorks
2 Data Analytics Turn large volumes of complex data into actionable information source: Gartner
3 Customer Example: Gas Natural Fenosa Energy Production Optimization Opportunity โ€ข Allocate demand among power plants to minimize generation costs Analytics Use โ€ข Data: Central database for historical power consumption and price data, weather forecasts, and parameters for each power plant โ€ข Machine Learning: Develop price simulation scenarios โ€ข Optimization: minimize production cost Benefit โ€ข Reduced generation costs โ€ข White-box solution for optimizing power generation User Story
4 Prescriptive Analytics Predictive Analytics Unit Commitment Predictive and Prescriptive Analytics Schedule Generator Parameters Unit Commitment Load Forecast Historical Weather Data Historical Load Data
5 Big Data Analytics Workflow Integrate Analytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction
6 Example: Working with Big Data in MATLAB โ–ช Objective: Create a model to predict the cost of a taxi ride in New York City โ–ช Inputs: โ€“ Monthly taxi ride log files โ€“ The local data set is small (~20 MB) โ€“ The full data set is big (~21 GB) โ–ช Approach: โ€“ Access Data โ€“ Preprocess and explore data โ€“ Develop and validate predictive model (linear fit) โ–ช Work with subset of data for prototyping and then run on spark enabled hadoop with full data โ€“ Integrate analytics into a webapp
7 Example: Working with Big Data in MATLAB
8 Demo: Taxi Fare Predictor Web App
9 Big Data Analytics Workflow: Data Access and Pre-process Integrate Analytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction
10 Data Access and Pre-processing โ€“ Challenges โ–ช Data aggregation โ€“ Different sources (files, web, etc.) โ€“ Different types (images, text, audio, etc.) โ–ช Data clean up โ€“ Poorly formatted files โ€“ Irregularly sampled data โ€“ Redundant data, outliers, missing data etc. โ–ช Data specific processing โ€“ Signals: Smoothing, resampling, denoising, Wavelet transforms, etc. โ€“ Images: Image registration, morphological filtering, deblurring, etc. โ–ช Dealing with out of memory data (big data) Challenges Data preparation accounts for about 80% of the work of data scientists - Forbes
11 Data Analytics Workflow: Big Data Access and Pre-processing
12 Next: Access Big Data from MATLAB โ–ช datastore โ€“ Tabular text files โ€“ Images โ€“ Excel spreadsheets โ€“ (SQL) Databases โ€“ HDFS (Hadoop) โ€“ S3 - Amazon
13 Get data in MATLAB
14 What if the data is saved in HDFS?
15 Or Data is stored in a Database
16 Data Access: Summary C Java Fortran Python Hardware Software Servers and Databases โ–ช Repositories โ€“ SQL, NoSQL, etc. โ–ช File I/O โ€“ Text, Spreadsheet, etc. โ–ช Web Sources โ€“ RESTful, JSON, etc. Business and Transactional Data Engineering, Scientific and Field Data โ–ช Real-Time Sources โ€“ Sensors, GPS, etc. โ–ช File I/O โ€“ Image, Audio, etc. โ–ช Communication Protocols โ€“ OPC (OLE for Process Control), CAN (Controller Area Network), etc.
17 Process data which doesn't fit into memory Integrate Analytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction
18 Pre-processing Big Data โ–ช New data type designed for data that doesnโ€™t fit into memory โ–ช Lots of observations (hence โ€œtallโ€) โ–ช Looks like a normal MATLAB array โ€“ Supports numeric types, tables, datetimes, strings, etcโ€ฆ โ€“ Supports several hundred functions for basic math, stats, indexing, etc. โ€“ Statistics and Machine Learning Toolbox support (clustering, classification, etc.) tall arrays in
19 tall array Single Machine Memory tall arrays โ–ช Automatically breaks data up into small โ€œchunksโ€ that fit in memory โ–ช Tall arrays scan through the dataset one โ€œchunkโ€ at a time โ–ช Processing code for tall arrays is the same as ordinary arrays Single Machine Memory Process
20 tall array Cluster of Machines Memory Single Machine Memory tall arrays โ–ช With Parallel Computing Toolbox, process several โ€œchunksโ€ at once โ–ช Can scale up to clusters with MATLAB Distributed Computing Server Single Machine Memory Process Single Machine Memory Process Single Machine Memory Process Single Machine Memory Process Single Machine Memory Process Single Machine Memory Process
21 Demo: Working with Tall Arrays
22 Data Access and pre-processing โ€“ challenges and solution โ–ช Data aggregation โ€“ Different sources (files, web, etc.) โ€“ Different types (images, text, audio, etc.) โ–ช Data clean up โ€“ Poorly formatted files โ€“ Irregularly sampled data โ€“ Redundant data, outliers, missing data etc. โ–ช Data specific processing โ€“ Signals: Smoothing, resampling, denoising, Wavelet transforms, etc. โ€“ Images: Image registration, morphological filtering, deblurring, etc. โ–ช Dealing with out of memory data (big data) Challenges โ–ช Point and click tools to access variety of data sources โ–ช High-performance environment for big data Files Signals Databases Images โ–ช Built-in algorithms for data preprocessing including sensor, image, audio, video and other real-time data MATLAB makes it easy to work with business and engineering data 1
23 Data Analytics Workflow: Develop Predictive Models using Big Data Integrate Analytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction
24 Machine Learning Machine learning uses data and produces a program to perform a task Standard Approach Machine Learning Approach ๐‘š๐‘œ๐‘‘๐‘’๐‘™ = < ๐‘ด๐’‚๐’„๐’‰๐’Š๐’๐’† ๐‘ณ๐’†๐’‚๐’“๐’๐’Š๐’๐’ˆ ๐‘จ๐’๐’ˆ๐’๐’“๐’Š๐’•๐’‰๐’Ž >(๐‘ ๐‘’๐‘›๐‘ ๐‘œ๐‘Ÿ_๐‘‘๐‘Ž๐‘ก๐‘Ž, ๐‘Ž๐‘๐‘ก๐‘–๐‘ฃ๐‘–๐‘ก๐‘ฆ) Computer Program Machine Learning ๐‘š๐‘œ๐‘‘๐‘’๐‘™: Inputs โ†’ Outputs Hand Written Program Formula or Equation If X_acc > 0.5 then โ€œSITTINGโ€ If Y_acc < 4 and Z_acc > 5 then โ€œSTANDINGโ€ โ€ฆ ๐‘Œ๐‘Ž๐‘๐‘ก๐‘–๐‘ฃ๐‘–๐‘ก๐‘ฆ = ๐›ฝ1๐‘‹๐‘Ž๐‘๐‘ + ๐›ฝ2๐‘Œ๐‘Ž๐‘๐‘ + ๐›ฝ3๐‘๐‘Ž๐‘๐‘ + โ€ฆ Task: Human Activity Detection
25 Consider Machine/Deep Learning When update as more data becomes available learn complex non- linear relationships learn efficiently from very large data sets Problem is too complex for hand written rules or equations Speech Recognition Object Recognition Engine Health Monitoring Program needs to adapt with changing data Weather Forecasting Energy Load Forecasting Stock Market Prediction Program needs to scale IoT Analytics Taxi Availability Airline Flight Delays Because algorithms can
26 Different Types of Learning Machine Learning Supervised Learning Classification Regression Unsupervised Learning Clustering Discover an internal representation from input data only Develop predictive model based on both input and output data Type of Learning Categories of Algorithms โ€ข No output - find natural groups and patterns from input data only โ€ข Output is a real number (temperature, stock prices) โ€ข Output is a choice between classes (True, False) (Red, Blue, Green)
27 Different Types of Learning Machine Learning Supervised Learning Classification Regression Unsupervised Learning Clustering Discover an internal representation from input data only Develop predictive model based on both input and output data Type of Learning Categories of Algorithms Linear Regression GLM Decision Trees Ensemble Methods Neural Networks SVR, GPR Nearest Neighbor Discriminant Analysis Naive Bayes Support Vector Machines kMeans, kmedoids Fuzzy C-Means Hierarchical Neural Networks Gaussian Mixture Hidden Markov Model
28 Machine Learning with Big Data โ€ข Descriptive statistics (skewness, tabulate, crosstab, cov, grpstats, โ€ฆ) โ€ข K-means clustering (kmeans) โ€ข Visualization (ksdensity, binScatterPlot; histogram, histogram2) โ€ข Dimensionality reduction (pca, pcacov, factoran) โ€ข Linear and generalized linear regression (fitlm, fitglm) โ€ข Discriminant analysis (fitcdiscr) โ€ข Linear classification methods for SVM and logistic regression (fitclinear) โ€ข Random forest ensembles of classification trees (TreeBagger) โ€ข Naรฏve Bayes classification (fitcnb) โ€ข Regularized regression (lasso) โ€ข Prediction applied to tall arrays
29 Demo: Training a Machine Learning Model
30 Demo: Training a Machine Learning Model
31 Regression Learner
32 Regression Learner App to apply advanced regression methods to your data โ–ช Added to Statistics and Machine Learning Toolbox in R2017a โ–ช Point and click interface โ€“ no coding required โ–ช Quickly evaluate, compare and select regression models โ–ช Export and share MATLAB code or trained models
33 Classification Learner App to apply advanced classification methods to your data โ–ช Added to Statistics and Machine Learning Toolbox in R2015a โ–ช Point and click interface โ€“ no coding required โ–ช Quickly evaluate, compare and select classification models โ–ช Export and share MATLAB code or trained models
34 and Many More MATLAB Apps for Data Analytics Distribution Fitting System Identification Signal Analysis Wavelet Design and Analysis Neural Net Fitting Neural Net Pattern Recognition Training Image Labeler and many moreโ€ฆ
35 Tuning Machine Learning Models Get more accurate models in less time Automatically select best machine leaning โ€œfeaturesโ€ NCA: Neighborhood Component Analysis Select best โ€œfeaturesโ€ to keep in model from over 400 candidates Automatically fine-tune machine learning parameters Hyperparameter Tuning
36 Machine Learning Hyperparameters Hyperparameters Tune a typical set of hyperparameters for this model Tune all hyperparameters for this model
37 Bayesian Optimization in Action
38 Challenges โ–ช Lack of data science expertise โ–ช Feature Extraction โ€“ How to transform data to best represent the system? โ€“ Requires subject matter expertise โ€“ No right way of designing features โ–ช Feature Selection โ€“ What attributes or subset of data to use? โ€“ Entails a lot of iteration โ€“ Trial and error โ€“ Difficult to evaluate features โ–ช Model Development โ€“ Many different models โ€“ Model Validation and Tuning โ–ช Time required to conduct the analysis MATLAB enables domain experts to do Data Science 2 Apps Language โ–ช Easy to use apps โ–ช Wide breadth of tools to facilitate domain specific analysis โ–ช Examples/videos to get started โ–ช Automatic MATLAB code generation โ–ช High speed processing of large data sets Big Data Analytics Workflow: Developing Predictive models
39 Back to our example: Working with Big Data in MATLAB โ–ช Objective: Create a model to predict the cost of a taxi ride in New York City โ–ช Inputs: โ€“ Monthly taxi ride log files โ€“ The local data set is small (~20 MB) โ€“ The full data set is big (~25 GB) โ–ช Approach: โ€“ Acecss Data โ€“ Preprocess and explore data โ€“ Develop and validate predictive model (linear fit) โ–ช Work with subset of data for prototyping โ–ช Scale to full data set on a cluster
40 Data Analytics Workflow: Develop Predictive Models using Big Data Integrate Analytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction
41 Demo: Taxi Fare Predictor Web App
42 MATLAB Production Server โ–ช Server software โ€“ Manages packaged MATLAB programs and worker pool โ–ช MATLAB Runtime libraries โ€“ Single server can use runtimes from different releases โ–ช RESTful JSON interface โ–ช Lightweight client libraries โ€“ C/C++, .NET, Python, and Java MATLAB Production Server MATLAB Runtime Request Broker & Program Manager Applications/ Database Servers RESTful JSON Enterprise Application MPS Client Library
43 Integrate analytics with systems MATLAB Runtime C, C++ HDL PLC Embedded Hardware C/C++ ++ Excel Add-in Java Hadoop/ Spark .NET MATLAB Production Server Standalone Application Enterprise Systems Python MATLAB Analytics run anywhere 3
44 YARN Product Support for Spark Web & Mobile Applications Enterprise Applications DEVELOPMENT TOOLS MATLAB Compiler MATLAB Runtime MATLAB Distributed Computing Server From MATLAB desktop: โ€ข Access data from HDFS โ€ข Run โ€œtallโ€ functions on Spark/Hadoop using MDCS Spark Integrate with applications: โ€ข Deploy MATLAB programs using โ€œtallโ€ โ€ข Develop deployable applications for Spark using MATLAB API for Spark
45 YARN : Data Operating System Spark Deployment Offerings โ–ช Deploy โ€œtallโ€ programs โ€“ Create Standalone Applications: MATLAB Compiler โ–ช MATLAB API for Spark โ€“ Create Standalone Applications: MATLAB Compiler โ€“ Functionality beyond tall arrays โ€“ For advanced programmers familiar with Spark โ€“ Local install of Spark to run code in MATLAB โ–ช Installed on same machine as MATLAB โ€“ single node, Linux Standalone Application Edge Node MATLAB Runtime MATLAB Compiler Program using tall Program using MATLAB API for Spark Since the Standalone must run on a Linux Edge Node, you must compile on Linux
46 Data Analytics Workflow Integrate Analytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction MATLAB Analytics work with business and engineering data 1 MATLAB enables domain experts to do Data Science 2 3 MATLAB Analytics run anywhere
47 Resources to learn and get started mathworks.com/machine-learning eBook mathworks.com/big-data
48 MathWorks Services โ–ช Consulting โ€“ Integration โ€“ Data analysis/visualization โ€“ Unify workflows, models, data โ–ช Training โ€“ Classroom, online, on-site โ€“ Data Processing, Visualization, Deployment, Parallel Computing www.mathworks.com/services/consulting/ www.mathworks.com/services/training/
49 MathWorks Training Offerings http://www.mathworks.com/services/training/
50 Speaker Details Email: seth.deland@mathworks.com amit.doshi@mathworks.in LinkedIn: https://in.linkedin.com/in/amit-doshi https://www.linkedin.com/in/seth-deland Contact MathWorks India Products/Training Enquiry Booth Call: 080-6632-6000 Email: info@mathworks.in Your feedback is valued. Please complete the feedback form provided to you.

Big Data And Machine Learning Using MATLAB.pdf

  • 1.
    1 ยฉ 2015 TheMathWorks, Inc. Big Data and Machine Learning Using MATLAB Seth DeLand & Amit Doshi MathWorks
  • 2.
    2 Data Analytics Turn largevolumes of complex data into actionable information source: Gartner
  • 3.
    3 Customer Example: GasNatural Fenosa Energy Production Optimization Opportunity โ€ข Allocate demand among power plants to minimize generation costs Analytics Use โ€ข Data: Central database for historical power consumption and price data, weather forecasts, and parameters for each power plant โ€ข Machine Learning: Develop price simulation scenarios โ€ข Optimization: minimize production cost Benefit โ€ข Reduced generation costs โ€ข White-box solution for optimizing power generation User Story
  • 4.
    4 Prescriptive Analytics Predictive Analytics UnitCommitment Predictive and Prescriptive Analytics Schedule Generator Parameters Unit Commitment Load Forecast Historical Weather Data Historical Load Data
  • 5.
    5 Big Data AnalyticsWorkflow Integrate Analytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction
  • 6.
    6 Example: Working withBig Data in MATLAB โ–ช Objective: Create a model to predict the cost of a taxi ride in New York City โ–ช Inputs: โ€“ Monthly taxi ride log files โ€“ The local data set is small (~20 MB) โ€“ The full data set is big (~21 GB) โ–ช Approach: โ€“ Access Data โ€“ Preprocess and explore data โ€“ Develop and validate predictive model (linear fit) โ–ช Work with subset of data for prototyping and then run on spark enabled hadoop with full data โ€“ Integrate analytics into a webapp
  • 7.
    7 Example: Working withBig Data in MATLAB
  • 8.
    8 Demo: Taxi FarePredictor Web App
  • 9.
    9 Big Data AnalyticsWorkflow: Data Access and Pre-process Integrate Analytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction
  • 10.
    10 Data Access andPre-processing โ€“ Challenges โ–ช Data aggregation โ€“ Different sources (files, web, etc.) โ€“ Different types (images, text, audio, etc.) โ–ช Data clean up โ€“ Poorly formatted files โ€“ Irregularly sampled data โ€“ Redundant data, outliers, missing data etc. โ–ช Data specific processing โ€“ Signals: Smoothing, resampling, denoising, Wavelet transforms, etc. โ€“ Images: Image registration, morphological filtering, deblurring, etc. โ–ช Dealing with out of memory data (big data) Challenges Data preparation accounts for about 80% of the work of data scientists - Forbes
  • 11.
    11 Data Analytics Workflow:Big Data Access and Pre-processing
  • 12.
    12 Next: Access BigData from MATLAB โ–ช datastore โ€“ Tabular text files โ€“ Images โ€“ Excel spreadsheets โ€“ (SQL) Databases โ€“ HDFS (Hadoop) โ€“ S3 - Amazon
  • 13.
  • 14.
    14 What if thedata is saved in HDFS?
  • 15.
    15 Or Data isstored in a Database
  • 16.
    16 Data Access: Summary CJava Fortran Python Hardware Software Servers and Databases โ–ช Repositories โ€“ SQL, NoSQL, etc. โ–ช File I/O โ€“ Text, Spreadsheet, etc. โ–ช Web Sources โ€“ RESTful, JSON, etc. Business and Transactional Data Engineering, Scientific and Field Data โ–ช Real-Time Sources โ€“ Sensors, GPS, etc. โ–ช File I/O โ€“ Image, Audio, etc. โ–ช Communication Protocols โ€“ OPC (OLE for Process Control), CAN (Controller Area Network), etc.
  • 17.
    17 Process data whichdoesn't fit into memory Integrate Analytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction
  • 18.
    18 Pre-processing Big Data โ–ชNew data type designed for data that doesnโ€™t fit into memory โ–ช Lots of observations (hence โ€œtallโ€) โ–ช Looks like a normal MATLAB array โ€“ Supports numeric types, tables, datetimes, strings, etcโ€ฆ โ€“ Supports several hundred functions for basic math, stats, indexing, etc. โ€“ Statistics and Machine Learning Toolbox support (clustering, classification, etc.) tall arrays in
  • 19.
    19 tall array Single Machine Memory tall arrays โ–ชAutomatically breaks data up into small โ€œchunksโ€ that fit in memory โ–ช Tall arrays scan through the dataset one โ€œchunkโ€ at a time โ–ช Processing code for tall arrays is the same as ordinary arrays Single Machine Memory Process
  • 20.
    20 tall array Cluster of Machines Memory Single Machine Memory tallarrays โ–ช With Parallel Computing Toolbox, process several โ€œchunksโ€ at once โ–ช Can scale up to clusters with MATLAB Distributed Computing Server Single Machine Memory Process Single Machine Memory Process Single Machine Memory Process Single Machine Memory Process Single Machine Memory Process Single Machine Memory Process
  • 21.
  • 22.
    22 Data Access andpre-processing โ€“ challenges and solution โ–ช Data aggregation โ€“ Different sources (files, web, etc.) โ€“ Different types (images, text, audio, etc.) โ–ช Data clean up โ€“ Poorly formatted files โ€“ Irregularly sampled data โ€“ Redundant data, outliers, missing data etc. โ–ช Data specific processing โ€“ Signals: Smoothing, resampling, denoising, Wavelet transforms, etc. โ€“ Images: Image registration, morphological filtering, deblurring, etc. โ–ช Dealing with out of memory data (big data) Challenges โ–ช Point and click tools to access variety of data sources โ–ช High-performance environment for big data Files Signals Databases Images โ–ช Built-in algorithms for data preprocessing including sensor, image, audio, video and other real-time data MATLAB makes it easy to work with business and engineering data 1
  • 23.
    23 Data Analytics Workflow:Develop Predictive Models using Big Data Integrate Analytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction
  • 24.
    24 Machine Learning Machine learninguses data and produces a program to perform a task Standard Approach Machine Learning Approach ๐‘š๐‘œ๐‘‘๐‘’๐‘™ = < ๐‘ด๐’‚๐’„๐’‰๐’Š๐’๐’† ๐‘ณ๐’†๐’‚๐’“๐’๐’Š๐’๐’ˆ ๐‘จ๐’๐’ˆ๐’๐’“๐’Š๐’•๐’‰๐’Ž >(๐‘ ๐‘’๐‘›๐‘ ๐‘œ๐‘Ÿ_๐‘‘๐‘Ž๐‘ก๐‘Ž, ๐‘Ž๐‘๐‘ก๐‘–๐‘ฃ๐‘–๐‘ก๐‘ฆ) Computer Program Machine Learning ๐‘š๐‘œ๐‘‘๐‘’๐‘™: Inputs โ†’ Outputs Hand Written Program Formula or Equation If X_acc > 0.5 then โ€œSITTINGโ€ If Y_acc < 4 and Z_acc > 5 then โ€œSTANDINGโ€ โ€ฆ ๐‘Œ๐‘Ž๐‘๐‘ก๐‘–๐‘ฃ๐‘–๐‘ก๐‘ฆ = ๐›ฝ1๐‘‹๐‘Ž๐‘๐‘ + ๐›ฝ2๐‘Œ๐‘Ž๐‘๐‘ + ๐›ฝ3๐‘๐‘Ž๐‘๐‘ + โ€ฆ Task: Human Activity Detection
  • 25.
    25 Consider Machine/Deep LearningWhen update as more data becomes available learn complex non- linear relationships learn efficiently from very large data sets Problem is too complex for hand written rules or equations Speech Recognition Object Recognition Engine Health Monitoring Program needs to adapt with changing data Weather Forecasting Energy Load Forecasting Stock Market Prediction Program needs to scale IoT Analytics Taxi Availability Airline Flight Delays Because algorithms can
  • 26.
    26 Different Types ofLearning Machine Learning Supervised Learning Classification Regression Unsupervised Learning Clustering Discover an internal representation from input data only Develop predictive model based on both input and output data Type of Learning Categories of Algorithms โ€ข No output - find natural groups and patterns from input data only โ€ข Output is a real number (temperature, stock prices) โ€ข Output is a choice between classes (True, False) (Red, Blue, Green)
  • 27.
    27 Different Types ofLearning Machine Learning Supervised Learning Classification Regression Unsupervised Learning Clustering Discover an internal representation from input data only Develop predictive model based on both input and output data Type of Learning Categories of Algorithms Linear Regression GLM Decision Trees Ensemble Methods Neural Networks SVR, GPR Nearest Neighbor Discriminant Analysis Naive Bayes Support Vector Machines kMeans, kmedoids Fuzzy C-Means Hierarchical Neural Networks Gaussian Mixture Hidden Markov Model
  • 28.
    28 Machine Learning withBig Data โ€ข Descriptive statistics (skewness, tabulate, crosstab, cov, grpstats, โ€ฆ) โ€ข K-means clustering (kmeans) โ€ข Visualization (ksdensity, binScatterPlot; histogram, histogram2) โ€ข Dimensionality reduction (pca, pcacov, factoran) โ€ข Linear and generalized linear regression (fitlm, fitglm) โ€ข Discriminant analysis (fitcdiscr) โ€ข Linear classification methods for SVM and logistic regression (fitclinear) โ€ข Random forest ensembles of classification trees (TreeBagger) โ€ข Naรฏve Bayes classification (fitcnb) โ€ข Regularized regression (lasso) โ€ข Prediction applied to tall arrays
  • 29.
    29 Demo: Training aMachine Learning Model
  • 30.
    30 Demo: Training aMachine Learning Model
  • 31.
  • 32.
    32 Regression Learner App toapply advanced regression methods to your data โ–ช Added to Statistics and Machine Learning Toolbox in R2017a โ–ช Point and click interface โ€“ no coding required โ–ช Quickly evaluate, compare and select regression models โ–ช Export and share MATLAB code or trained models
  • 33.
    33 Classification Learner App toapply advanced classification methods to your data โ–ช Added to Statistics and Machine Learning Toolbox in R2015a โ–ช Point and click interface โ€“ no coding required โ–ช Quickly evaluate, compare and select classification models โ–ช Export and share MATLAB code or trained models
  • 34.
    34 and Many MoreMATLAB Apps for Data Analytics Distribution Fitting System Identification Signal Analysis Wavelet Design and Analysis Neural Net Fitting Neural Net Pattern Recognition Training Image Labeler and many moreโ€ฆ
  • 35.
    35 Tuning Machine LearningModels Get more accurate models in less time Automatically select best machine leaning โ€œfeaturesโ€ NCA: Neighborhood Component Analysis Select best โ€œfeaturesโ€ to keep in model from over 400 candidates Automatically fine-tune machine learning parameters Hyperparameter Tuning
  • 36.
    36 Machine Learning Hyperparameters Hyperparameters Tunea typical set of hyperparameters for this model Tune all hyperparameters for this model
  • 37.
  • 38.
    38 Challenges โ–ช Lack ofdata science expertise โ–ช Feature Extraction โ€“ How to transform data to best represent the system? โ€“ Requires subject matter expertise โ€“ No right way of designing features โ–ช Feature Selection โ€“ What attributes or subset of data to use? โ€“ Entails a lot of iteration โ€“ Trial and error โ€“ Difficult to evaluate features โ–ช Model Development โ€“ Many different models โ€“ Model Validation and Tuning โ–ช Time required to conduct the analysis MATLAB enables domain experts to do Data Science 2 Apps Language โ–ช Easy to use apps โ–ช Wide breadth of tools to facilitate domain specific analysis โ–ช Examples/videos to get started โ–ช Automatic MATLAB code generation โ–ช High speed processing of large data sets Big Data Analytics Workflow: Developing Predictive models
  • 39.
    39 Back to ourexample: Working with Big Data in MATLAB โ–ช Objective: Create a model to predict the cost of a taxi ride in New York City โ–ช Inputs: โ€“ Monthly taxi ride log files โ€“ The local data set is small (~20 MB) โ€“ The full data set is big (~25 GB) โ–ช Approach: โ€“ Acecss Data โ€“ Preprocess and explore data โ€“ Develop and validate predictive model (linear fit) โ–ช Work with subset of data for prototyping โ–ช Scale to full data set on a cluster
  • 40.
    40 Data Analytics Workflow:Develop Predictive Models using Big Data Integrate Analytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction
  • 41.
    41 Demo: Taxi FarePredictor Web App
  • 42.
    42 MATLAB Production Server โ–ชServer software โ€“ Manages packaged MATLAB programs and worker pool โ–ช MATLAB Runtime libraries โ€“ Single server can use runtimes from different releases โ–ช RESTful JSON interface โ–ช Lightweight client libraries โ€“ C/C++, .NET, Python, and Java MATLAB Production Server MATLAB Runtime Request Broker & Program Manager Applications/ Database Servers RESTful JSON Enterprise Application MPS Client Library
  • 43.
    43 Integrate analytics withsystems MATLAB Runtime C, C++ HDL PLC Embedded Hardware C/C++ ++ Excel Add-in Java Hadoop/ Spark .NET MATLAB Production Server Standalone Application Enterprise Systems Python MATLAB Analytics run anywhere 3
  • 44.
    44 YARN Product Support forSpark Web & Mobile Applications Enterprise Applications DEVELOPMENT TOOLS MATLAB Compiler MATLAB Runtime MATLAB Distributed Computing Server From MATLAB desktop: โ€ข Access data from HDFS โ€ข Run โ€œtallโ€ functions on Spark/Hadoop using MDCS Spark Integrate with applications: โ€ข Deploy MATLAB programs using โ€œtallโ€ โ€ข Develop deployable applications for Spark using MATLAB API for Spark
  • 45.
    45 YARN : DataOperating System Spark Deployment Offerings โ–ช Deploy โ€œtallโ€ programs โ€“ Create Standalone Applications: MATLAB Compiler โ–ช MATLAB API for Spark โ€“ Create Standalone Applications: MATLAB Compiler โ€“ Functionality beyond tall arrays โ€“ For advanced programmers familiar with Spark โ€“ Local install of Spark to run code in MATLAB โ–ช Installed on same machine as MATLAB โ€“ single node, Linux Standalone Application Edge Node MATLAB Runtime MATLAB Compiler Program using tall Program using MATLAB API for Spark Since the Standalone must run on a Linux Edge Node, you must compile on Linux
  • 46.
    46 Data Analytics Workflow IntegrateAnalytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction MATLAB Analytics work with business and engineering data 1 MATLAB enables domain experts to do Data Science 2 3 MATLAB Analytics run anywhere
  • 47.
    47 Resources to learnand get started mathworks.com/machine-learning eBook mathworks.com/big-data
  • 48.
    48 MathWorks Services โ–ช Consulting โ€“Integration โ€“ Data analysis/visualization โ€“ Unify workflows, models, data โ–ช Training โ€“ Classroom, online, on-site โ€“ Data Processing, Visualization, Deployment, Parallel Computing www.mathworks.com/services/consulting/ www.mathworks.com/services/training/
  • 49.
  • 50.
    50 Speaker Details Email: seth.deland@mathworks.com amit.doshi@mathworks.in LinkedIn: https://in.linkedin.com/in/amit-doshi https://www.linkedin.com/in/seth-deland Contact MathWorksIndia Products/Training Enquiry Booth Call: 080-6632-6000 Email: info@mathworks.in Your feedback is valued. Please complete the feedback form provided to you.