www.prmia.org© PRMIA 2020 Synthetic VIX Data Generation Using ML Techniques Sri Krishnamurthy, CFA, CAP Founder & CEO www.QuantUniversity.com www.prmia.org© PRMIA 2020 Thought Leadership Webinar
www.prmia.org© PRMIA 2020 Before We Begin Submit your questions anytime using the Questions pane. Session is being recorded Show/Hide panel arrow Download Handout
www.prmia.org© PRMIA 2020 Presenter Sri Krishnamurthy, CFA, CAP Founder & CEO, QuantUniversity Our Presenter • Advisory and Consultancy for Financial Analytics • Prior experience at MathWorks, Citigroup, and Endeca and 25+ years in financial services and energy • Columnist for the Wilmott Magazine • Author of forthcoming book “Financial Modeling: A case study approach” published by Wiley • Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston • Reviewer: Journal of Asset Management
www.prmia.org© PRMIA 2020 Synthetic VIX Data Generation Using ML Techniques Sri Krishnamurthy, CFA, CAP Founder & CEO www.QuantUniversity.com www.prmia.org© PRMIA 2020 Thought Leadership Webinar
www.prmia.org© PRMIA 2020 About www.QuantUniversity.com • Boston-based Data Science, Quant Finance and Machine Learning training and consulting advisory • Trained more than 1,000 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R • Building a platform for AI and Machine Learning Enablement in the Enterprise
www.prmia.org© PRMIA 2020 Agenda Machine Learning in 20 minutes Case Study Key Trends in AI and Machine Learning
www.prmia.org© PRMIA 2020 AI and Machine Learning in FinancePart 1
www.prmia.org© PRMIA 2020
www.prmia.org© PRMIA 2020 The world as we know has changed! Source: https://finance.yahoo.com
www.prmia.org© PRMIA 2020 Winners & Losers Source: https://finance.yahoo.com
www.prmia.org© PRMIA 2020 The world as we know has changed! • Digital Transformation is the use of new, fast and frequently changing digital technology to solve problems. Source: https://en.wikipedia.org/wiki/Digital_transformation
www.prmia.org© PRMIA 2020 Business model for delivery matters today! Digital delivery models that leveraged cloud from the ground up as a means of delivery, online learning, paperless services, communication platforms are all seeing the - "I told you so" moment!
www.prmia.org© PRMIA 2020 Interest in Machine Learning Has Grown Significantly Source: https://www.wipo.int/edocs/pubdocs/en/wipo_pub_1055.pdf
www.prmia.org© PRMIA 2020 Machine Learning and AI Have Revolutionized Finance
www.prmia.org© PRMIA 2020 Machine Learning & AI in Finance: A Paradigm Shift Stochastic Models Factor Models Optimization Risk Factors P/Q Quants Derivative pricing Trading Strategies Simulations Distribution fitting Real-time analytics Predictive analytics Machine Learning RPA NLP Deep Learning Computer Vision Graph Analytics Chatbots Sentiment Analysis Alternative Data Quant Data Scientist/ML Engineer
www.prmia.org© PRMIA 2020 Up Next An Intuitive Introduction to AI and ML
www.prmia.org© PRMIA 2020 Machine Learning 1. https://en.wikipedia.org/wiki/Machine_learning Figure Source: http://www.fsb.org/wp-content/uploads/P011117.pdf AI • Artificial intelligence is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals1. Definitions: Machine Learning and AI • Machine learning is the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on patterns and inference instead1. 1. https://en.wikipedia.org/wiki/Machine_learning 2. Figure Source: http://www.fsb.org/wp-content/uploads/P011117.pdf
www.prmia.org© PRMIA 2020 The Machine Learning and AI Workflow Data Scraping/ Ingestion Data Exploration Data Cleansing and Processing Feature Engineering Model Evaluation & Tuning Model Selection Model Deployment/ Inference Supervised Unsupervised Modeling Data Engineer, Dev Ops Engineer • Auto ML • Model Validation • Interpretability Robotic Process Automation (RPA) (Microservices, Pipelines ) • SW: Web/ Rest API • HW: GPU, Cloud • Monitoring • Regression • KNN • Decision Trees • Naive Bayes • Neural Networks • Ensembles • Clustering • PCA • Autoencoder • RMS • MAPS • MAE • Confusion Matrix • Precision/Recall • ROC • Hyper-parameter tuning • Parameter Grids Risk Management/ Compliance(All stages) Software / Web Engineer Data Scientist/Quants Analysts& DecisionMakers
www.prmia.org© PRMIA 2020 Key Steps Involved • Data • Goals • Machine learning algorithms • Process • Performance evaluation
www.prmia.org© PRMIA 2020 Up Next Data
www.prmia.org© PRMIA 2020 Dataset, Variable and Observations Dataset: A rectangular array with Rows as observations and columns as variables Variable: A characteristic of members of a population (Age, State etc.) Observation: List of variable values for a member of the population
www.prmia.org© PRMIA 2020 Variables Categorical • Yes/No flags • AAA, BB ratings for bonds Numerical • 35mpg • $170K salary
www.prmia.org© PRMIA 2020 Datasets Longitudinal • Observations are dependent • Temporal-continuity is required Cross-sectional • Observations are independent
www.prmia.org© PRMIA 2020 Smart Algorithms Data Cross sectional Numerical Categorical Longitudinal Numerical Data Cross sectional Longitudinal CategoricalNumerical Numerical
www.prmia.org© PRMIA 2020 Up Next Goals
www.prmia.org© PRMIA 2020 Goal Descriptive Statistics • Goal is to describe the data at hand • Backward-looking • Statistical techniques employed here Predictive Analytics • Goal is to use historical data to build a model for prediction • Forward-looking • Machine learning & AI techniques employed here
www.prmia.org© PRMIA 2020 Predictive Analytics: Cross Sectional Datasets • Given a dataset, build a model that captures the similarities in different observations and assigns them to different buckets- Clustering • Given a set of variables, predict the value of another variable in a given data set- Prediction Examples: — Predict salaries given work experience, education etc. — Predict whether a loan would be approved given fico score, current loans, employment status etc.
www.prmia.org© PRMIA 2020 Summary Goal Descriptive Statistics Cross sectional Numerical Categorical Numerical vs Categorical Categorical vs Categorical Numerical vs Numerical Time series Predictive Analytics Cross- sectional Segmentation Prediction Predict a number Predict a category Time-serie Goal Descriptive Statistics Cross sectional Time series Numerical Categorical Numerical vs Categorical Categorical vs Categorical Numerical vs Numerical Predictive Analytics Cross sectional Time series Segmentation Prediction Predict a Number Predict a category
www.prmia.org© PRMIA 2020 Up Next Machine Learning Algorithms
www.prmia.org© PRMIA 2020 Machine Learning Unsupervised Supervised Reinforcement Semi-Supervised Machine Learning
www.prmia.org© PRMIA 2020 Summary Goal Descriptive Statistics Cross sectional Numerical Categorical Numerical vs Categorical Categorical vs Categorical Numerical vs Numerical Time series Predictive Analytics Cross- sectional Segmentation Prediction Predict a number Predict a category Time-serie Goal Descriptive Statistics Cross sectional Time series Numerical Categorical Numerical vs Categorical Categorical vs Categorical Numerical vs Numerical Predictive Analytics Cross sectional Time series Segmentation Prediction Predict a Number Predict a category
www.prmia.org© PRMIA 2020 Machine Learning Supervised Algorithms • Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦i in a given data set such that • If y is numeric => Prediction • If y is categorical => Classification Example: Given that a customer’s Debt-to-Income ratio increased 20%, what are the chances he/she would default in 3 months? x1,x2,x3… Model F(X) y
www.prmia.org© PRMIA 2020 Machine Learning Unsupervised Algorithms • Given a dataset with variables 𝑥𝑖, build a model that captures the similarities in different observations and assigns them to different buckets => Clustering Example: Given a list of emerging market stocks, can we segment them into three buckets? Obs1, Obs 2, Obs 3… Model Obs1-Class1 Obs2-Class2 Obs3-Class1
www.prmia.org© PRMIA 2020 Supervised Learning Models - Prediction Parametric models • Assume some functional form • Fit coefficients Examples : Linear Regression, Neural Networks 𝑌 = 𝛽! + 𝛽" 𝑋" Linear regression model Neural network model
www.prmia.org© PRMIA 2020 Supervised Learning Models Non-parametric models • No functional form assumed Examples : K-nearest neighbors, Decision Trees K-nearest neighbor model Decision tree model
www.prmia.org© PRMIA 2020 Machine Learning Algorithms Machine Learning Supervised Prediction Classification Parametric Logistic regression Neural Networks K-means Associative rule mining Parametric Linear Regression KNN Decision Trees Classification Logistic Regression Neural Networks Decision Trees KNN Unsupervised Non- parametric Linear regression Neural networks KNN Decision Trees Non- parametric KNN Decision Trees
www.prmia.org© PRMIA 2020 Machine Learning Movers and Shakers Deep Learning Automatic Machine Learning Ensemble Learning Natural Language Processing
www.prmia.org© PRMIA 2020 Up Next Process
www.prmia.org© PRMIA 2020 Process Data ingestion Data cleansing Feature engineering Training and testing Model building Model selection
www.prmia.org© PRMIA 2020 Feature Engineering What transformations do I need for the x and y variables ? Which are the best features to use? • Dimension Reduction – PCA • Best subset selection — Forward selection — Backward elimination — Stepwise regression
www.prmia.org© PRMIA 2020 Training Model Data Training 80% Testing 20%
www.prmia.org© PRMIA 2020 Up Next Performance Evaluation
www.prmia.org© PRMIA 2020 Evaluation Framework Evaluating machine learning algorithms ROC curvesMAPE Supervised classification Supervised - Prediction R-square RMS MAE MAPE ROC Curves Confusion Matrix MAERMSR-square Supervised prediction
www.prmia.org© PRMIA 2020 Prediction Accuracy Measures Fit measures in classical regression modeling: • Adjusted 𝑅^2 has been adjusted for the number of predictors. It increases only when the improve of model is more than one would expect to see by chance (p is the total number of explanatory variables) 𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅! = 1 − ⁄∑"#$ % (𝑦" − 1𝑦")! (𝑛 − 𝑝 − 1) ∑"#$ % 𝑦" − 5𝑦" ! /(𝑛 − 1) • MAE or MAD (mean absolute error/deviation) gives the magnitude of the average absolute error 𝑀𝐴𝐸 = ∑"#$ % 𝑒" 𝑛
www.prmia.org© PRMIA 2020 Prediction Accuracy Measures • MAPE (mean absolute percentage error) gives a percentage score of how predictions deviate on average 𝑀𝐴𝑃𝐸 = ∑"#$ % 𝑒"/𝑦" 𝑛 ×100% • RMSE (root-mean-squared error) is computed on the training and validation data 𝑅𝑀𝑆𝐸 = 1/𝑛 > "#$ % 𝑒" !
www.prmia.org© PRMIA 2020 Recap • Data • Goals • Machine learning algorithms • Process • Performance evaluation
www.prmia.org© PRMIA 2020 Machine Learning Workflow Data Scraping/ Ingestion Data Exploration Data Cleansing and Processing Feature Engineering Model Evaluation & Tuning Model Selection Model Deployment/ Inference Supervised Unsupervised Modeling Data Engineer, Dev Ops Engineer • Auto ML • Model Validation • Interpretability Robotic Process Automation (RPA) (Microservices, Pipelines ) • SW: Web/ Rest API • HW: GPU, Cloud • Monitoring • Regression • KNN • Decision Trees • Naive Bayes • Neural Networks • Ensembles • Clustering • PCA • Autoencoder • RMS • MAPS • MAE • Confusion Matrix • Precision/Recall • ROC • Hyper-parameter tuning • Parameter Grids Risk Management/ Compliance(All stages) Software / Web Engineer Data Scientist/Quants Analysts& DecisionMakers
www.prmia.org© PRMIA 2020 Case study: Synthetic VIX Data Generation Using ML Techniques
www.prmia.org© PRMIA 2020 49 1. Challenges with Real Datasets 2. Synthetic Dataset generation tools — Proprietary — Open Source ▪ Faker ▪ Data Synthesizer ▪ SDV ▪ Synthpop ▪ GANs 3. Demos — Data Synthesizer — Sales Data Generator — VIX Data Generator Agenda
www.prmia.org© PRMIA 2020 SYNTHETIC DATA • Synthetic data is "any production data applicable to a given situation that are not obtained by direct measurement.”1 • In finance, Synthetic data has been used in stress and scenario analysis for many years now. • Example: Monte-carlo simulations have been used to generate future scenarios. • In Machine Learning, Synthetic Data plays an important role to prevent overfitting, handle imbalance class problems, and to accommodate plausible scenarios. 1 https://en.wikipedia.org/wiki/Synthetic_data
www.prmia.org© PRMIA 2020 Challenges with Real Datasets All scenarios haven’t played out • Stress scenarios • What-if scenarios 51 Figureref:http://www.actuaries.org/CTTEES_SOLV/Documents/StressTestingPaper.pdf
www.prmia.org© PRMIA 2020 52 Missing values • Missing at random • Missing sequences • Need data to fill frames Challenges with Real Datasets
www.prmia.org© PRMIA 2020 53 Access • Hard to find • Rare class problems • Privacy concerns making it difficult to share Challenges with Real Datasets Picture source: www.pixabay.com
www.prmia.org© PRMIA 2020 54 Imbalanced • Need more samples of rare class • Need proxies for data points that were not observed or recorded Challenges with Real Datasets
www.prmia.org© PRMIA 2020 55 Labels • Human labeling is hard • Synthetic label generators Challenges with Real Datasets
www.prmia.org© PRMIA 2020 56 Opensource Tools
www.prmia.org© PRMIA 2020 57 SDV https://www.computer.org/csdl/proceedings-article/dsaa/2016/07796926/12OmNwx3Q7S
www.prmia.org© PRMIA 2020 58 Data Synthesizer https://faculty.washington.edu/billhowe/publications/pdfs/ping17datasynthesizer.pdf
www.prmia.org© PRMIA 2020 59 Synthpop Ref: https://cran.r-project.org/web/packages/synthpop/index.html
www.prmia.org© PRMIA 2020 60 VAE https://arxiv.org/pdf/1808.06444.pdf
www.prmia.org© PRMIA 2020 61 Generative Adversarial Networks (GAN) https://developers.google.com/machine-learning/gan/gan_structure
www.prmia.org© PRMIA 2020 Synthetic Data in Finance Ref: Machine Learning for Asset Managers, Marcos M. López de Prado,,CAMBRIDGE UNIVERSITY PRESS 2020
www.prmia.org© PRMIA 2020 63
www.prmia.org© PRMIA 2020 64 Demo 1 – Loan Data Synthesizer
www.prmia.org© PRMIA 2020 65 Demo 2: Synthetic Sales Data Generation
www.prmia.org© PRMIA 2020 VIX Characteristics REF: https://www.investopedia.com/terms/v/vix.asp
www.prmia.org© PRMIA 2020 67 Demo 3 : Synthetic VIX Generation
www.prmia.org© PRMIA 2020 Up Next Demo If you would like access to the demo and the QuSandbox, please contact us at info@qusandbox.com
www.prmia.org© PRMIA 2020 Foundations of ML and AI for Financial Professionals Module 1 Machine Learning and AI: An intuitive Introduction Machine Learning vs Statistics: How has the world changed? A tour of Machine Learning and AI methods • Supervised Learning Vs Unsupervised Learning • Deep Learning • Reinforcement Learning Key drivers influencing the adoption of Machine Learning and AI • Big Data, Hardware, Fintech, AI, Alternative Data Key applications • Credit risk, Personalization, Predicting risk, Portfolio optimization and selection Key players • Technology companies, Data vendors, Banks, Fintech startups
www.prmia.org© PRMIA 2020 Foundations of ML and AI for Financial Professionals Module 2 Exploratory data analysis + Case study Exploring and Visualizing large datasets • The Visualization zoo • A framework to decide how to chart • Examples on how to build powerful dashboards Case study 1: Visualizing Categorial, Numerical, Cross- sectional and Time series Financial datasets
www.prmia.org© PRMIA 2020 Foundations of ML and AI for Financial Professionals Module 3 Core Methods and Applications + Demo Dimension reduction and visualizing datasets using PCA, T-SNE Demo: Visualizing high-dimensional Datasets The power of understanding similar products • Unsupervised Machine Learning: How does Clustering work?
www.prmia.org© PRMIA 2020 Foundations of ML and AI for Financial Professionals Module 4 Case study + Lab Unsupervised Learning • Segmentation of Equities using Clustering Techniques • Case study 2: Using K-means for automatic clustering of stocks
www.prmia.org© PRMIA 2020 Foundations of ML and AI for Financial Professionals Module 5 Supervised Learning + Case study Learn from the past: How does Supervised machine learning work? • Cross sectional data • Time series analysis • Regression, Random Forests and Neural Networks Evaluating machine learning algorithms Case study 3: Predicting interest rates and credit risk using Alternative data sets.
www.prmia.org© PRMIA 2020 Foundations of ML and AI for Financial Professionals Module 6 Case study + Lab • Introduction to Neural Networks and Deep Neural Networks • Case study 4: Synthetic Data Generation for VIX Scenarios
www.prmia.org© PRMIA 2020 Foundations of ML and AI for Financial Professionals Module 7 Working with Text • Making sense of Text and Natural Language Processing • Sentiment Analysis: How to interpret sentiments and use it in stock selection • Case study 5: Analyzing Earning calls using text analytics
www.prmia.org© PRMIA 2020 Foundations of ML and AI for Financial Professionals Module 8 Frontier Topics Key issues in adopting AI and Machine learning into investment workflows How will Machine Learning and AI change the investment industry? Frontier topics • Anomaly detection • Automatic Machine Learning (AutoML) • Reinforcement learning • Risk in Machine Learning and AI • Model governance, Interpretability and Model Management
www.prmia.org© PRMIA 2020 Foundations of ML and AI for Financial Professionals Optional Data Science Basics With Python class Data Science Basics With Python • May 2nd • May 9th Topics Session 1 • Introduction to Python • Working with Datasets in Python • Visualizing Datasets Session 2 • Quantitative & Statistical Methods • Summarizing and Analyzing datasets • Case study
www.prmia.org© PRMIA 2020 Use Code PRMIADISCOUNT100 for $100 off! Register here
www.prmia.org© PRMIA 2020 Q&A Sri Krishnamurthy, CFA, CAP Founder and CEO Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
www.prmia.org© PRMIA 2020 Thank You! Take our survey Recording available prmia.org > Resources > Webinar Library Certificate of Completion Visit prmia.org for upcoming webinars and training!

Synthetic VIX Data Generation Using ML Techniques

  • 1.
    www.prmia.org© PRMIA 2020 SyntheticVIX Data Generation Using ML Techniques Sri Krishnamurthy, CFA, CAP Founder & CEO www.QuantUniversity.com www.prmia.org© PRMIA 2020 Thought Leadership Webinar
  • 2.
    www.prmia.org© PRMIA 2020 BeforeWe Begin Submit your questions anytime using the Questions pane. Session is being recorded Show/Hide panel arrow Download Handout
  • 3.
    www.prmia.org© PRMIA 2020 Presenter SriKrishnamurthy, CFA, CAP Founder & CEO, QuantUniversity Our Presenter • Advisory and Consultancy for Financial Analytics • Prior experience at MathWorks, Citigroup, and Endeca and 25+ years in financial services and energy • Columnist for the Wilmott Magazine • Author of forthcoming book “Financial Modeling: A case study approach” published by Wiley • Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston • Reviewer: Journal of Asset Management
  • 4.
    www.prmia.org© PRMIA 2020 SyntheticVIX Data Generation Using ML Techniques Sri Krishnamurthy, CFA, CAP Founder & CEO www.QuantUniversity.com www.prmia.org© PRMIA 2020 Thought Leadership Webinar
  • 5.
    www.prmia.org© PRMIA 2020 Aboutwww.QuantUniversity.com • Boston-based Data Science, Quant Finance and Machine Learning training and consulting advisory • Trained more than 1,000 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R • Building a platform for AI and Machine Learning Enablement in the Enterprise
  • 6.
    www.prmia.org© PRMIA 2020 Agenda MachineLearning in 20 minutes Case Study Key Trends in AI and Machine Learning
  • 7.
    www.prmia.org© PRMIA 2020 AIand Machine Learning in FinancePart 1
  • 8.
  • 9.
    www.prmia.org© PRMIA 2020 Theworld as we know has changed! Source: https://finance.yahoo.com
  • 10.
    www.prmia.org© PRMIA 2020 Winners& Losers Source: https://finance.yahoo.com
  • 11.
    www.prmia.org© PRMIA 2020 Theworld as we know has changed! • Digital Transformation is the use of new, fast and frequently changing digital technology to solve problems. Source: https://en.wikipedia.org/wiki/Digital_transformation
  • 12.
    www.prmia.org© PRMIA 2020 Businessmodel for delivery matters today! Digital delivery models that leveraged cloud from the ground up as a means of delivery, online learning, paperless services, communication platforms are all seeing the - "I told you so" moment!
  • 13.
    www.prmia.org© PRMIA 2020 Interestin Machine Learning Has Grown Significantly Source: https://www.wipo.int/edocs/pubdocs/en/wipo_pub_1055.pdf
  • 14.
    www.prmia.org© PRMIA 2020 MachineLearning and AI Have Revolutionized Finance
  • 15.
    www.prmia.org© PRMIA 2020 MachineLearning & AI in Finance: A Paradigm Shift Stochastic Models Factor Models Optimization Risk Factors P/Q Quants Derivative pricing Trading Strategies Simulations Distribution fitting Real-time analytics Predictive analytics Machine Learning RPA NLP Deep Learning Computer Vision Graph Analytics Chatbots Sentiment Analysis Alternative Data Quant Data Scientist/ML Engineer
  • 16.
    www.prmia.org© PRMIA 2020 UpNext An Intuitive Introduction to AI and ML
  • 17.
    www.prmia.org© PRMIA 2020 MachineLearning 1. https://en.wikipedia.org/wiki/Machine_learning Figure Source: http://www.fsb.org/wp-content/uploads/P011117.pdf AI • Artificial intelligence is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals1. Definitions: Machine Learning and AI • Machine learning is the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on patterns and inference instead1. 1. https://en.wikipedia.org/wiki/Machine_learning 2. Figure Source: http://www.fsb.org/wp-content/uploads/P011117.pdf
  • 18.
    www.prmia.org© PRMIA 2020 TheMachine Learning and AI Workflow Data Scraping/ Ingestion Data Exploration Data Cleansing and Processing Feature Engineering Model Evaluation & Tuning Model Selection Model Deployment/ Inference Supervised Unsupervised Modeling Data Engineer, Dev Ops Engineer • Auto ML • Model Validation • Interpretability Robotic Process Automation (RPA) (Microservices, Pipelines ) • SW: Web/ Rest API • HW: GPU, Cloud • Monitoring • Regression • KNN • Decision Trees • Naive Bayes • Neural Networks • Ensembles • Clustering • PCA • Autoencoder • RMS • MAPS • MAE • Confusion Matrix • Precision/Recall • ROC • Hyper-parameter tuning • Parameter Grids Risk Management/ Compliance(All stages) Software / Web Engineer Data Scientist/Quants Analysts& DecisionMakers
  • 19.
    www.prmia.org© PRMIA 2020 KeySteps Involved • Data • Goals • Machine learning algorithms • Process • Performance evaluation
  • 20.
  • 21.
    www.prmia.org© PRMIA 2020 Dataset,Variable and Observations Dataset: A rectangular array with Rows as observations and columns as variables Variable: A characteristic of members of a population (Age, State etc.) Observation: List of variable values for a member of the population
  • 22.
    www.prmia.org© PRMIA 2020 Variables Categorical •Yes/No flags • AAA, BB ratings for bonds Numerical • 35mpg • $170K salary
  • 23.
    www.prmia.org© PRMIA 2020 Datasets Longitudinal •Observations are dependent • Temporal-continuity is required Cross-sectional • Observations are independent
  • 24.
    www.prmia.org© PRMIA 2020 SmartAlgorithms Data Cross sectional Numerical Categorical Longitudinal Numerical Data Cross sectional Longitudinal CategoricalNumerical Numerical
  • 25.
  • 26.
    www.prmia.org© PRMIA 2020 Goal DescriptiveStatistics • Goal is to describe the data at hand • Backward-looking • Statistical techniques employed here Predictive Analytics • Goal is to use historical data to build a model for prediction • Forward-looking • Machine learning & AI techniques employed here
  • 27.
    www.prmia.org© PRMIA 2020 PredictiveAnalytics: Cross Sectional Datasets • Given a dataset, build a model that captures the similarities in different observations and assigns them to different buckets- Clustering • Given a set of variables, predict the value of another variable in a given data set- Prediction Examples: — Predict salaries given work experience, education etc. — Predict whether a loan would be approved given fico score, current loans, employment status etc.
  • 28.
    www.prmia.org© PRMIA 2020 Summary Goal Descriptive Statistics Cross sectional NumericalCategorical Numerical vs Categorical Categorical vs Categorical Numerical vs Numerical Time series Predictive Analytics Cross- sectional Segmentation Prediction Predict a number Predict a category Time-serie Goal Descriptive Statistics Cross sectional Time series Numerical Categorical Numerical vs Categorical Categorical vs Categorical Numerical vs Numerical Predictive Analytics Cross sectional Time series Segmentation Prediction Predict a Number Predict a category
  • 29.
    www.prmia.org© PRMIA 2020 UpNext Machine Learning Algorithms
  • 30.
    www.prmia.org© PRMIA 2020 MachineLearning Unsupervised Supervised Reinforcement Semi-Supervised Machine Learning
  • 31.
    www.prmia.org© PRMIA 2020 Summary Goal Descriptive Statistics Cross sectional NumericalCategorical Numerical vs Categorical Categorical vs Categorical Numerical vs Numerical Time series Predictive Analytics Cross- sectional Segmentation Prediction Predict a number Predict a category Time-serie Goal Descriptive Statistics Cross sectional Time series Numerical Categorical Numerical vs Categorical Categorical vs Categorical Numerical vs Numerical Predictive Analytics Cross sectional Time series Segmentation Prediction Predict a Number Predict a category
  • 32.
    www.prmia.org© PRMIA 2020 MachineLearning Supervised Algorithms • Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦i in a given data set such that • If y is numeric => Prediction • If y is categorical => Classification Example: Given that a customer’s Debt-to-Income ratio increased 20%, what are the chances he/she would default in 3 months? x1,x2,x3… Model F(X) y
  • 33.
    www.prmia.org© PRMIA 2020 MachineLearning Unsupervised Algorithms • Given a dataset with variables 𝑥𝑖, build a model that captures the similarities in different observations and assigns them to different buckets => Clustering Example: Given a list of emerging market stocks, can we segment them into three buckets? Obs1, Obs 2, Obs 3… Model Obs1-Class1 Obs2-Class2 Obs3-Class1
  • 34.
    www.prmia.org© PRMIA 2020 SupervisedLearning Models - Prediction Parametric models • Assume some functional form • Fit coefficients Examples : Linear Regression, Neural Networks 𝑌 = 𝛽! + 𝛽" 𝑋" Linear regression model Neural network model
  • 35.
    www.prmia.org© PRMIA 2020 SupervisedLearning Models Non-parametric models • No functional form assumed Examples : K-nearest neighbors, Decision Trees K-nearest neighbor model Decision tree model
  • 36.
    www.prmia.org© PRMIA 2020 MachineLearning Algorithms Machine Learning Supervised Prediction Classification Parametric Logistic regression Neural Networks K-means Associative rule mining Parametric Linear Regression KNN Decision Trees Classification Logistic Regression Neural Networks Decision Trees KNN Unsupervised Non- parametric Linear regression Neural networks KNN Decision Trees Non- parametric KNN Decision Trees
  • 37.
    www.prmia.org© PRMIA 2020 MachineLearning Movers and Shakers Deep Learning Automatic Machine Learning Ensemble Learning Natural Language Processing
  • 38.
  • 39.
  • 40.
    www.prmia.org© PRMIA 2020 FeatureEngineering What transformations do I need for the x and y variables ? Which are the best features to use? • Dimension Reduction – PCA • Best subset selection — Forward selection — Backward elimination — Stepwise regression
  • 41.
    www.prmia.org© PRMIA 2020 TrainingModel Data Training 80% Testing 20%
  • 42.
    www.prmia.org© PRMIA 2020 UpNext Performance Evaluation
  • 43.
    www.prmia.org© PRMIA 2020 EvaluationFramework Evaluating machine learning algorithms ROC curvesMAPE Supervised classification Supervised - Prediction R-square RMS MAE MAPE ROC Curves Confusion Matrix MAERMSR-square Supervised prediction
  • 44.
    www.prmia.org© PRMIA 2020 PredictionAccuracy Measures Fit measures in classical regression modeling: • Adjusted 𝑅^2 has been adjusted for the number of predictors. It increases only when the improve of model is more than one would expect to see by chance (p is the total number of explanatory variables) 𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅! = 1 − ⁄∑"#$ % (𝑦" − 1𝑦")! (𝑛 − 𝑝 − 1) ∑"#$ % 𝑦" − 5𝑦" ! /(𝑛 − 1) • MAE or MAD (mean absolute error/deviation) gives the magnitude of the average absolute error 𝑀𝐴𝐸 = ∑"#$ % 𝑒" 𝑛
  • 45.
    www.prmia.org© PRMIA 2020 PredictionAccuracy Measures • MAPE (mean absolute percentage error) gives a percentage score of how predictions deviate on average 𝑀𝐴𝑃𝐸 = ∑"#$ % 𝑒"/𝑦" 𝑛 ×100% • RMSE (root-mean-squared error) is computed on the training and validation data 𝑅𝑀𝑆𝐸 = 1/𝑛 > "#$ % 𝑒" !
  • 46.
    www.prmia.org© PRMIA 2020 Recap •Data • Goals • Machine learning algorithms • Process • Performance evaluation
  • 47.
    www.prmia.org© PRMIA 2020 MachineLearning Workflow Data Scraping/ Ingestion Data Exploration Data Cleansing and Processing Feature Engineering Model Evaluation & Tuning Model Selection Model Deployment/ Inference Supervised Unsupervised Modeling Data Engineer, Dev Ops Engineer • Auto ML • Model Validation • Interpretability Robotic Process Automation (RPA) (Microservices, Pipelines ) • SW: Web/ Rest API • HW: GPU, Cloud • Monitoring • Regression • KNN • Decision Trees • Naive Bayes • Neural Networks • Ensembles • Clustering • PCA • Autoencoder • RMS • MAPS • MAE • Confusion Matrix • Precision/Recall • ROC • Hyper-parameter tuning • Parameter Grids Risk Management/ Compliance(All stages) Software / Web Engineer Data Scientist/Quants Analysts& DecisionMakers
  • 48.
    www.prmia.org© PRMIA 2020 Casestudy: Synthetic VIX Data Generation Using ML Techniques
  • 49.
    www.prmia.org© PRMIA 2020 49 1.Challenges with Real Datasets 2. Synthetic Dataset generation tools — Proprietary — Open Source ▪ Faker ▪ Data Synthesizer ▪ SDV ▪ Synthpop ▪ GANs 3. Demos — Data Synthesizer — Sales Data Generator — VIX Data Generator Agenda
  • 50.
    www.prmia.org© PRMIA 2020 SYNTHETICDATA • Synthetic data is "any production data applicable to a given situation that are not obtained by direct measurement.”1 • In finance, Synthetic data has been used in stress and scenario analysis for many years now. • Example: Monte-carlo simulations have been used to generate future scenarios. • In Machine Learning, Synthetic Data plays an important role to prevent overfitting, handle imbalance class problems, and to accommodate plausible scenarios. 1 https://en.wikipedia.org/wiki/Synthetic_data
  • 51.
    www.prmia.org© PRMIA 2020 Challengeswith Real Datasets All scenarios haven’t played out • Stress scenarios • What-if scenarios 51 Figureref:http://www.actuaries.org/CTTEES_SOLV/Documents/StressTestingPaper.pdf
  • 52.
    www.prmia.org© PRMIA 2020 52 Missingvalues • Missing at random • Missing sequences • Need data to fill frames Challenges with Real Datasets
  • 53.
    www.prmia.org© PRMIA 2020 53 Access •Hard to find • Rare class problems • Privacy concerns making it difficult to share Challenges with Real Datasets Picture source: www.pixabay.com
  • 54.
    www.prmia.org© PRMIA 2020 54 Imbalanced •Need more samples of rare class • Need proxies for data points that were not observed or recorded Challenges with Real Datasets
  • 55.
    www.prmia.org© PRMIA 2020 55 Labels •Human labeling is hard • Synthetic label generators Challenges with Real Datasets
  • 56.
  • 57.
  • 58.
    www.prmia.org© PRMIA 2020 58 DataSynthesizer https://faculty.washington.edu/billhowe/publications/pdfs/ping17datasynthesizer.pdf
  • 59.
    www.prmia.org© PRMIA 2020 59 Synthpop Ref:https://cran.r-project.org/web/packages/synthpop/index.html
  • 60.
  • 61.
    www.prmia.org© PRMIA 2020 61 GenerativeAdversarial Networks (GAN) https://developers.google.com/machine-learning/gan/gan_structure
  • 62.
    www.prmia.org© PRMIA 2020 SyntheticData in Finance Ref: Machine Learning for Asset Managers, Marcos M. López de Prado,,CAMBRIDGE UNIVERSITY PRESS 2020
  • 63.
  • 64.
    www.prmia.org© PRMIA 2020 64 Demo1 – Loan Data Synthesizer
  • 65.
    www.prmia.org© PRMIA 2020 65 Demo2: Synthetic Sales Data Generation
  • 66.
    www.prmia.org© PRMIA 2020 VIXCharacteristics REF: https://www.investopedia.com/terms/v/vix.asp
  • 67.
    www.prmia.org© PRMIA 2020 67 Demo3 : Synthetic VIX Generation
  • 68.
    www.prmia.org© PRMIA 2020 UpNext Demo If you would like access to the demo and the QuSandbox, please contact us at info@qusandbox.com
  • 69.
    www.prmia.org© PRMIA 2020 Foundationsof ML and AI for Financial Professionals Module 1 Machine Learning and AI: An intuitive Introduction Machine Learning vs Statistics: How has the world changed? A tour of Machine Learning and AI methods • Supervised Learning Vs Unsupervised Learning • Deep Learning • Reinforcement Learning Key drivers influencing the adoption of Machine Learning and AI • Big Data, Hardware, Fintech, AI, Alternative Data Key applications • Credit risk, Personalization, Predicting risk, Portfolio optimization and selection Key players • Technology companies, Data vendors, Banks, Fintech startups
  • 70.
    www.prmia.org© PRMIA 2020 Foundationsof ML and AI for Financial Professionals Module 2 Exploratory data analysis + Case study Exploring and Visualizing large datasets • The Visualization zoo • A framework to decide how to chart • Examples on how to build powerful dashboards Case study 1: Visualizing Categorial, Numerical, Cross- sectional and Time series Financial datasets
  • 71.
    www.prmia.org© PRMIA 2020 Foundationsof ML and AI for Financial Professionals Module 3 Core Methods and Applications + Demo Dimension reduction and visualizing datasets using PCA, T-SNE Demo: Visualizing high-dimensional Datasets The power of understanding similar products • Unsupervised Machine Learning: How does Clustering work?
  • 72.
    www.prmia.org© PRMIA 2020 Foundationsof ML and AI for Financial Professionals Module 4 Case study + Lab Unsupervised Learning • Segmentation of Equities using Clustering Techniques • Case study 2: Using K-means for automatic clustering of stocks
  • 73.
    www.prmia.org© PRMIA 2020 Foundationsof ML and AI for Financial Professionals Module 5 Supervised Learning + Case study Learn from the past: How does Supervised machine learning work? • Cross sectional data • Time series analysis • Regression, Random Forests and Neural Networks Evaluating machine learning algorithms Case study 3: Predicting interest rates and credit risk using Alternative data sets.
  • 74.
    www.prmia.org© PRMIA 2020 Foundationsof ML and AI for Financial Professionals Module 6 Case study + Lab • Introduction to Neural Networks and Deep Neural Networks • Case study 4: Synthetic Data Generation for VIX Scenarios
  • 75.
    www.prmia.org© PRMIA 2020 Foundationsof ML and AI for Financial Professionals Module 7 Working with Text • Making sense of Text and Natural Language Processing • Sentiment Analysis: How to interpret sentiments and use it in stock selection • Case study 5: Analyzing Earning calls using text analytics
  • 76.
    www.prmia.org© PRMIA 2020 Foundationsof ML and AI for Financial Professionals Module 8 Frontier Topics Key issues in adopting AI and Machine learning into investment workflows How will Machine Learning and AI change the investment industry? Frontier topics • Anomaly detection • Automatic Machine Learning (AutoML) • Reinforcement learning • Risk in Machine Learning and AI • Model governance, Interpretability and Model Management
  • 77.
    www.prmia.org© PRMIA 2020 Foundationsof ML and AI for Financial Professionals Optional Data Science Basics With Python class Data Science Basics With Python • May 2nd • May 9th Topics Session 1 • Introduction to Python • Working with Datasets in Python • Visualizing Datasets Session 2 • Quantitative & Statistical Methods • Summarizing and Analyzing datasets • Case study
  • 78.
    www.prmia.org© PRMIA 2020 UseCode PRMIADISCOUNT100 for $100 off! Register here
  • 79.
    www.prmia.org© PRMIA 2020 Q&ASri Krishnamurthy, CFA, CAP Founder and CEO Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
  • 80.
    www.prmia.org© PRMIA 2020 Thank You! Takeour survey Recording available prmia.org > Resources > Webinar Library Certificate of Completion Visit prmia.org for upcoming webinars and training!