Data Scientist Introduction bref overview of Concepts

Agenda • Data Science and its Application • Stages-Data science, Project roles • Classification , decision tree , random forest • Demo using R Technology

Data Science Introduction-Concepts • Data science is managing the process that can transform hypotheses and data into actionable predictions. Acquire Data Manage Data Choose Modelling Method Write Code Verify Result

Data Science Introduction-Applications • Amazon’s product recommendation systems • LinkedIn’s contact recommendation system • Retail Business – Buying patterns , segment • Twitter’s trending topics • Google’s advertisement valuation systems • Walmart’s consumer demand projection systems

Data Science Domains • Statistics, Linear Algebra, Optimization, Time Series, etc. Math and Theory • Machine Learning, Data Structures, Parallel Algorithms, etc. Applied Algorithms • Storage and computing platforms, statistical tools ,etc. Technologies • Finance, banking ,health industry, agriculture Domain Expertise • Visualization, Infographics Art

Data Science Introduction-Project Roles • Represents the business interests Project sponsor • Represents end users’ interests Client • Sets and executes analytic strategy Data scientist • Manages data and data storage Data architect • Manages infrastructure Operations

Data Science Introduction- Processes in Data Science Project Define the Goal Collect and manage data Build the model Evaluate the model Present results Deploy the model

Data Science – Modelling Methods Classification and Regression Trees (CART) k-Nearest Neighbors (kNN) Random Forest (RF) Support Vector Machines (SVM) with a linear kernel Linear Discriminate Analysis Training , Test and Validation

Data Science – Modelling Method Classification and Regression Trees • Example :- Finding bad loan applications • Input variables :- Loan amount, duration, age, salary , any other loan , address, Income , education , background data , location etc • 1000 applications exist out of which 300 have been defaulted • Decision Tree for identifying Potential defaulters

Classification and Regression Trees Durati on>50 Amou nt>4 millio n Amo unt> 1mil Amo unt< 5 mil Bad (0.68) Durat ion>1 20 Good (0.75) Good (0.56) Bad (0.25) Good (0.61) Bad (0.88)

Data Science – Modelling Method K – nearest Neighbors(Knn) • Example : Male , Female distribution Hair Length (cms) 60 40 20 0/ 140 150 160 170 180 190 200 Height (cms)

Data Science – Modelling Method Random Forest (RF) Tree 1 Tree 3 Tree 2

Data Science – Modelling Method Random Forest (RF) Input All Trees Prediction Tree1: Tree2: Tree3: Random Forest Predicts:

Data Science – Modelling Method Random Forest (RF) Application where random forest algorithm is widely used: • Banking -loyal customer and fraud customers • Medicine-Disease (patient’s medical records) • Stock Market- Stock behavior, loss , Profit • E-commerce- Similar customer , segmentation

Data Science – Modelling Method Support Vector • Example : Male , Female distribution Hair Length (cms) 60 40 20 0/ 140 150 160 170 180 190 200 Height (cms)

Data Science –Model Evaluation Process • Training , Test and Validation DATA Test/ Train Split Training DATA Test DATA Training Process Model Predictions

Demo Explanation • Data • 3 Species Setosa Versicolor Verginica

Demo Explanation • Load the package Caret and load the data • Split the data into 2 parts -80 % would be kept in dataset and 20 % into validation • Feed the dataset to 4 algorithms(CART,KNN,SV,RF) • Select the best algorithm • Feed the validation to best algorithm • Check the output

Data Science Demo • Installing the R platform. • Loading the dataset. • Summarizing the dataset. • Visualizing the dataset. • Evaluating some algorithms. • Making some predictions

Other Practical's of Data Science • https://towardsdatascience.com/examples-of- data-science-with-r-789c6996435 • Customer analysis and predictive analysis • Association rules –(medical diagnosis, bio- medical, census data, fraud detection, CRM) • Hr Analytics - Finding valuable employees and retaining it

Data Science Resources • Practical Data Science with R • Demo commands • R and R Studio installation files • Resources kept at below location • gb-pb-dbm-v01Data_Science_Resources

Data Scientist Introduction bref overview of Concepts

More Related Content

Similar to Data Scientist Introduction bref overview of Concepts

More from Rahul Singh

Recently uploaded

Data Scientist Introduction bref overview of Concepts