0% found this document useful (0 votes)
24 views107 pages

Unit-1 AI ETC MS

The document provides an introduction to Artificial Intelligence (AI), detailing its definitions, types (Weak AI and Strong AI), and applications. It covers key concepts such as machine learning, data analysis, and various AI techniques, along with examples of AI systems in practice. Additionally, it discusses the differences between Weak and Strong AI, including their capabilities, autonomy, and ethical considerations.

Uploaded by

choudharyvaish91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views107 pages

Unit-1 AI ETC MS

The document provides an introduction to Artificial Intelligence (AI), detailing its definitions, types (Weak AI and Strong AI), and applications. It covers key concepts such as machine learning, data analysis, and various AI techniques, along with examples of AI systems in practice. Additionally, it discusses the differences between Weak and Strong AI, including their capabilities, autonomy, and ethical considerations.

Uploaded by

choudharyvaish91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 107

T7473

Artificial Intelligence

Introduction to
Artificial Intelligence

Associate Professor (E&TC)


Symbiosis Institute of Technology, Pune, India

mangal.singh@sitpune.edu.in
https://www.linkedin.com/in/singhmangal
Outline
Introduction to AI:
❖ Strong AI, Weak AI, Applications
❖ Machine Learning
❖ Supervised Learning, Unsupervised Learning, Applications
❖ Training, Testing and Validation of data
❖ Data Wrangling, Data Visualization
❖ Exploratory Data Analysis
❖ Univariate, Bivariate and Multivariate Data Analysis.
What is Artificial Intelligence (AI)?

AI is the science and


engineering of making
intelligent machines, especially
intelligent computer programs
(1956).

John McCarthy
(the father of Artificial Intelligence)

 AI is a branch of computer science dealing with the simulation of intelligent behavior in computers.
 AI is the study of how to make computers do things which, at the moment, people do better.
 AI is, the study and design of intelligent agents where an intelligent agent is a system that perceives its environment and takes
actions.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 3
What is Weak AI?

 Weak AI, also known as Narrow AI, refers to AI systems that are designed and trained for a specific task or a narrow range of
tasks. These systems do not possess general intelligence or consciousness. Instead, they excel at performing particular
functions within predefined parameters.

 Characteristics of Weak AI

▪Task-Specific: Weak AI systems are developed to handle specific tasks, such as language translation, facial recognition, or
playing chess.

▪Lack of Generalization: These AI systems cannot generalize their knowledge or skills to perform tasks outside their
designated domain.

▪No Consciousness: Weak AI lacks self-awareness, consciousness, or understanding. It operates based on programmed
algorithms and learned patterns.

▪Human Assistance: Often requires human intervention for maintenance, updates, and handling unexpected situations.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 4
Examples of Weak AI

 Siri and Alexa: Voice-activated assistants that perform tasks like


setting reminders, providing weather updates, and answering
questions.

 Chatbots: Automated systems that interact with users to provide


customer service or support.

 Recommendation Systems: Algorithms used by platforms like


Netflix and Amazon to suggest content based on user
preferences.

 Self-Driving Cars: Autonomous vehicles that navigate and make


decisions based on a set of rules and sensor inputs.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 5
What is Strong AI?

 Strong AI, also known as Artificial General Intelligence (AGI), refers to AI systems that possess general cognitive abilities.
These systems are capable of understanding, learning, and applying knowledge across a wide range of tasks, much like a
human being. Strong AI remains a theoretical concept and has not yet been achieved.

 Characteristics of Strong AI

▪General Intelligence: Strong AI can understand, learn, and apply knowledge in different contexts, much like human
intelligence.

▪Consciousness and Self-Awareness: It possesses self-awareness and consciousness, allowing it to understand and reflect
on its existence.

▪Autonomy: Strong AI can operate independently, make decisions, and solve problems without human intervention.

▪Adaptability: Capable of adapting to new situations and learning from experiences in a way that mimics human cognitive
processes.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 6
Theoretical Implications of Strong AI

 Human-Like Understanding: Strong AI would understand natural language, emotions, and complex concepts at a
level comparable to humans.

 Versatility: It could perform a wide range of tasks across different domains without being limited to specific
functions.

 Ethical and Moral Reasoning: Possess the ability to make ethical and moral decisions, taking into account the
implications of its actions.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 7
Key Differences between Strong AI and Weak AI

Aspect Weak AI Strong AI

Scope and Functionality Task-specific, narrow focus General intelligence, wide range of tasks

Operates on predefined algorithms and learned


Cognitive Abilities Possesses general cognitive abilities, self-awareness
patterns

Functions autonomously, makes independent


Autonomy Requires human oversight and intervention
decisions

Consciousness No consciousness or self-awareness Self-aware and conscious

Adaptability Limited to specific functions, not easily adaptable Highly adaptable, learns from experiences

Current Examples Siri, Alexa, Chatbots, Recommendation Systems Currently theoretical, not yet achieved

Significant ethical challenges regarding safety,


Ethical Considerations Less complex ethical concerns
control, and fairness

Development Status Widely used in various applications Subject of ongoing research and development

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 8
AI Techniques

 There are three important AI techniques:

1. Search –

 Provides a way of solving problems for which no direct approach is available.

 It also provides a framework into which any direct techniques that are available can be embedded.

2. Use of knowledge –

 Provides a way of solving complex problems by exploiting the structure of the objects that are involved.

3. Abstraction –

 Provides a way of separating important features and variations from many unimportant ones that would
otherwise overwhelm any process.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 9
Task Domains of AI

Ordinary tasks Formal tasks Expert tasks


Perception Games Engineering
− Computer Vision − Go − Design
− Speech, Voice − Chess (Deep Blue) − Fault Finding
− Ckeckers − Manufacturing
− Monitoring
Natural Language Processing Mathematics Scientific Analysis
− Understanding − Geometry
− Language Generation − Logic
− Language Translation − Integration and Differentiation

Common Sense Reasoning Theorem Proving Financial Analysis


Planning Medical Diagnosis
Robot Control

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 10
History of AI

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 11
Application Domains of AI
Natural Language Processing

Email Spam Filter in Gmail Neural Network

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 13
Image Processing

Face Detection in Camera Deep Learning

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 14
Speech Recognition

Voice Technology in Virtual Agents Deep Learning

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 15
Data Mining

Market Basket Analysis Product recommendation

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 16
Expert System

IBM Watson Reinforcement Learning

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 17
Robotics

Home Automation Deep Learning

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 18
Scheduling

Aurora - Advanced Intelligent


Resource Scheduling Planning and Scheduling Solution

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 19
Optimization

Shortest Path Google map path planner

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 20
Game Playing

Alpha Go Deep Neural Network

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 21
Virtual Agents

Chatbots Conversational AI

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 22
Personalized Recommender Systems

Online Shopping Machine Learning

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 23
Automated Control Systems

Washing Machine Fuzzy Logic

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 24
Security

NVIDIA Metropolis Machine Learning

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 25
AI – ML – DL and Data Science
Technique that enables
machines to mimic human
behavior
AI
Subset of AI which uses
Machine statistical methods to enable
Learning machine to learn and improve
Data with time

Science
Deep Subset of ML that includes
Learning algorithms and enables
system to train itself

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 26
What is Machine Learning?

Human can learn from past experience and make decision of its own.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 27
What is Machine Learning?

What is this object?

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 28
What is Machine Learning?

What is this object?

CAR

CAR

BIKE
It is a CAR
BIKE

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 29
What is Machine Learning?

Let us ask the same question to him


What is this object?

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 30
What is Machine Learning?

Let us ask the same question to him


What is this object?
?

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 31
What is Machine Learning?

[ But, he is a human being. He can observe and learn ]

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 32
What is Machine Learning?

Let us make him learn


show him

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 33
What is Machine Learning?

Let us make him learn


CAR

show him

CAR

BIKE

BIKE

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 34
What is Machine Learning?

Let us ask the same question now


What is this object?

CAR

CAR

BIKE
BIKE

Past experience
Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 35
What is Machine Learning?

Let us ask the same question now

CAR What is this object?

CAR

CAR

BIKE
BIKE

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 36
What about a Machine ?

Machines follow instructions

[ It can not take decision of its own]


Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 37
What about a Machine ?

We can ask a machine

• To perform an arithmetic operations


such as:

• Addition
• Multiplication
• Division
Machines follow instructions

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 38
What about a Machine ?

• Comparison

• Print

• Plotting a chart

Machines follow instructions

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 39
What is Machine Learning?

[ We want a machine to act like a human]

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 40
What is Machine Learning?

[ to identify this object.]

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 41
What is Machine Learning?

Price in 2025?

[ predict the price in future]

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 42
What is Machine Learning?

I made met him yesterday

[ Natural Language understand, and correct grammar ]

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 43
What is Machine Learning?

recognize face

[ Recognize Faces ]

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 44
What is Machine Learning?

[ What do we do?

Just like, what we did to human,

we need to provide experience


to the machine.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 45
What is Machine Learning?

[
This what we called as Data
or Training dataset

+ So, we first need to provide


training dataset to the
machine
]

Dataset

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 46
What is Machine Learning?

+ +
[ Then, devise algorithms and execute programs on the
data

With respect to the underlying target tasks ]

Dataset

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 47
What is Machine Learning?

+ + +

Dataset [ Then, using the programs, Identify


required rules ]
Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 48
What is Machine Learning?

+ + +

Dataset [extract required patterns ]

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 49
What is Machine Learning?

+ + +

Dataset [ Identify relations ]

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 50
What is Machine Learning?

+ + + =

Dataset [ So that machine can derive inferences


from the data ]
Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 51
In summary, what is machine learning?

Given a machine learning problem:


• Identify and create the appropriate dataset

• Perform computation to learn


• Required rules, pattern and relations

• Output the decision

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 52
Machine Learning Paradigms

• Supervised

• Unsupervised Learning

• Reinforcement learning

[ We as human being solve various types of problem in our day-to-day life, <pause> Various decisions
need to be taken.
Depending on the nature of the problem, machine learning tasks can be broadly divided in ]

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 53
What is Supervised Learning?

CAR

CAR

+ BIKE
= Training Dataset
BIKE

Samples Labels

[In supervised learning, we need some thing called a Labelled Training Dataset ]

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 54
What is Supervised Learning?

CAR

CAR

+ BIKE
= Training Dataset 𝑓( , )=
BIKE

Samples Labels

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 55
What is Supervised Learning?

CAR

CAR

+ BIKE
= Training Dataset 𝑓( , )=
BIKE

Samples Labels

[ Given a labelled dataset, the task is to devise a function which takes the dataset, and a new sample, and
produces an output value.]
Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 56
What is Supervised Learning?

CAR

CAR

+ BIKE
= Training Dataset 𝑓( , )= CAR
BIKE

Samples Labels

[ Given a labelled dataset, the task is to devise a function which takes the dataset, and a new sample, and
produces an output value.]
Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 57
What is Supervised Learning?

CAR
Classification
CAR

+ BIKE
= Training Dataset 𝑓( , )= CAR
BIKE

Samples Labels

[ If the possible output values of the function are predefined and discrete/categorical, it is called
Classification
Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 58
What is Supervised Learning?

CAR
Classification
CAR

+ BIKE
= Training Dataset 𝑓( , )= CAR
BIKE

Samples Labels

[ Predefined classes means, it will produce output only from the labels defined in the dataset. For example,
even if we input a bus, it will produce either CAR or BIKE ]
Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 59
Classifier

Elephant
Elephant

Classifier

Tiger Identify the Animal ?

Dataset

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 60
Regression

𝑓( , )= 20500.50

Dataset

[ If the possible output values of the function are continuous real values, then it is called Regression
Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 61
Classification and Regression problems

[
The classification and Regression problems are supervised, because the decision depends on
the characteristics of the ground truth labels or values present in the dataset, which we
define as experience
]

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 62
What is Unsupervised Learning?

CAR

CAR

BIKE

BIKE

Dataset

[ In the unsupervised learning, we do not need to know the labels or Ground truth values ]

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 63
What is Unsupervised Learning?

Clustering
Dataset

[ The task is to identify the patterns like group the similar objects together ]
Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 64
What is Unsupervised Learning?

Association Rules Mining


Dataset

[ Association rules like ]


Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 65
More Examples: Unsupervised Learning

Dataset

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 66
More Examples: Unsupervised Learning

Dataset

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 67
More Examples: Unsupervised Learning

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 68
What is Reinforcement Learning

[ It is also known as learning from trials and errors ]

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 69
What is Reinforcement Learning

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 70
What is Reinforcement Learning

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 71
What is Reinforcement Learning

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 72
Another Example

Agent Task Environment

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 73
Reinforcement Learning

Punishment

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 74
Reinforcement Learning

Reward

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 75
Reinforcement Learning

Reward
Baby Learn from the Trials and Errors

Reinforcement Learning
Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 76
Train vs. Validation vs. Test set

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 77
The Training Set

 It is the set of data that is used to train and make the model learn the hidden features/patterns in the data.

 In each epoch, the same training data is fed to the neural network architecture repeatedly, and the model continues
to learn the features of the data.

 The training set should have a diversified set of inputs so that the model is trained in all scenarios and can predict
any unseen data sample that may appear in the future.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 78
The Test Set

 The test set is a separate set of data used to test the model after completing the training.

 It provides an unbiased final model performance metric in terms of accuracy, precision, etc. To put it simply, it
answers the question of "How well does the model perform?"

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 79
The Validation Set

 The validation set is a set of data, separate from the training set, that is used to validate our model performance
during training.

 This validation process gives information that helps us tune the model’s hyperparameters and configurations
accordingly. It is like a critic telling us whether the training is moving in the right direction or not.

 The model is trained on the training set, and, simultaneously, the model evaluation is performed on the validation
set after every epoch.

 The main idea of splitting the dataset into a validation set is to prevent our model from overfitting i.e., the model
becomes really good at classifying the samples in the training set but cannot generalize and make accurate
classifications on the data it has not seen before.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 80
How to split your Machine Learning data?

 If there are several hyperparameters to tune, the machine learning model requires a larger validation set to
optimize the model performance. Similarly, if the model has fewer or no hyperparameters, it would be easy to
validate the model using a small set of data.

 If a model use case is such that a false prediction can drastically hamper the model performance—like falsely
predicting cancer—it’s better to validate the model after each epoch to make the model learn varied scenarios.

 With the increase in the dimension/features of the data, the hyperparameters of the neural network functions also
increase making the model more complex. In these scenarios, a large split of data should be kept in training set
with a validation set.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 81
How to split your Machine Learning data?

The truth is—

 There is no optimal split percentage.

 One has to come to a split percentage that suits the requirements and meets the model’s needs.

 However, there are two major concerns while deciding on the optimum split:

▪If there is less training data, the machine learning model will show high variance in training.

▪With less testing data/validation data, your model evaluation/model performance statistic will have greater
variance.

 Essentially, you need to come up with an optimum split that suits the need of the dataset/model.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 82
How to split your Machine Learning data?

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 83
What is Data Wrangling?

 Sometimes, data Wrangling is referred to as data munging.

 It is the process of transforming and mapping data from one "raw" data form into another format to make it more

appropriate and valuable for various downstream purposes such as analytics.

 The goal of data wrangling is to assure quality and useful data.

 Data analysts typically spend the majority of their time in the process of data wrangling compared to the actual

analysis of the data.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 84
What is Data Wrangling?

 The process of data wrangling may include further munging, data visualization, data aggregation, training a
statistical model, and many other potential uses.

 Data wrangling typically follows a set of general steps, which begin with extracting the raw data from the data
source, "munging" the raw data (e.g., sorting) or parsing the data into predefined data structures, and finally
depositing the resulting content into a data sink for storage and future use.

 Wrangling the data is usually accompanied by Mapping. The term "Data Mapping" refers to the element of the
wrangling process that involves identifying source data fields to their respective target data fields.

 While Wrangling is dedicated to transforming data, Mapping is about connecting the dots between different
elements.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 85
What is Data Wrangling?

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 86
Importance of Data Wrangling

 Making raw data usable. Accurately wrangled data guarantees that quality data is entered into the downstream analysis.

 Getting all data from various sources into a centralized location so it can be used.

 Piecing together raw data according to the required format and understanding the business context of data.

 Automated data integration tools are used as data wrangling techniques that clean and convert source data into a standard
format that can be used repeatedly according to end requirements. Businesses use this standardized data to perform crucial,
cross-data set analytics.

 Cleansing the data from the noise or flawed, missing elements.

 Data wrangling acts as a preparation stage for the data mining process, which involves gathering data and making sense of it.

 Helping business users make concrete, timely decisions.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 87
Data Wrangling Process

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 88
Data Wrangling Process-I

 Discovery: Before starting the wrangling process, it is critical to think about what may lie beneath your data. It is crucial to think
critically about what results from you anticipate from your data and what you will use it for once the wrangling process is
complete. Once you've determined your objectives, you can gather your data.

 Organization: After you've gathered your raw data within a particular dataset, you must structure your data. Due to the variety
and complexity of data types and sources, raw data is often overwhelming at first glance.

 Cleaning: When your data is organized, you can begin cleaning your data. Data cleaning involves removing outliers, formatting
nulls, and eliminating duplicate data. It is important to note that cleaning data collected from web scraping methods might be
more tedious than cleaning data collected from a database. Essentially, web data can be highly unstructured and require more
time than structured datafrom a database.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 89
Data Wrangling Process-II

 Data enrichment: This step requires that you take a step back from your data to determine if you have enough data to proceed.
Finishing the wrangling process without enough data may compromise insights gathered from further analysis. For example,
investors looking to analyze product review data will want a significant amount of data to portray the market and increase
investment intelligence

 Validation: After determining you gathered enough data, you will need to apply validation rules to your data. Validation rules,
performed in repetitive sequences, confirm that your data is consistent throughout your dataset. Validation rules will also
ensure quality as well as security. This step follows similar logic utilized in data normalization, a data standardization process
involving validation rules.

 Publishing: The final step of the data munging process is data publishing. Data publishing involves preparing the data for future
use. This may include providing notes and documentation of your wrangling process and creating access for other users and
applications.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 90
Benefits of Data Wrangling

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 91
Data Wrangling Formats

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 92
Data Wrangling Formats

 Transactional data: Transactional data refers to business operation transactions. This data type involves detailed subjective
information about particular transactions, including client documentation, client interactions, receipts, and notes regarding any
external transactions.

 Analytical Base Table (ABT): Analytical Base Table data involves data within a table with unique entries for each attribute
column. ABT data is the most common business data type as it involves various data types that contribute to the most common
data sources. Even more notable is that ABT data is primarily used for AI and ML, which we will examine later.

 Time-series: Time series data involves data that has been divided by a particular amount of time or data that has a relation with
time, particularly sequential time. For example, tracking data regarding an application's downloads over a year or tracking traffic
data over a month would be considered time series data.

 Document library: Lastly, document library data is information that involves a large amount of textual data, particularly text
within a document. While document libraries contain rather massive amounts of data, automated data mining tools specifically
designed for text mining can help extract entire texts from documents for further analysis.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 93
Data Wrangling Examples

 Merging several data sources into one data set for analysis

 Identifying gaps or empty cells in data and either filling or removing them

 Deleting irrelevant or unnecessary data

 Identifying severe outliers in data and either explaining the inconsistencies or deleting them to facilitate analysis.
 Businesses also use data wrangling tools to:
• Detect corporate fraud
• Support data security
• Ensure accurate and recurring data modeling results
• Ensure business compliance with industry standards
• Perform Customer Behavior Analysis
• Reduce time spent on preparing data for analysis
• Promptly recognize the business value of your data
• Find out data trends

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 94
What is Data Visualization?

 Data visualization is a graphical representation of quantitative information and data by using visual elements like
graphs, charts, and maps.

 Data visualization convert large and small data sets into visuals, which is easy to understand and process for
humans.

 Data visualization tools provide accessible ways to understand outliers, patterns, and trends in the data.

 In the world of Big Data, the data visualization tools and technologies are required to analyze vast amounts of
information.

 Data visualizations are common in your everyday life, but they always appear in the form of graphs and charts. The
combination of multiple visualizations and bits of information are still referred to as Infographics.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 95
What makes Data Visualization Effective?

• American statistician and Yale professor


Edward Tufte believe useful data
visualizations consist of complex ideas
communicated with clarity, precision, and
efficiency.
• To craft an effective data visualization, you need
to start with clean data that is well-sourced and
complete. After the data is ready to visualize,
you need to pick the right chart.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 96
What makes Data Visualization Effective?

 Data visualization is important because of the processing of information in human brains.

 Using graphs and charts to visualize a large amount of the complex data sets is more comfortable in comparison
to studying the spreadsheet and reports.

 Data visualization can identify areas that need improvement or modifications.

 Data visualization can clarify which factor influence customer behavior.

 Data visualization helps you to understand which products to place where.

 Data visualization can predict sales volumes.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 97
Why Use Data Visualization?

 To make easier in understand and remember.

 To discover unknown facts, outliers, and trends.

 To visualize relationships and patterns quickly.

 To ask a better question and make better decisions.

 To competitive analyze.

 To improve insights.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 98
Exploratory Data Analysis

 Exploratory Data Analysis (EDA) is a crucial initial step in data science projects.

 It involves analyzing and visualizing data to understand its key characteristics, uncover patterns, and identify
relationships between variables refers to the method of studying and exploring record sets to apprehend their
predominant traits, discover patterns, locate outliers, and identify relationships between variables.

 EDA is normally carried out as a preliminary step before undertaking extra formal statistical analyses or modeling.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 99
Exploratory Data Analysis: Key Aspects

 Distribution of Data: Examining the distribution of data points to understand their range, central tendencies (mean, median), and
dispersion (variance, standard deviation).

 Graphical Representations: Utilizing charts such as histograms, box plots, scatter plots, and bar charts to visualize relationships within
the data and distributions of variables.

 Outlier Detection: Identifying unusual values that deviate from other data points. Outliers can influence statistical analyses and might
indicate data entry errors or unique cases.

 Correlation Analysis: Checking the relationships between variables to understand how they might affect each other. This includes
computing correlation coefficients and creating correlation matrices.

 Handling Missing Values: Detecting and deciding how to address missing data points, whether by imputation or removal, depending on
their impact and the amount of missing data.

 Summary Statistics: Calculating key statistics that provide insight into data trends and nuances.

 Testing Assumptions: Many statistical tests and models assume the data meet certain conditions (like normality or homoscedasticity).
EDA helps verify these assumptions.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 100
Why Exploratory Data Analysis is Important?

 Understanding Data Structures: EDA helps in getting familiar with the dataset, understanding the number of features, the type of
data in each feature, and the distribution of data points. This understanding is crucial for selecting appropriate analysis or
prediction techniques.

 Identifying Patterns and Relationships: Through visualizations and statistical summaries, EDA can reveal hidden patterns and
intrinsic relationships between variables. These insights can guide further analysis and enable more effective feature
engineering and model building.

 Detecting Anomalies and Outliers: EDA is essential for identifying errors or unusual data points that may adversely affect the
results of your analysis. Detecting these early can prevent costly mistakes in predictive modeling and analysis.

 Testing Assumptions: Many statistical models assume that data follow a certain distribution or that variables are independent.
EDA involves checking these assumptions. If the assumptions do not hold, the conclusions drawn from the model could be
invalid.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 101
Why Exploratory Data Analysis is Important?

 Informing Feature Selection and Engineering: Insights gained from EDA can inform which features are most relevant to include
in a model and how to transform them (scaling, encoding) to improve model performance.

 Optimizing Model Design: By understanding the data’s characteristics, analysts can choose appropriate modeling techniques,
decide on the complexity of the model, and better tune model parameters.

 Facilitating Data Cleaning: EDA helps in spotting missing values and errors in the data, which are critical to address before
further analysis to improve data quality and integrity.

 Enhancing Communication: Visual and statistical summaries from EDA can make it easier to communicate findings and
convince others of the validity of your conclusions, particularly when explaining data-driven insights to stakeholders without
technical backgrounds.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 102
Types of Exploratory Data Analysis

 EDA, or Exploratory Data Analysis, refers back to the method of analyzing and analyzing information units to uncover styles, pick
out relationships, and gain insights.

 There are various sorts of EDA strategies that can be hired relying on the nature of the records and the desires of the evaluation.

 Depending on the number of columns we are analyzing. We can divide EDA into three types:

• Univariate

• Bivariate

• Multivariate.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 103
Univariate Data Analysis

 Univariate analysis focuses on a single variable to understand its internal structure.

 It is primarily concerned with describing the data and finding patterns existing in a single feature.

 This sort of evaluation makes a specialty of analyzing character variables inside the records set.

 It involves summarizing and visualizing a unmarried variable at a time to understand its distribution, relevant tendency, unfold,
and different applicable records.

 Common techniques include:

▪ Histograms: Used to visualize the distribution of a variable.

▪ Box plots: Useful for detecting outliers and understanding the spread and skewness of the data.

▪ Bar charts: Employed for categorical data to show the frequency of each category.

▪ Summary statistics: Calculations like mean, median, mode, variance, and standard deviation that describe the central
tendency and dispersion of the data.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 104
Bivariate Data Analysis

 Bivariate evaluation involves exploring the connection between variables.

 It enables find associations, correlations, and dependencies between pairs of variables.

 Bivariate analysis is a crucial form of exploratory data analysis that examines the relationship between two variables.

 Some key techniques used in bivariate analysis:

▪ Scatter Plots: These are one of the most common tools used in bivariate analysis. A scatter plot helps visualize the relationship
between two continuous variables.

▪ Correlation Coefficient: This statistical measure (often Pearson’s correlation coefficient for linear relationships) quantifies the
degree to which two variables are related.

▪ Line Graphs: In the context of time series data, line graphs can be used to compare two variables over time. This helps in
identifying trends, cycles, or patterns that emerge in the interaction of the variables over the specified period.

▪ Covariance: Covariance is a measure used to determine how much two random variables change together. However, it is sensitive
to the scale of the variables, so it’s often supplemented by the correlation coefficient for a more standardized assessment of the
relationship.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 105
Multivariate Data Analysis

 Multivariate analysis examines the relationships between two or more variables in the dataset.

 It aims to understand how variables interact with one another, which is crucial for most statistical modeling techniques.

 Some key techniques used in multivariate analysis:

▪ Pair plots: Visualize relationships across several variables simultaneously to capture a comprehensive view of potential
interactions.

▪ Principal Component Analysis (PCA): A dimensionality reduction technique used to reduce the dimensionality of large
datasets, while preserving as much variance as possible.

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 106
Steps for Performing Exploratory Data Analysis

Dr. Mangal Singh #T7473 Unit 1 – Introduction to Artificial Intelligence Symbiosis Institute of Technology, Pune 107

You might also like