0% found this document useful (0 votes)
91 views26 pages

Powerpoint Workshop Introduction To Deep Learning - Statistics and Data Analysis

The document provides an outline for a livestream workshop on statistics and data analysis. It covers topics like central tendency, measures of variability, visualizing data, and the normal distribution. The outline also distinguishes between descriptive and inferential statistics, and parametric versus nonparametric methods. Hands-on sessions are planned to complement the theoretical content.

Uploaded by

habifian sultan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views26 pages

Powerpoint Workshop Introduction To Deep Learning - Statistics and Data Analysis

The document provides an outline for a livestream workshop on statistics and data analysis. It covers topics like central tendency, measures of variability, visualizing data, and the normal distribution. The outline also distinguishes between descriptive and inferential statistics, and parametric versus nonparametric methods. Hands-on sessions are planned to complement the theoretical content.

Uploaded by

habifian sultan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

STATISTICS AND DATA

ANALYSIS
z Livestream Workshop
@ Purwadhika

Sunday, 10 May 2020


Andre Nurrohman
andrenurrohman@gmail.com
z
Outline

 Statistics and Data Analysis

 Central Tendency

 Measures of Variability

 Visualizing Data

 Distribution (Normal Distribution)

 Hands On Session
z
Statistics and Data Analysis

Statistics is the science concerned with developing and studying methods for collecting, 
analyzing, interpreting and presenting empirical data.
z
Statistics and Data Analysis
Types of Data
z
Statistics and Data Analysis

Statistics

Descriptive Inferential

Presenting, organizing, and Drawing conclusions about a


summarizing data population based data observed in
sample
z
Statistics and Data Analysis

Descriptive
Statistics

Central Measures
Tendency of Variability

Mean, median, and mode. Range, variance, standard


deviation, and quartile
z
Statistics and Data Analysis

Population
PARAMETER sampling

Statistical Inference Sample


STATISTIC

Parametric Methods

Nonparametric Methods
z
Statistics and Data Analysis
Parametric Method Nonparametric Method
Data Type Interval or Ratio Nominal or Ordinal
Assumed Distribution Normal Any
Assumed Variance Homogeneous Homo/Heterogeneous
Number of Data >= 30 Flexible
Data set relationship Independent Any
Usual central measure Means Medians
Statistical Power Strong Weak

Analysis Type
Independent measures, 2 groups Independet measure t test Mann-whitney test
Independent measures, >2 groups ANOVA Krusskal-Wallis test
Repeated measures, 2 conditions Matched pair t-test Wilcoxon test
z
Statistics and Data
Analysis

Data Analysis is the process of


systematically applying statistical
and/or logical techniques to describe
and illustrate, condense and recap,
and evaluate data.

Epicycle of Data Analysis

Source: The Art of Data Analysis


z
Central Tendency

Central
Tendency

Mean Median Mode


z
Central Tendency: Mean

Example:
Number of children in each house in my street:
0, 2, 3, 2, 1, 0, 0, 2, 0
Hence, the mean is:
(0+2+3+2+1+0+0+2+0) / 9 = 1.11
In my street, a new neighborhood with 11 children come in, so the
new mean is:
(0+2+3+2+1+0+0+2+0+11) / 10 = 2.1
z
Central Tendency: Median
Median is the middle data point.

1. Dataset is odd, exactly on the middle.


2. Dataset is even, average of the two middle data point.

Example:
0, 2, 3, 2, 1, 0, 0, 2, 0
Sort it Median:
0, 0, 0, 0, 1, 2, 2, 2, 3
1
0, 2, 3, 2, 1, 0, 0, 0, 11
Sort it 1.5
0, 0, 0, 0, 1, 2, 2, 2, 3, 11
z
Central Tendency: Mode
Mode is the most frequently occurs.

Example:
0, 2, 3, 2, 1, 0, 0, 2, 0

Count each value:


0 => 4
1 => 1
2 => 3
3 => 1

How’s the mode if we add 11?


z
Measures of Variability

Measures of Boxplot
Variability

Variance and
Range Standard Quartile Outliers
Deviation
z
Measures of Variability: Range

The distance between the largest value and


the smallest value of the data.

The range of the data: (1, 4, 5, 2, 8) is ???


z
Measures of Variability:
Variance and Standard Deviation
The variance is the average of the
squared differences from the mean.

The standard deviation is the square root of


the variance and is used to measure
distance from the mean.
z
Measures of Variability:
Quartile and Outlier
Q1 Q2 Q3

IQR = Q3 – Q1

Upper outliers > Q3 + 1.5 * IQR


Lower outliers < Q1 – 1.5 * IQR
Example:
(1, 4, 5, 2, 8)
Sort it
(1, 2, 4, 5, 8)

So, Q1 = 1.5 ; Q2 = 4 ; Q3 = 6.5


IQR = 6.5 – 1.5 = 5
Outliers > 6.5 + 1.5 * 5 = 14
Outliers < 6.5 – 1.5 * 5 = -1
z
Measures of Variability: Boxplot
z
Visualizing Data

Visualizing
Data

Frequency Proportion
Chart Histograms
Table Table
z
Visualizing Data: Tables

Frequency Table Proportion Table


z
Visualizing Data: Charts
z
Visualizing Data: Histogram
z
Skewness and Kurtosis
z
Normal Distribution
 Probability distribution is a list of all of the possible outcomes of a random variable
along with their corresponding probability values.

Example:

A probability distribution of a fair 6-sided die.

 Probability distribution function is a statistical function that describes all the possible
values and likelihoods that a random variable can take within a given range.

 Most common example for probability distribution is Normal Distribution


z
Normal Distribution
 Properties of Normal Distribution:

1. The mean, mode and median are all


equal.

2. The curve is symmetric at the center


(i.e. around the mean).

3. Exactly half of the values are to the left


of center and exactly half the values
are to the right.

4. The total area under the curve is 1


z

TERIMAKASIH

You might also like