Skip to content

Commit 91f26cd

Browse files
committed
Create readme.md
1 parent 54ecffb commit 91f26cd

File tree

4 files changed

+69
-0
lines changed
  • Data Cleaning with Python

4 files changed

+69
-0
lines changed
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
## Common data types
2+
3+
Manipulating and analyzing data with incorrect data types could lead to compromised analysis as you go along the data science workflow.
4+
5+
When working with new data, you should always check the data types of your columns using the `.dtypes` attribute or the `.info()` method which you'll see in the next exercise. Often times, you'll run into columns that should be converted to different data types before starting any analysis.
6+
7+
In this exercise, you'll first identify different types of data and correctly map them to their respective types.
8+
9+
<hr>
10+
11+
**Instructions**
12+
* Assign each card to what type of data you think it is.
13+
14+
**Answer**
15+
> **Numeric data types**
16+
> * Salary earned monthly
17+
> * Number of points on customer loyalty card
18+
> * Number of items bought in a basket
19+
>
20+
> **Text**
21+
> * City of residence
22+
> * Shipping address of a customer
23+
> * First name
24+
>
25+
> **Dates**
26+
> * Prder date of a product
27+
> * Birthdates of clients

Data Cleaning with Python/01 Common data problems/01 Common data types/script.py

Whitespace-only changes.
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
## Members only
2+
Throughout the course so far, you've been exposed to some common problems that you may encounter with your data, from data type constraints, data range constrains, uniqueness constraints, and now membership constraints for categorical values.
3+
4+
In this exercise, you will map hypothetical problems to their respective categories.
5+
6+
<hr>
7+
8+
**Instructions**
9+
* Map the data problem observed with the correct type of data problem.
10+
11+
**Answer**
12+
> **Membership Constraint**
13+
> * A `GPA` column containing a `Z-` grade.
14+
> * A `month` column with the value `14`.
15+
> * A `day_of_week` column with the value `Suntermonday`.
16+
> * A `has_loan` column with the value `12`.
17+
>
18+
> **Other Constraint**
19+
> * A `age` column with values above `130`.
20+
> * A `birthdate` column with values in the future.
21+
> * A `revenue` column represented as a string.
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
## Categories of errors
2+
3+
In the video exercise, you saw how to address common problems affecting categorical variables in your data, including white spaces and inconsistencies in your categories, and the problem of creating new categories and mapping existing ones to new ones.
4+
5+
To get a better idea of the toolkit at your disposal, you will be mapping functions and methods from pandas and Python used to address each type of problem.
6+
7+
<hr>
8+
9+
**Instructions**
10+
* Map each function/method to the categorical data problem it solves.
11+
12+
**Answer**
13+
> **White spaces and inconsistency**
14+
> * `.str.strip()`
15+
> * `.str.upper()`
16+
> * `.str.lower()`
17+
>
18+
> **Creating or remapping categories**
19+
> * `.pandas.qcut()`
20+
> * `.pandas.cut()`
21+
> * `.replace`

0 commit comments

Comments
 (0)