Skip to content

Commit b0a4faf

Browse files
committed
Create readme.md
1 parent 68c31f7 commit b0a4faf

File tree

5 files changed

+111
-8
lines changed
  • Exploratory Data Analysis in Python
    • 01 Read, clean, and validate
    • 02 Distributions/06 Distribution of education
    • 03 Relationships/08 Interpreting correlations
    • 04 Multivariate Thinking/01 Regression and causation

5 files changed

+111
-8
lines changed
Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,24 @@
1-
# Display the number of rows and columns
2-
nsfg.shape
1+
## Read the codebook
32

4-
# Display the names of the columns
5-
nsfg.columns
3+
When you work with datasets like the NSFG, it is important to read the documentation carefully. If you interpret a variable incorrectly, you can generate nonsense results and never realize it. So, before we start coding, I want to make sure you are familiar with the NSFG codebook, which describes every variable.
64

7-
# Select column birthwgt_oz1: ounces
8-
ounces = nsfg['birthwgt_oz1']
5+
* Follow [this link](https://www.icpsr.umich.edu/icpsradmin/nsfg/index?studyNumber=9999) to get to the interactive codebook.
6+
* Type "birthweight" in the search field, UNSELECT the checkbox that says "Search variable name only", and press "Search". You should see a list of variables related to birthweight.
7+
* Click on "BIRTHWGT_OZ1" and read the documentation of this variable. For your convenience, it is also displayed here:
98

10-
# Print the first 5 elements of ounces
11-
print(ounces.head())
9+
![birthwgt_oz1 codebook](https://assets.datacamp.com/production/repositories/4025/datasets/0d2a0c18b63f3ddf056858c145a6bdc022d8656c/Screenshot%202019-03-31%2019.16.14.png)
10+
11+
<hr>
12+
13+
How many respondents refused to answer this question?
14+
15+
**Possible Answers**
16+
17+
* 1
18+
* 35
19+
* 48-49
20+
* 2967
21+
22+
**Answer**
23+
24+
> 1
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
## Validate a variable
2+
3+
In the NSFG dataset, the variable `'outcome'` encodes the outcome of each pregnancy as shown below:
4+
5+
| Value | Label |
6+
|-------|-------------------|
7+
| 1 | Live birth |
8+
| 2 | Induced abortion |
9+
| 3 | Stillbirth |
10+
| 4 | Miscarrieage |
11+
| 5 | Ectopic pregnancy |
12+
| 6 | Current pregnancy |
13+
14+
The `nsfg` DataFrame has been pre-loaded for you. Explore it in the IPython Shell and use the methods Allen showed you in the video to answer the following question: How many pregnancies in this dataset ended with a live birth?
15+
16+
**Possible Answers**
17+
18+
* 6489
19+
* 9538
20+
* 1469
21+
* 6
22+
23+
**Answer**
24+
25+
> 6489
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
## Distribution of education
2+
3+
Let's begin comparing incomes for different levels of education in the GSS dataset, which has been pre-loaded for you into a DataFrame called `gss`. The variable `educ` represents the respondent's years of education.
4+
5+
<hr>
6+
7+
What fraction of respondents report that they have 12 years of education or fewer?
8+
9+
**Possible Answers**
10+
11+
* Approximately 22%
12+
* Approximately 31%
13+
* Approximately 47%
14+
* Approximately 53%
15+
16+
**Answer**
17+
18+
> Approximately 53%
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
## Interpreting correlations
2+
3+
In the previous exercise, the correlation between income and vegetable consumption is about `0.12`. The correlation between age and vegetable consumption is about `-0.01`.
4+
5+
<hr>
6+
7+
Which of the following are correct interpretations of these results:
8+
9+
* *A*: People with higher incomes eat more vegetables.
10+
* *B*: The relationship between income and vegetable consumption is linear.
11+
* *C*: Older people eat more vegetables.
12+
* *D*: There could be a strong nonlinear relationship between age and vegetable consumption.
13+
14+
**Possible Answers**
15+
16+
* A and C only.
17+
* B and D only.
18+
* B and C only.
19+
* A and D only.
20+
21+
**Answer**
22+
23+
> A and D only.
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
## Regression and causation
2+
3+
In the BRFSS dataset, there is a strong relationship between vegetable consumption and income. The income of people who eat 8 servings of vegetables per day is double the income of people who eat none, on average.
4+
5+
<hr>
6+
7+
Which of the following conclusions can we draw from this data?
8+
9+
A. Eating a good diet leads to better health and higher income.
10+
11+
B. People with higher income can afford a better diet.
12+
13+
C. People with high income are more likely to be vegetarians.
14+
15+
**Possible Answers**
16+
17+
* A only.
18+
* B only.
19+
* B and C.
20+
* None of them.
21+
22+
**Answer**
23+
24+
> None of them.

0 commit comments

Comments
 (0)