Data science with Data Structures

Data science with Data Structures

Python is not only a programming language anymore. Data scientists use Python to analyze and visualize data to extract the information they require. Python has many advanced libraries for this purpose that make tasks relatively efficient and straightforward.

Note: We have many courses on our platform that teach data science with Python. Later on, if you wish to begin learning data science with Python, Data Science for Non-Programmers is one of the goto courses on Educative. For now, let’s explore a dataset without using libraries specific to data science. You have to do it by using the built-in data structures and functional techniques taught previously.

For this project, we have taken a dataset from Kaggle. This dataset includes information on food choices, nutrition, preferences, childhood favorites, and other college students’ information. There are 125 responses. The data is raw and uncleaned in the form of a csv file.

The task is to read the dataset and process the information effectively by using the data structures. We won’t look into every piece of information. Dealing with the following columns is our only concern:

  • GPA: Numerical actual GPA.

  • Gender:

    • 1 - Female
    • 2 - Male
  • drink: Which picture do you associate with the word drink?

    • 1 - Orange juice
    • 2 - Soda
  • exercise: How often do you exercise in a regular week?

    • 1 - Everyday
    • 2 - Twice or three times per week
    • 3 - Once a week
    • 4 - Sometimes
    • 5 - Never
  • fries: Which picture do you associate with the word fries?

    • 1 - McDonald’s fries
    • 2 - Homefries
  • income:

    • 1 - Less than $15,000
    • 2 - $15,001 to $30,000
    • 3 - $30,001 to $50,000
    • 4 - $50,001 to $70,000
    • 5 - $70,001 to $100,000
    • 6 - Higher than $100,000
  • sports: Do you do any sports activity?

    • 1 - Yes
    • 2 - No
  • weight: An open-ended question - What is your weight in pounds?

❗️ Note: For the above data columns, you may find responses like nan, Personal and Unknown. You’ll be explicitly told where to discard them.

❗️ Note: All the values, whether numeric or string, are enclosed in single quotes ''.