DEV Community

Cover image for Day 2 - Importing data and practicing queries
uguremirmustafa
uguremirmustafa

Posted on

Day 2 - Importing data and practicing queries

Today we will learn how to find some structured data and import it into our database.

Generating some dummy data

There are some online tools for generating dummy data for testing purposes. Mockaroo is my favorite.

Let's create some user data and download it as CSV which is one of the most popular ways to store data in a text file.

Make sure your CSV headers match your table column names, or be ready to map them manually. Handle data types carefully - dates, numbers, and text need to be in the right format. Mockaroo has options to create your data in the correct format.

Mockaroo screenshot

Preparing docker container

We need to run some sql statement to import the data.

That's great but how am I going to pass the data to docker?

Volume mapping in a nutshell:

It's like creating a shared folder between your computer and the Docker container.

The Problem: Docker containers are isolated - they can't see files on your computer by default.

The Solution: Volume mapping creates a "bridge" so both your computer and the container can access the same files.

We need to re-create our docker container. Luckly we don't have any crucial data to save so we can delete the container and create it with a volume mapping.

# stop container docker stop travel_mania_container # delete container docker rm travel_mania_container # create a folder for the project and # for the data to share between the host and container cd ~ && mkdir travel_mania_project \ && cd ./travel_mania_project \ && mkdir container_data # re-create container with a volume mapping docker run --name travel_mania_container \ -e POSTGRES_USER=ugur \ -e POSTGRES_PASSWORD=ugur1234 \ -e POSTGRES_DB=travel_mania \ -p 5432:5432 \ -v ./container_data:/data \ -d postgres 
Enter fullscreen mode Exit fullscreen mode

After this operation don't forget to re-create the users table.

Importing CSV

You can import your data using SQL statements or using GUI of your SQL client. But what happens if your column names does not match 1:1 with the table definition?

My sample csv does not match with the actual table definition

In that case, we need to map our column names somehow.

Using column order

First option is to rely on column order. The HEADER option tells PostgreSQL to skip the first row, and you're explicitly mapping the column order by delimiting with comma.

-- Your CSV has: id, email_addr, username, fname, lname, creation_time, update_time -- Your table has: id, email, username, first_name, last_name, created_at, updated_at COPY users(id, email, username, first_name, last_name, created_at, updated_at) FROM '/data/MOCK_DATA.csv' DELIMITER ',' CSV HEADER; 
Enter fullscreen mode Exit fullscreen mode

Now you can run the COPY statement we wrote above. Here is the result:

psql select result

Notice how the column names are mapped correctly.

Using a temp table

You can create a temporary table and insert the csv there. Later insert into actual table from that temp table. In this method you don't import into your main table directly and have an opportunity to map columns more granully.

-- Create temp table matching your CSV exactly CREATE TEMP TABLE temp_users ( id INTEGER, user_email VARCHAR(255), username VARCHAR(50), fname VARCHAR(100), lname VARCHAR(100), creation_time TIMESTAMP, update_time TIMESTAMP ); -- Import to temp table COPY temp_users FROM '/data/MOCK_DATA.csv' DELIMITER ',' CSV HEADER; -- Insert into your real table with proper mapping INSERT INTO users (id, email, username, first_name, last_name, created_at, updated_at) SELECT id, user_email, username, fname, lname, creation_time, update_time FROM temp_users; -- Clean up DROP TABLE temp_users; 
Enter fullscreen mode Exit fullscreen mode

Practicing queries

Question: Find all users whose first name starts with 'J' and last name starts with 'F'. Order by first name ascending.

Solution:

select username, first_name, last_name, email from users where first_name like 'J%' and last_name like 'F%' order by first_name; 
Enter fullscreen mode Exit fullscreen mode

Question: Find users whose username contains numbers and group them by email domain (gmail.com, yahoo.com, etc.).

Solution:

select SPLIT_PART(email, '@', 2) as email_domain, COUNT(*) as users_with_numbers_in_username from users where username ~ '[0-9]' -- Contains at least one number group by SPLIT_PART(email, '@', 2) having COUNT(*) > 0 order by users_with_numbers_in_username desc; 
Enter fullscreen mode Exit fullscreen mode

End of Day 2

Today we learned how to import large dataset into the database using COPY statement.

Tomorrow we will learn about string functions like SPLIT_PART, LENGTH, TRIM and date functions AGE, DATE_PART, TO_CHAR.

Top comments (0)