Cleaning data with SQL is something I am new to, so I decided to try developing my SQL skills.
I tried using the data I generated from ChatGPT. It is a simple data with 10,000 rows for a simple data cleaning exercise, but I choose to write about it and post my journey into reviving my career back. Every milestone is important for me; it is my personal come back to be better and keep learning.
Code I used for my data cleaning exercise
-- Previewing dataset SELECT * FROM instagram_ads;
-- 1. Viewing column data types SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'instagram_ads';
-- 2. View number of rows and columns (shape) SELECT COUNT(*) AS row_count FROM instagram_ads;
-- Columns count: SELECT COUNT(*) AS column_count FROM information_schema.columns WHERE table_name = 'instagram_ads';
--- 3. View missing values per column SELECT COUNT(*) - COUNT('instagram_ads.date') AS missing_values_date, COUNT(*) - COUNT('instagram_ads.impression') AS missing_values_impression, COUNT(*) - COUNT(clicks) AS missing_values_clicks, COUNT(*) - COUNT(spend_usd) AS missing_values_spend, COUNT(*) - COUNT(campaign_name) AS missing_values_campaign FROM instagram_ads;
-- 4. Converting to lowercase and remove leading/trailing spaces UPDATE instagram_ads SET campaign_name = LOWER(TRIM(campaign_name));
-- 5. Creating a new table ALTER TABLE instagram_ads ADD COLUMN spend_usd_filled MONEY; -- 6. Updating the values with the filled mean values of null values in spend_usd UPDATE instagram_ads SET spend_usd_filled = COALESCE(spend_usd, ( SELECT AVG(spend_usd::numeric)::money FROM instagram_ads WHERE spend_usd IS NOT NULL ));
Top comments (0)