DEV Community

Cover image for 🔍 Data Cleaning Techniques Using SQL
Mohammed jobair Hossain
Mohammed jobair Hossain

Posted on

🔍 Data Cleaning Techniques Using SQL

Data cleaning is a critical step in the data preparation process. Whether it is analytics, business intelligence, or data engineering, clean data ensures more precise and reliable insights.

1️⃣ Convert Text to Lower/Upper Case
Ensure consistency in categorical fields like names or categories.

-- Convert to lowercase SELECT LOWER(column_name) AS cleaned_column FROM table_name; -- Convert to uppercase SELECT UPPER(column_name) AS cleaned_column FROM table_name; 
Enter fullscreen mode Exit fullscreen mode

2️⃣ Remove Extra Spaces from Text Fields Trim leading/trailing spaces using TRIM()

SELECT TRIM(column_name) AS cleaned_column FROM table_name; 
Enter fullscreen mode Exit fullscreen mode

3️⃣ Convert Date Strings to a Consistent Format
Transform text-based dates into a usable date format:

SELECT STR_TO_DATE(column_name, '%m/%d/%Y') AS formatted_date FROM table_name; 
Enter fullscreen mode Exit fullscreen mode

4️⃣Identify & Manage Outliers

Filter numeric values within a defined range:

SELECT * FROM table_name WHERE column_name BETWEEN lower_limit AND upper_limit; 
Enter fullscreen mode Exit fullscreen mode

5️⃣ Remove Special Characters

Strip out unwanted symbols using regular expressions:

SELECT REGEXP_REPLACE(column_name, '[^a-zA-Z0-9 ]', '') AS cleaned_column FROM table_name; 
Enter fullscreen mode Exit fullscreen mode

6️⃣ Standardize Categorical Values

Unify inconsistent text representations:

UPDATE table_name SET column_name = 'Male' WHERE column_name IN ('M', 'male'); 
Enter fullscreen mode Exit fullscreen mode

7️⃣ Replace NULLs with Default Values

SELECT COALESCE(column_name, 'DefaultValue') AS column_name FROM table_name; 
Enter fullscreen mode Exit fullscreen mode

8️⃣ Delete Duplicate Rows

WITH CTE AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY column1, column2 ORDER BY id) AS row_num FROM table_name ) DELETE FROM table_name WHERE id IN ( SELECT id FROM CTE WHERE row_num > 1 ); 
Enter fullscreen mode Exit fullscreen mode

💡 Clean data = better insights.
SQL makes it easy to standardize, validate, and transform your data—right where it lives.

If you found this helpful, feel free to share or drop a comment with your favorite data cleaning tip! 💬

Top comments (0)