📘 Premium Read: Access my best content on Medium member-only articles — deep dives into Java, Spring Boot, Microservices, backend architecture, interview preparation, career advice, and industry-standard best practices.
🎓 Top 15 Udemy Courses (80-90% Discount): My Udemy Courses - Ramesh Fadatare — All my Udemy courses are real-time and project oriented courses.
▶️ Subscribe to My YouTube Channel (176K+ subscribers): Java Guides on YouTube
▶️ For AI, ChatGPT, Web, Tech, and Generative AI, subscribe to another channel: Ramesh Fadatare on YouTube
1. Introduction
Data manipulation is a fundamental step in data analysis. At times, we might have redundant or unnecessary columns in our dataframe that we'd like to remove for clarity. In R, dropping columns from a dataframe can be achieved using a few different techniques. This guide will focus on the use of the select function from the dplyr package.
2. Program Overview
1. Create a sample dataframe.
2. Drop columns using negative selection.
3. Drop columns by name.
3. Code Program
# Load necessary library library(dplyr) # Create a sample dataframe df <- data.frame( Name = c('John', 'Jane', 'Doe'), Age = c(25, 28, 22), Gender = c('Male', 'Female', 'Male'), Score = c(85, 90, 78) ) # Display the original dataframe print("Original Dataframe:") print(df) # Drop the 'Gender' and 'Score' columns using negative selection df1 <- df %>% select(-c(Gender, Score)) # Display the dataframe after dropping columns print("Dataframe after Dropping 'Gender' and 'Score' Columns:") print(df1) # Another method: Drop the 'Age' column by name df2 <- df[, -which(names(df) %in% c("Age"))] # Display the dataframe after dropping the 'Age' column print("Dataframe after Dropping 'Age' Column:") print(df2)
Output:
[1] "Original Dataframe:" Name Age Gender Score 1 John 25 Male 85 2 Jane 28 Female 90 3 Doe 22 Male 78 [1] "Dataframe after Dropping 'Gender' and 'Score' Columns:" Name Age 1 John 25 2 Jane 28 3 Doe 22 [1] "Dataframe after Dropping 'Age' Column:" Name Gender Score 1 John Male 85 2 Jane Female 90 3 Doe Male 78
4. Step By Step Explanation
- We initiate by creating a sample dataframe df with columns: Name, Age, Gender, and Score.
- To drop columns, we use the select function from the dplyr package. By placing a - in front of the column name(s) we wish to exclude, we're effectively telling R to keep all columns except those specified.
- In another method, if you want to exclude columns without the dplyr package, you can use base R's negative indexing with the help of which and names functions.
Comments
Post a Comment
Leave Comment