How to remove rows from an R data frame based on frequency of values in grouping column?



To remove rows from an R data frame based on frequency of values in grouping column, we can follow the below steps −

  • First of all, create a data frame.
  • Then, remove rows based on frequency of values in grouping column using filter and group_by function of dplyr package.

Create the data frame

Let's create a data frame as shown below −

 Live Demo

> Group<-sample(c("I","II","III","IV"),20,replace=TRUE) > Rank<-sample(1:10,20,replace=TRUE) > df<-data.frame(Group,Rank) > df

On executing, the above script generates the below output(this output will vary on your system due to randomization) −

  Group  Rank 1    IV 7 2     I 8 3    IV 2 4     I 9 5   III 9 6   IV  5 7   II  8 8  III  2 9  III  3 10   I  6 11  II  3 12  II  1 13  IV  7 14 III  4 15 III  5 16  IV  3 17  II  2 18 III  8 19   I  5 20 III  4

Removing rows from data frame based on frequencies in grouping column

Loading dplyr package and removing rows from df based on frequency of values based on Group column −

 Live Demo

> Group<-sample(c("I","II","III","IV"),20,replace=TRUE) > Rank<-sample(1:10,20,replace=TRUE) > df<-data.frame(Group,Rank) > library(dplyr) > df %>% group_by(Group) %>% filter(n()>4)
# A tibble: 12 x 2 # Groups: Group [2] Group Rank <chr> <int> 1 IV 7 2 IV 2 3 III 9 4 IV 5 5 III 2 6 III 3 7 IV 7 8 III 4 9 III 5 10 IV 3 11 III 8 12 III 4
Updated on: 2021-08-13T10:02:53+05:30

861 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements