How to find the count of duplicate rows if they are greater than n in R data frame?



To find the count of duplicate rows if they are greater than n in R data frame, we can follow the below steps −

  • First of all, create a data frame.
  • Then, count the duplicate rows if they are greater than a certain number using group_by_all, count, and filter function of dplyr package.

Create the data frame

Let's create a data frame as shown below −

 Live Demo

x<-rpois(30,1) y<-rpois(30,1) df<-data.frame(x,y) df

On executing, the above script generates the below output(this output will vary on your system due to randomization) −

  x y 1 1 3 2 0 2 3 0 2 4 0 2 5 2 1 6 1 0 7 0 0 8 1 2 9 1 2 10 2 1 11 0 3 12 1 1 13 1 1 14 0 0 15 0 0 16 0 1 17 0 0 18 0 1 19 0 1 20 2 0 21 1 2 22 3 1 23 1 0 24 1 0 25 1 3 26 1 0 27 1 1 28 2 1 29 1 2 30 0 4

Count the duplicate rows if they are greater than a certain number

Loading dplyr package and using group_by_all, count, and filter function to find the count of duplicate rows if they are greater than 2 −

x<-rpois(30,1) y<-rpois(30,1) df<-data.frame(x,y) library(dplyr) df%>%group_by_all()%>%count()%>%filter(n>2)

Output

# A tibble: 7 x 3 # Groups: x, y [7]      x     y     n   <int> <int> <int> 1    0    0    4 2    0    1    3 3    0    2    3 4    1    0    4 5    1    1    3 6    1    2    4 7    2    1    3
Updated on: 2021-08-14T07:51:33+05:30

477 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements