How to check if a data frame column contains duplicate values in R?



To check if a data frame column contains duplicate values, we can use duplicated function along with any. For example, if we have a data frame called df that contains a column ID then we can check whether ID contains duplicate values or not by using the command −

any(duplicated(df$ID))

Example1

Consider the below data frame −

 Live Demo

ID<-1:20 x<-rpois(20,1) df1<-data.frame(ID,x) df1

Output

    ID x 1   1  4 2   2  1 3   3  2 4   4  2 5   5  1 6   6  0 7   7  1 8   8  1 9   9  0 10 10  1 11 11  1 12 12  2 13 13  1 14 14  3 15 15  1 16 16  0 17 17  0 18 18  3 19 19  2 20 20  2

Checking whether x contains any duplicate or not −

any(duplicated(df1$x))

[1] TRUE

Example2

 Live Demo

S.No<-1:20 y<-round(rnorm(20,5,3),1) df2<-data.frame(S.No,y) df2

Output

   S.No  y 1   1   5.1 2   2   5.8 3   3   4.4 4   4  10.1 5   5   3.3 6   6   6.1 7   7   4.8 8   8  12.6 9   9   6.4 10 10   8.7 11 11   1.5 12 12   2.5 13 13   2.1 14 14   8.7 15 15   5.5 16 16   2.0 17 17   2.1 18 18   5.5 19 19   5.4 20 20   3.4

Checking whether y contains any duplicate or not −

any(duplicated(df2$y))

[1] TRUE
Updated on: 2021-03-16T07:25:26+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements