How to subset an R data frame based on string match in two columns with OR condition?



To subset an R data frame based on string match in two columns with OR condition, we can use grepl function with double square brackets and OR operator |. For example, if we have a data frame called df that contains two string columns say x and y then subsetting based on a particular string match in any of the columns can be done by using the below

Syntax

df[grepl("text",df[["x"]])|grepl("text",df[["y"]]),]

Check out the below examples to understand how it works.

Example1

Consider the below data frame −

 Live Demo

f1<-sample(c("India","China","Egypt","UK"),20,replace=TRUE) f2<-sample(c("India","China","Egypt","UK"),20,replace=TRUE) v1<-rnorm(20) df1<-data.frame(f1,f2,v1) df1

Output

      f1       f2         v1 1    India    India     0.58383357 2    UK       Egypt    -0.71045054 3    India    China    -0.07848666 4    Egypt    India     1.21017481 5    Egypt    UK       -0.81991817 6    Egypt    China     1.98979283 7    India    India     0.36160374 8    Egypt    China    -1.77619986 9    China    UK       -0.05397712 10   India    Egypt    -0.30372078 11   Egypt    India    -1.68623489 12   India    India    -0.41997104 13   India    China    -0.97064798 14   UK       Egypt     2.02704796 15   UK       Egypt    -0.47732133 16   China    China     0.53153059 17   Egypt    UK       -1.71608164 18   Egypt    India    -0.73298689 19   UK       UK        1.83674440 20   China    China    -1.12186527

Subsetting df1 based on matching of India in any of the first two columns −

df1<-df1[grepl("India",df1[["f1"]])|grepl("India",df1[["f2"]]),] df1
      f1      f2       v1 1   India   India     0.58383357 3   India   China    -0.07848666 4   Egypt   India     1.21017481 7   India   India     0.36160374 10  India   Egypt    -0.30372078 11  Egypt   India    -1.68623489 12  India   India    -0.41997104 13  India   China    -0.97064798 18  Egypt   India    -0.73298689

Example2

 Live Demo

g1<-sample(c("Male","Female"),20,replace=TRUE) g2<-sample(c("Male","Female"),20,replace=TRUE) v2<-rpois(20,5) df2<-data.frame(g1,g2) df2

Output

    g1      g2 1  Female  Male 2  Female  Male 3  Female  Female 4  Male    Male 5  Male    Female 6  Female  Female 7  Female  Male 8  Male    Male 9  Male    Female 10 Male    Female 11 Female  Female 12 Male    Male 13 Male    Male 14 Male    Female 15 Female  Male 16 Female  Male 17 Female  Male 18 Male    Female 19 Female  Female 20 Male    Female

Subsetting df2 based on matching of Female in any of the first two columns −

df2<-df2[grepl("Female",df2[["g2"]])|grepl("Female",df2[["g2"]]),] df2
     g1      g2 3   Female  Female 5   Male    Female 6   Female  Female 9   Male    Female 10  Male    Female 11  Female  Female 14  Male    Female 18  Male    Female 19  Female  Female 20  Male    Female
Updated on: 2021-03-06T11:42:33+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements