How to find the last non-missing value in each column of a data.table object?



To find the last non-missing value in each column of a data.table object, we can use lapply function along with tail function for NA values.

For example, if we have a data.table object called DT then we can find the last non-missing value in each column of DT by using the below given command −

DT[,lapply(.SD,function(x) tail(x[!is.na(x)],1))]

Example 1

Following snippet creates a data.table object −

library(data.table) x1<-sample(c(NA,1,2),20,replace=TRUE) x2<-sample(c(NA,5,10),20,replace=TRUE) x3<-sample(c(NA,100,200),20,replace=TRUE) DT1<-data.table(x1,x2,x3) DT1

The following data.table object is created −

     x1 x2 x3 1:   2  5 200 2:  NA  5 200 3:   1  5 100 4:  NA NA 100 5:   1 10 200 6:  NA 10 100 7:  NA NA 200 8:   2 10 100 9:   1 10 100 10:  1  5 200 11:  1  5  NA 12: NA  5  NA 13:  2  5 100 14: NA  5 100 15: NA 10 100 16:  2  5 200 17: NA  5  NA 18: NA NA  NA 19:  2 10 200 20:  1 NA 100

To find the last non-missing value in each column of DT1, add the following code to the above snippet −

DT1[,lapply(.SD,function(x) tail(x[!is.na(x)],1))] 

Output

If you execute all the above given snippets as a single program, it generates the following output −

   x1 x2 x3 1: 1 10 100

Example 2

Following snippet creates a data.table object −

y1<-sample(c(NA,rnorm(2)),20,replace=TRUE) y2<-sample(c(NA,rnorm(2)),20,replace=TRUE) y3<-sample(c(NA,rnorm(2)),20,replace=TRUE) DT2<-data.table(y1,y2,y3) DT2

The following data.table object is created −

        y1         y2          y3 1:  -0.7819176  0.5451545 -1.7242281 2:  -0.4305358  NA         NA 3:  -0.7819176  0.5451545  NA 4:  -0.4305358  NA         0.3926004 5:   NA         0.2671401 -1.7242281 6:  -0.7819176  0.2671401  0.3926004 7:  -0.7819176  NA         NA 8:  -0.4305358  0.5451545 -1.7242281 9:  -0.4305358  NA        -1.7242281 10: -0.7819176  0.5451545  NA 11: -0.4305358  NA         NA 12:  NA         0.5451545  0.3926004 13: -0.4305358  NA         0.3926004 14: -0.4305358  0.5451545 -1.7242281 15:  NA         0.2671401  0.3926004 16:  NA         0.5451545 -1.7242281 17:  NA         NA         NA 18: -0.4305358  NA         NA 19: -0.7819176  0.2671401  NA 20: -0.7819176  NA        -1.7242281

To find the last non-missing value in each column of DT2, add the following code to the above snippet −

DT2[,lapply(.SD,function(x) tail(x[!is.na(x)],1))] 

Output

If you execute all the above given snippets as a single program, it generates the following output −

y1 y2 y3 1: -0.7819176 0.2671401 -1.724228

Example 3

Following snippet creates a data.table object −

z1<-sample(c(NA,round(runif(2,2,10),2)),20,replace=TRUE) z2<-sample(c(NA,round(runif(2,1,2),2)),20,replace=TRUE) z3<-sample(c(NA,round(runif(2,10,115),2)),20,replace=TRUE) DT3<-data.table(z1,z2,z3) DT3

The following data.table object is created −

     z1    z2    z3 1:  8.94  NA    71.40 2:  NA    NA    NA 3:  8.94  NA    NA 4:  8.94  NA    71.40 5:  4.09  1.48  NA 6:  4.09  NA    NA 7:  NA    1.38  71.40 8:  8.94  NA    NA 9:  4.09  NA    NA 10: 8.94  NA    NA 11: 4.09  NA    71.40 12: 8.94  1.48  107.44 13: 8.94  1.38  NA 14: 4.09  1.38  NA 15: 8.94  1.38  107.44 16: NA    1.48  71.40 17: NA    NA    107.44 18: NA    1.38  71.40 19: 4.09  NA    NA 20: 4.09  1.48  71.40

To find the last non-missing value in each column of DT3, add the following code to the above snippet −

DT3[,lapply(.SD,function(x) tail(x[!is.na(x)],1))] 

Output

If you execute all the above given snippets as a single program, it generates the following output −

    z1    z2   z3 1: 4.09 1.48 71.4
Updated on: 2021-11-12T08:22:30+05:30

688 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements