How to deal with missing values to calculate correlation matrix in R?



Often the data frames and matrices in R, we get have missing values and if we want to find the correlation matrix for those data frames and matrices, we stuck. It happens with almost everyone in Data Analysis but we can solve that problem by using na.omit while using the cor function to calculate the correlation matrix. Check out the examples below for that.

Example

Consider the below data frame −

 Live Demo

> x1<-sample(c(1:5,NA),500,replace=TRUE) > x2<-sample(c(rnorm(50,2,5),NA),500,replace=TRUE) > x3<-sample(c(rpois(50,2),NA),500,replace=TRUE) > x4<-sample(c(runif(50,2,10),NA),500,replace=TRUE) > df<-data.frame(x1,x2,x3,x4) > head(df,20)

Output

 x1     x2    x3    x4 1 2 2.6347839 4 2.577690 2 3 0.3082031 1 6.250998 3 1 0.3082031 3 7.786711 4 1 2.6347839 0 3.449600 5 NA 2.5107175 1 7.269619 6 4 2.4450443 4 6.250998 7 NA 1.1747742 2 3.053929 8 NA 2.4450443 3 5.860071 9 5 6.6736496 4 7.979433 10 NA 2.4450443 2 6.250998 11 NA 1.1747742 5 NA 12 2 11.1483587 1 9.498951 13 4 2.1400502 NA 9.299100 14 2 -0.8043954 3 2.883222 15 1 1.5054120 0 2.765324 16 1 0.1283554 2 7.918015 17 3 3.0337960 3 5.588130 18 1 4.5603861 2 7.979433 19 3 4.4976830 4 8.434829 20 1 9.4147186 2 3.053929
> tail(df,20)

Output

   x1    x2     x3    x4 481 2 -1.9780830 4 9.299100 482 3 2.0495769 1 9.639262 483 3 -4.5421502 2 3.374645 484 NA 2.1400502 3 NA 485 2 -4.0551622 2 5.999863 486 4 5.8547691 2 3.593138 487 NA NA 2 9.549274 488 3 3.9160824 1 3.053929 489 1 11.1483587 5 7.786711 490 3 -2.7581511 2 9.433952 491 NA 4.8002434 1 5.824331 492 2 4.8002434 2 8.434829 493 2 1.9706702 2 3.053929 494 NA 2.5099287 2 7.979433 495 4 1.9706702 1 7.929130 496 2 4.5919890 2 9.973436 497 4 2.5099287 4 7.269619 498 4 0.3082031 3 3.053929 499 1 5.4593713 2 9.973436 500 NA -1.9780830 4 3.219703
> cor(na.omit(df))

Output

         x1         x2          x3       x4 x1 1.000000000 0.009571313 -0.06363564 0.03276244 x2 0.009571313 1.000000000 0.08123065 0.03330818 x3 -0.063635640 0.081230649 1.00000000 0.03503841 x4 0.032762439 0.033308181 0.03503841 1.00000000

Let’s have a look at an example with matrix data −

Example

 Live Demo

> M<-matrix(sample(c(rpois(10,2),NA),36,replace=TRUE),nrow=6) > M

Output

   [,1] [,2] [,3] [,4] [,5] [,6] [1,] 2    2    2    2    NA   3 [2,] 3    2    4    1    4    3 [3,] 3    NA   1    1    1    NA [4,] 3    NA   3    2    2    1 [5,] 1    4    3    2    2    2 [6,] 1    2    1    3    1    1
> cor(na.omit(M))

Output

         [,1]       [,2]      [,3]       [,4]     [,5]    [,6] [1,] 1.0000000 -0.5000000 0.7559289 -0.8660254 0.9449112 0.8660254 [2,] -0.5000000 1.0000000 0.1889822 0.0000000 -0.1889822 0.0000000 [3,] 0.7559289 0.1889822 1.0000000 -0.9819805 0.9285714 0.9819805 [4,] -0.8660254 0.0000000 -0.9819805 1.0000000 -0.9819805 -1.0000000 [5,] 0.9449112 -0.1889822 0.9285714 -0.9819805 1.0000000 0.9819805 [6,] 0.8660254 0.0000000 0.9819805 -1.0000000 0.9819805 1.0000000
Updated on: 2020-09-08T10:39:12+05:30

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements