How to find the column standard deviation if some columns are categorical in R data frame?



To find the column standard deviation if some columns are categorical in R data frame, we can follow the below steps −

  • First of all, create a data frame.

  • Then, use numcolwise function from plyr package to find the column standard deviation if some columns are categorical.

Example 1

Create the data frame

Let’s create a data frame as shown below −

Group<-sample(c("I","II","III","IV"),25,replace=TRUE) Num1<-sample(1:50,25) Num2<-sample(1:50,25) df1<-data.frame(Group,Num1,Num2) df1

Output

On executing, the above script generates the below output(this output will vary on your system due to randomization) −

   Group Num1 Num2 1  IV    30   18 2  III   26    1 3  II    38    6 4  II    37    7 5  II    49   22 6  I      7   47 7  II    34   23 8  III   23   44 9  IV    24   11 10 II    36   28 11 II    31   13 12 IV    27    8 13 I     22   20 14 IV    25   38 15 IV    44   15 16 III   43    5 17 I     21   29 18 III   40   48 19 I     46   41 20 IV     8   36 21 IV    20   27 22 III   16   24 23 II    15    9 24 I     48 30 25 IV     3 39

Find the column standard deviation if some columns are categorical

Using numcolwise function from plyr package to find the column standard deviation of numerical columns in the data frame df1 −

Group<-sample(c("I","II","III","IV"),25,replace=TRUE) Num1<-sample(1:50,25) Num2<-sample(1:50,25) df1<-data.frame(Group,Num1,Num2) library(plyr) numcolwise(sd)(df1)

Output

 Num1 Num2 1 13.57424 14.9295

Example 2

Create the data frame

Let’s create a data frame as shown below −

Categories<-sample(c("First","Second","Third"),25,replace=TRUE) Score<-sample(1:10,25,replace=TRUE) Price<-sample(1:5,25,replace=TRUE) df2<-data.frame(Categories,Score,Price) df2

Output

On executing, the above script generates the below output(this output will vary on your system due to randomization) −

   Categories Score Price 1  Second     10   5 2  Third       5 3 3  Second     10 4 4  Third       2 3 5  Third       1 1 6  First       4 2 7  First       6 3 8  Second      3 4 9  Second      2 4 10 Third       8 3 11 First       8 1 12 Second      8 3 13 First       2 2 14 First       5 1 15 First      10 3 16 Third       7 3 17 Second 4 2 18 Second 7 2 19 Second 2 5 20 First 5 1 21 Third 10 3 22 Second 10 4 23 Third 2 5 24 Third 4 3 25 First 3 4

Find the column standard deviation if some columns are categorical

Using numcolwise function from plyr package to find the column standard deviation of numerical columns in the data frame df2 −

Categories<-sample(c("First","Second","Third"),25,replace=TRUE) Score<-sample(1:10,25,replace=TRUE) Price<-sample(1:5,25,replace=TRUE) df2<-data.frame(Categories,Score,Price) library(plyr) numcolwise(sd)(df2)

Output

 Score Price 1 3.006659 1.670329
Updated on: 2021-11-08T11:38:52+05:30

196 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements