Differentiate between categorical and numerical independent variables in R.



For categorical variable, each level is considered as an independent variable and is recognized by factor function. On the other hand, the numerical independent variable is either continuous or discrete in nature.

Check out the Example given below for linear regression model summary to understand the difference between categorical and numerical independent variables.

Example

Following snippet creates a sample data frame −

x<-rpois(20,2) y<-rpois(20,5) df<-data.frame(x,y) df

The following dataframe is created

 x y 1 1 1 2 4 5 3 3 10 4 3 4 5 1 6 6 3 4 7 1 2 8 1 10 9 1 6 10 2 5 11 1 2 12 3 4 13 0 5 14 1 5 15 4 5 16 4 7 17 3 5 18 2 4 19 1 3 20 2 6

To create linear model for data in df and find the model summary on the above created data frame, add the following code to the above snippet −

x<-rpois(20,2) y<-rpois(20,5) df<-data.frame(x,y) Model_1<-lm(y~x,data=df) summary(Model_1)

Output

If you execute all the above given snippets as a single program, it generates the following Output −

Call: lm(formula = y ~ x, data = df) Residuals: Min 1Q Median 3Q Max -3.549 -1.313 -0.503 1.128 5.451 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 4.168 1.013 4.11 0.00065 *** x 0.382 0.426 0.90 0.38249 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.29 on 18 degrees of freedom Multiple R-squared: 0.0426, Adjusted R-squared: -0.0106 F-statistic: 0.801 on 1 and 18 DF, p-value: 0.382

To create linear model for data in df with as a factor variable and find the model summary on the above created data frame, add the following code to the above snippet −

x<-rpois(20,2) y<-rpois(20,5) df<-data.frame(x,y) Model_1<-lm(y~x,data=df) Model_2<-lm(y~factor(x),data=df) summary(Model_2)

Output

If you execute all the above given snippets as a single program, it generates the following Output −

Call: lm(formula = y ~ factor(x), data = df) Residuals:    Min     1Q  Median    3Q   Max -3.375 -1.400  -0.533 1.083 5.625 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 5.00e+00 2.50e+00 2.00 0.064 . factor(x)1 -6.25e-01 2.65e+00 -0.24 0.817 factor(x)2 -3.92e-15 2.89e+00 0.00 1.000 factor(x)3 4.00e-01 2.74e+00 0.15 0.886 factor(x)4 6.67e-01 2.89e+00 0.23 0.820 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.5 on 15 degrees of freedom Multiple R-squared: 0.0526, Adjusted R-squared: -0.2 F-statistic: 0.208 on 4 and 15 DF, p-value: 0.93
Updated on: 2021-11-03T08:02:54+05:30

841 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements