Skip to content

Commit 24727c9

Browse files
committed
expand mcode docs
1 parent e98c8d4 commit 24727c9

File tree

3 files changed

+141
-27
lines changed

3 files changed

+141
-27
lines changed

README.Rmd

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Merge and recode across multiple variables
2+
3+
Recoding data to shape it into a form suitable for analysis is one of the first, most important, and most time-consuming parts of data analysis. Data rarely come in precisely the form needed to analyze and this is especially true for social science data such as that generated by surveys, experiments, and registry data. This package provides tools for simplifying some aspects of recoding, to expand upon the set of tools provided by base R such as `cut`, `interaction`, `[<-`, `ifelse`, etc. and functions available in add-on packages such as the `recode` from [**car**](http://cran.r-project.org/web/packages/car/), `mapvalues` and `revalue` from [**plyr**](http://cran.r-project.org/web/packages/plyr/), etc.
4+
5+
6+
## Installation ##
7+
8+
[![travis-ci](https://travis-ci.org/leeper/mcode.svg)](https://travis-ci.org/leeper/mcode)
9+
10+
To install the latest development version of **mcode** from GitHub:
11+
12+
```R
13+
if(!require("devtools")){
14+
install.packages("devtools")
15+
library("devtools")
16+
}
17+
install_github("leeper/mcode")
18+
```
19+
20+
## Current Functionality ##
21+
22+
So far the package includes four functions: `branch` and `unbranch`, `mergeNA`, and `mcode`.
23+
24+
### `branch` and `unbranch` ###
25+
26+
The `branch` function provides a generalization of a common expression: `model.matrix(~0 + x, data = dataset)`. That expression creates a matrix of indicator (dummy) variables from a categorical vector. `branch` generalizes this so that it is possible to branch a vector of values into multiple vectors based on any arbitrarily branching rule.
27+
28+
```R
29+
a <- sample(1:5, 20, TRUE)
30+
b1 <- sample(1:2, 20, TRUE)
31+
b2 <- sample(1:2, 20, TRUE)
32+
branch(a, b1) # 2-column matrix
33+
branch(a, list(b1, b2)) # 4-column matrix
34+
```
35+
36+
`unbranch` reverses this process by taking a multi-column matrix, or a data.frame, or a list of vectors, or some combination thereof, and collapsing them into a single vector. It inverts the behavior of `branch`:
37+
38+
```R
39+
a <- sample(1:5, 20, TRUE)
40+
b1 <- sample(1:2, 20, TRUE)
41+
b2 <- sample(1:2, 20, TRUE)
42+
43+
b <- branch(a, list(b1, b2))
44+
u <- unbranch(b)
45+
all.equal(a, u)
46+
```
47+
48+
`unbranch` is especially useful for extracting relevant categories out of two or more vectors to create a single vector that ignores all irrelevant (replacing them with a specified value):
49+
50+
```R
51+
m1 <- c(1,3,4,5,2)
52+
m2 <- c(0,2,2,2,4)
53+
54+
# replace irrelevant categories with 0
55+
unbranch(m1, m2, .ignore = c(1,2,3))
56+
57+
# replace irrelevant categories with NA
58+
unbranch(m1, m2, .ignore = c(1,2,3), .fill = NA)
59+
```
60+
61+
### `mergeNA` ###
62+
63+
`mergeNA` provides a wrapper for `unbranch` that sets different default values. This makes it useful for mergining multiple vectors (or vectors from data.frames and lists, or columns from matrices) into a single vector. It works on vectors that have "mutually exclusive missingness" (i.e., the vectors all have the same length and many missing values such that for each position (in each vector) only one vector has a valid, non-`NA` value.
64+
65+
This is especially useful for recoding complex survey questionnaire and experimental data, which often stores values to the same variable in different vectors within a data.frame (e.g., because survey respondents in each treatment condition were stored in their own data.frame column).
66+
67+
For example:
68+
69+
```R
70+
mergeNA(c(1,2,NA,NA,NA), c(NA,NA,NA,4,5))
71+
# [1] 1 2 NA 4 5
72+
```
73+
74+
This is especially useful when it is necessary to merge a potentially large number of variables (e.g., from branched survey questions) into a single variable.
75+
76+
### `mcode`
77+
78+
`mcode` is still experimental. Building on `car::recode`, the function aims to streamline recoding of variables in potentially complex ways. Where `car::recode` converts an input vector into an output vector following a set of recoding rules, `mcode` aims to recode an arbitrary number of vectors into a single output vector. For example one may want to create a categorical variable representing age and gender categories. Normally this would require two calls to `recode` and/or a long set of additional operations (e.g., `cut`, `[<-`, `ifelse`, `*`, `+`, and/or `interaction`). `mcode` will consolidate these steps into a single operation.
79+
80+
```R
81+
a <- c(1,2,1,2,1,NA,2,NA)
82+
b <- c(1,1,2,2,NA,1,NA,2)
83+
84+
# recode using `mcode`
85+
m1 <- mcode(a, b, recodes = "c(1,1)=1;c(1,2)=2;c(2,1)=3;c(2,2)=4")
86+
87+
# compare to `ifelse`:
88+
m2 <- ifelse(a == 1 & b == 1, 1,
89+
ifelse(a == 1 & b == 2, 2,
90+
ifelse(a == 2 & b == 1, 3,
91+
ifelse(a == 2 & b == 2, 4, NA))))
92+
identical(m1, m2)
93+
94+
# compare to a sequence of extraction statements
95+
m3 <- rep(NA, length(a))
96+
m3[a == 1 & b == 1] <- 1
97+
m3[a == 1 & b == 2] <- 2
98+
m3[a == 2 & b == 1] <- 3
99+
m3[a == 2 & b == 2] <- 4
100+
identical(m1, m3)
101+
102+
# compare to interaction
103+
m4 <- interaction(a, b)
104+
levels(m4) <- c("1.1" = 1, "1.2" = 2, "2.1" = 3, "2.2" = 4)[levels(m4)]
105+
m4 <- as.numeric(as.character(m4))
106+
identical(m1, m4)
107+
```
108+

README.md

Lines changed: 0 additions & 22 deletions
This file was deleted.

man/mcode.Rd

Lines changed: 33 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,41 @@ This really only works for categorical variables, but a continuous variable coul
2323
\author{Thomas J. Leeper}
2424
%\seealso{}
2525
\examples{
26-
recodes <- "c(1,1)=1;c(1,2)=2;c(1,3)=3;c(2,1)=4;c(2,2)=5;c(2,3)=6;c(3,1)=7;c(3,2)=8;c(3,3)=9"
27-
mcode(c(1,2,1,2),c(1,1,2,2), recodes=recodes)
26+
a <- c(1,2,1,2,1,NA,2,NA)
27+
b <- c(1,1,2,2,NA,1,NA,2)
2828
29-
recodes <- "c(1,1,1,1)=1;c(1,1,1,0)=2;c(1,1,0,1)=3;c(1,0,1,1)=4;c(0,1,1,1)=5"
30-
mcode(c(rep(1,9),0),c(rep(0,5),rep(1,5)),c(rep(1,8),0,1),c(rep(1,5),rep(0,2),rep(1,3)), recodes=recodes)
29+
# recode using `mcode`
30+
m1 <- mcode(a, b, recodes = "c(1,1)=1;c(1,2)=2;c(2,1)=3;c(2,2)=4")
31+
32+
# compare to `ifelse`:
33+
m2 <- ifelse(a == 1 & b == 1, 1,
34+
ifelse(a == 1 & b == 2, 2,
35+
ifelse(a == 2 & b == 1, 3,
36+
ifelse(a == 2 & b == 2, 4, NA))))
37+
identical(m1, m2)
38+
39+
# compare to a sequence of extraction statements
40+
m3 <- rep(NA, length(a))
41+
m3[a == 1 & b == 1] <- 1
42+
m3[a == 1 & b == 2] <- 2
43+
m3[a == 2 & b == 1] <- 3
44+
m3[a == 2 & b == 2] <- 4
45+
identical(m1, m3)
46+
47+
# compare to interaction
48+
m4 <- interaction(a, b)
49+
levels(m4) <- c("1.1" = 1, "1.2" = 2, "2.1" = 3, "2.2" = 4)[levels(m4)]
50+
m4 <- as.numeric(as.character(m4))
51+
identical(m1, m4)
52+
53+
r <- "c(1,1,1,1)=1;c(1,1,1,0)=2;c(1,1,0,1)=3;c(1,0,1,1)=4;c(0,1,1,1)=5"
54+
mcode(c(rep(1,9),0),
55+
c(rep(0,5),rep(1,5)),
56+
c(rep(1,8),0,1),
57+
c(rep(1,5),rep(0,2),rep(1,3)),
58+
recodes = r)
3159
3260
# WORK WITH MISSING VALUES:
33-
mcode(c(1,1,1,1,1,NA),c(1,1,2,2,NA,1), recodes="c(1,1)=1;c(1,2)=2;c(1,NA)=3")
61+
mcode(c(1,1,1,1,1,NA), c(1,1,2,2,NA,1), recodes="c(1,1)=1;c(1,2)=2;c(1,NA)=3")
3462
}
3563
%\keyword{}

0 commit comments

Comments
 (0)