Learning Notes of R For Python Programmer
R Basic Scalar Types • R basic scalar data types – integer ( 1L,2L,3L,…) – numeric ( 1,2,3,…) – character – complex – logical (TRUE, FALSE) • and(&) , or(|), not(!)
R Basic Scalar Types Constructors • RScalarType(0) == NULL – length(xxx(0)) == 0 • RScalarType(1) – integer 0L/ 0 – numeric 0 – character “” – complex 0+0i – logical FALSE
R Basic Object Types • R basic data structure types – (row) vector (In R, everything is vector) – matrix – list – data.frame – factor – environment • In R the “base" type is a vector, not a scalar.
R Object
Find R Object’s Properties • length(object) • mode(object) / class(object)/ typeof(obj) • attributes(object) • attr(object, name) • str(object)
Python type(obj) • R> class(obj) • R> mode(obj) class mode typeof • R> typeof(obj) 1 "numeric" "numeric" "double" 1:10 “integer" "numeric" “integer" “1” "character" "character" "character" class "function" "function" "builtin"
Python dir(obj) • attributes(obj) • str(object) • ls() (Python> dir() ) • The function attributes(object) returns a list of all the non-intrinsic attributes currently defined for that object.
R attr(object, name) • The function attr(object, name) can be used to select a specific attribute. • When it is used on the left hand side of an assignment it can be used either to associate a new attribute with object or to change an existing one. • For example • > attr(z, "dim") <- c(10,10) – allows R to treat z as if it were a 10-by-10 matrix.
R character
Python “a,b,c,d,e”.split(“,”) (R strsplit) • strsplit(“a,b,c,d,e”,“,“) • (Output R-list) • unlist(strsplit(“a,b,c,d,e”,“,"))[vector_index]
R paste • paste(“a”,”b”,sep=“”) – Python> “a”+”b”  “ab”
R-List Python-Dictionary
Python Dictionary (R List) • Constructor – Rlist <- list(key1=value1, … , key_n = value_n) • Evaluate – Rlist$key1 (Python> D[key1]) – Rlist[[1]] • Sublist – Rlist[key_i] (output list(key_i=value_i))
Python D[“new_key”]=new_value • Rlist$new_key = new_value or • Rlist$new_key <- new_value
Python> del D[key] • New_Rlist <- Rlist[-key_index] or • New_Rlist <- Rlist[-vector_of_key_index]
Python Dict.keys() • vector_of_Rlist_keys <- names(Rlist) • ( output “vector_of_Rlist_keys” is a R-vector)
R-Vector Python-List
Python List (R vector) • [Constructor] vector(mode , length) – vector(mode = "character", length = 10) • 0:10 – 0:10 == c(0,1,2,3,4,5,6,7,8,9,10) – Python> range(0,11) ) • seq(0,1,0.1) – seq(0,1,0.1) == 0:10*0.1 – Matlab> linspace(0,1,0.1) • rep(0:10, times = 2)
Python List.methods • vector <- c(vector, other_vector) – Python> List.append • vector[-J] or vector[-(I:J)] – Python> List.pop • subvector <- vector[vector_of_index] • which( vector == value ) – Python> List.index(value)
R which • which( vector == value ) – Python> List.index(value) • which( vector < v) or which( vector > v) • which(arg, arr.in=TRUE) • http://fortheloveof.co.uk/2010/04/11/r- select-specific-elements-or-find-their-index- in-a-vector-or-a-matrix/
R vector • length(vector) – Python> len(List) • names(vector) • rownames(vector)
Python> element in List • R> element %in% R-Vector • R> !(element %in% R-Vector) (not in)
R matrix R-Vector with Dimension
R-Matrix • Constructor: – matrix( ?? , nrow = ?? , ncol = ?? ) – as.matrix( ?? )
R-Matrix=R-Vector with Dimension > x <- 1:15 > class(x) [1] "integer" > dim(x) <- c(3, 5) > class(x) [1] "matrix"
Names on Matrix • Just as you can name indices in a vector you can (and should!) name columns and rows in a matrix with colnames(X) and rownames(X). • E.g. – colname(R-matrix) <- c(name_1,name_2,…) – colname(R-matrix) [i] <- name_i
Functions on Matrix • If X is a matrix apply(X, 1, f) is the result of applying f to each row of X; apply(X, 2, f) to the columns. – Python> map(func,py-List)
Add Columns and Rows • cbind E.g. > cbind(c(1,2,3),c(4,5,6)) • rbind E.g. > rbind(c(1,2,3),c(4,5,6))
Data Frame in R Explicitly like a list
Explicitly like a list • When can a list be made into a data.frame? – Components must be vectors (numeric, character, logical) or factors. – All vectors and factors must have the same lengths.
Python os and R
Python os.method • getwd() (Python> os.getcwd() ) • setwd(Path) (Python> os.chdir(Path))
Control Structures and Looping
if • if ( statement1 ) • statement2 • else if ( statement3 ) • statement4 • else if ( statement5 ) • statement6 • else • statement8
swtich • Switch (statement, list) • Example: > y <- "fruit" > switch(y, fruit = "banana", vegetable = "broccoli", meat = "beef") [1] "banana"
for • for ( name in vector ) statement1 • E.g. >.for ( ind in 1:10) { print(ind) }
while • while ( statement1 ) statement2
repeat • repeat statement • The repeat statement causes repeated evaluation of the body until a break is specifically requested. • When using repeat, statement must be a block statement. You need to both perform some computation and test whether or not to break from the loop and usually this requires two statements.
Functions in R
Create Function in R • name <- function(arg_1, arg_2, ...) expression • E.g. – ADD <- function(a,b) a+b – ADD <- function(a,b) {c<-a+b} – ADD <- function(a,b) {c<-a+b;c} – ADD <- function(a,b) {c<-a+b; return(c)} – (All these functions are the same functions)
Function Return R-List • To return more than one item, create a list using list() • E.g. – MyFnTest1 <- function(a,b) {c<-a+b;d<-a-b; list(r1=c,r2=d)} – MyFnTest1 <- function(a,b) {c<-a+b;d<-a-b; return(list(r1=c,r2=d))} – (These two functions are the same, too)
Python map(func,Py-List) • apply series methods (to be continued.)
R Time Objects
R Basic Time Objects • Basic Types – Date – POSIXct – POSIXlt • Constructors: – as.Date – as. POSIXct – as. POSIXlt
as.POSIXct/ as.POSIXlt • as. POSIXct( timestamp , origin , tz , …) • E.g. – as. POSIXct( timestamp , origin="1970-01- 01",tz="CST“, …)
strftime / strptime • "POSIXlt“/"POSIXct“ to Character – strftime(x, format="", tz = "", usetz = FALSE, ...) • Character to "POSIXlt“ – strptime(x, format, tz = "") • E.g. – strptime(… ,"%Y-%m-%d %H:%M:%S", tz="CST")
Time to Timestamp [Python> time.mktime(…)] • as.numeric(POSIXlt Object) • E.g. – as.numeric(Sys.time())
R Graph
Types of Graphics • Base • Lattice
Base Graphics • Use function such as – plot – barplot – contour – boxplot – pie – pairs – persp – image
Plot Arguments • type = ??? • axes = FALSE : suppresses axes • xlab = “str” : label of x-axis • ylab = “str” : label of y-axis • sub = “str” : subtitle appear under the x-axis • main = “str” : title appear at top of plot • xlim = c(lo,hi) • ylim = c(lo,hi)
Plot’s type arg • type = – “p” : plots points – “l” : plots a line – “n” : plots nothing, just creates the axes for later use – “b” : plots both lines and points – “o” : plot overlaid lines and points – “h” : plots histogram-like vertical lines – “s” : plots step-like lines
Plot Example • R> plot(x=(1:20),y=(11:30),pch=1:20,col=1:20,mai n="plot",xlab="x-axis",ylab="y- axis",ylim=c(0,30)) • R> example(points)
pch • 0:18: S-compatible vector symbols. • 19:25: further R vector symbols. • 26:31: unused (and ignored). • 32:127: ASCII characters. • 128:255 native characters only in a single-byte locale and for the symbol font. (128:159 are only used on Windows.) • Ref: http://stat.ethz.ch/R-manual/R-devel/library/graphics/html/points.html http://rgraphics.limnology.wisc.edu/
cex • a numerical vector giving the amount by which plotting characters and symbols should be scaled relative to the default. This works as a multiple of par("cex"). NULL and NA are equivalent to 1.0. Note that this does not affect annotation: see below. • E.g. – points(c(6,2), c(2,1), pch = 5, cex = 3, col = "red") – points(c(6,2), c(2,1), pch = 5, cex = 10, col = "red")
points, lines, text, abline
arrows
par/layout (Matlab> subplot) • par(mfrow=c(m,n)) – Matlab> subplot(m,n,?)
pairs • E.g. – R> pairs(iris[,c(1,3,5)]) – R> example(pairs)
MISC. Code1 (Saving Graph) • postscript("myfile.ps") • plot(1:10) • dev.off()
MISC. Code2 (Saving Graph) • windows(record=TRUE, width=7, height=7) • Last_30_TXF<-last(TXF,30)plt • chartSeries(Last_30_TXF) • savePlot(paste("Last30_",unlist(strsplit(filena me,"."))[1],sep=""),type = "jpeg",device = dev.cur(),restoreConsole = TRUE)
可使用的顏色種類 • R> colors() 可以查出所有顏色 • 可搭配grep找尋想要的色系, 如 • R> grep("red",colors()) • Reference: • http://research.stowers-institute.org/efg/R/Color/Chart/
R xts
Tools for xts • diff • lag
My XTS’ Tools • Integration_of_XTS • Indexing_of_XTS • XTS_Push_Events_Back • Get_XTS_Local_Max • Get_XTS_Local_Min
Basic Statistics Tools
R Statistical Models
Model Formulae • formula(x, ...) • as.formula(object, env = parent.frame()) • E.g. – R> example(formula)
MISC. 1 Updating fitted models • http://cran.r-project.org/doc/manuals/R- intro.html#Updating-fitted-models
R Packages
• library() • search() • loadedNamespaces() • getAnywhere(Package_Name) • http://cran.r-project.org/doc/manuals/R- intro.html#Namespaces
Random Number Generators
• rnorm • runif •
Regular Expression Python Re Module
grep • Pattern_Index <- grep(Pattern, Search_Vector) • E.g. (quantmod中的 Cl function) return(x[, grep("Close", colnames(x))])
• hits <- grep( pattern, x ) • Ref: Lecture5v1
R LibSVM (e1071) http://www.csie.ntu.edu.tw/~cjlin/lib svm/R_example
R CR Tree Method (rpart) Classification and Regression Tree
• http://www.statsoft.com/textbook/classificati on-and-regression-trees/ • http://www.stat.cmu.edu/~cshalizi/350/lectur es/22/lecture-22.pdf • http://www.stat.wisc.edu/~loh/treeprogs/gui de/eqr.pdf
R Adaboost Package (adabag)
adaboost.M1 • 此函數的演算法使用 Freund and Schapire‘s Adaboost.M1 algorithm • 其中 weak learner 的部分使用 CR Tree 也就 是R中的 rpart package
adaboost.M1’s Training Data Form • Label Column must be a factor object (in source code) fit <- rpart(formula = formula, weights = data$pesos, data = data[, -1], maxdepth = maxdepth) flearn <- predict(fit, data = data[, -1], type = "class")
R IDE Tools
Reference • http://en.wikipedia.org/wiki/R_(programming_language) • http://jekyll.math.byuh.edu/other/howto/R/RE.shtml (Emacs) • http://stat.ethz.ch/ESS/
Reference
Graph • http://addictedtor.free.fr/graphiques/
• http://www.nd.edu/~steve/Rcourse/Lecture2 v1.pdf • http://addictedtor.free.fr/graphiques/ • http://www.evc- cit.info/psych018/r_intro/r_intro4.html • http://www.r-tutor.com/r-introduction/data- frame • http://msenux.redwoods.edu/math/R/datafra me.php

Learning notes of r for python programmer (Temp1)

  • 1.
    Learning Notes ofR For Python Programmer
  • 2.
    R Basic ScalarTypes • R basic scalar data types – integer ( 1L,2L,3L,…) – numeric ( 1,2,3,…) – character – complex – logical (TRUE, FALSE) • and(&) , or(|), not(!)
  • 3.
    R Basic ScalarTypes Constructors • RScalarType(0) == NULL – length(xxx(0)) == 0 • RScalarType(1) – integer 0L/ 0 – numeric 0 – character “” – complex 0+0i – logical FALSE
  • 4.
    R Basic ObjectTypes • R basic data structure types – (row) vector (In R, everything is vector) – matrix – list – data.frame – factor – environment • In R the “base" type is a vector, not a scalar.
  • 5.
  • 6.
    Find R Object’sProperties • length(object) • mode(object) / class(object)/ typeof(obj) • attributes(object) • attr(object, name) • str(object)
  • 7.
    Python type(obj) • R>class(obj) • R> mode(obj) class mode typeof • R> typeof(obj) 1 "numeric" "numeric" "double" 1:10 “integer" "numeric" “integer" “1” "character" "character" "character" class "function" "function" "builtin"
  • 8.
    Python dir(obj) • attributes(obj) •str(object) • ls() (Python> dir() ) • The function attributes(object) returns a list of all the non-intrinsic attributes currently defined for that object.
  • 9.
    R attr(object, name) •The function attr(object, name) can be used to select a specific attribute. • When it is used on the left hand side of an assignment it can be used either to associate a new attribute with object or to change an existing one. • For example • > attr(z, "dim") <- c(10,10) – allows R to treat z as if it were a 10-by-10 matrix.
  • 10.
  • 11.
    Python “a,b,c,d,e”.split(“,”) (R strsplit) • strsplit(“a,b,c,d,e”,“,“) • (Output R-list) • unlist(strsplit(“a,b,c,d,e”,“,"))[vector_index]
  • 12.
    R paste • paste(“a”,”b”,sep=“”) – Python> “a”+”b”  “ab”
  • 13.
  • 14.
    Python Dictionary (RList) • Constructor – Rlist <- list(key1=value1, … , key_n = value_n) • Evaluate – Rlist$key1 (Python> D[key1]) – Rlist[[1]] • Sublist – Rlist[key_i] (output list(key_i=value_i))
  • 15.
    Python D[“new_key”]=new_value • Rlist$new_key= new_value or • Rlist$new_key <- new_value
  • 16.
    Python> del D[key] •New_Rlist <- Rlist[-key_index] or • New_Rlist <- Rlist[-vector_of_key_index]
  • 17.
    Python Dict.keys() • vector_of_Rlist_keys<- names(Rlist) • ( output “vector_of_Rlist_keys” is a R-vector)
  • 18.
  • 19.
    Python List (Rvector) • [Constructor] vector(mode , length) – vector(mode = "character", length = 10) • 0:10 – 0:10 == c(0,1,2,3,4,5,6,7,8,9,10) – Python> range(0,11) ) • seq(0,1,0.1) – seq(0,1,0.1) == 0:10*0.1 – Matlab> linspace(0,1,0.1) • rep(0:10, times = 2)
  • 20.
    Python List.methods • vector<- c(vector, other_vector) – Python> List.append • vector[-J] or vector[-(I:J)] – Python> List.pop • subvector <- vector[vector_of_index] • which( vector == value ) – Python> List.index(value)
  • 21.
    R which • which(vector == value ) – Python> List.index(value) • which( vector < v) or which( vector > v) • which(arg, arr.in=TRUE) • http://fortheloveof.co.uk/2010/04/11/r- select-specific-elements-or-find-their-index- in-a-vector-or-a-matrix/
  • 22.
    R vector • length(vector) – Python> len(List) • names(vector) • rownames(vector)
  • 23.
    Python> element inList • R> element %in% R-Vector • R> !(element %in% R-Vector) (not in)
  • 24.
  • 25.
    R-Matrix • Constructor: – matrix( ?? , nrow = ?? , ncol = ?? ) – as.matrix( ?? )
  • 26.
    R-Matrix=R-Vector with Dimension >x <- 1:15 > class(x) [1] "integer" > dim(x) <- c(3, 5) > class(x) [1] "matrix"
  • 27.
    Names on Matrix •Just as you can name indices in a vector you can (and should!) name columns and rows in a matrix with colnames(X) and rownames(X). • E.g. – colname(R-matrix) <- c(name_1,name_2,…) – colname(R-matrix) [i] <- name_i
  • 28.
    Functions on Matrix •If X is a matrix apply(X, 1, f) is the result of applying f to each row of X; apply(X, 2, f) to the columns. – Python> map(func,py-List)
  • 29.
    Add Columns andRows • cbind E.g. > cbind(c(1,2,3),c(4,5,6)) • rbind E.g. > rbind(c(1,2,3),c(4,5,6))
  • 30.
    Data Frame inR Explicitly like a list
  • 31.
    Explicitly like alist • When can a list be made into a data.frame? – Components must be vectors (numeric, character, logical) or factors. – All vectors and factors must have the same lengths.
  • 33.
  • 34.
    Python os.method • getwd() (Python> os.getcwd() ) • setwd(Path) (Python> os.chdir(Path))
  • 35.
  • 36.
    if • if ( statement1 ) • statement2 • else if ( statement3 ) • statement4 • else if ( statement5 ) • statement6 • else • statement8
  • 37.
    swtich • Switch (statement,list) • Example: > y <- "fruit" > switch(y, fruit = "banana", vegetable = "broccoli", meat = "beef") [1] "banana"
  • 38.
    for • for (name in vector ) statement1 • E.g. >.for ( ind in 1:10) { print(ind) }
  • 39.
    while • while (statement1 ) statement2
  • 40.
    repeat • repeat statement •The repeat statement causes repeated evaluation of the body until a break is specifically requested. • When using repeat, statement must be a block statement. You need to both perform some computation and test whether or not to break from the loop and usually this requires two statements.
  • 41.
  • 42.
    Create Function inR • name <- function(arg_1, arg_2, ...) expression • E.g. – ADD <- function(a,b) a+b – ADD <- function(a,b) {c<-a+b} – ADD <- function(a,b) {c<-a+b;c} – ADD <- function(a,b) {c<-a+b; return(c)} – (All these functions are the same functions)
  • 43.
    Function Return R-List •To return more than one item, create a list using list() • E.g. – MyFnTest1 <- function(a,b) {c<-a+b;d<-a-b; list(r1=c,r2=d)} – MyFnTest1 <- function(a,b) {c<-a+b;d<-a-b; return(list(r1=c,r2=d))} – (These two functions are the same, too)
  • 44.
    Python map(func,Py-List) • applyseries methods (to be continued.)
  • 45.
  • 46.
    R Basic TimeObjects • Basic Types – Date – POSIXct – POSIXlt • Constructors: – as.Date – as. POSIXct – as. POSIXlt
  • 47.
    as.POSIXct/ as.POSIXlt • as.POSIXct( timestamp , origin , tz , …) • E.g. – as. POSIXct( timestamp , origin="1970-01- 01",tz="CST“, …)
  • 48.
    strftime / strptime •"POSIXlt“/"POSIXct“ to Character – strftime(x, format="", tz = "", usetz = FALSE, ...) • Character to "POSIXlt“ – strptime(x, format, tz = "") • E.g. – strptime(… ,"%Y-%m-%d %H:%M:%S", tz="CST")
  • 49.
    Time to Timestamp [Python> time.mktime(…)] • as.numeric(POSIXlt Object) • E.g. – as.numeric(Sys.time())
  • 50.
  • 51.
    Types of Graphics •Base • Lattice
  • 52.
    Base Graphics • Usefunction such as – plot – barplot – contour – boxplot – pie – pairs – persp – image
  • 53.
    Plot Arguments • type = ??? • axes = FALSE : suppresses axes • xlab = “str” : label of x-axis • ylab = “str” : label of y-axis • sub = “str” : subtitle appear under the x-axis • main = “str” : title appear at top of plot • xlim = c(lo,hi) • ylim = c(lo,hi)
  • 54.
    Plot’s type arg •type = – “p” : plots points – “l” : plots a line – “n” : plots nothing, just creates the axes for later use – “b” : plots both lines and points – “o” : plot overlaid lines and points – “h” : plots histogram-like vertical lines – “s” : plots step-like lines
  • 55.
    Plot Example • R> plot(x=(1:20),y=(11:30),pch=1:20,col=1:20,mai n="plot",xlab="x-axis",ylab="y- axis",ylim=c(0,30)) • R> example(points)
  • 56.
    pch • 0:18: S-compatiblevector symbols. • 19:25: further R vector symbols. • 26:31: unused (and ignored). • 32:127: ASCII characters. • 128:255 native characters only in a single-byte locale and for the symbol font. (128:159 are only used on Windows.) • Ref: http://stat.ethz.ch/R-manual/R-devel/library/graphics/html/points.html http://rgraphics.limnology.wisc.edu/
  • 57.
    cex • a numericalvector giving the amount by which plotting characters and symbols should be scaled relative to the default. This works as a multiple of par("cex"). NULL and NA are equivalent to 1.0. Note that this does not affect annotation: see below. • E.g. – points(c(6,2), c(2,1), pch = 5, cex = 3, col = "red") – points(c(6,2), c(2,1), pch = 5, cex = 10, col = "red")
  • 58.
  • 59.
  • 60.
    par/layout (Matlab> subplot) •par(mfrow=c(m,n)) – Matlab> subplot(m,n,?)
  • 61.
    pairs • E.g. – R> pairs(iris[,c(1,3,5)]) – R> example(pairs)
  • 62.
    MISC. Code1 (SavingGraph) • postscript("myfile.ps") • plot(1:10) • dev.off()
  • 63.
    MISC. Code2 (SavingGraph) • windows(record=TRUE, width=7, height=7) • Last_30_TXF<-last(TXF,30)plt • chartSeries(Last_30_TXF) • savePlot(paste("Last30_",unlist(strsplit(filena me,"."))[1],sep=""),type = "jpeg",device = dev.cur(),restoreConsole = TRUE)
  • 64.
    可使用的顏色種類 • R> colors()可以查出所有顏色 • 可搭配grep找尋想要的色系, 如 • R> grep("red",colors()) • Reference: • http://research.stowers-institute.org/efg/R/Color/Chart/
  • 65.
  • 66.
    Tools for xts •diff • lag
  • 67.
    My XTS’ Tools • Integration_of_XTS • Indexing_of_XTS • XTS_Push_Events_Back • Get_XTS_Local_Max • Get_XTS_Local_Min
  • 68.
  • 70.
  • 71.
    Model Formulae • formula(x,...) • as.formula(object, env = parent.frame()) • E.g. – R> example(formula)
  • 73.
    MISC. 1 Updatingfitted models • http://cran.r-project.org/doc/manuals/R- intro.html#Updating-fitted-models
  • 74.
  • 75.
    library() • search() • loadedNamespaces() • getAnywhere(Package_Name) • http://cran.r-project.org/doc/manuals/R- intro.html#Namespaces
  • 76.
  • 77.
  • 78.
    Regular Expression Python Re Module
  • 79.
    grep • Pattern_Index <-grep(Pattern, Search_Vector) • E.g. (quantmod中的 Cl function) return(x[, grep("Close", colnames(x))])
  • 80.
    • hits <-grep( pattern, x ) • Ref: Lecture5v1
  • 81.
  • 83.
    R CR TreeMethod (rpart) Classification and Regression Tree
  • 85.
    • http://www.statsoft.com/textbook/classificati on-and-regression-trees/ • http://www.stat.cmu.edu/~cshalizi/350/lectur es/22/lecture-22.pdf • http://www.stat.wisc.edu/~loh/treeprogs/gui de/eqr.pdf
  • 86.
  • 87.
    adaboost.M1 • 此函數的演算法使用 Freundand Schapire‘s Adaboost.M1 algorithm • 其中 weak learner 的部分使用 CR Tree 也就 是R中的 rpart package
  • 88.
    adaboost.M1’s Training DataForm • Label Column must be a factor object (in source code) fit <- rpart(formula = formula, weights = data$pesos, data = data[, -1], maxdepth = maxdepth) flearn <- predict(fit, data = data[, -1], type = "class")
  • 89.
  • 91.
    Reference • http://en.wikipedia.org/wiki/R_(programming_language) • http://jekyll.math.byuh.edu/other/howto/R/RE.shtml (Emacs) • http://stat.ethz.ch/ESS/
  • 92.
  • 93.
  • 94.
    • http://www.nd.edu/~steve/Rcourse/Lecture2 v1.pdf • http://addictedtor.free.fr/graphiques/ • http://www.evc- cit.info/psych018/r_intro/r_intro4.html • http://www.r-tutor.com/r-introduction/data- frame • http://msenux.redwoods.edu/math/R/datafra me.php