INTRODUCTION TO R
AGENDA • History and evolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why use R? • References for learning R
Origin in the Bell Labs in the 1970’s HISTORY AND EVOLUTION OF R
R has developed from the S language HISTORY AND EVOLUTION OF R SVersion 1 SVersion 4 SVersion 3 SVersion 2 Developed 30 years ago for research applied to the high-tech industry
1990’s: R developed concurrently with S 1993: R made public The regular development of R HISTORY AND EVOLUTION OF R Acceleration of R development  R-Help and R-Devl mailing-lists  Creation of the R Core Group Source: R Journal Vol 1/2
Growing number of packages HISTORY AND EVOLUTION OF R 2001: ~100 packages 2009: Over 2000 packages Source: R Journal Vol 1/2 2000: R version 1.0.1 Today: R version 2.14
Explosion of R popularity in the last decade HISTORY AND EVOLUTION OF R  Object-oriented, growing user base, scripting features  Free and open-source  Irrational reasons: R seen as « cool »
Comparison of Mailing Lists HISTORY AND EVOLUTION OF R Evolution of the traffic on software main mailing-lists. Source: R.A. Muenchen, r4stats.com
Popularity amongst programming languages HISTORY AND EVOLUTION OF R KD Nuggets 2012 survey
Number of Blogs HISTORY AND EVOLUTION OF R Software Number of Blogs R 365 SAS 40 Stata 8 Others 0-3 Data as on Mar 2012
AGENDA • History and evolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why using R? • References for learning R
 R is rather a programming language  Limited user-friendly interfaces for data analysis  Is object oriented and almost non declarative  Similar to programming languages like Fortran, C, Java, Python R is not really a (statistical) software PRINCIPLE AND SOFTWARE PARADIGM
Recent endeavours to enhance R user-friendliness R has limited Graphical User Interface (GUI) options PRINCIPLE AND SOFTWARE PARADIGM Several GUIs in development R-commander RKWard Rattle
R Commander (RCmdr) PRINCIPLE AND SOFTWARE PARADIGM
RKWard PRINCIPLE AND SOFTWARE PARADIGM
Rattle PRINCIPLE AND SOFTWARE PARADIGM
Inherent limitations of pervasive Excel-like spreadsheets PRINCIPLE AND SOFTWARE PARADIGM VS.
Sophisticated but costly SAS PRINCIPLE AND SOFTWARE PARADIGM VS. Screenshot of SAS enteprise Miner 7.1. Source: sas.com
AGENDA • History and evolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why using R? • References for learning R
R console DESCRIPTION OF R INTERFACE R desktop shortcut RGui: R basic interface R command line (space to write instructions)
Using the command line in R console DESCRIPTION OF R INTERFACE First false sentence followed by R’s error message Second correct sentence Declaration and printing of the sentence as a R object Simple math computations Basic information about the R object containing the sentence
RGui menu: File tab DESCRIPTION OF R INTERFACE File tab: Usual basic and general operations
RGui menu: Edit tab DESCRIPTION OF R INTERFACE Edit tab: basic and general editing Results of the data editor Data editor: entering the object’s name
RGui menu: View tab DESCRIPTION OF R INTERFACE View tab: viewing Toolbar and/or Status bar
RGui menu: Misc tab DESCRIPTION OF R INTERFACE Misc tab: diverse operations
RGui menu: Packages tabs DESCRIPTION OF R INTERFACE Packages tab: adding functions to R foundation
RGui menu: Windows tab DESCRIPTION OF R INTERFACE Windows tab: usual options to arrange the tiles
RGui menu: Help tab DESCRIPTION OF R INTERFACE Help tab: very important links to help
AGENDA • History and evolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why using R? • References for learning R
 Open source code  You can access the code of the software  In-depth understanding of what R does  Modify the code R “philosophy” ADVANTAGES OF R Screenshot of the CRAN webpage of the « mgcv » package. Source: CRAN Adress of the « mgcv » package Link with Package sources (.tar.gz file) Example “mgcv” package webpage
Example of source code of the “mgcv” package R access to source code ADVANTAGES OF R Screenshot of unzipping the « mgcv » package and browsing through the package’s files. Unzipping mgcv_1.7-13.tar.gz file (with 7zip) List of directories in the « mgcv » package List of functions (i.e open code) in the « src » (i.e code sources) directory the « mgcv » package1 2 3
R is free ADVANTAGES OF R Software Academics Demo Commercial (basic) Commercial (full) R Free Free Free Free SAS Free to $100s Not available $1 000s $10 000s Statistica $100s 30 days limit ~$1 000 $10 000 Excel (Microsoft) Free to $10s Limited ~$100 $100s SPSS (IBM) $100s 14 days limit ~$2 000 $1 000s
Interface with other languages and scripting capabilities ADVANTAGES OF R Screenshot of the file « mgcv.c » of the « mgcv » package open in WordPad « mgcv.c » file in the « mgcv » package coded in typical C programming language Interfaces with virtually any other programming language  Fortran, C, C++, Python…  Tailor or rewrite your old codes in R R as a scripting language  R scripts can launch or be launched by other languages
R visualization capabilities ADVANTAGES OF R
R visualization capabilities ADVANTAGES OF R
R visualization capabilities ADVANTAGES OF R
 R ~ tool used by the finest researchers  Top-notch analytics capabilities R role in academia ADVANTAGES OF R Screenshot of a user’s Facebook map . Source: Paul Butler/Facebook, DG Rossiter, spatialanalysis.co.uk
Free open source philosophy To summarize ADVANTAGES OF R  R websites with many examples  Free books  Free online open courses  Twitter accounts Online help and discussion  Mailing-lists  Very active and diverse forums  Communities of developers and helpers
AGENDA • History and evolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why using R? • References for learning R
Poor management of large datasets  Avoid imbricated loops  Prefer R advanced language for data structure Average memory performance DRAWBACKS OF R Complicated structure of packages in R  Dozen of packages  To be loaded every time in memory R packages to better manage memory  Rhadoop (inspiration from Google)  Ff  bigmemory
No default parallel execution  R packages to use several cores  Top skills needed for high performance computing Average computing performance DRAWBACKS OF R A high-level programming language  Abstract and modern (Python…)  More productive coding  But further from « machine language »…  … meaning 100 times slower than C
Difficult to inspect data sets Difficult data visualization and management DRAWBACKS OF R Screenshot of the R data editor and « Viewtable » tab in SAS 9.3
Problems for large organizations  R made of several thousands independent packages  No deployment plan for complex organizations  No installation support Difficult architecture management DRAWBACKS OF R Lack of code accountability  Thousands of individual independent R developers  Nobody responsible for the quality of the code Potentially high hidden costs with R  Total cost may favour commercial solutions for complex computations made in large corporations
Steep learning curve  R code far from undergrad computer science courses  Very complex data structures (useful if mastered)  Is R’s syntax not logical? Relatively difficult to learn DRAWBACKS OF R Still, not more difficult to learn than SAS  Both SAS and R more abstract than basic programming languages (Fortran, C…)  Difficult to learn = more rewarding professionally!!
AGENDA • History and evolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why use R? • References for learning R
No language is perfect!!  Contradictory objectives to meet  Strengths and weaknesses of each language More positive than negative points SO WHY LEARN R? Different needs imply different tools  Large corporations + defined procedures  SAS-like  Less financial resources + quick proof of concept  R Effect of legacy and the culture of the organization  Use existing solutions (system architecture, BA tools…)  Habits in business analytics
Very appealing solution SO WHY LEARN R? Popularity of business analytics software (green = very popular, red = unpopular). Source: Rexer Analytics Overall Corporate Consultants Academics NGO/Gov't R SAS IBMSPSS STATISTICA Owncode
AGENDA • History and evolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why using R? • References for learning R
Many books available: choose the one that fits you!  Style, pedagogy, theory vs practice  Browse several books at local library or store Books REFERENCES FOR LEARNING R Springer’s UseR! Series (http://www.springer.com/series/6991)  Recent, concise, good quality, affordable, diverse Pure rookies: « A beginners’ guide to R », « R by example » One step forward: « Business analytics for managers » Intensive Excel users: « R through Excel » O’Reilly R series (for programmers) « R cookbook », « R in a nuttshell »
Websites REFERENCES FOR LEARNING R R official websites  The R project for statistical computing (www.r-project.org )  Mailing lists (« R-help », Special Interest Groups) and R journal  Official (austere) manuals (« An introduction to R ») Other websites  UCLA online R resources http://www.ats.ucla.edu/stat/r/)  R blogs aggregator (www.r-bloggers.com)  Social networks: LinkedIn groups (The R project for statistical computing), Twitter accounts (@RevolutionR, @inside_R), jobboards (Analytical Bridge…)
Growing number of conferences about R Conferences REFERENCES FOR LEARNING R  Annual during a few days in new venue (Google it!)  Lots of materials about many topics Other conferences or venues  Conferences about business analytics (data mining, specialized topics…) with sessions involving R  Find (or even start!) a R user group close to your location (R Wiki geographical list, map of groups on « meetup.com »)  Events and news from R-bloggers blog Official International R UseR! conference

Class ppt intro to r

  • 1.
  • 2.
    AGENDA • History andevolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why use R? • References for learning R
  • 3.
    Origin in theBell Labs in the 1970’s HISTORY AND EVOLUTION OF R
  • 4.
    R has developedfrom the S language HISTORY AND EVOLUTION OF R SVersion 1 SVersion 4 SVersion 3 SVersion 2 Developed 30 years ago for research applied to the high-tech industry
  • 5.
    1990’s: R developedconcurrently with S 1993: R made public The regular development of R HISTORY AND EVOLUTION OF R Acceleration of R development  R-Help and R-Devl mailing-lists  Creation of the R Core Group Source: R Journal Vol 1/2
  • 6.
    Growing number ofpackages HISTORY AND EVOLUTION OF R 2001: ~100 packages 2009: Over 2000 packages Source: R Journal Vol 1/2 2000: R version 1.0.1 Today: R version 2.14
  • 7.
    Explosion of Rpopularity in the last decade HISTORY AND EVOLUTION OF R  Object-oriented, growing user base, scripting features  Free and open-source  Irrational reasons: R seen as « cool »
  • 8.
    Comparison of MailingLists HISTORY AND EVOLUTION OF R Evolution of the traffic on software main mailing-lists. Source: R.A. Muenchen, r4stats.com
  • 9.
    Popularity amongst programminglanguages HISTORY AND EVOLUTION OF R KD Nuggets 2012 survey
  • 10.
    Number of Blogs HISTORYAND EVOLUTION OF R Software Number of Blogs R 365 SAS 40 Stata 8 Others 0-3 Data as on Mar 2012
  • 11.
    AGENDA • History andevolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why using R? • References for learning R
  • 12.
     R israther a programming language  Limited user-friendly interfaces for data analysis  Is object oriented and almost non declarative  Similar to programming languages like Fortran, C, Java, Python R is not really a (statistical) software PRINCIPLE AND SOFTWARE PARADIGM
  • 13.
    Recent endeavours toenhance R user-friendliness R has limited Graphical User Interface (GUI) options PRINCIPLE AND SOFTWARE PARADIGM Several GUIs in development R-commander RKWard Rattle
  • 14.
    R Commander (RCmdr) PRINCIPLEAND SOFTWARE PARADIGM
  • 15.
  • 16.
  • 17.
    Inherent limitations ofpervasive Excel-like spreadsheets PRINCIPLE AND SOFTWARE PARADIGM VS.
  • 18.
    Sophisticated but costlySAS PRINCIPLE AND SOFTWARE PARADIGM VS. Screenshot of SAS enteprise Miner 7.1. Source: sas.com
  • 19.
    AGENDA • History andevolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why using R? • References for learning R
  • 20.
    R console DESCRIPTION OFR INTERFACE R desktop shortcut RGui: R basic interface R command line (space to write instructions)
  • 21.
    Using the commandline in R console DESCRIPTION OF R INTERFACE First false sentence followed by R’s error message Second correct sentence Declaration and printing of the sentence as a R object Simple math computations Basic information about the R object containing the sentence
  • 22.
    RGui menu: Filetab DESCRIPTION OF R INTERFACE File tab: Usual basic and general operations
  • 23.
    RGui menu: Edittab DESCRIPTION OF R INTERFACE Edit tab: basic and general editing Results of the data editor Data editor: entering the object’s name
  • 24.
    RGui menu: Viewtab DESCRIPTION OF R INTERFACE View tab: viewing Toolbar and/or Status bar
  • 25.
    RGui menu: Misctab DESCRIPTION OF R INTERFACE Misc tab: diverse operations
  • 26.
    RGui menu: Packagestabs DESCRIPTION OF R INTERFACE Packages tab: adding functions to R foundation
  • 27.
    RGui menu: Windowstab DESCRIPTION OF R INTERFACE Windows tab: usual options to arrange the tiles
  • 28.
    RGui menu: Helptab DESCRIPTION OF R INTERFACE Help tab: very important links to help
  • 29.
    AGENDA • History andevolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why using R? • References for learning R
  • 30.
     Open sourcecode  You can access the code of the software  In-depth understanding of what R does  Modify the code R “philosophy” ADVANTAGES OF R Screenshot of the CRAN webpage of the « mgcv » package. Source: CRAN Adress of the « mgcv » package Link with Package sources (.tar.gz file) Example “mgcv” package webpage
  • 31.
    Example of sourcecode of the “mgcv” package R access to source code ADVANTAGES OF R Screenshot of unzipping the « mgcv » package and browsing through the package’s files. Unzipping mgcv_1.7-13.tar.gz file (with 7zip) List of directories in the « mgcv » package List of functions (i.e open code) in the « src » (i.e code sources) directory the « mgcv » package1 2 3
  • 32.
    R is free ADVANTAGESOF R Software Academics Demo Commercial (basic) Commercial (full) R Free Free Free Free SAS Free to $100s Not available $1 000s $10 000s Statistica $100s 30 days limit ~$1 000 $10 000 Excel (Microsoft) Free to $10s Limited ~$100 $100s SPSS (IBM) $100s 14 days limit ~$2 000 $1 000s
  • 33.
    Interface with otherlanguages and scripting capabilities ADVANTAGES OF R Screenshot of the file « mgcv.c » of the « mgcv » package open in WordPad « mgcv.c » file in the « mgcv » package coded in typical C programming language Interfaces with virtually any other programming language  Fortran, C, C++, Python…  Tailor or rewrite your old codes in R R as a scripting language  R scripts can launch or be launched by other languages
  • 34.
  • 35.
  • 36.
  • 37.
     R ~tool used by the finest researchers  Top-notch analytics capabilities R role in academia ADVANTAGES OF R Screenshot of a user’s Facebook map . Source: Paul Butler/Facebook, DG Rossiter, spatialanalysis.co.uk
  • 38.
    Free open sourcephilosophy To summarize ADVANTAGES OF R  R websites with many examples  Free books  Free online open courses  Twitter accounts Online help and discussion  Mailing-lists  Very active and diverse forums  Communities of developers and helpers
  • 39.
    AGENDA • History andevolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why using R? • References for learning R
  • 40.
    Poor management oflarge datasets  Avoid imbricated loops  Prefer R advanced language for data structure Average memory performance DRAWBACKS OF R Complicated structure of packages in R  Dozen of packages  To be loaded every time in memory R packages to better manage memory  Rhadoop (inspiration from Google)  Ff  bigmemory
  • 41.
    No default parallelexecution  R packages to use several cores  Top skills needed for high performance computing Average computing performance DRAWBACKS OF R A high-level programming language  Abstract and modern (Python…)  More productive coding  But further from « machine language »…  … meaning 100 times slower than C
  • 42.
    Difficult to inspectdata sets Difficult data visualization and management DRAWBACKS OF R Screenshot of the R data editor and « Viewtable » tab in SAS 9.3
  • 43.
    Problems for largeorganizations  R made of several thousands independent packages  No deployment plan for complex organizations  No installation support Difficult architecture management DRAWBACKS OF R Lack of code accountability  Thousands of individual independent R developers  Nobody responsible for the quality of the code Potentially high hidden costs with R  Total cost may favour commercial solutions for complex computations made in large corporations
  • 44.
    Steep learning curve R code far from undergrad computer science courses  Very complex data structures (useful if mastered)  Is R’s syntax not logical? Relatively difficult to learn DRAWBACKS OF R Still, not more difficult to learn than SAS  Both SAS and R more abstract than basic programming languages (Fortran, C…)  Difficult to learn = more rewarding professionally!!
  • 45.
    AGENDA • History andevolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why use R? • References for learning R
  • 46.
    No language isperfect!!  Contradictory objectives to meet  Strengths and weaknesses of each language More positive than negative points SO WHY LEARN R? Different needs imply different tools  Large corporations + defined procedures  SAS-like  Less financial resources + quick proof of concept  R Effect of legacy and the culture of the organization  Use existing solutions (system architecture, BA tools…)  Habits in business analytics
  • 47.
    Very appealing solution SOWHY LEARN R? Popularity of business analytics software (green = very popular, red = unpopular). Source: Rexer Analytics Overall Corporate Consultants Academics NGO/Gov't R SAS IBMSPSS STATISTICA Owncode
  • 48.
    AGENDA • History andevolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why using R? • References for learning R
  • 49.
    Many books available:choose the one that fits you!  Style, pedagogy, theory vs practice  Browse several books at local library or store Books REFERENCES FOR LEARNING R Springer’s UseR! Series (http://www.springer.com/series/6991)  Recent, concise, good quality, affordable, diverse Pure rookies: « A beginners’ guide to R », « R by example » One step forward: « Business analytics for managers » Intensive Excel users: « R through Excel » O’Reilly R series (for programmers) « R cookbook », « R in a nuttshell »
  • 50.
    Websites REFERENCES FOR LEARNINGR R official websites  The R project for statistical computing (www.r-project.org )  Mailing lists (« R-help », Special Interest Groups) and R journal  Official (austere) manuals (« An introduction to R ») Other websites  UCLA online R resources http://www.ats.ucla.edu/stat/r/)  R blogs aggregator (www.r-bloggers.com)  Social networks: LinkedIn groups (The R project for statistical computing), Twitter accounts (@RevolutionR, @inside_R), jobboards (Analytical Bridge…)
  • 51.
    Growing number ofconferences about R Conferences REFERENCES FOR LEARNING R  Annual during a few days in new venue (Google it!)  Lots of materials about many topics Other conferences or venues  Conferences about business analytics (data mining, specialized topics…) with sessions involving R  Find (or even start!) a R user group close to your location (R Wiki geographical list, map of groups on « meetup.com »)  Events and news from R-bloggers blog Official International R UseR! conference