Software For Data Analysis Programming With R 1st Edition John Chambers Auth download https://ebookbell.com/product/software-for-data-analysis- programming-with-r-1st-edition-john-chambers-auth-2250222 Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be interested in. You can click the link to download. Software For Data Analysis Programming With R 1st Edition John Chambers Auth https://ebookbell.com/product/software-for-data-analysis-programming- with-r-1st-edition-john-chambers-auth-2358460 Comparative Approaches To Using R And Python For Statistical Data Analysis Advances In Systems Analysis Software Engineering And High Performance Computing Sarmento https://ebookbell.com/product/comparative-approaches-to-using-r-and- python-for-statistical-data-analysis-advances-in-systems-analysis- software-engineering-and-high-performance-computing-sarmento-55674616 Data Analysis With R Statistical Software A Guidebook For Scientists Thomas https://ebookbell.com/product/data-analysis-with-r-statistical- software-a-guidebook-for-scientists-thomas-55810092 Tools And Algorithms For The Construction And Analysis Of Systems 28th International Conference Tacas 2022 Held As Part Of The European Joint Conferences On Theory And Practice Of Software Etaps 2022 Munich Germany April 27 2022 Part I Dana Fisman https://ebookbell.com/product/tools-and-algorithms-for-the- construction-and-analysis-of-systems-28th-international-conference- tacas-2022-held-as-part-of-the-european-joint-conferences-on-theory- and-practice-of-software-etaps-2022-munich-germany-april-27-2022-part- i-dana-fisman-44887728
Tools And Algorithms For The Construction And Analysis Of Systems 28th International Conference Tacas 2022 Held As Part Of The European Joint Conferences On Theory And Practice Of Software Etaps 2022 Munich Germany April 27 2022 Part Ii Dana Fisman https://ebookbell.com/product/tools-and-algorithms-for-the- construction-and-analysis-of-systems-28th-international-conference- tacas-2022-held-as-part-of-the-european-joint-conferences-on-theory- and-practice-of-software-etaps-2022-munich-germany-april-27-2022-part- ii-dana-fisman-44887730 Software Foundations For Data Interoperability And Large Scale Graph Data Analytics 4th International Workshop Sfdi 2020 And 2nd International Workshop Lsgda 2020 Held In Conjunction With Vldb 2020 Tokyo Japan September 4 2020 Proceedings 1st Ed Lu Qin https://ebookbell.com/product/software-foundations-for-data- interoperability-and-large-scale-graph-data-analytics-4th- international-workshop-sfdi-2020-and-2nd-international-workshop- lsgda-2020-held-in-conjunction-with-vldb-2020-tokyo-japan- september-4-2020-proceedings-1st-ed-lu-qin-22496306 Data Analytics For Drilling Engineering Theory Algorithms Experiments Software 1st Ed 2020 Qilong Xue https://ebookbell.com/product/data-analytics-for-drilling-engineering- theory-algorithms-experiments-software-1st-ed-2020-qilong-xue-10801346 Software Data Engineering For Network Elearning Environments Analytics And Awareness Learning Services 1st Edition Santi Caball https://ebookbell.com/product/software-data-engineering-for-network- elearning-environments-analytics-and-awareness-learning-services-1st- edition-santi-caball-6988958 Software Engineering For Data Scientists Meap V2 Chapters 1 To 7 Of 14 Andrew Treadway https://ebookbell.com/product/software-engineering-for-data- scientists-meap-v2-chapters-1-to-7-of-14-andrew-treadway-48497044
Statistics and Computing Series Editors: J. Chambers D. Hand W. Härdle
Statistics and Computing Brusco/Stahl: Branch and Bound Applications in Combinatorial Data Analysis Chambers: Software for Data Analysis: Programming with R Dalgaard: Introductory Statistics with R Gentle: Elements of Computational Statistics Gentle: Numerical Linear Algebra for Applications in Statistics Gentle: Random Number Generation and Monte Carlo Methods, 2nd ed. Härdle/Klinke/Turlach: XploRe: An Interactive Statistical Computing Environment Hörmann/Leydold/Derflinger: Automatic Nonuniform Random Variate Generation Krause/Olson: The Basics of S-PLUS, 4th ed. Lange: Numerical Analysis for Statisticians Lemmon/Schafer: Developing Statistical Software in Fortran 95 Loader: Local Regression and Likelihood Ó Ruanaidh/Fitzgerald: Numerical Bayesian Methods Applied to Signal Processing Pannatier: VARIOWIN: Software for Spatial Data Analysis in 2D Pinheiro/Bates: Mixed-Effects Models in S and S-PLUS Unwin/Theus/Hofmann: Graphics of Large Datasets: Visualizing a Million Venables/Ripley: Modern Applied Statistics with S, 4th ed. Venables/Ripley: S Programming Wilkinson: The Grammar of Graphics, 2nd ed.
John M. Chambers Programming with R Software for Data Analysis
David Hand Department of Mathematics South Kensington Campus Imperial College London W. Härdle Institut für Statistik und Ökonometrie Humboldt-Universität zu Berlin Spandauer Str. 1 D-10178 Berlin Germany Department of Statistics–Sequoia Hall John Chambers 390 Serra Mall Stanford University Stanford, CA 94305-4065 USA London, SW7 2AZ United Kingdom All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are springer.com 9 8 7 6 5 4 3 2 1 e-ISBN: 978-0-387-75936-4 ISBN: 978-0-387-75935-7 ©2008 Springer Science+Business Media, LLC Printed on acid-free paper. countries. Mac OS® X - Operating System software - is a registered trademark of Apple Computer, Inc. MATLAB® is a trademark of The MathWorks, Inc. countries. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. S-PLUS® is a registered trademark of Insightful Corporation. UNIX® is a registered trademark of The Open Group. of Microsoft Corporation in the U.S. and/or other countries. Star Trek and related marks are trademarks of CBS Studios, Inc. Windows® and/or other Microsoft products referenced herein are either registered trademarks or trademarks Java™ is a trademark or registered trademark of Sun Microsystems, Inc. in the United States and other MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other Department of Statistics–Sequoia Hall John Chambers 390 Serra Mall Stanford University Stanford, CA 94305-4065 USA Library of Congress Control Number: 2008922937 jmc@r-project.org Series Editors: DOI: 10.1007/978-0-387-75936-4 or by similar or dissimilar methodology now known or hereafter developed is forbidden.
Preface This is a book about Software for Data Analysis: using computer software to extract information from some source of data by organizing, visualizing, modeling, or performing any other relevant computation on the data. We all seem to be swimming in oceans of data in the modern world, and tasks ranging from scientific research to managing a business require us to extract meaningful information from the data using computer software. This book is aimed at those who need to select, modify, and create software to explore data. In a word, programming. Our programming will center on the R system. R is an open-source software project widely used for computing with data and giving users a huge base of techniques. Hence, Programming with R. R provides a general language for interactive computations, supported by techniques for data organization, graphics, numerical computations, model- fitting, simulation, and many other tasks. The core system itself is greatly supplemented and enriched by a huge and rapidly growing collection of soft- ware packages built on R and, like R, largely implemented as open-source software. Furthermore, R is designed to encourage learning and develop- ing, with easy starting mechanisms for programming and also techniques to help you move on to more serious applications. The complete picture— the R system, the language, the available packages, and the programming environment—constitutes an unmatched resource for computing with data. At the same time, the “with” word in Programming with R is impor- tant. No software system is sufficient for exploring data, and we emphasize interfaces between systems to take advantage of their respective strengths. Is it worth taking time to develop or extend your skills in such program- ming? the right questions and providing trustworthy answers to them are the key to analyzing data, and the twin principles that will guide us. v Yes, because the investment can pay off both in the ability to ask questions and in the trust you can have in the answers. Exploring data with
vi What’s in the book? A sequence of chapters in the book takes the reader on successive steps from user to programmer to contributor, in the gradual progress that R encourages. Specifically: using R; simple programming; packages; classes and methods; inter-system interfaces (Chapters 2; 3; 4; 9 and 10; 11 and 12). The order reflects a natural progression, but the chapters are largely independent, with many cross references to encourage browsing. Other chapters explore computational techniques needed at all stages: basic computations; graphics; computing with text (Chapters 6; 7; 8). Lastly, a chapter (13) discusses how R works and the appendix covers some topics in the history of the language. Woven throughout are a number of reasonably serious examples, ranging from a few paragraphs to several pages, some of them continued elsewhere as they illustrate different techniques. See “Examples” in the index. I encourage you to explore these as leisurely as time permits, thinking about how the computations evolve, and how you would approach these or similar examples. The book has a companion R package, SoDA, obtainable from the main CRAN repository, as described in Chapter 4. A number of the functions and classes developed in the book are included in the package. The package also contains code for most of the examples; see the documentation for "Examples" in the package. Even at five hundred pages, the book can only cover a fraction of the relevant topics, and some of those receive a pretty condensed treatment. Spending time alternately on reading, thinking, and interactive computation will help clarify much of the discussion, I hope. Also, the final word is with the online documentation and especially with the software; a substantial benefit of open-source software is the ability to drill down and see what’s really happening. Who should read this book? I’ve written this book with three overlapping groups of readers generally in mind. First, “data analysts”; that is, anyone with an interest in exploring data, especially in serious scientific studies. This includes statisticians, certainly, but increasingly others in a wide range of disciplines where data-rich studies now require such exploration. Helping to enable exploration is our mission PREFACE
vii here. I hope and expect that you will find that working with R and re- lated software enhances your ability to learn from the data relevant to your interests. If you have not used R or S-Plus R before, you should precede this book (or at least supplement it) with a more basic presentation. There are a number of books and an even larger number of Web sites. Try searching with a combination of “introduction” or “introductory” along with “R”. Books by W. John Braun and Duncan J. Murdoch [2], Michael Crawley [11], Peter Dalgaard [12], and John Verzani [24], among others, are general introductions (both to R and to statistics). Other books and Web sites are beginning to appear that introduce R or S-Plus with a particular area of application in mind; again, some Web searching with suitable terms may find a presentation attuned to your interests. A second group of intended readers are people involved in research or teaching related to statistical techniques and theory. R and other modern software systems have become essential in the research itself and in commu- nicating its results to the community at large. Most graduate-level programs in statistics now provide some introduction to R. This book is intended to guide you on the followup, in which your software becomes more important to your research, and often a way to share results and techniques with the community. I encourage you to push forward and organize your software to be reusable and extendible, including the prospect of creating an R package to communicate your work to others. Many of the R packages now available derive from such efforts.. The third target group are those more directly interested in software and programming, particularly software for data analysis. The efforts of the R community have made it an excellent medium for “packaging” software and providing it to a large community of users. R is maintained on all the widely used operating systems for computing with data and is easy for users to install. Its package mechanism is similarly well maintained, both in the central CRAN repository and in other repositories. Chapter 4 covers both using packages and creating your own. R can also incorporate work done in other systems, through a wide range of inter-system interfaces (discussed in Chapters 11 and 12). Many potential readers in the first and second groups will have some experience with R or other software for statistics, but will view their involve- ment as doing only what’s absolutely necessary to “get the answers”. This book will encourage moving on to think of the interaction with the software as an important and valuable part of your activity. You may feel inhibited by not having done much programming before. Don’t be. Programming with PREFACE
viii R can be approached gradually, moving from easy and informal to more ambitious projects. As you use R, one of its strengths is its flexibility. By making simple changes to the commands you are using, you can customize interactive graphics or analysis to suit your needs. This is the takeoff point for programming: As Chapters 3 and 4 show, you can move from this first personalizing of your computations through increasingly ambitious steps to create your own software. The end result may well be your own contribution to the world of R-based software. How should you read this book? Any way that you find helpful or enjoyable, of course. But an author often imagines a conversation with a reader, and it may be useful to share my version of that. In many of the discussions, I imagine a reader pausing to decide how to proceed, whether with a specific technical point or to choose a direction for a new stage in a growing involvement with software for data analysis. Various chapters chart such stages in a voyage that many R users have taken from initial, casual computing to a full role as a contributor to the community. Most topics will also be clearer if you can combine reading with hands-on interaction with R and other software, in particular using the Examples in the SoDA package. This pausing for reflection and computing admittedly takes a little time. Often, you will just want a “recipe” for a specific task—what is often called the “cookbook” approach. By “cookbook” in software we usually imply that one looks a topic up in the index and finds a corresponding explicit recipe. That should work sometimes with this book, but we concentrate more on general techniques and extended examples, with the hope that these will equip readers to deal with a wider range of tasks. For the reader in a hurry, I try to insert pointers to online documentation and other resources. As an enthusiastic cook, though, I would point out that the great cook- books offer a range of approaches, similar to the distinction here. Some, such as the essential Joy of Cooking do indeed emphasize brief, explicit recipes. The best of these books are among the cook’s most valuable resources. Other books, such as Jacques Pépin’s masterful La Technique, teach you just that: techniques to be applied. Still others, such as the classic Mastering the Art of French Cooking by Julia Child and friends, are about learning and about underlying concepts as much as about specific techniques. It’s the latter two approaches that most resemble the goals of the present book. The book presents a number of explicit recipes, but the deeper emphasis is in on con- cepts and techniques. And behind those in turn, there will be two general principles of good software for data analyis. PREFACE
ix Acknowledgments The ideas discussed in the book, as well as the software itself, are the results of projects involving many people and stretching back more than thirty years (see the appendix for a little history). Such a scope of participants and time makes identifying all the indi- viduals a hopeless task, so I will take refuge in identifying groups, for the most part. The most recent group, and the largest, consists of the “con- tributors to R”, not easy to delimit but certainly comprising hundreds of people at the least. Centrally, my colleagues in R-core, responsible for the survival, dissemination, and evolution of R itself. These are supplemented by other volunteers providing additional essential support for package manage- ment and distribution, both generally and specifically for repositories such as CRAN, BioConductor, omegahat, RForge and others, as well as the main- tainers of essential information resources—archives of mailing lists, search engines, and many tutorial documents. Then the authors of the thousands of packages and other software forming an unprecedented base of techniques; finally, the interested users who question and prod through the mailing lists and other communication channels, seeking improvements. This commu- nity as a whole is responsible for realizing something we could only hazily articulate thirty-plus years ago, and in a form and at a scale far beyond our imaginings. More narrowly from the viewpoint of this book, discussions within R-core have been invaluable in teaching me about R, and about the many techniques and facilities described throughout the book. I am only too aware of the many remaining gaps in my knowledge, and of course am responsible for all inaccuracies in the descriptions herein. Looking back to the earlier evolution of the S language and software, time has brought an increasing appreciation of the contribution of colleagues and management in Bell Labs research in that era, providing a nourishing environment for our efforts, perhaps indeed a unique environment. Rick Becker, Allan Wilks, Trevor Hastie, Daryl Pregibon, Diane Lambert, and W. S. Cleveland, along with many others, made essential contributions. Since retiring from Bell Labs in 2005, I have had the opportunity to interact with a number of groups, including students and faculty at several universities. Teaching and discussions at Stanford over the last two academic years have been very helpful, as were previous interactions at UCLA and at Auckland University. My thanks to all involved, with special thanks to Trevor Hastie, Mark Hansen, Ross Ihaka and Chris Wild. A number of the ideas and opinions in the book benefited from collab- PREFACE
x orations and discussions with Duncan Temple Lang, and from discussions with Robert Gentleman, Luke Tierney, and other experts on R, not that any of them should be considered at all responsible for defects therein. The late Gene Roddenberry provided us all with some handy terms, and much else to be enjoyed and learned from. Each of our books since the beginning of S has had the benefit of the editorial guidance of John Kimmel; it has been a true and valuable collab- oration, long may it continue. John Chambers Palo Alto, California January, 2008 PREFACE
Contents 1 Introduction: Principles and Concepts 1 1.1 Exploration: The Mission . . . . . . . . . . . . . . . . . . . . 1 1.2 Trustworthy Software: The Prime Directive . . . . . . . . . . 3 1.3 Concepts for Programming with R . . . . . . . . . . . . . . . 4 1.4 The R System and the S Language . . . . . . . . . . . . . . . 9 2 Using R 11 2.1 Starting R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 An Interactive Session . . . . . . . . . . . . . . . . . . . . . . 13 2.3 The Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4 Objects and Names . . . . . . . . . . . . . . . . . . . . . . . . 24 2.5 Functions and Packages . . . . . . . . . . . . . . . . . . . . . 25 2.6 Getting R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.7 Online Information About R . . . . . . . . . . . . . . . . . . . 31 2.8 What’s Hard About Using R? . . . . . . . . . . . . . . . . . . 34 3 Programming with R: The Basics 37 3.1 From Commands to Functions . . . . . . . . . . . . . . . . . 37 3.2 Functions and Functional Programming . . . . . . . . . . . . 43 3.3 Function Objects and Function Calls . . . . . . . . . . . . . . 50 3.4 The Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.5 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.6 Interactive Tracing and Editing . . . . . . . . . . . . . . . . . 67 3.7 Conditions: Errors and Warnings . . . . . . . . . . . . . . . . 74 3.8 Testing R Software . . . . . . . . . . . . . . . . . . . . . . . . 76 4 R Packages 79 4.1 Introduction: Why Write a Package? . . . . . . . . . . . . . . 79 4.2 The Package Concept and Tools . . . . . . . . . . . . . . . . 80 xi
xii CONTENTS 4.3 Creating a Package . . . . . . . . . . . . . . . . . . . . . . . . 85 4.4 Documentation for Packages . . . . . . . . . . . . . . . . . . . 95 4.5 Testing Packages . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.6 Package Namespaces . . . . . . . . . . . . . . . . . . . . . . . 103 4.7 Including C Software in Packages . . . . . . . . . . . . . . . . 108 4.8 Interfaces to Other Software . . . . . . . . . . . . . . . . . . . 108 5 Objects 111 5.1 Objects, Names, and References . . . . . . . . . . . . . . . . . 111 5.2 Replacement Expressions . . . . . . . . . . . . . . . . . . . . 115 5.3 Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.4 Non-local Assignments; Closures . . . . . . . . . . . . . . . . 125 5.5 Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.6 Reading and Writing Objects and Data . . . . . . . . . . . . 135 6 Basic Data and Computations 139 6.1 The Evolution of Data in the S Language . . . . . . . . . . . 140 6.2 Object Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.3 Vectors and Vector Structures . . . . . . . . . . . . . . . . . . 143 6.4 Vectorizing Computations . . . . . . . . . . . . . . . . . . . . 157 6.5 Statistical Data: Data Frames . . . . . . . . . . . . . . . . . . 166 6.6 Operators: Arithmetic, Comparison, Logic . . . . . . . . . . . 184 6.7 Computations on Numeric Data . . . . . . . . . . . . . . . . . 191 6.8 Matrices and Matrix Computations . . . . . . . . . . . . . . . 200 6.9 Fitting Statistical models . . . . . . . . . . . . . . . . . . . . 218 6.10 Programming Random Simulations . . . . . . . . . . . . . . . 221 7 Data Visualization and Graphics 237 7.1 Using Graphics in R . . . . . . . . . . . . . . . . . . . . . . . 238 7.2 The x-y Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 7.3 The Common Graphics Model . . . . . . . . . . . . . . . . . . 253 7.4 The graphics Package . . . . . . . . . . . . . . . . . . . . . . 263 7.5 The grid Package . . . . . . . . . . . . . . . . . . . . . . . . 271 7.6 Trellis Graphics and the lattice Package . . . . . . . . . . . 280 8 Computing with Text 289 8.1 Text Computations for Data Analysis . . . . . . . . . . . . . 289 8.2 Importing Text Data . . . . . . . . . . . . . . . . . . . . . . . 294 8.3 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . 298 8.4 Text Computations in R . . . . . . . . . . . . . . . . . . . . . 304
CONTENTS xiii 8.5 Using and Writing Perl . . . . . . . . . . . . . . . . . . . . . . 309 8.6 Examples of Text Computations . . . . . . . . . . . . . . . . 318 9 New Classes 331 9.1 Introduction: Why Classes? . . . . . . . . . . . . . . . . . . . 331 9.2 Programming with New Classes . . . . . . . . . . . . . . . . . 334 9.3 Inheritance and Inter-class Relations . . . . . . . . . . . . . . 344 9.4 Virtual Classes . . . . . . . . . . . . . . . . . . . . . . . . . . 351 9.5 Creating and Validating Objects . . . . . . . . . . . . . . . . 359 9.6 Programming with S3 Classes . . . . . . . . . . . . . . . . . . 362 9.7 Example: Binary Trees . . . . . . . . . . . . . . . . . . . . . . 369 9.8 Example: Data Frames . . . . . . . . . . . . . . . . . . . . . . 375 10 Methods and Generic Functions 381 10.1 Introduction: Why Methods? . . . . . . . . . . . . . . . . . . 381 10.2 Method Definitions . . . . . . . . . . . . . . . . . . . . . . . . 384 10.3 New Methods for Old Functions . . . . . . . . . . . . . . . . . 387 10.4 Programming Techniques for Methods . . . . . . . . . . . . . 389 10.5 Generic Functions . . . . . . . . . . . . . . . . . . . . . . . . 396 10.6 How Method Selection Works . . . . . . . . . . . . . . . . . . 405 11 Interfaces I: C and Fortran 411 11.1 Interfaces to C and Fortran . . . . . . . . . . . . . . . . . . . . 411 11.2 Calling R-Independent Subroutines . . . . . . . . . . . . . . . 415 11.3 Calling R-Dependent Subroutines . . . . . . . . . . . . . . . . 420 11.4 Computations in C++ . . . . . . . . . . . . . . . . . . . . . . 425 11.5 Loading and Registering Compiled Routines . . . . . . . . . . 426 12 Interfaces II: Other Systems 429 12.1 Choosing an Interface . . . . . . . . . . . . . . . . . . . . . . 430 12.2 Text- and File-Based Interfaces . . . . . . . . . . . . . . . . . 432 12.3 Functional Interfaces . . . . . . . . . . . . . . . . . . . . . . . 433 12.4 Object-Based Interfaces . . . . . . . . . . . . . . . . . . . . . 435 12.5 Interfaces to OOP Languages . . . . . . . . . . . . . . . . . . 437 12.6 Interfaces to C++ . . . . . . . . . . . . . . . . . . . . . . . . . 440 12.7 Interfaces to Databases and Spreadsheets . . . . . . . . . . . 446 12.8 Interfaces without R . . . . . . . . . . . . . . . . . . . . . . . 450
xiv CONTENTS 13 How R Works 453 13.1 The R Program . . . . . . . . . . . . . . . . . . . . . . . . . . 453 13.2 The R Evaluator . . . . . . . . . . . . . . . . . . . . . . . . . 454 13.3 Calls to R Functions . . . . . . . . . . . . . . . . . . . . . . . 460 13.4 Calls to Primitive Functions . . . . . . . . . . . . . . . . . . . 463 13.5 Assignments and Replacements . . . . . . . . . . . . . . . . . 465 13.6 The Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 468 13.7 Memory Management for R Objects . . . . . . . . . . . . . . 471 A Some Notes on the History of S 475 Bibliography 479 Index 481 Index of R Functions and Documentation 489 Index of R Classes and Types 497
Chapter 1 Introduction: Principles and Concepts This chapter presents some of the concepts and principles that recur throughout the book. We begin with the two guiding prin- ciples: the mission to explore and the responsibility to be trust- worthy (Sections 1.1 and 1.2). With these as guidelines, we then introduce some concepts for programming with R (Section 1.3, page 4) and add some justification for our emphasis on that sys- tem (Section 1.4, page 9). 1.1 Exploration: The Mission The first principle I propose is that our Mission, as users and creators of software for data analysis, is to enable the best and most thorough explo- ration of data possible. That means that users of the software must be ale to ask the meaningful questions about their applications, quickly and flexibly. Notice that speed here is human speed, measured in clock time. It’s the time that the actual computations take, but usually more importantly, it’s also the time required to formulate the question and to organize the data in a way to answer it. This is the exploration, and software for data analysis makes it possible. A wide range of techniques is needed to access and transform data, to make predictions or summaries, to communicate results to others, and to deal with ongoing processes. Whenever we consider techniques for these and other requirements in the chapters that follow, the first principle we will try to apply is the Mission: 1
2 CHAPTER 1. INTRODUCTION: PRINCIPLES AND CONCEPTS How can these techniques help people to carry out this specific kind of exploration? Ensuring that software for data analysis exists for such purposes is an important, exciting, and challenging activity. Later chapters examine how we can select and develop software using R and other systems. The importance, excitement, and challenge all come from the central role that data and computing have come to play in modern society. Science, business and many other areas of society continually rely on understanding data, and that understanding frequently involves large and complicated data processes. A few examples current as the book is written can suggest the flavor: • Many ambitious projects are underway or proposed to deploy sensor networks, that is, coordinated networks of devices to record a variety of measurements in an ongoing program. The data resulting is essen- tial to understand environmental quality, the mechanisms of weather and climate, and the future of biodiversity in the earth’s ecosystems. In both scale and diversity, the challenge is unprecedented, and will require merging techniques from many disciplines. • Astronomy and cosmology are undergoing profound changes as a result of large-scale digital mappings enabled by both satellite and ground recording of huge quantities of data. The scale of data collected allows questions to be addressed in an overall sense that before could only be examined in a few, local regions. • Much business activity is now carried out largely through distributed, computerized processes that both generate large and complex streams of data and also offer through such data an unprecedented opportu- nity to understand one’s business quantitatively. Telecommunications in North America, for example, generates databases with conceptually billions of records. To explore and understand such data has great attraction for the business (and for society), but is enormously chal- lenging. These and many other possible examples illustrate the importance of what John Tukey long ago characterized as “the peaceful collision of computing and data analysis”. Progress on any of these examples will require the ability to explore the data, flexibly and in a reasonable time frame.
1.2. TRUSTWORTHY SOFTWARE: THE PRIME DIRECTIVE 3 1.2 Trustworthy Software: The Prime Directive Exploration is our mission; we and those who use our software want to find new paths to understand the data and the underlying processes. The mission is, indeed, to boldly go where no one has gone before. But, we need boldness to be balanced by our responsibility. We have a responsibility for the results of data analysis that provides a key compensating principle. The complexity of the data processes and of the computations applied to them mean that those who receive the results of modern data analysis have limited opportunity to verify the results by direct observation. Users of the analysis have no option but to trust the analysis, and by extension the software that produced it. Both the data analyst and the software provider therefore have a strong responsibility to produce a result that is trustworthy, and, if possible, one that can be shown to be trustworthy. This is the second principle: the computations and the software for data analysis should be trustworthy: they should do what they claim, and be seen to do so. Neither those who view the results of data analysis nor, in many cases, the statisticians performing the analysis can directly validate exten- sive computations on large and complicated data processes. Ironically, the steadily increasing computer power applied to data analysis often distances the results further from direct checking by the recipient. The many com- putational steps between original data source and displayed results must all be truthful, or the effect of the analysis may be worthless, if not pernicious. This places an obligation on all creators of software to program in such a way that the computations can be understood and trusted. This obligation I label the Prime Directive. Note that the directive in no sense discourages exploratory or approx- imate methods. As John Tukey often remarked, better an approximate answer to the right question than an exact answer to the wrong question. We should seek answers boldly, but always explaining the nature of the method applied, in an open and understandable format, supported by as much evidence of its quality as can be produced. As we will see, a number of more technically specific choices can help us satisfy this obligation. Readers who have seen the Star Trek R television series1 may recognize the term “prime directive”. Captains Kirk, Picard, and Janeway and their crews were bound by a directive which (slightly paraphrased) was: Do noth- ing to interfere with the natural course of a new civilization. Do not distort 1 Actually, at least five series, from “The Original” in 1966 through “Enterprise”, not counting the animated version, plus many films. See startrek.com and the many reruns if this is a gap in your cultural background.
4 CHAPTER 1. INTRODUCTION: PRINCIPLES AND CONCEPTS the development. Our directive is not to distort the message of the data, and to provide computations whose content can be trusted and understood. The prime directive of the space explorers, notice, was not their mission but rather an important safeguard to apply in pursuing that mission. Their mission was to explore, to “boldly go where no one has gone before”, and all that. That’s really our mission too: to explore how software can add new abilities for data analysis. And our own prime directive, likewise, is an important caution and guiding principle as we create the software to support our mission. Here, then, are two motivating principles: the mission, which is bold exploration; and the prime directive, trustworthy software. We will examine in the rest of the book how to select and program software for data analysis, with these principles as guides. A few aspects of R will prove to be especially relevant; let’s examine those next. 1.3 Concepts for Programming with R The software and the programming techniques to be discussed in later chap- ters tend to share some concepts that make them helpful for data analysis. Exploiting these concepts will often benefit both the effectiveness of pro- gramming and the quality of the results. Each of the concepts arises nat- urally in later chapters, but it’s worth outlining them together here for an overall picture of our strategy in programming for data analysis. Functional Programming Software in R is written in a functional style that helps both to understand the intent and to ensure that the implementation corresponds to that intent. Computations are organized around functions, which can encapsulate spe- cific, meaningful computational results, with implementations that can be examined for their correctness. The style derives from a more formal theory of functional programming that restricts the computations to obtain well- defined or even formally verifiable results. Clearly, programming in a fully functional manner would contribute to trustworthy software. The S lan- guage does not enforce a strict functional programming approach, but does carry over some of the flavor, particularly when you make some effort to emphasize simple functional definitions with minimal use of non-functional computations. As the scope of the software expands, much of the benefit from functional style can be retained by using functional methods to deal with varied types
1.3. CONCEPTS FOR PROGRAMMING WITH R 5 of data, within the general goal defined by the generic function. Classes and Methods The natural complement to functional style in programming is the definition of classes of objects. Where functions should clearly encapsulate the actions in our analysis, classes should encapsulate the nature of the objects used and returned by calls to functions. The duality between function calls and objects is a recurrent theme of programming with R. In the design of new classes, we seek to capture an underlying concept of what the objects mean. The relevant techniques combine directly specifying the contents (the slots), relating the new class to existing classes (the inheritance), and expressing how objects should be created and validated (methods for initializing and validating). Method definitions knit together functions and classes. Well-designed methods extend the generic definition of what a function does to provide a specific computational method when the argument or arguments come from specified classes, or inherit from those classes. In contrast to methods that are solely class-based, as in common object-oriented programming languages such as C++ or Java, methods in R are part of a rich but complex network of functional and object-based computation. The ability to define classes and methods in fact is itself a major advan- tage in adhering to the Prime Directive. It gives us a way to isolate and define formally what information certain objects should contain and how those objects should behave when functions are applied to them. Data Frames Trustworthy data analysis depends first on trust in the data being analyzed. Not so much that the data must be perfect, which is impossible in nearly any application and in any case beyond our control, but rather that trust in the analysis depends on trust in the relation between the data as we use it and the data as it has entered the process and then has been recorded, organized and transformed. In serious modern applications, the data usually comes from a process external to the analysis, whether generated by scientific observations, com- mercial transactions or any of many other human activities. To access the data for analysis by well-defined and trustworthy computations, we will ben- efit from having a description, or model, for the data that corresponds to its natural home (often in DBMS or spreadsheet software), but can also be
6 CHAPTER 1. INTRODUCTION: PRINCIPLES AND CONCEPTS a meaningful basis for data as used in the analysis. Transformations and restructuring will often be needed, but these should be understandable and defensible. The model we will emphasize is the data frame, essentially a formulation of the traditional view of observations and variables. The data frame has a long history in the S language but modern techniques for classes and meth- ods allow us to extend the use of the concept. Particularly useful techniques arise from using the data frame concept both within R, for model-fitting, data visualization, and other computations, and also for effective commu- nication with other systems. Spreadsheets and relational database software both relate naturally to this model; by using it along with unambiguous mechanisms for interfacing with such software, the meaning and structure of the data can be preserved. Not all applications suit this approach by any means, but the general data frame model provides a valuable basis for trustworthy organization and treatment of many sources of data. Open Source Software Turning to the general characteristics of the languages and systems available, note that many of those discussed in this book are open-source software systems; for example, R, Perl, Python, many of the database systems, and the Linux operating system. These systems all provide access to source code sufficient to generate a working version of the software. The arrangement is not equivalent to “public-domain” software, by which people usually mean essentially unrestricted use and copying. Instead, most open-source systems come with a copyright, usually held by a related group or foundation, and with a license restricting the use and modification of the software. There are several versions of license, the best known being the Gnu Public License and its variants (see gnu.org/copyleft/gpl.html), the famous GPL. R is distributed under a version of this license (see the "COPYING" file in the home directory of R). A variety of other licenses exists; those accepted by the Open Source Initiative are described at opensource.org/licenses. Distinctions among open-source licenses generate a good deal of heat in some discussions, often centered on what effect the license has on the usability of the software for commercial purposes. For our focus, particularly for the concern with trustworthy software for data analysis, these issues are not directly relevant. The popularity of open-source systems certainly owes a lot to their being thought of as “free”, but for our goal of trustworthy software, this is also not the essential property. Two other characteristics contribute more. First, the simple openness itself allows any sufficiently
1.3. CONCEPTS FOR PROGRAMMING WITH R 7 competent observer to enquire fully about what is actually being computed. There are no intrinsic limitations to the validation of the software, in the sense that it is all there. Admittedly, only a minority of users are likely to delve very far into the details of the software, but some do. The ability to examine and critique every part of the software makes for an open-ended scope for verifying the results. Second, open-source systems demonstrably generate a spirit of commu- nity among contributors and active users. User groups, e-mail lists, chat rooms and other socializing mechanisms abound, with vigorous discussion and controversy, but also with a great deal of effort devoted to testing and extension of the systems. The active and demanding community is a key to trustworthy software, as well as to making useful tools readily available. Algorithms and Interfaces R is explicitly seen as built on a set of routines accessed by an interface, in particular by making use of computations in C or Fortran. User-written extensions can make use of such interfaces, but the core of R is itself built on them as well. Aside from routines that implement R-dependent techniques, there are many basic computations for numerical results, data manipulation, simulation, and other specific computational tasks. These implementations we can term algorithms. Many of the core computations on which the R software depends are now implemented by collections of such software that are widely used and tested. The algorithm collections have a long history, often predating the larger-scale open-source systems. It’s an important con- cept in programming with R to seek out such algorithms and make them part of a new computation. You should be able to import the trust built up in the non-R implementation to make your own software more trustworthy. Major collections on a large scale and many smaller, specialized al- gorithms have been written, generally in the form of subroutines in For- tran, C, and a few other general programming languages. Thirty-plus years ago, when I was writing Computational Methods for Data Analysis, those who wanted to do innovative data analysis often had to work directly from such routines for numerical computations or simulation, among other topics. That book expected readers to search out the routines and install them in the readers’ own computing environment, with many details left unspecified. An important and perhaps under-appreciated contribution of R and other systems has been to embed high-quality algorithms for many computa- tions in the system itself, automatically available to users. For example, key parts of the LAPACK collection of computations for numerical linear algebra
8 CHAPTER 1. INTRODUCTION: PRINCIPLES AND CONCEPTS are included in R, providing a basis for fitting linear models and for other matrix computations. Other routines in the collection may not be included, perhaps because they apply to special datatypes or computations not often encountered. These routines can still be used with R in nearly all cases, by writing an interface to the routine (see Chapter 11). Similarly, the internal code for pseudo-random number generation in- cludes most of the well-regarded and thoroughly tested algorithms for this purpose. Other tasks, such as sorting and searching, also use quality al- gorithms. Open-source systems provide an advantage when incorporating such algorithms, because alert users can examine in detail the support for computations. In the case of R, users do indeed question and debate the behavior of the system, sometimes at great length, but overall to the benefit of our trust in programming with R. The best of the algorithm collections offer another important boost for trustworthy software in that the software may have been used in a wide variety of applications, including some where quality of results is critically important. Collections such as LAPACK are among the best-tested substan- tial software projects in existence, and not only by users of higher-level systems. Their adaptability to a wide range of situations is also a frequent benefit. The process of incorporating quality algorithms in a user-oriented system such as R is ongoing. Users can and should seek out the best computations for their needs, and endeavor to make these available for their own use and, through packages, for others as well. Incorporating algorithms in the sense of subroutines in C or Fortran is a special case of what we call inter-system interfaces in this book. The general concept is similar to that for algorithms. Many excellent software systems exist for a variety of purposes, including text-manipulation, spreadsheets, database management, and many others. Our approach to software for data analysis emphasizes R as the central system, for reasons outlined in the next section. In any case, most users will prefer to have a single home system for their data analysis. That does not mean that we should or can absorb all computations di- rectly into R. This book emphasizes the value of expressing computations in a natural way while making use of high-quality implementations in whatever system is suitable. A variety of techniques, explored in Chapter 12, allows us to retain a consistent approach in programming with R at the same time.
1.4. THE R SYSTEM AND THE S LANGUAGE 9 1.4 The R System and the S Language This book includes computations in a variety of languages and systems, for tasks ranging from database management to text processing. Not all systems receive equal treatment, however. The central activity is data analysis, and the discussion is from the perspective that our data analysis is mainly expressed in R; when we examine computations, the results are seen from an interactive session with R. This view does not preclude computations done partly or entirely in other systems, and these computations may be complete in themselves. The data analysis that the software serves, however, is nearly always considered to be in R. Chapter 2 covers the use of R broadly but briefly ( if you have no ex- perience with it, you might want to consult one of the introductory books or other sources mentioned on page vii in the preface). The present section give a brief summary of the system and relates it to the philosophy of the book. R is an open-source software system, supported by a group of volunteers from many countries. The central control is in the hands of a group called R-core, with the active collaboration of a much larger group of contributors. The base system provides an interactive language for numerical computa- tions, data management, graphics and a variety of related calculations. It can be installed on Windows, Mac OS X, and Linux operating systems, with a variety of graphical user interfaces. Most importantly, the base system is supported by well over a thousand packages on the central repository cran.r-project.org and in other collections. R began as a research project of Ross Ihaka and Robert Gentleman in the 1990s, described in a paper in 1996 [17]. It has since expanded into software used to implement and communicate most new statistical techniques. The software in R implements a version of the S language, which was designed much earlier by a group of us at Bell Laboratories, described in a series of books ([1], [6], and [5] in the bibliography). The S-Plus system also implements the S language. Many of the com- putations discussed in the book work in S-Plus as well, although there are important differences in the evaluation model, noted in later chapters. For more on the history of S, see Appendix A, page 475. The majority of the software in R is itself written in the same language used for interacting with the system, a dialect of the S language. The lan- guage evolved in essentially its present form during the 1980s, with a gen- erally functional style, in the sense used on page 4: The basic unit of pro- gramming is a function. Function calls usually compute an object that is a
10 CHAPTER 1. INTRODUCTION: PRINCIPLES AND CONCEPTS function of the objects passed in as arguments, without side effects to those arguments. Subsequent evolution of the language introduced formal classes and methods, again in the sense discussed in the previous section. Methods are specializations of functions according to the class of one or more of the arguments. Classes define the content of objects, both directly and through inheritance. R has added a number of features to the language, while remain- ing largely compatible with S. All these topics are discussed in the present book, particularly in Chapters 3 for functions and basic programming, 9 for classes, and 10 for methods. So why concentrate on R? Clearly, and not at all coincidentally, R reflects the same philosophy that evolved through the S language and the approach to data analysis at Bell Labs, and which largely led me to the concepts I’m proposing in this book. It is relevant that S began as a medium for statistics researchers to express their own computations, in support of research into data analysis and its applications. A direct connection leads from there to the large community that now uses R similarly to implement new ideas in statistics, resulting in the huge resource of R packages. Added to the characteristics of the language is R’s open-source nature, exposing the system to continual scrutiny by users. It includes some al- gorithms for numerical computations and simulation that likewise reflect modern, open-source computational standards in these fields. The LAPACK software for numerical linear algebra is an example, providing trustworthy computations to support statistical methods that depend on linear algebra. Although there is plenty of room for improvement and for new ideas, I believe R currently represents the best medium for quality software in sup- port of data analysis, and for the implementation of the principles espoused in the present book. From the perspective of our first development of S some thirty-plus years ago, it’s a cause for much gratitude and not a little amazement.
Chapter 2 Using R This chapter covers the essentials for using R to explore data in- teractively. Section 2.1 covers basic access to an R session. Users interact with R through a single language for both data analy- sis and programming (Section 2.3, page 19). The key concepts are function calls in the language and the objects created and used by those calls (2.4, 24), two concepts that recur through- out the book. The huge body of available software is organized around packages that can be attached to the session, once they are installed (2.5, 25). The system itself can be downloaded and installed from repositories on the Web (2.6, 29); there are also a number of resources on the Web for information about R (2.7, 31). Lastly, we examine aspects of R that may raise difficulties for some new users (2.8, 34). 2.1 Starting R R runs on the commonly used platforms for personal computing: Windows R , Mac OS X R , Linux, and some versions of UNIX R . In the usual desktop en- vironments for these platforms, users will typically start R as they would most applications, by clicking on the R icon or on the R file in a folder of applications. An application will then appear looking much like other applications on the platform: for example, a window and associated toolbar. In the 11
12 CHAPTER 2. USING R standard version, at least on most platforms, the application is called the "R Console". In Windows recently it looked like this: The application has a number of drop-down menus; some are typical of most applications ("File", "Edit", and "Help"). Others such as "Packages" are special to R. The real action in running R, however, is not with the menus but in the console window itself. Here the user is expected to type input to R in the form of expressions; the program underlying the application responds by doing some computation and if appropriate by displaying a version of the results for the user to look at (printed results normally in the same console window, graphics typically in another window). This interaction between user and system continues, and constitutes an R session. The session is the fundamental user interface to R. The following section describes the logic behind it. A session has a simple model for user interaction, but one that is fundamentally different from users’ most common experience with personal computers (in applications such as word processors, Web browsers, or audio/video systems). First-time users may feel abandoned, left to flounder on their own with little guidance about what to do and even less help when they do something wrong. More guidance is available than may be obvious, but such users are not entirely wrong in their
2.2. AN INTERACTIVE SESSION 13 reaction. After intervening sections present the essential concepts involved in using R, Section 2.8, page 34 revisits this question. 2.2 An Interactive Session Everything that you do interactively with R happens in a session. A session starts when you start up R, typically as described above. A session can also be started from other special interfaces or from a command shell (the original design), without changing the fundamental concept and with the basic appearance remaining as shown in this section and in the rest of the book. Some other interfaces arise in customizing the session, on page 17. During an R session, you (the user) provide expressions for evaluation by R, for the purpose of doing any sort of computation, displaying results, and creating objects for further use. The session ends when you decide to quit from R. All the expressions evaluated in the session are just that: general ex- pressions in R’s version of the S language. Documentation may mention “commands” in R, but the term just refers to a complete expression that you type interactively or otherwise hand to R for evaluation. There’s only one language, used for either interactive data analysis or for programming, and described in section 2.3. Later sections in the book come back to ex- amine it in more detail, especially in Chapter 3. The R evaluator displays a prompt, and the user responds by typing a line of text. Printed output from the evaluation and other messages appear following the input line. Examples in the book will be displayed in this form, with the default prompts preceding the user’s input: > quantile(Declination) 0% 25% 50% 75% 100% -27.98 -11.25 8.56 17.46 27.30 The "> " at the beginning of the example is the (default) prompt string. In this example the user responded with quantile(Declination) The evaluator will keep prompting until the input can be interpreted as a complete expression; if the user had left off the closing ")", the evaluator would have prompted for more input. Since the input here is a complete expression, the system evaluated it. To be pedantic, it parsed the input text
14 CHAPTER 2. USING R and evaluated the resulting object. The evaluation in this case amounts to calling a function named quantile. The printed output may suggest a table, and that’s intentional. But in fact nothing special happened; the standard action by the evaluator is to print the object that is the value of the expression. All evaluated expressions are objects; the printed output corresponds to the object; specifically, the form of printed output is determined by the kind of object, by its class (tech- nically, through a method selected for that class). The call to quantile() returned a numeric vector, that is, an object of class "numeric". A method was selected based on this class, and the method was called to print the result shown. The quantile() function expects a vector of numbers as its argument; with just this one argument it returns a numeric vector containing the minimum, maximum, median and quartiles. The method for printing numeric vectors prints the values in the vec- tor, five of them in this case. Numeric objects can optionally have a names attribute; if they do, the method prints the names as labels above the num- bers. So the "0%" and so on are part of the object. The designer of the quantile() function helpfully chose a names attribute for the result that makes it easier to interpret when printed. All these details are unimportant if you’re just calling quantile() to summarize some data, but the important general concept is this: Objects are the center of computations in R, along with the function calls that create and use those objects. The duality of objects and function calls will recur in many of our discussions. Computing with existing software hinges largely on using and creating objects, via the large number of available functions. Programming, that is, creating new software, starts with the simple creation of function objects. More ambitious projects often use a paradigm of creating new classes of objects, along with new or modified functions and methods that link the functions and classes. In all the details of programming, the fundamental duality of objects and functions remains an underlying concept. Essentially all expressions are evaluated as function calls, but the lan- guage includes some forms that don’t look like function calls. Included are the usual operators, such as arithmetic, discussed on page 21. Another use- ful operator is `?`, which looks up R help for the topic that follows the question mark. To learn about the function quantile(): > ?quantile In standard GUI interfaces, the documentation will appear in a separate window, and can be generated from a pull-down menu as well as from the
2.2. AN INTERACTIVE SESSION 15 `?` operator. Graphical displays provide some of the most powerful techniques in data analysis, and functions for data visualization and other graphics are an es- sential part of R: > plot(Date, Declination) Here the user typed another expression, plot(Date, Declination); in this case producing a scatter plot as a side effect, but no printed output. The graphics during an interactive session typically appear in one or more sepa- rate windows created by the GUI, in this example a window using the native quartz() graphics device for Mac OS X. Graphic output can also be produced in a form suitable for inclusion in a document, such as output in a general file format (PDF or postscript, for example). Computations for graphics are discussed in more detail in Chapter 7. The sequence of expression and evaluation shown in the examples is es- sentially all there is to an interactive session. The user supplies expressions and the system evaluates them, one after another. Expressions that pro- duce simple summaries or plots are usually done to see something, either graphics or printed output. Aside from such immediate gratification, most expressions are there in order to assign objects, which can then be used in later computations: > fitK <- gam(Kyphosis ∼ s(Age, 4) + Number, family = binomial) Evaluating this expression calls the function gam() and assigns the value of the call, associating that object with the name fitK. For the rest of the
16 CHAPTER 2. USING R session, unless some other assignment to this name is carried out, fitK can be used in any expression to refer to that object; for example, coef(fitK) would call a function to extract some coefficients from fitK (which is in this example a fitted model). Assignments are a powerful and interesting part of the language. The basic idea is all we need for now, and is in any case the key concept: As- signment associates an object with a name. The term “associates” has a specific meaning here. Whenever any expression is evaluated, the context of the evaluation includes a local environment, and it is into this environ- ment that the object is assigned, under the corresponding name. The object and name are associated in the environment, by the assignment operation. From then on, the name can be used as a reference to the object in the en- vironment. When the assignment takes place at the “top level” (in an input expression in the session), the environment involved is the global environ- ment. The global environment is part of the current session, and all objects assigned there remain available for further computations in the session. Environments are an important part of programming with R. They are also tricky to deal with, because they behave differently from other objects. Discussion of environments continues in Section 2.4, page 24. A session ends when the user quits from R, either by evaluating the expression q() or by some other mechanism provided by the user interface. Before ending the session, the system offers the user a chance to save all the objects in the global environment at the end of the session: > q() Save workspace image? [y/n/c]: y If the user answers yes, then when a new session is started in the same working directory, the global environment will be restored. Technically, the environment is restored, not the session. Some actions you took in the session, such as attaching packages or using options(), may not be restored, if they don’t correspond to objects in the global environment. Unfortunately, your session may end involuntarily: the evaluator may be forced to terminate the session or some outside event may kill the process. R tries to save the workspace even when fatal errors occur in low-level C or Fortran computations, and such disasters should be rare in the core R computations and in well-tested packages. But to be truly safe, you should explicitly back up important results to a file if they will be difficult to re- create. See documentation for functions save() and dump() for suitable techniques.
2.2. AN INTERACTIVE SESSION 17 Customizing the R session As you become a more involved user of R, you may want to customize your interaction with it to suit your personal preferences or the goals motivating your applications. The nature of the system lends itself to a great variety of options from the most general to trivial details. At the most general is the choice of user interface. So far, we have assumed you will start R as you would start other applications on your computer, say by clicking on the R icon. A second approach, available on any system providing both R and a command shell, is to invoke R as a shell command. In its early history, S in all its forms was typically started as a program from an interactive shell. Before multi-window user interfaces, the shell would be running on an interactive terminal of some sort, or even on the machine’s main console. Nowadays, shells or terminal applications run in their own windows, either supported directly by the platform or indirectly through a client window system, such as those based on X11. Invoking R from a shell allows some flexibility that may not be provided directly by the application (such as run- ning with a C-level debugger). Online documentation from a shell command is printed text by default, which is not as convenient as a browser interface. To initiate a browser interface to the help facility, see the documentation for help.start(). A third approach, somewhat in between the first two, is to use a GUI based on another application or language, potentially one that runs on mul- tiple platforms. The most actively supported example of this approach is ESS, a general set of interface tools in the emacs editor. ESS stands for Emacs Speaks Statistics, and the project supports other statistical systems as well as R; see ess.r-project.org. For those who love emacs as a general com- putational environment, ESS provides a variety of GUI-like features, plus a user-interface programmability characteristic of emacs. The use of a GUI based on a platform-independent user interface has advantages for those who need to work regularly on more than one operating system. Finally, an R session can be run in a non-interactive form, usually invoked in a batch mode from a command shell, with its input taken from a file or other source. R can also be invoked from within another application, as part of an inter-system interface. In all these situations, the logic of the R session remains essentially the same as shown earlier (the major exception being a few computations in R that behave differently in a non-interactive session).
18 CHAPTER 2. USING R Encoding of text A major advance in R’s world view came with the adoption of multiple locales, using information available to the R session that defines the user’s preferred encoding of text and other options related to the human language and geographic location. R follows some evolving standards in this area. Many of those standards apply to C software, and therefore they fit fairly smoothly into R. Normally, default locales will have been set when R was installed that reflect local language and other conventions in your area. See Section 8.1, page 293, and ?locales for some concepts and techniques related to locales. The specifications use standard but somewhat unintuitive terminology; un- less you have a particular need to alter behavior for parsing text, sorting character data, or other specialized computations, caution suggests sticking with the default behavior. Options during evaluation R offers mechanisms to control aspects of evaluation in the session. The function options() is used to share general-purpose values among functions. Typical options include the width of printed output, the prompt string shown by the parser, and the default device for graphics. The options() mechanism maintains a named list of values that persist through the session; functions use those values, by extracting the relevant option via getOption(): > getOption("digits") [1] 7 In this case, the value is meant to be used to control the number of digits in printing numerical data. A user, or in fact any function, can change this value, by using the same name as an argument to options(): > 1.234567890 [1] 1.234568 > options(digits = 4) > 1.234567890 [1] 1.235 For the standard options, see ?options; however, a call to options() can be used by any computation to set values that are then used by any other computation. Any argument name is legal and will cause the corresponding option to be communicated among functions.
2.3. THE LANGUAGE 19 Options can be set from the beginning of the session; see ?Startup. How- ever, saving a workspace image does not cause the options in effect to be saved and restored. Although the options() mechanism does use an R ob- ject, .Options, the internal C code implementing options() takes the object from the base package, not from the usual way of finding objects. The code also enforces some constraints on what’s legal for particular options; for ex- ample, "digits" is interpreted as a single integer, which is not allowed to be too small or too large, according to values compiled into R. The use of options() is convenient and even necessary for the evalu- ator to behave intelligently and to allow user customization of a session. Writing functions that depend on options, however, reduces our ability to understand these functions’ behavior, because they now depend on exter- nal, changeable values. The behavior of code that depends on an option may be altered by any other function called at any earlier time during the session, if the other function calls options(). Most R programming should be functional programming, in the sense that each function call performs a well-defined computation depending only on the arguments to that call. The options() mechanism, and other dependencies on external data that can change during the session, compromise functional programming. It may be worth the danger, but think carefully about it. See page 47 for more on the programming implications, and for an example of the dangers. 2.3 The Language This section and the next describe the interactive language as you need to use it during a session. But as noted on page 13, there is no interactive lan- guage, only the one language used for interaction and for programming. To use R interactively, you basically need to understand two things: functions and objects. That same duality, functions and objects, runs through every- thing in R from an interactive session to designing large-scale software. For interaction, the key concepts are function calls and assignments of objects, dealt with in this section and in section 2.4 respectively. The language also has facilities for iteration and testing (page 22), but you can often avoid interactive use of these, largely because R function calls operate on, and return, whole objects. Function Calls As noted in Section 2.2, the essential computation in R is the evaluation of a call to a function. Function calls in their ordinary form consist of
20 CHAPTER 2. USING R the function’s name followed by a parenthesized argument list; that is, a sequence of arguments separated by commas. plot(Date, Declination) glm(Survived ∼ .) Arguments in function calls can be any expression. Each function has a set of formal arguments, to which the actual arguments in the call are matched. As far as the language itself is concerned, a call can supply any subset of the complete argument list. For this purpose, argument expressions can optionally be named, to associate them with a particular argument of the function: jitter(y, amount = .1 * rse) The second argument in the call above is explicitly matched to the formal argument named amount. To find the argument names and other information about the function, request the online documentation. A user interface to R or a Web browser gives the most convenient access to documentation, with documentation listed by package and within package by topic, including individual functions by name. Documentation can also be requested in the language, for example: > ?jitter This will produce some display of documentation for the topic "jitter", including in the case of a function an outline of the calling sequence and a discussion of individual arguments. If there is no documentation, or you don’t quite believe it, you can find the formal argument names from the function object itself: > formalArgs(jitter) [1] "x" "factor" "amount" Behind this, and behind most techniques involving functions, is the simple fact that jitter and all functions are objects in R. The function name is a reference to the corresponding object. So to see what a function does, just type its name with no argument list following. > jitter function (x, factor = 1, amount = NULL) { if (length(x) == 0) return(x) if (!is.numeric(x)) stop("’x’ must be numeric") etc.
2.3. THE LANGUAGE 21 The printed version is another R expression, meaning that you can input such an expression to define a function. At which point, you are programming in R. See Chapter 3. The first section of that chapter should get you started. In principle, the function preceding the parenthesized arguments can be specified by any expression that returns a function object, but in practice functions are nearly always specified by name. Operators Function calls can also appear as operator expressions in the usual scientific notation. y - mean(y) weight > 0 x < 100 | is.na(date) The usual operators are defined for arithmetic, comparisons, and logical operations (see Chapter 6). But operators in R are not built-in; in fact, they are just special syntax for certain function calls. The first line in the example above computes the same result as: `-`(y, mean(y)) The notation `-` is an example of what are called backtick quotes in R. These quotes make the evaluator treat an arbitrary string of characters as if it was a name in the language. The evaluator responds to the names "y" or "mean" by looking for an object of that name in the current environment. Similarly `-` causes the evaluator to look for an object named "-". Whenever we refer to operators in the book we use backtick quotes to emphasize that this is the name of a function object, not treated as intrinsically different from the name mean. Functions to extract components or slots from objects are also provided in operator form: mars$Date classDef@package And the expressions for extracting subsets or elements from objects are also actually just specialized function calls. The expression y[i] is recognized in the language and evaluated as a call to the function `[`, which extracts a subset of the object in its first argument, with the subset defined by the remaining arguments. The expression y[i] is equivalent to:
22 CHAPTER 2. USING R `[`(y, i) You could enter the second form perfectly legally. Similarly, the function `[[` extracts a single element from an object, and is normally presented as an operator expression: mars[["Date"]] You will encounter a few other operators in the language. Frequently useful for elementary data manipulation is the `:` operator, which produces a sequence of integers between its two arguments: 1:length(x) Other operators include `∼`, used in specifying models, `%%` for modulus, `%*%` for matrix multiplication, and a number of others. New operators can be created and recognized as infix operators by the parser. The last two operators mentioned above are examples of the general convention in the language that interprets %text% as the name of an operator, for any text string. If it suits the style of computation, you can define any function of two arguments and give it, say, the name `%d%`. Then an expression such as x %d% y will be evaluated as the call: `%d%`(x, y) Iteration: A quick introduction The language used by R has the iteration and conditional expressions typical of a C-style language, but for the most part you can avoid typing all but the simplest versions interactively. The following is a brief guide to using and avoiding iterative expressions. The workhorse of iteration is the for loop. It has the form: for( var in seq ) expr
2.3. THE LANGUAGE 23 where var is a name and seq is a vector of values. The loop assigns each element of seq to var in sequence and then evaluates the arbitrary expression expr each time. When you use the loop interactively, you need to either show something each time (printed or graphics) or else assign the result somewhere; otherwise, you won’t get any benefit from the computation. For example, the function plot() has several “types” of x-y plots (points, lines, both, etc.). To repeat a plot with different types, one can use a for() loop over the codes for the types: > par(ask=TRUE) > for(what in c("p","l","b")) Declination, type = what) The call to par() caused the graphics to pause between plots, so we get to see each plot, rather then having the first two flash by. The variables Date and Declination come from some data on the planet Mars, in a data frame object, mars (see Section 6.5, page 176). If we wanted to see the class of each of the 17 variables in that data frame, another for() loop would do it: for(j in names(mars)) print(class(mars[,j])) But this will just print 17 lines of output, which we’ll need to relate to the variable names. Not much use. Here’s where an alternative to iteration is usually better. The workhorse of these is the function sapply(). It applies a function to each element of the object it gets as its first argument, so: > sapply(mars,class) Year X Year.1 Month "integer" "logical" "integer" "integer" Day Day..adj. Hour Min etc. The function tries to simplify the result, and is intelligent enough to include the names as an attribute. See ?sapply for more details, and the “See Also” section of that documentation for other similar functions. The language has other iteration operators (while() and repeat), and the usual conditional operators (if ... else). These are all useful in pro- gramming and discussed in Chapter 3. By the time you need to use them in a non-trivial way interactively, in fact, you should consider turning your computation into a function, so Chapter 3 is indeed the place to look; see Section 3.4, page 58, in particular, for more detail about the language. plot(Date,
24 CHAPTER 2. USING R 2.4 Objects and Names A motto in discussion of the S language has for many years been: every- thing is an object. You will have a potentially very large number of objects available in your R session, including functions, datasets, and many other classes of objects. In ordinary computations you will create new objects or modify existing ones. As in any computing language, the ability to construct and modify ob- jects relies on a way to refer to the objects. In R, the fundamental reference to an object is a name. This is an essential concept for programming with R that arises throughout the book and in nearly any serious programming project. The basic concept is once again the key thing to keep in mind: references to objects are a way for different computations in the language to refer to the same object; in particular, to make changes to that object. In the S language, references to ordinary objects are only through names. And not just names in an abstract, global sense. An object reference must be a name in a particular R environment. Typically, the reference is established initially either by an assignment or as an argument in a function call. Assignment is the obvious case, as in the example on page 15: > fitK <- gam(Kyphosis ∼ s(Age, 4) + Number, family = binomial) Assignment creates a reference, the name "fitK", to some object. That ref- erence is in some environment. For now, just think of environments as tables that R maintains, in which objects can be assigned names. When an assign- ment takes place in the top-level of the R session, the current environment is what’s called the global environment. That environment is maintained throughout the current session, and optionally can be saved and restored between sessions. Assignments appear inside function definitions as well. These assign- ments take place during a call to the function. They do not use the global environment, fortunately. If they did, every assignment to the name "x" would overwrite the same reference. Instead, assignments during function calls use an environment specially created for that call. So another reason that functions are so central to programming with R is that they protect users from accidentally overwriting objects in the middle of a computation. The objects available during an interactive R session depend on what packages are attached; technically, they depend on the nested environments through which the evaluator searches, when given a name, to find a corre- sponding object. See Section 5.3, page 121, for the details of the search.
2.5. FUNCTIONS AND PACKAGES 25 2.5 Functions and Packages In addition to the software that comes with any copy of R, there are many thousands of functions available to be used in an R session, along with a correspondingly large amount of other related software. Nearly all of the important R software comes in the form of packages that make the software easily available and usable. This section discusses the implications of using different packages in your R session. For much more detail, see Chapter 4, but that is written more from the view of writing or extending a package. You will get there, I hope, as your own programming efforts take shape. The topic here, though, is how best to use other people’s efforts that have been incorporated in packages. The process leading from needing some computational tool to having it available in your R session has three stages: finding the software, typically in a package; installing the package; and attaching the package to the session. The last step is the one you will do most often, so let’s begin by assuming that you know which package you need and that the required package has been installed with your local copy of R. See Section 2.5, page 26, for finding and installing the relevant package. You can tell whether the package is attached by looking for it in the printed result of search(); alternatively, you can look for a particular ob- ject with the function find(), which returns the names of all the attached packages that contain the object. Suppose we want to call the function dotplot(), for example. > find("dotplot") character(0) No attached package has an object of this name. If we happen to know that the function is in the package named lattice, we can make that package available for the current session. A call to the function library() requests this: library(lattice) The function is library() rather than package() only because the original S software called them libraries. Notice also that the package name was given without quotes. The library() function, and a similar function require(), do some nonstandard evaluation that takes unquoted names. That’s another historical quirk that saves users from typing a couple of quote characters. If a package of the name "lattice" has been installed for this version of R, the call will attach the package to the session, making its functions and other objects available:
26 CHAPTER 2. USING R > library(lattice) > find("dotplot") [1] "package:lattice" By “available”, we mean that the evaluator will find an object belonging to the package when an expression uses the corresponding name. If the user types dotplot(Declination) now, the evaluator will normally find the appropriate function. To see why the quibbling “normally” was added, we need to say more precisely what happens to find a function object. The evaluator looks first in the global environment for a function of this name, then in each of the attached packages, in the order shown by search(). The evaluator will generally stop searching when it finds an object of the desired name, dotplot, Declination, or whatever. If two attached packages have functions of the same name, one of them will “mask” the object in the other (the evaluator will warn of such conflicts, usually, when a package is attached with conflicting names). In this case, the result returned by find() would show two or more packages. For example, the function gam() exists in two packages, gam and mgcv. If both were attached: > find("gam") [1] "package:gam" "package:mgcv" A simple call to gam() will get the version in package gam; the version in package mgcv is now masked. R has some mechanisms designed to get around such conflicts, at least as far as possible. The language has an operator, `::`, to specify that an object should come from a particular package. So mgcv::gam and gam::gam refer unambiguously to the versions in the two packages. The masked version of gam() could be called by: > fitK <- mgcv::gam(Kyphosis ∼ s(Age, 4) + etc. Clearly one doesn’t want to type such expressions very often, and they only help if one is aware of the ambiguity. For the details and for other approaches, particularly when you’re programming your own packages, see Section 5.3, page 121. Finding and installing packages Finding the right software is usually the hardest part. There are thousands of packages and smaller collections of R software in the world. Section 2.7, page 31, discusses ways to search for information; as a start, CRAN, the
2.5. FUNCTIONS AND PACKAGES 27 central repository for R software, has a large collection of packages itself, plus further links to other sources for R software. Extended browsing is recommended, to develop a general feel for what’s available. CRAN supports searching with the Google search engine, as do some of the other major collections. Use the search engine on the Web site to look for relevant terms. This may take some iteration, particularly if you don’t have a good guess for the actual name of the function. Browse through the search output, looking for a relevant entry, and figure out the name of the package that contains the relevant function or other software. Finding something which is not in these collections may take more in- genuity. General Web search techniques often help: combine the term "R" with whatever words describe your needs in a search query. The e-mail lists associated with R will usually show up in such a search, but you can also browse or search explicitly in the archives of the lists. Start from the R home page, r-project.org, and follow the link for "Mailing Lists". On page 15, we showed a computation using the function gam(), which fits a generalized additive model to data. This function is not part of the basic R software. Before being able to do this computation, we need to find and install some software. The search engine at the CRAN site will help out, if given either the function name "gam" or the term "generalized additive models". The search engine on the site tends to give either many hits or no relevant hits; in this case, it turns out there are many hits and in fact two packages with a gam() function. As an example, suppose we decide to install the gam package. There are two choices at this point, in order to get and install the pack- age(s) in question: a binary or a source copy of the package. Usually, installing from binary is the easy approach, assuming a binary version is available from the repository. Binary versions are currently available from CRAN only for Windows and Mac OS X platforms, and may or may not be available from other sources. Otherwise, or if you prefer to install from source, the procedure is to download a copy of the source archive for the package and apply the "INSTALL" command. From an R session, the function install.packages() can do part or all of the process, again depending on the package, the repository, and your particular platform. The R GUI may also have a menu-driven equivalent for these procedures: Look for an item in the tool bar about installing packages. First, here is the function install.packages(), as applied on a Mac OS X platform. To obtain the gam package, for example:
28 CHAPTER 2. USING R install.packages("gam") The function will then invoke software to access a CRAN site, download the packages requested, and attempt to install them on the same R system you are currently using. The actual download is an archive file whose name concatenates the name of the package and its current version; in our example, "gam 0.98.tgz". Installing from inside a session has the advantage of implicitly specifying some of the information that you might otherwise need to provide, such as the version of R and the platform. Optional arguments control where to put the installed packages, whether to use source or binary and other details. As another alternative, you can obtain the download file from a Web browser, and run the installation process from the command shell. If you aren’t already at the CRAN Web site, select that item in the navigation frame, choose a mirror site near you, and go there. Select "Packages" from the CRAN Web page, and scroll or search in the list of packages to reach a package you want (it’s a very long list, so searching for the exact name of the package may be required). Selecting the relevant package takes you to a page with a brief description of the package. For the package gam at the time this is written: At this stage, you can access the documentation or download one of the proffered versions of the package. Or, after studying the information, you could revert to the previous approach and use install.packages(). If you do work from one of the source or binary archives, you need to apply the shell-style command to install the package. Having downloaded the source archive for package gam, the command would be:
2.6. GETTING R 29 R CMD INSTALL gam_0.98.tar.gz The INSTALL utility is used to install packages that we write ourselves as well, so detailed discussion appears in Chapter 4. The package for this book In order to follow the examples and suggested computations in the book, you should install the SoDA package. It is available from CRAN by any of the mechanisms shown above. In addition to the many references to this package in the book itself, it will be a likely source for new ideas, enhancements, and corrections related to the book. 2.6 Getting R R is an open-source system, in particular a system licensed under the GNU Public license. That license requires that the source code for the system be freely available. The current source implementing R can be obtained over the Web. This open definition of the system is a key support when we are concerned with trustworthy software, as is the case with all similar open-source systems. Relatively simple use of R, and first steps in programming with R, on the other hand, don’t require all the resources that would be needed to create your local version of the system starting from the source. You may already have a version of R on your computer or network. If not, or if you want a more recent version, binary copies of R can be obtained for the commonly used platforms, from the same repository. It’s easier to start with binary, although as your own programming becomes more advanced you may need more of the source-related resources anyway. The starting point for obtaining the software is the central R Web site, r-project.org. You can go there to get the essential information about R. Treat that as the up-to-date authority, not only for the software itself but also for detailed information about R (more on that on page 31). The main Web site points you to a variety of pages and other sites for various purposes. To obtain R, one goes to the CRAN repository, and from there to either "R Binaries" or "R Sources". Downloading software may involve large transfers over the Web, so you are encouraged to spread the load. In particular, you should select from a list of mirror sites, preferably picking one geographically near your own location. When we talk about the
30 CHAPTER 2. USING R CRAN site from now on, we mean whichever one of the mirror sites you have chosen. R is actively maintained for three platforms: Windows, Mac OS X, and Linux. For these platforms, current versions of the system can be obtained from CRAN in a form that can be directly installed, usually by a standard in- stallation process for that platform. For Windows, one obtains an executable setup program (a ".exe" file); for Mac OS X, a disk image (a ".dmg" file) con- taining the installer for the application. The Linux situation is a little less straightforward, because the different flavors of Linux differ in details when installing R. The Linux branch of "R Binaries" branches again according to the flavors of Linux supported, and sometimes again within these branches according to the version of this flavor. The strategy is to keep drilling down through the directories, selecting at each stage the directory that corre- sponds to your setup, until you finally arrive at a directory that contains appropriate files (usually ".rpm" files) for the supported versions of R. Note that for at least one flavor of Linux (Debian), R has been made a part of the platform. You can obtain R directly from the Debian Web site. Look for Debian packages named "r-base", and other names starting with "r-". If you’re adept at loading packages into Debian, working from this direction may be the simplest approach. However, if the version of Debian is older than the latest stable version of R, you may miss out on some later improvements and bug fixes unless you get R from CRAN. For any platform, you will eventually download a file (".exe", "dmg", ".rpm", or other), and then install that file according to the suitable ritual for this platform. Installation may require you to have some administration privileges on the machine, as would be true for most software installations. (If installing software at all is a new experience for you, it may be time to seek out a more experienced friend.) Depending on the platform, you may have a choice of versions of R, but it’s unlikely you want anything other than the most recent stable version, the one with the highest version number. The platform’s operating system will also have versions, and you generally need to download a file asserted to work with the version of the operating system you are running. (There may not be any such file if you have an old version of the operating system, or else you may have to settle for a comparably ancient version of R.) And just to add further choices, on some platforms you need to choose from different hardware (for example, 32-bit versus 64-bit architecture). If you don’t know which choice applies, that may be another indication that you should seek expert advice. Once the binary distribution has been downloaded and installed, you should have direct access to R in the appropriate mechanism for your plat-
2.7. ONLINE INFORMATION ABOUT R 31 form. Installing from source Should you? For most users of R, not if they can avoid it, because they will likely learn more about programming than they need to or want to. For readers of this book, on the other hand, many of these details will be relevant when you start to seriously create or modify software. Getting the source, even if you choose not to install it, may help you to study and understand key computations. The instructions for getting and for installing R from source are contained in the online manual, R Installation and Administration, available from the Documentation link at the r-project.org Web site. 2.7 Online Information About R Information for users is in various ways both a strength and a problem with open-source, cooperative enterprises like R. At the bottom, there is always the source, the software itself. By definition, no software that is not open to study of all the source code can be as available for deep study. In this sense, only open-source software can hope to fully satisfy the Prime Directive by offering unlimited examination of what is actually being computed. But on a more mundane level, some open-source systems have a reputa- tion for favoring technical discussions aimed at the insider over user-oriented documentation. Fortunately, as the R community has grown, an increasing effort has gone into producing and organizing information. Users who have puzzled out answers to practical questions have increasingly fed back the results into publicly available information sources. Most of the important information sources can be tracked down starting at the main R Web page, r-project.org. Go there for the latest pointers. Here is a list of some of the key resources, followed by some comments about them. Manuals: The R distribution comes with a set of manuals, also available at the Web site. There are currently six manuals: An Introduction to R, Writing R Extensions, R Data Import/Export, The R Language Definition, R Installation and Administration, and R Internals. Each is available in several formats, notably as Web-browsable HTML docu- ments.
32 CHAPTER 2. USING R Help files: R itself comes with files that document all the functions and other objects intended for public use, as well as documentation files on other topics (for example, ?Startup, discussing how an R session starts). All contributed packages should likewise come with files documenting their publicly usable functions. The quality control tools in R largely enforce this for packages on CRAN. Help files form the database used to respond to the help requests from an R session, either in response to the Help menu item or through the `?` operator or help() function typed by the user. The direct requests in these forms only access terms explicitly labeling the help files; typically, the names of the functions and a few other general terms for documentation (these are called aliases in discussions of R documentation). For example, to get help on a function in this way, you must know the name of the function exactly. See the next item for alternatives. Searching: R has a search mechanism for its help files that generalizes the terms available beyond the aliases somewhat and introduces some additional searching flexibility. See ?help.search for details. The r-project.org site has a pointer to a general search of the files on the central site, currently using the Google search engine. This pro- duces much more general searches. Documentation files are typically displayed in their raw, L A TEX-like form, but once you learn a bit about this, you can usually figure out which topic in which package you need to look at. And, beyond the official site itself, you can always apply your favorite Web search to files generally. Using "R" as a term in the search pattern will usually generate appropriate entries, but it may be difficult to avoid plenty of inappropriate ones as well. The Wiki: Another potentially useful source of information about R is the site wiki.r-project.org, where users can contribute documentation. As with other open Wiki sites, this comes with no guarantee of accu- racy and is only as good as the contributions the community provides. But it has the key advantage of openness, meaning that in some “sta- tistical” sense it reflects what R users understand, or at least that subset of the users sufficiently vocal and opinionated to submit to the Wiki.
2.7. ONLINE INFORMATION ABOUT R 33 The strength of this information source is that it may include material that users find relevant but that developers ignore for whatever reason (too trivial, something users would never do, etc.). Some Wiki sites have sufficient support from their user community that they can func- tion as the main information source on their topic. As of this writing, the R Wiki has not reached that stage, so it should be used as a sup- plement to other information sources, and not the primary source, but it’s a valuable resource nevertheless. The mailing lists: There are a number of e-mail lists associated officially with the R project (officially in the sense of having a pointer from the R Web page, r-project.org, and being monitored by members of R core). The two most frequently relevant lists for programming with R are r-help, which deals with general user questions, and r-devel, which deals generally with more “advanced” questions, including fu- ture directions for R and programming issues. As well as a way to ask specific questions, the mailing lists are valu- able archives for past discussions. See the various search mechanisms pointed to from the mailing list Web page, itself accessible as the Mailing lists pointer on the r-project.org site. As usual with tech- nical mailing lists, you may need patience to wade through some long tirades and you should also be careful not to believe all the assertions made by contributors, but often the lists will provide a variety of views and possible approaches. Journals: The electronic journal R News is the newsletter of the R Foun- dation, and a good source for specific tutorial help on topics related to R, among other R-related information. See the Newsletter pointer on the cran.r-project.org Web site. The Journal of Statistical Software is also an electronic journal; its coverage is more general as its name suggests, but many of the articles are relevant to programming with R. See the Web site jstatsoft.org. A number of print journals also have occasional articles of direct or in- direct relevance, for example, Journal of Computational and Graphical Statistics and Computational Statistics and Data Analysis.
34 CHAPTER 2. USING R 2.8 What’s Hard About Using R? This chapter has outlined the computations involved in using R. An R session consists of expressions provided by the user, typically typed into an R console window. The system evaluates these expressions, usually either showing the user results (printed or graphic output) or assigning the result as an object. Most expressions take the form of calls to functions, of which there are many thousands available, most of them in R packages available on the Web. This style of computing combines features found in various other lan- guages and systems, including command shells and programming languages. The combination of a functional style with user-level interaction—expecting the user to supply functional expressions interactively—is less common. Be- ginning users react in many ways, influenced by their previous experience, their expectations, and the tasks they need to carry out. Most readers of this book have selected themselves for more than a first encounter with the software, and so will mostly not have had an extremely negative reaction. Examining some of the complaints may be useful, however, to understand how the software we create might respond (and the extent to which we can respond). Our mission of supporting effective exploration of data obliges us to try. The computational style of an R session is extremely general, and other aspects of the system reinforce that generality, as illustrated by many of the topics in this book (the general treatment of objects and the facilities for interacting with other systems, for example). In response to this generality, thousands of functions have been written for many techniques. This diversity has been cited as a strength of the system, as indeed it is. But for some users exactly this computational style and diversity present barriers to using the system. Requiring the user to compose expressions is very different from the mode of interaction users have with typical applications in current com- puting. Applications such as searching the Web, viewing documents, or playing audio and video files all present interfaces emphasizing selection- and-response rather than composing by the user. The user selects each step in the computation, usually from a menu, and then responds to the op- tions presented by the software as a result. When the user does have to compose (that is, to type) it is typically to fill in specific information such as a Web site, file or optional feature desired. The eventual action taken, which might be operationally equivalent to evaluating an expression in R, is effectively defined by the user’s interactive path through menus, forms and other specialized tools in the interface. Based on the principles espoused
2.8. WHAT’S HARD ABOUT USING R? 35 in this book, particularly the need for trustworthy software, we might ob- ject to a selection-and-response approach to serious analysis, because the ability to justify or reproduce the analysis is much reduced. However, most non-technical computing is done by selection and response. Even for more technical applications, such as producing documents or using a database system, the user’s input tends to be relatively free form. Modern document-generating systems typically format text according to selected styles chosen by the user, rather than requiring the user to express controls explicitly. These differences are accentuated when the expressions required of the R user take the form of a functional, algebraic language rather than free-form input. This mismatch between requirements for using R and the user’s experi- ence with other systems contributes to some common complaints. How does one start, with only a general feeling of the statistical goals or the “results” wanted? The system itself seems quite unhelpful at this stage. Failures are likely, and the response to them also seems unhelpful (being told of a syntax error or some detailed error in a specific function doesn’t suggest what to do next). Worse yet, computations that don’t fail may not produce any directly useful results, and how can one decide whether this was the “right” computation? Such disjunctions between user expectations and the way R works be- come more likely as the use of R spreads. From the most general view, there is no “solution”. Computing is being viewed differently by two groups of people, prospective users on one hand, and the people who created the S language, R and the statistical software extending R on the other hand. The S language was designed by research statisticians, initially to be used primarily by themselves and their colleagues for statistical research and data analysis. (See the Appendix, page 475.) A language suited for this group to communicate their ideas (that is, to “program”) is certain to be pitched at a level of abstraction and generality that omits much detail necessary for users with less mathematical backgrounds. The increased use of R and the growth in software written using it bring it to the notice of such potential users far more than was the case in the early history of S. In addition to questions of expressing the analysis, simply choosing an analysis is often part of the difficulty. Statistical data analysis is far from a routine exercise, and software still does not encapsulate all the expertise needed to choose an appropriate analysis. Creating such expert software has been a recurring goal, pursued most actively perhaps in the 1980s, but it must be said that the goal remains far off. So to a considerable extent the response to such user difficulties must
Exploring the Variety of Random Documents with Different Content
the Mountains, and hover over the Coast Region generally, literally deluging Western Oregon and Washington, at certain seasons of the year, with rains and fogs. The year before, at Fort Vancouver, they had had one hundred and twenty consecutive days of rain, in one year, without counting the intervening showers; and they said, it wasn't "much of a year for rain" either! Another year, they didn't see the sun there for eighty days together, without reckoning the occasional fogs. No wonder the Oregonians are called "Web-Feet." They do say, the children there are all born web-footed, like ducks and geese, so as to paddle about, and thus get along well in that amphibious region. Perhaps this is rather strong, even for Darwinism; but I can safely vouch for Oregon's all-sufficing rains and fogs, whatever their effects on the species. Our fellow-passengers down the Columbia were chiefly returning miners, going below to winter and recruit; but rough as they were and merry at times, they were, as a rule, self-respecting and orderly. Our Fenian friends, who had raced with us down Powder River and Grande Ronde Valleys and across the Blue Mountains, turned up here again—"Shanks," "Fatty," and all—and subsequently embarked on the same steamer with us at Portland for San Francisco. A few Chinamen also were on board; but they behaved civilly, and were treated kindly.
CHAPTER XVI. FORT VANCOUVER TO SAN FRANCISCO. Fort Vancouver is an old Government Post, established in 1849, when Washington Territory was still a part of Oregon, and all the great region there was yet a wilderness. The village of Vancouver, a parasite on its outskirts, had grown up gradually; but had long since been distanced by Portland, across the Columbia in Oregon. A fine plateau, with a bold shore, made the Post everything desirable; but back of the post-grounds, the unbroken forest was still everywhere around it. It was now Headquarters of the Department of the Columbia, and the base for all military operations in that section. Here troops and supplies were gathered, for all the posts up the Columbia and its tributaries; though Portland, rather, seemed to be the natural brain of all that region. So, too, it controlled and supplied the forts at the mouth of the Columbia and the posts on Puget Sound; and, indeed, was of prime importance to the Government in many ways. Gen. Steele, in command of the Department, was an old Regular officer, who during the war commanded first in Missouri, afterwards around Vicksburg, then in Arkansas, and always with ability. He is now no more (dying in 1868), but some things he related in speaking of the war seem worth preserving. He said, Gen. Sherman was undoubtedly a great soldier; but he owed much to the rough schooling of his first campaigns, and improved from year to year. He said, Sherman in '62 was "scary" about Price's movements in Missouri and cited as an instance, that he once ordered the depot at Rolla broken up and the troops withdrawn, for fear Price would "gobble up" everybody and everything. He (Steele) then a Colonel, but in command at Rolla, appealed to Gen. Halleck, and was allowed
to remain; and subsequently Sherman, with his customary frankness, admitted his mistake. So, he said, Sherman in '63, when campaigning around Vicksburg, had little confidence in Grant's famous movement to the rear, via Grand Gulf and the Big Black, though the results were so magnificent. He said Sherman was somewhere up the Yazoo, with Porter and the gun-boats, and from there wrote him (Steele), in command of the Corps during Sherman's absence, that the proposed movement was perilous, and would probably fail, ruining them all; but, "nevertheless," he added, right loyally, "We must support Grant cordially and thoroughly, dear Steele, whatever happens." Subsequently, after they had landed at Grand Gulf—repulsed Pemberton and hurled him back on Vicksburg —cleaned Joe Johnston out of Jackson and chased him out of the country—and were crossing the Big Black in triumph, the movement now apparently a sure thing, Sherman and he were lying down to rest a little, at a house near the bridge, while the troops were filing over. Presently, an orderly announced Gen. Grant and staff riding by, when Sherman instantly sprang up, and rushing out of the house bareheaded seized Grant by the hand, and shaking it very warmly exclaimed, "I congratulate you, General, with all my heart, on the success of your movement. And, by heaven, sir, the movement is yours, too; for nobody else would endorse it!" He added, he never heard of Sherman's "protesting" against the movement, as reported afterwards in the newspapers, and didn't believe he ever had—"was too soldierly, by far, for that"—but he (Steele), knew all the facts at the time, and the above was about the Truth of History. Poor Steele! He was a true Army bachelor, fond of horses and dogs, and a connoisseur in both. He was besides a man of fine intelligence, and after dinner told a camp-story capitally. I remember several he told, with great gusto, while we shared his cosy quarters at Vancouver; but have not space for them here. Afterwards, we met him again in San Francisco, on leave of absence, the beloved of all army circles, and the favorite of society. May he rest in peace!
But to return to Fort Vancouver. We spent several days there very pleasantly, getting the bearings of things from there as a centre, and were loath to leave its hospitable quarters. It was now the first week in December; but the grapes were still hanging on the vines at Maj. N.'s quarters, and all about the post the grass was springing fresh and green, as in April in the East. We had fog or rain, or both together, about every day; no heavy down-pours, however, but gentle drizzles, as if the Oregon-Washington sky was only a great sieve, with perpetual water on 'tother side. They said, this was their usual weather from fall to spring, and then they had a delightful summer; though sometimes occasional snow-storms, sweeping down from the Mountains in January or February, gave them a taste of winter. Such snows, however, were light, and never lasted long. It seems, the Gulf Stream of the Pacific, sweeping up from the tropics, bears the isothermal lines so far north on this coast, that here at Fort Vancouver in the latitude of Montreal, they have the climate of the Carolinas in winter, with little of their excessive heats in summer. Walla-Walla, in latitude 46°, boasts the range of Washington, D. C. in 39°; and San Francisco, on the line of New York, claims the climate of Savannah. One evening while there, after a day of weary rain, the clouds suddenly broke away, and just at sunset we caught another noble view of Mount Hood again. A thin, veil-like cloud enrobed his feet, extending much of the way up; but above, his heaven-kissing head rose right regally, and his snowy crown became transfigured through all the changes—from pink to purple, and into night—as the day faded out. He looked still loftier and grander, than we had yet seen him, as if piercing the very sky, and was really superb. Aye, superbus. Haughty, imperial, supremely proud—which is about what the Romans meant, if I mistake not. A ride of six miles down the Columbia, on the little steamer Fanny Troup, and then twelve miles up the Willamette, landed us at Portland, Oregon, the metropolis of all that region. The distance from Fort Vancouver, as the crow flies, is only about six miles, but by water it is fully eighteen, as above stated. Here we found a thrifty busy town, of eight or ten thousand people, with all the eastern
evidences of substantial wealth and prosperity. Much of the town was well built, and the rest was rapidly changing for the better. Long rows of noble warehouses lined the wharves, many of the stores were large and even elegant, and off in the suburbs handsome residences were already springing up, notwithstanding the abounding stumps nearly everywhere. The town seemed unfortunately located, the river-plateau was so narrow there; but just across the Willamette was East Portland, a growing suburb, with room plenty and to spare. A ferry-boat, plying constantly, connected the two places, and made them substantially one. Portland already boasted water, gas, and Nicholson pavements; and had more of a solid air and tone, than any city we had seen since leaving the Missouri. The rich black soil, on which she stands, makes her streets in the rainy season, as then, sloughs or quagmires, unless macadamised or Nicholsoned; but she was at work on these, and they promised soon to be in good condition. Several daily papers, two weekly religious ones, and a fine Mercantile Library, all spoke well for her intelligence and culture, while her Public School buildings and her Court-House would have been creditable anywhere. The New England element was noticeable in many of her citizens, and Sunday came here once a week, as regularly as in Boston or Bangor. The Methodists and Presbyterians both worshipped in goodly edifices, and the attendance at each the Sunday we were there was large and respectable. Being the first city of importance north of San Francisco, and the brain of our northwest coast, Portland was full of energy and vigor, and believed thoroughly in her future. The great Oregon Steam Navigation Company had their headquarters here, and poured into her lap all the rich trade of the Columbia and its far-reaching tributaries, that tap Idaho, Montana, and even British America itself. So, also, the coastwise steamers, from San Francisco up, all made Portland their terminus, and added largely to her commerce. Back of her lay the valley of the Willamette, and the rich heart of Oregon; and her wharves, indeed, were the gateways to thousands of miles of territory and trade, in all directions. Nearer to the Sandwich
Islands and China, by several hundred miles, than California, she had already opened a brisk trade with both, and boasted that she could sell sugars, teas, silks, rice, etc., cheaper than San Francisco. Victoria, the British city up on Puget Sound, had once been a dangerous rival; but Portland had managed to beat her out of sight, and claimed now she would keep her beaten. It was Yankee Doodle against John Bull; and, of course, in such a contest, Victoria went to the wall! It seemed singular, however, that the chief city of the northwest coast should be located there—a hundred miles from the sea, and even then twelve miles up the little Willamette. Your first thought is, Portland has no right to be at all, where she now is. But, it appears, she originally got a start, from absorbing and controlling the large trade of the Willamette, and when the Columbia was opened up to navigation rapidly grew into importance, by her heavy dealings in flour, wool, cattle, lumber, etc. The discovery of mines in Idaho and Montana greatly invigorated her, and now she had got so much ahead, and so much capital and brains were concentrated here, that it seemed hard for any new place to compete with her successfully. [14] Moreover, we were told, there are no good locations for a town along the Columbia from the ocean up to the Willamette, nor on the Willamette up to Portland. Along the Columbia, from the ocean up, wooded hills and bluffs come quite down to the water, and the whole back country, as a rule, is still a wilderness of pines and firs; while the Willamette up to Portland, they said, was apt to overflow its banks in high water. Hence, Portland seemed secure in her supremacy, at least for years to come, though no doubt at no distant day a great city will rise on Puget Sound, that will dominate all that coast, up to Sitka and down to San Francisco. From want of time, we failed to reach the Posts on Puget's Sound; but all accounts agreed, that—land-locked by Vancouver's and San Juan islands—we there have one of the largest and most magnificent harbors in the world. With the Northern Pacific Railroad linking it to Duluth and the great lakes, commerce will yet seek its great advantages; and the Boston, if not the New York, of the Pacific will yet flourish where now are
only the wilds of Washington. The Sound already abounded in saw- mills, and the ship-timber and lumber of Washington we subsequently found famed in San Francisco, and throughout California. She was then putting lumber down in San Francisco, cheaper than the Californians could bring it from their own foot-hills, and her magnificent forests of fir and pine promised yet to be a rare blessing to all the Pacific Coast. The Portlanders, of course, were energetic, go-ahead men, from all parts of the North, with a good sprinkling from the South. Outside of Portland, however, the Oregonians appeared to be largely from Missouri, and to have retained many of their old Missouri and so- called "conservative" ideas still. All through our Territories, indeed, Missouri seemed to have been fruitful of emigrants. Kentucky, Indiana, Illinois, were everywhere well represented; but Missouri led, especially in Idaho and Oregon. This fact struck us repeatedly, and was well accounted for by friend Meacham's remark (top of the Blue Mountains), "the left wing of Price's army is still encamped in this region." The tone of society, in too many places, seemed to be of the Nasby order, if not worse. No doubt hundreds of deserters and draft-sneaks, from both armies, had made their way into those distant regions; and then, besides, the influence of our old officials, both civil and military, had long been pro-slavery, and this still lingered among communities, whom the war had not touched, and among whom school-houses and churches were still far too few. Of course, we met some right noble and devoted Union men everywhere, especially in Colorado; but elsewhere, and as a rule, they did not strike us as numerous, nor as very potential. In saying this, I hope I am not doing the Territories injustice; but this is how their average public opinion impressed a passing traveller, and other tourists we met en route remarked the same thing. Here at Portland, John Chinaman turned up again, and seemed to be behaving thoroughly well. At Boisè, we found these heathen paying their stage-fare, and riding down to the Columbia, while many Caucasians were walking, and here at Portland they appeared alike
thrifty and prosperous. Their advent here had been comparatively recent, and there was still much prejudice against them, especially among the lower classes; but they were steadily winning their way to public favor by their sobriety, their intelligence and thrift, and good conduct generally. Washing and ironing, and household service generally, seemed to be their chief occupations, and nearly everybody gave them credit for industry and integrity. Mr. Arrigoni, the proprietor of our hotel (and he was one of the rare men, who know how to "keep a hotel"), spoke highly of their capacity and honesty, and said he wanted no better servants anywhere. One of them, not over twenty-one, had a contract to do the washing and ironing for the Arrigoni House, at a hundred dollars per month, and was executing it with marked fidelity. He certainly did his work well, judging by what we saw of the hotel linen. In walking about the town, we occasionally came upon their signs, over the door of some humble dwelling, as for example, "Ling & Ching, Laundry;" "Hop Kee, washing and ironing;" "Ching Wing, shoemaker;" "Chow Pooch, doctor;" etc. As far as we could see, they appeared to be intent only on minding their own business, and as a class were doing more hearty honest work by far, than most of their bigoted defamers. We could not refrain from wishing them well, they were so sober, industrious, and orderly; for, after all, are not these the first qualities of good citizenship the world over? We left Portland, Dec. 11th, on the good steamer Oriflamme, for San Francisco. For a wonder, it was a calm clear day, with the bracing air of our Octobers in the east, and as we glided out of the Willamette into the noble Columbia, we had a last superb view of Mts. Jefferson, Hood, Adams and St. Helens all at the same time. Sometimes Rainier also is visible from here, but ordinarily only Hood and St. Helens appear. We thought this the finest view of these splendid snow-peaks that we had had yet, and it seemed strange no artist had yet attempted to group them all in one grand landscape, from the mouth of the Willamette as a stand-point. Or, if he could not get them all in, he might at least combine Hood and St. Helens. The breadth and scope, the grandeur and sublimity of such a
picture, with the Columbia in the foreground, and the great range of the Cascade Mountains in the perspective, would make a painting, that would live forever. We watched them all, with the naked eye and through the glass, until we were far down the Columbia, and to the last, Hood was the same "Dread ambassador from earth to heaven!" How he soared and towered, beyond and above everything, as if communing with the Almighty! Lofty as were the rest, they seemed small by his majestic side. St. Helens, however, though not so imperial, was perhaps more simply and chastely beautiful. An unbroken forest of fir, deep green verging into black, girt her feet, while above she "swelled vast to heaven," a perfect snow sphere rather than cone, whose celestial whiteness dazzled the eye. She looked like a virgin's or a nun's white breast, unsullied by sin, and standing sharply out against the glorious azure of that December sky, seemed indeed a perfect emblem of purity and beauty. Farther down the river, we detected a light smoke or vapor, drifting dreamily away from her summit, and Capt. Conner of the Oriflamme said this was not unusual, though St. Helens was not rated as a volcano. He thought it steam or vapor, caused by internal heat melting the snow, rather than smoke; but the effect was about the same. We reached the mouth of the Columbia, the same evening; but Capt. Conner thought it risky to venture over the bar, until morning. The next morning early, we lifted anchor, and steamed down to Astoria—a higgledy-piggledy village, of only four or five hundred inhabitants still, though begun long before prosperous Portland. Her anchorage seemed fair; but ashore the land abounded in a congeries of wooded bluffs and ridges, that evidently made a town or farms there difficult, if not impossible. A short street or two of straggling houses, propped along the hillsides, was about all there was of Astoria; and yet she was a port of entry, with a custom-house and full corps of officials, while Portland with all her enterprise and commerce was not, and could not get to be. What her custom-
officials would have to do, were it not for the business of Portland, it seemed pretty hard to say. A venture of John Jacob Astor's a half century before, as a trading post with the Indians, she had never become of much importance, because lacking a good back country; and it appeared, had no future now, because wanting a good town- site. This was unfortunate perhaps for Oregon, and the whole Columbia region; but over it Portland rejoiced, and continued to wax fat. Of course, it had begun to rain again, and by the time we had passed the ordeal of the custom-house at Astoria, the weather had thickened up into a drizzly fog, that caused Capt. C. much anxiety— especially, when he observed the barometer steadily going down. The bar of the Columbia, always bad, is peculiarly rough in winter, and only the voyage before the Oriflamme had to lay to here, nearly a week, unable to venture out. Her provisions became exhausted, and she had to "clean out" Astoria, and all the farm-houses up and down the river for miles, before she finally got away. Our company of four hundred passengers had no fancy for an experience of this sort, and "dirty" as the weather promised to be, Capt. C. at last decided to try the bar, even if we had to return, hoping to find better skies when fairly afloat in blue water. Our engines once in motion, we soon ran down past Forts Stevens and Cape Disappointment, at the mouth of the Columbia, on the Oregon and Washington sides respectively, with the black throats of their heavy cannon gaping threateningly at us. Both forts seem necessary there, as they completely command the mouth of the Columbia, and so hold the key to all that region. But life in them must be an almost uninterrupted series of rains and fogs, with the surf forever thundering at your feet, and one can but pity the officers and men really exiled there. Gathered about the flag-staff or lounging along the ramparts, they gazed wistfully at us as we steamed past; and already in the distance we could see the white-caps, racing in over the dreaded bar. Heading for the north channel, we put all steam on, and once out of the jaws of the Columbia were soon fairly a-dancing on the bar. The wind and tide both strong, were both dead ahead,
which made our exit about as bad, as could well be. The sea went hissing by, or broke into huge white-caps all about us. The engines creaked and groaned, and at times seemed to stand still, as if exhausted with the struggle. The good ship Oriflamme pitched and tossed, battling with the waves like a practiced pugilist, yet ever advanced, though sometimes apparently drifting shoreward. At one period, indeed, Capt. C. feared we would have to about ship and run for the Columbia—we progressed so slowly; but something of a lull in the wind just then helped us on, and at last we saw by the receding head-lands, that we were fairly over the bar and out into the broad Pacific. We congratulated ourselves in thus getting speedily to sea; but our tussle on the bar had been too much for the majority of our passengers, and soon our bulwarks were thronged with scores "casting up their accounts" with Father Neptune. Sea- sickness, that deathliest of all human ailments, had set in, and our "rough and tumble" with the waves had been so sharp, that many began to suffer from it, who declared they had never been attacked before. A notable New Yorker, a brawny son of Æsculapius at that, bravely protested, that sea-sickness was "Only a matter of the imagination. Anyone can overcome it. It only requires a vigorous exercise of the will." But, unfortunately for his theory, soon afterwards he himself became the sickest person on board, not excepting the ladies. My own experience ended with a qualm or two; but the majority of our passengers suffered very much, for several days. Our steamer really had accommodations for only about one hundred passengers; but some four hundred had crowded aboard of her at Portland, mostly miners eager to get "below" to winter, and those who had no state-rooms now "roughed it" pitiably. They lay around loose—on deck, in the cabin, in the gang-way, everywhere— the most disconsolate-looking fellows I ever saw, outside of a yellow-fever hospital. The few ladies aboard were even sicker; but these all had state-rooms, and kept them mostly for the voyage. The weather continued raw and the sea rough, most of the way down the coast, and our voyage of eight hundred miles from Portland to San Francisco, as a whole, could hardly be called
agreeable. We had fog, and rain, and head-winds all the way down, and with the exception of a day or two, it was really cold and uncomfortable. The steam-heating apparatus of the vessel was out of order, and the only place for us all to warm was at a register in the Social Hall—a narrow little cabin on deck, that would not accommodate over thirty persons at the farthest. There was a similar place for the ladies, but they usually filled this themselves. Groups huddled here all day, smoking and talking, and when the weather permitted also swarmed about the smoke-stacks. And then, besides, as already stated, our ship was badly overcrowded. Of our 400 passengers, less than a quarter had state-rooms, and the rest were left to shift for themselves. After the sea-sickness began to abate, we filled two or three tables every meal; and when bed-time came, mattrasses thronged the cabin from end to end. How it was down in the steerage, where the miners and Chinamen mostly congregated, one need not care to imagine. Fortunately great-coats and blankets abounded, or many would have suffered much. We found many choice spirits aboard, and in spite of wind and weather enjoyed ourselves, after all, very fairly. When it did not rain too hard, we walked the deck and talked for hours; and when everything else failed, we always found something of interest in the gulls that followed us by hundreds, and the great frigate-birds with their outstretched pinions, and the ever-rolling boundless sea. Our table- fare was always profuse and generally excellent, especially the Oregon apples and pears they gave us for dessert; and had it not been for our broken heating apparatus, no doubt we would have got along very satisfactorily after all, all things considered. We arrived off the Golden Gate, late at night, Dec. 14th, only four days out from Portland; but the sea was still so rough, that we feared to venture in. Next morning, however, when the mist broke away a little, we up steam and headed again for San Francisco. We had a tough time getting in, nearly as bad as getting out of the Columbia. We had to combat a strong wind dead-ahead, and to wrestle with a heavy sea. But, nevertheless, our good ship held on her course bravely; and at last, weathering Point Reyes, and
rounding Fort Point, we steamed up past frowning Alcatraz, and with booming cannon dropped anchor at the Company's wharf. The storm we had encountered was reported as one of the worst known on the coast for years, and we were glad once more to touch terra firma, and strike hands with a live civilization. In a half hour we were ashore and at the Occidental, a hostelry worthy of San Francisco or any other city. And so, we had reached California at last. All hail, the Golden Gate! And 'Frisco, plucky, vain young metropolis, hail! Bragging, boasting, giddy as you are, there is much excuse for you. Surely, with your marvellous growth, and far-reaching schemes, you have a right to call yourself the New York of the Pacific Coast, if that contents you.
CHAPTER XVII. SAN FRANCISCO. Geography demonstrates the matchless position of San Francisco, as metropolis of the Pacific coast, and assures her supremacy perhaps forever. The Golden Gate, a strait six miles long by one wide, with an average depth of twenty-four fathoms—seven fathoms at the shallowest point—is her pathway to the Pacific. At her feet stretches her sheltered and peerless bay, fifty miles long by five wide, with Oakland as her Brooklyn just across it. Beyond, the Sacramento and the San Joaquin empty their floods, the drainage of the Sierra Nevadas, and afford channels for trade with much of the interior. Her system of bays—San Pablo, Suisun, and San Francisco proper— contain a superficial area of four hundred square miles, of which it is estimated, eight feet in depth pour in and out of the Golden Gate every twenty-four hours. On all that coast, for thousands of miles, she seems to be the only really great harbor; and then, besides, all enterprise and commerce have so centred here, that hereafter it will be difficult, if not impossible, to wrest supremacy from her. Until we reached Salt-Lake, New York everywhere ruled the country, and all business ideas turned that way; but from there on, the influence of Gotham ceased, and everything tended to "'Frisco," as many lovingly called her. This was her general name, indeed, for short, all over the Pacific coast; though the Nevadans spoke of her, as "the Bay" still. The city itself stands on a peninsula of shifting dunes or sand-hills, at the mouth of the harbor, much the same as if New York were built at Sandy Hook. It was a great mistake, that its founders did not locate it at Benicia, or Vallejo, or somewhere up that way, where it would have been out of the draft of the Golden Gate, had better wharfage, and been more easily defended. But, it seems, when the gold fever first broke out, in 1849, the early vessels all came
consigned to Yerba Buena, as the little hamlet was then called; and as their charter-parties would not allow them to ascend the Bay farther, their cargoes were deposited on the nearest shore, and hence came San Francisco. It took a year or more then to hear from New York or London, and before further advices were received, so great was the rush of immigrants, the town was born and the city named. Benicia tried to change things afterwards; but 'Frisco had got the start, and kept it, in spite of her false location. Her military defences are Fort Point at the mouth of the Golden Gate, Fort San Josè farther up the harbor, and Alcatraz on an island square in the entrance, which with other works yet to be constructed would cross- fire and command all the approaches by water, thus rendering the city fairly impregnable. From the first, she seems to have had a fight with the sand-hills, and she was still pluckily maintaining it. She had cut many of them down, and hurled them into the sea, to give her a better frontage. Her "made" land already extended out several blocks, and the work was still going on. With a great penchant for right-angles, as if Philadelphia was her model city, she was pushing her streets straight out, in all directions, no matter what obstacles intervened. One would have thought, that with an eye to economy, as well as the picturesque, she would have flanked some of her sand-hills by leading her streets around them; but no! she marched straight at and over them, with marvellous audacity and courage, like the Old Guard at Waterloo, or the Boys in Blue at Chattanooga. Some were inaccessible to carriages; still she pushed straight on, and left the inhabitants to clamber up to their eyrie-like residences, as best they could. Many of these hills were still shifting sand, and in places lofty fences had been erected as a protection against sand-drifts; just as our railroads East sometimes build fences, as a protection against snow-drifts. The sand seemed of the lightest and loosest character, and when the breeze rose filled the atmosphere at all exposed points. And yet, when properly irrigated, it really seemed to produce about everything abundantly. While inspecting one of the harbor forts, I saw a naked drift on one side of a sand-fence, and on the
other a flower-garden of the most exquisite character, while just beyond was a vegetable and fruit-garden, that would have astonished people East. A little water had worked the miracle, and this a faithful wind-mill continued to pump up, from time to time as needed. Towards the south, the sand-hills seemed less of an obstruction, and thither the city was now drifting very rapidly. Real- estate there was constantly on the rise, and houses were springing up as if by magic in a night. The city-front, heretofore much confined, was now extending southward accordingly. It was about decided to build a sea-wall of solid granite, all along the front, two miles or more in length, at a cost of from two to three millions of dollars. This expenditure seemed large; but, it was maintained, was not too great for the vast and growing commerce of the city. But a few years before, it was a common thing for ships to go East empty or in ballast, for want of a return cargo; but in 1867 San Francisco shipped grain alone to the amount of thirteen millions of dollars, and of manufactures about as much more. Here are some other statistics that are worth one's considering. In 1849, then called Yerba Buena, she numbered perhaps 1,000 souls, all told; in 1869, nearly 200,000. In 1868, 59,000 passengers arrived by sea, and only 25,000 departed, leaving a net gain of 34,000. The vessels which entered the bay that year, numbered 3,300, and measured over 1,000,000 tons. She exported 4,000,000 sacks of wheat that year, and half a million barrels of flour. Her total exports of all kinds were estimated at not less than $70,000,000, and her imports about the same. Her sales of real-estate aggregated $27,000,000, and of mining and other stocks $115,000,000, on which she paid over $5,000,000 of dividends. The cash value of her real and personal property was estimated at $200,000,000. She sent away six tons of gold, and forty tons of silver every month, and in all since 1849 had poured into the coffers of the world not less than $1,030,000,000. [15] Her net-work of far-reaching and gigantic enterprises already embraced the whole Pacific Coast, northward to Alaska and southward to Panama, while beyond she stretched out her invisible arms to Japan and China, and shook hands with the Orient.
One cloudless morning, after days of dismal drizzle, an enthusiastic Forty-Niner took me up Telegraph Hill, and bade me "view the landscape o'er!" I remembered when a school-boy reading Dana's "Two Years before the Mast," in which he speaks so contemptuously of Yerba Buena, and its Mexican Rip Van Winkles. What a change here since then! Off to the west rolled the blue Pacific, sea and sky meeting everywhere. Then came Fort Point, with its formidable batteries, commanding the Golden Gate; and then the old Presidio, with the stars and stripes waving over it. Farther inland were the stunted live-oaks and gleaming marbles of Lone Mountain Cemetery, with the Broderick Monument rising over all. Then came the live, busy, bustling, pushing city, with its quarter of a million of inhabitants nearly, soon to be a million, its wharves thronged with the ships of all nations, but with harbor-room to spare sufficient to float the navies of the world. Beyond, lay Oakland, loveliest of suburbs, smiling in verdure and beauty, with Mount Diabolo towering in the distance—his snow-crowned summit flashing in the sunlight. The Sacramento and Stockton boats, from the heart of California were already in. Past the Golden Gate, and up the noble bay, with boom of welcoming cannon, came the Hong Kong steamer fresh from Japan. The Panama steamer, with her fires banked and flag flying, was just ready to cast off. While off to the south, a long train of cars, from down the bay and San Josè, came thundering in. A hundred church spires pierced the sky; the smoke from numberless mills and factories, machine-shops and foundries, drifted over the harbor; the horse-car bells tinkled on every side—the last proofs of American progress—and all around us were the din and boom of Yankee energy, and thrift, and go-ahead-ative-ness, in place of the old Rip Van Winkleism. I don't wonder, that all good Pacific Coasters believe in San Francisco, and expect to go there when they die! Her hotels, her school-houses, her churches, her Bank of California, her Wells-Fargo Express, her Mission Woollen Mills, her lines of ocean steam-ships, and a hundred other things, all suggest great wealth and brains; and yet they are only the first fruits of nobler fortune yet to come. She is what Carlyle might call an undeniable fact, a substantial verity; and, in spite of her "heavy job of work," moves
onward to empire with giant strides. She contained already fully a third of the population of the whole state of California, and was "lifting herself up like a young lion" in all enterprises—at all times and everywhere—on the Pacific slope. Her faulty location, however, gives her a climate, that can scarcely be called inviting, notwithstanding all that Californians claim for their climate generally. It is true, the range of the thermometer there indicates but a moderate variation of temperature, with neither snow nor frost, usually. But her continual rains in winter, and cold winds and fogs in summer, must be very trying to average nerves and lungs. We found it raining on our arrival there in December, with the hills surrounding the bay already turning green; and it continued to rain and drizzle right along, pretty much all the time, until we departed for Arizona in February. Sometimes it would break away for an hour or two, and the sun would come out resplendently, as if meaning to shine forever; and then, suddenly, it would cloud over, and begin to drizzle and rain again, as if the whole heavens were only a gigantic sieve. Really, it did rain there sometimes the easiest of any place I ever saw—not excepting Fort Vancouver. Going out to drive, or on business, we got caught thus several times, and learned the wisdom of carrying stout umbrellas, or else wearing bang-up hats and water-proof coats, like true Californians. Once, for a fortnight nearly, it rained in torrents, with but little intermission, and then the whole interior became flooded—bridges were washed away, roads submerged, etc. In the midst of this, one night, we had a sharp passage of thunder and lightning—a phenomenon of rare occurrence on that coast—followed by a slight earthquake, and then it rained harder than ever. But at last, the winter rains came to an end, as all things must, and then we had indeed some superb weather, worthy of Italy or Paradise. Californians vowed their winter had been an unusual one; that their January was usually good, and their February very fine; but, of course, things must be reported as we found them. As a rule, nobody seemed to mind the perpetual drizzle, so to speak; but with slouched hats and light overcoats, or infrequent umbrellas, everybody tramped the streets, as business or
pleasure called, and the general health of the city continued good. The few fair days we had in January and early February were as soft and balmy, as our May or June, and all 'Frisco made the most of them. The ladies literally swarmed along Montgomery street, resplendent in silks and jewelry, and all the drives about the city— especially the favorite one to the Cliff-House and sea-lions—were thronged with coaches and buggies. Meanwhile, the islands in the harbor and the surrounding hills and country, so dead and barren but a few weeks before, had now become superbly green, and the whole bay and city lay embosomed in emerald. We left there the middle of February for Arizona, and did not get back until late in May. Then, when we returned we found the rains long gone, the vegetation fast turning to yellow—grain ripening in the fields—strawberries and peas on the table—and the summer winds and fogs in full vogue. At sunrise, it would be hot, even sultry, and you would see persons dressed in white linen. By nine or ten a. m., the wind would rise—a raw damp wind, sometimes with fog, sweeping in from the Pacific—and in the evening, you would see ladies going to the Opera with full winter furs on. How long this lasted, I cannot say; but this was the weather we experienced, as a rule, late in May and early in June. Heavy great-coats, doubtless, are never necessary there. And so, on the other hand, thin clothing is seldom wanted. Many indeed said, they wore the same clothing all seasons of the year, and seldom found it uncomfortable either way. The truth seemed to be, that for hardy persons the climate was excellent—the air bracing and stimulating—but invalids were better off in the interior. Consumptives could not stand the winds and fogs at all; and it was a mooted question, as to whether the large percentage of suicides just then, was not due in part to climatic influences. The really healthy, however, appeared plump and rosy, and the growing children promised well for the future. Had 'Frisco been built at Benicia, or about there, she would have escaped much of her climatic misery. Even across the bay, at Oakland, they have a much smoother climate. But she would "squat" on a sandspit, at the mouth of the Golden Gate, where there is a perpetual suck of wind
and fog—from the ocean, into the bay, and up the valley of the Sacramento—and now must make the most of her situation. Montgomery Street is the Broadway or Chestnut Street of San Francisco, and California her Wall Street. Her hotels, shops, and banking-houses are chiefly here, and many of them are very handsome edifices. The Occidental, Cosmopolitan, and Lick-House hotels, the new Mercantile Library, and Bank of California, are stately structures, that would do credit to any city. Their height, four and five stories, seemed a little reckless, considering the liability of the Coast to earthquakes; but the people made light of this, notwithstanding some of their best buildings showed ominous cracks "from turret to foundation stone." So long as they stood, everything was believed secure; and commerce surged and roared along the streets, as in New York and London. Brick, well strengthened by iron, seemed to be the chief building material in the business parts of the city, though stone was coming into use, obtained from an excellent quarry on Angel Island. The Bank of California had been constructed of this, and was much admired by everybody. The private residences, however, seemed chiefly frame, and were seldom more than two and a half stories high. Doubtless more heed is given to earthquakes here, though your true Californian would be slow to acknowledge this. Nevertheless, deep down in his heart—at "bed- rock," as he would say—his household gods are esteemed of more importance, than his commercial commodities. In the suburbs, Mansard roofs were fast coming into vogue, and everywhere there was a general breaking out of Bay-Window. Brown seemed to be the favorite color, doubtless to offset the summer sand-storms, and the general prevalence of bay-windows may also be due partly to these. Convenience and comfort—often elegance and luxury—appeared everywhere, and to an extent that was surprising, for a city so young and raw. Shade-trees were still rare, because only the native scrubby live-oaks, with deep penetrating roots, can survive the long and dry summers there. But shrubbery and flowers, prompted by plentiful irrigation, appeared on every side, and the air was always redolent of perfume. The most unpretending homes had their gems of flower-
gardens, with evergreens, fuchsias, geraniums, pansies, and the variety and richness of their roses were a perpetual delight. A rill of water, with trickling side streams, made the barren sand-hills laugh with verdure and beauty, and gaunt wind-mills in every back-yard kept up the supply. The wind-mill California rises to the dignity of an institution, and is a godsend to the whole coast. In winter, of course, they are not needed. But throughout the long and rainless summer, when vegetation withers up and blows away, the steady sea-breeze keeps the wind-mills going, and these pump up water for a thousand irrigating purposes. The vegetable gardens about the city, and California farmers generally, all patronize them, more or less, and thus grow fruits and vegetables of exquisite character, and almost every variety, the year round. The markets and fruit-stands of San Francisco, groaning with apples, pears, peaches, plums, pomegranates, oranges, grapes, strawberries, etc., have already become world-renowned, and the Pacific Railroad now places them at our very doors. Montgomery street repeats Broadway in all but its vista, but with something more perhaps of energy and dash. The representative New Yorker always has a trace of conservatism somewhere; but your true Californian laughs at precedent, and is embodied go-ahead- ativeness. In costume, he is careless, not to say reckless, insisting on comfort at all hazards, and running greatly to pockets. Stove-pipe hats are an abomination to him, and tight trowsers nowhere; but beneath his slouch-hat are a keen eye and nose, and his powers of locomotion are something prodigious. Cleaner-cut, more wide- awake, and energetic faces are nowhere to be seen. Few aged men appear, but most average from twenty-five to forty years. Resolute, alert, jaunty, bankrupt perhaps to-day, but to-morrow picking their flints and trying it again, such men mean business in all they undertake, and carry enterprise and empire in the palms of their hands. The proportion of ladies on Montgomery street, however, usually seemed small, and the quality inferior to that of the sterner sex. Given to jewelry and loud colors, and still louder manners, there was a fastness about them, that jarred upon one's Eastern sense,
though some noble specimens of womanhood now and then appeared. Doubtless, the hotel and apartment-life of so many San Franciscans had something to do with this, as it is fatal to the more modest and domestic virtues; but it must be doubted, whether this will account for it entirely. Evidently, California is still "short" of women, at least of the worthier kind, and until she completes her supply will continue to over-estimate and spoil what she has. At least, this is the impression her Montgomery street dames make upon a stranger, and unfortunately there is much elsewhere to confirm it. Respect for the Sabbath seemed to be a growing virtue, but there was still room for much improvement. Many of the stores and shops on Montgomery and Kearney streets were open on Sunday, the same as other days; and it seemed to be the favorite day for pic-nics and excursions, to Oakland and San Mateo. Processions, with bands of music, were not infrequent, and at Hayes' Park in the Southern suburbs the whole Teuton element seemed to concentrate on that day, for a general saturnalia. On the other hand, there was a goodly array of well-filled churches, and their pastors preached with much fervency and power. The Jewish Synagogue is a magnificent structure, one of the finest in America, and deserves more than a passing notice. It is on Sutter street, in a fine location overlooking the city, and cost nearly half a million of dollars. The gilding and decoration generally inside, viewed from the organ-loft, are superb. But few of the large choir were Jews, and scarcely any could read the old Hebrew songs and chants in the original; so these were printed in English, as the Hebrew sounds, and thus they maintained the ancient custom of singing and chanting only in Hebrew! Their music, nevertheless, was grand and inspiring, and it would be well, for our Gentile churches, to emulate it. This was called the Progressive Synagogue. The congregation had recently shortened the ancient service from three hours to an hour and a half, by leaving out some of the long prayers—"vain repetitions," it is presumed—and the consequence was, a split in this most conservative of churches. The good old conservative brethren, of
course, could not stand the abbreviation. They were fully persuaded, they could never get to Paradise, with only an hour and a half's service. So, they seceded, and set up for themselves. Very prosperous and wealthy are the Jews of San Francisco; and, indeed, all over the Pacific Coast, our Hebrew friends enjoy a degree of respectability, that few attain East. They number in their ranks many of the leading bankers, merchants, lawyers, etc., of San Francisco; and more than one of them sits upon the Bench, gracing his seat. Poor Thomas Starr King's church is a model in its way, and the congregation that assembles there one of the most cultivated and refined on the Pacific Coast. Their pastor, Dr. Stebbins, though not equal to his great predecessor, in some respects, is a man of marked thought and eloquence; and, by his broad Christian charity, was doing a noble work in San Francisco. So, Dr. Stone, formerly of Boston, was preaching to large audiences, and declaring "the whole counsel of God," without fear or favor. His church is plain but large and commodious, and was always thronged with attentive worshippers. Dr. Wadsworth, lately of Philadelphia, was not attracting the attention he did East; but his church was usually well- filled, and he was exerting an influence and power for good much needed. The Methodists, our modern ecclesiastical sharp-shooters, did not seem as live and aggressive, as they usually do elsewhere; but we were told they were a great and growing power on the Coast, for all that, and everybody bade them God speed. The Episcopalians, as a rule, I regret to say, appeared to make but little impression, and were perhaps unfortunate in their chief official. The Catholics, embracing most of the old Spanish population and much of the foreign element, were vigorous and aggressive, and made no concealment of the fact, that they were aiming at supremacy. In this cosmopolitan city, the Chinese, too, have their Temples, or Josh- Houses; but they were much neglected, and John Chinaman, indeed, religiously considered, seemed well on the road to philosophic indifference. During the past decade, however, things on the whole had greatly improved, morally and religiously, as the population had become
more fixed and settled; and all were hoping for a still greater improvement, with the completion of the Railroad, and the resumption of old family ties East. The drinking-saloons were being more carefully regulated. The gambling-hells, no longer permitted openly, were being more and more driven into obscurity and secrecy. Law and order were more rigidly enforced. The vigilance committees of former years still exerted their beneficent example. The Alta, Bulletin, and Times, then the three great papers of the city and Coast, all noble journals, were all open and pronounced in behalf of good morals and wholesome government; and it is not too much to say, that the prospect for the future was certainly very gratifying, not to say cheering. "Forty-Niners," (Bret Harte's Argonauts) and other early comers, declared themselves amazed, that they were getting on, as well as they did. "Yes," said one of the best of them, a man of great shrewdness and ability, "I grant, we Californians have been pretty rough customers, and have not as many religious people among us yet, as we ought to have; but then, what we have are iron-clad, you bet!" I suspect that is about so. A man, who is really religious in California, will likely be so anywhere. The severity of his temptations, if he resist them, will make him invulnerable; and all the "fiery darts of the wicked one," elsewhere, will fall harmless at his feet. Faithful Monitors are they, battling for Jesus; and in the end, we know, will come off more than conquerors. With all our hearts, let us bid them God speed!
CHAPTER XVIII. SAN FRANCISCO (continued). Here in San Francisco, our National greenbacks were no longer a legal tender, but everything was on a coin basis. Just as in New York, you sell gold and buy greenbacks, if you want a convenient medium of exchange, so here we had to sell greenbacks and buy gold. A dime was the smallest coin, and "two bits" (twenty-five cents) the usual gratuity. A newspaper cost a dime, or two for twenty-five cents —the change never being returned. Fruits and vegetables were cheap, but dry-goods, groceries, clothing, books, etc., about the same in gold, as East in greenbacks. The general cost of living, therefore, seemed to be about the same as in New York, plus the premium on gold. California and the Pacific slope generally had refused to adopt the National currency, and it was still a mooted question whether they had lost or gained by this. At first, they thought it a great gain to be rid of our paper dollars; but public opinion had changed greatly, and many were getting to think they had made a huge mistake, in not originally acquiescing in the national necessity. The prosperity of the East during the war, and the pending sluggishness of trade on the Coast (still continuing), were much commented on, as connected with this question of Coin vs. Greenbacks; but it was thought too late to remedy the matter now. This hostility to our Greenbacks did not seem to arise from a want of patriotism, so much as from a difference of opinion, as to the necessity or propriety of their using a paper currency, when they had all the gold and silver they wanted, and were exporting a surplus by every steamer. If there was a speck of Secession there at first, California afterwards behaved very nobly, especially when she came with her bullion by the many thousands to the rescue of the Sanitary Commission; and Starr King's memory was still treasured
everywhere, as that of a martyr for the Union. The oncoming Pacific Railroad was constantly spoken of, as a new "bond of union," to link the Coast to the Atlantic States as with "hooks of steel;" and, evidently, nothing (unless it may be the Chinese Question) can disturb the repose of the Republic there, for long years to come. The people almost universally spoke lovingly and tenderly of the East, as their old "home," and thousands were awaiting the completion of the Railroad to go thither once again. Their great passion, however, just then, was for territorial aggrandizement. Mr. Seward had just announced his purchase of Alaska, and of course, everybody was delighted, as they would have been if he had bought the North Pole, or even the tip end of it. Next they wanted British Columbia and the Sandwich Islands, and hoped before long also to possess Mexico and down to the Isthmus. The Sitka Ice Company, which for some years had supplied San Francisco and the Coast with their only good ice, was proof positive, that there was cold weather sometimes in Alaska; nevertheless, they claimed, the Sage of Auburn had certainly shown himself to be a great statesman, by going into this Real Estate business, however hyperborean the climate. It was soon alleged to be a region of fair fields and dimpled meadows, of luscious fruits and smiling flowers, of magnificent forests and inexhaustible mines, as well as of icebergs and walrusses; and straightway a steamer cleared for Sitka, with a full complement of passengers, expecting to locate a "city" there and sell "corner lots," start a Mining Company and "water" stock, or initiate some other California enterprise. Christmas and New Year in San Francisco were observed very generally, and with even more spirit than in the East. The shops and stores had been groaning with gifts and good things for some time, and on Christmas Eve the whole city seems to pour itself into Montgomery street. Early in the evening, there was a scattering tooting of trumpets, chiefly by boys; but along toward midnight, a great procession of men and boys drifted together, and traversing Montgomery, Kearney, and adjacent streets, made the night hideous
with every kind of horn, from a dime trumpet to a trombone. New Year was ushered in much the same way, though not quite so elaborately. On both of these winter holidays there happened to be superb weather, much like what we have East in May, with the sky clear, and the air crisp, and the whole city—with his wife and child— seemed to be abroad. The good old Knickerbocker custom of New Year calls was apparently everywhere accepted, and thoroughly enjoyed. Every kind of vehicle was in demand, and "stag" parties of four or five gentlemen—out calling on their lady friends—were constantly met, walking hilariously along, or driving like mad. Quite a number of army officers happened to be in San Francisco just then, and their uniforms of blue and brass made many a parlor gay. Of names known east, there were Generals Halleck, McDowell, Allen, Steele, Irvin Gregg, French, King, Fry, etc., and these with their brother officers were everywhere heartily welcomed. Indeed, army officers are nowhere more esteemed or better treated, than on the Pacific Coast, and all are usually delighted with their tour of duty there. In former years, many of them married magnificent ranches— encumbered, however, with native señoritas—and here and there we afterwards met them, living like grand seignors on their broad and baronial acres. Ranches leagues in extent, and maintaining thousands of cattle and sheep, are still common in California, and some of the best of these belong to ex-army officers. Their owners, however, do but little in the way of pure farming, and are always ready to give a quarter section or so to any stray emigrant, who will settle down and cultivate it—especially to old comrades. The great feature of San Francisco, of course, is her peerless bay. Yet noble as it is for purposes of commerce, it avails little for pleasure excursions; and 'Frisco, indeed, might be better off in this respect. A trip to Oakland is sometimes quite enjoyable, and the ride by railroad down the peninsula, skirting the bay, to San Josè, is always a delight. But the bay itself is fickle and morose in winter, and in summer must be raw and gusty. The suck of wind, from the Pacific into the interior, through the Golden Gate, as through a funnel, always keeps the bay more or less in a turmoil; and during
the time we were there, it seemed quite neglected, except for business purposes. One day, in the middle of January, however, we had duties that took us to Alcatraz and Angel Island, and essayed the trip thither in a little sloop. On leaving the Occidental, the sky was overcast, and we had the usual drizzle of that winter; but before we reached Meigg's Wharf, it had thickened into a pouring rain, and as we crossed to Alcatraz squalls were churning the outer bay into foam in all directions. After an hour or two there, on that rocky fortress, the key of San Francisco, with the wind and rain dashing fitfully about us, we took advantage of a temporary lull to re-embark for Angel Island. We had hardly got off, however, before squall after squall came charging down upon us; and as we beat up the little strait between Angel Island and Socelito, the sloop careening and the waves breaking over us, it seemed at times as if we were in a fair way of going to the bottom. Just as we rounded the rocky point of the Island, before reaching the landing, a squall of unusual force struck us athwart the bows, wave after wave leaped aboard, and for awhile our gallant little craft quivered in the blast like a spent race- horse, as she struggled onward. An abrupt lee shore was on one side, the squall howling on the other; but we faced it out, and in a lull, that soon followed, shot by the landing (it being too rough to halt there), and weathering the next point dropped anchor in a little cove behind it, just in time to escape another squall even fiercer than the former. Had we been off either point, or out in the bay, when this last one struck us, no doubt we would have gone ashore or to Davy Jones' locker; and altogether, as our Captain said, it was a "nasty, dirty day," even for San Francisco. Returning, we had skies less treacherous and a smoother run; but were glad to reach the grateful welcome and spacious halls of the Occidental, best of hotels, again. It may be, that the bay was a little ruder that day, than usual; but it bears a bad name for sudden gusts and squalls, and San Franciscans give it a wide berth generally. Sometimes, in summer, it is afflicted by calms as well as squalls; we heard some amusing stories of parties becalmed there until late at night, unable to reach either shore; so that, altogether, however useful otherwise,
it can hardly be regarded as adding much per se to the pleasures of a life in 'Frisco. As an offset to this, however, all orthodox San Franciscans, swear by the Cliff-House and the sea-lions. To "go to the Cliff," is the right thing to do in San Francisco, and not to go to the Cliff-House is not to see or know California. In the summer, people drive there in the early morning, to breakfast and return before the sea-breeze rises, and then hundreds of gay equipages throng the well-kept road. Even in winter, at the right hour, you are always sure to meet many driving out or in. Of course, we went to the "Cliff"—wouldn't have missed going there for anything. Past Lone Mountain Cemetery, that picturesque city of the dead, the fine graveled road strikes straight through the sand-hills, for five or six miles, to the Pacific; and when you reach the overhanging bluff, on which the hotel perches like an eagle's nest, you have a grand view of the Golden Gate and the far- stretching sea beyond. On the very verge of the horizon hang the Farallones, pointing the way to Japan and China, and the white sails of vessels beating in or out the harbor dot the ocean far and near. Just in front of the hotel are several groups of high shelving rocks, among which the ocean moans and dashes ceaselessly, and here the seals or "sea-lions," as 'Frisco lovingly calls them, have a favorite rendezvous and home. The day we were there, there appeared to be a hundred or more of them, large and small, swimming about the rocks or clambering over them, while pelicans and gulls kept them company. Some were small, not larger than a half-grown sturgeon, while others again were huge unwieldy monsters, not unlike legless oxen, weighing perhaps a thousand pounds or more. "Ben Butler" was an immense, overgrown creature, as selfish and saucy, apparently, as he could well be; and another, called "Gen. Grant," was not much better. They kicked and cuffed the rest overboard quite indiscriminately, though now and then they were compelled to take a plunge themselves. Many contented themselves with merely gamboling around the water's edge; but others had somehow managed slimily to roll and climb forty or fifty feet up the rocks, and there lay sunning themselves in supreme felicity, like veteran
politicians snug in office. Sometimes two or three would get to wrangling about the same position, as if one part of the rocks were softer than another, and then they would bark and howl at each other, and presently essay to fight in the most clumsy and ludicrous way. "Ben Butler," or "Gen. Grant," would usually settle the squabble, by a harsh bark, or by flopping the malcontents overboard, and then would resume his nap with becoming satisfaction. Uncouth, and yet half-human in their way, with a cry that sometimes startled you like a distant wail, we watched their movements from the piazza of the hotel with much interest, and must congratulate 'Frisco on having such a first-class "sensation." May her "sea-lions" long remain to her as a "lion" of the first water, and their numbers and renown never grow less! In former years, they were much shot at and annoyed, by thoughtless visitors. But subsequently the State took them under her protection, and now it was a penal offence to injure or disturb them. This is right, and California should be complimented, for thus trying to preserve and perpetuate this interesting colony of her original settlers. Returning, we had a superb drive down the beach, with the surf thundering at our wheels; and thence, by a winding road over and through the hills, reached the city again. It was a glorious day in February, after a fortnight of perpetual drizzle—a June day for beauty, but toned by an October breeze—the sun flashing overhead like a shield of gold; the road, over and between the hills, gave us from time to time exquisite glimpses of the sea or bay and city; every sense seemed keyed to a new life and power of enjoyment; and the memory of that "drive to the Cliff," is something wonderfully clear and charming still. It would be surprising, if Californians did not brag considerably about it. They are not famed for modesty, and would be heathens, if they kept silence. Californians are proverbial for their ups and downs, and we heard much of their varying fortunes. You will scarcely meet a leading citizen, who has not been down to "hard-pan" once or twice in his career, and everybody seems to enjoy telling about it. In former
years, many had been rich in "feet" or "corner-lots," who yet had not enough "dust" to buy a "square-meal;" and men with Great Expectations, but small cash in hand, were still not infrequent. I ran foul of an old school-mate one day, who arrived in California originally as captain of an ox-team, which he had driven across the Plains. But now he was deep in mining-stocks, and twenty-vara lots, and was rated as a millionaire. I met another who for years lost all he invested in "feet." But luckily, at last, he went into Savage and Yellow Jacket, and now he owned handsome blocks on Montgomery and California streets, and lived like a prince at the Occidental. Another still, named O., an eccentric genius, came out to California early, and his uncle (already there) secured him a place in a dry- goods house. In a few months, the house failed, and O. fell back on his uncle's hands again. Then he was given a place in a silk-house, but in a short time this also failed. A fatality seemed to accompany the poor fellow. Wherever he went, the houses either failed, closed up, or burned out; and thus, time after time, he came back to his uncle, like a bad penny. Once he was reduced so low, he went to driving a dray, glad to get even that; and again, turned chiffonier, and eked out a precarious living by collecting the old bones, scraps of tin, sheet-iron, etc., that lay scattered about the suburbs. Finally, he wisely concluded he had "touched bottom," and that California was no place for him. So, his kind-hearted uncle bought him a ticket home by the "Golden City," and supposed when he bade him good- bye on her gang-way, that that would be the last he would see of O. in California. But a week or so afterwards, early one Sunday morning, he was roused up by some one rapping lustily at the door, and opening it lo! there was his hopeful nephew again—"large as life and twice as natural!" It seems, the ill-fated steamer, when two or three hundred miles down the Coast, had caught fire and been beached, with the loss of many lives; but O., strange to say, had escaped scot-free, and now was on hand again. He now tried two or three more situations, thinking his "luck" perhaps had turned, but failed in all of them or they soon failed; and finally set out for the East again, but this time across the Plains, driving a "bull-team." He got safely back to New York, and taking hold of his father's business
Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com

Software For Data Analysis Programming With R 1st Edition John Chambers Auth

  • 1.
    Software For DataAnalysis Programming With R 1st Edition John Chambers Auth download https://ebookbell.com/product/software-for-data-analysis- programming-with-r-1st-edition-john-chambers-auth-2250222 Explore and download more ebooks at ebookbell.com
  • 2.
    Here are somerecommended products that we believe you will be interested in. You can click the link to download. Software For Data Analysis Programming With R 1st Edition John Chambers Auth https://ebookbell.com/product/software-for-data-analysis-programming- with-r-1st-edition-john-chambers-auth-2358460 Comparative Approaches To Using R And Python For Statistical Data Analysis Advances In Systems Analysis Software Engineering And High Performance Computing Sarmento https://ebookbell.com/product/comparative-approaches-to-using-r-and- python-for-statistical-data-analysis-advances-in-systems-analysis- software-engineering-and-high-performance-computing-sarmento-55674616 Data Analysis With R Statistical Software A Guidebook For Scientists Thomas https://ebookbell.com/product/data-analysis-with-r-statistical- software-a-guidebook-for-scientists-thomas-55810092 Tools And Algorithms For The Construction And Analysis Of Systems 28th International Conference Tacas 2022 Held As Part Of The European Joint Conferences On Theory And Practice Of Software Etaps 2022 Munich Germany April 27 2022 Part I Dana Fisman https://ebookbell.com/product/tools-and-algorithms-for-the- construction-and-analysis-of-systems-28th-international-conference- tacas-2022-held-as-part-of-the-european-joint-conferences-on-theory- and-practice-of-software-etaps-2022-munich-germany-april-27-2022-part- i-dana-fisman-44887728
  • 3.
    Tools And AlgorithmsFor The Construction And Analysis Of Systems 28th International Conference Tacas 2022 Held As Part Of The European Joint Conferences On Theory And Practice Of Software Etaps 2022 Munich Germany April 27 2022 Part Ii Dana Fisman https://ebookbell.com/product/tools-and-algorithms-for-the- construction-and-analysis-of-systems-28th-international-conference- tacas-2022-held-as-part-of-the-european-joint-conferences-on-theory- and-practice-of-software-etaps-2022-munich-germany-april-27-2022-part- ii-dana-fisman-44887730 Software Foundations For Data Interoperability And Large Scale Graph Data Analytics 4th International Workshop Sfdi 2020 And 2nd International Workshop Lsgda 2020 Held In Conjunction With Vldb 2020 Tokyo Japan September 4 2020 Proceedings 1st Ed Lu Qin https://ebookbell.com/product/software-foundations-for-data- interoperability-and-large-scale-graph-data-analytics-4th- international-workshop-sfdi-2020-and-2nd-international-workshop- lsgda-2020-held-in-conjunction-with-vldb-2020-tokyo-japan- september-4-2020-proceedings-1st-ed-lu-qin-22496306 Data Analytics For Drilling Engineering Theory Algorithms Experiments Software 1st Ed 2020 Qilong Xue https://ebookbell.com/product/data-analytics-for-drilling-engineering- theory-algorithms-experiments-software-1st-ed-2020-qilong-xue-10801346 Software Data Engineering For Network Elearning Environments Analytics And Awareness Learning Services 1st Edition Santi Caball https://ebookbell.com/product/software-data-engineering-for-network- elearning-environments-analytics-and-awareness-learning-services-1st- edition-santi-caball-6988958 Software Engineering For Data Scientists Meap V2 Chapters 1 To 7 Of 14 Andrew Treadway https://ebookbell.com/product/software-engineering-for-data- scientists-meap-v2-chapters-1-to-7-of-14-andrew-treadway-48497044
  • 6.
    Statistics and Computing SeriesEditors: J. Chambers D. Hand W. Härdle
  • 7.
    Statistics and Computing Brusco/Stahl:Branch and Bound Applications in Combinatorial Data Analysis Chambers: Software for Data Analysis: Programming with R Dalgaard: Introductory Statistics with R Gentle: Elements of Computational Statistics Gentle: Numerical Linear Algebra for Applications in Statistics Gentle: Random Number Generation and Monte Carlo Methods, 2nd ed. Härdle/Klinke/Turlach: XploRe: An Interactive Statistical Computing Environment Hörmann/Leydold/Derflinger: Automatic Nonuniform Random Variate Generation Krause/Olson: The Basics of S-PLUS, 4th ed. Lange: Numerical Analysis for Statisticians Lemmon/Schafer: Developing Statistical Software in Fortran 95 Loader: Local Regression and Likelihood Ó Ruanaidh/Fitzgerald: Numerical Bayesian Methods Applied to Signal Processing Pannatier: VARIOWIN: Software for Spatial Data Analysis in 2D Pinheiro/Bates: Mixed-Effects Models in S and S-PLUS Unwin/Theus/Hofmann: Graphics of Large Datasets: Visualizing a Million Venables/Ripley: Modern Applied Statistics with S, 4th ed. Venables/Ripley: S Programming Wilkinson: The Grammar of Graphics, 2nd ed.
  • 8.
    John M. Chambers Programmingwith R Software for Data Analysis
  • 9.
    David Hand Department ofMathematics South Kensington Campus Imperial College London W. Härdle Institut für Statistik und Ökonometrie Humboldt-Universität zu Berlin Spandauer Str. 1 D-10178 Berlin Germany Department of Statistics–Sequoia Hall John Chambers 390 Serra Mall Stanford University Stanford, CA 94305-4065 USA London, SW7 2AZ United Kingdom All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are springer.com 9 8 7 6 5 4 3 2 1 e-ISBN: 978-0-387-75936-4 ISBN: 978-0-387-75935-7 ©2008 Springer Science+Business Media, LLC Printed on acid-free paper. countries. Mac OS® X - Operating System software - is a registered trademark of Apple Computer, Inc. MATLAB® is a trademark of The MathWorks, Inc. countries. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. S-PLUS® is a registered trademark of Insightful Corporation. UNIX® is a registered trademark of The Open Group. of Microsoft Corporation in the U.S. and/or other countries. Star Trek and related marks are trademarks of CBS Studios, Inc. Windows® and/or other Microsoft products referenced herein are either registered trademarks or trademarks Java™ is a trademark or registered trademark of Sun Microsystems, Inc. in the United States and other MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other Department of Statistics–Sequoia Hall John Chambers 390 Serra Mall Stanford University Stanford, CA 94305-4065 USA Library of Congress Control Number: 2008922937 jmc@r-project.org Series Editors: DOI: 10.1007/978-0-387-75936-4 or by similar or dissimilar methodology now known or hereafter developed is forbidden.
  • 10.
    Preface This is abook about Software for Data Analysis: using computer software to extract information from some source of data by organizing, visualizing, modeling, or performing any other relevant computation on the data. We all seem to be swimming in oceans of data in the modern world, and tasks ranging from scientific research to managing a business require us to extract meaningful information from the data using computer software. This book is aimed at those who need to select, modify, and create software to explore data. In a word, programming. Our programming will center on the R system. R is an open-source software project widely used for computing with data and giving users a huge base of techniques. Hence, Programming with R. R provides a general language for interactive computations, supported by techniques for data organization, graphics, numerical computations, model- fitting, simulation, and many other tasks. The core system itself is greatly supplemented and enriched by a huge and rapidly growing collection of soft- ware packages built on R and, like R, largely implemented as open-source software. Furthermore, R is designed to encourage learning and develop- ing, with easy starting mechanisms for programming and also techniques to help you move on to more serious applications. The complete picture— the R system, the language, the available packages, and the programming environment—constitutes an unmatched resource for computing with data. At the same time, the “with” word in Programming with R is impor- tant. No software system is sufficient for exploring data, and we emphasize interfaces between systems to take advantage of their respective strengths. Is it worth taking time to develop or extend your skills in such program- ming? the right questions and providing trustworthy answers to them are the key to analyzing data, and the twin principles that will guide us. v Yes, because the investment can pay off both in the ability to ask questions and in the trust you can have in the answers. Exploring data with
  • 11.
    vi What’s in thebook? A sequence of chapters in the book takes the reader on successive steps from user to programmer to contributor, in the gradual progress that R encourages. Specifically: using R; simple programming; packages; classes and methods; inter-system interfaces (Chapters 2; 3; 4; 9 and 10; 11 and 12). The order reflects a natural progression, but the chapters are largely independent, with many cross references to encourage browsing. Other chapters explore computational techniques needed at all stages: basic computations; graphics; computing with text (Chapters 6; 7; 8). Lastly, a chapter (13) discusses how R works and the appendix covers some topics in the history of the language. Woven throughout are a number of reasonably serious examples, ranging from a few paragraphs to several pages, some of them continued elsewhere as they illustrate different techniques. See “Examples” in the index. I encourage you to explore these as leisurely as time permits, thinking about how the computations evolve, and how you would approach these or similar examples. The book has a companion R package, SoDA, obtainable from the main CRAN repository, as described in Chapter 4. A number of the functions and classes developed in the book are included in the package. The package also contains code for most of the examples; see the documentation for "Examples" in the package. Even at five hundred pages, the book can only cover a fraction of the relevant topics, and some of those receive a pretty condensed treatment. Spending time alternately on reading, thinking, and interactive computation will help clarify much of the discussion, I hope. Also, the final word is with the online documentation and especially with the software; a substantial benefit of open-source software is the ability to drill down and see what’s really happening. Who should read this book? I’ve written this book with three overlapping groups of readers generally in mind. First, “data analysts”; that is, anyone with an interest in exploring data, especially in serious scientific studies. This includes statisticians, certainly, but increasingly others in a wide range of disciplines where data-rich studies now require such exploration. Helping to enable exploration is our mission PREFACE
  • 12.
    vii here. I hopeand expect that you will find that working with R and re- lated software enhances your ability to learn from the data relevant to your interests. If you have not used R or S-Plus R before, you should precede this book (or at least supplement it) with a more basic presentation. There are a number of books and an even larger number of Web sites. Try searching with a combination of “introduction” or “introductory” along with “R”. Books by W. John Braun and Duncan J. Murdoch [2], Michael Crawley [11], Peter Dalgaard [12], and John Verzani [24], among others, are general introductions (both to R and to statistics). Other books and Web sites are beginning to appear that introduce R or S-Plus with a particular area of application in mind; again, some Web searching with suitable terms may find a presentation attuned to your interests. A second group of intended readers are people involved in research or teaching related to statistical techniques and theory. R and other modern software systems have become essential in the research itself and in commu- nicating its results to the community at large. Most graduate-level programs in statistics now provide some introduction to R. This book is intended to guide you on the followup, in which your software becomes more important to your research, and often a way to share results and techniques with the community. I encourage you to push forward and organize your software to be reusable and extendible, including the prospect of creating an R package to communicate your work to others. Many of the R packages now available derive from such efforts.. The third target group are those more directly interested in software and programming, particularly software for data analysis. The efforts of the R community have made it an excellent medium for “packaging” software and providing it to a large community of users. R is maintained on all the widely used operating systems for computing with data and is easy for users to install. Its package mechanism is similarly well maintained, both in the central CRAN repository and in other repositories. Chapter 4 covers both using packages and creating your own. R can also incorporate work done in other systems, through a wide range of inter-system interfaces (discussed in Chapters 11 and 12). Many potential readers in the first and second groups will have some experience with R or other software for statistics, but will view their involve- ment as doing only what’s absolutely necessary to “get the answers”. This book will encourage moving on to think of the interaction with the software as an important and valuable part of your activity. You may feel inhibited by not having done much programming before. Don’t be. Programming with PREFACE
  • 13.
    viii R can beapproached gradually, moving from easy and informal to more ambitious projects. As you use R, one of its strengths is its flexibility. By making simple changes to the commands you are using, you can customize interactive graphics or analysis to suit your needs. This is the takeoff point for programming: As Chapters 3 and 4 show, you can move from this first personalizing of your computations through increasingly ambitious steps to create your own software. The end result may well be your own contribution to the world of R-based software. How should you read this book? Any way that you find helpful or enjoyable, of course. But an author often imagines a conversation with a reader, and it may be useful to share my version of that. In many of the discussions, I imagine a reader pausing to decide how to proceed, whether with a specific technical point or to choose a direction for a new stage in a growing involvement with software for data analysis. Various chapters chart such stages in a voyage that many R users have taken from initial, casual computing to a full role as a contributor to the community. Most topics will also be clearer if you can combine reading with hands-on interaction with R and other software, in particular using the Examples in the SoDA package. This pausing for reflection and computing admittedly takes a little time. Often, you will just want a “recipe” for a specific task—what is often called the “cookbook” approach. By “cookbook” in software we usually imply that one looks a topic up in the index and finds a corresponding explicit recipe. That should work sometimes with this book, but we concentrate more on general techniques and extended examples, with the hope that these will equip readers to deal with a wider range of tasks. For the reader in a hurry, I try to insert pointers to online documentation and other resources. As an enthusiastic cook, though, I would point out that the great cook- books offer a range of approaches, similar to the distinction here. Some, such as the essential Joy of Cooking do indeed emphasize brief, explicit recipes. The best of these books are among the cook’s most valuable resources. Other books, such as Jacques Pépin’s masterful La Technique, teach you just that: techniques to be applied. Still others, such as the classic Mastering the Art of French Cooking by Julia Child and friends, are about learning and about underlying concepts as much as about specific techniques. It’s the latter two approaches that most resemble the goals of the present book. The book presents a number of explicit recipes, but the deeper emphasis is in on con- cepts and techniques. And behind those in turn, there will be two general principles of good software for data analyis. PREFACE
  • 14.
    ix Acknowledgments The ideas discussedin the book, as well as the software itself, are the results of projects involving many people and stretching back more than thirty years (see the appendix for a little history). Such a scope of participants and time makes identifying all the indi- viduals a hopeless task, so I will take refuge in identifying groups, for the most part. The most recent group, and the largest, consists of the “con- tributors to R”, not easy to delimit but certainly comprising hundreds of people at the least. Centrally, my colleagues in R-core, responsible for the survival, dissemination, and evolution of R itself. These are supplemented by other volunteers providing additional essential support for package manage- ment and distribution, both generally and specifically for repositories such as CRAN, BioConductor, omegahat, RForge and others, as well as the main- tainers of essential information resources—archives of mailing lists, search engines, and many tutorial documents. Then the authors of the thousands of packages and other software forming an unprecedented base of techniques; finally, the interested users who question and prod through the mailing lists and other communication channels, seeking improvements. This commu- nity as a whole is responsible for realizing something we could only hazily articulate thirty-plus years ago, and in a form and at a scale far beyond our imaginings. More narrowly from the viewpoint of this book, discussions within R-core have been invaluable in teaching me about R, and about the many techniques and facilities described throughout the book. I am only too aware of the many remaining gaps in my knowledge, and of course am responsible for all inaccuracies in the descriptions herein. Looking back to the earlier evolution of the S language and software, time has brought an increasing appreciation of the contribution of colleagues and management in Bell Labs research in that era, providing a nourishing environment for our efforts, perhaps indeed a unique environment. Rick Becker, Allan Wilks, Trevor Hastie, Daryl Pregibon, Diane Lambert, and W. S. Cleveland, along with many others, made essential contributions. Since retiring from Bell Labs in 2005, I have had the opportunity to interact with a number of groups, including students and faculty at several universities. Teaching and discussions at Stanford over the last two academic years have been very helpful, as were previous interactions at UCLA and at Auckland University. My thanks to all involved, with special thanks to Trevor Hastie, Mark Hansen, Ross Ihaka and Chris Wild. A number of the ideas and opinions in the book benefited from collab- PREFACE
  • 15.
    x orations and discussionswith Duncan Temple Lang, and from discussions with Robert Gentleman, Luke Tierney, and other experts on R, not that any of them should be considered at all responsible for defects therein. The late Gene Roddenberry provided us all with some handy terms, and much else to be enjoyed and learned from. Each of our books since the beginning of S has had the benefit of the editorial guidance of John Kimmel; it has been a true and valuable collab- oration, long may it continue. John Chambers Palo Alto, California January, 2008 PREFACE
  • 16.
    Contents 1 Introduction: Principlesand Concepts 1 1.1 Exploration: The Mission . . . . . . . . . . . . . . . . . . . . 1 1.2 Trustworthy Software: The Prime Directive . . . . . . . . . . 3 1.3 Concepts for Programming with R . . . . . . . . . . . . . . . 4 1.4 The R System and the S Language . . . . . . . . . . . . . . . 9 2 Using R 11 2.1 Starting R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 An Interactive Session . . . . . . . . . . . . . . . . . . . . . . 13 2.3 The Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4 Objects and Names . . . . . . . . . . . . . . . . . . . . . . . . 24 2.5 Functions and Packages . . . . . . . . . . . . . . . . . . . . . 25 2.6 Getting R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.7 Online Information About R . . . . . . . . . . . . . . . . . . . 31 2.8 What’s Hard About Using R? . . . . . . . . . . . . . . . . . . 34 3 Programming with R: The Basics 37 3.1 From Commands to Functions . . . . . . . . . . . . . . . . . 37 3.2 Functions and Functional Programming . . . . . . . . . . . . 43 3.3 Function Objects and Function Calls . . . . . . . . . . . . . . 50 3.4 The Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.5 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.6 Interactive Tracing and Editing . . . . . . . . . . . . . . . . . 67 3.7 Conditions: Errors and Warnings . . . . . . . . . . . . . . . . 74 3.8 Testing R Software . . . . . . . . . . . . . . . . . . . . . . . . 76 4 R Packages 79 4.1 Introduction: Why Write a Package? . . . . . . . . . . . . . . 79 4.2 The Package Concept and Tools . . . . . . . . . . . . . . . . 80 xi
  • 17.
    xii CONTENTS 4.3 Creatinga Package . . . . . . . . . . . . . . . . . . . . . . . . 85 4.4 Documentation for Packages . . . . . . . . . . . . . . . . . . . 95 4.5 Testing Packages . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.6 Package Namespaces . . . . . . . . . . . . . . . . . . . . . . . 103 4.7 Including C Software in Packages . . . . . . . . . . . . . . . . 108 4.8 Interfaces to Other Software . . . . . . . . . . . . . . . . . . . 108 5 Objects 111 5.1 Objects, Names, and References . . . . . . . . . . . . . . . . . 111 5.2 Replacement Expressions . . . . . . . . . . . . . . . . . . . . 115 5.3 Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.4 Non-local Assignments; Closures . . . . . . . . . . . . . . . . 125 5.5 Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.6 Reading and Writing Objects and Data . . . . . . . . . . . . 135 6 Basic Data and Computations 139 6.1 The Evolution of Data in the S Language . . . . . . . . . . . 140 6.2 Object Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.3 Vectors and Vector Structures . . . . . . . . . . . . . . . . . . 143 6.4 Vectorizing Computations . . . . . . . . . . . . . . . . . . . . 157 6.5 Statistical Data: Data Frames . . . . . . . . . . . . . . . . . . 166 6.6 Operators: Arithmetic, Comparison, Logic . . . . . . . . . . . 184 6.7 Computations on Numeric Data . . . . . . . . . . . . . . . . . 191 6.8 Matrices and Matrix Computations . . . . . . . . . . . . . . . 200 6.9 Fitting Statistical models . . . . . . . . . . . . . . . . . . . . 218 6.10 Programming Random Simulations . . . . . . . . . . . . . . . 221 7 Data Visualization and Graphics 237 7.1 Using Graphics in R . . . . . . . . . . . . . . . . . . . . . . . 238 7.2 The x-y Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 7.3 The Common Graphics Model . . . . . . . . . . . . . . . . . . 253 7.4 The graphics Package . . . . . . . . . . . . . . . . . . . . . . 263 7.5 The grid Package . . . . . . . . . . . . . . . . . . . . . . . . 271 7.6 Trellis Graphics and the lattice Package . . . . . . . . . . . 280 8 Computing with Text 289 8.1 Text Computations for Data Analysis . . . . . . . . . . . . . 289 8.2 Importing Text Data . . . . . . . . . . . . . . . . . . . . . . . 294 8.3 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . 298 8.4 Text Computations in R . . . . . . . . . . . . . . . . . . . . . 304
  • 18.
    CONTENTS xiii 8.5 Usingand Writing Perl . . . . . . . . . . . . . . . . . . . . . . 309 8.6 Examples of Text Computations . . . . . . . . . . . . . . . . 318 9 New Classes 331 9.1 Introduction: Why Classes? . . . . . . . . . . . . . . . . . . . 331 9.2 Programming with New Classes . . . . . . . . . . . . . . . . . 334 9.3 Inheritance and Inter-class Relations . . . . . . . . . . . . . . 344 9.4 Virtual Classes . . . . . . . . . . . . . . . . . . . . . . . . . . 351 9.5 Creating and Validating Objects . . . . . . . . . . . . . . . . 359 9.6 Programming with S3 Classes . . . . . . . . . . . . . . . . . . 362 9.7 Example: Binary Trees . . . . . . . . . . . . . . . . . . . . . . 369 9.8 Example: Data Frames . . . . . . . . . . . . . . . . . . . . . . 375 10 Methods and Generic Functions 381 10.1 Introduction: Why Methods? . . . . . . . . . . . . . . . . . . 381 10.2 Method Definitions . . . . . . . . . . . . . . . . . . . . . . . . 384 10.3 New Methods for Old Functions . . . . . . . . . . . . . . . . . 387 10.4 Programming Techniques for Methods . . . . . . . . . . . . . 389 10.5 Generic Functions . . . . . . . . . . . . . . . . . . . . . . . . 396 10.6 How Method Selection Works . . . . . . . . . . . . . . . . . . 405 11 Interfaces I: C and Fortran 411 11.1 Interfaces to C and Fortran . . . . . . . . . . . . . . . . . . . . 411 11.2 Calling R-Independent Subroutines . . . . . . . . . . . . . . . 415 11.3 Calling R-Dependent Subroutines . . . . . . . . . . . . . . . . 420 11.4 Computations in C++ . . . . . . . . . . . . . . . . . . . . . . 425 11.5 Loading and Registering Compiled Routines . . . . . . . . . . 426 12 Interfaces II: Other Systems 429 12.1 Choosing an Interface . . . . . . . . . . . . . . . . . . . . . . 430 12.2 Text- and File-Based Interfaces . . . . . . . . . . . . . . . . . 432 12.3 Functional Interfaces . . . . . . . . . . . . . . . . . . . . . . . 433 12.4 Object-Based Interfaces . . . . . . . . . . . . . . . . . . . . . 435 12.5 Interfaces to OOP Languages . . . . . . . . . . . . . . . . . . 437 12.6 Interfaces to C++ . . . . . . . . . . . . . . . . . . . . . . . . . 440 12.7 Interfaces to Databases and Spreadsheets . . . . . . . . . . . 446 12.8 Interfaces without R . . . . . . . . . . . . . . . . . . . . . . . 450
  • 19.
    xiv CONTENTS 13 HowR Works 453 13.1 The R Program . . . . . . . . . . . . . . . . . . . . . . . . . . 453 13.2 The R Evaluator . . . . . . . . . . . . . . . . . . . . . . . . . 454 13.3 Calls to R Functions . . . . . . . . . . . . . . . . . . . . . . . 460 13.4 Calls to Primitive Functions . . . . . . . . . . . . . . . . . . . 463 13.5 Assignments and Replacements . . . . . . . . . . . . . . . . . 465 13.6 The Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 468 13.7 Memory Management for R Objects . . . . . . . . . . . . . . 471 A Some Notes on the History of S 475 Bibliography 479 Index 481 Index of R Functions and Documentation 489 Index of R Classes and Types 497
  • 20.
    Chapter 1 Introduction: Principlesand Concepts This chapter presents some of the concepts and principles that recur throughout the book. We begin with the two guiding prin- ciples: the mission to explore and the responsibility to be trust- worthy (Sections 1.1 and 1.2). With these as guidelines, we then introduce some concepts for programming with R (Section 1.3, page 4) and add some justification for our emphasis on that sys- tem (Section 1.4, page 9). 1.1 Exploration: The Mission The first principle I propose is that our Mission, as users and creators of software for data analysis, is to enable the best and most thorough explo- ration of data possible. That means that users of the software must be ale to ask the meaningful questions about their applications, quickly and flexibly. Notice that speed here is human speed, measured in clock time. It’s the time that the actual computations take, but usually more importantly, it’s also the time required to formulate the question and to organize the data in a way to answer it. This is the exploration, and software for data analysis makes it possible. A wide range of techniques is needed to access and transform data, to make predictions or summaries, to communicate results to others, and to deal with ongoing processes. Whenever we consider techniques for these and other requirements in the chapters that follow, the first principle we will try to apply is the Mission: 1
  • 21.
    2 CHAPTER 1.INTRODUCTION: PRINCIPLES AND CONCEPTS How can these techniques help people to carry out this specific kind of exploration? Ensuring that software for data analysis exists for such purposes is an important, exciting, and challenging activity. Later chapters examine how we can select and develop software using R and other systems. The importance, excitement, and challenge all come from the central role that data and computing have come to play in modern society. Science, business and many other areas of society continually rely on understanding data, and that understanding frequently involves large and complicated data processes. A few examples current as the book is written can suggest the flavor: • Many ambitious projects are underway or proposed to deploy sensor networks, that is, coordinated networks of devices to record a variety of measurements in an ongoing program. The data resulting is essen- tial to understand environmental quality, the mechanisms of weather and climate, and the future of biodiversity in the earth’s ecosystems. In both scale and diversity, the challenge is unprecedented, and will require merging techniques from many disciplines. • Astronomy and cosmology are undergoing profound changes as a result of large-scale digital mappings enabled by both satellite and ground recording of huge quantities of data. The scale of data collected allows questions to be addressed in an overall sense that before could only be examined in a few, local regions. • Much business activity is now carried out largely through distributed, computerized processes that both generate large and complex streams of data and also offer through such data an unprecedented opportu- nity to understand one’s business quantitatively. Telecommunications in North America, for example, generates databases with conceptually billions of records. To explore and understand such data has great attraction for the business (and for society), but is enormously chal- lenging. These and many other possible examples illustrate the importance of what John Tukey long ago characterized as “the peaceful collision of computing and data analysis”. Progress on any of these examples will require the ability to explore the data, flexibly and in a reasonable time frame.
  • 22.
    1.2. TRUSTWORTHY SOFTWARE:THE PRIME DIRECTIVE 3 1.2 Trustworthy Software: The Prime Directive Exploration is our mission; we and those who use our software want to find new paths to understand the data and the underlying processes. The mission is, indeed, to boldly go where no one has gone before. But, we need boldness to be balanced by our responsibility. We have a responsibility for the results of data analysis that provides a key compensating principle. The complexity of the data processes and of the computations applied to them mean that those who receive the results of modern data analysis have limited opportunity to verify the results by direct observation. Users of the analysis have no option but to trust the analysis, and by extension the software that produced it. Both the data analyst and the software provider therefore have a strong responsibility to produce a result that is trustworthy, and, if possible, one that can be shown to be trustworthy. This is the second principle: the computations and the software for data analysis should be trustworthy: they should do what they claim, and be seen to do so. Neither those who view the results of data analysis nor, in many cases, the statisticians performing the analysis can directly validate exten- sive computations on large and complicated data processes. Ironically, the steadily increasing computer power applied to data analysis often distances the results further from direct checking by the recipient. The many com- putational steps between original data source and displayed results must all be truthful, or the effect of the analysis may be worthless, if not pernicious. This places an obligation on all creators of software to program in such a way that the computations can be understood and trusted. This obligation I label the Prime Directive. Note that the directive in no sense discourages exploratory or approx- imate methods. As John Tukey often remarked, better an approximate answer to the right question than an exact answer to the wrong question. We should seek answers boldly, but always explaining the nature of the method applied, in an open and understandable format, supported by as much evidence of its quality as can be produced. As we will see, a number of more technically specific choices can help us satisfy this obligation. Readers who have seen the Star Trek R television series1 may recognize the term “prime directive”. Captains Kirk, Picard, and Janeway and their crews were bound by a directive which (slightly paraphrased) was: Do noth- ing to interfere with the natural course of a new civilization. Do not distort 1 Actually, at least five series, from “The Original” in 1966 through “Enterprise”, not counting the animated version, plus many films. See startrek.com and the many reruns if this is a gap in your cultural background.
  • 23.
    4 CHAPTER 1.INTRODUCTION: PRINCIPLES AND CONCEPTS the development. Our directive is not to distort the message of the data, and to provide computations whose content can be trusted and understood. The prime directive of the space explorers, notice, was not their mission but rather an important safeguard to apply in pursuing that mission. Their mission was to explore, to “boldly go where no one has gone before”, and all that. That’s really our mission too: to explore how software can add new abilities for data analysis. And our own prime directive, likewise, is an important caution and guiding principle as we create the software to support our mission. Here, then, are two motivating principles: the mission, which is bold exploration; and the prime directive, trustworthy software. We will examine in the rest of the book how to select and program software for data analysis, with these principles as guides. A few aspects of R will prove to be especially relevant; let’s examine those next. 1.3 Concepts for Programming with R The software and the programming techniques to be discussed in later chap- ters tend to share some concepts that make them helpful for data analysis. Exploiting these concepts will often benefit both the effectiveness of pro- gramming and the quality of the results. Each of the concepts arises nat- urally in later chapters, but it’s worth outlining them together here for an overall picture of our strategy in programming for data analysis. Functional Programming Software in R is written in a functional style that helps both to understand the intent and to ensure that the implementation corresponds to that intent. Computations are organized around functions, which can encapsulate spe- cific, meaningful computational results, with implementations that can be examined for their correctness. The style derives from a more formal theory of functional programming that restricts the computations to obtain well- defined or even formally verifiable results. Clearly, programming in a fully functional manner would contribute to trustworthy software. The S lan- guage does not enforce a strict functional programming approach, but does carry over some of the flavor, particularly when you make some effort to emphasize simple functional definitions with minimal use of non-functional computations. As the scope of the software expands, much of the benefit from functional style can be retained by using functional methods to deal with varied types
  • 24.
    1.3. CONCEPTS FORPROGRAMMING WITH R 5 of data, within the general goal defined by the generic function. Classes and Methods The natural complement to functional style in programming is the definition of classes of objects. Where functions should clearly encapsulate the actions in our analysis, classes should encapsulate the nature of the objects used and returned by calls to functions. The duality between function calls and objects is a recurrent theme of programming with R. In the design of new classes, we seek to capture an underlying concept of what the objects mean. The relevant techniques combine directly specifying the contents (the slots), relating the new class to existing classes (the inheritance), and expressing how objects should be created and validated (methods for initializing and validating). Method definitions knit together functions and classes. Well-designed methods extend the generic definition of what a function does to provide a specific computational method when the argument or arguments come from specified classes, or inherit from those classes. In contrast to methods that are solely class-based, as in common object-oriented programming languages such as C++ or Java, methods in R are part of a rich but complex network of functional and object-based computation. The ability to define classes and methods in fact is itself a major advan- tage in adhering to the Prime Directive. It gives us a way to isolate and define formally what information certain objects should contain and how those objects should behave when functions are applied to them. Data Frames Trustworthy data analysis depends first on trust in the data being analyzed. Not so much that the data must be perfect, which is impossible in nearly any application and in any case beyond our control, but rather that trust in the analysis depends on trust in the relation between the data as we use it and the data as it has entered the process and then has been recorded, organized and transformed. In serious modern applications, the data usually comes from a process external to the analysis, whether generated by scientific observations, com- mercial transactions or any of many other human activities. To access the data for analysis by well-defined and trustworthy computations, we will ben- efit from having a description, or model, for the data that corresponds to its natural home (often in DBMS or spreadsheet software), but can also be
  • 25.
    6 CHAPTER 1.INTRODUCTION: PRINCIPLES AND CONCEPTS a meaningful basis for data as used in the analysis. Transformations and restructuring will often be needed, but these should be understandable and defensible. The model we will emphasize is the data frame, essentially a formulation of the traditional view of observations and variables. The data frame has a long history in the S language but modern techniques for classes and meth- ods allow us to extend the use of the concept. Particularly useful techniques arise from using the data frame concept both within R, for model-fitting, data visualization, and other computations, and also for effective commu- nication with other systems. Spreadsheets and relational database software both relate naturally to this model; by using it along with unambiguous mechanisms for interfacing with such software, the meaning and structure of the data can be preserved. Not all applications suit this approach by any means, but the general data frame model provides a valuable basis for trustworthy organization and treatment of many sources of data. Open Source Software Turning to the general characteristics of the languages and systems available, note that many of those discussed in this book are open-source software systems; for example, R, Perl, Python, many of the database systems, and the Linux operating system. These systems all provide access to source code sufficient to generate a working version of the software. The arrangement is not equivalent to “public-domain” software, by which people usually mean essentially unrestricted use and copying. Instead, most open-source systems come with a copyright, usually held by a related group or foundation, and with a license restricting the use and modification of the software. There are several versions of license, the best known being the Gnu Public License and its variants (see gnu.org/copyleft/gpl.html), the famous GPL. R is distributed under a version of this license (see the "COPYING" file in the home directory of R). A variety of other licenses exists; those accepted by the Open Source Initiative are described at opensource.org/licenses. Distinctions among open-source licenses generate a good deal of heat in some discussions, often centered on what effect the license has on the usability of the software for commercial purposes. For our focus, particularly for the concern with trustworthy software for data analysis, these issues are not directly relevant. The popularity of open-source systems certainly owes a lot to their being thought of as “free”, but for our goal of trustworthy software, this is also not the essential property. Two other characteristics contribute more. First, the simple openness itself allows any sufficiently
  • 26.
    1.3. CONCEPTS FORPROGRAMMING WITH R 7 competent observer to enquire fully about what is actually being computed. There are no intrinsic limitations to the validation of the software, in the sense that it is all there. Admittedly, only a minority of users are likely to delve very far into the details of the software, but some do. The ability to examine and critique every part of the software makes for an open-ended scope for verifying the results. Second, open-source systems demonstrably generate a spirit of commu- nity among contributors and active users. User groups, e-mail lists, chat rooms and other socializing mechanisms abound, with vigorous discussion and controversy, but also with a great deal of effort devoted to testing and extension of the systems. The active and demanding community is a key to trustworthy software, as well as to making useful tools readily available. Algorithms and Interfaces R is explicitly seen as built on a set of routines accessed by an interface, in particular by making use of computations in C or Fortran. User-written extensions can make use of such interfaces, but the core of R is itself built on them as well. Aside from routines that implement R-dependent techniques, there are many basic computations for numerical results, data manipulation, simulation, and other specific computational tasks. These implementations we can term algorithms. Many of the core computations on which the R software depends are now implemented by collections of such software that are widely used and tested. The algorithm collections have a long history, often predating the larger-scale open-source systems. It’s an important con- cept in programming with R to seek out such algorithms and make them part of a new computation. You should be able to import the trust built up in the non-R implementation to make your own software more trustworthy. Major collections on a large scale and many smaller, specialized al- gorithms have been written, generally in the form of subroutines in For- tran, C, and a few other general programming languages. Thirty-plus years ago, when I was writing Computational Methods for Data Analysis, those who wanted to do innovative data analysis often had to work directly from such routines for numerical computations or simulation, among other topics. That book expected readers to search out the routines and install them in the readers’ own computing environment, with many details left unspecified. An important and perhaps under-appreciated contribution of R and other systems has been to embed high-quality algorithms for many computa- tions in the system itself, automatically available to users. For example, key parts of the LAPACK collection of computations for numerical linear algebra
  • 27.
    8 CHAPTER 1.INTRODUCTION: PRINCIPLES AND CONCEPTS are included in R, providing a basis for fitting linear models and for other matrix computations. Other routines in the collection may not be included, perhaps because they apply to special datatypes or computations not often encountered. These routines can still be used with R in nearly all cases, by writing an interface to the routine (see Chapter 11). Similarly, the internal code for pseudo-random number generation in- cludes most of the well-regarded and thoroughly tested algorithms for this purpose. Other tasks, such as sorting and searching, also use quality al- gorithms. Open-source systems provide an advantage when incorporating such algorithms, because alert users can examine in detail the support for computations. In the case of R, users do indeed question and debate the behavior of the system, sometimes at great length, but overall to the benefit of our trust in programming with R. The best of the algorithm collections offer another important boost for trustworthy software in that the software may have been used in a wide variety of applications, including some where quality of results is critically important. Collections such as LAPACK are among the best-tested substan- tial software projects in existence, and not only by users of higher-level systems. Their adaptability to a wide range of situations is also a frequent benefit. The process of incorporating quality algorithms in a user-oriented system such as R is ongoing. Users can and should seek out the best computations for their needs, and endeavor to make these available for their own use and, through packages, for others as well. Incorporating algorithms in the sense of subroutines in C or Fortran is a special case of what we call inter-system interfaces in this book. The general concept is similar to that for algorithms. Many excellent software systems exist for a variety of purposes, including text-manipulation, spreadsheets, database management, and many others. Our approach to software for data analysis emphasizes R as the central system, for reasons outlined in the next section. In any case, most users will prefer to have a single home system for their data analysis. That does not mean that we should or can absorb all computations di- rectly into R. This book emphasizes the value of expressing computations in a natural way while making use of high-quality implementations in whatever system is suitable. A variety of techniques, explored in Chapter 12, allows us to retain a consistent approach in programming with R at the same time.
  • 28.
    1.4. THE RSYSTEM AND THE S LANGUAGE 9 1.4 The R System and the S Language This book includes computations in a variety of languages and systems, for tasks ranging from database management to text processing. Not all systems receive equal treatment, however. The central activity is data analysis, and the discussion is from the perspective that our data analysis is mainly expressed in R; when we examine computations, the results are seen from an interactive session with R. This view does not preclude computations done partly or entirely in other systems, and these computations may be complete in themselves. The data analysis that the software serves, however, is nearly always considered to be in R. Chapter 2 covers the use of R broadly but briefly ( if you have no ex- perience with it, you might want to consult one of the introductory books or other sources mentioned on page vii in the preface). The present section give a brief summary of the system and relates it to the philosophy of the book. R is an open-source software system, supported by a group of volunteers from many countries. The central control is in the hands of a group called R-core, with the active collaboration of a much larger group of contributors. The base system provides an interactive language for numerical computa- tions, data management, graphics and a variety of related calculations. It can be installed on Windows, Mac OS X, and Linux operating systems, with a variety of graphical user interfaces. Most importantly, the base system is supported by well over a thousand packages on the central repository cran.r-project.org and in other collections. R began as a research project of Ross Ihaka and Robert Gentleman in the 1990s, described in a paper in 1996 [17]. It has since expanded into software used to implement and communicate most new statistical techniques. The software in R implements a version of the S language, which was designed much earlier by a group of us at Bell Laboratories, described in a series of books ([1], [6], and [5] in the bibliography). The S-Plus system also implements the S language. Many of the com- putations discussed in the book work in S-Plus as well, although there are important differences in the evaluation model, noted in later chapters. For more on the history of S, see Appendix A, page 475. The majority of the software in R is itself written in the same language used for interacting with the system, a dialect of the S language. The lan- guage evolved in essentially its present form during the 1980s, with a gen- erally functional style, in the sense used on page 4: The basic unit of pro- gramming is a function. Function calls usually compute an object that is a
  • 29.
    10 CHAPTER 1.INTRODUCTION: PRINCIPLES AND CONCEPTS function of the objects passed in as arguments, without side effects to those arguments. Subsequent evolution of the language introduced formal classes and methods, again in the sense discussed in the previous section. Methods are specializations of functions according to the class of one or more of the arguments. Classes define the content of objects, both directly and through inheritance. R has added a number of features to the language, while remain- ing largely compatible with S. All these topics are discussed in the present book, particularly in Chapters 3 for functions and basic programming, 9 for classes, and 10 for methods. So why concentrate on R? Clearly, and not at all coincidentally, R reflects the same philosophy that evolved through the S language and the approach to data analysis at Bell Labs, and which largely led me to the concepts I’m proposing in this book. It is relevant that S began as a medium for statistics researchers to express their own computations, in support of research into data analysis and its applications. A direct connection leads from there to the large community that now uses R similarly to implement new ideas in statistics, resulting in the huge resource of R packages. Added to the characteristics of the language is R’s open-source nature, exposing the system to continual scrutiny by users. It includes some al- gorithms for numerical computations and simulation that likewise reflect modern, open-source computational standards in these fields. The LAPACK software for numerical linear algebra is an example, providing trustworthy computations to support statistical methods that depend on linear algebra. Although there is plenty of room for improvement and for new ideas, I believe R currently represents the best medium for quality software in sup- port of data analysis, and for the implementation of the principles espoused in the present book. From the perspective of our first development of S some thirty-plus years ago, it’s a cause for much gratitude and not a little amazement.
  • 30.
    Chapter 2 Using R Thischapter covers the essentials for using R to explore data in- teractively. Section 2.1 covers basic access to an R session. Users interact with R through a single language for both data analy- sis and programming (Section 2.3, page 19). The key concepts are function calls in the language and the objects created and used by those calls (2.4, 24), two concepts that recur through- out the book. The huge body of available software is organized around packages that can be attached to the session, once they are installed (2.5, 25). The system itself can be downloaded and installed from repositories on the Web (2.6, 29); there are also a number of resources on the Web for information about R (2.7, 31). Lastly, we examine aspects of R that may raise difficulties for some new users (2.8, 34). 2.1 Starting R R runs on the commonly used platforms for personal computing: Windows R , Mac OS X R , Linux, and some versions of UNIX R . In the usual desktop en- vironments for these platforms, users will typically start R as they would most applications, by clicking on the R icon or on the R file in a folder of applications. An application will then appear looking much like other applications on the platform: for example, a window and associated toolbar. In the 11
  • 31.
    12 CHAPTER 2.USING R standard version, at least on most platforms, the application is called the "R Console". In Windows recently it looked like this: The application has a number of drop-down menus; some are typical of most applications ("File", "Edit", and "Help"). Others such as "Packages" are special to R. The real action in running R, however, is not with the menus but in the console window itself. Here the user is expected to type input to R in the form of expressions; the program underlying the application responds by doing some computation and if appropriate by displaying a version of the results for the user to look at (printed results normally in the same console window, graphics typically in another window). This interaction between user and system continues, and constitutes an R session. The session is the fundamental user interface to R. The following section describes the logic behind it. A session has a simple model for user interaction, but one that is fundamentally different from users’ most common experience with personal computers (in applications such as word processors, Web browsers, or audio/video systems). First-time users may feel abandoned, left to flounder on their own with little guidance about what to do and even less help when they do something wrong. More guidance is available than may be obvious, but such users are not entirely wrong in their
  • 32.
    2.2. AN INTERACTIVESESSION 13 reaction. After intervening sections present the essential concepts involved in using R, Section 2.8, page 34 revisits this question. 2.2 An Interactive Session Everything that you do interactively with R happens in a session. A session starts when you start up R, typically as described above. A session can also be started from other special interfaces or from a command shell (the original design), without changing the fundamental concept and with the basic appearance remaining as shown in this section and in the rest of the book. Some other interfaces arise in customizing the session, on page 17. During an R session, you (the user) provide expressions for evaluation by R, for the purpose of doing any sort of computation, displaying results, and creating objects for further use. The session ends when you decide to quit from R. All the expressions evaluated in the session are just that: general ex- pressions in R’s version of the S language. Documentation may mention “commands” in R, but the term just refers to a complete expression that you type interactively or otherwise hand to R for evaluation. There’s only one language, used for either interactive data analysis or for programming, and described in section 2.3. Later sections in the book come back to ex- amine it in more detail, especially in Chapter 3. The R evaluator displays a prompt, and the user responds by typing a line of text. Printed output from the evaluation and other messages appear following the input line. Examples in the book will be displayed in this form, with the default prompts preceding the user’s input: > quantile(Declination) 0% 25% 50% 75% 100% -27.98 -11.25 8.56 17.46 27.30 The "> " at the beginning of the example is the (default) prompt string. In this example the user responded with quantile(Declination) The evaluator will keep prompting until the input can be interpreted as a complete expression; if the user had left off the closing ")", the evaluator would have prompted for more input. Since the input here is a complete expression, the system evaluated it. To be pedantic, it parsed the input text
  • 33.
    14 CHAPTER 2.USING R and evaluated the resulting object. The evaluation in this case amounts to calling a function named quantile. The printed output may suggest a table, and that’s intentional. But in fact nothing special happened; the standard action by the evaluator is to print the object that is the value of the expression. All evaluated expressions are objects; the printed output corresponds to the object; specifically, the form of printed output is determined by the kind of object, by its class (tech- nically, through a method selected for that class). The call to quantile() returned a numeric vector, that is, an object of class "numeric". A method was selected based on this class, and the method was called to print the result shown. The quantile() function expects a vector of numbers as its argument; with just this one argument it returns a numeric vector containing the minimum, maximum, median and quartiles. The method for printing numeric vectors prints the values in the vec- tor, five of them in this case. Numeric objects can optionally have a names attribute; if they do, the method prints the names as labels above the num- bers. So the "0%" and so on are part of the object. The designer of the quantile() function helpfully chose a names attribute for the result that makes it easier to interpret when printed. All these details are unimportant if you’re just calling quantile() to summarize some data, but the important general concept is this: Objects are the center of computations in R, along with the function calls that create and use those objects. The duality of objects and function calls will recur in many of our discussions. Computing with existing software hinges largely on using and creating objects, via the large number of available functions. Programming, that is, creating new software, starts with the simple creation of function objects. More ambitious projects often use a paradigm of creating new classes of objects, along with new or modified functions and methods that link the functions and classes. In all the details of programming, the fundamental duality of objects and functions remains an underlying concept. Essentially all expressions are evaluated as function calls, but the lan- guage includes some forms that don’t look like function calls. Included are the usual operators, such as arithmetic, discussed on page 21. Another use- ful operator is `?`, which looks up R help for the topic that follows the question mark. To learn about the function quantile(): > ?quantile In standard GUI interfaces, the documentation will appear in a separate window, and can be generated from a pull-down menu as well as from the
  • 34.
    2.2. AN INTERACTIVESESSION 15 `?` operator. Graphical displays provide some of the most powerful techniques in data analysis, and functions for data visualization and other graphics are an es- sential part of R: > plot(Date, Declination) Here the user typed another expression, plot(Date, Declination); in this case producing a scatter plot as a side effect, but no printed output. The graphics during an interactive session typically appear in one or more sepa- rate windows created by the GUI, in this example a window using the native quartz() graphics device for Mac OS X. Graphic output can also be produced in a form suitable for inclusion in a document, such as output in a general file format (PDF or postscript, for example). Computations for graphics are discussed in more detail in Chapter 7. The sequence of expression and evaluation shown in the examples is es- sentially all there is to an interactive session. The user supplies expressions and the system evaluates them, one after another. Expressions that pro- duce simple summaries or plots are usually done to see something, either graphics or printed output. Aside from such immediate gratification, most expressions are there in order to assign objects, which can then be used in later computations: > fitK <- gam(Kyphosis ∼ s(Age, 4) + Number, family = binomial) Evaluating this expression calls the function gam() and assigns the value of the call, associating that object with the name fitK. For the rest of the
  • 35.
    16 CHAPTER 2.USING R session, unless some other assignment to this name is carried out, fitK can be used in any expression to refer to that object; for example, coef(fitK) would call a function to extract some coefficients from fitK (which is in this example a fitted model). Assignments are a powerful and interesting part of the language. The basic idea is all we need for now, and is in any case the key concept: As- signment associates an object with a name. The term “associates” has a specific meaning here. Whenever any expression is evaluated, the context of the evaluation includes a local environment, and it is into this environ- ment that the object is assigned, under the corresponding name. The object and name are associated in the environment, by the assignment operation. From then on, the name can be used as a reference to the object in the en- vironment. When the assignment takes place at the “top level” (in an input expression in the session), the environment involved is the global environ- ment. The global environment is part of the current session, and all objects assigned there remain available for further computations in the session. Environments are an important part of programming with R. They are also tricky to deal with, because they behave differently from other objects. Discussion of environments continues in Section 2.4, page 24. A session ends when the user quits from R, either by evaluating the expression q() or by some other mechanism provided by the user interface. Before ending the session, the system offers the user a chance to save all the objects in the global environment at the end of the session: > q() Save workspace image? [y/n/c]: y If the user answers yes, then when a new session is started in the same working directory, the global environment will be restored. Technically, the environment is restored, not the session. Some actions you took in the session, such as attaching packages or using options(), may not be restored, if they don’t correspond to objects in the global environment. Unfortunately, your session may end involuntarily: the evaluator may be forced to terminate the session or some outside event may kill the process. R tries to save the workspace even when fatal errors occur in low-level C or Fortran computations, and such disasters should be rare in the core R computations and in well-tested packages. But to be truly safe, you should explicitly back up important results to a file if they will be difficult to re- create. See documentation for functions save() and dump() for suitable techniques.
  • 36.
    2.2. AN INTERACTIVESESSION 17 Customizing the R session As you become a more involved user of R, you may want to customize your interaction with it to suit your personal preferences or the goals motivating your applications. The nature of the system lends itself to a great variety of options from the most general to trivial details. At the most general is the choice of user interface. So far, we have assumed you will start R as you would start other applications on your computer, say by clicking on the R icon. A second approach, available on any system providing both R and a command shell, is to invoke R as a shell command. In its early history, S in all its forms was typically started as a program from an interactive shell. Before multi-window user interfaces, the shell would be running on an interactive terminal of some sort, or even on the machine’s main console. Nowadays, shells or terminal applications run in their own windows, either supported directly by the platform or indirectly through a client window system, such as those based on X11. Invoking R from a shell allows some flexibility that may not be provided directly by the application (such as run- ning with a C-level debugger). Online documentation from a shell command is printed text by default, which is not as convenient as a browser interface. To initiate a browser interface to the help facility, see the documentation for help.start(). A third approach, somewhat in between the first two, is to use a GUI based on another application or language, potentially one that runs on mul- tiple platforms. The most actively supported example of this approach is ESS, a general set of interface tools in the emacs editor. ESS stands for Emacs Speaks Statistics, and the project supports other statistical systems as well as R; see ess.r-project.org. For those who love emacs as a general com- putational environment, ESS provides a variety of GUI-like features, plus a user-interface programmability characteristic of emacs. The use of a GUI based on a platform-independent user interface has advantages for those who need to work regularly on more than one operating system. Finally, an R session can be run in a non-interactive form, usually invoked in a batch mode from a command shell, with its input taken from a file or other source. R can also be invoked from within another application, as part of an inter-system interface. In all these situations, the logic of the R session remains essentially the same as shown earlier (the major exception being a few computations in R that behave differently in a non-interactive session).
  • 37.
    18 CHAPTER 2.USING R Encoding of text A major advance in R’s world view came with the adoption of multiple locales, using information available to the R session that defines the user’s preferred encoding of text and other options related to the human language and geographic location. R follows some evolving standards in this area. Many of those standards apply to C software, and therefore they fit fairly smoothly into R. Normally, default locales will have been set when R was installed that reflect local language and other conventions in your area. See Section 8.1, page 293, and ?locales for some concepts and techniques related to locales. The specifications use standard but somewhat unintuitive terminology; un- less you have a particular need to alter behavior for parsing text, sorting character data, or other specialized computations, caution suggests sticking with the default behavior. Options during evaluation R offers mechanisms to control aspects of evaluation in the session. The function options() is used to share general-purpose values among functions. Typical options include the width of printed output, the prompt string shown by the parser, and the default device for graphics. The options() mechanism maintains a named list of values that persist through the session; functions use those values, by extracting the relevant option via getOption(): > getOption("digits") [1] 7 In this case, the value is meant to be used to control the number of digits in printing numerical data. A user, or in fact any function, can change this value, by using the same name as an argument to options(): > 1.234567890 [1] 1.234568 > options(digits = 4) > 1.234567890 [1] 1.235 For the standard options, see ?options; however, a call to options() can be used by any computation to set values that are then used by any other computation. Any argument name is legal and will cause the corresponding option to be communicated among functions.
  • 38.
    2.3. THE LANGUAGE19 Options can be set from the beginning of the session; see ?Startup. How- ever, saving a workspace image does not cause the options in effect to be saved and restored. Although the options() mechanism does use an R ob- ject, .Options, the internal C code implementing options() takes the object from the base package, not from the usual way of finding objects. The code also enforces some constraints on what’s legal for particular options; for ex- ample, "digits" is interpreted as a single integer, which is not allowed to be too small or too large, according to values compiled into R. The use of options() is convenient and even necessary for the evalu- ator to behave intelligently and to allow user customization of a session. Writing functions that depend on options, however, reduces our ability to understand these functions’ behavior, because they now depend on exter- nal, changeable values. The behavior of code that depends on an option may be altered by any other function called at any earlier time during the session, if the other function calls options(). Most R programming should be functional programming, in the sense that each function call performs a well-defined computation depending only on the arguments to that call. The options() mechanism, and other dependencies on external data that can change during the session, compromise functional programming. It may be worth the danger, but think carefully about it. See page 47 for more on the programming implications, and for an example of the dangers. 2.3 The Language This section and the next describe the interactive language as you need to use it during a session. But as noted on page 13, there is no interactive lan- guage, only the one language used for interaction and for programming. To use R interactively, you basically need to understand two things: functions and objects. That same duality, functions and objects, runs through every- thing in R from an interactive session to designing large-scale software. For interaction, the key concepts are function calls and assignments of objects, dealt with in this section and in section 2.4 respectively. The language also has facilities for iteration and testing (page 22), but you can often avoid interactive use of these, largely because R function calls operate on, and return, whole objects. Function Calls As noted in Section 2.2, the essential computation in R is the evaluation of a call to a function. Function calls in their ordinary form consist of
  • 39.
    20 CHAPTER 2.USING R the function’s name followed by a parenthesized argument list; that is, a sequence of arguments separated by commas. plot(Date, Declination) glm(Survived ∼ .) Arguments in function calls can be any expression. Each function has a set of formal arguments, to which the actual arguments in the call are matched. As far as the language itself is concerned, a call can supply any subset of the complete argument list. For this purpose, argument expressions can optionally be named, to associate them with a particular argument of the function: jitter(y, amount = .1 * rse) The second argument in the call above is explicitly matched to the formal argument named amount. To find the argument names and other information about the function, request the online documentation. A user interface to R or a Web browser gives the most convenient access to documentation, with documentation listed by package and within package by topic, including individual functions by name. Documentation can also be requested in the language, for example: > ?jitter This will produce some display of documentation for the topic "jitter", including in the case of a function an outline of the calling sequence and a discussion of individual arguments. If there is no documentation, or you don’t quite believe it, you can find the formal argument names from the function object itself: > formalArgs(jitter) [1] "x" "factor" "amount" Behind this, and behind most techniques involving functions, is the simple fact that jitter and all functions are objects in R. The function name is a reference to the corresponding object. So to see what a function does, just type its name with no argument list following. > jitter function (x, factor = 1, amount = NULL) { if (length(x) == 0) return(x) if (!is.numeric(x)) stop("’x’ must be numeric") etc.
  • 40.
    2.3. THE LANGUAGE21 The printed version is another R expression, meaning that you can input such an expression to define a function. At which point, you are programming in R. See Chapter 3. The first section of that chapter should get you started. In principle, the function preceding the parenthesized arguments can be specified by any expression that returns a function object, but in practice functions are nearly always specified by name. Operators Function calls can also appear as operator expressions in the usual scientific notation. y - mean(y) weight > 0 x < 100 | is.na(date) The usual operators are defined for arithmetic, comparisons, and logical operations (see Chapter 6). But operators in R are not built-in; in fact, they are just special syntax for certain function calls. The first line in the example above computes the same result as: `-`(y, mean(y)) The notation `-` is an example of what are called backtick quotes in R. These quotes make the evaluator treat an arbitrary string of characters as if it was a name in the language. The evaluator responds to the names "y" or "mean" by looking for an object of that name in the current environment. Similarly `-` causes the evaluator to look for an object named "-". Whenever we refer to operators in the book we use backtick quotes to emphasize that this is the name of a function object, not treated as intrinsically different from the name mean. Functions to extract components or slots from objects are also provided in operator form: mars$Date classDef@package And the expressions for extracting subsets or elements from objects are also actually just specialized function calls. The expression y[i] is recognized in the language and evaluated as a call to the function `[`, which extracts a subset of the object in its first argument, with the subset defined by the remaining arguments. The expression y[i] is equivalent to:
  • 41.
    22 CHAPTER 2.USING R `[`(y, i) You could enter the second form perfectly legally. Similarly, the function `[[` extracts a single element from an object, and is normally presented as an operator expression: mars[["Date"]] You will encounter a few other operators in the language. Frequently useful for elementary data manipulation is the `:` operator, which produces a sequence of integers between its two arguments: 1:length(x) Other operators include `∼`, used in specifying models, `%%` for modulus, `%*%` for matrix multiplication, and a number of others. New operators can be created and recognized as infix operators by the parser. The last two operators mentioned above are examples of the general convention in the language that interprets %text% as the name of an operator, for any text string. If it suits the style of computation, you can define any function of two arguments and give it, say, the name `%d%`. Then an expression such as x %d% y will be evaluated as the call: `%d%`(x, y) Iteration: A quick introduction The language used by R has the iteration and conditional expressions typical of a C-style language, but for the most part you can avoid typing all but the simplest versions interactively. The following is a brief guide to using and avoiding iterative expressions. The workhorse of iteration is the for loop. It has the form: for( var in seq ) expr
  • 42.
    2.3. THE LANGUAGE23 where var is a name and seq is a vector of values. The loop assigns each element of seq to var in sequence and then evaluates the arbitrary expression expr each time. When you use the loop interactively, you need to either show something each time (printed or graphics) or else assign the result somewhere; otherwise, you won’t get any benefit from the computation. For example, the function plot() has several “types” of x-y plots (points, lines, both, etc.). To repeat a plot with different types, one can use a for() loop over the codes for the types: > par(ask=TRUE) > for(what in c("p","l","b")) Declination, type = what) The call to par() caused the graphics to pause between plots, so we get to see each plot, rather then having the first two flash by. The variables Date and Declination come from some data on the planet Mars, in a data frame object, mars (see Section 6.5, page 176). If we wanted to see the class of each of the 17 variables in that data frame, another for() loop would do it: for(j in names(mars)) print(class(mars[,j])) But this will just print 17 lines of output, which we’ll need to relate to the variable names. Not much use. Here’s where an alternative to iteration is usually better. The workhorse of these is the function sapply(). It applies a function to each element of the object it gets as its first argument, so: > sapply(mars,class) Year X Year.1 Month "integer" "logical" "integer" "integer" Day Day..adj. Hour Min etc. The function tries to simplify the result, and is intelligent enough to include the names as an attribute. See ?sapply for more details, and the “See Also” section of that documentation for other similar functions. The language has other iteration operators (while() and repeat), and the usual conditional operators (if ... else). These are all useful in pro- gramming and discussed in Chapter 3. By the time you need to use them in a non-trivial way interactively, in fact, you should consider turning your computation into a function, so Chapter 3 is indeed the place to look; see Section 3.4, page 58, in particular, for more detail about the language. plot(Date,
  • 43.
    24 CHAPTER 2.USING R 2.4 Objects and Names A motto in discussion of the S language has for many years been: every- thing is an object. You will have a potentially very large number of objects available in your R session, including functions, datasets, and many other classes of objects. In ordinary computations you will create new objects or modify existing ones. As in any computing language, the ability to construct and modify ob- jects relies on a way to refer to the objects. In R, the fundamental reference to an object is a name. This is an essential concept for programming with R that arises throughout the book and in nearly any serious programming project. The basic concept is once again the key thing to keep in mind: references to objects are a way for different computations in the language to refer to the same object; in particular, to make changes to that object. In the S language, references to ordinary objects are only through names. And not just names in an abstract, global sense. An object reference must be a name in a particular R environment. Typically, the reference is established initially either by an assignment or as an argument in a function call. Assignment is the obvious case, as in the example on page 15: > fitK <- gam(Kyphosis ∼ s(Age, 4) + Number, family = binomial) Assignment creates a reference, the name "fitK", to some object. That ref- erence is in some environment. For now, just think of environments as tables that R maintains, in which objects can be assigned names. When an assign- ment takes place in the top-level of the R session, the current environment is what’s called the global environment. That environment is maintained throughout the current session, and optionally can be saved and restored between sessions. Assignments appear inside function definitions as well. These assign- ments take place during a call to the function. They do not use the global environment, fortunately. If they did, every assignment to the name "x" would overwrite the same reference. Instead, assignments during function calls use an environment specially created for that call. So another reason that functions are so central to programming with R is that they protect users from accidentally overwriting objects in the middle of a computation. The objects available during an interactive R session depend on what packages are attached; technically, they depend on the nested environments through which the evaluator searches, when given a name, to find a corre- sponding object. See Section 5.3, page 121, for the details of the search.
  • 44.
    2.5. FUNCTIONS ANDPACKAGES 25 2.5 Functions and Packages In addition to the software that comes with any copy of R, there are many thousands of functions available to be used in an R session, along with a correspondingly large amount of other related software. Nearly all of the important R software comes in the form of packages that make the software easily available and usable. This section discusses the implications of using different packages in your R session. For much more detail, see Chapter 4, but that is written more from the view of writing or extending a package. You will get there, I hope, as your own programming efforts take shape. The topic here, though, is how best to use other people’s efforts that have been incorporated in packages. The process leading from needing some computational tool to having it available in your R session has three stages: finding the software, typically in a package; installing the package; and attaching the package to the session. The last step is the one you will do most often, so let’s begin by assuming that you know which package you need and that the required package has been installed with your local copy of R. See Section 2.5, page 26, for finding and installing the relevant package. You can tell whether the package is attached by looking for it in the printed result of search(); alternatively, you can look for a particular ob- ject with the function find(), which returns the names of all the attached packages that contain the object. Suppose we want to call the function dotplot(), for example. > find("dotplot") character(0) No attached package has an object of this name. If we happen to know that the function is in the package named lattice, we can make that package available for the current session. A call to the function library() requests this: library(lattice) The function is library() rather than package() only because the original S software called them libraries. Notice also that the package name was given without quotes. The library() function, and a similar function require(), do some nonstandard evaluation that takes unquoted names. That’s another historical quirk that saves users from typing a couple of quote characters. If a package of the name "lattice" has been installed for this version of R, the call will attach the package to the session, making its functions and other objects available:
  • 45.
    26 CHAPTER 2.USING R > library(lattice) > find("dotplot") [1] "package:lattice" By “available”, we mean that the evaluator will find an object belonging to the package when an expression uses the corresponding name. If the user types dotplot(Declination) now, the evaluator will normally find the appropriate function. To see why the quibbling “normally” was added, we need to say more precisely what happens to find a function object. The evaluator looks first in the global environment for a function of this name, then in each of the attached packages, in the order shown by search(). The evaluator will generally stop searching when it finds an object of the desired name, dotplot, Declination, or whatever. If two attached packages have functions of the same name, one of them will “mask” the object in the other (the evaluator will warn of such conflicts, usually, when a package is attached with conflicting names). In this case, the result returned by find() would show two or more packages. For example, the function gam() exists in two packages, gam and mgcv. If both were attached: > find("gam") [1] "package:gam" "package:mgcv" A simple call to gam() will get the version in package gam; the version in package mgcv is now masked. R has some mechanisms designed to get around such conflicts, at least as far as possible. The language has an operator, `::`, to specify that an object should come from a particular package. So mgcv::gam and gam::gam refer unambiguously to the versions in the two packages. The masked version of gam() could be called by: > fitK <- mgcv::gam(Kyphosis ∼ s(Age, 4) + etc. Clearly one doesn’t want to type such expressions very often, and they only help if one is aware of the ambiguity. For the details and for other approaches, particularly when you’re programming your own packages, see Section 5.3, page 121. Finding and installing packages Finding the right software is usually the hardest part. There are thousands of packages and smaller collections of R software in the world. Section 2.7, page 31, discusses ways to search for information; as a start, CRAN, the
  • 46.
    2.5. FUNCTIONS ANDPACKAGES 27 central repository for R software, has a large collection of packages itself, plus further links to other sources for R software. Extended browsing is recommended, to develop a general feel for what’s available. CRAN supports searching with the Google search engine, as do some of the other major collections. Use the search engine on the Web site to look for relevant terms. This may take some iteration, particularly if you don’t have a good guess for the actual name of the function. Browse through the search output, looking for a relevant entry, and figure out the name of the package that contains the relevant function or other software. Finding something which is not in these collections may take more in- genuity. General Web search techniques often help: combine the term "R" with whatever words describe your needs in a search query. The e-mail lists associated with R will usually show up in such a search, but you can also browse or search explicitly in the archives of the lists. Start from the R home page, r-project.org, and follow the link for "Mailing Lists". On page 15, we showed a computation using the function gam(), which fits a generalized additive model to data. This function is not part of the basic R software. Before being able to do this computation, we need to find and install some software. The search engine at the CRAN site will help out, if given either the function name "gam" or the term "generalized additive models". The search engine on the site tends to give either many hits or no relevant hits; in this case, it turns out there are many hits and in fact two packages with a gam() function. As an example, suppose we decide to install the gam package. There are two choices at this point, in order to get and install the pack- age(s) in question: a binary or a source copy of the package. Usually, installing from binary is the easy approach, assuming a binary version is available from the repository. Binary versions are currently available from CRAN only for Windows and Mac OS X platforms, and may or may not be available from other sources. Otherwise, or if you prefer to install from source, the procedure is to download a copy of the source archive for the package and apply the "INSTALL" command. From an R session, the function install.packages() can do part or all of the process, again depending on the package, the repository, and your particular platform. The R GUI may also have a menu-driven equivalent for these procedures: Look for an item in the tool bar about installing packages. First, here is the function install.packages(), as applied on a Mac OS X platform. To obtain the gam package, for example:
  • 47.
    28 CHAPTER 2.USING R install.packages("gam") The function will then invoke software to access a CRAN site, download the packages requested, and attempt to install them on the same R system you are currently using. The actual download is an archive file whose name concatenates the name of the package and its current version; in our example, "gam 0.98.tgz". Installing from inside a session has the advantage of implicitly specifying some of the information that you might otherwise need to provide, such as the version of R and the platform. Optional arguments control where to put the installed packages, whether to use source or binary and other details. As another alternative, you can obtain the download file from a Web browser, and run the installation process from the command shell. If you aren’t already at the CRAN Web site, select that item in the navigation frame, choose a mirror site near you, and go there. Select "Packages" from the CRAN Web page, and scroll or search in the list of packages to reach a package you want (it’s a very long list, so searching for the exact name of the package may be required). Selecting the relevant package takes you to a page with a brief description of the package. For the package gam at the time this is written: At this stage, you can access the documentation or download one of the proffered versions of the package. Or, after studying the information, you could revert to the previous approach and use install.packages(). If you do work from one of the source or binary archives, you need to apply the shell-style command to install the package. Having downloaded the source archive for package gam, the command would be:
  • 48.
    2.6. GETTING R29 R CMD INSTALL gam_0.98.tar.gz The INSTALL utility is used to install packages that we write ourselves as well, so detailed discussion appears in Chapter 4. The package for this book In order to follow the examples and suggested computations in the book, you should install the SoDA package. It is available from CRAN by any of the mechanisms shown above. In addition to the many references to this package in the book itself, it will be a likely source for new ideas, enhancements, and corrections related to the book. 2.6 Getting R R is an open-source system, in particular a system licensed under the GNU Public license. That license requires that the source code for the system be freely available. The current source implementing R can be obtained over the Web. This open definition of the system is a key support when we are concerned with trustworthy software, as is the case with all similar open-source systems. Relatively simple use of R, and first steps in programming with R, on the other hand, don’t require all the resources that would be needed to create your local version of the system starting from the source. You may already have a version of R on your computer or network. If not, or if you want a more recent version, binary copies of R can be obtained for the commonly used platforms, from the same repository. It’s easier to start with binary, although as your own programming becomes more advanced you may need more of the source-related resources anyway. The starting point for obtaining the software is the central R Web site, r-project.org. You can go there to get the essential information about R. Treat that as the up-to-date authority, not only for the software itself but also for detailed information about R (more on that on page 31). The main Web site points you to a variety of pages and other sites for various purposes. To obtain R, one goes to the CRAN repository, and from there to either "R Binaries" or "R Sources". Downloading software may involve large transfers over the Web, so you are encouraged to spread the load. In particular, you should select from a list of mirror sites, preferably picking one geographically near your own location. When we talk about the
  • 49.
    30 CHAPTER 2.USING R CRAN site from now on, we mean whichever one of the mirror sites you have chosen. R is actively maintained for three platforms: Windows, Mac OS X, and Linux. For these platforms, current versions of the system can be obtained from CRAN in a form that can be directly installed, usually by a standard in- stallation process for that platform. For Windows, one obtains an executable setup program (a ".exe" file); for Mac OS X, a disk image (a ".dmg" file) con- taining the installer for the application. The Linux situation is a little less straightforward, because the different flavors of Linux differ in details when installing R. The Linux branch of "R Binaries" branches again according to the flavors of Linux supported, and sometimes again within these branches according to the version of this flavor. The strategy is to keep drilling down through the directories, selecting at each stage the directory that corre- sponds to your setup, until you finally arrive at a directory that contains appropriate files (usually ".rpm" files) for the supported versions of R. Note that for at least one flavor of Linux (Debian), R has been made a part of the platform. You can obtain R directly from the Debian Web site. Look for Debian packages named "r-base", and other names starting with "r-". If you’re adept at loading packages into Debian, working from this direction may be the simplest approach. However, if the version of Debian is older than the latest stable version of R, you may miss out on some later improvements and bug fixes unless you get R from CRAN. For any platform, you will eventually download a file (".exe", "dmg", ".rpm", or other), and then install that file according to the suitable ritual for this platform. Installation may require you to have some administration privileges on the machine, as would be true for most software installations. (If installing software at all is a new experience for you, it may be time to seek out a more experienced friend.) Depending on the platform, you may have a choice of versions of R, but it’s unlikely you want anything other than the most recent stable version, the one with the highest version number. The platform’s operating system will also have versions, and you generally need to download a file asserted to work with the version of the operating system you are running. (There may not be any such file if you have an old version of the operating system, or else you may have to settle for a comparably ancient version of R.) And just to add further choices, on some platforms you need to choose from different hardware (for example, 32-bit versus 64-bit architecture). If you don’t know which choice applies, that may be another indication that you should seek expert advice. Once the binary distribution has been downloaded and installed, you should have direct access to R in the appropriate mechanism for your plat-
  • 50.
    2.7. ONLINE INFORMATIONABOUT R 31 form. Installing from source Should you? For most users of R, not if they can avoid it, because they will likely learn more about programming than they need to or want to. For readers of this book, on the other hand, many of these details will be relevant when you start to seriously create or modify software. Getting the source, even if you choose not to install it, may help you to study and understand key computations. The instructions for getting and for installing R from source are contained in the online manual, R Installation and Administration, available from the Documentation link at the r-project.org Web site. 2.7 Online Information About R Information for users is in various ways both a strength and a problem with open-source, cooperative enterprises like R. At the bottom, there is always the source, the software itself. By definition, no software that is not open to study of all the source code can be as available for deep study. In this sense, only open-source software can hope to fully satisfy the Prime Directive by offering unlimited examination of what is actually being computed. But on a more mundane level, some open-source systems have a reputa- tion for favoring technical discussions aimed at the insider over user-oriented documentation. Fortunately, as the R community has grown, an increasing effort has gone into producing and organizing information. Users who have puzzled out answers to practical questions have increasingly fed back the results into publicly available information sources. Most of the important information sources can be tracked down starting at the main R Web page, r-project.org. Go there for the latest pointers. Here is a list of some of the key resources, followed by some comments about them. Manuals: The R distribution comes with a set of manuals, also available at the Web site. There are currently six manuals: An Introduction to R, Writing R Extensions, R Data Import/Export, The R Language Definition, R Installation and Administration, and R Internals. Each is available in several formats, notably as Web-browsable HTML docu- ments.
  • 51.
    32 CHAPTER 2.USING R Help files: R itself comes with files that document all the functions and other objects intended for public use, as well as documentation files on other topics (for example, ?Startup, discussing how an R session starts). All contributed packages should likewise come with files documenting their publicly usable functions. The quality control tools in R largely enforce this for packages on CRAN. Help files form the database used to respond to the help requests from an R session, either in response to the Help menu item or through the `?` operator or help() function typed by the user. The direct requests in these forms only access terms explicitly labeling the help files; typically, the names of the functions and a few other general terms for documentation (these are called aliases in discussions of R documentation). For example, to get help on a function in this way, you must know the name of the function exactly. See the next item for alternatives. Searching: R has a search mechanism for its help files that generalizes the terms available beyond the aliases somewhat and introduces some additional searching flexibility. See ?help.search for details. The r-project.org site has a pointer to a general search of the files on the central site, currently using the Google search engine. This pro- duces much more general searches. Documentation files are typically displayed in their raw, L A TEX-like form, but once you learn a bit about this, you can usually figure out which topic in which package you need to look at. And, beyond the official site itself, you can always apply your favorite Web search to files generally. Using "R" as a term in the search pattern will usually generate appropriate entries, but it may be difficult to avoid plenty of inappropriate ones as well. The Wiki: Another potentially useful source of information about R is the site wiki.r-project.org, where users can contribute documentation. As with other open Wiki sites, this comes with no guarantee of accu- racy and is only as good as the contributions the community provides. But it has the key advantage of openness, meaning that in some “sta- tistical” sense it reflects what R users understand, or at least that subset of the users sufficiently vocal and opinionated to submit to the Wiki.
  • 52.
    2.7. ONLINE INFORMATIONABOUT R 33 The strength of this information source is that it may include material that users find relevant but that developers ignore for whatever reason (too trivial, something users would never do, etc.). Some Wiki sites have sufficient support from their user community that they can func- tion as the main information source on their topic. As of this writing, the R Wiki has not reached that stage, so it should be used as a sup- plement to other information sources, and not the primary source, but it’s a valuable resource nevertheless. The mailing lists: There are a number of e-mail lists associated officially with the R project (officially in the sense of having a pointer from the R Web page, r-project.org, and being monitored by members of R core). The two most frequently relevant lists for programming with R are r-help, which deals with general user questions, and r-devel, which deals generally with more “advanced” questions, including fu- ture directions for R and programming issues. As well as a way to ask specific questions, the mailing lists are valu- able archives for past discussions. See the various search mechanisms pointed to from the mailing list Web page, itself accessible as the Mailing lists pointer on the r-project.org site. As usual with tech- nical mailing lists, you may need patience to wade through some long tirades and you should also be careful not to believe all the assertions made by contributors, but often the lists will provide a variety of views and possible approaches. Journals: The electronic journal R News is the newsletter of the R Foun- dation, and a good source for specific tutorial help on topics related to R, among other R-related information. See the Newsletter pointer on the cran.r-project.org Web site. The Journal of Statistical Software is also an electronic journal; its coverage is more general as its name suggests, but many of the articles are relevant to programming with R. See the Web site jstatsoft.org. A number of print journals also have occasional articles of direct or in- direct relevance, for example, Journal of Computational and Graphical Statistics and Computational Statistics and Data Analysis.
  • 53.
    34 CHAPTER 2.USING R 2.8 What’s Hard About Using R? This chapter has outlined the computations involved in using R. An R session consists of expressions provided by the user, typically typed into an R console window. The system evaluates these expressions, usually either showing the user results (printed or graphic output) or assigning the result as an object. Most expressions take the form of calls to functions, of which there are many thousands available, most of them in R packages available on the Web. This style of computing combines features found in various other lan- guages and systems, including command shells and programming languages. The combination of a functional style with user-level interaction—expecting the user to supply functional expressions interactively—is less common. Be- ginning users react in many ways, influenced by their previous experience, their expectations, and the tasks they need to carry out. Most readers of this book have selected themselves for more than a first encounter with the software, and so will mostly not have had an extremely negative reaction. Examining some of the complaints may be useful, however, to understand how the software we create might respond (and the extent to which we can respond). Our mission of supporting effective exploration of data obliges us to try. The computational style of an R session is extremely general, and other aspects of the system reinforce that generality, as illustrated by many of the topics in this book (the general treatment of objects and the facilities for interacting with other systems, for example). In response to this generality, thousands of functions have been written for many techniques. This diversity has been cited as a strength of the system, as indeed it is. But for some users exactly this computational style and diversity present barriers to using the system. Requiring the user to compose expressions is very different from the mode of interaction users have with typical applications in current com- puting. Applications such as searching the Web, viewing documents, or playing audio and video files all present interfaces emphasizing selection- and-response rather than composing by the user. The user selects each step in the computation, usually from a menu, and then responds to the op- tions presented by the software as a result. When the user does have to compose (that is, to type) it is typically to fill in specific information such as a Web site, file or optional feature desired. The eventual action taken, which might be operationally equivalent to evaluating an expression in R, is effectively defined by the user’s interactive path through menus, forms and other specialized tools in the interface. Based on the principles espoused
  • 54.
    2.8. WHAT’S HARDABOUT USING R? 35 in this book, particularly the need for trustworthy software, we might ob- ject to a selection-and-response approach to serious analysis, because the ability to justify or reproduce the analysis is much reduced. However, most non-technical computing is done by selection and response. Even for more technical applications, such as producing documents or using a database system, the user’s input tends to be relatively free form. Modern document-generating systems typically format text according to selected styles chosen by the user, rather than requiring the user to express controls explicitly. These differences are accentuated when the expressions required of the R user take the form of a functional, algebraic language rather than free-form input. This mismatch between requirements for using R and the user’s experi- ence with other systems contributes to some common complaints. How does one start, with only a general feeling of the statistical goals or the “results” wanted? The system itself seems quite unhelpful at this stage. Failures are likely, and the response to them also seems unhelpful (being told of a syntax error or some detailed error in a specific function doesn’t suggest what to do next). Worse yet, computations that don’t fail may not produce any directly useful results, and how can one decide whether this was the “right” computation? Such disjunctions between user expectations and the way R works be- come more likely as the use of R spreads. From the most general view, there is no “solution”. Computing is being viewed differently by two groups of people, prospective users on one hand, and the people who created the S language, R and the statistical software extending R on the other hand. The S language was designed by research statisticians, initially to be used primarily by themselves and their colleagues for statistical research and data analysis. (See the Appendix, page 475.) A language suited for this group to communicate their ideas (that is, to “program”) is certain to be pitched at a level of abstraction and generality that omits much detail necessary for users with less mathematical backgrounds. The increased use of R and the growth in software written using it bring it to the notice of such potential users far more than was the case in the early history of S. In addition to questions of expressing the analysis, simply choosing an analysis is often part of the difficulty. Statistical data analysis is far from a routine exercise, and software still does not encapsulate all the expertise needed to choose an appropriate analysis. Creating such expert software has been a recurring goal, pursued most actively perhaps in the 1980s, but it must be said that the goal remains far off. So to a considerable extent the response to such user difficulties must
  • 55.
    Exploring the Varietyof Random Documents with Different Content
  • 56.
    the Mountains, andhover over the Coast Region generally, literally deluging Western Oregon and Washington, at certain seasons of the year, with rains and fogs. The year before, at Fort Vancouver, they had had one hundred and twenty consecutive days of rain, in one year, without counting the intervening showers; and they said, it wasn't "much of a year for rain" either! Another year, they didn't see the sun there for eighty days together, without reckoning the occasional fogs. No wonder the Oregonians are called "Web-Feet." They do say, the children there are all born web-footed, like ducks and geese, so as to paddle about, and thus get along well in that amphibious region. Perhaps this is rather strong, even for Darwinism; but I can safely vouch for Oregon's all-sufficing rains and fogs, whatever their effects on the species. Our fellow-passengers down the Columbia were chiefly returning miners, going below to winter and recruit; but rough as they were and merry at times, they were, as a rule, self-respecting and orderly. Our Fenian friends, who had raced with us down Powder River and Grande Ronde Valleys and across the Blue Mountains, turned up here again—"Shanks," "Fatty," and all—and subsequently embarked on the same steamer with us at Portland for San Francisco. A few Chinamen also were on board; but they behaved civilly, and were treated kindly.
  • 57.
    CHAPTER XVI. FORT VANCOUVERTO SAN FRANCISCO. Fort Vancouver is an old Government Post, established in 1849, when Washington Territory was still a part of Oregon, and all the great region there was yet a wilderness. The village of Vancouver, a parasite on its outskirts, had grown up gradually; but had long since been distanced by Portland, across the Columbia in Oregon. A fine plateau, with a bold shore, made the Post everything desirable; but back of the post-grounds, the unbroken forest was still everywhere around it. It was now Headquarters of the Department of the Columbia, and the base for all military operations in that section. Here troops and supplies were gathered, for all the posts up the Columbia and its tributaries; though Portland, rather, seemed to be the natural brain of all that region. So, too, it controlled and supplied the forts at the mouth of the Columbia and the posts on Puget Sound; and, indeed, was of prime importance to the Government in many ways. Gen. Steele, in command of the Department, was an old Regular officer, who during the war commanded first in Missouri, afterwards around Vicksburg, then in Arkansas, and always with ability. He is now no more (dying in 1868), but some things he related in speaking of the war seem worth preserving. He said, Gen. Sherman was undoubtedly a great soldier; but he owed much to the rough schooling of his first campaigns, and improved from year to year. He said, Sherman in '62 was "scary" about Price's movements in Missouri and cited as an instance, that he once ordered the depot at Rolla broken up and the troops withdrawn, for fear Price would "gobble up" everybody and everything. He (Steele) then a Colonel, but in command at Rolla, appealed to Gen. Halleck, and was allowed
  • 58.
    to remain; andsubsequently Sherman, with his customary frankness, admitted his mistake. So, he said, Sherman in '63, when campaigning around Vicksburg, had little confidence in Grant's famous movement to the rear, via Grand Gulf and the Big Black, though the results were so magnificent. He said Sherman was somewhere up the Yazoo, with Porter and the gun-boats, and from there wrote him (Steele), in command of the Corps during Sherman's absence, that the proposed movement was perilous, and would probably fail, ruining them all; but, "nevertheless," he added, right loyally, "We must support Grant cordially and thoroughly, dear Steele, whatever happens." Subsequently, after they had landed at Grand Gulf—repulsed Pemberton and hurled him back on Vicksburg —cleaned Joe Johnston out of Jackson and chased him out of the country—and were crossing the Big Black in triumph, the movement now apparently a sure thing, Sherman and he were lying down to rest a little, at a house near the bridge, while the troops were filing over. Presently, an orderly announced Gen. Grant and staff riding by, when Sherman instantly sprang up, and rushing out of the house bareheaded seized Grant by the hand, and shaking it very warmly exclaimed, "I congratulate you, General, with all my heart, on the success of your movement. And, by heaven, sir, the movement is yours, too; for nobody else would endorse it!" He added, he never heard of Sherman's "protesting" against the movement, as reported afterwards in the newspapers, and didn't believe he ever had—"was too soldierly, by far, for that"—but he (Steele), knew all the facts at the time, and the above was about the Truth of History. Poor Steele! He was a true Army bachelor, fond of horses and dogs, and a connoisseur in both. He was besides a man of fine intelligence, and after dinner told a camp-story capitally. I remember several he told, with great gusto, while we shared his cosy quarters at Vancouver; but have not space for them here. Afterwards, we met him again in San Francisco, on leave of absence, the beloved of all army circles, and the favorite of society. May he rest in peace!
  • 59.
    But to returnto Fort Vancouver. We spent several days there very pleasantly, getting the bearings of things from there as a centre, and were loath to leave its hospitable quarters. It was now the first week in December; but the grapes were still hanging on the vines at Maj. N.'s quarters, and all about the post the grass was springing fresh and green, as in April in the East. We had fog or rain, or both together, about every day; no heavy down-pours, however, but gentle drizzles, as if the Oregon-Washington sky was only a great sieve, with perpetual water on 'tother side. They said, this was their usual weather from fall to spring, and then they had a delightful summer; though sometimes occasional snow-storms, sweeping down from the Mountains in January or February, gave them a taste of winter. Such snows, however, were light, and never lasted long. It seems, the Gulf Stream of the Pacific, sweeping up from the tropics, bears the isothermal lines so far north on this coast, that here at Fort Vancouver in the latitude of Montreal, they have the climate of the Carolinas in winter, with little of their excessive heats in summer. Walla-Walla, in latitude 46°, boasts the range of Washington, D. C. in 39°; and San Francisco, on the line of New York, claims the climate of Savannah. One evening while there, after a day of weary rain, the clouds suddenly broke away, and just at sunset we caught another noble view of Mount Hood again. A thin, veil-like cloud enrobed his feet, extending much of the way up; but above, his heaven-kissing head rose right regally, and his snowy crown became transfigured through all the changes—from pink to purple, and into night—as the day faded out. He looked still loftier and grander, than we had yet seen him, as if piercing the very sky, and was really superb. Aye, superbus. Haughty, imperial, supremely proud—which is about what the Romans meant, if I mistake not. A ride of six miles down the Columbia, on the little steamer Fanny Troup, and then twelve miles up the Willamette, landed us at Portland, Oregon, the metropolis of all that region. The distance from Fort Vancouver, as the crow flies, is only about six miles, but by water it is fully eighteen, as above stated. Here we found a thrifty busy town, of eight or ten thousand people, with all the eastern
  • 60.
    evidences of substantialwealth and prosperity. Much of the town was well built, and the rest was rapidly changing for the better. Long rows of noble warehouses lined the wharves, many of the stores were large and even elegant, and off in the suburbs handsome residences were already springing up, notwithstanding the abounding stumps nearly everywhere. The town seemed unfortunately located, the river-plateau was so narrow there; but just across the Willamette was East Portland, a growing suburb, with room plenty and to spare. A ferry-boat, plying constantly, connected the two places, and made them substantially one. Portland already boasted water, gas, and Nicholson pavements; and had more of a solid air and tone, than any city we had seen since leaving the Missouri. The rich black soil, on which she stands, makes her streets in the rainy season, as then, sloughs or quagmires, unless macadamised or Nicholsoned; but she was at work on these, and they promised soon to be in good condition. Several daily papers, two weekly religious ones, and a fine Mercantile Library, all spoke well for her intelligence and culture, while her Public School buildings and her Court-House would have been creditable anywhere. The New England element was noticeable in many of her citizens, and Sunday came here once a week, as regularly as in Boston or Bangor. The Methodists and Presbyterians both worshipped in goodly edifices, and the attendance at each the Sunday we were there was large and respectable. Being the first city of importance north of San Francisco, and the brain of our northwest coast, Portland was full of energy and vigor, and believed thoroughly in her future. The great Oregon Steam Navigation Company had their headquarters here, and poured into her lap all the rich trade of the Columbia and its far-reaching tributaries, that tap Idaho, Montana, and even British America itself. So, also, the coastwise steamers, from San Francisco up, all made Portland their terminus, and added largely to her commerce. Back of her lay the valley of the Willamette, and the rich heart of Oregon; and her wharves, indeed, were the gateways to thousands of miles of territory and trade, in all directions. Nearer to the Sandwich
  • 61.
    Islands and China,by several hundred miles, than California, she had already opened a brisk trade with both, and boasted that she could sell sugars, teas, silks, rice, etc., cheaper than San Francisco. Victoria, the British city up on Puget Sound, had once been a dangerous rival; but Portland had managed to beat her out of sight, and claimed now she would keep her beaten. It was Yankee Doodle against John Bull; and, of course, in such a contest, Victoria went to the wall! It seemed singular, however, that the chief city of the northwest coast should be located there—a hundred miles from the sea, and even then twelve miles up the little Willamette. Your first thought is, Portland has no right to be at all, where she now is. But, it appears, she originally got a start, from absorbing and controlling the large trade of the Willamette, and when the Columbia was opened up to navigation rapidly grew into importance, by her heavy dealings in flour, wool, cattle, lumber, etc. The discovery of mines in Idaho and Montana greatly invigorated her, and now she had got so much ahead, and so much capital and brains were concentrated here, that it seemed hard for any new place to compete with her successfully. [14] Moreover, we were told, there are no good locations for a town along the Columbia from the ocean up to the Willamette, nor on the Willamette up to Portland. Along the Columbia, from the ocean up, wooded hills and bluffs come quite down to the water, and the whole back country, as a rule, is still a wilderness of pines and firs; while the Willamette up to Portland, they said, was apt to overflow its banks in high water. Hence, Portland seemed secure in her supremacy, at least for years to come, though no doubt at no distant day a great city will rise on Puget Sound, that will dominate all that coast, up to Sitka and down to San Francisco. From want of time, we failed to reach the Posts on Puget's Sound; but all accounts agreed, that—land-locked by Vancouver's and San Juan islands—we there have one of the largest and most magnificent harbors in the world. With the Northern Pacific Railroad linking it to Duluth and the great lakes, commerce will yet seek its great advantages; and the Boston, if not the New York, of the Pacific will yet flourish where now are
  • 62.
    only the wildsof Washington. The Sound already abounded in saw- mills, and the ship-timber and lumber of Washington we subsequently found famed in San Francisco, and throughout California. She was then putting lumber down in San Francisco, cheaper than the Californians could bring it from their own foot-hills, and her magnificent forests of fir and pine promised yet to be a rare blessing to all the Pacific Coast. The Portlanders, of course, were energetic, go-ahead men, from all parts of the North, with a good sprinkling from the South. Outside of Portland, however, the Oregonians appeared to be largely from Missouri, and to have retained many of their old Missouri and so- called "conservative" ideas still. All through our Territories, indeed, Missouri seemed to have been fruitful of emigrants. Kentucky, Indiana, Illinois, were everywhere well represented; but Missouri led, especially in Idaho and Oregon. This fact struck us repeatedly, and was well accounted for by friend Meacham's remark (top of the Blue Mountains), "the left wing of Price's army is still encamped in this region." The tone of society, in too many places, seemed to be of the Nasby order, if not worse. No doubt hundreds of deserters and draft-sneaks, from both armies, had made their way into those distant regions; and then, besides, the influence of our old officials, both civil and military, had long been pro-slavery, and this still lingered among communities, whom the war had not touched, and among whom school-houses and churches were still far too few. Of course, we met some right noble and devoted Union men everywhere, especially in Colorado; but elsewhere, and as a rule, they did not strike us as numerous, nor as very potential. In saying this, I hope I am not doing the Territories injustice; but this is how their average public opinion impressed a passing traveller, and other tourists we met en route remarked the same thing. Here at Portland, John Chinaman turned up again, and seemed to be behaving thoroughly well. At Boisè, we found these heathen paying their stage-fare, and riding down to the Columbia, while many Caucasians were walking, and here at Portland they appeared alike
  • 63.
    thrifty and prosperous.Their advent here had been comparatively recent, and there was still much prejudice against them, especially among the lower classes; but they were steadily winning their way to public favor by their sobriety, their intelligence and thrift, and good conduct generally. Washing and ironing, and household service generally, seemed to be their chief occupations, and nearly everybody gave them credit for industry and integrity. Mr. Arrigoni, the proprietor of our hotel (and he was one of the rare men, who know how to "keep a hotel"), spoke highly of their capacity and honesty, and said he wanted no better servants anywhere. One of them, not over twenty-one, had a contract to do the washing and ironing for the Arrigoni House, at a hundred dollars per month, and was executing it with marked fidelity. He certainly did his work well, judging by what we saw of the hotel linen. In walking about the town, we occasionally came upon their signs, over the door of some humble dwelling, as for example, "Ling & Ching, Laundry;" "Hop Kee, washing and ironing;" "Ching Wing, shoemaker;" "Chow Pooch, doctor;" etc. As far as we could see, they appeared to be intent only on minding their own business, and as a class were doing more hearty honest work by far, than most of their bigoted defamers. We could not refrain from wishing them well, they were so sober, industrious, and orderly; for, after all, are not these the first qualities of good citizenship the world over? We left Portland, Dec. 11th, on the good steamer Oriflamme, for San Francisco. For a wonder, it was a calm clear day, with the bracing air of our Octobers in the east, and as we glided out of the Willamette into the noble Columbia, we had a last superb view of Mts. Jefferson, Hood, Adams and St. Helens all at the same time. Sometimes Rainier also is visible from here, but ordinarily only Hood and St. Helens appear. We thought this the finest view of these splendid snow-peaks that we had had yet, and it seemed strange no artist had yet attempted to group them all in one grand landscape, from the mouth of the Willamette as a stand-point. Or, if he could not get them all in, he might at least combine Hood and St. Helens. The breadth and scope, the grandeur and sublimity of such a
  • 64.
    picture, with theColumbia in the foreground, and the great range of the Cascade Mountains in the perspective, would make a painting, that would live forever. We watched them all, with the naked eye and through the glass, until we were far down the Columbia, and to the last, Hood was the same "Dread ambassador from earth to heaven!" How he soared and towered, beyond and above everything, as if communing with the Almighty! Lofty as were the rest, they seemed small by his majestic side. St. Helens, however, though not so imperial, was perhaps more simply and chastely beautiful. An unbroken forest of fir, deep green verging into black, girt her feet, while above she "swelled vast to heaven," a perfect snow sphere rather than cone, whose celestial whiteness dazzled the eye. She looked like a virgin's or a nun's white breast, unsullied by sin, and standing sharply out against the glorious azure of that December sky, seemed indeed a perfect emblem of purity and beauty. Farther down the river, we detected a light smoke or vapor, drifting dreamily away from her summit, and Capt. Conner of the Oriflamme said this was not unusual, though St. Helens was not rated as a volcano. He thought it steam or vapor, caused by internal heat melting the snow, rather than smoke; but the effect was about the same. We reached the mouth of the Columbia, the same evening; but Capt. Conner thought it risky to venture over the bar, until morning. The next morning early, we lifted anchor, and steamed down to Astoria—a higgledy-piggledy village, of only four or five hundred inhabitants still, though begun long before prosperous Portland. Her anchorage seemed fair; but ashore the land abounded in a congeries of wooded bluffs and ridges, that evidently made a town or farms there difficult, if not impossible. A short street or two of straggling houses, propped along the hillsides, was about all there was of Astoria; and yet she was a port of entry, with a custom-house and full corps of officials, while Portland with all her enterprise and commerce was not, and could not get to be. What her custom-
  • 65.
    officials would haveto do, were it not for the business of Portland, it seemed pretty hard to say. A venture of John Jacob Astor's a half century before, as a trading post with the Indians, she had never become of much importance, because lacking a good back country; and it appeared, had no future now, because wanting a good town- site. This was unfortunate perhaps for Oregon, and the whole Columbia region; but over it Portland rejoiced, and continued to wax fat. Of course, it had begun to rain again, and by the time we had passed the ordeal of the custom-house at Astoria, the weather had thickened up into a drizzly fog, that caused Capt. C. much anxiety— especially, when he observed the barometer steadily going down. The bar of the Columbia, always bad, is peculiarly rough in winter, and only the voyage before the Oriflamme had to lay to here, nearly a week, unable to venture out. Her provisions became exhausted, and she had to "clean out" Astoria, and all the farm-houses up and down the river for miles, before she finally got away. Our company of four hundred passengers had no fancy for an experience of this sort, and "dirty" as the weather promised to be, Capt. C. at last decided to try the bar, even if we had to return, hoping to find better skies when fairly afloat in blue water. Our engines once in motion, we soon ran down past Forts Stevens and Cape Disappointment, at the mouth of the Columbia, on the Oregon and Washington sides respectively, with the black throats of their heavy cannon gaping threateningly at us. Both forts seem necessary there, as they completely command the mouth of the Columbia, and so hold the key to all that region. But life in them must be an almost uninterrupted series of rains and fogs, with the surf forever thundering at your feet, and one can but pity the officers and men really exiled there. Gathered about the flag-staff or lounging along the ramparts, they gazed wistfully at us as we steamed past; and already in the distance we could see the white-caps, racing in over the dreaded bar. Heading for the north channel, we put all steam on, and once out of the jaws of the Columbia were soon fairly a-dancing on the bar. The wind and tide both strong, were both dead ahead,
  • 66.
    which made ourexit about as bad, as could well be. The sea went hissing by, or broke into huge white-caps all about us. The engines creaked and groaned, and at times seemed to stand still, as if exhausted with the struggle. The good ship Oriflamme pitched and tossed, battling with the waves like a practiced pugilist, yet ever advanced, though sometimes apparently drifting shoreward. At one period, indeed, Capt. C. feared we would have to about ship and run for the Columbia—we progressed so slowly; but something of a lull in the wind just then helped us on, and at last we saw by the receding head-lands, that we were fairly over the bar and out into the broad Pacific. We congratulated ourselves in thus getting speedily to sea; but our tussle on the bar had been too much for the majority of our passengers, and soon our bulwarks were thronged with scores "casting up their accounts" with Father Neptune. Sea- sickness, that deathliest of all human ailments, had set in, and our "rough and tumble" with the waves had been so sharp, that many began to suffer from it, who declared they had never been attacked before. A notable New Yorker, a brawny son of Æsculapius at that, bravely protested, that sea-sickness was "Only a matter of the imagination. Anyone can overcome it. It only requires a vigorous exercise of the will." But, unfortunately for his theory, soon afterwards he himself became the sickest person on board, not excepting the ladies. My own experience ended with a qualm or two; but the majority of our passengers suffered very much, for several days. Our steamer really had accommodations for only about one hundred passengers; but some four hundred had crowded aboard of her at Portland, mostly miners eager to get "below" to winter, and those who had no state-rooms now "roughed it" pitiably. They lay around loose—on deck, in the cabin, in the gang-way, everywhere— the most disconsolate-looking fellows I ever saw, outside of a yellow-fever hospital. The few ladies aboard were even sicker; but these all had state-rooms, and kept them mostly for the voyage. The weather continued raw and the sea rough, most of the way down the coast, and our voyage of eight hundred miles from Portland to San Francisco, as a whole, could hardly be called
  • 67.
    agreeable. We hadfog, and rain, and head-winds all the way down, and with the exception of a day or two, it was really cold and uncomfortable. The steam-heating apparatus of the vessel was out of order, and the only place for us all to warm was at a register in the Social Hall—a narrow little cabin on deck, that would not accommodate over thirty persons at the farthest. There was a similar place for the ladies, but they usually filled this themselves. Groups huddled here all day, smoking and talking, and when the weather permitted also swarmed about the smoke-stacks. And then, besides, as already stated, our ship was badly overcrowded. Of our 400 passengers, less than a quarter had state-rooms, and the rest were left to shift for themselves. After the sea-sickness began to abate, we filled two or three tables every meal; and when bed-time came, mattrasses thronged the cabin from end to end. How it was down in the steerage, where the miners and Chinamen mostly congregated, one need not care to imagine. Fortunately great-coats and blankets abounded, or many would have suffered much. We found many choice spirits aboard, and in spite of wind and weather enjoyed ourselves, after all, very fairly. When it did not rain too hard, we walked the deck and talked for hours; and when everything else failed, we always found something of interest in the gulls that followed us by hundreds, and the great frigate-birds with their outstretched pinions, and the ever-rolling boundless sea. Our table- fare was always profuse and generally excellent, especially the Oregon apples and pears they gave us for dessert; and had it not been for our broken heating apparatus, no doubt we would have got along very satisfactorily after all, all things considered. We arrived off the Golden Gate, late at night, Dec. 14th, only four days out from Portland; but the sea was still so rough, that we feared to venture in. Next morning, however, when the mist broke away a little, we up steam and headed again for San Francisco. We had a tough time getting in, nearly as bad as getting out of the Columbia. We had to combat a strong wind dead-ahead, and to wrestle with a heavy sea. But, nevertheless, our good ship held on her course bravely; and at last, weathering Point Reyes, and
  • 68.
    rounding Fort Point,we steamed up past frowning Alcatraz, and with booming cannon dropped anchor at the Company's wharf. The storm we had encountered was reported as one of the worst known on the coast for years, and we were glad once more to touch terra firma, and strike hands with a live civilization. In a half hour we were ashore and at the Occidental, a hostelry worthy of San Francisco or any other city. And so, we had reached California at last. All hail, the Golden Gate! And 'Frisco, plucky, vain young metropolis, hail! Bragging, boasting, giddy as you are, there is much excuse for you. Surely, with your marvellous growth, and far-reaching schemes, you have a right to call yourself the New York of the Pacific Coast, if that contents you.
  • 69.
    CHAPTER XVII. SAN FRANCISCO. Geographydemonstrates the matchless position of San Francisco, as metropolis of the Pacific coast, and assures her supremacy perhaps forever. The Golden Gate, a strait six miles long by one wide, with an average depth of twenty-four fathoms—seven fathoms at the shallowest point—is her pathway to the Pacific. At her feet stretches her sheltered and peerless bay, fifty miles long by five wide, with Oakland as her Brooklyn just across it. Beyond, the Sacramento and the San Joaquin empty their floods, the drainage of the Sierra Nevadas, and afford channels for trade with much of the interior. Her system of bays—San Pablo, Suisun, and San Francisco proper— contain a superficial area of four hundred square miles, of which it is estimated, eight feet in depth pour in and out of the Golden Gate every twenty-four hours. On all that coast, for thousands of miles, she seems to be the only really great harbor; and then, besides, all enterprise and commerce have so centred here, that hereafter it will be difficult, if not impossible, to wrest supremacy from her. Until we reached Salt-Lake, New York everywhere ruled the country, and all business ideas turned that way; but from there on, the influence of Gotham ceased, and everything tended to "'Frisco," as many lovingly called her. This was her general name, indeed, for short, all over the Pacific coast; though the Nevadans spoke of her, as "the Bay" still. The city itself stands on a peninsula of shifting dunes or sand-hills, at the mouth of the harbor, much the same as if New York were built at Sandy Hook. It was a great mistake, that its founders did not locate it at Benicia, or Vallejo, or somewhere up that way, where it would have been out of the draft of the Golden Gate, had better wharfage, and been more easily defended. But, it seems, when the gold fever first broke out, in 1849, the early vessels all came
  • 70.
    consigned to YerbaBuena, as the little hamlet was then called; and as their charter-parties would not allow them to ascend the Bay farther, their cargoes were deposited on the nearest shore, and hence came San Francisco. It took a year or more then to hear from New York or London, and before further advices were received, so great was the rush of immigrants, the town was born and the city named. Benicia tried to change things afterwards; but 'Frisco had got the start, and kept it, in spite of her false location. Her military defences are Fort Point at the mouth of the Golden Gate, Fort San Josè farther up the harbor, and Alcatraz on an island square in the entrance, which with other works yet to be constructed would cross- fire and command all the approaches by water, thus rendering the city fairly impregnable. From the first, she seems to have had a fight with the sand-hills, and she was still pluckily maintaining it. She had cut many of them down, and hurled them into the sea, to give her a better frontage. Her "made" land already extended out several blocks, and the work was still going on. With a great penchant for right-angles, as if Philadelphia was her model city, she was pushing her streets straight out, in all directions, no matter what obstacles intervened. One would have thought, that with an eye to economy, as well as the picturesque, she would have flanked some of her sand-hills by leading her streets around them; but no! she marched straight at and over them, with marvellous audacity and courage, like the Old Guard at Waterloo, or the Boys in Blue at Chattanooga. Some were inaccessible to carriages; still she pushed straight on, and left the inhabitants to clamber up to their eyrie-like residences, as best they could. Many of these hills were still shifting sand, and in places lofty fences had been erected as a protection against sand-drifts; just as our railroads East sometimes build fences, as a protection against snow-drifts. The sand seemed of the lightest and loosest character, and when the breeze rose filled the atmosphere at all exposed points. And yet, when properly irrigated, it really seemed to produce about everything abundantly. While inspecting one of the harbor forts, I saw a naked drift on one side of a sand-fence, and on the
  • 71.
    other a flower-gardenof the most exquisite character, while just beyond was a vegetable and fruit-garden, that would have astonished people East. A little water had worked the miracle, and this a faithful wind-mill continued to pump up, from time to time as needed. Towards the south, the sand-hills seemed less of an obstruction, and thither the city was now drifting very rapidly. Real- estate there was constantly on the rise, and houses were springing up as if by magic in a night. The city-front, heretofore much confined, was now extending southward accordingly. It was about decided to build a sea-wall of solid granite, all along the front, two miles or more in length, at a cost of from two to three millions of dollars. This expenditure seemed large; but, it was maintained, was not too great for the vast and growing commerce of the city. But a few years before, it was a common thing for ships to go East empty or in ballast, for want of a return cargo; but in 1867 San Francisco shipped grain alone to the amount of thirteen millions of dollars, and of manufactures about as much more. Here are some other statistics that are worth one's considering. In 1849, then called Yerba Buena, she numbered perhaps 1,000 souls, all told; in 1869, nearly 200,000. In 1868, 59,000 passengers arrived by sea, and only 25,000 departed, leaving a net gain of 34,000. The vessels which entered the bay that year, numbered 3,300, and measured over 1,000,000 tons. She exported 4,000,000 sacks of wheat that year, and half a million barrels of flour. Her total exports of all kinds were estimated at not less than $70,000,000, and her imports about the same. Her sales of real-estate aggregated $27,000,000, and of mining and other stocks $115,000,000, on which she paid over $5,000,000 of dividends. The cash value of her real and personal property was estimated at $200,000,000. She sent away six tons of gold, and forty tons of silver every month, and in all since 1849 had poured into the coffers of the world not less than $1,030,000,000. [15] Her net-work of far-reaching and gigantic enterprises already embraced the whole Pacific Coast, northward to Alaska and southward to Panama, while beyond she stretched out her invisible arms to Japan and China, and shook hands with the Orient.
  • 72.
    One cloudless morning,after days of dismal drizzle, an enthusiastic Forty-Niner took me up Telegraph Hill, and bade me "view the landscape o'er!" I remembered when a school-boy reading Dana's "Two Years before the Mast," in which he speaks so contemptuously of Yerba Buena, and its Mexican Rip Van Winkles. What a change here since then! Off to the west rolled the blue Pacific, sea and sky meeting everywhere. Then came Fort Point, with its formidable batteries, commanding the Golden Gate; and then the old Presidio, with the stars and stripes waving over it. Farther inland were the stunted live-oaks and gleaming marbles of Lone Mountain Cemetery, with the Broderick Monument rising over all. Then came the live, busy, bustling, pushing city, with its quarter of a million of inhabitants nearly, soon to be a million, its wharves thronged with the ships of all nations, but with harbor-room to spare sufficient to float the navies of the world. Beyond, lay Oakland, loveliest of suburbs, smiling in verdure and beauty, with Mount Diabolo towering in the distance—his snow-crowned summit flashing in the sunlight. The Sacramento and Stockton boats, from the heart of California were already in. Past the Golden Gate, and up the noble bay, with boom of welcoming cannon, came the Hong Kong steamer fresh from Japan. The Panama steamer, with her fires banked and flag flying, was just ready to cast off. While off to the south, a long train of cars, from down the bay and San Josè, came thundering in. A hundred church spires pierced the sky; the smoke from numberless mills and factories, machine-shops and foundries, drifted over the harbor; the horse-car bells tinkled on every side—the last proofs of American progress—and all around us were the din and boom of Yankee energy, and thrift, and go-ahead-ative-ness, in place of the old Rip Van Winkleism. I don't wonder, that all good Pacific Coasters believe in San Francisco, and expect to go there when they die! Her hotels, her school-houses, her churches, her Bank of California, her Wells-Fargo Express, her Mission Woollen Mills, her lines of ocean steam-ships, and a hundred other things, all suggest great wealth and brains; and yet they are only the first fruits of nobler fortune yet to come. She is what Carlyle might call an undeniable fact, a substantial verity; and, in spite of her "heavy job of work," moves
  • 73.
    onward to empirewith giant strides. She contained already fully a third of the population of the whole state of California, and was "lifting herself up like a young lion" in all enterprises—at all times and everywhere—on the Pacific slope. Her faulty location, however, gives her a climate, that can scarcely be called inviting, notwithstanding all that Californians claim for their climate generally. It is true, the range of the thermometer there indicates but a moderate variation of temperature, with neither snow nor frost, usually. But her continual rains in winter, and cold winds and fogs in summer, must be very trying to average nerves and lungs. We found it raining on our arrival there in December, with the hills surrounding the bay already turning green; and it continued to rain and drizzle right along, pretty much all the time, until we departed for Arizona in February. Sometimes it would break away for an hour or two, and the sun would come out resplendently, as if meaning to shine forever; and then, suddenly, it would cloud over, and begin to drizzle and rain again, as if the whole heavens were only a gigantic sieve. Really, it did rain there sometimes the easiest of any place I ever saw—not excepting Fort Vancouver. Going out to drive, or on business, we got caught thus several times, and learned the wisdom of carrying stout umbrellas, or else wearing bang-up hats and water-proof coats, like true Californians. Once, for a fortnight nearly, it rained in torrents, with but little intermission, and then the whole interior became flooded—bridges were washed away, roads submerged, etc. In the midst of this, one night, we had a sharp passage of thunder and lightning—a phenomenon of rare occurrence on that coast—followed by a slight earthquake, and then it rained harder than ever. But at last, the winter rains came to an end, as all things must, and then we had indeed some superb weather, worthy of Italy or Paradise. Californians vowed their winter had been an unusual one; that their January was usually good, and their February very fine; but, of course, things must be reported as we found them. As a rule, nobody seemed to mind the perpetual drizzle, so to speak; but with slouched hats and light overcoats, or infrequent umbrellas, everybody tramped the streets, as business or
  • 74.
    pleasure called, andthe general health of the city continued good. The few fair days we had in January and early February were as soft and balmy, as our May or June, and all 'Frisco made the most of them. The ladies literally swarmed along Montgomery street, resplendent in silks and jewelry, and all the drives about the city— especially the favorite one to the Cliff-House and sea-lions—were thronged with coaches and buggies. Meanwhile, the islands in the harbor and the surrounding hills and country, so dead and barren but a few weeks before, had now become superbly green, and the whole bay and city lay embosomed in emerald. We left there the middle of February for Arizona, and did not get back until late in May. Then, when we returned we found the rains long gone, the vegetation fast turning to yellow—grain ripening in the fields—strawberries and peas on the table—and the summer winds and fogs in full vogue. At sunrise, it would be hot, even sultry, and you would see persons dressed in white linen. By nine or ten a. m., the wind would rise—a raw damp wind, sometimes with fog, sweeping in from the Pacific—and in the evening, you would see ladies going to the Opera with full winter furs on. How long this lasted, I cannot say; but this was the weather we experienced, as a rule, late in May and early in June. Heavy great-coats, doubtless, are never necessary there. And so, on the other hand, thin clothing is seldom wanted. Many indeed said, they wore the same clothing all seasons of the year, and seldom found it uncomfortable either way. The truth seemed to be, that for hardy persons the climate was excellent—the air bracing and stimulating—but invalids were better off in the interior. Consumptives could not stand the winds and fogs at all; and it was a mooted question, as to whether the large percentage of suicides just then, was not due in part to climatic influences. The really healthy, however, appeared plump and rosy, and the growing children promised well for the future. Had 'Frisco been built at Benicia, or about there, she would have escaped much of her climatic misery. Even across the bay, at Oakland, they have a much smoother climate. But she would "squat" on a sandspit, at the mouth of the Golden Gate, where there is a perpetual suck of wind
  • 75.
    and fog—from theocean, into the bay, and up the valley of the Sacramento—and now must make the most of her situation. Montgomery Street is the Broadway or Chestnut Street of San Francisco, and California her Wall Street. Her hotels, shops, and banking-houses are chiefly here, and many of them are very handsome edifices. The Occidental, Cosmopolitan, and Lick-House hotels, the new Mercantile Library, and Bank of California, are stately structures, that would do credit to any city. Their height, four and five stories, seemed a little reckless, considering the liability of the Coast to earthquakes; but the people made light of this, notwithstanding some of their best buildings showed ominous cracks "from turret to foundation stone." So long as they stood, everything was believed secure; and commerce surged and roared along the streets, as in New York and London. Brick, well strengthened by iron, seemed to be the chief building material in the business parts of the city, though stone was coming into use, obtained from an excellent quarry on Angel Island. The Bank of California had been constructed of this, and was much admired by everybody. The private residences, however, seemed chiefly frame, and were seldom more than two and a half stories high. Doubtless more heed is given to earthquakes here, though your true Californian would be slow to acknowledge this. Nevertheless, deep down in his heart—at "bed- rock," as he would say—his household gods are esteemed of more importance, than his commercial commodities. In the suburbs, Mansard roofs were fast coming into vogue, and everywhere there was a general breaking out of Bay-Window. Brown seemed to be the favorite color, doubtless to offset the summer sand-storms, and the general prevalence of bay-windows may also be due partly to these. Convenience and comfort—often elegance and luxury—appeared everywhere, and to an extent that was surprising, for a city so young and raw. Shade-trees were still rare, because only the native scrubby live-oaks, with deep penetrating roots, can survive the long and dry summers there. But shrubbery and flowers, prompted by plentiful irrigation, appeared on every side, and the air was always redolent of perfume. The most unpretending homes had their gems of flower-
  • 76.
    gardens, with evergreens,fuchsias, geraniums, pansies, and the variety and richness of their roses were a perpetual delight. A rill of water, with trickling side streams, made the barren sand-hills laugh with verdure and beauty, and gaunt wind-mills in every back-yard kept up the supply. The wind-mill California rises to the dignity of an institution, and is a godsend to the whole coast. In winter, of course, they are not needed. But throughout the long and rainless summer, when vegetation withers up and blows away, the steady sea-breeze keeps the wind-mills going, and these pump up water for a thousand irrigating purposes. The vegetable gardens about the city, and California farmers generally, all patronize them, more or less, and thus grow fruits and vegetables of exquisite character, and almost every variety, the year round. The markets and fruit-stands of San Francisco, groaning with apples, pears, peaches, plums, pomegranates, oranges, grapes, strawberries, etc., have already become world-renowned, and the Pacific Railroad now places them at our very doors. Montgomery street repeats Broadway in all but its vista, but with something more perhaps of energy and dash. The representative New Yorker always has a trace of conservatism somewhere; but your true Californian laughs at precedent, and is embodied go-ahead- ativeness. In costume, he is careless, not to say reckless, insisting on comfort at all hazards, and running greatly to pockets. Stove-pipe hats are an abomination to him, and tight trowsers nowhere; but beneath his slouch-hat are a keen eye and nose, and his powers of locomotion are something prodigious. Cleaner-cut, more wide- awake, and energetic faces are nowhere to be seen. Few aged men appear, but most average from twenty-five to forty years. Resolute, alert, jaunty, bankrupt perhaps to-day, but to-morrow picking their flints and trying it again, such men mean business in all they undertake, and carry enterprise and empire in the palms of their hands. The proportion of ladies on Montgomery street, however, usually seemed small, and the quality inferior to that of the sterner sex. Given to jewelry and loud colors, and still louder manners, there was a fastness about them, that jarred upon one's Eastern sense,
  • 77.
    though some noblespecimens of womanhood now and then appeared. Doubtless, the hotel and apartment-life of so many San Franciscans had something to do with this, as it is fatal to the more modest and domestic virtues; but it must be doubted, whether this will account for it entirely. Evidently, California is still "short" of women, at least of the worthier kind, and until she completes her supply will continue to over-estimate and spoil what she has. At least, this is the impression her Montgomery street dames make upon a stranger, and unfortunately there is much elsewhere to confirm it. Respect for the Sabbath seemed to be a growing virtue, but there was still room for much improvement. Many of the stores and shops on Montgomery and Kearney streets were open on Sunday, the same as other days; and it seemed to be the favorite day for pic-nics and excursions, to Oakland and San Mateo. Processions, with bands of music, were not infrequent, and at Hayes' Park in the Southern suburbs the whole Teuton element seemed to concentrate on that day, for a general saturnalia. On the other hand, there was a goodly array of well-filled churches, and their pastors preached with much fervency and power. The Jewish Synagogue is a magnificent structure, one of the finest in America, and deserves more than a passing notice. It is on Sutter street, in a fine location overlooking the city, and cost nearly half a million of dollars. The gilding and decoration generally inside, viewed from the organ-loft, are superb. But few of the large choir were Jews, and scarcely any could read the old Hebrew songs and chants in the original; so these were printed in English, as the Hebrew sounds, and thus they maintained the ancient custom of singing and chanting only in Hebrew! Their music, nevertheless, was grand and inspiring, and it would be well, for our Gentile churches, to emulate it. This was called the Progressive Synagogue. The congregation had recently shortened the ancient service from three hours to an hour and a half, by leaving out some of the long prayers—"vain repetitions," it is presumed—and the consequence was, a split in this most conservative of churches. The good old conservative brethren, of
  • 78.
    course, could notstand the abbreviation. They were fully persuaded, they could never get to Paradise, with only an hour and a half's service. So, they seceded, and set up for themselves. Very prosperous and wealthy are the Jews of San Francisco; and, indeed, all over the Pacific Coast, our Hebrew friends enjoy a degree of respectability, that few attain East. They number in their ranks many of the leading bankers, merchants, lawyers, etc., of San Francisco; and more than one of them sits upon the Bench, gracing his seat. Poor Thomas Starr King's church is a model in its way, and the congregation that assembles there one of the most cultivated and refined on the Pacific Coast. Their pastor, Dr. Stebbins, though not equal to his great predecessor, in some respects, is a man of marked thought and eloquence; and, by his broad Christian charity, was doing a noble work in San Francisco. So, Dr. Stone, formerly of Boston, was preaching to large audiences, and declaring "the whole counsel of God," without fear or favor. His church is plain but large and commodious, and was always thronged with attentive worshippers. Dr. Wadsworth, lately of Philadelphia, was not attracting the attention he did East; but his church was usually well- filled, and he was exerting an influence and power for good much needed. The Methodists, our modern ecclesiastical sharp-shooters, did not seem as live and aggressive, as they usually do elsewhere; but we were told they were a great and growing power on the Coast, for all that, and everybody bade them God speed. The Episcopalians, as a rule, I regret to say, appeared to make but little impression, and were perhaps unfortunate in their chief official. The Catholics, embracing most of the old Spanish population and much of the foreign element, were vigorous and aggressive, and made no concealment of the fact, that they were aiming at supremacy. In this cosmopolitan city, the Chinese, too, have their Temples, or Josh- Houses; but they were much neglected, and John Chinaman, indeed, religiously considered, seemed well on the road to philosophic indifference. During the past decade, however, things on the whole had greatly improved, morally and religiously, as the population had become
  • 79.
    more fixed andsettled; and all were hoping for a still greater improvement, with the completion of the Railroad, and the resumption of old family ties East. The drinking-saloons were being more carefully regulated. The gambling-hells, no longer permitted openly, were being more and more driven into obscurity and secrecy. Law and order were more rigidly enforced. The vigilance committees of former years still exerted their beneficent example. The Alta, Bulletin, and Times, then the three great papers of the city and Coast, all noble journals, were all open and pronounced in behalf of good morals and wholesome government; and it is not too much to say, that the prospect for the future was certainly very gratifying, not to say cheering. "Forty-Niners," (Bret Harte's Argonauts) and other early comers, declared themselves amazed, that they were getting on, as well as they did. "Yes," said one of the best of them, a man of great shrewdness and ability, "I grant, we Californians have been pretty rough customers, and have not as many religious people among us yet, as we ought to have; but then, what we have are iron-clad, you bet!" I suspect that is about so. A man, who is really religious in California, will likely be so anywhere. The severity of his temptations, if he resist them, will make him invulnerable; and all the "fiery darts of the wicked one," elsewhere, will fall harmless at his feet. Faithful Monitors are they, battling for Jesus; and in the end, we know, will come off more than conquerors. With all our hearts, let us bid them God speed!
  • 80.
    CHAPTER XVIII. SAN FRANCISCO(continued). Here in San Francisco, our National greenbacks were no longer a legal tender, but everything was on a coin basis. Just as in New York, you sell gold and buy greenbacks, if you want a convenient medium of exchange, so here we had to sell greenbacks and buy gold. A dime was the smallest coin, and "two bits" (twenty-five cents) the usual gratuity. A newspaper cost a dime, or two for twenty-five cents —the change never being returned. Fruits and vegetables were cheap, but dry-goods, groceries, clothing, books, etc., about the same in gold, as East in greenbacks. The general cost of living, therefore, seemed to be about the same as in New York, plus the premium on gold. California and the Pacific slope generally had refused to adopt the National currency, and it was still a mooted question whether they had lost or gained by this. At first, they thought it a great gain to be rid of our paper dollars; but public opinion had changed greatly, and many were getting to think they had made a huge mistake, in not originally acquiescing in the national necessity. The prosperity of the East during the war, and the pending sluggishness of trade on the Coast (still continuing), were much commented on, as connected with this question of Coin vs. Greenbacks; but it was thought too late to remedy the matter now. This hostility to our Greenbacks did not seem to arise from a want of patriotism, so much as from a difference of opinion, as to the necessity or propriety of their using a paper currency, when they had all the gold and silver they wanted, and were exporting a surplus by every steamer. If there was a speck of Secession there at first, California afterwards behaved very nobly, especially when she came with her bullion by the many thousands to the rescue of the Sanitary Commission; and Starr King's memory was still treasured
  • 81.
    everywhere, as thatof a martyr for the Union. The oncoming Pacific Railroad was constantly spoken of, as a new "bond of union," to link the Coast to the Atlantic States as with "hooks of steel;" and, evidently, nothing (unless it may be the Chinese Question) can disturb the repose of the Republic there, for long years to come. The people almost universally spoke lovingly and tenderly of the East, as their old "home," and thousands were awaiting the completion of the Railroad to go thither once again. Their great passion, however, just then, was for territorial aggrandizement. Mr. Seward had just announced his purchase of Alaska, and of course, everybody was delighted, as they would have been if he had bought the North Pole, or even the tip end of it. Next they wanted British Columbia and the Sandwich Islands, and hoped before long also to possess Mexico and down to the Isthmus. The Sitka Ice Company, which for some years had supplied San Francisco and the Coast with their only good ice, was proof positive, that there was cold weather sometimes in Alaska; nevertheless, they claimed, the Sage of Auburn had certainly shown himself to be a great statesman, by going into this Real Estate business, however hyperborean the climate. It was soon alleged to be a region of fair fields and dimpled meadows, of luscious fruits and smiling flowers, of magnificent forests and inexhaustible mines, as well as of icebergs and walrusses; and straightway a steamer cleared for Sitka, with a full complement of passengers, expecting to locate a "city" there and sell "corner lots," start a Mining Company and "water" stock, or initiate some other California enterprise. Christmas and New Year in San Francisco were observed very generally, and with even more spirit than in the East. The shops and stores had been groaning with gifts and good things for some time, and on Christmas Eve the whole city seems to pour itself into Montgomery street. Early in the evening, there was a scattering tooting of trumpets, chiefly by boys; but along toward midnight, a great procession of men and boys drifted together, and traversing Montgomery, Kearney, and adjacent streets, made the night hideous
  • 82.
    with every kindof horn, from a dime trumpet to a trombone. New Year was ushered in much the same way, though not quite so elaborately. On both of these winter holidays there happened to be superb weather, much like what we have East in May, with the sky clear, and the air crisp, and the whole city—with his wife and child— seemed to be abroad. The good old Knickerbocker custom of New Year calls was apparently everywhere accepted, and thoroughly enjoyed. Every kind of vehicle was in demand, and "stag" parties of four or five gentlemen—out calling on their lady friends—were constantly met, walking hilariously along, or driving like mad. Quite a number of army officers happened to be in San Francisco just then, and their uniforms of blue and brass made many a parlor gay. Of names known east, there were Generals Halleck, McDowell, Allen, Steele, Irvin Gregg, French, King, Fry, etc., and these with their brother officers were everywhere heartily welcomed. Indeed, army officers are nowhere more esteemed or better treated, than on the Pacific Coast, and all are usually delighted with their tour of duty there. In former years, many of them married magnificent ranches— encumbered, however, with native señoritas—and here and there we afterwards met them, living like grand seignors on their broad and baronial acres. Ranches leagues in extent, and maintaining thousands of cattle and sheep, are still common in California, and some of the best of these belong to ex-army officers. Their owners, however, do but little in the way of pure farming, and are always ready to give a quarter section or so to any stray emigrant, who will settle down and cultivate it—especially to old comrades. The great feature of San Francisco, of course, is her peerless bay. Yet noble as it is for purposes of commerce, it avails little for pleasure excursions; and 'Frisco, indeed, might be better off in this respect. A trip to Oakland is sometimes quite enjoyable, and the ride by railroad down the peninsula, skirting the bay, to San Josè, is always a delight. But the bay itself is fickle and morose in winter, and in summer must be raw and gusty. The suck of wind, from the Pacific into the interior, through the Golden Gate, as through a funnel, always keeps the bay more or less in a turmoil; and during
  • 83.
    the time wewere there, it seemed quite neglected, except for business purposes. One day, in the middle of January, however, we had duties that took us to Alcatraz and Angel Island, and essayed the trip thither in a little sloop. On leaving the Occidental, the sky was overcast, and we had the usual drizzle of that winter; but before we reached Meigg's Wharf, it had thickened into a pouring rain, and as we crossed to Alcatraz squalls were churning the outer bay into foam in all directions. After an hour or two there, on that rocky fortress, the key of San Francisco, with the wind and rain dashing fitfully about us, we took advantage of a temporary lull to re-embark for Angel Island. We had hardly got off, however, before squall after squall came charging down upon us; and as we beat up the little strait between Angel Island and Socelito, the sloop careening and the waves breaking over us, it seemed at times as if we were in a fair way of going to the bottom. Just as we rounded the rocky point of the Island, before reaching the landing, a squall of unusual force struck us athwart the bows, wave after wave leaped aboard, and for awhile our gallant little craft quivered in the blast like a spent race- horse, as she struggled onward. An abrupt lee shore was on one side, the squall howling on the other; but we faced it out, and in a lull, that soon followed, shot by the landing (it being too rough to halt there), and weathering the next point dropped anchor in a little cove behind it, just in time to escape another squall even fiercer than the former. Had we been off either point, or out in the bay, when this last one struck us, no doubt we would have gone ashore or to Davy Jones' locker; and altogether, as our Captain said, it was a "nasty, dirty day," even for San Francisco. Returning, we had skies less treacherous and a smoother run; but were glad to reach the grateful welcome and spacious halls of the Occidental, best of hotels, again. It may be, that the bay was a little ruder that day, than usual; but it bears a bad name for sudden gusts and squalls, and San Franciscans give it a wide berth generally. Sometimes, in summer, it is afflicted by calms as well as squalls; we heard some amusing stories of parties becalmed there until late at night, unable to reach either shore; so that, altogether, however useful otherwise,
  • 84.
    it can hardlybe regarded as adding much per se to the pleasures of a life in 'Frisco. As an offset to this, however, all orthodox San Franciscans, swear by the Cliff-House and the sea-lions. To "go to the Cliff," is the right thing to do in San Francisco, and not to go to the Cliff-House is not to see or know California. In the summer, people drive there in the early morning, to breakfast and return before the sea-breeze rises, and then hundreds of gay equipages throng the well-kept road. Even in winter, at the right hour, you are always sure to meet many driving out or in. Of course, we went to the "Cliff"—wouldn't have missed going there for anything. Past Lone Mountain Cemetery, that picturesque city of the dead, the fine graveled road strikes straight through the sand-hills, for five or six miles, to the Pacific; and when you reach the overhanging bluff, on which the hotel perches like an eagle's nest, you have a grand view of the Golden Gate and the far- stretching sea beyond. On the very verge of the horizon hang the Farallones, pointing the way to Japan and China, and the white sails of vessels beating in or out the harbor dot the ocean far and near. Just in front of the hotel are several groups of high shelving rocks, among which the ocean moans and dashes ceaselessly, and here the seals or "sea-lions," as 'Frisco lovingly calls them, have a favorite rendezvous and home. The day we were there, there appeared to be a hundred or more of them, large and small, swimming about the rocks or clambering over them, while pelicans and gulls kept them company. Some were small, not larger than a half-grown sturgeon, while others again were huge unwieldy monsters, not unlike legless oxen, weighing perhaps a thousand pounds or more. "Ben Butler" was an immense, overgrown creature, as selfish and saucy, apparently, as he could well be; and another, called "Gen. Grant," was not much better. They kicked and cuffed the rest overboard quite indiscriminately, though now and then they were compelled to take a plunge themselves. Many contented themselves with merely gamboling around the water's edge; but others had somehow managed slimily to roll and climb forty or fifty feet up the rocks, and there lay sunning themselves in supreme felicity, like veteran
  • 85.
    politicians snug inoffice. Sometimes two or three would get to wrangling about the same position, as if one part of the rocks were softer than another, and then they would bark and howl at each other, and presently essay to fight in the most clumsy and ludicrous way. "Ben Butler," or "Gen. Grant," would usually settle the squabble, by a harsh bark, or by flopping the malcontents overboard, and then would resume his nap with becoming satisfaction. Uncouth, and yet half-human in their way, with a cry that sometimes startled you like a distant wail, we watched their movements from the piazza of the hotel with much interest, and must congratulate 'Frisco on having such a first-class "sensation." May her "sea-lions" long remain to her as a "lion" of the first water, and their numbers and renown never grow less! In former years, they were much shot at and annoyed, by thoughtless visitors. But subsequently the State took them under her protection, and now it was a penal offence to injure or disturb them. This is right, and California should be complimented, for thus trying to preserve and perpetuate this interesting colony of her original settlers. Returning, we had a superb drive down the beach, with the surf thundering at our wheels; and thence, by a winding road over and through the hills, reached the city again. It was a glorious day in February, after a fortnight of perpetual drizzle—a June day for beauty, but toned by an October breeze—the sun flashing overhead like a shield of gold; the road, over and between the hills, gave us from time to time exquisite glimpses of the sea or bay and city; every sense seemed keyed to a new life and power of enjoyment; and the memory of that "drive to the Cliff," is something wonderfully clear and charming still. It would be surprising, if Californians did not brag considerably about it. They are not famed for modesty, and would be heathens, if they kept silence. Californians are proverbial for their ups and downs, and we heard much of their varying fortunes. You will scarcely meet a leading citizen, who has not been down to "hard-pan" once or twice in his career, and everybody seems to enjoy telling about it. In former
  • 86.
    years, many hadbeen rich in "feet" or "corner-lots," who yet had not enough "dust" to buy a "square-meal;" and men with Great Expectations, but small cash in hand, were still not infrequent. I ran foul of an old school-mate one day, who arrived in California originally as captain of an ox-team, which he had driven across the Plains. But now he was deep in mining-stocks, and twenty-vara lots, and was rated as a millionaire. I met another who for years lost all he invested in "feet." But luckily, at last, he went into Savage and Yellow Jacket, and now he owned handsome blocks on Montgomery and California streets, and lived like a prince at the Occidental. Another still, named O., an eccentric genius, came out to California early, and his uncle (already there) secured him a place in a dry- goods house. In a few months, the house failed, and O. fell back on his uncle's hands again. Then he was given a place in a silk-house, but in a short time this also failed. A fatality seemed to accompany the poor fellow. Wherever he went, the houses either failed, closed up, or burned out; and thus, time after time, he came back to his uncle, like a bad penny. Once he was reduced so low, he went to driving a dray, glad to get even that; and again, turned chiffonier, and eked out a precarious living by collecting the old bones, scraps of tin, sheet-iron, etc., that lay scattered about the suburbs. Finally, he wisely concluded he had "touched bottom," and that California was no place for him. So, his kind-hearted uncle bought him a ticket home by the "Golden City," and supposed when he bade him good- bye on her gang-way, that that would be the last he would see of O. in California. But a week or so afterwards, early one Sunday morning, he was roused up by some one rapping lustily at the door, and opening it lo! there was his hopeful nephew again—"large as life and twice as natural!" It seems, the ill-fated steamer, when two or three hundred miles down the Coast, had caught fire and been beached, with the loss of many lives; but O., strange to say, had escaped scot-free, and now was on hand again. He now tried two or three more situations, thinking his "luck" perhaps had turned, but failed in all of them or they soon failed; and finally set out for the East again, but this time across the Plains, driving a "bull-team." He got safely back to New York, and taking hold of his father's business
  • 87.
    Welcome to ourwebsite – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com