Getting Started with R


Why use R?

R is a statistical software environment that is widely used by statisticians, social scientists, and data analysts. R is different from “point-and-click” software packages like Microsoft Excel, SPSS, or Tableau in that it requires the user to write code via a command line interface. For this reason, R is also often referred to as a programming language. However, the R environment is much more interactive than other programming languages like C or Java which makes it easier to learn and use.

Here are a few key advantages of R.

  • R is free and open source1
  • R is available on Windows, Mac OS, and UNIX/Linux
  • R is flexible: you can write, modify, save, and share your own code
  • R is powerful: you can do everything from making high-quality graphics to running sophisticated statistical machine learning models
  • R is popular: there is a large and growing online community of users making it easy to find answers to any problem you run into

Lastly, learning R is a tangible and highly-valued skill you can put on your CV!

Downloading R and RStudio

To download R, go to http://cran.r-project.org/mirrors.html and choose the CRAN mirror closest to your physical location. Then select “Download R for…” and choose your operating system. For other questions, see the R FAQ.

After downloading R, you should also download RStudio, which is an integrated development enviroment (IDE) for R. In short, it provides a much more interactive and user-friendly interface for using R. To download, go here, select “RStudio Desktop - Free”, and select the installer corresponding to your operating system.

Installing and Loading Packages

As you will see, one of the most attractive features of R is its library of over 10,000 packages. R packages – which are collections of R functions, code, and data sets – allow us to use code written by others in order to use certain data sets, make certain graphs, or run certain models.

For example, in this course we will discuss a variety of regression models including linear regression models and regression trees. While the R function to estimate a linear regression model (called lm) is included in “base” R, the function to estimate a regression tree is not. Rather than writing the code ourselves, we can download an R package!

One package that provides code for estimating regression trees is called rpart. To use functions within the rpart package, we must first install it.

Alternatively, you can navigate to the “Packages” tab in RStudio (likely in the lower right panel), click “Install”, and search for rpart.

Note: You only need to install a package once! After a package is installed, it will remain installed until you upgrade your version of R/RStudio.

However, in each R session (i.e., each time you open RStudio), you will need to load the package.

Again, an alternative is to navigate to the “Packages” tab in RStudio, find the package name, and click on the box to the left of the name.