# Getting Started with R

## Why use R?

R is a statistical software environment that is widely used by statisticians, social scientists, and data analysts. R is different from “point-and-click” software packages like Microsoft Excel, SPSS, or Tableau in that it requires the user to write code via a command line interface. For this reason, R is also often referred to as a programming language. However, the R environment is much more interactive than other programming languages like C or Java which makes it easier to learn and use.

Here are a few key advantages of R.

- R is
**free**and**open source**^{1} - R is available on Windows, Mac OS, and UNIX/Linux
- R is
**flexible**: you can write, modify, save, and share your own code - R is
**powerful**: you can do everything from making high-quality graphics to running sophisticated statistical machine learning models - R is
**popular**: there is a large and growing online community of users making it easy to find answers to any problem you run into

Lastly, learning R is a tangible and highly-valued skill you can put on your CV!

## Downloading R and RStudio

To download R, go to http://cran.r-project.org/mirrors.html and choose the CRAN mirror closest to your physical location. Then select “Download R for…” and choose your operating system. For other questions, see the R FAQ.

After downloading R, you should also download RStudio, which is an integrated development enviroment (IDE) for R. In short, it provides a much more interactive and user-friendly interface for using R. To download, go here, select “RStudio Desktop - Free”, and select the installer corresponding to your operating system.

## Installing and Loading Packages

As you will see, one of the most attractive features of R is its library of over 10,000 packages. R packages – which are collections of R functions, code, and data sets – allow us to use code written by others in order to use certain data sets, make certain graphs, or run certain models.

For example, in this course we will discuss a variety of regression models including linear regression models and regression trees. While the R function to estimate a linear regression model (called `lm`

) is included in “base” R, the function to estimate a regression tree is not. Rather than writing the code ourselves, we can download an R package!

One package that provides code for estimating regression trees is called `rpart`

. To use functions within the `rpart`

package, we must first install it.

Alternatively, you can navigate to the “Packages” tab in RStudio (likely in the lower right panel), click “Install”, and search for **rpart**.

**Note: You only need to install a package once!** After a package is installed, it will remain installed until you upgrade your version of R/RStudio.

However, in each R session (i.e., each time you open RStudio), you will need to *load the package*.

Again, an alternative is to navigate to the “Packages” tab in RStudio, find the package name, and click on the box to the left of the name.