Getting Started with R

Who are we wrt R?

“great frustration and much suckiness…”

R, RStudio, RStudio Cloud

R is the computational engine; RStudio is the interface

RStudio Cloud is the same, but online.

Organizing R

For any new project in R, create an R project. Projects allow RStudio to leave notes for itself (e.g., history), will always start a new R session when opened, and will always set the working directory to the Project directory. If you never have to set the working directory at the top of the script, that’s a good thing!1

And create a system for organizing the objects in this project!

File Structure Example

I’ll try to structure shared projects in a similar manner.

R Packages

Functions are the “verbs” that allow us to manipulate data. Packages contain functions, and all functions belong to packages.

R comes with about 30 packages (“base R”). There are over 10,000 user-contributed packages; you can discover these packages online in Comprehensive R Archive Network (CRAN), with more in active development on GitHub.

To use a package, install it once

  • You can install packages via point-and-click: Tools…Install Packages…Enter tidyverse (or a different package name) then click on Install.
  • Or you can use this command in the console: install.packages("tidyverse")

RStudio Cloud allows me to set up a base project to share, so in the beginning, all of the packages you need should be installed in our shared space.

In each new R session, you’ll have to load the package if you want access to its functions: e.g., type library(tidyverse).

R Basics

  • R is case sensitive
  • Everything in R is an object (vectors, lists, matrices, data frames)
  • # demarcates code comments
  • <- is the assignment operator, how we name new objects in the R environment
  • $ is the accessor operator (or extractor), how we call named variables within an R data frame (in base R)

Reading in data

You can import pretty much any data format into R if you know the right command and (package):

  • CSV: read.csv (base R), read_csv (tidyverse)
  • Excel: read_excel (readxl)
  • Stata, SPSS, SAS: e.g., read.dta (foreign), read_dta (haven)
  • JSON, fixed-width, TXT, DAT, shape files, etc.

Primary data types include numeric, integer, logical, and character; plus factors.

Some initial R commands

  • Examining data: names(), head(), tail(), str(), glimpse()
  • Summarizing data: summary(), table(), xtabs()

dplyr

Part of the the tidyverse, dplyr is a package for data manipulation. The package implements a grammar for transforming data, based on verbs/functions that define a set of common tasks.

dplyr functions are for data frames.

  • first argument of dplyr functions is always a data frame
  • followed by function specific arguments that detail what to do

dplyr cheatsheet!

Some initial dplyr commands

\(\color{blue}{\text{select()}}\) - extract \(\color{blue}{\text{variables}}\)

select() helpers include

  • select(.data, var1:var10): select range of columns
  • select(.data, -c(var1, var2)): select every column but
  • select(.data, starts_with(“string”)): select columns that start with… (or ends_with(“string”))
  • select(.data, contains(“string”)): select columns whose names contain…

\(\color{green}{\text{filter()}}\) - extract \(\color{green}{\text{rows}}\)

Logical tests Boolean operators for multiple conditions
x < y: less than a & b: and
y >= y: greater than or equal to a | b: or
x == y: equal to xor(a,b): exactly or
x != y: not equal to !a: not
x %in% y: is a member of
is.na(x): is NA
!is.na(x): is not NA
\(\color{green}{\text{arrange()}}\) - reorder \(\color{green}{\text{rows}}\)
  • Reverse the order (largest to smallest) with desc()

Pipes!

The pipe (%>%) allows you to chain together functions by passing (piping) the result on the left into the first argument of the function on the right.

Less good alternatives

  • Create nested functions
arrange(
  select(
    filter(general, Fine > 0), 
    Charge, CodeSection, CaseType, Class, Fine), 
  desc(Fine))
  • Run and save intermediate steps
tmp <- filter(general, Fine > 0)
tmp <- select(tmp, Charge, CodeSection, CaseType, Class, Fine)
arrange(tmp, desc(Fine))

With the pipe, we call each function in sequence (read the pipe as “and then…”)

general %>% 
  filter(Fine > 0) %>% 
  select(Charge, CodeSection, CaseType, Class, Fine) %>% 
  arrange(desc(Fine))

Keyboard shortcut to create %>%

  • Mac: cmd + shift + m
  • Windows: ctrl + shift + m

Let’s Play with R!

Artwork by @allison_horst

Artwork by @allison_horst


  1. Especially since no one seems to understand paths and directories any more.↩︎