Public Interest::Data Ethics & Practice

Getting Started with R

“great frustration and much suckiness…”
R, RStudio, RStudio Cloud
Organizing R
R Packages

R Basics

Reading in data
Some initial R commands

dplyr

Some initial dplyr commands

- extract
- extract

Pipes!

Let’s Play with R!

Getting Started with R

Who are we wrt R?

“great frustration and much suckiness…”

R, RStudio, RStudio Cloud

R is the computational engine; RStudio is the interface

RStudio Cloud is the same, but online.

Organizing R

For any new project in R, create an R project. Projects allow RStudio to leave notes for itself (e.g., history), will always start a new R session when opened, and will always set the working directory to the Project directory. If you never have to set the working directory at the top of the script, that’s a good thing!¹

And create a system for organizing the objects in this project!

I’ll try to structure shared projects in a similar manner.

R Packages

Functions are the “verbs” that allow us to manipulate data. Packages contain functions, and all functions belong to packages.

R comes with about 30 packages (“base R”). There are over 10,000 user-contributed packages; you can discover these packages online in Comprehensive R Archive Network (CRAN), with more in active development on GitHub.

To use a package, install it once

You can install packages via point-and-click: Tools…Install Packages…Enter tidyverse (or a different package name) then click on Install.
Or you can use this command in the console: install.packages("tidyverse")

RStudio Cloud allows me to set up a base project to share, so in the beginning, all of the packages you need should be installed in our shared space.

In each new R session, you’ll have to load the package if you want access to its functions: e.g., type library(tidyverse).

R Basics

R is case sensitive
Everything in R is an object (vectors, lists, matrices, data frames)
# demarcates code comments
<- is the assignment operator, how we name new objects in the R environment
$ is the accessor operator (or extractor), how we call named variables within an R data frame (in base R)

Reading in data

You can import pretty much any data format into R if you know the right command and (package):

CSV: read.csv (base R), read_csv (tidyverse)
Excel: read_excel (readxl)
Stata, SPSS, SAS: e.g., read.dta (foreign), read_dta (haven)
JSON, fixed-width, TXT, DAT, shape files, etc.

Primary data types include numeric, integer, logical, and character; plus factors.

Some initial R commands

Examining data: names(), head(), tail(), str(), glimpse()
Summarizing data: summary(), table(), xtabs()

dplyr

Part of the the tidyverse, dplyr is a package for data manipulation. The package implements a grammar for transforming data, based on verbs/functions that define a set of common tasks.

dplyr functions are for data frames.

first argument of dplyr functions is always a data frame
followed by function specific arguments that detail what to do

dplyr cheatsheet!

Some initial dplyr commands

- extract

select() helpers include

select(.data, var1:var10): select range of columns
select(.data, -c(var1, var2)): select every column but
select(.data, starts_with(“string”)): select columns that start with… (or ends_with(“string”))
select(.data, contains(“string”)): select columns whose names contain…

- extract

Logical tests	Boolean operators for multiple conditions
x < y: less than	a & b: and
y >= y: greater than or equal to	a \| b: or
x == y: equal to	xor(a,b): exactly or
x != y: not equal to	!a: not
x %in% y: is a member of
is.na(x): is NA
!is.na(x): is not NA

- reorder

Reverse the order (largest to smallest) with desc()

Pipes!

The pipe (%>%) allows you to chain together functions by passing (piping) the result on the left into the first argument of the function on the right.

Less good alternatives

Create nested functions

arrange(
  select(
    filter(general, Fine > 0), 
    Charge, CodeSection, CaseType, Class, Fine), 
  desc(Fine))

Run and save intermediate steps

tmp <- filter(general, Fine > 0)
tmp <- select(tmp, Charge, CodeSection, CaseType, Class, Fine)
arrange(tmp, desc(Fine))

With the pipe, we call each function in sequence (read the pipe as “and then…”)

general %>% 
  filter(Fine > 0) %>% 
  select(Charge, CodeSection, CaseType, Class, Fine) %>% 
  arrange(desc(Fine))

Keyboard shortcut to create %>%

Mac: cmd + shift + m
Windows: ctrl + shift + m

Let’s Play with R!

Artwork by @allison_horst

Especially since no one seems to understand paths and directories any more.↩︎

Public Interest::Data Ethics & Practice

Michele Claibourn

2022-01-19

Getting Started with R

“great frustration and much suckiness…”

R, RStudio, RStudio Cloud

Organizing R

R Packages

R Basics

Reading in data

Some initial R commands

dplyr

Some initial dplyr commands

- extract

- extract

- reorder

Pipes!

Let’s Play with R!

Public Interest::Data Ethics & Practice

Michele Claibourn

2022-01-19

Getting Started with R

“great frustration and much suckiness…”

R, RStudio, RStudio Cloud

Organizing R

R Packages

R Basics

Reading in data

Some initial R commands

dplyr

Some initial dplyr commands

select() - extract variables

filter() - extract rows

arrange() - reorder rows

Pipes!

Let’s Play with R!

- extract

- extract

- reorder