Public Interest::Data Ethics & Practice
Who are we wrt R?
R is the computational engine; RStudio is the interface
RStudio Cloud is the same, but online.
For any new project in R, create an R project. Projects allow RStudio to leave notes for itself (e.g., history), will always start a new R session when opened, and will always set the working directory to the Project directory. If you never have to set the working directory at the top of the script, that’s a good thing!1
And create a system for organizing the objects in this project!
I’ll try to structure shared projects in a similar manner.
Functions are the “verbs” that allow us to manipulate data. Packages contain functions, and all functions belong to packages.
R comes with about 30 packages (“base R”). There are over 10,000 user-contributed packages; you can discover these packages online in Comprehensive R Archive Network (CRAN), with more in active development on GitHub.
To use a package, install it once
tidyverse
(or a different package name) then click on Install.install.packages("tidyverse")
RStudio Cloud allows me to set up a base project to share, so in the beginning, all of the packages you need should be installed in our shared space.
In each new R session, you’ll have to load the package if you want access to its functions: e.g., type library(tidyverse)
.
#
demarcates code comments<-
is the assignment operator, how we name new objects in the R environment$
is the accessor operator (or extractor), how we call named variables within an R data frame (in base R)You can import pretty much any data format into R if you know the right command
and (package):
read.csv
(base R), read_csv
(tidyverse)read_excel
(readxl)read.dta
(foreign), read_dta
(haven)Primary data types include numeric, integer, logical, and character; plus factors.
names()
, head()
, tail()
, str()
, glimpse()
summary()
, table()
, xtabs()
Part of the the tidyverse
, dplyr
is a package for data manipulation. The package implements a grammar for transforming data, based on verbs/functions that define a set of common tasks.
dplyr
functions are for d
ata frames.
dplyr
functions is always a data frameselect() helpers include
Logical tests | Boolean operators for multiple conditions |
---|---|
x < y: less than | a & b: and |
y >= y: greater than or equal to | a | b: or |
x == y: equal to | xor(a,b): exactly or |
x != y: not equal to | !a: not |
x %in% y: is a member of | |
is.na(x): is NA | |
!is.na(x): is not NA |
desc()
The pipe (%>%
) allows you to chain together functions by passing (piping) the result on the left into the first argument of the function on the right.
Less good alternatives
arrange(
select(
filter(general, Fine > 0),
Charge, CodeSection, CaseType, Class, Fine),
desc(Fine))
tmp <- filter(general, Fine > 0)
tmp <- select(tmp, Charge, CodeSection, CaseType, Class, Fine)
arrange(tmp, desc(Fine))
With the pipe, we call each function in sequence (read the pipe as “and then…”)
general %>%
filter(Fine > 0) %>%
select(Charge, CodeSection, CaseType, Class, Fine) %>%
arrange(desc(Fine))
Keyboard shortcut to create %>%
Especially since no one seems to understand paths and directories any more.↩︎