Public Interest::Data Ethics & Practice
dplyr
Summarize according to a summary function
Summary functions include
Summary Functions | |
---|---|
first(): first value | sum(): sum of values |
last(): last value | n(): number of values |
nth(.x, n): nth value | n_distinct(): number of distinct values |
min(): minimum value | mean(): mean value |
max(): maximum value | var(): variance |
median(): median value | sd(): standard deviation |
quantile(.x, probs = .25): | *IQR(): interquartile range |
Things to note: * multiple summary functions can be called within the same command * we can give the summary values new names (though we don’t have to);
Summarize is especially helpful when combined with group_by
Aggregate/group by value(s) of column(s).
mutate
Create new columns or alter existing columns
if_else
, case_when
df <- df %>%
mutate(newvar = if_else(condition, value_if_true, value_if_false, value_if_na))
df <- df %>%
mutate(newvar = case_when(
condition1 ~ value1,
condition2 ~ value2,
condition3 ~ value3,
TRUE ~ value_everything_else)
across()
can also be used within mutateFactors are variables which take on a limited number of values, aka categorical variables. In R, factors are stored as a vector of integer values with the corresponding set of character values you’ll see when displayed (colloquially, labels; in R, levels).
forcats
The forcats
package, part of the tidyverse
, provides helper functions for working with factors. Including
tidyr
Data from one column to multiple colums, or from multiple columns into one
unite(data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE)
separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ...)
stringr
stringr
provides a set of functions to make working with strings easier. Built on stringi
, it implements some of the most frequenlty used string manipulation functions.
All functions in stringr
start with str_
and take a vector of strings as the first argument. Some key functions:
str_sub(x, start = 1L, end = -1L)
str_pad(string, width, side = c("left", "right", "both"), pad = " ")
or str_trim(string, side = c("both", "left", "right"))
or str_wrap(string, width = 80, indent = 0, exdent = 0)
str_to_upper(string)
or str_to_lower(string
or str_to_title(string
str_detect()
, str_count()
, str_subset()
, str_locate()
, str_extract()
, str_replace()
See stringr for more. And the stringr
vignette on regular expressions,
Combine code, results, and prose into dynamic and reproducible documents suitable for sharing! These notes are made with R Markdown!