dplyrforcatstidyrstringr Public Interest::Data Ethics & Practice
dplyrSummarize according to a summary function
Summary functions include
| Summary Functions | |
|---|---|
| first(): first value | sum(): sum of values |
| last(): last value | n(): number of values |
| nth(.x, n): nth value | n_distinct(): number of distinct values |
| min(): minimum value | mean(): mean value |
| max(): maximum value | var(): variance |
| median(): median value | sd(): standard deviation |
| quantile(.x, probs = .25): | *IQR(): interquartile range |
Things to note: * multiple summary functions can be called within the same command * we can give the summary values new names (though we don’t have to);
Summarize is especially helpful when combined with group_by
Aggregate/group by value(s) of column(s).
mutateCreate new columns or alter existing columns
if_else, case_whendf <- df %>%
mutate(newvar = if_else(condition, value_if_true, value_if_false, value_if_na))
df <- df %>%
mutate(newvar = case_when(
condition1 ~ value1,
condition2 ~ value2,
condition3 ~ value3,
TRUE ~ value_everything_else)
across() can also be used within mutateFactors are variables which take on a limited number of values, aka categorical variables. In R, factors are stored as a vector of integer values with the corresponding set of character values you’ll see when displayed (colloquially, labels; in R, levels).
forcatsThe forcats package, part of the tidyverse, provides helper functions for working with factors. Including
tidyrData from one column to multiple colums, or from multiple columns into one
unite(data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE)separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ...)stringrstringr provides a set of functions to make working with strings easier. Built on stringi, it implements some of the most frequenlty used string manipulation functions.
All functions in stringr start with str_ and take a vector of strings as the first argument. Some key functions:
str_sub(x, start = 1L, end = -1L)str_pad(string, width, side = c("left", "right", "both"), pad = " ") or str_trim(string, side = c("both", "left", "right")) or str_wrap(string, width = 80, indent = 0, exdent = 0)str_to_upper(string) or str_to_lower(string or str_to_title(stringstr_detect(), str_count(), str_subset(), str_locate(), str_extract(), str_replace()See stringr for more. And the stringr vignette on regular expressions,
Combine code, results, and prose into dynamic and reproducible documents suitable for sharing! These notes are made with R Markdown!
Because nobody can remember all of this!
Artwork by @allison_horst
XKCD, Randall Munroe, https://xkcd.com/2494/