| aggregateData {iNZightTools} | R Documentation |
Aggregate a dataframe into summaries of all numeric variables by grouping them by specified categorical variables and returns the result along with tidyverse code used to generate it.
aggregateData( .data, vars, summaries, summary_vars, varnames = NULL, quantiles = c(0.25, 0.75), custom_funs = NULL )
.data |
a dataframe or survey design object to aggregate |
vars |
a character vector of categorical variables in |
summaries |
summaries to generate for the groups generated
in |
summary_vars |
names of variables in the dataset to calculate summaries of |
varnames |
name templates for created variables (see details). |
quantiles |
if requesting quantiles, specify the desired quantiles here |
custom_funs |
a list of custom functions (see details). |
aggregated dataframe containing the summaries with tidyverse code attached
The aggregateData function accepts any R function which returns a single-value (such as mean, var, sd, sum, IQR). The default name of new variables will be {var}_{fun}, where {var} is the variable name and {fun} is the summary function used. You may pass new names via the varnames argument, which should be either a vector the same length as summary_vars, or a named list (where the names are the summary function). In either case, use {var} to represent the variable name. e.g., {var}_mean or min_{var}.
You can also include the summary missing, which will count the number of missing values in the variable. It has default name {var}_missing.
For the quantile summary, there is the additional argument quantiles. A new variable will be created for each specified quantile 'p'. To name these variables, use {p} in varnames (the default is {var}_q{p}).
Custom functions can be passed via the custom_funs argument. This should be a list, and each element should have a name and either an expr or fun element. Expressions should operate on a variable x. The function should be a function of x and return a single value.
cust_funs <- list(name = '{var}_width', expr = diff(range(x), na.rm = TRUE))
cust_funs <- list(name = '{var}_stderr',
fun = function(x) {
s <- sd(x)
n <- length(x)
s / sqrt(n)
}
)
Tom Elliott, Owen Jin
aggregated <-
aggregateData(iris,
vars = c("Species"),
summaries = c("mean", "sd", "iqr")
)
cat(code(aggregated))
head(aggregated)