Skip to contents

Calculates the estimate and confidence interval of a module-level statistic based on all values obtained by jackknifing the module.

Usage

summarizeJackknifeStats(
  stats_df,
  stats = setdiff(colnames(stats_df), c("regulator", "type", "id", "gene_removed",
    "module_size", "replicate", "species", "replicate1", "replicate2", "species1",
    "species2")),
  summary_method = ifelse(endsWith(stats, "monophyl"), "mean", "median"),
  conf_level = 0.95
)

Arguments

stats_df

Data frame of the statistics per jackknife module version:

regulator

Character, transcriptional regulator.

module_size

Integer, the numer of target genes assigned to a regulato.r

type

Character, module type (orig = original or jk = jackknifed).

id

Character, the unique ID of the module version (format: nameOfRegulator_jk_nameOfGeneRemoved in case of module type 'jk' and nameOfRegulator_orig in case of module type 'orig').

gene_removed

Character, the name of the gene removed by jackknifing (NA in case of module type 'orig').

replicate

Character, the name of the replicate (optional, only needed if the statistics are calculated per replicate).

species

Character, the name of the species (optional, only needed if the statistics are calculated per replicate or per species).

replicate1, replicate2

Character, the names of the replicates compared (optional, only needed if the statistics are calculated per replicate pair).

species1, species2

Character, the names of the species compared (optional, only needed if the statistics are calculated per replicate pair or species pair).

{{nameOfStat}}

Numeric, integer or logical, one or more columns containing the values of the statistic(s) specified in 'stats'.

stats

Character, the name of the statistic(s) that need to be summarized.

summary_method

Character or character vector, the summary method ("mean" or "median") to be used for the statistics in 'stats'. If only one value is provided, the same summary method will be used for all statistics, if a vector is provided, each element of the vector will be used for the corresponding element in 'stats'. By default, "mean" for statistics measuring monophyleticity and "median" for all other statistics.

conf_level

Numeric, confidence level of the interval (default: 0.95).

Value

A data frame of the statistics per module:

regulator

Character, transcriptional regulator.

module_size

Module size, the numer of target genes assigned to a regulator.

replicate

The name of the replicate (only if the column is present in the input 'stats_df').

species

The name of the species (only if the column is present in the input 'stats_df').

replicate1, replicate2

The names of the replicates compared (only if the column is present in the input 'stats_df').

species1, species2

The names of the species compared (only if the column is present in the input 'stats_df').

{{nameOfStat}}

Numeric, one or more columns containing the estimates (mean or median) of the statistic(s) specified in 'stats'.

var_{{nameOfStat}}

Numeric, one or more columns containing the variances of the statistic(s) specified in 'stats'.

lwr_{{nameOfStat}}

Numeric, one or more columns containing the lower bounds of the confidence interval for the statistic(s) specified in 'stats'.

upr_{{nameOfStat}}

Numeric, one or more columns containing the upper bounds of the confidence interval for the statistic(s) specified in 'stats'.

Details

To gain information about the confidence of various statistics, jackknifing can be used: each member gene of a module is removed and the statistics are re-calculated. This way, the median or mean and its confidence interval across the jackknifed versions can be used to estimate a statistic of interest for the module as a whole. For continuous statistics the median, for boolean statistics the mean is recommended as the summary method.

Examples

pres_stats <- summarizeJackknifeStats(pres_stats_jk)
tree_stats <- summarizeJackknifeStats(tree_stats_jk,
                                      c("total_tree_length", "within_species_diversity"))