
Fit (weighted) linear model between tree-based statistics
Source:R/fitTreeStatsLm.R
fitTreeStatsLm.RdFits a (weighted) linear model between the total tree length and within-species diversity (in case the focus of interest is conservation and overall divergence) or between the subtree length and the diversity of a species (in case the focus of interest is the species-specific divergence).
Arguments
- tree_stats
Data frame of the tree-based statistics for the pruned modules.
Columns required in case the focus of interest is conservation and overall divergence:
- regulator
Character, transcriptional regulator.
- module_size
Integer, the numer of target genes assigned to a regulator.
- total_tree_length
Numeric, total tree length per module (typically the median across all jackknife versions of the module).
- var_total_tree_length
Numeric, the variance of total tree lengths across all jackknifed versions of the module (optional, only needed for weighted regression).
- within_species_diversity
Numeric, within-species diveristy per module (typically the median across all jackknife versions of the module).
Columns required in case the focus of interest is species-specific divergence:
- regulator
Character, transcriptional regulator.
- module_size
Integer, the numer of target genes assigned to a regulator.
- {{species}}_subtree_length
Numeric, the sum of the branch lengths in the subtree that is defined by the replicates of the species and includes the internal branch connecting these replicates to the rest of the tree (typically the median across all jackknife versions of the module).
- var_{{species}}_subtree_length
Numeric, the variance of the subtree length of a species across all jackknifed versions of the module (optional, only needed for weighted regression).
- {{species}}_diversity
Numeric, within-species diveristy per module (typically the median across all jackknife versions of the module).
- focus
Character, the focus of interest in terms of cross-species conservation, either "overall" if the focus of interest is conservation and overall divergence, or the name of a species if the focus of interest is the divergence between that particular species and all others.
- weighted_lm
Logical indicating whether the linear regression should be weighted or not (default: TRUE). If TRUE,
tree_statsis expected to contain the columnvar_total_tree_lengthand the weights will be inversely proportional to these variances. If no jackknifing was performed and thus these variances were not calculated, please set this parameter to FALSE.
Value
An object of class lm.
Details
The linear models output by this function can be used to identify conserved and diverged modules, and to identify target genes within these modules that contribute the most to the conservation/divergence. For details, please see findConservedDivergedModules and findConservedDivergedTargets.
The focus of interest can be specified using the parameter focus. If focus is set to "overall" (default), the linear model will be fit between the total tree length and within-species diversity, and subsequent analysis using findConservedDivergedModules and findConservedDivergedTargets can identify modules and target genes that are conserved or diverged across all species. If focus is set to the name of a species in the dataset, the linear model will be fit between the subtree length and the diversity of that species, and subsequent analysis using findConservedDivergedModules and findConservedDivergedTargets can identify modules and target genes that are diverged between the species and all others. Please note that if the aim is to find conserved modules, focus should always be set to "overall".
The function fits the linear model corresponding to the focus of interest by calling lm. If a weighted model is desired (default), the weights are defined to be inversely proportional to the variance of the dependent variable (total tree length or the subtree length of a species). If no jackknifing was performed and thus the variance is unknown, please set weighted_lm to FALSE.