Calculates the module eigengene for each input module.
Usage
calculateEigengenes(
pruned_modules,
sce,
direction_of_regulation = "+_only",
module_names = as.character(unique(pruned_modules$regulator)),
per_species = FALSE,
pseudotime_column = "pseudotime",
cell_type_column = "cell_type",
n_cores = 1L
)Arguments
- pruned_modules
Data frame of the pruned modules, required columns:
- regulator
Character, transcriptional regulator.
- target
Character, target gene of the transcriptional regulator (member of the regulator's pruned module).
- direction
Character specifying the direction of interaction between the regulator and the target, either "+" or "-" (only required if
direction_of_regulationis set to "+_only" or "+-_separately").
- sce
SingleCellExperimentobject containing the expression data (logcounts and metadata) for all network genes. Required metadata columns:- species
Character, the name of the species.
- {{pseudotime_column}}
Numeric, inferred pseudotime (optional).
- {{cell_type_column}}
Character, cell type annotation (optional).
- direction_of_regulation
Character specifying how positively and negatively regulated targets of the same transcriptional regulator should be treated, one of "+_only", "+-_separately", "all_together". If "+_only", the eigengene is calculated only for the positively regulated targets, the negatively regulated targets are removed. If "+-_separately", the eigengene is calculated separately for the positively and negatively regulated targets. If "all_together", the eigengene is calculated for all targets, irrespective of the direction of regulation.
- module_names
Character vector, the names of the modules for which the eigengenes should be calculated (default: all unique names in the column
regulatorofpruned_modules).- per_species
Logical, if FALSE (default), the eigengenes are calculated across all cells, if TRUE, the eigengenes are calculated per species.
- pseudotime_column
Character, the name of the pseudotime column in the metadata of
sce(default: "pseudotime", if there is no pseudotime column, it should be set to NULL).- cell_type_column
Character, the name of the cell type annotation column in the metadata of
sce(default: "cell_type", if there is no cell type column, it should be set to NULL).- n_cores
Integer, the number of cores (default: 1).
Value
A data frame of eigengenes with the following columns:
- cell
Character, the cell barcode.
- species
Character, the name of the species.
- {{pseudotime_column}}
Numeric, inferred pseudotime (only present if
pseudotime_columnis not NULL).- {{cell_type_column}}
Character, cell type annotation (only present if
cell_type_columnis not NULL).- module
Character, transcriptional regulator and in case the eigengene was calculated for the positively or negatively regulated targets only, the direction of interaction (format: nameOfRegulator(+) or nameOfRegulator(-)).
- eigengene
Numeric, the eigengene (i.e. the first principal component of the scaled and centered logcounts) of the module. In case
per_speciesis TRUE, it is calculated for each species separately.- mean_expr
Numeric, the mean of the scaled and centered logcounts across all genes in the module.
- regulator_expr
Numeric, the scaled and centered logcounts of the regulator.
Details
A concept adapted from WGCNA, the eigengene summarizes the expression profile of an entire module, and it is calculated as the first principal component of the module expression data. Effectively, it is a weighted mean of the individual genes' expression profiles.
As the first step, the logcounts are subsetted to keep only module member genes and the resulting count matrix is scaled and centered per gene. Next, singular value decomposition is performed on the scaled and centered count matrix using svd, with nu = 1 and nv = 1 (only 1 left and 1 right singular vector computed). The right singular vector is taken as the eigengene. Finally, this eigengene is aligned along the average expression of the module: if the correlation of the two vectors is negative, the eigengene is negated, if the correlation is positive, the eigengene is kept as it is.
If a module contains both activated and repressed targets of the transcriptional regulator, calculating the eigengene (or any other summary expression profiles) across both directions of regulation does not make biological sense and leads to the dilution of signal. It makes more sense to calculate the eigengene either for the activated targets only (direction_of_regulation = "+_only") or for the activated and repressed targets separately (direction_of_regulation = "+-_separately"). In these cases, the module column of the output will specify not only the name of the transcriptional regulator but also the direction of regulation (format: nameOfRegulator(+) or nameOfRegulator(-)). If an eigengene across all targets is desired irrespective of the direction of regulation, direction_of_regulation should be set to "all_together".
If the aim is to compare the eigengenes across species, it is recommended to calculate the eigengenes per species by setting per_species to TRUE. In this case, the scaling, centering and SVD is performed for each species separately.
If the user plans to plot the eigengene along pseudotime and/or cell types, the corresponding columns of the sce object can be specified by the arguments of pseudotime_column and cell_type_column, and then the pseudotime and cell type information of each cell will be added to the output.
