Wrapper and helper functions to use bulk RNA-seq differential expression methods with single-cell data
Developed by Christoph Hafemeister in the Developmental Cancer Genomics group at St. Anna Children's Cancer Research Institute (CCRI)
Motivated by our observation that single-cell RNA-seq differential expression tests within a sample should use pseudo-bulk data of pseudo-replicates.
Install from GitHub
remotes::install_github('aitoe96/DElegate')Given a Seurat object s, run differential expression tests between each cluster and the rest of the cells.
de_results <- DElegate::findDE(object = s)Or find all cluster markers and show top 5 for each cluster
marker_results <- DElegate::FindAllMarkers2(object = s)
dplyr::filter(marker_results, feature_rank < 6)An overview of the functionality, including examples, can be found here
DElegate is an R package that allows bulk RNA-seq differential expression methods to be used with single-cell data. It is a light wrapper around
DESeq2,
edgeR, and
limma, similar to the Libra package. In contrast to Libra, DElegate focuses on a few DE methods and will assign cells to pseudo-replicates if no true replicates are available.
All DElegate functionality is contained in one function - findDE(). It has one mandatory input argument: object, which may be of class
Seurat- the count matrix will be extracted from the'RNA'assaySingleCellExperiment- the count matrix will be extracted viacounts()dgCMatrix- sparse matrix of theMatrixpackagematrix
To indicate the cell group memberships, you have several options, depending on input type:
Seurat- in the object viaIdents(object), or use thegroup_columnargumentSingleCellExperiment- in the object viacolLables(object), or use thegroup_columnargumentdgCMatrix, ormatrix- use themeta_dataandgroup_columnarguments
DElegate uses bulk RNA-seq DE methods and relies on replicates. If no true replicates are available, it assigns cells to pseudo-replicates. However, if replicates are available in the input, the replicate_column argument can be used to indicate where to find the labels.
To tell findDE() which cell groups to compare, use the compare argument. We provide several ways to set up the comparisons that will be tested:
'each_vs_rest', the default, does multiple comparisons, one per group vs all remaining cells'all_vs_all', also does multiple comparisons, covering all group pairs- a length one character vector, e.g.
'MONOCYTES', does one comparison between that group and the remaining cells - a length two character vector, e.g.
c('T CELLS', 'B CELLS'), does one comparison between those two groups - a list of length two, e.g.
list(c('T CELLS', 'B CELLS'), c('MONOCYTES')), does one comparison after combining groups
Finally, there are currently three DE methods supported
'edger'usesedgeR::glmQLFit'deseq'usesDESeq2::DESeq(test = 'Wald')'limma'useslimma::eBayes(trend = TRUE, robust = TRUE)
For complete details, consult the package documentation: ?DElegate::findDE.
DElegate supports parallelization via the future package.
For example, to use the multicore strategy with 12 workers you may call
future::plan(strategy = 'future::multicore', workers = 12) before DE testing.
See more details at the future website
Note that every comparison is run single-threaded, but multiple comparisons will be done in parallel.
Trouble shooting: You may get an error regarding future.globals.maxSize, the maximum allowed total size of global variables. The default value is 500 MiB and may be too small. You may increase it, for example to 8GB, using options(future.globals.maxSize = 8 * 10^9).
For reporting progress updates, DElegate relies on the progressr package. By default no progress updates are rendered, but may be turned on in an R session: progressr::handlers(global = TRUE) and its default presentation modified (e.g. progressr::handlers(progressr::handler_progress)).
See more details at the progressr website