Skip to content

Commit

Permalink
Support fine-tuning when integrating annotations from multiple refere…
Browse files Browse the repository at this point in the history
…nces.

It was actually turned on by default so it was already (silently) enabled by
the previous update to the latest singlepp, but now we have documentation and
options to turn it on/off or modify the fine-tuning threshold.
  • Loading branch information
LTLA committed Sep 8, 2024
1 parent 257b5ac commit eb3c953
Show file tree
Hide file tree
Showing 8 changed files with 44 additions and 22 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: SingleR
Title: Reference-Based Single-Cell RNA-Seq Annotation
Version: 2.7.2
Date: 2024-09-06
Version: 2.7.3
Date: 2024-09-07
Authors@R: c(person("Dvir", "Aran", email="[email protected]", role=c("aut", "cph")),
person("Aaron", "Lun", email="[email protected]", role=c("ctb", "cre")),
person("Daniel", "Bunis", role="ctb"),
Expand Down
4 changes: 2 additions & 2 deletions R/RcppExports.R
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

classify_integrated <- function(test, results, integrated_build, quantile, nthreads) {
.Call('_SingleR_classify_integrated', PACKAGE = 'SingleR', test, results, integrated_build, quantile, nthreads)
classify_integrated <- function(test, results, integrated_build, quantile, use_fine_tune, fine_tune_threshold, nthreads) {
.Call('_SingleR_classify_integrated', PACKAGE = 'SingleR', test, results, integrated_build, quantile, use_fine_tune, fine_tune_threshold, nthreads)
}

#' @importFrom Rcpp sourceCpp
Expand Down
5 changes: 4 additions & 1 deletion R/classifySingleR.R
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,10 @@ classifySingleR <- function(
test=test,
trained=trained,
check.missing=FALSE,
quantile=quantile
quantile=quantile,
fine.tune=fine.tune,
tune.thresh=tune.thresh,
num.threads=num.threads
)
}
}
Expand Down
18 changes: 12 additions & 6 deletions R/combineRecomputedResults.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
#' or (ii) running \code{trainSingleR} on each reference separately and manually making a list of the trained outputs.
#' @param warn.lost Logical scalar indicating whether to emit a warning if markers from one reference in \code{trained} are absent in other references.
#' @param quantile Numeric scalar specifying the quantile of the correlation distribution to use for computing the score, see \code{\link{classifySingleR}}.
#' @param fine.tune A logical scalar indicating whether fine-tuning should be performed.
#' @param tune.thresh A numeric scalar specifying the maximum difference from the maximum correlation to use in fine-tuning.
#' @param allow.lost Deprecated.
#'
#' @return A \linkS4class{DataFrame} is returned containing the annotation statistics for each cell or cluster (row).
Expand All @@ -32,17 +34,17 @@
#' @details
#' Here, the strategy is to perform classification separately within each reference,
#' then collate the results to choose the label with the highest score across references.
#' For a given cell in \code{test}, we extract its assigned label from \code{results} for each reference.
#' We also retrieve the marker genes associated with that label and take the union of markers across all references.
#' For a given cell in \code{test}, we extract its assigned label from each reference in \code{results}, along with the marker genes associated with that label.
#' We take the union of the markers for the assigned labels across all references.
#' This defines a common feature space in which the score for each reference's assigned label is recomputed using \code{ref};
#' the label from the reference with the top recomputed score is then reported as the combined annotation for that cell.
#'
#' A key aspect of this approach is that each entry of \code{results} is generated with reference-specific markers.
#' This avoids the inclusion of noise from irrelevant genes in the within-reference assignments.
#' A key aspect of this approach is that each entry of \code{results} is generated separately for each reference.
#' This avoids problems with unintersting technical or biological differences between references that could otherwise introduce noise by forcing irrelevant genes into the marker list.
#' Similarly, the common feature space for each cell is defined from the most relevant markers across all references,
#' analogous to one iteration of fine-tuning using only the best labels from each reference.
#' Compare this to the alternative approach of creating a common feature space, where we force all per-reference classifications to use the same set of markers;
#' this would slow down each individual classification as many more genes are involved.
#' Indeed, if fine-tuning is enabled, the common feature space is iteratively refined from the labels with the highest scores, using the same process described in \code{\link{classifySingleR}}.
#' This allows us to distinguish between closely-related labels from different references.
#'
#' @section Dealing with mismatching gene availabilities:
#' It is recommended that the universe of genes be the same across all references in \code{trained}.
Expand Down Expand Up @@ -107,6 +109,8 @@ combineRecomputedResults <- function(
test,
trained,
quantile=0.8,
fine.tune=TRUE,
tune.thresh=0.05,
assay.type.test="logcounts",
check.missing=FALSE,
warn.lost=TRUE,
Expand Down Expand Up @@ -167,6 +171,8 @@ combineRecomputedResults <- function(
results=collated,
integrated_build=ibuilt,
quantile=quantile,
use_fine_tune = fine.tune,
fine_tune_threshold = tune.thresh,
nthreads=num.threads
)

Expand Down
3 changes: 3 additions & 0 deletions inst/NEWS.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ The \code{approximate=} argument is deprecated.
\item Soft-deprecated \code{check.missing=} in \code{classifySingleR()} and \code{combineRecomputedResults()}.
This is because any filtering will cause a mismatch between the row names of \code{tests} and the \code{test.genes} in \code{trained}.
Rather, filtering should be done prior to \code{trainSingleR()}, as is done in the main \code{SingleR()} function.
\item \code{combineRecomputedResults()} now supports fine-tuning to resolve closely-related labels from different references.
This is similar to the fine-tuning in \code{classifySingleR()} where the feature space is iterately redefined as the union of markers of labels with near-highest scores.
}}
\section{Version 2.0.0}{\itemize{
Expand Down
18 changes: 12 additions & 6 deletions man/combineRecomputedResults.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 6 additions & 4 deletions src/RcppExports.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,18 @@ Rcpp::Rostream<false>& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get();
#endif

// classify_integrated
SEXP classify_integrated(Rcpp::RObject test, Rcpp::List results, SEXP integrated_build, double quantile, int nthreads);
RcppExport SEXP _SingleR_classify_integrated(SEXP testSEXP, SEXP resultsSEXP, SEXP integrated_buildSEXP, SEXP quantileSEXP, SEXP nthreadsSEXP) {
SEXP classify_integrated(Rcpp::RObject test, Rcpp::List results, SEXP integrated_build, double quantile, bool use_fine_tune, double fine_tune_threshold, int nthreads);
RcppExport SEXP _SingleR_classify_integrated(SEXP testSEXP, SEXP resultsSEXP, SEXP integrated_buildSEXP, SEXP quantileSEXP, SEXP use_fine_tuneSEXP, SEXP fine_tune_thresholdSEXP, SEXP nthreadsSEXP) {
BEGIN_RCPP
Rcpp::RObject rcpp_result_gen;
Rcpp::traits::input_parameter< Rcpp::RObject >::type test(testSEXP);
Rcpp::traits::input_parameter< Rcpp::List >::type results(resultsSEXP);
Rcpp::traits::input_parameter< SEXP >::type integrated_build(integrated_buildSEXP);
Rcpp::traits::input_parameter< double >::type quantile(quantileSEXP);
Rcpp::traits::input_parameter< bool >::type use_fine_tune(use_fine_tuneSEXP);
Rcpp::traits::input_parameter< double >::type fine_tune_threshold(fine_tune_thresholdSEXP);
Rcpp::traits::input_parameter< int >::type nthreads(nthreadsSEXP);
rcpp_result_gen = Rcpp::wrap(classify_integrated(test, results, integrated_build, quantile, nthreads));
rcpp_result_gen = Rcpp::wrap(classify_integrated(test, results, integrated_build, quantile, use_fine_tune, fine_tune_threshold, nthreads));
return rcpp_result_gen;
END_RCPP
}
Expand Down Expand Up @@ -120,7 +122,7 @@ END_RCPP
}

static const R_CallMethodDef CallEntries[] = {
{"_SingleR_classify_integrated", (DL_FUNC) &_SingleR_classify_integrated, 5},
{"_SingleR_classify_integrated", (DL_FUNC) &_SingleR_classify_integrated, 7},
{"_SingleR_classify_single", (DL_FUNC) &_SingleR_classify_single, 6},
{"_SingleR_find_classic_markers", (DL_FUNC) &_SingleR_find_classic_markers, 6},
{"_SingleR_grouped_medians", (DL_FUNC) &_SingleR_grouped_medians, 4},
Expand Down
4 changes: 3 additions & 1 deletion src/classify_integrated.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
#include <vector>

//[[Rcpp::export(rng=false)]]
SEXP classify_integrated(Rcpp::RObject test, Rcpp::List results, SEXP integrated_build, double quantile, int nthreads) {
SEXP classify_integrated(Rcpp::RObject test, Rcpp::List results, SEXP integrated_build, double quantile, bool use_fine_tune, double fine_tune_threshold, int nthreads) {
Rtatami::BoundNumericPointer curtest(test);
TrainedIntegratedPointer iptr(integrated_build);

Expand Down Expand Up @@ -43,6 +43,8 @@ SEXP classify_integrated(Rcpp::RObject test, Rcpp::List results, SEXP integrated
singlepp::ClassifyIntegratedOptions<double> opts;
opts.num_threads = nthreads;
opts.quantile = quantile;
opts.fine_tune = use_fine_tune;
opts.fine_tune_threshold = fine_tune_threshold;
singlepp::classify_integrated(*(curtest->ptr), previous_results, *iptr, buffers, opts);

return Rcpp::List::create(
Expand Down

0 comments on commit eb3c953

Please sign in to comment.