This repository contains functions to model the per-gene expression from a gene-by-cell matrix of (log-transformed) expression values. Genes with high variance are considered to be more interesting and are prioritized for further analyses. The code itself was originally derived from the scran R package, factored out into a separate C++ library for easier re-use.
Given a tatami::Matrix
of log-expression values for each gene in each cell,
we can compute the per-gene variances and model the trend with respect to the mean across genes:
#include "scran_variances/scran_variances.hpp"
std::shared_ptr<tatami::Matrix<double, int> > mat = some_data_source();
scran_variances::ModelGeneVariancesOptions opt;
auto res = scran_variances::model_gene_variances(*mat, opt);
res.means; // vector of means across genes.
res.variances; // vector of variances across genes.
res.fitted; // vector of fitted values of the mean-variance trend for each gene.
res.residuals; // vector of residuals from the trend.
Typically, the residuals are used for feature selection, as these account for non-trivial mean-variance trends in transformed count data.
scran_variances::ChooseHighlyVariableGenesOptions copt;
copt.top = 5000;
auto chosen = scran_variances::choose_highly_variable_genes_index(
res.residuals.size(),
res.residuals.data(),
copt
);
// Create the HVG submatrix for downstream analysis.
auto hvg_subset = tatami::make_DelayedSubset(mat, chosen, /* by_row = */ true);
Users can also fit a trend directly to their own statistics.
scran_variances::FitVarianceTrendOptions fopt;
fopt.span = 0.5;
fopt.minimum_mean = 1;
auto fit = scran_variances::fit_variance_trend(100, means, variances, fopt);
fit.fitted; // fitted values for all genes.
fit.residuals; // residuals values for all genes.
Check out the reference documentation for more details.
If you're using CMake, you just need to add something like this to your CMakeLists.txt
:
include(FetchContent)
FetchContent_Declare(
scran_variances
GIT_REPOSITORY https://github.com/libscran/scran_variances
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(scran_variances)
Then you can link to scran_variances to make the headers available during compilation:
# For executables:
target_link_libraries(myexe libscran::scran_variances)
# For libaries
target_link_libraries(mylib INTERFACE libscran::scran_variances)
find_package(libscran_scran_variances CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::scran_variances)
To install the library, use:
mkdir build && cd build
cmake .. -DSCRAN_VARIANCES_TESTS=OFF
cmake --build . --target install
By default, this will use FetchContent
to fetch all external dependencies.
If you want to install them manually, use -DSCRAN_VARIANCES_FETCH_EXTERN=OFF
.
See the tags in extern/CMakeLists.txt
to find compatible versions of each dependency.
If you're not using CMake, the simple approach is to just copy the files in include/
- either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I
.
This requires the external dependencies listed in extern/CMakeLists.txt
, which also need to be made available during compilation.