Skip to content

Commit

Permalink
3.5.0 release (#40)
Browse files Browse the repository at this point in the history
* implement logging of latent genotypes (#14)

* Fix NA values in allele frequency vector (#20)

The code now checks for NA values in the allele frequency vector and replaces them with 0. A warning message is displayed if NA values are detected, indicating a potential problem with the MCMC chain or loci with no diversity.

* Add start_profiler and stop_profiler functions (#24)

- Added start_profiler function to R/RcppExports.R
- Added stop_profiler function to R/RcppExports.R
- Modified src/Makevars to include ENABLE_PROFILER flag and link with -lprofiler
- Added start_profiler and stop_profiler functions to src/RcppExports.cpp
- Created new file src/profiler.cpp containing the implementation of start_profiler and stop_profiler functions

* Refactor Chain class to include methods for calculating likelihood, prior, and posterior for a specific sample. (#33)

- Add `get_llik(int sample)` method to calculate the likelihood for a specific sample.
- Add `get_prior(int sample)` method to calculate the prior for a specific sample.
- Add `get_posterior(int sample)` method to calculate the posterior for a specific sample.

These changes allow for more granular calculations of likelihood, prior, and posterior values based on individual samples.

* Implement maximum runtime (#34)

* Implement maximum runtime

* log chain runtime

* add utility to detect if openmp is enabled

* ungroup input data when loading long form (#37)

* fix memory crash in prob_any_missing (#38)

The code fix addresses a memory crash issue in the `prob_any_missing` function. The fix includes adding a condition to check if `maxNumEvents` is less than `totalEvents`. If true, it fills the `probVec` vector with 1.0 and returns it. This prevents the memory crash and ensures proper execution of the function.

* restore accidentally removed openmp library to compilation

* Add new datasets namibia_data and regional_allele_frequencies. Update vignette with code to use the new datasets. (#27)

Add new datasets namibia_data and regional_allele_frequencies. Update vignette with code to use the new datasets.

- Added new datasets namibia_data and regional_allele_frequencies.
- Updated vignette with code to use the new datasets.

* update citation and manuscript
  • Loading branch information
m-murphy authored Oct 30, 2024
1 parent 9755e93 commit ade4c16
Show file tree
Hide file tree
Showing 14 changed files with 102 additions and 5 deletions.
4 changes: 4 additions & 0 deletions R/RcppExports.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ run_mcmc_rcpp <- function(args) {
.Call(`_moire_run_mcmc`, args)
}

openmp_enabled <- function() {
.Call(`_moire_openmp_enabled`)
}

start_profiler <- function(str) {
.Call(`_moire_start_profiler`, str)
}
Expand Down
6 changes: 5 additions & 1 deletion R/mcmc.R
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,9 @@
#' adaptation steps. Only used if `adapt_temp` is TRUE.
#' @param max_initialization_tries Number of times to try to initialize the
#' chain before giving up
#' @param max_runtime Maximum runtime in minutes. If the MCMC is running for
#' more than this amount of time, the function will stop and return the current
#' state of the MCMC.
run_mcmc <-
function(data,
is_missing = FALSE,
Expand Down Expand Up @@ -91,7 +94,8 @@ run_mcmc <-
adapt_temp = TRUE,
pre_adapt_steps = 25,
temp_adapt_steps = 25,
max_initialization_tries = 10000) {
max_initialization_tries = 10000,
max_runtime = Inf) {
start_time <- Sys.time()
args <- as.list(environment())
mcmc_args <- as.list(environment())
Expand Down
1 change: 1 addition & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
#' @importFrom rlang .data
load_long_form_data <- function(df, warn_uninformative = TRUE) {
uninformative_loci <- df |>
dplyr::ungroup() |>
dplyr::group_by(.data$locus) |>
dplyr::summarise(total_alleles = length(unique(.data$allele))) |>
dplyr::filter(.data$total_alleles == 1) |>
Expand Down
7 changes: 6 additions & 1 deletion man/run_mcmc.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 11 additions & 0 deletions src/RcppExports.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,16 @@ BEGIN_RCPP
return rcpp_result_gen;
END_RCPP
}
// openmp_enabled
SEXP openmp_enabled();
RcppExport SEXP _moire_openmp_enabled() {
BEGIN_RCPP
Rcpp::RObject rcpp_result_gen;
Rcpp::RNGScope rcpp_rngScope_gen;
rcpp_result_gen = Rcpp::wrap(openmp_enabled());
return rcpp_result_gen;
END_RCPP
}
// start_profiler
SEXP start_profiler(SEXP str);
RcppExport SEXP _moire_start_profiler(SEXP strSEXP) {
Expand All @@ -45,6 +55,7 @@ END_RCPP

static const R_CallMethodDef CallEntries[] = {
{"_moire_run_mcmc", (DL_FUNC) &_moire_run_mcmc, 1},
{"_moire_openmp_enabled", (DL_FUNC) &_moire_openmp_enabled, 0},
{"_moire_start_profiler", (DL_FUNC) &_moire_start_profiler, 1},
{"_moire_stop_profiler", (DL_FUNC) &_moire_stop_profiler, 0},
{NULL, NULL, 0}
Expand Down
31 changes: 31 additions & 0 deletions src/chain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1016,6 +1016,37 @@ float Chain::get_llik() { return llik; }
float Chain::get_prior() { return prior; }
float Chain::get_posterior() { return llik * temp + prior; }

float Chain::get_llik(int sample) {
int idx = sample * genotyping_data.num_loci;

#ifdef HAS_EXECUTION
return std::reduce(std::execution::unseq, genotyping_llik_new.begin() + idx,
genotyping_llik_new.begin() + idx + genotyping_data.num_loci);
#else
float llik = 0.0f;
#pragma omp simd reduction(+:llik)
for (int i = 0; i < genotyping_data.num_loci; ++i) {
llik += genotyping_llik_new[idx + i];
}
return llik;
#endif
}
float Chain::get_prior(int sample) {
float prior = 0.0f;
if (params.allow_relatedness) {
prior += relatedness_prior_new[sample];
}
prior += coi_prior_new[sample];
prior += eps_neg_prior_new[sample];
prior += eps_pos_prior_new[sample];
return prior;
}

float Chain::get_posterior(int sample) {
float posterior = get_llik(sample) * temp + get_prior(sample);
return posterior;
}

void Chain::calculate_genotype_likelihood(int sample_idx, int locus_idx)
{
int idx = sample_idx * genotyping_data.num_loci + locus_idx;
Expand Down
3 changes: 3 additions & 0 deletions src/chain.h
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,9 @@ class Chain
float get_llik();
float get_prior();
float get_posterior();
float get_llik(int sample);
float get_prior(int sample);
float get_posterior(int sample);

void set_llik(float llik);
void set_temp(float temp);
Expand Down
21 changes: 19 additions & 2 deletions src/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,20 @@ Rcpp::List run_mcmc(Rcpp::List args)
params.adapt_temp ? "Yes" : "No");
}

enum events {
START_COMPUTATION
};

Timer<events, std::chrono::minutes> timer;

MCMC mcmc(genotyping_data, params);
MCMCProgressBar pb(params.burnin, params.samples, params.use_message);
Progress p(params.burnin + params.samples, params.verbose, pb);
pb.set_llik(mcmc.get_llik());

int step = 0;
while (step < params.burnin)
timer.record_event(events::START_COMPUTATION);
while (step < params.burnin && timer.time_since_event(events::START_COMPUTATION).count() < params.max_runtime)
{
Rcpp::checkUserInterrupt();
mcmc.burnin(step);
Expand All @@ -54,7 +61,7 @@ Rcpp::List run_mcmc(Rcpp::List args)
}

step = 0;
while (step < params.samples)
while (step < params.samples && timer.time_since_event(events::START_COMPUTATION).count() < params.max_runtime)
{
Rcpp::checkUserInterrupt();
mcmc.sample(step);
Expand All @@ -72,7 +79,10 @@ Rcpp::List run_mcmc(Rcpp::List args)
}
}

bool max_runtime_reached = timer.time_since_event(events::START_COMPUTATION).count() >= params.max_runtime && step < params.samples;

mcmc.finalize();
float runtime = timer.time_since_event(events::START_COMPUTATION).count();

Rcpp::List acceptance_rates;
Rcpp::List sampling_variances;
Expand Down Expand Up @@ -123,6 +133,7 @@ Rcpp::List run_mcmc(Rcpp::List args)
res.push_back(Rcpp::wrap(mcmc.prior_sample));
res.push_back(Rcpp::wrap(mcmc.posterior_burnin));
res.push_back(Rcpp::wrap(mcmc.posterior_sample));
res.push_back(Rcpp::wrap(mcmc.data_llik_store));
res.push_back(Rcpp::wrap(mcmc.m_store));
res.push_back(Rcpp::wrap(mcmc.mean_coi_store));
res.push_back(Rcpp::wrap(mcmc.p_store));
Expand All @@ -137,6 +148,9 @@ Rcpp::List run_mcmc(Rcpp::List args)
res.push_back(Rcpp::wrap(mcmc.temp_gradient));
res.push_back(Rcpp::wrap(acceptance_rates));
res.push_back(Rcpp::wrap(sampling_variances));
res.push_back(Rcpp::wrap(max_runtime_reached));
res.push_back(Rcpp::wrap(runtime));


Rcpp::StringVector res_names;
res_names.push_back("llik_burnin");
Expand All @@ -145,6 +159,7 @@ Rcpp::List run_mcmc(Rcpp::List args)
res_names.push_back("prior_sample");
res_names.push_back("posterior_burnin");
res_names.push_back("posterior_sample");
res_names.push_back("data_llik");
res_names.push_back("coi");
res_names.push_back("lam_coi");
res_names.push_back("allele_freqs");
Expand All @@ -159,6 +174,8 @@ Rcpp::List run_mcmc(Rcpp::List args)
res_names.push_back("temp_gradient");
res_names.push_back("acceptance_rates");
res_names.push_back("sampling_variances");
res_names.push_back("max_runtime_reached");
res_names.push_back("total_runtime");

res.names() = res_names;
return res;
Expand Down
12 changes: 12 additions & 0 deletions src/mcmc.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ MCMC::MCMC(GenotypingData genotyping_data, Parameters params)
r_store.resize(genotyping_data.num_samples);
eps_neg_store.resize(genotyping_data.num_samples);
eps_pos_store.resize(genotyping_data.num_samples);
data_llik_store.resize(genotyping_data.num_samples);
swap_acceptances.resize(params.pt_chains.size() - 1, 0);
swap_barriers.resize(params.pt_chains.size() - 1, 0.0);
swap_indices.resize(params.pt_chains.size(), 0);
Expand Down Expand Up @@ -253,6 +254,7 @@ void MCMC::sample(int step)
eps_neg_store[jj].push_back(chain.eps_neg[jj]);
eps_pos_store[jj].push_back(chain.eps_pos[jj]);
r_store[jj].push_back(chain.r[jj]);
data_llik_store[jj].push_back(chain.get_llik(jj));

if (params.record_latent_genotypes) {
for (size_t kk = 0; kk < genotyping_data.num_loci; ++kk)
Expand Down Expand Up @@ -282,3 +284,13 @@ void MCMC::finalize()
float MCMC::get_llik() { return chains[swap_indices[0]].get_llik(); }
float MCMC::get_prior() { return chains[swap_indices[0]].get_prior(); }
float MCMC::get_posterior() { return chains[swap_indices[0]].get_posterior(); }

// [[Rcpp::export]]
SEXP openmp_enabled()
{
#ifdef _OPENMP
return Rcpp::wrap(true);
#else
return Rcpp::wrap(false);
#endif
}
1 change: 1 addition & 0 deletions src/mcmc.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ class MCMC
std::vector<std::vector<int>> m_store{};
std::vector<std::vector<std::vector<float>>> p_store{};
std::vector<std::vector<std::vector<std::vector<int>>>> latent_genotypes_store{};
std::vector<std::vector<float>> data_llik_store{};
std::vector<std::vector<float>> eps_pos_store{};
std::vector<std::vector<float>> eps_neg_store{};
std::vector<std::vector<float>> r_store{};
Expand Down
2 changes: 1 addition & 1 deletion src/mcmc_progress_bar.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ void MCMCProgressBar::update(float progress)
}
else
{
// stop and record time no more than every .1 seconds
// stop and record time no more than every 1 second
if (clock_.time_since_event(events::UPDATE_CONSOLE).count() < 1000)
{
return;
Expand Down
1 change: 1 addition & 0 deletions src/parameters.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Parameters::Parameters(const Rcpp::List &args)
temp_adapt_steps = UtilFunctions::r_to_int(args["temp_adapt_steps"]);
max_initialization_tries = UtilFunctions::r_to_int(args["max_initialization_tries"]);
record_latent_genotypes = UtilFunctions::r_to_bool(args["record_latent_genotypes"]);
max_runtime = UtilFunctions::r_to_float(args["max_runtime"]);

// Model
max_coi = UtilFunctions::r_to_int(args["max_coi"]);
Expand Down
1 change: 1 addition & 0 deletions src/parameters.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ class Parameters
int pre_adapt_steps;
int temp_adapt_steps;
int max_initialization_tries;
float max_runtime;

bool record_latent_genotypes;

Expand Down
6 changes: 6 additions & 0 deletions src/prob_any_missing.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,12 @@ std::vector<float> probAnyMissingFunctor::vectorized(
const std::size_t totalEvents = eventProbs.size();

std::vector<float> probVec(maxNumEvents - minNumEvents + 1, 0.0);

if (maxNumEvents < totalEvents) {
std::fill_n(probVec.begin(), maxNumEvents - minNumEvents + 1, 1.0);
return probVec;
}

std::fill_n(probVec.begin(), totalEvents - 1, 1.0);

// Calculate via inclusion-exclusion principle
Expand Down

0 comments on commit ade4c16

Please sign in to comment.