You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To address the issue of loading all kmer matrices in to memory for the model pipeline (both score and model rules do this) we can create background distributions from all kmers in a dataset. This could be constructed ahead of time - in a special pipeline 'build-dist' or can be done on the fly to build for each individual family in a thread. The score and model rules can load these background distributions - which will only be a bit bigger than the length of the kmers. Then combined, then used to score and model*. *model is something I'm not as clear about how to do.
The text was updated successfully, but these errors were encountered:
This would allow the creation of generalized kmer background distribution files that could be pre-constructed and used for particular k/alphabet combinations. That would mean that the user wouldn't have to worry about supplying a background and could train a model that way. These could be included in the repo.
To address the issue of loading all kmer matrices in to memory for the model pipeline (both score and model rules do this) we can create background distributions from all kmers in a dataset. This could be constructed ahead of time - in a special pipeline 'build-dist' or can be done on the fly to build for each individual family in a thread. The score and model rules can load these background distributions - which will only be a bit bigger than the length of the kmers. Then combined, then used to score and model*. *model is something I'm not as clear about how to do.
The text was updated successfully, but these errors were encountered: