Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PMI stat file handling functions #9

Open
benmwebb opened this issue Nov 20, 2019 · 7 comments
Open

Use PMI stat file handling functions #9

benmwebb opened this issue Nov 20, 2019 · 7 comments

Comments

@benmwebb
Copy link
Member

Rather than reading stat files with our own code, we should use the IMP.pmi.output.ProcessOutput class. This handles both v1 and v2 statfiles, and also RMF files (stat file information can be written into the RMF file itself rather than a separate text file).

@shruthivis
Copy link
Collaborator

That part of the protocol (GoodScoringModelSelector.py) is superseded by @iecheverria and @ichem001 's new methods for selecting models for analysis. So not sure if it is worth investing a lot of time in revamping this script. Only the actin tutorial (and perhaps a couple of older application papers?) use this. Perhaps the actin tutorial should be updated to include the new analysis protocol?

@iecheverria
Copy link

iecheverria commented Nov 20, 2019 via email

@shruthivis
Copy link
Collaborator

shruthivis commented Nov 20, 2019 via email

@saltzberg
Copy link

saltzberg commented Nov 20, 2019 via email

@ichem001
Copy link
Contributor

ichem001 commented Nov 20, 2019

@saltzberg -
Instead of writing one giant RMF file per sample - maybe we could write one small RMF and a big DCD file for each sample - and this will also make deposition to Zenodo almost automatic since we need DCD files at the end of the day - might kill two birds with one stone -
We could either link both DCD files to the ensembles or concatenate the DCD file with catDCD from the VMD/NAMD group.
what do you think?

@benmwebb
Copy link
Member Author

Instead of writing one giant RMF file per sample - maybe we could write one small RMF and a big DCD file for each sample

This is essentially what happens internally anyway - everything is converted to a monstrous numpy array of coordinates, which is about as efficient as it can be. I don't much like DCD as a long-term solution since you lose all of the topology information and can only store coordinates. I'd rather overhaul RMF to make it more efficient at storing multiple conformations (on my lengthy list of things to fix).

@saltzberg
Copy link

@ichem001
The single large RMF that I am talking about are replacing the ./analysis/sample_A/tons_of_one_frame.rmf3s, not the final output.

Reading individual RMF files with rmf_slice is exceedingly slow...almost half of the total time for clustering. The PMI_analysis run_extract_models.py step can be changed to output two RMF files (sample_A and sample_B) for each cluster. These can be read into imp-sampcon an order of magnitude faster than individual RMFs for each model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants