Use PMI stat file handling functions #9

benmwebb · 2019-11-20T00:14:03Z

Rather than reading stat files with our own code, we should use the IMP.pmi.output.ProcessOutput class. This handles both v1 and v2 statfiles, and also RMF files (stat file information can be written into the RMF file itself rather than a separate text file).

The text was updated successfully, but these errors were encountered:

shruthivis · 2019-11-20T02:34:56Z

That part of the protocol (GoodScoringModelSelector.py) is superseded by @iecheverria and @ichem001 's new methods for selecting models for analysis. So not sure if it is worth investing a lot of time in revamping this script. Only the actin tutorial (and perhaps a couple of older application papers?) use this. Perhaps the actin tutorial should be updated to include the new analysis protocol?

iecheverria · 2019-11-20T06:28:59Z

Yes, this is true. The idea is to move the analysis away from arbitrary cutoffs and start looking into all sampled models in a probabilistic way. I still find selecting good scoring models useful for preliminary analysis while simulations are still running. For example, how well the good scoring models are satisfying the data and if the representation needs to be adjusted. Moving forward, I'm planning to incorporate everything, including what is in PMI_analysis and sampcon, into the PMI analysis module. I can add the new analysis protocol to the actin tutorial. Do we have a full set of trajectories? Where are they stored?

…

On Tue, Nov 19, 2019 at 6:34 PM shruthivis ***@***.***> wrote: That part of the protocol (GoodScoringModelSelector.py) is superseded by @iecheverria <https://github.com/iecheverria> and @ichem001 <https://github.com/ichem001> 's new methods for selecting models for analysis. So not sure if it is worth investing a lot of time in revamping this script. Only the actin tutorial (and perhaps a couple of older application papers?) use this. Perhaps the actin tutorial should be updated to include the new analysis protocol? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9?email_source=notifications&email_token=ADBZGA477IAQZ2CEUOU5BKLQUSO5DA5CNFSM4JPLF2OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEQPM7Q#issuecomment-555808382>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADBZGAYFF2NYFPUWHT2DAD3QUSO5DANCNFSM4JPLF2OA> .

-- --------------------------------------- Ignacia Echeverria Postdoctoral Scholar Department of Bioengineering and Therapeutic Sciences University of California, San Francisco http://salilab.org/~ignacia ----------------------------------------

shruthivis · 2019-11-20T09:22:42Z

https://github.com/salilab/actin_tutorial/tree/master/modeling has run1.zip and run2.zip which presumably correspond to the full set of trajectories from modeling. On Wed, Nov 20, 2019 at 11:59 AM Ignacia Echeverria < [email protected]> wrote:

…

Yes, this is true. The idea is to move the analysis away from arbitrary cutoffs and start looking into all sampled models in a probabilistic way. I still find selecting good scoring models useful for preliminary analysis while simulations are still running. For example, how well the good scoring models are satisfying the data and if the representation needs to be adjusted. Moving forward, I'm planning to incorporate everything, including what is in PMI_analysis and sampcon, into the PMI analysis module. I can add the new analysis protocol to the actin tutorial. Do we have a full set of trajectories? Where are they stored? On Tue, Nov 19, 2019 at 6:34 PM shruthivis ***@***.***> wrote: > That part of the protocol (GoodScoringModelSelector.py) is superseded by > @iecheverria <https://github.com/iecheverria> and @ichem001 > <https://github.com/ichem001> 's new methods for selecting models for > analysis. So not sure if it is worth investing a lot of time in revamping > this script. Only the actin tutorial (and perhaps a couple of older > application papers?) use this. Perhaps the actin tutorial should be updated > to include the new analysis protocol? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > < #9?email_source=notifications&email_token=ADBZGA477IAQZ2CEUOU5BKLQUSO5DA5CNFSM4JPLF2OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEQPM7Q#issuecomment-555808382 >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/ADBZGAYFF2NYFPUWHT2DAD3QUSO5DANCNFSM4JPLF2OA > > . > -- --------------------------------------- Ignacia Echeverria Postdoctoral Scholar Department of Bioengineering and Therapeutic Sciences University of California, San Francisco http://salilab.org/~ignacia ---------------------------------------- — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9?email_source=notifications&email_token=AB7N634SAHKR3VJPH6XFTYTQUTKKZA5CNFSM4JPLF2OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEQ36NQ#issuecomment-555859766>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB7N632PUXIFFI23GHSG3WLQUTKKZANCNFSM4JPLF2OA> .

saltzberg · 2019-11-20T14:02:00Z

Shruthi is correct. Ignacia, as it happens, I'm reworking the actin_tutorial this week to include PMI_analysis in preparation for a workshop I'm giving in a couple weeks. Has the workflow changed recently (past few months?). As for integrating into PMI, I found some major bottlenecks in imp-sampcon, one of which requires changes to PMI_analysis, so maybe hold off a bit. The major workflow change is going from outputting and reading sets of individual RMF files for sample_A and sample_B to a single RMF file each for sample_A and sample_B. Hoping to have it finished and tested by the beginning of next week.

…

On Wed, Nov 20, 2019 at 1:22 AM shruthivis ***@***.***> wrote: https://github.com/salilab/actin_tutorial/tree/master/modeling has run1.zip and run2.zip which presumably correspond to the full set of trajectories from modeling. On Wed, Nov 20, 2019 at 11:59 AM Ignacia Echeverria < ***@***.***> wrote: > Yes, this is true. The idea is to move the analysis away from arbitrary > cutoffs and start looking into all sampled models in a probabilistic way. I > still find selecting good scoring models useful for preliminary analysis > while simulations are still running. For example, how well the good scoring > models are satisfying the data and if the representation needs to be > adjusted. > Moving forward, I'm planning to incorporate everything, including what is > in PMI_analysis and sampcon, into the PMI analysis module. > I can add the new analysis protocol to the actin tutorial. Do we have a > full set of trajectories? Where are they stored? > > > On Tue, Nov 19, 2019 at 6:34 PM shruthivis ***@***.***> > wrote: > > > That part of the protocol (GoodScoringModelSelector.py) is superseded by > > @iecheverria <https://github.com/iecheverria> and @ichem001 > > <https://github.com/ichem001> 's new methods for selecting models for > > analysis. So not sure if it is worth investing a lot of time in revamping > > this script. Only the actin tutorial (and perhaps a couple of older > > application papers?) use this. Perhaps the actin tutorial should be > updated > > to include the new analysis protocol? > > > > — > > You are receiving this because you were mentioned. > > Reply to this email directly, view it on GitHub > > < > #9?email_source=notifications&email_token=ADBZGA477IAQZ2CEUOU5BKLQUSO5DA5CNFSM4JPLF2OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEQPM7Q#issuecomment-555808382 > >, > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/ADBZGAYFF2NYFPUWHT2DAD3QUSO5DANCNFSM4JPLF2OA > > > > . > > > > > -- > --------------------------------------- > Ignacia Echeverria > Postdoctoral Scholar > Department of Bioengineering and Therapeutic Sciences > University of California, San Francisco > http://salilab.org/~ignacia > ---------------------------------------- > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > < #9?email_source=notifications&email_token=AB7N634SAHKR3VJPH6XFTYTQUTKKZA5CNFSM4JPLF2OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEQ36NQ#issuecomment-555859766 >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AB7N632PUXIFFI23GHSG3WLQUTKKZANCNFSM4JPLF2OA > > . > — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#9?email_source=notifications&email_token=ABXONQAMGV7RVNS5BWMOCDLQUT6WLA5CNFSM4JPLF2OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEERJW6I#issuecomment-555916153>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABXONQDG6ONRUPI23BS7JYDQUT6WLANCNFSM4JPLF2OA> .

-- Daniel Saltzberg Post-doctoral Scholar University of California at San Francisco Lab of Andrej Sali (www.salilab.org) T: 415.514.4258 *Mailing Address:* UCSF MC 2552, Mission Bay, Byers Hall 1700 4th Street, Suite 503B San Francisco, CA 94158-2330 [email protected] [email protected]

ichem001 · 2019-11-20T21:01:28Z

@saltzberg -
Instead of writing one giant RMF file per sample - maybe we could write one small RMF and a big DCD file for each sample - and this will also make deposition to Zenodo almost automatic since we need DCD files at the end of the day - might kill two birds with one stone -
We could either link both DCD files to the ensembles or concatenate the DCD file with catDCD from the VMD/NAMD group.
what do you think?

benmwebb · 2019-11-20T21:08:48Z

Instead of writing one giant RMF file per sample - maybe we could write one small RMF and a big DCD file for each sample

This is essentially what happens internally anyway - everything is converted to a monstrous numpy array of coordinates, which is about as efficient as it can be. I don't much like DCD as a long-term solution since you lose all of the topology information and can only store coordinates. I'd rather overhaul RMF to make it more efficient at storing multiple conformations (on my lengthy list of things to fix).

saltzberg · 2019-11-20T21:24:27Z

@ichem001
The single large RMF that I am talking about are replacing the ./analysis/sample_A/tons_of_one_frame.rmf3s, not the final output.

Reading individual RMF files with rmf_slice is exceedingly slow...almost half of the total time for clustering. The PMI_analysis run_extract_models.py step can be changed to output two RMF files (sample_A and sample_B) for each cluster. These can be read into imp-sampcon an order of magnitude faster than individual RMFs for each model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use PMI stat file handling functions #9

Use PMI stat file handling functions #9

benmwebb commented Nov 20, 2019

shruthivis commented Nov 20, 2019

iecheverria commented Nov 20, 2019 via email

shruthivis commented Nov 20, 2019 via email

saltzberg commented Nov 20, 2019 via email

ichem001 commented Nov 20, 2019 •

edited

Loading

benmwebb commented Nov 20, 2019

saltzberg commented Nov 20, 2019

Use PMI stat file handling functions #9

Use PMI stat file handling functions #9

Comments

benmwebb commented Nov 20, 2019

shruthivis commented Nov 20, 2019

iecheverria commented Nov 20, 2019 via email

shruthivis commented Nov 20, 2019 via email

saltzberg commented Nov 20, 2019 via email

ichem001 commented Nov 20, 2019 • edited Loading

benmwebb commented Nov 20, 2019

saltzberg commented Nov 20, 2019

ichem001 commented Nov 20, 2019 •

edited

Loading