Updating tutorial

salilab · Nov 15, 2024 · d983657 · d983657
1 parent 2b79c9c
commit d983657
Show file tree

Hide file tree

Showing 4 changed files with 17 additions and 10 deletions.
diff --git a/doc/heterogeneity.md b/doc/heterogeneity.md
@@ -1,13 +1,19 @@
 Modeling of heterogeneity {#heterogeneity}
 ====================================
 
-# Heterogeneity modeling step 1: gather information {#heterogeneity1}
+Here, we describe the first modeling problem in our composite workflow, how to build models of heterogeneity modeling using IMP. In this tutorial, heterogeneity modeling only includes protein copy number; however, in general, other types of information, such as the coarse location in the final state, could also be included in heterogeneity models.
+
+# Heterogeneity modeling step 1: gathering of information {#heterogeneity1}
+
+We begin heterogeneity modeling with the first step of integrative modeling, gathering information. Heterogeneity modeling will rely on copy number information about the complex. In this case, we utilize the X-ray crystal structure of the fully assembled Bmi1/Ring1b-UbcH5c complex from the protein data bank (PDB), and synthetically generated protein copy numbers during the assembly process.
 
 \image html Input_heterogeneity.png width=600px
 
-# Heterogeneity modeling step 2: representation, scoring, and search process {#heterogeneity2}
+The PDB structure of the complex informs the final state of our model and constrains the maximum copy number for each protein, while the protein copy number data informs the stoichiometry of the complex during assembly.
 
-We first must select which snapshots to model. Here, we choose only to model snapshots at 0 minutes, 1 minute, and 2 minutes because ET and SAXS data are only available at those time points. We know this complex has three protein chains (A, B, and C), and we choose to model these chains based on their protein copy number data. We then use `prepare_protein_library`, [documented here](https://integrativemodeling.org/nightly/doc/ref/namespaceIMP_1_1spatiotemporal_1_1prepare__protein__library.html), to calculate the protein copy numbers for each snapshot model and to use the topology file of the full complex (`spatiotemporal_topology.txt`) to generate a topology file for each of these snapshot models. Here, we choose to model 3 protein copy numbers at each time point, and restrict the final time point to have the same protein copy numbers as the PDB structure. 
+# Heterogeneity modeling step 2: representation, scoring function, and search process {#heterogeneity2}
+
+Next, we represent, score and search for heterogeneity models models. These operations are performed by the `heterogeneity_modeling.py` in the `Heterogeneity/Heterogeneity_Modeling` folder. As ET and SAXS data, are only available at 0 minutes, 1 minute, and 2 minutes, we choose to create heterogeneity models at these three time points. We know this complex has three protein chains (A, B, and C), and we choose to model these chains based on their protein copy number data. We then use `prepare_protein_library`, [documented here](https://integrativemodeling.org/nightly/doc/ref/namespaceIMP_1_1spatiotemporal_1_1prepare__protein__library.html), to calculate the protein copy numbers for each snapshot model and to use the topology file of the full complex (`spatiotemporal_topology.txt`) to generate a topology file for each of these snapshot models. The choices made in this topology file are important for the representation, scoring function, and search process for snapshot models, and are [discussed later.] (@ref snapshot_representation) For heterogeneity modeling, we choose to model 3 protein copy numbers at each time point, and restrict the final time point to have the same protein copy numbers as the PDB structure. 
 
 \code{.py}
 # 1a - parameters for prepare_protein_library:
@@ -25,8 +31,9 @@ IMP.spatiotemporal.prepare_protein_library.prepare_protein_library(times, exp_co
                                                 template_topology=template_topology, template_dict=template_dict)
 \endcode
 
-From the output of `prepare_protein_library`, we see that there are 3 snapshot models at each time point (it is possible to have more snapshot models than copy numbers if multiple copies of the protein exist in the complex). We then wrote `generate_all_snapshots`, which creates a directory for each snapshot, copies the necessary files into that directory, and submits a job script to run sampling. The job script will likely need to be customized for the user's computer or cluster.
-
+From the output of `prepare_protein_library`, we see that there are 3 heterogeneity models at each time point (it is possible to have more snapshot models than copy numbers if multiple copies of the protein exist in the complex). For each heterogeneity model, we see 2 files:
+- *.config, a file with a list of proteins represented in the heterogeneity model
+- *_topol.txt, a topology file for snapshot modeling corresponding to this heterogeneity model.
 
 # Heterogeneity modeling step 3: assessment {#heterogeneity_assess}
 

diff --git a/doc/snapshot.md b/doc/snapshot.md
@@ -1,17 +1,17 @@
 Modeling of snapshots {#snapshots}
 ====================================
 
-Here, we describe the second modeling problem in our composite workflow, how to build models of static snapshots using IMP. We note that this process is similar to previous tutorials of [actin](https://integrativemodeling.org/tutorials/actin/) and [RNA PolII](https://integrativemodeling.org/tutorials/rnapolii_stalk/).
+Here, we describe the second modeling problem in our composite workflow, how to build models of static snapshot models using IMP. We note that this process is similar to previous tutorials of [actin](https://integrativemodeling.org/tutorials/actin/) and [RNA PolII](https://integrativemodeling.org/tutorials/rnapolii_stalk/).
 
-# Snapshot modeling step 1: gather information {#snapshots1}
+# Snapshot modeling step 1: gathering of information {#snapshots1}
 
 We begin snapshot modeling with the first step of integrative modeling, gathering information. Snapshot modeling utilizes structural information about the complex. In this case, we utilize heterogeneity models, the X-ray crystal structure of the fully assembled Bmi1/Ring1b-UbcH5c complex from the protein data bank (PDB), synthetically generated electron tomography (ET) density maps during the assembly process, and physical theories.
 
 \image html Input_snapshot.png width=600px
 
 The heterogeneity models inform protein copy numbers for the snapshot models. The PDB structure of the complex informs the structure of the individual proteins. The time-dependent ET data informs the size and shape of the assembling complex. Physical theories inform connectivity and excluded volume.
 
-# Snapshot modeling step 2: representation, scoring, and search process {#snapshots2}
+# Snapshot modeling step 2: representation, scoring function, and search process {#snapshots2}
 
 Next, we represent, score and search for snapshot models. To do so, navigate to the `Snapshots/Snapshots_Modeling/` folder. Here, you will find two python scripts. The first, `static_snapshot.py`, uses IMP to represent, score, and search for models of a single static snapshot. The second, `start_sim.py`, automates the creation of a snapshot model for each heterogeneity model.
 

diff --git a/doc/trajectory.md b/doc/trajectory.md
@@ -3,15 +3,15 @@ Modeling of Trajectories {#trajectories}
 
 Here, we describe the final modeling problem in our composite workflow, how to build models of trajectory models using IMP.
 
-# Trajectory modeling step 1: gather information {#trajectories1}
+# Trajectory modeling step 1: gathering of information {#trajectories1}
 
 We begin trajectory modeling with the first step of integrative modeling, gathering information. Trajectory modeling utilizes dynamic information about the bimolecular process. In this case, we utilize heterogeneity models, snapshot models, physical theories, and synthetically generated small-angle X-ray scattering (SAXS) profiles.
 
 \image html Input_trajectories.png width=600px
 
 Heterogeneity models inform the possible compositional states at each time point and measure how well a compositional state agrees with input information. Snapshot models provide structural models for each heterogeneity model and measure how well those structural models agree with input information about their structure. Physical theories of macromolecular dynamics inform transitions between states. SAXS data informs the size and shape of the assembling complex and is left for validation.
 
-# Trajectory modeling step 2: representation, scoring, and search process {#trajectories2}
+# Trajectory modeling step 2: representation, scoring function, and search process {#trajectories2}
 
 Trajectory modeling connects alternative snapshot models at adjacent time points, followed by scoring the trajectories based on their fit to the input information, as described in full [here](https://www.biorxiv.org/content/10.1101/2024.08.06.606842v1.abstract).
 

diff --git a/modeling/Heterogeneity/Heterogeneity_Modeling/.1_0min_topol.txt.swp b/modeling/Heterogeneity/Heterogeneity_Modeling/.1_0min_topol.txt.swp