Updating tutorial

salilab · Dec 2, 2024 · 82bcd0b · 82bcd0b
1 parent 430ca69
commit 82bcd0b
Show file tree

Hide file tree

Showing 4 changed files with 16 additions and 16 deletions.
diff --git a/Jupyter/.ipynb_checkpoints/spatiotemporal-colab-checkpoint.ipynb b/Jupyter/.ipynb_checkpoints/spatiotemporal-colab-checkpoint.ipynb
@@ -44,7 +44,7 @@
     "Modeling of heterogeneity<a id=\"notebook_heterogeneity\"></a>\n",
     "====================================\n",
     "\n",
-    "Here, we describe the first modeling problem in our composite workflow, how to build models of heterogeneity modeling using IMP. In this tutorial, heterogeneity modeling only includes protein copy number; however, in general, other types of information, such as the coarse location in the final state, could also be included in heterogeneity models.\n",
+    "Here, we describe the first modeling problem in our composite workflow, how to build models of heterogeneity using IMP. In this tutorial, heterogeneity modeling only includes protein copy number; however, in general, other types of information, such as the coarse location in the final state, could also be included in heterogeneity models.\n",
     "\n",
     "# Heterogeneity modeling step 1: gathering of information<a id=\"notebook_heterogeneity1\"></a>\n",
     "\n",
@@ -181,7 +181,7 @@
     "\n",
     "Beads and Gaussians in our model belong to either a *rigid body* or *flexible string*. The positions of all beads and Gaussians in a single rigid body are constrained during sampling and do not move relative to each other. Meanwhile, flexible beads can move freely during sampling, but are restrained by sequence connectivity.\n",
     "\n",
-    "To begin, we built a topology file with the representation for the model of the complete system, `spatiotemporal_topology.txt`, located in `Heterogeneity/Heterogeneity_Modeling/`. This complete topology was used as a template to build topologies of each heterogeneity model. Based on our observation of the structure of the complex, we chose to represent each protein with at least 2 separate rigid bodies, and left the first 28 residues of protein C as flexible beads. Rigid bodies were described with 1 bead for every residue, and 10 residues per Gaussian. Flexible beads were described with 1 bead for every residue and 1 residue per Gaussian. A more complete description of the options available in topology files is available in the the [TopologyReader](https://integrativemodeling.org/2.21.0/doc/ref/classIMP_1_1pmi_1_1topology_1_1TopologyReader.html) documentation.\n",
+    "To begin, we built a topology file with the representation for the model of the complete system, `spatiotemporal_topology.txt`, located in `Heterogeneity/Heterogeneity_Modeling/`. This complete topology was used as a template to build topologies for each snapshot model. Based on our observation of the structure of the complex, we chose to represent each protein with at least 2 separate rigid bodies, and left the first 28 residues of protein C as flexible beads. Rigid bodies were described with 1 bead for every residue, and 10 residues per Gaussian. Flexible beads were described with 1 bead for every residue and 1 residue per Gaussian. A more complete description of the options available in topology files is available in the the [TopologyReader](https://integrativemodeling.org/2.21.0/doc/ref/classIMP_1_1pmi_1_1topology_1_1TopologyReader.html) documentation.\n",
     "\n",
     "```\n",
     "|molecule_name | color | fasta_fn | fasta_id | pdb_fn | chain | residue_range | pdb_offset | bead_size | em_residues_per_gaussian | rigid_body | super_rigid_body | chain_of_super_rigid_bodies | \n",
@@ -484,11 +484,11 @@
     "\n",
     "# Trajectory modeling step 1: gathering of information<a id=\"notebook_trajectories1\"></a>\n",
     "\n",
-    "We begin trajectory modeling with the first step of integrative modeling, gathering information. Trajectory modeling utilizes dynamic information about the bimolecular process. In this case, we utilize heterogeneity models, snapshot models, physical theories, and synthetically generated small-angle X-ray scattering (SAXS) profiles.\n",
+    "We begin trajectory modeling with the first step of integrative modeling, gathering information. Trajectory modeling utilizes dynamic information about the bimolecular process. In this case, we utilize snapshot models, physical theories, and synthetically generated small-angle X-ray scattering (SAXS) profiles.\n",
     "\n",
     "<img src=\"images/Input_trajectories.png\" width=\"600px\" />\n",
     "\n",
-    "Heterogeneity models inform the possible compositional states at each time point and measure how well a compositional state agrees with input information. Snapshot models provide structural models for each heterogeneity model and measure how well those structural models agree with input information about their structure. Physical theories of macromolecular dynamics inform transitions between states. SAXS data informs the size and shape of the assembling complex and is left for validation.\n",
+    "Snapshot models inform transitions between sampled time points, and their scores inform trajectory scores. Physical theories of macromolecular dynamics inform transitions between sampled time points. SAXS data informs the size and shape of the assembling complex and is left for validation.\n",
     "\n",
     "# Trajectory modeling step 2: representation, scoring function, and search process<a id=\"notebook_trajectories2\"></a>\n",
     "\n",
@@ -504,10 +504,10 @@
     "To score trajectory models, we incorporate both the scores of individual snapshot models, as well as the scores of transitions between them. Under the assumption that the process is Markovian (*i.e.* memoryless), the weight of a trajectory model takes the form:\n",
     "\n",
     "$$\n",
-    "W(\\chi) \\propto   \\displaystyle\\prod^{T}_{t=0} P( X_{N,t}, N_{t} | D_{t}) \\cdot \\displaystyle\\prod^{T-1}_{t=0} W(X_{N,t+1},N_{t+1} | X_{N,t},N_{t}, D_{t,t+1}),\n",
+    "W(\\chi) \\propto   \\displaystyle\\prod^{T}_{t=0} P( X_{t} | D_{t}) \\cdot \\displaystyle\\prod^{T-1}_{t=0} W(X_{t+1} | X_{t},D_{t,t+1}),\n",
     "$$\n",
     "\n",
-    "where $t$ indexes times from 0 until the final modeled snapshot ($T$); $P(X_{N,t}, N_{t} | D_{t})$ is the snapshot model score; and $W(X_{N,t+1},N_{t+1} | X_{N,t},N_{t}, D_{t,t+1})$ is the transition score. Trajectory model weights ($W(\\chi)$) are normalized so that the sum of all trajectory models' weights is 1.0. Transition scores are currently based on a simple metric that either allows or disallows a transition. Transitions are only allowed if all proteins in the first snapshot model are included in the second snapshot model. In the future, we hope to include more detailed transition scoring terms, which may take into account experimental information or physical models of macromolecular dynamics.\n",
+    "where $t$ indexes times from 0 until the final modeled snapshot ($T$); $P(X_{t} | D_{t})$ is the snapshot model score; and $W(X_{t+1} | X_{t},D_{t,t+1})$ is the transition score. Trajectory model weights ($W(\\chi)$) are normalized so that the sum of all trajectory models' weights is 1.0. Transition scores are currently based on a simple metric that either allows or disallows a transition. Transitions are only allowed if all proteins in the first snapshot model are included in the second snapshot model. In the future, we hope to include more detailed transition scoring terms, which may take into account experimental information or physical models of macromolecular dynamics.\n",
     "\n",
     "### Searching for good scoring models<a id=\"notebook_trajectory_searching\"></a>\n",
     "\n",

diff --git a/Jupyter/spatiotemporal-colab.ipynb b/Jupyter/spatiotemporal-colab.ipynb
@@ -44,7 +44,7 @@
     "Modeling of heterogeneity<a id=\"notebook_heterogeneity\"></a>\n",
     "====================================\n",
     "\n",
-    "Here, we describe the first modeling problem in our composite workflow, how to build models of heterogeneity modeling using IMP. In this tutorial, heterogeneity modeling only includes protein copy number; however, in general, other types of information, such as the coarse location in the final state, could also be included in heterogeneity models.\n",
+    "Here, we describe the first modeling problem in our composite workflow, how to build models of heterogeneity using IMP. In this tutorial, heterogeneity modeling only includes protein copy number; however, in general, other types of information, such as the coarse location in the final state, could also be included in heterogeneity models.\n",
     "\n",
     "# Heterogeneity modeling step 1: gathering of information<a id=\"notebook_heterogeneity1\"></a>\n",
     "\n",
@@ -56,7 +56,7 @@
     "\n",
     "# Heterogeneity modeling step 2: representation, scoring function, and search process<a id=\"notebook_heterogeneity2\"></a>\n",
     "\n",
-    "Next, we represent, score and search for heterogeneity models models. A single heterogeneity model is a set of protein copy numbers, scored according to its fit to experimental copy number data at that time point. As ET and SAXS data, are only available at 0 minutes, 1 minute, and 2 minutes, we choose to create heterogeneity models at these three time points. We then use `prepare_protein_library`, to calculate the protein copy numbers for each snapshot model and to use the topology file of the full complex (`spatiotemporal_topology.txt`) to generate a topology file for each of these snapshot models. The choices made in this topology file are important for the representation, scoring function, and search process for snapshot models, and are discussed later. For heterogeneity modeling, we choose to model 3 protein copy numbers at each time point, and restrict the final time point to have the same protein copy numbers as the PDB structure.\n"
+    "Next, we represent, score and search for heterogeneity models models. A single heterogeneity model is a set of protein copy numbers, scored according to its fit to experimental copy number data at that time point. As ET and SAXS data, are only available at 0 minutes, 1 minute, and 2 minutes, we choose to create heterogeneity models at these three time points. We then use `prepare_protein_library`, to calculate the protein copy numbers for each heterogeneity model and to use the topology file of the full complex (`spatiotemporal_topology.txt`) to generate a topology file for each corresponding snapshot model. The choices made in this topology file are important for the representation, scoring function, and search process for snapshot models, and are discussed later. For heterogeneity modeling, we choose to model 3 protein copy numbers at each time point, and restrict the final time point to have the same protein copy numbers as the PDB structure.\n"
    ]
   },
   {
@@ -133,7 +133,7 @@
    "id": "b56fbe48-da12-412e-ac2e-dca673e04a43",
    "metadata": {},
    "source": [
-    "From the output of `prepare_protein_library`, we see that there are 3 heterogeneity models at each time point (it is possible to have more snapshot models than copy numbers if multiple copies of the protein exist in the complex). For each heterogeneity model, we see 2 files:\n",
+    "From the output of `prepare_protein_library`, we see that there are 3 heterogeneity models at each time point (it is possible to have more heterogeneity models than copy numbers if multiple copies of the protein exist in the complex). For each heterogeneity model, we see 2 files:\n",
     "- *.config, a file with a list of proteins represented in the heterogeneity model\n",
     "- *_topol.txt, a topology file for snapshot modeling corresponding to this heterogeneity model.\n",
     "\n",
@@ -156,7 +156,7 @@
     "Modeling of snapshots<a id=\"notebook_snapshots\"></a>\n",
     "====================================\n",
     "\n",
-    "Here, we describe the second modeling problem in our composite workflow, how to build models of static snapshot models using IMP. We note that this process is similar to previous tutorials of [actin](https://integrativemodeling.org/tutorials/actin/) and [RNA PolII](https://integrativemodeling.org/tutorials/rnapolii_stalk/).\n",
+    "Here, we describe the second modeling problem in our composite workflow, how to build static snapshot models using IMP. We note that this process is similar to previous tutorials of [actin](https://integrativemodeling.org/tutorials/actin/) and [RNA PolII](https://integrativemodeling.org/tutorials/rnapolii_stalk/).\n",
     "\n",
     "# Snapshot modeling step 1: gathering of information<a id=\"notebook_snapshots1\"></a>\n",
     "\n",
@@ -181,7 +181,7 @@
     "\n",
     "Beads and Gaussians in our model belong to either a *rigid body* or *flexible string*. The positions of all beads and Gaussians in a single rigid body are constrained during sampling and do not move relative to each other. Meanwhile, flexible beads can move freely during sampling, but are restrained by sequence connectivity.\n",
     "\n",
-    "To begin, we built a topology file with the representation for the model of the complete system, `spatiotemporal_topology.txt`, located in `Heterogeneity/Heterogeneity_Modeling/`. This complete topology was used as a template to build topologies of each heterogeneity model. Based on our observation of the structure of the complex, we chose to represent each protein with at least 2 separate rigid bodies, and left the first 28 residues of protein C as flexible beads. Rigid bodies were described with 1 bead for every residue, and 10 residues per Gaussian. Flexible beads were described with 1 bead for every residue and 1 residue per Gaussian. A more complete description of the options available in topology files is available in the the [TopologyReader](https://integrativemodeling.org/2.21.0/doc/ref/classIMP_1_1pmi_1_1topology_1_1TopologyReader.html) documentation.\n",
+    "To begin, we built a topology file with the representation for the model of the complete system, `spatiotemporal_topology.txt`, located in `Heterogeneity/Heterogeneity_Modeling/`. This complete topology was used as a template to build topologies for each snapshot model. Based on our observation of the structure of the complex, we chose to represent each protein with at least 2 separate rigid bodies, and left the first 28 residues of protein C as flexible beads. Rigid bodies were described with 1 bead for every residue, and 10 residues per Gaussian. Flexible beads were described with 1 bead for every residue and 1 residue per Gaussian. A more complete description of the options available in topology files is available in the the [TopologyReader](https://integrativemodeling.org/2.21.0/doc/ref/classIMP_1_1pmi_1_1topology_1_1TopologyReader.html) documentation.\n",
     "\n",
     "```\n",
     "|molecule_name | color | fasta_fn | fasta_id | pdb_fn | chain | residue_range | pdb_offset | bead_size | em_residues_per_gaussian | rigid_body | super_rigid_body | chain_of_super_rigid_bodies | \n",
@@ -514,7 +514,7 @@
     "Trajectory models are constructed by enumerating all connections between adjacent snapshot models and scoring these trajectory models according to the equation above. This procedure results in a set of weighted trajectory models.\n",
     "\n",
     "## Computing trajectory models<a id=\"autotoc1v43\"></a>\n",
-    "To compute trajectory models, we first copy all necessary files to a new directory, `data`. These files are (i) `{state}_{time}.config` files, which include the subcomplexes that are in each state, (ii) `{state}_{time}_scores.log`, which is a list of all scores of all structural models in that snapshot model, and (iii) `exp_comp{prot}.csv`, which is the experimental copy number for each protein (`{prot}`) as a function of time. Here, we copy files related to the snapshots (`*.log` files) from the `modeling` directory, as we skipped computing snapshots due to the computational expense.\n"
+    "To compute trajectory models, we first copy all necessary files to a new directory, `data`. These files are (i) `{state}_{time}.config` files, which include the subcomplexes that are in each state, (ii) `{state}_{time}_scores.log`, which is a list of all scores of all structural models in that snapshot model, and (iii) `exp_comp{prot}.csv`, which is the experimental copy number for each protein (`{prot}`) as a function of time. Here, we copy files related to the snapshot models (`*.log` files) from the `modeling` directory, as we skipped computing snapshot models due to the computational expense.\n"
    ]
   },
   {

diff --git a/doc/heterogeneity.md b/doc/heterogeneity.md
@@ -1,7 +1,7 @@
 Modeling of heterogeneity {#heterogeneity}
 ====================================
 
-Here, we describe the first modeling problem in our composite workflow, how to build models of heterogeneity modeling using IMP. In this tutorial, heterogeneity modeling only includes protein copy number; however, in general, other types of information, such as the coarse location in the final state, could also be included in heterogeneity models.
+Here, we describe the first modeling problem in our composite workflow, how to build models of heterogeneity using IMP. In this tutorial, heterogeneity modeling only includes protein copy number; however, in general, other types of information, such as the coarse location in the final state, could also be included in heterogeneity models.
 
 # Heterogeneity modeling step 1: gathering of information {#heterogeneity1}
 
@@ -15,7 +15,7 @@ The PDB structure of the complex informs the final state of our model and constr
 
 Next, we represent, score and search for heterogeneity models models. These operations are performed by the `heterogeneity_modeling.py` in the `Heterogeneity/Heterogeneity_Modeling` folder. A single heterogeneity model is a set of protein copy numbers, scored according to its fit to experimental copy number data at that time point.
 
-As ET and SAXS data, are only available at 0 minutes, 1 minute, and 2 minutes, we choose to create heterogeneity models at these three time points. We then use `prepare_protein_library`, [documented here](https://integrativemodeling.org/nightly/doc/ref/namespaceIMP_1_1spatiotemporal_1_1prepare__protein__library.html), to calculate the protein copy numbers for each snapshot model and to use the topology file of the full complex (`spatiotemporal_topology.txt`) to generate a topology file for each of these snapshot models. The choices made in this topology file are important for the representation, scoring function, and search process for snapshot models, and are [discussed later.] (@ref snapshot_representation) For heterogeneity modeling, we choose to model 3 protein copy numbers at each time point, and restrict the final time point to have the same protein copy numbers as the PDB structure. 
+As ET and SAXS data, are only available at 0 minutes, 1 minute, and 2 minutes, we choose to create heterogeneity models at these three time points. We then use `prepare_protein_library`, [documented here](https://integrativemodeling.org/nightly/doc/ref/namespaceIMP_1_1spatiotemporal_1_1prepare__protein__library.html), to calculate the protein copy numbers for each heterogeneity model and to use the topology file of the full complex (`spatiotemporal_topology.txt`) to generate a topology file for each of the corresponding snapshot models. The choices made in this topology file are important for the representation, scoring function, and search process for snapshot models, and are [discussed later.] (@ref snapshot_representation) For heterogeneity modeling, we choose to model 3 protein copy numbers at each time point, and restrict the final time point to have the same protein copy numbers as the PDB structure. 
 
 \code{.py}
 # 1a - parameters for prepare_protein_library:
@@ -33,7 +33,7 @@ IMP.spatiotemporal.prepare_protein_library.prepare_protein_library(times, exp_co
                                                 template_topology=template_topology, template_dict=template_dict)
 \endcode
 
-From the output of `prepare_protein_library`, we see that there are 3 heterogeneity models at each time point (it is possible to have more snapshot models than copy numbers if multiple copies of the protein exist in the complex). For each heterogeneity model, we see 2 files:
+From the output of `prepare_protein_library`, we see that there are 3 heterogeneity models at each time point (it is possible to have more heterogeneity models than copy numbers if multiple copies of the protein exist in the complex). For each heterogeneity model, we see 2 files:
 - *.config, a file with a list of proteins represented in the heterogeneity model
 - *_topol.txt, a topology file for snapshot modeling corresponding to this heterogeneity model.