diff --git a/Jupyter/.ipynb_checkpoints/spatiotemporal-colab-checkpoint.ipynb b/Jupyter/.ipynb_checkpoints/spatiotemporal-colab-checkpoint.ipynb
index 895e7bd0a..c55a16c91 100644
--- a/Jupyter/.ipynb_checkpoints/spatiotemporal-colab-checkpoint.ipynb
+++ b/Jupyter/.ipynb_checkpoints/spatiotemporal-colab-checkpoint.ipynb
@@ -14,10 +14,10 @@
"\n",
"Biomolecules are constantly in motion; therefore, a complete depiction of their function must include their dynamics instead of just static structures. We have developed an integrative spatiotemporal approach to model dynamic systems.\n",
"\n",
- "Our approach applies a composite workflow, consisting of three modeling problems to compute (i) heterogeneity models, (ii) snapshot models, and (iii) trajectory models.\n",
+ "Our approach applies a composite workflow, consisting of three modeling problems to compute (i) heterogeneity models, (ii) snapshot models, and (iii) a trajectory model.\n",
"Heterogeneity models describe the possible biomolecular compositions of the system at each time point. Optionally, other auxiliary variables can be considered, such as the coarse location in the final state when modeling an assembly process.\n",
"For each heterogeneity model, one snapshot model is produced. A snapshot model is a set of alternative standard static integrative structure models based on the information available for the corresponding time point.\n",
- "Then, trajectory models are created by connecting alternative snapshot models at adjacent time points. These trajectory models can be scored based on both the scores of static structures and the transitions between them, allowing for the creation of trajectories that are in agreement with the input information by construction.\n",
+ "Then, a set of trajectories ranked by their agreement with input information is computed by connecting alternative snapshot models at adjacent time points (*i.e.*, the “trajectory model”). This trajectory model can be scored based on both the scores of static structures and the transitions between them, allowing for the creation of trajectories that are in agreement with the input information by construction.\n",
"\n",
"If you use this tutorial or its accompanying method, please site the corresponding publications:\n",
"\n",
@@ -56,7 +56,7 @@
"\n",
"# Heterogeneity modeling step 2: representation, scoring function, and search process\n",
"\n",
- "Next, we represent, score and search for heterogeneity models models. A single heterogeneity model is a set of protein copy numbers, scored according to its fit to experimental copy number data at that time point. As ET and SAXS data, are only available at 0 minutes, 1 minute, and 2 minutes, we choose to create heterogeneity models at these three time points. We then use `prepare_protein_library`, to calculate the protein copy numbers for each snapshot model and to use the topology file of the full complex (`spatiotemporal_topology.txt`) to generate a topology file for each of these snapshot models. The choices made in this topology file are important for the representation, scoring function, and search process for snapshot models, and are discussed later. For heterogeneity modeling, we choose to model 3 protein copy numbers at each time point, and restrict the final time point to have the same protein copy numbers as the PDB structure.\n"
+ "Next, we represent, score and search for heterogeneity models models. A single heterogeneity model is a set of protein copy numbers, scored according to its fit to experimental copy number data at that time point. As ET and SAXS data, are only available at 0 minutes, 1 minute, and 2 minutes, we choose to create heterogeneity models at these three time points. We then use `prepare_protein_library`, to calculate the protein copy numbers for each heterogeneity model and to use the topology file of the full complex (`spatiotemporal_topology.txt`) to generate a topology file for each corresponding snapshot model. The choices made in this topology file are important for the representation, scoring function, and search process for snapshot models, and are discussed later. For heterogeneity modeling, we choose to model 3 protein copy numbers at each time point, and restrict the final time point to have the same protein copy numbers as the PDB structure.\n"
]
},
{
@@ -133,13 +133,13 @@
"id": "b56fbe48-da12-412e-ac2e-dca673e04a43",
"metadata": {},
"source": [
- "From the output of `prepare_protein_library`, we see that there are 3 heterogeneity models at each time point (it is possible to have more snapshot models than copy numbers if multiple copies of the protein exist in the complex). For each heterogeneity model, we see 2 files:\n",
+ "From the output of `prepare_protein_library`, we see that there are 3 heterogeneity models at each time point (it is possible to have more heterogeneity models than copy numbers if multiple copies of the protein exist in the complex). For each heterogeneity model, we see 2 files:\n",
"- *.config, a file with a list of proteins represented in the heterogeneity model\n",
"- *_topol.txt, a topology file for snapshot modeling corresponding to this heterogeneity model.\n",
"\n",
"# Heterogeneity modeling step 3: assessment\n",
"\n",
- "Now, we have a variety of heterogeneity models. In general, there are four ways to assess a model: estimate the sampling precision, compare the model to data used to construct it, validate the model against data not used to construct it, and quantify the precision of the model. Here, we will focus specifically on comparing the model to experimental data, as other assessments will be performed later, when the trajectory models are assessed.\n",
+ "Now, we have a variety of heterogeneity models. In general, there are four ways to assess a model: estimate the sampling precision, compare the model to data used to construct it, validate the model against data not used to construct it, and quantify the precision of the model. Here, we will focus specifically on comparing the model to experimental data, as other assessments will be performed later, when the trajectory model is assessed.\n",
"\n",
"Next, we can plot the modeled and experimental copy numbers simultaneously for each protein, as shown below for proteins A (a), B (b), and C (c).\n",
"\n",
@@ -156,7 +156,7 @@
"Modeling of snapshots\n",
"====================================\n",
"\n",
- "Here, we describe the second modeling problem in our composite workflow, how to build models of static snapshot models using IMP. We note that this process is similar to previous tutorials of [actin](https://integrativemodeling.org/tutorials/actin/) and [RNA PolII](https://integrativemodeling.org/tutorials/rnapolii_stalk/).\n",
+ "Here, we describe the second modeling problem in our composite workflow, how to build static snapshot models using IMP. We note that this process is similar to previous tutorials of [actin](https://integrativemodeling.org/tutorials/actin/) and [RNA PolII](https://integrativemodeling.org/tutorials/rnapolii_stalk/).\n",
"\n",
"# Snapshot modeling step 1: gathering of information\n",
"\n",
@@ -480,7 +480,7 @@
"Modeling of a Trajectory\n",
"====================================\n",
"\n",
- "Here, we describe the final modeling problem in our composite workflow, how to build models of trajectory models using IMP.\n",
+ "Here, we describe the final modeling problem in our composite workflow, how to build a trajectory model using IMP.\n",
"\n",
"# Trajectory modeling step 1: gathering of information\n",
"\n",
@@ -492,29 +492,29 @@
"\n",
"# Trajectory modeling step 2: representation, scoring function, and search process\n",
"\n",
- "Trajectory modeling connects alternative snapshot models at adjacent time points, followed by scoring the trajectory models based on their fit to the input information, as described in full [here](https://www.biorxiv.org/content/10.1101/2024.08.06.606842v1.abstract).\n",
+ "Trajectory modeling connects alternative snapshot models at adjacent time points, followed by scoring the trajectories based on their fit to the input information, as described in full [here](https://www.biorxiv.org/content/10.1101/2024.08.06.606842v1.abstract).\n",
"\n",
"## Background behind integrative spatiotemporal modeling\n",
"### Representing the model\n",
"\n",
- "We choose to represent dynamic processes as a trajectory of snapshot models, with one snapshot model at each time point. In this case, we computed snapshot models at 3 time points (0, 1, and 2 minutes), so a single trajectory model will consist of 3 snapshot models, one at each 0, 1, and 2 minutes. The modeling procedure described here will produce a set of scored trajectory models, which can be displayed as a directed acyclic graph, where nodes in the graph represent the snapshot model and edges represent connections between snapshot models at neighboring time points.\n",
+ "We choose to represent dynamic processes as a trajectory of snapshot models, with one snapshot model at each time point. In this case, we computed snapshot models at 3 time points (0, 1, and 2 minutes), so a single trajectory will consist of 3 snapshot models, one at each 0, 1, and 2 minutes. The modeling procedure described here will produce a set of scored trajectories, which can be displayed as a directed acyclic graph, where nodes in the graph represent the snapshot model and edges represent connections between snapshot models at neighboring time points.\n",
"\n",
"### Scoring the model\n",
"\n",
- "To score trajectory models, we incorporate both the scores of individual snapshot models, as well as the scores of transitions between them. Under the assumption that the process is Markovian (*i.e.* memoryless), the weight of a trajectory model takes the form:\n",
+ "To score trajectories, we incorporate both the scores of individual snapshot models, as well as the scores of transitions between them. Under the assumption that the process is Markovian (*i.e.* memoryless), the weight of a trajectory takes the form:\n",
"\n",
"$$\n",
"W(\\chi) \\propto \\displaystyle\\prod^{T}_{t=0} P( X_{t} | D_{t}) \\cdot \\displaystyle\\prod^{T-1}_{t=0} W(X_{t+1} | X_{t},D_{t,t+1}),\n",
"$$\n",
"\n",
- "where $t$ indexes times from 0 until the final modeled snapshot ($T$); $P(X_{t} | D_{t})$ is the snapshot model score; and $W(X_{t+1} | X_{t},D_{t,t+1})$ is the transition score. Trajectory model weights ($W(\\chi)$) are normalized so that the sum of all trajectory models' weights is 1.0. Transition scores are currently based on a simple metric that either allows or disallows a transition. Transitions are only allowed if all proteins in the first snapshot model are included in the second snapshot model. In the future, we hope to include more detailed transition scoring terms, which may take into account experimental information or physical models of macromolecular dynamics.\n",
+ "where $t$ indexes times from 0 until the final modeled snapshot ($T$); $P(X_{t} | D_{t})$ is the snapshot model score; and $W(X_{t+1} | X_{t},D_{t,t+1})$ is the transition score. Trajectory weights ($W(\\chi)$) are normalized so that the sum of all trajectory weights is 1.0. Transition scores are currently based on a simple metric that either allows or disallows a transition. Transitions are only allowed if all proteins in the first snapshot model are included in the second snapshot model. In the future, we hope to include more detailed transition scoring terms, which may take into account experimental information or physical models of macromolecular dynamics.\n",
"\n",
"### Searching for good scoring models\n",
"\n",
- "Trajectory models are constructed by enumerating all connections between adjacent snapshot models and scoring these trajectory models according to the equation above. This procedure results in a set of weighted trajectory models.\n",
+ "Trajectories are constructed by enumerating all connections between adjacent snapshot models and scoring these trajectories according to the equation above. This procedure results in a set of weighted trajectories.\n",
"\n",
"## Computing trajectory models\n",
- "To compute trajectory models, we first copy all necessary files to a new directory, `data`. These files are (i) `{state}_{time}.config` files, which include the subcomplexes that are in each state, (ii) `{state}_{time}_scores.log`, which is a list of all scores of all structural models in that snapshot model, and (iii) `exp_comp{prot}.csv`, which is the experimental copy number for each protein (`{prot}`) as a function of time. Here, we copy files related to the snapshots (`*.log` files) from the `modeling` directory, as we skipped computing snapshots due to the computational expense.\n"
+ "To compute trajectories, we first copy all necessary files to a new directory, `data`. These files are (i) `{state}_{time}.config` files, which include the subcomplexes that are in each state, (ii) `{state}_{time}_scores.log`, which is a list of all scores of all structural models in that snapshot model, and (iii) `exp_comp{prot}.csv`, which is the experimental copy number for each protein (`{prot}`) as a function of time. Here, we copy files related to the snapshot models (`*.log` files) from the `modeling` directory, as we skipped computing snapshot models due to the computational expense.\n"
]
},
{
@@ -692,13 +692,13 @@
"metadata": {},
"source": [
"After running `spatiotemporal.create_DAG`, a variety of outputs are written:\n",
- "- `cdf.txt`: the cumulative distribution function for the set of trajectory models.\n",
- "- `pdf.txt`: the probability distribution function for the set of trajectory models.\n",
- "- `labeled_pdf.txt`: Each row has 2 columns and represents a different trajectory model. The first column labels a single trajectory model as a series of snapshot models, where each snapshot model is written as `{state}_{time}|` in sequential order. The second column is the probability distribution function corresponding to that trajectory model.\n",
+ "- `cdf.txt`: the cumulative distribution function for the set of trajectories.\n",
+ "- `pdf.txt`: the probability distribution function for the set of trajectories.\n",
+ "- `labeled_pdf.txt`: Each row has 2 columns and represents a different trajectory. The first column labels a single trajectory as a series of snapshot models, where each snapshot model is written as `{state}_{time}|` in sequential order. The second column is the probability distribution function corresponding to that trajectory.\n",
"- `dag_heatmap.eps` and `dag_heatmap`: image of the directed acyclic graph from the set of models.\n",
- "- `path*.txt`: files where each row includes a `{state}_{time}` string, so that rows correspond to the states visited over that trajectory model. Files are numbered from the most likely path to the least likely path.\n",
+ "- `path*.txt`: files where each row includes a `{state}_{time}` string, so that rows correspond to the states visited over that trajectory. Files are numbered from the most likely path to the least likely path.\n",
"\n",
- "Now that we have a trajectory model, we can plot the directed acyclic graph (left) and the series of centroid models from each snapshot model along the most likely trajectory model (right). Each row corresponds to a different time point in the assembly process (0 min, 1 min, and 2 min). Each node is shaded according to its weight in the final model ($W(X_{N,t}N_{t})$). Proteins are colored as A - blue, B - orange, and C - purple.\n",
+ "Now that we have a trajectory model, we can plot the directed acyclic graph (left) and the series of centroid models from each snapshot model along the most likely trajectory (right). Each row corresponds to a different time point in the assembly process (0 min, 1 min, and 2 min). Each node is shaded according to its weight in the final model ($W(X_{N,t}N_{t})$). Proteins are colored as A - blue, B - orange, and C - purple.\n",
"\n",
"\n"
]
@@ -715,9 +715,9 @@
"\n",
"## Sampling precision\n",
"\n",
- "To begin, we calculate the sampling precision of the models. The sampling precision is calculated by using `spatiotemporal.create_DAG` to reconstruct the set of trajectory models using 2 independent sets of samplings for snapshot models. Then, the overlap between these snapshot models is evaluated using `analysis.temporal_precision`, which takes in two `labeled_pdf` files.\n",
+ "To begin, we calculate the sampling precision of the models. The sampling precision is calculated by using `spatiotemporal.create_DAG` to reconstruct the set of trajectories using 2 independent sets of samplings for snapshot models. Then, the overlap between these snapshot models is evaluated using `analysis.temporal_precision`, which takes in two `labeled_pdf` files.\n",
"\n",
- "The temporal precision can take values between 1.0 and 0.0, and indicates the overlap between the two models in trajectory space. Hence, values close to 1.0 indicate a high sampling precision, while values close to 0.0 indicate a low sampling precision. Here, the value close to 1.0 indicates that sampling does not affect the weights of the trajectory models.\n"
+ "The temporal precision can take values between 1.0 and 0.0, and indicates the overlap between the two models in trajectory space. Hence, values close to 1.0 indicate a high sampling precision, while values close to 0.0 indicate a low sampling precision. Here, the value close to 1.0 indicates that sampling does not affect the weights of the trajectories.\n"
]
},
{
@@ -864,9 +864,9 @@
"source": [
"## Model precision\n",
"\n",
- "Next, we calculate the precision of the model, using `analysis.precision`. Here, the model precision calculates the number of trajectory models with high weights. The precision ranges from 1.0 to 1/d, where d is the number of trajectory models. Values approaching 1.0 indicate the model set can be described by a single trajectory model, while values close to 1/d indicate that all trajectory models have similar weights.\n",
+ "Next, we calculate the precision of the model, using `analysis.precision`. Here, the model precision calculates the number of trajectories with high weights. The precision ranges from 1.0 to 1/d, where d is the number of trajectories. Values approaching 1.0 indicate the model set can be described by a single trajectory, while values close to 1/d indicate that all trajectories have similar weights.\n",
"\n",
- "The `analysis.precision` function reads in the `labeled_pdf` of the complete model, and calculates the precision of the model. The value close to 1.0 indicates that the set of models can be sufficiently represented by a single trajectory model."
+ "The `analysis.precision` function reads in the `labeled_pdf` of the complete model, and calculates the precision of the model. The value close to 1.0 indicates that the set of models can be sufficiently represented by a single trajectory."
]
},
{
@@ -915,7 +915,7 @@
"id": "70a6e0a0-98d5-4b63-a839-221a4fdc493b",
"metadata": {},
"source": [
- "After comparing the model to EM data, we aimed to compare the model to copy number data, and wrote the `forward_model_copy_number` function to evaluate the copy numbers from our set of trajectory models. The output of `forward_model_copy_number` is written in `forward_model_copy_number/`. The folder contains `CN_prot_{prot}.txt` files for each protein, which have the mean and standard deviation of protein copy number at each time point. We can then plot these copy numbers from the forward models against those from the experiment, as shown below."
+ "After comparing the model to EM data, we aimed to compare the model to copy number data, and wrote the `forward_model_copy_number` function to evaluate the copy numbers from our set of trajectories. The output of `forward_model_copy_number` is written in `forward_model_copy_number/`. The folder contains `CN_prot_{prot}.txt` files for each protein, which have the mean and standard deviation of protein copy number at each time point. We can then plot these copy numbers from the forward models against those from the experiment, as shown below."
]
},
{
@@ -1058,7 +1058,7 @@
"id": "de81dea5-9d98-4424-ba64-d7a2def893b9",
"metadata": {},
"source": [
- "Here, we plot the comparison between the experimental data used in model construction and the set of trajectory models. This analysis includes the cross-correlation coefficient between the experimental EM density and the forward density of the set of sufficiently good scoring modeled structures in the highest weighted trajectory model (a), as well as comparisons between experimental and modeled protein copy numbers for proteins A (b), B (c), and C (d). Here, we see the model is in good agreement with the data used to construct it.\n",
+ "Here, we plot the comparison between the experimental data used in model construction and the set of trajectories. This analysis includes the cross-correlation coefficient between the experimental EM density and the forward density of the set of sufficiently good scoring modeled structures in the highest weighted trajectory (a), as well as comparisons between experimental and modeled protein copy numbers for proteins A (b), B (c), and C (d). Here, we see the model is in good agreement with the data used to construct it.\n",
"\n",
""
]
@@ -1268,11 +1268,11 @@
"id": "64e1a2f6-7258-462d-bde3-7734743b5aa1",
"metadata": {},
"source": [
- "Finally, we plot the results for assessing the spatiotemporal model with data not used to construct it. Comparisons are made between the centroid structure of the most populated cluster in each snapshot model at each time point and the experimental SAXS profile for 0 (a), 1 (b), and 2 (c) minutes. Further, we plot both the sampling precision (dark red) and the RMSD to the PDB structure (light red) for each snapshot model in the highest trajectory model (d).\n",
+ "Finally, we plot the results for assessing the spatiotemporal model with data not used to construct it. Comparisons are made between the centroid structure of the most populated cluster in each snapshot model at each time point and the experimental SAXS profile for 0 (a), 1 (b), and 2 (c) minutes. Further, we plot both the sampling precision (dark red) and the RMSD to the PDB structure (light red) for each snapshot model in the highest weighted trajectory (d).\n",
"\n",
"\n",
"\n",
- "To quantitatively compare the model to SAXS data, we used the $\\chi^2$ to compare each snapshot model to the experimental profile. We note that the $\\chi^2$ are substantially lower for the models along the highest trajectory model (1_0min, 1_1min, and 1_2min) than for other models, indicating that the highest weighted trajectory model is in better agreement with the experimental SAXS data than other possible trajectory models.\n",
+ "To quantitatively compare the model to SAXS data, we used the $\\chi^2$ to compare each snapshot model to the experimental profile. We note that the $\\chi^2$ are substantially lower for the models along the highest weighted trajectory (1_0min, 1_1min, and 1_2min) than for other models, indicating that the highest weighted trajectory is in better agreement with the experimental SAXS data than other possible trajectories.\n",
"\n",
"\n",
"\n",
diff --git a/Jupyter/images/Spatiotemporal_Model.png b/Jupyter/images/Spatiotemporal_Model.png
index 66571a5c9..2788a3c58 100644
Binary files a/Jupyter/images/Spatiotemporal_Model.png and b/Jupyter/images/Spatiotemporal_Model.png differ
diff --git a/Jupyter/images/static_snapshots.png b/Jupyter/images/static_snapshots.png
index d9ac15238..7a077e9e3 100644
Binary files a/Jupyter/images/static_snapshots.png and b/Jupyter/images/static_snapshots.png differ
diff --git a/Jupyter/images/static_snapshots_noCC.png b/Jupyter/images/static_snapshots_noCC.png
index 37490a9ee..16001ae83 100644
Binary files a/Jupyter/images/static_snapshots_noCC.png and b/Jupyter/images/static_snapshots_noCC.png differ
diff --git a/Jupyter/spatiotemporal-colab.ipynb b/Jupyter/spatiotemporal-colab.ipynb
index c79518689..e9e37f135 100644
--- a/Jupyter/spatiotemporal-colab.ipynb
+++ b/Jupyter/spatiotemporal-colab.ipynb
@@ -14,10 +14,10 @@
"\n",
"Biomolecules are constantly in motion; therefore, a complete depiction of their function must include their dynamics instead of just static structures. We have developed an integrative spatiotemporal approach to model dynamic systems.\n",
"\n",
- "Our approach applies a composite workflow, consisting of three modeling problems to compute (i) heterogeneity models, (ii) snapshot models, and (iii) trajectory models.\n",
+ "Our approach applies a composite workflow, consisting of three modeling problems to compute (i) heterogeneity models, (ii) snapshot models, and (iii) a trajectory model.\n",
"Heterogeneity models describe the possible biomolecular compositions of the system at each time point. Optionally, other auxiliary variables can be considered, such as the coarse location in the final state when modeling an assembly process.\n",
"For each heterogeneity model, one snapshot model is produced. A snapshot model is a set of alternative standard static integrative structure models based on the information available for the corresponding time point.\n",
- "Then, trajectory models are created by connecting alternative snapshot models at adjacent time points. These trajectory models can be scored based on both the scores of static structures and the transitions between them, allowing for the creation of trajectories that are in agreement with the input information by construction.\n",
+ "Then, a set of trajectories ranked by their agreement with input information is computed by connecting alternative snapshot models at adjacent time points (*i.e.*, the “trajectory model”). This trajectory model can be scored based on both the scores of static structures and the transitions between them, allowing for the creation of trajectories that are in agreement with the input information by construction.\n",
"\n",
"If you use this tutorial or its accompanying method, please site the corresponding publications:\n",
"\n",
@@ -139,7 +139,7 @@
"\n",
"# Heterogeneity modeling step 3: assessment\n",
"\n",
- "Now, we have a variety of heterogeneity models. In general, there are four ways to assess a model: estimate the sampling precision, compare the model to data used to construct it, validate the model against data not used to construct it, and quantify the precision of the model. Here, we will focus specifically on comparing the model to experimental data, as other assessments will be performed later, when the trajectory models are assessed.\n",
+ "Now, we have a variety of heterogeneity models. In general, there are four ways to assess a model: estimate the sampling precision, compare the model to data used to construct it, validate the model against data not used to construct it, and quantify the precision of the model. Here, we will focus specifically on comparing the model to experimental data, as other assessments will be performed later, when the trajectory model is assessed.\n",
"\n",
"Next, we can plot the modeled and experimental copy numbers simultaneously for each protein, as shown below for proteins A (a), B (b), and C (c).\n",
"\n",
@@ -480,7 +480,7 @@
"Modeling of a Trajectory\n",
"====================================\n",
"\n",
- "Here, we describe the final modeling problem in our composite workflow, how to build models of trajectory models using IMP.\n",
+ "Here, we describe the final modeling problem in our composite workflow, how to build a trajectory model using IMP.\n",
"\n",
"# Trajectory modeling step 1: gathering of information\n",
"\n",
@@ -492,29 +492,29 @@
"\n",
"# Trajectory modeling step 2: representation, scoring function, and search process\n",
"\n",
- "Trajectory modeling connects alternative snapshot models at adjacent time points, followed by scoring the trajectory models based on their fit to the input information, as described in full [here](https://www.biorxiv.org/content/10.1101/2024.08.06.606842v1.abstract).\n",
+ "Trajectory modeling connects alternative snapshot models at adjacent time points, followed by scoring the trajectories based on their fit to the input information, as described in full [here](https://www.biorxiv.org/content/10.1101/2024.08.06.606842v1.abstract).\n",
"\n",
"## Background behind integrative spatiotemporal modeling\n",
"### Representing the model\n",
"\n",
- "We choose to represent dynamic processes as a trajectory of snapshot models, with one snapshot model at each time point. In this case, we computed snapshot models at 3 time points (0, 1, and 2 minutes), so a single trajectory model will consist of 3 snapshot models, one at each 0, 1, and 2 minutes. The modeling procedure described here will produce a set of scored trajectory models, which can be displayed as a directed acyclic graph, where nodes in the graph represent the snapshot model and edges represent connections between snapshot models at neighboring time points.\n",
+ "We choose to represent dynamic processes as a trajectory of snapshot models, with one snapshot model at each time point. In this case, we computed snapshot models at 3 time points (0, 1, and 2 minutes), so a single trajectory will consist of 3 snapshot models, one at each 0, 1, and 2 minutes. The modeling procedure described here will produce a set of scored trajectories, which can be displayed as a directed acyclic graph, where nodes in the graph represent the snapshot model and edges represent connections between snapshot models at neighboring time points.\n",
"\n",
"### Scoring the model\n",
"\n",
- "To score trajectory models, we incorporate both the scores of individual snapshot models, as well as the scores of transitions between them. Under the assumption that the process is Markovian (*i.e.* memoryless), the weight of a trajectory model takes the form:\n",
+ "To score trajectories, we incorporate both the scores of individual snapshot models, as well as the scores of transitions between them. Under the assumption that the process is Markovian (*i.e.* memoryless), the weight of a trajectory takes the form:\n",
"\n",
"$$\n",
"W(\\chi) \\propto \\displaystyle\\prod^{T}_{t=0} P( X_{t} | D_{t}) \\cdot \\displaystyle\\prod^{T-1}_{t=0} W(X_{t+1} | X_{t},D_{t,t+1}),\n",
"$$\n",
"\n",
- "where $t$ indexes times from 0 until the final modeled snapshot ($T$); $P(X_{t} | D_{t})$ is the snapshot model score; and $W(X_{t+1} | X_{t},D_{t,t+1})$ is the transition score. Trajectory model weights ($W(\\chi)$) are normalized so that the sum of all trajectory models' weights is 1.0. Transition scores are currently based on a simple metric that either allows or disallows a transition. Transitions are only allowed if all proteins in the first snapshot model are included in the second snapshot model. In the future, we hope to include more detailed transition scoring terms, which may take into account experimental information or physical models of macromolecular dynamics.\n",
+ "where $t$ indexes times from 0 until the final modeled snapshot ($T$); $P(X_{t} | D_{t})$ is the snapshot model score; and $W(X_{t+1} | X_{t},D_{t,t+1})$ is the transition score. Trajectory weights ($W(\\chi)$) are normalized so that the sum of all trajectory weights is 1.0. Transition scores are currently based on a simple metric that either allows or disallows a transition. Transitions are only allowed if all proteins in the first snapshot model are included in the second snapshot model. In the future, we hope to include more detailed transition scoring terms, which may take into account experimental information or physical models of macromolecular dynamics.\n",
"\n",
"### Searching for good scoring models\n",
"\n",
- "Trajectory models are constructed by enumerating all connections between adjacent snapshot models and scoring these trajectory models according to the equation above. This procedure results in a set of weighted trajectory models.\n",
+ "Trajectories are constructed by enumerating all connections between adjacent snapshot models and scoring these trajectories according to the equation above. This procedure results in a set of weighted trajectories.\n",
"\n",
"## Computing trajectory models\n",
- "To compute trajectory models, we first copy all necessary files to a new directory, `data`. These files are (i) `{state}_{time}.config` files, which include the subcomplexes that are in each state, (ii) `{state}_{time}_scores.log`, which is a list of all scores of all structural models in that snapshot model, and (iii) `exp_comp{prot}.csv`, which is the experimental copy number for each protein (`{prot}`) as a function of time. Here, we copy files related to the snapshot models (`*.log` files) from the `modeling` directory, as we skipped computing snapshot models due to the computational expense.\n"
+ "To compute trajectories, we first copy all necessary files to a new directory, `data`. These files are (i) `{state}_{time}.config` files, which include the subcomplexes that are in each state, (ii) `{state}_{time}_scores.log`, which is a list of all scores of all structural models in that snapshot model, and (iii) `exp_comp{prot}.csv`, which is the experimental copy number for each protein (`{prot}`) as a function of time. Here, we copy files related to the snapshot models (`*.log` files) from the `modeling` directory, as we skipped computing snapshot models due to the computational expense.\n"
]
},
{
@@ -692,13 +692,13 @@
"metadata": {},
"source": [
"After running `spatiotemporal.create_DAG`, a variety of outputs are written:\n",
- "- `cdf.txt`: the cumulative distribution function for the set of trajectory models.\n",
- "- `pdf.txt`: the probability distribution function for the set of trajectory models.\n",
- "- `labeled_pdf.txt`: Each row has 2 columns and represents a different trajectory model. The first column labels a single trajectory model as a series of snapshot models, where each snapshot model is written as `{state}_{time}|` in sequential order. The second column is the probability distribution function corresponding to that trajectory model.\n",
+ "- `cdf.txt`: the cumulative distribution function for the set of trajectories.\n",
+ "- `pdf.txt`: the probability distribution function for the set of trajectories.\n",
+ "- `labeled_pdf.txt`: Each row has 2 columns and represents a different trajectory. The first column labels a single trajectory as a series of snapshot models, where each snapshot model is written as `{state}_{time}|` in sequential order. The second column is the probability distribution function corresponding to that trajectory.\n",
"- `dag_heatmap.eps` and `dag_heatmap`: image of the directed acyclic graph from the set of models.\n",
- "- `path*.txt`: files where each row includes a `{state}_{time}` string, so that rows correspond to the states visited over that trajectory model. Files are numbered from the most likely path to the least likely path.\n",
+ "- `path*.txt`: files where each row includes a `{state}_{time}` string, so that rows correspond to the states visited over that trajectory. Files are numbered from the most likely path to the least likely path.\n",
"\n",
- "Now that we have a trajectory model, we can plot the directed acyclic graph (left) and the series of centroid models from each snapshot model along the most likely trajectory model (right). Each row corresponds to a different time point in the assembly process (0 min, 1 min, and 2 min). Each node is shaded according to its weight in the final model ($W(X_{N,t}N_{t})$). Proteins are colored as A - blue, B - orange, and C - purple.\n",
+ "Now that we have a trajectory model, we can plot the directed acyclic graph (left) and the series of centroid models from each snapshot model along the most likely trajectory (right). Each row corresponds to a different time point in the assembly process (0 min, 1 min, and 2 min). Each node is shaded according to its weight in the final model ($W(X_{t})$). Proteins are colored as A - blue, B - orange, and C - purple.\n",
"\n",
"\n"
]
@@ -715,9 +715,9 @@
"\n",
"## Sampling precision\n",
"\n",
- "To begin, we calculate the sampling precision of the models. The sampling precision is calculated by using `spatiotemporal.create_DAG` to reconstruct the set of trajectory models using 2 independent sets of samplings for snapshot models. Then, the overlap between these snapshot models is evaluated using `analysis.temporal_precision`, which takes in two `labeled_pdf` files.\n",
+ "To begin, we calculate the sampling precision of the models. The sampling precision is calculated by using `spatiotemporal.create_DAG` to reconstruct the set of trajectories using 2 independent sets of samplings for snapshot models. Then, the overlap between these snapshot models is evaluated using `analysis.temporal_precision`, which takes in two `labeled_pdf` files.\n",
"\n",
- "The temporal precision can take values between 1.0 and 0.0, and indicates the overlap between the two models in trajectory space. Hence, values close to 1.0 indicate a high sampling precision, while values close to 0.0 indicate a low sampling precision. Here, the value close to 1.0 indicates that sampling does not affect the weights of the trajectory models.\n"
+ "The temporal precision can take values between 1.0 and 0.0, and indicates the overlap between the two models in trajectory space. Hence, values close to 1.0 indicate a high sampling precision, while values close to 0.0 indicate a low sampling precision. Here, the value close to 1.0 indicates that sampling does not affect the weights of the trajectories.\n"
]
},
{
@@ -864,9 +864,9 @@
"source": [
"## Model precision\n",
"\n",
- "Next, we calculate the precision of the model, using `analysis.precision`. Here, the model precision calculates the number of trajectory models with high weights. The precision ranges from 1.0 to 1/d, where d is the number of trajectory models. Values approaching 1.0 indicate the model set can be described by a single trajectory model, while values close to 1/d indicate that all trajectory models have similar weights.\n",
+ "Next, we calculate the precision of the model, using `analysis.precision`. Here, the model precision calculates the number of trajectories with high weights. The precision ranges from 1.0 to 1/d, where d is the number of trajectories. Values approaching 1.0 indicate the model set can be described by a single trajectory, while values close to 1/d indicate that all trajectories have similar weights.\n",
"\n",
- "The `analysis.precision` function reads in the `labeled_pdf` of the complete model, and calculates the precision of the model. The value close to 1.0 indicates that the set of models can be sufficiently represented by a single trajectory model."
+ "The `analysis.precision` function reads in the `labeled_pdf` of the complete model, and calculates the precision of the model. The value close to 1.0 indicates that the set of models can be sufficiently represented by a single trajectory."
]
},
{
@@ -915,7 +915,7 @@
"id": "70a6e0a0-98d5-4b63-a839-221a4fdc493b",
"metadata": {},
"source": [
- "After comparing the model to EM data, we aimed to compare the model to copy number data, and wrote the `forward_model_copy_number` function to evaluate the copy numbers from our set of trajectory models. The output of `forward_model_copy_number` is written in `forward_model_copy_number/`. The folder contains `CN_prot_{prot}.txt` files for each protein, which have the mean and standard deviation of protein copy number at each time point. We can then plot these copy numbers from the forward models against those from the experiment, as shown below."
+ "After comparing the model to EM data, we aimed to compare the model to copy number data, and wrote the `forward_model_copy_number` function to evaluate the copy numbers from our set of trajectories. The output of `forward_model_copy_number` is written in `forward_model_copy_number/`. The folder contains `CN_prot_{prot}.txt` files for each protein, which have the mean and standard deviation of protein copy number at each time point. We can then plot these copy numbers from the forward models against those from the experiment, as shown below."
]
},
{
@@ -1058,7 +1058,7 @@
"id": "de81dea5-9d98-4424-ba64-d7a2def893b9",
"metadata": {},
"source": [
- "Here, we plot the comparison between the experimental data used in model construction and the set of trajectory models. This analysis includes the cross-correlation coefficient between the experimental EM density and the forward density of the set of sufficiently good scoring modeled structures in the highest weighted trajectory model (a), as well as comparisons between experimental and modeled protein copy numbers for proteins A (b), B (c), and C (d). Here, we see the model is in good agreement with the data used to construct it.\n",
+ "Here, we plot the comparison between the experimental data used in model construction and the set of trajectories. This analysis includes the cross-correlation coefficient between the experimental EM density and the forward density of the set of sufficiently good scoring modeled structures in the highest weighted trajectory (a), as well as comparisons between experimental and modeled protein copy numbers for proteins A (b), B (c), and C (d). Here, we see the model is in good agreement with the data used to construct it.\n",
"\n",
""
]
@@ -1268,11 +1268,11 @@
"id": "64e1a2f6-7258-462d-bde3-7734743b5aa1",
"metadata": {},
"source": [
- "Finally, we plot the results for assessing the spatiotemporal model with data not used to construct it. Comparisons are made between the centroid structure of the most populated cluster in each snapshot model at each time point and the experimental SAXS profile for 0 (a), 1 (b), and 2 (c) minutes. Further, we plot both the sampling precision (dark red) and the RMSD to the PDB structure (light red) for each snapshot model in the highest trajectory model (d).\n",
+ "Finally, we plot the results for assessing the spatiotemporal model with data not used to construct it. Comparisons are made between the centroid structure of the most populated cluster in each snapshot model at each time point and the experimental SAXS profile for 0 (a), 1 (b), and 2 (c) minutes. Further, we plot both the sampling precision (dark red) and the RMSD to the PDB structure (light red) for each snapshot model in the highest weighted trajectory (d).\n",
"\n",
"\n",
"\n",
- "To quantitatively compare the model to SAXS data, we used the $\\chi^2$ to compare each snapshot model to the experimental profile. We note that the $\\chi^2$ are substantially lower for the models along the highest trajectory model (1_0min, 1_1min, and 1_2min) than for other models, indicating that the highest weighted trajectory model is in better agreement with the experimental SAXS data than other possible trajectory models.\n",
+ "To quantitatively compare the model to SAXS data, we used the $\\chi^2$ to compare each snapshot model to the experimental profile. We note that the $\\chi^2$ are substantially lower for the models along the highest weighted trajectory (1_0min, 1_1min, and 1_2min) than for other models, indicating that the highest weighted trajectory is in better agreement with the experimental SAXS data than other possible trajectories.\n",
"\n",
"\n",
"\n",
diff --git a/doc/heterogeneity.md b/doc/heterogeneity.md
index ec6b6e398..63c6a1070 100644
--- a/doc/heterogeneity.md
+++ b/doc/heterogeneity.md
@@ -39,8 +39,8 @@ From the output of `prepare_protein_library`, we see that there are 3 heterogene
# Heterogeneity modeling step 3: assessment {#heterogeneity_assess}
-Now, we have a variety of heterogeneity models. In general, there are four ways to assess a model: estimate the sampling precision, compare the model to data used to construct it, validate the model against data not used to construct it, and quantify the precision of the model. Here, we will focus specifically on comparing the model to experimental data, as other assessments will be performed later, when the [trajectory models are assessed.] (@ref trajectory_assess)
-
-In the `Heterogeneity/Heterogeneity_Assessment` folder, there is a single script, `plot_heterogeneity.m`. This script plots the modeled and experimental copy numbers simultaneously, as shown below for proteins A (a), B (b), and C (c). From these plots, we observe that the range of possible experimental copy numbers are well sampled by the heterogeneity models, indicating that we are prepared for [snapshot modeling.] (@ref snapshots)
+Now, we have a variety of heterogeneity models. In general, there are four ways to assess a model: estimate the sampling precision, compare the model to data used to construct it, validate the model against data not used to construct it, and quantify the precision of the model. Here, we will focus specifically on comparing the model to experimental data, as other assessments will be performed later, when the [trajectory model is assessed.] (@ref trajectory_assess)
\image html Heterogeneity_Assessment.png width=600px
+
+In the `Heterogeneity/Heterogeneity_Assessment` folder, there is a single script, `plot_heterogeneity.m`. This script plots the modeled and experimental copy numbers simultaneously, as shown below for proteins A (a), B (b), and C (c). From these plots, we observe that the range of possible experimental copy numbers are well sampled by the heterogeneity models, indicating that we are prepared for [snapshot modeling.] (@ref snapshots)
\ No newline at end of file
diff --git a/doc/images/Spatiotemporal_Model.png b/doc/images/Spatiotemporal_Model.png
index 66571a5c9..2788a3c58 100644
Binary files a/doc/images/Spatiotemporal_Model.png and b/doc/images/Spatiotemporal_Model.png differ
diff --git a/doc/images/static_snapshots.png b/doc/images/static_snapshots.png
index d9ac15238..7a077e9e3 100644
Binary files a/doc/images/static_snapshots.png and b/doc/images/static_snapshots.png differ
diff --git a/doc/images/static_snapshots_noCC.png b/doc/images/static_snapshots_noCC.png
index 37490a9ee..16001ae83 100644
Binary files a/doc/images/static_snapshots_noCC.png and b/doc/images/static_snapshots_noCC.png differ
diff --git a/doc/mainpage.md b/doc/mainpage.md
index 3efd3fdf2..865ac838e 100644
--- a/doc/mainpage.md
+++ b/doc/mainpage.md
@@ -7,10 +7,10 @@ Integrative spatiotemporal modeling in IMP {#mainpage}
Biomolecules are constantly in motion; therefore, a complete depiction of their function must include their dynamics instead of just static structures. We have developed an integrative spatiotemporal approach to model dynamic systems.
-Our approach applies a composite workflow, consisting of three modeling problems to compute (i) heterogeneity models, (ii) snapshot models, and (iii) trajectory models.
+Our approach applies a composite workflow, consisting of three modeling problems to compute (i) heterogeneity models, (ii) snapshot models, and (iii) a trajectory model.
Heterogeneity models describe the possible biomolecular compositions of the system at each time point. Optionally, other auxiliary variables can be considered, such as the coarse location in the final state when modeling an assembly process.
For each heterogeneity model, one snapshot model is produced. A snapshot model is a set of alternative standard static integrative structure models based on the information available for the corresponding time point.
-Then, trajectory models are created by connecting alternative snapshot models at adjacent time points. These trajectory models can be scored based on both the scores of static structures and the transitions between them, allowing for the creation of trajectories that are in agreement with the input information by construction.
+Then, a set of trajectories ranked by their agreement with input information is computed by connecting alternative snapshot models at adjacent time points (*i.e.*, the “trajectory model”). This trajectory model can be scored based on both the scores of static structures and the transitions between them, allowing for the creation of trajectories that are in agreement with the input information by construction.
If you use this tutorial or its accompanying method, please site the corresponding publications:
diff --git a/doc/trajectory.md b/doc/trajectory.md
index 7c568cff7..23ff97db3 100644
--- a/doc/trajectory.md
+++ b/doc/trajectory.md
@@ -1,7 +1,7 @@
Modeling of a Trajectory {#trajectories}
====================================
-Here, we describe the final modeling problem in our composite workflow, how to build models of trajectory models using IMP.
+Here, we describe the final modeling problem in our composite workflow, how to build a trajectory model using IMP.
# Trajectory modeling step 1: gathering of information {#trajectories1}
@@ -13,27 +13,27 @@ Snapshot models inform transitions between sampled time points, and their scores
# Trajectory modeling step 2: representation, scoring function, and search process {#trajectories2}
-Trajectory modeling connects alternative snapshot models at adjacent time points, followed by scoring the trajectory models based on their fit to the input information, as described in full [here](https://www.biorxiv.org/content/10.1101/2024.08.06.606842v1.abstract).
+Trajectory modeling connects alternative snapshot models at adjacent time points, followed by scoring the trajectories based on their fit to the input information, as described in full [here](https://www.biorxiv.org/content/10.1101/2024.08.06.606842v1.abstract).
## Background behind integrative spatiotemporal modeling
### Representing the model {#trajectory_representation}
-We choose to represent dynamic processes as a trajectory of snapshot models, with one snapshot model at each time point. In this case, we computed snapshot models at 3 time points (0, 1, and 2 minutes), so a single trajectory model will consist of 3 snapshot models, one at each 0, 1, and 2 minutes. The modeling procedure described here will produce a set of scored trajectory models, which can be displayed as a directed acyclic graph, where nodes in the graph represent the snapshot model and edges represent connections between snapshot models at neighboring time points.
+We choose to represent dynamic processes as a trajectory of snapshot models, with one snapshot model at each time point. In this case, we computed snapshot models at 3 time points (0, 1, and 2 minutes), so a single trajectory will consist of 3 snapshot models, one at each 0, 1, and 2 minutes. The modeling procedure described here will produce a set of scored trajectories, which can be displayed as a directed acyclic graph, where nodes in the graph represent the snapshot model and edges represent connections between snapshot models at neighboring time points.
### Scoring the model {#trajectory_scoring}
-To score trajectory models, we incorporate both the scores of individual snapshot models, as well as the scores of transitions between them. Under the assumption that the process is Markovian (*i.e.* memoryless), the weight of a trajectory model takes the form:
+To score trajectories, we incorporate both the scores of individual snapshot models, as well as the scores of transitions between them. Under the assumption that the process is Markovian (*i.e.* memoryless), the weight of a trajectory takes the form:
\f[
W(\chi) \propto \displaystyle\prod^{T}_{t=0} P( X_{t} | D_{t}) \cdot \displaystyle\prod^{T-1}_{t=0} W(X_{t+1} | X_{t},D_{t,t+1})
\f]
-where \f$t\f$ indexes times from 0 until the final modeled snapshot (\f$T\f$); \f$P(X_{t} | D_{t})\f$ is the snapshot model score; and \f$W(X_{t+1} | X_{t}, D_{t,t+1})\f$ is the transition score. Trajectory model weights (\f$W(\chi)\f$) are normalized so that the sum of all trajectory models' weights is 1.0. Transition scores are currently based on a simple metric that either allows or disallows a transition. Transitions are only allowed if all proteins in the first snapshot model are included in the second snapshot model. In the future, we hope to include more detailed transition scoring terms, which may take into account experimental information or physical models of macromolecular dynamics.
+where \f$t\f$ indexes times from 0 until the final modeled snapshot (\f$T\f$); \f$P(X_{t} | D_{t})\f$ is the snapshot model score; and \f$W(X_{t+1} | X_{t}, D_{t,t+1})\f$ is the transition score. Trajectory weights (\f$W(\chi)\f$) are normalized so that the sum of all trajectory weights is 1.0. Transition scores are currently based on a simple metric that either allows or disallows a transition. Transitions are only allowed if all proteins in the first snapshot model are included in the second snapshot model. In the future, we hope to include more detailed transition scoring terms, which may take into account experimental information or physical models of macromolecular dynamics.
### Searching for good scoring models {#trajectory_searching}
-Trajectory models are constructed by enumerating all connections between adjacent snapshot models and scoring these trajectory models according to the equation above. This procedure results in a set of weighted trajectory models.
+Trajectories are constructed by enumerating all connections between adjacent snapshot models and scoring these trajectories according to the equation above. This procedure results in a set of weighted trajectories.
## Code for integrative spatiotemporal modeling {#trajectory_example}
@@ -50,7 +50,7 @@ state_dict = {'0min': 3, '1min': 3, '2min': 1}
create_data_and_copy_files(state_dict)
\endcode
-We then build the spatiotemporal graph by running `spatiotemporal.create_DAG`, [documented here](https://integrativemodeling.org/nightly/doc/ref/namespaceIMP_1_1spatiotemporal_1_1create__DAG.html). This function represents, scores, and searches for trajectory models.
+We then build the spatiotemporal graph by running `spatiotemporal.create_DAG`, [documented here](https://integrativemodeling.org/nightly/doc/ref/namespaceIMP_1_1spatiotemporal_1_1create__DAG.html). This function represents, scores, and searches for trajectories.
\code{.py}
# then trajectory model is created based on the all copied data
@@ -81,13 +81,13 @@ The inputs we included are:
- draw_dag (bool): whether to write out an image of the directed acyclic graph.
After running `spatiotemporal.create_DAG`, a variety of outputs are written:
-- `cdf.txt`: the cumulative distribution function for the set of trajectory models.
-- `pdf.txt`: the probability distribution function for the set of trajectory models.
-- `labeled_pdf.txt`: Each row has 2 columns and represents a different trajectory model. The first column labels a single trajectory model as a series of snapshot models, where each snapshot model is written as `{state}_{time}|` in sequential order. The second column is the probability distribution function corresponding to that trajectory model.
-- `dag_heatmap.eps` and `dag_heatmap`: image of the directed acyclic graph from the set of models.
-- `path*.txt`: files where each row includes a `{state}_{time}` string, so that rows correspond to the states visited over that trajectory model. Files are numbered from the most likely path to the least likely path.
+- `cdf.txt`: the cumulative distribution function for the set of trajectories.
+- `pdf.txt`: the probability distribution function for the set of trajectories.
+- `labeled_pdf.txt`: Each row has 2 columns and represents a different trajectory. The first column labels a single trajectory as a series of snapshot models, where each snapshot model is written as `{state}_{time}|` in sequential order. The second column is the probability distribution function corresponding to that trajectory.
+- `dag_heatmap.eps` and `dag_heatmap`: image of the directed acyclic graph from the set of trajectories.
+- `path*.txt`: files where each row includes a `{state}_{time}` string, so that rows correspond to the states visited over that trajectory. Files are numbered from the most likely path to the least likely path.
-Now that we have a trajectory model, we can plot the directed acyclic graph (left) and the series of centroid models from each snapshot model along the most likely trajectory model (right). Each row corresponds to a different time point in the assembly process (0 min, 1 min, and 2 min). Each node is shaded according to its weight in the final model (\f$W(X_{N,t}N_{t})\f$). Proteins are colored as A - blue, B - orange, and C - purple.
+Now that we have a trajectory model, we can plot the directed acyclic graph (left) and the series of centroid models from each snapshot model along the most likely trajectory (right). Each row corresponds to a different time point in the assembly process (0 min, 1 min, and 2 min). Each node is shaded according to its weight in the final model (\f$W(X_{t})\f$). Proteins are colored as A - blue, B - orange, and C - purple.
\image html Spatiotemporal_Model.png width=600px
@@ -99,7 +99,7 @@ Navigate to `Trajectories/Trajectories_Assessment` and run `trajectories_assessm
## Sampling precision {#trajectory_sampling_precision}
-To begin, we calculate the sampling precision of the models. The sampling precision is calculated by using `spatiotemporal.create_DAG` to reconstruct the set of trajectory models using 2 independent sets of samplings for snapshot models. Then, the overlap between these snapshot models is evaluated using `analysis.temporal_precision`, which takes in two `labeled_pdf` files.
+To begin, we calculate the sampling precision of the models. The sampling precision is calculated by using `spatiotemporal.create_DAG` to reconstruct the set of trajectories using 2 independent sets of samplings for snapshot models. Then, the overlap between these snapshot models is evaluated using `analysis.temporal_precision`, which takes in two `labeled_pdf` files.
\code{.py}
# state_dict - universal parameter
@@ -153,7 +153,7 @@ print("")
print("")
\endcode
-The output of `analysis.temporal_precision` is written in `analysis_output_precision/temporal_precision.txt`, shown below. The temporal precision can take values between 1.0 and 0.0, and indicates the overlap between the two models in trajectory space. Hence, values close to 1.0 indicate a high sampling precision, while values close to 0.0 indicate a low sampling precision. Here, the value close to 1.0 indicates that sampling does not affect the weights of the trajectory models.
+The output of `analysis.temporal_precision` is written in `analysis_output_precision/temporal_precision.txt`, shown below. The temporal precision can take values between 1.0 and 0.0, and indicates the overlap between the two models in trajectory space. Hence, values close to 1.0 indicate a high sampling precision, while values close to 0.0 indicate a low sampling precision. Here, the value close to 1.0 indicates that sampling does not affect the weights of the trajectories.
\code{.txt}
Temporal precision between ../outputA/labeled_pdf.txt and ../outputB/labeled_pdf.txt:
@@ -162,7 +162,7 @@ Temporal precision between ../outputA/labeled_pdf.txt and ../outputB/labeled_pdf
## Model precision {#trajectory_precision}
-Next, we calculate the precision of the model, using `analysis.precision`. Here, the model precision calculates the number of trajectory models with high weights. The precision ranges from 1.0 to 1/d, where d is the number of trajectory models. Values approaching 1.0 indicate the model set can be described by a single trajectory model, while values close to 1/d indicate that all trajectory models have similar weights.
+Next, we calculate the precision of the model, using `analysis.precision`. Here, the model precision calculates the number of trajectories with high weights. The precision ranges from 1.0 to 1/d, where d is the number of trajectories. Values approaching 1.0 indicate the model set can be described by a single trajectory, while values close to 1/d indicate that all trajectories have similar weights.
\code{.py}
## 2 - calculation of precision of the model
@@ -178,7 +178,7 @@ print("")
print("")
\endcode
-The `analysis.precision` function reads in the `labeled_pdf` of the complete model, and writes the output file to `analysis_output_precision/precision.txt`, shown below. The value close to 1.0 indicates that the set of models can be sufficiently represented by a single trajectory model.
+The `analysis.precision` function reads in the `labeled_pdf` of the complete model, and writes the output file to `analysis_output_precision/precision.txt`, shown below. The value close to 1.0 indicates that the set of models can be sufficiently represented by a single trajectory.
\code{.txt}
Precision of ../Trajectories_Modeling/output/labeled_pdf.txt:
@@ -200,7 +200,7 @@ print("")
The output of `ccEM` is written in `ccEM_output/`. It contains forward densities for each snapshot model (`MRC_{state}_{time}.mrc`) and `ccEM_calculations.txt`, which contains the cross-correlation to the experimental EM profile for each snapshot model.
-After comparing the model to EM data, we aimed to compare the model to copy number data, and wrote the `forward_model_copy_number` function to evaluate the copy numbers from our set of trajectory models.
+After comparing the model to EM data, we aimed to compare the model to copy number data, and wrote the `forward_model_copy_number` function to evaluate the copy numbers from our set of trajectories.
\code{.py}
# 3b - comparison of the model to data used in modeling (copy number)
@@ -213,7 +213,7 @@ print("")
The output of `forward_model_copy_number` is written in `forward_model_copy_number/`. The folder contains `CN_prot_{prot}.txt` files for each protein, which have the mean and standard deviation of protein copy number at each time point.
-Here, we plot the comparison between the experimental data used in model construction and the set of trajectory models. This analysis includes the cross-correlation coefficient between the experimental EM density and the forward density of the set of sufficiently good scoring modeled structures in the highest weighted trajectory model (a), as well as comparisons between experimental and modeled protein copy numbers for proteins A (b), B (c), and C (d). Here, we see the model is in good agreement with the data used to construct it.
+Here, we plot the comparison between the experimental data used in model construction and the set of trajectories. This analysis includes the cross-correlation coefficient between the experimental EM density and the forward density of the set of sufficiently good scoring modeled structures in the highest weighted trajectory (a), as well as comparisons between experimental and modeled protein copy numbers for proteins A (b), B (c), and C (d). Here, we see the model is in good agreement with the data used to construct it.
\image html Spatiotemporal_Assessment_Included.png width=1200px
@@ -252,11 +252,11 @@ print("")
The output of this function is written in `RMSD_calculation_output`. The function outputs `rmsd_{state}_{time}.png` files, which plots the RMSD for each structural model within each snapshot model. This data is then summarized in `RMSD_analysis.txt`, which includes the minimum RMSD, average RMSD, and number of structural models in each snapshot model.
-Here, we plot the results for assessing the spatiotemporal model with data not used to construct it. Comparisons are made between the centroid structure of the most populated cluster in each snapshot model at each time point and the experimental SAXS profile for 0 (a), 1 (b), and 2 (c) minutes. Further, we plot both the sampling precision (dark red) and the RMSD to the PDB structure (light red) for each snapshot model in the highest trajectory model (d).
+Here, we plot the results for assessing the spatiotemporal model with data not used to construct it. Comparisons are made between the centroid structure of the most populated cluster in each snapshot model at each time point and the experimental SAXS profile for 0 (a), 1 (b), and 2 (c) minutes. Further, we plot both the sampling precision (dark red) and the RMSD to the PDB structure (light red) for each snapshot model in the highest trajectory (d).
\image html Spatiotemporal_Assessment_Unused.png width=1200px
-To quantitatively compare the model to SAXS data, we used the \f$\chi^2\f$ to compare each snapshot model to the experimental profile. We note that the \f$\chi^2\f$ are substantially lower for the models along the highest trajectory model (1_0min, 1_1min, and 1_2min) than for other models, indicating that the highest weighted trajectory model is in better agreement with the experimental SAXS data than other possible trajectory models.
+To quantitatively compare the model to SAXS data, we used the \f$\chi^2\f$ to compare each snapshot model to the experimental profile. We note that the \f$\chi^2\f$ are substantially lower for the models along the highest trajectory (1_0min, 1_1min, and 1_2min) than for other models, indicating that the highest weighted trajectory is in better agreement with the experimental SAXS data than other possible trajectories.
\image html Chi2_Table.png width=600px