-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
26 additions
and
37 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1,30 @@ | ||
Step 1: gather information for both snapshot and trajectory modeling {#heterogeneity} | ||
Modeling of heterogeneity {#heterogeneity} | ||
==================================== | ||
|
||
# Gathering information {#gathering} | ||
# Heterogeneity modeling step 1: gather information {#heterogeneity1} | ||
|
||
In the first step, information about the process of interest is gathered. This information can theoretically be applied to either modeling static snapshots or modeling trajectories, and can be used for representing the model, scoring the model, sampling alternative models, filtering sampled models, or validating the models. | ||
# Heterogeneity modeling step 2: representation, scoring, and search process {#heterogeneity2} | ||
|
||
For this tutorial, we used the X-ray crystal structure of the complete Bmi1/Ring1b-UbcH5c complex (a), synthetically generated electron tomography (ET) density maps during the assembly process (b), synthetically generated protein copy numbers during the assembly process, which can be calculated from experiments such as fluorescence correlation spectroscopy (FCS) (c), and synthetically generated small-angle X-ray scattering (SAXS) profiles during the assembly process (d). The crystal structure of the complex informs the final state of our model as well as the structure of the individual proteins. The time-dependent ET and SAXS data give two inputs to inform the size and shape of the assembling complex. The protein copy number data informs the stoichiometry of the complex during assembly. | ||
We first must select which snapshots to model. Here, we choose only to model snapshots at 0 minutes, 1 minute, and 2 minutes because ET and SAXS data are only available at those time points. We know this complex has three protein chains (A, B, and C), and we choose to model these chains based on their protein copy number data. We then use `prepare_protein_library`, [documented here](https://integrativemodeling.org/nightly/doc/ref/namespaceIMP_1_1spatiotemporal_1_1prepare__protein__library.html), to calculate the protein copy numbers for each snapshot model and to use the topology file of the full complex (`spatiotemporal_topology.txt`) to generate a topology file for each of these snapshot models. Here, we choose to model 3 protein copy numbers at each time point, and restrict the final time point to have the same protein copy numbers as the PDB structure. | ||
|
||
\image html Input.png width=600px | ||
\code{.py} | ||
# 1a - parameters for prepare_protein_library: | ||
times = ["0min", "1min", "2min"] | ||
exp_comp = {'A': '../../Input_Information/gen_FCS/exp_compA.csv', | ||
'B': '../../Input_Information/gen_FCS/exp_compB.csv', | ||
'C': '../../Input_Information/gen_FCS/exp_compC.csv'} | ||
expected_subcomplexes = ['A', 'B', 'C'] | ||
template_topology = 'spatiotemporal_topology.txt' | ||
template_dict = {'A': ['Ubi-E2-D3'], 'B': ['BMI-1'], 'C': ['E3-ubi-RING2']} | ||
nmodels = 3 | ||
|
||
These pieces of information are stored in the `Input_Information` folder. In addition to containing the raw data used for the tutorial, this folder contains the code necessary to generate the synthetic data. This code is described in `README` files in each directory, but is not the focus of our tutorial. | ||
# 1b - calling prepare_protein_library | ||
IMP.spatiotemporal.prepare_protein_library.prepare_protein_library(times, exp_comp, expected_subcomplexes, nmodels, | ||
template_topology=template_topology, template_dict=template_dict) | ||
\endcode | ||
|
||
The `FASTA` folder contains `3rpg.fasta.txt`, which provides the sequence information for each protein in the Bmi1/Ring1b-UbcH5c complex. The `PDB` folder contains the PDB structure for the fully assembled Bmi1/Ring1b-UbcH5c complex, [3RPG](https://www.rcsb.org/structure/3rpg). | ||
From the output of `prepare_protein_library`, we see that there are 3 snapshot models at each time point (it is possible to have more snapshot models than copy numbers if multiple copies of the protein exist in the complex). We then wrote `generate_all_snapshots`, which creates a directory for each snapshot, copies the necessary files into that directory, and submits a job script to run sampling. The job script will likely need to be customized for the user's computer or cluster. | ||
|
||
The `gen_FCS` folder contains protein copy number data for each protein as a function of time. Our code will use the `exp_comp{prot}.csv` files, where {prot} is the protein corresponding to that copy number data. Each csv file has 3 rows, which correspond to the time at which the data was taken ("Time"), the mean protein copy number at that time ("mean"), and the standard deviation in protein copy number at that time ("std"). | ||
|
||
The `ET_data` folder contains the time-dependent ET data. Briefly, at each time point, a subset of Bmi1/Ring1b-UbcH5c complex proteins were used to compute a density map at each time point, and then random noise was added to this true density profile. The results of this computation are stored as `add_noise/{time}_noisy.mrc` and `add_noise/{time}_noisy.gmm`, where {time} is the time point in which the time dependent ET data was calculated. | ||
|
||
The `gen_SAXS` folder contains the time-dependent SAXS data. Experimental SAXS profiles are forward profiles calculated from the true structure by [FoXS](https://modbase.compbio.ucsf.edu/foxs/), and are stored as `{time}_exp.dat`, where {time} is the time point in which the time dependent ET data was calculated. | ||
|
||
In addition to the four types of data used here, a variety of data could be useful for the spatiotemporal modeling of protein complexes. IMP currently features restraints for a variety of experimental data or prior models, including chemical cross-links, Förster resonance energy transfer, comparative structural models, and deep-learning structural models, all of which could inform spatiotemporal modeling through a procedure similar to the one presented here. | ||
|
||
Next, we will demonstrate how to perform [Snapshot modeling steps 2-4: representation, scoring, and search process](@ref snapshot1). | ||
# Heterogeneity modeling step 3: assessment {#heterogeneity_assess} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters