Skip to content
/ IMMS Public

Data for paper: Protein shape sampled by ion mobility mass spectrometry consistently improves protein structure prediction

Notifications You must be signed in to change notification settings

smturzo/IMMS

Repository files navigation

IMMS

Supplementary data for the paper: Protein shape sampled by ion mobility mass spectrometry consistently improves protein structure prediction

Authors: SM Bargeen Alam Turzo, Justin T. Seffernick, Amber D. Rolland, Micah T. Donor, Sten Heinze, James S. Prell, View ORCID ProfileVicki Wysocki, View ORCID ProfileSteffen Lindert

The folders within this directory contains:

NC_DATA : This directory contains all the output files from the per_residue_solvent_exposure application.
Paper_Figures : This directory contains all the scripts neccessary to regenerate the figures in the paper mention above.
Top_Predicted_Structures : This directory contains all the top predicted structures from AlphaFold2, RoseTTAFold, Ion-Mobility Score Function, Rosetta Score Function, and Proxy Score Function.
Example_Commands : This directory contains example scripts that were used to run AlphaFold2, RoseTTAFold, and Rosetta (ab initio and comparative modleing)
Example_Data : This directory contains the example data (zipped) for 3A1Y (PDB ID from the ideal dataset) on which rescoring with IM can be performed for reproducibility.

NC_DATA

This directory has the folders and file
  • CCS_0percent_reduced_NC: Output files containing the neighbor count (NC) results for models predicted with the IM score function when CCS_Ideal was used.

  • CCS_2percent_reduced_NC: Output files containing the neighbor count (NC) results for models predicted with the IM score function when CCS_Ideal was reduced by 2%.

  • CCS_5percent_reduced_NC: Output files containing the neighbor count (NC) results for models predicted with the IM score function when CCS_Ideal was reduced by 5%.

  • CCS_10percent_reduced_NC: Output files containing the neighbor count (NC) results for models predicted with the IM score function when CCS_Ideal was reduced by 10%.

  • CCS_20percent_reduced_NC: Output files containing the neighbor count (NC) results for models predicted with the IM score function when CCS_Ideal was reduced by 20%.

  • CCS_30percent_reduced_NC: Output files containing the neighbor count (NC) results for models predicted with the IM score function when CCS_Ideal was reduced by 30%.

  • native_neighbor_count: Output files containing the neighbor count (NC) results for the native crystal structures.

  • All output files from per_residue_solvent_exposure application are named as XXXX.per_res_solv_expo, where XXXX is the pdb id of the protein from the ideal dataset.

  • The file rgs_with_reduced_ccs_data_and_of_native is a comma separated file containg the radius of gyration for all the models predicted with IM for the 60 proteins in the Ideal dataset when the CCS was not reduced (0), and when it was reduced by 2%, 5%, 10%, 20%, and 30%. It also contains the radius of gyration for the native crstal structures.

Paper_Figures

  • This directory has the all the python scripts to regenerate Figures 2-6 and Figures S1-S3 in their appropriate directory.
    • figure2.py : This script will re-generate Fig 2a, 2b, 2c, 2d in the directory FIG2.
      • The FIG2 directory has the files :
        • impact_parcs_ccs_time.csv : This file contains CCS and Timing data of PARCS and IMPACT.
        • parcs_pred_exp_ccs.csv : This file contains predicted CCS from PARCS and experimental CCS for proteins in the experimental dataset.
        • std_dev_ccsparcs_randrot.csv : This file contains the average standard deviation data from varying random rotations for the PARCS evaluation dataset.
    • figure3_figure4.py : This script will regenerate Fig 3 a(i), a(ii), b(i), b(ii) in the directory FIG3. It will also regenerate Fig 4 a(i), a(ii) and c in FIG4 directory.
      • The FIG3 directory also has:
        • Blender images of native 2ZZQ, 3KYJ and 3VZ9 aligned to proxy score function predicted model, Rosetta score function predicted model and IM predicted model. These images corresponds to panel c in Fig 3.
      • The FIG4 directory also has:
        • Blender images of native 3ENI aligned to AF predicted model and IM predicted model. It also has Blender images of 4D6F aligned to RF predicted model and IM predicted model. All of these corresponds to panel b in Fig 4.
    • figure5.py : This script will regenerate Fig 5 a(i), a(ii), b(i) and b(ii) in the directory FIG5.
      • The FIG5 directory also has:
        • Blender images of native 1QOY, 1VXG, and 1YTQ aligned their respective predicted model from proxy score function, Rosetta score function and IM score function. This images are used in panel c of Fig 5.
    • figure6a_figure6b.py : This script will regenerate Fig6 a_i_1qoy, a_i_1ytq, a_i_2mlt, a_i_1vxg, a_ii_1qoy, a_ii_1ytq, a_ii_2mlt, a_ii_1vxg, b_i, b_ii that were used in panel (a) and (b) of Fig 6.
      • The FIG6 directory also has:
        • All scores (RS and IM) and RMSD and TMSCORE values for 1QOY, 1YTQ, 2MLT, 1VXG that were used to create the score vs rmsd and score vs TM-Score plots.
  • Furthermore this directory has the directory Exp_Score_File_Pnear_Results and Ideal_Score_File_Pnear_Results.
    • These two folders contain all raw data from structure prediction results for both the experimental and ideal dataset.
    • These two folders also contains the RG scores from the proxy score functions for both the experimental and ideal dataset.

Top_Predicted_Structures

  • This directory has the folders containing top predicted structures from Alphafold2 (AF), RoseTTAFold (RF), Rosetta Score Function (RS), Ion-Mobility Score Function (IM), Proxy Score Function (RG) as well as the native crystal structures as reference in the folder NATIVES
    • IM : This folder is subdivided into 2 folders. One containing top IM score predicted structures for experimental dataset (EXP) and the other for ideal dataset (IDEAL)
    • RS : This folder is subdivided into 2 folders. One containing top RS score predicted structures for experimental dataset (EXP) and the other for ideal dataset (IDEAL)
    • RG : This folder is subdivided into 2 folders. One containing top RG score predicted structures for experimental dataset (EXP) and the other for ideal dataset (IDEAL)
    • RF : This folder is subdivided into 2 folders. One containing top RF predicted structures for experimental dataset (EXP) and the other for ideal dataset (IDEAL)
    • AF : This folder is subdivided into 2 folders. One containing top AF predicted structures for experimental dataset (EXP) and the other for ideal dataset (IDEAL)
    • AF : This folder is subdivided into 2 folders. One containing top AF predicted structures for experimental dataset (EXP) and the other for ideal dataset (IDEAL)
    • NATIVES : This folder is subdivided into 2 folders. One containing native crystal structures for experimental dataset (EXP) and the other for ideal dataset (IDEAL)

Example_Data

Example_Commands

Instruction to run PARCS and IM Score function

Note 1: These instruction can also be found in the SI our paper "Protein shape sampled by ion mobility mass spectrometry consistently improves protein structure prediction"
Note 2: Since PARCS (for predicting collision cross section in Rosetta) and IM (score structures with IM data) are new implementation of Rosetta, the users have access to these code through the developer agreement license of Rosetta. These implementations will also be available with the next version release of Rosetta.

About

Data for paper: Protein shape sampled by ion mobility mass spectrometry consistently improves protein structure prediction

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages