Skip to content

Commit

Permalink
Add IM2Deep feature generator (#121)
Browse files Browse the repository at this point in the history
* im2deep

* models + debug

* fix

* add models

* fixes in im2deep.py

* fixes in im2deep.py

* im2deep implementation

* CCS shift calculation fix

* models + plot

* add unused argument

* IM2Deep plot correct labels

* calibrate per charge option

* new models and reference

* correct format reference

* fix model name in IM2Deep

* change use of reference dataset

* IM2Deep calibrate correctly

* Implement IM2Deep package

* GUI changes im2deep

* add im2deep to pyproject

* change dockerfile

* optimise spectrum parsing

* update toml file

* added debug logging

* requested changes

* change numpy version upper limit

* fix bug

* Use newer setup-python action; revert numpy upper limit

* Fix numpy version for py311

* Update pyproject.toml

Use semi-colon instead of comma's in dependencies

* requested changes

* Update ms2rescore/feature_generators/im2deep.py

---------

Co-authored-by: rodvrees <[email protected]>
Co-authored-by: rodvrees <[email protected]>
Co-authored-by: rodvrees <[email protected]>
Co-authored-by: RalfG <[email protected]>
  • Loading branch information
5 people authored Feb 21, 2024
1 parent d8340f5 commit 667822e
Show file tree
Hide file tree
Showing 12 changed files with 416 additions and 106 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: "3.11"

Expand Down Expand Up @@ -47,7 +47,7 @@ jobs:
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"

Expand Down
11 changes: 6 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
FROM ubuntu:focal
FROM python:3.10

# ARG DEBIAN_FRONTEND=noninteractive

LABEL name="ms2rescore"

ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/ms2rescore
# ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/ms2rescore

ADD pyproject.toml /ms2rescore/pyproject.toml
ADD LICENSE /ms2rescore/LICENSE
Expand All @@ -11,8 +13,7 @@ ADD MANIFEST.in /ms2rescore/MANIFEST.in
ADD ms2rescore /ms2rescore/ms2rescore

RUN apt-get update \
&& apt-get install -y python3-pip procps libglib2.0-0 libsm6 libxrender1 libxext6 \
&& rm -rf /var/lib/apt/lists/* \
&& pip3 install ms2rescore/
&& apt install -y procps git-lfs \
&& pip install /ms2rescore

ENTRYPOINT [""]
11 changes: 11 additions & 0 deletions docs/source/config_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
- **`deeplc`**: Refer to *[#/definitions/deeplc](#definitions/deeplc)*.
- **`maxquant`**: Refer to *[#/definitions/maxquant](#definitions/maxquant)*.
- **`ionmob`**: Refer to *[#/definitions/ionmob](#definitions/ionmob)*.
- **`im2deep`**: Refer to *[#/definitions/im2deep](#definitions/im2deep)*.
- **`rescoring_engine`** *(object)*: Rescoring engine to use and its configuration. Leave empty to skip rescoring and write features to file. Default: `{"mokapot": {}}`.
- **`.*`**: Refer to *[#/definitions/rescoring_engine](#definitions/rescoring_engine)*.
- **`percolator`**: Refer to *[#/definitions/percolator](#definitions/percolator)*.
Expand Down Expand Up @@ -43,6 +44,14 @@
- **One of**
- *string*
- *null*
- **`psm_id_rt_pattern`**: Regex pattern to extract retention time from psm identifier. Requires at least one capturing group. Default: `null`.
- **One of**
- *string*
- *null*
- **`psm_id_im_pattern`**: Regex pattern to extract ion mobility from psm identifier. Requires at least one capturing group. Default: `null`.
- **One of**
- *string*
- *null*
- **`psm_id_pattern`**: Regex pattern to extract index or scan number from PSM file. Requires at least one capturing group. Default: `"(.*)"`.
- **One of**
- *string*
Expand Down Expand Up @@ -75,6 +84,8 @@
- **`ionmob_model`** *(string)*: Path to Ionmob model directory. Default: `"GRUPredictor"`.
- **`reference_dataset`** *(string)*: Path to Ionmob reference dataset file. Default: `"Meier_unimod.parquet"`.
- **`tokenizer`** *(string)*: Path to tokenizer json file. Default: `"tokenizer.json"`.
- <a id="definitions/im2deep"></a>**`im2deep`** *(object)*: Ion mobility feature generator configuration using IM2Deep. Can contain additional properties. Refer to *[#/definitions/feature_generator](#definitions/feature_generator)*.
- **`reference_dataset`** *(string)*: Path to IM2Deep reference dataset file. Default: `"Meier_unimod.parquet"`.
- <a id="definitions/mokapot"></a>**`mokapot`** *(object)*: Mokapot rescoring engine configuration. Additional properties are passed to the Mokapot brew function. Can contain additional properties. Refer to *[#/definitions/rescoring_engine](#definitions/rescoring_engine)*.
- **`write_weights`** *(boolean)*: Write Mokapot weights to a text file. Default: `false`.
- **`write_txt`** *(boolean)*: Write Mokapot results to a text file. Default: `false`.
Expand Down
14 changes: 10 additions & 4 deletions ms2rescore/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

from ms2rescore.feature_generators import FEATURE_GENERATORS
from ms2rescore.parse_psms import parse_psms
from ms2rescore.parse_spectra import get_missing_values
from ms2rescore.parse_spectra import fill_missing_values
from ms2rescore.report import generate
from ms2rescore.rescoring_engines import mokapot, percolator

Expand Down Expand Up @@ -55,11 +55,17 @@ def rescore(configuration: Dict, psm_list: Optional[PSMList] = None) -> None:
)

# TODO: avoid hard coding feature generators in some way
rt_required = "deeplc" in config["feature_generators"] and None in psm_list["retention_time"]
im_required = "ionmob" in config["feature_generators"] and None in psm_list["ion_mobility"]
rt_required = ("deeplc" in config["feature_generators"]) and (
None in psm_list["retention_time"]
)
im_required = ("ionmob" or "im2deep" in config["feature_generators"]) and (
None in psm_list["ion_mobility"]
)
logger.debug(f"RT required: {rt_required}, IM required: {im_required}")

if rt_required or im_required:
logger.info("Parsing missing retention time and/or ion mobility values from spectra...")
get_missing_values(config, psm_list, missing_rt=rt_required, missing_im=im_required)
fill_missing_values(config, psm_list, missing_rt=rt_required, missing_im=im_required)

# Add rescoring features
for fgen_name, fgen_config in config["feature_generators"].items():
Expand Down
2 changes: 2 additions & 0 deletions ms2rescore/feature_generators/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,13 @@
from ms2rescore.feature_generators.ionmob import IonMobFeatureGenerator
from ms2rescore.feature_generators.maxquant import MaxQuantFeatureGenerator
from ms2rescore.feature_generators.ms2pip import MS2PIPFeatureGenerator
from ms2rescore.feature_generators.im2deep import IM2DeepFeatureGenerator

FEATURE_GENERATORS = {
"basic": BasicFeatureGenerator,
"ms2pip": MS2PIPFeatureGenerator,
"deeplc": DeepLCFeatureGenerator,
"maxquant": MaxQuantFeatureGenerator,
"ionmob": IonMobFeatureGenerator,
"im2deep": IM2DeepFeatureGenerator,
}
228 changes: 228 additions & 0 deletions ms2rescore/feature_generators/im2deep.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
"""
IM2Deep ion mobility-based feature generator.
IM2Deep is a fully modification-aware peptide ion mobility predictor. It uses a deep convolutional
neural network to predict retention times based on the atomic composition of the (modified) amino
acid residues in the peptide. See
`github.com/compomics/IM2Deep <https://github.com/compomics/IM2Deep>`_ for more information.
"""

import contextlib
import logging
import os
from inspect import getfullargspec
from itertools import chain
from typing import List, Optional

import numpy as np
from im2deep.im2deep import predict_ccs
from psm_utils import PSMList

from ms2rescore.feature_generators.base import FeatureGeneratorBase

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
logger = logging.getLogger(__name__)


class IM2DeepFeatureGenerator(FeatureGeneratorBase):
"""IM2Deep collision cross section feature generator."""

def __init__(
self,
*args,
lower_score_is_better: bool = False,
spectrum_path: Optional[str] = None,
processes: int = 1,
calibrate_per_charge: bool = True,
**kwargs,
):
"""
Initialize the IM2DeepFeatureGenerator.
Parameters
----------
lower_score_is_better : bool, optional
A boolean indicating whether lower scores are better for the generated features.
spectrum_path : str or None, optional
Optional path to the spectrum file used for IM2Deep predictions.
processes : int, optional
Number of parallel processes to use for IM2Deep predictions.
calibrate_per_charge : bool, optional
A boolean indicating whether to calibrate CCS values per charge state.
**kwargs : dict, optional
Additional keyword arguments.
Returns
-------
None
"""
super().__init__(*args, **kwargs)
self.lower_score_is_better = lower_score_is_better
self.spectrum_path = spectrum_path
self.processes = processes
self.deeplc_kwargs = kwargs or {}

self._verbose = logger.getEffectiveLevel() <= logging.DEBUG

# Lazy-load DeepLC
from deeplc import DeepLC

self.im2deep = DeepLC

# Remove any kwargs that are not DeepLC arguments
self.im2deep_kwargs = {
k: v for k, v in self.deeplc_kwargs.items() if k in getfullargspec(DeepLC).args
}
self.im2deep_kwargs.update({"config_file": None})

# TODO: Implement im2deep_retrain?

self.im2deep_predictor = None
self.calibrate_per_charge = calibrate_per_charge

@property
def feature_names(self) -> List[str]:
return [
"ccs_observed_im2deep",
"ccs_predicted_im2deep",
"ccs_error_im2deep",
"abs_ccs_error_im2deep",
"perc_ccs_error_im2deep",
]

def add_features(self, psm_list: PSMList) -> None:
"""Add IM2Deep-derived features to PSMs"""

logger.info("Adding IM2Deep-derived features to PSMs")

# Get easy-access nested version of PSMlist
psm_dict = psm_list.get_psm_dict()

# Run IM2Deep for each spectrum file
current_run = 1
total_runs = sum(len(runs) for runs in psm_dict.values())

for runs in psm_dict.values():
# Reset IM2Deep predictor for each collection of runs
self.im2deep_predictor = None
self.selected_model = None
for run, psms in runs.items():
logger.info(
f"Running IM2Deep for PSMs from run ({current_run}/{total_runs}): `{run}`..."
)

# Disable wild logging to stdout by TensorFlow, unless in debug mode
with (
contextlib.redirect_stdout(open(os.devnull, "w"))
if not self._verbose
else contextlib.nullcontext()
):
# Make new PSM list for this run (chain PSMs per spectrum to flat list)
psm_list_run = PSMList(psm_list=list(chain.from_iterable(psms.values())))

logger.debug("Calibrating IM2Deep...")

# Convert ion mobility to CCS and calibrate CCS values
psm_list_run_df = psm_list_run.to_dataframe()
psm_list_run_df["charge"] = [
peptidoform.precursor_charge
for peptidoform in psm_list_run_df["peptidoform"]
]
psm_list_run_df["ccs_observed"] = psm_list_run_df.apply(
lambda x: self.im2ccs(
x["ion_mobility"],
x["precursor_mz"], # TODO: Why does ionmob use calculated mz?
x["charge"],
),
axis=1,
)

# Create dataframe with high confidence hits for calibration
cal_psm_df = self.make_cal_df(psm_list_run_df)

# Make predictions with IM2Deep
logger.debug("Predicting CCS values...")
calibrated_predictions = predict_ccs(
psm_list_run, cal_psm_df, write_output=False
)

# Add features to PSMs
logger.debug("Adding features to PSMs...")
predictions = calibrated_predictions
observations = psm_list_run_df["ccs_observed"]
ccs_diffs_run = np.abs(predictions - observations)
for i, psm in enumerate(psm_list_run):
psm["rescoring_features"].update(
{
"ccs_observed_im2deep": observations[i],
"ccs_predicted_im2deep": predictions[i],
"ccs_error_im2deep": ccs_diffs_run[i],
"abs_ccs_error_im2deep": np.abs(ccs_diffs_run[i]),
"perc_ccs_error_im2deep": np.abs(ccs_diffs_run[i])
/ observations[i]
* 100,
}
)

current_run += 1

def im2ccs(self, reverse_im, mz, charge, mass_gas=28.013, temp=31.85, t_diff=273.15):
"""
Convert ion mobility to CCS.
Parameters
----------
reverse_im : float
Reduced ion mobility.
mz : float
Precursor m/z.
charge : int
Precursor charge.
mass_gas : float, optional
Mass of gas, by default 28.013
temp : float, optional
Temperature in Celsius, by default 31.85
t_diff : float, optional
Factor to convert Celsius to Kelvin, by default 273.15
Notes
-----
Adapted from theGreatHerrLebert/ionmob (https://doi.org/10.1093/bioinformatics/btad486)
"""

SUMMARY_CONSTANT = 18509.8632163405
reduced_mass = (mz * charge * mass_gas) / (mz * charge + mass_gas)
return (SUMMARY_CONSTANT * charge) / (
np.sqrt(reduced_mass * (temp + t_diff)) * 1 / reverse_im
)

# TODO: replace threshold by identified psms?
def make_cal_df(self, psm_list_df, threshold=0.95):
"""Make dataframe for calibration of IM2Deep predictions.
Parameters
----------
psm_list_df : pd.DataFrame
DataFrame with PSMs.
threshold : float, optional
Threshold for high confidence hits, by default 0.95.
Returns
-------
pd.DataFrame
DataFrame with high confidence hits for calibration."""

psm_list_df = psm_list_df[
psm_list_df["charge"] < 5
] # predictions do not go higher for IM2Deep
high_conf_hits = list(
psm_list_df["spectrum_id"][psm_list_df["score"].rank(pct=True) > threshold]
)
logger.debug(
f"Number of high confidence hits for calculating shift: {len(high_conf_hits)}"
)
# Filter df for high_conf_hits
cal_psm_df = psm_list_df[psm_list_df["spectrum_id"].isin(high_conf_hits)]
return cal_psm_df
28 changes: 27 additions & 1 deletion ms2rescore/gui/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -359,15 +359,20 @@ def __init__(self, *args, **kwargs):
self.deeplc_config = DeepLCConfiguration(self)
self.deeplc_config.grid(row=2, column=0, pady=(0, 20), sticky="nsew")

self.im2deep_config = Im2DeepConfiguration(self)
self.im2deep_config.grid(row=3, column=0, pady=(0, 20), sticky="nsew")

self.ionmob_config = IonmobConfiguration(self)
self.ionmob_config.grid(row=3, column=0, pady=(0, 20), sticky="nsew")
self.ionmob_config.grid(row=4, column=0, pady=(0, 20), sticky="nsew")

def get(self) -> Dict:
"""Return the configuration as a dictionary."""
basic_enabled, basic_config = self.basic_config.get()
ms2pip_enabled, ms2pip_config = self.ms2pip_config.get()
deeplc_enabled, deeplc_config = self.deeplc_config.get()
im2deep_enabled, im2deep_config = self.im2deep_config.get()
ionmob_enabled, ionmob_config = self.ionmob_config.get()

config = {}
if basic_enabled:
config["basic"] = basic_config
Expand Down Expand Up @@ -522,6 +527,27 @@ def get(self) -> Dict:
return enabled, config


class Im2DeepConfiguration(ctk.CTkFrame):
def __init__(self, *args, **kwargs):
"""IM2Deep configuration frame."""
super().__init__(*args, **kwargs)

self.configure(fg_color="transparent")
self.grid_columnconfigure(0, weight=1)

self.title = widgets.Heading(self, text="im2deep")
self.title.grid(row=0, column=0, columnspan=2, pady=(0, 5), sticky="ew")

self.enabled = widgets.LabeledSwitch(self, label="Enable im2deep", default=False)
self.enabled.grid(row=1, column=0, pady=(0, 10), sticky="nsew")

def get(self) -> Dict:
"""Return the configuration as a dictionary."""
enabled = self.enabled.get()
config = {}
return enabled, config


class RescoringEngineConfig(ctk.CTkFrame):
def __init__(self, *args, **kwargs):
"""Rescoring engine configuration frame."""
Expand Down
Loading

0 comments on commit 667822e

Please sign in to comment.