Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge im2deep into timsrescore branch #121

Merged
merged 47 commits into from
Feb 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
b45007f
im2deep
rodvrees Jan 9, 2024
0330096
Merge remote-tracking branch 'origin/timsRescore' into timsRescore
rodvrees Jan 10, 2024
f6c9564
models + debug
rodvrees Jan 10, 2024
7585d51
fix
rodvrees Jan 10, 2024
a0f792f
add models
rodvrees Jan 10, 2024
b4660fd
fixes in im2deep.py
rodvrees Jan 11, 2024
1a18a42
fixes in im2deep.py
rodvrees Jan 11, 2024
b93df98
im2deep implementation
rodvrees Jan 11, 2024
782c7a0
CCS shift calculation fix
rodvrees Jan 12, 2024
7604153
models + plot
rodvrees Jan 12, 2024
aa9cde0
Merge branch 'compomics:timsRescore' into timsRescore
ArthurDeclercq Jan 16, 2024
6ccb429
Merge branch 'compomics:timsRescore' into timsRescore
ArthurDeclercq Jan 17, 2024
00d6704
add unused argument
ArthurDeclercq Jan 17, 2024
2645378
Merge branch 'compomics:timsRescore' into timsRescore
ArthurDeclercq Jan 17, 2024
a99b492
IM2Deep plot correct labels
rodvrees Jan 22, 2024
1f6e973
calibrate per charge option
rodvrees Jan 26, 2024
301d632
Merge branch 'timsRescore' of github.com:rodvrees/ms2rescore into tim…
rodvrees Jan 26, 2024
dd8bd18
new models and reference
rodvrees Jan 29, 2024
ad0f3d7
correct format reference
rodvrees Jan 29, 2024
3a5b1ab
fix model name in IM2Deep
rodvrees Jan 29, 2024
3b4899a
Merge branch 'compomics:timsRescore' into timsRescore
ArthurDeclercq Jan 31, 2024
fd09f88
Merge branch 'compomics:timsRescore' into timsRescore
ArthurDeclercq Jan 31, 2024
012e9d0
change use of reference dataset
ArthurDeclercq Jan 31, 2024
d07c3b6
Merge branch 'timsRescore' of https://github.com/rodvrees/ms2rescore …
ArthurDeclercq Jan 31, 2024
13155e6
Merge branch 'compomics:timsRescore' into timsRescore
ArthurDeclercq Jan 31, 2024
aea9d9f
IM2Deep calibrate correctly
rodvrees Jan 31, 2024
52d0761
Merge branch 'timsRescore' of github.com:rodvrees/ms2rescore into tim…
rodvrees Jan 31, 2024
167deb7
Merge branch 'compomics:timsRescore' into timsRescore
ArthurDeclercq Jan 31, 2024
337d81a
Merge branch 'compomics:timsRescore' into timsRescore
ArthurDeclercq Jan 31, 2024
ff34fc2
Merge branch 'compomics:timsRescore' into timsRescore
ArthurDeclercq Feb 1, 2024
f5ec8e5
Merge branch 'compomics:timsRescore' into timsRescore
ArthurDeclercq Feb 1, 2024
65f134c
Implement IM2Deep package
rodvrees Feb 2, 2024
e1f0ddf
Merge branch 'timsRescore' of github.com:rodvrees/ms2rescore into tim…
rodvrees Feb 2, 2024
a9fc45f
GUI changes im2deep
ArthurDeclercq Feb 5, 2024
fa97517
add im2deep to pyproject
ArthurDeclercq Feb 5, 2024
a4fd01a
change dockerfile
ArthurDeclercq Feb 5, 2024
721cd08
optimise spectrum parsing
ArthurDeclercq Feb 5, 2024
ceed348
update toml file
ArthurDeclercq Feb 6, 2024
fcd393c
added debug logging
ArthurDeclercq Feb 6, 2024
3fc994f
requested changes
ArthurDeclercq Feb 15, 2024
42debd8
change numpy version upper limit
ArthurDeclercq Feb 15, 2024
c0713ed
fix bug
ArthurDeclercq Feb 15, 2024
fcf579d
Use newer setup-python action; revert numpy upper limit
RalfG Feb 15, 2024
48fae0e
Fix numpy version for py311
RalfG Feb 17, 2024
9063a84
Update pyproject.toml
RalfG Feb 18, 2024
7ede2f8
requested changes
ArthurDeclercq Feb 20, 2024
b2a126a
Update ms2rescore/feature_generators/im2deep.py
RalfG Feb 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: "3.11"

Expand Down Expand Up @@ -47,7 +47,7 @@ jobs:
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"

Expand Down
11 changes: 6 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
FROM ubuntu:focal
FROM python:3.10

# ARG DEBIAN_FRONTEND=noninteractive

LABEL name="ms2rescore"

ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/ms2rescore
# ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/ms2rescore

ADD pyproject.toml /ms2rescore/pyproject.toml
ADD LICENSE /ms2rescore/LICENSE
Expand All @@ -11,8 +13,7 @@ ADD MANIFEST.in /ms2rescore/MANIFEST.in
ADD ms2rescore /ms2rescore/ms2rescore

RUN apt-get update \
&& apt-get install -y python3-pip procps libglib2.0-0 libsm6 libxrender1 libxext6 \
&& rm -rf /var/lib/apt/lists/* \
&& pip3 install ms2rescore/
&& apt install -y procps git-lfs \
&& pip install /ms2rescore

ENTRYPOINT [""]
11 changes: 11 additions & 0 deletions docs/source/config_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
- **`deeplc`**: Refer to *[#/definitions/deeplc](#definitions/deeplc)*.
- **`maxquant`**: Refer to *[#/definitions/maxquant](#definitions/maxquant)*.
- **`ionmob`**: Refer to *[#/definitions/ionmob](#definitions/ionmob)*.
- **`im2deep`**: Refer to *[#/definitions/im2deep](#definitions/im2deep)*.
- **`rescoring_engine`** *(object)*: Rescoring engine to use and its configuration. Leave empty to skip rescoring and write features to file. Default: `{"mokapot": {}}`.
- **`.*`**: Refer to *[#/definitions/rescoring_engine](#definitions/rescoring_engine)*.
- **`percolator`**: Refer to *[#/definitions/percolator](#definitions/percolator)*.
Expand Down Expand Up @@ -43,6 +44,14 @@
- **One of**
- *string*
- *null*
- **`psm_id_rt_pattern`**: Regex pattern to extract retention time from psm identifier. Requires at least one capturing group. Default: `null`.
- **One of**
- *string*
- *null*
- **`psm_id_im_pattern`**: Regex pattern to extract ion mobility from psm identifier. Requires at least one capturing group. Default: `null`.
- **One of**
- *string*
- *null*
- **`psm_id_pattern`**: Regex pattern to extract index or scan number from PSM file. Requires at least one capturing group. Default: `"(.*)"`.
- **One of**
- *string*
Expand Down Expand Up @@ -75,6 +84,8 @@
- **`ionmob_model`** *(string)*: Path to Ionmob model directory. Default: `"GRUPredictor"`.
- **`reference_dataset`** *(string)*: Path to Ionmob reference dataset file. Default: `"Meier_unimod.parquet"`.
- **`tokenizer`** *(string)*: Path to tokenizer json file. Default: `"tokenizer.json"`.
- <a id="definitions/im2deep"></a>**`im2deep`** *(object)*: Ion mobility feature generator configuration using IM2Deep. Can contain additional properties. Refer to *[#/definitions/feature_generator](#definitions/feature_generator)*.
- **`reference_dataset`** *(string)*: Path to IM2Deep reference dataset file. Default: `"Meier_unimod.parquet"`.
- <a id="definitions/mokapot"></a>**`mokapot`** *(object)*: Mokapot rescoring engine configuration. Additional properties are passed to the Mokapot brew function. Can contain additional properties. Refer to *[#/definitions/rescoring_engine](#definitions/rescoring_engine)*.
- **`write_weights`** *(boolean)*: Write Mokapot weights to a text file. Default: `false`.
- **`write_txt`** *(boolean)*: Write Mokapot results to a text file. Default: `false`.
Expand Down
14 changes: 10 additions & 4 deletions ms2rescore/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

from ms2rescore.feature_generators import FEATURE_GENERATORS
from ms2rescore.parse_psms import parse_psms
from ms2rescore.parse_spectra import get_missing_values
from ms2rescore.parse_spectra import fill_missing_values
from ms2rescore.report import generate
from ms2rescore.rescoring_engines import mokapot, percolator

Expand Down Expand Up @@ -55,11 +55,17 @@ def rescore(configuration: Dict, psm_list: Optional[PSMList] = None) -> None:
)

# TODO: avoid hard coding feature generators in some way
rt_required = "deeplc" in config["feature_generators"] and None in psm_list["retention_time"]
im_required = "ionmob" in config["feature_generators"] and None in psm_list["ion_mobility"]
rt_required = ("deeplc" in config["feature_generators"]) and (
None in psm_list["retention_time"]
)
im_required = ("ionmob" or "im2deep" in config["feature_generators"]) and (
None in psm_list["ion_mobility"]
)
logger.debug(f"RT required: {rt_required}, IM required: {im_required}")

if rt_required or im_required:
logger.info("Parsing missing retention time and/or ion mobility values from spectra...")
get_missing_values(config, psm_list, missing_rt=rt_required, missing_im=im_required)
fill_missing_values(config, psm_list, missing_rt=rt_required, missing_im=im_required)

# Add rescoring features
for fgen_name, fgen_config in config["feature_generators"].items():
Expand Down
2 changes: 2 additions & 0 deletions ms2rescore/feature_generators/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,13 @@
from ms2rescore.feature_generators.ionmob import IonMobFeatureGenerator
from ms2rescore.feature_generators.maxquant import MaxQuantFeatureGenerator
from ms2rescore.feature_generators.ms2pip import MS2PIPFeatureGenerator
from ms2rescore.feature_generators.im2deep import IM2DeepFeatureGenerator

FEATURE_GENERATORS = {
"basic": BasicFeatureGenerator,
"ms2pip": MS2PIPFeatureGenerator,
"deeplc": DeepLCFeatureGenerator,
"maxquant": MaxQuantFeatureGenerator,
"ionmob": IonMobFeatureGenerator,
"im2deep": IM2DeepFeatureGenerator,
}
228 changes: 228 additions & 0 deletions ms2rescore/feature_generators/im2deep.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
"""
IM2Deep ion mobility-based feature generator.

IM2Deep is a fully modification-aware peptide ion mobility predictor. It uses a deep convolutional
neural network to predict retention times based on the atomic composition of the (modified) amino
acid residues in the peptide. See
`github.com/compomics/IM2Deep <https://github.com/compomics/IM2Deep>`_ for more information.

"""

import contextlib
import logging
import os
from inspect import getfullargspec
from itertools import chain
from typing import List, Optional

import numpy as np
from im2deep.im2deep import predict_ccs
from psm_utils import PSMList

from ms2rescore.feature_generators.base import FeatureGeneratorBase

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
logger = logging.getLogger(__name__)


class IM2DeepFeatureGenerator(FeatureGeneratorBase):
"""IM2Deep collision cross section feature generator."""

def __init__(
self,
*args,
lower_score_is_better: bool = False,
spectrum_path: Optional[str] = None,
processes: int = 1,
calibrate_per_charge: bool = True,
**kwargs,
):
"""
Initialize the IM2DeepFeatureGenerator.

Parameters
----------
lower_score_is_better : bool, optional
A boolean indicating whether lower scores are better for the generated features.
spectrum_path : str or None, optional
Optional path to the spectrum file used for IM2Deep predictions.
processes : int, optional
Number of parallel processes to use for IM2Deep predictions.
calibrate_per_charge : bool, optional
A boolean indicating whether to calibrate CCS values per charge state.
**kwargs : dict, optional
Additional keyword arguments.

Returns
-------
None
"""
super().__init__(*args, **kwargs)
self.lower_score_is_better = lower_score_is_better
self.spectrum_path = spectrum_path
self.processes = processes
self.deeplc_kwargs = kwargs or {}

self._verbose = logger.getEffectiveLevel() <= logging.DEBUG

# Lazy-load DeepLC
from deeplc import DeepLC

self.im2deep = DeepLC

# Remove any kwargs that are not DeepLC arguments
self.im2deep_kwargs = {
k: v for k, v in self.deeplc_kwargs.items() if k in getfullargspec(DeepLC).args
}
self.im2deep_kwargs.update({"config_file": None})

# TODO: Implement im2deep_retrain?

self.im2deep_predictor = None
self.calibrate_per_charge = calibrate_per_charge

@property
def feature_names(self) -> List[str]:
return [
"ccs_observed_im2deep",
"ccs_predicted_im2deep",
"ccs_error_im2deep",
"abs_ccs_error_im2deep",
"perc_ccs_error_im2deep",
]

def add_features(self, psm_list: PSMList) -> None:
"""Add IM2Deep-derived features to PSMs"""

logger.info("Adding IM2Deep-derived features to PSMs")

# Get easy-access nested version of PSMlist
psm_dict = psm_list.get_psm_dict()

# Run IM2Deep for each spectrum file
current_run = 1
total_runs = sum(len(runs) for runs in psm_dict.values())

for runs in psm_dict.values():
# Reset IM2Deep predictor for each collection of runs
self.im2deep_predictor = None
self.selected_model = None
for run, psms in runs.items():
logger.info(
f"Running IM2Deep for PSMs from run ({current_run}/{total_runs}): `{run}`..."
)

# Disable wild logging to stdout by TensorFlow, unless in debug mode
with (
contextlib.redirect_stdout(open(os.devnull, "w"))
if not self._verbose
else contextlib.nullcontext()
):
# Make new PSM list for this run (chain PSMs per spectrum to flat list)
psm_list_run = PSMList(psm_list=list(chain.from_iterable(psms.values())))

logger.debug("Calibrating IM2Deep...")

# Convert ion mobility to CCS and calibrate CCS values
psm_list_run_df = psm_list_run.to_dataframe()
psm_list_run_df["charge"] = [
peptidoform.precursor_charge
for peptidoform in psm_list_run_df["peptidoform"]
]
psm_list_run_df["ccs_observed"] = psm_list_run_df.apply(
lambda x: self.im2ccs(
x["ion_mobility"],
x["precursor_mz"], # TODO: Why does ionmob use calculated mz?
x["charge"],
),
axis=1,
)

# Create dataframe with high confidence hits for calibration
cal_psm_df = self.make_cal_df(psm_list_run_df)

# Make predictions with IM2Deep
logger.debug("Predicting CCS values...")
calibrated_predictions = predict_ccs(
psm_list_run, cal_psm_df, write_output=False
)

# Add features to PSMs
logger.debug("Adding features to PSMs...")
predictions = calibrated_predictions
observations = psm_list_run_df["ccs_observed"]
ccs_diffs_run = np.abs(predictions - observations)
for i, psm in enumerate(psm_list_run):
psm["rescoring_features"].update(
{
"ccs_observed_im2deep": observations[i],
"ccs_predicted_im2deep": predictions[i],
"ccs_error_im2deep": ccs_diffs_run[i],
"abs_ccs_error_im2deep": np.abs(ccs_diffs_run[i]),
"perc_ccs_error_im2deep": np.abs(ccs_diffs_run[i])
/ observations[i]
* 100,
}
)

current_run += 1

def im2ccs(self, reverse_im, mz, charge, mass_gas=28.013, temp=31.85, t_diff=273.15):
"""
Convert ion mobility to CCS.

Parameters
----------
reverse_im : float
Reduced ion mobility.
mz : float
Precursor m/z.
charge : int
Precursor charge.
mass_gas : float, optional
Mass of gas, by default 28.013
temp : float, optional
Temperature in Celsius, by default 31.85
t_diff : float, optional
Factor to convert Celsius to Kelvin, by default 273.15

Notes
-----
Adapted from theGreatHerrLebert/ionmob (https://doi.org/10.1093/bioinformatics/btad486)

"""

SUMMARY_CONSTANT = 18509.8632163405
reduced_mass = (mz * charge * mass_gas) / (mz * charge + mass_gas)
return (SUMMARY_CONSTANT * charge) / (
np.sqrt(reduced_mass * (temp + t_diff)) * 1 / reverse_im
)

# TODO: replace threshold by identified psms?
def make_cal_df(self, psm_list_df, threshold=0.95):
"""Make dataframe for calibration of IM2Deep predictions.

Parameters
----------
psm_list_df : pd.DataFrame
DataFrame with PSMs.
threshold : float, optional
Threshold for high confidence hits, by default 0.95.

Returns
-------
pd.DataFrame
DataFrame with high confidence hits for calibration."""

psm_list_df = psm_list_df[
psm_list_df["charge"] < 5
] # predictions do not go higher for IM2Deep
high_conf_hits = list(
psm_list_df["spectrum_id"][psm_list_df["score"].rank(pct=True) > threshold]
)
logger.debug(
f"Number of high confidence hits for calculating shift: {len(high_conf_hits)}"
)
# Filter df for high_conf_hits
cal_psm_df = psm_list_df[psm_list_df["spectrum_id"].isin(high_conf_hits)]
return cal_psm_df
ArthurDeclercq marked this conversation as resolved.
Show resolved Hide resolved
28 changes: 27 additions & 1 deletion ms2rescore/gui/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -359,15 +359,20 @@ def __init__(self, *args, **kwargs):
self.deeplc_config = DeepLCConfiguration(self)
self.deeplc_config.grid(row=2, column=0, pady=(0, 20), sticky="nsew")

self.im2deep_config = Im2DeepConfiguration(self)
self.im2deep_config.grid(row=3, column=0, pady=(0, 20), sticky="nsew")

self.ionmob_config = IonmobConfiguration(self)
self.ionmob_config.grid(row=3, column=0, pady=(0, 20), sticky="nsew")
self.ionmob_config.grid(row=4, column=0, pady=(0, 20), sticky="nsew")

def get(self) -> Dict:
"""Return the configuration as a dictionary."""
basic_enabled, basic_config = self.basic_config.get()
ms2pip_enabled, ms2pip_config = self.ms2pip_config.get()
deeplc_enabled, deeplc_config = self.deeplc_config.get()
im2deep_enabled, im2deep_config = self.im2deep_config.get()
ionmob_enabled, ionmob_config = self.ionmob_config.get()

config = {}
if basic_enabled:
config["basic"] = basic_config
Expand Down Expand Up @@ -522,6 +527,27 @@ def get(self) -> Dict:
return enabled, config


class Im2DeepConfiguration(ctk.CTkFrame):
def __init__(self, *args, **kwargs):
"""IM2Deep configuration frame."""
super().__init__(*args, **kwargs)

self.configure(fg_color="transparent")
self.grid_columnconfigure(0, weight=1)

self.title = widgets.Heading(self, text="im2deep")
self.title.grid(row=0, column=0, columnspan=2, pady=(0, 5), sticky="ew")

self.enabled = widgets.LabeledSwitch(self, label="Enable im2deep", default=False)
self.enabled.grid(row=1, column=0, pady=(0, 10), sticky="nsew")

def get(self) -> Dict:
"""Return the configuration as a dictionary."""
enabled = self.enabled.get()
config = {}
return enabled, config


class RescoringEngineConfig(ctk.CTkFrame):
def __init__(self, *args, **kwargs):
"""Rescoring engine configuration frame."""
Expand Down
Loading
Loading