Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/compomics/psm_utils into di…
Browse files Browse the repository at this point in the history
…ann-io
  • Loading branch information
RalfG committed Nov 6, 2024
2 parents 00c2714 + 27b8591 commit 8fc6f19
Show file tree
Hide file tree
Showing 27 changed files with 1,206 additions and 146 deletions.
42 changes: 42 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,48 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.1.1] - 2024-10-01

### Fixed

- `io`: Fix Sage filename pattern for automatic file type inference
- `io.flashlfq`: Fix writing PSMs without protein accession
- `io.flashlfq`: Fix column names `Peptide Monoisotopic Mass` and `Protein Accession`.
- `io.idxml`: Fix parsing if spectra file name not present [#92](https://github.com/compomics/psm_utils/issues/92)

## [1.1.0] - 2024-09-05

### Added

- `Peptidoform`: Add `modified_sequence` property to return the modified sequence in ProForma format, but without charge state.
- `io`: Add support for reading and writing FlashLFQ generic TSV files.


## [1.0.1] - 2024-08-28

### Fixed

- `io.percolator`: Fix and improve ScanNr inferring and writing
- `io.percolator`: Infer style from file extension if not provided (enables dynamic style determination in, for instance, `convert` function).

## [1.0.0] - 2024-08-14

### Added

- Peptidoform: Allow comparison between a peptidoform and a peptidoform string; allow direct indexing with square brackets, which indexes or slices parsed_sequence (in #89)

### Fixed

- TSV: Avoid flooding logs when reading a different file format by raising exception when three consecutive rows could not be parsed (in #88)

## [0.9.1] - 2024-07-17

### Fixed

- `io.xtandem`: Fix parsing PSMs and complete protein names in XTandem (by @julianu in #83)
- `io.tsv`: Fix warning formatting when parsing TSV (by @paretje in #85)
- `io`: Fix support for mzIdentML and pepXML files from Comet (by @paretje in #87)

## [0.9.0] - 2024-05-01

### Added
Expand Down
1 change: 1 addition & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ Supported file formats
===================================================================================================================== ======================== =============== ===============
`AlphaDIA precursors TSV <https://alphadia.readthedocs.io/en/latest/quickstart.html#output-files>`_ ``alphadia`` ✅ ❌
`DIA-NN TSV <https://github.com/vdemichev/DiaNN#output>`_ ``diann`` ✅ ❌
`FlashLFQ generic TSV <https://github.com/smith-chem-wisc/FlashLFQ/wiki/Identification-Input-Formats>`_ ``flashlfq`` ✅ ✅
`FragPipe PSM TSV <https://fragpipe.nesvilab.org/docs/tutorial_fragpipe_outputs.html#psmtsv/>`_ ``fragpipe`` ✅ ❌
`ionbot CSV <https://ionbot.cloud/>`_ ``ionbot`` ✅ ❌
`OpenMS idXML <https://www.openms.de/>`_ ``idxml`` ✅ ✅
Expand Down
10 changes: 9 additions & 1 deletion docs/source/api/psm_utils.io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,14 @@ psm_utils.io.diann
:inherited-members:


psm_utils.io.flashlfq
#####################

.. automodule:: psm_utils.io.flashlfq
:members:
:inherited-members:


psm_utils.io.fragpipe
##################

Expand Down Expand Up @@ -76,7 +84,7 @@ psm_utils.io.mzid


psm_utils.io.parquet
#################
####################

.. automodule:: psm_utils.io.parquet
:members:
Expand Down
90 changes: 90 additions & 0 deletions example_files/PXD053286-G1_PTMiprophet.slice.pep.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
<?xml version="1.0" encoding="UTF-8"?>
<msms_pipeline_analysis date="2022-05-17T22:37:25"
xmlns="http://regis-web.systemsbiology.net/pepXML"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://regis-web.systemsbiology.net/pepXML /tools/bin/TPP/tpp/schema/pepXML_v122.xsd"
summary_xml="/mnt/nfs/DataPool/Projects/Lumos/2022/22_2_Apr_Jun/20220510_VC/Pipe_Mferi_M/G1_iprophet.pep.xml">
<analysis_summary analysis="ptmprophet" time="2022-05-17T23:41:43">
<ptmprophet_summary version="TPP v5.1.0 Syzygy, Build 202012091755-8315 (Linux-x86_64)"
options="M:15.994915,n:42.010565 MZTOL=0.4 G1_iprophet.pep.xml G1_PTMiprophet.pep.xml">
<inputfile name="G1_iprophet.pep.xml" />
<inputfile name="20220511_M1_Mferi_0535_VC_i01_comet.pep.xml"
directory="/mnt/nfs/DataPool/Projects/Lumos/2022/22_2_Apr_Jun/20220510_VC/Data" />
<inputfile name="20220511_M1_Mferi_0535_VC_i02_comet.pep.xml"
directory="/mnt/nfs/DataPool/Projects/Lumos/2022/22_2_Apr_Jun/20220510_VC/Data" />
</ptmprophet_summary>
</analysis_summary>
<msms_run_summary
base_name="/mnt/nfs/DataPool/Projects/Lumos/2022/22_2_Apr_Jun/20220510_VC/Data/20220511_M1_Mferi_0535_VC_i01"
msManufacturer="UNKNOWN" msModel="UNKNOWN" raw_data_type="raw" raw_data=".mzXML">
<sample_enzyme name="trypsin">
<specificity cut="KR" no_cut="P" sense="C" />
</sample_enzyme>
<search_summary
base_name="/mnt/nfs/DataPool/Projects/Lumos/2022/22_2_Apr_Jun/20220510_VC/Data/20220511_M1_Mferi_0535_VC_i01"
search_engine="Comet" search_engine_version="2019.01 rev. 5"
precursor_mass_type="monoisotopic" fragment_mass_type="monoisotopic" search_id="1">
<search_database
local_path="/mnt/nfs/DataPool/FastaDataBases/Mferi_G5847_GCF_000327395.faa_REV.fasta"
type="AA" />
<enzymatic_search_constraint enzyme="Trypsin" max_num_internal_cleavages="3"
min_number_termini="2" />
<aminoacid_modification aminoacid="M" massdiff="15.994900" mass="147.035385"
variable="Y" symbol="*" />
<terminal_modification terminus="N" massdiff="42.010565" mass="43.018390" variable="Y"
protein_terminus="Y" symbol="#" />
<aminoacid_modification aminoacid="C" massdiff="57.021464" mass="160.030649"
variable="N" />
</search_summary>
<spectrum_query spectrum="20220511_M1_Mferi_0535_VC_i01.05387.05387.2" start_scan="5387"
end_scan="5387" precursor_neutral_mass="1006.560276" assumed_charge="2" index="143"
retention_time_sec="1413.1" experiment_label="Mferi_M1">
<search_result>
<search_hit hit_rank="1" peptide="SNLFLMLK" peptide_prev_aa="M" peptide_next_aa="Q"
protein="WP_008364460.1" num_tot_proteins="1" num_matched_ions="11"
tot_num_ions="14" calc_neutral_pep_mass="1006.552139" massdiff="0.008137"
num_tol_term="2" num_missed_cleavages="0" num_matched_peptides="55"
protein_descr="ABC transporter permease [Mycoplasma feriruminatoris]">
<modification_info modified_peptide="n[43]SNLFLMLK" mod_nterm_mass="43.01839"></modification_info>
<search_score name="xcorr" value="1.214" />
<search_score name="deltacn" value="1.000" />
<search_score name="deltacnstar" value="0.000" />
<search_score name="spscore" value="453.5" />
<search_score name="sprank" value="1" />
<search_score name="expect" value="1.17E+00" />
<analysis_result analysis="peptideprophet">
<peptideprophet_result probability="0.5131"
all_ntt_prob="(0.0000,0.0000,0.5131)">
<search_score_summary>
<parameter name="fval" value="-0.3767" />
<parameter name="ntt" value="2" />
<parameter name="nmc" value="0" />
<parameter name="massd" value="8.084" />
<parameter name="isomassd" value="0" />
</search_score_summary>
</peptideprophet_result>
</analysis_result>
<analysis_result analysis="interprophet">
<interprophet_result probability="0.00133535"
all_ntt_prob="(0,0,0.00133535)">
<search_score_summary>
<parameter name="nss" value="0.1289" />
<parameter name="nrs" value="-0.7221" />
<parameter name="nse" value="-0.5741" />
<parameter name="nsi" value="0" />
<parameter name="nsm" value="0.9838" />
<parameter name="nsp" value="20" />
</search_score_summary>
</interprophet_result>
</analysis_result>
<analysis_result analysis="ptmprophet">
<ptmprophet_result prior="1" ptm="PTMProphet_n42.0106"
ptm_peptide="n(1.000)SNLFLMLK">
<mod_terminal_probability terminus="n" probability="1.000" />
</ptmprophet_result>
</analysis_result>
</search_hit>
</search_result>
</spectrum_query>
</msms_run_summary>
</msms_pipeline_analysis>
18 changes: 18 additions & 0 deletions example_files/example.flashlfq.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
File Name Scan Retention Time Precursor Charge Base Sequence Full Sequence Peptide Monoisotopic Mass Protein Accession
SmallCalibratible_Yeast 24.80555 2 KAPAGGAADAAAK KAPAGGAADAAAK
SmallCalibratible_Yeast 24.95372 2 KAPAAAPAASK KAPAAAPAASK
SmallCalibratible_Yeast 24.77032 2 KQAIETANK KQAIETANK
SmallCalibratible_Yeast 24.17319 2 RVDEGGAQDK RVDEGGAQDK
SmallCalibratible_Yeast 24.26695 2 KDAEPQSDSTTSK KDAEPQSDSTTSK
SmallCalibratible_Yeast 24.10798 2 EKAEAEAEK EKAEAEAEK
SmallCalibratible_Yeast 24.06874 2 EKAEAEAEK EKAEAEAEK
SmallCalibratible_Yeast 24.77398 2 FKEEDEKESQR FKEEDEKESQR
SmallCalibratible_Yeast 24.90638 2 YDHEASSSYK YDHEASSSYK
SmallCalibratible_Yeast 24.40345 3 SKDVTDSATTKK SKDVTDSATTKK
SmallCalibratible_Yeast 24.71679 2 FKEEDEKESQR FKEEDEKESQR
SmallCalibratible_Yeast 24.39968 2 ALKQEGAANK ALKQEGAANK
SmallCalibratible_Yeast 24.67303 2 SKDVTDSATTK SKDVTDSATTK
SmallCalibratible_Yeast 24.45053 2 KLEDHPK KLEDHPK
SmallCalibratible_Yeast 24.77398 1 HIDAGAK HIDAGAK
SmallCalibratible_Yeast 24.9022 2 YLAKEEEKK YLAKEEEKK
SmallCalibratible_Yeast 24.76278 2 YAGEVSHDDK YAGEVSHDDK
2 changes: 1 addition & 1 deletion psm_utils/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""Common utilities for parsing and handling PSMs, and search engine results."""

__version__ = "0.9.0"
__version__ = "1.1.1"
__all__ = ["Peptidoform", "PSM", "PSMList"]

from warnings import filterwarnings
Expand Down
13 changes: 10 additions & 3 deletions psm_utils/io/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

import psm_utils.io.alphadia as alphadia
import psm_utils.io.diann as diann
import psm_utils.io.flashlfq as flashlfq
import psm_utils.io.fragpipe as fragpipe
import psm_utils.io.idxml as idxml
import psm_utils.io.ionbot as ionbot
Expand All @@ -31,6 +32,12 @@
from psm_utils.psm_list import PSMList

FILETYPES = {
"flashlfq": {
"reader": flashlfq.FlashLFQReader,
"writer": flashlfq.FlashLFQWriter,
"extension": ".tsv",
"filename_pattern": r"^.*\.flashlfq\.tsv$",
},
"ionbot": {
"reader": ionbot.IonbotReader,
"writer": None,
Expand Down Expand Up @@ -65,7 +72,7 @@
"reader": pepxml.PepXMLReader,
"writer": None,
"extension": ".pepxml",
"filename_pattern": r"^.*\.pepxml$",
"filename_pattern": r"^.*\.pep\.?xml$",
},
"percolator": {
"reader": percolator.PercolatorTabReader,
Expand Down Expand Up @@ -101,13 +108,13 @@
"reader": sage.SageTSVReader,
"writer": None,
"extension": ".tsv",
"filename_pattern": r"^.*(?:_|\.).sage.tsv$",
"filename_pattern": r"^.*(?:_|\.)sage.tsv$",
},
"sage_parquet": {
"reader": sage.SageParquetReader,
"writer": None,
"extension": ".parquet",
"filename_pattern": r"^.*(?:_|\.).sage.parquet$",
"filename_pattern": r"^.*(?:_|\.)sage.parquet$",
},
"fragpipe": {
"reader": fragpipe.FragPipeReader,
Expand Down
Loading

0 comments on commit 8fc6f19

Please sign in to comment.