libpappsomspp
Library for mass spectrometry
Loading...
Searching...
No Matches
semiglobalalignment.h
Go to the documentation of this file.
1/**
2 * \file pappsomspp/processing/specpeptidoms/semiglobalalignment.h
3 * \date 24/03/2025
4 * \author Aurélien Berthier
5 * \brief protein to spectrum alignment
6 *
7 * C++ implementation of the SpecPeptidOMS algorithm described in :
8 * (1) Benoist, É.; Jean, G.; Rogniaux, H.; Fertin, G.; Tessier, D. SpecPeptidOMS Directly and
9 * Rapidly Aligns Mass Spectra on Whole Proteomes and Identifies Peptides That Are Not Necessarily
10 * Tryptic: Implications for Peptidomics. J. Proteome Res. 2025.
11 * https://doi.org/10.1021/acs.jproteome.4c00870.
12 */
13
14/*
15 * Copyright (c) 2025 Aurélien Berthier
16 * <aurelien.berthier@ls2n.fr>
17 *
18 * This program is free software: you can redistribute it and/or modify
19 * it under the terms of the GNU General Public License as published by
20 * the Free Software Foundation, either version 3 of the License, or
21 * (at your option) any later version.
22 *
23 * This program is distributed in the hope that it will be useful,
24 * but WITHOUT ANY WARRANTY; without even the implied warranty of
25 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
26 * GNU General Public License for more details.
27 *
28 * You should have received a copy of the GNU General Public License
29 * along with this program. If not, see <http://www.gnu.org/licenses/>.
30 */
31
32#pragma once
33
34#include <boost/numeric/ublas/matrix.hpp>
35#include "spomsspectrum.h"
37#include "scorevalues.h"
38#include "locationsaver.h"
39#include "scenario.h"
40
41namespace pappso
42{
43namespace specpeptidoms
44{
45
46struct KeyCell
47{
48 std::size_t n_row;
49 int score;
50 std::size_t beginning;
52};
53
55{
56 /** @brief reinitialize to default score_values
57 */
58 void reset();
59
60 std::vector<std::size_t> peaks; // List of the spectrum's peaks used by the alignment
61 QString interpretation; // String representing the alignment, with mass shifts and sequence shifts
62 int score = 0; // Final alignment score
63 double begin_shift = 0.0, // Shift between the spectrum's first peak (at 19.02 Da) and the
64 // alignment's last peak (first in N->C order).
65 end_shift = 0.0; // Missing mass between the alignment's total mass (including begin_shift) and
66 // the spectrum precursor mass.
67 std::vector<double> shifts; // List of mass shifts present in the alignment
68 std::size_t SPC = 0, // SPC : Shared Peak Count
69 beginning = 0, // Localization of the alignment's first amino acid in the peptide sequence
70 end = 0; // Localization of the alignment's last amino acid in the peptide sequence
71};
72
74{
75 public:
76 /**
77 * Default constructor
78 */
79 SemiGlobalAlignment(const ScoreValues &score_values,
80 const pappso::PrecisionPtr precision_ptr,
81 const AaCode &aaCode);
82
83 /**
84 * Destructor
85 */
87
88 /**
89 * @brief perform the first alignment search between a protein sequence and a spectrum. The member
90 * location heap is filled with the candidates locations.
91 * @param spectrum Spectrum to align
92 * @param protein_seq Protein sequence to align.
93 * @param protein_id ID of the protein to align.
94 */
95 void
96 fastAlign(const SpOMSSpectrum &spectrum, const QString &protein_seq, const QString &protein_id);
97
98 /**
99 * @brief performs the second alignment search between a protein subsequence and a spectrum.
100 * @param spectrum Spectrum to align
101 * @param protein_seq Protein sequence to align.
102 * @param protein_id ID of the protein to align.
103 * @param beginning Index of the beginning of the subsequence to align.
104 * @param length Length of the subsequence to align.
105 */
106 void preciseAlign(const SpOMSSpectrum &spectrum,
107 const QString &protein_seq,
108 const QString &protein_id,
109 const std::size_t beginning,
110 const std::size_t length);
111
112 /**
113 * @brief performs the post-processing : generates corrected spectra and align them
114 * @param spectrum Spectrum to align
115 * @param protein_seq Protein sequence to align.
116 * @param protein_id ID of the protein to align.
117 * @param beginning Index of the beginning of the subsequence to align.
118 * @param length Length of the subsequence to align.
119 * @param shifts List of potential precursor mass errors to test.
120 */
121 void postProcessingAlign(const SpOMSSpectrum &spectrum,
122 const QString &protein_seq,
123 const QString &protein_id,
124 std::size_t beginning,
125 std::size_t length,
126 const std::vector<double> &shifts);
127
128 /**
129 * @brief Returns a copy of m_location_saver.
130 */
132
133 /**
134 * @brief Returns a copy of m_scenario.
135 */
136 Scenario getScenario() const;
137
138 /**
139 * @brief Returns a const ref to m_best_alignment.
140 */
141 const Alignment &getBestAlignment() const;
142
143 /**
144 * @brief Returns a list of the potential mass errors corresponding to the provided alignment in
145 * the provided protein sequence.
146 * @param aa_code the amino acid code of reference to get aminon acid masses
147 * @param alignment Alignment for which to get the potential mass errors.
148 * @param protein_seq Protein sequence corresponding to the provided alignment.
149 */
150 static std::vector<double> getPotentialMassErrors(const pappso::AaCode &aa_code,
151 const Alignment &alignment,
152 const QString &protein_seq);
153
154 private:
155 std::vector<KeyCell> m_interest_cells;
156 std::vector<std::pair<std::size_t, KeyCell>> m_updated_cells;
158 const int min_score = 15;
164
165 /**
166 * @brief Stores the best alignment from m_scenario in m_best_alignment
167 * @param sequence Protein sequence of the current alignment.
168 * @param spectrum Spectrum currently being aligned.
169 * @param offset Size of the protein sequence minus beginning of the alignment. Used to compute
170 * the position of the alignment in the protein sequence.
171 */
172 void
173 saveBestAlignment(const QString &sequence, const SpOMSSpectrum &spectrum, std::size_t offset);
174
175 /**
176 * @brief Recursively performs the correction of the alignment.
177 * @param protein_seq Protein sequence to align.
178 * @param protein_id ID of the protein to align.
179 * @param spectrum Spectrum to align.
180 * @param peaks_to_remove Peaks to remove from the spectrum.
181 * @param offset Size of the protein sequence minus beginning of the alignment. Used to compute
182 * the position of the alignment in the protein sequence.
183 */
184 void correctAlign(int recursive_call_count,
185 const QString &protein_seq,
186 const QString &protein_id,
187 const SpOMSSpectrum &spectrum,
188 std::vector<std::size_t> peaks_to_remove,
189 std::size_t offset);
190
191 /**
192 * @brief updates the scores of the alignment matrix for a given amino acid as well as the
193 * location heap/scenario.
194 * @param sequence Reversed sequence of the protein being aligned
195 * @param row_number number of the row to update (== index in sequence of the amino acid being
196 * aligned)
197 * @param aa_positions list of the AaPositions of the current amino acid
198 * @param spectrum Spectrum being aligned
199 * @param fast_align Whether to use the fast version of the algorithm (for 1st alignemnt step)
200 * @param protein Id of the protein being aligned.
201 */
202 void updateAlignmentMatrix(const QString &sequence,
203 const std::size_t row_number,
204 const std::vector<AaPosition> aa_positions,
205 const SpOMSSpectrum &spectrum,
206 const bool fast_align,
207 const QString &protein);
208
209 /**
210 * @brief indicates if a perfect shift is possible between the provided positions
211 * @param sequence Reversed sequence of the protein being aligned
212 * @param spectrum Spectrum being aligned
213 * @param origin_row beginning row of the aa gap to verify (== index of the first missing aa in
214 * sequence)
215 * @param current_row row being processed (== index of the current AaPosition in sequence)
216 * @param l_peak left peak index of the mz gap to verify
217 * @param r_peak right peak index of the mz gap to verify
218 */
219 bool perfectShiftPossible(const QString &sequence,
220 const SpOMSSpectrum &spectrum,
221 const std::size_t origin_row,
222 const std::size_t current_row,
223 const std::size_t l_peak,
224 const std::size_t r_peak) const;
225
226 /**
227 * @brief indicates if a perfect shift is possible from the spectrum beginning to the provided
228 * peak.
229 * @param sequence Reversed sequence of the protein being aligned
230 * @param spectrum Spectrum being aligned
231 * @param current_row row being processed (== index of the current AaPosition in sequence)
232 * @param r_peak right peak index of the mz gap to verify
233 */
234 std::size_t perfectShiftPossibleFrom0(const QString &sequence,
235 const SpOMSSpectrum &spectrum,
236 const std::size_t current_row,
237 const std::size_t r_peak) const;
238
239 /**
240 * @brief indicates if a perfect shift is possible between the provided positions
241 * @param sequence Reversed sequence of the protein being aligned
242 * @param spectrum Spectrum being aligned
243 * @param end_row Index of the last aligned row.
244 * @param end_peak Index of the last aligned peak.
245 */
246 std::size_t perfectShiftPossibleEnd(const QString &sequence,
247 const SpOMSSpectrum &spectrum,
248 std::size_t end_row,
249 std::size_t end_peak) const;
250};
251} // namespace specpeptidoms
252} // namespace pappso
collection of integer code for each amino acid 0 => null 1 to 20 => amino acid sorted by there mass (...
Definition aacode.h:44
void saveBestAlignment(const QString &sequence, const SpOMSSpectrum &spectrum, std::size_t offset)
Stores the best alignment from m_scenario in m_best_alignment.
const Alignment & getBestAlignment() const
Returns a const ref to m_best_alignment.
void preciseAlign(const SpOMSSpectrum &spectrum, const QString &protein_seq, const QString &protein_id, const std::size_t beginning, const std::size_t length)
performs the second alignment search between a protein subsequence and a spectrum.
Scenario getScenario() const
Returns a copy of m_scenario.
void updateAlignmentMatrix(const QString &sequence, const std::size_t row_number, const std::vector< AaPosition > aa_positions, const SpOMSSpectrum &spectrum, const bool fast_align, const QString &protein)
updates the scores of the alignment matrix for a given amino acid as well as the location heap/scenar...
void postProcessingAlign(const SpOMSSpectrum &spectrum, const QString &protein_seq, const QString &protein_id, std::size_t beginning, std::size_t length, const std::vector< double > &shifts)
performs the post-processing : generates corrected spectra and align them
bool perfectShiftPossible(const QString &sequence, const SpOMSSpectrum &spectrum, const std::size_t origin_row, const std::size_t current_row, const std::size_t l_peak, const std::size_t r_peak) const
indicates if a perfect shift is possible between the provided positions
std::size_t perfectShiftPossibleEnd(const QString &sequence, const SpOMSSpectrum &spectrum, std::size_t end_row, std::size_t end_peak) const
indicates if a perfect shift is possible between the provided positions
std::vector< std::pair< std::size_t, KeyCell > > m_updated_cells
static std::vector< double > getPotentialMassErrors(const pappso::AaCode &aa_code, const Alignment &alignment, const QString &protein_seq)
Returns a list of the potential mass errors corresponding to the provided alignment in the provided p...
void correctAlign(int recursive_call_count, const QString &protein_seq, const QString &protein_id, const SpOMSSpectrum &spectrum, std::vector< std::size_t > peaks_to_remove, std::size_t offset)
Recursively performs the correction of the alignment.
SemiGlobalAlignment(const ScoreValues &score_values, const pappso::PrecisionPtr precision_ptr, const AaCode &aaCode)
void fastAlign(const SpOMSSpectrum &spectrum, const QString &protein_seq, const QString &protein_id)
perform the first alignment search between a protein sequence and a spectrum. The member location hea...
std::size_t perfectShiftPossibleFrom0(const QString &sequence, const SpOMSSpectrum &spectrum, const std::size_t current_row, const std::size_t r_peak) const
indicates if a perfect shift is possible from the spectrum beginning to the provided peak.
LocationSaver getLocationSaver() const
Returns a copy of m_location_saver.
tries to keep as much as possible monoisotopes, removing any possible C13 peaks and changes multichar...
Definition aa.cpp:39
const PrecisionBase * PrecisionPtr
Definition precision.h:122
void reset()
reinitialize to default score_values
std::vector< std::size_t > peaks