My watch list  

Mass spectrometry software

Mass spectrometry software is any software for data acquisition, analysis or data representation in mass spectrometry.

Most of the following tools work on the mass spectrometry data formats mzData and mzXML.

If you are in the market for Mass spectrometry software consider your application first. For standard sequenced organisms, many standard database search engine providers will suffice. If research is being conducted where unsequenced organisms may present, a de novo sequencing algorithm is imperative.Many software providers produce valuded database search engine algorthims have various strengths and weaknesses. For this reason many researchers are seeking software tools that offer a complete package of tools such as de novo sequencing, database protein identification, and possibly quantification.

Additional recommended knowledge



PEAKS is designed for peptide sequencing and protein identification from tandem mass spectrometry (MS/MS) data.

Other than being used for search engine protein identification (Protein ID), it is one of the earliest and successful adaptors for de novo sequencing (both automated and manual) and sequence tag based searching (SPIDER).[citation needed]

In short, de novo sequencing is peptide sequencing performed without prior knowledge of the amino acid sequence, approximately 1 spectra per second or a run of 1000 spectra in about 20 minutes.

Some of the information PEAKS provides is a complete sequence for each peptide, confidence scores on individual amino acid assignments, simple reporting for high-throughput analysis, and pretty much all the stuff one needs to know for in depth investigations.

One of the most useful tools in any form of research is the ability to compare results. PEAKS will cross check test results automatically with other protein ID search engines, such as Sequest, OMSSA, X!Tandem and Mascot. This approach guards against false positive peptide assignments.

PEAKS reads all standard vendor data formats: ABI, Aglient, Bruker, Thermo, Waters, etc.

Reliable software package: de novo (industry gold standard) and database search engine, meta server (for comparing multiple methods easily), sequence homology tool and quantitation available. Good scalability and very fast processing speed.


ProTrawler is an LCMS data reduction application that reads raw mass spectrometry vendor data (from a variety of well-known instrument companies) and creates lists of {mass, retention time, integrated signal intensity} triplets summarizing the LCMS chromatogram. The measurements are reported with errors, which are essential for performing dynamic binning for comparisons between data sets. ProTrawler operates in two modes: a highly visual hands-on (expert) mode for the development of parameters used in data reduction and a fully automated mode for moving through many chromatograms in an automated fashion. ProTrawler's data reduction work flow includes background elimination, noise estimation, peak shape estimation, shape deconvolution, and isotopic and charge-state list deconvolution (factoring in errors and signal noise) to give a list features. Typically, ProTrawler reduces 1 GB of raw data to 10 Kb of processed results with a detection sensitivity of three orders of magnitude in 25% of the data acquisition time. No formal Bayesian methods are used, but sophisticated statistical inference is employed throughout. ProTrawler has been used for bacterial protein biomarker discovery efforts as well as for IPEx-related applications.


Regatta is an LCMS list comparison application that works hand-in-hand with ProTrawler (but accepts input in Excel/CSV form) to provide an environment for LCMS results list filtering and normalization {mass, retention time, integrated intensity} lists. To accomplish this, Regatta solves the famous Transitive Property of Equality problem that arises in the comparison of analytical list data, viz., if Peak A in Sample A overlaps Peak B in Sample B, and Peak B overlaps Peak C in Sample C, but Peak A does not overlap Peak C, then can we say that we've measured the same analyte in all three samples or not? Regatta also implements multivariate analysis, e.g., hierarchical cluster analysis, principal component analysis, as well as statistical tests, e.g., coefficients of variation. Input is not necessarily restricted to output from ProTrawler. Regatta has been used for successfully for biomarker discovery.


Common database search engines are unable to recognize some peptides.[citation needed] SPIDER, a sequence tag based search tool, complements protein identification by quickly seeking homology in proposed protein sequences. Partial sequence recognition allows for a greater understanding of post translational modifications and sequence mutations.

BLAST style homology fails when confronted with common sequence substitutions such as I/L, N/GG, SAT/TAS.

SPIDER's corporate site


SEQUEST is a tandem mass spectrometry data analysis program [1]. Sequest identifies collections of tandem mass spectra to peptide sequences that have been generated from databases of protein sequences.

This tool is most useful in the context of shotgun proteomics. Starting with a complex mixture of proteins, this strategy typically employs trypsin to digest proteins. These peptides are separated by liquid chromatography en route to a tandem mass spectrometer. The mass spectrometer then isolates ions of a particular peptide, subjects them to collision-induced dissociation, and records the produced fragments in a tandem mass spectrum. This process, repeated for several hours, will produce thousands of tandem mass spectra. Identifying such a data collection requires automation, and Sequest was the first software to fill that need.

Sequest, like many engines, identifies each tandem mass spectrum individually. The software evaluates protein sequences from a database to compute the list of peptides that could result from each. The peptide's intact mass is known from the mass spectrum, and Sequest uses this information to determine the set of candidate peptides sequences that could meaningfully be compared to the spectrum by including only those which are near the mass of the observed peptide ion. For each candidate peptide, Sequest projects a theoretical tandem mass spectrum, and Sequest compares these theoretical spectra to the observed tandem mass spectrum by the use of cross correlation. The candidate sequence with the best matching theoretical tandem mass spectrum is reported as the best identification for this spectrum.

While very successful in terms of sensitivity, it is quite slow to process data and there are concerns against specificity (especially if multiple PTMS are present).


Matrix Science produces an algorithm called "Mascot" that performs mass spectrometry data analysis through a statistical evaluation of matches between observed and projected peptide fragments rather than cross correlation. As of version 2.2, support for peptide quantitation methods is provided in addition to the identification features.

Formerly the dominant leader for database search engine and used by competitors for benchmarking. Improvements by competitors and new technologies have caught up to and surpassed in performance. Although very sensitive, the software is significantly slower when looking for multiple PTM.

VIPER and Decon2LS

The "Proteomics Research Resource for Integrative Biology" distributes software tools (VIPER [2], Decon2LS, and others) that can be used to perform analysis of accurate mass and chromatography retention time analysis of LC-MS features. Sometimes referred to as the Accurate Mass and Time tag approach (AMT tag approach) generally these tools are used for Proteomics.


Phenyx is developed by Geneva Bioinformatics (GeneBio) in collaboration with the Swiss Institute of Bioinformatics (SIB). Phenyx incorporates OLAV, a family of statistical scoring models, to generate and optimize scoring schemes that can be tailored for all kinds of instruments, instrumental set-ups and general sample treatments. Although, not RAW, unprocessed data. [3] Phenyx computes a score to evaluate the quality of a match between a theoretical and experimental peak list (i.e. mass spectrum). A match is thus a collection of observations deduced from this comparison. The basic peptide score is ultimately transformed into a normalized z-Score and a p-Value. A basic peptide score is the sum of raw scores for up to twelve physico-chemical properties.

In addition to regular peptide and protein identification features, Phenyx proposes a number of additional functionalities, such as: a result comparison interface to visualise side-by-side multiple results; an import functionality to incorporate results from other search engines; a manual validation feature to manually accept/reject identifications and dynamically recalculates protein scores.


OpenMS is a software C++ library for LC/MS data management and analysis. It offers an infrastructure for the development of mass spectrometry related software. OpenMS is free software available under the LGPL.

TOPP - The OpenMS Proteomics Pipeline - is a set of small applications that can be chained to create analysis pipelines tailored for a specific problem. TOPP is developed using the datastructures and algorithms provided by OpenMS. TOPP is free software available under the LGPL.

OpenMS and TOPP are a joint project of the Algorithmic Bioinformatics group at the Free University of Berlin, the Department for Simulation of Biological Systems of Tübingen University and the Junior Research Group for Protein-Protein Interactions and Computational Proteomics at Saarland University.


X! Tandem open source is software that can match tandem mass spectra with peptide sequences, in a process that has come to be known as protein identification.

This software has a very simple, sophisticated application programming interface (API): it simply takes an XML file of instructions on its command line, and outputs the results into an XML file, which has been specified in the input XML file. The output format is described here (PDF). This format is used for all of the X! series search engines, as well as the GPM and GPMDB.

Unlike some earlier generation search engines, all of the X! Series search engines calculate statistical confidence (expectation values) for all of the individual spectrum-to-sequence assignments. They also reassemble all of the peptide assignments in a data set onto the known protein sequences and assign the statistical confidence that this assembly and alignment is non-random. The formula for which can be found here. Therefore, separate assembly and statistical analysis software, e.g. PeptideProphet and ProteinProphet, do not need to be used.

Good for speed but poor for false negatives and sensitivity.


  1. ^ Eng JK et al (1994). "Analysis of the An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database". JASMS.
  2. ^ Monroe ME et al (2007). "Analysis of the VIPER: an advanced software package to support high-throughput LC-MS peptide identification". Bioinformatics. PMID 17545182.
  3. ^ Colinge J, Masselot A, Giron M, Dessingy T, Magnin J (2003). "OLAV: towards high-throughput tandem mass spectrometry data identification". Proteomics 3 (8): 1454–63. doi:10.1002/pmic.200300485. PMID 12923771.
This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "Mass_spectrometry_software". A list of authors is available in Wikipedia.
Your browser is not current. Microsoft Internet Explorer 6.0 does not support some functions on Chemie.DE