Raw conversion

6/22/2023

An important driver for this bias has been the lack of cross-platform libraries to access instrument output data files (RAW files) from major instrument providers ( 10). The development of computational proteomics tools has historically been favoured the Microsoft Windows operating systems with tools such as ProteomeDiscover, MaxQuant ( 8), PeaksDB and Mascot Distiller ( 9). However, in order to process these large amounts of (public) data, it is increasingly necessary to use elastic compute resources such as Linux-based cluster environments and cloud infrastructures ( 7). novel variants and post-translational modifications ( 6)). At the same time, more data is now publicly available in proteomics repositories, which in turn means that there is increasing benefit to be had from the reanalysis of millions of mass spectra ( 2– 5) to find new biological insights (e.g. The field of computational proteomics is approaching the big data age ( 1), driven both by a continuous growth in the number of samples analysed per experiment, as well as by the growing amount of data obtained in each analytical run. Finally, we performed a benchmark of ThermoRawFileParser and msconvert to verify that the converted mzML files contain reliable quantitative results.

In addition, we implemented a user-friendly interface (ThermoRawFileParserGUI) for those users not familiar with command-line tools. To ensure the broadest possible availability, and to increase integration capabilities with popular workflow systems such as Galaxy or Nextflow, we have also built Conda package and BioContainers container around ThermoRawFileParser. Here, we present ThermoRawFileParser, an open-source, cross-platform tool that converts Thermo RAW files into open file formats such as MGF and the HUPO-PSI standard file format mzML. Unfortunately, the vast majority of cross-platform proteomics tools are not able to operate directly on the proprietary formats generated by the diverse mass spectrometers. In order to process these large amounts of data, it is increasingly necessary to use elastic compute resources such as Linux-based cluster environments and cloud infrastructures.

The field of computational proteomics is approaching the big data age, driven both by a continuous growth in the number of samples analysed per experiment, as well as by the growing amount of data obtained in each analytical run.

0 Comments

BLOG

Raw conversion

Leave a Reply.

Author

Archives

Categories