This is the introduction to the LCMS data processing approach SOULS (Segmentation of untargeted LCMS spectra). It is based on the xcms package [1-3] and particularly suitable for the development of large machine learning models over time. In contrast to the xcms workflow, it allows separate processing, as a unique peak list independent of the processing batch is achieved by summing up the signal intensities within defined segments in the LCMS spectra. The application of the SOULS approach requires an Linux OS.
## Installation
```{r setup}
devtools::install_github("AGSeifert/SOULS")
library(SOULS)
```
This is the introduction to the LCMS data processing approach SOULS (Segmentation of untargeted LCMS spectra). It is based on the xcms package [1-3] and particularly suitable for the development of large machine learning models over time. In contrast to the xcms workflow, it allows separate processing, as a unique peak list independent of the processing batch is achieved by summing up the signal intensities within defined segments in the LCMS spectra.
## Data
To demonstrate the functionality of this approach, [example data is provided here](https://www.fdr.uni-hamburg.de/record/13535).
Please download the and unzip the file. The phenodata folder contains a csv file with information about the samples, which can be customized individually (e.g. name, variety, geogr. origin, instrument, harvesting year etc). The other folders provide mzML files (we recommend converting the vendor specific files to mzML format with MSConvert [4]). The example files have been shortened to one minute to enable processing on a normal laptop. For real data sets, we recommend using a workstation. The development of large machine learning models over time requires the definition of one sample as a reference sample for the retention time alignment. For example, this could be the sample with the most detected peaks. This reference sample is added to each new data set to be processed, making the retention time alignment independent of the processing batch. The "Samples" folder contains the samples to be processed in the respective batch.