The increasing amount of publicly available proteomics data creates opportunities for data scientists to investigate quality metrics in novel ways. QuaMeter IDFree was used to generate quality metrics from 665 RAW files and 97 WIFF files representing publicly available "shotgun" mass spectrometry datasets. These experiments were selected to represent Mycobacterium tuberculosis lysates, mouse MDSCs, and exosomes derived from human cell lines. We demonstrate machine learning techniques to detect outliers within experiments and show that quality metrics may be used to distinguish sources of variability among these experiments. In particular, our findings demonstrate that according to nested ANOVA performed on an SDS-PAGE shotgun principal component analysis, runs of fractions from the same gel regions clustered together rather than technical replicates, close temporal proximity or even biological samples. This indicates that the individual fraction may have had a higher impact on the quality metrics than other factors. In addition, we identify sample type, instrument type, mass analyzer, fragmentation technique and digestion enzyme as sources of variability. From a quality control perspective, we illustrate the importance of study design and in particular, the run order, in seeking ways to limit the impact of technical variability. This article is protected by copyright. All rights reserved.
Kriek Marina, Monyai Koena, Magcwebeba Tandeka U, Plessis Nelita Du, Stoychev Stoyan H, Tabb David L
QuaMeter, exosomes, fractionation, mycobacterium tuberculosis, myeloid-derived suppressor cells, quality control, shotgun, study design