The University of Sheffield
3 files

Merging tag-based proteomic experiments

posted on 2018-02-26, 10:11 authored by Andrew Landels
This dataset/code forms part of Andrew Landels' thesis: "Improving proteomic methods and investigating H2 production in Synechocystis sp. PCC6803"

The code in this section is split into two separate scripts, both written in Mathematica. The first (MaxQuant_to_SignifiQunat) converts the data format of files generated by the program MaxQuant and re-orders them into a format that can be input to SignifiQuant - a program in the in-house proteomics pipeline available at the Sheffield University Biological and Chemical Engineering Department. This code reads one or more files within a relevant directory, collects all peptide information, and writes a new file containing all required data. As such, it is both a conversion script and also a data-collecting script.

The second script investigates methods for merging together two biologically replicated datasets - specifically, one dataset represents a complete experimental replicate of the other. The theory behind this methodology is described in the aforementioned thesis, chapter 4.6. Briefly, this code examines the label intensity distributions, log-transforms the data, then utilises the median correction method to generate a fixed median value (0) and scales the data to generate an equal gradient between the 40th and 60th percentile.

The protein data in the repeat experiment are then scaled by the protein data in the initial experiment. This slightly disrupts the balancing by median correction, however not significantly. The data are then plotted against each other in a scatter plot, demonstrating systematic improvement of the quality of the between-experiment repeatability. A principal component analysis was then performed, showing a much closer clustering by experimental condition (principal component 1) than of experimental replication deviations (principal component 2), demonstrating success of the method.

This method shows effective combination of two proteomic datasets that are completely independent experimental repeats, demonstrating for the first time that this methodology is feasible in tag-based proteomic investigations.


EU FP7 308518



  • The project has ethical approval and have included the number in the description field


  • The data complies with the institution and funders' policies on access and sharing

Sharing and access restrictions

  • The data can be shared openly

Data description

  • The file formats are open or commonly used

Methodology, headings and units

  • Headings and units are explained in the files

Usage metrics

    Department of Chemical and Biological Engineering



    Ref. manager