The University of Sheffield
Browse
TEXT
iTRAQ_TMT-complexity_emPAI.nb (250.83 kB)
TEXT
proteome_size_histogram.nb (151.98 kB)
DATASET
emPAI_database.csv (38.08 kB)
DATASET
emPAI_values.csv (31.82 kB)
DATASET
bg.csv (11.98 kB)
TEXT
Synechocystis_uniprot_20150813.fasta (1.5 MB)
TEXT
LICENSE.txt (1.05 kB)
1/0
7 files

Proteomic background in Synechocystis

software
posted on 2018-02-26, 10:11 authored by Andrew Landels
This dataset/code forms part of Andrew Landels' thesis: "Improving proteomic methods and investigating H2 production in Synechocystis sp. PCC6803" http://etheses.whiterose.ac.uk/id/eprint/19034

The code for the methodology described below was written in Wolfram Mathematica (10.1) and the notebook file is "iTRAQ\_TMT-complexity\_emPAI.nb"

An in-depth proteomic dataset, comprised of 2 8-plex iTRAQ experiments investigating a mutant against WT \species{Synechocystis} under two different conditions, was generated on a Q-Exactive HF mass spectrometer (data not included in this repository due to size constraints). To calculate the emPAI scores, the ‘observable’ peptide values were calculated as follows. The complete proteome for Synechocystis PCC6803 – Kazusa strain, was downloaded as a fasta file from uniprot (taxonomy:1111708 – accessed August 2015, 3517 protein entries), which is available in this respository.

This was then merged with the spike-in proteins to make a singular database for analysing the data, by doing this, effects on statistical methods such as false discovery were equal between all analyses. The fasta file was processed in Wolfram Mathematica (version 10.1) to generate an in-silico digest of each of the proteins, excluding any peptides that fell outside a 1000 – 7500 dalton window to replicate the presence of 2+ or 3+ ions observable in the 500 – 2500 m/z window used during the mass spec experimental scan. The emPAI scores for all identified proteins were calculated using the following formula.
\[
emPAI = 10^{(\frac{N_{observed}}{N_{observable}})} -1
\]
Where $N_{observed}$ is the number of unique peptides observed for a given protein, and $N_{observable}$ is the total number of unique peptides that could be observed for a given protein.

This data was then graphed as a histogram to identify the protein concentration distribution and dynamic range. Dynamic range was calculated by taking the exponential of the difference between the maximal and minimal emPAI values.

Funding

EU FP7 308518

History

Ethics

  • There is no personal data or any that requires ethical approval

Policy

  • The data complies with the institution and funders' policies on access and sharing

Sharing and access restrictions

  • The data can be shared openly

Data description

  • The file formats are open or commonly used

Methodology, headings and units

  • Headings and units are explained in the files

Usage metrics

    Department of Chemical and Biological Engineering

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC