Proteomic background in Synechocystis

software

posted on 2018-02-26, 10:11 authored by Andrew Landels

This dataset/code forms part of Andrew Landels' thesis: "Improving proteomic methods and investigating H2 production in Synechocystis sp. PCC6803" http://etheses.whiterose.ac.uk/id/eprint/19034

The code for the methodology described below was written in Wolfram Mathematica (10.1) and the notebook file is "iTRAQ\_TMT-complexity\_emPAI.nb"

An in-depth proteomic dataset, comprised of 2 8-plex iTRAQ experiments investigating a mutant against WT \species{Synechocystis} under two different conditions, was generated on a Q-Exactive HF mass spectrometer (data not included in this repository due to size constraints). To calculate the emPAI scores, the ‘observable’ peptide values were calculated as follows. The complete proteome for Synechocystis PCC6803 – Kazusa strain, was downloaded as a fasta file from uniprot (taxonomy:1111708 – accessed August 2015, 3517 protein entries), which is available in this respository.

This was then merged with the spike-in proteins to make a singular database for analysing the data, by doing this, effects on statistical methods such as false discovery were equal between all analyses. The fasta file was processed in Wolfram Mathematica (version 10.1) to generate an in-silico digest of each of the proteins, excluding any peptides that fell outside a 1000 – 7500 dalton window to replicate the presence of 2+ or 3+ ions observable in the 500 – 2500 m/z window used during the mass spec experimental scan. The emPAI scores for all identified proteins were calculated using the following formula.
\[
emPAI = 10^{(\frac{N_{observed}}{N_{observable}})} -1
\]
Where $N_{observed}$ is the number of unique peptides observed for a given protein, and $N_{observable}$ is the total number of unique peptides that could be observed for a given protein.

This data was then graphed as a histogram to identify the protein concentration distribution and dynamic range. Dynamic range was calculated by taking the exponential of the difference between the maximal and minimal emPAI values.

Funding

EU FP7 308518

History

Ethics

There is no personal data or any that requires ethical approval

Policy

The data complies with the institution and funders' policies on access and sharing

Sharing and access restrictions

The data can be shared openly

Data description

The file formats are open or commonly used

Methodology, headings and units

Headings and units are explained in the files

Usage metrics

Keywords

Synechocystis PCC6803 emPAI Proteomic Background Bioinformatics

Licence

MIT

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM