Peptide Sets

A total of ~1.4 million individual peptides representing human proteins will be synthesized over the course of the ProteomeTools project. These are segmented as follows:

  • 550,000 tryptic peptides, covering essentially all human genes and isoforms
  • 200,000 non-tryptic peptides, i.e representing products of alternative proteases
  • 350,000 post-translationally modified peptides representing phosphorylation, acetylation, methylation, ubiquinylation and glycosylation
  • 200,000 peptides representing other interesting biology such as disease associated mutations, HLA neo-antigens, protease cleavage products, small open reading frames or translated lncRNAs

Furthermore, peptides will be chemically modified using tandem mass tags (TMT) and di-methyl labels to provide fragmentation data for these quantification methods.

We welcome participation by the scientific community to this project. In particular, we would be interested to learn about interesting sets of peptides to make so that these can be incorporated into the project. Within reason, we can also share the ProteomeTools Peptide Library (PROPEL) with scientists willing to measure these on platforms not available to this project.

 

Released Datasets

The following peptide sets are available as raw data from the PRIDE repository and as fragmentation data in ProteomicsDB.org. Ready to use spectral libraries will follow in due course. Please refer to the 2017 Publication in Nature Methods for further details and tables of peptides and data currently available. The seven sets of tryptic peptides in the current release contain 549,411 peptides covering 19,749 human genes as annotated in Uniprot/SwissProt (Version 2016-07-20; 42,164 protein sequences).

 

Unmodified, tryptic peptides

'Proteotypic Peptide' Set

This set contains 124,875 tryptic peptides covering 15,855 protein coding human genes (by SwissProt) that have been frequently and confidently identified in ProteomicsDB.org. This data is designated as “TUM_first_pool” in PRIDE.

'Missing Gene' Set

For canonical gene products lacking confident protein level identification in ProteomicsDB.org all unique tryptic peptides between 7 and 30 amino acids in length (maximum of one missed cleavage site) were synthesized and measured. This set contains 140,458 peptides covering 4,818 protein coding human genes (by SwissProt). This data is designated as “TUM_second_pool” in PRIDE.

'SRMAtlas' Set

The ProteomeTools Consortium obtained a subset of the SRMAtlas collection of peptides. This set contains 90,967 peptides mapping to 19,099 protein coding human genes (by SwissProt). This data is designated as “SRMAtlas” in PRIDE.

'Isoform' Set

Set covering protein isoforms annotated in SwissProt with unique tryptic peptides. The set contains 125,512 peptides and covering 9,354 human genes. This data is termed “TUM_isoform” in PRIDE.

'Missing Gene' Addon Set

Set extending the 'Missing Gene' Set by 54,710 peptides for canonical gene products lacking confident protein level identification in ProteomicsDB.org. This data is termed as “TUM_second_addon" in PRIDE.

'Proteotypic when labeled' Set

Set of 29,141 peptides that are often detected in studies employing tandem mass tag labels. Note that the peptides in this set are not isobarically labeled. This data is termed  “TUM_proteo_TMT" in PRIDE.

'Resynthesis of proteotypic peptides' Set

Set of 12,760 peptides that were part of the initial “proteotypic” peptide set but were not detected using Liquid chromatography mass spectrometry (LC-MS). This data is termed “TUM_proteo_TMT" in PRIDE.

 

Upcoming Datasets

The ProteomeTools Consortium is currently working to synthesize, measure and release the following peptide sets to the community.

Expect peptide sets containing post-translationally modified peptides soon. Peptides and data sets for Acetylation, Methylation, Ubiquitination (K-GG), O-linked Glycosylation and (formerly) N-linked Glycosylation (PNGaseF treated peptides) have been generated. We are currently in the process of generating extensive libraries for serine, threonine and tyrosine phosphorylation.