CMPD: cancer mutant proteome database

Po-Jung Huang Chi-Ching Lee Bertrand Chin-Ming Tan Yuan-Ming YehLichieh Julie Chu Ting-Wen Chen Kai-Ping Chang Cheng-Yang Lee Ruei-Chi GanHsuan Liu

Whole-exome sequencing, which centers on the protein-coding regions of disease/cancer-associated genes, represents the most cost-effective method to-date for deciphering the association between genetic alterations and diseases. Large-scale whole exome/genome sequencing projects have been launched by various institutions, such as NCI, Broad Institute and TCGA, to provide a comprehensive catalog of coding variants in diverse tissue samples and cell lines. Further functional and clinical interrogation of these sequence variations must rely on extensive cross-platforms integration of sequencing information and a proteome database that explicitly and comprehensively archives the corresponding mutated peptide sequences. While such data resource is critical for the mass spectrometry-based proteomic analysis of exonic variants, no database is currently available for the collection of mutant protein sequences that correspond to recent large-scale genomic data. To address this issue and serve as a bridge to integrate genomic and proteomics datasets, CMPD ( collected over 2 million genetic alterations, which not only facilitates the confirmation and examination of potential cancer biomarkers but also provides an invaluable resource for translational medicine research and opportunities to identify mutated proteins encoded by mutated genes.


Figure 1. Overview of CMPD

Genetic alterations were gathered from large-scale cancer genomics studies such as NCI-60 WES, CCLE DNA sequencing, and TCGA WES/WGS projects. A wide variety of annotation sources were integrated into CMPD database to facilitate the functional interpretations of these alterations. The coding variants were introduced to protein sequences according to the respective transcripts to generate mutant protein sequence collection. Sample-specific tryptic peptides with mutated amino acids can also be generated for proteomic searches.