Welcome to mSignatureDB
What is Mutational Signatures?
Cancer is a genetic disease arising from sequence changes in DNA. These somatic mutations can be caused by endogenous and exogenous factors such as deficiency of DNA repair, activity of DNA cytidine deaminases (APOBECs), and mutagenic exposures (e.g., tobacco, ultraviolet light and toxic chemicals). The catalogue of somatic mutations from a cancer genome bears the signatures of the mutational processes that have occurred during tumor development, which are the cumulative effects of DNA damage and repair processes. Alexandrov et al., researchers from the Wellcome Trust Sanger Institute have analyzed somatic mutation catalogs of 7,042 tumors from 30 different cancer types and revealed 21 signatures of mutational processes using the algorithm of WTSI Mutational Signature Framework, providing a better understanding of cancer biology by linking signatures to endogenous processes and exogenous mutagens as mentioned above.
The profile of each signature is displayed as contribution of 96 trinucleotide contexts, which is composed of the 6 types of substitution (C>A, C>G, C>T, T>A, T>C and T>G) with 4 types of 5' base and 4 types of 3' base (6*4*4=96). The Wellcome Trust Sanger Institute (WTSI) has categorized 30 reference mutational signatures in the COSMIC database based on the analyses of 10,952 exams and 1,048 genomes across 40 types of human cancer from the Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC).
What we have done?
Despite that 30 reference mutational signatures have been categorized in the COSMIC database. Large cohorts and advanced bioinformatics skills are required to perform the same analysis as WTSI. However, the quantification of known signatures in custom cohorts is not possible under the current framework of the COSMIC database. Due to the ubiquitous nature of many of the signatures found across different cancer types which motivates us to construct a database for mutational signatures in cancers and make such analyses more accessible to general researchers.
mSignatureDB (http://tardis.cgu.edu.tw/msignaturedb) integrates R packages and in-house scripts to determine the contributions of 30 known COSMIC signatures in 15,780 individual tumors across 73 cancer projects (33 TCGA cancer projects and 40 ICGC cancer projects). Furthermore, we also incorporated clinical information from each TCGA/ICGC sample into our database. Users are able to compare mutational signatures between tumor stages or subsets of patients and to systematically investigate differences between clinical categories for each tumor type, which are useful features that are all missed by existing signature analysis packages or database.
A user-friendly interface is also provided to explore the signature distributions within and across cancer projects. mSignatureDB also allows users to perform signature analysis on their datasets, quantifying contributions of signatures at sample resolution, which is a unique feature of mSignatureDB, not available in other related databases.
If you make use of the data presented here, please cite the following article:
Po-Jung Huang, Ling-Ya Chiu, Chi-Ching Lee, Yuan-Ming Yeh, Kuo-Yang Huang, Cheng-Hsun Chiu, Petrus Tang*
Nucleic Acids Research (2018) Database issue, gkx1133, https://doi.org/10.1093/nar/gkx1133