1 Backgroud

CoMut plot is widely used in cancer research publications as a visual summary of mutational landscapes in cancer cohorts. This summary plot can inspect gene mutation rate and sample mutation burden with their relevant clinical details, which is a common first step for analyzing the recurrence and co-occurrence of gene mutations across samples. To address the needs for custom data and TCGA/ICGC data comparison, we have created CoMutPlotter, a web-based tool for the production of publication quality graphs in an easy-of-use and automatic manner.

2 Implementation

2.1 CoMutPlotter framework

CoMutPltter takes mutational profiles from whole-exome sequencing or whole-genome sequencing as input materials.

2.2 Data input

2.2.1 Supported Input File Formats

Three dominant file formats are supported:
1. Variant Call Format (VCF)
2. Mutation Annotation Format (GDC MAF)
3. Tab Separated Values (ICGC TSV)

2.2.2 Supported format for sample metadata/clinical information

As shown in the following figure, Tab-delimited format is supported and the sample ID column should be “sample”. Additional phenotypes or clinical features should be rendered as separated columns according to their respective samples.

2.2.3 Upload interface

2.2.4 For study cohort with large number of VCF files

To make data management and analysis more efficient, mutation profiles in diverse formats are converted to MAF format before entering subsequent analyses. A custom script for file format conversion is available for download (https://drive.google.com/file/d/1erLDY6gocUMilhW7KLAzyW1X9rqCKhMC/view ) when users try to deal with a study cohort with large number of VCF files.

Usage: ./vcf2maf.R [-[-DB|d] [<character>]] [-[-out|o] [<character>]] [-[-help|h]]
    -d|--DB      Select annotation DB:hg19 ,hg38 ,default: hg19 (optional)
    -o|--out     output file name, default: custom.maf (optional) 
    -h|--help    This help
 
 Jul. 13, 2018
Please contact Po-Jung Huang at Department of Biomedical Sciences, Chang Gung University.
            (pjhuang@gap.cgu.edu.tw) if you have any question!

The resulting custom.maf can be used for uploading to CoMutPlotter server.

2.3 Functional consequence annotation

CoMutPlotter uses Oncotator as its default annotation package to ensure the annotation results are comparble between custom cohort and TCGA/ICGC studies and make variant annotation become a reproducible step.

Oncotator 1.9.9.0 was retrieved from the Docker public repository and installed on our server for functional consequence annotation.

2.4 Cancer driver genes identification

MutSigCV version 1.4 was retrieved from the Docker public repository and installed on our server for identifying genes that are significantly mutated in cancer genomes, using a model with mutational covariates.

The prepareMutSig function from Maftools was used to correct the discrepancy between Hugo_Symbols in MAF and non-Hugo symbols in the covariates files provided by MutSigCV package.

2.5 Mutational signature recognition

Up to date, 30 distinct mutational signatures have been identified and categorized in COSMIC database using the WTSI Mutational Signature Analysis Framework. Quantifying known signatures in individual samples is not possible under the current WTSI framework when sample sizes are small. The deconstructSigs R package was adopted to address the issue and to perform the identification and quantification of mutational signatures within a single tumor sample.

2.6 Visualization and web modules

  • The bar plot is used to represent the mutation rate of each sample and heatmaps are used to display the mutation types, signifiant values of cancer driver genes, and sample information.

    • Bar plot

    • Heat maps

  • The stacked bar and dot matrix function of ggplot2 are used to display the contributions of mutational signatures in individual samples.

  • The gtable R package is used to render the CoMut Plot.

- The Shiny R package is used to construct the framework of CoMutPlotter.

2.7 OUTPUT

The output of CoMutPlotter consists of three major components:

  1. CoMut Plot
  2. Cross-project comparison
  3. Download & Report Generation

2.7.1 CoMut Plot

  1. Mutation rate per sample stratified by tanslational effects such as SNPs, Insertions and Deletions
    The mutation rate is calculated as number of somatic mutations per million bases.
  2. Sample metadata is displayed as heatmap
  3. Gene mutation is displayed as bar plot and stratified by various mutation types
  4. Significantly mutated genes identified by MutSigCV by q <=0.1
    The color gradient is rendered according to -log10(q)
  5. Mutation landscape and effects are displayed as heatmap
  6. Contributions of 30 COSMIC signatures in individual samples are displayed using % stacked bar chart

2.7.2 Interactive filters

2.7.2.1 Filter for displaying the most frequently/signifiantly mutated genes

  • Order by % altered samples
  • Order by -log10(FDR) of MutSigCV

2.7.2.2 Filter for selecting genes of interest

2.7.2.3  Filter for selecting specific mutation types

2.7.2.4 Inspecting contributions of mutational signatures in individual samples

Contributions of known mutational signatures (https://cancer.sanger.ac.uk/cosmic/signatures) in individual samples were quantified by using deconstructSigs packages.

Percentage stacked bar is a commonly used method for displaying the contributions of mutational signatures in each sample. However, users may find it very difficult to distinguish signatures from each other when too many signatures were identified in a sample.

To address this issue, we provide an alternative way to display the contributions of mutational signatures using dot matrix. As shown in the following figure, users can easily depict the landscape of mutational signatures across samples using dot matrix.

2.7.3 Cross-project comparison

2.7.4 Download & Report Generation