CPAP: Cancer Panel Analysis Pipeline

Huang, P.-J., Yeh, Y.-M., Gan, R.-C., Lee, C.-C., Chen, T.-W., Lee, C.-Y., Liu, H., Chen, S.-J., Tang, P.

Targeted sequencing using next‐generation sequencing technologies is currently being rapidly adopted for clinical sequencing and cancer marker tests. However, no existing bioinformatics tool is available for the analysis and visualization of multiple targeted sequencing datasets. In the present study, we use cancer panel targeted sequencing datasets generated by the Life Technologies Ion Personal Genome Machine Sequencer as an example to illustrate how to develop an automated pipeline for the comparative analyses of multiple datasets. Cancer Panel Analysis Pipeline (CPAP) uses standard output files from variant calling software to generate a distribution map of SNPs among all of the samples in a circular diagram generated by Circos. The diagram is hyperlinked to a dynamic HTML table that allows the users to identify target SNPs by using different filters. CPAP also integrates additional information about the identified SNPs by linking to an integrated SQL database compiled from SNP‐related databases, including dbSNP, 1000 Genomes Project, COSMIC, and dbNSFP. CPAP only takes 17 min to complete a comparative analysis of 500 datasets. CPAP not only provides an automated platform for the analysis of multiple cancer panel datasets but can also serve as a model for any customized targeted sequencing project.



Figure 1. Two different representative views of CPAP.

A: Circos View summarizes all of the identified genomic variants from multiple samples in a circular ideogram, representing variant counts as concentric heatmap tracks. A job information box is located in the upper-left corner summarizing the counts of samples, genes, and variants, as well as records of the execution time and versions of the annotation databases. Buttons for sequence retrieval, Excel download, and switching between different views (Circos and Table) and resolutions (SVG and PNG) are located in the upper left corner. Pop-up windows containing detailed information on items, such as gene symbol, amplicon, and specific sample, are displayed when the mouse is hovered over the Circos plot. A hyperlink to a subset of the annotation table is also embedded in the Circos plot for retrieving a subgroup of variants according to specific items. B: Table View is composed of job information, filters, pie charts, and the main annotation table. Users can apply various filters to obtain subset variants of interest. After filtering out unwanted records, the remaining annotation results, as well as the regenerated Circos plot, can be downloaded according to the timestamp recorded in the filter box.