DSAP: deep-sequencing small RNA analysis pipeline
DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log2-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.
Identification of putative miRNAs from the deep-branching unicellular flagellates
MicroRNAs (miRNAs) are a class of extensively studied RNAi-associated small RNAs that play a critical role in eukaryotic gene regulation. However, knowledge of the miRNA and its regulation in unicellular eukaryotes is very limited. In order to obtain a better understanding on the origin of miRNA regulation system, we used deep-sequencing technology to investigate the miRNA expression pattern in four deep-branching unicellular flagellates: Giardia lamblia, Trichomonas vaginalis, Tritrichomonas foetus, and Pentatrichomonas hominis. In addition to the known miRNAs that have been described in G. lamblia and T. vaginalis, we identified 14 ancient animal miRNA families and 13 plant-specific families. Bioinformatics analysis also identified four novel miRNA candidates with reliable precursor structures derived from mature tRNAs. Our results indicated that miRNAs are likely to be a general feature for gene regulation throughout unicellular and multicellular eukaryotes and some of them may derive from unconventional ncRNAs such as snoRNA and tRNA.
A comprehensive expression profile of microRNAs and other classes of non-coding small RNAs in barley under phosphorous-deficient and -sufficient conditions.
Phosphorus (P) is essential for plant growth. MicroRNAs (miRNAs) play a key role in phosphate homeostasis. However, little is known about P effect on miRNA expression in barley (Hordeum vulgare L.). In this study, we used Illumina’s next-generation sequencing technology to sequence small RNAs (sRNAs) in barley grown under P-deficient and P-sufficient conditions. We identified 221 conserved miRNAs and 12 novel miRNAs, of which 55 were only present in P-deficient treatment while 32 only existed in P-sufficient treatment. Total 47 miRNAs were significantly differentially expressed between the two P treatments (|log2| > 1). We also identified many other classes of sRNAs, including sense and antisense sRNAs, repeat-associated sRNAs, transfer RNA (tRNA)-derived sRNAs and chloroplast-derived sRNAs, and some of which were also significantly differentially expressed between the two P treatments. Of all the sRNAs identified, antisense sRNAs were the most abundant sRNA class in both P treatments. Surprisingly, about one-fourth of sRNAs were derived from the chloroplast genome, and a chloroplast-encoded tRNA-derived sRNA was the most abundant sRNA of all the sRNAs sequenced. Our data provide valuable clues for understanding the properties of sRNAs and new insights into the potential roles of miRNAs and other classes of sRNAs in the control of phosphate homeostasis.
CPAP: Cancer Panel Analysis Pipeline
Huang, P.-J., Yeh, Y.-M., Gan, R.-C., Lee, C.-C., Chen, T.-W., Lee, C.-Y., Liu, H., Chen, S.-J., Tang, P.
Vanno: A Visualization‐Aided Variant Annotation Tool
Po-Jung Huang,1, 2 ‡ Chi-Ching Lee,1 ‡ Bertrand Chin-Ming Tan,3 Yuan-Ming Yeh,4 Kuo-Yang Huang,5Ruei-Chi Gan,1 Ting-Wen Chen,1 Cheng-Yang Lee,1 Sheng-Ting Yang,6 Chung-Shou Liao,6 Hsuan Liu,2, 7 †and Petrus Tang1,2,5
Next‐generation sequencing (NGS) technologies have revolutionized the field of genetics and are trending toward clinical diagnostics. Exome and targeted sequencing in a disease context represent a major NGS clinical application, considering its utility and cost‐effectiveness. With the ongoing discovery of disease‐associated genes, various gene panels have been launched for both basic research and diagnostic tests. However, the fundamental inconsistencies among the diverse annotation sources, software packages, and data formats have complicated the subsequent analysis. To manage disease‐associated NGS data, we developed Vanno, a Web‐based application for in‐depth analysis and rapid evaluation of disease‐causative genome sequence alterations. Vanno integrates information from biomedical databases, functional predictions from available evaluation models, and mutation landscapes from TCGA cancer types. A highly integrated framework that incorporates filtering, sorting, clustering, and visual analytic modules is provided to facilitate exploration of oncogenomics datasets at different levels, such as gene, variant, protein domain, or three‐dimensional structure. Such design is crucial for the extraction of knowledge from sequence alterations and translating biological insights into clinical applications. Taken together, Vanno supports almost all disease‐associated gene tests and exome sequencing panels designed for NGS, providing a complete solution for targeted and exome sequencing analysis. Vanno is freely available at http://cgts.cgu.edu.tw/vanno.
CMPD: cancer mutant proteome database
Whole-exome sequencing, which centers on the protein-coding regions of disease/cancer-associated genes, represents the most cost-effective method to-date for deciphering the association between genetic alterations and diseases. Large-scale whole exome/genome sequencing projects have been launched by various institutions, such as NCI, Broad Institute and TCGA, to provide a comprehensive catalog of coding variants in diverse tissue samples and cell lines. Further functional and clinical interrogation of these sequence variations must rely on extensive cross-platforms integration of sequencing information and a proteome database that explicitly and comprehensively archives the corresponding mutated peptide sequences. While such data resource is critical for the mass spectrometry-based proteomic analysis of exonic variants, no database is currently available for the collection of mutant protein sequences that correspond to recent large-scale genomic data. To address this issue and serve as a bridge to integrate genomic and proteomics datasets, CMPD (http://cgbc.cgu.edu.tw/cmpd) collected over 2 million genetic alterations, which not only facilitates the confirmation and examination of potential cancer biomarkers but also provides an invaluable resource for translational medicine research and opportunities to identify mutated proteins encoded by mutated genes.
APOBEC3A is an oral cancer prognostic biomarker in Taiwanese carriers of an APOBEC deletion polymorphism
Oral squamous cell carcinoma is a prominent cancer worldwide, particularly in Taiwan. By integrating omics analyses in 50 matched samples, we uncover in Taiwanese patients a predominant mutation signature associated with cytidine deaminase APOBEC, which correlates with the upregulation of APOBEC3A expression in the APOBEC3 gene cluster at 22q13. APOBEC3A expression is significantly higher in tumors carrying APOBEC3B-deletion allele(s). High-level APOBEC3A expression is associated with better overall survival, especially among patients carrying APOBEC3B-deletion alleles, as examined in a second cohort (n=188; p=0.004). The frequency of APOBEC3B-deletion alleles is ~50% in 143 genotyped oral squamous cell carcinoma -Taiwan samples (27A3B−/−:89A3B+/−:27A3B+/+), compared to the 5.8% found in 314 OSCC-TCGA samples. We thus report a frequent APOBEC mutational profile, which relates to a APOBEC3B-deletion germline polymorphism in Taiwanese oral squamous cell carcinoma that impacts expression of APOBEC3A, and is shown to be of clinical prognostic relevance. Our finding might be recapitulated by genomic studies in other cancer types.
VAReporter: variant reporter for cancer research of massive parallel sequencing
Po-Jung Huang1,2,4, Chi-Ching Lee3, Ling-Ya Chiu4, Kuo-Yang Huang5, Yuan-Ming Yeh4, Chia-Yu Yang4, Cheng-Hsun Chiu2 and Petrus Tang2,4,6*
Background: High throughput sequencing technologies have been an increasingly critical aspect of precision medicine owing to a better identification of disease targets, which contributes to improved health care cost and clinical outcomes. In particular, disease-oriented targeted enrichment sequencing is becoming a widely-accepted application for diagnostic purposes, which can interrogate known diagnostic variants as well as identify novel biomarkers from panels of entire human coding exome or disease-associated genes.
Results: We introduce a workflow named VAReporter to facilitate the management of variant assessment in disease-targeted sequencing, the identification of pathogenic variants, the interpretation of biological effects and the prioritization of clinically actionable targets. State-of-art algorithms that account for mutation phenotypes are used to rank the importance of mutated genes through visual analytic strategies. We established an extensive annotation source by integrating a wide variety of biomedical databases and followed the American College of Medical Genetics and Genomics (ACMG) guidelines for interpretation and reporting of sequence variations.
Conclusions: In summary, VAReporter is the first web server designed to provide a “one-stop” resource for individual’s diagnosis and large-scale cohort studies, and is freely available at http://rnd.cgu.edu.tw/vareporter.
Keywords: NGS, Exomes, SNV annotation, TCGA, ICGC
circlncRNAnet: an integrated web-based resource for mapping functional networks of long or circular forms of noncoding RNAs
Despite their lack of protein-coding potential, long noncoding RNAs (lncRNAs) and circular RNAs (circRNAs) have emerged as key determinants in gene regulation, acting to fine-tune transcriptional and signaling output. These noncoding RNA transcripts are known to affect expression of messenger RNAs (mRNAs) via epigenetic and post-transcriptional regulation. Given their widespread target spectrum, as well as extensive modes of action, a complete understanding of their biological relevance will depend on integrative analyses of systems data at various levels.
While a handful of publicly available databases have been reported, existing tools do not fully capture, from a network perspective, the functional implications of lncRNAs or circRNAs of interest. Through an integrated and streamlined design, circlncRNAnet aims to broaden the understanding of ncRNA candidates by testing in silico several hypotheses of ncRNA-based functions, on the basis of large-scale RNA-seq data. This web server is implemented with several features that represent advances in the bioinformatics of ncRNAs: (1) a flexible framework that accepts and processes user-defined next-generation sequencing–based expression data; (2) multiple analytic modules that assign and productively assess the regulatory networks of user-selected ncRNAs by cross-referencing extensively curated databases; (3) an all-purpose, information-rich workflow design that is tailored to all types of ncRNAs. Outputs on expression profiles, co-expression networks and pathways, and molecular interactomes, are dynamically and interactively displayed according to user-defined criteria.
In short, users may apply circlncRNAnet to obtain, in real time, multiple lines of functionally relevant information on circRNAs/lncRNAs of their interest. In summary, circlncRNAnet provides a “one-stop” resource for in-depth analyses of ncRNA biology. circlncRNAnet is freely available at http://app.cgu.edu.tw/circlnc/.
mSignatureDB: a database for deciphering mutational signatures in human cancers
Po-Jung Huang1,2, Ling-Ya Chiu3, Chi-Ching Lee2,4, Yuan-Ming Yeh2,3, Kuo-Yang Huang5, Cheng-Hsun Chiu2,6 and Petrus Tang3,6,*
Cancer is a genetic disease caused by somatic mutations; however, the understanding of the causative biological processes generating these mutations is limited. A cancer genome bears the cumulative effects of mutational processes during tumor development. Deciphering mutational signatures in cancer is a new topic in cancer research. The Wellcome Trust Sanger Institute (WTSI) has categorized 30 reference signatures in the COSMIC database based on the analyses of ∼10 000 sequencing datasets from TCGA and ICGC. Large cohorts and bioinformatics skills are required to perform the same analysis as WTSI. The quantification of known signatures in custom cohorts is not possible under the current framework of the COSMIC database, which motivates us to construct a database for mutational signatures in cancers and make such analyses more accessible to general researchers. mSignatureDB (http://tardis.cgu.edu.tw/msignaturedb) integrates R packages and in-house scripts to determine the contributions of the published signatures in 15 780 individual tumors from 73 TCGA/ICGC cancer projects, making the comparison of signature patterns within and between projects become possible. mSignatureDB also allows users to perform signature analysis on their own datasets, quantifying contributions of signatures at sample resolution, which is a unique feature of mSignatureDB not available in other related databases.
CoMutPlotter: a web tool for visual summary of mutations in cancer cohorts
Po-Jung Huang1,2,3, Hou-Hsien Lin4, Chi-Ching Lee5, Ling-Ya Chiu4, Shao-Min Wu2, Yuan-Ming Yeh3, Petrus Tang2, Cheng-Hsun Chiu3, Ping-Chiang Lyu4,*and Pei-Chien Tsai1,2,3,*
CoMut plot is widely used in cancer research publications as a visual summary of mutational landscapes in cancer cohorts. This summary plot can inspect gene mutation rate and sample mutation burden with their relevant clinical details, which is a common first step for analyzing the recurrence and co-occurrence of gene mutations across samples. The cBioPortal and iCoMut are two web-based tools that allow users to create intricate visualizations from pre-loaded TCGA and ICGC data. For custom data analysis, only limited command-line packages are available now, making the production of CoMut plots difficult to achieve, especially for researchers without advanced bioinformatics skills. To address the needs for custom data and TCGA/ICGC data comparison, we have created CoMutPlotter, a web-based tool for the production of publication-quality graphs in an easy-of-use and automatic manner.
We introduce a web-based tool named CoMutPlotter to lower the barriers between complex cancer genomic data and researchers, providing intuitive access to mutational profiles from TCGA/ICGC projects as well as custom cohort studies. A wide variety of file formats are supported by CoMutPlotter to translate cancer mutation profiles into biological insights and clinical applications, which include Mutation Annotation Format (MAF), Tab-separated values (TSV) and Variant Call Format (VCF) files.
In summary, CoMutPlotter is the first tool of its kind that supports VCF file, the most widely used file format, as its input material. CoMutPlotter also provides the most-wanted function for comparing mutation patterns between custom cohort and TCGA/ICGC project. Contributions of COSMIC mutational signatures in individual samples are also included in the summary plot, which is a unique feature of our tool.
CoMutPlotter is freely available at http://tardis.cgu.edu.tw/comutplotter.
Keywords Cancer Mutational Profile Mutational Signature TCGA