New Integrated Mitochondrial DNA Bioinformatics Pipeline to Improve Quality Assessment of Putative Pathogenic Variants from NGS Experiments
Page: 1-39 (39)
Author: Luigi Donato, Simona Alibrandi, Rosalia D’Angelo, Concetta Scimone, Antonina Sidoti and Alessandra Costa
DOI: 10.2174/9789811481802120010003
PDF Price: $15
Abstract
Mitochondria represents one of the most essential, investigated organelles of eukaryotic cells. Due to the relevance of the functions, especially cellular respiration, mitochondria are subject to continuous oxidative stress stimuli that, over time, can impair this distinct genome, leading, for example, to several neurodegenerative and age-related diseases. Today, the growth of next generation sequencing techniques allows researchers to improve variant detection of mtDNA, increasing, in the meantime, the quantity and complexity of data produced, making molecular diagnosis of mitochondrial diseases more challenging. The main issues that will be faced working with mtDNA high-throughput sequencing deal with detection and interpretation of low heteroplasmy and homoplasmy levels, variants unrelated to exhibited phenotype and identification of variants of unknown significance (VUS). To perform an accurate analysis of mtDNA variants produced by next generation sequencing experiments, we propose an integrated approach that foresees the complementary use of the most recent algorithms applied to mtDNA data, trying to extract the maximum from each one. This workflow foresaw four macro-phases (mitogenome alignment/assembly, variant calling, variant annotation and in-silico variant effects predictions), each one characterized by a mixed output coming from several tools and databases rich in complementary information on mtDNA variants. In this way, a superior quality output could be obtained, leading to improved genetic counseling for patients affected by primary mitochondrial pathologies.
Variant Calling on RNA Sequencing Data: State of Art and Future Perspectives
Page: 40-52 (13)
Author: Luigi Donato, Simona Alibrandi, Rosalia D’Angelo, Concetta Scimone, Antonina Sidoti and Alessandra Costa
DOI: 10.2174/9789811481802120010004
PDF Price: $15
Abstract
In recent decades, scientific research has marked an important change in the conceptualization of studies. The development of new analytical technologies, capable of generating large amounts of data, led to the transition from the reductionist scientific model to the holistic one. Among these “high-throughput” technologies, nextgeneration sequencing (NGS) has exponentially increased the amount of knowledge about complex living systems. Bioinformatics and biostatistics are two disciplines developed together with the NGS platforms in order to allow more accurate analysis and data management. NGS technology can be equally applied to both emerging DNA and RNA, originally, for the detection of variants and the analysis of gene expression, respectively. However, in recent years, the possibility of calling variants from the RNA-seq analysis has become increasingly concrete. Here we discuss the different analytical conceptualizations that distinguish DNA from the analysis of RNA sequencing data, highlighting the informative potential of RNA-seq data, not only in relation to the quantification of gene expression. Therefore, the application of the variant calling pipeline analysis to transcriptome data is discussed. Furthermore, the possibility of identifying single nucleotide variants starting from RNA samples, allows characterizing two important mechanisms of regulation of gene expressions such as RNA editing and genomic imprinting. The study of these two biological mechanisms is probably the most stimulating resource obtained from RNA-seq and clearly requires highly adequate bioinformatics support, which is now being developed.
An Innovative Gene Prioritization Pipeline for DNA-Sequencing Analyses
Page: 53-73 (21)
Author: Luigi Donato, Simona Alibrandi, Rosalia D’Angelo, Concetta Scimone, Antonina Sidoti and Alessandra Costa
DOI: 10.2174/9789811481802120010005
PDF Price: $15
Abstract
In recent decades, the development of next-generation sequencing (NGS) technologies has made it possible to understand molecular mechanisms at the basis of various genetic diseases. The huge amount of data obtained from these experiments must be carefully analyzed. One of the most sensitive steps consists of gene prioritization, already performed by several widely used computational tools such as Endeavour, ToppGene, and Candid, to obtain only the genes that are most probably associated with the disease of interest. Furthermore, among these genes, it is important to choose those that show the highest statistical significance, to obtain a more reliable result. This represents one of the major limitations for many researchers. In this work, we propose an innovative method that could help researchers reduce a large amount of data by applying filters before the prioritization process that is carried out by Toppgene, today considered the most powerful tool. We performed prioritization of candidate genes obtained by whole-exome sequencing (WES) on a patient affected by an orphan form of retinitis pigmentosa. We obtained new mutations and polymorphic variants in known associated/causative and yet unrelated genes. The upstream application of different filters allowed us to work with a smaller number of genes and therefore, to produce a lower statistical bias. Furthermore, Toppgene has proven to be a complete, reliable tool for carrying out the prioritization process.
New Integrated Differential Expression Approach for RNA-Seq Data Analysis
Page: 74-102 (29)
Author: Luigi Donato, Simona Alibrandi, Rosalia D’Angelo, Concetta Scimone, Antonina Sidoti and Alessandra Costa
DOI: 10.2174/9789811481802120010006
PDF Price: $15
Abstract
The correct identification of differentially expressed genes is a key concept of many areas of genetic studies. Since 1990s, many different approaches, methods, algorithms and statistics tools have been developed to analyze gene expression levels of thousands of genes.
However, due to the growing complexity of managing, processing and interpreting sequencing data in order to obtain reliable results, there is no consensus about the most appropriate protocols and tools for the identification of differentially expressed genes, starting from RNA-Seq data.
Thus, we propose an integrated and comprehensive approach that combines the most used algorithms for DEG analysis, starting from the raw count data table. The proposed method consists of three main steps: 1) preliminary data analysis and visualization; 2) differential gene expression analysis, using Bioconductor packages (DESeq2, edgeR, Limma, SAMSeq, TweeDESeq) and standard ANOVA (ez and afex packages); 3) integration of results, using two main graphical outputs, through SuperExactTest, UpSetR plots and ComplexHeatmaps packages.
In this way, a more robust output could be obtained in a simple manner, and with no previous bioinformatic knowledge.
Innovations in Data Visualization for Straightforward Interpretation of Nucleic Acid Omics Outcomes
Page: 103-129 (27)
Author: Luigi Donato, Simona Alibrandi, Rosalia D’Angelo, Concetta Scimone, Antonina Sidoti and Alessandra Costa
DOI: 10.2174/9789811481802120010007
PDF Price: $15
Abstract
With the increasing availability of big data in every field of science, the development of visual collecting tools able to simplify the interpretation of such quantity of data is essential. However, many scientists do not have a specific concept of data visualization, manifesting serious problems in implementing it, especially for omics data. Thus, bioinformatics specialists continuously develop new algorithms and tools to perform the deepest analysis of these data, along with innovative methods to simplify their output representation.
In this work, we evaluated a set of free tools that we considered highly suitable for enhancing the interpretation of next-generation sequencing analysis outcomes, above all regarding exomic and transcriptomic experiments.
Visualization of both kinds of omics data is frequently employed in biomedical research to access knowledge within a genomic context, to communicate, and to explore datasets to elaborate well-defined hypotheses. To realize this purpose, it is necessary to adopt dedicated algorithms and tools specific for each kind of analysis.
Circos and VIsualization of VAriants (VIVA) tools allowed us a straightforward, summarized representation of exomic outcomes, while the Omics Playground platform produced powerful results from RNA-Seq analyses. Finally, both omics sources represented the input of pathway analysis by ClueGO and CluePedia tools, which produced enriched network maps useful to discover novel insights from obtained data.
Today, a huge variety of visualization tools is available to data scientists and it can be difficult to select the right one. Data visualization users should, thus, mainly focus their choice on ease of use and whether a tool has the features they need.
Subject Index
Page: 130-138 (9)
Author: Luigi Donato, Simona Alibrandi, Rosalia D’Angelo, Concetta Scimone, Antonina Sidoti and Alessandra Costa
DOI: 10.2174/9789811481802120010008
Introduction
Bioinformatics, and by extension omic sciences – the collective disciplines that are dependent on the use of extensive datasets of biological information – present a challenge of data management for researchers all over the world. Big data collected as part of research projects and experiments can be complex, with several kinds of variables involved. Coupled with continuously changing bioinformatics and information technology tools, there is a need to bring a multidisciplinary approach into these fields. Advances in Bioinformatics, Biostatistics and Omic Sciences attempts to realize an integrated approach between all omic sciences, exploring innovative bioinformatics and biostatistical methodologies which enable researchers to unveil hidden sides of biological phenomena. This volume presents reviews on the following topics which give a glimpse of recent advances in the field: - New Integrated Mitochondrial DNA Bioinformatics Pipeline to Improve Quality Assessment of Putative Pathogenic Variants from NGS Experiments - Variant Calling on RNA Sequencing Data: State of Art and Future Perspectives - An innovative Gene Prioritization Pipeline for WES analyses - New Integrated Differential Expression Approach for RNA-Seq Data Analysis - Innovations in Data Visualization for Straightforward Interpretation of Nucleic Acid Omics Outcomes This volume serves as a guide for graduate students in bioinformatics as well as researchers planning new projects as a part of their professional and academic activities.