Abstract
Background: Transcriptome annotation is the basis for understanding gene structures and analysing gene expression. The transcriptome annotation of many organisms such as humans is far from incomplete, due partly to the challenge in the identification of isoforms that are produced from the same gene through alternative splicing. Third generation sequencing (TGS) reads provide unprecedented opportunity for detecting isoforms due to their long length that exceeds the length of most isoforms. One limitation of current TGS reads-based isoform detection methods is that they are exclusively based on sequence reads, without incorporating the sequence information of annotated isoforms.
Objective: We aim to develop a method to detect isoforms by incorporating annotated isoforms.
Methods: Based on annotated isoforms, we propose a splice isoform detection method called IsoDetect. First, the sequence at exon-exon junctions is extracted from annotated isoforms as “short feature sequences”, which is used to distinguish splice isoforms. Second, we align these feature sequences to long reads and partition long reads into groups that contain the same set of feature sequences, thereby avoiding the pair-wise comparison among the large number of long reads. Third, clustering and consensus generation are carried out based on sequence similarity. For the long reads that do not contain any short feature sequence, clustering analysis based on sequence similarity is performed to identify isoforms. Therefore, our method can detect not only known but also novel isoforms.
Results: Tested on two datasets from Calypte anna and Zebra Finch, IsoDetect shows higher speed and good accuracies compared with four existing methods.
Conclusion: IsoDetect may become a promising method for isoform detection.
Keywords: Isoform detection, sequencing technologies, TGS, transcriptome annotation, Calypte anna, Zebra finch.
Graphical Abstract