Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

HSS-Bin: An Unsupervised Metagenomic Binning Method Based on Hybrid Sequence Feature Recognition and Spectral Clustering

Author(s): Xiao Ding, Chang-Chang Cao, Xu-Ying Liu, Fu-Dong Cheng, Xing Luo and Xiao Sun

Volume 11, Issue 3, 2016

Page: [330 - 339] Pages: 10

DOI: 10.2174/1574893611666151203222815

Price: $65

Abstract

Rapidly developing next-generation sequencing technologies significantly promote metagenomics research, yet also present extreme challenges in the analysis of metagenomic data. Metagenomic samples can contain thousands of microbial species, thus, sequencing datasets can contain fragments from thousands of different genomes. Therefore, clustering the sequencing reads with their original genomes, namely, binning, is usually done to expedite further studies. Currently, binning methods are divided into two categories: supervised methods (which require reference genomes), and unsupervised methods (which do not).

We present an unsupervised binning method that combines a novel sequence feature recognition method with a spectral clustering algorithm. The sequence feature is a hybrid of sequence correlation and sequence composition analyses. Simulation experiments, based on simulated and actual metagenomic datasets, suggest that the combination of sequence composition and an intrinsic correlation of oligonucleotides, both extracted from tetranucleotide analyses, performs better than any single feature. A spectral clustering algorithm, which is a high performance unsupervised clustering method, is also applied in our binning method. The method is available as an open source package called HSS-bin (Hybrid Sequence feature and Spectral clustering unsupervised metagenomic binning) at http://bioinfo.seu.edu.cn/HSS-bin/.

We evaluated HSS-bin’s performance using both simulated and actual metagenomic datasets. Experimental results indicate that HSS-bin can handle metagenomic sequencing data with non-uniform species abundance, short sequences, and complex phylogenetic diversity with high accuracy. Our method performs well on actual metagenomic datasets and on datasets simulated from a complex metagenomic community.

Keywords: Metagenomics, unsupervised binning, sequence features, spectral clustering.


Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy