Detecting Atypical Examples of Known Domain Types by Sequence Similarity Searching: The SBASE Domain Library Approach

Somdutta      Dhir; Mircea      Pacurar; Dino      Franklin; Zoltan      Gaspari; Attila      Kertesz-Farkas; Andras      Kocsor; Frank      Eisenhaber; Sandor      Pongor

doi:10.2174/138920310794109148

Abstract

SBASE is a project initiated to detect known domain types and predicting domain architectures using sequence similarity searching (Simon et al., Protein Seq Data Anal, 5: 39-42, 1992, Pongor et al., Nucl. Acids. Res. 21:3111-3115, 1992). The current approach uses a curated collection of domain sequences – the SBASE domain library – and standard similarity search algorithms, followed by postprocessing which is based on a simple statistics of the domain similarity network (http://hydra.icgeb.trieste.it/sbase/). It is especially useful in detecting rare, atypical examples of known domain types which are sometimes missed even by more sophisticated methodologies. This approach does not require multiple alignment or machine learning techniques, and can be a useful complement to other domain detection methodologies. This article gives an overview of the project history as well as of the concepts and principles developed within this the project.

Keywords: SBASE domain library, sequence similarity searching, protein domain prediction, atypical domain detection, Known Domain Types, protein sequence, Pair-wise comparison methods, priori classified database, Genera- tive models, Discriminative models, kernel methods, vector machines, Network mod-els, PageRank algorithm, BLAST-based, curation-based classification, regular expressions, sequence profiles, hidden Markov models, FTHOM approach, Uniprot, domain annotations, predictive coverage, ANNOTATOR, Supervised Cross-Validation