Abstract
Detection and characterization of structural domains of proteins is crucial for determination of its tertiary structure, elucidation of its functions and design and production of its biologically active analogs. Identification of domainsegments at the sequence level is also important in deciphering protein structural genomics and in evolutionary studies. The diversity of domain folds and sequences and high structural flexibility of the inter-domain linker regions pose great challenges for determination of multi-domain protein structures even from X-ray crystallographic or NMR spectroscopic data or by homology modeling. The problems get manifold in the absence of any such data or sequence homologies. Interestingly though, identification of protein domains is a unique research problem where ab-intio computational investigations supersede the experimental ones or offer better applications of the latter. Advancement of Bioinformatics and Computational Biology in post-genomic research has led to plethora of approaches, algorithms and web-server developments for prediction of protein domains using - 3D co-ordinates, partial structural information including secondary structure or only the primary sequence. Here we assess the state-of-art developments in the field. Trend-setting as well as widely used computational methods and web-servers/databases are reviewed here with a focus on their applicability, novelty and strength in mining the multiple features of sequence/structure that contribute to formation and distinctions and diversity of protein domains. Future possibilities of a unified system with optimal decision support are highlighted.
Keywords: Protein structural domains, homology modeling, template matching, linker index, motifs, secondary structure