Abstract
Protein domain boundary prediction is critical for understanding protein structure and function. In this study, we present a novel method, an order profile domain linker propensity index (OPI), which uses the evolutionary information extracted from the protein sequence frequency profiles calculated from the multiple sequence alignments. A protein sequence is first converted into smooth and normalized numeric order profiles by OPI, from which the domain linkers can be predicted. By discriminating the different frequencies of the amino acids in the protein sequence frequency profiles, OPI clearly shows better performance than our previous method, a binary profile domain linker propensity index (PDLI). We tested our new method on two different datasets, SCOP-1 dataset and SCOP-2 dataset, and we were able to achieve a precision of 0.82 and 0.91 respectively. OPI also outperforms other residue-level, profile-level indexes as well as other state-of-the-art methods.
Keywords: Domain boundary, domain linker, multiple sequence alignments, sequence-based prediction, computational modelling, pseudo amino acid, GPCR type, substrate-enzyme-product triads, mutagenesis, homology, multiple sequence alignment, DomSSEA, IGRN, CHOPnet, Nagarajan, SSEP-Domain, beta-turn types, entropy index, GHL, KDH, SCOP-1, ASTRAL, protein sequence frequency profiles, index OPI, PSI-BLAST, lower Z-score, OPI, NONE represents, SCOP-1 Dataset, SCOP-2 Dataset, PDLI, REI, sequence-level, protein-protein interactionDomain boundary, domain linker, multiple sequence alignments, sequence-based prediction, computational modelling, pseudo amino acid, GPCR type, substrate-enzyme-product triads, mutagenesis, homology, multiple sequence alignment, DomSSEA, IGRN, CHOPnet, Nagarajan, SSEP-Domain, beta-turn types, entropy index, GHL, KDH, SCOP-1, ASTRAL, protein sequence frequency profiles, index OPI, PSI-BLAST, lower Z-score, OPI, NONE represents, SCOP-1 Dataset, SCOP-2 Dataset, PDLI, REI, sequence-level, protein-protein interaction