Abstract
Background: The rapidly growing protein and annotation databases necessitate the development of efficient tools to process this valuable information. Biologists frequently need to find proteins similar to a given protein, for which BLAST tools are commonly used. With the development of biomedical ontologies, e.g. Gene Ontology, methods were designed to measure function (semantic) similarity between two proteins. These methods work well on protein pairs, but are not suitable for protein query processing.
Objective: Our aim is to facilitate searching of similar proteins in an acceptable time.
Methods: A novel method SimExact for high speed searching of functionally similar proteins has been proposed.
Results: The experiments of this study show that SimExact gives correct results required for protein searching. A fully functional prototype of an online tool (www.datafurnish.com/protsem.php) has been provided that generates a ranked list of the proteins similar to a query protein, with a response time of less than 20 seconds in our setup. SimExact was used to search for protein pairs having high disparity between function similarity and sequence similarity.
Conclusion: SimExact makes such searches practical, which would not be possible in a reasonable time otherwise.
Keywords: SimExact, protein function similarity, protein query, protein similarity measures, gene ontology, protein.
Graphical Abstract
[PMID: 11262956]
[http://dx.doi.org/10.1093/bioinformatics/16.2.125] [PMID: 10842733]
[http://dx.doi.org/10.1016/j.datak.2006.05.003]
[http://dx.doi.org/10.1093/bioinformatics/btg153] [PMID: 12835272]
[http://dx.doi.org/10.1016/S0168-9525(01)02348-4] [PMID: 11485799]
[http://dx.doi.org/10.1016/j.datak.2006.05.003]
[http://dx.doi.org/10.1186/2041-1480-2-5] [PMID: 21884591]
[http://dx.doi.org/10.1186/1471-2105-11-290] [PMID: 20509916]
[http://dx.doi.org/10.1109/TCBB.2013.176] [PMID: 26356015]
[http://dx.doi.org/10.1142/S0219720011005641] [PMID: 22084008]
[http://dx.doi.org/10.1186/1471-2105-15-S2-S5] [PMID: 24564710]
[http://dx.doi.org/10.1093/bioinformatics/btq064] [PMID: 20179076]
[http://dx.doi.org/10.1136/amiajnl-2011-000659] [PMID: 22374934]
[http://dx.doi.org/10.1371/journal.pone.0066745] [PMID: 23741529]
[http://dx.doi.org/10.1109/TCBB.2017.2695542] [PMID: 28436885]