Abstract
In the postgenomic era different electronic procedures are available for protein sequence annotation, the process of enriching, with structural and functional features, any protein after electronic translation from its correspondent gene or mRNA. The demand of reliable annotation systems is particularly urgent given the volume of genomic data that are daily produced by next generation sequencing machines. In this paper we present a procedure that enhances the annotation performance of the previously described Bologna Annotation Resource (BAR+). BAR is based on clustering of the graphs representing the similarity between a large number of protein sequences and here we apply community detection algorithms to detect subclusters within any graph. When the cluster is endowed with specific Gene Ontology terms associated both to Biological Process and Molecular Function, the application of our procedure allows a fine tuning of the annotation process and generates subclusters where proteins sharing strictly related GO terms are grouped.
Keywords: Clustering, community detection, protein sequence annotation.
Graphical Abstract