Abstract
Aims: The many miRNAs discovered so far have been divided into biologically representative families, aiming at organizing and systematizing their study so to promote, mainly, a better understanding of their functionalities. Clustering miRNA sequences can corroborate the family-based organizations as well as helping to explore sequences belonging to the same cluster as potentially having similar biological functions.
Observations: Considering that members of the same miRNA family tend to biologically function in similar ways, a well-structured family can help detecting miRNA functions which have not been associated yet with any existing family.
Methods: The work described in this paper empirically investigates the suitability of organizing miRNAs as families, using a clustering algorithm based on a particular type of graph i.e., minimal spanning tree (MST), for clustering miRNA sequences. Seven miRNA families stored in the online miRBase database have been used as input to the MST-based clustering algorithm and clustering results have been compared to assess the suitability of identirying them.
Results: The motivations for the experiments were to identify refinements in miRNA family organizations that could be pursued and, also, investigate how the chosen graph-based clustering algorithm would perform in miRNA related domains.
Conclusion: Interesting and useful results, particularly related to detecting information based on the visualization of the final induced graphs, and their induced connected components (clusters), are presented and discussed. Particularly, experiments results suggested the possibility of refining some miRNA families by grouping some of their miRNAs as sub-families.
Keywords: Data mining, enzyme, graph theory, minimal spanning tree, miRNA clustering, unsupervised learning.
Graphical Abstract