Abstract
Objective: Being a kind of post-transcriptional modification (PTCM) in RNA, the 2'-Omethylation modification occurs in the processes of life development and disease formation as well. Accordingly, from the angles of both basic research and drug development, we are facing a challenging problem: given an uncharacterized RNA sequence formed by many nucleotides of A (adenine), C (cytosine), G (guanine), and U (uracil), which one can be of 2-O'-methylation modification, and which one cannot? Unfortunately, so far no computational method whatsoever has been developed to address such a problem.
Method: To fill this empty area, we propose a predictor called iRNA-2methyl. It is formed by incorporating a series of sequence-coupled factors into the general PseKNC (pseudo nucleotide composition), followed by fusing 12 basic random forest classifier into four ensemble predictors, with each aimed to identify the cases of A, C, G, and U along the RNA sequence concerned, respectively.
Results: Rigorous jackknife cross-validations have indicated that the success rates are very high (>93%). For the convenience of most experimental scientists, a user-friendly web-server for iRNA-2methyl has been established at http://www.jci-bioinfo.cn/iRNA-2methyl, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.
Conclusion: The proposed predictor iRNA-2methyl will become a very useful bioinformatics tool for medicinal chemistry, helping to design effective drugs against the diseases related to the 2'-Omethylation modification.
Keywords: 2'-O-Me modification sites, RNA, Sequence-coupled model, Ensemble classifier, Fusion, PseKNC.
Graphical Abstract