Abstract
Quantitative structure–activity relationship (QSAR) models have application in bioorganic chemistry mainly to the study of small sized molecules while applications to biopolymers remain not very developed. MicroRNAs (miRNAs), which are non-coding small RNAs, regulate a variety of biological processes and constitute good candidates to scale up the application of QSAR and complex network (CN). In this work, we selected microRNAs and predicted activity profile subsequently represented as a large network, which may be used to identify stem cell microRNAs with similar action. The propensity of a small RNA sequence to act as miRNA depends on its secondary structure, which one can explain in terms of folding thermodynamic and topological parameters; these can be used for fast identification of miRNAs at early stages of development of stem cells, and gain clarity inside cellular differentiation processes and diseases such as cancer. First, we calculated thermodynamic parameters and topological descriptors for 432 small RNA sequences. The model correctly recognized 203 of smiRNAs (94.0 %) and 216 of non-smiRNAs (100.0 %) divided into both training and validation series used to extend model validation for network construction. ROC curve analysis (area = 0.99) demonstrated that the present model significantly differentiates from a random classifier. In addition, a double ordinate cartesian plot of cross-validated residuals, standard residuals and leverages defined the domain of applicability of the model as a squared area within ±2 band for residuals and a leverage threshold of h = 0.0466. Last, we accounted for the methodology to combine QSAR and CN to carry out a study that would allow us to differentiate the activity of smiRNAs. The network predicted has 216 nodes (smiRNAs), 1948 edges (pairs of smiRNAs with similar activity), and low coverage density d = 8.4%. Comparative studies with real networks reveal that our network apparently has not only an ideal behavior but also resembles the known network models in different aspects. The combination of QSAR and CN is used for quickly accurate selection of new smiRNAs with potential use in bioorganic and medicinal chemistry.
Keywords: Cancer, complex network, miRNA prediction, OD, 1D and 2D descriptors, QSAR, RNA structure, spectral moments, stem cells, thermodynamic parameters, topological descriptor.