Abstract
Aim and Objective: A metabolic pathway is an important type of biological pathway, which is composed of a series of chemical reactions. It provides essential molecules and energies for living organisms. To date, several metabolic pathways have been uncovered. However, their completeness is still on the way. A number of prediction methods have been built to assign chemicals into certain metabolic pathway, which can further be used to predict novel latent chemicals for a given metabolic pathway. However, they did not make use of chemical properties in a system level to construct prediction models.
Method: In this study, we applied a network integration method, which can extract topological features from different chemical networks, representing chemical associations from their different properties, and fused several high-dimension vector representations into a low-dimension vector representation for each chemical. The compact vector representations were fed into the Support Vector Machine (SVM) to construct the prediction model. To tackle the problem that one chemical can participate in more than one pathway type, we construct an SVM-based binary prediction model for each pathway type to determine whether a given chemical can participate in the pathway type. Furthermore, the Synthetic Minority Over-sampling Technique (SMOTE) was adopted to weaken the influence of imbalanced dataset.
Results and Conclusion: Each binary model gave a quite good performance and was superior to the classic prediction model, indicating that the proposed models can be useful tools for integrating heterogeneous information to assign chemicals into correct metabolic pathways.
Keywords: Metabolic pathway, feature extraction, feature fusion, Mashup, Synthetic Minority Over-sampling Technique, support vector machine.