Abstract
Given a compounds-forming system, i.e., a system consisting of some compounds and their relationship, can it form a biologically meaningful pathway? It is a fundamental problem in systems biology. Nowadays, a lot of information on different organisms, at both genetic and metabolic levels, has been collected and stored in some specific databases. Based on these data, it is feasible to address such an essential problem. Metabolic pathway is one kind of compoundsforming systems and we analyzed them in yeast by extracting different (biological and graphic) features from each of the 13,736 compounds-forming systems, of which 136 are positive pathways, i.e., known metabolic pathway from KEGG; while 13,600 were negative. Each of these compounds-forming systems was represented by 144 features, of which 88 are graph features and 56 biological features. “Minimum Redundancy Maximum Relevance” and “Incremental Feature Selection” were utilized to analyze these features and 16 optimal features were selected as being able to predict a query compounds- forming system most successfully. It was found through Jackknife cross-validation that the overall success rate of identifying the positive pathways was 74.26%. It is anticipated that this novel approach and encouraging result may give meaningful illumination to investigate this important topic.
Keywords: Compounds-forming system, Metabolic pathway, Minimum redundancy maximum relevance, Nearest neighbor algorithm, Jackknife cross-validation, KEGG, HBV virus, AAC model, PseAAC, mRMR, local density change, Incremental Feature Selection (IFS), MaxRel features q, benchmark dataset