Abstract
Prediction of gene co-expression has a great importance because of its role in explaining the molecular and functional mechanisms of the cells. For this reason, high performance methods should be developed to reduce errors. We have developed a novel method using heterogeneous features including gene expression values and various sequencebased features (SBF) via the random forest (RF) classifier to predict co-expressed genes. The proposed method, SeqNet, outperforms current state-of-the-art methods. Furthermore, the results indicated that the SBF are effective in the detection of co-expressed genes. However, the highest performance in predicting co-expressed genes was achieved by sequencebased features, along with gene expression data. This may be due to the ability of heterogeneous features prompt functional relationships between genes. Finally, we have concluded that SBF improve the performance of co-expressed genes prediction methods. The SeqNet can predict gene co-expression relationships when there is not enough gene expression data.
Keywords: Codon usage, gene co-expression network, machine learning methods, prediction of gene co-expression, random forest, sequence-based features.
Graphical Abstract