Abstract
Nicotinamide adenine dinucleotide (NAD) plays an important role in cellular metabolism and acts as hydrideaccepting and hydride-donating coenzymes in energy production. Identification of NAD protein interacting sites can significantly aid in understanding the NAD dependent metabolism and pathways, and it could further contribute useful information for drug development. In this study, a computational method is proposed to predict NAD-protein interacting sites using the sequence information and structure-based information. All models developed in this work are evaluated using the 7-fold cross validation technique. Results show that using the position specific scoring matrix (PSSM) as an input feature is quite encouraging for predicting NAD interacting sites. After considering the unbalance dataset, the ensemble support vector machine (SVM), which is an assembly of many individual SVM classifiers, is developed to predict the NAD interacting sites. It was observed that the overall accuracy (Acc) thus obtained was 87.31% with Matthew’s correlation coefficient (MCC) equal to 0.56. In contrast, the corresponding rate by the single SVM approach was only 80.86% with MCC of 0.38. These results indicated that the prediction accuracy could be remarkably improved via the ensemble SVM classifier approach.
Keywords: Ensemble support vector machines, encoded PSSM profile, NAD-protein interacting sites, position specific scoring matrix (PSSM), unbalance data set, 7-fold cross validation