Abstract
Nuclear receptors constitute a super family of protein hormones that serve as transcription factors. They typically reside in the cytosol and, after ligand binding, migrate to the nucleus to exert their biological action. Ligands are lipophilic, small molecules including retinoids, steroids, thyroxine, and vitamin D. Nuclear receptors being important regulators of gene expression, constitute 13% of proteins targeted by various drugs. Thus it becomes important to identify the ligand binding pockets on these proteins. Support Vector Machine (SVM) classifier was built to identify nuclear receptor ligand binding pockets. Positive dataset consisted of the ligand binding pockets of known nuclear receptor-ligand complex structures. Negative dataset consisted of ligand binding pockets of proteins other than nuclear receptors and nonligand binding pockets of nuclear receptors. SVM model yielded a 10 fold cross-validation accuracy of 96% using linear kernel. Also, it is helpful to find out the class of nuclear receptor in order to design a “class-specific” drug. In case of the multiclass nuclear receptor dataset comprising of nuclear receptors belonging to three different classes, SVM model for classification yielded an average 10-fold cross validation accuracy of 92 % for this dataset. SVM algorithm identifies and classifies nuclear receptor binding pockets with excellent accuracy. Top ranked features indicate the hydrophobic nature of ligand binding pocket of nuclear receptors. Conserved Leucine and phenylalanine residues form a distinguishing feature of these binding pockets. Along-with identification of NR binding pockets, important top ranked features are listed which would be useful in screening of possible drug molecules with NRs as molecular targets.
Keywords: fpocket, ligand binding pockets, nuclear hormone receptors, nucleRDB, shell-wise features, support vector machines.