Abstract
In this study, the problem of predicting interspecies transmission of avian influenza viruses (AIVs) was investigated with machine learning methods. We identified 87 signature positions in AIV protein sequences with information entropy method and encoded these positions with five amino acid factor scores (AAFactors) concentrated from 491 physicochemical and biochemical properties of amino acids. We constructed four prediction models by integrating these five features with commonly used machine learning technologies including Decision Tree, Naive Bayes, Random Forest and Support Vector Machine. The cross validation experiment results demonstrated the power of AAFactors in predicting avian-to-human transmission of AIVs. Comparative analysis revealed the strengths and weaknesses of different machine learning methods, and the importance of different AAFactors to the prediction.
Keywords: AAIndex, amino acid factor, avian influenza A virus, decision tree, interspecies transmission, machine learning, naive bayes, random forest, support vector machine.