Abstract
After several studies on prediction of mutation, we examine the effect of three sampling strategies, the sampling based on years, the sampling based on number of mutations, and the sampling based on the unpredictable portion of amino-acid pairs, on the prediction performance in H5N1 hemagglutinins. The results show that the sampling strategy does play an important role in prediction, which should be taken into account when predicting the next generation of mutations in proteins from influenza A virus.
Keywords: Amino acid, hemagglutinin, influenza, logistic regression, model, mutation, prediction, RNA, virus