Abstract
In recent years, research on bioinformatics has increasingly focused on the problem of class imbalance. A classification task is called class imbalance when the number of instances belonging to one class or several classes exceeds that of the other classes. Class imbalance often underestimates the performance of minority classes. This article provides a review of the most widely used class imbalance learning methods and their applications in various bioinformatic problems, including disease diagnosis based on gene expression data and protein mass spectrometry data, translation initiation site recognition based on DNA sequences, protein function classification using amino acid sequences, activities prediction of drug molecules, recognition of precursor microRNA (pre-miRNAs), etc. This article also summarizes the current challenges and future possible trends of class imbalance learning methods in Bioinformatics.
Keywords: Activities prediction of drug molecules, bioinformatics, class imbalance, gene expression, protein function classification, protein mass spectrometry, recognition of precursor microRNA, translation initiation site recognition.
Graphical Abstract