Abstract
In Machine Learning, classification is considered a supervised learning
technique to predict class samples based on labeled data. Classification techniques have
been applied to various domains such as intrusion detection, credit card fraud detection,
etc. However, classification techniques on all these domains have been applied to
balanced datasets. Balanced datasets are those which contain equal proportion of
majority and minority examples. However, in real-time, obtaining balanced datasets is
difficult because majority of the datasets tend to be imbalanced. Developing a model
for classifying imbalanced datasets is a challenge, particularly in the medical domain.
Accurate identification of a disease-affected patient within time is critical as any
misclassification leads to severe consequences. However the imbalanced nature of most
of the real-time datasets presents a challenge for most of the conventional machine
learning algorithms. For the past few years, researchers have developed models using
Conventional machine learning algorithms (linear and nonlinear) are stating
unsatisfactory performance in classifying imbalanced datasets. To address this problem
of skewed datasets several statistics techniques & robust machine Learning techniques
have been developed by the researchers. The discussion on handling imbalanced
datasets in the healthcare domain using machine learning techniques is a primary focus
of this chapter.