Handbook of Artificial Intelligence

Machine Learning Algorithms for Health Care Data Analytics Handling Imbalanced Datasets

Author(s): T. Sajana* and K.V.S.N. Rama Rao

Pp: 75-96 (22)

DOI: 10.2174/9789815124514123010006

* (Excluding Mailing and Handling)

Abstract

In Machine Learning, classification is considered a supervised learning technique to predict class samples based on labeled data. Classification techniques have been applied to various domains such as intrusion detection, credit card fraud detection, etc. However, classification techniques on all these domains have been applied to balanced datasets. Balanced datasets are those which contain equal proportion of majority and minority examples. However, in real-time, obtaining balanced datasets is difficult because majority of the datasets tend to be imbalanced. Developing a model for classifying imbalanced datasets is a challenge, particularly in the medical domain. Accurate identification of a disease-affected patient within time is critical as any misclassification leads to severe consequences. However the imbalanced nature of most of the real-time datasets presents a challenge for most of the conventional machine learning algorithms. For the past few years, researchers have developed models using Conventional machine learning algorithms (linear and nonlinear) are stating unsatisfactory performance in classifying imbalanced datasets. To address this problem of skewed datasets several statistics techniques & robust machine Learning techniques have been developed by the researchers. The discussion on handling imbalanced datasets in the healthcare domain using machine learning techniques is a primary focus of this chapter. 

© 2024 Bentham Science Publishers | Privacy Policy