Abstract
This study aims to develop a churn prediction model which can assist
telecommunication companies in predicting customers who are most likely subject to
churn. The model is developed by employing machine learning techniques on big data
platforms. Customer churn is one of the most critical issues, especially in high
investment telecommunication companies. Accordingly, the companies are looking for
ways to predict potential customers to churn and take necessary actions to reduce the
churn. To accomplish the objective of the study, it first compares eight machine
learning techniques, i.e., ridge classifier, gradient booster, adaptive boosting, bagging
classifier, k-nearest neighbour (kNN), decision tree, logistic regression, and random
forest. By using five evaluation performance metrics (i.e., accuracy, AUC score,
precision score, recall score, and the F score), kNN is selected since it outperforms
other techniques. Second, the selected technique is used to predict the likelihood of
customers churning.