Abstract
Objective: Defects in delivered software products not only have financial implications but also affect the reputation of the organisation and lead to wastage of time and human resources. This paper aims to detect defects in software modules.
Methods: Our approach sequentially combines SMOTE algorithm with K - means clustering algorithm to deal with class imbalance problem to obtain a set of key features based on the interclass and intra-class coefficient of correlation and ensemble modeling to predict defects in software modules. After cautious examination, an ensemble framework of XGBoost, Decision Tree, and Random Forest is used for the prediction of software defects owing to numerous merits of the ensembling approach.
Results: We have used five open-source datasets from NASA PROMISE repository for software engineering. The result obtained from our approach has been compared with that of individual algorithms used in the ensemble. A confidence interval for the accuracy of our approach with respect to performance evaluation metrics, namely accuracy, precision, recall, F1 score and AUC score, has also been constructed at a significance level of 0.01.
Conclusion: Results have been depicted pictographically.
Keywords: Software defects, feature selection, class imbalance, ensemble modelling, hard voting, confidence interval
Graphical Abstract
[http://dx.doi.org/10.1109/TSE.1979.234188]
[http://dx.doi.org/10.1109/TSE.1979.234188]
[http://dx.doi.org/10.3233/IDA-2002-6504]
[http://dx.doi.org/10.1109/TPAMI.1979.4766926] [PMID: 21868861]
[http://dx.doi.org/10.1016/j.csl.2005.06.002]
[http://dx.doi.org/10.1109/CIDU.2012.6382186]
[http://dx.doi.org/10.1186/1471-2105-7-3] [PMID: 16398926]
[http://dx.doi.org/10.1145/2939672.2939785]
[http://dx.doi.org/10.1109/TRO.2011.2127110]
[http://dx.doi.org/10.1109/TSE.1976.233837]
[http://dx.doi.org/10.1109/32.295895]
[http://dx.doi.org/10.1109/QRS.2015.14]
[http://dx.doi.org/10.1109/TSE.2012.70]
[http://dx.doi.org/10.1145/2884781.2884804]
[http://dx.doi.org/10.1109/QRS.2017.42]
[http://dx.doi.org/10.1016/j.eswa.2009.12.056]
[http://dx.doi.org/10.1109/TR.2014.2370891]
[http://dx.doi.org/10.1109/NAFIPS.2007.383813]
[http://dx.doi.org/10.1049/iet-sen.2017.0148]
[http://dx.doi.org/10.1002/cpe.5478]
[http://dx.doi.org/10.1016/j.procs.2018.05.055]