Systematic Review of Machine Learning-Based Open-Source Software
Maintenance Effort Estimation

Chaymae      Miloudi; Laila      Cheikhi; Alain      Abran
doi:10.2174/2666255816666220609110712
Abstract

Background: Software maintenance is known as a laborious activity in the software lifecycle and is often considered more expensive than other activities. Open-Source Software (OSS) has gained considerable acceptance in the industry recently, and the Maintenance Effort Estimation (MEE) of such software has emerged as an important research topic. In this context, researchers have conducted a number of open-source software maintenance effort estimation (OMEE) studies based on statistical as well as machine learning techniques for better estimation.
Objective: The objective of this study is to perform a systematic literature review (SLR) to analyze and summarize the empirical evidence of O-MEE ML techniques in current research through a set of five Research Questions (RQs) related to several criteria (e.g. data pre-processing tasks, data mining tasks, tuning parameter methods, accuracy criteria and statistical tests, as well as ML techniques reported in the literature that outperformed).
Methods: We performed a systematic literature review of 36 primary empirical studies published from 2000 to June 2020, selected based on an automated search of six digital databases.
Results: The findings show that Bayesian networks, decision tree, support vector machines and instance-based reasoning were the ML techniques most used; few studies opted for ensemble or hybrid techniques. Researchers have paid less attention to O-MEE data pre-processing in terms of feature selection, methods that handle missing values and imbalanced datasets, and tuning parameters of ML techniques. Classification data mining is the task most addressed using different accuracy criteria such as Precision, Recall, and Accuracy, as well as Wilcoxon and Mann-Whitney statistical tests.
Conclusion: This SLR identifies a number of gaps in the current research and suggests areas for further investigation. For instance, since OSS includes different data source formats, researchers should pay more attention to data pre-processing and develop new models using ensemble techniques since they have proved to perform better.
Keywords: Machine learning techniques, maintenance effort estimation, open-source software, empirical studies, data preprocessing, ensemble techniques.
Graphical Abstract

[1]
A. Abran,  and H. Nguyenkim, "Measurement of the maintenance process from a demand-based perspective", J. Softw. Maint. Res. Pract., vol. 5, no. 2, pp. 63-90, 1993.
 [http://dx.doi.org/10.1002/smr.4360050202]
[2]
E. Capra, C. Francalanci,  and F. Merlo, "The economics of community open source software projects: An empirical analysis of maintenance effort", Adv. Softw. Eng., vol. 2010, pp. 1-17, 2010.
 [http://dx.doi.org/10.1155/2010/685950]
[3]
B. Golden, Succeeding with Open Source., Addison-Wesley, 2005.
[4]
H. Wu, L. Shi, C. Chen, Q. Wang,  and B. Boehm, "Maintenance effort estimation for open source software: A systematic literature review", IEEE International Conference on Software Maintenance and Evolution (ICSME), 2016, pp. 32-43.
 [http://dx.doi.org/10.1109/ICSME.2016.87]
[5]
M. Syeed, I. Hammouda,  and T. Syatä, "Evolution of open source software projects: A systematic literature review", J. Softw., vol. 8, no. 11, 2013.
 [http://dx.doi.org/10.4304/jsw.8.11.2815-2829]
[6]
P. Bhattacharya,  and I. Neamtiu, "Assessing programming language impact on development and maintenance: A study on c and c", 33rd International Conference on Software Engineering, Waikiki, Honolulu, HI, USA, 2011, pp. 171-180.
 [http://dx.doi.org/10.1145/1985793.1985817]
[7]
L. Yu, "Indirectly predicting the maintenance effort of open-source software", J. Softw. Maint. Evol. Res. Pract., vol. 18, no. 5, pp. 311-332, 2006.
 [http://dx.doi.org/10.1002/smr.335]
[8]
C.J. Xiong, Y.F. Li, M. Xie, S.H. Ng,  and T.N. Goh, "A model of open source software maintenance activities", IEEE International Conference on Industrial Engineering and Engineering Management, 2009, pp. 267-271.
 Hong Kong, China. [http://dx.doi.org/10.1109/IEEM.2009.5373367]
[9]
M. El Bajta, A. Idri, J.L. Fernández-Alemán, J. Nicolas Ros,  and A. Toval, "Software cost estimation for global software development - A systematic map and review study", 10th International Conference on Evaluation of Novel Approaches to Software Engineering, Barcelona, Spain, 2015, pp. 197-206.
 [http://dx.doi.org/10.5220/0005371501970206]
[10]
Y. Xie, "Values and limitations of statistical models", Res. Soc. Stratification Mobility, vol. 29, no. 3, pp. 343-349, 2011.
 [http://dx.doi.org/10.1016/j.rssm.2011.04.001] [PMID:  22043133]
[11]
D. Zhang,  and J.J.P. Tsai, Machine Learning Applications In Software Engineering, Series on Software Engineering and Knowledge Engineering., World Scientific Publishing Co., Inc.: USA, 2005. World Scientific Publishing Co., Inc.: USA, 2005.
[12]
A. Idri, F. Amazal,  and A. Abran, "Analogy-based software development effort estimation: A systematic mapping and review", Inf. Softw. Technol., vol. 58, pp. 206-230, 2015.
 [http://dx.doi.org/10.1016/j.infsof.2014.07.013]
[13]
M. Sharma, M. Kumari, R.K. Singh,  and V.B. Singh, Multiattribute based machine learning models for severity prediction in cross project context. Computational Science and Its Applications-ICCSA 2014., vol. 8583. Springer International Publishing: Cham, 2014, pp. 227-241.
 [http://dx.doi.org/10.1007/978-3-319-09156-3_17]
[14]
A. Adewumi, S. Misra, N. Omoregbe, B. Crawford,  and R. Soto, "A systematic literature review of open source software quality assessment models", Springerplus, vol. 5, no. 1, p. 1936, 2016.
 [http://dx.doi.org/10.1186/s40064-016-3612-4] [PMID:  27872799]
[15]
M. Kaur,  and S. Sood, Web effort estimation techniques: A systematic literature review., IJACT, 2019, pp. 3462-3471.
[16]
A. Saeed, W.H. Butt, F. Kazmi,  and M. Arif, "Survey of software development effort estimation techniques", 7th International Conference on Software and Computer Applications - ICSCA 2018, Kuantan, Malaysia, 2018, p. 82-86.
 [http://dx.doi.org/10.1145/3185089.3185140]
[17]
R. Marco, N. Suryana,  and S.S.S. Ahmad, "A systematic literature review on methods for software effort estimation", J. Theor. Appl. Inform. Technol. Vol, no. 2, p. 31, 2005.
[18]
D. Wickramaarachchi,  and R. Lai, "Effort estimation in global software development - a systematic review", Comput. Sci. Inf. Syst., vol. 14, no. 2, pp. 393-421, 2017.
 [http://dx.doi.org/10.2298/CSIS160229007W]
[19]
U. Kaur,  and G. Singh, "A review on software maintenance issues and how to reduce maintenance efforts", Int. J. Comput. Appl., vol. 118, no. 1, pp. 6-11, 2015.
 [http://dx.doi.org/10.5120/20707-3021]
[20]
H.C. Benestad, B. Anda,  and E. Arisholm, "Understanding software maintenance and evolution by analyzing individual changes: A literature review", J. Softw. Maint. Evol. Res. Pract., vol. 21, no. 6, pp. 349-378, 2009.
 [http://dx.doi.org/10.1002/smr.412]
[21]
K. Bennett,  and V. Rajlich, "Software maintenance and evolution: A roadmap", Conference on The Future of Software Engineering - ICSE 2000, 2000, p. 14.
 [http://dx.doi.org/10.1145/336512.336534]
[22]
R. Mallett, J. Hagen-Zanker, R. Slater,  and M. Duvendack, "The benefits and challenges of using systematic reviews in international development research", J. Dev. Effect., vol. 4, no. 3, pp. 445-455, 2012.
 [http://dx.doi.org/10.1080/19439342.2012.711342]
[23]
B. Kitchenham,  and S. Charters, "Guidelines for performing Systematic Literature Reviews in Software Engineering, EBSE 2007-001", Keele University and Durham University Joint Report, 2007.
[24]
K. Petersen, S. Vakkalanka,  and L. Kuzniarz, "Guidelines for conducting systematic mapping studies in software engineering: An update", Inf. Softw. Technol., vol. 64, pp. 1-18, 2015.
 [http://dx.doi.org/10.1016/j.infsof.2015.03.007]
[25]
B. A. Kitchenham, D. Budgen, and P. Brereton, Evidence-Based Software Engineering and Systematic Reviews., Taylor & Francis, 2015.
[26]
S. Elmidaoui, L. Cheikhi, A. Idri,  and A. Abran, "Empirical studies on software product maintainability prediction: A systematic mapping and review", E-Inform, vol. Vol XIII, p. 2019, 2019.
 [http://dx.doi.org/10.5277/e-inf190105]
[27]
J. Wen, S. Li, Z. Lin, Y. Hu,  and C. Huang, "Systematic literature review of machine learning based software development effort estimation models", Inf. Softw. Technol., vol. 54, no. 1, pp. 41-59, 2012.
 [http://dx.doi.org/10.1016/j.infsof.2011.09.002]
[28]
A. Idri, M. Hosni,  and A. Abran, "Systematic literature review of ensemble effort estimation", J. Syst. Softw., vol. 118, pp. 151-175, 2016.
 [http://dx.doi.org/10.1016/j.jss.2016.05.016]
[29]
M. Badri, F. Toure,  and L. Lamontagne, "Predicting unit testing effort levels of classes: An exploratory study based on multinomial logistic regression modeling", Procedia Comput. Sci., vol. 62, pp. 529-538, 2015.
 [http://dx.doi.org/10.1016/j.procs.2015.08.528]
[30]
C. Couto, P. Pires, M.T. Valente, R.S. Bigonha,  and N. Anquetil, "Predicting software defects with causality tests", J. Syst. Softw., vol. 93, pp. 24-41, 2014.
 [http://dx.doi.org/10.1016/j.jss.2014.01.033]
[31]
L.L. Minku,  and X. Yao, "Software effort estimation as a multiobjective learning problem", ACM Trans. Softw. Eng. Methodol., vol. 22, no. 4, pp. 1-32, 2013.
 [http://dx.doi.org/10.1145/2522920.2522928]
[32]
B.V. Rompaey,  and S. Demeyer, "Estimation of test code changes using historical release data", 15th Working Conference on Reverse Engineering, 2008, pp. 269-278.
 [http://dx.doi.org/10.1109/WCRE.2008.29]
[33]
K.K. Sabor, M. Hamdaqa,  and A. Hamou-Lhadj, "Automatic prediction of the severity of bugs using stack traces and categorical features", Inf. Softw. Technol., vol. 123, p. 106205, 2020.
 [http://dx.doi.org/10.1016/j.infsof.2019.106205]
[34]
W.Y. Ramay, Q. Umer, X.C. Yin, C. Zhu,  and I. Illahi, "Deep neural network-based severity prediction of bug reports", IEEE Access, vol. 7, pp. 46846-46857, 2019.
 [http://dx.doi.org/10.1109/ACCESS.2019.2909746]
[35]
Q. Umer, H. Liu,  and Y. Sultan, "Emotion based automated priority prediction for bug reports", IEEE Access, vol. 6, pp. 35743-35752, 2018.
 [http://dx.doi.org/10.1109/ACCESS.2018.2850910]
[36]
H. Wang,  and H. Kagdi, "A conceptual replication study on bugs that get fixed in open source software", IEEE International Conference on Software Maintenance and Evolution (ICSME), 2018, pp. 299-310.
 [http://dx.doi.org/10.1109/ICSME.2018.00039]
[37]
D. Di Nucci, F. Palomba, G. De Rosa, G. Bavota, R. Oliveto,  and A. De Lucia, "A developer centered bug prediction model", IEEE Trans. Softw. Eng., vol. 44, no. 1, pp. 5-24, 2018.
 [http://dx.doi.org/10.1109/TSE.2017.2659747]
[38]
S. Guo, R. Chen, M. Wei, H. Li,  and Y. Liu, "Ensemble data reduction techniques and multi-RSMOTE via fuzzy integral for bug report classification", IEEE Access, vol. 6, pp. 45934-45950, 2018.
 [http://dx.doi.org/10.1109/ACCESS.2018.2865780]
[39]
A. Kaur,  and D. S. Singh, "Comparison of maintenance activity for effort estimation in open source software projects", nt. J. Adv. Res. Comput. Sci., p. 5, 2017.
[40]
M. Sharma,  and A. Tondon, Developing prediction models to assist software developers and support managers. Computational Science and Its Applications – ICCSA 2017., vol. 10408. Springer International Publishing: Cham, 2017, pp. 548-560.
 [http://dx.doi.org/10.1007/978-3-319-62404-4_41]
[41]
P. Ardimento,  and A. Dinapoli, "Knowledge extraction from on-line open source bug tracking systems to predict bug-fixing time," 7th International Conference on Web Intelligence, Mining and Semantics - WIMS ’17, 2017, p. 1-9.
 [http://dx.doi.org/10.1145/3102254.3102275]
[42]
S. Akbarinasaji, B. Caglayan,  and A. Bener, "Predicting bug-fixing time: A replication study using an open source software project", J. Syst. Softw., vol. 136, pp. 173-186, 2018.
 [http://dx.doi.org/10.1016/j.jss.2017.02.021]
[43]
M. Habayeb, S.S. Murtaza, A. Miranskyy,  and A.B. Bener, "On the use of hidden markov model to predict the time to fix bugs", IEEE Trans. Softw. Eng., vol. 44, no. 12, pp. 1224-1244, 2018.
 [http://dx.doi.org/10.1109/TSE.2017.2757480]
[44]
S. Bibi, A. Ampatzoglou,  and I. Stamelos, A bayesian belief network for modeling open source software maintenance productivity. Open Source Systems: Integrating Communities., vol. 472. Springer International Publishing: Cham, 2016, pp. 32-44.
 [http://dx.doi.org/10.1007/978-3-319-39225-7_3]
[45]
F. Thung, "Automatic prediction of bug fixing effort measured by code churn size", In 5th International Workshop on Software Mining – Software Mining, 2016, p. 18-23.
 [http://dx.doi.org/10.1145/2975961.2975964]
[46]
S. Akbarinasaji, A.B. Bener,  and A. Erdem, "Measuring the principal of defect debt", 5th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering - RAISE ’16, 2016, p. 1-7.
 [http://dx.doi.org/10.1145/2896995.2896999]
[47]
P. Ardimento, M. Bilancia,  and S. Monopoli, Predicting bug-fix time: Using standard versus topic-based text categorization techniques. Discov. Sci., vol. 9956. Springer International Publishing: Cham, 2016, pp. 167-182.
 [http://dx.doi.org/10.1007/978-3-319-46307-0_11]
[48]
M.S. Rakha, W. Shang,  and A.E. Hassan, "Studying the needed effort for identifying duplicate issues", Empir. Softw. Eng., vol. 21, no. 5, pp. 1960-1989, 2016.
 [http://dx.doi.org/10.1007/s10664-015-9404-6]
[49]
M. Sharma, M. Kumari,  and V.B. Singh, The Way Ahead for Bug-fix time Prediction., 2015.
[50]
H. Valdivia Garcia,  and E. Shihab, "Characterizing and predicting blocking bugs in open source projects", In 11th Working Conference on Mining Software Repositories - MSR, 2014, p. 72-81.
 [http://dx.doi.org/10.1145/2597073.2597099]
[51]
X. Wang, Y. Dang, L. Zhang, D. Zhang, E. Lan,  and H. Mei, "Predicting consistency-maintenance requirement of code clonesat copy-and-paste time", IEEE Trans. Softw. Eng., vol. 40, no. 8, pp. 773-794, 2014.
 [http://dx.doi.org/10.1109/TSE.2014.2323972]
[52]
W. AbdelMoez, M. Kholief,  and F.M. Elsalmy, "Improving bug fix-time prediction model by filtering out outliers", The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE), 2013, pp. 359-364.
 [http://dx.doi.org/10.1109/TAEECE.2013.6557301]
[53]
W. Abdelmoez, M. Kholief,  and F.M. Elsalmy, "Bug fix-time prediction model using naive Bayes classifier", 22nd International Conference on Computer Theory and Applications (ICCTA), 2012, pp. 167-172.
 [http://dx.doi.org/10.1109/ICCTA.2012.6523564]
[54]
H. Hata, O. Mizuno,  and T. Kikuno, "Bug prediction based on fine-grained module histories", 34th International Conference on Software Engineering (ICSE), 2012, pp. 200-210.
 [http://dx.doi.org/10.1109/ICSE.2012.6227193]
[55]
S. Karus,  and M. Dumas, "Code churn estimation using organisational and code metrics: An experimental comparison", Inf. Softw. Technol., vol. 54, no. 2, pp. 203-211, 2012.
 [http://dx.doi.org/10.1016/j.infsof.2011.09.004]
[56]
H. Hosseini, R. Nguyen,  and M.W. Godfrey, "A market-based bug allocation mechanism using predictive bug lifetimes", 16th European Conference on Software Maintenance and Reengineering, 2012, pp. 149-158.
 [http://dx.doi.org/10.1109/CSMR.2012.25]
[57]
Jifeng Xuan, Yan Hu, He Jiang, J. X, Y. H, and H. J,, "Debt-prone bugs: Technical debt in software maintenance", Int. J. Adv. Comput. Technol., vol. 4, no. 19, pp. 453-461, 2012.
 [http://dx.doi.org/10.4156/ijact.vol4.issue19.54]
[58]
N. Duc Anh, D.S. Cruzes, R. Conradi,  and C. Ayala, "Empirical validation of human factors in predicting issue lead time in open source projects", 7th International Conference on Predictive Models in Software Engineering - Promise ’11, 2011, p. 1-10.
 [http://dx.doi.org/10.1145/2020390.2020403]
[59]
L. Marks, Y. Zou,  and A.E. Hassan, "Studying the fix-time for bugs in large open source projects", 7th International Conference on Predictive Models in Software Engineering - Promise ’11, 2011, p. 1-8.
 [http://dx.doi.org/10.1145/2020390.2020401]
[60]
G. Bougie, C. Treude, D.M. German,  and M-A. Storey, "A comparative exploration of FreeBSD bug lifetimes", 7th IEEE Working Conference on Mining Software Repositories- MSR, 2010, p. 106-109.
 [http://dx.doi.org/10.1109/MSR.2010.5463291]
[61]
A. Hassouna,  and L. Tahvildari, "An effort prediction framework for software defect correction", Inf. Softw. Technol., vol. 52, no. 2, pp. 197-209, 2010.
 [http://dx.doi.org/10.1016/j.infsof.2009.10.003]
[62]
E. Giger, M. Pinzger,  and H. Gall, "Predicting the fix time of bugs", In 2nd International Workshop on Recommendation Systems for Software Engineering - RSSE ’10, 2010, p. 56 52 
 [http://dx.doi.org/10.1145/1808920.1808933]
[63]
N.K. Nagwani,  and S. Verma, "Predictive data mining model for software bug estimation using average weighted similarity", IEEE 2nd International Advance Computing Conference (IACC), 2010, p. 373-378.
 [http://dx.doi.org/10.1109/IADCC.2010.5422923]
[64]
Y. Kamei, S. Matsumoto, A. Monden, K. Matsumoto, B. Adams,  and A.E. Hassan, "Revisiting common bug prediction findings using effort-aware models", IEEE International Conference on Software Maintenance, 2010, pp. 1-10.
 [http://dx.doi.org/10.1109/ICSM.2010.5609530]
[65]
S. Ahsan, J. Ferzund,  and F. Wotawa, "Program file bug fix effort estimation using machine learning methods for OSS", International Conference on Software Engineering and Knowledge Engineering, 2009, pp. 129-134. San Francisco, United States.
[66]
C. Weiss, R. Premraj, T. Zimmermann,  and A. Zeller, "How long will it take to fix this bug?", Fourth International Workshop on Mining Software Repositories MSR’07:ICSE Workshops, 2007, pp. 1-1.
 [http://dx.doi.org/10.1109/MSR.2007.13]
[67]
L.D. Panjer, "Predicting eclipse bug lifetimes", Fourth International Workshop on Mining Software Repositories -MSR’07:ICSE Workshops, 2007, p. 29-39.
 [http://dx.doi.org/10.1109/MSR.2007.25]
[68]
A.S. Nayak,  and A.P. Kanive, NMAM Institute of Technology Dept. of Computer Science and Engineering, "Survey on preprocessing techniques for text mining", Int. J. Eng. Comput. Sci. no. Jun, 2016.
 [http://dx.doi.org/10.18535/ijecs/v5i6.25]
[69]
J. Luts, F. Ojeda, R. Van de Plas, B. De Moor, S. Van Huffel,  and J.A.K. Suykens, "A tutorial on support vector machine-based methods for classification problems in chemometrics", Anal. Chim. Acta, vol. 665, no. 2, pp. 129-145, 2010.
 [http://dx.doi.org/10.1016/j.aca.2010.03.030] [PMID:  20417323]
[70]
N.V. Chawla, K.W. Bowyer, L.O. Hall,  and W.P. Kegelmeyer, "SMOTE: Synthetic minority over-sampling technique", J. Artif. Intell. Res., vol. 16, pp. 321-357, 2002.
 [http://dx.doi.org/10.1613/jair.953]
[71]
P.J. García-Laencina, J-L. Sancho-Gómez,  and A.R. Figueiras-Vidal, "Pattern classification with missing data: A review", Neural Comput. Appl., vol. 19, no. 2, pp. 263-282, 2010.
 [http://dx.doi.org/10.1007/s00521-009-0295-6]
[72]
O. Maimon, L. Rokach, Eds., Data Mining and Knowledge Discovery Handbook., Springer US: Boston, MA, 2010.
 [http://dx.doi.org/10.1007/978-0-387-09823-4]
[73]
Q. Shen, R. Diao,  and P. Su, Feature selection ensemble. A. Voronkov (eds.).Turing-100. The Alan Turing Centenary, vol. 10. p. 289-306.
 [http://dx.doi.org/10.29007/rlxq]
[74]
G. Chandrashekar,  and F. Sahin, "A survey on feature selection methods", Comput. Electr. Eng., vol. 40, no. 1, pp. 16-28, 2014.
 [http://dx.doi.org/10.1016/j.compeleceng.2013.11.024]
[75]
A. Jovic, K. Brkic,  and N. Bogunovic, "A review of feature selection methods with applications", 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2015, pp. 1200-1205.
 Opatija, Croatia. [http://dx.doi.org/10.1109/MIPRO.2015.7160458]
[76]
Z.M. Hira,  and D.F. Gillies, "A review of feature selection and feature extraction methods applied on microarray data", Adv. Bioinforma., vol. 2015, p. 198363, 2015.
 [http://dx.doi.org/10.1155/2015/198363] [PMID:  26170834]
[77]
M.A. Jabbar, B.L. Deekshatulu,  and P. Chandra, "Computational intelligence technique for early diagnosis of heart disease", IEEE International Conference on Engineering and Technology (ICETECH), Coimbatore, India, 2015, pp. 1-6.
 [http://dx.doi.org/10.1109/ICETECH.2015.7275001]
[78]
S.F. Shazmeen, M.A. Mustafa,  and A. Baig, "Performance evaluation of different data mining classification algorithm and predictive analysis", IOSR-JCE, vol. Vol 10. 2013, no. 6, p. 2278-8727.
 [http://dx.doi.org/10.9790/0661-1060106]
[79]
L. Wilkinson,  Classification and regression treesSYSTAT 13 Statistics I. SYSTAT Software, Inc (Ed.), 2009, p. p. 23.
[80]
E. Kocaguneli, T. Menzies,  and J.W. Keung, "On the value of ensemble effort estimation", IEEE Trans. Softw. Eng., vol. 38, no. 6, pp. 1403-1416, 2012.
 [http://dx.doi.org/10.1109/TSE.2011.111]
[81]
P. Probst, B. Bischl,  and A-L. Boulesteix, "Tunability: Importance of hyperparameters of machine learning algorithms", ArXiv, 2018.
[82]
S. Elmidaoui, L. Cheikhi, A. Idri,  and A. Abran, "Machine learning techniques for software maintainability prediction: Accuracy analysis", J. Comput. Sci. Technol., vol. 35, no. 5, pp. 1147-1174, 2020.
 [http://dx.doi.org/10.1007/s11390-020-9668-1]
[83]
M.J. Shepperd, L. Pickard, S.G. MacDonell,  and B.A. Kitchenham, Assessing prediction systems, Information Science Discussion Papers, Series No. 99/14., University of Otago, 1999.
[84]
E. McCrum-Gardner, "Which is the correct statistical test to use?", Br. J. Oral Maxillofac. Surg., vol. 46, no. 1, pp. 38-41, 2008.
 [http://dx.doi.org/10.1016/j.bjoms.2007.09.002] [PMID:  17961892]
[85]
G. Lang, Q. Li,  and L. Guo, "Discernibility matrix simplification with new attribute dependency functions for incomplete information systems", Knowl. Inf. Syst., vol. 37, no. 3, pp. 611-638, 2013.
 [http://dx.doi.org/10.1007/s10115-012-0589-3]
[86]
K. Kaur,  Statistical Comparison of Machine Learning Techniques for Predicting Software Maintainability and Defects, India, 2016. University School of Information and Communication Technology.
[87]
S.B. Kotsiantis, D. Kanellopoulos,  and P.E. Pintelas, "Handling imbalanced datasets: A review", GESTS Int. Trans. Comput. Sci. Eng., vol. 30, no. 1, pp. 25-36, 2006.
[88]
H. He,  and E.A. Garcia, "Learning from imbalanced data", IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263-1284, 2009.
 [http://dx.doi.org/10.1109/TKDE.2008.239]
[89]
N. Japkowicz,  and S. Stephen, "The class imbalance problem: A systematic study", Intell. Data Anal., vol. 6, no. 5, pp. 429-449, 2002.
 [http://dx.doi.org/10.3233/IDA-2002-6504]
[90]
T.R. Hoens,  and N.V. Chawla, Imbalanced datasets: From sampling to classifiers. Imbalanced Learning., John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2013, pp. 43-59.
 [http://dx.doi.org/10.1002/9781118646106.ch3]
[91]
O. Toka,  and M. Çetin, "Imputation and deletion methods under the presence of missing values and outliers: A comparative study", GU J Sci, vol. 29, no. 4, pp. 799-809, 2016.
[92]
A. Bala,  Impact analysis of a multiple imputation technique for handling missing value in the ISBSG repository of software projects - Espace ETS (etsmtl.ca), 2013.
[93]
K. Tamura, T. Kakimoto, K. Toda, M. Tsunoda, A. Monden,  and K. Matsumoto, "Empirical evaluation of missing data techniques for effort estimation", International Workshop on Software Productivity Analysis and Cost Estimation (SPACE2008), 2008, pp. 4-9.
[94]
M.L. Yadav,  and B. Roychoudhury, "“Handling missing values: A study of popular imputation packages in R,” Knowl.-", Knowl. Base. Syst., vol. 160, pp. 104-118, 2018.
 [http://dx.doi.org/10.1016/j.knosys.2018.06.012]
[95]
A. Idri, I. Abnane,  and A. Abran, "Systematic mapping study of missing values techniques in software engineering data", In IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2015, p. 1-8 
 [http://dx.doi.org/10.1109/SNPD.2015.7176280]
[96]
M.M. Rahman,  and D.N. Davis, "Addressing the class imbalance problem in medical datasets", Int. J. Mach. Learn. Comput., vol. 3, no. 2, p. 2013, 2013.
 [http://dx.doi.org/10.7763/IJMLC.2013.V3.307]
[97]
A-C. Haury, P. Gestraud,  and J-P. Vert, "The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures", PLoS One, vol. 6, no. 12, p. e28210, 2011.
 [http://dx.doi.org/10.1371/journal.pone.0028210] [PMID:  22205940]
[98]
F. Ye, X. Zhu,  and Y. Wang, "A new software maintainability evaluation model based on multiple classifiers combination", International Conference on Quality, Reliability, Risk, Maintenance, and Safety Engineering (QR2MSE), 2013, pp. 1588-1591.
 Chengdu, China. [http://dx.doi.org/10.1109/QR2MSE.2013.6625879]
[99]
L. Wang, X. Hu, Z. Ning,  and W. Ke, "Predicting object-oriented software maintainability using projection pursuit regression", In 2009 First International Conference on Information Science and Engineering, 26-28 Dec. 2009, Nanjing, China, IEEE, 2009. 
 [http://dx.doi.org/10.1109/ICISE.2009.845]
[100]
S. Olatunji, "Sensitivity-based linear learning method and extreme learning machines compared for software maintainability prediction of object-oriented software systems", ICTACT J. Soft Comput., vol. 03, no. 3, pp. 514-523, 2013.
 [http://dx.doi.org/10.21917/ijsc.2013.0077]
[101]
C-F. Tsai,  and M-L. Chen, "Credit rating by hybrid machine learning techniques", Appl. Soft Comput., vol. 10, no. 2, pp. 374-380, 2010.
 [http://dx.doi.org/10.1016/j.asoc.2009.08.003]
[102]
L. Song, L.L. Minku,  and X. Yao, "The impact of parameter tuning on software effort estimation using learning machines", 9th International Conference on Predictive Models in Software Engineering - PROMISE ’13, Baltimore, Maryland, 2013, p. 1-10.
 [http://dx.doi.org/10.1145/2499393.2499394]
[103]
F. Hutter, J. Lücke,  and L. Schmidt-Thieme, "“Beyond manual tuning of hyperparameters,” KI - Künstl", Intell., vol. 29, no. 4, pp. 329-337, 2015.
 [http://dx.doi.org/10.1007/s13218-015-0381-0]
[104]
J. Bergstra,  and Y. Bengio, "Random search for hyper-parameter optimization", J. Mach. Learn. Res., vol. 13, pp. 281-305, 2012.
[105]
J. Snoek, H. Larochelle,  and R.P. Adams, Practical Bayesian Optimization of Machine Learning Algorithms., 2012, p. 9.
[106]
M. Birattari, Z. Yuan, P. Balaprakash,  and T. Stützle, F-Race and iterated F-race: An overviewExperimental Methods for the Analysis of Optimization Algorithms., Springer Berlin Heidelberg: Berlin, Heidelberg, 2010, pp. 311-336.
 [http://dx.doi.org/10.1007/978-3-642-02538-9_13]
[107]
S. Elmidaoui, L. Cheikhi,  and A. Idri, "The impact of smote and grid search on maintainability prediction models", IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA), 2019, p. 1-8.
 [http://dx.doi.org/10.1109/AICCSA47632.2019.9035342]
[108]
M. Hosni, A. Idri, A. Abran,  and A.B. Nassif, "On the value of parameter tuning in heterogeneous ensembles effort estimation", Soft Comput., vol. 22, no. 18, pp. 5977-6010, 2018.
 [http://dx.doi.org/10.1007/s00500-017-2945-4]
[109]
F. Pedregosa, "Scikit-learn: Machine learning in python", J. Mach. Learn. Res., vol. 12, pp. 2825-2830, 2011.
[110]
A. Idri, I. Abnane,  and A. Abran, "Evaluating Pred(p) and standardized accuracy criteria in software development effort estimation", J. Softw. (Malden), vol. 30, no. 4, p. e1925, 2018.
 [http://dx.doi.org/10.1002/smr.1925]
[111]
F. Last, G. Douzas,  and F. Bacao, "Oversampling for imbalanced learning based on K-Means and SMOTE", Inf. Sci., vol. 465, pp. 1-20, 2018.
 [http://dx.doi.org/10.1016/j.ins.2018.06.056]
[112]
L.A. Jeni, J.F. Cohn,  and F. De La Torre, "Facing imbalanced data-recommendations for the use of performance metrics", Humaine Association Conference on Affective Computing and Intelligent Interaction, Geneva, Switzerland, 2013, pp. 245-251.
 [http://dx.doi.org/10.1109/ACII.2013.47]
[113]
L.L. Minku,  and X. Yao, "Can cross-company data improve performance in software effort estimation?", In 8th International Conference on Predictive Models in Software Engineering - PROMISE ’12, 2012pp. 69-78 Lund, Sweden, 2012. 
 [http://dx.doi.org/10.1145/2365324.2365334]
[114]
C. Lokan,  and E. Mendes, "Investigating the use of chronological split for software effort estimation", IET Softw., vol. 3, no. 5, p. 422, 2009.
 [http://dx.doi.org/10.1049/iet-sen.2008.0107]
[115]
P.M. Bentler,  and D.G. Bonett, "Significance tests and goodness of fit in the analysis of covariance structures", Psychol. Bull., vol. 88, no. 3, pp. 588-606, 1980.
 [http://dx.doi.org/10.1037/0033-2909.88.3.588]
[116]
L.C. Briand, V.R. Brasili,  and C.J. Hetmanski, "Developing interpretable models with optimized set reduction for identifying high-risk software components", IEEE Trans. Softw. Eng., vol. 19, no. 11, pp. 1028-1044, 1993.
 [http://dx.doi.org/10.1109/32.256851]
[117]
S. Elmidaoui, L. Cheikhi,  and A. Idri, Accuracy comparison of empirical studies on software product maintainability prediction.WorldCIST’18 Advances in Intelligent Systems and Computing., vol. Vol. 746. Springer: Cham, 2018.
 [http://dx.doi.org/10.1007/978-3-319-77712-2_3]
[118]
M.J. Shepperd,  and G.F. Kadoda, "Comparing software prediction techniques using simulation", IEEE Trans. Softw. Eng., vol. 27, no. 11, pp. 1014-1022, 2001.
 [http://dx.doi.org/10.1109/32.965341]
Rights & Permissions Print Cite
Journal Information
For Authors
For Editors
For Reviewers
Explore Articles
Open Access
Open Access Articles
For Visitors
DOI https://dx.doi.org/10.2174/2666255816666220609110712	Print ISSN 2666-2558
Publisher Name Bentham Science Publisher	Online ISSN 2666-2566
Recent Advances in Computer Science and Communications

Systematic Review of Machine Learning-Based Open-Source Software Maintenance Effort Estimation

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract