Abstract
Background: Breast cancer is one of the most devastating and prominent reasons for women's life loss on the planet today. Therefore, the development of tools for the early detection of breast cancer based on the use of a large amount of available clinical data in combination with machine learning/artificial intelligence models can be a cost-effective and game-changer solution for saving women's lives. Researchers have been working hard to put such a tool into practice in recent years.
Introduction: Breast cancer is a serious health issue, and detecting it early is crucial to improving patient outcomes. Breast cancer is the second most prevalent chronic illness in women behind lung cancer and can also affect males, though it does so significantly on a smaller scale.
Objective: For early detection of breast cancer, a wide range of methods, including oversampling, feature selection, fuzzy logic, and machine learning algorithms, have been explored. This research article aims to predict the type and risk factors of breast cancer.
Methods: In this study, the support vector machine-synthetic minority oversampling technique (SVM-SMOTE) was used to balance the Wisconsin Breast Cancer Original (WBCO) dataset from the UCI repository. To enhance the models’ performance and decrease the number of features, features were extracted using particle swarm optimization (PSO). Furthermore, thirteen types of models are proposed: Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Naïve Bayes (NB), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Gradient Boosting (GB), Extreme Gradient Boosting (XGB), Adaptive Boosting (AB), Categorical Boosting (CB), Light Gradient Boosting Machine (LGBM), Multi-Layer Perceptron (MLP), and Extra Trees (ET). The PSO-suggested features are utilized as input fuzzy variables, and fuzzy logic is trained.
Results: As a result of these proposed techniques, the highest accuracy was obtained at 92.64%, 98.31%, and 97.19% using SVM-SMOTE, SVM-SMOTE+PSO, and PSO+fuzzy logic, respec-tively.
Conclusion: In our investigation, we used fewer attributes (three), such as clump thickness, mar-ginal adhesion, and Bland chromatin, in the fuzzy logic systems and obtained satisfactory results.