Abstract
Background: Through the analysis of the relevant data of industrial equipment, faults diagnosis is helpful for system maintenance and reducing economic losses.
Objective: This study aimed at reducing the influence of irrelevant features and efficiently training the FIR-XgBoost model.
Methods: An Extreme Gradient Boosting (XgBoost) approach based on feature importance ranking (FIR) is proposed in this article for fault classification of high-dimensional complex industrial systems. Gini index is applied to rank the features according to the importance, and feature selection is implemented based on their position in the ranking.
Results: The dataset from the PHM 2021 data challenge, which is related to the process of fuse thermal imaging, is used. The classification accuracy of FIR-XgBoost has been found to be 99.63%, outperforming other existing algorithms. A case study is presented to show that excellent fault classification can be achieved through ensemble learning and feature selection.
Conclusion: Data-driven machine learning methods are proposed for solving high-dimensional fault classification problems on the dataset of the PHM2021 Data Challenge. An FIR-XgBoost method is proposed, the core of which is to retain important features and to reduce redundancy of sensor data. Consequently, feature selection based on FIR has better interpretability than other algorithms. Furthermore, the FIR- XgBoost algorithm retaining the 50 most important features has been observed to achieve the best fault classification performance among the compared algorithms and can be implemented in specific industrial processes.
Keywords: Fault classification, ensemble learning, feature importance ranking, PHM 2021 data challenge, XgBoost, algorithm.
Graphical Abstract
[http://dx.doi.org/10.3724/SP.J.1187.2010.00001]
[http://dx.doi.org/10.1109/TIE.2019.2892705]
[http://dx.doi.org/10.1098/rsta.2020.0360] [PMID: 34398651]
[http://dx.doi.org/10.1109/JSEN.2020.3033153]
[http://dx.doi.org/10.1109/OJIES.2020.3046044]
[http://dx.doi.org/10.1109/TII.2020.2998086]
[http://dx.doi.org/10.1109/TASE.2015.2487523]
[http://dx.doi.org/10.1109/JSEN.2020.3023748]
[http://dx.doi.org/10.1016/j.conengprac.2020.104673]
[http://dx.doi.org/10.1109/JSEN.2015.2497545]
[http://dx.doi.org/10.1109/TII.2017.2695583]
[http://dx.doi.org/10.1109/TCAD.2015.2459046]
[http://dx.doi.org/10.1109/JSEN.2020.2995109]
[http://dx.doi.org/10.1109/TIE.2016.2519325]
[http://dx.doi.org/10.1109/TII.2019.2915559]
[http://dx.doi.org/10.1016/j.jprocont.2019.06.011]
[http://dx.doi.org/10.1109/ACCESS.2018.2818678]
[http://dx.doi.org/10.1109/TNNLS.2017.2677441] [PMID: 28333644]
[http://dx.doi.org/10.1007/s10462-020-09896-5]
[PMID: 25014984]
[http://dx.doi.org/10.1109/TIE.2018.2889614]
[http://dx.doi.org/10.1109/TSG.2019.2938090]
[http://dx.doi.org/10.1016/j.ress.2021.108297]