Abstract
Background: Feature selection (FS) is critical for high dimensional data analysis. Ensemble based feature selection (EFS) is a commonly used approach to develop FS techniques. Rank aggregation (RA) is an essential step in EFS where results from multiple models are pooled to estimate feature importance. However, the literature primarily relies on static rule-based methods to perform this step which may not always provide an optimal feature set. The objective of this study is to improve the EFS performance using dynamic learning in RA step.
Method: This study proposes a novel Supervised Rank Aggregation (SRA) approach to allow RA step to dynamically learn and adapt the model aggregation rules to obtain feature importance.
Results: We evaluate the performance of the algorithm using simulation studies and implement it into real research studies, and compare its performance with various existing RA methods. The proposed SRA method provides better or at par performance in terms of feature selection and predictive performance of the model compared to existing methods.
Conclusion: SRA method provides an alternative to the existing approaches of RA for EFS. While the current study is limited to the continuous cross-sectional outcome, other endpoints such as longitudinal, categorical, and time-to-event data could also be used.
Graphical Abstract
[http://dx.doi.org/10.1016/S0076-5392(08)61063-2]
[http://dx.doi.org/10.4171/022-3/31]
[http://dx.doi.org/10.1016/j.inffus.2020.01.005]
[http://dx.doi.org/10.3389/fgene.2021.611506] [PMID: 33692825]
[http://dx.doi.org/10.1007/s10654-009-9411-2] [PMID: 19967429]
[http://dx.doi.org/10.1002/bimj.201700067] [PMID: 29292533]
[http://dx.doi.org/10.1371/journal.pone.0246159] [PMID: 33592034]
[http://dx.doi.org/10.1080/01621459.1988.10478694]
[http://dx.doi.org/10.1186/1751-0473-8-2] [PMID: 23302187]
[http://dx.doi.org/10.3389/fbinf.2022.927312] [PMID: 36304293]
[http://dx.doi.org/10.1111/iere.12623]
[http://dx.doi.org/10.1186/s13059-021-02544-3] [PMID: 34847932]
[http://dx.doi.org/10.1007/s10489-021-02550-9]
[http://dx.doi.org/10.1016/j.eswa.2021.115589]
[http://dx.doi.org/10.1093/bib/bbab354] [PMID: 34498681]
[http://dx.doi.org/10.1016/j.jesit.2017.06.004]
[http://dx.doi.org/10.1201/9781315171401-9]
[http://dx.doi.org/10.1016/j.inffus.2018.11.019]
[http://dx.doi.org/10.1016/S0004-3702(97)00043-X]
[http://dx.doi.org/10.1016/j.ins.2021.02.061]
[http://dx.doi.org/10.1016/j.knosys.2021.107629] [PMID: 34728909]
[http://dx.doi.org/10.1007/s11831-022-09853-1]
[http://dx.doi.org/10.1111/j.1467-9868.2011.00771.x]
2010. [http://dx.doi.org/10.1111/j.1467-9868.2009.00723.x] [PMID: 20107611]
[http://dx.doi.org/10.1016/j.ins.2021.06.096]
[http://dx.doi.org/10.1007/978-3-540-35488-8_6]
[http://dx.doi.org/10.1088/1742-6596/2258/1/012026]
[http://dx.doi.org/10.1007/s11042-023-15207-1]
[http://dx.doi.org/10.1186/s12859-022-04962-x] [PMID: 36199022]
[http://dx.doi.org/10.1007/s10462-019-09800-w]
[http://dx.doi.org/10.1016/j.patcog.2021.107932]
[http://dx.doi.org/10.1016/j.engappai.2022.104894]
[http://dx.doi.org/10.1016/j.aej.2022.10.069]
[http://dx.doi.org/10.3390/cancers15184628] [PMID: 37760597]
[http://dx.doi.org/10.1016/j.knosys.2016.11.017]
[http://dx.doi.org/10.1002/smr.2343]
[http://dx.doi.org/10.3390/app11178122]
[http://dx.doi.org/10.1016/j.knosys.2020.106097]
[http://dx.doi.org/10.1093/jamia/ocy165] [PMID: 30602020]
[http://dx.doi.org/10.1155/2021/5069016] [PMID: 34868291]
[http://dx.doi.org/10.1016/j.jksuci.2019.06.012]
[http://dx.doi.org/10.1016/j.eswa.2023.121383]
[http://dx.doi.org/10.1016/j.patcog.2023.109761]
[http://dx.doi.org/10.1109/TCSVT.2023.3250464]
[http://dx.doi.org/10.14738/tmlai.94.10101]
[http://dx.doi.org/10.1093/bioinformatics/btr709] [PMID: 22247279]
[http://dx.doi.org/10.1007/s12194-017-0394-5] [PMID: 28211015]
[http://dx.doi.org/10.1016/j.ijmedinf.2017.06.004] [PMID: 28750904]
[http://dx.doi.org/10.3390/cancers12071785] [PMID: 32635415]
[http://dx.doi.org/10.1111/j.2517-6161.1996.tb02080.x]
[http://dx.doi.org/10.1080/00401706.1970.10488634]
[http://dx.doi.org/10.1023/A:1010933404324]
[http://dx.doi.org/10.1007/s11135-022-01480-z]
[http://dx.doi.org/10.1371/journal.pone.0278570] [PMID: 36455001]
[http://dx.doi.org/10.3390/sym14061095]
[http://dx.doi.org/10.3390/rs14040829]
[http://dx.doi.org/10.1016/j.egyr.2022.03.092]
[http://dx.doi.org/10.1007/s11063-017-9619-1]
[http://dx.doi.org/10.32604/csse.2022.020043]
[http://dx.doi.org/10.18637/jss.v033.i01] [PMID: 20808728]
[http://dx.doi.org/10.3886/ICPSR20541.v9]
[http://dx.doi.org/10.1016/j.ajhg.2011.12.020] [PMID: 22305529]
[http://dx.doi.org/10.1177/0165551515613226]
[http://dx.doi.org/10.1109/IRI.2012.6303034]
[http://dx.doi.org/10.1186/s40537-022-00607-1]
[http://dx.doi.org/10.1093/bioinformatics/btaa825] [PMID: 32960939]
[http://dx.doi.org/10.1186/s13059-015-0649-6] [PMID: 25968125]
[http://dx.doi.org/10.1093/bioinformatics/btz137] [PMID: 30799505]
[http://dx.doi.org/10.1093/nar/gkx1042] [PMID: 29121237]