Abstract
Background: CpG island (CGI) detection and methylation prediction play important roles in studying the complex mechanisms of CGIs involved in genome regulation. In recent years, machine learning (ML) has been gradually applied to CGI detection and CGI methylation prediction algorithms in order to improve the accuracy of traditional methods. However, there are a few systematic reviews on the application of ML in CGI detection and CGI methylation prediction. Therefore, this systematic review aims to provide an overview of the application of ML in CGI detection and methylation prediction.
Methods: The review was carried out using the PRISMA guideline. The search strategy was applied to articles published on PubMed from 2000 to July 10, 2022. Two independent researchers screened the articles based on the retrieval strategies and identified a total of 54 articles. After that, we developed quality assessment questions to assess study quality and obtained 46 articles that met the eligibility criteria. Based on these articles, we first summarized the applications of ML methods in CGI detection and methylation prediction, and then identified the strengths and limitations of these studies.
Result: Finally, we have discussed the challenges and future research directions.
Conclusion: This systematic review will contribute to the selection of algorithms and the future development of more efficient algorithms for CGI detection and methylation prediction.
Graphical Abstract
[http://dx.doi.org/10.1016/S0140-6736(18)31268-6] [PMID: 30100054]
[http://dx.doi.org/10.1016/j.cell.2013.12.019] [PMID: 24439369]
[http://dx.doi.org/10.1093/bioinformatics/bty392] [PMID: 29762634]
[http://dx.doi.org/10.1126/science.aag3260] [PMID: 28473583]
[http://dx.doi.org/10.1016/j.bpj.2016.12.029] [PMID: 28109529]
[http://dx.doi.org/10.1038/nsmb.1594] [PMID: 19377480]
[http://dx.doi.org/10.1093/bib/bbz134] [PMID: 31982909]
[http://dx.doi.org/10.3389/fbioe.2020.01032] [PMID: 33015010]
[http://dx.doi.org/10.1007/s12038-019-9961-8] [PMID: 31894124]
[http://dx.doi.org/10.1088/1742-6596/1624/4/042026]
[http://dx.doi.org/10.1371/journal.pone.0021036] [PMID: 21738602]
[http://dx.doi.org/10.1016/0022-2836(87)90689-9] [PMID: 3656447]
[http://dx.doi.org/10.1007/978-1-61779-316-5_3] [PMID: 21913069]
[http://dx.doi.org/10.1038/nrg798] [PMID: 11988762]
[http://dx.doi.org/10.1371/journal.pcbi.0030110] [PMID: 17559301]
[http://dx.doi.org/10.1103/PhysRevE.72.011908] [PMID: 16090002]
[http://dx.doi.org/10.3389/fgene.2014.00341] [PMID: 25374580]
[http://dx.doi.org/10.21037/atm.2016.03.38] [PMID: 27429967]
[http://dx.doi.org/10.1109/MASSP.1986.1165342]
[PMID: 26120265]
[http://dx.doi.org/10.1023/A:1010933404324]
[http://dx.doi.org/10.1093/nar/gkq1172] [PMID: 21097890 ]
[http://dx.doi.org/10.1136/bmj.n71] [PMID: 33782057]
[http://dx.doi.org/10.1001/jama.1994.03510380059038] [PMID: 8151853]
[http://dx.doi.org/10.1073/pnas.052410099] [PMID: 11891299]
[http://dx.doi.org/10.1103/PhysRevE.71.061925] [PMID: 16089783]
[http://dx.doi.org/10.1186/s12864-017-3731-5] [PMID: 28589860]
[http://dx.doi.org/10.1007/s12539-020-00370-y] [PMID: 32394270]
[http://dx.doi.org/10.1093/bioinformatics/18.4.631] [PMID: 12016061]
[http://dx.doi.org/10.1093/bioinformatics/bth059] [PMID: 14764558]
[http://dx.doi.org/10.1016/j.fsigen.2018.02.013] [PMID: 29477876]
[http://dx.doi.org/10.6026/97320630002335] [PMID: 18685720]
[http://dx.doi.org/10.1186/gb-2011-12-12-236] [PMID: 22204421]
[PMID: 23193274]
[http://dx.doi.org/10.1093/nar/gki582] [PMID: 15911630]
[http://dx.doi.org/10.1186/1687-4153-2012-12] [PMID: 22931396]
[http://dx.doi.org/10.1371/journal.pone.0144748] [PMID: 26727213]
[http://dx.doi.org/10.1016/j.patrec.2005.10.010]
[http://dx.doi.org/10.2174/1574893609666140829204535]
[http://dx.doi.org/10.1371/journal.pcbi.1005249] [PMID: 27984582]
[http://dx.doi.org/10.1016/S0004-3702(97)00043-X]
[http://dx.doi.org/10.1007/BF00994018]
[http://dx.doi.org/10.2202/1544-6115.1677] [PMID: 23089814]
[http://dx.doi.org/10.1007/s00335-009-9222-5] [PMID: 19777308]
[http://dx.doi.org/10.1109/IEMBS.2011.6091490]
[PMID: 11928508]
[http://dx.doi.org/10.1101/gr.090597.108] [PMID: 19564452]
[http://dx.doi.org/10.1101/gr.121905.111] [PMID: 21875935]
[http://dx.doi.org/10.1093/biostatistics/kxq005] [PMID: 20212320]
[http://dx.doi.org/10.2202/1557-4679.1158] [PMID: 20148132]
[http://dx.doi.org/10.1109/ICNN.1995.488968]
[http://dx.doi.org/10.1109/ISIC.1990.128621]
[http://dx.doi.org/10.1186/1471-2105-7-446] [PMID: 17038168]
[http://dx.doi.org/10.1186/1471-2156-13-13] [PMID: 22385986]
[http://dx.doi.org/10.1016/S0168-9525(00)02024-2] [PMID: 10827456]
[http://dx.doi.org/10.1089/cmb.2016.0178]
[http://dx.doi.org/10.1093/bioinformatics/btn525] [PMID: 18845582]
[http://dx.doi.org/10.1186/1471-2164-12-S2-S10] [PMID: 21989037]
[http://dx.doi.org/10.1023/A:1011342715567]
[http://dx.doi.org/10.1016/j.jtbi.2015.09.014] [PMID: 26427337]
[http://dx.doi.org/10.1093/nar/gkp882] [PMID: 19854943]
[http://dx.doi.org/10.1093/bib/bbk007] [PMID: 16761367]
[http://dx.doi.org/10.1162/089976698300017610] [PMID: 9573418]
[http://dx.doi.org/10.1613/jair.301]
[http://dx.doi.org/10.1093/bioinformatics/btm344] [PMID: 17720704]
[http://dx.doi.org/10.1158/0008-5472.CAN-08-3274] [PMID: 19118013]
[http://dx.doi.org/10.1073/pnas.89.5.1827] [PMID: 1542678]
[http://dx.doi.org/10.1007/978-1-61779-316-5_8] [PMID: 21913074]
[http://dx.doi.org/10.1093/bioinformatics/btl377] [PMID: 16837523]
[http://dx.doi.org/10.1109/IEMBS.2011.6090879]
[http://dx.doi.org/10.1073/pnas.2037852100] [PMID: 14519846]
[http://dx.doi.org/10.6026/97320630009061] [PMID: 23390346]
[http://dx.doi.org/10.1371/journal.pone.0035327] [PMID: 22558141]
[http://dx.doi.org/10.1016/j.jtbi.2013.07.019] [PMID: 23911575]
[http://dx.doi.org/10.1038/s41598-018-37407-8] [PMID: 30655579]
[http://dx.doi.org/10.1051/itmconf/20182300037]
[http://dx.doi.org/10.1371/journal.pgen.0020026] [PMID: 16520826]
[http://dx.doi.org/10.1016/j.bbrc.2008.07.077] [PMID: 18656446]
[http://dx.doi.org/10.1186/1755-8794-6-S1-S13] [PMID: 23369266]
[http://dx.doi.org/10.1186/1471-2105-10-116] [PMID: 19383127]
[http://dx.doi.org/10.1038/srep19598] [PMID: 26797014]
[http://dx.doi.org/10.1371/journal.pone.0058793] [PMID: 23536826]
[http://dx.doi.org/10.1186/s12859-020-3443-8] [PMID: 32183722]
[http://dx.doi.org/10.1039/D0CS01387K] [PMID: 34036968]
[http://dx.doi.org/10.1016/j.ygeno.2014.08.011] [PMID: 25172426]
[http://dx.doi.org/10.1038/s41598-020-65406-1] [PMID: 32451390]
[http://dx.doi.org/10.1016/j.ygeno.2011.07.007] [PMID: 21839163]
[http://dx.doi.org/10.1186/s13059-015-0581-9] [PMID: 25616342]
[http://dx.doi.org/10.1186/s12864-019-5654-9] [PMID: 31014252]
[http://dx.doi.org/10.1186/s12859-015-0850-3] [PMID: 26818685]
[http://dx.doi.org/10.1186/s12864-020-6768-9] [PMID: 32414326]
[http://dx.doi.org/10.1093/bioinformatics/btz916] [PMID: 31808791]
[http://dx.doi.org/10.1145/2939672.2939785]
[http://dx.doi.org/10.1186/s13059-017-1189-z] [PMID: 28395661]
[http://dx.doi.org/10.1016/j.cosrev.2021.100379]
[http://dx.doi.org/10.1093/bioinformatics/btm514] [PMID: 17977884]
[http://dx.doi.org/10.1016/j.compbiomed.2019.103375] [PMID: 31382212]
[http://dx.doi.org/10.1016/j.compbiolchem.2022.107747] [PMID: 35932551]
[http://dx.doi.org/10.1016/j.ygeno.2019.05.007] [PMID: 31078719]
[http://dx.doi.org/10.1093/nar/gkw124] [PMID: 26932361]
[http://dx.doi.org/10.1038/s41467-021-22639-6] [PMID: 33976121]
[http://dx.doi.org/10.1038/srep23421] [PMID: 26996456]
[http://dx.doi.org/10.1016/j.matcom.2014.07.003]
[http://dx.doi.org/10.1002/cnm.1444]
[http://dx.doi.org/10.1186/1742-4682-8-46] [PMID: 22176732]
[http://dx.doi.org/10.1109/TCBB.2021.3049617] [PMID: 33406042]
[http://dx.doi.org/10.1109/TCBB.2019.2935971] [PMID: 31443042]
[http://dx.doi.org/10.1016/j.jcp.2009.03.018]
[http://dx.doi.org/10.1016/j.jmgm.2010.06.010] [PMID: 20675161]
[http://dx.doi.org/10.1016/j.apsb.2021.05.032] [PMID: 34729303]
[http://dx.doi.org/10.1109/JSTARS.2011.2162643]