Generic placeholder image

Current Medical Imaging

Editor-in-Chief

ISSN (Print): 1573-4056
ISSN (Online): 1875-6603

Review Article

A Review on Lung Cancer Diagnosis Using Data Mining Algorithms

Author(s): Farzad Heydari and Marjan Kuchaki Rafsanjani*

Volume 17, Issue 1, 2021

Published on: 25 June, 2020

Page: [16 - 26] Pages: 11

DOI: 10.2174/1573405616666200625153017

Price: $65

Abstract

Due to the serious consequences of lung cancer, medical associations use computer-aided diagnostic procedures to diagnose this disease more accurately. Despite the damaging effects of lung cancer on the body, the lifetime of cancer patients can be extended by early diagnosis. Data mining techniques are practical in diagnosing lung cancer in its first stages. This paper surveys a number of leading data mining-based cancer diagnosis approaches. Moreover, this review draws a comparison between data mining approaches in terms of selection criteria and presents the advantages and disadvantages of each method.

Keywords: Lung cancer, machine learning, data mining algorithms, detection accuracy, dignosis, MRI.

Erratum In:
A Review on Lung Cancer Diagnosis Using Data Mining Algorithms

Graphical Abstract

[1]
Stewart B, Wild CP. World cancer report 2014.
[2]
Brown MS, McNitt-Gray MF, Goldin JG, Suh RD, Sayre JW, Aberle DR. Patient-specific models for lung nodule detection and surveillance in CT images. IEEE Trans Med Imaging 2001; 20(12): 1242-50.
[http://dx.doi.org/10.1109/42.974919] [PMID: 11811824]
[3]
Collins LG, Haines C, Perkel R, Enck RE. Lung cancer: diagnosis and management. Am Fam Physician 2007; 75(1): 56-63.
[PMID: 17225705]
[4]
Cosman PC, Tseng C, Gray RM, et al. Tree-structured vector quantization of CT chest scans: image quality and diagnostic accuracy. IEEE Trans Med Imaging 1993; 12(4): 727-39.
[http://dx.doi.org/10.1109/42.251124] [PMID: 18218468]
[5]
Dewes P, Frellesen C, Al-Butmeh F, et al. Comparative evaluation of non-contrast CAIPIRINHA-VIBE 3T-MRI and multidetector CT for detection of pulmonary nodules: In vivo evaluation of diagnostic accuracy and image quality. Eur J Radiol 2016; 85(1): 193-8.
[http://dx.doi.org/10.1016/j.ejrad.2015.11.020] [PMID: 26724665]
[6]
De Nunzio G, Massafra A, Cataldo R, et al. Approaches to juxta-pleural nodule detection in CT images within the MAGIC-5 Collaboration. Nucl Instrum Methods Phys Res A 2011; 648: 103-6.
[http://dx.doi.org/10.1016/j.nima.2010.12.082]
[7]
Mulshine JL, Gierada DS, Armato SG III, et al. Role of the quantitative imaging biomarker alliance in optimizing CT for the evaluation of lung cancer screen-detected nodules. J Am Coll Radiol 2015; 12(4): 390-5.
[http://dx.doi.org/10.1016/j.jacr.2014.12.003] [PMID: 25842017]
[8]
Tariq A, Akram MU, Javed MY. Lung nodule detection in CT images using neuro fuzzy classifier. Proceeding of the fourth international workshop on computational intelligence in medical imaging (CIMI). 2013 April 16-19; Singapore, Singapore. 49-53.
[http://dx.doi.org/10.1109/CIMI.2013.6583857]
[9]
Samuel CC, Saravanan V, Devi MV. Lung nodule diagnosis from CT images using fuzzy logic. Proceeding of the international conference on computational intelligence and multimedia applications (ICCIMA). 2007 Dec 13-15; Sivakasi, Tamil Nadu, India. 159-63.
[http://dx.doi.org/10.1109/ICCIMA.2007.236]
[10]
Ritchie AJ, Sanghera C, Jacobs C, et al. Pan-Canadian early detection of lung cancer study group. Computer vision tool and technician as first reader of lung cancer screening CT scans. J Thorac Oncol 2016; 11(5): 709-17.
[http://dx.doi.org/10.1016/j.jtho.2016.01.021] [PMID: 26994641]
[11]
Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin 2019; 69(1): 7-34.
[http://dx.doi.org/10.3322/caac.21551] [PMID: 30620402]
[12]
Oser MG, Niederst MJ, Sequist LV, Engelman JA. Transformation from non-small-cell lung cancer to small-cell lung cancer: molecular drivers and cells of origin. Lancet Oncol 2015; 16(4): e165-72.
[http://dx.doi.org/10.1016/S1470-2045(14)71180-5] [PMID: 25846096]
[13]
Murray JF, Nadel JA. Murray & Nadel’s Textbook of Respiratory Medicine. Elsevier Saunders 2016.
[14]
Krishnaiah V, Narsimha G, Chandra DNS. Diagnosis of lung cancer prediction system using data mining classification techniques. Int J Comput Sci Info Technolo 2013; 4(1): 39-45.
[15]
El-Regaily SA, Salem MA, Abdel Aziz MH, Roushdy MI. Survey of computer aided detection systems for lung cancer in computed tomography. Curr Med Imaging 2018; 14(1): 3-18.
[http://dx.doi.org/10.2174/1573405613666170602123329]
[16]
Paulson SS, Scruth E. Legal and ethical concerns of big data: predictive analytics. Clin Nurse Spec 2017; 31(5): 237-9.
[http://dx.doi.org/10.1097/NUR.0000000000000315] [PMID: 28806228]
[17]
Marcu LG, Boyd C, Bezak E. Current issues regarding artificial intelligence in cancer and health care. Implications for medical physicists and biomedical engineers. Health Technol 2019; 9(4): 375-81.
[http://dx.doi.org/10.1007/s12553-019-00348-x]
[18]
Hand D, Mannila H, Smyth P. Principles of data mining 2001. MIT Press 2001.
[19]
Yoo I-H, Song M. Biomedical ontologies and text mining for biomedicine and healthcare: A survey. J Comput Sci Eng 2008; 2(2): 109-36.
[http://dx.doi.org/10.5626/JCSE.2008.2.2.109]
[20]
Richards G, Rayward-Smith VJ, Sönksen PH, Carey S, Weng C. Data mining for indicators of early mortality in a database of clinical records. Artif Intell Med 2001; 22(3): 215-31.
[http://dx.doi.org/10.1016/S0933-3657(00)00110-X] [PMID: 11377148]
[21]
Hand DJ. Data mining: statistics and more. Am Stat 1998; 52(2): 112-8.
[22]
Fayyad U, Piatetsky-Shapiro G, Smyth P. The KDD process of extracting useful knowledge form volumes of data. Commun ACM 1996; 39(11): 27-34.
[http://dx.doi.org/10.1145/240455.240464]
[23]
Fayyad U, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery in databases. AI Mag 1996; 17(3): 37-7.
[24]
Brachman RJ, Khabaza T, Kloesgen W, Piatetsky-Shapiro G, Simoudis E. Mining business databases. Commun ACM 1996; 39(11): 42-9.
[http://dx.doi.org/10.1145/240455.240468]
[25]
Velickov S, Solomatine D. Predictive data mining: practical example Proceedings of the 2nd Joint Workshop on Applied AI in Civil Engineering 2000; 1-17.
[26]
Dunham MH. Data mining: Introductory and Advanced Topics. Prentice Hall 2003.
[27]
Rajan JR, Chelvan AC, Duela JS. Multi-class neural networks to predict lung cancer. J Med Syst 2019; 43(7): 211.
[http://dx.doi.org/10.1007/s10916-019-1355-9] [PMID: 31152236]
[28]
Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995; 20(3): 273-97.
[http://dx.doi.org/10.1007/BF00994018]
[29]
Elavarasan D, Vincent DR, Sharma V, Zomaya AY, Srinivasan K. Forecasting yield by integrating agrarian factors and machine learning models: A survey. Comput Electron Agric 2018; 155: 257-82.
[http://dx.doi.org/10.1016/j.compag.2018.10.024]
[30]
Freund T, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 1997; 55(1): 119-39.
[http://dx.doi.org/10.1006/jcss.1997.1504]
[31]
Morra JH, Tu Z, Apostolova LG, Green AE, Toga AW, Thompson PM. Comparison of AdaBoost and support vector machines for detecting Alzheimer’s disease through automated hippocampal segmentation. IEEE Trans Med Imaging 2010; 29(1): 30-43.
[http://dx.doi.org/10.1109/TMI.2009.2021941] [PMID: 19457748]
[32]
Situ N, Yuan X, Zouridakis G. Boosting instance prototypes to detect local dermoscopic features. Proceeding of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 5561-4.
[33]
Douglas PK, Harris S, Yuille A, Cohen MS. Performance comparison of machine learning algorithms and number of independent components used in fMRI decoding of belief vs. disbelief. Neuroimage 2011; 56(2): 544-53.
[http://dx.doi.org/10.1016/j.neuroimage.2010.11.002] [PMID: 21073969]
[34]
Lopes R, Ayache A, Makni N, et al. Prostate cancer characterization on MR images using fractal features. Med Phys 2011; 38(1): 83-95.
[http://dx.doi.org/10.1118/1.3521470] [PMID: 21361178]
[35]
Bezdek JC. Pattern Recognition with Fuzzy Objective Function Algorithms. Springer Science & Business Media 2013.
[36]
Yong Y, Chongxun Z, Pan L. A novel fuzzy c-means clustering algorithm for image thresholding. Meas Sci Rev 2004; 4(1): 11-9.
[37]
Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med 2001; 23(1): 89-109.
[http://dx.doi.org/10.1016/S0933-3657(01)00077-X] [PMID: 11470218]
[38]
Quinlan JR. Discovering rules by induction from large collections of examples.Expert Systems in the Micro Electronics Age. Edinburgh University Press 1979.
[39]
Quinlan JR. Learning efficient classification procedures and their application to chess end games Mach Learn. Springer 1983; pp. 463-82.
[40]
Daliri MR. A hybrid automatic system for the diagnosis of lung cancer based on genetic algorithm and fuzzy extreme learning machines. J Med Syst 2012; 36(2): 1001-5.
[http://dx.doi.org/10.1007/s10916-011-9806-y] [PMID: 22113438]
[41]
Huang G-B, Ding X, Zhou H. Optimization method based extreme learning machine for classification. Neurocomput 2010; 74(1-3): 155-63.
[http://dx.doi.org/10.1016/j.neucom.2010.02.019]
[42]
Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: theory and applications Neurocomput 2006; 70(1-3): 89-501.
[http://dx.doi.org/10.1016/j.neucom.2005.12.126]
[43]
Machine Learning Repository. ics.uci.edu2020.http://archive.ics.uci.edu/ml
[44]
Lu C, Zhu Z, Gu X. An intelligent system for lung cancer diagnosis using a new genetic algorithm based feature selection method. J Med Syst 2014; 38(9): 97.
[http://dx.doi.org/10.1007/s10916-014-0097-y] [PMID: 24994515]
[45]
Han M, Liu X. Feature selection techniques with class separability for multivariate time series. Neurocomput 2013; 110: 29-34.
[http://dx.doi.org/10.1016/j.neucom.2012.12.006]
[46]
Taşcı E, Uğur A. Shape and texture based novel features for automated juxtapleural nodule detection in lung CTs. J Med Syst 2015; 39(5): 46.
[http://dx.doi.org/10.1007/s10916-015-0231-5] [PMID: 25732079]
[47]
LIDC-IDRI - The Cancer Imaging Archive (TCIA) Public Access - Cancer Imaging Archive Wiki. 2020. https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI
[48]
Theodoridis S, Koutroumbas K. Pattern Recognition. New York: Academic Press 1999.
[49]
Liu H, Motoda H. Feature selection for knowledge discovery and data mining. Springer Science & Business Media 2012.
[50]
Dobson AJ, Barnett AG. An Introduction to Generalized Linear Models. Chapman and Hall/CRC 2008.
[51]
Cover TM. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE T Elect Comput 1965; 3: 326-34.
[http://dx.doi.org/10.1109/PGEC.1965.264137]
[52]
Breiman L. Bagging predictors. Mach Learn 1996; 24(2): 123-40.
[http://dx.doi.org/10.1007/BF00058655]
[53]
Specht DF. Probabilistic neural networks and the polynomial Adaline as complementary techniques for classification. IEEE Trans Neural Netw 1990; 1(1): 111-21.
[http://dx.doi.org/10.1109/72.80210] [PMID: 18282828]
[54]
Fisher RA. The use of multiple measurements in taxonomic problems. Ann Eugen 1936; 7(2): 179-88.
[http://dx.doi.org/10.1111/j.1469-1809.1936.tb02137.x]
[55]
Manikandan T, Bharathi N. Lung cancer detection using fuzzy auto-seed cluster means morphological segmentation and SVM classifier. J Med Syst 2016; 40(7): 181.
[http://dx.doi.org/10.1007/s10916-016-0539-9] [PMID: 27299354]
[56]
Vapnik V. The nature of Statistical Learning Theory. Springer science & business media 2013.
[57]
Vapnik VN. Statistical Learning Theory. Wiley New York 1998.
[58]
Scholkopf B, Sung K-K, Burges CJ, et al. Comparing support vector machines with Gaussian kernels to radial basis function classifiers. IEEE Trans Image Process 1997; 45(11): 2758-65.
[http://dx.doi.org/10.1109/78.650102]
[59]
Wei G, Cao H, Ma H, Qi S, Qian W, Ma Z. Content-based image retrieval for lung nodule classification using texture features and learned distance metric. J Med Syst 2017; 42(1): 13.
[http://dx.doi.org/10.1007/s10916-017-0874-5] [PMID: 29185058]
[60]
Xiong Y, Luo Y, Huang W, Zhang W, Yang Y, Gao J. A novel classification method based on ICA and ELM: a case study in lie detection. Biomed Mater Eng 2014; 24(1): 357-63.
[http://dx.doi.org/10.3233/BME-130818] [PMID: 24211917]
[61]
Wei G, Ma H, Qian W, Qiu M. Similarity measurement of lung masses for medical image retrieval using kernel based semisupervised distance metric. Med Phys 2016; 43(12): 6259-69.
[http://dx.doi.org/10.1118/1.4966030] [PMID: 27908158]
[62]
Singh GAP, Gupta P. Performance analysis of various machine learning-based approaches for detection and classification of lung cancer in humans. Neural Comput Appl 2019; 31(10): 6863-77.
[http://dx.doi.org/10.1007/s00521-018-3518-x]
[63]
Head-Neck-Radiomics-HN1 - The Cancer Imaging Archive (TCIA) Public Access - Cancer Imaging Archive Wiki. 2020. https://wiki.cancerimagingarchive.net/display/Public/Head-Neck-Radiomics-HN1
[64]
Shakeel PM, Tolba A, Al-Makhadmeh Z, Jaber MM. Automatic detection of lung cancer from biomedical data set using discrete AdaBoost optimized ensemble learning generalized neural networks. Neural Comput Appl 2020; 32(3): 777-90.
[http://dx.doi.org/10.1007/s00521-018-03972-2]
[65]
Bhattacharjee A, Richards WG, Staunton J, et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA 2001; 98(24): 13790-5.
[http://dx.doi.org/10.1073/pnas.191502998] [PMID: 11707567]
[66]
Luque-Baena RM, Urda D, Subirats JL, Franco L, Jerez JM. Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data. Theor Biol Med Model 2014; 11(S1): S7.
[http://dx.doi.org/10.1186/1742-4682-11-S1-S7] [PMID: 25077572]
[67]
Zhao Z, Feng J, Jing K, Shi E, Eds. A hybrid ACOR algorithm for pattern classification neural network training. Proceeding of the International Conference on Computing Intelligence and Information System (CIIS). 177-83.
[http://dx.doi.org/10.1109/CIIS.2017.35]
[68]
Senthil S, Ayshwarya B. Lung cancer prediction using feed forward back propagation neural networks with optimal features. International J Appl Eng Res 2018; 13(1): 318-25.
[69]
Geng Y, Zhang L, Sun Y, Zhang Y, Yang N, Wu J. Research on ant colony algorithm optimization neural network weights blind equalization algorithm. Int J Secur Appl 2016; 10(2): 95-104.
[http://dx.doi.org/10.14257/ijsia.2016.10.2.09]
[70]
Manickavasagam R, Selvan S. Automatic detection and classification of lung nodules in CT image using optimized neuro fuzzy classifier with cuckoo search algorithm. J Med Syst 2019; 43(3): 77.
[http://dx.doi.org/10.1007/s10916-019-1177-9] [PMID: 30758682]
[71]
Kavitha MS, Shanthini J, Sabitha R. ECM-CSD: An efficient classification model for cancer stage diagnosis in CT lung images using FCM and SVM techniques. J Med Syst 2019; 43(3): 73.
[http://dx.doi.org/10.1007/s10916-019-1190-z] [PMID: 30746555]
[72]
Lopez-Molina C, De Baets B, Bustince H, Sanz J, Barrenechea E. Multiscale edge detection based on Gaussian smoothing and edge tracking. Knowl Base Syst 2013; 44: 101-11.
[http://dx.doi.org/10.1016/j.knosys.2013.01.026]
[73]
Gadelmawla E. A vision system for surface roughness characterization using the gray level co-occurrence matrix. NDT Int 2004; 37(7): 577-88.
[http://dx.doi.org/10.1016/j.ndteint.2004.03.004]
[74]
Palani D, Venkatalakshmi K. An IoT based predictive modelling for predicting lung cancer using fuzzy cluster based segmentation and classification. J Med Syst 2018; 43(2): 21.
[http://dx.doi.org/10.1007/s10916-018-1139-7] [PMID: 30564924]
[75]
Parida P, Bhoi N. Transition region based single and multiple object segmentation of gray scale images. Eng Sci Technolo. Int J 2016; 19(3): 1206-15.
[76]
Otsu N. A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern B Cybern 1979; 9(1): 62-6.
[http://dx.doi.org/10.1109/TSMC.1979.4310076]
[77]
Rangaswamy C, Raju G, Seshikala G. Novel approach for lung image segmentation through enhanced fuzzy C-means algorithm. Int J Pure Appl Math 2017; 117(21): 455-65.
[78]
Kaviarasi. R, Gandhi R. R. Accuracy Enhanced Lung Cancer Prognosis for Improving Patient Survivability Using Proposed Gaussian Classifier System. J Med Syst 2019; 43(7): 201.
[http://dx.doi.org/10.1007/s10916-019-1297-2] [PMID: 31127444]
[79]
Accessing the Data - SEER Datasets. SEER 2020. https://seer.cancer.gov/data/access.html

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy