Abstract
Among childhood cancer, acute lymphoblastic leukaemia (ALL) has been the most extensively studied propelled by the desire to improve survival rate. DNA microarray technology has expanded rapidly providing an extensive source of data that promise to pave the way for better prognosis and diagnosis of cancer and identify key targets for drug development. DNA microarray data analysis has been carried out using statistical analysis as well as machine learning and data mining approaches. In this paper, we present a comprehensive review of machine learning approaches that have been used on ALL microarray data. Followed by the research conducted by biological and medical childhood leukaemia research groups, machine learning has been used to enhance cancer diagnosis and subtype classification, development of novel therapeutic approaches and accurate identification of risk stratification of patients. These methods have been used in four major areas of microarray data analysis: gene selection, clustering, classification and pathway analysis. Each machine learning algorithm has its own advantages and drawbacks. Highlights of these as well as some outstanding future research and challenges are summarized in this paper. This review aims to serve as a starting point for those interested in microarray analysis in general and cancer research in particular.
Keywords: Childhood acute lymphoblastic leukaemia, machine learning, classification, clustering, gene regulatory networks