Abstract
OPLS discriminant analysis (OPLS-DA) was successfully applied for the selection of a limited number of gene transcripts necessary to discriminate PTPN11 and RAS mutated cells in acute lymphoblastic leukaemia (ALL) patients. The original set of 273 variables with VIP (1) values higher than 2.0 in the OPLS-DA model could be further reduced to 200 by elimination of less informative variables in the PCA class models adopted for SIMCA classification. The above 200 transcripts not only achieve a satisfactory discrimination accuracy between PTPN11 and RAS mutated cells but also indicate clearly that wild type samples belong to none of the mutated class models. In this list it was possible to identify candidate genes that could be involved in the molecular mechanisms discriminating PTPN11 and RAS mutations in ALL. Among them CBFA2T2, a member of the “ETO” family, is known because of its homology and association with the product of RUNX1-CBFA2T1 gene fusion generated by t(8;21) translocation, one frequent cause of acute myeloid leukemia.
Keywords: ALL, OPLS, leukaemia, gene transcripts, mutations, SIMCA classification, Partial Least Squares, PLS, PLS-DA, SIMCA, PDQuest, OPLS-DA, acute lymphoblastic leukaemia, PTPN11, RAS, PCA, DModX, DCrit, 17AAG, EPH136, CBFA2T2, MTGR1, CBFA2T1, MTG8, ETO, N-CoR, SMRT, mSin3A, a, HDAC, RUNX1-CBFA2T1