Abstract
One of the most elegant and versatile techniques of machine learning is the
decision tree. The decision tree is a simple tool to predict and explain the relationship
between the object and the target value, recursively partitioning the input space. Tree
ensembles such as random forest and gradient boosting trees significantly improve the
predictive power of supervised models based on tree weak predictors. In a random
forest, the generalized error that is included in the model prediction is dependent on the
correlation strength between the trees and the individual predictors' quality. The
random selection of features in each node split is at the core of random forest, which
makes it as effective as other complex machine learning techniques while having a
lower computational cost, which is appealing in the analysis of large data matrices such
as those generated by infrared spectroscopy because most analysts do not have
computers with high processing capacity for implementing those complex models.
Also, techniques based on the decision tree are more robust to noise, which is
preferable for the analysis of trace level contaminants. In this chapter, we present the
techniques based on decision trees and apply them to solve problems related to
classification, regression, and feature selection in spectra obtained experimentally and
provided by public repositories. Comparisons of the performance obtained with
techniques based on the decision tree in relation to other chemometric tools are also
performed.
Keywords: Analytical screening, ATR, CART, Chemometrics, Data mining, Discriminant analysis, Decision trees, Feature selection, FTIR, Gradient boosting machines, Machine learning, Non-parametric models, Non-Linearity, Neural Networks, NIR, Predictive models, Supervised learning, Random forest, Regression, Validation.