Potential Use of Tree-based Tools for Chemometric Analysis of Infrared Spectra

Pp: 46-67 (22)

* (Excluding Mailing and Handling)

Abstract

One of the most elegant and versatile techniques of machine learning is the decision tree. The decision tree is a simple tool to predict and explain the relationship between the object and the target value, recursively partitioning the input space. Tree ensembles such as random forest and gradient boosting trees significantly improve the predictive power of supervised models based on tree weak predictors. In a random forest, the generalized error that is included in the model prediction is dependent on the correlation strength between the trees and the individual predictors' quality. The random selection of features in each node split is at the core of random forest, which makes it as effective as other complex machine learning techniques while having a lower computational cost, which is appealing in the analysis of large data matrices such as those generated by infrared spectroscopy because most analysts do not have computers with high processing capacity for implementing those complex models. Also, techniques based on the decision tree are more robust to noise, which is preferable for the analysis of trace level contaminants. In this chapter, we present the techniques based on decision trees and apply them to solve problems related to classification, regression, and feature selection in spectra obtained experimentally and provided by public repositories. Comparisons of the performance obtained with techniques based on the decision tree in relation to other chemometric tools are also performed.

Keywords: Analytical screening, ATR, CART, Chemometrics, Data mining, Discriminant analysis, Decision trees, Feature selection, FTIR, Gradient boosting machines, Machine learning, Non-parametric models, Non-Linearity, Neural Networks, NIR, Predictive models, Supervised learning, Random forest, Regression, Validation.