Abstract
Minimum-margin nearest neighbor instance selection is one of the data reduction techniques for the feed-forward neural network that can improve the scalability and make incremental learning computationally feasible. The technique utilizes the Euclidean distance function to produce a reduced training set that can conserve the distribution model of the original training set and yield a comparable level of the classification accuracy of the neural network as the original training set. Nevertheless, the technique does not consider the range of the attributes, causing some attributes to dominate the other attributes on the Euclidean distance calculation. This paper studies an integration of six normalization techniques; Min-Max, Z-score, Mean MAD, Median MAD, Hyperbolic Tangent Estimator, and Modified Hyperbolic Tangent, with the minimum-margin nearest neighbor instance selection algorithms to improve the data reduction performance, the execution time reduction performance of the data reduction techniques, and the classification performance of the feedforward neural network. The experimental results on the real-world datasets from the UCI database and ELENA project confirmed that the Min-Max, Z-score, Mean MAD, and Median MAD normalization techniques could improve the data reduction performance more efficient than the Modified Hyperbolic Tangent and Hyperbolic Tangent Estimator normalization techniques. The Median MAD normalization could improve the execution time reduction performance of the data reduction techniques most efficiently. Besides, most normalization techniques could improve the accuracy, error rate, precision, recall, and F1-score of the feed-forward neural network when the final training set included both the normalized original training sets and the reduced normalized training sets.
Keywords: Data mining, data normalization, data reduction, instance selection, nearest neighbor, neural network, parallel algorithm.
Graphical Abstract