Abstract
Background: Patents reveal that security information producing big data make the risk assessment task very difficult at risk level evaluation because of high computational and communication overheads in collecting, storing, and managing information. Under the background of big data era today, facing vast amounts of the security information, information security risk assessment require special technologies to efficiently process large number of data. The traditional machine learning methods have some weakness need to overcome. We review the well-known machine learning methods, such as SVM, DT and Rough set. The weakness of them we pointed as follows. 1) parameters needs to optimized, 2)more computational costs and 3) too objectivity.
Methods: This study proposes an evolutionary algorithm to address these problems, namely Clustering and Support Subset algorithm (CSS). The CSS algorithm (1) discretizes the evaluated risk level and (2) extract the rules fromthe sample set to predict the risk level of the testing set.
Results: A number of experiments have been performed to compare CSS with the traditional classification algorithms to evaluate the CSS’s performance in classification with the datasets of UCI machine learning repository. Experimental results indicate that CSS can efficiently predict the risk level with a higher accuracy rate and the application of CSS algorithmin the information security risk assessment is shown in the experiments.
Conclusion: The CSS algorithm is an alternatively machine learning method with three superiority: 1) less parameters, only on parameter in k-means step and the k is given by the user; 2) more efficiency, the time complexity can be calculated and shown the CSS is less computational cost; 3) more accuracy, the experiment results indicated the CSS have more powerful classification ability. Therefore the efficiency and the performance of prediction ability are better than ANN, SVM and C4.5. Consequently, the CSS can be used as alternative algorithmfor information security risk assessment.
Keywords: Support subset, clustering, information security, risk assessment, classification, CSS, machine learning.
Graphical Abstract