Compressed Learning and Its Applications to Subcellular Localization

Zhong-Long      Zheng; Li      Guo; Jiong      Jia; Chen-Mao      Xie; Wen-Cai      Zeng; Jie      Yang

doi:10.2174/092986611796011464

Abstract

One of the main challenges faced by biological applications is to predict protein subcellular localization in automatic fashion accurately. To achieve this in these applications, a wide variety of machine learning methods have been proposed in recent years. Most of them focus on finding the optimal classification scheme and less of them take the simplifying the complexity of biological systems into account. Traditionally, such bio-data are analyzed by first performing a feature selection before classification. Motivated by CS (Compressed Sensing) theory, we propose the methodology which performs compressed learning with a sparseness criterion such that feature selection and dimension reduction are merged into one analysis. The proposed methodology decreases the complexity of biological system, while increases protein subcellular localization accuracy. Experimental results are quite encouraging, indicating that the aforementioned sparse methods are quite promising in dealing with complicated biological problems, such as predicting the subcellular localization of Gram-negative bacterial proteins.

Keywords: Subcellular localization, dimensionality reduction, compressive sensing, subspace learning, Gram-negative proteins, multiple-location proteins, positionspecific score matrix, pseudo amino acid, Compressed Learning, dipeptide components, K-Nearest Neighbor, fuzzy K-NN, complexity measure factor, Principal Component Analysis, low-dimensional vectors, Benchmark dataset, Periplasm, Unified Compressed Learning, encoding schemeSubcellular localization, dimensionality reduction, compressive sensing, subspace learning, Gram-negative proteins, multiple-location proteins, positionspecific score matrix, pseudo amino acid, Compressed Learning, dipeptide components, K-Nearest Neighbor, fuzzy K-NN, complexity measure factor, Principal Component Analysis, low-dimensional vectors, Benchmark dataset, Periplasm, Unified Compressed Learning, encoding scheme