Abstract
To understand the structure and function of a protein, an important task is to know where it occurs in the cell. Thus, a computational method for properly predicting the subcellular location of proteins would be significant in interpreting the original data produced by large-scale genome sequencing projects. Prediction of protein subcellular localization is now a hot topic in bioinformatics community, which has been extensively studied in the past several years. Many computational methods have been proposed by the investigators, but they are still far from the final frontier. Among these methods, except for the prediction algorithms, the main factor influencing the prediction performance of various methods is the techniques used to extract features for characterizing proteins, i.e. the protein encoding schemes. To enhance the prediction performance of existing methods, many different approaches have been taken towards developing efficient and accurate methods for protein subcellular localization prediction, ranging from sorting signal based systems to machine learning as well as a variety of alignment-free techniques based on the physiochemical properties of their amino acid sequences. This review describes the inherent difficulties in developing a protein subcellular localization method and includes feature extraction techniques previously employed in this area. It is anticipated to serve as a guide for readers working in this field.
Keywords: Subcellular localization, protein function, feature extraction, protein encoding, computational biology, bioinformatics