Abstract
With the avalanche of protein sequences generated in the post-genomic age, many typical topics in bioinformatics, proteomics and system biology are relevant to identification of various attributes of uncharacterized proteins or need this kind of knowledge. Unfortunately, it is both time-consuming and costly to acquire the desired information by purely conducting biochemical experiments. Therefore, it is highly desirable to develop automated methods for fast and accurately identifying various attributes of proteins based on their sequences information alone. This is the convergence between bioinformatics and artificial intelligence techniques (AI). To establish powerful computational methods in this regard, one of the key procedures is to find an effective mathematical expression for the protein samples that can truly reflect their intrinsic correlation with the target to be predicted. To realize this, the pseudo amino acid (PseAA) composition or PseAAC was proposed. Stimulated by the concept of PseAAC, a series of different modes of PseAAC were developed to deal with proteins or proteins-related systems. The current review is mainly focused on those PseAAC modes that were formulated via cellular automata. By using some optimal space-time evolvement rules of cellular automata, a protein sequence can be represented by a unique image, the so-called cellular automata (CA) image or CAI. Many important features, which are deeply hidden in piles of long and complicated amino acid sequences, can be clearly revealed through their CAIs. It is anticipated that, owing to its impressive power, intuitiveness and relative simplicity, the CAI approach holds a great potential in bioinformatics and other related areas.
Keywords: Pseudo amino acid composition, cellular automata, sequence-derived features, protein sequence image expression, protein attributes, Fast Fourer transform, G-protein coupled receptors, Complexity Factor, Amino Acids