Abstract
We evaluated the i-peptides occurrence frequency in the protein sequences, belonging to two reference datasets containing structured and disordered protein domains. Moreover we estimated the most frequent i-peptides (with i= 2, 3, 4) into these sequences in order to select specific i-peptides for each structural classification. According to these specific ipeptides, a new binary classification method was developed for predicting if a given protein sequence can be classified as “disordered” or “structured”. The best results were obtained using the tri-peptides, much more able to gain structural information from sequences compared to the di-peptides.
Keywords: Binary classification method, disordered proteins, disorder prediction, peptides occurrence frequency, protein folding, protein unfolding, method, disordered, disorder, peptides, protein, unfolding, i-peptides, DNA, AD, (NACP), p53, (IUPs), 25PDB, DisProt, specific i-peptides, validation, correlation coefficient, weighted score, CASP8, PREDISPRO, CGI, hydrophilic, 3-Dstructure, (AC), PREC-LASSPRO, CH-plot