Abstract
Background: The open and accessible regions of the chromosome are more likely to be bound by transcription factors which are important for nuclear processes and biological functions. Studying the change of chromosome flexibility can help to discover and analyze disease markers and improve the efficiency of clinical diagnosis. Current methods for predicting chromosome flexibility based on Hi-C data include the Flexibility-Rigidity Index (FRI) and the Gaussian Network Model (GNM), which have been proposed to characterize chromosome flexibility. However, these methods require the chromosome structure data based on 3D biological experiments, which is time-consuming and expensive.
Objective: Generally, the folding and curling of the double helix sequence of DNA have a great impact on chromosome flexibility and function. Motivated by the success of genomic sequence analysis in biomolecular function analysis, we hope to propose a method to predict chromosome flexibility only based on genomic sequence data.
Methods: We propose a new method (named "DeepCFP") using deep learning models to predict chromosome flexibility based on only genomic sequence features. The model has been tested in the GM12878 cell line.
Results: The maximum accuracy of our model has reached 91%. The performance of DeepCFP is close to FRI and GNM.
Conclusion: The DeepCFP can achieve high performance only based on genomic sequence.
Keywords: Deep learning, chromosome flexibility, genomic sequence, neural network, genomic sequence analysis, attention algorithm.
Graphical Abstract