Abstract
Background: The hybridization stability of single and double stranded DNA sequences has been studied extensively and its impact on bio-computing, bio-sensing and bio-quantification technologies such as microarrays, Real-time PCR and DNA sequencing is significant. In many bioinformatics applications DNA duplex hybridization is traditionally estimated using GC-content and melting temperature calculations based on the sequence base composition.
Objective: In this study we explore the equivalence of the two approaches when estimating DNA sequence hybridization and we show that GC-content is a far from perfect predictor of DNA strand hybridization strength compared to experimentally-determined melting temperatures. Method: To test the assumption that DNA GC-content is a good indicator of its melting temperature, we formulate a research hypothesis and we apply the Pearson product-moment correlation statistical model to measure the strength of a linear association between the GC-content and melting temperatures. Results: We built a manually curated set of 373 experimental data points collected from 21 publications, each point representing a DNA strand with length between 4 and 35 nucleotides and its corresponding experimentally determined melting temperature measured under specific sequence and salt concentrations. For each data point we calculated the corresponding GC-content and we separated the set into 12 subsets to minimize the variability of experimental conditions. Conclusion: Based on calculated Pearson product-moment correlation coefficients we conclude that GC-content only seldom correlates well with experimentally determined melting temperatures and thus it is not a strictly necessary constraint when used to control the uniformity of DNA strands.Keywords: DNA sequence, GC-content, hybridization, melting temperature, oligonucleotides, Pearson correlation.
Graphical Abstract