non-covalent hydrogen bonds betwixt base pairs of the DNA-Double-Helix visualized through an electron microscope
Hydrogen bonding and stability
Hydrogen bonding is the chemical interaction that underlies the base-pairing rules described above. Appropriate geometrical correspondence of hydrogen bond donors and acceptors allows only the "right" pairs to form stably. DNA with high GC-content is more stable than DNA with low GC-content. But, contrary to popular belief, the hydrogen bonds do not stabilize the DNA significantly; stabilization is mainly due to stacking interactions.
The larger nucleobases, adenine and guanine, are members of a class of double-ringed chemical structures called purines; the smaller nucleobases, cytosine and thymine (and uracil), are members of a class of single-ringed chemical structures called pyrimidines. Purines are complementary only with pyrimidines: pyrimidine-pyrimidine pairings are energetically unfavorable because the molecules are too far apart for hydrogen bonding to be established; purine-purine pairings are energetically unfavorable because the molecules are too close, leading to overlap repulsion. Purine-pyrimidine base pairing of AT or GC or UA (in RNA) results in proper duplex structure. The only other purine-pyrimidine pairings would be AC and GT and UG (in RNA); these pairings are mismatches because the patterns of hydrogen donors and acceptors do not correspond. The GU pairing, with two hydrogen bonds, does occur fairly often in RNA (see wobble base pair).
Paired DNA and RNA molecules are comparatively stable at room temperature, but the two nucleotide strands will separate above a melting point that is determined by the length of the molecules, the extent of mispairing (if any), and the GC content. Higher GC content results in higher melting temperatures; it is, therefore, unsurprising that the genomes of extremophile organisms such as Thermus thermophilus are particularly GC-rich. On the converse, regions of a genome that need to separate frequently — for example, the promoter regions for often-transcribed genes — are comparatively GC-poor (for example, see TATA box). GC content and melting temperature must also be taken into account when designing primers for PCR reactions.
A base pair (bp) is a unit consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA and RNA. Dictated by specific hydrogen bonding patterns, Watson-Crick base pairs (guanine-cytosine and adenine-thymine) allow the DNA helix to maintain a regular helical structure that is subtly dependent on its nucleotide sequence. The complementary nature of this based-paired structure provides a redundant copy of the genetic information encoded within each strand of DNA. The regular structure and data redundancy provided by the DNA double helix make DNA well suited to the storage of genetic information, while base-pairing between DNA and incoming nucleotides provides the mechanism through which DNA polymerase replicates DNA and RNA polymerase transcribes DNA into RNA. Many DNA-binding proteins can recognize specific base pairing patterns that identify particular regulatory regions of genes.
Intramolecular base pairs can occur within single-stranded nucleic acids. This is particularly important in RNA molecules (e.g., transfer RNA), where Watson-Crick base pairs (guanine-cytosine and adenine-uracil) permit the formation of short double-stranded helices, and a wide variety of non-Watson-Crick interactions (e.g., G-U or A-A) allow RNAs to fold into a vast range of specific three-dimensional structures. In addition, base-pairing between transfer RNA (tRNA) and messenger RNA (mRNA) forms the basis for the molecular recognition events that result in the nucleotide sequence of mRNA becoming translated into the amino acid sequence of proteins via the genetic code.
The size of an individual gene or an organism's entire genome is often measured in base pairs because DNA is usually double-stranded. Hence, the number of total base pairs is equal to the number of nucleotides in one of the strands (with the exception of non-coding single-stranded regions of telomeres). The haploid human genome (23 chromosomes) is estimated to be about 3.2 billion bases long and to contain 20,000–25,000 distinct protein-coding genes. A kilobase (kb) is a unit of measurement in molecular biology equal to 1000 base pairs of DNA or RNA. The total amount of related DNA base pairs on Earth is estimated at 5.0 × 1037 and weighs 50 billion tonnes. In comparison, the total mass of the biosphere has been estimated to be as much as 4 TtC (trillion tons of carbon).
DNA could store all of the world's data in one room By Robert Service, March 2, 2017 , 2:00 PM
Humanity has a data storage problem: More data were created in the past 2 years than in all of preceding history. And that torrent of information may soon outstrip the ability of hard drives to capture it. Now, researchers report that they’ve come up with a new way to encode digital data in DNA to create the highest-density large-scale data storage scheme ever invented. Capable of storing 215 petabytes (215 million gigabytes) in a single gram of DNA, the system could, in principle, store every bit of datum ever recorded by humans in a container about the size and weight of a couple of pickup trucks. But whether the technology takes off may depend on its cost.
DNA has many advantages for storing digital data. It’s ultracompact, and it can last hundreds of thousands of years if kept in a cool, dry place. And as long as human societies are reading and writing DNA, they will be able to decode it. “DNA won’t degrade over time like cassette tapes and CDs, and it won’t become obsolete,” says Yaniv Erlich, a computer scientist at Columbia University. And unlike other high-density approaches, such as manipulating individual atoms on a surface, new technologies can write and read large amounts of DNA at a time, allowing it to be scaled up.
Scientists have been storing digital data in DNA since 2012. That was when Harvard University geneticists George Church, Sri Kosuri, and colleagues encoded a 52,000-word book in thousands of snippets of DNA, using strands of DNA’s four-letter alphabet of A, G, T, and C to encode the 0s and 1s of the digitized file. Their particular encoding scheme was relatively inefficient, however, and could store only 1.28 petabytes per gram of DNA. Other approaches have done better. But none has been able to store more than half of what researchers think DNA can actually handle, about 1.8 bits of data per nucleotide of DNA. (The number isn’t 2 bits because of rare, but inevitable, DNA writing and reading errors.)
Erlich thought he could get closer to that limit. So he and Dina Zielinski, an associate scientist at the New York Genome Center, looked at the algorithms that were being used to encode and decode the data. They started with six files, including a full computer operating system, a computer virus, an 1895 French film called Arrival of a Train at La Ciotat, and a 1948 study by information theorist Claude Shannon. They first converted the files into binary strings of 1s and 0s, compressed them into one master file, and then split the data into short strings of binary code. They devised an algorithm called a DNA fountain, which randomly packaged the strings into so-called droplets, to which they added extra tags to help reassemble them in the proper order later. In all, the researchers generated a digital list of 72,000 DNA strands, each 200 bases long.
They sent these as text files to Twist Bioscience, a San Francisco, California–based startup, which then synthesized the DNA strands. Two weeks later, Erlich and Zielinski received in the mail a vial with a speck of DNA encoding their files. To decode them, the pair used modern DNA sequencing technology. The sequences were fed into a computer, which translated the genetic code back into binary and used the tags to reassemble the six original files. The approach worked so well that the new files contained no errors, they report today in Science. They were also able to make a virtually unlimited number of error-free copies of their files through polymerase chain reaction, a standard DNA copying technique. What’s more, Erlich says, they were able to encode 1.6 bits of data per nucleotide, 60% better than any group had done before and 85% the theoretical limit.
“I love the work,” says Kosuri, who is now a biochemist at the University of California, Los Angeles. “I think this is essentially the definitive study that shows you can [store data in DNA] at scale.”
However, Kosuri and Erlich note the new approach isn’t ready for large-scale use yet. It cost $7000 to synthesize the 2 megabytes of data in the files, and another $2000 to read it. The cost is likely to come down over time, but it still has a long ways to go, Erlich says. And compared with other forms of data storage, writing and reading to DNA is relatively slow. So the new approach isn’t likely to fly if data are needed instantly, but it would be better suited for archival applications. Then again, who knows? Perhaps those giant Facebook and Amazon data centers will one day be replaced by a couple of pickup trucks of DNA.
Posted in: Technology