How much information is in DNA?
71 comments
·May 8, 2025chromatin
Daniel_sk
Exactly. The compression level of DNA is magnitudes better than anything we can even come close to. DNA usually doesn't even contain specific counts (like 5 fingers on hand) or sizes of organs and so on - these are given by the processes that run in parallel and cause the cells to hit spatial / chemical / electrical or other limits. It's like putting lots of house builders on specific places where the house should be and each one would just keep building a wall until the he hits another one. There is no compressed house plan, it's a compressed "engine" that builds the result.
Earw0rm
Comparing it to machine code on CD/DVD might make more sense then. Machine code where every line has been hand-optimised by nature's hackers over 500 million years.
And in that context, hundreds of MBs is a heck of a lot of complexity.
clickety_clack
You put my reaction to this in much more educated terms. I’ve always felt that thinking of DNA as bits was a bit simplistic. Just because we store information as bits it doesn’t mean that nature does.
Not that it means they can’t be right, but the author also doesn’t seem to have any particular expertise in genetics. Their ideas need to survive a lot more criticism by people who know what they’re talking about before you could start to see them as convincing.
ses1984
T he raw bits of the base pairs is just one component of the information, but it’s like a maximally compressed version of the info.
The laws of physics are another component.
From there you would need to simulate nature to be able to decompress all the data, like how computer programs can use procedural generation.
Imagine a game like Minecraft. You can generate practically infinitely many screenshots of Minecraft worlds, but all that data can be derived from the game code and the jvm.
jampekka
> T he raw bits of the base pairs is just one component of the information, but it’s like a maximally compressed version of the info.
This sounds a bit suspect. Maximally compressed version would be very sensitive to mutations which wouldn't be great for adaptation via mutations. My understanding is that only a small fraction of mutations lead to unviable phenotypes.
Also AFAIK the current understanding is that majority of DNA is "junk", i.e don't seem to affect the phenotype. Which would be a partial explanation for the above.
The process of genetic expression is indeed something like procedual generation, but if maximal compression is about something like Kolmorogov complexity, the produced phenotype doesn't contain more information than the genetic information.
deng
He does mention structual interactions as well as duplications/deletions/inversions. I would argue methylation is more like an annotation of DNA and not part of the DNA itself, but that's a matter of opinion.
In the end, the author literally says: "nobody knows". Yes, you cannot compare a linear sequence of bits to a macromolecule that interacts structurally with its environment, and the author does not make that claim. The question he tries to answer is: how much data is needed to re-create a similar macromolecule that interacts in a similar way. His main point, in which you both agree: only the exons are surely not enough because the encoded proteins are just a (small?) part of how DNA interacts.
kjkjadksj
Exons are almost like functions where as a gene is almost like a class definition. In different tissues in the body a gene might be alternatively spliced to lead to different protein isoforms. In effect, making use of only a subset of available functions in the class depending on certain input parameters or how the class is called.
throwanem
This is a Star Trek version of the subject, in that it is pure technobabble which happens to mention a few real terms.
null
foobarian
I find that even if this just provides a lower bound it is still an interesting piece of information.
lotharcable
Yeah...
We know now that environmental factors change how DNA is expressed as well through epigenetics.
I don't know how any of it works. Something to do with the shape the DNA when it is wound up and how it changes the output when RNA produces proteins.
This is how parents can do things like pass some of the athleticism they earn through training to their children. It is possible for athletic parents to pass genes in such a way that it produces children even more athletic then they were.
All of this means that DNA has the ability to encode information and produce proteins in different ways using the same sequences.
So I am guessing that a lot of the DNA that is considered "junk" may not actually be. They are just missing a piece of the puzzle in how it gets read in.
moralestapia
But all of those emergent effects are accounted for in the DNA sequence [1], so the estimate is fine.
1. Maaaaybe you could make a case for DNA methylation, but that still requires some DNA signatures so ...
null
stenl
A much more detailed and thoughtful (and peer reviewed) take on the same question from my colleague Jussi Taipale: https://www.embopress.org/doi/full/10.15252/embj.201696114
vintermann
Information can only be defined with respect to states where you 1. Can tell (or could in theory tell) the difference and 2. Care about the difference between states. The differences you care about, and the ones you don't, are baked in whenever you use any definition of information.
It doesn't matter much, unless you use it to sneak in what you think we should care about, or use it to make philosophical arguments whose circularity is carefully hidden.
tringuyen_cse
I have a similar view. The question of how much infomation by itself does not matter without some context/application.
tetris11
I thought the main advantage of DNA storage was the physical size of it, and how many different genomes you could have stacked next to each other in the same -70degree space.
Millions of chimeric cells on the same petri dish? That's 1PB on a single glass slide.
Depending on the sequencing tech paired with the rise of Spatial data, the read speed could be formidable.
Needlessly complex setup though. Let's just stick with metals for now.
out_of_protocol
DNA self-desintegrate very fast. It only works in living cells because it is being repaired non-stop
throwanem
Even reading is a destructive process, and the physics involved are incomprehensibly complex by comparison with anything in the digital domain.
kjkjadksj
There are ways to read it nondestructively. One way does trade resolution but once prepped the DNA itself can be imaged to be read.
coolcase
How do forensics for old cases work? Is it because of probability. You don't need all the data but if enough of it matches it convinces the scientist of the match?
shishironline
Sorry, it is one of the most stable organic molecules and can stay intact for thousands of years. That is why the Jurassic Park like fantasies are based on a truth and many extinct species have been brought to life through DNA in reality too.
chermi
I think maybe they are talking about the very tightly packed yet still functionally accessible 3d structure that is chromatin, not individual strands.
misnome
No, they haven’t. Any claims otherwise are as real as “T Rex Leather” handbags.
gfalcao
I would like to get a reasonably good intuition in regards to the total amount of compound DNA from human bodies at different biochemical states, in different locations around the world (different climates). By "compound DNA" I mean, including DNA of bacterium, fungi and viruses living within one's body. For instance, gut bacteria acquired and maintained based on food intake and environmental influence.
gfalcao
In other words, how much the perception of DNA data in gigabytes grow by in different circumstances? Would it grow by a few more gigabytes ?
null
timewizard
> But mitochondrial DNA is tiny so I won’t mention it again.
Which is a bummer because it is circular. There is also a point on the strand where two separate genes overlap. The end of one has the same code as the beginning of another.
So even DNA has it's own native compression scheme.
xvilka
If we add more nucleotides[1] than standard 4, we could encode much more.
amelius
Another question is:
How much information can you __store__ in DNA without affecting the organism too much?
timewizard
Very little. The base pairs have specific electrochemical properties. The content of DNA controls it's structure.
roxolotl
Discussion from earlier this week: https://news.ycombinator.com/item?id=43927321
Pretty sure the substack and main site are the same. First paragraph is at least.
gitroom
Man, the back and forth here before coffee is actually kinda hilarious - I get all worked up before caffeine too, but honestly, DNA being this messy scratchpad feels way more interesting than treating it like a tidy CD. The messiness kinda rules, if you ask me.
sgt
Exactly why I prefer organizing my windows (on my computer) in a chaotic way, rather than a tiling manager. That's the DNA way of doing it ;)
RainbowcityKun
- Cells work like this because DNA is under constant attack from mutations. - Mutations most commonly arise during cell replication.
It's fascinating to realize that the "messiness" of DNA isn't a bug, but a feature—a side effect of evolution's raw material supply chain.
Mutations, repeats, transposons, and imperfect repairs all contribute to a noisy genomic landscape. But it's exactly this noise that enables biological diversity. No mutations, no variation. No variation, no selection. No selection, no evolution.
The genome is not a blueprint—it's a living, adapting scratchpad. Messiness is the canvas on which nature paints diversity.
esafak
Don't forget sexual reproduction.
nickpsecurity
Let me add to that. It requires a universe with specific laws that remain stable and encourage optimization. Then, a planet hospitible to life. Then, specific creatures with biological machinery more complex than anything humans have created. The machinery has plenty of reliability and adaptation baked in.
Godless evolution suggests randomness produced all of it overtime. Yet, that's never worked in anything we've built. Even our GA's required laws, an environment, a computer, software, and fine-tuning. Pre-existing or by intelligent design (human inventors). Without these, it produced no results.
So, I'll correct you by saying empirical data suggests evolution didnt produce this. We're seeing God's design skills in adaptive, resilient, complex, self-replicating systems. His work is truly beautiful to behold. Humans still can't produce something similar from scratch. Actually, they can't even be sure how the existing design works.
kaibee
> Godless evolution suggests randomness produced all of it overtime.
Nope. Randomness _and_ a selection function. Natural selection, ie: surviving to create the next generation.
> Yet, that's never worked in anything we've built.
It works completely fine in things we've built. We don't have the processing power to simulate something on the scale of computational complexity happening a small tide pond though. But you can see 'evolution by natural selection' in a rule set as simple as Game of Life.
> Even our GA's required laws, an environment, a computer, software, and fine-tuning. Pre-existing or by intelligent design (human inventors). Without these, it produced no results.
The laws/environment/computer are the equivalent of having a universe with physical laws. If you want to claim that god created the universe and tuned the constants of the universe, well, maybe. Or maybe every possible universe exists and we're just not around in the ones that don't lead to conscious life, in the same way that Game of Life universe is too simple/constrained to evolve conscious life on the scales we can simulate.
nickpsecurity
"Randomness _and_ a selection function"
It takes more than that for the chemical bonds to form, for the encoding to exist, for the bootstrapping environments to form, for the transitions to happen, and so on. Also, if a selection function exists, where did it come from and why does it work? Why does the math work? Why isn't math less useful or changing constantly?
"But you can see 'evolution by natural selection' in a rule set as simple as Game of Life."
That's false. You're repeating the same false premises as in the original claim I refuted. If godless and random could do it, then the questions below would all be No.
Does the game run in an environment made by intelligent designers? Does that environment need to be maintained?
Does it require rules made and maintained by intelligent designers?
Does it take an initial state in those rules to get to the specific outcomes you are looking for?
Does it produce simple, temporary patterns that are useless? Or complex machinery that's actually useful?
Or did all of the above happen randomly, keep happening, and produce increasingly complex and useful things?
"Or maybe every possible universe exists"
Science starts from observations to produce hypotheses. That is a faith-based belief popular in science fiction. It's also sort of a cop out because they're going to imagine something as infinite as God, but not mention God, to hope this would pop out randomly. If one does, they still have the "maintain it with stability over long periods" problem for that or those universes. They'll probably drag it deeper into infinity to say it will finally happen accidentally. Let's do science instead.
What we observe is a universe that is highly chaotic, almost every cubic inch is deadly, and the safest places are dead. We see nothing happening from it with Earth and humans being mind-boggling exceptions. Looking deeper at classical physics, we find reality itself also emerges in an orderly fashion from endless, quantum events that should be too random to support order. It also appears to work perfectly without failure for long periods of time.
We've also observed countless phenomenon that are truly random and chaotic, like July 4th fireworks, which never produce life or complex machines. Never self-replicating artifacts whose complexity increases over time. Never emergent intelligence from anything that didn't show evidence of design or have human input. We have billions of observations of chaotic events which themselves sometimes have a high magnitude of particles, chemicals, etc. Also, nothing lasts on its own due to physics with our intelligent designs requiring maintenance over time.
Our first hypothesis is that our reality should be total chaos. Our second hypothesis is something with unimaginable power is forcing a specific order to consistently come out of chaos. Second hypothesis is that the universe doesn't support life without being forced to. Third hypothesis is an intelligent being went uphill against the deadly universe to create us and our planet. Fourth hypothesis is that being is sustaining us despite a whole universe of threats to our lives. Fifth is that the creator is perfect. God is the Occam's Razor explanation of all of this.
There's also revelatory knowledge. God revealed Himself to us via His Word which came with prophecies, miracles, and testable predictions about lifestyles. Jesus, who died for humanity's sins, had a perfect life on top of the same, other attributes. Neither nobody nor nothing else had these traits to support their claimed revelations. So, outside empirical knowledge, revelatory knowledge reinforces the God theory into a highly-proven, saving belief. The life transformations that follow add anecdotal evidence to it.
SalmoShalazar
Really? Creationists on HN? There are mountains of peer reviewed research articles you can read to see that evolution is real and evidence based. To claim otherwise is idiocy.
nickpsecurity
Most top scientists were deists or Christians at one point. Newton's Principia Mathematica was even written to glorify God. Clearly, neither atheists nor evolutionists found the number of people making that claim to be good enough to ignore another claim.
Scientists tell us all ideas, whether a proposal or dissent, are evaluated strictly on evidential merit. Yet, evolution as origin of life had little evidence, many flaws, was forced on people anyway, and dissenting papers aren't allowed.
If it is dogmatic, and dissent isnt allowed, it is not science at all. Just a godless religion or political domination done with scientific wording in their papers. A consensus by people who force everyone to think one way isnt a scientific consensus. A theory whose rebuttals aren't even allowed in scientific journals isnt a scientific theory.
Until alternatives are allowed, and a real debate happens, I reject macro-evolution as either the truth or even a scientific consensus. I'll throw in some example counters, most being strong, which I wasn't taught in high school or college.
https://www.epm.org/resources/2010/Oct/3/ten-major-flaws-evo...
https://www.icr.org/article/four-scientific-reasons-that-ref...
jacktensuited
[dead]
RainbowcityKun
I want to clarify first: I'm not trying to defend "evolutionary theory" itself — what I'm pointing out is:
> Mutation, chaos, and randomness may actually be the fertile ground where biological diversity emerges.
At the same time, I fully agree with your key point:
> "The adaptive, complex, self-replicating systems we see > don’t persist just because of pure randomness."
In my view, this doesn’t necessarily mean a “God” designed it in a human-like way. But it does point to a deeper structural order and cosmic regularity.
Maybe we can call it a kind of “design of laws,” rather than a personal designer.
After all, nature seems to operate within a set of elegant, consistent rules:
- F = ma (Newton's 2nd Law): A foundational rule in classical mechanics. - E = mc² (Einstein): Energy and mass are interchangeable. - V = IR (Ohm’s Law): Governs how voltage, current, and resistance relate. - a² + b² = c² (Pythagorean Theorem): Geometry’s timeless backbone. - Entropy always increases (2nd Law of Thermodynamics): Order tends toward disorder unless something resists it.
So maybe we can say:
- In religious terms, this is “God’s design.” - In philosophical terms, it’s the “underlying order of the universe.” - In scientific terms, it’s the “laws of nature, structural stability, and the boundary conditions of evolution.”
nickpsecurity
Why are those rules there? Why don't they break? Why are they small, elegant, and beautiful? Why are many connected to each other in harmony?
This doesn't fit what random, survival-oriented processes produce. It doesn't fit what random, chaotic systems produce. It looks more like an intelligent being designed and maintained the universe. That should amaze you.
They also hardwired us with a specific morality. Children are born looking for God, wanting to be loved, and with a sense of justice (fairness). That the creation has these morals implies the creator either has them or knows of them. If people have done evil, they should be quite afraid of what that implies.
Divine revelation came later with miracles as proof. God's Word told us we have to seek God, love others, do good, and do justice. That fits with our natural design. That specific God fits the profile of one who would design that elegant universe with only human life in it. That should reinforce the need to repent and follow Christ, or burn alive. In His Word, he also said He created us very personally before He began driving those laws you're talking about.
The article massively undersells the information content of the genome in several key ways. A non-comprehensive list of these (before my morning coffee forgive me) includes:
- DNA methylation (https://en.wikipedia.org/wiki/DNA_methylation)
- Interactions of alleles (what article refers to as the "two versions of each base pair")
- Duplications, deletions, inversions, and other structural variations (https://www.genome.gov/genetics-glossary/Structural-Variatio...)
- Physical proximity interactions in 3-dimensional space (https://cmbl.biomedcentral.com/articles/10.1186/s11658-023-0...)
- Combinatorial effect (massive) of different alleles in complex systems
Overall, it's not sensible to compare a linear sequence of bits, like a CD (sibling comment) or DVD (the article), to the linear sequence of the genome and conclude that their information content, based on length alone, is in any way comparable.