Duplication Isn't Always an Anti-Pattern

53 comments

·December 2, 2025

ninkendo

I've had coworkers in the past that treat code like it needs to be compressed. Like, in the huffman coding sense. Find code that exists in two places, put it in one place, then call it from the original places. Repeat until there's no more duplication.

It results in a brittle nightmare because you can no longer change any of it, because the responsibility of the refactored functions is simply "whatever the orignal code was doing before it was de-duplicated", and don't represent anything logical.

Then, if two places that had "duplicated" code before the refactoring need to start doing different things, the common functions get new options/parameters to cover the different use cases, until those get so huge that they start needing to get broken up too, and then the process repeats until you have a zillion functions called "process_foo" and "execute_bar", and nothing makes sense any more.

I've since become allergic to any sort of refactoring that feels like this kind of compression. All code needs to justify its existence, and it has to have an obvious name. It can't just be "do this common subset of what these 2 other places need to do". It's common sense, obviously, but I still have to explain it to people in code review. The tendency to want to "compress" your code seems to be strong, especially in more junior engineers.

swatcoder

Yup. People are taught DRY very early on, as an introductory "engineering" practice above the nuts and bolts of writing code.

But nobody really teaches the distinction between two passages that happen to have an identical implementation vs two passages that represent an identical concept, so they start aggressively DRY'ing up the former even though the practice is only really suited for the latter subset of them.

As you note, when you blindly de-duplicate code that's only identical by happenstance (which is a lot), it's only a matter of time before the concepts making them distinct in the first place start applying pressure for differentiation again and you end up with that nasty spaghetti splatter.

RaftPeople

> But nobody really teaches the distinction between two passages that happen to have an identical implementation vs two passages that represent an identical concept, so they start aggressively DRY'ing up the former even though the practice is only really suited for the latter subset of them.

Even identical implementations might make more sense to be duplicated when throwing in variables around organizational coupling of different business groups and their change mgmt cycle/requirements.

clickety_clack

I think grug has the best refactoring advice: https://grugbrain.dev/

hinkley

I would probably still be working with one of these assholes if I hadn’t gotten laid of. Dude was 40. How tf have you not learned better by now?

gaigalas

Is there any code (yours, open open source, doesn't matter) that you would recommend as non "huffman compressed"? Give us an example of what you like.

ShipEveryWeek

I like doing this for data models - but it’s easy for people to go overboard

gardenhedge

Yeah I have seen that too. Any it's easily sold to non-technical managers

theoldgreybeard

I had the pleasure of Sandi Metz coming to a company I worked for and going us a “boot camp” of sorts for all of the engineering principles she espouses and it had a profound impact on how I view software development. Whatever the company paid for her to come - it was worth every penny.

“Prefer duplication over the wrong abstraction”

breckenedge

Sandi’s books and conference talks are excellent. My faves:

https://www.poodr.com/ https://www.youtube.com/watch?v=PJjHfa5yxlU

theoldgreybeard

Poodr is one of the best programming books ever written. Even if you don’t program in Ruby you should read it anyway (and pick up a bit of Ruby just for fun) because there are lots of great concepts to internalize that are useful in almost all programming languages.

codr7

Couldn't agree more!

You develop a sense for when the time is right over the years, by maintaining over engineered pieces of shit, many written by yourself.

To beginners it seems like coming up with the idea and building it is the difficult part; it isn't, not even close. The only difficult parts worth mentioning is keeping complexity on a tight leash and maintaining conceptual integrity.

bmitch3020

I'm a fan of the Go proverb "a little copying is better than a little dependency"[1] and also the "rule of three"[2] when designing a shared dependency.

I think the JS developers could take a lesson from the Go proverb. I often write something from scratch to avoid a dependency because of the overhead of maintaining dependencies (or dealing with dependencies that cease to be maintained). If I only need a half dozen lines of code, I'm not going to import a dependency with a couple hundred lines of code, including lots of features I don't need.

The "rule of three" helps avoid premature abstractions. Put the code directly in your project instead of in a library the first time. The second time, copy what you need. And the third time, figure out what's common between all the uses, and then build the abstraction that fits all the projects. The avoids over-optimizing on a single use case and refactoring/deprecating APIs that are already in use.

[1]: https://go-proverbs.github.io/ [2]: https://en.wikipedia.org/wiki/Rule_of_three_(computer_progra...

bob1029

In terms of code & data, I would say that duplication is mostly upside because the cost of refactoring is negligible. If all call sites are truly duplicate usages, then normalizing them must be trivial. Otherwise, you are dealing with something that seems like duplication but is not. The nuance of things being so similar that we would prefer they be the same (but unfortunately they cannot be) is where we will find some of the shittiest decisions in software engineering. We should not be in a rush to turn the problem domain into a crystalline structure. The focus should be about making our customer happy and keeping our options open.

That said, I have found other areas of tech where duplication is very costly. If you are doing something like building a game, avoiding use of abstractions like prefabs and scriptable objects will turn into a monster situation. Failure to identify ways to manage common kinds of complexity across the project will result in rapid failure. I think this is what actually makes game dev so hard. You have to come up with some concrete domain model & tool chain pretty quickly that is also reasonably normalized or no one can collaborate effectively. The art quality will fall through the basement level if a designer has to touch the same NPC in 100+ places every time they iterate a concept.

arealaccount

Classic similar blog post https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction

null

[deleted]

vanschelven

When a colleague told my father that "duplication is always bad" he grabbed a random memo from that colleague's desk and said "I bet there's at least 3 copies of this piece of paper in this building". That drove the point home alright.

esafak

This is not the riposte you might think it is. Duplication serves a different purpose in print; it lets you avoid having to pass the memo around.

vanschelven

I think the riposte is against the word "always" and it worked precisely because one could blindly pick a counter example from the physical space of the discussion.

I.e. it worked because it smashed the broad statement and forced a discussion about particulars. Now who was right about those, I have no idea, since I wasn't even present.

synecdoche

How? Why?

huflungdung

[dead]

lateforwork

If you have only one copy of the code then you only have to fix the bug in one place, as opposed to a dozen. So there is significant cost savings. But there is a problem: when you make a bug fix you have to test all the different places it is used. If you don't then you could be breaking something while fixing something. If you have comprehensive automated tests then you can have just one copy of the code--if you introduce a bug while fixing a bug the automated tests will catch it.

If you don't have comprehensive test automation then you have to consider whether you can manually test all the places it is used. If the code is used in multiple products at your company--and you aren't even familiar with some of those products then you can't manually test all the places it is used. Under such circumstances it may be preferable for each team to have duplicate copies of some code. Not ideal, but practical.

hinkley

I don’t think it’s the cost of making the same fix n times, it’s the cost of missing one, or two. Especially if customers notice first.

lateforwork

Right, but you have to consider the cost of incorporating bug fixes without fully testing them. That too can introduce new failures that are noticed by customers first.

porridgeraisin

Yep. This is the actual reason behind duplication I've seen 99% of the time.

jamesbelchamber

"Don't Repeat Yourself" is a great rule of thumb which, at least in writing Terraform configuration, became absolute tyranny. Strange webs of highly coupled code with layers of modules, all in an effort to be DRY - almost everywhere I've seen Terraform.

Trying to explain why a little duplication is preferable to bad abstractions, and specifically preferable to tightly coupling two unrelated systems together because they happened to sort-of interact with the same resource, was endless and tiring and - ultimately - often futile.

hinkley

On the terraform comment, things that change together ship together is a good mantra.

If you keep having to make edits in two independent systems every time you want to make one change, something is wrong. If you’re leaving footguns around because changing one thing affects two or more systems, but you aren’t at liberty to change them both in production, that’s also something wrong.

Ekaros

I don't do too much terraform. But isn't the DRY really happening on provider level? And when you are using it, most of times it really doesn't make too much sense to try to not repeat yourself. Unless you are dealing with actual identical resources. Or deploying multiple times say dev, test and prod.

miduil

testability and developability, ideally you structure your terraform/terragrunt code in a way that you can bootstrap an almost equivalent test environment. For example when using "AWS Well Architected"-method you would be able to bootstrap a similar environment on a separate AWS account that's part of your organization.

Unfortunately, terraform module system is extremely lacking and in many ways you're totally right - if your module is just replicating all the provider arguments it just feels wrong.

esafak

Were they deduplicating incorrectly, or does terraform lack the ability to do so well?

baby_souffle

Bit of both, really. There are some common techniques that would be a lot simpler or more robust if terraform would support variables and expressions like lambdas in more places (tofu is getting there…) but it’s also a failure to realize that terraform is ,rant to composite many small modules together and not just pass 150 different inputs into an Omni module.

hbogert

you can shorten to:

> writing Terraform configuration, became absolute tyranny.

gaigalas

Explaning is hard. Examples often work better. You need to be able to show an example where deduplication would be made worse by applying DRY, otherwise it's hard to argue using just vague descriptions.

I totally agree with deduplication, but only when it's shown. Otherwise it's too easy, and I've seen people try to use this argument to justify slop many times.

ilitirit

I had a lengthy argument about this in our architecture forum. I argued that "re-use" shouldn't be included as an Enterprise (keyword here) Architecture principle because they are clear use-cases where duplication is preferable to alternatives. e.g. deployment and testing decoupling etc etc. I had a lot of resistance, and eventually we just ended up with an EA principle with a ton of needless caveats.

It's unfortunate that so many people end up parroting fanciful ideas without fully appreciating the different contexts around software development.

StellarScience

> It's unfortunate that so many people end up parroting fanciful ideas without fully appreciating the different contexts around software development.

Of course that's true of both sides of this discussion too.

I really value DRY, but of course I have seen cases where a little duplication is preferable. Lately I've seen a steady stream of these "duplication is ok" posts, and I worry that newer programmers will use it to justify copy-paste-modifying 20-30-line blocks of code without even trying to create an appropriate abstraction.

The reality of software is, as you suggest, that there are many good rules of thumb, but also lots of exceptions, and judgment is required in applying them.

polygot

https://archive.ph/eR3gp

brandensilva

I have to agree, it's much easier to remove and consolidate duplicative work than unwind a poor abstraction that is embedded everywhere.

And I think it's easy to see small companies lean on the duplication because it's too easy to screw up abstractions without more engineering heads involved to get it right sometimes.

StellarScience

> it's much easier to remove and consolidate duplicative work than unwind a poor abstraction that is embedded everywhere.

It's not easy to deduplicate after a few years have passed, and one copy had a bugfix, another got a refactoring improvement, and a third copy got a language modernization.

With poor abstractions, at least you can readily find all the places that the abstraction is used and imorove them. Whereas copy-paste-modified code can be hard to even find.

bonesss

With poor abstractions I can improve abstractions and ensure holistic impact because of the reuse. Then I’m left with well factored reusable code full of increasingly powerful abstractions. Productivity increases over time. Abstractions improve and refine over time. Domain understanding is deepened.

With duplicated messes you may be looking at years before a logical point to attack across the stack is even available because the team is duplicating and producing duplicated efforts on an ongoing basis. Every issue, every hotfix, every customer request, every semi-complete update, every deviation is putting pressure to produce and with duplication available as the quickest and possibly only method. And there are geological nuances to each copy and paste exercise that often have rippling effects…

The necessary abstractions often aren’t even immaturely conceived of. Domain understanding is buried under layers of incidental complexity. Superstition around troublesome components takes over decision making. And a few years of plugging the same dams with the same fingers drains and scares off proper IT talent. Up front savings transmutate to tech debt, with every incentive to every actor at every point to make the collective situation worse by repeating the same short term reasoning.

Learning to abstract and modularize properly is the underlying issue. Learn to express yourself in maintainable fashion, then Don’t Repeat Yourself.

brandensilva

I feel AI does decent at fixing the dupes and consolidating it as one instance. Abstractions can have far deeper connections and embeddings making it really hard to undo and reform but to each their own on what works for them.

locknitpicker

> I have to agree, it's much easier to remove and consolidate duplicative work than unwind a poor abstraction that is embedded everywhere.

That is basically the core tenet of "Write Everything Twice" (WET)

next_xibalba

I’ve been working on a new framework for the last five years. White paper dropping soon. It’s called “Write Everything Thrice” (WET). Lmk if want the link to my substack where I’m cooking up more stuff like this.

tomaskafka

Only if you include an ICO of WETtoken and a line of longevity supplements.

didip

I don’t understand why so many engineers have tendencies to dedupe code.

Data, which is more important than code imho, are constantly duplicated all the time. Why can’t code have some duplication?

HN

Duplication Isn't Always an Anti-Pattern

Duplication Isn't Always an Anti-Pattern