Skip to content(if available)orjump to list(if available)

The Ingredients of a Productive Monorepo

bob1029

This thread is reminding me of a prior one about complexity merchants. I am seeing a lot of sentiment that there is somehow a technical sacrifice by moving to a monorepo.

This is absolutely ludicrous unless you fail to grasp the power of a hierarchical file system. I don't see how a big mess like CI/CD is made easier by spreading it out to more points of configuration.

To me the whole point of a monorepo is atomic commits for the whole org. The power of this is really hard to overstate when you are trying to orchestrate the efforts of lots of developers - contrary to many claims. Rebasing in one repo and having one big meeting is a hell of a lot easier than doing it N times.

Even if the people on the team hate each other and refuse to directly collaborate. I still don't see the reason to not monorepo. In this scenario, the monorepo becomes a useful management and HR tool.

jayd16

Here's the issue... Being able to track commits across multiple projects is just a nice to have. Does it even significantly increase your ability to track dependencies or trigger down stream tests? You can track that stuff with multi-repo automations. Maybe it helps a bit but it's not free and its not complete.

Your deploys don't become atomic. Your builds don't become atomic. At best you get to tell yourself you can version a bit more loosely.

Mono-repos themselves do not scale easily. Inherently its a harder technical problem to solve. You need to toss git and find something better, which is not easy. Its work. Its so much work that it is incredibly clear that you've never experienced it yourself.

gugagore

Do you have go-to multi repo automations and other tools for operating across multiple repos?

gengstrand

It is true that there are significant benefits to monorepo but it comes at a cost. Managing a monorepo is more expensive than polyrepo. For the details behind that claim, check out https://www.exploravention.com/blogs/monorepo/

The question is this. Do the costs of monorepo justify the benefits for your situation? The answer is not always yes.

codethief

Note that the costs depend on the scale and size of the monorepo, and a polyrepo is not without costs either (which people often like to forget or ignore because they are less visible).

I spent some years at small to mid-sized companies (~30-100 devs) that would have profited from a monorepo. However, people were in the habit of splitting repositories every other month. Sometimes a single team would own a given repository, but more often than not several teams would contribute to each repository.

I have serious PTSD from that time. Every single pipeline in every repo worked differently, tools were different, setup, scripts and commands were different. In some repositories you could trust the CI pipeline, in others you absolutely couldn't. CI performance gains in one repo wouldn't translate to another. And of course you often still had some tech debt lying around from when repositories got split and people forgot to clean up thoroughly. Uggh.

Now, to be fair, people did put quite a bit of effort into their setups and pipelines and all that, and it wasn't that the outcome of that effort in and by itself was totally bad. Not at all. But everyone did things differently. And often teams just re-did the work other people had already done & solved – there was a ton of overhead.

Finally, the worst part were inter-repository dependencies. People argued coupling would be loose and all that, so we could easily split repos, but in reality there were so many implicit dependencies because ultimately all repositories made up one big application. Merge requests had to be coordinated. One repo imported files from another. Later a CI pipeline in one repo triggered a pipeline in another repo…

This brings me to another problem with polyrepos: They harm discoverability and cause certain dependencies not to be encoded in code. In a monorepo, in contrast, when people wonder where they can find X, or whether anyone uses or depends on Y, the answer is usually only a `ripgrep` or `git grep` away. In a polyrepo, however, they need to know where to look first.

Born from all these experiences, my mantra has been: Split repos only if you really have to and have a very good reason! (Entirely different applications or ecosystems; different parts of the code owned by different & independent teams; …)

marcosdumay

> The power of this is really hard to overstate

A really useful heuristics when you are designing programming environments is: the more power you give to a team of developers, the more problems you will have.

Technically, atomic commits are not more power, they are less. But it does empower the team to work with bad interfaces. And that's a power that creates problems.

wocram

Just today I saw someone try to set up a new project and stumble over setting it up as it's own repository, instead of just having a central place to add it.

There's also a vicious feedback loop of separating projects across repos creating a huge impediment for touching anything outside of the project you're directly working in.

yencabulator

> To me the whole point of a monorepo is atomic commits for the whole org.

The belief that a monorepo makes a change somehow more atomic is one of the traps.

From the article:

> The greatest power and biggest lie of the monorepo is that it is possible to make atomic commits across your entire codebase. [...]

> Your monorepo now contains many different deployable artifacts that deploy at different times. It is also technically possible to make, for example, a breaking change to a service’s interface, a service’s implementation, and the service’s clients all in one PR. However, this PR will break when you deploy it because you do not deploy your service and all of its clients atomically. While this is also possible in a world with many repositories, the requirement to do this change in multiple pull requests is often enough to remind engineers that breaking changes to a service contract are not safe to make.

> Your users must understand that your deployment system operates asynchronously with respect to what happens in the monorepo. Its primary interaction with the monorepo is to go and pick up the “latest” build artifacts for a particular service; everything else happens on timetables that are potentially not under your control and can happen arbitrarily far in the future.

> A common CI job in a monorepo is to validate service contracts and make sure that they are not broken unless the author deliberately intended to do so, and they are required to provide a justification as to why such a change is OK.

crazygringo

That's confusing two different things, though.

A monorepo does make changes atomic in the code. There's no trap there.

You're talking about deployment, and yes when deployment is staggered, then obviously all atomic changes need to be backward-compatible, or else be very carefully orchestrated. But that doesn't have anything to do with monorepo vs polyrepo. That's just staggered deployment.

You have to deal with backwards compatibility in both cases. But at least with the monorepo you can see and track and merge all the changes related to a feature in one place, and be able to roll them back in one place.

yencabulator

There's no such thing as when deployment is staggered. It's a distributed problem, so by definition it is not synchronous.

(Or you turn services off for the duration of the deploy. Most companies do not want that these days.)

Also, you're missing this part of the article:

> While this is also possible in a world with many repositories, the requirement to do this change in multiple pull requests is often enough to remind engineers that breaking changes to a service contract are not safe to make.

cmrdporcupine

The push to fragmentation and atomism is so strong with this generation of devs. The obsession with microservices, dozens of small repositories, splitting everything up from fear of "monoliths."

What they're doing is creating a mass of complexity that is turning org-chart problems into future technical ones and at the same time not recognizing the intrinsic internal dependencies of the software systems they're building.

Luckily my current job is not like this, but the last one was, and I couldn't believe the wasted hours spent doing things as simple as updating the fields in a protobuf schema file.

bluGill

That push to fragmentation is in large part because of hard lessons learned from the problems of a monolith.

The answer is IMO somewhere in between. Microservices can get too tiny and thus the system becomes impossible to understand. However a monolith is impossible to understand as well.

The real problem is you need good upfront architecture to figure out how the whole system fits together. However that is really hard to get right (and Agile discourages it - which is right for small projects where those architects add complex things to mitigate problems you will never have)

jyounker

Monolith != Monorepo. They're independent concepts.

ecoffey

In my experience microservices are easier to manage and understand when organized in a monorepo.

layer8

That indicates a strong coupling between those microservices.

mherkender

Large software projects cycle back and forth between fragmentation and defragmentation. There is no right answer, only what's right for each project at the time.

Relevant xkcd: https://xkcd.com/2044/

nobodywillobsrv

While things like git submodules have problems, are they not sort of the best of both worlds camp?

I never quite got the arguments/grumblings about this stuff but perhaps I do not understand what friction people are hitting up agains. It feels to me the problems of multiple repositories is the same problem of having to manage versions of dependencies like in pip or something that you do not even own.

Perhaps people are using tools that mean they are not able to search/click through their multi repo setup even though the code connects?

Too

Yes and No.

One top level "superrepo" is the best of both worlds. You don't break the limits of git, you can include third party repos, you can seal off access to some secret repos. All while still giving benefit of everything checked out in one place. Tooling for flattening the git log exists and is relatively cheap to build yourself, It's easy to design CI that runs both big and small jobs in the superrepo and the subcomponents.

Nested submodules = Game over. Don't ever do it. This is the worst of both worlds. Common components included twice or more need to be aligned to exact same revision, without help of semantic versioning. Components far down the dependency tree need to be uplifted recursively through a chain of submodules, resulting in 10 commits with no common CI for one tiny change.

lxe

So there are 2 kinds of big tech monorepos.

One is the kind described in the article here: "THE" monorepo of the (mostly) entire codebase, requiring custom VCS, custom CI, and a team of 200 engineering supporting this whole thing. Uber and Meta and I guess Google do it this way now. It takes years of pain to reach to this point. It usually starts with the other kind of "monorepo":

The other kind is the "multirepo monorepo" where individual teams decide to start clustering their projects in monorepos loosely organized around orgs. The frontend folks want to use Turborepo and they hate Bazel. The Java people want to use Bazel and don't know that anything else really exists. The Python people do whatever the python people do these days after giving up on Poetry, etc... Eventually these might coalesce into larger monorepos.

Either approach costs millions of dollars and millions of hours of developers' time and effort. The effort is largely defensible to the business leaders by skillful technology VPs, and the resulting state is mostly supported by the developers who chose to forget the horror that they had to endure to actually reach it.

przmk

At my current $dayjob, there is a backend that is split into ~11 git repos which results in a single feature being split among 4-5 merge requests and it's very annoying. We're about to begin evaluating monorepos to group them all (among other projects). What would the alternative to a monorepo be in this case, knowing that we can't bundle the repos together?

bluGill

Is a mono repo the answer, or is the real problem you just have a bad repo split.

I can't answer that question, and there are reasons to go monorepo anyway. However if your problem is a bad polyrepo split going to monorepo is the obvious answer, but it isn't the only answer. Monorepo and polyrepo each have very significant problems (see the article for monorepo problems) that are unique to that setup. You have to choose what set of problems to live with and mitigate them as best you can.

layer8

The general rule is that things should be versioned together that change together. Separate repositories should be thought of similarly to separately versioned libraries. Dependencies between repositories should have stable interfaces. Design decisions that are likely to change should be encapsulated within a module, so that these decisions are hidden from other modules (a seminal paper about that is [0]). These considerations should guide any split into separate repositories.

[0] https://wstomv.win.tue.nl/edu/2ip30/references/criteria_for_...

whstl

"11 repos with 4-5 merge requests" doesn't sound like Google-level, so I don't see why a monorepo wouldn't work without much work.

nightpool

I think you might just have a badly architected backend. get rid of your microservices first and then we'll see how you're feeling

matthew16550

As an asside, I've found IntelliJ very helpful in this situation as it can load many repos into one project then doing commits / pushes / branches etc across various repos at the same time just seemed to work the way I wanted without much thinking about it.

dustingetz

use git subtree - first to concatenate the minor repos into one major repo, and then subtree split from that point forward to publish subtree folders back to the minor repos, if needed (e.g. open source projects to github). works for us with about 8 minor repos, eliminated submodule pain entirely. only the delivery lead has to even know the minor repos exist.

przmk

I have already briefly looked at git-subtree. From what I can gather, it doesn't help much with my use-case. You still need to manually pull from each subtree and push branches individually to each project. The end result is still 4-5 merge requests to handle on Gitlab for a single new feature.

I might have missed something.

mystified5016

Yup, at work I have a few projects split across several repos in like four languages. A completely new feature implemented across the whole stack involves PRs in up to 8 different repos. Potentially more.

To be totally honest, yes this is an unbelievable pain in the ass, but I much prefer the strict isolation. Having worked with (much, much) smaller monorepos, I find the temptation to put code anywhere it fits too much, and things quickly get sloppy. With isolated repos, my brain much more clearly understands the boundaries and separation of concerns.

Then again, this results in a lot of code duplication that is not trivial to resolve. Submodules help to a degree, but when the codebase is this diverse, you're gonna have to copy some code somewhere.

I view it sort of like the split between inheritance and composition. You can either inherit code from the entire monorepo, or build projects from component submodules plugged together. I much prefer the latter solution, but clearly the former works for some people.

hiddencost

Do these 11 repos end up in separate binaries?

Because it sounds like you just need flag based feature releases.

przmk

It ends up with 6 deployables that are coupled together (let's say micro-services). There are surely better ways to structure the project but our CI/CD pipeline doesn't allow us to do so and it is not handled by our team anyway. I haven't seen any good way to make my life easier for merges, tech reviews, deployments, etc…

no_wizard

There's no good orchestration system that is both easy to implement and has the core features that make a monorepo pleasant to use that is language agnostic.

Bazel is complex and isn't the easiest to pick up for many (though to Google's credit the documentation is getting better). Buck isn't any better in this regard. Pants seems easiest out of all the big ones I've seen but its also a bit quirky, though much easier to get started with in my experience. NX is all over the place in my experience.

Until recently too, most of these monorepo systems didn't have good web ecosystem support and even of those that do they don't handle every build case you want them to, which means you have to extend them in some way and maintain that.

It also doesn't help that most CI systems don't have good defaults for these tools and can be hard to setup properly to take advantage of their advantages (like shared cross machine caching).

As an aside, the best monorepo tooling I have ever used was Rush[0] from Microsoft. If you are working in a frontend / node monorepo or considering it, do take a look. It works great and really makes working in a monorepo extremely uniform and consistent. It does mean doing things 'the rush way' but the trade off is worth it.

[0]: https://rushjs.io

Pawka

It's worth noting that most monorepos won't reach the same size as repositories from Google, Uber, or other tech giants. Some companies introduce new services every day, but for some, the number of services remains steady.

If a company has up to 100 services, there won't be VCS scale problems, LSP will be able to fit the tags of the entire codebase in a laptop's memory, and it is probably _almost_ fine to run all tests on CI.

TL;DR not every company will/should/plan to be the size of Google.

CamouflagedKiwi

I do think the 'run all tests on CI' part is not that fine, it bites a lot earlier than the others do. Git is totally fine for a few hundred engineers and 100ish services (assuming nobody does anything really bad to it, but then it fails for 10 engineers anyway), but running all tests rapidly becomes an issue even with tens of engineers.

That is mitigated a lot by a really good caching system (and even more by full remote build execution) but most times you basically end up needing a 'big iron' build system to get that, at which point it should be able to run the changed subset of tests accurately for you anyway.

anon7000

There are also so many types of slow tests in web systems. Any kind of e2e test like Cypress or Playwright can easily take a minute. Integrations tests that render components and potentially even access a DB take many times longer than a basic unit test. It doesn’t take very many of the slow group to reaaaly start slowing your system down. At that point, what matters is how much money you’re willing to pay to scale your build agents either vertical or (more likely) horizontally

zbentley

I think that discussions in this area get muddied by people using different definitions of “rapidly”. There are (at least) two kinds of speed WRT tests being run for a large code base.

First, there is “rapidly” as pertains to the speed of running tests during development of a change. This is “did I screw up in an obvious way” error checking, and also often “are the tests that I wrote as part of this change passing” error checking. “Rapid” in this area should target low single digits of minutes as the maximum allowed time, preferably much less. This type of validation doesn’t need to run all tests—or even run a full determinator pass to determine what tests to run; a cache, approximation, or sampling can be used instead. In some environments, tests can be run in the development environment rather than in CI for added speed.

Then there is “rapidly” as pertains to the speed of running tests before deployment. This is after the developer of a change thinks their code is pretty much done, unless they missed something—this pass checks for “something”. Full determinator runs or full builds are necessary here. Speed should usually be achieved through parallelism and, depending on the urgency of release needs, by spending money scaling out CI jobs across many cores.

Now the hot take: in nearly every professional software development context it is fine if “rapidly” for the pre-deployment category of tests is denominated in multiple hours.

Yes, really.

Obviously, make it faster than that if you can, but if you have to trade away “did I miss something” coverage, don’t. Hours are fine, I promise. You can work on something else or pick up the next story while you wait—and skip the “but context switching!” line; stop feverishly checking whether your build is green and work on the next thing for 90min regardless.

“But what if the slow build fails and I have to keep coming back and fixing stuff with an 2+ hours wait time each fix cycle? My precious sprint velocity predictability!”—you never had predictability; you paid that cost in fixing broken releases that made it out because you didn’t run all the tests. Really, just go work on something else while the big build runs, and tell your PM to chill out (a common organizational failure uncovered here is that PMs are held accountable for late releases but not for severe breakage caused by them pushing devs to release too early and spend less time on testing).

“But flakes!”—fix the flakes. If your organization draws a hard “all tests run on every build and spurious failures are p0 bugs for the responsible team” line, then this problem goes away very quickly—weeks, and not many of them. Shame and PagerDuty are powerful motivators.

“But what if production is down?” Have an artifact-based revert system to turn back the clock on everything, so you don’t need to wait hours to validate a forward fix or cherry-picked partial revert. Yes, even data migrations.

Hours is fine, really. I promise.

DanielHB

My company has been moving towards having monorepos per language stack. Decent compromise

Kinrany

That sounds worse than either option. At that point put it all in one repo with a directory for each language.

aldanor

And then at some point your Rust people write a Python module in Rust via pyo3, and it has to be integrated into Python build system and Python type checkers, but also needs local rust crates as build dependencies and local python packages as runtime dependencies... hm.

jolt42

Coupling and Cohesion likely has nothing to do with the language.

amelius

This will start to become a problem if the stacks need to communicate with each other using versioned protocols.

lolinder

Why can't you just use versioning in your external-to-the-monorepo APIs and use HEAD within the monorepo? Nothing about combining some projects into a monorepo forces you into dropping everything else we know about software release cycles.

NiloCK

Maybe I miss the point here, but it seems to me that versioning the protocols is the specific solution to maintaining interop between different implementations.

mamcx

And for small teams, what we want/need is the "all deps" mono-repo.

I wanna link other repos I depend on, but that repos can be read-only. And then all the tools work without extra friction

P.D: This could be another wish for jj!

echelon

As a former IC at a large monorepo company, I preferred monorepos over polyrepos.

It was the "THE" monorepo, and it made understanding the company's service graph, call graph, ownership graph, etc etc. incredibly clear. Crystal clear. Vividly so.

Polyrepos are tribal knowledge. You don't know where anything lives and you can't look or discover it. Every team does their own thing. Inheriting new code is a curse. Code archeology feels like an adventure in root cause analysis in a library of hidden and cryptic tomes.

Polyrepos are like messages and knowledge locked away inside Discord or Slack channels with bad retention policies. Everything atrophies in the dark corners.

If monorepos cost millions, I'd say polyrepos do just the same in a different way.

Monorepos are are a continent of giant megafauna. Large resources, monotrophic.

Polyrepos are a thousand species living and dying, some thriving, some never to be known, most completely in the dark.

shawabawa3

Every time I've been monorepos compares with polyrepos it's always "monorepo plus millions of dollars of custom tool engineering" vs "stock polyrepo"

Why can't we add millions of dollars of tool engineering on top of polyrepos to get some of the benefits of monorepos without a lot of the pain? E.g. it wouldn't be too hard to create "linked" PRs across repos for changes that span projects, with linked testing infrastructure

And I don't see how discovery changes significantly from browsing through loads of repositories instead of loads of directories in a repository

dezgeg

> Every time I've been monorepos compares with polyrepos it's always "monorepo plus millions of dollars of custom tool engineering" vs "stock polyrepo"

The costs of the infra/build/CI work are of course more visible when there is a dedicated team doing it. If there is no such central team, the cost is just invisibly split between all the teams. In my experience this is more costly overall, due to every team rolling their own thing and requiring them to be jack-of-all-trades in rolling their own infra/build/CI.

> And I don't see how discovery changes significantly from browsing through loads of repositories instead of loads of directories in a repository

If repository permissions aren't set centrally but every team gets to micromanage them, then they usually end up too restrictive and you don't get even read-only access.

phinnaeus

Great call out. Amazon has an extremely effectively polyrepo setup and it’s a shame there’s no open source analog. Probably because it requires infrastructure outside of the repo software itself. I’ve been toying around with building it myself but it’s a massive project and I don’t have much free time.

bluGill

My company has millions of dollars in tooling for our polyrepo. It would not be hard to throw several more million into the problem.

If you have a large project there is no getting around the issues you will have. Just a set of pros and cons.

There are better tools for polyrepo you can start with, but there is a lot of things that we have that I wish I could get upstreamed (there is good reason the open source world would not accept our patches even if I cleaned them up)

CamouflagedKiwi

I think one reason is that there are various big companies (Google, Microsoft, Meta) who have talked about the tech they've deployed to make monorepos work, but I at least have never seen an equivalent big successful company describe their polyrepo setup, how they solved the pain points and what the tech around it looks like.

codethief

> Why can't we add millions of dollars of tool engineering on top of polyrepos

I don't think the "stock polyrepo" characterization is apt. Organizations using polyrepos already do invest that kind of money. Unfortunately, this effort is not visible because it's spread out across repos and every team does their own thing. So then people erroneously conclude that monorepos are much more expensive. Like the GP said:

> Polyrepos are a thousand species living and dying, some thriving, some never to be known, most completely in the dark.

maccard

> Every time I've been monorepos compares with polyrepos it's always "monorepo plus millions of dollars of custom tool engineering" vs "stock polyrepo"

Not quite - it's "vs stock polyrepo with millions of dollars of engineering effort in manually doing what the monorepo tooling does".

friendzis

Exactly. Take your monorepo, split it into n repos by directory at certain depth from root, write very a rudimentary VCS wrapper script to sync all the repos in tandem and you have already solved a lot of pain points.

> E.g. it wouldn't be too hard to create "linked" PRs across repos for changes that span projects, with linked testing infrastructure

Bitbucket does this out-of-the box :)

threeseed

a) At least with Github Actions it is trivial to support polyrepos. At my company we have thousands of repositories which we can easily handle because we can sync templated CI/CD workflows from a shared repository to any number of downstream ones.

b) When you are browsing through repositories you see a description, tags, technologies used, contributors, number of commits, releases etc. Massive difference in discovery versus a directory.

nssnsjsjsjs

Monorepo monoliths make it hard to experiment. Getting something as benine as a later version of .NET becomes a mammoth task requiring the architecture team and everything stays old. Want to use a reasonable tool? No chance.

zelphirkalt

I don't see how it immediately has to follow from monorepo usage, that its parts cannot have separate runtimes and dependency versions. Perhaps the monorepo tooling is still that bad, idk, but there seems no inherent reason for that.

AlotOfReading

One thing I don't usually see discussed in monorepo vs multi repo discussions is there's an inverse Conway's law that happens: choosing one or the other will affect the structure of your organization and the way it solves problems. Monorepos tend to invite individual heroics among common infrastructure teams, for example. Because there are so many changes going in at once, anything touching a common area has a huge number of potential breakages, so effort to deliver even a single "feature" skyrockets. Doing the same thing in a multi-repo may require coordinating several PRs over a couple of weeks and some internal politics, but that might also be split among different developers who aren't even on a dedicated build team.

Adverblessly

> Because there are so many changes going in at once, anything touching a common area has a huge number of potential breakages, so effort to deliver even a single "feature" skyrockets.

If a specific change in a monorepo is so centrally embedded it requires incredible effort to do atomically (the benefit of having the monorepo in the first place), you are still able to split it into multiple gradual changes (and "require coordinating several PRs over a couple of weeks and some internal politics, but that might also be split among different developers who aren't even on a dedicated build team.").

So in a monorepo you can still enjoy the same advantage you describe for multi repo, and you'll even have much better visibility into the rollout of your gradual change thanks to the monorepo.

CamouflagedKiwi

That's an optimistic take on what happens in the polyrepo setup. A common alternative (I suspect by far the more common one) is that changes are made to the common area but not propagated to downstream repos, which all end up pinned to different versions of the common repo and struggle to update once they get ~years out of date.

oivey

Yeah. My experience is that the teams managing shared repos tend to shift responsibility for integrating their changes onto their users. They then also more often make breaking changes because they’re insulated from the costs of those changes.

williamdclt

the obvious result of that is: the changes are often not integrated for ages, if ever. Which means at some point it becomes a problem and the cost to do the integration has become much higher.

bluGill

We have a person deditated to bringing in changes to our polyrepo. Nothing is considered done until it is in his mainline branches so there is incentive to get things integrated. Nothing goes in until it passes the full test suite, whith he verifies you ran before integrating and then runs again to be sure.

as someone who works on core parts that are lively to break everything I spend half of my time just integrating things and anouther quarter trying to figure out how to make my things either less core or not need changes so often.

nitwit005

I'd caution that a monorepo isn't a full fix to that. People often make multiple versions of libraries. You have separate 2.X and 3.X versions, with independent source code (or branches), and ask people to migrate to the new one.

There's not really a way around that when you need some behavioral change for the code using the library.

makeitdouble

Is your underlying assumption that the organization doesn't want to go one way or the other in the first place and is nudged by the technical choice afterwards ?

I think most of the time the philosophical decision (more shared parts or better separation) is made before deciding how you'll deal with the repos.

Now, if an org changes direction mid-way, the handling of the code can still be adapted without fundamentally switching the repo structure. Many orgs are multi-repo but their engineers have access to almost all of the code, and monorepo teams can still have strong isolation of what they're doing, up to having different CI rules and deployment management.

TeMPOraL

I think GP's claiming it's a feedback loop, not one-directional relationship. Communication structure of an organization ends up reflected in the structure of systems it designs, and at the same time, the structure of a system influences the communication structure of the organization building it.

This makes sense if you consider that:

1) Changes to system structure, especially changes to fundamentals when the system is already being built, are difficult, expensive and time consuming. This gives system designs inertia that grows over time.

2) Growing the teams working on a system means creating new organizational units; the more inertia system has, the more sense it makes for growth to happen along the lines suggested by system architecture, rather than forcing the system to change to accommodate some team organization ideals.

Monorepo/multirepo is a choice that's very difficult to change once work on building the system starts, and it's a choice you commit at the very beginning (and way before the choice starts to matter) - a perfect recipe for not a mere nudge, but a scaffolding the organization itself will grow around, without even realizing it.

tveita

Having done this a few times, I suspect the norm is that the decision is taken without a full understanding of the tradeoffs, both because the decision is taken before engineering has matured and can be hard to change later, and because the disadvantages are easy to downplay - "Sure, but our team will always stay on top of dependencies."

Typically someone has read a few blog posts like the ones linked to, and have some vague ideas of the positives but don't have a full understanding of how how the disadvantages will shape their workflow.

I've seen people with experience at hobby or small scale successfully campaigning for a switch at work and then hitting a wall - in both directions. Updating every call site for a breaking change doesn't sound that onerous, and at a small scale it isn't. Having each team update versioned depencies doesn't sound that hard, and at a small scale it isn't.

Just like with languages, don't listen to anyone who tells you this will solve all your problems. One of the options are merely the least bad for your situation.

wocram

Much more common is that an easy cross project change in a monorepo simply isn't don't in a polyrepo because of how much more onerous it is.

Flux159

This definitely tracks with my experience in big tech - managing large scale build systems ends up taking a team that works on the build system itself. The underlying repo technology itself needs to work at scale & that was with a virtual file system that downloaded source files on demand when you needed to access them.

One thing that this article didn't mention is that most development was done either on your development server running a datacenter (think ~50-100 cores) - or on an "on demand" machine that was like a short lived container that generally stayed up to date with known good commits every few hours. IDE was integrated with devservers / machines & generally language servers, other services were prewarmed or automatically setup via chef/ansible, etc. Rarely would you want to run the larger monorepos on your laptop client (exception would generally be mobile apps, Mac OS apps, etc.).

swgillespie

Yeah - I worked on that build team probably at the same place you did!

I think for a lot of users it's more important that the monorepo devenv be reproducible than be specifically local or specifically remote. It's certainly easier to pull this off when it's a remote devserver that gets regularly imaged.

codethief

> Yeah - I worked on that build team probably at the same place you did!

I did not work at that place but the story sounds very familiar – I believe there might have been a blog post about that remote development environment here on HN some time ago?

zer00eyz

> One thing that this article didn't mention is that most development was done either on your development server running a datacenter (think ~50-100 cores)

I have done this for many small teams as well.

It remains pretty hard to get engineers to stop "thinking localy" when doing development. And with what modern hardware looks like (in terms of cost and density) it makes a lot of sense to find a rack some where for your dev team... It's easy enough to build a few boxes that can run dev, staging, test and what ever other on demand tooling you need with room to grow.

When you're close to your infrastructure and it looks that much like production, when you have to share the same playground the code inside a monorepo starts to look very different.

> managing large scale build systems ends up taking a team that works on the build system itself

This is what stops a lot of small teams from moving to monorepo. The thing is, your 10-20 person shop is never going to be google or fb or ms. They will never have large build system problems. Maintaining all of it MIGHT be someone's part time job IF you have a 20 person team and a very complex product. Even that would be pushing it.

rwieruch

Over the past four years, I’ve set up three monorepos for different companies as contract work. The experience was positive, but it’s essential to know your tools.

Since our monorepos were used exclusively for frontend applications, we could rely entirely on the JavaScript/TypeScript ecosystem, which kept things manageable.

What I learned is that a good monorepo often behaves like a “polyrepo in disguise.” Each project within it can be developed, hosted, and even deployed independently, yet they all coexist in the same codebase. The key benefit: all projects can share code (like UI components) to ensure a consistent look and feel across the entire product suite.

If you're looking for a more practical guide, check out [0].

[0] https://www.robinwieruch.de/javascript-monorepos/

wocram

This isn't a polyrepo in disguise. This is a monorepo done correctly.

bittermandel

I firmly believe that us at Molnett(serverless cloud) going for a strict monorepo built with Bazel has been paramount to us being able to make the platform with a small team of ~1.5 full-time engineers.

We can start the entire platform, Kubernetes operators and all, locally on our laptops using Tilt + Bazel + Kind. This works on both Mac and Linux. This means we can validate essentially all functionality, even our Bottlerocket-based OS with Firecracker, locally without requiring a personal development cluster or such.

We have made this tool layer which means if I run `go` or `kubectl` while in our repo, it's built and provided by Bazel itself. This means that all of us are always on the same version of tools, and we never have to maintain local installations.

It's been a HUGE blessing. It has taken some effort, will take continuous effort and to be fair it has been crucial to have an ex Google SRE on the team. I would never want to work in another way in the future.

EDIT: To clarify, our repo is essentially only Golang, Bash and Rust.

eadmund

> a small team of ~1.5 full-time engineers

Yes, with one and a half FTEs you should only have a single repo.

My experience with Bazel has been extremely bad, but I don’t think that it should necessarily be avoided completely. It may actually have some value on extremely large multi-team projects. But for less than two FTEs it seems like massive overkill.

I believe that you could do what you need with Kind (and maybe Tilt?), without Bazel.

> We have made this tool layer which means if I run `go` or `kubectl` while in our repo, it's built and provided by Bazel itself. This means that all of us are always on the same version of tools, and we never have to maintain local installations.

Go kind of does that for you already, with go.mod. Since kubectl is a Go program, you could achieve that goal the same way.

> it has been crucial to have an ex Google SRE on the team

I wonder how many additional team members y’all could afford in return for an ex-Googler’s salary expectations.

I sincerely hope that y’all find the maintenance expense of Bazel to be worth it going forward. Hopefully you will!

bittermandel

I don't think you are wrong at all. As we are all founders with an OK salary and this is our area of expertise, so we're able to take advantage of our previous experiences and reap the benefits. We're probably uniquely positioned here.

I had massive issues at my previous employer with Bazel. They did not try to make Bazel work for non-SREs, which as you can imagine didn't work very well. So it's definitely not a silver bullet!

We should probably write a blog post about our setup!

wocram

Bazel is notoriously difficult to set up to begin with, but not so difficult to use after that. What was your bad experience?

munksbeer

> My experience with Bazel has been extremely bad

Would you mind elaborating and providing some examples of what was bad?

We have a monorepo built using bazel, and at first when new to bazel, I was pretty frustrated. But now I can't think of any issue I've had with it recently.

But we do have a relatively simple setup.

lbhdc

I am in a pretty similar situation as you, and have really been feeling the benefits of going all in on bazel.

> We have made this tool layer which means if I run `go` or `kubectl` while in our repo, it's built and provided by Bazel itself. This means that all of us are always on the same version of tools, and we never have to maintain local installations.

Currently I have to run `bazel run <tool>`. Your solution sounds way better. How does yours work?

peterldowns

Not the OP but you can use tools like direnv + mise/asdf/nix so that every time a developer cd's into the monorepo, their shell environment loads a pinned, declaratively-configured set of dependencies and tools whose definitions are part of the monorepo.

The way I'd naively set up something like OP described would be to have direnv + nix flake deliver you a copy of bazelisk, and then have some custom shell scripts added to $PATH that alias `go = bazel run go`, `kubectl = bazel run kubectl` or whatever custom wrappers you want.

(Handwaving and I know the above isn't quite correct)

codethief

Came here to post this. I've used asdf/mise for years (and recently also Nix + direnv) and it works tremendously well.

mikn

Hi! Previously mentioned ex-Google SRE! There are a few layers to it - to make it work "ok" you need to first have a tool runner wrapper rule that does something similar to:

```

ctx.actions.write(output="""

tool_path=$(realpath {tool_short_path})

cd ${{BUILD_WORKING_DIRECTORY}}

exec $tool_path

""".format(tool_short_path=tool.short_path)

```

The purpose of this rule is to ensure that the tool's CWD is actually where you are inside the repository and not within the runfiles folder that Bazel prepared for you.

The second step is to set up a symlink target, similar to this:

```

#! /usr/bin/env bash

tool_name=$(basename $0)

exec -a "$tool_name" bazel run --ui_event_filters=-info,-stdout,-stderr --noshow_progress //tools/bin:$tool_name -- "$@"

```

We need to filter out all UI events since for some tools we intercept (such as jq) it expects the stdout to be clean from other output when used programmatically.

We then create a symlink for each tool name (say kubectl) to this script from another folder, and then we use `direnv` to inject the folder of symlinks into the user's paths with an `.envrc` file in the repository root like this:

```

PATH=$PWD/tools/path:$PATH

```

We have had this in place for quite a while now - it does seem like this pattern has caught some more wind and buildbuddy.io has released a ruleset: https://github.com/buildbuddy-io/bazel_env.bzl paired with https://github.com/theoremlp/rules_multitool achieves the same thing that we have built internally, the main difference being that with the bazel run wrapper we have made, you always run the latest version, whereas with the bazel_env pattern you need to manually rerun their target to get the latest binaries. :)

peterldowns

Any chance you'll be releasing your rules? I'd love to see how you do it.

lbhdc

Thanks! I am gonna give this a try.

eddd-ddde

Probably something similar to a py venv that you run to activate some aliases in your environment?

teitoklien

we run everything under systemd managed services instead of k8s and deploy via ansible playbooks at our company, and we similarly use tmuxinator to spin up all the backend api, search engine, databases like qdrant, meilisearch, etc and frontend services all in dev mode with all the terminals auto opening in window panes inside a tmux shell.

It really makes development in dev mode super simple and easy, and running all of the services in local dev environment is as simple as running one command, ‘tmuxinator’ at root of our monorepo and boom everything is up.

Monorepo truly outcompete individual repos for almost all projects, its far more pleasurable ever since I changed to this method of development.

mattmanser

The question here is why are you using micro service pattern and k8s with 2 Devs. That pattern is not designed for that small scale operation and adds tons of completely unnecessary complexity.

And does it really matter what you go with when you've got 1.5 engineers?

It's a non-problem at that scale as both engineers are intimately aware of how the entire build process works and can keep it in their head.

At that scale I've done no repo at all, repo stored on Dropbox, repo in VCS, SVN, whatever, and it all still worked fine.

It really hasn't added anything at all to your success.

BTW, it's still common for developers to start entire repos on their own laptops with zero hassles in tons of dev shops that haven't been silly and used k8s with 2 developers.

In fact at the start of my career I worked with 10 or so developers the shitty old MS one where you had to lock files so no-one else can use them. You'd checkout files to allow you to change them (very different to git checkout), otherwise they'd be ready only on your drive.

And the build was a massive VB script we had to run manually with params.

And it still worked.

We got some moaning when we moved to SVN too at how much better the old system was. Which was ridiculous as you used to have to run around and ask people to unlock key files to finish a ticket, which was made worse as we had developer consultants who'd be out of office for days on end.

So then you'd have to go hassle the greybeard who had admin rights to unlock the file for you (although he wasn't actually that old and didn't have a beard).

bittermandel

I think this take is quite shallow and lacks insight into how one would actually build a somewhat complex technical platform.

We are not using a microservice pattern at all. I am not sure where you get that from. If anything we have several "macro services".

Our final setup is quite complex as we are building a literal cloud provider, but in practice we have a Go API, a Docker registry, a Temporal Worker and a Kubernetes controller. Whats complicated is everything else around it. We run our platform on bare-metal and thus have auxiliary services like a full-blown Kubernetes cluster, Ory Hydra + Kratos, SpiceDB, Cilium, Temporal Cluster + Workers and some other small things. We need to be able to test this locally to feel safe to release to production. And in turn our production environment is almost identical to our local environments.

None of that would be possible unless we've done something similar to what we have built today. Most companies cannot run their entire stack on their laptop, more unlikely that they could run a full cloud provider.

bee_rider

Keeping code in Dropbox kinda sucks even with 1 or .5 developers though. That said, yeah, a regular old git or (I assume, never used it) svn seems fine.

eadmund

> Keeping code in Dropbox kinda sucks even with 1 or .5 developers though. That said, yeah, a regular old git or (I assume, never used it) svn seems fine.

What you do is store the git repo in Dropbox, and developers just use it as a remote. With backups, this could actually go a reasonably long time, although I personally wouldn’t suggest it.

yc-kraln

The answer, of course, is "it depends".

We have something like ~40 repos in our private gitlab repo, and each one has its own CI system, which compiles, runs tests, builds packages for distribution, etc. Then there's a CI task which integrates a file system image from those ~40 repo's packages, runs integration tasks, etc.

Many of those components communicate with each other with a flatbuffers-defined message, which of course itself is a submodule. Luckily, flatbuffers allows for progressive enhancement, but I digress--essentially, these components have some sort of inter-dependency on them which at the absolute latest surfaces at the integration phase.

Is this actually a multi-repo, or is it just a mono-repo with lots of sub-modules? Would we have benefits if we moved to a mono-repo (the current round-trip CI time for full integration is ~35 minutes, many of the components compile and test in under 10s)? Maybe.

Everything is a tradeoff. Anything can work, it's about what kinds of frustrations you're willing to put up with.

lihaoyi

I wrote a bit about monorepo tooling in this blog post. It covers many of the same points in the OP, but in a lot more detail.

- https://mill-build.org/blog/2-monorepo-build-tool.html

People like to rave about Monorepos, and they are great if set up correctly, but there's a lot of intricacies that often goes on behind the scenes to make a Monorepo successful that it's easy to overlook since usually some "other" team (devops teams, devtools team, etc.) is shouldering all that burden. Still worth it, but most be approached with caution

spankalee

I love monorepos, but in large organizations they have a counter-intuitive incentive for teams to _not_ allow other teams to depend on them, which can _reduce_ code reuse - the opposite of what some adopters want.

This issue is that users of a library can put almost infinite friction on the library. If the library team wants to make a change, they have to update all the use sites, but Hyrum's Law will get you because users will do the damndest things.

So for the top organization, it's good if many other teams can utilize a great team's battle-tested library, but for the library team it's just liability (unless making common code is their job). In a place like Google you either end up with internal copies and forks, strict access control lists, or libraries that are slow as molasses to change.

eddd-ddde

Well when making a library that's intended to be shared, you REALLY need to stop for a second and think about the API. Ideally APIs don't change, and when they do, you better have planned for large scale changes, or just use a new function and mark the old one deprecated.

I don't think there's anything wrong with copy pasting some useful piece of code too, not everything has to be a library you depend on, for small enough things.

kccqzy

At least the benefit of a monorepo is that you can find all the use sites in the first place and correct these wrong uses. You can even correct them atomically if you so wish.

wocram

All software with dependencies needs to respect it's dependents. A monorepo doesn't really change anything about the relationship between a library and it's users, except that the library or the users are somewhat more empowered to change each other.

ec109685

I would still say code is more likely to be reused in the monorepo versus trying to take an external dependency in the poly repo case. Just the ease of making a change to target your case is so much higher.

gorgoiler

An unspoken truth of a monorepo is that everyone is committed to developing on trunk, and trunk is never allowed to be broken. The consequence of this is that execution must be configurable at runtime: feature flags and configuration options with old and new code alongside each other.

You can have a monorepo and still fail if every team works on their own branch and then attempts to integrate into trunk the week before your quarterly release process begins.

You can fail if a core team builds a brand new version of the product on master with all new tests such that everything is green on every commit but your code is unreleasable because customers aren’t ready for v2 and you need to keep that v1 compatability around.

946789987649

I didn't know places still had quarterly releases. That seems to like the one to resolve rather than a mono repo.

surajrmal

Android is only recently switching to quarter releases instead of yearly. Most. Popular Linux distros only have major releases every 6 months. While chrome cuts a release branch every 4 weeks, it soaks it in a beta channel for another 4. Same goes for the rust compiler toolchain, albeit on a 6 week cadence.

gorgoiler

It’s more common than you think if you expand your view of release a bit. On the one hand you very much still have shrink-wrap software (for example, all firmware) that ships on a very slow cadence.

On the other hand even the big tech companies will only expose code paths very slowly and very conservatively. Meta’s Threads.app for example combined both a constant churn of innovation on master with a very measured gating of new features shipping to the public.

The best teams do indeed, as you say, ship and test finished builds on a weekly or daily basis even if the stuff that gets under the customers’ / users’ / clients’ noses appears on a far less regular basis. After all, any kind of severe bug could necessitate a release at any moment.

bluGill

not all the world is a web site or even internet connetted. not all the world has no safety concerns.

if you work in medical or aviation areas every release legally needs extensive - months - testing before you can release. If there are issuse found in that testing you start over. Not all tests can be automated.

i work in agraculture. the entire month of July there will be nobody in the world using a planter or any of the software on it. there is no point in a release then. the lack of users means we cannot use automated rollback if the change somehow fails for customers - we could but it would be months of changes rolled back whe Brasil starts planting season.

vegetablepotpie

Every company that uses SAFe agile has quarterly, or bi-quarterly, releases [1].

[1] https://www.servicenow.com/docs/bundle/yokohama-it-business-...

jonthepirate

I'm on the build team at DoorDash. We're in year 1 of our Bazel monorepo journey. We are heavy into Go, already have remote execution and caching working, and are looking to add support for Python & C++ soon.

If this sort of stuff happens to be something you might want to work on, our team has multiple openings... if you search for "bazel" on our careers page, you'll find them.

ecoffey

Monorepo is one of few things I’ve drunk the koolaid on. I joke that the only thing worse than being in a monorepo, is not being in one.

codethief

Thanks, I'll steal that one! :-)