What would a Kubernetes 2.0 look like

210 comments

·June 19, 2025

fideloper

"Low maintenance", welp.

I suppose that's true in one sense - in that I'm using EKS heavily, and don't maintain cluster health myself (other than all the creative ways I find to fuck up a node). And perhaps in another sense: It'll try its hardest to run some containers so matter how many times I make it OOMkill itself.

Buttttttttt Kubernetes is almost pure maintenance in reality. Don't get me wrong, it's amazing to just submit some yaml and get my software out into the world. But the trade off is pure maintenance.

The workflows to setup a cluster, decide which chicken-egg trade-off you want to get ArgoCD running, register other clusters if you're doing a hub-and-spoke model ... is just, like, one single act in the circus.

Then there's installing all the operators of choice from https://landscape.cncf.io/. I mean that page is a meme, but how many of us run k8s clusters without at least 30 pods running "ancillary" tooling? (Is "ancillary" the right word? It's stuff we need, but it's not our primary workloads).

A repeat circus is spending hours figuring out just the right values.yaml (or, more likely, hours templating it, since we're ArgoCD'ing it all, right?)

> As an side, I once spent HORUS figuring out to (incorrectly) pass boolean values around from a Secrets Manager Secret, to a k8s secret - via External Secrets, another operator! - to an ArgoCD ApplicationSet definition, to another values.yaml file.

And then you have to operationalize updating your clusters - and all the operators you installed/painstakingly configured. Given the pace of releases, this is literally, pure maintenance that is always present.

Finally, if you're autoscaling (Karpenter in our case), there's a whole other act in the circus (wait, am I still using that analogy?) of replacing your nodes "often" without downtime, which gets fun in a myriad of interesting ways (running apps with state is fun in kubernetes!)

So anyway, there's my rant. Low fucking maintenance!

zzyzxd

But you are not talking about maintaining Kubernetes, you are talking about maintaining a CI/CD system, a secret management system, some automation to operate databases, and so on.

Instead of editing some YAML files, in the "old" days these software vendors would've asked you to maintain a cronjob, ansible playbooks, systemd unit, bash scripts...

aljgz

"Low Maintenance" is relative to alternatives. In my experience, any time I was dealing with K8s I needed much lower maintenance to get the same quality of service (everything from [auto]scaling, to faileover, deployment, rollback, disaster recovery, DevOps, ease of spinning up a completely independent cluster) compared to not using it. YMMV.

SOLAR_FIELDS

It's yet another argument, like many arguments against Kubernetes, that essentially boils down to "Kubernetes is complicated!"

No, deployment of a distributed system itself is complicated, regardless of the platform you deploy it on. Kubernetes is only "complicated" because it can do all of the things you need to do to deploy software, in a standard way. You can simplify by not using Kube, but then you have to hand roll all of the capabilities that Kube just gives you for free. If you don't need a good hunk of those capabilities, you probably don't need Kube.

nostrebored

When choosing distributed systems platforms to work with, k8s vs. rolling your own orchestration isn’t the decision anyone is making. It’s k8s vs cloud vendors that want your money in exchange for the headaches.

danpalmer

Exactly, there are a lot of comparisons that aren't apples to apples. If you're comparing kubernetes to a fixed size pool of resources running a fixed set of applications each with their own fixed resources, who cares? That's not how most deployments today.

One could make the argument that deployments today that necessitate K8s are too complex, I think there's a more convincing argument there, but my previous company was extremely risk averse in architecture (no resumé driven development) and eventually moved on to K8s, and systems at my current company often end up being way simpler than anyone would expect, but at scale, the coordination without a K8s equivalent would be too much work.

null

[deleted]

turtlebits

Sounds self inflicted. Stop installing so much shit. Everything you add is just tech debt and has a cost associated, even if the product is free.

If autoscaling doesn't save more $$ than the tech debt/maintenance burden, turn it off.

ozim

I agree with your take.

But I think a lot of people are in state where they need to run stuff the way it is because “just turn it off” won’t work.

Like system after years on k8s coupled to its quirks. People not knowing how to setup and run stuff without k8s.

pnathan

vis-a-vis running a roughly equivalent set of services cobbled together, its wildly low maintenance to the point of fire and forget.

you do have to know what you're doing and not fall prey to the "install the cool widget" trap.

ljm

I’ve been running k3s on hetzner for over 2 years now with 100% uptime.

In fact, it was so low maintenance that I lost my SSH key for the master node and I had to reprovision the entire cluster. Took about 90 mins including the time spent updating my docs. If it was critical I could have got that down to 15 mins tops.

20€/mo for a k8s cluster using k3s, exclusively on ARM, 3 nodes 1 master, some storage, and a load balancer with automatic dns on cloudflare.

Bombthecat

Yeah, as soon as you got your helm charts and node installers.

Installing is super fast.

We don't do backup of the cluster for example for that reason ( except databases etc) we just reprovision the whole cluster.

verst

How often do you perform version upgrades? Patching of the operation system of the nodes or control plane etc? Things quickly get complex if application uptime / availability is critical.

TacticalCoder

[dead]

NathanFlurry

The #1 problem with Kubernetes is it's not something that "Just Works." There's a very small subset of engineers who can stand up services on Kubernetes without having it fall over in production – not to mention actually running & maintaining a Kubernetes cluster on your own VMs.

In response, there's been a wave of "serverless" startups because the idea of running anything yourself has become understood as (a) a time sink, (b) incredibly error prone, and (c) very likely to fail in production.

I think a Kubernetes 2.0 should consider what it would look like to have a deployment platform that engineers can easily adopt and feel confident running themselves – while still maintaining itself as a small-ish core orchestrator with strong primitives.

I've been spending a lot of time building Rivet to itch my own itch of an orchestrator & deployment platform that I can self-host and scale trivially: https://github.com/rivet-gg/rivet

We currently advertise as the "open-source serverless platform," but I often think of the problem as "what does Kubernetes 2.0 look like." People are already adopting it to push the limits into things that Kubernetes would traditionally be good at. We've found the biggest strong point is that you're able to build roughly the equivalent of a Kubernetes controller trivially. This unlocks features more complex workload orchestration (game servers, per-tenant deploys), multitenancy (vibe coding per-tenant backends, LLM code interpreters), metered billing per-tenant, more powerful operators, etc.

stuff4ben

I really dislike this take and I see it all the time. Also I'm old and I'm jaded, so it is what it is...

Someone decides X technology is too heavy-weight and wants to just run things simply on their laptop because "I don't need all that cruft". They spend time and resources inventing technology Y to suit their needs. Technology Y gets popular and people add to it so it can scale, because no one runs shit in production off their laptops. Someone else comes along and says, "damn, technology Y is too heavyweight, I don't need all this cruft..."

"There are neither beginnings nor endings to the Wheel of Time. But it was a beginning.”

adrianmsmith

It’s also possible for things to just be too complex.

Just because something’s complex doesn’t necessarily mean it has to be that complex.

mdaniel

IMHO, the rest of that sentence is "be too complex for some metric within some audience"

I can assure you that trying to reproduce kubernetes with a shitload of shell scripts, autoscaling groups, cloudwatch metrics, and hopes-and-prayers is too complex for my metric within the audience of people who know Kubernetes

supportengineer

Because of promo-driven, resume-driven culture, engineers are constantly creating complexity. No one EVER got a promotion for creating LESS.

wongarsu

Or too generic. A lot of the complexity if from trying to support all use cases. For each new feature there is a clear case of "we have X happy users, and Y people who would start using it if we just added Z". But repeat that often enough and the whole things becomes so complex and abstract that you lose those happy users.

The tools I've most enjoyed (including deployment tools) are those with a clear target group and vision, along with leadership that rejects anything that falls too far outside of it. Yes, it usually doesn't have all the features I want, but it also doesn't have a myriad of features I don't need

NathanFlurry

I hope this isn't the case here with Rivet. I genuinely believe that Kubernetes does a good job for what's on the tin (i.e. container orchestration at scale), but there's an evolution that needs to happen.

If you'll entertain my argument for a second:

The job of someone designing systems like this is to decide what are the correct primitives and invest in building a simple + flexible platform around those.

The original cloud primitives were VMs, block devices, LBs, and VPCs.

Kubernetes became popular because it standardized primitives (pods, PVCs, services, RBAC) that containerized applications needed.

Rivet's taking a different approach of investing in different three primitives based on how most organizations deploy their applications today:

- Stateless Functions (a la Fluid Compute)

- Stateful Workers (a la Cloudflare Durable Objects)

- Containers (a la Fly.io)

I fully expect to raise a few hackles claiming these are the "new primitives" for modern applications, but our experience shows it's solving real problems for real applications today.

Edit: Clarified "original cloud primitives"

RattlesnakeJake

See also: JavaScript frameworks

themgt

The problem Kubernetes solves is "how do I deploy this" ... so I go to Rivet (which does look cool) docs, and the options are:

* single container

* docker compose

* manual deployment (with docker run commands)

But erm, realistically how is this a viable way to deploy a "serverless infrastructure platform" at any real scale?

My gut response would be ... how can I deploy Rivet on Kubernetes, either in containers or something like kube-virt to run this serverless platform across a bunch of physical/virtual machines? How is docker compose a better more reliable/scalable alternative to Kubernetes? So alternately then you sell a cloud service, but ... that's not a Kubernetes 2.0. If I was going to self-host Rivet I'd convert your docs so I could run it on Kubernetes.

NathanFlurry

Our self-hosting docs are very rough right now – I'm fully aware of the irony given my comment. It's on our roadmap to get them up to snuff within the next few weeks.

If you're curious on the details, we've put a lot of work to make sure that there's as few moving parts as possible:

We have our own cloud VM-level autoscaler that's integrated with the core Rivet platofrm – no k8s or other orchestrators in between. You can see the meat of it here: https://github.com/rivet-gg/rivet/blob/335088d0e7b38be5d029d...

For example, Rivet has an API to dynamically spin up a cluster on demand: https://github.com/rivet-gg/rivet/blob/335088d0e7b38be5d029d...

Once you start the Rivet "seed" process with your API key, everything from there is automatic.

Therefore, self-hosted deployments usually look like one of:

- Plugging in your cloud API token in to Rivet for autoscaling (recommended)

- Fixed # of servers (hobbyist deployments that were manually set up, simple Terraform deployments, or bare metal)

- Running within Kubernetes (usually because it depends on existing services)

hosh

It's been my experience that nothing in infra and ops will ever "just work". Even something like Heroku will run into scaling issues, and how much you are willing to pay for it.

If people's concerns is that they want a deployment platform that can be easily adopted and used, it's better to understand Kubernetes as the primitives on which the PaaS that people want can be built on top of it.

Having said all that, Rivet looks interesting. I recognize some of the ideas from the BEAM ecosystem. Some of the appeal to me has less to do with deploying at scale, and more to do with resiliency and local-first.

akdor1154

I think the yaml / HCL and package system overlap..

I wouldnt so much go HCL as something like JSONnet, Pkl, Dhall, or even (inspiration not recommendation) Nix - we need something that allows a schema for powering an LSP, with enough expressitivity to void the need for Helm's templating monstrosity, and ideally with the ability for users to override things that library/package authors haven't provided explicit hooks for.

Does that exist yet? Probably not, but the above languages are starting to approach it.

pas

k8s could provide a strongly-typed DSL, like cdk8s+

it's okay to be declarative (the foundation layer), but people usually think in terms of operations (deploy this, install that, upgrade this, assign some vhost, filter that, etc.)

and even for the declarative stuff, a builder pattern works well, it can be super convenient, sufficient terse, and typed, and easily composed (and assembled via normál languages, not templates)

...

well. anyway. maybe by the time k8s2 rolls around Go will have at least normal error handling.

otterley

First, K8S doesn't force anyone to use YAML. It might be idiomatic, but it's certainly not required. `kubectl apply` has supported JSON since the beginning, IIRC. The endpoints themselves speak JSON and grpc. And you can produce JSON or YAML from whatever language you prefer. Jsonnet is quite nice, for example.

Second, I'm curious as to why dependencies are a thing in Helm charts and why dependency ordering is being advocated, as though we're still living in a world of dependency ordering and service-start blocking on Linux or Windows. One of the primary idioms in Kubernetes is looping: if the dependency's not available, your app is supposed to treat that is a recoverable error and try again until the dependency becomes available. Or, crash, in which case, the ReplicaSet controller will restart the app for you.

You can't have dependency conflicts in charts if you don't have dependencies (cue "think about it" meme here), and you install each chart separately. Helm does let you install multiple versions of a chart if you must, but woe be unto those who do that in a single namespace.

If an app truly depends on another app, one option is to include the dependency in the same Helm chart! Helm charts have always allowed you to have multiple application and service resources.

cbarrick

Big +1 to dependency failure should be recoverable.

I was part of an outage caused by a fail-closed behavior on a dependency that wasn't actually used and was being turned down.

Dependencies among servers are almost always soft. Just return a 500 if you can't talk to your downstream dependency. Let your load balancer route around unhealthy servers.

Arrowmaster

You say supposed to. That's great when building your own software stack in house but how much software is available that can run on kubenetes but was created before it existed. But somebody figured out it could run in docker and then later someone realized it's not that hard to make it run in kubenetes because it already runs in docker.

You can make an opinionated platform that does things how you think is the best way to do them, and people will do it how they want anyway with bad results. Or you can add the features to make it work multiple ways and let people choose how to use it.

delusional

> One of the primary idioms in Kubernetes is looping

Indeed, working with kubernetes I would argue that the primary architectural feature of kubernetes is the "reconciliation loop". Observe the current state, diff a desired state, apply the diff. Over and over again. There is no "fail" or "success" state, only what we can observe and what we wish to observe. Any difference between the two is iterated away.

I think it's interesting that the dominant "good enough technology" of mechanical control, the PID feedback loop, is quite analogous to this core component of kubernetes.

p_l

PID feedback loop, OODA loop, and blackboard systems (AI design model) are all valid metaphors that k8s embodies, with first two being well known enough that they were common in presentations/talks about K8s around 1.0

tguvot

i developed a system like this (with reconciliation loop, as you call it) some years ago. there is most definitely failed state (for multiple reasons). but as part of "loop" you can have logic to fix it up in order to bring it to desired state.

we had integrated monitoring/log analysis to correlate failures with "things that happen"

pm90

Hard disagree with replacing yaml with HCL. Developers find HCL very confusing. It can be hard to read. Does it support imports now? Errors can be confusing to debug.

Why not use protobuf, or similar interface definition languages? Then let users specify the config in whatever language they are comfortable with.

vanillax

Agree HCL is terrible. K8s YAML is fine. I have yet to hit a use case that cant be solved with its types. If you are doing too much perhaps a config map is the wrong choice.

ofrzeta

It's just easier to shoot yourself in the foot with no proper type support (or enforcement) in YAML. I've seen Kubernetes updates fail when the version field was set to 1.30 and it got interpreted as a float 1.3. Sure, someone made a mistakes but the config language should/could stop you from making them.

XorNot

Yeah but that's on the application schema. If it's version string why are accepting floats for it?

geoctl

You can very easily build and serialize/deserialize HCL, JSON, YAML or whatever you can come up with outside Kubernetes from the client-side itself (e.g. kubectl). This has actually nothing to do with Kubernetes itself at all.

Kwpolska

There aren’t that many HCL serialization/deserialization tools. Especially if you aren’t using Go.

dilyevsky

Maybe you know this but Kubernetes interface definitions are already protobufs (except for crds)

cmckn

Sort of. The hand-written go types are the source of truth and the proto definitions are generated from there, solely for the purpose of generating protobuf serializers for the hand-written go types. The proto definition is used more as an intermediate representation than an “API spec”. Still useful, but the ecosystem remains centered on the go types and their associated machinery.

dilyevsky

Given that i can just take generated.proto and ingest it in my software then marshal any built-in type and apply it via standard k8s api, why would I even need all the boilerplate crap from apimachinery? Perfectly happy with existing rest-y semantics - full grpc would be going too far

acdha

> Developers find HCL very confusing. It can be hard to read. Does it support imports now? Errors can be confusing to debug.

This sounds a lot more like “I resented learning something new” than anything about HCL, or possibly confusing learning HCL simultaneously with something complex as a problem with the config language rather than the problem domain being configured.

aduwah

Issue is that you do not want a dev learning hcl. Same as you don't want your SRE team learning next and react out of necessity.

The ideal solution would be to have an abstraction that is easy to use and does not require learning a whole new concept (especially an ugly one as hcl). Also learning hcl is simply just the tip of the iceberg, with sinking into the dependencies between components and outputs read from a bunch of workspaces etc. It is simply wasted time to have the devs keeping up with all the terraform heap that SREs manage and keep evolving under the hood. The same dev time better spent creating features.

acdha

Why? If they can’t learn HCL, they’re not going to be a successful developer.

If your argument is instead that they shouldn’t learn infrastructure, then the point is moot because that applies equally to every choice (knowing how to indent YAML doesn’t mean they know what to write). That’s also wrong as an absolute position but for different reasons.

XorNot

HCL can be represented as JSON and thus can be represented as YAML. Why did we need another configuration language which I'll also simply deal with with a proper programming language at any level serious usage?

dochne

My main beef with HCL is a hatred for how it implemented for loops.

Absolutely loathsome syntax IMO

mdaniel

It was mentioned pretty deep in another thread, but this is just straight up user hostile

  variable "my_list" { default = ["a", "b"] }
  resource whatever something {
    for_each = var.my_list
  }

  The given "for_each" argument value is unsuitable: the "for_each" argument must be a map, or set of strings, and you have provided a value of type tuple.

znpy

> Hard disagree with replacing yaml with HCL.

I see some value instead. Lately I've been working on Terraform code to bring up a whole platform in half a day (aws sub-account, eks cluster, a managed nodegroup for karpenter, karpenter deployment, ingress controllers, LGTM stack, public/private dns zones, cert-manager and a lot more) and I did everything in Terraform, including Kubernetes resources.

What I appreciated about creating Kubernetes resources (and helm deployments) in HCL is that it's typed and has a schema, so any ide capable of talking to an LSP (language server protocol - I'm using GNU Emacs with terraform-ls) can provide meaningful auto-completion as well proper syntax checking (I don't need to apply something to see it fail, emacs (via the language server) can already tell me what I'm writing is wrong).

I really don't miss having to switch between my ide and the Kubernetes API reference to make sure I'm filling each field correctly.

NewJazz

I do something similar except with pulumi, and as a result I don't need to learn HCL, and I can rely on the excellent language servers for e.g. Typescript or python.

wredcoll

But... so do yaml and json documents?

dangus

Confusing? Here I am working on the infrastructure side thinking that I’m working with a a baby configuration language for dummies who can’t code when I use HCL/Terraform.

The idea that someone who works with JavaScript all day might find HCL confusing seems hard to imagine to me.

To be clear, I am talking about the syntax and data types in HCL, not necessarily the way Terraform processes it, which I admit can be confusing/frustrating. But Kubernetes wouldn’t have those pitfalls.

icedchai

Paradoxically, the simplicity itself can be part of the confusion: the anemic "for loop" syntax, crazy conditional expressions to work around that lack of "if" statements, combine this with "count" and you can get some weird stuff. It becomes a flavor all its own.

mdaniel

orly, what structure does this represent?

  outer {
    example {
      foo = "bar"
    }
    example {
      foo = "baz"
    }
  }

it reminds me of the insanity of toml

  [lol]
  [[whut]]
  foo = "bar"
  [[whut]]
  foo = "baz"

only at least with toml I can $(python3.13 -c 'import tomllib, sys; print(tomllib.loads(sys.stdin.read()))') to find out, but with hcl too bad

XorNot

Oh thank god I'm not the only one who sees it.

People complain about YAML then go and use TOML.

nunez

I _still_ think Kubernetes is insanely complex, despite all that it does. It seems less complex these days because it's so pervasive, but complex it remains.

I'd like to see more emphasis on UX for v2 for the most common operations, like deploying an app and exposing it, then doing things like changing service accounts or images without having to drop into kubectl edit.

Given that LLMs are it right now, this probably won't happen, but no harm in dreaming, right?

Pet_Ant

Kubernetes itself contains so many layers of abstraction. There are pods, which is the core new idea, and it's great. But now there are deployments, and rep sets, and namespaces... and it makes me wish we could just use Docker Swarm.

Even Terraform seems to live on just a single-layer and was relatively straight-forward to learn.

Yes, I am in the middle of learning K8s so I know exactly how steep the curve is.

jakewins

The core idea isn’t pods. The core idea is reconciliation loops: you have some desired state - a picture of how you’d like a resource to look or be - and little controller loops that indefinitely compare that to the world, and update the world.

Much of the complexity then comes from the enormous amount of resource types - including all the custom ones. But the basic idea is really pretty small.

I find terraform much more confusing - there’s a spec, and the real world.. and then an opaque blob of something I don’t understand that terraform sticks in S3 or your file system and then.. presumably something similar to a one-shot reconciler that wires that all together each time you plan and apply?

vrosas

Someone saying "This is complex but I think I have the core idea" and someone to responding "That's not the core idea at all" is hilarious and sad. BUT ironically what you just laid out about TF is exactly the same - you just manually trigger the loop (via CI/CD) instead of a thing waiting for new configs to be loaded. The state file you're referencing is just a cache of the current state and TF reconciles the old and new state.

mdaniel

> a one-shot reconciler that wires that all together each time you plan and apply?

You skipped the { while true; do tofu plan; tofu apply; echo "well shit"; patch; done; } part since the providers do fuck-all about actually, no kidding, saying whether the plan could succeed

jonenst

To me the core of k8s is pod scheduling on nodes, networking ingress (e.g. nodeport service), networking between pods (everything addressable directly), and colocated containers inside pods.

Declarative reconciliation is (very) nice but not irreplaceable (and actually not mandatory, e.g. kubectl run xyz)

NathanFlurry

We're still missing a handful of these features, but this is the end goal with what we're building over at Rivet: https://github.com/rivet-gg/rivet

This whole thing started scratching my own itch of wanting an orchestrator that I can confidently stand up, delpoy to, then forget about.

stackskipton

Ops type here, after looking at Rivet, I've started doing The Office "Dear god no, PLEASE NO"

Most people are looking for Container Management runtime with HTTP(S) frontend that will handle automatic certificate from Let's Encrypt.

I don't want Functions/Actors or require this massive suite:

FoundationDB: Actor state

CockroachDB: OLTP

ClickHouse: Developer-facing monitoring

Valkey: Caching

NATS: Pub/sub

Traefik: Load balancers & tunnels

This is just switching Kubernetes cloud lock in with KEDA and some other more esoteric operators to Rivet Cloud lock in. At least Kubernetes is slightly more portable than this.

Oh yea, I don't know what Clickhouse is doing with monitoring but Prometheus/Grafana suite called, said they would love for you to come home.

mdaniel

I recognize that I'm biased, but you'll want to strongly consider whether https://rivet.gg/docs/config is getting your audience where they can be successful, as compared to (e.g.) https://kubernetes.io/docs/reference/generated/kubernetes-ap...

coderatlarge

where is that in the design space relative to where goog internal cluster management has converged to after the many years and the tens of thousands of engineers who have sanded it down under heavy fire since the original borg?

throwaway5752

I've come to think that it is a case of "the distinctions between types of computer programs are a human construct" problem.

I agree with you on a human level. Operators and controllers remind me of COM and CORBA, in a sense. They are hightly abstract things, that are intrinsically so flexible that they allow judgement (and misjudgement) in design.

For simple implementations, I'd want k8s-lite, that was more opinionated and less flexible. Something which doesn't allow for as much shooting ones' self in the foot. For very complex implementations, though, I've felt existing abstractions to be limiting. There is a reason why a single cluster is sometimes the basis for cell boundaries in cellular architectures.

I sometimes wonder if one single system - kubernetes 2.0 or anything else - can encompass the full complexity of the problem space while being tractable to work with by human architects and programmers.

nine_k

> I'd want k8s-lite, that was more opinionated and less flexible

You seem to want something like https://skateco.github.io/ (still compatible to k8s manifests).

Or maybe even something like https://uncloud.run/

Or if you still want real certified Kubernetes, but small, there is https://k3s.io/

mdaniel

Ah, so that explains it: https://github.com/skateco/skate#:~:text=leverages%20podman%...

mountainriver

We have started working on a sort of Kubernetes 2.0 with https://github.com/agentsea/nebulous -- still pre-alpha

Things we are aiming to improve:

* Globally distributed * Lightweight, can easily run as a single binary on your laptop while still scaling to thousands of nodes in the cloud. * Tailnet as the default network stack * Bittorrent as the default storage stack * Multi-tenant from the ground up * Live migration as a first class citizen

Most of these needs were born out of building modern machine learning products, and the subsequent GPU scarcity. With ML taking over the world though this may be the norm soon.

hhh

Wow… Cool stuff, the live migration is very interesting. We do autoscaling across clusters across clouds right now based on pricing, but actual live migration is a different beast

Thaxll

This is not Kubernetes, this a custom made solution to run GPU.

nine_k

Since it still can consume Kubernetes manifests, it's of interest for k8s practitioners.

Since k8s manifests are a language, there can be multiple implementations of it, and multiple dialects will necessarily spring up.

mountainriver

Which is the future of everything and Kuberentes does a very bad job at

mdaniel

You stopped typing; what does Kubernetes do a bad job at with relation to scheduling workloads that declare they need at least 1 GPU resource but should be limited to no more than 4 GPU resources on a given Node? https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus...

znpy

> * Globally distributed

Non-requirement?

> * Tailnet as the default network stack

That would probably be the first thing I look to rip out if I ever was to use that.

Kubernetes assuming the underlying host only has a single NIC has been a plague for the industry, setting it back ~20 years and penalizing everyone that's not running on the cloud. Thank god there are multiple CNI implementation.

Only recently with Multus (https://www.redhat.com/en/blog/demystifying-multus) some sense seem to be coming back into that part of the infrastructure.

> * Multi-tenant from the ground up

How would this be any different from kubernetes?

> * Bittorrent as the default storage stack

Might be interesting, unless you also mean seeding public container images. Egress traffic is crazy expensive.

mountainriver

>> * Globally distributed >Non-requirement?

It is a requirement because you can't find GPUs in a single region reliably and Kubernetes doesn't run on multiple regions.

>> * Tailnet as the default network stack

> That would probably be the first thing I look to rip out if I ever was to use that.

This is fair, we find it very useful because it easily scales cross clouds and even bridges them locally. It was the simplest solution we could implement to get those properties, but in no way would we need to be married to it.

>> * Multi-tenant from the ground up

> How would this be any different from kubernetes?

Kuberentes is deeply not multi-tenant, anyone who has tried to make a multi-tenant solution over kube has dealt with this. I've done it at multiple companies now, its a mess.

>> * Bittorrent as the default storage stack

> Might be interesting, unless you also mean seeding public container images. Egress traffic is crazy expensive.

Yeah egress cost is a concern here, but its lazy so you don't pay for it unless you need it. This seemed like the lightest solution to sync data when you do live migrations cross cloud. For instance, I need to move my dataset and ML model to another cloud, or just replicate it there.

stackskipton

What is use case for multiple NICs outside bonding for hardware failure?

Every time I’ve had multiple NICs on a server with different IPs, I’ve regretted it.

mdaniel

I'd guess management access, or the old school way of doing vLANs. Kubernetes offers Network Policies to solve the risk of untrusted workloads in the cluster accessing both pods and ports on pods that they shouldn't https://kubernetes.io/docs/concepts/services-networking/netw...

Network Policies are also defense in depth, since another Pod would need to know its sibling Pod's name or IP to reach it directly, the correct boundary for such things is not to expose management toys in the workload's Service, rather create a separate Service that just exposes those management ports

Akin to:

  interface Awesome { String getFavoriteColor(); }
  interface Management { void setFavoriteColor(String value); }
  class MyPod implements Awesome, Management {}

but then only make either Awesome, or Management, available to the consumers of each behavior

znpy

A nic dedicated to SAN traffic, for example. People being serious about networked storage don’t run their storage network i/o on the same nic where they serve traffic.

nine_k

> Non-requirement

> the first thing I look to rip out

This only shows how varied the requirements are across the industry. One size does not fit all, hence multiple materially different solutions spring up. This is only good.

znpy

> One size does not fit all, hence multiple materially different solutions spring up.

Sooo… like what kubernetes does today?

mdaniel

heh, I think you didn't read the room given this directory https://github.com/agentsea/nebulous/blob/v0.1.88/deploy/cha...

Also, ohgawd please never ever do this ever ohgawd https://github.com/agentsea/nebulous/blob/v0.1.88/deploy/cha...

mountainriver

Why not? We can run on Kube and extend it to multi-region when needed, or we can run on any VM as a single binary, or just your laptop.

If you mean Helm, yeah I hate it but it is the most common standard. Also not sure what you mean by the secret, that is secure.

mdaniel

Secure from what, friend? It's a credential leak waiting to happen, to say nothing of the need to now manage IAM Users in AWS. That is the 2000s way of authenticating with AWS, and reminds me of people who still use passwords for ssh. Sure, it works fine, until some employee leaves and takes the root password with them

johngossman

Not a very ambitious wishlist for a 2.0 release. Everyone I talk to complains about the complexity of k8s in production, so I think the big question is could you do a 2.0 with sufficient backward compatibility that it could be adopted incrementally and make it simpler. Back compat almost always mean complexity increases as the new system does its new things and all the old ones.

herval

The question is always what part of that complexity can be eliminated. Every “k8s abstraction” I’ve seen to date either only works for a very small subset of stuff (eg the heroku-like wrappers) or eventually develops a full blown dsl that’s as complex as k8s (and now you have to learn that job-specific dsl)

mdaniel

Relevant: Show HN: Canine – A Heroku alternative built on Kubernetes - https://news.ycombinator.com/item?id=44292103 - June, 2025 (125 comments)

herval

yep, that's the latest of a long lineage of such projects (one of which I worked on myself). Ohers include kubero, dokku, porter, kr0, etc. There was a moment back in 2019 where every big tech company was trying to roll out their own K8s DSL (I know of Twitter, Airbnb, WeWork, etc).

For me, the only thing that really changed was LLMs - chatgpt is exceptional at understanding and generating valid k8s configs (much more accurately than it can do coding). It's still complex, but it feels I have a second brain to look at it now

mrweasel

What I would add is "sane defaults", as in unless you pick something different, you get a good enough load balancer/network/persistent storage/whatever.

I'd agree that YAML isn't a good choice, but neither is HCL. Ever tried reading Terraform, yeah, that's bad too. Inherently we need a better way to configure Kubernetes clusters and changing out the language only does so much.

IPv6, YES, absolutely. Everything Docker, container and Kubernetes should have been IPv6 only internal from the start. Want IPv4? That should be handle by a special case ingress controller.

zdw

Sane defaults is in conflict with "turning you into a customer of cloud provider managed services".

The longer I look at k8s, the more I see it "batteries not included" around storage, networking, etc, with the result being that the batteries come with a bill attached from AWS, GCP, etc. K8s is less of an open source project, and more as a way encourage dependency on these extremely lucrative gap filler services from the cloud providers.

JeffMcCune

Except you can easily install calico, istio, and ceph on used hardware in your garage and get an experience nearly identical to every hyper scaler using entirely free open source software.

zdw

Having worked on on-prem K8s deployments, yes, you can do this. But getting it to production grade is very different than a garage-quality proof of concept.

ChocolateGod

I find it easier to read Terraform/HCL over YAML for the simple fact that it does't rely me trying to process invisible characters.

benced

I found Kubernetes insanely intuitive coming from the frontend world. I was used to writing code that took in data and made the UI react to that - now I write code that the control panel uses reconciles resources with config.

hosh

While we're speculating:

I disagree that YAML is so bad. I don't particularly like HCL. The tooling I use don't care though -- as long as I can stil specify things in JSON, then I can generate (not template) what I need. It would be more difficult to generate HCL.

I'm not a fan of Helm, but it is the de facto package manager. The main reason I don't like Helm has more to do with its templating system. Templated YAML is very limiting, when compared to using a full-fledged language platform to generate a datastructure that can be converted to JSON. There are some interesting things you can do with that. (cdk8s is like this, but it is not a good example of what you can do with a generator).

On the other hand, if HCL allows us to use modules, scoping, and composition, then maybe it is not so bad after all.

mikeocool

How about release 2.0 and then don’t release 2.1 for a LONG time.

I get that in the early days such a fast paced release/EOL schedule made sense. But now something that operates at such a low level shouldn’t require non-security upgrades every 3 months and have breaking API changes at least once a year.