Yoke: Infrastructure as code, but actually

168 comments

·March 2, 2025

solatic

> If you really do think that Terraform is code, then go try and make multiple DNS records for each random instance ID based on a dynamic number of instances. Correct me if I'm wrong, but I don't think you can do that in Terraform.

It depends on where the source of dynamism is coming from, but yes you can do this in Terraform. You get the instances with data.aws_instances, feed it into aws_route53_record with a for_each, and you're done. Maybe you need to play around with putting them into different modules because of issues with dynamic state identifiers, but it's not remotely the most complicated Terraform I've come across.

That's a separate question from whether or not it's a good idea. Terraform is a one-shot CLI tool, not a daemon, and it doesn't provide auto-reconciliation on its own (albeit there are daemons like Terraform Enterprise / TerraKube that will run Terraform on a schedule for you and thus provide auto-reconciliation). Stuff like DNS records for Kubernetes ingress is much better handled by external-dns, which itself is statically present in a Kubernetes cluster and therefore might be more properly installed with Terraform.

ljm

K8S is at a point now where I'd probably try to configure whatever I can inside the cluster as an operator or controller.

There are going to be situations where that isn't practical, but the ability to describe all the pieces of your infra as a CRD is quite nice and it takes some pain out of having things split between terraform/pulumi/cdk and yaml.

At that point, you're just running your own little cloud instead of piggybacking on someone else's. Just need a dry-run pipeline so you can review changes before applying them to the cluster.

solatic

Sure, but the Kubernetes cluster itself, plus its foundational extra controllers (e.g. FluxCD) are basically static and therefore should be configured in Terraform.

SOLAR_FIELDS

That’s only true if you go with an architecture that involves doing so in terraform. A common pattern I implement is an initial management cluster bootstrap that runs Argo then after that it’s possible to manage everything, including cluster components of “child” clusters, using Argo. Can use either cluster api provider or cross plane for that, or one of the cloud specific ones like ack.

One single imperative helm install command to start the whole train rolling then after that it’s all IaC

url00

Can you expand a bit on the kinds of things you are doing in operators and controllers? I've been wary to put to much in the cluster... but maybe I should be doing more.

klooney

https://registry.terraform.io/providers/hashicorp/random/lat... is also very useful for this sort of thing, in case you want a persistent random value per resource- shuffle, id, pet, and password are all super handy.

akdor1154

Hill I will die on: Terraform being less expressive than a real language is a feature, not a drawback.

CDK/Pulumi/Yoke is optimised for being easy to write, but code should be optimised to be easy to READ.

Sure, cdk/pulumi/yoke lets you write the most clever and succinct construction you can compose in your favourite language.. however, whoever comes across your clever code next will probably want to hit you, especially if it's not a dev from your immediate team, and especially if you have succumbed to blurring the lines between your idk code and your app code.

If they instead come across some bog-standard terraform that maybe has a bunch of copy-paste and is a bit more verbose... Who cares? Its function will be obvious, there is no mental overhead needed.

On the flipside Helm templating is an absolute abomination and i would probably take anything over needing to immerse myself in that filth, maybe Yoke is worth a look after all. But the REAL answer is a real config language, still.

Aeolun

> code should be optimised to be easy to READ

You say that as if it’s impossible to write clear code. As soon as you have any form of multiple resources (e.g. create x of y) I’ll take the real programming language over terraform.

dijksterhuis

> As soon as you have any form of multiple resources

terraform handles this with for_each. need 10 EBS volumes on 10 EC2 instances? for_each and link instance id of the each value. done. theres a bunch of stuff i now don’t have to worry about (does the instance actually exist yet? other validation edge cases?)

https://developer.hashicorp.com/terraform/language/meta-argu...

> You say that as if it’s impossible to write clear code.

not the parent, but i feel their usage of the word “code” was in error. i don’t care about how, i care about what.

the HCL is purely a defintion/description of what the infrastructure looks like. what resources will be created? that is all it is. i want that. to define the infra and move on. i don’t want low level control of every minutia to do with infrastructure. i want to read a config file and just know what resources will exist in the account. wanna know every resource that exists? `terraform state list` … job done. no reading code required.

HCL/terraform is to define my cloud resources, not to control them or their creation. if i want control, then i need to whip out some go/python.

that’s my vibe on CDK libraries/platform APIs versus terraform.

Aeolun

I don’t understand how these things follow.

I’ll be the first to agree that CDK sucks, but Pulumi is essentially Terraform as a programming language, with all the niceties of terraform and more besides.

The format that defines what infra looks like is… very hard to parse with terraform (for me). It seems optimized for defining a list of static resources, and everything else is bolted on. The for_each or count constructs do not make intuitive sense to me anyway.

That said, by all means, use Terraform, as long as it’s not CDK I’ll be happy.

Spivak

You can understand every single terraform codebase using nothing other than the terraform documentation itself. All abstractions are provided by the language itself.

Clear isn't really the word I would call it, more that the real work being done is exposed and always visible.

Aeolun

Modules aren’t provided by the language though?

paulddraper

Fortunately, Terraform has CDKTF [1] which allows you to use common languages such as Python, Java, and TypeScript to author Terraform infra.

I used it daily and find it greatly liberating.

[1] https://developer.hashicorp.com/terraform/cdktf

patrick451

>whoever comes across your clever code next will probably want to hit you, especially if it's not a dev from your immediate team, and especially if you have succumbed to blurring the lines between your idk code and your app code.

If you want to maximize the number of people who have a chance of understanding what is happening, python is your huckleberry. They are going to want to hit the guy who wrote everything in a bizarre language called HCL that nobody outside of infra has ever seen or heard of.

> If they instead come across some bog-standard terraform that maybe has a bunch of copy-paste and is a bit more verbose... Who cares? Its function will be obvious, there is no mental overhead needed.

"bog standard" is doing a lot of heavy lifting here. You can write simple python or esoteric python and you can write simple terraform or esoteric terraform.

liampulles

As the Go proverb goes: "clear is better than clever". https://go-proverbs.github.io/

danw1979

I think a majority of the rants about Terraform I read are written from the perspective of someone managing inherently ephemeral infrastructure - things that are easily disposed of and reprovisioned quickly. The author of such a critique is likely managing an application stack on top of an account that someone else has provided them, a platform team maybe. CDK probably works for you in this case.

Now, if you belong to that platform team and have to manage the state of tens of thousands of "pet" resources that you can't just nuke and recreate using the CDK (because some other team depends on their avaiability) then Terraform is the best thing since sliced bread; it manages state, drift, and the declarative nature of the DSL is desirable.

Horses for courses.

bayindirh

> Horses for courses.

I think with YMMV, these are the two most important things we need to keep in our mind. With plethora of technologies and similar tools, we generally read the tin superficially but not the manual, and we declare "This is bollocks!".

Every tool is targeted towards a specific use and thrive in specific scenarios. Calling a tool bad for something not designed for is akin to getting angry to your mug because it doesn't work as well when upside down [0].

[0]: https://i.redd.it/mcfym6oqx5p11.jpg

robertlagrant

For me Terraform's biggest strength is also its biggest source of pain: it can integrate all sorts of technologies under one relatively vendor-agnostic umbrella and enforce a standard workflow across a huge amount of change. However, that means any bug in any provider is sort of Terraform's fault, if only in the developer's mind.

gregmac

Having debugged this sort of thing before, it's actually really hard to figure that out.

The entire stack is kind of bad at both logging and having understandable error messages.

You get things like this:

    ╷
    │ Error: googleapi: Error 400: The request has errors, badRequest
    │ 
    │   with google_cloudfunctions_function.function,
    │   on main.tf line 46, in resource "google_cloudfunctions_function"         "function":
    │   46: resource "google_cloudfunctions_function" "function" ¨

Is this a problem with the actual terraform or passing a variable in or something? Is it a problem with the googleapi provider? Is it a problem with the API? Or did I, as the writer of this, simply forget a field?

In complex setups, this will be deep inside a module inside a module, and as the developer who did not use any google_cloudfunctions_function directly, you're left wondering what the heck is going on.

JohnMakin

The nice thing though, if you are a developer, is that most of these providers code is open source. We've had cases where we've forked providers and fixed bugs on our own before we could get something merged in. I've personally fixed several provider bugs on my own out of annoyance - terraform's just a wrapper around cloud API's, usually, and you can be in control of how that works.

stego-tech

These sorts of posts are fascinating "nerd snipes" to cryptids like me. On the surface, they look incredibly interesting and I want to learn more! Terraform isn't code? Please explain to me why not, you have my attention.

Then I get to the real meat of the issue, which is often along the lines of, "I'm a software developer who has to handle my own infrastructure and I hate it, because infrastructure doesn't behave like software." Which, fair! That is a fair critique! Infrastructure does not behave like software, and that's intentional!

It's almost certainly because I come from the Enterprise Tech world rather than Software Dev world, where the default state of infrastructure is permanent and mutable, forever. Modern devs, who (rightly!) like immutable containers and block storage and build tools to support these deployments by default, just don't get why the Enterprise tech stack is so much more different, and weird, and...crufty compared to their nifty and efficient CI/CD pipeline, just like I cannot fully appreciate the point of such a pipeline when I'm basically deploying bespoke machines for internal teams on the regular because politics dictates customer service over enterprise efficiency. It's the difference between building an assembly line for Corollas and Camrys (DevOps), and building a Rolls-Royce Phantom to spec for a VIP client (BizTech). That's not to say there hasn't been immense pressure to transform the latter into more like the former, and I've been part of some of those buildouts and transitions in my career (with some admittedly excellent benefits - Showback! Tenancy! Lifecycles!), but these gripes about Terraform are admittedly lost on me, because I'll never really encounter them.

And if I did, I don't need to pickup programming to fix it necessarily. I just need to improve my existing system integrations so Ansible runbooks can handle the necessary automation for me.

JohnMakin

Thanks for posting this, I favorited it - having carved out a weird niche in my career as an "infra" guy, inevitably I deal with a lot of IAC. I run into this attitude a lot by devs - they are indeed annoyed by managing infrastructure, because it innately is not like software! I know I'm reiterating what you said but it is so important to understand this.

Here is a thing I run into a lot:

"Our infra is brittle and becoming a chore to manage, and is becoming a huge risk. We need IAC!" (At this point, I don't think it's a bad idea to reach for this)

But then -

"We need to manage all our IAC practices like dev ones, because this is code, so we will use software engineering practices!"

Now I don't entirely disagree with the above statement, but I have caveats. I try to treat my IAC like "software" as much as I can, but as you pointed out, this can break down. Example: managing large terraform repositories that touch tons of things across an organization can become a real pain with managing state + automation + normal CI/CD practices. I can push a terraform PR, get approved, but I won't actually know whether what I did was valid until you try to push it live. As opposed to software, where you can be reasonably confident that the code is going to mostly work how you intend before you deploy it. Often in infra, the only way to know is to try/apply it. Rollback procedures are entirely different, etc.

It also breaks down as others have noted trying to use terraform to manage dynamic resources that aren't supposed to be immutable (like Kubernetes). I still do it, but it's loaded with foot guns I wouldn't recommend to someone that hasn't spent years doing this kind of thing.

mdaniel

> I can push a terraform PR, get approved, but I won't actually know whether what I did was valid until you try to push it live

Our concession to this risk was that once a merge request was approved, the automation was free to to run the apply pipeline step, leaving open the very likely possibility that TF shit itself. However, since it wasn't actually merged yet, push fixes until TF stopped shitting itself

I'm cognizant that solution doesn't "scale," in that if you have a high throughput repo those merge requests will almost certainly clash, but it worked for us because it meant less merge request overhead (context switching). It also, obviously, leveraged the "new pushes revoke merge request approval" which I feel is good hygiene but some places are "once approved, always approved"

anonfordays

>It's almost certainly because I come from the Enterprise Tech world rather than Software Dev world, where the default state of infrastructure is permanent and mutable, forever. Modern devs, who (rightly!) like immutable containers and block storage and build tools to support these deployments by default, just don't get why the Enterprise tech stack is so much more different

This is generally true, but the interesting thing about Terraform is it was created specifically to work in the world of "immutable by default." This is why Terraform automatically creates and destroys instead of mutating in many (most?) cases, shys away from using provisioners to mutate resources after creation, etc.

stego-tech

Yep, and that's why I only very recently picked it up in Enterprise world, where the AWS team used it to deploy resources. What used to take them ~45min by hand using prebuilt AMIs, now takes ~500 lines of Terraform "code" and several hours of troubleshooting every time Terraform (or whatever fork they're now using post-Hashicorp) updates/changes, because Enterprise architecture is mutable by default and cannot simply be torn down and replaced.

anonfordays

>What used to take them ~45min by hand using prebuilt AMIs, now takes ~500 lines of Terraform "code" and several hours of troubleshooting every time

This is just operational immaturity. No one should be building anything "by hand," everything should be automated. Deploying instances from prebuilt AMIs takes a dozen or so lines of Terraform code. Terraform can spin up dozens of instances in less than 5 minutes with a dozen lines of code: https://dev.to/bennyfmo_237/deploying-basic-infrastructure-o...

If you're not operationally mature enough, the problem isn't the tool, it's you. This is basic Terraform usage.

>because Enterprise architecture is mutable by default and cannot simply be torn down and replaced.

This is no longer correct/true. Maybe for laggards it's true, but modern enterprises with modern ops teams using modern tooling are deploying most of everything with immutability in mind. Enterprise architecture is immutable by default now, and destroying and replacing is the norm.

ctrlp

what sort of cryptid are you?

stego-tech

As the username implies, the "dinosaur on the internet" kind. The classic trope of the IT person who live(d) in their windowless cave, surrounded by a cacophony of whirling fans and grinding hard drives, retired kit repurposed into a lab since the budget never allowed for a proper one. Graphic tees and blue jeans, an enigmatic mystery to the masses who complain stuff is broken but also that they don't know why I'm here since everything always works.

So just your average IT person, really. What we lack in social graces, we make up for with good humor, excellent media recommendations, and a loved passion for what we create because we like seeing our users smile at their own lives being made easier. I guess the "cryptid" part comes in because I'm actively trying to improve said sociability and round out my flaws, unlike the stereotypical portrayals of the BOFH or IT Crowd.

ctrlp

the stegotech, a mythical beast that, unlike the BOFH, will not bite the hand that submits a support ticket

throwanem

I miss that kind of work.

voidfunc

I ditched Terraform years ago and just interact with the raw cloud provider SDKs now. It's much easier to long-term evolve actual code and deal with weird edgecases that come up when you're not in beholden to the straight jacket that is configuration masquerading as code.

Oh yea, and we can write tests for all that provisioning logic too.

plmpsu

How are you handling creating multiple resources in parallel? or rolling back changes after an unsuccessful run?

gorgoiler

Not OP, but for rolling back we just… revert the change to the setup_k8s_stuff.py script !

In practice it’s a module that integrates with quite a large number of things in the monolith because that’s one of the advantages of Infrastructure as Actual Code: symbols and enums and functions that have meaningful semantics in your business logic are frequently useful in your infrastructure logic too. The Apples API runs on the Apples tier, the Oranges API runs on the Oranges tier, etc. etc.

People call me old fashioned (“it’s not the 1990s any more”) but when I deploy something it’s a brand new set of instances to which traffic gets migrated. We don’t modify in place with anything clever and I imagine reverting changes in a mutable environment is indeed quite hard to get right (and what you are hinting at?)

solatic

> I imagine reverting changes in a mutable environment is indeed quite hard to get right (and what you are hinting at?)

I guess you're not managing any databases then? Because you can't just treat those immutably, you have to manage the database in-place.

michaelmior

> revert the change to the setup_k8s_stuff.py script

What about resources that were created by the code you reverted?

inopinatus

A very small shell script.

kikimora

I’ve been thinking about this for a long time. But doesn’t it brings a host of other issues? For example, I need to update instance RAM from 4 to 8 Gb but how do I know if the instance exists or should be created? I need to make a small change, how do I know what parts of my scripts to run?

klooney

Here are the things that TF does that you are probably not going to get around to in a comprehensive way-

- State tracking, especially all of the tedious per cloud resource details

- Parallelism- TF defaults to 10 threads at a time. You won't notice this when you write a demo to deploy one thing, but it really matters as you accrete more things.

- Dependency tracking- hand in hand with the parallelism, but this is what makes it possible. It is tedious, resource by resource blood sweat and tears stuff, and enabled by the inexpressive nature of HCL

Plus, you know, all of the work that has already done by other people to wrap a million quirky APIs in a uniform way.

voidfunc

You write code to do these things? If there's a requirement for you to be able to do such a thing make it a feature, implement it with tests and voila, no different than any other feature or bug you work on is it?

diggan

> For example, I need to update instance RAM from 4 to 8 Gb but how do I know if the instance exists or should be created?

    let front_id = if instance_exists("front_balancer") {
      return fetch_instance("front_balancer").id
    } else {
      return create_new_instance("front_balancer", front_balancer_opts).id
    }

Or however else you would manage that sort of thing in your favorite programming language.

> I need to make a small change, how do I know what parts of my scripts to run?

Either just re-run the parts you know you've changed (manually or based on git diffs), or even better, make the entire thing idempotent and you won't have to care, re-run the entire program after each change and it'll automagically work.

michaelmior

> Either just re-run the parts you know you've changed (manually or based on git diffs)

This is exactly the sort of thing Terraform is designed to avoid because it can obviously get quite messy. Agreed that making things idempotent solves that problem, but it's not always obvious/easy how to do so.

kikimora

I get the idea but I don't think it addresses the issue. There has to be a function that a) checks if instance exists b) checks if instance state is what I want (e.g. it has 8 GB ram) c) if not it updates the instance. Ideally it also locks environment while doing this to prevent race conditions. It can be written but seems to be quite cumbersome. Complexity of this code and also time it takes to run it what concerns me most.

I guess this is why terraform state is there. IMHO state is IaaC biggest weakness because you have to keep it consistent with actual cloud state. If we can just query state from the cloud and make it performant + be able to automatically (or just fast enough) select resources to be update it would be ideal.

evantbyrne

I went through the same evolution, even built a PaaS for AWS, but I kept going and now just deploy my own stuff to VMs with Swarm via one command in Rove. It's great. And yes I know kubernetes I use it at work. It's an unnecessary waste of time.

dijksterhuis

> Swarm

docker swarm is so simple and easy compared to the utter behemoth that is k8s, and basically is all you need for CRUD webapps 80-90% of the time. add an RDS instance and you’re set.

i will always pick swarm in a small company* whenever possible until k8s or ECS makes sense because something has changed and it’s needed.

dont start with complexity.

* - bigger companies have different needs.

evantbyrne

People have really been sleeping on Swarm. I sometimes even see people trying to recreate Swarm features with Compose. Wish more devs knew about it.

solatic

Terraform added tests somewhat recently: https://developer.hashicorp.com/terraform/language/tests

imp0cat

And eventually, you end up with your own in-house Terraform.

beacon294

I agree that the SDK is better for many use cases. I do like terraform for static resources like aws vpc, networking, s3 buckets, etc.

abound

I think I've commented this elsewhere, but using Cue [1] is also great for this purpose, with no extra infrastructure. E.g. you define a Cue Template [2], which seems analogous to Yoke/ATC's CRDs, and then your definitions just include the data.

Here's an example of Vaultwarden running on my K8s cluster:

    deployment: bitwarden: {
      spec: {
       template: {
        spec: {
         containers: [{
          image: "vaultwarden/server:1.32.7"
          env: [{
           name:  "ROCKET_PORT"
           value: "8080"
          }, {
           name: "ADMIN_TOKEN"
           valueFrom: secretKeyRef: {
            name: "bitwarden-secrets"
            key:  "ADMIN_TOKEN"
           }
          }]
          volumeMounts: [{
           name:      "data"
           mountPath: "/data"
           subPath:   "bitwarden"
          }]
          ports: [{
           containerPort: 8080
           name:          "web"
          }]
         }]
         volumes: [{
          name: "data"
          persistentVolumeClaim: claimName: "local-pvc"
         }]
        }
       }
      }
     }

And simpler services are, well, even simpler:

    deployment: myapp: spec: template: spec: containers: [{
      ports: [{
       containerPort: 8080
       name:          "web"
      }]
     }]

And with Cue, you get strongly typed values for everything, and can add tighter constraints as well. This expands to the relevant YAML resources (Services, Deployments, etc), which then get applied to the cluster. The nice thing of this approach is that the cluster doesn't need to know anything about how you manage your resources.

[1] https://cuelang.org/

[2] https://cuelang.org/docs/tour/types/templates/

Cyphus

I really want to dive in with Cue, but one thing that I got burned on when using jsonnet to generate CloudFormation templates years ago was lack of discoverability for newcomers to the repo.

Taking your sample code as an example, someone might look at the myapp deployment definition and ask: “does this deployment get created in the default namespace or does it automatically create a myapp namespace? What’s the default number of replicas? Are there any labels or annotations that get automatically added?” Etc.

On the flip side, there’s potential lack of “greppability.” The user may have found a problem with a deployed resource in, say, the development cluster, and go to grep for some resource-specific string in the repo, only to come up empty because that string is not in the source but rather generated at by the templating system.

To be clear, both of these problems can affect any method of generating config, be it yoke, helm, ksonnet, kustomize, or cue. It’s like a curse of abstraction. The more you make things into nice reusable components, the easier it is for you to build upon, and the harder it is for others to others to jump in and modify.

At least with Cue you get properly typed values and parameter validation built in, which puts it miles ahead of “everything is a string” templating systems like the helm templates the article complains about.

strangelove026

I was kind of interested in cue earlier last year as IIRC it can be served by helm and is much much better than templating yaml. Never really got started with it. Wish they had an LSP too.

https://github.com/cue-lang/cue/issues/142

mdaniel

What the hell is going on with their bot copy-pasting every comment on that issue? What a mess

Anyway, I wanted to ask what you meant by "served by helm?" I knew about https://github.com/stefanprodan/timoni and https://github.com/holos-run/holos but I believe they are merely "inspired by helm" and not "cue for helm"

nosefrog

Reminds me of gcl (yikes).

bbu

Looks promising but it starts with a (justified) rant about terraform and then goes into how to replace Helm.

I am confused. Can yoke be used to create and manage infrastructure or just k8s resources?

thayne

Indeed. This isn't really a replacement for terraform, unless you are only using terraform to manage k8s resources. Which probably isn't most people who are currently using Terraform.

xena

Author here. It's mainly for k8s resources; but if you install operators like external-dns or something like crossplane into your cluster, you can manage infra too.

groestl

> into your cluster

I guess the point is: what if you don't have a cluster.

darkwater

And also: what manages the Kubernetes cluster lifecycle in the cloud provider, or on bare metal?

There is life before (and beyond) Kubernetes.

sureglymop

What alternative to terraform would one use to set up the whole cluster before provisioning any resources?

I currently have a custom script that is a mix between terraform and ansible that sets up a proxmox cluster, then a k3s cluster and a few haproxys with keepalived on top. Granted, maybe not the most standard setup.

e12e

Do you have a complex Ansible setup? For the few bespoke VMs I need, I've been able to get away with cloud init so far - but they're explicitly set up to be reasonable to nuke and recreate - if they had more personality and needed to be more managed as pets - I would probably need to reach for something like Ansible - or see if I could build images (vm or Docker).

But then with images I'm on the hook for patching... Not simply auto-patching via apt...

glitchcrab

I use the cluster-api provider for Proxmox running in an ephemeral cluster (usually kind) to bootstrap the Kubernetes cluster to a point where Flux gets installed and it takes over managing itself. I then throw the kind cluster away as I no longer need it.

e12e

I've considered dropping terraform (openTofu) for our k8s resources since k8s is stateful anyway.

But that would complicate synchronization with resources outside of k8s, like tailscale, DNS, managed databases, cloud storage (S3 compatible) - and even mapping k8s ingress to load_balancer and external DNS.

So far I feel that everything in terraform is the most simple and reasonable solution - mostly because everything can be handled by a single tool and language.

bbu

ok, that makes sense. A better Helm would be nice. timoni.sh is getting better and better, but Cue is a big hurdle.

Unfortunately, I'm not a big fan of the yaml-hell that crossplane is either.

But as a Terraform replacement systeminit.com is still the strongest looking contender.

jonasdegendt

> A better Helm would be nice.

Consider CDK8s (Typescript or Go) or Jsonnet. We evaluated Cue and the two aforementioned options and ended up with CDK8s using Typescript and it's incredibly powerful.

danw1979

It’s just a dunk on terraform to promote yet another K8s provisioning thing.

WatchDog

I'm quite happy with CDK[0].

My experience is only with the main AWS cloudformation based version of CDK, although there is also CDK for terraform, which supports any resource that terraform supports, although some of what I'm about to say is not applicable to that version.

What I like about CDK, is that you can write real code, and it supports a wide range of languages, although typescript is the best experience.

Provided that you don't use any of the `fromLookup` type functions, you can run and test the code without needing any actual credentials to your cloud provider.

CDK essentially complies your code into a cloudformation template, you can run the build without credentials, then deploy the built cloudformation template separately.

You don't need to worry about your terraform server crashing half way though a deployment, because cloudformation runs the actual deployment.

[0]: https://github.com/aws/aws-cdk

chuckadams

My main problem with CDK is that it only outputs a CloudFormation stack. I can sign up for a new cloud account, spin up a k8s cluster, deploy everything to it, and restore the database snapshot faster than CF will finish a job that's stuck on UPDATE_CLEANUP_IN_PROGRESS.

Of course there's also cdk8s, but I'll probably go with Pulumi instead if I need that. Right now I'm happy with helmfile, though not so much with helm itself. So I'll definitely be giving Yoke a look.

cedws

In your experience how often have you had template builds succeed but then fail at apply time? This kind of issue is what I find most frustrating about IaC today, your 'code' 'compiling' means nothing because all of the validations are serverside, and sometimes you won't find out something's wrong until Terraform is already half done applying. I want to be able to declare my infrastructure, be able to fully validate it offline, and have it work first try when I apply it.

Aeolun

I find Pulumi very nice here because it persists state after every successful resource creation. If it breaks somewhere in the middle, the next run will just pick up where it left off last time.

CDK… well, CDK doesn’t get in an invalid state often either, but that’s because it spends 30m rolling back every time something goes wrong.

WatchDog

I've had less such issues with CDK, versus raw cloudformation, or terraform, but it can still happen.

jon-wood

> Yoke is a project that takes this basic idea to the next level. With Yoke, you write your infrastructure definitions in Go or Rust, compile it to WebAssembly, and then you take input and output Kubernetes manifests that get applied to the cluster.

This just puts me in mind of https://howfuckedismydatabase.com/nosql/

skinkestek

>> Wait, there's something here that I'm not getting. Why are you compiling the code to WebAssembly instead of just running it directly on the server?

> Well, everything's a tradeoff. Let's imagine a world where you run the code on the server directly.

> If you're using a language like Python, you need to have the Python runtime and any dependencies installed. This means you have to incur the famous wrath of pip (pip hell is a real place and you will go there without notice). If you're using a language like Go, you need to have either the Go compiler toolchain installed or prebuild binaries for every permutation of CPU architecture and OS that you want to run your infrastructure on. This doesn't scale well.

> One of the main advantages of using WebAssembly here is that you can compile your code once and then run it anywhere that has a WebAssembly runtime, such as with the yoke CLI or with Air Traffic Controller.

At this point, why not use a proper runtime like JVM or .Net?

Then one can also easily use reasonable languages like C#, Java or Kotlin as well.

LoganDark

> At this point, why not use a proper runtime like JVM or .Net?

Because then you are forced to only use managed languages?

skinkestek

Ahh, good point.

I guess Rust (and maybe other unmanaged languages) can be compiled to WebAssembly?

xena

Yes:

Go: https://go.dev/blog/wasi

Rust: https://github.com/bytecodealliance/wasmtime/blob/main/docs/...

LoganDark

https://logandark.net/calc is C++ compiled to WebAssembly using Emscripten. Back from I think 2018.

These days Rust is practically the poster child of compiling to WebAssembly because it's so easy. Most WASM content I see is actually about Rust.

sunrunner

> a proper runtime like JVM or .Net?

Or (god forbid) a proper runtime like just the OS and architecture of the platform you're running the tool on.

This article calling out Go because you have to prebuild binaries for every OS and architecture combination beforehand (even though you probably already know this combination and it's likely small) and saying that doesn't scale but then requiring every user to not just get the WASM binary they want to run but _also_ just get the (OS and architecture-specific) runtime for the environment seems unfair.

Anything where you distribute your application in a way that isn't immediately usable on a target platform without having to set up an environment of some kind feels like a 'Shift right deployment' kind of thinking, where it's fine to do the bare minimum to make your tool available and it's now an end user's problem to get everything set up to run it.

In some cases this is fine and the cost is low (same-language library usage for example) but when it comes to tool distribution every time I see 'cargo install' for someone's cute Rust project or an instruction to 'just do the following ten steps and you can run the executable' I usually just back away from GitHub and don't bother.

beacon294

It's possible a few ways. I prefer modules, and this LLM answer describes an older way with count and for_each.

It's always possible that incantation of the problem space has a gotcha that needs a work around, but I doubt it would be a blocker.

https://www.perplexity.ai/search/if-you-really-do-think-that...

INTPenis

Who are these ops people that want to write golang and rust? It seems like a tiny niche. If you're that comfortable writing golang or rust then why not just become a developer?

I'm a lifelong ops person, since 2000, and I use Ansible or Terraform daily.

I often wanted to learn golang better but I just never had the motivation. I'm a mean Python scripter, I can write a system integration in hours, but there's something about compiled languages I just never could get into.

I'm saying this only because the whole point of yoke is to define your infrastructure in Golang so that you can add in the missing pieces with Golang. So that you're free to use Golang for anything other than the pre-defined infrastructure providers in Yoke, so you're now a Golang developer. Congrats.

bionhoward

A big benefit is the compiler catches bugs so you don’t have to wait around for your Python program to crash at runtime. Also, if the type system is more “legit” then you can skip a ton of defensive parsing of inputs.

Could be more about developers who know Golang and Rust wanting to deploy their apps (no need to pigeonhole anyone into just dev or just ops)

ForTheKidz

I don't really see a distinction between developer and ops person in this context. The whole point of all of these tools is to make infrastructure into code. Go isn't the choice I would have made but it's fine.

null

[deleted]

abofh

Being a developer is not necessarily a life-long goal of ops people. I like playing with all the toys in the toybox - sometimes I need to write code to make things play the way I want. I went to school to become a programmer, and discovered that while I enjoyed programming for myself, I hated doing it for others -- otoh, I had been working as an ops guy to pay beer money, and found that it was a lot more aligned with my interests.

Most of these tools though, are written by engineers who don't want to understand the tools they were given, and want to write their own. Even the vaunted pulumi's 'aws-native' package is just built on top of cloud control which is built on top of cloudformation which is often cited as the reason terraform (which pulumi is based on top of) was created: "eww, I don't like that tool".

Which is all to say - people write code because they have a problem - engineering, operational, it doesn't matter. Assuming an ops person wants to become a developer is akin to assuming all developers want to become managers, and that all managers want to become TV stars. The logic presumes a viewpoint that simply isn't true.

empath75

I've written a bunch of k8s operators in go (and rust more recently). That's how basically everyone working with k8s does once you reach a certain level of complexity.

I don't really understand, in fact, why you'd use yoke instead of just writing an operator with kubebuilder or kube-rs.

davidmdm91

Writing your own operator is a lot of complexity when what you want to do is deploy a package.

Via Yoke's AirTrafficController, it's easy to define a package as a CRD and its implementation (a program that takes the CR and outputs the desired underlying resources) and the AirTrafficController takes care of the rest.

You no longer need to think about reconciler loops, figuring out any orphaned state, etc.

Kinrany

Supposedly it's a package manager as well, so if there's a package, you'd be able to use it without writing any code.

postpawl

A lot of important devops tools like Kubernetes and Grafana are written in golang, and it’s often handy to be able to import their code to use in your own code to automate those things.

INTPenis

But again, you're now a developer.

And I'm asking who are these developers using IaC tooling? It seems to me like it was made for ops.

All power to you if you take on both roles, but that's a good way to get burned out. I'm a devops person so the devs can focus on just code, and I can focus on making the best and safest infrastructure for them to run their code in.

terrabitz

I feel like the distinction between the two is fairly contrived these days. I'm an SRE, and we're constantly building tooling to help us better manage infrastructure, improve reliability, improve DX, etc. On the flip side, we also push a lot of the responsibility for infrastructure management to our devs: we maintain the modules and IaC pipelines, and the developers hook up the building blocks they need. It can actually help avoid burnout because our team doesn't become a bottleneck for infrastructure provisioning.

Say what you want about IaC in Go or other programming languages, but it can definitely help strengthen the whole "developers own their infrastructure" since they don't have to learn an additional language syntax.

postpawl

Those developers are working on “Internal Development Platforms” and building their own abstractions on top of tools like Kubernetes and Grafana to simplify things for developers. This page explains it pretty well: https://internaldeveloperplatform.org/what-is-an-internal-de...

HN

Yoke: Infrastructure as code, but actually

Yoke: Infrastructure as code, but actually