Automatic K8s pod placement to match external service zones

mystifyingpoi

Cool idea, I like that. Though I'm curious about the lookup service. You say:

> To gather zone information, use this command ...

Why couldn't most of this information be gathered by lookup service itself? A point could be made about excessive IAM, but a simple case of RDS reader residing in a given AZ could be easily handled by simply listing the subnets and finding where a given IP belongs.

toredash

Hi HN,

I wanted to share something I've worked a bit to solve regarding Kubernetes: its scheduler has no awareness of the network topology for external services that workloads communicate with. If a pod talks to a database (e.g AWS RDS), K8s does not know it should schedule it in the same AZ as the database. If placed in the wrong AZ, it leads to unnecessary cross-AZ network traffic, adding latency (and costs $).

I've made a tool I've called "Automatic Zone Placement", which automatically aligns Pod placements with their external dependencies.

Testing shows that placing the pod in the same AZ resulted in a ~175-375% performance increase. Measured with small, frequent SQL requests. It's not really that strange, same AZ latency is much lower than cross-AZ. Lower latency = increased performance.

The tool has two components:

1) A lightweight lookup service: A dependency-free Python service that takes a domain name (e.g., your RDS endpoint) and resolves its IP to a specific AZ.

2 ) A Kyverno mutating webhook: This policy intercepts pod creation requests. If a pod has a specific annotation, the webhook calls the lookup service and injects the required nodeAffinity to schedule the pod onto a node in the correct AZ.

The goal is to make this an automatic process, the alternative is to manually add a nodeAffinity spec to your workloads. But resources moves between AZ, e.g. during maintenance events for RDS instances. I built this with AWS services in mind, the concept is generic enough to be used for on-premise clusters to make scheduling decisions based on rack, row, or data center properties.

I'd love some feedback on this, happy to answer questions :)

darkwater

Interesting project! Kudos for the release. One question: how are the failure scenario managed, i.e. AZP fails for whatever reason and it's in a crash loop? Just "no hints" to the scheduler, and that's it?

toredash

If the AZP deployment fails, yes your correct there is no hints anywhere. If the lookup to AZP fails for whatever reason, it would be noted in the Kyverno logs. And based on if you -require- this policy to take affect or not, you have to decide if it you want pods to fail or not in the scheduling step. In most cases, you don't want to stop scheduling :)

mathverse

Typically you have multi-az setup for app deployment for HA. How would you without traffic management controll solve this?

toredash

I'm not sure I follow. Are you talking about the AZP service, or ... ?

dserodio

It's a best practice to have a Deployment run multiple Pods in separate AZs to increase availability

ruuda

> Have you considered alternative solutions?

How about, don't use Kubernetes? The lack of control over where the workload runs is a problem caused by Kubernetes. If you deploy an application as e.g. systemd services, you can pick the optimal host for the workload, and it will not suddenly jump around.

mystifyingpoi

> The lack of control

This project literally sets the affinity. That's precisely the control you seem to negate.

arccy

k8s doesn't lack control, you can select individual nodes, AZs, regions, etc with the standard affinity settings.