Skip to content(if available)orjump to list(if available)

Case Study: ByteDance Uses eBPF to Enhance Networking Performance

tptacek

Netkit, which is what this is built on, is pretty neat. For transmitting packets from one container/VM to another, the conventional solution is to give each its own veth device. When you do that, the kernel network stack, at like the broad logic level, is sort of oblivious to the fact that the devices aren't real ethernet devices and don't have to go through the ethernet motions to transact.

Netkit replaces that logic with a simple pairing of sending and receiving eBPF programs; it's an eBPF cut-through for packet-level networking between networks that share a host kernel. It's faster, and it's simpler to reason about; the netkit.c code is pretty easy to read straight through.

charleslmunger

>When you do that, the kernel network stack, at like the broad logic level, is sort of oblivious to the fact that the devices aren't real ethernet devices and don't have to go through the ethernet motions to transact.

Is that true even for virtio-net? I guess I just assumed all these virtual devices worked like virtiofs and had low overhead fast paths for host and guest communication.

XorNot

Yeah this is a surprise to me too - my impression was things like loopback and virtio devices were used explicitly because they don't pretend to ever be real devices, and thus bypass all the real device handling.

What additional overhead is cut out by the netkit approach?

tptacek

Are you using virtual machines? They're not.

The big win here as I understand it is that it gives you roughly the same efficient inter-device forwarding path that XDP gives you: you can bounce from one interface to another in eBPF without converting buffers back into skbuffs and snaking them through the stack again.

lsnd-95

It would be nice to see an implementation of TCP fusion (on Solaris) or SIO_LOOPBACK_FASTPATH (on Windows) for Linux.

sirjaz

Someone on HN giving kudos to Windows for once. Has hell frozen over.

jiveturkey

Came here to say the same. I'm glad linux is finally catching up to Solaris.

preisschild

Cilium (a Kubernetes CNI) can use netkit instead of veth bridges since netkit was introduced in the kernel

https://isovalent.com/blog/post/cilium-netkit-a-new-containe...

akamaka

Thanks for the clear explanation!

erulabs

I'd love to see a more complete picture of ByteDance's TikTok infra. They released "KubeAdmiral" (1) so I'm assuming they're using eBPF via a Kubernetes CNI, and I see ByteDance listed on Cilium's github (2). They're also using KubeRay (3) to orchestrate huge inference tasks. It's annoying that a company I definitely do not want to work for has such an incredibly interesting infrastructure!

1. https://github.com/kubewharf/kubeadmiral

2. https://github.com/cilium/cilium/blob/main/USERS.md

3. https://www.anyscale.com/blog/how-bytedance-scales-offline-i...

koakuma-chan

They also made monoio, an io-uring based async runtime for Rust: https://github.com/bytedance/monoio

dilyevsky

I also heard they replace k8s etcd with a shim [0] similar to kine because their clusters are so large.

[0] - https://github.com/kubewharf/kubebrain

ddxv

Here's my list of the decompiled apps tools and business SDKs they are using:

https://appgoblin.info/apps/com.zhiliaoapp.musically/sdks

nighthawk454

> eBPF is a technology that can run programs in a privileged context such as the operating system kernel. It is the successor to the Berkeley Packet Filter (BPF, with the "e" originally meaning "extended") filtering mechanism in Linux and is also used in non-networking parts of the Linux kernel as well.

> It is used to safely and efficiently extend the capabilities of the kernel at runtime without requiring changes to kernel source code or loading kernel modules. Safety is provided through an in-kernel verifier which performs static code analysis and rejects programs which crash, hang or otherwise interfere with the kernel negatively.

https://en.wikipedia.org/wiki/EBPF?useskin=vector

throw78311

I guess this is why everything is under Federation/default now, the old mess was annoying to work with.