DeepSeek open source DeepEP – library for MoE training and Inference
18 comments
·February 25, 2025helloericsf
- Efficient and optimized all-to-all communication - Both intranode and internode support with NVLink and RDMA - High-throughput kernels for training and inference prefilling - Low-latency kernels for inference decoding - Native FP8 dispatch support - Flexible GPU resource control for computation-communication overlapping X: https://x.com/deepseek_ai/status/1894211757604049133
ofou
You gotta love these guys, they're really pushing the open source frontier for all of us, thanks for sharing
grg0
Open AI™ (with a space)
hackit2
Kind of ironic that DeepSeek is more Open than ChatGPT
gostsamo
They do it for their own reasons, but OpenAI are straight up liars and they are neither open nor give a fuck about humanity.
echelon
I hope you're reading this Sam Altman:
Make Open AI open.
Or else you'll lose to the ecosystem.
deyiao
Now it includes the highly anticipated PTX! Of course, I don’t understand it, but I’ve already click the star and even the fork button, which basically means I’ve mastered it, right? I feel incredibly powerful right now...
deyiao
Is the PTX that everyone was looking forward to included this time?
find0x90
Yes, there's some in the csrc/kernels directory. Search for 'asm' to find uses of it.
Bimos
The PTX instructions they talked about in the tech report should be pointing to the code here?
zardinality
"For extreme performance, we discover and use a behavior-out-of-doc PTX instruction: ld.global.nc.L1::no_allocate.L2::256B. This instruction will lead to an undefined behavior: accessing volatile GPU memory with non-coherent read-only PTX modifiers .nc. But the correctness is tested to be guaranteed with .L1::no_allocate on Hopper architectures, and performance will be much better. If you find kernels not working on some other platforms, you may add DISABLE_AGGRESSIVE_PTX_INSTRS=1 to setup.py and disable this, or file an issue."
rvz
Round 2 of open source releases from an actual "Open AI™" company and licensed under MIT.
Once again, DeepSeek is more open than the $157B+ one that is claiming to be "Open".
Almost no-one is talking about Meta's Llama and everyone should expect them to release Llama 4 with reasoning.
The objective is to not be squeezed in the middle of the race to zero.
I feel like a kid in a candy shop. Some of these tricks would take way too long to reverse engineer correctly based on the papers. I hope that the releases this week start a renaissance of the use of MoE as baseline academic models.