DualPipe: Bidirectional pipeline parallelism algorithm

xnhbx

> DualPipe was created and developed by Jiashi Li and Chengqi Deng and Wenfeng Liang.

A CEO who codes.

anonzzzies

When my company was still working closely with CN factories a few years ago (before the bans / clients no longer wanting to work with companies working with china etc), the CEO's of the factories we worked with all were electronic engineers at that company or another before; they all could jump in, debug schematics, sold and write firmware themselves. And they did. These were places with massive campuses with towering buildings with robots and a few (relative to the massive space) employees doing maintenance etc + prototyping.

larodi

It sounds so more reasonable to have a director who is actually technical, doesn't it? I'm absolutely amazed how this (to the east) contrasts to understanding (to the west) that directors rather need to know finance, strategic planning, and marketing, than the actual nuance of the work.

tway223

To be blunt this is exactly what is wrong with the “leadership” mindset in the west, as decisions are often made without understanding the “nuances” yet they are confident it would work.

tantalor

"developed" and "codes" have different meanings.

ikeashark

Yes but in this context, they are very close to each other in meaning.

Besides Liang does indeed code a significant amount and has contributed to almost all of their published papers.

danielhanchen

I attached all 3 algorithms 1F1B (1 forward 1 backward), ZB1P (zero bubble pipeline parallelism) and DualPipe as a picture here: https://x.com/danielhanchen/status/1894937006352031832 for those interested :)

Bimos

Maybe add Chimera as well?

https://arxiv.org/pdf/2107.06925

isoprophlex

it looks as if Chimera has marginally less bubbles than DualPipe?

danielhanchen

Oh more nice pictures :)

alphan0n

Off topic, but this is the Rick and Morty episode where Rick creates a perfectly level space.

The symmetry is uuugh.

danielhanchen

You'll have to refresh my memory :) Is there like a Youtube clip for it?

Cyphase

https://www.youtube.com/watch?v=-MwCJpEuC44

puppycodes

Sorry for us utter simpletons can someone explain what it do?

fasterergpes

It makes it so that having more GPUs makes inference run faster. Worst case has been you can only use memory from them and gain no speed at all

456yetdh6r

[flagged]

qrios

In very simple words: it is one way to reduce the white squares in the picture from @danielhanchen[1].

In more complex words: imagine a processor which is able to process every instruction in 10 clock cycles. But also the processor is able to get new input for this instruction on every clock cycle and starts to process this new input in a pipeline. After the first input you have to wait ten clock cycles. But if you feed the input line every time you will get the output also permanently.

In the case of GPUs, it is now not only a topic of a single pipeline, but multiple in parallel. Depends on your data and algorithm it can be thousands in parallel.

[1] https://x.com/danielhanchen/status/1894937006352031832

optimalplusone

I hope all the open sources Deepseek is doing encourages American labs to do more of the same. Surely they'll realize their momentum is more of a moat than their tech at any one point in time.

jpcom

Does this remind anyone else of the Pied Piper compression algorithm?

aqueueaqueue

Middle out or something?

snake_doc

Hmm weren’t there also supposed to be the SM re-allocation, doesn’t look like it was included; I may have been mis-remembering the explanation.

ringer007

[dead]

HN

DualPipe: Bidirectional pipeline parallelism algorithm

DualPipe: Bidirectional pipeline parallelism algorithm