Skip to content(if available)orjump to list(if available)

Diffusion Beats Autoregressive in Data-Constrained Settings

smokel

I fail to understand why we would lack data. Sure, there is limited (historical) text, but if we just open up all available video, and send out interactive robots into the world, we'll drown in data. Then there is simulated data, and tons of sensors that can capture vast amounts of even more data.

Edit: from the source [1], this quote pretty much sums it all up: "Our 2022 paper predicted that high-quality text data would be fully used by 2024, whereas our new results indicate that might not happen until 2028."

[1] https://epoch.ai/blog/will-we-run-out-of-data-limits-of-llm-...

macawfish

I have a feeling this technique might make waves: https://openreview.net/forum?id=c05qIG1Z2B#discussion

null

[deleted]