Tom and Jerry One-Minute Video Generation with Test-Time Training
19 comments
·April 8, 2025quantumHazer
really impressive work considering the reported size of the model and training hours.
trunch
50+ hours on 256 H100s is considered impressively low training?
Really makes me wonder if any of this incredibly computationally expensive research is worth it, which seems only useful in potentially promising a future in which humans are given less opportunity to express themselves creatively - while delivering them an infinitely produceable amount of ai generated 'content' to passively consume
skyyler
>Really makes me wonder if any of this incredibly computationally expensive research is worth it
I'm wondering the same thing. 256 H100s were hot for two days straight to be able to make short clips of cartoons that almost don't look like shit?
It just isn't compelling to me.
burgrkng
Compared to the resources costs for humans to prop up the industry, a handful of DCs that can do this and still improve is cheap.
Work phones, laptops, personal stuff. We duplicate a lot of resource use for one person to have a career.
There will still be pencil and paper. There’s still creative things to do. Do we even get that these days? Where’s our generations LOTR or Star Wars? Yep just prequels and sequels of same old.
Are we that creative copy-pasting and git pull deps someone else maintains? IT is librarian work these days. Little in the day to day is novel creativity.
Your argument is not a compelling one. Feels like hand wavy nod to a human soul, while ignoring we all complain about soul crushing jobs capturing so much of our agency, sucking fun out of life since it’s just the same todos different day… not that creative and we tacitly notice and complain but keep doing.
It’s a really lame circular routine and lived experience being around my peers these days; oh I hate my job but this new thing is an abomination and affront to my chosen job. I’m gonna be someone someday! Don’t take it away! Unicorn! Disrupt!
altcognito
So, costs roughly 15k?
quantumHazer
Sorry, you're right lol. I'm just accustomed to other major lab gazillions of hours of training.
soupfordummies
Reading the prompts reminds me of this interesting short story from Steven Millhauser called "Cat 'N Mouse"[1]
Would be really cool to just use this (or parts of it) as one of the prompts and see what results.
[1] - https://www.newyorker.com/magazine/2004/04/19/cat-n-mouse
null
keiferwiseman
Looks pretty bad but considering this was impossible a couple years ago(as far as I know) it’s very impressive progress
onemoresoop
Aside from memes I do not see the progress value.
blamarvt
Are you saying you don't see the value in video generation? The potential for unlimited high quality and customizable content generation?
quantumHazer
who said that personalised, infinite content generation is a good thing? I watch movies and listen to music because I want to be challenged in some way. I don’t want tailored content that prevents me from exploring new territory and keeps me trapped in a personalised echo chamber.
andy12_
The main progress value is that Test-Time Training appears to work very well in practice. I think that as labs begin to test it as scale in LLMs, it will become commonplace in next-generation models.
onemoresoop
Sure, Im not saying they’re not useful tools but let’s not buy into the hype and pretend they’re some silver bullet. Im aware they’ll change how we do programming and other tasks but I don’t think they’ll completely displace human thinking. As for art, i’m not sure artists will cease to exist either. Unless we as a species cease to exist, but then what is all this progress for?
This is by no means a comment about the quality of the project, but my god it's very uncanny in some frames. I feel like this would open up a lot of doors to creepypasta content. I'd love to play around with this