TScale – distributed training on consumer GPUs
19 comments
·May 4, 2025zitterbewegung
mdaniel
I suspect this was prematurely published to HN and was in fact just someone's weekend project
https://github.com/Foreseerr/TScale/blob/aa2638c53c74dd33280...
https://github.com/Foreseerr/TScale/blob/aa2638c53c74dd33280...
and I struggle to think of what would lead one to the urge to implement a key=value config file parser in 2025 https://github.com/Foreseerr/TScale/blob/aa2638c53c74dd33280...
On top of that, people who do $(git add . && git commit -myolo) drive me crazy https://github.com/Foreseerr/TScale/blob/main/logs/125m_1T_f...
comex
> and I struggle to think of what would lead one to the urge to implement a key=value config file parser in 2025
C/C++ culture never changes.
As many new build tools and package managers as people come up with, the ‘default’ environment is still one where adding dependencies is hard, so people roll their own utilities instead.
fizx
What is this 1T index technique they seem so hyped about?
emorning3
>> In this case we build a model with 1T index which we lookup for every token to make prediction with much smaller model. <<
This index seems to be used to minimize the size of models.
I'm familiar with term indexing as described in The Handbook of Automated Reasoning and I imagine that this index helps them recognize 'generalizations'.
In the way that a rewrite rule can be used to reduce an infinite number of expressions, not just a single expression, a generalization can be used to minimize models.
Generally, such an index would be some kind of prefix-tree.
Just a guess, guessing is fun
TYMorningCoffee
Can the inference piece be partitioned over multiple hosts?
Edit: algorithmed or partitioned in a way that overcomes the network bottleneck
Maxious
> prima.cpp is a distributed implementation of llama.cpp that lets you run 70B-level LLMs on your everyday devices— laptops, desktops, phones, and tablets (GPU or no GPU, it’s all good). With it, you can run QwQ-32B, Qwen 2.5-72B, Llama 3-70B, or DeepSeek R1 70B right from your local home cluster!
happyPersonR
Pretty sure llama.cpp can already do that
TYMorningCoffee
I forgot to clarify dealing with the network bottleneck
dgekfeg
[flagged]
revskill
Interesting that you put code in code folder, not src.
ArtTimeInvestor
Even with consumer GPUs, the AI stack is completely dependent on ASML, isn't it?
Thought experiment: What would happen if the Dutch government decided that AI is bad for mankind and shuts down ASML? Would the world be stuck in terms of AI? For how long?
bgnn
That's a silly the thought. ASML isn't controlled by the Dutch government.
Also, everything in computing is dependent on semiconductors. ASML is just one player. There are tens of thousands companies involved in the industry and some of them are single suppliers of critical materials, machines or software. It's wrong to single out ASML.
mschuster91
> ASML isn't controlled by the Dutch government.
Of course they are. The Dutch government is who ordered ASML to not export their brand new stuff to China.
wokkel
Actually it was the usa presuring the Dutch government.
TechDebtDevin
ASML publishes most of the research and theres not much stopping people from building their own EUV lithography machines. Its just very very very hard and basically the equivalent of doing magic. China is making incredible progress on this front.
airstrike
The problem with these things is that there are always trade secrets that aren't published anywhere. So you'd need to actually hire people with specific knowledge to be able to replicate it.
The world (and the West specifically) definitely needs to build redundancy ASAP here.
I'm trying to run this but fo.cpp doesn't exist in the repository. I made an issue see https://github.com/Foreseerr/TScale/issues/1