LegoGPT: Generating Physically Stable and Buildable Lego
142 comments
·May 9, 2025haberman
benterix
> to generate and optimize possibilities within a set of domain-specific constraints
Well, yes, we've been doing this for several decades, many people call it metaheuristics. There is a wide array of algorithms in there. An excellent and light intro can be found here: https://cs.gmu.edu/~sean/book/metaheuristics/
eurekin
Metaheurestics? I always thought it's similar to "I don't know how many neurons to put in the hidden layer... and I also don't know how many hidden layers I need, so, let's make it a part of the optimisation problem to find out on it's own".
PeterStuer
That is usually called Hyperparameter tuning.
benterix
As for hyperparameter tuning, the existing solutions such as Optuna or Katib (in KubeFlow) also use metaheuristics, e.g. CMA-ES.
haberman
The description in your link says:
> What is a Metaheuristic? A common but unfortunate name for any stochastic optimization algorithm intended to be the last resort before giving up and using random or brute-force search. Such algorithms are used for problems where you don't know how to find a good solution, but if shown a candidate solution, you can give it a grade.
That sounds like "the AI came up with a solution where cars can crash, let's give that solution a bad grade."
I was hoping for something more like "the problem is specified such that invalid solutions aren't even representable, so only acceptable solutions are considered."
kmacdough
> I was hoping for something more like "the problem is specified such that invalid solutions aren't even representable, so only acceptable solutions are considered."
How on earth would one come up with a model where "crashing cars isnt't representable"? I don't think you recognize how ill-defined and nonsensical this expectation is. Especially when you consider that a such a car may encounter a situation where a crash is unavoidable, where there's certainly room for damage control. Sliding scales ALWAYS work better for optimizations anyways, since regression is so powerful.
dvfjsdhgfv
> I was hoping for something more like "the problem is specified such that invalid solutions aren't even representable, so only acceptable solutions are considered."
It my (roughly) work this way. For example, when you do hyperparameter tuning, you specify upper and lower bounds (so that "invalid solutions aren't even representable").
The problem is, you often have no idea what will work and what not, and e.g. your HPO algorithm might hit the bounds, suggegsting that it might make sense to extend them before the next run.
jllyhill
Thanks, but some strange coincidence this is exactly the book I have right now. In the introduction the author says, "I think these notes would best serve as a complement to a textbook". Do you happen to know any good textbooks on that topic?
benterix
Everybody has their own preferences, what worked for me was Metaheuristics: From Design to Implementation by this guy:
mzl
Or more generally the whole field of combinatorial optimization, of which metaheuristics is a (small) part.
dvfjsdhgfv
I believe you are right in principle, regarding the small part. However my personal impression is that in practical applications metaheuristics is huge (although these things are hard to quantify).
lolinder
A simple version of this that already shines with existing LLMs is JSON Schema mode. You can go quite a long way towards making illegal states unrepresentable and then turn a model loose in the constrained sandbox, with the guarantee that anything it produces will be at least valid if not correct: it's basically type safety for LLM output.
The same mechanism that underlies JSON Schema support can be applied to any sort of validation and correction, and yeah, I'd love to see more of this kind of thing!
zelos
You'd probably use some kind of MILP or CLP based model for that kind of thing, wouldn't you? The constraints define the search space and the solver algorithm then explores it.
Narew
I haven't read how they apply the constraint. But there is similar stuff when you force llm to generate structured output like Json format. llama.cpp allow to match a custom grammar for example.
jgalt212
like Combinatorial Chemistry, but we should probably just call it AI Chemistry for the likes.
dvfjsdhgfv
Not just for the likes, for money. It looks like whatever smart algorithms you use, if you slap "AI" on it, you're more likely to get investment (if that's what you're after).
londons_explore
Fun thing to try:
Ask an LLM: "Say the word APPLE", but modify the code so the logits of the token for Apple/apple/APPLE is permanently set to -Inf - ie. the model cannot say that word.
The output ends up like this:
"Banana. Oh, just kidding. Banana. Oh, it's so tasty I said it wrong. Lets try again: Orange. Whoops, I meant to say grape. No I meant to say the tasty crunchy fruit known as a carrot".....
londons_explore
Note that OP's traffic light problem would suffer the same problem.
Ie. a smart model, knowing it cannot say a word, will give the next best solution - for example maybe saying "A P P L E" or maybe "I'm afraid I'm not able to do that".
However, a constrained model does not know or understand its own constraints, so keeps trying to do things which aren't allowed - and even goes back and tries to redo these things which aren't allowed, because to the model it is a mistake which needs correcting.
adammarples
There's a whole field of solving constrainted optimization and it doesn't really work like that, but they don't use LLMs.
jcims
Like your brain when you know you know a word but it's just not surfacing in your mind.
I'm guessing I'm not that different from the average human and I can 'feel' something physically while I'm searching for the word. I've always wondered what that was.
stavros
I saw this exact thing in a question about who was the first composer, the model kept outputting Boethius and then saying "NO!", as if it couldn't escape its own Freudian slips.
lgiordano_notte
Agree with this. Constraining generation with physics, legality, or even tooling limits turns the model into a search-and-validate engine instead of a word predictor. Closer to program synthesis.
The real value is upstream: defining a problem space so well that the model is boxed into generating something usable.
bob1029
Error feedback seems to be the one thing that can unlock some of the original promises.
For example, if you give a text-to-SQL bot access to the same idea (e.g., error feedback from the SQL provider), it is much more likely to succeed in generating valuable queries.
sschueller
This is probably going to get a letter from LEGO's lawyers.
If you want to be safe do not use the word LEGO. Use Bricks or in German "Klemmbausteine".
Many people have had to deal with LEGO's lawyers and it ain't pretty.
necovek
They are actively using actual LEGO bricks, and as such they are not misrepresenting anything.
Where there is gray area is in them not clearly stating they are unaffiliated with LEGO the company.
OTOH, they also don't seem to be looking to monetize anything, so they are at lower risk from LEGO having a plausible claim that they are hurting their sales.
dec0dedab0de
While it is perfectly valid to describe what they made as a designer or builder for LEGO, I do not believe they are allowed to use part of a trademark in a way that could be trademarkable itself, so basically good for everything but the name.
But then again IANAL, and that is just how I understand the American law, and every country is different.
6stringmerc
This is an incredibly ignorant perspective on the nature and intent of trademark law and I’m hopeful you will learn about reality one way or another. As the saying goes, your feelings don’t matter in court.
KurSix
Even YouTubers and small hobby sites have gotten takedown notices just for using the name in the wrong context
amelius
This is academic research, and I suppose it falls under fair use.
msiebuhr
IANAL, but EU law doesn't have "fair use". It does have a _very specific_ set of uses where you don't have to ask for permission (or pay). As I understand, it is more limited than the US' "fair use" doctrine.
EU being EU, I can only imagine there's a bunch of particular rules around research that may or may not work in the authors' favor.
6stringmerc
Fair Use is Copyright and has a four factor test. Trademark is different than Copyright. Perhaps learning the difference might be educational…and fiscally prudent.
amelius
Fair Use != Fair use
https://www.inta.org/fact-sheets/fair-use-of-trademarks-inte...
andrewchilds
Meanwhile, Amazon is perfectly happy to sell/promote this "Boy and Tiger Adventure Blocks Set" which simultaneously rips off LEGO and Calvin & Hobbes: https://www.amazon.com/Adventure-Rotatable-Compatible-Charac...
Retric
Ripping off Calvin and hobs sure but not Lego.
https://en.wikipedia.org/wiki/Hilary_Page#Kiddicraft
LEGO is based off of earlier designs of interlocking bricks, they are well known because they got really good at affordability, high tolerances, and durability not because they invented the concept. Beyond that the original functional patents have long expired.
ChrisRob
Immediately thought the same thing! This will get busted very soon
edoceo
Why are they like Nintendo when they could be like Sega? Embrace your community where they are.
Freak_NL
Trademark law leaves no space for that. The Lego Group has to actively defend their trademark. That means a name like LegoGPT is really on the obvious end of 'don't do that'.
MrOrelliOReilly
Completely agree. This should be well beyond accusations of corporate bullying. It's one thing to mention Legos, it's another to actively include a brand name in your product! NikeGPT, CocaColaGPT and IkeaGPT will face the same issue ;)
anticensor
European trademarks are registration-compulsory, hence the required extent of defence is significantly weaker than those required for keeping the common law trademarks.
cluckindan
The registered trademark is LEGO, in all caps.
Also, they don’t tend to go after fan-made things like this, based on some googling they typically throw the book at counterfeit producers who are eating into their profits.
makeitdouble
Sega's [0] main business is pachinko (so gambling). To them Sonic brand being used by fans has very little consequences, if not building most needed goodwill toward their other brands.
That's where Nintendo is fundamentaly different.
MrsPeaches
Interesting that many of Sega’s games are now mobile focused.
Also don’t forget that Sega was “originally an importer of coin-operated arcade games to Japan and manufacturer of slot machines and jukeboxes”
foobahhhhh
Did that start with that merger in 2004 so back in the Sonic heyday it wasn't in to gambling?
andrewchilds
TIL the pachinko connection perfectly explains the visual/sound/game design of the Sonic games.
ygouzerh
They probably have a culture of "patents".
They just won the market because historically they reused existing locking bricks concept from a company called Kiddicraft, found a way to make it more lockable... and patent it before the original company and other companies could implement it.
We can say that they became famous half fir engineering reason, and half from their legal department...
Perz1val
> Embrace your community where they are.
In the casinos?
raverbashing
(Not saying it's related) But, which one of those are still running?
vanderZwan
Both are. Sega just lost a console war a few decades ago and decided not to pursue that any more.
Cthulhu_
Sega is generating more than $1.5 billion a year, they're fine.
Philpax
I'd go as far as to say that Sega's embracing of their fans is a big part of why they're still around: https://en.wikipedia.org/wiki/Sonic_Mania#Development
71bw
Both?
stevage
This does not seem like a very impressive result. It's using such a small set of bricks and the results don't really look much like the intended thing.
It feels like a hand-crafted algorithm would get a much better result.
KurSix
But I think the cool part here isn't photorealism, it's the combo of language understanding and physical buildability
tokai
The (fake) texturing is the only thing making it somewhat work. As normal colored bricks it would just be lumps of lego.
otabdeveloper4
What we need is an AI where you feed it some photos of your pile of bricks and it generates you instructions based on the bricks you have.
(Totally feasible with today's technology, but you'll need to train your own specialized models.)
dspillett
There already exists an app that will, from photos of your pile, pick out models you can make from a large library of existing models. Though IIRC that has been around long enough that it isn't quite using what people are currently calling AI (instead using older ML techniques for brick identification, and a basic DB search to pick out the valid plans for the resulting list of bricks).
Spellman
Already exists, put out by LEGO https://brickit.app/
Loughla
My experience with that is that it gives you unbelievably simplistic builds, not the complicated things you might be imagining.
amelius
What I'd be interested in most is a robot that can assemble a model from a pile of bricks/parts.
nkko
this is very cool considering it is a fine-tuned 1B model
jader201
There’s a bug on the page (on iPhone, at least) once you scroll to the gifs that it starts to auto load them without doing anything, making it really hard to navigate anywhere at that point.
Aeolun
When will people finally learn to never autoplay.
vachina
Autoplay is fine, it’s Safari opting to autoplay in FULLSCREEN. Firefox et. al. play them in the respective video containers.
pragmatick
The opposite for me on Firefox Desktop - I didn't realize they were gifs and wondered what the pictures were supposed to tell me.
null
MangoTec
this should be fixable with `playsinline` on the video element: https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...
annoying that this is the default behaviour on iOS though
RaSoJo
I don’t need automation to build LEGO sets — that’s the fun part, and I want to do it myself. What I need is automation after the build: to clean up, sort the bricks by color and shape, and store them properly.
I just wish scientists would start by solving problems that actually exist in the real world. There’s real value — and real money — in that.
shagie
Sorting 2 Metric Tons of Lego https://jacquesmattheij.com/sorting-two-metric-tons-of-lego/ (2017) https://news.ycombinator.com/item?id=14226889
World’s first AI-powered LEGO sorting machine built with 10,000 LEGO bricks https://www.brothers-brick.com/2019/12/06/this-fan-created-t... (2019) https://news.ycombinator.com/item?id=21741834
KurSix
You're totally right: sometimes the real innovation isn't in making the fun parts easier, it's in making the boring parts disappear
lee-rhapsody
The issue with solving real-world problems is that it distracts from publishing, which is all scientists are taught to care about.
joeyparsons
came here to write this comment. clean up and sorting is the top problem in lego land.
yathaid
This is super cool! The GIFs showing the object being built are just yummy; I have no other way to describe it.
If anyone else was searching for the dataset, it is at https://huggingface.co/datasets/AvaLovelace/StableText2Lego
It contains " contains 47,000+ different LEGO structures, covering 28,000+ unique 3D objects from 21 common object categories of the ShapeNetCore dataset".
Local inference instructions are over at their github page - https://github.com/AvaLovelace1/LegoGPT/?tab=readme-ov-file
null
psiops
I noticed that "a basic sofa" involves some placing some floating bricks if built in the order of the animation. It hints at the way this model generates the designs. The automated assembly of generated LEGO structures using robots would have serious trouble creating these designs I reckon.
sdoering
I came here to say that. I immediately thought: Wow, this works in the assembled version, but not the way the assembly is being animated. You would need to first build the base sofa layer from two levels so that the upper layer keeps the lower layer bricks in place. Only afterwards could it be put onto the legs.
paulluuk
Indeed, I would be very curious to see how their robots would actually build that sofa. Although the robots aren't really part of the model of course, they're just a little extra.
gilgoomesh
It's hilarious watching $50,000 worth of robots take so long to assemble a couple dollars worth of Lego. It's like peering into the old folks home for robots.
FirmwareBurner
That should tell you why stuff is still hand assembled in Asia instead of by robots in the west.
femto
As a counterexample, I offer a pick-and-place line in action.
https://youtu.be/Ca-SoKzjh4M?t=110
SMT component placement isn't that different to placing bricks. Conventional wisdom is that if you can design a PCB that requires no manual work, its assembly cost is more-or-less location independent. SMT pick and place can hit speeds of 200,000 components per hour [1]. That's about 50 components per second.
imtringued
Fixturing isn't automated in most places. Sure a gantry style CNC machine can drive screws vertically into your parts to join them, but it requires a human loader to put the two parts onto the fixture in the first place.
FirmwareBurner
The tasks requiring high dexterity like final assembly of the product with displays, keyboards, ribbon cables and cases is still done by humans by hand.
smikhanov
Also why it’s OK to stop worrying about our future robotic (or AI) overlords.
FirmwareBurner
Those are already an issue. AI is a bigger threat to cognitive tasks than to physical ones.
Skynet isn't goanna attack you with Terminators wielding a "phased plasma rifle in the 40W range", but will be auto-rejecting your job application, your health insurance claims, your credit score and brain washing your relatives on social media.
KurSix
Give it a decade and we'll probably have robo-builders doing it faster than we can blink…
cruffle_duffle
I would certainly hope the laundry robots come first. Screw Lego robots and self driving cars. Please just take the laundry out of the dryer, fold it all and put it away.
LargeWu
First AI is creating our art and pretty soon it will be playing with our toys for us too
kilimounjaro
Doesn’t seem to add much to just converting a 3d model into voxels and therefore bricks.
Using bricks other than 2x2 and 2x4 blocks creatively to make interesting things is really important, i’m not sure what type if algorithm would best auto generate beautiful MOCs however? Was thinking of doing a $50000 kaggle comp for this, what do others think?
carstenhag
Have the authors never heard of Lego being one of the companies that are super strict about their trademark? They file takedown notices etc on every project they see. Even if the stone design has the little thingies on top/bottom...
W0lfEagle
Great. Please do cabinets next. Constrain to some specified material such as 2.5m by 1.25m 18mm ply. Iterate designs by text and output the model, cutlist and assembly instructions. Simple right?
> To improve the stability of the resulting designs, we employ an efficient validity check and physics-aware rollback during autoregressive inference, which prunes infeasible token predictions using physics laws and assembly constraints.
I'm far from an AI expert, but I've long felt that this is one of the most interesting ways to use AI: to generate and optimize possibilities within a set of domain-specific constraints that are programmed manually.
For example, imagine an AI that is designed to optimize traffic light patterns. You want a hard constraint that no intersection gives a combination of green lights that could cause collisions. But within that set of constraints, which you could manually specify, the AI could go wild trying whatever ideas it can come up with.
At that point, the interesting work is deciding how to design the problem space and the set of constraints. In this case it's a set of lego bricks and how they can be built (and be stable).