Type-constrained code generation with language models
51 comments
·May 13, 2025homebrewer
Hejlsberg mentioned the ability to quickly provide accurate type information to LLMs as one of the reasons for rewriting tsc into Go:
tough
But isn't TypeScript already a typed language to begin with?
habitue
This is about the speed with which the compiler can advise an LLM that a particular thing checks or doesn't check. Typescript is much slower than Go
tough
okay so basically the faster compiling means a tigher feedback loop for the LLM to -know- if the code compiles or not etc? interesting
is go faster than rust?
cpfiffer
We (.txt, the outlines people) had a brief thread about this paper on twitter if you're interested: https://x.com/dottxtai/status/1922322194379551128
ArcaneMoose
I think TypeScript is uniquely positioned to be the optimal language for LLMs. Tons of training data (benefiting from all the JS examples as well) plus the structure of types for LLMs to follow and tools to enforce.
pram
LLMs work well with any static analysis tool. I frequently instruct Claude to use stuff like “go vet” and “deadcode” when it goes on a tear and writes a bunch of broken trash and declares mission accomplished.
koakuma-chan
> LLMs work well with any static analysis tool.
tsc error messages are so bad that every time my LLM sees one of those "SomeType is not assignable to SomeLongAssTypeDontEvenTryToUnderstandWhatsGoingOnHere<<<<>>>>>>>>>>>>>>>>>>>>" it just gives up and casts to any. goes for python too.
floydnoel
ha, that's always been my biggest gripe with ts
AaronAPU
I can’t be the only one who hopes this was a joke.
OutOfHere
There are languages that constrain types a lot more tightly than TypeScript, e.g. Kotlin, Rust, and Haskell. The more constrained the types, the more correct the program could be.
mindwok
Yep, and Rust famously goes beyond this by modelling memory ownership at compile time.
In fact, the more behaviour we can model at compile time the better when it comes to LLMs - there's some cool ideas here like transpiling Rust into languages for formal verification. See https://github.com/formal-land/coq-of-rust as an example.
Formal verification was one of those things that was previously so annoying to do that it rarely made it past academic use cases or extremely important libraries, but I think LLMs take the tedium out of it. Perhaps formal verification will have a "test driven development" type of moment in the sun thanks to this.
koakuma-chan
Can LLMs properly code in Rust yet? There is way more TypeScript code out there compared to Rust, and I doubt structured output can alleviate this.
muglug
Really cool results!
That this research comes out of universities, and not large AI labs, makes me think those labs believe that larger models are still the way to go.
aibrother
+1 this seems like healthy development
slt2021
we really need LLM trained on AST, instead of token, is there any research on this?
tough
ASTrust: Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations
https://arxiv.org/abs/2407.08983
AST-T5: Structure-Aware Pretraining for Code Generation and Understanding
https://arxiv.org/abs/2401.03003
CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation
tough
The code can be found here: https://github.com/eth-sri/type-constrained-code-generation
notnullorvoid
The general idea seems very promising, I had been hoping someone would do something like this since seeing JSON schema structured outputs for LLMs.
Need to dig in a bit more on the implementation, but I was surprised that the paper didn't mention hooking into existing language service/server. There's more than types that an LLM could leverage from existing language tooling. Auto imports is a good example, it is handy for the human developer to keep a linear writing flow, something a LLM needs even more.
koakuma-chan
The vibe code society would benefit way more if libraries hosted their docs in a way that's easy to copy and paste into an LLM.
tough
many docs now include llms.txt https://llmstxt.org/
koakuma-chan
I saw that but it doesn't work for me. I use Gemini 2.5 Pro Preview right now, and it cannot fetch content from links. What I am looking for is a large text file with public API class, function, etc. signatures, plain text documentation and code examples.
bmc7505
The correct way to do this is with finite model theory but we're not there yet.
compacct27
Honestly it's already working great in Cursor. Even adapting one type structure to another is quickly handled.
nikolayasdf123
nice. the speed of AI development is accelerating so fast
Also worth checking out MultiLSPy, effectively a python wrapper around multiple LSPs: https://github.com/microsoft/multilspy
Used in multiple similar publications, including "Guiding Language Models of Code with Global Context using Monitors" (https://arxiv.org/abs/2306.10763), which uses static analysis beyond the type system to filter out e.g. invalid variable names, invalid control flow etc.