Launch HN: Relace (YC W23) – Models for fast and reliable codegen

53 comments

·May 27, 2025

Hey HN community! We're Preston and Eitan, and we're building Relace (https://relace.ai). We're trying to make building code agents easy and cheap.

Here’s an example of our apply model vs. whole file edits: https://youtu.be/J0-oYyozUZw

Building reliable code agents is hard. Beyond simple prototypes, any app with code generation in production quickly runs into two problems -- how do you reliably apply diffs, and how do you manage codebase context?

We're focused on solving these two problems at order-of-magnitude lower price and latency.

Our first model that we released, in February, is the Fast Apply model -- it merges code snippets with files at 4300 tok/s. It is more reliable (in terms of merge errors) than Sonnet, Qwen, Llama, or any other model at this task. Each file takes ~900ms and gives an instantaneous user experience, as well as saving ~40% on Claude 4 output tokens.

Our second model focuses on retrieval. For both vibe-coded and enterprise codebases, retrieving only the files relevant to a user request saves both on SoTA input token cost and reduces the number of times code agents need to view files. Our reranker (evals below) can scan a million-line codebase in ~1-2s, and our embedding model outperforms any other embedding model for retrieval as evaluated on a corpus of Typescript/React repositories.

There are many different ways to build coding agents, but being able to edit code reliably and retrieve the most relevant parts of the codebase is going to be a foundational issue. We're excited to be building ways to make it more accessible to millions of users who don't want to spend $$$ on Claude.

These models are used in production, millions of times per week. If you've used Lovable, Create.xyz, Magic Patterns, Codebuff, Tempo Labs then you've used us!

Here's a link to try it out: https://app.relace.ai, and here are our docs: https://docs.relace.ai.

We've opened up free access for prototyping on our website to everyone, and the limits should be enough for personal coding use and building small projects (correct us if it’s not). We integrate directly with Open-Source IDE's like Continue.dev. Please try us out, we'd love to hear your feedback!

Visit

jumploops

We looked into many different diff/merge strategies[0] before finding Relace.

Their apply model was a simple drop-in that reduced the latency of our UX substantially, while keeping error rates low.

Great work Preston and Eitan!

[0] https://aider.chat/docs/more/edit-formats.html

eborgnia

Thanks for the support!

bcyn

Very interested to see what the next steps are to evolve the "retrieval" model - I strongly believe that this is where we'll see the next stepwise improvement in coding models.

Just thinking about how a human engineer approaches a problem. You don't just ingest entire relevant source files into your head's "context" -- well, maybe if your code is broken into very granular files, but often files contain a lot of irrelevant context.

Between architecture diagrams, class relationship diagrams, ASTs, and tracing codepaths through a codebase, there should intuitively be some model of "all relevant context needed to make a code change" - exciting that you all are searching for it.

ankit219

I have a different pov on retrieval. It's a hard problem to solve in a generalizable format with embeddings. I believe this can be solved at a model level where its used to fix an issue. With the model providers (oai, anthropic) going full stack, there is a possibility they solve it at reinforcement learning level. Eg: when you teach a model to solve issues in a codebase, the first step is literally getting the right files. Here basic search (with grep) would work very well as with enough training, you want the model to have an instinct about what to search given a problem. similar to how an experienced dev has that instinct about a given issue. (This might be what the tools like cursor are also looking at). (nothing against anyone, just sharing a pov, i might be wrong)

However, the fast apply model is a thing of beauty. Aider uses it and it's just super accurate and very fast.

bcyn

Definitely agree with you that it's a problem that will be hard to generalize a solution for, and that the eventual solution is likely not embeddings (at least not alone).

cocoflunchy

Relevant interview extract from the Claude Code team: https://x.com/pashmerepat/status/1926717705660375463

> Boris from the Claude Code team explains why they ditched RAG for agentic discovery. > "It outperformed everything. By a lot"

ankit219

This is very cool. They explained the solution better than I did. If I knew, I would have just linked this :)

null

[deleted]

eborgnia

Adding extra structural information about the codebase is an avenue we're actively exploring. Agentic exploration is a structure-aware system where you're using a frontier model (Claude 4 Sonnet or equivalent) that gives you an implicit binary relevance score based on whatever you're putting into context -- filenames, graph structures, etc.

If a file is "relevant" the agent looks at it and decides if it should keep it in context or not. This process repeats until there's satisfactory context to make changes to the codebase.

The question is whether we actually need a 200b+ parameter model to do this or if we can distill the functionality onto a much smaller, more economical model. A lot of people are already choosing to do it with Gemeni (due to the 1m context window), and they write the code with Claude 4 Sonnet.

Ideally, we want to be able to run this process cheaply in parallel to get really fast generations. That's the ultimate goal we're aiming towards

darkteflon

This is very interesting. There was also an article and discussion a couple of days ago on using diffusion models for edit/apply tasks at ~2k tps[1].

If I understand correctly, the ‘apply’ model takes the original code, an edit snippet, and produces a patch. If the original code has a lot of surrounding context (e.g., let’s say you pass it the entire file rather than trying to assess which bits are relevant in advance), are speed and/or performance materially affected (assuming the input code contains no duplication of the code to be amended)?

Does / how well does any of this generalise to non-code editing? Could I use Relace Apply to create patches for, e.g., plain English markdown documents? If Apply is not a good fit, is anyone aware of something suitable in the plain English space?

[1] https://news.ycombinator.com/item?id=44057820

eborgnia

The diffusion approach is really interesting -- it's something we haven't checked out for applying edits just yet. It could work quite well though!

You can definitely use it for markdown, but we haven't seen anyone test it for plaintext yet. I'm sure it would work though, let us know if you end up trying it!

darkteflon

Perfect, thanks for the reply - I absolutely will try it, we have a specific need for this capability.

diggan

Looks interesting and useful if the accuracy numbers are as told. Kind of sad it's only available via a remote API though, makes the product more like a traditional SaaS-API. The marketing keeps talking about "models" yet the actual thing you use is only the API, would have been nice to be able to run locally. Although I do understand that it's harder to make money in that case.

I got curious about what datasets you used for training the models? Figured the easiest would be to scrape git repositories for commits from there, but seems there are also quality issues with an approach like that.

eborgnia

Open source git repos are a really good place to get data -- it requires a lot of munging to get it into a useful format, but that's the name of the game with model training.

It's on the roadmap to make public evals people can use to compare their options. A lot of the current benchmarks aren't really specialized for these prompt-to-app use cases

harrisreynolds

Nice! I am currently writing a new version of my no-code platform, WeBase [1], to use AI to generate and edit applications.

Currently just using foundation models from OpenAI and Gemini but will be very interested to try this out.

My current approach is to just completely overwrite files with new updated version but I am guessing using something like Relace will make the whole process more efficient... is that correct?

I'll watch your video later but I would love to learn more about common use cases. It could even be fun to write a blog post for your blog comparing my "brut force" approach to something more intelligent using Relace.

[1] https://www.webase.com (still points to the old "manual" version)

diggan

> My current approach is to just completely overwrite files with new updated version

Overwriting full files work great <100 lines or so, but once you want to be able to edit files above that, it kind of gets very slow (and costly if using paid APIs), so using some sort of "patch format" makes a lot of sense.

eborgnia

Happy to collaborate, shoot us an email at info@relace.ai :)

bradly

Great job. I think this is a great area to focus on.

I am a solo developer who after trying to run local llm models for code and not being satisfied with the results is back to copy/pasting from browser tabs. I use vim so getting llm/lsp integration working reliably has felt questionable and not something I enjoying tinkering with. I tried aider with Google's Geminis models, but I never got the IAM accounts, billing quotas, and acls properly configured to get things to just work. I thought it would be fairly straight forward to build a local model based on my Gemfile, codebase, whatever else and have a local llm be both a better and cheaper experience than claude code which I blew threw $5 results that weren't usable or didn't save time after.

The sign up experience was really smooth. Like anything it else, is so easy to over complicate or be too clever, so I commend you for having the discipline to get it straight forward and to the point.

After account verification I didn't feel I understood what to do when landing on the Add Code Playground experience. It took me a while to grok what the three editors were doing and why there was JavaScript on the left and python on the right, but with an option for JavaScript. I found https://docs.relace.ai/docs/instant-apply/quickstart in the docs and at myself would be a better place to land after signup. I'd even recommend having the tabs on those snippets to be able to just grab a curl command and tip my toe in.

I think my biggest miss was my own assumption that a custom model was going to be a local model. Not that it was represented that way, but my brain was lumping those things together prematurely.

eborgnia

Hey, really appreciate the detailed sign up journey here! Getting the simplest flow is hard, and it's something we obsess over. The docs have been a work in progress for the past couple of months, but now that they are getting better I think it's a good idea to make them more front and center for new users.

We are trying to make this as accessible as possible to the open-source community, with our free tier, but feel free to reach out if you need expanded rate limits. Cheers :)

blef

We are using Relace in production to apply code for a month and this is crazy how easy it has been to integrate it (less than 30 minutes). The most impressive thing when you come from a general purpose LLM is the speed and the accuracy relace brings.

In the past we were using o4-mini which had an annoying issue at adding newline when not needed and was slow (5s+), relace fixed all these issues.

eborgnia

Glad it's working out -- thanks for the support :)

bigyabai

> We're trying to make building code agents easy and cheap.

What is your plan to beat the performance and cost of first-party models like Claude and GPT?

eborgnia

Hey -- good question! We're focused on a narrower task right now that aims to save frontier tokens (both input & output). Our merge + retrieval models are simply smaller LLMs that save you from passing in too much context to Sonnet, and allow you to output fewer tokens. These are cheap for us to run while still maintaining or improving accuracy.

ramoz

I can import my entire codebase to Gemini and get more than a nuanced similarity score in terms of agent guidance.

What’s the differentiator or plan for arbitrary query matching?

Latency? If you think about it - not really a huge issue. Spend 20s-1M mapping an entire plan with Gemini for a feature.

Pass that to Claude Code.

At this point you want non-disruptive context moving forward and presumably any new findings would only be redundant with what is in long context already.

Agentic discovery is fairly powerful even without any augmentations. I think Claude Code devs abandoned early embedding architectures.

eborgnia

Hey, these are really interesting points. The question of agentic discovery vs. one-shot retrieval is really dependent on the type of product.

For Cline or Claude Code where there's a dev in the loop, it makes sense to spend more money on Gemeni ranking or more latency on agentic discovery. Prompt-to-app companies (like Lovable) have a flood of impatient non-technical users coming in, so latency and cost become a big consideration.

That's when using a more traditional retrieval approach can be relevant. Our retrieval models are meant to work really well with non-technical queries on these vibe-coded codebases. They are more of a supplement to the agentic discovery approaches, and we're still figuring out how to integrate them in a sensible way.

nico

Very interesting. Can these models be used in editors/agents like aider or roo? I can see also see a use case of some sort of plugin or browser extension, to easily apply the patches provided by GPT/Claude on their web interfaces (without having to copy/paste and manually edit the files in the editor)

Also, would love to see more concrete examples of using the Apply model

Reading here: https://docs.relace.ai/docs/instant-apply/quickstart

Is it correct, that first I need to: 1) have some code, 2) create a patch of the code with the changes I want, 3) call the Apply model with the full code + patch to make the changes and provide the result?

Do you have metrics to compare that workflow with just passing the code from 1) with a prompt for the changes to something like gpt/claude?

pfunctional

(Preston, other guy on the team)

Yes, they can -- I actually tried a semantic edit implementation in Aider. It got the "correct edit format" percentage to 100%, but didn't really budge the overall percent correct on SOTA models. I should push it sometime, since it really helps the reliability of these local models like Qwen3. If you reach out to me, I can try to share some of this code with you as well (it needs to be cleaned up).

But yes, 1. have some code, 2. create a patch (semantic, diff, or udiff formats all work), and 3. apply will return it to you very fast. There's roughly a 10-15% merge error rate when we last benchmarked on using Claude 3.7 Sonnet to create diff patches, and with us it was 4%; and you can use the Apply as a backup if the merge fails.

conartist6

What's the semantic diff format?

mercurialsolo

Good job on the launch - will give it a spin for our coding agent. Having worked a bunch with the agents i see below as the next evolution or leap in agents.

I see 2 big factors to improve ability of coding agents today

- on device model - context (or understanding of modules) - not only retrieving the relevant sections or codebase but creating a version (transforming it) which is readily consumable by a model and used to focus on the problem at hand.

This requires both a macro global context of the codebase and the ability to retrieve the local context of the problem being solved.

Augment context e.g. does a fairly good job of context compression and retrieval among coding agents. Fast indexing & retrieval is a good step forward to enable open context compression

eborgnia

Thank you :)

Please do reach out, we love talking to builders in this space & would love to share notes & give you free access. eborgnia@relace.ai

mousetree

This looks great - congrats. Would I be able to build an internal "Chat with your code" type product? I'm looking at the "code reranker" api docs and they require a list of files including the code itself. This would probably be infeasible to send all our code just to get relevancy. Are there any local ways of reducing the list of files and code we send?

We're using Github Copilot chat (the interface hosted at https://github.com/copilot) for this purpose today but I'm curious how one would build such a thing ourselves.

rbitar

Excited to try this out, it will solve two problems we’ve had: applying a code diff reliably and selecting which files from a large codebase to use for context.

We quickly discovered that RAG using a similarity search over embedded vectors can easily miss relevant files, unless we cast a very wide net during retrieval.

We’ve also had trouble getting any LLM to generate a diff format (such as universal diff) reliably so your approach to applying a patch is exciting.