Skip to content(if available)orjump to list(if available)

Show HN: Semantic grep for Claude Code (RUST) (local embeddings)

rane

I tried in my relatively small project.

    ~/c/l/web % ck --sem 'error handling'
    ℹ Semantic search: top 10 results, threshold ≥0.6
    ⠹ Searching with semantic mode...
All I got was spinning M2 Mac fan after a minute, and gave up.

rane

Cool. Some AI fluff can be detected in the README.

For example under the "Why CK?" section, "For teams" is of no substance compared to "For developers"

MarkMarine

I saw this comment a little bit back and I don’t think the OP expanded on it, but this looks like a fantastic idea to me:

sam0x17 20 days ago:

Didn't want to bury the lead, but I've done a bunch of work with this myself. It goes fine as long as you give it both the textual representation and the ability to walk along the AST. You give it the raw source code, and then also give it the ability to ask a language server to move a cursor that walks along the AST, and then every time it makes a change you update the cursor location accordingly. You basically have a cursor in the text and a cursor in the AST and you keep them in sync so the LLM can't mess it up. If I ever have time I'll release something but right now just experimenting locally with it for my rust stuff On the topic of LLMs understanding ASTs, they are also quite good at this. I've done a bunch of applications where you tell an LLM a novel grammar it's never seen before _in the system prompt_ and that plus a few translation examples is usually all it takes for it to learn fairly complex grammars. Combine that with a feedback loop between the LLM and a compiler for the grammar where you don't let it produce invalid sentences and when it does you just feed it back the compiler error, and you get a pretty robust system that can translate user input into valid sentences in an arbitrary grammar.

https://news.ycombinator.com/item?id=44941999

mellosouls

This looks interesting and I look forward to trying it but the title here should really just use the description of the repo, or that be adjusted.

Apart from anything else it appears to be very misleading as Rust (ironically) according to the documentation is not one of the languages supported.

anthonyronning

I clicked on this because it said rust in the title. Very disappointed.

Runonthespot

I'll add rust, ruby, elixir, Clojure next. It says rust as it's written in rust, sorry about that!

abyesilyurt

What model are you using to create the embeddings?

Runonthespot

BAAI/bge-small-en-v1.5 but considering switching this to google's latest gemmaembedding - it's fairly switchable.

0x696C6961

This is cool, but I don't understand why it tries to re-implement (a subset of) grep. Not only that, but the grep-like behaviour is the default and I need to opt-in to the semantic search using the --sem flag. If I want grep I can use grep/ripgrep.

Runonthespot

Fair comment- the initial thinking was to have both and in fact a hybrid mode too which fuses results so you can get chunks that match both semantically and on keyword search in one resultset. Later could add a reranker too.

alvis

Or another way of thinking. How much is the penalty we are talking about for semantic vs conventional grep?

My thinking is that for large codebase, sorting embedding matches maybe more efficient than reading all files and hence there is no point to put semantic search behind a --semantic flag

ozten

This generalizes to a whole new category of tools: UX which requires more thought and skill, but is way more powerful. Human devs are mostly too lazy to use, but LLMs will put in the work to use them.

Alifatisk

At this point, we aren't even saying it's written in Rust anymore, we just mention it in the title whenever possible.

I did look into the core features and I gotta say, that looked quite cool. It's like Google search, but for the codebase. What does it take to support other languages?

Runonthespot

It supports most languages but needs a bit of tree-sitter setup to do semantic chunking. Let me know what languages you’d like added

t0mas88

Java would be useful as well for larger backend codebases.

Alifatisk

Thanks for your quick response, most large codebases I've been fiddling on is Ruby!

benzible

I'd love to see elixir support.

Bigsy

Clojure would be awesome

dmd

What does this have to do with Claude Code?

Runonthespot

Mainly I wrote it because I noticed Claude's "by design" use of grep meant it couldn't search the code base for things it didn't already know the name of, or find "the auth section". But equally, it's well documented that e.g. Cursor's old RAG technique wasn't that great.

My idea was to make a tool that just does a quick and simple embedding on each file, and uses that to provide a semantic alternative that is much closer to grep in nature, but allows an AI tool like Claude Code to run it from the command line - with some parameters.

Arguably could be MCP, but in my experience setting up a server for a basic tool like this is a whole lot of hassle.

I'm fairly confident that this is a useful tool for CC as it started using it while I was coding it, and even when buggy, was more than willing to work around the issues for the benefit of having semantic search!

furyofantares

CC is so good with grep that I'm half expecting to clutter its context with bad results from semantic search. But also half optimistic at this just improving its search.

If you're getting useful results from hybrid mode that's very interesting to me since well-constructed grep that claude executes don't really look like they'd work great for semantic search to me! But intuition is often wrong on this stuff.

I am very curious your thoughts on speed. I'd rather any tools claude invokes be as fast as possible so it can get feedback immediately and execute again.

alvis

A proper title could be "Semantic grep with completely local embeddings"

Put the title aside, the tool, if it works as described, is pretty insane

skybrian

This looks very useful.

Looks like you have to build an index. When should it be rebuilt? Any support for automatic rebuilds?

Runonthespot

Yes- files are hashed and checked whenever you search so index should always remain up to date. Only changed files are reindexed. You can also inspect the metadata (chunking semantics, embeddings). It’s all in the .ck sidecar

null

[deleted]

ayhanfuat

Isn't Claude Code's selling point that it doesn't use embeddings?

joshuanapoli

I don’t think that “Claude Code” is relevant to this semantic grep tool.

Runonthespot

bear in mind that Claude Code by default uses grep - if you watch you'll see if it's looking for something it doesn't know the name of, it flails around with different patterns. Try this tool, tell CC to take a look using ck --help and take it for a spin.

CC in my case likes it so much, it started using it to debug the repo rather than grep and suggesting its own additions

Runonthespot

Note that it’s grep AND semantic - so Claude can start with a grep strategy and if it finds nothing can switch to semantic, and since it’s local and fast, it keeps in sync easily enough

brookst

How do you tell CC to use it? Just as an entry in Claude.md?