Evaluating modular RAG with reasoning models
5 comments
·February 25, 2025mkesper
Latency must be brutal here. This will not be possible for any chat application, I guess.
bauefi
It depends on how you do retrieval. If you just use dense embeddings for example you can get the latency of one search query down to maybe something like 400ms. In that case multiple sequential look ups would be ok but your embeddings need to be good enough of course
emil_sorensen
Yep even with a small bump in performance (which we only saw for a subset of coding questions), it wouldn't be worth the huge latency penalty. Though that will surely go down over time.
emil_sorensen
Curious if anyone else has run similar experiments?
Is RAG any good for coding tasks?