Show HN: K(r)ep - A high-performance string search utility
15 comments
·April 16, 2025aw1621107
finnh
With this comment showing that search was inconsistent and gave incorrect results: https://news.ycombinator.com/item?id=43335277
why the repost?
globnomulous
He posted the project 36 days ago, 35 days ago, and today. The commit history appears to show that he has been working on it, but that's no excuse for this sort of re-posting.
The guy has a pattern of not just posting his own projects but contributing nothing meaningful to discussion. Either he makes a sales pitch when introducing the project or he thanks people for their feedback and fails to engage in any way with substantive feedback or criticism.
Last time he posted this project he couldn't even be bothered to reply to a very generous response by the author of ripgrep: https://news.ycombinator.com/context?id=43334661
I'm not a moderator (thank God) but it's patently clear to me that there is an etiquette, essential for any productive conversation, that this guy isn't following and that his submissions should be ignored.
burntsushi
He also posts threads on reddit and then deletes everything:
https://old.reddit.com/r/programming/comments/1jra9y0/introd...
https://old.reddit.com/r/programming/comments/1jpk8sw/krep_a...
https://old.reddit.com/r/commandline/comments/1jqhssc/krep_b...
daviducolo
I reposted the project because I have released a new version fixing everything that users have encountered.
alkh
The fact that even in the test example it took ripgrep 0.115 s with 97% CPU usage vs Krep's 0.106 s with 328% CPU usage is a testament of why ripgrep is still preferable. Such a huge tradeoff in resource consumption for such a small speed gain is hard to justify
burntsushi
ripgrep author here.
Like many grep tools, it also has significant performance cliffs that are very easy to hit:
$ time rg -o -c '[A-Z]' subtitles2016-sample.en
51835353
real 4.391
user 4.344
sys 0.042
maxmem 923 MB
faults 0
$ time krep-1.0.2 -E -o -c '[A-Z]' subtitles2016-sample.en
^C
real 8:02.95
user 3:04:10.58
sys 4.203
maxmem 919 MB
faults 0
ripgrep has cliffs too, but they tend to be rarer and/or still give decent perf.daviducolo
I have just released version 1.1.0 which should fix the problems reported by users.
https://github.com/davidesantangelo/krep/releases/tag/v1.1.0
MattPalmer1086
There's a few issues with the Horspool implementation.
To obtain non case sensitive searching, you set the shift in the bad char table for both the upper and lower case variants. That's fine. But when you come to do the search, you ignore this and only look up the lower case version if it's non case sensitive. This is unnecessary - the valid shifts for both cases are already in the table, since you put them there. So you can avoid doing a lookup on the lower table.
I think you also only ever shift by one if you get a full match, but this is also unnecessary. You can shift by whatever the last char bad char shift is and you won't miss any patterns, even if they can overlap.
Finally, you have a test to see if the shift is zero, and to set it to one if it is. You should never get a shift of zero from the bad char table (you don't process the last char of the pattern when creating the bad char table, so there is never a distance of zero from the end).
daviducolo
Thank you for the report! Here’s what I changed:
1. Always use the raw bad_char_table[text[i + pattern_len‑1]] shift—no extra lowercase lookup at runtime. 2. After a full match, advance by that same table shift instead of just doing i++, so you never miss overlapping occurrences. 3. Removed the “if shift==0 then shift=1” guard—since we build the table excluding the last pattern byte, zero shifts can’t occur.
https://github.com/davidesantangelo/krep/releases/tag/v1.0.6
MattPalmer1086
Looks better! A couple of other minor comments, since I'm looking at it:
1. You don't need to assign the last char of the pattern inside the search loop every time around the loop. It will never change (line 1143). Just do it once before the search loop starts.
2. tc_last is the text aligned with the end of the pattern window (line 1142). You don't need to get it again when getting the shift (lines 1201, 1208). bad char is the last char which you already have.
daviducolo
I have applied your suggestions. Thank you again.
https://github.com/davidesantangelo/krep/releases/tag/v1.1.1
null
Posted here about a month ago: https://news.ycombinator.com/item?id=43333946