Show HN: Transductive regular expressions for text editing
41 comments
·February 7, 2025danielparks
Cool, I’m interested to see where you go with this.
I found the operator precedence unnatural, and it looks like a lot of other folks in this thread did too. I would naturally assume `cat:dog` would be equivalent to `(cat):(dog)` rather than `ca(t:d)og`.
froh
sweet, I did my "Diplom" CS thesis around 1997 on finite state transducers. was mich less trivial than I'd thought. the ask was to implement composition and DFAs where possible. also for composed transducers. "algebra of finite state transducers". use case was morphology. the topic was heavily underestimated and I had to finish half way through. so: chapeau :-)
anyhow
wrt syntax, are you sure you want ':' to bind stronger than concatenation 'ab' ?
Etheryte
I have to say, incredibly bold of you to essentially hinge your graduation on whether you can regex hard enough or not.
c0nstantine
yeah. Transducers are very old topic. For some reason they were not connected to a specific language like regex.
> wrt syntax, are you sure you want ':' to bind stronger than concatenation 'ab' ?
That's something I am still not sure about. I took a hundred examples and it looked more natural this way (: lower then .). But I can change it with the change of one digit in the code, literally. That's why I'm posting here. I need some real feedback.
zoogeny
> Avoid using * or + in the right part, as it can cause infinite loops
Why not just disallow this? I understand it would make the grammar more difficult to specify - but I don't see any good reason to keep it.
c0nstantine
Fair point. I agree. Now it is better to disable it.
The rationale was to implement a fun operation called transducer composition. It is possible to do simple operation on strings and compose trre's like filters. But I haven't finished it yet. So again, a fair point.
shawnz
Another question about this issue: in the case of `:a*` for example, why doesn't it just pick empty string as the replacement text and immediately exit? Why would it be an infinite loop if `-g` isn't specified?
And then if `-g` were specified, you could use this to create infinite generators, which could be a useful construct in some cases -- like `yes` but more powerful.
EDIT: Another interesting use case, if I am understanding correctly: if this worked, then you could use `:(<regex>)` to have it output an example of a string that matches any given regex. `-g :(<regex>)` produces a generator of every string in the language matched by that regex. `-g :(<regex>) | head -n 100` would give you 100 examples.
crazygringo
> Regular expressions is a great tool for searching patterns in text. But I always found it unnatural for text editing.
The entire purpose of this project seems to hinge on this assertion, but there isn't a single example.
I don't understand what makes regex unnatural for editing? What is meant by editing? Why do people struggle with groups?
There are lots of examples of the syntax for this project, but why is it better than regular regex?
If there were a few examples showing "here's the vanilla regex version, here's my version, here's why my version makes this easier" I might be able to understand this project.
c0nstantine
Fair point. The most explicit example if you need to change something in context. For example if we need to change 'y' to 'Y' only if it occurs between x and y you would do something like this in python.
pattern = r'(x)y(z)'
replacement = r'\1Y\2'
result = re.sub(pattern, replacement, text)
I would like to replace it with 'xy:Yz' pattern:
result = re.trre('xy:Yz', text)
If you need your x, z to be more complicated patterns or even regex themselves it can be more handy using this approach.
rendaw
I think it's an interesting project, but is this a functional replacement for regex substitutions? It seems like you can only replace text with something else at the same location. Separating the replacement from the matching can be unweildy, but you can reorder, repeat, and insert text between things too.
IIUC
crazygringo
Thanks!
I guess I'm still struggling to see how it's simpler overall.
Most of the examples on your page don't involve groups at all, e.g.:
$ echo 'catcatcat' | trre '((cat):(dog))*'
dogdogdog
That already seems a lot more complicated than just: re.sub('cat', 'dog', 'catcatcat')
I don't need to use groups that often in regex replacements, and when I do I'm already trying to do something complicated, and it's not clear to me why the colon syntax is easier to write, easier to understand, or if it's as flexible.Not trying to criticize the project, just genuinely trying to understand the specific strengths and limitations of the proposed syntax. E.g. what if I want to turn xyz into zYx?
c0nstantine
>> E.g. what if I want to turn xyz into zYx?
echo 'zyx' | ./trre 'xy:Yz|zy:Yx'
It is still a regular language. I do not introduce references.
You are right in sense the `sed` is far superior editor. But here I see some advantages: - the current implementation is super small; it is direct translation to an automaton - the complex patterns may be compiled in a more efficient way using deterministic transducer. I can't defend this claim now but I have some evidences - there are some tricks you can do using 'generative' part of it, e.g. and you even can find levenshtein distance of 1 between two strings just by generating substitutions/insertions/deletions and implement a simple spell checker.
Overall, I think you have a good point. Maybe it is just marginal improvement (if any). It was more comfortable to write in this style instead of group usage. I used it for some time and found it handy (especially as extended `tr`).
Etheryte
I would say regex is usually pretty write once edit never. Building a prototype that tries to look beyond that is a great way to see what a better future might look like in this regard.
everdimension
Come on, it's about replacements. They're easier to express (meaning literally easier to type out) with the author's syntax
Great project
null
bawolff
[flagged]
andrewla
I feel like this is very underspecified, The very first example:
$ echo 'cat' | trre 'c:da:ot:g'
dog
Feels strange. What is happening here; the grammar says TRRE <- TRRE* TRRE|TRRE TRRE.TRRE
TRRE <- REGEX REGEX:REGEX
What is the parse tree here? Why is "c" not being replaced with "da"? Or why isn't c being removed and "da" being replaced by "ot"?I do like the idea of having a search/replace semantic that is more intuitive than grouping operators; back in MS-DOS days you could do "ren .log .txt" and this would work which feels bananas to my modern bash-minded way of thinking, but it's very obvious looking at this what it is supposed to do.
kccqzy
Yes it is underspecified. The deletion example shows that an empty string is possibly a REGEX. So you can essentially treat any position as containing as many empty string regexes as you want. So there are indeed infinite number of parses.
If we instead require regex to be non-empty (breaking the deletion examples), then the ambiguity becomes that of concatenation: whether it's '(((c:d)(a:o))(t:g))' or '((c:d)((a:o)(d:g)))'. Assuming associativity, this would not matter.
danielparks
This is a matter of operator precedence and tokenization. Tokens are single characters in this language, and there is an invisible operator between them.
If the operator were explicit (let’s call it ~), the example would look like this:
$ echo 'cat' | trre 'c:d~a:o~t:g'
dog
With unnecessary parentheses: $ echo 'cat' | trre '(c:d)~(a:o)~(t:g)'
dog
null
Imustaskforhelp
From what it feels as to how it works, it seems that c:d and a: (nothing) and ot:g
but yes now that I read it , it also makes confusion , theoretically your point makes valid , I also believe that c should be replaced by da after I read the repo , I am not sure ...
null
sabellito
I love the idea, wanna se where it goes. Gives me the same vibe as when jq came out all those years ago.
i3oi3
Are the examples all actual outputs of the program? It's entirely possible that my understanding of the grammar is off, but it looks like these examples are wrong:
$ echo 'cat dog' | trre 'c:bat|d:hog' bat hog
$ echo '' | trre ':a*' # <- do NOT do this dog
$ echo '' | trre ':(repeat-10-times){10}' dog
c0nstantine
The second line actually is an output. I've modified the README. The last example is a typo. Fixed. Thanks!
meonkeys
And the first one? Wouldn't the output be
batat hogog
wfn
What a pleasure your C code is :) very nice (currently reading it)
My only quick comment is - the link to `theory.pdf` in README is broken (your pdf is in docs/ dir, so just need to change the url to include docs/).
c0nstantine
Thank you! For the feedback and pointing to the typo. Fixed. Actually my C is very rusty and I am bit uncomfortable about this.
mordechai9000
I would probably use this in a text editor if it was available. I don't struggle with group logic, but I often forget which tools use \ to reference capture groups, and which use $. (If it's Microsoft or Microsoft-adjacent, it's probably $.)
rjh29
and of course Perl supports both!
the_arun
> $ echo 'xor' | '(x:)or' 'xor'
> cat
I got lost here.
c0nstantine
Typo. Fixed. Thanks. Too many cats in examples.
trashburger
Also too many dogs it seems, as the infinite and finite loop examples also produce "dog".
pipeline_peak
I just ask chat-gpt to handle things like this. No need to memorize trivial syntax.
metadat
When I saw this headline, I got excited about the prospect of a new innovation in the application of Regular Expressions. After reading, I was scratching my head because trre doesn't provide any new capabilities and is essentially just yet another flavor of regex. Additional complexity without a significant upside.
Trre seems like an arbitrarily different version of `sed -e /a/b/ ..`. This method of search and replace is essentially ported everywhere else, from awk to vim to IntelliJ, and has always gotten the job done adequately and feels as natural as anything else I've learned.
Am I missing something?
p.s I just realized I've been regex'ing heavily for 21 (!) years now, time flies.
null
An extension of regular expressions for text editing, with a grep-like command-line tool. If you, like me, struggle with group logic in regular expressions, you might find it useful.
I wanted to do this for a very long time. It is more of a sketch or prototype. I'd really appreciate your feedback!