Fixing Left and Mutual Recursions in Grammars
12 comments
·February 2, 2025earleybird
Use an Earley parser. A properly implemented Earley parser will do unambiguous grammars with left/right/mutual recursion in linear time. The worst case ambiguous I've worked with is UBDA (Earley, 1970). UBDA produces the equivalent of all the permutations of a binary tree - which is Catalan(n). If you generate all the combinations as you parse it won't complete before the heat death of the universe for fairly short strings.
tempodox
An interesting and worthwhile topic. This kind of problem pops up regularly.
Those diagrams are really handy, does anyone know how they were made? I'm always looking for alternatives / companions to the Railroad Diagram Generator (https://rr.red-dove.com/ui).
Rochus
These diagrams indeed look good. But actually I never use those for parser construction, but consider EBNF more useful, especially also when analyzing and correcting a grammar. I was surprised that there was no decent editor to develop grammars which has built-in support to find and show issues in the grammar like left recursion or ambiguities (at least I didn't find one), so I implemented my own: https://github.com/rochus-keller/ebnfstudio/. I used it in all of my programming language projects to create LL(n) versions including some pretty complicated grammars like Simula or Verilog.
aardshark
draw.io can do similar diagramming. You can generate diagrams programmatically through their CSV format.
Unfortunately documentation for using it programmatically is a bit lacking, but you can for example host a CSV somewhere and use their editor to load it. It's all client side so will automatically update.
brightprogramer
Use mermaid js : https://mermaid.js.org/syntax/stateDiagram.html
mrkeen
Imo this missed the punchline.
I banged my head on the table for days trying to fix a left-recursive grammar, just by moving bits around and hoping.
The key is precedence! Parse loose expressions first (plus, minus, etc), these loose expressions can in turn call into tighter expressions (mul, div), and so on down to parentheses.
Parentheses do form a loop - that is, they call all the way back to the top - but at long as the parser makes progress i.e. can only proceed if it moves past a '(', no infinite loops and you're golden.
WhyIsItAlwaysHN
I think you're semi-right here, but the term precedence is ambiguous.
Left recursion causes rules to be ambiguous. In `<S> | a`, it's always ambiguous if the input has ended. Thus the data structure that would be traversed can be a circular graph. In `<S> := a | <S>, S := ε', the rules are unambiguous and the data structure that is traversed has the shape of a tree.
You get precedence rules by defining the shape of this tree, thus which rules will be visited before which.
However it doesn't have to be strict precedence. Some rules can have an ambiguous order as long as the execution still follows the shape of a tree.
For example, in my simple rule above, it doesn't matter if you visit the ε rule or the other rule first in your implementation (although it's less efficient to visit ε first).
So, I think precedence is a side-effect of doing this transformation.
brightprogramer
Precedence is more of a way to remove ambiguity from grammar. Grammars are just hard to figure out in one go.
mrkeen
I appreciate the dismissal, but I stand by what I said.
I used to think 'precedence was just for removing ambiguity', so I didn't pay attention to it.
Then I discovered it was the key to structuring grammars to avoid left recursion, and everything changed for me.
I've used it successfully in the past and I'll do so in the future.
tgv
I think you're not using the word precedence properly. It seems as if you're trying to explicitly instruct a top-down parser.
E.g., the precedence in these two grammars (for simple arithmetic expressions) is identical
S -> F | S + F
F -> T | F * T
T -> 0 | 1 | 2 | ...
S -> F | F + S
F -> T | T * F
T -> 0 | 1 | 2 | ...
but one recurses on the left side, and the other on the right.brightprogramer
I see, I'd like to give it a try then. I'll try adding some precedence here. I don't know how, though.
This calls for some reading...
The article should add more context. Left-recursion is usually a problem for top-down parsers (as the cited paper does, https://www.microsoft.com/en-us/research/wp-content/uploads/...), while right-recursion is "bad" for bottom-up parsers (by "bad" I mean that it isn't impossible to have right-recursion, but it might be less efficient in terms of memory).