Skip to content(if available)orjump to list(if available)

How Janet's PEG module works

How Janet's PEG module works

14 comments

·April 11, 2025

mplanchard

I did Advent of Code in Janet the year before last I think, and really loved the PEG support. Essentially every day started out with making a quick grammar to parse the problem into whatever data structures I was using. It's intuitive, pretty easy to pick up and adjust, and powerful.

3036e4

I did that as well, in 2023. Tried to use PEGs as much as possible. Have only good memories of the PEGs, but have not had much reason to use them since. Janet+PEG is definitely something I will consider for future projects whenever I need to parse something. Even for something that would otherwise be just a small regular expression I think writing a PEG instead may make some sense for readability.

wodenokoto

I had plans to try the same, but with Python background and having never touched any lispy languages or any macros I found the Janet for Mortals book surprisingly difficult to follow and gave up.

norir

I am not a fan of PEG. It is straightforward to write a fast parser generator for languages that require just one character of lookahead to disambiguate any alternation in the grammar. This gets you most of the expressivity of PEG and nearly optimal performance (since you only need to look at one character to disambiguate and there is no backtracking). Just as importantly, it avoids the implicit ambiguities that PEG's resolution algorithm can hide from the grammar author that lead to unexpected parse results that can be quite difficult to debug and/or fix in the grammar.

It does require a bit more thought to design an unambiguous language but I think it's worth it. While there is a learning curve for designing such languages, it becomes natural with practice and it becomes hard to go back to ambiguous languages.

janzer

For those further interested in PEG vs LL(1) parsers. The first few sections of the Python PEP[1] where they switched from an LL(1) to PEG parser for CPython has a nice short introduction to both and their rationale for switching from LL(1) to PEG.

https://peps.python.org/pep-0617/

PaulHoule

It still seems to me the PEG revolution hasn't arrived.

PEG has the possibility for composable grammars (why not smack some SQL code in the middle of Python?) but it needs a few more affordances, particularly an easy way to handle operator precedence.

I think current parser generators suck and that more programmers would be using them if anybody cared about making compiler technology easier to use but the problems are: (1) people who understand compiler technology can get things done with the awful tools we have now and (2) mostly those folks think it is performance über alles.

With the right tools the "Lisp is better because it is homoiconic" would finally die. With properly architected compilers adding

  unless(X) { .. } -> if(!X) { ... }
to Java would just one grammar production, one transformation operator and maybe a new class in the AST (which might be codegenned), that and something to tell the compiler where to find these things. Less code than the POM file.

I gave up on Restructured text because it didn't support unparsing: I could picture all kinds of scenarios where I'd want to turn something else into RST or take RST and mix it up against other data and turn it back to RST; RST had the potential to work with or without a schema but it never got realized.

behnamoh

> "Lisp is better because it is homoiconic"

- Lisp is better because it manipulates the same data that the program code is represented in (car works on a data list, and it works on a code list as well).

- Lisp is better (at least, Common Lisp) because of image-and-REPL-driven development. Good luck finding exactly that level of flexibility in other REPL-ful languages.

- Lisp is better because of hot code reloading and restarts. Only Elixir/Erlang have a similar mechanism.

- Lisp is better because of structural editing (e.g., paredit). No more character-level editing.

I could go on but just wanted to point out that homoiconicity isn't the entire deal with Lisp.

zem

brag is a pretty user-friendly parser generator for racket: https://docs.racket-lang.org/brag/index.html

thesz

> It is straightforward to write a fast parser generator for languages that require just one character of lookahead...

Then you get VHDL.

https://news.ycombinator.com/item?id=15017974

You need (at least an approximation to) the symbol table for correct lexing.

Or Postgres/MariaDB's SQL with the the DELIMITER statement that can change semicolon to something else.