Literate programming: Knuth is doing it wrong (2014)
123 comments
·January 13, 2025fanf2
wiz21c
I like your comment.
But let's take for example JupyterNoteBook. It's nice but not as good Knuth's stuff. Why ? Because the notebook forces you to follow the evaluation order whereas Knuth allows you follow your ideas order. Now for the rest, Jupyter is simply much better. But fundamentally, JNB can't match Knuth's LP flexibility.
So I'd say that JupyterNotebooks are really nice to explain a recipe but LP is better at explaining ideas (more like algorithms).
However, usually algorithms presentations have more to do with ideas, concepts. And thus, one may wonder if even bothering writing an LP document is necessary since the core ideas don't actually need code to be explained).
So-so... :-)
fanf2
WRT modern tooling, I was thinking more in terms of languages like Haskell that don’t constrain the order of presentation so much, or tools like rustdoc that embrace hypertext and allow the reader to choose their own adventure.
leephillips
This is solved by the Pluto notebook, which makes presentation order independent of execution order. Only works for Julia, but shows that such a notebook is possible.
snops
Marimo does this for python, by tracking dependencies.
PittleyDunkin
It's one of two literate programming books I've read—the other one is PBRT, which provides about the same level of quality as the TeXBook IMO (I haven't read metafont).
> On balance I think modern tools achieve Knuth’s goals better than tangle/weave. Modern languages let you organize code much closer to its narrative structure, without WEB’s unhygienic scoping. Editors and IDEs and documentation browsers give you hyperlinks everywhere.
I just don't think this is a technical concern to begin with. It's true that you can jump around code a lot easier, but this doesn't make the issue of laying out the code as a linear narrative with interleaved code and text any easier of a task. I do think it's an excellent, excellent way to present code, though.
WorldMaker
I think "linear narrative" is exactly the hard part. Most programs don't have a linear narrative, they have a "hub-and-spoke" narrative of some sort, a central core then a bunch of branches of "features" that may or may not interact. Some of that is "do one thing" is great in the Unix philosophy but doesn't really describe things like business-oriented development. But I think more interestingly as Diataxis [0] maybe suggests learning about any project is maybe necessarily "two-dimensional" at best/least. Someone entirely new to a codebase is probably going to need a different "narrative" than someone familiar with it. Maintenance and new work are different narratives from the original design.
Perhaps literate programming is so "hard" simply because we aren't able to nail down a single linear narrative.
I don't know what the Diataxis of Literate Programming looks like, but it does wriggle with some ideas.
creer
Yes the problem is all these different literate needs.
There is a whole range of things we should want to document and encompass. From an outline idea or plan of how the software is planned to work and is currently implemented, all the way to more like programming journal or engineering notebook on the development, showing stuff that was tried and failed, record of performance experiments, all the way to day to day commits. Day to day commits is probably enough quantity that it will do fine with a separate system - but then should still have pointers or references in the engineering notebook aspect of the whole thing. And then of course multi-user by now. And for many software bases, this cannot be linked to "in order" execution like a python notebook. Execution is too variable, long, on-going and environment dependent. It's possible that what this ends up looking like is pairing "extensive in-code documentation" with a separate "overview narrative" and a separate "engineering journal" (with thoughts and rationales and test results pasted in or git-ted.)
But I don't throw stones at "typesetting". Nowadays "mind-map" is probably more appropriate or free-form layout, and there is a lot to be said for throwing low cost graphical representations including napkin diagrams here and there in the documentation. If we are trying to make it easy on the programmers to provide all this input, then let's make it easy.
(But then I object to the lack of "linearity, diff-ability, text-ability" of mind-map formats and my reaction to spreadsheets like one of the comments requests: "oh god no, let's not HIDE all this in countless tiny little boxes that must be opened one by one!" - but I would love a linear, text-based computable spreadsheet format.)
Joker_vD
> The order of presentation of a program is dictated by the needs of a single-pass compiler
That never really made sense to me. All those "if"s, and "while"s, and "break"s, and "return"s jump to the not-yet-generated places in code just fine; a similar technique could be used for delaying the resolution of function/procedure calls as well.
Now, generating initialized data/rodata sections is something that single-pass compilers do struggle with (that's why Pascal didn't have array literals), and it's understandable: the Modula compiler, which got them (or was it Modula-2?), had to hold all of that initialized data section in memory and then dump it to the disk only after the code-generation (and patching the offsets) was done. But dealing with not-yet-defined code labels? That's something you have to do anyhow.
Someone
> All those "if"s, and "while"s, and "break"s, and "return"s jump to the not-yet-generated places in code just fine; a similar technique could be used for delaying the resolution of function/procedure calls as well.
Not quite. When compiling "if"s, and "while"s, and "break"s, a compiler will make use of the fact that code is structured. Because of that, a stack of addresses is sufficient to track (similar to how Forth uses the return stack to compile such constructs)
For returns, the compiler doesn’t resolve the target address; that happens at runtime.
For function calls, a compiler would need a map mapping function names to addresses.
Also, generating good error messages is harder if you allow forward references. You cannot generate an error before seeing the definition, even if, say you encounter two calls to foo before seeing its definition.
Joker_vD
Of course you can do all of this! After all, the sources of e.g. Wirth's Modula-2 single-pass compiler for the Macintosh (that generates native MC68000 code) have survived, and it doesn't use "stack of addresses": it uses "fixup()" calls when compiling conditions, loops, references to (imported) global variables etc. and, surely enough, calls to functions. It even has "CheckUDProc()" call at the end of the module-translation procedure to check whether there are any undefined procedures left in the symbol table.
It would be entirely possible to treat calls to undefined functions as implicit forward declarations; he simply never chose to do so.
int_19h
The problem with functions is that when you see something like this:
F(X);
there may be implicit conversions happening to X here depending on its type, and, crucially, the declared type of the argument - e.g. Integer to Real. This would be a separate function call in many cases, so it's not just a single address to patch - indeed, you don't even know the size of the code that will need to be inserted in advance.Joker_vD
Yes, and the way that e.g. C famously handles it is that it treats it as X being implicitly int, if it's an integral expression that produces something shorter than long, and a double, if X is a floating-point-valued expression, and whatever happens to structs when invoking implicitly-declared functions.
This, again, can be checked for the correctness at the end of the translation of a particular compilation unit, C just decided to not bother with this at all and leave it to the linker to figure out (even though it started its life as a language with 2-and-a-half passes compiler), and Pascal and its descendants decided to require either forward declarations, or using function pointers. I personally think that e.g.
var F: proc(whatever): whatever;
proc G() begin
... F(whatever); ...
end G;
proc F0(whatever): whatever begin
... G(); ...
end F0;
begin
F := F0;
end module.
is a needless pessimization but e.g. Oberon does it this way.pragma_x
> You just feed your TeX document to ghc and it picks the code out of the \begin{code} blocks.
This is starting to remind me of Jupyter notebooks, as you are expected to have documentation (well-rendered markdown) blocks and dedicated code blocks. Now I wonder if notebooks were conceived with Knuth's vision in mind.
kccqzy
Yes Jupyter notebooks are inspired by Mathematica notebooks, and the documentation for the latter explicitly mentions literate programming.
PaulHoule
I spent a lot of time thinking about no/low code and one conclusion I came to was that there was "the program" and then there was the way it displayed. Consider
https://www.knime.com/why-visual-workflows
the topology of the connection between the blocks is essential to execute the program; that doesn't require coordinates for the blocks, but the visual editor does. The ideal tool has a clean separation between these. People are dimly aware of the "hairball" graph problem and looking for a visualization algorithm that banishes them:
https://blog.tomsawyer.com/untangle-the-hairball-with-bundle...
yet the real problem is that if your goal is to communicate you want to tell stories with visualizations and you need a visualization organized around a story -- and for each story you tell you need a different visualization.
I see the same thing with literate software. I can think of a number of interesting stories to tell about last month's chess program (e.g. "move generation", "alpha-beta search", for instance.) For a particular story I want to focus on certain things and completely eliminate other things. The same code might appear in more than one story. To make all this work there has to be a clean separation.
WillAdams
The fundamental question here is one which I don't think there is an agreed-upon answer for:
>What does an algorithm look like?
I am working on a rather visually-oriented tool, and while I did a fair bit of early development using BlockSCAD, and also have access to OpenSCAD Graph Editor, it hasn't made sense to show the algorithms visually because it's a lot of work making them suitably expressive.
One almost wishes that the specialized library used for:
https://www.youtube.com/watch?v=aVwxzDHniEw
was published and widely used.
Similarly, why can't we have more things such as:
https://mathcs.clarku.edu/~djoyce/java/elements/elements.htm...
mjochim
The library developed and used by 3blue1brown [1] is open-source [2] and seems to fit the same use case. I don’t know about widely-used, though.
[1] https://3blue1brown.com [2] https://github.com/ManimCommunity/manim
PaulHoule
An example I'll call out is an art show I saw by
https://en.wikipedia.org/wiki/Mark_Lombardi
a 'conspiracy theorist' who died mysteriously in 2000 at the age of 48. He would make large numbers of pencil sketches (50+) that started out as hairballs and gradually he'd try different layouts until they told a clear story. (At least some of the conspiracies, such as BCCI, were real, thus the quotes)
The same kind of work is necessary if you want to reveal some network of relationships that is essential to understanding some technological system, rule base, etc.
kayvulpe
I cannot find any of his work in high-resolution but those diagrams are exhilarating. Thank you.
PaulHoule
As a kid I felt algebra >> geometry (like I want to divide an angle by three and why waste my time with a system that can't! sure you can learn what a system can and can't do but that can be taught more directly with examples from computing) so as much as I read about Elements in math books by the likes of Martin Gardner it struck me as serious malpractice that "Great Books" advocates wanted kids to read it. (It's better than reading Newton's Principia if you want to learn physics or calculus though...)
I like what that site is trying to do but the upper levels don't communicate the affordances you would find if you drilled in. Also there is graph structure in Elements that I don't see visualized; also Elements uses a lot of weird vocabulary that would be a lot easier to deal with if it were hyperlinked to a glossary.
I've been interested in old Asian texts like the https://en.wikipedia.org/wiki/Kojiki and https://en.wikipedia.org/wiki/Romance_of_the_Three_Kingdoms where I have, charitably, 5% reading comprehension of the language but could get a lot with the graph structure materialized (like the chain of succession from Amaterasu to the Emperor) and also would like to see the original text, plus human-generated English translation if available, LLM-based translations, links to properly resolved characters and words in the dictionary, etc. (Right now I am digging into about 800,000 images with Chinese language metadata with some crude tools, really just getting out named entities makes me tickled pink.)
nathancahill
Along the same lines, I quite like the regex visualizer (Railroad diagram): https://regexper.com/#%2F%5E%28%28%5Ba-f0-9%5D%7B32%7D%29%2B...
jasonpeacock
The problem with literate programming is that most people aren’t good writers, and they aren’t interested in developing their writing skills - they only want to code.
Already it’s pulling teeth just to get literate commit messages…
mrweasel
Side note, regarding commit messages: I have a few colleague who will go through the commit message just as thoroughly as they do the code, and match up the code with the commit. Over the past few years that has thought me to write pretty decent commit messages, to the point where I go "That is nicely done" when encountering my own commits.
Still can't write outside commit messages, but I guess that can be learned as well.
kubb
You know who is a good writer? LLMs. Imagine a model interrogating you about a piece of code and writing perfect documentation.
JK, the LLM will get bad input and it will spit out bad output.
flir
LLM's tend to write classic "what, not why" comments/commit messages. The idea of the LLM interrogating the programmer for the "why" is interesting, though.
Or maybe reading the ticket? (That might just be moving the problem somewhere else though).
toxik
I wonder if an LLM could make ”atomic commits” out of my N pending changes.
nzach
It almost can... You can use something like gptcommit to automatically create a commit message. But the results are pretty bad. It can't produce anything beyond placeholder/filler text.
I don't need a message explaining we introduced a if to return when i is greater than len(items). I want a commit explaining why it blew up in production after being in production for over a year. What changed ? Did it had any other implications ? There is any ticket for this bug, or maybe a thread in slack ?
lupire
The other problem is that literate programming works well for code that has complicated ideas behind a small amount of code, but most businesses code is simple ideas behind a lot of boilerplatey code. The hard part is in making it all fit together, not explaining what each part means.
exe34
it might be worth separating out the clever business-specific algorithms, etc. these can be done with literate programming and then the rest of the plumbing can be done as usual.
hitchstory
>Already it’s pulling teeth just to get literate commit messages…
I usually push back on this because those commit messages almost never actually get read. It's an investment whose dividends are nebulous and hard to pin down.
It's rare that I look at a commit, and it's even rarer that I read it and wish that it had a better message.
There are all sorts of other documentation I routinely wish people put more effort into writing (comments, a "why" attached to every test, how to guides, tutorials), but rarely ever a commit message.
fsmv
If you wrote better commit messages you might look at them more because they're a lot more useful.
At work I see it as explaining why the code is there so when people check the blame layer they can find out and not delete my code if my reason is still relevant.
At home it helps a lot to write changelogs later when I do a release and it helps so much to see the last few commits when I pick a project up again after a month or two.
hitchstory
I mostly just message the person who wrote it if I have questions when I look at a commit and have questions.
I suppose if I painstakingly write 1000 beautiful commit messages I could save myself from having that one conversation when somebody else has a question about one of those commits.
norir
Writing good commit messages can be good self-promotion. You may not realize who is watching the repo and this is an easy way to differentiate yourself from you colleagues and make your work appear interesting and important.
Also, I have found that writing commit messages often forced me to rework a poor solution whose weakness became apparent only when I tried to explain it. In other words, often the value of the commit message is not the message itself but rather the process that produced it. It's a bit like musical scales. Almost no one is performing scales at a concert but also almost no one is performing without scales as a part of their regular practice.
hitchstory
>can be good self-promotion. You may not realize
When I said nebulous and hard to pin down this is kinda what I meant.
Do you routinely monitor commit messages on adjacent teams' repos? I dont. I dont know anybody who does. When I want to know something about their code I ping them a slack message and vice versa.
I dont think the CTO is reading commit messages either, theyre too busy.
I think it's good to ask when and why people actually do read commit messages to make sure what you write aligns with what they want to see.
paulddraper
Surely you do it often enough that an extra 90s would not be an undue cost?
hitchstory
I'd estimate I probably commit ~30-50 times a day. I dig into and read a commit message once every 3 months.
A 90 seconds a commit that's an hour a day spent writing beautiful commits.
That one hour a day could be substituted with one conversation "hey, why did you do xyz in [ linktocommit ]?" every 3 months.
If you dont do these back of the envelope calculations in your head when trying to figure out if something is worth doing I highly encourage it.
sparker72678
Seems like many devs don't even like to code. They just want to get paid.
(Not suggesting there's something wrong with that, per se. But good luck getting someone's who just in it for the money to go above and beyond.)
dhosek
I used to do everything in WEB/CWEB back in the 80s/90s. My biggest difference from Knuth was that I always started with an outline of the program as my first section (or perhaps second after an introductory section talking about what the program was meant to do). This made sense both from an expository standpoint and from a development standpoint as I could then assemble the program piecewise by filling in the sections that I had outlined at the beginning and generally each block of code fit on a single page with its documentation.
Problems I ran into were (a) for Pascal WEB, it was hard to get away from Knuth’s string pool–based handling of strings which I didn’t especially love and (2) for CWEB, it made sense to have both the .c and .h files specified in the .cweb file, but this meant that a lot of the efficiencies of make, such as they are, would be broken since the .h file would get updated everytime I updated the .c file forcing recompilation of other chunks of the code even though there were no actual changes. Perhaps a more intelligent version of ctangle which could screw with modification dates of files based on whether the code was actually changed would have helped, but…
That said, the weave output of TeX and Metafont does provide amazing documentation of the internals of how those programs work and they make for excellent reading. Alas, that sort of thing is hard to justify in contemporary business contexts.
froh
> Perhaps a more intelligent version of ctangle which could screw with modification dates of files based on whether the code was actually changed would have helped, but…
yes, it would --- noweb did two things right: they created a helper utility, `cpif`, which checks if a tangled file has changed at all and leaves the existing file alone if there was no change.
and they radically simplified the markup, which made it most accessible.
and they created (third good thing) a modular architecture with a.well defined easily parsable intermediate textual representation to pipe text and code chunks through whichever utilities you want, for syntax highlighting, indexing, whatever.
and then, of all options, they chose the icon language for a reimplementation. from awk and c, available anywhere, to iconoclastic icon. bummer...
taeric
Your starting with a general outline of it feels exactly right, to me. I thought Knuth actually did that quite often, as well?
In a sibling post, I describe it as how you would narrate the code to someone. You wouldn't go line-by-line in that. Instead, you would likely start by describing the general structure. Then, you'd start talking about bursts of changes.
It can be difficult to explain, as often times this can be done by talking about the functions that you are adding. In which case, it doesn't look much different from many other environments. Once you get to the non-linear additions of code where you are scattering variables and such through multiple sections, though, it starts to really shine.
kragen
I feel like you could run ctangle in one directory and use a ten-line awk script to copy the files with different checksums into the directory where you run the main Make?
dhosek
One of the challenges is that cweb wrote #line directives to its output so a file might change, but not in a meaningful way.
coliveira
You could write to a separate header file and use a script to copy to the right place only when the header was modified.
wduquette
In my experience, code bases are best structured for navigability, I.e., so that you can find what you’re looking for. Literate programming is about narrative, telling the story of the code in a clear way so that you can build your mental model. These two needs are frequently orthogonal.
lou1306
But that is the point of the article. Literate programming was built on the premise "what if you could have _one_ document that you could either typeset as an article/report _or_ compile as a program?", but then most implementations only cater to the second part of the equation and essentially default to "comments on steroids". Being able to expose pieces of code in a non-linear way would be a basic necessity, so that you can put the "interesting stuff" front and center and only focus on the minutiae (e.g., the #includes) later in the exposition.
josh-sematic
Firm agree. To me the best way to build a mental model of some code is to be able to quickly answer the questions that arise in my mind as I read it. The order in which these questions arise differ for each reader, and indeed for each intention that you approach the code with. There is therefore no one “perfect linear order” the code could be presented in and the best you can do is make it easy for the reader to construct their own reading order by being able to easily navigate around.
alganet
For today, I would focus on literate testing. The linear unit test is a better bottle for narrative content than the non-linear code.
In a similar line, I believe automatic documentation should be generated from tests, not the implementation. Then it's always up to date.
Text has many forms. Poems with many kinds of metrics, prose with all kinds of tones. Some forms are atemporal, some get old. None of it is "wrong", just different bottles to convey different ideas to different audiences.
bluGill
I disagree. The most important documentation is for people who don't want to read your code just use it. Thus API documentation is important. Tests sort of do that, but they get into the wrong details - I don't want to read through a dozen different edge conditions for the first argument before you get to what the second argument does...
alganet
I said documentation generated from literate tests (this does not exist yet), not tests as the final documentation artifact. The result would be similar to API docs, but derived from a more authoritative resource (tests, not docblocks).
> I don't want to read through a dozen different edge conditions
Neither do I. You mean you want the happy path in a distinct scenario. That is a very common perception amongst both testers and documentation writers. Good test suites have the happy path distinct from the edge conditions as well as good docs.
bluGill
I don't think it is possible to usefully deliver what you want, but if you can do that I'm all for it. Just having the code in my documentation checked that it compiles (which often means assumptions because I shouldn't have to put in boilerplate)
fjfaase
For 'Advent of Code' I have been using a kind of literary programming method based on Markdown files. I have written a parser that can read Markdown files with segments of C code and put all the segments in the correct order to be compiled. For an example, see: https://github.com/FransFaase/AdventOfCode2023/blob/main/Day...
I also used it as a documentation for parser I have been developing, which shows the literary programming style in a better way. The program can combine several Markdown files into a single C program. See: https://github.com/FransFaase/RawParser?tab=readme-ov-file#d...
drivers99
That looks great. Found it in your IParse repo. I'm starting to work through "Write a C Compiler" by Nora Sandler and I think this could be very useful for me. Besides, I'm curious how it puts things in order and lets you mention functions multiple times, using "..." to skip what's already there, etc.
fjfaase
Look for the file MarkDownC.cpp and search for 'elipses', which stands for '...'. The code makes use of AbstractParseTreeCursor, which is a kind of smart pointer. (The is quite a bit of code that is commented out.) You can contact me through the email mentioned at the bottom of my website www.iwriteiam.nl
camel-cdr
> There's a fundamental problem with generating a beautifully typeset document for a codebase: it's dead. [...]
> You can't work with it, you can't try to make changes to it to see what happens, and you certainly can't run it interactively.
I think an aspect where " classical" literate programming excels is in the form of printed educational books.
A great example of this is LCC "A Retargetable Compiler for ANSI C" by David R. Hanson.
Reading about how to write compilers in theory is fine, but it's very nice to learn from a full working example implementation.
bluGill
Truth and Knuth is an author of books so it makes sense for him to write code for books when writing code for books. However code that isn't bound into book has different needs and should be different. I've printed out code before (the floor gives a lot more space to spread out text than even the largest monitor, and taking a pen to circle useful parts is helpful) - but I recycled that paper as soon as the bug was found/fixed. In that case I'm never interested in how the code is documented/supposed to work in the authors imagination - I care about what it actually does.
WillAdams
Note that it is possible to apply a [draft] option to some LaTeX documentclasses so as to avoid that appearance and encourage folks to treat as not-yet final/finished.
taeric
I used to agree with this post's general premise, but I have come to think it is wrong.
The idea seems to be that a literate program would be ordered such that a reader would not need any familiarity with general boilerplate of the language that is being presented. But there is no real benefit for that. Indeed, it makes everything so much harder if you are trying to have meaningfully original narrative for every single line.
Instead, people should know roughly what the outline of a C program is if they are going to try and read a C program. Regardless of if it is done literately or not. To that end, having some generic imports at the top of a file that has a scattering of globals the author typically uses makes a ton of sense.
The idea, restated, is to allow you to narrate code. So, think of how you would approach reading anyone's source code that wasn't reformatted in this way. You'd take a look at the general outline of the file. Probably take a quick peak at the basic imports. If there are a common set of top level variables you expect to see, check for those real quick. Then, start trying to find the narrative of the code.
flerchin
It's funny the very things the author decries are when IDEs do for us automatically. Hide the imports, expose the abstractions when needed.
Sylvain78
Oscar winning book about rendering accurately 3D scenes, written in literate programming : https://www.pbrt.org/
WillAdams
I've tried to collect a list of books written thus at:
https://www.goodreads.com/review/list/21394355-william-adams...
(please excuse _A Philosophy of Software Design_ EDIT and the two _Structure..._ books --- they were just too influential on me to not list)
I would be glad of any I have missed, or other such lists.
magicalhippo
By far the best programming book I've ever read, by orders of magnitude.
I think the literate programming style fit this book well, but the main reason is simply that it tackles not just the theoretical side but also the practical side in great detail.
In my experience a lot of programming books focus a lot on either and ignore the other, and that leads to frustration when trying to implement stuff in practice.
lupire
Fascinating.
I think it would be much easier to read if it were formatted in 2-pane format, with the English and math on one pane, and the code on the side, aligned to match.
Working from the source code of the program/book and the Literate Code renderer program, it wouldn't be so hard to write a variant transformation to lay out in side-by-side fashion.
Possibly could need some extra markup code, though much could be inferred simply from "comment-followed-by-code implies comment-aside-code" rule.
As it is, the inline code is a distraction from the text and math, and vice versa.
WillAdams
There shouldn't be reason one couldn't use LaTeX to create such a typeset representation.
Integrating the inline code is the responsibility of the writer.
BlueTemplar
The problem with (La)TeX is that it's designed for fixed layout documents.
Great for print and slides, bad for most digital documents.
mtrovo
> You can't work with it, you can't try to make changes to it to see what happens, and you certainly can't run it interactively. All you can do, literally, is curl up with it in bed. And promptly fall asleep.
I totally agree with this. Generating static typeset docs essentially leaves them "dead" because they can't be poked at, run, or updated in real time. That's why I'm a big believer in the value of tests and instrumentation, which stay alive alongside the code. In practice, we don't need every aspect of a project ( e.g code, tests, docs, and instrumentation) to be simultaneously visible at all times, and I think part of the complexity of onboarding into a new project is the overwhelming feeling you get the first time you see a piece of code with all its tests, all comments, and all its instrumentation thrown at you like you should pay attention to everything. When we're refactoring, we focus on a particular chunk of code and its tests; when we're debugging, we need instrumentation. For everything else, we can loop back later.
We already have powerful ways to merge issues, commits, and discussions, but we still rely on massive documentation dumps that go stale fast. A more adaptive approach (perhaps using an LLM to generate contextual help from version control) would allow us to focus on the core process of writing and verifying code. I especially like the idea of code that can "talk back" and give just the right amount of detail based on what we're trying to accomplish at the moment. Sometimes it suffices to have novice-friendly, high-level explanations; other times, we want to jump in and change the behaviour of a very specific feature across some specific files. Maintaining documentation for every possible scenario is a huge burden, which is why I think flexible, on-demand help could be a game changer.
There are at least two aspects to literate programming.
Knuth wrote TeX in Pascal, which has a couple of limitations that annoyed him:
- Pascal has no module system: you can’t break a program up into multiple files
- The order of presentation of a program is dictated by the needs of a single-pass compiler
The tooling that supports literate programming is mostly about overcoming these problems with Pascal. (Knuth also uses it to split up larger procedures in a manner that makes me think of very un-structured C macros.) Most other languages are more flexible so they have much less need for a technology like tangle/weave.
The opposite extreme is Literate Haskell, which doesn’t need any preprocessor. You just feed your TeX document to ghc and it picks the code out of the \begin{code} blocks. Or instead of Knuth style you can write in Bird style, where the file is a plain text document with code marked by > quote blocks.
The other aspect of literate programming is organizing the code like a book. This is very difficult to do well. When I think of how I approach a codebase, it reminds me of textbooks that have a complicated diagram describing multiple suggested orders of reading (a chapter per module?), or the informal introduction to Algol 68 with its orthogonal table of contents. Ideally a program’s commentary should span both the explanatory and reference documentation quadrants.
What strikes me about Knuth’s literate programming is how hypertextual it is, albeit using printed cross-references and indexes. It seems to be in desperate need of interactive pixels. And an underlying programming language that has better support for Knuth’s preferred size of code fragment.
On balance I think modern tools achieve Knuth’s goals better than tangle/weave. Modern languages let you organize code much closer to its narrative structure, without WEB’s unhygienic scoping. Editors and IDEs and documentation browsers give you hyperlinks everywhere.
It’s then “just” a matter of writing the exposition. Less literate programming, more literate programmer.