Show HN: Jq-Like Tool for Markdown
20 comments
·February 23, 2025verdverm
> GitHub PRs are Markdown documents, and some organizations have specific templates with checklists for all reviewers to complete. Enforcing these often requires ugly regexes that are a pain to write and worse to debug
This is because GitHub is not building the features we need, instead they are putting their energy towards the AI land grab. Bitbucket, by contrast, has a feature where you can block PRs using a checkbox list outside of the description box. There are better ways to solve this first example from OP readme. Cool project, I write mainly MDX these days, would be cool to see support for that dialect
yshavit
The Markdown parsing library I'm using supports MDX, so it shouldn't be too difficult to come up with syntax for those components. I haven't done that yet, but mostly because I didn't want to go down that path until I knew there was interest and had a concrete use case or two to inform the query syntax.
If you want to open an enhancement request issue, I'm happy to take a look (PRs also welcome, but not required). If you're not on GitHub, let me know and we can figure out some other way to get the request tracked.
Thanks for taking a look at the project!
codelion
it's a shame when core feature development seems to lag. i've also been working w/ MDX lately & agree that support would be a great addition.
echelon
> This is because GitHub is not building the features we need, instead they are putting their energy towards the AI land grab.
You throw the ball to where it's going. Gitlab might be delivering more value in the short term, but if things wind up looking significantly different in ten years, they might be in for a world of hurt. Innovator's dilemma is real.
It's a danger to ignore the tectonic changes happening. It's also incredibly risky to lean fully in, because we're not sure where the value accrues or which systems are the most important to build. It doesn't seem like foundation models are it.
It's smart to build basic scaffolding, let the first movers make all the expensive mistakes, then integrate the winning approaches into your platform. That requires a lot of energy though.
lanstin
Ironically one of the reasons markdown (and other text based file formats) were popular because you could use regular find/grep to analyze it, and version control to manage it.
zahlman
I don't think anyone ever really expected to see widespread use of regexes to alter the structure of a Markdown document. Honestly, while something like "look for numbers and surround them with double-asterisks to put them in boldface" is feasible enough (and might even work!), I can't imagine that a lot of people would do that sort of thing very often (or want to) anyway.
If a document is supposed to have structure - even something as simple as nested lists of paragraphs - it doesn't seem realistic to expect regular text manipulation tools to do a whole lot with them. Something like "remove the second paragraph of the third entry in the fourth bullet-point list" is well beyond any sane use of any regex dialect that might be powerful enough. (Keeping in mind that traditional regexes can't balance brackets; presumably they can't properly track indentation levels either.)
See also: TOML - generally quite human-editable, but still very much structured with potentially arbitrary nesting.
cdbattags
Definitely, but it's neat nonetheless because more and more things are "structured Markdown" these days. Extremely useful for AI reasoning and outputs.
monsieurbanana
> because you could use regular find/grep to analyze it
They were meant to be analyzable in some ways. Count lines, extract headers, maybe sed-replace some words. But being able to operate/analyze over multiline strings was never a strong point of unix tools.
unglaublich
My flow is to go through the Pandoc JSON AST and then use Jq. This works for other input formats, too.
saghm
I've never had a need for parsing markdown like this, bit I have to wonder, would it make to go through HTML instead, given that it's what markdown is designed to compile to? At that point, I'd assume there's any number of existing XML tools that work work, and my (maybe naive) assumption is that typical markdown documents would be relatively flat compared to how deeply nested "native" HTML/XML often gets, so it doesn't seem like most queries would require particularly complex XPath to be able to specify.
MathMonkeyMan
I did this for a tool that checks relative links in markdown files, e.g. readmes in a repo.
markdown -> xhtml -> sxml -> logic (racket)
yshavit
I'm curious how ergonomic you find that? I did look at the pandoc JSON initially, and found it fairly awkward to work with. It's a great interchange format, but doesn't seem optimized for either human interaction or scripting. (It's definitely possible to use it for scripting, it just felt cumbersome to me, personally.)
broodbucket
I think you'd benefit of having some more real-world-ish examples in the README, as someone who doesn't intuit what I'd want to use this for.
twinkjock
Thanks for sharing this Yuval! Thanks as well for using permissive licenses so I can use this at work.
imglorp
Curious, which license can't you use at work for a simple shell tool? Considering you're not linking against it, even GPL3 should be okay, right?
nodesocket
How is it parsing? Just normal string and regex matching or transforming markdown to an intermediate structured language?
yshavit
For the markdown, I'm using https://github.com/wooorm/markdown-rs, which is a formal parser that produces an AST. For the query language, I have a very simple hand-rolled parser.
Hamzahkm
[flagged]
There have been a few times I wanted the ability to select some text out of a Markdown doc. For example, a GitHub CI check to ensure that PRs / issues / etc are properly formatted.
This can be done to some extent with regex, but those expressions are brittle and hard to read or edit later. mdq uses a familiar pipe syntax to navigate the Markdown in a structured way.
It's in 0.x because I don't want to fully commit to the syntax being stable, in case real-world testing shows that the syntax needs tweaking. But I think the project is in a pretty good spot overall, and would be interested in feedback!