Backlog.md – Markdown‑native Task Manager and Kanban visualizer for any Git repo
17 comments
·July 6, 2025mrlesk
mitjam
Really love this.
Would love to see an actual end to end example video of you creating, planning, and implementing a task using your preferred models and apps.
mrlesk
Will definitely do. I am also planning to run a benchmark with various models to see which one is more effective at building a full product starting from a PRD and using backlog for managing tasks
bazooka5798
I'd love to see openRouter connectivity to try non Claude models for some of the planning parts of the cycle.
westurner
Is there an established benchmark for building a full product?
- SWE-bench leaderboard: https://www.swebench.com/
- Which metrics for e.g. "SWE-Lancer: a benchmark of freelance software engineering tasks from Upwork"? https://news.ycombinator.com/item?id=43101314
- MetaGPT, MGX: https://github.com/FoundationAgents/MetaGPT :
> Software Company as Multi-Agent System
> MetaGPT takes a one line requirement as input and outputs user stories / competitive analysis / requirements / data structures / APIs / documents, etc. Internally, MetaGPT includes product managers / architects / project managers / engineers. It provides the entire process of a software company along with carefully orchestrated SOPs.
- Mutation-Guided LLM-based Test Generation: https://news.ycombinator.com/item?id=42953885
- https://news.ycombinator.com/item?id=41333249 :
- codefuse-ai/Awesome-Code-LLM > Analysis of AI-Generated Code, Benchmarks: https://github.com/codefuse-ai/Awesome-Code-LLM :
> 8.2 Benchmarks: Integrated Benchmarks, Evaluation Metrics, Program Synthesis, Visually Grounded Program, Synthesis, Code Reasoning and QA, Text-to-SQL, Code Translation, Program Repair, Code Summarization, Defect/Vulnerability Detection, Code Retrieval, Type Inference, Commit Message Generation, Repo-Level Coding
- underlines/awesome-ml/tools.md > Benchmarking: https://github.com/underlines/awesome-ml/blob/master/llm-too...
- formal methods workflows, coverage-guided fuzzing: https://news.ycombinator.com/item?id=40884466
- "Large Language Models Based Fuzzing Techniques: A Survey" (2024) https://arxiv.org/abs/2402.00350
unshavedyak
Would love more detail on your integration with claude. Are you telling claude to use backlog to plan X task? Feels like some MCP integration or something might make it feel more native?
Though i've not had much luck in getting Claude to natively use MCPs, so maybe that's off base heh.
ttoinou
Seems like a great idea. How would that work with multiple branches ? One task might be implemented in a different branch, we might want to have a global overview of all the tasks being coded in the main branch
All data is saved under backlog folder as human‑readable Markdown with the following format task-<task-id> - <task-title>.md (e.g. task-12 - Fix typo.md).
If every "task" is one .md file, I believe AI have issues editing big files, it can't easily append text to a big file due to context window, we need to force a workaround launching a command line to append text instead of editing a file. So this means the tasks have to remain small, or we have to avoid putting too much information in each task.jedimastert
Can we change the title to include that this is a tool for AI? I thought it was just gonna be a visualizer.
The tagline from the repo seems fine: "A tool for managing project collaboration between humans and AI Agents in a git ecosystem"
tptacek
This is a good idea. But the screenshots you have show lots of tasks in a project; how are you dispatching tasks (once planned) to an agent, and how are agents navigating the large number of markdown task content you're producing without blowing out their context budget?
rumblefrog
Is there an alternative that integrates with a Jira instance?
Many of my tasks already exists in forms of a Jira ticket, would be interesting to prompt it to take over a specific ticket & update its ticket progress as well.
mrlesk
For such kind of tasks I would go with Taskmaster AI. It had mcp integration and probably could connect with jira.
Backlog is more for smaller projects where you wouldn’t normally have a project management tool
QRY
Ooh, definitely trying this out! I ended up homebrewing a whole context maintainance ritual, but that was a pain to get an AI agent to consistently apply, so it spun out into building a whole project management... thing.
This looks much more thought out, thanks for sharing!
bearjaws
Like the idea of a self hosted kanban in git, one item you should do in your repo is add the installation instructions to the readme :)
I see its a TS app so I am sure the bun bundle is the install, but always good to include in your 5 min intro.
mrlesk
You’re absolutely right
Joking aside there is a npm/bun install -g backlog.md at the top but I can add an extra one in the 5 min intro.
I am using Bun’s new fullstack single file builds. I’m really impressed by how easy it was to set up everything.
adobbs
Brilliant! Thank you for sharing.
Had similar success with making some more markdown files to help guide the agent but never would have thought of something this useful.
Will try your workflow and backlog on a build this week.
TimMeade
That's look fascinating. I will certainly be testing it in the morning! Thanks!
crashabr
[dead]
I threw Claude Code at an existing codebase a few months back and quickly quit— untangling its output was slower than writing from scratch. The fix turned out to be process, not model horsepower.
Iteration timeline
==================
• 50 % task success - added README.md + CLAUDE.md so the model knew the project.
• 75 % - wrote one markdown file per task; Codex plans, Claude codes.
• 95 %+ - built Backlog.md, a CLI that turns a high-level spec into those task files automatically (yes, using Claude/Codex to build the tool).
Three step loop that works for me 1. Generate tasks - Codex / Claude Opus → self-review.
2. Generate plan - same agent, “plan” mode → tweak if needed.
3. Implement - Claude Sonnet / Codex → review & merge.
For simple features I can even run this from my phone: ChatGPT app (Codex) → GitHub app → ChatGPT app → GitHub merge.
Repo: https://github.com/MrLesk/Backlog.md
Would love feedback and happy to answer questions!