Show HN: I built an AI that turns GitHub codebases into easy tutorials

180 comments

·April 19, 2025

https://the-pocket.github.io/Tutorial-Codebase-Knowledge/

bilalq

This is actually really cool. I just tried it out using an AI studio API key and was pretty impressed. One issue I noticed was that the output was a little too much "for dummies". Spending paragraphs to explain what an API is through restaurant analogies is a little unnecessary. And then followed up with more paragraphs on what GraphQL is. Every chapter seems to suffer from this. The generated documentation seems more suited for a slightly technical PM moreso than a software engineer. This can probably be mitigated by refining the prompt.

The prompt would also maybe be better if it encouraged variety in diagrams. For somethings, a flow chart would fit better than a sequence diagram (e.g., a durable state machine workflow written using AWS Step Functions).

cushychicken

Answers like this are sort of what makes me wonder what most engineers are smoking when they think AI isn’t valuable.

I don’t think the outright dismissal of AI is smart. (And, OP, I don’t mean to imply that you are doing that. I mean this generally.)

I also suspect people who level these criticisms have never really used a frontier LLM.

Feeding in a whole codebase that I’m familiar with, and hearing the LLM give good answers about its purpose and implementation from a completely cold read is very impressive.

Even if the LLM never writes a line of code - this is still valuable, because helping humans understand software faster means you can help humans write software faster.

linotype

Many devs still think their job is to write code not build products their business needs. I use LLMs extensively and it’s helped me work better faster.

grugagag

LLMs excel at some things and work very poorly at others. People working on different problems have had different experiences, sometimes opposite ends of the spectrum.

kaycebasques

> hearing the LLM give good answers about its purpose and implementation from a completely cold read

Cold read ability for this particular tool is still an open question. As others have mentioned, a lot of the example tutorials are for very popular codebases that are probably well-represented in the language model's training data. I'm personally going to test it on my private, undocumented repos.

tossandthrow

> Even if the LLM never writes a line of code - this is still valuable, because helping humans understand software faster means you can help humans write software faster.

IMHO, Ai text additions are generally not valuable and I assume, until proven wrong, that Ai text provides little to no value.

I have seen so many startups fold after they made some ai product that on the surface level appeared impressive but provided no substantial value.

Now, I will be impressed by the ai that can remove code without affecting the product.

jonahx

> Now, I will be impressed by the ai that can remove code without affecting the product.

Current AIs can already do this decently. With the usual caveats about possible mistakes/oversight.

otabdeveloper4

Summarization is one thing LLM's can do well, yes. (That's not what this current hype cycle is selling though.)

panny

>Answers like this are sort of what makes me wonder what most engineers are smoking when they think AI isn’t valuable.

I'll just wait for a winner to shake out and learn that one. I've gotten tired of trying AIs only to get slop.

CodeMage

> Answers like this are sort of what makes me wonder what most engineers are smoking when they think AI isn’t valuable.

Honestly, I wonder if I'm living in some parallel universe, because my experience is that "most engineers" are far from that position. The reactions I'm seeing are either "AI is the future" or "I have serious objections to and/or problems with AI".

If you're calling the latter group "the outright dismissal of AI", I would disagree. If I had to call it the outright dismissal of anything, it would be of AI hype.

> I also suspect people who level these criticisms have never really used a frontier LLM.

It's possible. At my workplace, we did a trial of an LLM-based bot that would generate summaries for our GitHub PRs. I have no idea whether it's a "frontier" LLM or not, but I came out of that trial equally impressed, disappointed, and terrified.

Impressed, because its summaries got so many details right. I could immediately see the use for a tool like that: even when the PR author provides a summary of the PR, it's often hard to figure out where to start looking at the PR and in which order to go through changes. The bulleted list of changes from the bot's summary was incredibly useful, especially because it was almost always correct.

Disappointed, because it would often get the most important thing wrong. For the very first PR that I made, it got the whole list of changes right, but the explanation of what the PR did was the opposite of the truth. I made a change to make certain behavior disabled by default and added an option to enable it for testing purposes, and the bot claimed that the behavior was impossible before this change and the PR made it possible if you used this option.

Terrified, because I can see how alluring it is for people to think that they can replace critical thinking with AI. Maybe it's my borderline burnout speaking, but I can easily imagine the future where the pressure from above to be more "efficient" and to reduce costs brings us to the point where we start trusting faulty AI and the small mistakes start accumulating to the point where great damage is done to millions of people.

> Even if the LLM never writes a line of code - this is still valuable, because helping humans understand software faster means you can help humans write software faster.

I have my doubts about this. Yes, if we get an AI that is reliable and doesn't make these mistakes, it can help us understand software faster, as long as we're willing to make the effort to actually understand it, rather than delegating to the AI's understanding.

What I mean by that is that there are different levels of understanding. How deep do you dive before you decide it's "deep enough" and trust what the AI said? This is even more important if you start also using the AI to write the code and not just read it. Now you have even less motivation to understand the code, because you don't have to learn something that you will use to write your own code.

I'll keep learning how to use LLMs, because it's necessary, but I'm very worried about what we seem to want from them. I can't think of any previous technological advance that aimed to replace human critical thinking and creativity. Why are we even pursuing efficiency if it isn't to give us more time and freedom to be creative?

doug_durham

The value is that it got the details correct as you admit. That alone is worth the price of admission. Even if I need to rewrite or edit parts it has saved me time, and has raised the quality of PRs being submitted across the board. The key point with these tools is *Accountability*. As an engineer you are still accountable for your work. Using any tool doesn't take that away. If the PR tool gets it wrong, and you still submit it, that on the engineer. If you have a culture of accountability, then there is nothing to be terrified of. Any by the way the most recent tools are really, really good at PRs and commit messages.

voidUpdate

Well companies lock "frontier LLMs" behind paywalls, and I don't want to pay for something that still might not be of any use to me

GaggiX

Gemini 2.5 Pro Experimental (a frontier model) has 5 RPM and 25 RPD.

Gemini 2.5 Flash Preview 04-17 another powerful model has 10 and 500.

OpenAI also allows you to use their API for free if you agree to share the tokens.

kaycebasques

> Spending paragraphs to explain what an API is through restaurant analogies is a little unnecessary. And then followed up with more paragraphs on what GraphQL is.

It sounds like the tool (as it's currently set up) may not actually be that effective at writing tutorial-style content in particular. Tutorials [1] are usually heavily action-oriented and take you from a specific start point to a specific end point to help you get hands-on experience in some skill. Some technical writers argue that there should be no theory whatsoever in tutorials. However, it's probably easy to tweak the prompts to get more action-oriented content with less conceptual explanation (and exclamation marks).

[1] https://diataxis.fr/tutorials/

neop1x

>> This can probably be mitigated by refining the prompt

Sometimes it explains things like I am a child and sometimes it doesn't explain things well enough. I think fixing this just by a simple prompt change won't work - it may fix it in one part and make things worse in the other part. This is a problem which I have with LLM: you can fine-tune the prompt for a specific case but I find it difficult to write a universally-working prompt. The problem seems to be LLM "does not understand my intents", like it can't deduce what I need and "proactively" help. It follows requirements from the prompt but the prompt has to (and can't) handle all situations. I am getting tired of LLM.

hackernewds

exactly it is. I'd rather impressive but at the same time the audience is always going to be engineers, so perhaps it can be curated to still be technical to a degree? I can't imagine a scenario where I have to explain to the VP my ETL pipeline

trcf21

From flow.py

Ensure the tone is welcoming and easy for a newcomer to understand{tone_note}.

- Output only the Markdown content for this chapter.

Now, directly provide a super beginner-friendly Markdown output (DON'T need ```markdown``` tags)

So just a change here might do the trick if you’re interested.

But I wonder how Gemini would manage different levels. From my take (mostly edtech and not in English) it’s really hard to tone the answer properly and not just have a black and white (5 year old vs expert talk) answer. Anyone has advice on that?

porridgeraisin

This has given me decent success:

"Write simple, rigorous statements, starting from first principles, and making sure to take things to their logical conclusion. Write in straightforward prose, no bullet points and summaries. Avoid truisms and overly high-level statements. (Optionally) Assume that the reader {now put your original prompt whatever you had e.g 5 yo}"

Sometimes I write a few more lines with the same meaning as above, and sometimes less, they all work more or less OK. Randomly I get better results sometimes with small tweaks but nothing to make a pattern out of -- a useless endeavour anyway since these models change in minute ways every release, and in neural nets the blast radius of a small change is huge.

swashbuck1r

While the doc generator is a useful example app, the really interesting part is how you used Cursor to start a PocketFlow design doc for you, then you fine-tuned the details of the design doc to describe the PocketFlow execution graph and utilities you wanted the design of the doc-generator to follow…and then you used used Cursor to generate all the code for the doc-generator application.

This really shows off that the simple node graph, shared storage and utilities patterns you have defined in your PocketFlow framework are useful for helping the AI translate your documented design into (mostly) working code.

Impressive project!

See design doc https://github.com/The-Pocket/Tutorial-Codebase-Knowledge/bl...

And video https://m.youtube.com/watch?v=AFY67zOpbSo

mooreds

I had not used gemini before, so spent a fair bit of time yak shaving to get access to the right APIs and set up my Google project. (I have an OpenAPI key but it wasn't clear how to use that service.)

I changed it to use this line:

   api_key=os.getenv("GEMINI_API_KEY", "your-api_key")

instead of the default project/location option.

and I changed it to use a different model:

    model = os.getenv("GEMINI_MODEL", "gemini-2.5-pro-preview-03-25")

I used the preview model because I got rate limited and the error message suggested it.

I used this on a few projects from my employer:

- https://github.com/prime-framework/prime-mvc a largish open source MVC java framework my company uses. I'm not overly familiar with this, though I've read a lot of code written in this framework.

- https://github.com/FusionAuth/fusionauth-quickstart-ruby-on-... a smaller example application I reviewed and am quite familiar with.

- https://github.com/fusionauth/fusionauth-jwt a JWT java library that I've used but not contributed to.

Overall thoughts:

Lots of exclamation points.

Thorough overview, including of some things that were not application specific (rails routing).

Great analogies. Seems to lean on them pretty heavily.

Didn't see any inaccuracies in the tutorials I reviewed.

Pretty amazing overall!

mooreds

If you want to see what output looks like (for smaller projects--the OP shared some for other, more popular projects), I posted a few of the tutorials to my GitHub:

https://github.com/mooreds/prime-mvc-tutorial

https://github.com/mooreds/railsquickstart-tutorial

https://github.com/mooreds/fusionauth-jwt-tutorial/

Other than renaming the index.md file to README.md and modifying it slightly, I made no changes.

Edit: added note that there are examples in the original link.

mooreds

Update, billing was delayed, but for 4 tutorials it cost about $5.

manofmanysmiles

I love it! I effectively achieve similar results by asking Cursor lots of questions!

Like at least one other person in the comments mentioned, I would like a slightly different tone.

Perhaps good feature would be a "style template", that can be chosen to match your preferred writing style.

I may submit a PR though not if it takes a lot of time.

zh2408

Thanks—would really appreciate your PR!

TheTaytay

Woah, this is really neat. My first step for many new libraries is to clone the repo, launch Claude code, and ask it to write good documentation for me. This would save a lot of steps for me!

randomcatuser

Exactly what I did today! (for Codex!) The output here is actually slightly better!

I bet in the next few months we'll be getting dynamic, personalized documentation for every library!! Good times

fforflo

If you want to use Ollama to run local models, here’s a simple example:

from ollama import chat, ChatResponse

def call_llm(prompt, use_cache: bool = True, model="phi4") -> str: response: ChatResponse = chat( model=model, messages=[{ 'role': 'user', 'content': prompt, }] ) return response.message.content

mooreds

Is the output as good?

I'd love the ability to run the LLM locally, as that would make it easier to run on non public code.

fforflo

It's decent enough. But you'd probably have to use a model like llama2, which may set your GPU on fire.

Too

How well does this work on unknown code bases?

The tutorial on requests looks uncanny for being generated with no prior context. The use cases and examples it gives are too specific. It is making up terminology, for concepts that are not mentioned once in the repository, like "functional api" and "hooks checkpoints". There must be thousands of tutorials on requests online that every AI was already trained on. How do we know that it is not using them?

chairhairair

A company (mutable ai) was acquired by Google last year for essentially doing this but outputting a wiki instead of a tutorial.

kaycebasques

I meant to write a blog post about mutable.ai but didn't get around to it before the product shut down.

I did however archive the wiki that it generated for the project I work on: https://web.archive.org/web/20240815184418/wiki.mutable.ai/g...

(The images aren't working. I believe those were auto-generated class inheritance or dependency diagrams.)

* The first paragraph is pretty good.

* The second paragraph is incorrect to call pw_rpc the "core" of Pigweed. That implies that you must always use pw_rpc and that all other modules depend on it, which is not true.

* The subsequent descriptions of modules all seemed decent, IIRC.

* The big issue is that the wiki is just a grab bag summary of different parts of the codebase. It doesn't feel coherent. And it doesn't mention the other 100+ modules that the Pigweed codebase contains.

When working on a big codebase, I imagine that tools like mutable.ai and Pocket Flow will need specific instruction on what aspects of the codebase to document.

zh2408

Their site seems to be down. I can't find their results.

codetrotter

Were they acquired? Or did they give up and the CEO found work at Google?

https://news.ycombinator.com/item?id=42542512

The latter is what this thread claims ^

chairhairair

I don’t know the details of the deal, but their YC profile indicates they were acquired.

cowsandmilk

you're going to trust the person who started the thread with no idea what happened to the company and then jumped to conclusions based on LinkedIn?

nxobject

It sounds like it'd be perfect for Google's NotebookLM portfolio -- at least if they wanted to scale it up.

null

[deleted]

gregpr07

I built browser use. Dayum, the results for our lib are really impressive, you didn’t touch outputs at all? One problem we have is maintaining the docs with current codebase (code examples break sometimes). Wonder if I could use parts of Pocket to help with that.

cehrlich

As a maintainer of a different library, I think there’s something here. A revised version of this tool that also gets fed the docs and asked to find inaccuracies could be great. Even if false positives and false negatives are let’s say 20% each, it would still be better than before as final decisions are made by a human.

zh2408

Thank you! And correct, I didn't modify the outputs. For small changes, you can just feed the commit history and ask an LLM to modify the docs. If there are lots of architecture-level changes, it would be easier to just feed the old docs and rewrite - it usually takes <10 minutes.

esjeon

At the top are some neat high-level stuffs, but, below that, it quickly turns into code-written-in-human-language.

I think it should be possible to extract some more useful usage patterns by poking into related unit tests. How to use should be what matters to most tutorial readers.

remoquete

This is nice and fun for getting some fast indications on an unknown codebase, but, as others said here and elsewhere, it doesn't replace human-made documentation.

https://passo.uno/whats-wrong-ai-generated-docs/

kaycebasques

My bet is that the combination of humans and language models is stronger than humans alone or models alone. In other words there's a virtuous cycle developing where the codebases that embrace machine documentation tools end up getting higher quality docs in the long run. For example, last week I tried out a codebase summary tool. It had some inaccuracies and I knew exactly where it was pulling the incorrect data from. I fixed that data, re-ran the summarization tool, and was satisfied to see a more accurate summary. But yes, it's probably key to keep human technical writers (like myself!) in the loop.

remoquete

Indeed. Augmentation is the way forward.

axelr340

We are also building a tool to understand codebases. Our tool shows the features implemented in a codebase visually, along with their hierarchy, and with traceability to associated code.

Here is an example feature map for the Spot robot SDK from Boston Dynamics with 100k lines of code: https://product-map.ai/app/public?url=https://github.com/bos...