Skip to content(if available)orjump to list(if available)

Diagrams AI can, and cannot, generate

diggan

A mistake I see people repeating over and over, is never restarting their conversations with a edited initial message.

Instead of doing what the author is doing here, and sending messages back and forward, leading to a longer and longer conversation, where each messages leads to worse and worse quality replies, until the LLM seems like a dumb rock, rewrite your initial message with everything that went wrong/was misunderstood, and aim to have whatever you want solved in the first message, and you'll get a lot higher quality answers. If the LLM misunderstood, don't reply "No, what I mean was..." but instead rewrite the first message so it's clearer.

This is at least true for all ChatGPT, Claude and DeepSeek models, YMMV with other models.

swatcoder

Yup.

Inasmuch as these are collaborative document generators at their core, "minimally ambiguous prompt and conforming reply" is a strongly represented document structure and so we benefit by setting them up to complete one.

Likewise, "tragi-comic dialog between increasingly frustrated instructor and bumbling pupil" is also a widely represented document structure that we benefit by trying to avoid.

Chatbot training works to minimize the chance of an LLM engaging in the latter, because dialog is a intuitive interface that users enjoy, but we can avoid the problem more successfully by just providing a new and less ambiguous prompt in a new session, as you suggest.

dartos

> dialog is a intuitive interface that users enjoy

Do people enjoy chat interfaces in their workflows?

I always thought that cursor/copilot/copy.ai/v0.dev were so popular because they break away from the chat UI.

Dialog is cool when exploring but, imo, really painful when trying to accomplish a task. LLMs are far too slow to make a real fluid conversation.

yuvalr1

This means the leading UI for LLMs - the chat - is the wrong UI, at least for some of the tasks. We should instead have a single query text field, like in search engines, that you continue to edit and refine, just like in complex search queries.

freehorse

I like zed's approach, where the whole discussion is a plain text file you can edit as any text, which gives you the ability to change anything in the "discussion" regardless if it was generated by you or the llm. It makes stuff like that much simpler, ie you can correct simple stuff in the llm's response without unecessary back and forths, you can just cut parts out of the discussion to reduce context size or guide the discussion where you actually want removing distractions etc. I don't understand why the dominant approach is an actual, realistic chat interface where you can only add a new response, or in best case create "threads".

diggan

> I don't understand why the dominant approach is an actual, realistic chat interface where you can only add a new response, or in best case create "threads".

I'm not 100% sure either, I think it might just be a first-iteration UX that is generally useful, but not specifically useful for use cases like coding.

To kind of work around this, I generally keep my prompts as .md files on disk, treat them like templates where I have variables like $SRC that gets replaced with the actual code when I "compile" them. So I write a prompt, paste it into ChatGPT, notice something is wrong, edit my template on disk then paste it into a new conversation. Iterate until it works. I ended up putting the CLI I use for this here, in case others wanna try the same approach: https://github.com/victorb/prompta

smrq

I wonder if people in general would have a healthier understanding of LLMs if this mode of interaction was more common. Perhaps it would be more clear that the LLM is a very souped up autocomplete, instead of some other mind to debate.

refsab

I've found the most useful LLM UIs for me are tree-like with lots of branches where you go back and forth between your prompts. You branch off anywhere, edit top or leafs as you go.

If one branch doesn't work out you go back to the last node that gave good results or the top and create another branch with a different prompt from.

Or if you want to ask something in a different direction but don't want all the baggage from recent nodes.

Example: https://exoloom.io/trees

sgillen

I still think there is value in chats and retaining context. But there is also value in starting clean when necessary. Giving users control and teaching people how to use it is the way IMO.

nomel

The problem with retaining context is that it gets polluted. That pollution gets you into a latent space with errors, which probably not where you want your next token prediction to be sourced.

The reasonable alternative is a chat interface that lets you edit any text, the AI response or your prompts, and regenerate from any point. This is why I use the API "playground" interfaces or something like LibreChat. Deepseek at least has prompt editing/regeneration.

diggan

> This means the leading UI for LLMs - the chat - is the wrong UI

For coding, I'd agree. But seemingly people use LLMs for more than that, but I don't have any experience myself. But I agree with the idea that we haven't found the right UX for programming with LLMs yet. I'm getting even worse results with Aider, Cursor and all of those, than just my approach outlined above, so that doesn't seem like the right way either.

null

[deleted]

barnas2

I've also started adding "Ask any questions you think are relevant before starting" to the end of my prompts. It usually results in at least one question that addresses something I didn't think to add to my prompt.

kordlessagain

I’ve been saying “stop writing code until we agree what needs to be done”.

godelski

It seems like the author in fact did do this. They asked Claud the same message. I really doubt they repeated the entire conversation to get to that point, but I may be wrong.

From personal experience, I agree with you, but I wouldn't make the critique here as it is far from a magic bullet. Honestly, with the first stuff it seems faster to learn mermaid and implement it yourself. Mermaid can be learned in a rather short time, the basic syntax is fairly trivial and essentially obvious. As an added benefit, you then get to have this knowledge and use it later on. This will certainly feel slower than the iterative back and forth with a LLM -- either by follow-up conversations or refining your one shot -- but I'm not convinced it will be a huge difference in time as measured by the clock on the wall[0]

[0] idk, going back and forth with an LLM and refining my initial messages feels slow to me. It reminds me of print statement debugging in a compiled language. Lots of empty time.

diggan

> It seems like the author in fact did do this.

It doesn't seem like that to me. At one point in the article: "There are also a few issues [...] Let’s fix with the prompt" and then a prompt that is referring the previous message. Almost all prompts after that seem to depend on the context before them.

My point is that instead of doing that, revise the original initially message so the very first response from the LLM doesn't contain any errors, because (in my experience) that's way easier and faster than trying to correct errors by adding more messages, since they all (even O1 Pro) seem to lose track of what's important in the conversation really fast.

godelski

I'm just saying I don't think they repeated that whole process for Claud.

bpodgursky

100%

To be honest, this would help a lot of person-implemented iteration too, if it was biologically feasible to erase a conversation from a brain.

dingnuts

alright, time for you to go watch Eternal Sunshine of the Spotless Mind so that you can disabuse yourself of that notion

danenania

I built Plandex[1], an open source AI coding agent, partly to enable this workflow.

It has `log` and `rewind` commands that allow you to easily back up to any previous point in the conversation and start again from there with an updated prompt. Plandex also has branches, which can be helpful for not losing history when using this approach.

You’re right that it’s often a way to get superior results. Having mistakes or bad output in the conversation history tends to beget more mistakes and bad output, even if you are specifically directing the LLM to fix those things. Trial and error with a new prompt and clean context avoids this problem.

1 - https://plandex.ai

P.S. I wrote a bit about the pros and cons of this approach vs. continuing to prompt iteratively in Plandex’s docs here: https://docs.plandex.ai/core-concepts/prompts#which-is-bette...

01100011

I tried this approach when attempting to get Deepseek-r1 and GrokV3 to create a simple CUDA application. It was necessary because the iterative approach kept leading to hangs and divergent behaviors. I still wasn't able to get a working application, however.

kordlessagain

I love Claude, but whomever works on their UI needs to be slapped a bit. Code output covering the stop button on my laptop, page lockups on iPhone/Chrome with certain artifacts (even after reload), crazy slow typing on the computer and refusal to “continue” chat with a cheaper model. Simply providing a summary of the chat on running out of tokens would let me start another conversation, or at least a warning I was getting close.

LASR

We use mermaidjs as a supercharged version of chain-of-thought for generating some sophisticated decompositions of the intent.

Then we injected the generated mermaid diagrams back into subsequent requests. Reasoning performance improves for a whole variety of applications.

Garlef

Neat idea!

Could you go into a bit more detail on how you encode the intent?

BOOSTERHIDROGEN

Any simple examples?

graphviz

Random thoughts:

Sketching backed by automated cleanup can be good for entering small diagrams. There used to be an iOS app based on graphviz: http://instaviz.com

Constraint-based interactive layout may be underinvested, as a consequence of too many disappointments and false starts in the 1980s.

LLMs seem ill-suited to solving the optimization of combinatorial and geometric constraints and objectives required for good diagram layout. Overall, one has to admire the directness and simplicity of mermaid. Also, it would be great to someday see a practical tool with the quality and generality of the ultra-compact grid layout prototype from the Monash group, https://ialab.it.monash.edu/~dwyer/papers/gridlayout2015.pdf (2015!!)

ttd

Oh wow, thank you for linking that paper. I've been working an interactive tool for a while and have been musing on new constraint and layout types to add. Anecdotally it seems a lot of mainstream graph layout algorithms work well for small to mediumish complexity inputs, but then quickly start generating visual spaghetti. So this looks incredibly apropos for me.

relaxing

App is unavailable in the US :(

teleforce

Thanks for link to the Monash's paper.

>LLMs seem ill-suited to solving the optimization of combinatorial and geometric constraints and objectives required for good diagram layout.

I think this is where LLM distance NLP cousin can be of help namely CUE since fundamentally it's based on feature structure from the deterministic approach of NLP unlike LLM that's stochastic NLP [1],[2],[3].

Based on the Monash's paper, Constraint Programming (CP) is one of the popular approaches that's being used for the automatic grid layout.

Since CUE is a constraint configuration language belong to CP, and its NLP background should make it easier and seamless to integrate with LLM. If someone somehow can crack this then it will be a new generation LLM that can perform good and accurate diagramming via prompts and it will be a boon for the architect, designer and engineer. Talking about engineer, if this approach can also be used for IC layout design (analog and digital) not only for diagrams, it will easily disrupt the multi-billion dollars industry for the very expensive software for IC design and man powers.

I hope I'm not getting ahead of myself, but ultimately this combo can probably solve the "holy grails" problem mentioned towards the end of the paper's conclusions regarding layout model that somehow incorporates routing in a way that is efficiently solvable to optimality. After all some people in computer science consider CP as "holy grails" of programming [4].

Please someone somehow make a start up, or any existing YC startup like JITX (Hi Patrick) can look into this potential fruitful endeavor of hybrid LLM combo for automated IC design [5].

Perhaps your random thoughts are not so random but deterministic non-random in nature, pardon the pun.

[1] Cue – A language for defining, generating, and validating data:

https://news.ycombinator.com/item?id=20847943

[2] Feature structure:

https://en.m.wikipedia.org/wiki/Feature_structure

[3] The Logic of CUE:

https://cuelang.org/docs/concept/the-logic-of-cue/

[4] Solving Combinatorial Optimization Problems with Constraint Programming and OscaR [video]:

https://m.youtube.com/watch?v=opXBR00z_QM

[5] JITX: Automatic circuit board design:

https://www.ycombinator.com/companies/jitx

vunderba

Related - a nice time saver that I've been using since they added image recognition support to ChatGPT has been taking a quick snap of my crudely drawn hand sketched diagrams (on graph paper) with my phone and asking ChatGPT to convert them to mermaid UML syntax.

seeingnature

Comments like these are why I come to hacker news! I'm working on a project right now where I've been learning mermaid, but have gotten to the point where it would be easier for me to draw it out and convert this way by a lot. I'll try this!

30minAdayHN

I was thinking about the similar topic and started to wonder if I can generated a diagram of a large codebase.

I thought that LLMs are great at compressing information and thought of putting it to good use by compressing a large codebase into a single diagram. Since entire codebase doesn't fit in the context window, I built a recursive LLM tool that calls itself.

It takes two params: * current diagram state, * new files it needs to expand the diagram.

The seed set would be an empty diagram and an entry point to source code. And I also extended it to complexity analysis.

It worked magically well. Here are couple of diagrams it generated: * https://gist.github.com/priyankc/27eb786e50e41c32d332390a42e... * https://gist.github.com/priyankc/0ca04f09a32f6d91c6b42bd8b18...

If you are interested in trying out, I've blogged here: https://updates.priyank.ch/projects/2025/03/12/complexity-an...

stared

GPT 4o is not particularly good at this kind of logic, at least compared to other current models. Trying something that is at least in the top 10 from this WebDev Areans leaderboard: https://web.lmarena.ai/leaderboard would help.

Make sure it is allowed to think before doing (not necessarily in a dedicated thinking mode, it can be a regular prompt to design a graph before implementing it; make sure to add in a prompt who the graph is for (e.g. "a clean graph, suitable for a blog post for technical audience").

McNutty

You have got more patience than me. I have tried to use these tools to generate (basic) network diagrams and by the time I reached your third step I already knew that it was time to quit and draw it out myself. Diagrams need to be correct and accurate otherwise they're just art. I also need any amendments to be made to the same diagram, not to have it regenerated each time.

I do like the idea of another commenter here who takes a photo of their whiteboard and instructs the AI tool to turn it into a structured diagram. That seems to be well within reach of these tools.

larodi

Claude does quite alright. Across one and a half year I did more than several dozens of Mermaid diagrams of all kinds, and only the most complex perhaps were out of reach.

It also really depends on the printing.

layer8

Printing?

larodi

hmmm let me remember what did I want to say. hmmm hmm hmmm.

depends on the prompting I guess :D

sorry

RKFADU_UOFCCLEL

The "AI" we have now is just a tweening algorithm on a different medium. You won't be able to get it to do anything specific, except when that's a point between 2 existing works. As for this blog, it's nigh unreadable for those not following the current fad web frameworks. Who's to say the user doesn't have to log in to get to the gateway? Gateway can mean different things. Why can the user choose to upload images instead of logging in? What was the purpose of the log in?

victorbjorklund

I have had good success with D2 diagrams with Claude: https://victorbjorklund.com/build-diagrams-as-code-with-d2-d...

They have icons for common things like cloud things.

chillingeffect

Ive had similar results asking chatgpt to generate inout files for graphviz "dot". Pretty good. E.g. i asked it to summarize a complex article and draw the people named and their relationships. I also got it embellish the diagrams a little but, but it needed a lot of guidance to know what kind of nodes to add.

But it was good at arranging the elements in timeline order for example.

cadamsdotcom

Thanks for writing this up. Some questions for the author:

Interesting perspective but it’s a bit incomplete without a comparison of various models and how they perform.

Kind of like Simon Willison’s now-famous “pelican on a bicycle” test, these diagrams might be done better by some models than others.

Second, this presents a static picture of things, but AI moves really fast! It’d also be great to understand how this capability is improving over time.

ndr_

I talk about this, kind-of, in my article about process visualization (in German, available behind paywall and in print). It‘s not rigorous in the sense that I give points, but a picture emerges along the way. Based on the full set of practical examples there, I would recommend the „v1“ of Claude 3.5 Sonnet. GPT 4.5 also looks good, but I haven‘t run the full suite.

https://www.heise.de/ratgeber/Prozessvisualisierung-mit-gene...

submeta

Try asking llm to generate plantuml markup (use case, statechart, etc) which has some other diagram types in addition to mermaid markup. Then paste it into the free plantuml renderer. Works pretty well.

I also experimented with bpmn markup (xml). Realized there are already repos on GitHub creating bpmn diagrams from prompt.

You can also ask llms to create svg.

antman

Bpmn diagrams from prompt? Did you try any that was good?

monkeydust

Plantuml works pretty well with openai models.

cptnmrvl

plantuml also supports AWS icons