Rethinking CLI interfaces for AI
57 comments
·July 19, 2025swax
skydhash
The one thing that I always wonder is how varied are those interactions with an agent. My workflow is is enough of a routine that I just write scripts and create functions and aliases to improve ergonomics. Anything that have to do with interacting with the computer can be automated.
swax
Yea a lot of this is experimental, I basically have plain text instructions per agent all talking to each other, coordinating and running an entire pipeline to do what would typically be hard coded. There’s definite pros and cons, a lot of unpredictability of course, but also resilience and flexibility in the ways they can work around unexpected errors.
yoavm
Or you can give your AI agent access to your terminal. I've been using https://github.com/hiraishikentaro/wezterm-mcp/ with gemini-cli and it generally allows it to use the terminal like I would, so stuff like scrolling inside interactive TUIs etc more-or-less just works.
j45
Appreciate the share!
I might give access to a terminal in a locked down VM, I don't know about a shell.
SoftTalker
At least give it its own login (and no sudo privileges).
gtirloni
Somehow a whole industry is now fine with Heisenbugs being a regular part of the dev workflow.
fullstackwife
the salary raise and promo project industry within large corps is fine with that
there is everyone else who is supposed deliver software that works, like always, and they are not fine with built-in flakiness
BrianCripe
Agree 100% that CLI interface design needs to be altered to include AI Agents as a new type of user persona, but I don't think it's as drastic of a change as one might expect.
We designed Desktop GUI & Web Browsers on top of the terminal to allow a type of user to interact without speaking "lower level" commands, but we've also created abstractions to hide complexity for ourselves at this layer. We just so happen to call them CLI Apps, Scripts, Makefile targets, Taskfile tasks, Justfile recipes, unix tools, etc. It consists of a pseudo-natural language short-code name combined with schema-validated options and some context around what each option does (via the --help view). The trick is how do we optimize for both human developers and AI Agents to have access to the same tools but in the optimized interface for each.
In an experiment to let my agents share the exact same 'tools' that I do for developing in a repository, I gave it direct access to load and self-modify the local project Justfile via MCP: https://github.com/toolprint/just-mcp
Just as (pun intended) I create tools for myself to repeat common tasks with sane defaults and some parameters, my agents immediately gain the same access and I can restrict permissions to use these instead of ANY bash command (IE: "Bash(just:*)"). The agent can also assist in creating tools for me or itself to use that would save on time and token usage. I'd love to see the paradigm evolve to the point it feels more like warp.dev where you don't have to switch between two text boxes to choose whether you're talking in natural language or instructing to run a known 'tool'.
skydhash
Interfaces and tools are orthogonal. It's like a hammer. The head is what is used on the nail. While the handle is shaped to fit the human hand. We can modify one without modifying the other. Another good example is Magit (or Lazygit) and git. Magit is designed to be used interactively, while is more about the domain of version control.
Workflows are humans processes, what we do is naming them and identify their parameters. The actual tools to implement those workflows don't matters that much at a human scale other than cognitive load. So I don't care much about gcc various options. What I want is `make debug` or `make release` (or just `make`). And cognitive load is lowered because I only have these to remember and they are deterministic.
Agent is not a good bridge between humans and tools. Because they increase cognitive load, while all the interface have been about lowering it. There's no "make test" and have a nice output of all the lines that have been flagged (and have some integration like Vim's quickfix which can quickly bring you to each line). Instead it's typing a lot and praying that it actually do something good.
chubot
I do think it’s interesting how Claude Code makes shell and dev automation more important – it also makes testing and code review more important
So there is probably some room for innovation here
But most of these seems like problems with Claude (and maybe fundamental problems with LLMs), not problems with the CLI interface:
This started a game of whack-a-mole where the LLM would also attempt to change the pre-commit hooks! I had to fix it by denying Edit(.git/hooks/pre-commit) to my project’s .claude/settings.json. I look forward to its next lazy innovation.
If you watch Claude Code, you’ll see that it often uses head -n100 to limit the results apriori. It also gets lost about which directory it’s in, and it will frustratingly flail around trying to run commands in different directories until it finds the right one.
kordlessagain
Agree with the whack-a-mole effect, where it goes from nailing the problem or bug to absolutely destroying the code. I would offer some of these MCP tools I wrote/had written to solve the problem: https://github.com/kordless/gnosis-evolve. Tools are in contrib-tools.
It has helped tremendously having a dedicated build service that CC can control through MCP vs running Docker itself because it can then restart the container and test. And, the fuzzy search tool and diff editor seem to perform better than the replacement strategy Claude Code uses, most of the time. I continue to work on the editor when I run into issues with it, so happy to help anyone interested in implementing their own file editing (and search) strategy.
You will need to convert these to Claude Code format, but all you need to do is ask CC to do it for you...
pbronez
Your license is interesting. To meet your intent, I suggest you revisit this definition:
> “Military Entity" means any armed forces branch, defense department, or military organization of any nation or alliance.
As written, this only applies to nation states. It excludes many kinds of human organizations that use force to impose their will on others. The word for this is “Terrorist.”
While that term has been applied to many groups for many reasons, it technically means “the use of violence against non-combatants to achieve political or ideological aims.”
If you add Terrorism + Nation-State Militaries should cover most everyone you intend here, including organized crime and private military contractors. You could add “financial gain” to the definition if you want to ensure those last two are captured.
com2kid
I'd argue that many CLI tools output too much log spew by default and rely on making humans take up the burden of parsing through masses of output to find the one useful line.
For another example of where this is a problem, look at any large company that pays to keep logs in kibana, the amount of over logging paid for is insane.
Approximately 1/3rd of my Claude code tokens are spent parsing CLI output, that is insane! Often Claude doesn't catch what it needs in the massive log outputs and I have to go through the logs myself to find the problem and point it out to Claude.
jerpint
I’ve been building a context-engineering tool for collaborating with LLMs. The CLI is for the human and the MCP is for the LLM, but they all map to the same core commands
https://github.com/jerpint/context-llemur
I’ve actually bootstrapped ctx with ctx and found it very useful !
It basically stops me from having to repeat myself over and over to different agents
duncanfwalker
The --no-verify example is interesting because I can imagine the same hint being useful for junior engineers. In general it's hard to give the the right level of advice in cli docs because you don't always know who the consumer will be and so what knowledge can be assumed. The think that makes LLMs different is that there's no problem to being verbose in the docs because you're not wasting any human's time. It would be cool if you could docs that provide extra advice like in the example and then the interface adapted for the users context - for LLMs provide everything, for human users learn what they know give them just the right level of advice
null
AchintyaAshok
I think part of this is that we're in a transition phase. The shell cmds we have built (for example) were built for human consumption (ex. manpages). They were built around the expectation that we learn how to use it through experimentation or were taught by more knowledgable peers. In the AI world, we basically need to assume that role of the guide / sherpa for the LLM.
Another idea that I've been thinking about is context hierarchy:
Low -> High Utility
Base (AI reads tool desc/manpage,etc.) > General human advice (typically use grep this way, etc.) > Specific advice (for this project / impl this is how you use the tool).
Currently the best interface to provide our insights are via MCPs. At https://toolprint.ai/ we're building a human (or machine) driven way supplement that knowledge around tool-use to Claude/Cursor, etc.
A practical way in which we dogfood our own product is with the Linear MCP. If you connect that and ask an agent to create a new issue, it predictably fails because there's no instructions on which linear project to select or the correct way to provide a description around Linear's quirks. When we connect the linear mcp via the toolprint mcp, it gets pre-primed context around these edge cases to improve tool use.
skydhash
The shell is an interface. The computer is the tool. Then we find that we have workflows that are actually routine. And we create scripts to handle them. Then we find that they are contextual, and we created task runners to provide the context.Then our mental capacity is freed, while the computer takes care of the menial stuff. And everything is good.
That is generally how it goes for power users. And people that takes the time to RTFM.
But now, I see people that don't want to determine their workflows. It's just ad-hoc use, spending their time on local decisions that doesn't matter that much instead of grasping the big picture and then solves it. Maybe it helps them looks busy.
So I don't want an agent for Linear. What I want is maybe a bash alias "isc" (for "issue create"), that pops up nano where I write the issue in git commit format (title + blank line + description). Upon saving, the rest is done automatically, because it can determines the project based on some .trackerrc I put in the root of the project. Or maybe a "linear-issues" emacs command and a transient interface (again, the correct project can be determined automatically).
jasonriddle
This is a great post, thank you for sharing. I like the idea of giving hints to the LLMs.
To clarify, the example that was provided using `command_not_found_handler`, is that possible to implement in bash? Or perhaps you were saying this would be a nice to have if this functionality existed?
withzombies
The `command_not_found_handler` can be added to your .zshrc or .bashrc as is.
layer8
If I can learn how to use the Bulk Rename Utility (it’s actually quite useful once you get to grips with it), then AI should be able to as well. ;)
There’s the saying that computers should adapt to us, rather than the other way around, but now this makes me wonder which side LLMs are on in that picture.
kristopolous
Just yesterday I updated a tool to parse and snip sections of manpages I made in 2020 to have an LLM ingestion feature for fitting partial manpages into tight context windows (https://github.com/day50-dev/Mansnip).
There may be something more generalizable here.
tomrod
That's pretty cool, man (pun intended).
Throwing this out there, I have a command line driver for LLMs. Lots of little tricks in there to adapt the CLI to make it amiable for LLMs. Like interrupting a long running process periodically and asking the LLM if it wants to kill it or continue waiting. Also allowing the LLM to use and understand apps that use the alternate screen buffer (to some degree).
Overall I try to keep it as thin a wrapper as I can. The better the model, the less wrapper is needed. It's a good way to measure model competence. The code is here https://github.com/swax/NAISYS and context logs here for examples - https://test.naisys.org/logs/
I have agents built with it that do research on the web for content, run python scripts, update the database, maintain a website, etc.. all running through the CLI, if it calls APIs then it does it with curl. Example agent instructions here: https://github.com/swax/NAISYS/tree/main/agents/scdb/subagen...