What Is the Most Minimal Agentic Coder You Can Write?

mrieck

I recently looked at source for Cline because I’m making my own coding agent and they use tools without using tools.

See how they request the agent to use xml to call tools: https://github.com/cline/cline/blob/main/src/core/prompts/sy...

And also: https://github.com/cline/cline/blob/main/src/core/Cline.ts

ai-christianson

In RA.Aid (https://github.com/ai-christianson/RA.Aid) we support normal tool calling, but we also have a our own agent backend called Code Is All You Need.

The way it works is we prompt the LLM to generate code to call the tools directly. We then parse the AST to validate it and restrict it to a very small/safe subset (whitelisted function calls.)

HF smolagents does something similar and there are a few papers out there validating the approach.

bhouston

That makes sense because Cline existed prior to tool use being part of the standard LLM APIs. I think OpenAI had it at that point, but Anthropic and Ollama hasn't yet standardized on an API.

But the vendors are training their models specifically on tool use with their expected internal formats so you might as well leverage that if you are starting fresh now.

pcwelder

Awesome! I essentially started with the same premise when coding "wcgw" [1]. Give a single shell tool to chatgpt (later claude).

Chronology that followed:

1. Some processes take long time: add status checking.

2. Need to kill long running process or interact with tools add option to send control characters; BashInteraction tool.

3. File editing using sed and echo is not reliable, add FileEdit tool.

4. Large output; add truncation based on tokens.

5. But wait I need command output to get truncated at the beginning but file reading `cat $file` to truncate at the end. Add special ReadFile tool.

6. File writing using 'echo "content" >file' needs unncessary quotes escaping leading to degraded quality, add special WriteFile tool.

7. Reading one file at a time is slow, change ReadFile to ReadFiles tool.

8. Directly writing files is not as good as first planning things. Prevent writing using architect mode.

And so on.

[1] https://github.com/rusiaaman/wcgw

bhouston

Yes, it seems to be the same progression. This blog post covers my progression from the minimalist beginnings:

https://benhouston3d.com/blog/building-an-agentic-code-from-...

My chronology can be summarized as:

Phase 1:

- Add async shell commands with input/output/status checking/kill - combines all of the required functionality.

- text editing that can read/write sub-blocks, rather than whole files. But not multiple files yet (but I can look into that now.)

- Add fetch

- Add browser tools with async input/output

- Add sub-agents (to reduce context, send them on research tasks or other sub-tasks to report back.)

Phase 2:

- Github mode (basically prompt it to use git and gh cli tools in its workflows)

- Github action integration (run automatically on the cloud in response to triggers)

- Token caching (reduce costs by 80%)

I will check out your planning stage. I've been having my tool run off GitHub issues so I get it to write a plan there and then implement, which is sort of the same thing in a way.

joshka

This was an experiment from a couple of years back with making the agent basically re-write itself based on the instructions you provide. You could probably code golf this shorter depending on the language. Arguably, this doesn't solve the problem directly, but it allows you to vibe your way there.

        import os
        import openai

        instruction = input("Enter an instruction: ")

        script_name = os.path.basename(__file__)
        script_code = open(script_name).read()
        response = openai.Edit.create(
                model="code-davinci-edit-001",
                input=script_code,
                instruction=instruction,
                temperature=0)

        new_script_code = response["choices"][0]["text"]
        with open(script_name, "w") as f:
            f.write(new_script_code)

bhouston

(Hacker news is really messed up, I didn't submit this 2hr ago but rather last week - see here: https://news.ycombinator.com/submitted?id=bhouston But now it is on the front page with 3 upvotes saying it is 2hr old. Anyhow, I'll take it.)

hnuser123456

Posts that fall off the front page have a chance to re-appear on the front page a few days later.

Anyways, back to the actual content: my current best agent framework is about 500 lines or 20 kb of python, a large part of that is strings of instructions for various stages of development.

Like you, I start with a system prompt about "you will be given a programming project description, please write a full attempt at the project and we will execute and provide results", then extract the code, run it, copy the console output or error to the script, and ask if the project description is fully satisfied, looping until it judges that the project is fully complete.

I do also have some minified versions that are around 60-70 LOC, but at this point, my interest is shifting to more features that are still implemented simply and modularly. For example, I'd like to be able to manage multi-file projects, allow the LLM to test interactive CLI programs, implement a backup and rollback system for when the LLM seems to get stuck and unable to produce a functioning build or has a major regression, implement git repos, implement web browsing (for coding documentation research) and GUI development with playwright and a local multimodal model, and ideally be able to tell my computer to work on a project while I'm in the office or asleep, and wake up the next day to a significantly improved complex project, ideally eventually getting it able to work on itself and developing its own automated developer, using entire programming projects of increasing complexity as test cases, measuring how quickly a framework can accomplish a task of a given difficulty, and optimize for increasing how quickly it can reliably develop complex applications.

Some of my test projects include:

Write a script to interact with an LLM API endpoint at localhost:123/v1/

Write a python script to get the public IP and local weather using free APIs

Stuff like that.

I could share some of my agent frameworks with you when I get home tonight if you'd like.

bhouston

> I could share some of my agent frameworks with you when I get home tonight if you'd like.

I'm super interested! Please share!

aricblunk

Okay, finally got home and finished enough errands for now, here you go:

https://github.com/aricblunk/pyautoaidev

The python script with the short name is the entire script.

The text file with the medium length name is a log of a full run, which was successful on the first iteration and I gave it no additional feedback.

The python and text files with fdbk and iter in the name are created for each time the LLM makes and runs a new code attempt, the text file is the log of the output of the matching named python script. Iter is incremented for each time LLM self-retried, Fdbk is incremented each time the human gives additional input when the LLM thought it accomplished all goals.

neilv

IIUC, someone (dang?) can give a post a second chance.

But the way the software does it involves changing the apparent timestamps.

(It will also claim that people made comments when they didn't, which is problematic -- from potentially making it look like someone was goofing off when they were supposed to be crunching at work, to making it look like they said something after they'd already changed their mind based on new information, to potential legal problems when someone appears to be speaking on a topic during a period they aren't allowed to.)

bryanlarsen

This likely means that somebody else submitted your URL 2 hours ago. HN translates duplicate submissions into upvotes.

I wouldn't be surprised if HN weights resubmission more strongly than simple upvotes since it's a much stronger indication of interest.

senko

Might have been resurfaced by a mod to give it a second chance: https://news.ycombinator.com/item?id=26998309

null

[deleted]