Goose: An open-source, extensible AI agent that goes beyond code suggestions

70 comments

·January 30, 2025

alexkehayias

So I gave goose a whirl and I actually really like the approach they are taking, especially because I use emacs and not vscode. I would recommend people try it out on an existing project—the results are quite good for small, additive features and even ones that are full stack.

Here's a short writeup of my notes from trying to use it https://notes.alexkehayias.com/goose-coding-ai-agent/

terhechte

Which LLM did you use with Goose? That really affects the quality of the outcome

alexkehayias

I’m using gpt-4o which I think is the default for the OpenAI configuration.

Haven’t tried other models yet but old like to see how o3-mini performs once it’s been added.

alexjhancock

Hi - Alex here from the Goose team. We do already support o3-mini. If you're using the OpenAI provider you can enter that model in the settings screen of the app. Let me know if you have any trouble.

XorNot

I don't know how useful this is, but my immediate reaction to the animation on the front page was "that's literally worse then the alternative".

Because the example given was "change the color of a component".

Now, it's obviously fairly impressive that a machine can go from plain text to identifying a react component and editing it...but the process to do so literally doesn't save me any time.

"Can you change the current colour of headercomponent.tsx to <some color> and increase the size vertical to 15% of vh" is a longer to type sentence then the time it would take to just open the file and do that.

Moreover, the example is in a very "standard" format. What happens if I'm not using styled components? What happens if that color is set from a function? In fact none of the examples shown seem gamechanging in anyway (i.e. the Confluence example is also what a basic script could do, or a workflow, or anything else - and is still essentially "two mouseclicks" rather then writing out a longer English sentence and then I would guess, waiting substantially more time for inferrence to run.

taneq

On the one hand, this isn’t a great example for you because you already knew how to do that. There’s probably no good way to automate trivial changes that you can make off the top of your head, and have it be faster than just doing it yourself.

I’ve found LLMs most useful for doing things with unfamiliar tooling, where you know what you want to achieve but not exactly how to do it.

On the other hand, it’s an okay test case because you can easily verify the results.

robertwt7

I agree with the process not saving any of our time. however aren't examples supposed to be simple?

Take it from Aider example: https://github.com/Aider-AI/aider It asked to add a param and typing to a function. Would that save us more time? I don't think so. but it's a good peek of what it can do

just like any other hello world example i suppose

two_handfuls

Examples are supposed to be simple when they illustrate a process we already know works.

With AI the challenge is that we need to convince the reader that the tool will work. So that calls for a different kind of example.

throwaway290

If you don't know how implement it how can you be sure LLM will do it correctly?

If the task is not simple then break it into simple tasks. Then each of them is as easy as color change.

pjm331

Yeah the fact that just composing the first prompt would take me longer than just doing the thing is my biggest blocker to using any of these tools on a regular basis

ehnto

Which is also assuming it gets it right the first prompt, and not 15 minutes of prompt hacking later, giving up and doing it the old fashioned way anyway.

The risk of wasted time is higher than the proposed benefit, for most of my current use cases. I don't do heaps of glue code, it's mostly business logic, and one off fixes, so I have not found LLMs to be useful day to day at work.

Where it has been useful is when I need to do a task with tech I don't use often. I usually know exactly what I want to do but don't have the myriad arcane details. A great example would be needing to do a complex MongoDB query when I don't normally use Mongo.

oxidant

Cursor + Sonnet has been great for scaffolding tests.

I'll stub out tests (just a name and `assert true`) and have it fill them in. It usually gets them wrong, but I can fix one and then have it update the rest to match.

Not perfect, but beats writing all the tests myself.

juunpp

It advertises that it runs locally and that it is "extensible" but then requires you to set up a remote/external provider as the first step of installation? That's a rather weird use of "local" and "extensible". Do words mean anything anymore?

raincole

You went as far as checking how it works (thus "requires you to set up a remote/external provider as the first step").

But you didn't bother checking the very next section on side bar, Supported LLM Providers, where ollama is listed.

The attention span issue today is amusing.

anonzzzies

> The attention span issue today is amusing.

I find it rather depressing. I know it's a more complex thing, but it really feels irl like people have no time for anything past a few seconds before moving onto the next thing. Shows in the results of their work too often as well. Some programming requires very long attention span and if you don't have any, it's not going to be good.

unification_fan

But this is an elevator pitch. I didn't come here to be marketed to, yet I am being marketed to.

So if you're going to market something to me at least do it right. My attention span is low because I don't really give a shit about this.

EVa5I7bHFq9mnYK

But people really have no time. There is only one brain and thousands of AI startups pitching something every day.

tonygiorgio

Can’t you just run ollama and provide it a localhost endpoint? I dont think its within scope to reproduce the whole local LLM stack when anyone wanting to do this today can easily use existing better tools to solve that part of it.

null

[deleted]

demarq

Did you not see Ollama?

hiyer

You can use it with ollama too

kylecazar

Yeah, they seem to be referring to the Goose agent/CLI that are local. Not models themselves.

anonzzzies

You can run ollama, so no, not only Goose itself.

kylecazar

Fair, but the repeated references to local/on-machine on the project's homepage which OP criticized is, I would think, in reference to the Goose agent.

Have many other projects put MPC servers (https://modelcontextprotocol.io/introduction) to use since it was announced? I haven't seen very many others using it yet.

paulgb

Cursor also just got support this week. Overall it’s still early (MCP only came out a couple of months ago) but seeing multiple clients that allow it to be used with non-Anthropic models, and getting good results, makes me bullish for MCP.

My colleague has working on an MCP server that runs Python code in a sandbox (through https://forevervm.com, shameless plug). I’ve been using Goose a lot since it was announced last week for testing and it’s rough in some spots but the results have been good.

Keyframe

I don't know if anyone find this useful, but it seems rather useless / not working? I tried with numerous online and local llms for good measure. I installed that computerController extension and tried couple of dozens different versions of open a website (url) in a browser and save a screenshot. Most of the time it wouldn't even open a website, and I never got a screenshot. At best it did open a website once and saved a html (even though I asked a screenshot); and that one was unique in a bunch when it did something instead of complaining it can't find AppleScript or whatever on a linux machine.. I qualified the ask by telling it it's on linux. It managed to find which distro it was on even. Really weird overall.

DrewHintz

I’ve had luck with a workflow similar to:

git clone a repo

Open goose with that directory

Instruct it to discover what the repo does

Ask it to make changes to the code, being detailed with my instructions.

I haven't tried computerController, only Goose’s main functionality.

pzo

This looks very promising. I only played a little bit yesterday but they really need to polish the UI. Comparing to desktop version of chatgpt or perplexity they are in much lower league. Some feedback for team:

1) use better font and size

2) allow to adjust shortcuts and have nice defaults with easy change

3) integrate with local whisper model so I can type with voice triggered with global shortcut

4) change background to blend with default system OS theme so we don't have useless ugly top bar and ugly bottom bar

5) shortcuts buttons to easily copy part of conversion or full conversation, activate web search, star conversion so easy to find in history etc.

They should get more inspiration from raycast/perlexity/chatgpt/arcbrowser/warpai ui/cursor

grimgrin

this is the page i’d link to

https://block.github.io/goose/docs/goose-architecture/

yencabulator

This is amusing/worrying.

> Make sure to confirm all changes with me before applying.

https://block.github.io/goose/docs/guides/using-goosehints

So, we're supposed to rely on LLM not hallucinating that it is allowed to do what it wants?

ramesh31

>So, we're supposed to rely on LLM not hallucinating that it is allowed to do what it wants?

Yes. Frontier models have been moving at light speed over the last year. Hallucinations are almost completely solved, particularly with Anthropic models.

It won't be long before statements like this sound the same as "so you mean I have to trust that my client will always have a stable internet connection to reach out to this remote server for data?".

feznyng

This is missing the human language ambiguity problem. If you don't perfectly specify your requirements and it misinterprets what you're asking for that's going to be a problem regardless of how smart it is. This is fine with code editing since you've got version control and not so great when running commands in your terminal that can't be as trivially reverted.

Besides that, you can absolutely still trick top of the line models: https://embracethered.com/blog/posts/2024/claude-computer-us...

Hallucination might be getting better, gullibility less so.

threecheese

As a regular Claude user, incorrectness is not anywhere near solved; it may be domain-dependent, but just yesterday 3.5 invented a Mac CLI tool that does not exist (and would’ve been pretty useful if it had). I cannot take anything factual at face value, which is actually OK as long as net/net I’m still more productive.

But... how does it work? The documentation is really confusing . How to make it aware of code file and project structure?

DrewHintz

I tell it to discover that itself by asking leading questions: “What does this repo do?” “How do you run its unit tests?” “What does file foo do?”

bckr

Today I decided that what I need is:

- prompt from command line directly to Claude

- suggestions dumped into a file under ./tmp/ (ignored by git)

- iterate on those files

- shuttle test results over to Claude

Getting those files merged with the source files is also important, but I’m not confident in a better way than copy-pasting at this point.

lordswork

I'm building this for myself because I want it too. I recommend you do the same, because it's been really fun to build and teaches you a lot about what LLMs can and cannot do well.

I plan to share it on Github, but waiting for my employer's internal review process to let me open source it as my own project, since they can legally claim all IP developed by me while employed there.

horsawlarway

I also built this for myself, and I really do suggest it as a good project to get a grounded idea of what models can handle.

Mainly - tool calling support just merged in llama.cpp (https://github.com/ggerganov/llama.cpp/pull/9639) this week, and it's been a fun exercise to put local LLMs through the wringer to see how they do at handling it.

It's been a mixture of "surprisingly well" and "really badly".

vessenes

Aider is fantastic. Worth a look.

bckr

I’ve been playing with it and I don’t like it that much? I’m not sure why. It feels a little buggy and like it’s doing too much.

vessenes

Interesting. I occasionally feel that way with Claude as the backend. Still the best backend that’s reliable, although o3-mini-high in architecture mode with Claude is very good.

I find Claude wants to edit files that I don’t like to be edited often. Two ways I deal with that - first, you can import ‘read only’ files, which is super helpful for focusing. Second, you can use a chat mode first to talk over plans, and when you’re happy say “go”.

I think the thing to do is try and use it at a fairly high level, then drop down and critique. Essentially use it as a team of impressive motivation and mid quality.

yoyohello13

You’re being downvoted for some reason but I feel the same. It’s cool tech but I’ve found I often need to revert changes. It’s far too aggressive with tweaking files. Maybe I can adjust in the settings, idk. Also, it’s expensive as hell to run with Claude sonnet. Cost me like $0.01 per action on a small project, insane. At this point I still prefer the chat interface.

anonzzzies

There are many (ignored) requests, to, like cursor, copilot and cline, automatically pick the files without having to specify them. Not having that makes it much worse than those others. I was a fan before the others but having to add your files is not a thing anymore.

bckr

Hmm, I want to add my own files. This is because in my workflow I often turn to the web UI in order to get a fresh context.

I do like the idea of letting the model ask for source code.

It’s all about attention / context.

bckr

I’ve almost finished an interactive file selector inspired by git add interactive, with the addition of a tree display.

I’m giving myself the option to output collated code to a file, or copy it to clipboard, or just hold onto it for the next prompt.

I know aider does this stuff, but because I’m automating my own workflow, it’s worth doing it myself.

dbdoskey

You should try cline. I found it with Anthropic to be invaluable.

bckr

I may give it a spin after finishing this project :)

bArray

Running locally is such an important feature, running elsewhere is an instant non-starter for me. I also want the LLM to be able to read the code to build an in-context solution, but not be able to make changes unless they are strictly accepted.

93po

I'm confused what this does that Cursor doesn't. The example it shows on the front page is something Cursor can also easily do.

lordswork

The one that stands out most to me is that it doesn't bundle the AI features with the editor. This is a standalone application that runs on the side of your editor/tools of choice.

netfl0

Open source licensing where the limit is on the ai-backend right?

HN

Goose: An open-source, extensible AI agent that goes beyond code suggestions

Goose: An open-source, extensible AI agent that goes beyond code suggestions