Show HN: Pickaxe – A TypeScript library for building AI agents

18 comments

·June 20, 2025

Hey HN, Gabe and Alexander here from Hatchet. Today we're releasing Pickaxe, a Typescript library to build AI agents which are scalable and fault-tolerant.

Here's a demo: https://github.com/user-attachments/assets/b28fc406-f501-442...

Pickaxe provides a simple set of primitives for building agents which can automatically checkpoint their state and suspend or resume processing (also known as durable execution) while waiting for external events (like a human in the loop). The library is based on common patterns we've seen when helping Hatchet users run millions of agent executions per day.

Unlike other tools, Pickaxe is not a framework. It does not have any opinions or abstractions for implementing agent memory, prompting, context, or calling LLMs directly. Its only focus is making AI agents more observable and reliable.

As agents start to scale, there are generally three big problems that emerge: 1. Agents are long-running compared to other parts of your application. Extremely long-running processes are tricky because deploying new infra or hitting request timeouts on serverless runtimes will interrupt their execution. 2. They are stateful: they generally store internal state which governs the next step in the execution path 3. They require access to lots of fresh data, which can either be queried during agent execution or needs to be continuously refreshed from a data source.

(These problems are more specific to agents which execute remotely -- locally running agents generally don't have these problems)

Pickaxe is designed to solve these issues by providing a simple API which wraps durable execution infrastructure for agents. Durable execution is a way of automatically checkpointing the state of a process, so that if the process fails, it can automatically be replayed from the checkpoint, rather than starting over from the beginning. This model is also particularly useful when your agent needs to wait for an external event or human review in order to continue execution. To support this pattern, Pickaxe uses a Hatchet feature called `waitFor` which durably registers a listener for an event, which means that even if the agent isn't actively listening for the event, it is guaranteed to be processed by Hatchet and stored in the execution history and resume processing. This infrastructure is powered by what is essentially a linear event log, which stores the entire execution history of an agent in a Postgres database managed by Hatchet.

Full docs are here: https://pickaxe.hatchet.run/

We'd greatly appreciate any feedback you have and hope you get the chance to try out Pickaxe.

Visit

movedx01

This is great, and I keep my fingers crossed for Hatchet!

One use case I imagine is key here is background/async agents, so OpenAI Codex/Jules style, so that's great if I can durably run them with Pickaxe(btw I belive I've read somewhere in temporal docs or some webinar that Codex was built on that ;), but how do I get that real-time and resumable message stream back to the client? The user might reload the page or return after 15 minutes, etc. I wasn't able to think of an elegant way to model this in a distributed system.

gabrielruttner

perfect use case and this was one of the reasons we built pickaxe, we have a number of coding agents/pr review platforms powered by hatchet with similar patterns already... more to come on the compute side for this use case soon

we'll have agent->client streaming on the very short term roadmap (order of weeks), but haven't broadly rolled out since its not 100% ready for prime time.

we do already have wait for event support for client->agent eventing [1] already in this release!

[1] https://pickaxe.hatchet.run/patterns/human-in-the-loop

cmdtab

Pretty much was my first question if there was support for streaming events. Any way we could be beta tester? ༼ つ ◕_◕ ༽つ

gabrielruttner

100% shoot me a note at gabe [at] hatchet [dot] run and we can share some details here for the signatures that exist, but are going to change

jskalc92

If I understand it correctly, tools and agents run() method works in a similar way to react hooks, correct?

Depending on execution order, tool is either called or a cached value returned. That way local state can be replayed, and that's why "no side effects" rule is in place.

I like it. Just, what's the recommended way to have a chat assistant agent with multiple tools? Message history would need to be passed to the very top-level agent.run call, isn't it?

gabrielruttner

yes, similar and we've been toying around with some feedback to have a `pickaxe.memo(()=>{})` utility to quickly wrap small chunks of code similar to `useMemo`.

we'll be continuously improving docs on this project, but since pickaxe is built on hatchet it supports concurrency [1]. so for a chat usecase, you can pass the chat history to the top level agent but propagate cancelation for other message runs in the session to handle if the user sends a few messages in a row. we'll work an example in pattern section for this!

[1] https://docs.hatchet.run/home/concurrency#cancel-in-progress

blixt

I see the API is rarely mentioning exact message structure (system prompt, assistant/user history, etc) or the choice of model (other than defaultLanguageModel). And it's not immediately clear to me how `toolbox.pickAndRun` can access any context from an ongoing agentic flow other than within the one prompt. But this is just from skimming the docs, maybe all of this is supported?

The reason I ask is because I've had a lot of success using different models for different tasks, constructing the system prompt specifically for each task, and also choosing between the "default" long assistant/tool_call/user/(repeat) message history vs. constantly pruning it (bad for caching but sometimes good for performance). And it would be nice to know a library like this could allow experimentation of these strategies.

gabrielruttner

gabe, hatchet cofounder here. thanks for this feedback and i agree!

under the hood we're using vercel ai sdk to make tool calls so this is easily extended [1]. this is the only "opinionated" api for calling llm apis which is "bundled" within the sdk and we were torn on how to expose it for this exact reason, but since its so common we decided to include it.

some things we were thinking is overloading `defaultLanguageModel` with a map for different usecases, or allowing users to "eject" the tool picker to customize it as needed. i've opened a discussion [2] to track this.

[1] https://github.com/hatchet-dev/pickaxe/blob/main/sdk/src/cli...

[2] https://github.com/hatchet-dev/pickaxe/discussions/3

cmdtab

I think providing examples and sample code is better than tying your API to AI sdk.

Due to how fast AI providers are iterating on their APIs, many features arrive weeks or months later to AI SDK (support for openai computer use is pending since forever for example).

I like the current API where you can wait for an event. Similar to that, it would be great to have an API for streaming and receiving messages and everything else is handled by the person so they could use AI sdk and stream the end response manually.

awaseem

Love to see more frameworks like this in the Typescript eco-system! How does this compare to Mastra: https://mastra.ai/

gabrielruttner

thanks! we think of mastra and other frameworks as "batteries included" for patterns like memory and reasoning. this is great for many but not all projects. i think mastra is doing a great job balancing some of this by simply wrapping vercel's ai sdk (we took some inspiration here in our tool picker and it is recommendation for llm calls).

we're leaning away from being a framework in favor of being a library specifically because we're seeing teams looking to implement their own business logic for most core agentic capabilities where things like concurrency, fairness, or resource contention become problematic (think many agents reading 1000s of documents in parallel).

unlike most frameworks we've been working on the orchestrator, hatchet, first for over a year and are basing these patterns on what we've seen our most successful companies already doing.

put shortly - pickaxe brings orchestration and best practices, but you're free to implement to your requirements.

[1] https://github.com/hatchet-dev/hatchet

randomcatuser

Oh this is really cool! I was building out a bit of this with Restate this past week, but this seems really well put together :) will give it a try!

abelanger

Thanks! Would love to hear more about what type of agent you're building.

We've heard pretty often that durable execution is difficult to wrap your head around, and we've also seen more of our users (including experienced engineers) relying on Cursor and Claude Code while building. So one of the experiments we've been running is ensuring that the agent code is durable when written by LLMs by using our MCP server so the agents can follow best practices while generating code: https://pickaxe.hatchet.run/development/developing-agents#pi...

Our MCP server is super lightweight and basically just tells the LLM to read the docs here: https://pickaxe.hatchet.run/mcp/mcp-instructions.md (along with some tool calls for scaffolding)

I have no idea if this is useful or not, but we were able to get Claude to generate complex agents which were written with durable execution best practices (no side effects or non-determinism between retries), which we viewed as a good sign.

almosthere

What I really like about it, is that this kind of project helps people learn what an agent is.

abelanger

Thanks! Our favorite resources on this (both have been posted on HN a few times):

- https://www.anthropic.com/engineering/building-effective-age...

- https://github.com/humanlayer/12-factor-agents

That's also why we implemented pretty much all relevant patterns in the docs (i.e. https://pickaxe.hatchet.run/patterns/prompt-chaining).

If there's an example or pattern that you'd like to see, let me know and we can get it released.

golergka

Fantastic. That's exactly what I wanted to make for a long time but never got around to, writing ad-hoc, lacking, overlapping stuff each time.

zegl

As a long time Hatchet user, I understand why you’ve created this library, but it also disappoints me a little bit. I wish more engineering time was spent on making the core platform more stable and performant.

abelanger

Definitely understand the frustration, the difficulty of Hatchet being general-purpose is that being performant for every use-case can be tricky, particularly when combining many features (concurrency, rate limiting, priority queueing, retries with backoff, etc). We should be more transparent about which combinations of use-cases we're focused on optimizing.

We spent a long time optimizing the single-task FIFO use-case, which is what we typically benchmark against. Performance for that pattern is i/o-bound at > 10k/s which is a good sign (just need better disks). So a pure durable-execution workload should run very performantly.

We're focused on improving multi-task and concurrency use-cases now. Our benchmarking setup recently added support for those patterns. More on this soon!