Claude Skills

149 comments

·October 16, 2025

https://www.anthropic.com/engineering/equipping-agents-for-t...

fny

I fear the conceptual churn we're going to endure in the coming years will rival frontend dev.

Across ChatGPT and Claude we now have tools, functions, skills, agents, subagents, commands, and apps, and there's a metastasizing complex of vibed-frameworks feeding on this mess.

mathattack

There's so much white space - this is the cost of a brand new technology. Similar issues with figuring out what cloud tools to use, or what python libraries are most relevant.

LPisGood

Metastasizing is such an excellent way to describe this phenomenon. They grow on top of each other.

hkt

The same thing will happen: skilled people will do one thing well. I've zero interest in anything but Claude code in a dev container and, while mindful of the lethal trifecta, will give Claude as much access to a local dev environment and it's associated tooling as I would give to a junior developer.

mercurialsolo

Sub agents, mcp, skills - wonder how are they supposed to interact with each other?

Feels like fair bit of overlap here. It's ok to proceed in a direction where you are upgrading the spec and enabling claude wth additional capabilities. But one can pretty much use any of these approaches and end up with the same capability for an agent.

Right now feels like a ux upgrade from mcp where you need a json but instead can use a markdown in a file / folder and provide multi-modal inputs.

josefresco

I just used tested the canvas-design skill and the results were pretty awful.

This is the skill description:

Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.

What it created was an abstract art museum-esque poster with random shapes and no discernable message. It may have been trying to design a playing card but just failed miserably which is my experience with most AI image generators.

It certainly spent a lot of time, and effort to create the poster. It asked initial questions, developed a plan, did research, created tooling - seems like a waste of "tokens" given how simple and lame the resulting image turned out.

Also after testing I still don't know how to "use" one of these skills in an actual chat.

Imnimo

I feel like a danger with this sort of thing is that the capability of the system to use the right skill is limited by the little blurb you give about what the skill is for. Contrast with the way a human learns skills - as we gain experience with a skill, we get better at understanding when it's the right tool for the job. But Claude is always starting from ground zero and skimming your descriptions.

mbesto

> Contrast with the way a human learns skills - as we gain experience with a skill, we get better at understanding when it's the right tool for the job.

Which is precisely why Richard Sutton doesn't think LLMs will evolve to AGI[0]. LLMs are based on mimicry, not experience, so it's more likely (according to Sutton) that AGI will be based on some form of RL (reinforcement learning) and not neural networks (LLMs).

More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence. So, to your point, the idea of a "skill" is more akin to a reference manual, than it is a skill building exercise that can be applied to developing an instrument, task, solution, etc.

[0] https://www.youtube.com/watch?v=21EYKqUsPfg

mediaman

It's a false dichotomy. LLMs are already being trained with RL to have goal directedness.

He is right that non-RL'd LLMs are just mimicry, but the field already moved beyond that.

anomaloustho

I wrote elsewhere but I’m more interpreting this distinction as “RL in real-time” vs “RL beforehand”.

baxtr

So it’s on-the-fly adaptive mimicry?

dingnuts

Explain something to me that I've long wondered: how does Reinforcement Learning work if you cannot measure your distance from the goal? In other words, how can RL be used for literally anything qualitative?

leptons

I can't wait to try to convince an LLM/RL/whatever-it-is that what it "thinks" is right is actually wrong.

buildbot

The industry has been doing RL on many kinds of neural networks, including LLMs, for quite some time. Is this person saying we RL on some kind of non neural network design? Why is that more likely to bring AGI than an LLM?.

> More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence.

Citation?

anomaloustho

Looks like they added the link. But I think it’s doing RL in realtime vs pre-trained as an LLM is.

And I associate that part to AGI being able to do cutting edge research and explore new ideas like humans can. Where, when that seems to “happen” with LLMs it’s been more debatable. (e.g. there was an existing paper that the LLM was able to tap into)

I guess another example would be to get an AGI doing RL in realtime to get really good at a video game with completely different mechanics in the same way a human could. Today, that wouldn’t really happen unless it was able to pre-train on something similar.

jfarina

Why are you asking them to cite something for that statement? Are you questioning whether it's the foundation for intelligence or whether LLMS understand goals and consequences?

skurilyak

Besides a "reference manual", Claude Skills is analogous to a "toolkit with an instruction manual" in that it includes both instructions (manuals) and executable functions (tools/code)

hbarka

For humans, it’s not uncommon to have a clever realization by way of serendipity. How do you skill AI to have serendipity.

vonneumannstan

This is an uninformed take. Much of the improvement in performance of LLM based models has been through RLHF and other RL techniques.

zobzu

IMO this is a context window issue. Humans are pretty good are memorizing super broad context without great accuracy. Sometimes our "recall" function doesn't even work right ("How do you say 'blah' in German again?"), so the more you specialize (say, 10k hours / mastery), the better you are at recalling a specific set of "skills", but perhaps not other skills.

On the other hand, LLMs have a programatic context with consistent storage and the ability to have perfect recall, they just don't always generate the expected output in practice as the cost to go through ALL context is prohibitive in terms of power and time.

Skills.. or really just context insertion is simply a way to prioritize their output generation manually. LLM "thinking mode" is the same, for what it's worth - it really is just reprioritizing context - so not "starting from scratch" per se.

When you start thinking about it that way, it makes sense - and it helps using these tools more effectively too.

ryancnelson

I commented here already about deli-gator ( https://github.com/ryancnelson/deli-gator ) , but your summary nailed what I didn’t mention here before: Context.

I’d been re-teaching Claude to craft Rest-api calls with curl every morning for months before i realized that skills would let me delegate that to cheaper models, re-using cached-token-queries, and save my context window for my actual problem-space CONTEXT.

dingnuts

>I’d been re-teaching Claude to craft Rest-api calls with curl every morning for months

what the fuck, there is absolutely no way this was cheaper or more productive than just learning to use curl and writing curl calls yourself. Curl isn't even hard! And if you learn to use it, you get WAY better at working with HTTP!

You're kneecapping yourself to expend more effort than it would take to just write the calls, helping to train a bot to do the job you should be doing

mbesto

> IMO this is a context window issue.

Not really. It's a consequential issue. No matter how big or small the context window is, LLMs simply do not have the concept of goals and consequences. Thus, it's difficult for them to acquire dynamic and evolving "skills" like humans do.

dwaltrip

There are ways to compensate for lack of “continual learning”, but recognizing that underlying missing piece is important.

andruby

Would this requirement to start from ground zero in current LLMs be an artefact of the requirement to have a "multi-tenant" infrastructure?

Of course OpenAI and Anthropic want to be able to reuse the same servers/memory for multiple users, otherwise it would be too expensive.

Could we have "personal" single-tenant setups? Where the LLM incorporates every previous conversation?

ChadMoran

This is the crux of knowledge/tool enrichment in LLMs. The idea that we can have knowledge bases and LLMs will know WHEN to use them is a bit of a pipe dream right now.

fragmede

Can you be more specific? The simple case seems to be solved, eg if I have an mcp for foo enabled and then ask about a list of foo, Claude will go and call the list function on foo.

blackoil

Most of the experience is general information not specific to project/discussion. LLM starts with all that knowledge. Next it needs a memory and lookup system for project specific information. Lookup in humans is amazingly fast, but even with a slow lookup, LLMs can refer to it in near real-time.

seunosewa

The blurbs can be improved if they aren't effective. You can also invoke skills directly.

The description is equivalent to your short term memory.

The skill is like your long term memory which is retrieved if needed.

These should both be considered as part of the AI agent. Not external things.

j45

LLMs are a probability based calculation, so it will always skim to some degree, and always guess to some degree, and often pick the best choice available to it even though it might not be the best.

For folks who this seems elusive for, it's worth learning how the internals actually work, helps a great deal in how to structure things in general, and then over time as the parent comment said, specifically for individual cases.

arjie

It's pretty neat that they're adding these things. In my projects, I have a `bin/claude` subdirectory where I ask it to put scripts etc. that it builds. In the claude.md I then note that it should look there for tools. It does a pretty good job of this. To be honest, the thing I most need are context-management helpers like "start a claude with this set of MCPs, then that set, and so on". Instead right now I have separate subdirectories that I then treat as projects (which are supported as profiles in Claude) which I then launch a `claude` from. The advantage of the `bin/claude` in each of these things is that it functions as a longer-cycle learning thing. My Claude instantly knows how to analyze certain BigQuery datasets and where to find the credentials file and so on.

Filesystem as profile manager is not something I thought I'd be doing, but here we are.

mousetree

I'm perplexed why they would use such a silly example in their demo video (rotating an image of a dog upside down and cropping). Surely they can find more compelling examples of where these skills could be used?

antiloper

The developer page uses a better example, a PDF processing skill: https://github.com/anthropics/skills/tree/main/document-skil...

I've been emulating this in claude code by manually @tagging markdown files containing guides for common tasks in our repository. Nice to see that this step is now automatic as well.

Mouvelie

You'd think so, eh ? https://en.wikipedia.org/wiki/The_purpose_of_a_system_is_wha...

alansaber

Dog photo >> informing the consumer

iyn

Does anyone know how skills relate to subagents? Seems that subagents have more capabilities (e.g. can access the internet) but seems that there's a lot of overlap.

I've asked Claude and this it answered this:

  Skills = Instructions + resources for the current Claude instance (shared context)
  Subagents = Separate AI instances with isolated contexts that can work in parallel (different context windows)
  Skills make Claude better at specific tasks. Subagents are like having multiple specialized Claudes working simultaneously on different aspects of a problem.

I imagine we can probably compose them, e.g. invoke subagents (to keep separate context) which could use some skills to in the end summarize the findings/provide output, without "polluting" the main context window.

lukev

How this reads to me is that a skill is "just" a bundle of prompts, scripts, and files that can be read into context as a unit.

Having a sub-agent "execute" a skill makes a lot of sense from a context management, perspective, but I think the way to think about it is that a sub-agent is an "execution-level" construct, whereas a skill is a "data-level" construct.

throwup238

Skills can also contain scripts that can be executed in a VM. The Anthropic engineering blog mentions that you can specify in the markdown instructions whether the script should be executed or read into context. One of their examples is a script to extract properties from a PDF file.

simonw

I accidentally leaked the existence of these last Friday, glad they officially exist now! https://simonwillison.net/2025/Oct/10/claude-skills/

buildbot

"So I fired up a fresh Claude instance (fun fact: Code Interpreter also works in the Claude iOS app now, which it didn't when they first launched) and prompted:

Create a zip file of everything in your /mnt/skills folder"

It's a fun, terrifying world that this kind of "hack" to exfiltrate data is possible! I hope it does not have full filesystem/bin access, lol. Can it SSH?...

antiloper

What's the hack? Instead of typing `zip -r mnt.zip /mnt` into bash, you type `Create a zip file of /mnt` in claude code. It's the same thing running as the same user.

ryancnelson

The uptake on Claude-skills seems to have a lot of momentum already! I was fascinated on Tuesday by “Superpowers” , https://blog.fsck.com/2025/10/09/superpowers/ … and then packaged up all the tool-building I’ve been working on for awhile into somewhat tidy skills that i can delegate agents to:

http://github.com/ryancnelson/deli-gator I’d love any feedback

skinnymuch

Delegation is super cool. I can sometimes end up having too much Linear issue context coming in. IE frequently I want a Linear issue description and last comment retrieved. Linear MCP grabs all comments which pollutes the context and fills it up too much.

qwertox

This is incredible. So with MCP I show them how they can access my server, so that I can execute code to perform a task. And with skills, I upload my code to their servers, they drop the protocol and directly execute the code for you.

Possibly great to centralize the configuration of your personalized AI, for access in the browser, the CLI and on mobile, also via voice.

CuriouslyC

Anything the model chooses to use is going to waste context and get utilized poorly. Also, the more skills you have, the worse they're going to be. It's subagents v2.

Just use slash commands, they work a lot better.

phildougherty

getting hard to keep up with skills, plugins, marketplaces, connectors, add-ons, yada yada

hiq

IMHO, don't, don't keep up. Just like "best practices in prompt engineering", these are just temporary workaround for current limitations, and they're bound to disappear quickly. Unless you really need the extra performance right now, just wait until models get you this performance out of the box instead of investing into learning something that'll be obsolete in months.

lukev

I agree with your conclusion not to sweat all these features too much, but only because they're not hard at all to understand on demand once you realize that they all boil down to a small handful of ways to manipulate model context.

But context engineering very much not going anywhere as a discipline. Bigger and better models will by no means make it obsolete. In fact, raw model capability is pretty clearly leveling off into the top of an S-curve, and most real-world performance gains over the last year have been precisely because of innovations on how to better leverage context.

vdfs

IMO, these are just marketing or new ways of using functions calling, under the hood they all get re-written as tools the model can call

spprashant

I agree with this take. Models and the tooling around them are both in flux. I d rather not spend time learning something in detail for theses companies to pull the plug for next-big-thing.

hansmayer

Well, have some understanding: the good folks need to produce something, since their main product is not delivering the much yearned for era of joblessness yet. It's not for you, it's signalling their investors - see, we're not burning your cash paying a bunch of PhDs to tweak the model weights without visible results. We are actually building products. With a huge and willing A/B testing base.

gordonhart

Agree — it's a big downside as a user to have more and more of these provider-specific features. More to learn, more to configure, more to get locked into.

Of course this is why the model providers keep shipping new ones; without them their product is a commodity.

dominicq

Features will be added until morale improves

marcusestes

Agreed, but I think it's actually simple.

Plugins include: * Commands * MCPs * Subagents * Now, Skills

Marketplaces aggregate plugins.

input_sh

It's so simple you didn't even name all of them properly.

tempusalaria

All these things are designed to create lock in for companies. They don’t really fundamentally add to the functionality of LLMs. Devs should focus on working directly with model generate apis and not using all the decoration.

tqwhite

Me? I love some lock in. Give me the coolest stuff and I'll be your customer forever. I do not care about trying to be my own AI company. I'd feel the same about OpenAI if they got me first... but they didn't. I am team Anthropic.

adidoit

All of it is ultimately managing the context for a model. Just different methods

prng2021

Yep. Now I need an AI to help me use AI

josefresco

Joking aside, I ask Claude how to uses Claude... all the time! Sometimes I ask ChatGTP about Claude. It actually doesn't work well because they don't imbue these AI tools with any special knowledge about how they work, they seem to rely on public documentation which usually lags behind the breakneck pace of these feature-releases.

andoando

Train AI to setup/train AI on doing tasks. Bam

consumer451

I mean, that is a very common thing that I do.

wartywhoa23

That's why the key word for all the AI horror stories that have been emerging lately is "recursion".

HN

Claude Skills

Claude Skills