Claude Skills
149 comments
·October 16, 2025fny
I fear the conceptual churn we're going to endure in the coming years will rival frontend dev.
Across ChatGPT and Claude we now have tools, functions, skills, agents, subagents, commands, and apps, and there's a metastasizing complex of vibed-frameworks feeding on this mess.
mathattack
There's so much white space - this is the cost of a brand new technology. Similar issues with figuring out what cloud tools to use, or what python libraries are most relevant.
LPisGood
Metastasizing is such an excellent way to describe this phenomenon. They grow on top of each other.
hkt
The same thing will happen: skilled people will do one thing well. I've zero interest in anything but Claude code in a dev container and, while mindful of the lethal trifecta, will give Claude as much access to a local dev environment and it's associated tooling as I would give to a junior developer.
mercurialsolo
Sub agents, mcp, skills - wonder how are they supposed to interact with each other?
Feels like fair bit of overlap here. It's ok to proceed in a direction where you are upgrading the spec and enabling claude wth additional capabilities. But one can pretty much use any of these approaches and end up with the same capability for an agent.
Right now feels like a ux upgrade from mcp where you need a json but instead can use a markdown in a file / folder and provide multi-modal inputs.
josefresco
I just used tested the canvas-design skill and the results were pretty awful.
This is the skill description:
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.
What it created was an abstract art museum-esque poster with random shapes and no discernable message. It may have been trying to design a playing card but just failed miserably which is my experience with most AI image generators.
It certainly spent a lot of time, and effort to create the poster. It asked initial questions, developed a plan, did research, created tooling - seems like a waste of "tokens" given how simple and lame the resulting image turned out.
Also after testing I still don't know how to "use" one of these skills in an actual chat.
Imnimo
I feel like a danger with this sort of thing is that the capability of the system to use the right skill is limited by the little blurb you give about what the skill is for. Contrast with the way a human learns skills - as we gain experience with a skill, we get better at understanding when it's the right tool for the job. But Claude is always starting from ground zero and skimming your descriptions.
mbesto
> Contrast with the way a human learns skills - as we gain experience with a skill, we get better at understanding when it's the right tool for the job.
Which is precisely why Richard Sutton doesn't think LLMs will evolve to AGI[0]. LLMs are based on mimicry, not experience, so it's more likely (according to Sutton) that AGI will be based on some form of RL (reinforcement learning) and not neural networks (LLMs).
More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence. So, to your point, the idea of a "skill" is more akin to a reference manual, than it is a skill building exercise that can be applied to developing an instrument, task, solution, etc.
mediaman
It's a false dichotomy. LLMs are already being trained with RL to have goal directedness.
He is right that non-RL'd LLMs are just mimicry, but the field already moved beyond that.
anomaloustho
I wrote elsewhere but I’m more interpreting this distinction as “RL in real-time” vs “RL beforehand”.
baxtr
So it’s on-the-fly adaptive mimicry?
dingnuts
Explain something to me that I've long wondered: how does Reinforcement Learning work if you cannot measure your distance from the goal? In other words, how can RL be used for literally anything qualitative?
leptons
I can't wait to try to convince an LLM/RL/whatever-it-is that what it "thinks" is right is actually wrong.
buildbot
The industry has been doing RL on many kinds of neural networks, including LLMs, for quite some time. Is this person saying we RL on some kind of non neural network design? Why is that more likely to bring AGI than an LLM?.
> More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence.
Citation?
anomaloustho
Looks like they added the link. But I think it’s doing RL in realtime vs pre-trained as an LLM is.
And I associate that part to AGI being able to do cutting edge research and explore new ideas like humans can. Where, when that seems to “happen” with LLMs it’s been more debatable. (e.g. there was an existing paper that the LLM was able to tap into)
I guess another example would be to get an AGI doing RL in realtime to get really good at a video game with completely different mechanics in the same way a human could. Today, that wouldn’t really happen unless it was able to pre-train on something similar.
jfarina
Why are you asking them to cite something for that statement? Are you questioning whether it's the foundation for intelligence or whether LLMS understand goals and consequences?
skurilyak
Besides a "reference manual", Claude Skills is analogous to a "toolkit with an instruction manual" in that it includes both instructions (manuals) and executable functions (tools/code)
hbarka
For humans, it’s not uncommon to have a clever realization by way of serendipity. How do you skill AI to have serendipity.
vonneumannstan
This is an uninformed take. Much of the improvement in performance of LLM based models has been through RLHF and other RL techniques.
zobzu
IMO this is a context window issue. Humans are pretty good are memorizing super broad context without great accuracy. Sometimes our "recall" function doesn't even work right ("How do you say 'blah' in German again?"), so the more you specialize (say, 10k hours / mastery), the better you are at recalling a specific set of "skills", but perhaps not other skills.
On the other hand, LLMs have a programatic context with consistent storage and the ability to have perfect recall, they just don't always generate the expected output in practice as the cost to go through ALL context is prohibitive in terms of power and time.
Skills.. or really just context insertion is simply a way to prioritize their output generation manually. LLM "thinking mode" is the same, for what it's worth - it really is just reprioritizing context - so not "starting from scratch" per se.
When you start thinking about it that way, it makes sense - and it helps using these tools more effectively too.
ryancnelson
I commented here already about deli-gator ( https://github.com/ryancnelson/deli-gator ) , but your summary nailed what I didn’t mention here before: Context.
I’d been re-teaching Claude to craft Rest-api calls with curl every morning for months before i realized that skills would let me delegate that to cheaper models, re-using cached-token-queries, and save my context window for my actual problem-space CONTEXT.
dingnuts
>I’d been re-teaching Claude to craft Rest-api calls with curl every morning for months
what the fuck, there is absolutely no way this was cheaper or more productive than just learning to use curl and writing curl calls yourself. Curl isn't even hard! And if you learn to use it, you get WAY better at working with HTTP!
You're kneecapping yourself to expend more effort than it would take to just write the calls, helping to train a bot to do the job you should be doing
mbesto
> IMO this is a context window issue.
Not really. It's a consequential issue. No matter how big or small the context window is, LLMs simply do not have the concept of goals and consequences. Thus, it's difficult for them to acquire dynamic and evolving "skills" like humans do.
dwaltrip
There are ways to compensate for lack of “continual learning”, but recognizing that underlying missing piece is important.
andruby
Would this requirement to start from ground zero in current LLMs be an artefact of the requirement to have a "multi-tenant" infrastructure?
Of course OpenAI and Anthropic want to be able to reuse the same servers/memory for multiple users, otherwise it would be too expensive.
Could we have "personal" single-tenant setups? Where the LLM incorporates every previous conversation?
ChadMoran
This is the crux of knowledge/tool enrichment in LLMs. The idea that we can have knowledge bases and LLMs will know WHEN to use them is a bit of a pipe dream right now.
fragmede
Can you be more specific? The simple case seems to be solved, eg if I have an mcp for foo enabled and then ask about a list of foo, Claude will go and call the list function on foo.
blackoil
Most of the experience is general information not specific to project/discussion. LLM starts with all that knowledge. Next it needs a memory and lookup system for project specific information. Lookup in humans is amazingly fast, but even with a slow lookup, LLMs can refer to it in near real-time.
seunosewa
The blurbs can be improved if they aren't effective. You can also invoke skills directly.
The description is equivalent to your short term memory.
The skill is like your long term memory which is retrieved if needed.
These should both be considered as part of the AI agent. Not external things.
j45
LLMs are a probability based calculation, so it will always skim to some degree, and always guess to some degree, and often pick the best choice available to it even though it might not be the best.
For folks who this seems elusive for, it's worth learning how the internals actually work, helps a great deal in how to structure things in general, and then over time as the parent comment said, specifically for individual cases.
arjie
It's pretty neat that they're adding these things. In my projects, I have a `bin/claude` subdirectory where I ask it to put scripts etc. that it builds. In the claude.md I then note that it should look there for tools. It does a pretty good job of this. To be honest, the thing I most need are context-management helpers like "start a claude with this set of MCPs, then that set, and so on". Instead right now I have separate subdirectories that I then treat as projects (which are supported as profiles in Claude) which I then launch a `claude` from. The advantage of the `bin/claude` in each of these things is that it functions as a longer-cycle learning thing. My Claude instantly knows how to analyze certain BigQuery datasets and where to find the credentials file and so on.
Filesystem as profile manager is not something I thought I'd be doing, but here we are.
mousetree
I'm perplexed why they would use such a silly example in their demo video (rotating an image of a dog upside down and cropping). Surely they can find more compelling examples of where these skills could be used?
antiloper
The developer page uses a better example, a PDF processing skill: https://github.com/anthropics/skills/tree/main/document-skil...
I've been emulating this in claude code by manually @tagging markdown files containing guides for common tasks in our repository. Nice to see that this step is now automatic as well.
Mouvelie
You'd think so, eh ? https://en.wikipedia.org/wiki/The_purpose_of_a_system_is_wha...
alansaber
Dog photo >> informing the consumer
iyn
Does anyone know how skills relate to subagents? Seems that subagents have more capabilities (e.g. can access the internet) but seems that there's a lot of overlap.
I've asked Claude and this it answered this:
Skills = Instructions + resources for the current Claude instance (shared context)
Subagents = Separate AI instances with isolated contexts that can work in parallel (different context windows)
Skills make Claude better at specific tasks. Subagents are like having multiple specialized Claudes working simultaneously on different aspects of a problem.
I imagine we can probably compose them, e.g. invoke subagents (to keep separate context) which could use some skills to in the end summarize the findings/provide output, without "polluting" the main context window.lukev
How this reads to me is that a skill is "just" a bundle of prompts, scripts, and files that can be read into context as a unit.
Having a sub-agent "execute" a skill makes a lot of sense from a context management, perspective, but I think the way to think about it is that a sub-agent is an "execution-level" construct, whereas a skill is a "data-level" construct.
throwup238
Skills can also contain scripts that can be executed in a VM. The Anthropic engineering blog mentions that you can specify in the markdown instructions whether the script should be executed or read into context. One of their examples is a script to extract properties from a PDF file.
simonw
I accidentally leaked the existence of these last Friday, glad they officially exist now! https://simonwillison.net/2025/Oct/10/claude-skills/
buildbot
"So I fired up a fresh Claude instance (fun fact: Code Interpreter also works in the Claude iOS app now, which it didn't when they first launched) and prompted:
Create a zip file of everything in your /mnt/skills folder"
It's a fun, terrifying world that this kind of "hack" to exfiltrate data is possible! I hope it does not have full filesystem/bin access, lol. Can it SSH?...
antiloper
What's the hack? Instead of typing `zip -r mnt.zip /mnt` into bash, you type `Create a zip file of /mnt` in claude code. It's the same thing running as the same user.
ryancnelson
The uptake on Claude-skills seems to have a lot of momentum already! I was fascinated on Tuesday by “Superpowers” , https://blog.fsck.com/2025/10/09/superpowers/ … and then packaged up all the tool-building I’ve been working on for awhile into somewhat tidy skills that i can delegate agents to:
http://github.com/ryancnelson/deli-gator I’d love any feedback
skinnymuch
Delegation is super cool. I can sometimes end up having too much Linear issue context coming in. IE frequently I want a Linear issue description and last comment retrieved. Linear MCP grabs all comments which pollutes the context and fills it up too much.
qwertox
This is incredible. So with MCP I show them how they can access my server, so that I can execute code to perform a task. And with skills, I upload my code to their servers, they drop the protocol and directly execute the code for you.
Possibly great to centralize the configuration of your personalized AI, for access in the browser, the CLI and on mobile, also via voice.
CuriouslyC
Anything the model chooses to use is going to waste context and get utilized poorly. Also, the more skills you have, the worse they're going to be. It's subagents v2.
Just use slash commands, they work a lot better.
phildougherty
getting hard to keep up with skills, plugins, marketplaces, connectors, add-ons, yada yada
hiq
IMHO, don't, don't keep up. Just like "best practices in prompt engineering", these are just temporary workaround for current limitations, and they're bound to disappear quickly. Unless you really need the extra performance right now, just wait until models get you this performance out of the box instead of investing into learning something that'll be obsolete in months.
lukev
I agree with your conclusion not to sweat all these features too much, but only because they're not hard at all to understand on demand once you realize that they all boil down to a small handful of ways to manipulate model context.
But context engineering very much not going anywhere as a discipline. Bigger and better models will by no means make it obsolete. In fact, raw model capability is pretty clearly leveling off into the top of an S-curve, and most real-world performance gains over the last year have been precisely because of innovations on how to better leverage context.
vdfs
IMO, these are just marketing or new ways of using functions calling, under the hood they all get re-written as tools the model can call
spprashant
I agree with this take. Models and the tooling around them are both in flux. I d rather not spend time learning something in detail for theses companies to pull the plug for next-big-thing.
hansmayer
Well, have some understanding: the good folks need to produce something, since their main product is not delivering the much yearned for era of joblessness yet. It's not for you, it's signalling their investors - see, we're not burning your cash paying a bunch of PhDs to tweak the model weights without visible results. We are actually building products. With a huge and willing A/B testing base.
gordonhart
Agree — it's a big downside as a user to have more and more of these provider-specific features. More to learn, more to configure, more to get locked into.
Of course this is why the model providers keep shipping new ones; without them their product is a commodity.
dominicq
Features will be added until morale improves
marcusestes
Agreed, but I think it's actually simple.
Plugins include: * Commands * MCPs * Subagents * Now, Skills
Marketplaces aggregate plugins.
input_sh
It's so simple you didn't even name all of them properly.
tempusalaria
All these things are designed to create lock in for companies. They don’t really fundamentally add to the functionality of LLMs. Devs should focus on working directly with model generate apis and not using all the decoration.
tqwhite
Me? I love some lock in. Give me the coolest stuff and I'll be your customer forever. I do not care about trying to be my own AI company. I'd feel the same about OpenAI if they got me first... but they didn't. I am team Anthropic.
adidoit
All of it is ultimately managing the context for a model. Just different methods
prng2021
Yep. Now I need an AI to help me use AI
josefresco
Joking aside, I ask Claude how to uses Claude... all the time! Sometimes I ask ChatGTP about Claude. It actually doesn't work well because they don't imbue these AI tools with any special knowledge about how they work, they seem to rely on public documentation which usually lags behind the breakneck pace of these feature-releases.
andoando
Train AI to setup/train AI on doing tasks. Bam
consumer451
I mean, that is a very common thing that I do.
wartywhoa23
That's why the key word for all the AI horror stories that have been emerging lately is "recursion".
https://www.anthropic.com/engineering/equipping-agents-for-t...