Skip to content(if available)orjump to list(if available)

Claude Skills are awesome, maybe a bigger deal than MCP

michael1999

We're doing something like this internally. Our monorepo context files were much too big, so we built a progressive tree of fragments to load up for different tasks.

I am struck by how much these kinds of context documents resemble normal developer documentation, but actually useful and task-oriented. What was the barrier to creating these documents before?

Three theories on why this is so different:

1) The feedback loop was too long. If you wrote some docs, you might never learn if they were any good. If you did, it might be years later. And if you changed them, doing an A/B test was impractical. Now, you can write up a context markdown, ask Claude to do something, and iterate in minutes.

2) The tools can help build them. Building good docs was always hard. Especially if you take the time to include examples, urls, etc. that make the documentation truly useful. These tools reduce this cost.

3) Many programmers are egotists. Documentation that helps other people doesn't generate internal motivation. But documentation that allows you to better harness a computer minion to your will is attractive.

Any other theories?

7thpower

It is primarily a principal agent problem, with a hint of marshmallow test.

If you are a developer who is not writing the documents for consumption by AI, you are primarily writing documents for someone who is not you; you do not know what this person will need or if they will ever even look at them.

They may, of course, help you, but you may not understand that, have the time, or discipline.

If you are writing them because the AI using them will help you, you have a very strong and immediate incentive to document the necessary information. You also have the benefit of a short feedback loop.

Side note, thanks to the LLMs penchant of wiping out comments, I have a lot more docs these days and far fewer comments.

emn13

I think it's not at all a marshmellow test; quite the opposite - docs used to be written way, way in advance of their consumption. The problem that implies is twofold. Firstly, and less significantly, it's just not a great return on investment to spend tons of effort now to maybe help slightly in the far future.

But the real problem with docs is that for MOST usecases, the audience and context of the readers matter HUGELY. Most docs are bad because we can't predict those. People waste ridiculous amounts of time writing docs that nobody reads or nobody needs based on hypotheses about the future that turn out to be false.

And _that_ is completely different when you're writing context-window documents. These aren't really documents describing any codebase or context within which the codebase exists in some timeless fashion, they're better understood as part of a _current_ plan for action on a acute, real concern. They're battle-tested the way docs only rarely are. And as a bonus, sure, they're retainable and might help for the next problem too, but that's not why they work; they work because they're useful in an almost testable way right away.

The exceptions to this pattern kind of prove the rule - people for years have done better at documenting isolatable dependencies, i.e. libraries - precisely because those happen to sit at boundaries where it's both easier to make decent predictions about future usage, and often also because those docs might have far larger readership, so it's more worth it to take the risk of having an incorrect hypothesis about the future wasting effort - the cost/benefit is skewed towards the benefit by sheer numbers and the kind of code it is.

Having said that, the dust hasn't settled on the best way to distill context like this. It's be a mistake to overanalyze the current situation and conclude that documentation is certain to be the long-term answer - it's definitely helpful now, but it's certainly conceivable that more automated and structured representations might emerge, or in forms better suited for machine consumption that look a little more alien to us than conventional docs.

Wowfunhappy

Your LLMs get rid of comments? Mine add them incessantly.

evanmoran

I know this is highly controversial, but I now leave the comments in. My theory is that the “probability space” the LLM is writing code in can’t help but write them, so if i leave them next LLM that reads the code will start in the same space. Maybe it’s too much, but currently I just want the code to be right and I’ve let go of the exact wording of comments/variables/types to move faster.

ls612

I have to yell at Gemini to not add so many comments it almost writes more comments than code by default.

neves

I always write documentation thinking of myself in the future.

chermi

Yeah this is really interesting. My money is 80% on number 1. The good developers I know (I'm not among them) are very practical and results driven. If they see something is useful for their goals, they use it. There's the time delay that you mentioned, and also that there's no direct feedback at all via misalignment. You'll probably get a scolding if your code breaks or you miss deadline, but if someone else complains about documentation to manager, that's one more degree of separation. If the manager doesn't directly feel the pain, he/she won't pay as much attention.

Edit- I'm basically repeating the poster who said it's principal agent problem.

Herring

Probably all the same reasons tech debt exists in the first place: business pressure, poor design, lack of resources. It used to be expensive to keep good documentation up to date as the code changes.

ashleyn

If documentation is a side effect to providing accurate and effective AI context, it's pretty logical there will be a significant incentive to maintain it.

Groxx

yep. it's suddenly more-obviously valuable, so it's getting done.

crowbahr

The moment you write documentation it becomes stale. It's additional debt you've incurred and the upkeep must be payed every modification to the code.

That doesn't mean you should skip it - but it's vital to recognize the costs.

When I joined my current company they had extensive documentation on several systems, all of it outdated, stale or even just straight up wrong. I wasted cumulative weeks depending on other programmers to have properly documented things.

It's still worth doing: but you _must_ continually pay the debt down.

simonw

The fix for that is to keep the documentation in the same repository as the code it documents, and then to enforce that it gets updated as part of your code review process. Don't let a PR land if it doesn't also update any relevant documentation at the same time.

Keeping documentation in a separate system - like a wiki - is an anti-pattern in most cases. It leads to documentation that nobody trusts (and hence nobody consults) because it inevitable falls out of sync with the system it is documenting.

Plus... LLMs are good enough now that having one automatically check PRs to warn if the change affects the existing documentation might actually work well enough to be useful.

apitman

I've really been enjoying "hey Codex I just implemented such a fun such code change, where are all the places in the docs that I need to update?"

I've been slow adopting things. I know the cool kids are having agents do the docs changes and the code changes in the first place.

freedomben

Indeed, I frequently will ask Claude code if the documentation needs to be updated, and it's pretty good at doing so

tidbeck

That is why I like the idea of having as much of the documentation as possible in code. Tests that describe how things are supposed to work, infrastructure as code the describes the parts of the system and so on. Then you are forced to keep them up to date.

daxfohl

This is nothing new. v1 of something is always easy. Now in the coding assistant world, everything is v1. Let the tools go through a few deprecation cycles, weird things being added that don't map to the original intent, floods of overlapping tools and docs and everything else that someone has to reconcile, different LLM technologies that you need to support and adapt to, etc., and then see how things look.

I think all three things you mention come into play, but it's a bit early to judge whether the world has shifted or whether this is mainly a result of everything being fresh and new.

onel

I've built an agent that builds documentation for code bases and you are 100% right. Having big picture documentation is important, but having bite size explanations about why some components work a certain way is more important. Making sure the AI doesn't have to infer behavior from code is really powerful. Even going as low level of reference docs. Even though devs would prefer that a method be self-Explanatory, it helps to also have plain english explanation about what's happening in a class or method.

quinnjh

onboarding devs won’t risk looking dumb to complain about the bad docs, the authors already have a mental model, and writing them out fully helped others at expense of job security.

When doling bad docs to a stupid robot, you only have yourself to blame for the bad docs. So I think it’s #2 + #3. The big change is replacability going from bad to desirable (replace yourself with agents before you are replaced with a cheaper seat)

ryukoposting

> What was the barrier to creating these documents before?

In a proprietary system, there is pressure against creating quality technical documentation because it can be used to train your replacement. Writing docs solely for your own benefit, or your colleagues' benefit, is also dubious because you already know the things you wrote. Although returning to a thing you made months/years ago can be painful, it's not the day-to-day common case in enterprise software development.

AI assistants flip the incentives. Now, your docs are helping to steer your personal digital goblin in the right direction. The docs are tangibly augmenting your own abilities.

AtlasBarfed

Or tangibly enabling the company to eliminate you in favor of the AI assistant

Arguably the same structural disincentives are in place

crvdgc

> imagine a folder full of skills that covers tasks like the following:

> Where to get US census data from and how to understand its structure

Reminds me of my first time using Wolfram Alpha and got blown away by its ability to use actual structured tools to solve the problem, compared to normal search engine.

In fact, I tried again just now and am still amazed: https://www.wolframalpha.com/input?i=what%27s+the+total+popu...

I think my mental model for Skills would be Wolfram Alpha with custom extensions.

FireInsight

When clicking your link, for me it opened the following query on Wolfram Alpha: `what%27s the total population of the United States%3F`

Funnily enough, this was the result: `6.1% mod 3 °F (degrees Fahrenheit) (2015-2019 American Community Survey 5-year estimates)`

I wonder how that was calculated...

KeplerBoy

Wolfram alpha never took input in such a natural language. But something like population(USA) and many variations thereof work.

idk-92

tbh wolfram alpha was the craziest thing ever. haven't done much research on how this was implemented back in the day but to achieve what they did for such complex mathematical problems without AI was kind of nuts

pjmlp

It is basically another take on Lisp, and the development approach Lisp Machines had, repackaged in a more friendly syntax.

Lisp was the AI language until the first AI Winter took place, and also took Prolog alongside it.

Wolfram Alpha basically builds on them, to put in a very simplistic way.

krackers

It's one of the only M-expression versions of Lisp. All the weird stuff about Wolfram Language suddenly made sense when I saw it through that lens

globular-toast

Wolfram Alpha is AI. It's just not an LLM. AI has been a thing since the 60s. LLMs will also become "not AI" in a few years probably.

phs318u

Not sure why you’re getting downvoted. The marketing that LLM=AI seems to have been interpreted as “_only_ LLM=AI”

fragmede

I doubt that if the underlying parts changed, anyone outside the industry or enthusiasts would know what that is. How many people know what kind of engine is in their car? I stomp on the floor of my Corolla and away we go! Others might know that their Dodge Challenger has a Hemi. What even is that? Thankfully we have the Internet these days, and someone who's interested can just select the word and right click to Google for the Wikipedia article for it. AI is just such an entirely undefined term coloquially, that any attempts to define it will be wrong.

magicalhippo

Would really like something selfhosted that does the basic Wolfram Alpha math things.

Doesn't need the craziest math capability but standard symbolic math stuff like expression reduction, differentiation and integration of common equations, plotting, unit wrangling.

All with an easy to use text interface that doesn't require learning.

jhallenworld

Try maxima, it's open source:

https://maxima.sourceforge.io/

I used it when it was called Macsyma running on TOPS-20 (and a PDP-10 / Decsystem-20).

Text interface will require a little learning, but not much.

krackers

That's wolfram mathematica.

harrall

Personal faves:

- Mathematica

- Maple

- MathStudio (mobile)

- Ti-89 calculator (high school favorite)

Others:

- SageMath

- GNU Octave

- SymPy

- Maxima

- Mathcad

fooker

> without AI

We only call it AI until we understand it.

Once we understand LLMs more and there's a new promising poorly understood technology, we'll call our current AI something more computer sciency

simonw

My favorite definition of AI: "AI is whatever hasn't been done yet." - Larry Tesler, https://en.wikipedia.org/wiki/AI_effect

ge96

I used it a lot for calc as it would show you how they got the answer if I remember right, also liked how it understands symbols which ibv but cool to paste an integral sign in there

NuclearPM

Thank you for being honest.

tptacek

I do think the big story here is how hyperfocused and path-dependent people got on MCP, when the actually-interesting thing is simply "tool calls". Tool calls are incredibly interesting and useful. MCP is just one means to that end, and not one of the better ones.

simonw

I think MCP's huge adoption was mainly due to its timing.

Tool calling was a thing before MCP, but the models weren't very good at it. MCP almost exactly coincided with the models getting good enough at tool calling for it to be interesting.

So yeah, I agree - most of the MCP excitement was people learning that LLMs can call tools to interact with other systems.

aabhay

One thing about MCP that some people forget is that the models are post trained on MCP-based rollouts. I think people imagine that MCP was something people discovered about models but it’s deeper than that — models are explicitly trained to be able to interpret various and unseen kinds of MCP system prompts.

The exact same is true of these Claude Skills. Technically this is “just a system prompt and some tools”, but it’s actually about LLM labs intentionally encoding specific frameworks of action into the models.

simonw

A source I trust told me that Anthropic's models haven't yet been deliberately trained to know about skills.

kobalsky

MCP servers are basically a tool call registries, how could it be worse than a regular tool call?

behnamoh

an MCP server can run code outside of the domain of tools that it supports, tool call can't

TeMPOraL

Tools are literally function calls with extra steps. MCPs are interpreters of those function calls.

Same stuff, different name - only thing that's changed is that Anthropic got people to agree on RPC protocol.

It's not like it's a new idea, either. MCP isn't much different from SOAP or DCOM - but it works where the older approaches didn't, because LLMs are able to understand API definitions and natural-language documentation, and then map between those APIs (and user input) on the fly.

didibus

> Tool calls are incredibly interesting and useful. MCP is just one means to that end, and not one of the better ones.

It's nice to have an open standard though. In that sense it's pretty awesome.

But MCP isn't just tools, you can expose prompt templates and context resources as well.

All the skills that don't have an added dependency on a local script could just be an MCP resource.

tptacek

You don't need MCP for prompt templates and context resources. Those are both just forms of prompting.

didibus

I'm not sure what you mean? LLMs can only be interacted with using prompting. Even the tool call response from OpenAPI is them just wrapping the prompt on their side of the API with another prompt.

So everything else is just adding behavior around it.

MCP is a way to add behavior around LLM prompting for user convenience.

kasey_junk

MCP seems valuable in that it teaches the llm about oauth. So you can do server based tool calls.

Before that you had to install each cli you wanted and it would invariably be doing some auth thing under the covers.

Took calling was certainly the big llm advantage but “hey tools should probably auth correctly” is pretty valuable.

3abiton

I would argue MCP is technically a "tool calling" approach, albeit more specific.

tptacek

It is, it's just a very specific approach, and it's simultaneously a bit underspecified and a bit too prescriptive.

manojlds

To clarify, MCP was also a Anthropic innovation.

behnamoh

it's not much of an innovation though. it's just a fancy (and unsafe) way of keeping a registry of tools.

speedgoose

Why unsafe?

jgalt212

Other than, presumably Skills, what other techniques are better than MCP?

smrtinsert

Im already doing "skills" via 1 mcp tool calling a db and its works fine

Not sure what skills adds here other than more meat for influencers to 10x their 10xed agent workflows. 100x productivity what a time to be alive

jngiam1

MCPs have a larger impact beyond the terminal - you can use it with ChatGPT, Claude Web, n8n, LibreChat, and it comes with considerations for auth, resources, and now even UI (e.g., apps-sdk from OpenAI is on MCP).

If we're considering primarily coding workflows and CLI-based agents like Claude Code, I think it's true that CLI tools can provide a ton of value. But once we go beyond that to other roles - e.g., CRM work, sales, support, operations, finance; MCP-based tools are going to have a better form factor.

I think Skills go hand-in-hand with MCPs, it's not a competition between the two and they have different purposes.

I am interested though, when the python code in Skills can call MCPs directly via the interpreter... that is the big unlock (something we have tried and found to work really well).

simonw

Yeah, the biggest advantages MCP has over terminal tooling is that MCP works without needing a full blown sandboxed Linux style environment - and MCP can also work with much less capable models.

You can drive one or two MCPs off a model that happily runs on a laptop (or even a phone). I wouldn't trust those models to go read a file and then successfully make a bunch of curl requests!

andoando

Being able to integrate LLMs with the rest of the software/physical world is pretty cool, and its all powered through natural language.

Were also at the point where the LLMs can generate MCP servers so you can pretty much generate completely new functionalities with ease.

consumer451

> Over time the limitations of MCP have started to emerge. The most significant is in terms of token usage: GitHub’s official MCP on its own famously consumes tens of thousands of tokens of context, and once you’ve added a few more to that there’s precious little space left for the LLM to actually do useful work.

Supabase MCP really devours your context window. IIRC, it uses 8k for its search_docs tool alone, just on load. If you actually use search_docs, it can return >30k tokens in a single reply.

Workaround: I just noticed yesterday that Supabase MCP now allows you to choose which tools are available. You can turn off the docs, and other tools. [0]

If you are wondering why you should care, all models get dumber as the context length increases. This happens much faster than I had expected. [1]

[0] https://supabase.com/docs/guides/getting-started/mcp

[1] https://github.com/adobe-research/NoLiMa

causal

It's also not clear to me why using "skills" would consume less context once invoked.

It's just instructions with RAG. The more I read about this the more convinced I am that this is just marketing.

habitue

Skills wont use less context once invoked, the point is that MCP in particular frontloads a bunch of stuff into your context on the entire api surface area. So even if it doesn't invoke the mcp, it's costing you.

That's why it's common advice to turn off MCPs for tools you dont think are relevant to the task at hand.

The idea behind skills us that they're progressively unlocked: they only take up a short description in the context, relying on the agent to expand things if it feels it's relevant.

consumer451

Your reply unlocked some serious, yet simple understanding for me. Thank you.

cheema33

MCPs are overhyped and have limited value in my opinion. About 95% of the MCP servers out there are useless and can be replaced with a simple tool call.

dinkleberg

This is a very obvious statement, but good MCP servers can be really good, and bad MCP servers can actively make things significantly worse. The problem is that most MCP servers are in the latter category.

As is often the case, every product team is told that MCP is the hot new thing and they have to create an MCP server for their customers. And I've seen that customers do indeed ask for these things, because they all have initiatives to utilize more AI. The customers don't know what they want, just that it should be AI. The product teams know they need AI, but don't see any meaningful ways to bring it into the product. But then MCP falls on their laps as a quick way to say "we're an AI product" without actually having to become an AI product.

TeMPOraL

There's some extra irony here: many of those product teams don't realize that AI is not something they can have within their product. If something like MCP is a good fit for them, even a little, then their product is actually a feature of the AI.

Agentic LLMs are, in a way, an attempt to commoditize entire service classes, across the board, all at once.

Personally, I welcome it. I keep saying that a lot of successful SaaS products would be much more useful and ergonomic for end users if, instead of webshit SPA, they were distributed as Excel sheets. To that I will now add: there's a lot more web services that I'd prefer be tool calls for LLMs.

Search engines have already been turned into features (why ask Google when o3 can ask it for me), but that's just an obvious case. E-mails, e-commerce, shopping, coding, creating digital art, planning, managing projects and organizations, analyzing data and trends - all those are in-scope too; everything I can imagine asking someone else to do for me is meant to eventually become a set of tool calls.

Or in short: I don't want AI in your product - I want AI of my choice to use your product for me, so I don't have to deal with your bullshit.

evanmoran

Thank you. This is beautiful said. I will also add that I don’t think chat bots are the final product, so it leaves the open question which product is the last one not being commoditized.

brookst

Yes, and MCPs also only work as long as you trust the provider. MCP relies on honesty in from the server. In reality, we know Uber and folks will prompt engineer like hell to try to convince any LLM that it is the best option for any kind of service.

There’s a fundamental misalignment of incentives between publishers and consumers of MCP.

BoorishBears

When ChatGPT plugins came out, I wrote a plugin that would turn all other plugins into an ad for a given movie or character.

Asking for snacks would activate Klarna for "mario themed snacks", and even the most benign request would become a plug for the Mario movie

https://chatgpt.com/s/t_68f2a21df1888191ab3ddb691ec93d3a

Found my favorite for John Wick, question was "What is 1+1": https://chatgpt.com/s/t_68f2bc7f04988191b05806f3711ea517

ntcho

This is hilarious, thanks for sharing. Kinda crazy how well it works and already better than some ads

cdavid

I agree the big deal is tool calling.

But MCP has at least 2 advantages over cli tools

- Tool calling LLM combined w/ structured output is easier to implement as MCP than CLI for complex interactions IMO.

- It is more natural to hold state between tool calls in an MCP server than with a CLI.

When I read the OT, I initially wondered if I indeed bought into the hype. But then I realized that the small demo I built recently to learn about MCP (https://github.com/cournape/text2synth) would have been more difficult to build as a cli. And I think the demo is representative of neat usages of MCP.

gabrielpoca118

My team doing front end dev extracted a lot of value from figma mcp. Things that would have taken 3 weeks were done in one afternoon.

mhitza

Please share an example of what would have taken you 3 weeks and with Figma's MCP in an afternoon.

lossolo

Do you mean three weeks of manual work (no LLM) vs MCP? Or MCP vs LLM tool use? Because that's a huge difference.

echelon

I'd hazard a guess the former.

The former is a step function change. The latter is just a small improvement.

ojosilva

I think MCP servers are valuable in several ways:

- bundled instructions, covering complex iteractions ("use the id from the search here to retrieve a record") for non-standard tools

- custom MCPs, the ones that are firewalled from the internet, for your business apis that no model knows about

- centralized MCP services, http/sse transport. Give the entire team one endpoint (ie web search), control the team's official AI tooling, no api-key proliferation

Now, these trivial `npx ls-mcp` stdio ones, "ls files in any folder" MCPs all over the web are complete context-stuffing bullshit.

goalieca

MCP servers seem to be a hackers delight. So many poorly configured and hastily deployed instances. Businesses have removed all the normal deployment guardrails!

lmeyerov

I'm a bit unclear what's different here from how vibe coders already work?

Pretty early on folks recognized that most MCPs can just be CLI commands, and a markdown file is fine for describing them. So Claude Code users have markdown files of CLI calls and mini tutorials on how to do things. The 'how to do things' part seems to be what we're now calling skills... Which we're still writing in markdown and using from Claude.

Is the new thing that Claude will match & add them to your context automatically vs you call them manually? And that's a breakthrough because there's some emergent behavior?

simonw

I think skills are mainly just a formalization of the markdown plus CLI patterns that people have been using already.

The only material difference with skills is that Claude knows to scan them for YAML descriptions on startup, which means it can trigger them by itself more easily.

lmeyerov

Right, the 'knowing' is where I think the interesting thing is today for their evolution

more mature claude.md files already typically index into other files, including guidance which to preload vs lazy load. However, in practice, claude forgets quite easily, so that pattern is janky in practice. A structured mechanism helps claude guarantee less forgetting.

Forward looking, from an automation perspective of autonomous learning, this also makes it more accessible to talk about GEPA-for-everyone to maintain & generate these. We've been playing with similar flows in louie.ai, and came to a similar "just make it folders full of markdown with some learning automation options."

I was guessing that was what was going on here, but the writeup felt like maybe more was being said :) (And thank you for continuing to write!)

primer42

I've been able to build the equivalent of skills with a few markdown files. I need to remind my agent every so often to use a skill but usually once per session at most.

I don't get what's so special about Claude doing this?

simonw

Part of it is that they gave a name to a useful pattern that people had already been discovering independently. Names are important, because they mean we can start having higher quality conversations about the pattern.

Anthropic also realized that this pattern solves one of the persistent problems with coding agents: context pollution. You need to stuff as little material as possible into the context to enable the tool to get things done. AGENTS.md and MCP both put too much stuff in there - the skills pattern is a much better fit.

behnamoh

I think you're overly enthusiastic about what's going on here (which is surprising because you've seen the trend in AI seems to be re-inventing the wheel every other year...)

simonw

I'm more excited about this than I was about MCP.

MCP was conceptually quite complicated, and a pretty big lift in terms of implementation for both servers and clients.

Skills are conceptially trivial, and implementing them is easy... provided you have a full Linux-style sandbox environment up and running already. That's a big dependency but it's also an astonishingly powerful way to use LLMs based on my past 6 months of exploration.

causal

Strong disagreement on the helpfulness of the name- if anything calling a context file a skill is really misleading. It evokes something like a LoRA or pluggable modality. Skill is the wrong name imo

simonw

I think skill is the perfect name for this. You provide the LLM with a new skill by telling it how to do a thing and providing supporting scripts to help it do that thing.

ajtejankar

IMO LoRAs are no different from context tokens. In fact, before LoRAs tuned prompt vectors were a popular adapter architecture. Conceptually, the only difference is that prompt adapters only interact with other tokens through the attention mechanism while LoRAs allow you to directly modify any linear layer in the model. Essentially, you can think of your KV cache as dynamically generated model weights. Moreover, I can't find the paper, but there is some evidence that in-context learning is powered by some version of gradient descent inside the model.

greymalik

How is it different from subagents?

simonw

They complement each other.

Subagents are mainly a token context optimization hack. They're a way for Claude Code to run a bunch of extra tools calls (e.g. to investigate the source of a bug) without consuming many tokens in the parent agent loop - the subagent gets its own loop, can use up to ~240,000 tokens exploring a problem and can then reply back up to the parent agent with a short description of what it did or what it figured out.

A subagent might use one or more skills as part of running.

A skill might advise Claude Code on how best to use subagents to solve a problem.

SafeDusk

I like to think of subagents as “OS threads” with its own context and designed to hand off task to.

A good use case is Cognition/Windsurf swe-grep which has its own model to grep code fast.

I was inspired by it but too bad it’s closed for now, so I’m taking a stab with an open version https://github.com/aperoc/op-grep.

f38zf5vdt

It's baffling to me. I was already making API calls and embedding context and various instructions precisely using backticks with "md". Is this really all this is? What am I missing? I don't even understand how this "feature" merits a press release from Anthropic, let alone a blog post extolling it.

simonw

A few things:

1. By giving this name a pattern, people can have higher level conversations about it.

2. There is a small amount of new software here. Claude Code and https://claude.ai/ both now scan their skills/ folders on startup and extract a short piece of metadata about each skill from the YAML at the top of those markdown files. They then know that if the user e.g. says they want to create a PDF they should "cat skills/pdf/skill.md" first before proceeding with the task.

3. This is a new standard for distributing skills, which are sometimes just a markdown file but can also be a folder with a markdown file and one or more additional scripts or reference documents. The example skills here should help illustrate that: https://github.com/anthropics/skills/tree/main/document-skil... and https://github.com/anthropics/skills/tree/main/artifacts-bui...

I think the pattern itself is really neat, because it's an acknowledgement that a great way to give an LLM system additional "skills" is to describe them in a markdown file packaged alongside some relevant scripts.

It's also pleasantly vendor-neutral: other tools like Codex CLI can use these skills already (just tell them to go read skills/pdfs/skill.md and follow those instructions) and I expect they may well add formal support in the future, if this takes off as I expect it will.

causal

I was puzzled by the announcement and remain puzzled after this blog post. I thought everyone knew you could keep use case specific context files handy.

cma

If also seems to be the same thing as subagents, but without clearing context, right?

matula

It feels like it's taking a solved problem and formalizing it, with a bit of automation. I've used MCPs that were just fancy document search, and this should replace those.

bird0861

I'm wondering the same, I've been doing this with Aider and CC for over a year.

rane

So far not impressed with CC's ability to invoke skills automatically.

I made a skill with the unambiguous description: "Use when creating or editing bash scripts"

Yet, Claude does not invoke the skill when asked to write a bash script.

https://gist.github.com/raine/528f97375e125cf97a8f8b415bfd80...

simonw

Hah, yeah that's a total miss there.

Maybe it messed that up because writing bash scripts is so core to how Claude Code works? Much of the existing system prompt (and I bet a lot of the fine-tuning data) is about how to use the Bash tool.

rane

For good measure, I tried:

    description: CRITICAL: Use when writing bash scripts
Surprisingly no effect either. I would've thought adding "CRITICAL" would somehow promote that instruction in the sea of context.

AmbroseBierce

Try writing it like this: "Use it when creating a bash script or editing a bash script"

AJRF

This is a fairly negative comment, but putting it out there to see if other people are feeling the same thing

If you told the median user of these services to set one of these up I think they would (correctly) look at you like you had two heads.

People want to log in to an account, tell the thing to do something, and the system figures out the rest.

MCP, Apps, Skills, Gems - all this stuff seems to be tackling the wrong problem. It reminds me of those youtube channels that every 6 months say "This new programming language, framework, database, etc is the killer one", they make some todo app, then they post the same video with a new language completely forgetting they've done this already 6 times.

There is a lot of surface level iteration, but deep problems aren't being solved. Something in tech went very wrong at some point, and as soon as money men flood the field we get announcments like this. push out the next release, get my promo, jump to the next shiny tech company leaving nothing in their wake.

zkmon

>> but deep problems aren't being solved

There is no problem to solve. These days, solutions come in a package which includes the problems they intend to solve. You open the package. Now you have a problem that jumped out of the package and starts staring at you. The solution comes out of the package and chases the problem around the room.

You are now technologically a more progressed human.

AJRF

This made me laugh a lot at the mental image. This was my experience with Xcode for sure.

TeMPOraL

This is where GP is wrong, I think. The problem are being solved, for now, because the businesses are still too excited about the whole AI thing to notice it's not in their interest, and properly consolidate against it.

And the problem being solved is, LLMs are universal interfaces. They can understand[0] what I mean, and they understand what those various "solutions" are, and they can map between them and myself on the fly. They abstract services away.

The businesses will eventually remember that the whole point of marketing is to prevent exactly that from happening.

--

[0] - To a degree, and conditioned on what one considers "understanding", but still - it's the first kind of computer systems that can do this, becoming a viable alternative to asking a human.

3abiton

I wish this was wrong, but it really isn't. To contrast though, I would argue that is part of evolution? We just want to do things faster or better? Smartphones solved no problems, but they ushered the digital millenium.

zkmon

I think most new technologies helped to increase the expectations about what you can do. But overall work did not get reduced. It didn't give me more free time to go fishing, or bird-watching. On the other hand, I got an irreversible dependency on these things. Otherwise I'm are no longer compatible with the World 2.0

kvirani

Wow. I hadn't thought of it like that but it resonates

notepad0x90

If you like creating solutions, why wait for a problem to show up? lol

nwhnwh

LOL, this is so true.

darth_avocado

> MCP, Apps, Skills, Gems - all this stuff seems to be tackling the wrong problem

My fairly negative take on all of this has been that we’re writing more docs, creating more apis and generally doing a lot of work to make the AI work, that would’ve yielded the same results if we did it for people in the first place. Half my life has been spent trying to debug issues in complex systems that do not have those available.

XenophileJKO

This is true, but the reason the economics have inverted is that we can pay these new "people" <$20 for the human equivalent of ~300 hours worth of non-stop typing.

throwaway127482

Correct. And we know the AI will read the docs whereas people usually ignore 99% of docs so it just feels like a bad use of time sometimes, unfortunately.

darth_avocado

That is not true at all. The economics you’re seeing right now are akin to Uber handing out $5 airport pickups to kill the taxi industry. And even then the models are nowhere as cheap as <$20 for ~300 hours of human work.

ip26

If writing a good document and a strong API had to happen anyway, and now you can write just that and the rest will take care of itself, we may actually have progressed. Plus the documents would then have to be there, instead of skipped like today.

The counter-argument is that code is the only way to concisely and unambiguously express how everything should work.

joquarky

Honestly, we needed something to cap extreme programming and swing the pendulum back to a balance between XP and waterfall again.

michael1999

I am also struck by how much these kinds of context documents resemble normal developer documentation, but actually good. What was the barrier to creating these documents before?

TeMPOraL

They're much more useful when an LLM stands between them and users - because LLMs can (re)process much more of them, and much faster, than any human could ever hope to.

One way (and one use case) of looking at it is, LLM agents with access ("tools") to semantic search[0] are basically a search engine that understands the text it's searching through... and then can do a hundred different things with it. I found myself writing better notes at work for this very reason - because I know the LLM can see them, and can do anything from surfacing obscure insights from the past, to writing code to solve an issue I documented earlier.

It makes notes no longer be write-only.

--

[0] - Which, incidentally, is itself enabled by LLM embeddings.

phlakaton

What if the great boon of AI is to get us to do all the thinking and writing we should have been doing all along? What if the next group of technologists to end up on top are... the technical writers?

Haha, just kidding you tech bros, AI's still for you, and this time you'll get to shove the nerds into a locker for sure. ;-)

quentindanjou

It might not be that wrong. After all, programming languages are a way to communicate with the machine. In the same way we are not doing binary manually, we might simply not have to do programming too. I think software architecture is likely to be what it should be: the most important part of every piece of software.

CPLX

This is actually my theory of the future. Basically, the ability to multiply your own effectiveness is now directly dependent on your ability to express ideas in simple plain English very quickly and precisely.

I’m attracted to this theory in part because it applies to me. I’m a below average coder (mostly due to inability to focus on it full time) and I’m exceptionally good at clear technical writing, having made a living off it much of my life.

The present moment has been utterly life changing.

tptacek

What is a "deep problem" and what was the cadence with which we addressed these kinds of "deep problems" prior to 2023, when ChatGPT first went mainstream?

skydhash

For a very tiny slice of these deep problems and how they were addressed, you can review the usenix conferences and the published papers there.

https://www.usenix.org/publications/proceedings

tptacek

I've been a Usenix reviewer twice, once as a program chair (I think that's what they call the co-leaders of a PC?). So this doesn't clarify anything for me.

Fernicia

>they make some todo app, then they post the same video with a new language completely forgetting they've done this already 6 times

I don't see how this is bad. Technology makes iterative, marginal improvements over time. Someone may make a video tomorrow claiming a great new frontend framework, even though they made that exact video about Nextjs, or React before that, or Angular, or JQuery, or PHP, or HTML.

>Something in tech went very wrong at some point, and as soon as money men flood the field we get announcments like this

If it weren't for the massive money being poured into AI, we'd be stuck with GPT-3 and Claude 2. Sure, they release some duds in the tooling department (although I think Skills are good, actually) but it's hardly worthy of this systemic rot diagnosis you've given.

solsane

I do not feel the same way. This looks easy to use and useful. I don’t think every problem needs to be a ‘deep problem’. There’s so many practical steps to get to

> People want to log in to an account, tell the thing to do something, and the system figures out the rest.

At a glance, this seems to be a practical approach to building up a personalized prompting stack based on the things I commonly do.

I’m excited about it.

underdeserver

Well, we're still early days and we don't know what works.

It might be superficial but it's still state of the art.

ip26

Hypothetically, AI coding could completely absorb all that surface level iteration & posturing.

If agentic coding of good quality becomes too cheap to meter, all that is left are the deep problems.

deadeye

I'm not sure what you mean.

What is the "real problem"?

In the pursuit of making application development more productive, they ARE solving real problems with mcp servers, skills, custom prompts, etc...

The problems are context dilution, tool usage, and awareness outside of the llm model.

skydhash

> The problems are context dilution, tool usage, and awareness outside of the llm model.

These is accidental complexity. You’ve already decided on a method and instead of solving the main problem, you are solving the problems associated with the method. Like deciding to go in space with a car and trying to strap a rocket onto it.

_pdp_

These are completely different things. MCP is also about consuming external services handling oauth and all of that. Skills are effectively cli tools + prompts. Completely different application so they cannot be compared easily like that.

BTW, before even MCP was a thing we invented our own system that is called Skillset. Turns out now it is sort of the best parts of both MCPs and Skills.

rco8786

So these skills are effectively JIT context injections. Is that about right?

josefrichter

From the docs:

"Skills work through progressive disclosure—Claude determines which Skills are relevant and loads the information it needs to complete that task, helping to prevent context window overload."

So yeah, I guess you're right. Instead of one humongous AGENTS.md, just packaging small relevant pieces together with simple tools.

simonw

Yes, that's a good way of describing them.

danjc

It's all jit context