Claude Integrations

111 comments

·May 1, 2025

todsacerdoti

Check out 2500+ MCP servers at https://mcp.pipedream.com

n_ary

Is this the beginning of the apps for everything era and finally the SaaS for your LLM begins? Initially we had internet but value came when instead of installed apps, webapps arrived to become SaaS. Now if LLMs can use specific remote MCP which is another SaaS for your LLM, the remote MCP powered service can charge a subscription to do wonderful things and voila! Let the new golden age of SaaS for LLMs begin and the old fad(replace job XYZ with AI) die already.

OtherShrezzing

I'd love a _tip jar_ MCP, where the LLM vendor can automatically tip my website for using its content/feature/service in a query's response. Even if the amount is absolutely minuscule, in aggregate, this might make up for ad revenue losses.

clvx

I'm more excited I can run now a custom site, hook an MCP for it, and have all the cool intelligence I had to pay for SaaS without having to integrate to them plus govern my data, it's a massive win. I just see AI assistant coding replicating current SaaS services that I can run internally. If my shop was a specific stack, I could aim to have all my supporting apps in that specific stack using AI assistant coding, simplifying operations, and being able to hook up MCP's to get intelligence from all of them.

Truly, OSS should be more interesting in the next decade for this alone.

throwaway7783

MCP is yet another interface for an existing SaaS (like UI and APIs), but now magically "agent enabled". And $$$ of course

naravara

On one hand, yes this is very cool for a whole host of personal uses. On the other hand giving any company this level of access to as many different personal data sources as are out there scares the shit out of me.

I’d feel a lot better if we had something resembling a comprehensive data privacy law in the United States because I don’t want it to basically be the Wild West for anyone handling whatever personal info doesn’t get covered under HIPAA.

falcor84

Absolutely agreed, but just wanted to mention that it's essentially the same level of access you would give to Zapier, which is one of their top examples of MCP integrations.

throwup238

The leap frogging at this point is getting insane (in a good way, I guess?). The amount of time each state of the art feature gets before it's supplanted is a few weeks at this point.

LLMs were always a fun novelty for me until OpenAI DeepResearch which started to actually come up with useful results on more complex programming questions (where I needed to write all the code by hand but had to pull together lots of different libraries and APIs), but it was limited to 10/month for the cheaper plan. Then Google Deep Research upgraded to 2.5 Pro and with paid usage limits of 20/day, which allowed me to just throw everything at it to the point where I'm still working through reports that are a week or more old. Oh and it searched up to 400 sources at a time, significantly more than OpenAI which made it quite useful in historical research like identifying first edition copies of books.

Now Claude is releasing the same research feature with integrations (excited to check out the Cloudflare MCP auth solution and hoping Val.town gets something similar), and a run time of up to 45 minutes. The pace of change was overwhelming half a year ago, now it's just getting ridiculous.

user_7832

I agree with your overall message - rapid growth appears to encourage competition and forces companies to put their best foot forward.

However, unfortunately, I cannot shower much praise on Claude 3.7. And if you (or anyone) asks why - 3.7 seems much better than 3.5, surely? - Then I’m moderately sure that you use Claude much more for coding than for any kind of conversation. In my opinion, even 3.5 Haiku (which is available for free during high loads) is better than 3.7 Sonnet.

Here’s a simple test. Try asking 3.7 to intuitively explain anything technical - say, mass dominated vs spring dominated oscillations. I’m a mechanical engineer who studied this stuff and I could not understand 3.7’s analogies.

I understand that coders are the largest single group of Claude’s users, but Claude went from being my most used app to being used only after both chatgpt and Gemini, something that I absolutely regret.

airstrike

I too like 3.5 better than 3.7 and I use it pretty often. It's like 3.7 is better in 2 metrics but worse in 10 different ones

null

[deleted]

joshstrange

I use Claude mostly for coding/technical things and something about 3.7 does not feel like an upgrade. I haven't gone back to 3.5 (mostly started using Gemini Pro 2.5 instead).

I haven't been able to use Claude research yet (it's not rolled out to the Pro tier) but o1 -> o3 deep research was a massive jump IMHO. It still isn't perfect but o1 would often give me trash results but o3 deep research actually starts to be useful.

3.5->3.7 (even with extended thinking) felt like a nothingburger.

ilrwbwrkhv

None of those reports are any good though. Maybe for shallow research, but I haven't found them deep. Can you share what kind of research you have been trying there where it has done a great job of actual deep research.

null

[deleted]

conroy

Remote MCP servers are still in a strange space. Anthropic updated the MCP spec about a month ago with a new Streamable HTTP transport, but it doesn't appear that Claude supports that transport yet.

When I hooked up our remote MCP server, Claude sends a GET request to the endpoint. According to the spec, clients that want to support both transports should first attempt to POST an InitializeRequest to the server URL. If that returns a 4xx, it should then assume the SSE integration.

VSerge

Ongoing demo of integrations with Claude by a bunch of A-list companies: Linear, Stripe, Paypal, Intercom, etc.. It's live now on: https://www.youtube.com/watch?v=njBGqr-BU54

In case the above link doesn't work later on, the page for this demo day is here: https://demo-day.mcp.cloudflare.com/

zoogeny

I'm curious what kind of research people are doing that takes 45 minutes of LLM time. Is this a poke at the McKinsey consultant domain?

Perhaps I am just frivolous with my own time, but I tend to use LLMs in a more iterative way for research. I get partial answers, probe for more information, direct the attention of the LLM away from areas I am familiar and towards areas I am less familiar. I feel if I just let it loose for 45 minutes it would spend too much time on areas I do not find valuable.

This seems more like a play for "replacement" rather than "augmentation". Although, I suppose if I had infinite wealth, I could kick of 10+ research agents each taking 45 minutes and then review their output as it became available, then kick off round 2, etc. That is, I could do my process but instead of interactively I could do it asynchronously.

throwup238

That iterative research process is exactly how I use Google Deep Research since it has a 20/day rate limit. Research a problem, notice some off hand assumption or remark the report made, and fire off another research run asking about it. It depends on what you work on; in my use case I often have to do hours of research for 30 minutes of work like when integrating a bunch of different vendors’ APIs or pouring over datasheets for EE, so it’s worth firing off research and then working on something else for 10-20 minutes (it helps that the Gemini app fires off a push notification when the report is done - Anthropic please do this! Even for requests made from the web app).

As for long research times, one thing I’ve been using it for is historical research on old books. Gemini DeepResearch was the first one able to properly explain the nuances of identifying a chimeral first edition Origin of Species after taking half an hour and reading 400 sources. It went into all the important details like spelling errors and the properties of chimeral FY2** copies found in various libraries around the world.

pton_xd

"To start, you can choose from Integrations for 10 popular services, including Atlassian’s Jira and Confluence, Zapier, Cloudflare, Intercom, Asana, Square, Sentry, PayPal, Linear, and Plaid. ... Each integration drastically expands what Claude can do."

Give us an LLM with better reasoning capabilities, please! All this other stuff just feels like a distraction.

atonse

I disagree. They can walk and chew gum, do both things at once. And this practical stuff is very important.

I've been using the Atlassian MCP for nearly a month now, and it's completely changed (and eliminated) the feeling of having an overwhelming backlog.

I can have it do things like "find all the tickets related to profile editing and combine them into one epic" where it works perfectly. Or "help me prioritize the 15 tickets assigned to me this sprint" and it'll actually go through and suggest "maybe you can do these two tickets first since they seem smaller, then do this big one" – i haven't hooked it up to my calendar yet.

But I'd love for it to suggest things like "do this one ticket that requires a lot of heads down time on wednesday since you don't have any meetings. I can create a block on your calendar so that nobody will schedule a meeting then"

Those are all superhuman things that can be done with MCP and a smart model.

I've defined rules in cursor that say "when I ask you to mark something ready for test, change the status and assign it to <x person>, and leave a comment summarizing the changes"

If you look at my JIRA comments now, you'd wonder how I had so much time to write such thorough comments. I don't, Cursor and whatever model is doing it for me.

It's been an absolute game changer. MCP is going to be what the App store was to mobile. Yes you can get by without it, but actually hooking into all your daily tool is when this stuff gets insanely valuable in a practical sense.

OJFord

> If you look at my JIRA comments now, you'd wonder how I had so much time to write such thorough comments. I don't, Cursor and whatever model is doing it for me.

How do your colleagues feel about it?

atonse

Well I had half a mind to not tell them to see what they’d say, but I also was excited to show everyone so they can also be empowered with it.

One of them said “yeah I was wondering cuz you never write that much” - as a leader, I actually don’t set a good example of how to leave quality JIRA comments. And my view with all these things is that I have to lead by example, not by orders.

With the help of these kinds of tools, we can improve the quality of these comments. And I wouldn’t expect others to write them manually, more that I wanted to show that everyone’s use of JIRA on the team can improve.

warkdarrior

My colleagues' LLM assistants think that my LLM assistant leaves great JIRA comments.

organsnyder

I have Claude hooked up to our project management system, GitHub, and my calendar (among other things). It's already proving extremely useful for various project management tasks.

zoogeny

Honestly, that backlog management idea is probably the first time an MCP actually sounded appealing to me.

I'm not in that world at the moment, but I've been the lead on several projects where the backlog has became a dumping ground of years of neglect. You end up with this tiered backlog thing where one level of backlog gets too big so you create a second tier of backlog for the stuff you are actually going to work on. Pretty soon you end up with duplicates in the second tier backlog for items already in the base level backlog since no one even looks at that old backlog anymore.

I've done a lot of tidy up myself when I inherit this kind of mess, just closing tickets we definitely will never get to, de-duping, adding context when available, grouping into epics, tagging with relevant "tech-debt", "security", "bug", "automation", etc. But when there are 100s of tickets it is a slog. Having an LLM do this makes so much sense.

Centigonal

Building integrations is a more predictable way of developing a smaller competitive advantage versus research. I think most of the leading AI companies are adopting a multi-arm strategy of research + product/ecosystem development to balance their risks.

WhitneyLand

The integrations feel so rag-ish. It talks, tells you it’s going to use a tool, searches, talks about what it found…

Hope one day it will be practical to do nightly finetunes of a model per company with all core corporate data stores.

This could create a seamless native model experience that knows about (almost) everything you’re doing.

notgiorgi

why is finetuning talked about so much less than RAG? is it not viable at all?

omneity

RAG is infinitely more accessible and cheaper than finetuning. But it is true that finetuning is getting severely overlooked in situations where it would outperform alternatives like RAG.

computerex

It’s significantly harder to get right, it’s a very big stepwise increase in technical complexity over in context learning/rag.

There are now some light versions of fine tuning that don’t update all the model weights but train a small adapter layer called Lora which is way more viable commercially atm in my opinion.

mring33621

i'm not an expert in either, but RAG is like dropping some 'useful' info into the prompt context, while fine tuning is more like a performing mix of retraining, appending re-interpretive model layers and/or brain surgery.

I'll leave it to you to guess which one is harder to do.

disgruntledphd2

RAG is much cheaper to run.

ijk

There were initial difficulties in finetuning that made it less appealing early on, and that's snowballed a bit into having more of a focus on RAG.

Some of the issues still exist, of course:

* Finetuning takes time and compute; for one-off queries using in-context learning is vastly more efficient (i.e., look it up with RAG).

* Early results with finetuning had trouble reliably memorizing information. We've got a much better idea of how to add information to a model now, though it takes more training data.

* Full finetuning is very VRAM intensive; optimizations like LoRA were initially good at transferring style and not content. Today, LoRA content training is viable but requires training code that supports it [1].

* If you need a very specific memorized result and it's costly to get it wrong, good RAG is pretty much always going to be more efficient, since it injects the exact text in context. (Bad RAG makes the problem worse, of course).

* Finetuning requires more technical knowledge: you've got to understand the hyperparameters, avoid underfitting and overfitting, evaluate the results, etc.

* Finetuning requires more data. RAG works with a handful datapoints; finetuning requires at least three orders of magnitude more data.

* Finetuning requires extra effort to avoid forgetting what the model already knows.

* RAG works pretty well when the task that you are trying to perform is well-represented in the training data.

* RAG works when you don't have direct control over the model (i.e., API use).

* You can't finetune most of the closed models.

* Big, general models have outperformed specialized models over the past couple of years; if it doesn't work now, just wait for OpenAI to make their next model better on your particular task.

On the other hand:

* Finetuning generalizes better.

* Finetuning has more influence on token distribution.

* Finetuning is better at learning new tasks that aren't as present in the pretraining data.

* Finetuning can change the style of output (e.g., instruction training).

* When finetuning pays off, it gives you a bigger moat (no one else has that particular model).

* You control which tasks you are optimizing for, without having to wait for other companies to maybe fix your problems for you.

* You can run a much smaller, faster specialized model because it's been optimized for your tasks.

* Finetuning + RAG outperforms just RAG. Not by a lot, admittedly, but there's some advantages.

Plus the RL Training for reasoning has been demonstrating unexpectedly effective improvements on relatively small amounts of data & compute.

So there's reasons to do both, but the larger investment that finetuning requires means that RAG has generally been more popular. In general, the past couple of years have been won by the bigger models scaling fast, but with finetuning difficulty dropping there is a bit more reason to do your own finetuning.

That said, for the moment the expertise + expense + time of finetuning makes it a tough business proposition if you don't have a very well-defined task to perform, a large dataset to leverage, or other way to get an advantage over the multi-billion dollar investment in the big models.

[1] https://unsloth.ai/blog/contpretraining

pyryt

I would love to do this on my codebase after every commit

bjornsing

The strategic business dynamic here is very interesting. We used to have "GPT-wrapper SaaS". I guess what we're about to see now is the opposite: "SaaS/MCP-wrapper GPTs".

joshwarwick15

Created a list of remote MCP servers here so people can keep track of new releases - https://github.com/jaw9c/awesome-remote-mcp-servers

OJFord

Where's the permissioning, the data protection?

People will say 'aaah ad company' (me too sometimes) but I'd honestly trust a Google AI tool with this way more. Not just because it already has access to my Google Workspace obviously, but just because it's a huge established tech firm with decades of experience in trying not to lose (or have taken) user data.

Even if they get the permissions right and it can only read my stuff if I'm just asking it to 'research', now Anthropic has all that and a target on their backs. And I don't even know what 'all that' is, whatever it explored deeming it maybe useful.

Maybe I'm just transitioning into old guy not savvy with latest tech, but I just can't trust any of this 'go off and do whatever seems correct or helpful with access to my filesystem/Google account/codebase/terminal' stuff.

I like chat-only (well, +web) interactions where I control the input and taking the output, but even that is not an experience that gives me any confidence in giving uncontrolled access to stuff and it always doing something correct and reasonable. It's often confidently incorrect too! I wouldn't give an intern free reign in my shell either!

joshwarwick15

Permissoning: OAuth Data protection: Local LLMs

boh

I think all the retail LLM's are working to broaden the available context, but in most practical use-cases it's having the ability to minimize and filter the context that would produce the most value. Even a single PDF with too many similar datapoints leads to confusion in output. They need to switch gears from the high growth, "every thing is possible and available" narrative, to one that narrows the scope. The "hallucination" gap is widening with more context, not shrinking.

fhd2

Definitely my experience. I manage context like a hawk, be it with Claude-as-Google-replacement or LLM integrations into systems. Too little and the results are off. Too much and the results are off.

Not sure what Anthropic and co can do about that, but integrations feel like a step in the wrong direction. Whenever I've tried tool use, it was orders of magnitude more expensive and generally inferior to a simple model call with curated context from SerpApi and such.

loufe

Couldn't agree more. I wish all major model makers would build tools into their proprietary UIs to "summarize contents and start a new conversation with that base". My biggest slowdown with working with LLMs while coding is moving my conversation to a new thread because context limit is hit (Claude) or the coherent-thought threshold is exceeded (Gemini).

roordan

This is my concern as well. How successful is it in selecting the correct tool out of hundreds or thousands?

Different to what this integration is pushing, the LLMs usage in production based products where high accuracy is a requirement (99%), you have to give a very limited tool set to get any degree of success.

energy123

There's a niche for the kitchen sink approach. It's a type of search engine.

Throw in all context --> ask it what is important for problem XYZ --> curate what it tells you, and feed that to another model to actually solve XYZ

Etheryte

This has been my experience as well. The moment you turn internet access on, Kagi Assistant starts outputting garbage. Turn it off and you're all good.

medhir

you hit the nail on the head. my experience with prompting LLMs is that providing extra context that isn’t explicitly needed leads to “distracted” outputs

ketzo

I mean, to be honest, they gotta do both to achieve what they’re aiming for.

A truly useful AI assistant has context on my last 100,000 emails - and also recalls the details of each individual one perfectly, without confusion or hallucination.

Obviously I’m setting a high bar here; I guess what I’m saying is “yes, and”

mikepurvis

That's a tough pill to swallow when your company valuation is a $62B based on the premise that you're building a bot capable of transcendent thought, ready to disrupt every vertical in existence.

Tackling individual use-cases is supposed to be something for third party "ecosystem" companies to go after, not the mothership itself.

null

[deleted]

HN

Claude Integrations

Claude Integrations