BBC threatens AI firm with legal action over unauthorised content use

simonw

It looks to me like this is mainly about RAG - Perplexity answers user questions by running searches and then displaying content from those searches to users, and the BBC are arguing that this content display violates their copyright.

Unsurprisingly this article confuses the issue somewhat by also talking about training models on content. I understand why that's in there - it's a hot topic, especially in the UK right now - but I don't think it's directly relevant to this complaint.

The note about robots.txt is interesting - "The BBC said in its letter that while it disallowed two of Perplexity's crawlers, the company "is clearly not respecting robots.txt".

Perplexity describe their user-agents here: https://docs.perplexity.ai/guides/bots

I had a look at https://www.bbc.com/robots.txt and it does indeed block both PerplexityBot ("designed to surface and link websites in search results on Perplexity" - I think that's their search index crawler) and Perplexity-User ("When users ask Perplexity a question, it might visit a web page to help provide an accurate answer and include a link to the page in its response").

But... I checked the Internet Archive for a random earlier date - Feb 2025 - https://web.archive.org/web/20250208052005/https://www.bbc.c... - and back then the BBC were blocking PerplexityBot but not Perplexity-User.

hadrien01

They also write this:

> Since a user requested the fetch, this fetcher generally ignores robots.txt rules.

simonw

Oh wow, I missed that! That's from the docs for that Perplexity‑User user-agent, at which point presumably there's no point in listing that in robots.txt at all?

dabeeeenster

I mean, that's just not true.

esskay

Which part? It's widely established and known that many AI crawlers are ignoring the robots.txt file, perplexity being one of them [1]

[1]https://www.tomshardware.com/tech-industry/artificial-intell...

whilenot-dev

For what its worth, this statement here regarding Perplexity-User:

> Since a user requested the fetch, this fetcher generally ignores robots.txt rules.

...has been added sometime between 30.01.2025[0] and 07.02.2025[1], and makes it sound like robots.txt was not respected by that bot anyways.

[0]: https://web.archive.org/web/20250130164401/https://docs.perp...

[1]: https://web.archive.org/web/20250207113929/https://docs.perp...

simonw

Great catch there.

seydor

> In a statement, Perplexity said: "The BBC's claims are just one more part of the overwhelming evidence that the BBC will do anything to preserve Google's illegal monopoly."

Unless perplexity has a way to indirectly pay writers the way google does, this is very rich

> four popular AI chatbots - including Perplexity AI - were inaccurately summarising news stories, including some BBC content.

One of the interesting things about the failures of LLMs is that news sources have become more concise and more authoritative. Even google fails to get facts right with its AI summaries, so one is compelled even more to go read the website instead. And I'm not sure if LLMs will ever be able to grasp true from lies.

riskable

How is Perplexity different from running a Jupyter Notebook or anything, really that lets you download a web page programmatically? I can spin up an AWS instance, login then run `python` and scrape the BBC's content as much as I want. Why aren't they suing Amazon (and every other company that lets you download stuff via their systems) for providing the same functionality?

A very old argument: If you don't want people scraping or downloading your content don't put it on the (public) Internet!

Imagine we had LLM-like functionality in the 1980s: Sony announces a new VCR that can read a recorded news show and print out a summary on a connected Imagewriter II. People start using it to summarize the publicly-broadcast BBC news programs.

Today's scenario would be like the BBC sues Sony for providing that functionality.

ethbr1

Because copyright is intrinsically linked to scale.

1000000x'ing fair use... might no longer be fair use.

The balances between society and copyright need to change when scale changes drastically.

To address the elephant in the room -- what happens when there are only leachers and no sources, because we've let them hijack first-party news revenue without creating a replacement?

fcatalan

To be honest not visiting some websites is one of my main uses of Perplexity.

For example I like to watch F1 and I like to know the times for all sessions in my timezone during the weekend.

It's surprisingly hard to find this information, because the Google search is SEOed to hell and back by sites that hide the information behind endless articles full of irrelevant AI slop and 2 million intrusive ads, and that's if they have it right or at all.

Perplexity wades through all that shit, gives me a neatly formatted table and has never been wrong so far.

So I can see where the BBC is coming from but I also don't really want them to win.

esskay

> In a statement, Perplexity said: "The BBC's claims are just one more part of the overwhelming evidence that the BBC will do anything to preserve Google's illegal monopoly."

That's got to be the most delusional response they could've given. It's not BBC or any other news publishers job to preserve Google's monopoly. The comparison would only even work if Google was replacing a link to a BBC article in the search results with a direct copy of said article on the Google search results page.

oneeyedpigeon

I'd love to see some—any—of this "overwhelming evidence". I suspect it does not exist. I'd also love to ask Perplexity why they think the BBC would have any kind of bias toward Google, it just doesn't make any sense.

randall

this is the most non sequitur press statement ever.

josefritzishere

Good. I hope BBC gets a historically large judgement and Google has to learn a valulable lesson.

RiOuseR

[dead]

null

[deleted]

HN

BBC threatens AI firm with legal action over unauthorised content use

BBC threatens AI firm with legal action over unauthorised content use