Skip to content(if available)orjump to list(if available)

Has LLM killed traditional NLP?

Has LLM killed traditional NLP?

76 comments

·January 15, 2025

itissid

LLM Design/Use has only about as much to with engineering as building a plane has to do with actually flying it.

Every business is kind of a unicorn in its problems NLP is a small part of it. Like even if it did perform cheaply enough to do NLP, how would you replace parts like: 1. Evaluation system that uses Calibration(Human labels) 2. Ground Truth Collection(Human + sometimes semi automated) 3. QA testing by end users.

Even if LLMs made it easier to do NLP there are correlations with the above which means your NLP process is hugely influenced so much that you still need an engineer. If you have an engineer who only for doing NLP and nothing else you are quite hyper specialized like to the extent you are only building planes 0.01%: of the engineering work out there.

axegon_

No, it has not and will not in the foreseeable future. This is one of my responsibilities at work. LLMs are not feasible when you have a dataset of 10 million items that you need to classify relatively fast and at a reasonable cost. LLMs are great at mid-level complexity tasks given a reasonable volume of data - they can take away the tedious job of figuring out what you are looking at or even come up with some basic mapping. But anything at large volumes.. Na. Real life example: "is '20 bottles of ferric chloride' a service or a product?"

One prompt? Fair. 10? Still ok. 100? You're pushing it. 10M - get help.

segmondy

You are not pushing it at 100. I can classify "Is 20 bottles of ferric chloride' a service or product in probably 2 seconds with a 4090. Something that most people don't realize is you can run multiple inference. So with something like a 4090, some solid few shots, and instead of having it classify one example at a time, you can do 5. We can probably run 100 parallel inference at 5 at a time. For about a rate of 250 a second on a 4090. So in 11 hours I'll be done. I'm going with a 7-8B model too. Some of the 1.5-3B models are great and will even run faster. Take a competent developer who knows python and how to use an OpenAI compatible API, they can put this together in 10-15 minutes, with no data science/scikit learn or other NLP toolchain experience.

So for personal, medium or even large workloads, I think it has killed it. It needs to be extremely large. If you are classifying or segmenting comments on a social media platform were you need to deal with billions a day, then LLM would be a very inefficient approach, but for 90+% of use cases. I think it wins.

I'm assuming you are going to run it locally because everyone is paranoid about their data. It's even cheaper if you use a cloud API.

mikeocool

If you have to classify user input as they’re inputting it to provide a response — so it can’t be batched - 2 seconds could potentially be really slow.

Though LLMs sure have made creating training data to train old school models for those cases a lot easier.

alexwebb2

I think your intuition on this might be lagging a fair bit behind the current state of LLMs.

System message: answer with just "service" or "product"

User message (variable): 20 bottles of ferric chloride

Response: product

Model: OpenAI GPT-4o-mini

$0.075/1Mt batch input * 27 input tokens * 10M jobs = $20.25

$0.300/1Mt batch output * 1 output token * 10M jobs = $3.00

It's a sub-$25 job.

You'd need to be doing 20 times that volume every single day to even start to justify hiring an NLP engineer instead.

simonw

You might be able to use an even cheaper model. Google Gemini 1.5 Flash 8B is Input: $0.04 / Output: $0.15 per 1M tokens.

17 input tokens and 2 output tokens * 10 million jobs = 170,000,000 input tokens, 20,000,000 output tokens... which costs a total of $6.38 https://tools.simonwillison.net/llm-prices

As for rate limits, https://ai.google.dev/pricing#1_5flash-8B says 4,000 requests per minute and 4 million tokens per minute - so you could run those 10 million jobs in about 2500 minutes or 42 hours.

elicksaur

How do you validate these classifications?

segmondy

The same way you validate it if you didn't use an LLM.

jeswin

Isn't it easier and cheaper to validate than to classify (requires expensive engineers)? I mean the skill is not as expensive - many companies do this at scale.

scarface_74

You need a domain expert either way. I mentioned in another reply that one of my niches is implementing call centers with Amazon Connect and Amazon Lex (the NLP engine).

https://news.ycombinator.com/item?id=42748189

I don’t know the domain beforehand they are working in, I do validation testing with them.

bugglebeetle

The same way you check performance for any problem like this: by creating one or more manually-labeled test datasets, randomly sampled from the target data and looking at the resulting precision, recall, f-scores etc. LLMs change pretty much nothing about evaluation for most NLP tasks.

axegon_

Yeah... Let's talk time needed for 10M prompts and how that fits into a daily pipeline. Enlighten us, please.

FloorEgg

Run them all in parallel with a cloud function in less than a minute?

LeafItAlone

>You'd need to be doing 20 times that volume every single day to even start to justify hiring an NLP engineer instead.

How much for the “prompt engineer”? Who is going to be doing the work and validating the output?

blindriver

You do not need a prompt engineer to create: “answer with just "service" or "product"”

Most classification prompts can be extremely easy and intuitive. The idea you have to hire a completely different prompt engineer is kind of funny. In fact you might be able to get the llm itself to help revise the prompt.

alexwebb2

All software engineers are (or can be) prompt engineers, at least to the level of trivial jobs like this. It's just an API call and a one-liner instruction. Odds are very good at most companies that they have someone on staff who can knock this out in short order. No specialized hiring required.

IanCal

Prompt engineering is less and less of an issue the simpler the job is and the more powerful the model is. You also don't need someone with deep nlp knowledge to measure and understand the output.

vlovich123

That’s the argument the article makes but the reasoning is a little questionable on a few fronts:

- It uses f16 for the data format whereas quantization can reduce the memory burden without a meaningful drop in accuracy, especially as compared with traditional NLP techniques.

- The quality of LLMs typically outperform OpenCV + NER.

- You can choose to replace just part of the pipeline instead of using the LLM for everything (e.g. using text-only 3B or 1B models to replace the NER model while keeping OpenCV)

- The (LLM compute / quality) / watt is constantly decreasing. Meaning even if it’s too expensive today, the system you’ve spent time building, tuning and maintaining today is quickly becoming obsolete.

- Talking with new grads in NLP programs, all the focus is basically on LLMs.

- The capability + quality out of models / size of model keeps increasing. That means your existing RAM & performance budget keeps absorbing problems that seemed previously out of reach

Now of course traditional techniques are valuable because they can be an important tool in bringing down costs (fixed function accelerator vs general purpose compute), but it’s going to become more niche and specialized with most tasks transitioning to LLMs I think.

The “bitter lesson” paper is really relevant to these kinds of discussions.

vlovich123

Not an independent player so obviously important to be critical of papers like this [1], but it’s claiming a ~10x cost in LLM inference every year. This lines up with the technical papers I’m seeing that are continually improving performance + the related HW improvements.

That’s obviously not sustainable indefinitely, but these kinds of exponentials are precisely why people often make incorrect conclusions on how long change will take to happen. Just a reminder: CPUs were 2x more performance every 18 months and continued to continually upend software companies for 20 years who weren’t in tune with this cycle (i.e. focusing on performance instead of features). For example, even if you’re spending $10k/month for LLM vs $100/month to process the 10M item, it can still be more beneficial to go the LLM route as you can buy cheaper expertise to put together your LLM pipeline than the NLP route to make up the ~100k/year difference (assuming the performance otherwise works and the improved quality and robustness of the LLM solution isn’t providing extra revenue to offset).

[1] https://a16z.com/llmflation-llm-inference-cost/

simonw

What NLP approaches are you using to solve the "is '20 bottles of ferric chloride' a service or a product?" problem?

blindriver

That’s sort of like asking a horse and buggy driver whether automobiles are going to put them out of business.

I think for the most part, casual nlp is dead because of LLMs. And LLM costs are going to plummet soon, so large scale nlp that you’re talking about is probably dead within 5 years or less. The fact that you can replace programmers with prompts is huge in my opinion so no one needs to learn an nlm API anymore, just stuff it into a prompt. Once costs to power LLMs decrease to meet the cost of programmers it’s game over.

arandomhuman

>The fact that you can replace programmers with prompts

this is how you end up with 1000s of lines of slop that you have no idea how it functions.

dartos

> LLM costs

Inference costs, not training costs.

> The fact that you can replace programmers

You can’t… not for any real project. For quick mockups they’re serviceable

> That’s sort of like asking a horse and buggy driver whether automobiles

Kind of an insult to OP, no? Horse and buggy drivers were not highly educated experts in their field.

Maybe take the word of domain experts rather than AI company marketing teams.

chaos_emergent

> Inference costs, not training costs.

Why does training cost matter if you have a general intelligence that can do the task for you, that’s getting cheaper to run the task on?

> for quick mockups they’re serviceable

I know multiple startups that use LLMs as their core bread-and-butter intelligence platform instead of tuned but traditional NLP models

> take the word of domain experts

I guess? I wouldn’t call myself an expert by any means but I’ve been working on NLP problems for about 5 years. Most people I know in NLP-adjacent fields have converged around LLMs being good for most (but obviously not all) problems.

> kind of an insult

Depends on whether you think OP intended to offend, ig

blindriver

> Maybe take the word of domain experts rather than AI company marketing teams.

Appeal to authority is a well known logical fallacy.

I know how dead NLP is personally because I’ve never been able to get NLP working but once ChatGPT came around, I was able to classify texts extremely easily. It’s transformational.

I was able to get ChatGPT to classify posts based on how political it was from a scale of 1 to 10 and which political leaning they were and then classify the persons likely political affiliations.

All of this without needing to learn any APIs or anything about NLPs. Sorry but given my experience, NLPs are dead in the water right now, except in terms of cost. And cost will go down exponentially as they always do. Right now I’m waiting for the RTC 5090 so I can just do it myself with open source LLM.

elwebmaster

Reply didn’t say that the expert is uneducated, just that their tool is obsolete. Better look at facts the way they are, sugar coating doesn’t serve anyone.

otabdeveloper4

> The fact that you can replace programmers with prompts

No, you can't. The only thing LLM's replace is internet commentators.

blindriver

As I explained below, I avoided having to learn anything about ML, PyTorch or any other APIs when trying to classify posts based on how political they were and which affiliation they were. That was holding me back and it was easily replaced by an llm and a prompt. Literally took me minutes what would have taken days or weeks and the results are more than good enough.

portaouflop

No you can’t; LLMs are dog shit at internet banter, too neutered

devjab

While I agree with both you and the article I also think it'll depend on more than just the volume of your data. We have quite a lot of documents that we classify. It's around 10-100k a month, some rather large others simple invoices. We used to have a couple of AI specialists who handled the classification with local NLP models, but when they left we had to find alternatives. For us this was the AI services in the cloud we use and the result has been a document warehouse which is both easier for the business to manage and a "pipeline" which is much cheaper than having those AI specialists on the payroll.

I imagine this wouldn't be the case if we were to do more classification projects, but we aren't. We did try to find replacements first, but it was impossible for us to attract any talent, which isn't too much of a surprise considering it's mainly maintenance. Using external consultants for that maintenance proved to be almost more expensive than having two full time employees.

bloomingkales

I suspect any solution like that will be wholesale thrown away in a year or two. Unless the damn thing is going to make money in the next 2-3 years, we are all mostly going to write throwaway code.

Things are such an opportunity cost now days. It’s like trying to capture value out of a transient amorphous cloud, you can’t hold any of it in your hand but the phenomenon is clearly occurring.

null

[deleted]

DebtDeflation

The question seems malformed to me.

Text classification, clustering, named entity recognition, etc. are NLP tasks. LLMs can perform these tasks. ML models that are not LLMs (or even not deep learning models) can also perform these tasks. Is the author perhaps asking if the concept of a "completion" has replaced all of these tasks?

When I hear "traditional NLP" I think not of the above types of tasks but rather the methodology employed for performing them. For example, building a pipeline to do stemming/lemmatization, part of speech tagging, coreference resolution, etc. before the text gets fed to a classifier model. This was SOTA 10 years ago but I don't think many people are still doing it today.

scarface_74

For my use case, definitely.

I have worked on AWS Connect (online call center) and Amazon Lex (the backing NLP engine) projects.

Before LLMs, it was a tedious process of trying to figure out all of the different “utterances” that people could say and the various languages you had to support. With LLMs, it’s just prompting

https://chatgpt.com/share/678bab08-f3a0-8010-82e0-32cff9c0b4...

I used something like this using Amazon Bedrock and a Lambda hook for Amazon Lex. Of course it wasn’t booking a flight. It was another system

The above is a simplified version. In the real world , I gave it a list of intents (book flights, reserve a room, rent a car) and properties - “slots” - I needed for each intent.

elicksaur

Thank you for sharing an actual prompt thread. So much of the LLM debate is washed in biases, and it is very helpful to share concrete examples of outputs.

scarface_74

The “cordele GA” example surprised me. I was expecting to get a value of “null” for the airport code since I knew that city had a population of 12K and no airport within its metropolitan statistical area. It returned an airport that was close.

Having world knowledge is a godsend. I also just tried a prompt with “Alpharetta, GA” a city north of Atlanta and it returned ATL. An NLP could never do that without a lot more work.

gtirloni

How about the the costs?

scarface_74

We measure savings in terms of call deflections. Clients we work with say that each time a customer talks to an agent it costs $2-$5. That’s not even taking into account call abandonments

IanCal

My base thing while advising people is that if anyone you pay needs to read the output, or you are directly replacing any kind of work then even frontier llm model inference costs are irrelevant. Of course you need to work out of that's truly the case but people worry about the cost in places where it's just irrelevant. If it's $2 when you get to an agent, each case that's avoided there could pay for around a million words read/generated. That's expensive compared to most API calls but irrelevant when counting human costs.

vedant

The title of this article feels like "has electricity killed oil lamps"?

null

[deleted]

darepublic

I remember using the open NLP library from Stanford around 2016. It would do parts of speech tagging of words in a sentence (labelling the words with their grammatical function). It was pretty good but reliably failed on certain words where context determined the tag. When for gpt 3 came out the first thing I tested it out on was parts of speech tagging. In particular those sentences open NLP had trouble with. And it aced everything I was impressed.

RancheroBeans

NLP is an important part of upcoming RAG frameworks like Microsoft’s LazyGraphRAG. So I think it’s more like NLP is a tool used when the time is right.

https://www.microsoft.com/en-us/research/blog/lazygraphrag-s...

politelemon

I could use some help understanding, is this a set of tools or techniques to answer questions? The name made me think it's related to create embeddings but it seems much more?

leobg

There are AI bros that will call an LLM to do what you could do with a regex. I’ve seen people do the chunking for RAG using an LLM…

oliwary

This article seems to be paywalled unfortunately. While LLMs are very useful when the tasks are complex and/or there is not a lot of training data, I still think traditional NLP pipelines have a very important role to play, including when:

- Depending on the complexity of the task and the required results, SVMs or BERT can be enough in many cases and take much lower resources, especially if there is a lot of training data available. Training these models with LLM outputs could also be an interesting approach to achieve this.

- When resources are constrained or latency is important.

- In some cases, there may be labeled data in certain classes that have no semantic connection between them, e.g. explaining the class to LLMs could be tricky.

eminent101

> This article seems to be paywalled unfortunately.

I am no fan of Medium paywalled articles but if it helps you, here's the article on archive - https://archive.is/J53CE