The case for the return of fine-tuning
15 comments
·October 19, 2025meander_water
soVeryTired
Have you used PaddleOCR? I'm surprised they're claiming SOTA without comparing against Amazon Textract or Azure doc intelligence (LayoutLM v3 under the hood, as far as I know).
I've played around with doc recognition quite a bit, and as far as I can tell those two are best-in-class.
empiko
Fine-tuning is a good technique to have in a toolbox, but in reality, it is feasible only in some use cases. On one hand, many NLP tasks are already easy enough for LLMs to have near perfect accuracy and fine tuning is not needed. On the other hand, really complex tasks are really difficult to fine-tune and clevem data collection might be pretty expensive. Fine-tuning can help with the use cases somewhere in the middle, not too simple, not too complex, feasible for data collection, etc.
libraryofbabel
What would you say is an example of one of those “middle” tasks it can help with?
CaptainOfCoit
An example I just found worked very well with fine-tuning: I wanted to extract any frame that contained a full-screen presentation slide from a various videos I've archived, only when it's full-screen, and also not capture videos, and some other constraints.
Naturally I reached for CLIP+ViT which got me a ~60% success rate out of the box. Then based on that, I created a tiny training script that read `dataset/{slide,no_slide}` and trained a new head based on that. After adding ~100 samples of each, the success rate landed at 95% which was good enough to call it done, and circle back to iterate once I have more data.
I ended up with a 2.2K large "head_weights.safetensors" that increased the accuracy by ~35% which felt really nice.
leblancfg
Fine tuning was never really hard to do locally if you had the hardware. What I’d like to read in an article like this is more details into why they’re making a comeback.
Curious to hear others’ thoughts on this
melpomene
This website loads at impressive speeds (from Europe)! Rarely seen anything more snappy. Dynamic loading of content as you scroll, small compressed images without looking like it (webp). Well crafted!
hshdhdhehd
Magic of a CDN? Plus avoiding JS probably. Haven't checked source though.
oli5679
The OpenAI fine-tuning api is pretty good - you need to label an evaluation benchmark anyway to systematically iterate on prompts and context, and it’s often creates good results if you give it a 50-100 examples, either beating frontier models or allowing a far cheaper and faster model to catch up.
It requires no local gpus, just creating a json and posting to OpenAI
deaux
They don't offer it for GPT-5 series, as a result much of the time fine-tuning Gemini 2.5-Flash is a better deal.
CuriouslyC
Fine tuning by pretraining over a RL tuned model is dumb AF. RL task tuning works quite well.
HarHarVeryFunny
You may have no choice in how the model you are fine tuning was trained, and may have no interest in verticals it was RL tuned for.
In any case, platforms like tinker.ai support both SFT and RL.
CuriouslyC
Why would you choose a model where the trained in priors don't match your use case? Also, keep in mind that RL'd in behavior includes things like reasoning and how to answer questions correctly, so you're literally taking smart models and making them dumber by doing SFT. To top it off, SFT only produces really good results when you have traces that closely model the actual behavior you're trying to get the model to display. If you're just trying to fine tune in a knowledge base, a well tuned RAG setup + better prompts win every time.
imcritic
Because you need a solution for your problem and the available tools are what they are and nothing else and you don't have enough resources to train your own model.
null
A couple of examples I have seen recently which makes me agree with OP:
- PaddleOCR, a 0.9B model that reaches SOTA accuracy across text, tables, formulas, charts & handwriting. [0]
- A 3B and 8B model which performs HTML to json extraction at GPT-5 level accuracy at 40-80x less cost, and faster inference. [1]
I think it makes sense to fine tune when you're optimizing for a specific task.
[0] https://huggingface.co/papers/2510.14528
[1] https://www.reddit.com/r/LocalLLaMA/comments/1o8m0ti/we_buil...