Putting Andrew Ng's OCR models to the test

noitanec

I took the screenshot of the the bill in their article and ran through the tool at https://va.landing.ai/demo/doc-extraction. The tool doesn't hallucinate any of the value as reported in the article. In fact, the value for Profit/loss for continuing operations is 1654 in their extraction which is the gt, still they've plot a red bbox around it.

ritvikpandey21

good catch on the 1654, will edit that on our blog! try it multiple times, we've noticed esp for tabular data it's fairly nondeterministic. we trialed it over 10 times on many financial CIMs and observed this phenomena.

ritvikpandey21

Today, Andrew Ng, one of the legends of the AI world, released a new document extraction service that went viral on X:

https://x.com/AndrewYNg/status/1895183929977843970

At Pulse, we put the models to the test with complex financial statements and nested tables – the results were underwhelming to say the least, and suffer from many of the same issues we see when simply dumping documents into GPT or Claude.

moralestapia

That's the standard tier of competence you expect from Ng. Academia is always close but no cigar.

teruakohatu

> That's the standard tier of competence you expect from Ng. Academia is always close but no cigar.

Academics do research. You should not expect an academic paper to be turned into a business or production overnight.

The first neural network, the Mark 1 Perceptron, was invented during WWII for OCR. It took 70 years of non-commercial research to bring us to the very useful multimodal LLMs of today.

mattmanser

It's more they had to wait for processing power to catch up.

One of my bit older friends got an AI doctorate in the 00s, and would always lament a business would never bother reading his thesis, they'd just end up recreating what he did in a few weeks themselves.

It's easy to forget now that in the 90s//00s/10s AI research was mainly viewed as a waste of time. The recurring joke was that general AI was just 20 years away, and had been for the last few decades.

ritvikpandey21

don't be mistaken, andrew's a legend! he's done some incredible work -- google brain, coursera, baidu ai, etc.

panny

It seems like you missed the point. Andrew Ng is not there to give you production grade models. He exists to deliver a proof of concept that needs refinements.

>Here's an idea that could use some polish, but I think as an esteemed AI researcher that it could improve your models. -- Andrew Ng

>OH MY GOSH! IT ISN'T PRODUCTION READY OUT OF THE BOX, LOOK AT HOW DUMB THIS STUFFED SHIRT HAPPENS TO BE!!! -- You

Nobody appreciates a grandstander. You're really treading on thin ice by attacking someone who has given so much to the AI community and asked for so little in return. Andrew Ng clearly does this because he enjoys it. You are here to self-promote and it looks bad on you.

yorwba

This is not about some paper Ng published with a new idea that needs some polishing before being useful in the real world.

It's a product released by a company Ng cofounded. So expecting production-readiness isn't asking for too much in my opinion.

ritvikpandey21

we respect andrew a lot, as we mentioned in our blog! he's an absolute legend in the field, founded google brain, coursera, worked heavily on baidu ai. this is more to inform everyone not to blindly trust new document extraction tools without really giving them challenges!

serjester

Personally I find it frustrating they called it "agentic" parsing when there's nothing agentic about it. Not surprised the quality is lackluster.

pierre

If you want to try agentic parsing we added support for sonnet-3.7 agentic parse and gemini 2.0 in llamaParse. cloud.llamaindex.ai/parse (select advanced options / parse with agent then a model)

However this come at a high cost in token and latency, but result in way better parse quality. Hopefully with new model this can be improved.

ritvikpandey21

we're not the biggest believers in 'agentic' parsing! we definitely do believe there's a specific role for LLMs in the data ingestion pipeline, but this occurs more when bar graphs/charts/figures -> structured markdown.

we're messing around with some agentic zooming around documents internally, will make our findings public!

null

[deleted]

krashidov

How does pulse compare to reducto and gemini? Claude is actually pretty good at PDFs (much better than GPT)

ritvikpandey21

claude is definitely better than gpt -- but both have their flaws! they pretty much fall flat on their face with nested entries, low-fidelity images, etc. (we detailed this heavily in our blog post here [1])

other ocr providers are doing a great job - we personally believe we have the highest accuracy tool on the market. we're not here to dunk on anyone just provide unbiased feedback when putting new document extraction tools through a challenge.

[1]: https://www.runpulse.com/blog/why-llms-suck-at-ocr

Ishirv

good read, saw your recent raise in BI - congrats!

sidmanchkanti21

appreciate it!

ritvikpandey21

thanks man!

j7ake

Honestly he’s famous for pedagogy and research papers, not real world products.

Not surprised it’s underwhelming

deepsun

What about Coursera? It's a real world product.

porridgeraisin

> Pedagogy

what

> - Over 50% hallucinated values in complex financial tables

> - Completely fabricated numbers in several instances

Why are these different bullet points? Which one is correct number of wrong values?

ritvikpandey21

to not make the read extra long, we only included one example. we tried over 50 docs and found a couple with pie charts/bar graphs that weren't parsed at all. there were also a few instances with entire column entires incorrect due to mismatching.

kneegerman

>grifter grifts diggity

HN

Putting Andrew Ng's OCR models to the test

Putting Andrew Ng's OCR models to the test