Tips for using Gemini 2.0 for PDF ingestion
5 comments
·March 4, 2025jtrueb
petercooper
No direct recommendation for that use case, but one strategy I've heard being used and that works with complex documents (or where hallucinations are Very Bad™ - like invoice processing) is using multiple techniques and models at once in a quorum approach. For example, direct ingestion of PDFs into Gemini, OCR and ingestion of text, plus perhaps using another model like GPT. If they all agree on a fact, you're (probably) good. If not, it can be bumped up to human correction.
thelittleone
Interesting, although would be great to see some comparative results, e.g., with and without the html alt tag approach.
javier123454321
Honestly though, I hope that the google notbooklm https://notebooklm.google/ doesn't go to the google graveyard. It is great for feeding a decent amount of information and helping you process it. I've found great success at it.
null
Anyone have recommendations for chip datasheets? Ive explored a couple options so far, but getting some bitfields wrong is super annoying.
I see plenty of examples like the one here that are on easier extractions. A PDF to HTML or Markdown converter will probably get it right with OCR.