Skip to content(if available)orjump to list(if available)

Nvidia-Ingest: Multi-modal data extraction

hammersbald

Is there a OCR toolkit or a ML Model which is able to reliable extract tables from invoices?

benpacker

All frontier multi modal LLMs can do this - there’s likely something lighter weight as well.

In my experience, the latest Gemini is best at vision and OCR

null

[deleted]

ixaxaar

Ah so like NIM is a set of microservices on top of various models, and this is another set of microservices using NIM microservices to do large scale OCR?

and that too integrated with prometheus, 160GB VRAM requirement and so on?

Looks like this is targeted for enterprises or maybe governments etc trying to digitalize at scale.

greatgib

I have hard time to understand what they mean by "early access micro services"...?

Does it mean that it is yet another wrapper library to call they proprietary cloud api?

Or that when you have the specific access right, you can retrieve a proprietary docker image with secret proprietary binary stuffs inside that will be the server used by the library available in GitHub?

theossuary

The latter. NIMs is Nvidia's umbrella branding for proprietary containerized AI models, which is being pushed hard by Jensen. They build models and containers, then push them to ngc.nvidia.com. They then provide reference architectures which rely on them. In this case the images are in an invite only org, so to use the helm chart you have to sign up, request access, then use an API key to pull the image.

You can imagine how fun it is to debug.

jappgar

Nvidia getting in on the lucrative gpt-wrapper market.

joaquincabezas

lol, while checking which OCR is using (PaddleOCR) I found a line with the text: "TODO(Devin)" and was pretty excited thinking they were already using Devin AI...

"Devin Robison" is the author of the package!! Funny, guess it will be similar with the name Alexa

null

[deleted]

vardump

Sounds pretty useful. What are the system requirements?

  Prerequisites
  Hardware
  GPU Family Memory # of GPUs (min.)
  H100 SXM or PCIe 80GB 2
  A100 SXM or PCIe 80GB 2
Hmm, perhaps this is not for me.

null

[deleted]

shutty

Wow, I perhaps need a kubernetes cluster just for a demo:

    CONTAINER ID   IMAGE                                                    
    0f2f86615ea5   nvcr.io/ohlfw0olaadg/ea-participants/nv-ingest:24.10     
    de44122c6ddc   otel/opentelemetry-collector-contrib:0.91.0              
    02c9ab8c6901   nvcr.io/ohlfw0olaadg/ea-participants/cached:0.2.0        
    d49369334398   nvcr.io/nim/nvidia/nv-embedqa-e5-v5:1.1.0                
    508715a24998   nvcr.io/ohlfw0olaadg/ea-participants/nv-yolox-structured-images-v1:0.2.0
    5b7a174a0a85   nvcr.io/ohlfw0olaadg/ea-participants/deplot:1.0.0                                                                     
    430045f98c02   nvcr.io/ohlfw0olaadg/ea-participants/paddleocr:0.2.0                                                                  
    8e587b45821b   grafana/grafana                                                         
    aa2c0ec387e2   redis/redis-stack                                                       
    bda9a2a9c8b5   openzipkin/zipkin                                                       
    ac27e5297d57   prom/prometheus:latest

threeseed

You can just use k3s/rke2 and run everything on the same node.

fsniper

It may be least of your worries considering it requires 2x[A/H]100 80GB Ram.

null

[deleted]