Skip to content(if available)orjump to list(if available)

LangExtract: Python library for extracting structured data from language models

hm-nah

Oly Chit! This is a BIG deal! Sub-page citations…in-context RAG…built-in HTML UI…this is like the holy grail of deterministic text extraction. I’m trying this ASAP Rocky.

constantinum

There is also Unstract(open-source) that helps process structured data extraction. Key differences:

1. Unstract has a Pre-processing layer(OCR). Which converts documents into LLM readable formats.(helps improve accuracy, and control costs)

2. Unstract also connects to your existing data sources, making it an out-of-the-box ETL tool.

https://github.com/Zipstack/unstract

fudged71

Any idea how it compares with docetl?

oriettaxx

impressive, really