Extraction
Extract structured data from documents using schemas.
Extraction
Extraction lets you pull structured information from documents automatically. Define what you're looking for with a schema, run extraction on selected documents, and get organized results you can export.
Getting started
Create a Schema
Define what data to extract
Run Extraction
Process documents and get results
Review Results
Verify and troubleshoot
How it works
- Create a schema — Define field names and types (text, number, enum, etc.)
- Select documents — Choose individual documents or a folder (up to 50)
- Run extraction — Processing happens in the background
- View results — See structured data with provenance
- Export CSV — Download for analysis in Excel, Sheets, etc.
The Extract page
Navigate to Extract in the sidebar to see:
| Tab | Contents |
|---|---|
| Runs | Your extraction jobs — status, results, export |
| Schemas | Available templates — yours, team, admin templates |
Schemas
A schema defines what to extract:
- Field names — Column headers for your results
- Field types — Text, Number, Boolean, Enum, Object, Array
- Instructions — Natural language guidance for accuracy
Create your own or clone from templates.
Provenance
Every extracted value includes the source quote — the exact text passage it came from. Click to verify accuracy or check context.
Processing speed
Extraction processes roughly 10 pages per minute.
| Documents × Pages | Estimated time |
|---|---|
| 10 × 20 = 200 pages | ~20 minutes |
| 50 × 20 = 1,000 pages | ~100 minutes |
Limits
| Limit | Value |
|---|---|
| Documents per run (folder) | 50 |
| Fields per schema | 50 |
| Items extracted per document | 5,000 |