Extraction Reference
Complete reference for extraction schemas, runs, and limits.
This page documents all extraction options, field types, status values, and system limits.
| Type | Description | Example |
|---|
| Text | Free-form text of any length | Names, descriptions, quotes |
| Number | Integer or decimal values | 2024, 3.14, -5 |
| Boolean | True/false value | Displays as "Yes" or "No" |
| Enum | One choice from predefined list | Must define at least 2 values |
| Object | Group of related sub-fields | Address with city, state, zip |
| Array | List of multiple values | ["value1", "value2"] |
| Option | Description |
|---|
| Required | Field must have a value (empty not allowed) |
| Description | Field-specific guidance for the AI |
| Visibility | Who can see | Who can edit |
|---|
| Private | Only you | Only you |
| Shared | Your team | Only you |
| Public | All users | Only you |
| Template | All users | Administrators only |
Template schemas cannot be edited or deleted, only cloned.
| Status | Description | Actions available |
|---|
| Queued | Waiting to start processing | Cancel |
| Running | Processing documents (shows X/Y progress) | Cancel |
| Complete | Successfully finished | View results, Export CSV, Delete |
| Failed | Error occurred during processing | View error, Delete |
| Cancelled | Stopped by user | View partial results, Delete |
| Column | Description | Default visible |
|---|
| Document | Source document name | Yes |
| [Schema fields] | One column per field | Yes |
| Source Quote | Text passage used for extraction | Yes |
| Chunk | Section/chunk identifier | No |
| Rule | Error message |
|---|
| Field name required | "Field name cannot be empty" |
| No reserved prefixes | "Field name cannot start with '_' or '$'" |
| Reserved names blocked | "[name] is a reserved field name" |
| Enum minimum | "Enum must have at least 2 values" |
| Enum uniqueness | "Duplicate enum value" |
Cannot use: id, created_at, updated_at, _provenance, _meta
| Data type | CSV representation |
|---|
| Text | Plain text |
| Number | Numeric value |
| Boolean | true or false |
| Enum | Selected value |
| Array | JSON string: ["val1","val2"] |
| Object | Flattened: Field.Subfield |
| Limit | Value |
|---|
| Documents per folder extraction | 50 |
| Items extracted per document | 5,000 |
| Fields per schema | 50 |
| Schema nesting depth | 5 levels |
| Schema definition size | 64 KB |
| Minimum enum values | 2 |
Extraction processes approximately 10 pages per minute.
| Example | Calculation |
|---|
| 100 pages | ~10 minutes |
| 500 pages | ~50 minutes |
| 1,000 pages | ~100 minutes |
| Error | Solution |
|---|
| "Field name cannot be empty" | Enter a name for the field |
| "Field name cannot start with '_'" | Rename to not start with underscore |
| "Enum must have at least 2 values" | Add more values or change to Text |
| "Duplicate enum value" | Remove the duplicate |
| Cause | Solution |
|---|
| Network/server issues | Try starting a new run |
| Corrupted documents | Remove problem documents |
| Very large documents | Split documents or increase timeout |
| Cause | Solution |
|---|
| Information not in document | Expected behavior |
| Unclear field name | Make name more specific |
| Information in unexpected format | Add extraction instructions |
Processing time depends on document size and schema complexity. This is normal for large documents with many fields.