MoongraphMoongraph

Structured Extraction

How Moongraph extracts structured data from documents using schemas.

Structured Extraction

Structured extraction lets you pull specific information out of documents automatically. Instead of reading through hundreds of pages to find details, you define what you're looking for and the system finds and organizes it for you.

What extraction does

Define a schema describing what you want to extract — field names like "Author Name" or "Publication Date" with their types. Run extraction on selected documents, and the system produces a spreadsheet-like table of results.

Example: You have 50 research papers. You want to extract author names, publication dates, and key findings from each. Extraction processes all 50 and gives you a structured table you can filter, sort, and export.

Key concepts

Schemas

A schema is a template describing what to extract:

  • Field names — What to call each piece of data
  • Field types — What kind of data it is (text, number, enum, etc.)
  • Instructions — Optional guidance for accuracy

Schemas can be reused across multiple extraction runs.

Runs

A run is an extraction job. You select a schema and documents, and the system processes each document in the background. Results appear as they complete.

Processing takes roughly 10 pages per minute. A 50-document run with 20 pages each (~1,000 pages total) takes about 100 minutes.

Provenance

Every extracted value includes provenance — a direct quote from the source document showing where the data came from. This lets you:

  • Verify accuracy
  • Check original context
  • Cite sources

Field types

TypeWhat it capturesUse case
TextFree-form textNames, descriptions, quotes
NumberNumeric valuesYears, counts, measurements
BooleanYes/No answer"Is peer reviewed?", "Contains tables?"
EnumOne choice from a predefined listCategories, ratings, classifications
ObjectGrouped sub-fieldsAddresses (city, state, zip)
ArrayList of multiple valuesMultiple authors, keywords

Row granularity

What determines one row in results?

The extraction interprets your schema to decide. For example, with fields "Author", "Book", "Character":

  • Flat fields → One row per character (author/book repeated)
  • Characters as Array → One row per book (characters listed together)

Use extraction instructions to be explicit:

"Extract one row for each book. List all characters in the Characters field."

Visibility

Control who can use your schema:

LevelWho can see it
PrivateOnly you
SharedYour team
PublicAll users
TemplateAdmin-created, available to everyone, cannot be edited

When to use extraction

Extraction works well for:

  • Pulling consistent fields from similar documents (research papers, contracts, reports)
  • Building datasets from document collections
  • Extracting facts for analysis or comparison

Extraction is less suited for:

  • Documents with highly variable structure
  • Very open-ended information needs
  • One-off questions (use Agent instead)

On this page