MoongraphMoongraph

Supported Formats

File types, sizes, and format requirements for document upload.

Supported Formats

Moongraph supports several document formats. This page covers what's supported, size limits, and tips for best results.

Supported File Types

FormatExtensionNotes
PDF.pdfText extraction with layout awareness
Word.doc, .docxConverted and processed
Excel.xls, .xlsxConverted and processed
PowerPoint.ppt, .pptxConverted and processed
PNG.pngOCR for text extraction
JPEG.jpg, .jpegOCR for text extraction
TIFF.tiff, .tifOCR for text extraction
GIF.gifOCR for text extraction
WebP.webpOCR for text extraction
Text.txtDirect text ingestion
Markdown.mdParsed as text content
HTML.html, .htmParsed and extracted

PDF Documents

PDFs are the most common format and work well with Moongraph.

Text-Based PDFs

PDFs with embedded text (created from Word documents, exported from software, etc.) process quickly and accurately.

Scanned PDFs

PDFs containing scanned images require OCR. Quality depends on:

  • Scan resolution: 300 DPI minimum recommended
  • Clarity: Sharp, clean scans work best
  • Orientation: Properly oriented pages process better

Scanned documents take longer to process and may have OCR errors. If accuracy is critical, review the extracted content.

Password-Protected PDFs

Password-protected PDFs will fail to process. Remove password protection before uploading.

Images

Images (PNG, JPEG) are processed using OCR to extract text.

Best Practices for Images

  • Resolution: Higher is better. 300 DPI minimum for documents.
  • Clarity: Avoid blurry or low-contrast images
  • Format: PNG preserves quality better than JPEG for text documents
  • Content: Works best with printed/typed text. Handwriting is less reliable.

Limitations

  • Very large images may be resized for processing
  • Complex layouts (multi-column, mixed text/graphics) may not extract perfectly
  • Handwritten content has lower accuracy than printed text

Text Files

Plain text and Markdown files are ingested directly without parsing overhead.

  • Fast to process
  • No OCR needed
  • What you upload is what gets indexed

File Size Limits

ConstraintLimit
Maximum file size35 MB per file
Maximum pagesNo hard limit, but very large documents (500+ pages) are slower
Bulk uploadUp to 50 files at once

Very large documents (500+ pages, 50+ MB) may time out or fail. Consider splitting them into smaller sections.

Unsupported Formats

Currently not supported:

  • Audio files
  • Video files

URL import and audio transcription are planned features.

If you need support for additional formats, contact cole@moongraph.io.

Tips for Best Results

Before Uploading

  1. Remove passwords from protected PDFs
  2. Convert Office documents to PDF
  3. Use descriptive filenames — helps with organization and metadata
  4. Check scan quality for image-based documents

Document Quality

  • Clear, readable text extracts better than blurry or faded content
  • Consistent formatting helps the chunking algorithm
  • Standard fonts are more reliable than decorative ones

Large Collections

  • Upload in batches of 20-50 documents
  • Monitor the first batch for any format issues
  • Failed documents don't block others in the queue

Troubleshooting

"Document failed to process"

Common causes:

  • Password protection
  • Corrupted file
  • Unsupported format disguised as PDF
  • Very large file timing out

Try: Remove protection, re-download the original, convert format, or split into smaller files.

"OCR quality is poor"

Try:

  • Higher resolution scan (300+ DPI)
  • Cleaner original document
  • Better lighting/contrast if re-scanning
  • PNG instead of JPEG for screenshots

"Text looks garbled"

Possible causes:

  • Non-standard fonts
  • Complex layouts
  • Poor quality scan

The extracted content may still be usable for semantic search even if not perfectly formatted.

On this page