Supported Formats
File types, sizes, and format requirements for document upload.
Supported Formats
Moongraph supports several document formats. This page covers what's supported, size limits, and tips for best results.
Supported File Types
| Format | Extension | Notes |
|---|---|---|
.pdf | Text extraction with layout awareness | |
| Word | .doc, .docx | Converted and processed |
| Excel | .xls, .xlsx | Converted and processed |
| PowerPoint | .ppt, .pptx | Converted and processed |
| PNG | .png | OCR for text extraction |
| JPEG | .jpg, .jpeg | OCR for text extraction |
| TIFF | .tiff, .tif | OCR for text extraction |
| GIF | .gif | OCR for text extraction |
| WebP | .webp | OCR for text extraction |
| Text | .txt | Direct text ingestion |
| Markdown | .md | Parsed as text content |
| HTML | .html, .htm | Parsed and extracted |
PDF Documents
PDFs are the most common format and work well with Moongraph.
Text-Based PDFs
PDFs with embedded text (created from Word documents, exported from software, etc.) process quickly and accurately.
Scanned PDFs
PDFs containing scanned images require OCR. Quality depends on:
- Scan resolution: 300 DPI minimum recommended
- Clarity: Sharp, clean scans work best
- Orientation: Properly oriented pages process better
Scanned documents take longer to process and may have OCR errors. If accuracy is critical, review the extracted content.
Password-Protected PDFs
Password-protected PDFs will fail to process. Remove password protection before uploading.
Images
Images (PNG, JPEG) are processed using OCR to extract text.
Best Practices for Images
- Resolution: Higher is better. 300 DPI minimum for documents.
- Clarity: Avoid blurry or low-contrast images
- Format: PNG preserves quality better than JPEG for text documents
- Content: Works best with printed/typed text. Handwriting is less reliable.
Limitations
- Very large images may be resized for processing
- Complex layouts (multi-column, mixed text/graphics) may not extract perfectly
- Handwritten content has lower accuracy than printed text
Text Files
Plain text and Markdown files are ingested directly without parsing overhead.
- Fast to process
- No OCR needed
- What you upload is what gets indexed
File Size Limits
| Constraint | Limit |
|---|---|
| Maximum file size | 35 MB per file |
| Maximum pages | No hard limit, but very large documents (500+ pages) are slower |
| Bulk upload | Up to 50 files at once |
Very large documents (500+ pages, 50+ MB) may time out or fail. Consider splitting them into smaller sections.
Unsupported Formats
Currently not supported:
- Audio files
- Video files
URL import and audio transcription are planned features.
If you need support for additional formats, contact cole@moongraph.io.
Tips for Best Results
Before Uploading
- Remove passwords from protected PDFs
- Convert Office documents to PDF
- Use descriptive filenames — helps with organization and metadata
- Check scan quality for image-based documents
Document Quality
- Clear, readable text extracts better than blurry or faded content
- Consistent formatting helps the chunking algorithm
- Standard fonts are more reliable than decorative ones
Large Collections
- Upload in batches of 20-50 documents
- Monitor the first batch for any format issues
- Failed documents don't block others in the queue
Troubleshooting
"Document failed to process"
Common causes:
- Password protection
- Corrupted file
- Unsupported format disguised as PDF
- Very large file timing out
Try: Remove protection, re-download the original, convert format, or split into smaller files.
"OCR quality is poor"
Try:
- Higher resolution scan (300+ DPI)
- Cleaner original document
- Better lighting/contrast if re-scanning
- PNG instead of JPEG for screenshots
"Text looks garbled"
Possible causes:
- Non-standard fonts
- Complex layouts
- Poor quality scan
The extracted content may still be usable for semantic search even if not perfectly formatted.