Skip to content

OCR and Parsing Pipelines

Parse Engine

ModeMeaningBest for
local_ocrParse PDF locally, then run OCR as neededGeneral default mode
remote_ocrRemote OCR-first pipelineOCR-quality-first scenarios
baidu_docBaidu document parsing pipelineStructured parsing
mineru_cloudMinerU cloud parsing pipelineTables, formulas, and complex layouts

OCR Provider

ProviderMeaningNotes
aiocrRemote OpenAI-compatible OCRGood for higher-quality OCR
tesseractLocal TesseractFewer external dependencies
paddle_localLocal PaddleOCRFully local setup
baiduBaidu OCRStandalone provider

AIOCR Chain

ModeMeaningCharacteristic
directSend the whole page directly to a vision modelSimplest setup
layout_blockSplit into local blocks, then recognize block by blockBetter for dense mixed layouts
doc_parserStructured document parsing channelBetter structural information

Scanned Page Mode

ModeMeaningResult
fullpageKeep the page as a background and overlay editable textSafest and closest to the original
segmentedSplit charts, screenshots, and other image regions into separate objects when possibleMore editable afterwards

For the highest chance of a stable first run, start with:

  • remote_ocr
  • aiocr
  • fullpage

Then adjust based on your needs:

  • try segmented when you want more editable image regions
  • try baidu_doc or mineru_cloud when structural parsing matters more

Boundaries

This project is better understood as a high-fidelity reconstruction tool for scanned and image-heavy documents. It should not be interpreted as:

  • a guarantee that any PDF becomes a fully structured, fully editable native PPT
  • a promise that complex documents work well without OCR or parsing configuration
  • a system where every page becomes easier to edit than the original source

MIT Licensed