Skip to content
Why did we open-source our inference engine? Read the post

OCR

SIE supports OCR via four dedicated models plus Florence-2’s <OCR> task:

  • OCR models (zai-org/GLM-OCR, lightonai/LightOnOCR-2-1B, PaddlePaddle/PaddleOCR-VL-1.5, docling). Convert document images or PDFs to Markdown, preserving tables and headings.
  • Florence-2 (microsoft/Florence-2-base). Flat-text OCR via the <OCR> and <OCR_WITH_REGION> task tokens.

For image captioning, object detection, and document QA, see Vision Tasks.

Pick by what you need to extract: structured Markdown for downstream chunking → one of the four dedicated OCR models; flat text or bounding-box OCR over a single image → Florence-2.

For converting document images or PDFs to Markdown, use one of the four dedicated OCR models. They preserve tables, headings, and reading order; Florence-2’s <OCR> task only returns flat text.

ModelInputBest forNotes
zai-org/GLM-OCRImageHigh-quality multilingual OCRCogViT + GLM-0.5B; bfloat16 only
lightonai/LightOnOCR-2-1BImageLarger model, 2.1B paramsPixtral encoder + Qwen3 decoder
PaddlePaddle/PaddleOCR-VL-1.5Image109 languages, multi-mode (table/formula/chart)0.9B params; smallest
doclingDocument (PDF/DOCX/HTML)Multi-page documents, layout-awareComposite pipeline; OCR is opt-in

GLM-OCR returns a single Markdown string per page in entities[0].text.

from sie_sdk import SIEClient
from sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
with open("page.png", "rb") as f:
page_bytes = f.read()
result = client.extract(
"zai-org/GLM-OCR",
Item(images=[{"data": page_bytes, "format": "png"}]),
)
markdown = result["entities"][0]["text"]
print(markdown)

Same call shape as GLM-OCR: one image per item, Markdown returned in entities[0].text.

result = client.extract(
"lightonai/LightOnOCR-2-1B",
Item(images=[{"data": page_bytes, "format": "png"}]),
)
markdown = result["entities"][0]["text"]

PaddleOCR-VL supports six task modes via options.task: ocr (default), table, formula, chart, spotting, seal.

# Default OCR mode
result = client.extract(
"PaddlePaddle/PaddleOCR-VL-1.5",
Item(images=[{"data": page_bytes, "format": "png"}]),
)
markdown = result["entities"][0]["text"]
# Table-extraction mode
result = client.extract(
"PaddlePaddle/PaddleOCR-VL-1.5",
Item(images=[{"data": table_image, "format": "png"}]),
options={"task": "table"},
)

Docling parses entire PDF/DOCX/HTML files in one call, preserving layout, tables, and headings. Output goes to data (not entities):

FieldTypeDescription
textstrPlain-text rendering
markdownstrMarkdown with tables and headings preserved
documentdictFull DoclingDocument JSON for downstream chunkers

Docling ships two profiles:

ProfileWhat it doesWhen to use
defaultLayout + table-structure parsing; uses embedded text from PDFsBorn-digital PDFs and DOCX/HTML (fastest)
ocrSame as default + runs OCR on rasterized pagesScanned PDFs or images-only documents
from sie_sdk import SIEClient
from sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
with open("report.pdf", "rb") as f:
pdf_bytes = f.read()
# Default profile: fast, no OCR (born-digital PDFs only)
result = client.extract(
"docling",
Item(document={"data": pdf_bytes, "format": "pdf"}),
)
markdown = result["data"]["markdown"]
# OCR profile: needed for scanned PDFs
result_ocr = client.extract(
"docling",
Item(document={"data": pdf_bytes, "format": "pdf"}),
options={"profile": "ocr"},
)

For flat-text OCR without layout, the microsoft/Florence-2-base model exposes an <OCR> task token:

result = client.extract(
"microsoft/Florence-2-base",
Item(images=[{"data": document_image, "format": "png"}]),
options={"task": "<OCR>"}
)
for entity in result["entities"]:
print(entity["text"])
# Extracted text from the document image

To get text with bounding box positions (the default task):

result = client.extract(
"microsoft/Florence-2-base",
Item(images=[{"data": document_image, "format": "png"}]),
options={"task": "<OCR_WITH_REGION>"}
)
for entity in result["entities"]:
print(f"{entity['text']} at {entity['bbox']}")

See Vision Tasks for the full list of Florence-2 task tokens.

Contact us

Tell us about your use case and we'll get back to you shortly.