OCR

SIE supports OCR via four dedicated models plus Florence-2’s <OCR> task:

OCR models (zai-org/GLM-OCR, lightonai/LightOnOCR-2-1B, PaddlePaddle/PaddleOCR-VL-1.5, docling). Convert document images or PDFs to Markdown, preserving tables and headings.
Florence-2 (microsoft/Florence-2-base). Flat-text OCR via the <OCR> and <OCR_WITH_REGION> task tokens.

For image captioning, object detection, and document QA, see Vision Tasks.

Pick by what you need to extract: structured Markdown for downstream chunking → one of the four dedicated OCR models; flat text or bounding-box OCR over a single image → Florence-2.

OCR Models

For converting document images or PDFs to Markdown, use one of the four dedicated OCR models. They preserve tables, headings, and reading order; Florence-2’s <OCR> task only returns flat text.

Model	Input	Best for	Notes
`zai-org/GLM-OCR`	Image	High-quality multilingual OCR	CogViT + GLM-0.5B; bfloat16 only
`lightonai/LightOnOCR-2-1B`	Image	Larger model, 2.1B params	Pixtral encoder + Qwen3 decoder
`PaddlePaddle/PaddleOCR-VL-1.5`	Image	109 languages, multi-mode (table/formula/chart)	0.9B params; smallest
`docling`	Document (PDF/DOCX/HTML)	Multi-page documents, layout-aware	Composite pipeline; OCR is opt-in

GLM-OCR

GLM-OCR returns a single Markdown string per page in entities[0].text.

Python
TypeScript

from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

with open("page.png", "rb") as f:
    page_bytes = f.read()

result = client.extract(
    "zai-org/GLM-OCR",
    Item(images=[{"data": page_bytes, "format": "png"}]),
)
markdown = result["entities"][0]["text"]
print(markdown)

import { SIEClient } from "@superlinked/sie-sdk";

const client = new SIEClient("http://localhost:8080");

const result = await client.extract(
  "zai-org/GLM-OCR",
  { images: [pageBytes] },  // Uint8Array of PNG/JPEG data
);
const markdown = result.entities[0].text;
console.log(markdown);

await client.close();

LightOnOCR-2-1B

Same call shape as GLM-OCR: one image per item, Markdown returned in entities[0].text.

Python
TypeScript

result = client.extract(
    "lightonai/LightOnOCR-2-1B",
    Item(images=[{"data": page_bytes, "format": "png"}]),
)
markdown = result["entities"][0]["text"]

const result = await client.extract(
  "lightonai/LightOnOCR-2-1B",
  { images: [pageBytes] },
);
const markdown = result.entities[0].text;

PaddleOCR-VL-1.5

PaddleOCR-VL supports six task modes via options.task: ocr (default), table, formula, chart, spotting, seal.

Python
TypeScript

# Default OCR mode
result = client.extract(
    "PaddlePaddle/PaddleOCR-VL-1.5",
    Item(images=[{"data": page_bytes, "format": "png"}]),
)
markdown = result["entities"][0]["text"]

# Table-extraction mode
result = client.extract(
    "PaddlePaddle/PaddleOCR-VL-1.5",
    Item(images=[{"data": table_image, "format": "png"}]),
    options={"task": "table"},
)

const result = await client.extract(
  "PaddlePaddle/PaddleOCR-VL-1.5",
  { images: [pageBytes] },
  { options: { task: "table" } },  // or "ocr", "formula", "chart", "spotting", "seal"
);
const markdown = result.entities[0].text;

Docling (multi-page documents)

Docling parses entire PDF/DOCX/HTML files in one call, preserving layout, tables, and headings. Output goes to data (not entities):

Field	Type	Description
`text`	`str`	Plain-text rendering
`markdown`	`str`	Markdown with tables and headings preserved
`document`	`dict`	Full DoclingDocument JSON for downstream chunkers

Docling ships two profiles:

Profile	What it does	When to use
`default`	Layout + table-structure parsing; uses embedded text from PDFs	Born-digital PDFs and DOCX/HTML (fastest)
`ocr`	Same as default + runs OCR on rasterized pages	Scanned PDFs or images-only documents

Python
TypeScript

from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

with open("report.pdf", "rb") as f:
    pdf_bytes = f.read()

# Default profile: fast, no OCR (born-digital PDFs only)
result = client.extract(
    "docling",
    Item(document={"data": pdf_bytes, "format": "pdf"}),
)
markdown = result["data"]["markdown"]

# OCR profile: needed for scanned PDFs
result_ocr = client.extract(
    "docling",
    Item(document={"data": pdf_bytes, "format": "pdf"}),
    options={"profile": "ocr"},
)

import { SIEClient } from "@superlinked/sie-sdk";

const client = new SIEClient("http://localhost:8080");

const result = await client.extract(
  "docling",
  { document: { data: pdfBytes, format: "pdf" } },
);
const markdown = result.data.markdown;

// OCR profile: needed for scanned PDFs
const resultOcr = await client.extract(
  "docling",
  { document: { data: pdfBytes, format: "pdf" } },
  { options: { profile: "ocr" } },
);

await client.close();

OCR (Text from Images)

For flat-text OCR without layout, the microsoft/Florence-2-base model exposes an <OCR> task token:

Python
TypeScript

result = client.extract(
    "microsoft/Florence-2-base",
    Item(images=[{"data": document_image, "format": "png"}]),
    options={"task": "<OCR>"}
)

for entity in result["entities"]:
    print(entity["text"])
# Extracted text from the document image

const result = await client.extract(
  "microsoft/Florence-2-base",
  { images: [documentImage] },  // Uint8Array of PNG data
  { options: { task: "<OCR>" } }
);

for (const entity of result.entities) {
  console.log(entity.text);
}

OCR with Regions

To get text with bounding box positions (the default task):

Python
TypeScript

result = client.extract(
    "microsoft/Florence-2-base",
    Item(images=[{"data": document_image, "format": "png"}]),
    options={"task": "<OCR_WITH_REGION>"}
)

for entity in result["entities"]:
    print(f"{entity['text']} at {entity['bbox']}")

const result = await client.extract(
  "microsoft/Florence-2-base",
  { images: [documentImage] },
  { options: { task: "<OCR_WITH_REGION>" } }
);

for (const entity of result.entities) {
  console.log(`${entity.text} at ${JSON.stringify(entity.bbox)}`);
}

See Vision Tasks for the full list of Florence-2 task tokens.

What’s Next

Vision Tasks - image captioning, object detection, and document understanding
NER & Entity Extraction - named entity recognition
Relations & Classification - relation extraction and text classification
Full model catalog - all supported models