---
title: OCR
description: Convert document images and PDFs to Markdown with OCR models, plus flat-text OCR via Florence-2.
canonical_url: https://superlinked.com/docs/extract/ocr
last_updated: 2026-05-14
---

SIE supports OCR via four dedicated models plus Florence-2's `<OCR>` task:

- **OCR models** (`zai-org/GLM-OCR`, `lightonai/LightOnOCR-2-1B`, `PaddlePaddle/PaddleOCR-VL-1.5`, `docling`). Convert document images or PDFs to Markdown, preserving tables and headings.
- **Florence-2** (`microsoft/Florence-2-base`). Flat-text OCR via the `<OCR>` and `<OCR_WITH_REGION>` task tokens.

For image captioning, object detection, and document QA, see [Vision Tasks](/docs/extract/vision/).

Pick by what you need to extract: structured Markdown for downstream chunking → one of the four dedicated OCR models; flat text or bounding-box OCR over a single image → Florence-2.

## OCR Models

For converting document images or PDFs to Markdown, use one of the four dedicated OCR models. They preserve tables, headings, and reading order; Florence-2's `<OCR>` task only returns flat text.

| Model | Input | Best for | Notes |
|-------|-------|----------|-------|
| `zai-org/GLM-OCR` | Image | High-quality multilingual OCR | CogViT + GLM-0.5B; bfloat16 only |
| `lightonai/LightOnOCR-2-1B` | Image | Larger model, 2.1B params | Pixtral encoder + Qwen3 decoder |
| `PaddlePaddle/PaddleOCR-VL-1.5` | Image | 109 languages, multi-mode (table/formula/chart) | 0.9B params; smallest |
| `docling` | Document (PDF/DOCX/HTML) | Multi-page documents, layout-aware | Composite pipeline; OCR is opt-in |

:::note
Quality and latency benchmarks for these models are in flight. They will be published when the eval-matrix work in [sie-internal#578](https://github.com/superlinked/sie-internal/issues/578) lands. The recommendations above reflect input shape and feature differences, not measured performance.
:::

### GLM-OCR

Source: [packages/sie_server/src/sie_server/adapters/glm_ocr/__init__.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/adapters/glm_ocr/__init__.py)

GLM-OCR returns a single Markdown string per page in `entities[0].text`.

#### Python

```python
from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

with open("page.png", "rb") as f:
    page_bytes = f.read()

result = client.extract(
    "zai-org/GLM-OCR",
    Item(images=[{"data": page_bytes, "format": "png"}]),
)
markdown = result["entities"][0]["text"]
print(markdown)
```

#### TypeScript

```typescript
import { SIEClient } from "@superlinked/sie-sdk";

const client = new SIEClient("http://localhost:8080");

const result = await client.extract(
  "zai-org/GLM-OCR",
  { images: [pageBytes] },  // Uint8Array of PNG/JPEG data
);
const markdown = result.entities[0].text;
console.log(markdown);

await client.close();
```

### LightOnOCR-2-1B

Source: [packages/sie_server/src/sie_server/adapters/lighton_ocr/__init__.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/adapters/lighton_ocr/__init__.py)

Same call shape as GLM-OCR: one image per item, Markdown returned in `entities[0].text`.

#### Python

```python
result = client.extract(
    "lightonai/LightOnOCR-2-1B",
    Item(images=[{"data": page_bytes, "format": "png"}]),
)
markdown = result["entities"][0]["text"]
```

#### TypeScript

```typescript
const result = await client.extract(
  "lightonai/LightOnOCR-2-1B",
  { images: [pageBytes] },
);
const markdown = result.entities[0].text;
```

### PaddleOCR-VL-1.5

Source: [packages/sie_server/src/sie_server/adapters/paddleocr_vl/__init__.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/adapters/paddleocr_vl/__init__.py)

PaddleOCR-VL supports six task modes via `options.task`: `ocr` (default), `table`, `formula`, `chart`, `spotting`, `seal`.

#### Python

```python
# Default OCR mode
result = client.extract(
    "PaddlePaddle/PaddleOCR-VL-1.5",
    Item(images=[{"data": page_bytes, "format": "png"}]),
)
markdown = result["entities"][0]["text"]

# Table-extraction mode
result = client.extract(
    "PaddlePaddle/PaddleOCR-VL-1.5",
    Item(images=[{"data": table_image, "format": "png"}]),
    options={"task": "table"},
)
```

#### TypeScript

```typescript
const result = await client.extract(
  "PaddlePaddle/PaddleOCR-VL-1.5",
  { images: [pageBytes] },
  { options: { task: "table" } },  // or "ocr", "formula", "chart", "spotting", "seal"
);
const markdown = result.entities[0].text;
```

### Docling (multi-page documents)

Source: [packages/sie_server/src/sie_server/adapters/docling/__init__.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/adapters/docling/__init__.py)

Docling parses entire PDF/DOCX/HTML files in one call, preserving layout, tables, and headings. Output goes to `data` (not `entities`):

| Field | Type | Description |
|-------|------|-------------|
| `text` | `str` | Plain-text rendering |
| `markdown` | `str` | Markdown with tables and headings preserved |
| `document` | `dict` | Full DoclingDocument JSON for downstream chunkers |

Docling ships two profiles:

| Profile | What it does | When to use |
|---------|--------------|-------------|
| `default` | Layout + table-structure parsing; uses embedded text from PDFs | Born-digital PDFs and DOCX/HTML (fastest) |
| `ocr` | Same as default + runs OCR on rasterized pages | Scanned PDFs or images-only documents |

#### Python

```python
from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

with open("report.pdf", "rb") as f:
    pdf_bytes = f.read()

# Default profile: fast, no OCR (born-digital PDFs only)
result = client.extract(
    "docling",
    Item(document={"data": pdf_bytes, "format": "pdf"}),
)
markdown = result["data"]["markdown"]

# OCR profile: needed for scanned PDFs
result_ocr = client.extract(
    "docling",
    Item(document={"data": pdf_bytes, "format": "pdf"}),
    options={"profile": "ocr"},
)
```

#### TypeScript

```typescript
import { SIEClient } from "@superlinked/sie-sdk";

const client = new SIEClient("http://localhost:8080");

const result = await client.extract(
  "docling",
  { document: { data: pdfBytes, format: "pdf" } },
);
const markdown = result.data.markdown;

// OCR profile: needed for scanned PDFs
const resultOcr = await client.extract(
  "docling",
  { document: { data: pdfBytes, format: "pdf" } },
  { options: { profile: "ocr" } },
);

await client.close();
```

## OCR (Text from Images)

Source: [packages/sie_server/src/sie_server/adapters/florence2/__init__.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/adapters/florence2/__init__.py)

For flat-text OCR without layout, the `microsoft/Florence-2-base` model exposes an `<OCR>` task token:

#### Python

```python
result = client.extract(
    "microsoft/Florence-2-base",
    Item(images=[{"data": document_image, "format": "png"}]),
    options={"task": "<OCR>"}
)

for entity in result["entities"]:
    print(entity["text"])
# Extracted text from the document image
```

#### TypeScript

```typescript
const result = await client.extract(
  "microsoft/Florence-2-base",
  { images: [documentImage] },  // Uint8Array of PNG data
  { options: { task: "<OCR>" } }
);

for (const entity of result.entities) {
  console.log(entity.text);
}
```

### OCR with Regions

To get text with bounding box positions (the default task):

#### Python

```python
result = client.extract(
    "microsoft/Florence-2-base",
    Item(images=[{"data": document_image, "format": "png"}]),
    options={"task": "<OCR_WITH_REGION>"}
)

for entity in result["entities"]:
    print(f"{entity['text']} at {entity['bbox']}")
```

#### TypeScript

```typescript
const result = await client.extract(
  "microsoft/Florence-2-base",
  { images: [documentImage] },
  { options: { task: "<OCR_WITH_REGION>" } }
);

for (const entity of result.entities) {
  console.log(`${entity.text} at ${JSON.stringify(entity.bbox)}`);
}
```

:::caution
Do **not** pass task tokens like `<OCR>` via the `instruction` parameter. The `instruction` parameter appends free text to the task prompt - passing a task token there produces an invalid prompt like `<OCR_WITH_REGION><OCR>`. Use `options={"task": "<OCR>"}` instead.
:::

See [Vision Tasks](/docs/extract/vision/#florence-2-task-prompts) for the full list of Florence-2 task tokens.

## What's Next

- [Vision Tasks](/docs/extract/vision/) - image captioning, object detection, and document understanding
- [NER & Entity Extraction](/docs/extract/) - named entity recognition
- [Relations & Classification](/docs/extract/relations/) - relation extraction and text classification
- [Full model catalog](/models#task=extract) - all supported models