Skip to content
Why did we open-source our inference engine? Read the post

How to extract entities and structured data with SIE

SIE’s extract primitive pulls structured information from unstructured content. It handles named entity recognition (NER), relation extraction, text classification, and vision tasks including captioning and OCR. Models run on your own infrastructure with zero per-call API costs.

from sie_sdk import SIEClient
from sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
text = Item(text="Apple CEO Tim Cook announced the iPhone 16 in Cupertino.")
result = client.extract(
"urchade/gliner_multi-v2.1",
text,
labels=["person", "organization", "product", "location"]
)
for entity in result["entities"]:
print(f"{entity['label']}: {entity['text']} (score: {entity['score']:.2f})")
# organization: Apple (score: 0.95)
# person: Tim Cook (score: 0.93)
# product: iPhone 16 (score: 0.89)
# location: Cupertino (score: 0.87)

For model recommendations, see the full model catalog.


Item accepts three input modes depending on the model:

  • text — plain string. Used by GLiNER, GLiREL, GLiClass, and the rest of the text-only extractors.
  • images — list of image bytes (or {data, format} dicts in Python). Used by Florence-2, Donut, GroundingDINO, OWL-v2, and image-input OCR models like zai-org/GLM-OCR, lightonai/LightOnOCR-2-1B, and PaddlePaddle/PaddleOCR-VL-1.5. See Vision Tasks and OCR.
  • document — raw file bytes (PDF, DOCX, HTML, MD, TXT, RTF, ODT, PPTX, XLSX, CSV). Used by the multi-page docling parser. The Python SDK auto-detects the format from a path suffix; bytes-based callers pass format explicitly. See OCR → Docling.

GLiNER models support zero-shot NER: define any entity types you need at query time, with no predefined schema.

result = client.extract(
"urchade/gliner_multi-v2.1",
Item(text="The merger between Acme Corp and Beta Inc requires FTC approval."),
labels=["company", "regulatory_body", "legal_action"]
)
for entity in result["entities"]:
print(f"{entity['label']}: {entity['text']}")
# company: Acme Corp
# company: Beta Inc
# regulatory_body: FTC

Each entity includes character positions for highlighting or downstream processing:

result = client.extract(
"urchade/gliner_multi-v2.1",
Item(text="Tim Cook works at Apple."),
labels=["person", "organization"]
)
for entity in result["entities"]:
print(f"{entity['label']}: '{entity['text']}' [{entity['start']}:{entity['end']}]")
# person: 'Tim Cook' [0:8]
# organization: 'Apple' [18:23]
documents = [
Item(id="doc-1", text="Microsoft acquired Activision for $69 billion."),
Item(id="doc-2", text="Sundar Pichai leads Google's AI initiatives."),
]
results = client.extract(
"urchade/gliner_multi-v2.1",
documents,
labels=["person", "organization", "money"]
)

The ExtractResult contains different fields depending on the extraction type used:

FieldTypeWhen present
idstr or NoneAlways (if provided in input)
entitieslist[Entity]NER models (GLiNER)
relationslist[Relation]Relation extraction (GLiREL)
classificationslist[Classification]Classification models (GLiClass)
objectslist[DetectedObject]Object detection (GroundingDINO, OWLv2)
datadictDocument/composite extractors (Docling, Donut, document-mode Florence-2)
FieldTypeDescription
textstrExtracted text span
labelstrEntity type label
scorefloatConfidence score from 0 to 1
startintStart character position
endintEnd character position

The server defaults to msgpack. For JSON responses:

curl -X POST http://localhost:8080/v1/extract/urchade/gliner_multi-v2.1 \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{
"items": [{"text": "Tim Cook is the CEO of Apple."}],
"params": {"labels": ["person", "organization"]}
}'

See the HTTP API Reference.


Extraction is available through all major framework integrations, not just the native SDK:

FrameworkComponentReturns
LangChainSIEExtractorDict with entities, relations, classifications, objects
LlamaIndexcreate_sie_extractor_toolDict with entities, relations, classifications, objects
HaystackSIEExtractorTyped outputs: Entity, Relation, Classification, DetectedObject
DSPySIEExtractordspy.Prediction with extraction fields
CrewAISIEExtractorToolFormatted string with all extraction types

What is zero-shot NER? Zero-shot NER means you can define your entity types at query time without fine-tuning a model. GLiNER models like urchade/gliner_multi-v2.1 accept arbitrary label strings and extract matching spans from text. There is no fixed list of entity types.

Does SIE support relation extraction? Yes. GLiREL models extract relationships between entities (for example, “CEO of”, “acquired by”). See Relations and Classification.

Can SIE extract data from PDFs and images? Yes. SIE supports four dedicated OCR models: zai-org/GLM-OCR, lightonai/LightOnOCR-2-1B, PaddlePaddle/PaddleOCR-VL-1.5, and docling (multi-page PDF/DOCX/HTML). They convert documents to Markdown while preserving tables and layout. Donut and Florence-2 are also available for image captioning and visual QA. See OCR and Vision Tasks.

Which model should I use for entity extraction? urchade/gliner_multi-v2.1 is a strong default for multilingual NER. It handles zero-shot extraction across 100+ languages. Browse all extraction models in the model catalog.

Contact us

Tell us about your use case and we'll get back to you shortly.