Why did we open-source our inference engine? Read the post

← Home

Search

Uses: /encode · /score

Retrieval in two moves: /encode turns documents and queries into vectors for fast recall, then /score reranks the shortlist for precision. Pick a dense, sparse, or multi-vector embedder and pair it with a cross-encoder.

Featured models

Examples

End-to-end projects from our examples that put this task to work.

Featured picks are still being finalized. Latency, throughput and cost are real where we've benchmarked the model on the selected GPU; "—" means no measurement there. Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.

Open source inference for agents

Open-source inference for the models behind your agents. Run it yourself, or let us run it for you.

Github 2.1K

Contact us

Tell us about your use case and we'll get back to you shortly.

Apply for an inference grant

Free capacity on our hosted cluster for selected projects. Tell us what you run and we reply by email.