The open-source opportunity
Weβve built TB-scale search systems for years, always on open source models. Embeddings, rerankers, classifiers, OCR. No matter the domain or language, a good OSS model either exists or is a fine-tuning run away.
π€ now adds 100,000 new models each month:
Source: Hugging Face Hub
Even the largest proprietary models now have an equally capable open source alternative within months of launch. Yet most companies still spend billions on LLM APIs instead of running their own.
The multi-model problem
Real AI pipelines use many specialized models working together: dense and sparse embeddings, multi-vector representations like ColBERT, vision models like SigLIP, cross-encoder rerankers, NER, classification, OCR. A single document processing pipeline might chain four of these.
But single-model inference servers werenβt built for this. Every model gets its own deployment, its own dedicated GPU pool. Five models, five pools, ~3% total utilization; each provisioned for peak load and idle the rest of the time.
Managed inference providers are chasing general-purpose LLMs and ignoring small models. Open source projects like TEI and vLLM still require home-grown infra around them, and support for new models is best-effort. There hasnβt been a good way to self-host hundreds of task-specific models in your own cloud.
Superlinked Inference Engine
SIE is a multi-model inference cluster for search and document processing. Instead of one service per model, it runs all of them behind a single API. We released it under Apache 2.0 for AWS and GCP (Terraform and Helm included for easy setup).
SIE ships with 85+ models across encoding, scoring, and extraction. It lazy-loads them onto shared GPUs with elastic scaling and fast switching for evals and A/B tests. One cluster can handle pipeline and real-time workloads across multiple teams with familiar Kubernetes ergonomics.
Check the quickstart, clone one of our examples, or drop SIE into your existing app via native integrations with Chroma, Weaviate, LangChain, LlamaIndex, DSPy, Haystack, and CrewAI.
Weβll be adding hundreds more models and squeezing more performance per dollar out of your GPUs over the coming months. Let us know what you build.
Letβs make 2026 the year of open source AI!
Daniel, Ben and the Superlinked team