Blog | Superlinked

Launch February 4, 2026

Boost performance & reduce cost by self-hosting specialized AI models

Introducing SIE, a multi-model inference cluster for search and document processing workloads, released under Apache 2.0.

Read article

Retrieval Jun 10, 2026

One GPU, Four Retrieval Modes: How to Serve Hybrid Search Without Four Separate Deployments

Competitive search needs dense, sparse, ColBERT, and cross-encoder reranking. Here is how to serve all four retrieval modes from one GPU instead of four separate deployments.

By Superlinked Read article

Retrieval Jun 9, 2026

Self-hosted search inference with SIE

Dense, sparse, ColBERT, and reranking from one self-hosted cluster. Cut per-token API costs and keep search queries inside your cloud.

By Superlinked Read article

Document Processing Jun 8, 2026

Self-hosted document processing for AI agents, with SIE

Your agents read PDFs, extract fields, embed chunks, and rerank context. SIE runs all of that document inference on your own GPU, so per-token spend stops scaling with agent usage and customer documents never leave your cloud.

By Superlinked Read article

Token Costs Jun 5, 2026

Why Are My Token Costs Going Up? How Open Source Inference Keeps Them Down

GitHub Copilot raised model multipliers up to 27x on June 1, 2026. Here is why token bills keep climbing, and how self-hosted open source inference for embeddings, reranking, and extraction cuts the cost that quietly compounds under your agents.

By Superlinked Read article

Cost Savings Jun 3, 2026

How to Cut Token Usage and Costs in AI Search and Agents (Without Throwing More GPUs at It)

Embeddings, reranking, and extraction quietly drive your token bill. Here is how small models on your own infrastructure cut token usage and per-token costs.

By Superlinked Read article

Agents Apr 13, 2026

Building an Agentic NLQ System for Real Estate Search

SIE embeddings and Qdrant retrieval behind a GPT-4 router: cross-encoder reranking, hard filters, and five agent tools for natural language real estate search.

By Vipul Maheshwari Read article

RAG Apr 13, 2026

Improving RAG with RAPTOR

How hierarchical cluster-embedding chunking with RAPTOR improves RAG retrieval over vanilla chunking, with a step-by-step implementation and a note on serving embeddings in production with SIE.

By Vipul Maheshwari Read article

RAG Apr 7, 2026

Evaluating Retrieval Augmented Generation using RAGAS

Part two of our RAG evaluation series: building synthetic eval datasets with RAGAS, interpreting faithfulness and retrieval metrics, and mapping results to inference and serving concerns.

By Atita Arora Read article

Retrieval Apr 7, 2026

An evaluation of Retrieval Chunking Methods for Inference Systems

We benchmarked LlamaIndex and LangChain chunkers, MTEB embedding models, ColBERT v2, and rerankers on HotpotQA, SQUAD, and QuAC—and what the results mean for inference-heavy retrieval stacks.

By Kristóf Horváth Read article

RAG Apr 7, 2026

Semantic Chunking

Explore semantic chunking for RAG: embedding similarity, hierarchical clustering, and LLM-based methods, with code, HotpotQA and SQUAD evaluation, and BAAI/bge-small-en-v1.5.

By Ashish Abraham Read article

Vector Databases Mar 15, 2025

A Practical Guide for Choosing a Vector Database

Key considerations and trade-offs for picking a vector database that fits your architecture, scale, and operational limits.

By Superlinked Read article

RAG Feb 10, 2025

Optimizing RAG with Hybrid Search & Reranking

How combining keyword search, vector search, and semantic reranking improves RAG retrieval precision and recall.

By Ashish Abraham Read article

Embeddings Jan 20, 2025

Vector Embeddings in the Browser

Build AI apps that generate and compare vector embeddings directly in your browser using TensorFlow.js. No backend required.

By Rod Rivera Read article

Boost performance & reduce cost by self-hosting specialized AI models

Open source inference for agents