Boost performance & reduce cost by self-hosting specialized AI models
Introducing SIE, a multi-model inference cluster for search and document processing workloads, released under Apache 2.0.
SIE embeddings and Qdrant retrieval behind a GPT-4 router: cross-encoder reranking, hard filters, and five agent tools for natural language real estate search.
How hierarchical cluster-embedding chunking with RAPTOR improves RAG retrieval over vanilla chunking, with a step-by-step implementation and a note on serving embeddings in production with SIE.
Key considerations and trade-offs for picking a vector database that fits your architecture, scale, and operational limits.
How combining keyword search, vector search, and semantic reranking improves RAG retrieval precision and recall.
Build AI apps that generate and compare vector embeddings directly in your browser using TensorFlow.js. No backend required.