Why did we open-source our inference engine? Read the post
← All News

Multi-modal Information Retrieval with Lightning Fast Nearest Neighbour Search

Multi-modal Information Retrieval with Lightning Fast Nearest Neighbour Search

Mor Kapronczay, Superlinked’s lead ML engineer, used his slot at the Budapest Data and ML Forum to prove that search on images, numbers, maps and text can still clock sub-50 ms latencies.

Mor began by showing why vector databases are booming. Roughly forty products now fight for a slice of an estimated 1.6 billion-dollar market, all built on the same cosine-similarity trick that scales neatly with both item count and embedding size . On Superlinked’s Python testbed a single user vector hits an HNSW index in twelve milliseconds on average, with the 95th percentile sitting at fifty milliseconds .

The problem is that real queries rarely stay textual. He flashed the now-famous travel search “popular family-friendly hotels with good Wi-Fi near Manhattan Midtown under 400” and broke it into four biases: numerical, categorical, semantic and location, plus a hard price filter . Text-only embeddings force every extra signal into brittle filters or rerankers, losing nuance and speed.

Superlinked’s fix is a mixture-of-encoders blueprint. A location encoder handles latitude–longitude, a category encoder maps tags, a number encoder digests prices, and a semantic encoder still does the heavy text lifting. Their outputs are concatenated into one long vector, which the system can weight dynamically at query time . The design keeps k-NN search fast while letting product teams dial relevance knobs without code changes.

Mor also called out the state of evaluation. Popular benchmarks such as MTEB and BEIR top out at a 0.6 retrieval score, largely because they ignore metadata and multimodal inputs, making them a poor proxy for production workloads . He urged practitioners to publish richer datasets and full-stack eval pipelines rather than “toy text tests”.

Audience questions veered toward hardware, index choice and how to train specialist encoders, but Mor’s parting advice was simple: push as much intent as possible into the first retrieval pass, keep reranking light and treat every feature numeric or otherwise as first-class data. If hallway chatter is any indicator, teams left keen to give the encoder mix a spin.

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github
1.5K

Contact us

Tell us about your use case and we'll get back to you shortly.