🎉 We've just raised $9.5M Seed round. Read more about our plan ->

Multi-modal Information Retrieval with Lightning Fast Nearest Neighbour Search

‍

Mor Kapronczay, Superlinked’s lead ML engineer, used his slot at the Budapest Data and ML Forum to prove that search on images, numbers, maps and text can still clock sub-50 ms latencies.

Mor began by showing why vector databases are booming. Roughly forty products now fight for a slice of an estimated 1.6 billion-dollar market, all built on the same cosine-similarity trick that scales neatly with both item count and embedding size . On Superlinked’s Python testbed a single user vector hits an HNSW index in twelve milliseconds on average, with the 95th percentile sitting at fifty milliseconds .

The problem is that real queries rarely stay textual. He flashed the now-famous travel search “popular family-friendly hotels with good Wi-Fi near Manhattan Midtown under 400” and broke it into four biases: numerical, categorical, semantic and location, plus a hard price filter . Text-only embeddings force every extra signal into brittle filters or rerankers, losing nuance and speed.

‍

‍

Superlinked's fix is a mixture-of-encoders blueprint. A location encoder handles latitude–longitude, a category encoder maps tags, a number encoder digests prices, and a semantic encoder still does the heavy text lifting. Their outputs are concatenated into one long vector, which the system can weight dynamically at query time . The design keeps k-NN search fast while letting product teams dial relevance knobs without code changes.

Mor also called out the state of evaluation. Popular benchmarks such as MTEB and BEIR top out at a 0.6 retrieval score, largely because they ignore metadata and multimodal inputs, making them a poor proxy for production workloads . He urged practitioners to publish richer datasets and full-stack eval pipelines rather than “toy text tests”.

Audience questions veered toward hardware, index choice and how to train specialist encoders, but Mor's parting advice was simple: push as much intent as possible into the first retrieval pass, keep reranking light and treat every feature numeric or otherwise as first-class data. If hallway chatter is any indicator, teams left keen to give the encoder mix a spin.

‍

No items found.
Posted by
No items found.

Share on social

Let’s launch vectors into production

Talk to Engineer
Subscribe to stay updated
You are agreeing to our Terms and Conditions by Subscribing.
Thank you!
Your submission has been received!
Oops! Something went wrong while submitting the form.
Copyright © 2025 Superlinked Inc. All rights reserved.