Multi-modal Information Retrieval with Lightning Fast Nearest Neighbour Search

‍

Mor Kapronczay, Superlinked’s lead ML engineer, used his slot at the Budapest Data and ML Forum to prove that search on images, numbers, maps and text can still clock sub-50 ms latencies.

Mor began by showing why vector databases are booming. Roughly forty products now fight for a slice of an estimated 1.6 billion-dollar market, all built on the same cosine-similarity trick that scales neatly with both item count and embedding size . On Superlinked’s Python testbed a single user vector hits an HNSW index in twelve milliseconds on average, with the 95th percentile sitting at fifty milliseconds .

The problem is that real queries rarely stay textual. He flashed the now-famous travel search “popular family-friendly hotels with good Wi-Fi near Manhattan Midtown under 400” and broke it into four biases: numerical, categorical, semantic and location, plus a hard price filter . Text-only embeddings force every extra signal into brittle filters or rerankers, losing nuance and speed.

‍

‍

Superlinked's fix is a mixture-of-encoders blueprint. A location encoder handles latitude–longitude, a category encoder maps tags, a number encoder digests prices, and a semantic encoder still does the heavy text lifting. Their outputs are concatenated into one long vector, which the system can weight dynamically at query time . The design keeps k-NN search fast while letting product teams dial relevance knobs without code changes.

Mor also called out the state of evaluation. Popular benchmarks such as MTEB and BEIR top out at a 0.6 retrieval score, largely because they ignore metadata and multimodal inputs, making them a poor proxy for production workloads . He urged practitioners to publish richer datasets and full-stack eval pipelines rather than “toy text tests”.

Audience questions veered toward hardware, index choice and how to train specialist encoders, but Mor's parting advice was simple: push as much intent as possible into the first retrieval pass, keep reranking light and treat every feature numeric or otherwise as first-class data. If hallway chatter is any indicator, teams left keen to give the encoder mix a spin.

‍

No items found.

Multi-modal Information Retrieval with Lightning Fast Nearest Neighbour Search

Let’s launch vectors into production

Product

About

Support

Links

Multi-modal Information Retrieval with Lightning Fast Nearest Neighbour Search

Share on social

Let’s launch vectors into production

Product

About

Support

Links