Natural Language Search on Semi-Structured Data

Daniel and Jason explain how a “mixture of encoders” makes search smarter

In a recent live lesson Daniel Svonava and Jason Liu chatted through the nuts and bolts of search that understands far more than text alone.

‍

TL;DR:

Key Topics Covered:

• The pitfalls of treating semi-structured data as unstructured text in search pipelines • Moving beyond text-to-SQL: handling queries with complex, multi-modal signals

• Strategies for encoding rich metadata (such as timestamps, geospatial data, and user behavior) directly into embeddings

• Rethinking the role of re-ranking and why it may be redundant with better initial retrieval

• Designing embeddings that are natively aware of both content and context

• Real-world examples from industries like logistics and personalized recommendations

• Challenges in evaluating the effectiveness of metadata-enriched search

Main Argument: Daniel advocates for integrating as much structured signal as possible into the embedding layer, minimizing the need for post-processing and complex re-ranking strategies for more precise and scalable search.

Hot Take: Re-ranking is often a workaround for inadequate initial retrieval. The most effective systems should surface the best results in the first pass—making re-ranking largely obsolete.

‍

Daniel kicked things off with a tough travel query that folds in popularity scores, family-friendly tags, Wi-Fi sentiment, geo coordinates and a strict price cap – the sort of request that breaks a plain text embedding model .

He argued that most teams lean on hard filters and post-hoc rerankers. These tricks slice away nuance, and they tinker with only a small slice of the index. Daniel’s alternative is a mixture of encoders. Each data type gets its own small encoder: numbers, locations, graphs, images and free text. An aggregator then stitches the outputs into one vector that a database can rank in a single hop .

Real-world proof points followed. A jobs marketplace saw applications jump by fifty percent after swapping an old keyword engine for the encoder mix . A fashion retailer added more than ten million dollars in revenue after the same upgrade .

Jason asked whether a single giant model could swallow all of this work. Daniel expects modular blocks to stay practical, because each one can be retrained or swapped without touching the rest .

Daniel wrapped with three “commandments”: rely less on blunt filters, keep reranking light, and stop treating every piece of data as if it were a string . The audience seemed to agree, judging by the steady stream of questions on weighting schemes and encoder design.

Still hungry for more and need help improving your search? Why not jump on a call with one of our co-founders? Supercharge your search and recommendations with Superlinked!

‍

No items found.

Natural Language Search on Semi-Structured Data

Daniel and Jason explain how a “mixture of encoders” makes search smarter

Let’s launch vectors into production

Product

About

Support

Links

Natural Language Search on Semi-Structured Data

Daniel and Jason explain how a “mixture of encoders” makes search smarter

Matt Seabourne

Share on social

Let’s launch vectors into production

Product

About

Support

Links