🎉 We've just raised $9.5M Seed round. Read more about our plan ->

Superlinked unlocks the Power of GenAI in the Enterprise with MongoDB Atlas

In the rapidly evolving world of technology, building high-performing Generative AI (GenAI) applications has become a top priority for enterprise tech teams globally. The promise of operational efficiencies and enhanced customer experiences is undeniably appealing. However, a significant challenge stands in the way: enterprise data is complex and data entities have both structured and unstructured attributes. While powerful in understanding and working with unstructured data, GenAI models are notoriously poor at understanding and handling structured data predictably.

‍

‍

The Challenge of Structured Data in GenAI Applications 

‍

Structured data is the backbone of most internal and customer-facing enterprise systems. Consider a Q&A chatbot of a financial analyst - we would expect it to understand that the analyst refers to “recent” or “at risk” reports from a particular range of dates, or a recommender system of an e-commerce store - we would expect it to recommend similarly priced or popular products to the product that the customer is currently viewing, in real-time. Unfortunately, traditional GenAI models struggle with numbers, timestamps and categorical data - let alone more complex data structures like time series or graphs. Their strength lies in processing and understanding certain types of unstructured data like text and images, but they falter with structured data, treating important attributes of the data, such as product prices, document dates, store locations, and content categories as if they were merely text. This mismatch often leads to tech teams discovering that their applications deliver subpar results.

In response, many teams attempt to develop and train custom re-ranking models tailored to their specific needs. However, this is a daunting task, which requires significant expertise, time, and resources—luxuries that many enterprises cannot afford. Consequently, most GenAI-powered solutions remain stuck in the proof of concept phase, unable to realize their full potential.

‍

‍

The Powerful Partnership between MongoDB Atlas and Superlinked

‍

The combination of MongoDB Atlas and Superlinked aims to overcome these challenges and revolutionize the way enterprises build and deploy GenAI applications. Here’s how:

‍

Superlinked: Bridging the Gap Between Structured and Unstructured Data

‍

Superlinked’s vector compute framework is a game-changer for data science teams. It enables the creation of custom vector embeddings that seamlessly integrate structured and unstructured data into the same vector space. This unique approach allows enterprises to use vector search to deliver results that take into account both data types, effectively tailoring GenAI models to their specific use cases. The result combines the high-quality performance of a custom model with the convenience of pre-trained GenAI models, offering a significant boost in time-to-market and explainability of the results.

‍

MongoDB Atlas: Simplifying Vector Search and Data Management

‍

For enterprises already using MongoDB to store their structured data, Atlas is a natural choice. MongoDB Atlas offers an easy and reliable way to implement vector search at scale without adding complexity to their existing data stack. By using Atlas with custom embeddings, generated by Superlinked, businesses are able to harness the full power of their complex data, delivering high-quality applications and achieving the promise of GenAI.

‍

Bringing GenAI-powered Applications to Production

‍

In summary, the combination of Superlinked and MongoDB Atlas offers a clear path to building and deploying high-quality GenAI-powered applications. By addressing the inherent challenges of complex data, this partnership ensures that enterprises can move beyond the POC and MVP stages, delivering real value to their operations and customers.

MongoDB’s partnership with Superlinked aims to make it easier for customers to create and maintain entity-level and sub-entity-level vector embeddings for enterprise retrieval augmented generation and other use cases, including analytics or more standard semantic search and recommendation systems.

Greg Maxson,
Global Lead, AI GTM

Getting started with Atlas and Superlinked 

‍

Below you’ll find a step-by-step guide for building your first simple application with Superlinked, using Atlas as the vector store and search solution. This Semantic Search application allows users to perform a free text search within a database of product reviews and demonstrates how combining the unstructured text of the review with the star ratings of the product embedded as a numeric value in the same vector space delivers higher-quality and more relevant results. 

You can find a complete example here, and as always, refer to the official README for the latest details. 

‍

Experiment in your Python notebook environment

‍

Step 1: Install Superlinked

%pip install superlinked 


Step 2: Define the data schema

# we are going to create 2 representations of the data
## 1. separate text and ranking for multimodal superlinked embeddings
## 2. full_review_as_text for LLM embedding of stringified review and rating

@schema
class Review:
    id: IdField
    review_text: String
    rating: Integer
    full_review_as_text: String

‍

Step 3: Create the spaces to encode different parts of data

# Embed review data separately
review_text_space = TextSimilaritySpace(
    text=review.review_text, model="all-MiniLM-L6-v2")

rating_maximizer_space = NumberSpace(review.rating, min_value=1, 
    max_value=5, mode=Mode.MAXIMUM)

## Embed the full review as text
full_review_as_text_space = TextSimilaritySpace(
    text=review.full_review_as_text, model="all-MiniLM-L6-v2"

# Combine spaces as vector parts to an index.
## Create one for the stringified review 
naive_index = Index([full_review_as_text_space])

## and one for the structured multimodal embeddings
advanced_index = Index([review_text_space, rating_maximizer_space])

‍


Step 4: Define the query mapping to each index type

openai_config = OpenAIClientConfig(api_key=userdata.get("openai_api_key"),     model="gpt-4o")

# Define your query using dynamic parameters for query text and weights.
## first a query on the naive index - using natural language
naive_query = (
    Query(
        naive_index,
        weights={
            full_review_as_text_space: Param('full_review_as_text_weight')
        },
    )
    .find(review)
    .similar(full_review_as_text_space.text, Param("query_text"))
    .limit(Param('limit'))
    .with_natural_query(Param("natural_query"), openai_config)
)
## and another on the advanced multimodal index - also using natural language
superlinked_query = (
    Query(
        advanced_index,
        weights={
            review_text_space: Param('review_text_weight'),
            rating_maximizer_space: Param('rating_maximizer_weight'),
        },
    )
    .find(review)
    .similar(review_text_space.text, Param("query_text"))
    .limit(Param('limit'))
    .with_natural_query(Param("natural_query"), openai_config)
)


Note
: Superlinked supports two ways of setting weights for query parts:

  1. “Natural language" queries - using .with_natural_query to dynamically and automatically parse user queries and set the weights
  2. Pre-defined weights - developers can set weights for each part of the query based on business logic or known user preferences

‍


Step 5: Load the data

# Run the app
source: InMemorySource = InMemorySource(review, parser=DataFrameParser(schema=review))
executor = InMemoryExecutor(sources=[source], indices=[naive_index, advanced_index]index])
app = executor.run()

# Download dataset
data = pd.read_json("https://storage.googleapis.com/superlinked-preview-test-data/amazon_dataset_1000.jsonl",lines=True)

# Ingest data to the framework.
source.put([data])

‍ 

Step 6: Run experiments with different weights and user queries, until you are satisfied with the results

# query that is based on the LLM embedded reviews# query that is solely based on text ( = zero weight to star ratings)
naive_positive_results = app.query(
    naive_query,
    natural_query='High rated quality products',
    limit=10)
naive_positive_results.to_pandas()
results = app.query(query, review_text_weight=1,rating_maximizer_weight=0, query_text='High quality products', limit=10)
results.to_pandas().head(10)

# query based on multimodal Superlinked embeddings
superlinked_positive_results = app.query(
    superlinked_query,
    natural_query='High rated quality products',
    limit=10)
superlinked_positive_results.to_pandas()

‍

Ready to deploy? 
‍

Step 7: Install Superlinked server (refer to the server README for the latest instructions)

# Clone the repository
git clone https://github.com/superlinked/superlinked

cd <repo-directory>/server
./tools/init-venv.sh
cd runner
source "$(poetry env info --path)/bin/activate"
cd ..

# Make sure you have your docker engine running and activate the virtual environment
./tools/deploy.py up

‍

Step 8: Connect to your Atlas instance (refer to Atlas Vector Search Documentation for full instructions)

from superlinked.framework.dsl.storage.mongo_vector_database import MongoDBVectorDatabase

vector_database = MongoDBVectorDatabase(
    "<USER>:<PASSWORD>@<HOST_URL>",
    "<DATABASE_NAME>",
    "<CLUSTER_NAME>",
    "<PROJECT_ID>",
    "<API_PUBLIC_KEY>",
    "<API_PRIVATE_KEY>",
)

‍

Step 9: Configure the Superlinked Server - add your configuration from the notebook (Steps 2, 3) and append the deployment setup

# Copy your configuration to app.py
# ...

# Create a data source to bulk load your production data.
config = DataLoaderConfig("https://storage.googleapis.com/superlinked-sample-datasets/amazon_dataset_ext_1000.jsonl""https://storage.googleapis.com/superlinked-sample-datasets/amazon_dataset_ext_1000.jsonlhttps://storage.googleapis.com/superlinked-sample-datasetspreview-test-data/amazon_dataset_1000.jsonl", DataFormat.JSON, pandas_read_kwargs={"lines": True, "chunksize": 100})
source = DataLoaderSource(review, config)

executor = RestExecutor(
    # Add your data source 
    sources=[source],
    # Add the indices ex that contains your configuration 
    indices=[index],
    # Create a REST endpoint for your query.
    queries=[RestQuery(RestDescriptor("naive_query"), naive_query),RestQuery(RestDescriptor("superlinked_query"), superlinked_query)],
    # Connect to MongoDB Atlas
    vector_database=MongoDBVectorDatabase()
)

SuperlinkedRegistry.register(executor)

‍

Step 10: Test your deployment

# Trigger the data load.
curl -X POST 'http://localhost:8080/data-loader/review/run'

# Check the status of the loader.
curl -X GET 'http://localhost:8080/data-loader/review/status'

# Send your first query
curl -X POST \
    'http://localhost:8080/api/v1/search/superlinked_query' \
    --header 'Accept: */*' \
    --header 'Content-Type: application/json' \
    --data-raw '{
        "natural_query": "High rated quality products",
        "limit": 10
    }'

‍

Congratulations! You learned how to build your first GenAI-powered application that combines numeric and unstructured data in the same embedding space to deliver high-quality results. Now you are ready to explore additional notebooks here.

‍

We are excited to see the amazing applications that you will build with Superlinked and Atlas - don’t hesitate to share your work with us.

‍

Conclusion

Our winning partnership is designed to empower tech teams, helping them overcome the barriers to effective GenAI implementation and achieve their goals. With Atlas and Superlinked, the future of GenAI in the enterprise is not just promising—it’s here.

‍

Posted by

Daniel Svonava

CEO & Co-founder

Share on social

Let’s launch vectors into production

Start Building
Subscribe to stay updated
You are agreeing to our Terms and Conditions by Subscribing.
Thank you!
Your submission has been received!
Oops! Something went wrong while submitting the form.
2024 Superlinked, Inc.