#Vector Search

Last Update: April 5, 2024

**Scanning to calculate the similarity between vectors quickly** is at the heart of Vector Search. Vector similarity scores encoded by your embedding model/s store valuable feature or characteristic information about your data that can be used in various applications (e.g., content recommendation, clustering, data analysis). There are several ways to perform nearest neighbor search.

Let’s say we have a dataset of 1 million 1000-dimensional vectors and want to quickly find similar vectors for a given query. Naively, this would require 1 billion operations per query – a full scan nearest neighbor search. Full scan is an **exhaustive, brute force approach** to nearest neighbor search. It sequentially checks every record or data block of the dataset.

A full scan is simple, easy to implement, suitable when your dataset has less than 1M vectors, and not constantly changing.

But as your dataset grows past 1M vectors, or is updated frequently, full scan takes more and more time and becomes more resource-intensive. We can significantly expedite our scan using an approximate nearest neighbors (ANN) search.

Approximate Nearest Neighbors (ANN) is a class of algorithms that **expedite** vector similarity search through approximations, avoiding the high computational cost of exhaustive brute force (full scan) search.

ANN algorithms like locality-sensitive hashing (LSH) and hierarchical navigable small world (HNSW) can provide a tunable balance between precision and speed. HNSW and other similar methods permit very high recall, approaching brute force levels but at faster speeds.

ANN provides a crucial advantage over brute force search for large, high-dimensional datasets. ANN-benchmarks’ “Benchmarking Results” demonstrate that brute force algorithms provide the highest precision but at a tradeoff cost of fewer QPS (queries per second). **Hybrid** approaches combining ANN and exact search (i.e., full scan) can provide both speed and accuracy.

To select an ANN method appropriate to your application, you need to evaluate metrics like recall and latency in relation to your dataset scale, dimensionality, and data distribution requirements.

Let’s look at some example code, demonstrating linear (full) scan and ANN search.

`import numpy as np import faiss`

Create dataset of 1 million 1000-dim vectors

`num_vectors = 1000000 vector_dim = 1000 dataset = np.random.rand(num_vectors, vector_dim).astype('float32')`

Define query vector

`query_vector = np.random.rand(vector_dim).astype('float32')`

Create FAISS index

`index = faiss.IndexFlatL2(vector_dim)`

Add vectors to index

`index.add(dataset)`

Linear scan search

`start = time.time() distances, indices = index.search(query_vector.reshape(1, vector_dim), 1) print("Linear scan time: ", time.time() - start)`

Switch to IVFFlat index for approximate search

`index = faiss.index_factory(vector_dim, "IVF1024,Flat") index.train(dataset) index.add(dataset) start = time.time() distances, indices = index.search(query_vector.reshape(1, vector_dim), 1) print("ANN time: ", time.time() - start)`

The code snippet above shows how an ANN algorithm, like IVFFlat from the FAISS library, indexes the vectors, allowing **quick narrowing of the search space**. This lets you significantly speed up your scan, compared to a linear scan, especially as your data set becomes larger.

Different ANN implementations provide different optimization tradeoffs. Choosing the right ANN method requires benchmarking options like HNSW, IVF, and LSH on your dataset.

Search speed is not the only challenge when handling datasets with more than 1 million vectors. Once a dataset is too large to fit into the RAM of a single machine, you may be forced to shard the index to multiple machines. This is costly and increases system complexity. Fortunately, you can use quantization to **reduce the index size, without substantially reducing retrieval quality**.

Quantization reduces the memory required to store vectors by compressing the data but preserving relevant information. There are **three types** of vector quantization: scalar, binary, and product.

**Scalar** quantization reduces the dimensionality of your data by representing each vector component with fewer bits, reducing the amount of memory required for vector storage and speeding up the search process. Scalar quantization achieves a good balance of compression, accuracy, and speed, and is used widely.

**Binary** quantization represents each vector component as binary codes. Binary is the fastest quantization method. But it’s only efficient for high dimensional vectors and a centered distribution of vector components. As a result, binary quantization should only be used with tested models.

**Product** quantization splits the vector dimensions into smaller subvectors, each independently quantized into an index using the subvector’s set of codewords, representing data point clusters, and then combined into a single index representing the original high-dimensional vector. Product quantization provides better compression (minimal memory footprint) but is slower and less accurate than scalar quantization.

ANN search can be combined with quantization to enable running high-performance [Vector Search] pipelines on huge datasets.

Stay updated with VectorHub

Continue Reading