Publication Date: October 22, 2025

When does quantization make sense, and should I use IVF_PQ or IVF_SQ?

This tip is based on the following article: Vector Indexes

Quantization becomes necessary when your vector index exceeds 50% of available RAM or cloud storage costs exceed compute costs.

In short, IVF groups data elements to improve vector search efficiency. But instead of hashing, IVF uses clustering techniques to prefilter data. IVF_SQ and IVF_PQ are a few ways of doing this. Read more about Inverted File Index (IVF) here.

Choose IVF_SQ for predictable performance - it's simpler, quantizes each dimension independently to 8-bit integers, gives you exactly 4x compression, and maintains 85-90% recall.

Use IVF_PQ when you need aggressive compression (8-16x) and can tolerate 75-80% recall - it splits vectors into subspaces and quantizes each independently, allowing fine-tuned compression ratios.

Real-world tip: PQ shines with high-dimensional embeddings (>256 dims) where dimensional correlation is high.

For production, implement a hybrid approach: store your top 5% most-accessed vectors without quantization for perfect recall on popular items, apply SQ to the next 20% for good balance, and use aggressive PQ for the long tail. Monitor your P95 latency - PQ decoding adds 2-5ms overhead that might push you past SLAs despite the memory savings.

Did you find this tip helpful?