Keep flat indexing until you hit ~10K vectors if accuracy is critical and latency under 100ms is acceptable. Beyond that, your choice depends on your constraints:
- For 10K-1M vectors that fit in memory, use IVF_FLAT with sqrt(N) clusters as a starting point - this maintains high accuracy while reducing search from O(N) to O(sqrt(N)).
- For 1M-10M vectors where memory is limited, switch to IVF_PQ or IVF_SQ which compress vectors by 4-8x through quantization, trading 5-10% accuracy for 75% memory savings.
- Above 10M vectors, go disk-based with DiskANN (95% accuracy) or SPANN (90% accuracy with better scaling).
Pro tip: Run A/B tests during migration - keep flat indexing for your top 1% most-queried items while using approximate methods for the long tail, as users are more tolerant of slight inaccuracies in less popular results.