Publication Date: October 22, 2025

My RAG system misses important insights that span multiple chunks. Should I just increase chunk size or overlap?

This tip is based on the following article: Improving RAG with RAPTOR

Increasing chunk size or overlap is a band-aid that creates new problems (larger context windows, diluted embeddings).

The production fix is implementing hierarchical summarization:

after your initial chunking, cluster similar chunks using GMM (not k-means - GMM allows soft assignments where chunks can belong to multiple clusters),
then generate summaries of each cluster.
Recursively repeat this process 2-3 levels up.
Store both original chunks AND cluster summaries as retrievable documents.

This captures relationships that span chunks without bloating individual chunk sizes. Key implementation detail: Use UMAP before clustering to reduce embedding dimensions - set n_neighbors=10 for local structure (detailed clusters) or n_neighbors=sqrt(N) for global structure (broad themes). RAPTOR showed 20-30% improvement in answering multi-hop questions compared to flat retrieval.

Did you find this tip helpful?