vidore/colpali-v1.3-hf

Primitive: /encode · Encode · PaliGemma

> [!IMPORTANT] > This version of ColPali should be loaded with the `transformers 🤗` release, not with `colpali-engine`. > It was converted using the `convert_colpali_weights_to_hf.py` script > from the `vidore/colpali-v1.3-merged` checkpoint.

MultimodalMulti-vector

View on Hugging Face → Fine-tuned from vidore/colpaligemma-3b-pt-448-base

Overview

Hardware: — drives latency, throughput & cost

Size	3.0B params
Tasks	/encode
License	gemma
Languages	en
Latency	582 ms
Throughput	23.0 mpix/s
Cost	— /1M tok

Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.

Embedding

Output types	Multi-Vec
Dimensions	multivector: 128
Max sequence length	2,048
Inputs	text · image

Benchmarks

Vidore3ComputerScienceRetrieval

technology retrieval en

Visual document retrieval on computer science papers and slides

Performance L4 b1 c16

Corpus 23.2 mpix/s

Corpus p50 579.6ms

Query 484 tok/s

Query p50 266.9ms

Reference →

Vidore3FinanceEnRetrieval

finance retrieval en

Visual document retrieval on financial reports

Performance L4 b1 c16

Corpus 22.8 mpix/s

Corpus p50 583.7ms

Query 469 tok/s

Query p50 252.6ms

Reference →

Vidore3HrRetrieval

general retrieval en

Visual document retrieval on HR-related documents

Performance L4 b1 c16

Corpus 23.5 mpix/s

Corpus p50 585.1ms

Query 562 tok/s

Query p50 261.5ms

Reference →

Vidore3PharmaceuticalsRetrieval

medical retrieval en

Visual document retrieval on pharmaceutical documents

Performance L4 b1 c16

Corpus 16.3 mpix/s

Corpus p50 575.6ms

Query 538 tok/s

Query p50 250.7ms

Reference →

vidore/colpali-v1.3-hf

Overview

Embedding

Benchmarks

Vidore3ComputerScienceRetrieval

Vidore3FinanceEnRetrieval

Vidore3HrRetrieval

Vidore3PharmaceuticalsRetrieval

Open source inference for agents