Why did we open-source our inference engine? Read the post

google/siglip2-base-patch16-224

SigLIP 2 extends the pretraining objective of SigLIP with prior, independently developed techniques into a unified recipe, for improved semantic understanding, localization, and dense features.

Overview

Architecture
SigLIP
Parameters
375M
Tasks
Encode
Outputs
Dense
Dimensions
Dense: 768
Max Sequence Length
64 tokens
License
apache-2.0

Benchmarks

Flickr30kI2TRetrieval

general retrieval en

Image-to-text retrieval: retrieve captions from images

Corpus: 31,783 Queries: 1,000
Quality
ndcg at 10 0.8157
map at 10 0.7255
mrr at 10 0.9302
Performance L4 b1 c8
Corpus 1.6K tok/s
Corpus p50 68.5ms
Query 13.0 mpix/s
Query p50 99.0ms
Reference →

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 2.0K

Contact us

Tell us about your use case and we'll get back to you shortly.