---
title: google/owlv2-base-patch16-ensemble
description: The OWLv2 model (short for Open-World Localization) was proposed in Scaling Open-Vocabulary Object Detection by Matthias Minderer, Alexey Gr. CLIP, 155M parameters.
canonical_url: https://superlinked.com/models/google-owlv2-base-patch16-ensemble
last_updated: 2026-05-25
---

# google/owlv2-base-patch16-ensemble

The OWLv2 model (short for Open-World Localization) was proposed in Scaling Open-Vocabulary Object Detection by Matthias Minderer, Alexey Gritsenko, Neil Houlsby.

Source: [google/owlv2-base-patch16-ensemble on HuggingFace](https://huggingface.co/google/owlv2-base-patch16-ensemble)

## Overview

| Field | Value |
|-------|-------|
| Architecture | CLIP |
| Parameters | 155M |
| Tasks | Extract |
| Outputs | Bounding Boxes |
| License | apache-2.0 |
| Inputs | image |

## Benchmarks

### COCO

Domain: general · Task: detection · Language: en

Object detection on COCO natural images

Corpus: 5,000 · Queries: 5,000

**Variant: default_limit-1000**

**Performance (A10G b1 c4):** Detect 0.0 mpix/s · Detect p50 42.1s

**Performance (L4-SPOT b1 c4):** Detect 0.9 mpix/s · Detect p50 901.0ms

**Performance (L4 b1 c4):** Detect 1.1 mpix/s · Detect p50 1.0s

**Variant: default_limit-100**

**Quality:** ap: 0.5171 · ap50: 0.7172 · ap75: 0.5738 · ar 100: 0.6315

**Performance (RTX-4090 b1 c16):** Detect 4.3 mpix/s · Detect p50 547.4ms

[Reference](https://cocodataset.org/)