Multimodal (text + image + video + audio) embedding models aligned with jina-embeddings-v5-text-*. Two sizes, four task variants each.
-
jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition
Paper • 2605.08384 • Published • 7 -
jinaai/jina-embeddings-v5-omni-small
Feature Extraction • 2B • Updated • 16.9k • 32 -
jinaai/jina-embeddings-v5-omni-nano
Feature Extraction • 1.0B • Updated • 15.1k • 15 -
jinaai/jina-embeddings-v5-omni-nano-text-matching
Feature Extraction • 0.9B • Updated • 11.5k • 3
