--- license: apple-amlr license_name: apple-ascl license_link: https://github.com/apple/ml-mobileclip/blob/main/LICENSE_weights_data library_name: mobileclip --- # 📸 MobileCLIP-B Zero-Shot Image Classifier ### Hugging Face Inference Endpoint > **Production-ready wrapper** around Apple’s MobileCLIP-B checkpoint. > Handles image → text similarity in a single fast call. --- ## 📑 Sidebar - [Features](#-features) - [Repository layout](#-repository-layout) - [Quick start (local smoke-test)](#-quick-start-local-smoke-test) - [Calling the deployed endpoint](#-calling-the-deployed-endpoint) - [How it works](#-how-it-works) - [Updating the label set](#-updating-the-label-set) - [License](#-license) --- ## ✨ Features | | This repo | |------------------------------|-----------| | **Model** | MobileCLIP-B (`datacompdr` checkpoint) | | **Branch fusion** | `reparameterize_model` baked in | | **Mixed-precision** | FP16 on GPU, FP32 on CPU | | **Pre-computed text feats** | One-time encoding of prompts in `items.json` | | **Per-request work** | _Only_ image decoding → encode_image → softmax | | **Latency (A10G)** | < 30 ms once the image arrives | --- ## 📁 Repository layout | Path | Purpose | |--------------------|------------------------------------------------------------------| | `handler.py` | HF entry-point (loads model + text cache, serves requests) | | `reparam.py` | 60-line stand-alone copy of Apple’s `reparameterize_model` | | `requirements.txt` | Minimal dep set (`torch`, `torchvision`, `open-clip-torch`) | | `items.json` | Your label set (`id`, `name`, `prompt` per line) | | `README.md` | This document | --- ## 🚀 Quick start (local smoke-test) ```bash python -m venv venv && source venv/bin/activate pip install -r requirements.txt python - <<'PY' import base64, json, handler, pathlib app = handler.EndpointHandler() img_b64 = base64.b64encode(pathlib.Path("tests/cat.jpg").read_bytes()).decode() print(app({"inputs": {"image": img_b64}})[:5]) # top-5 classes PY ``` --- ## 🌐 Calling the deployed endpoint ```bash ENDPOINT="https://.aws.endpoints.huggingface.cloud" TOKEN="hf_xxxxxxxxxxxxxxxxx" IMG="cat.jpg" python - <<'PY' import base64, json, os, requests, sys url = os.environ["ENDPOINT"] token = os.environ["TOKEN"] img = sys.argv payload = { "inputs": { "image": base64.b64encode(open(img, "rb").read()).decode() } } resp = requests.post( url, headers={ "Authorization": f"Bearer {token}", "Content-Type": "application/json", "Accept": "application/json", }, json=payload, timeout=60, ) print(json.dumps(resp.json()[:5], indent=2)) PY $IMG ``` *Response example* ```json [ { "id": 23, "label": "cat", "score": 0.92 }, { "id": 11, "label": "tiger cat", "score": 0.05 }, { "id": 48, "label": "siamese cat", "score": 0.02 } ] ``` --- ## ⚙️ How it works 1. **Startup (runs once per replica)** * Downloads / loads MobileCLIP-B (`datacompdr`). * Fuses MobileOne branches via `reparam.py`. * Reads `items.json` and encodes every prompt → `[N,512]` tensor. 2. **Per request** * Decodes base-64 JPEG/PNG. * Applies OpenCLIP preprocessing (224 × 224 center-crop + normalise). * Encodes the image, normalises, computes cosine similarity vs. cached text matrix. * Returns sorted `[{id, label, score}, …]`. --- ## 🔄 Updating the label set Simply edit `items.json`, push, and redeploy. ```json [ { "id": 0, "name": "cat", "prompt": "a photo of a cat" }, { "id": 1, "name": "dog", "prompt": "a photo of a dog" } ] ``` No code changes are required; the handler re-encodes prompts at start-up. --- ## ⚖️ License * **Weights / data** — Apple AMLR (see [`LICENSE_weights_data`](./LICENSE_weights_data)) * **This wrapper code** — MIT ---

_{Maintained with ❤️ by Your-Team — Aug 2025}