bilalfaye
/

OneEncoder-text-image-audio

model_hub_mixin

pytorch_model_hub_mixin

Model card Files Files and versions

bilalfaye commited on Mar 4

Commit

5cbbc44

·

verified ·

1 Parent(s): 4e84607

Update README.md

Files changed (1) hide show

README.md +36 -3

README.md CHANGED Viewed

@@ -2,8 +2,41 @@
 tags:
 - model_hub_mixin
 - pytorch_model_hub_mixin
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Library: [More Information Needed]
-- Docs: [More Information Needed]

 tags:
 - model_hub_mixin
 - pytorch_model_hub_mixin
+language:
+- en
+base_model:
+- google/vit-base-patch16-224
+- google-bert/bert-base-uncased
+- facebook/wav2vec2-base-960h
 ---
+# 🖼️📝 OneEncoder: A Unified Text & Image & Audio Model
+**OneEncoder** is a lightweight framework for cross-modal alignment, focusing on efficiently integrating **text, image and audio** (with future extensions to other modalities). Unlike traditional methods relying on massive modality-specific encoders, OneEncoder progressively aligns different data types, making it cost-effective and performant even on small paired datasets.
+## 🚀 Key Features
+✅ **Multimodal Alignment**: Initially supports **text & image & audio**, with extension to other modalities.
+✅ **Lightweight & Efficient**: Avoids full retraining when adding new modalities.
+✅ **Superior Performance**: Outperforms models that require large specialized datasets.
+## 🎯 Applications
+- **Visual Question Answering (VQA)**
+- **Image-Text-Audio Retrieval**
+- **Multimodal Content Understanding**
+## 📄 Research Paper
+📜 **arXiv**: [OneEncoder: Progressive Cross-Modal Alignment](https://arxiv.org/abs/2409.11059)
+## 📌 Resources
+🔗 **GitHub Repo**: [OneEncoder](https://github.com/b-faye/OneEncoder)
+🚀 **Hugging Face Demo**: [OneEncoder Retriever](https://huggingface.co/spaces/bilalfaye/OneEncoder-retriever)
+📓 **Demo Notebook**: [OneEncoder Demos](https://github.com/b-faye/OneEncoder/tree/main/demo)
+🔊 **OneEncoder for Text, Image**: [HF Model](https://huggingface.co/bilalfaye/OneEncoder-text-image)
+🔊 **OneEncoder for Text, Image & Video**: [HF Model](https://huggingface.co/bilalfaye/OneEncoder-text-image-video)
+🔊 **OneEncoder for Text, Image & X-ray**: [HF Model](https://huggingface.co/bilalfaye/OneEncoder-text-image-xray)
+## 📝 Authors
+📌 **Bilal FAYE**, Hanane AZZAG, Mustapha LEBBAH, Djamel BOUCHAFFRA
+**Note: This model is training with temperature=2.5 and addition as fusion operation**