LEAF Embed BEIR
A text embedding model trained using LEAF (Lightweight Embedding Alignment Framework) Distillation to achieve competitive performance on the BEIR benchmark.
Model Description
This model was created by distilling knowledge from Snowflake/snowflake-arctic-embed-m-v1.5 (teacher) into a smaller, more efficient student architecture.
Architecture
| Component | Details |
|---|---|
| Encoder | 8-layer BERT with 512 hidden size |
| Attention Heads | 8 |
| Output Dimension | 768 |
| Parameters | ~65M (vs 109M teacher) |
| Pooling | Mean pooling |
Training
- Method: LEAF Distillation (L2 loss on normalized embeddings)
- Teacher:
Snowflake/snowflake-arctic-embed-m-v1.5 - Hardware: NVIDIA B200 GPU on Modal.com
- Training Data: 5M samples from BEIR, MS MARCO, Wikipedia
- Epochs: 3
- Final Teacher-Student Similarity: 77.2%
Usage
With Transformers
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("wolfnuker/leaf-embed-beir")
model = AutoModel.from_pretrained("wolfnuker/leaf-embed-beir")
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output.last_hidden_state
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
# Example usage
sentences = ["This is an example sentence", "Each sentence is converted to a vector"]
encoded = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
outputs = model(**encoded)
embeddings = mean_pooling(outputs, encoded["attention_mask"])
embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
print(embeddings.shape) # [2, 768]
With Sentence-Transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("wolfnuker/leaf-embed-beir")
embeddings = model.encode(["This is an example sentence", "Each sentence is converted"])
Evaluation Results
BEIR Benchmark
| Dataset | NDCG@10 |
|---|---|
| NFCorpus | 0.0896 |
Note: This is an initial baseline model. Performance will improve with:
- More training data and epochs
- IE-specific contrastive training (entity masking, relation pairs)
- Hyperparameter tuning
Training Details
Hyperparameters
| Parameter | Value |
|---|---|
| Learning Rate | 2e-5 → 2e-8 (cosine decay) |
| Batch Size | 320 (64 × 5 gradient accumulation) |
| Warmup Ratio | 10% |
| Mixed Precision | FP16 |
| Max Sequence Length | 256 |
Loss Function
LEAF uses L2 loss on normalized embeddings:
L = MSE(normalize(student_emb), normalize(teacher_emb))
Limitations
- Trained primarily on English text
- Initial baseline - further tuning recommended for production use
- Optimized for retrieval, may need adaptation for other tasks
Citation
If you use this model, please cite:
@misc{leaf-embed-beir,
author = {RankSaga},
title = {LEAF Embed BEIR: Text Embeddings via Distillation},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/wolfnuker/leaf-embed-beir}
}
Acknowledgments
- MongoDB LEAF Paper
- Snowflake Arctic Embed
- Modal.com for GPU compute
License
Apache 2.0
- Downloads last month
- 15
Datasets used to train wolfnuker/leaf-embed-beir
Evaluation results
- ndcg_at_10 on BEIRself-reported0.090