LEAF Embed BEIR

A text embedding model trained using LEAF (Lightweight Embedding Alignment Framework) Distillation to achieve competitive performance on the BEIR benchmark.

Model Description

This model was created by distilling knowledge from Snowflake/snowflake-arctic-embed-m-v1.5 (teacher) into a smaller, more efficient student architecture.

Architecture

Component	Details
Encoder	8-layer BERT with 512 hidden size
Attention Heads	8
Output Dimension	768
Parameters	~65M (vs 109M teacher)
Pooling	Mean pooling

Training

Method: LEAF Distillation (L2 loss on normalized embeddings)
Teacher: Snowflake/snowflake-arctic-embed-m-v1.5
Hardware: NVIDIA B200 GPU on Modal.com
Training Data: 5M samples from BEIR, MS MARCO, Wikipedia
Epochs: 3
Final Teacher-Student Similarity: 77.2%

Usage

With Transformers

import torch
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("wolfnuker/leaf-embed-beir")
model = AutoModel.from_pretrained("wolfnuker/leaf-embed-beir")

def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output.last_hidden_state
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

# Example usage
sentences = ["This is an example sentence", "Each sentence is converted to a vector"]
encoded = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")

with torch.no_grad():
    outputs = model(**encoded)
    embeddings = mean_pooling(outputs, encoded["attention_mask"])
    embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)

print(embeddings.shape)  # [2, 768]

With Sentence-Transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("wolfnuker/leaf-embed-beir")
embeddings = model.encode(["This is an example sentence", "Each sentence is converted"])

Evaluation Results

BEIR Benchmark

Dataset	NDCG@10
NFCorpus	0.0896

Note: This is an initial baseline model. Performance will improve with:

More training data and epochs
IE-specific contrastive training (entity masking, relation pairs)
Hyperparameter tuning

Training Details

Hyperparameters

Parameter	Value
Learning Rate	2e-5 → 2e-8 (cosine decay)
Batch Size	320 (64 × 5 gradient accumulation)
Warmup Ratio	10%
Mixed Precision	FP16
Max Sequence Length	256

Loss Function

LEAF uses L2 loss on normalized embeddings:

L = MSE(normalize(student_emb), normalize(teacher_emb))

Limitations

Trained primarily on English text
Initial baseline - further tuning recommended for production use
Optimized for retrieval, may need adaptation for other tasks

Citation

If you use this model, please cite:

@misc{leaf-embed-beir,
  author = {RankSaga},
  title = {LEAF Embed BEIR: Text Embeddings via Distillation},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/wolfnuker/leaf-embed-beir}
}

Acknowledgments

License

Apache 2.0

Downloads last month: 15

Datasets used to train wolfnuker/leaf-embed-beir

Evaluation results

ndcg_at_10 on BEIR
self-reported

0.090