OpenCensor-H1-Mini

OpenCensor-H1-Mini is a lightweight, efficient version of OpenCensor-H1, designed to detect profanity, toxicity, and offensive content in Hebrew text. It is fine-tuned on the onlplab/alephbert-base architecture.

Model Details

  • Model Name: OpenCensor-H1-Mini
  • Base Model: onlplab/alephbert-base
  • Task: Binary Classification (0 = Clean, 1 = Toxic/Profane)
  • Language: Hebrew
  • Max Sequence Length: 256 tokens (optimized for efficiency)

Performance

Metric Score
Accuracy 0.9826
F1-Score 0.9823
Precision 0.9812
Recall 0.9835

Note: Best Threshold = 0.17

Training Graphs

Validation F1 Threshold Analysis
Validation F1 Thresholds

Final Test Metrics

How to Use

You can use this model directly with the Hugging Face transformers library.

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the model
model_id = "LikoKIko/OpenCensor-H1-Mini"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id).eval()

def predict(text):
    # Tokenize input
    inputs = tokenizer(
        text, 
        return_tensors="pt", 
        truncation=True, 
        padding=True, 
        max_length=256
    )
    
    # Predict
    with torch.no_grad():
        logits = model(**inputs).logits
        score = torch.sigmoid(logits).item()
        
    return {
        "text": text,
        "score": round(score, 4),
        "is_toxic": score >= 0.17  # Threshold
    }

# Example usage
text = "ืื ื™ ืื•ื”ื‘ ืืช ื›ื•ืœื" # "I love everyone"
print(predict(text))

Training Info

The model was trained using an optimized pipeline featuring:

  • Gradient Accumulation: Ensures stable training with larger effective batch sizes.
  • Smart Text Cleaning: Removes noise while preserving Hebrew, English, and important symbols (@#$%*).
  • Dynamic Padding: Uses efficient token lengths based on data distribution.

License

CC-BY-SA-4.0

Citation

@misc{opencensor-h1-mini,
  title = {OpenCensor-H1-Mini: Hebrew Profanity Detection Model},
  author = {LikoKIko},
  year = {2025},
  url = {https://huggingface.co/LikoKIko/OpenCensor-H1-Mini}
}
Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for LikoKIko/OpenCensor-H1-Mini

Finetuned
(8)
this model

Space using LikoKIko/OpenCensor-H1-Mini 1