Global Corpus lumees/turkish-corpus-100b Viewer • Updated Nov 30, 2025 • 107M • 1.19k • 3 lumees/multilingual-safety-classification-dataset Viewer • Updated Oct 24, 2025 • 213k • 234 • 2 lumees/bulgarian-corpus-33b Viewer • Updated Nov 30, 2025 • 34.9M • 920 • 3 lumees/dutch-corpus-200b Viewer • Updated Dec 1, 2025 • 170M • 355 • 3
Turkish Retrieval Datasets lumees/ms-marco-tr-hard-negatives Viewer • Updated Nov 27, 2025 • 786k • 47 • 2 lumees/wikipedia-turkish-synthetic-query Viewer • Updated Nov 28, 2025 • 19.8k • 29 • 3
Retrieval Models lumees/lumees-matryoshka-embedding-v1 Sentence Similarity • 0.6B • Updated Nov 25, 2025 • 15 • 2 lumees/lumees-matryoshka-vision-embedding-v1 Feature Extraction • Updated Nov 26, 2025 • 4 • 3 lumees/aethel-reranker-en-v1 Text Ranking • 0.1B • Updated Nov 20, 2025 • 78 • 3
Code Retrieval Datasets lumees/codesearchnet-hard-negatives Viewer • Updated Nov 28, 2025 • 955k • 27 • 2
Safety & Moderation Datasets Comprehensive collection of high-quality multilingual datasets for NLP research and production. lumees/multilingual-safety-classification-dataset Viewer • Updated Oct 24, 2025 • 213k • 234 • 2 lumees/age-specific-text-simplification Viewer • Updated Aug 13, 2025 • 17.2k • 35 • 2
Retrieval Models lumees/lumees-matryoshka-embedding-v1 Sentence Similarity • 0.6B • Updated Nov 25, 2025 • 15 • 2 lumees/lumees-matryoshka-vision-embedding-v1 Feature Extraction • Updated Nov 26, 2025 • 4 • 3 lumees/aethel-reranker-en-v1 Text Ranking • 0.1B • Updated Nov 20, 2025 • 78 • 3
Global Corpus lumees/turkish-corpus-100b Viewer • Updated Nov 30, 2025 • 107M • 1.19k • 3 lumees/multilingual-safety-classification-dataset Viewer • Updated Oct 24, 2025 • 213k • 234 • 2 lumees/bulgarian-corpus-33b Viewer • Updated Nov 30, 2025 • 34.9M • 920 • 3 lumees/dutch-corpus-200b Viewer • Updated Dec 1, 2025 • 170M • 355 • 3
Code Retrieval Datasets lumees/codesearchnet-hard-negatives Viewer • Updated Nov 28, 2025 • 955k • 27 • 2
Turkish Retrieval Datasets lumees/ms-marco-tr-hard-negatives Viewer • Updated Nov 27, 2025 • 786k • 47 • 2 lumees/wikipedia-turkish-synthetic-query Viewer • Updated Nov 28, 2025 • 19.8k • 29 • 3
Safety & Moderation Datasets Comprehensive collection of high-quality multilingual datasets for NLP research and production. lumees/multilingual-safety-classification-dataset Viewer • Updated Oct 24, 2025 • 213k • 234 • 2 lumees/age-specific-text-simplification Viewer • Updated Aug 13, 2025 • 17.2k • 35 • 2