Biomedical datasets & models
Collection
8 items • Updated • 6
How to use almanach/Biomed-Enriched-classifier with Transformers:
# Load model directly
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("almanach/Biomed-Enriched-classifier")
model = AutoModel.from_pretrained("almanach/Biomed-Enriched-classifier")This is the model used to create the Biomed-Enriched dataset.
xlm-roberta-baseThe model was trained on a set of 400,000 paragraphs from PubMed Central, which were annotated by the Llama 3.1 70B Instruct model.
This classifier was created to scale the initial high-quality annotations to the entire PubMed Open Access dataset. This distillation process enabled the creation of the large-scale Biomed-Enriched dataset while maintaining annotation consistency.
The model predicts the following outputs:
ClinicalBiomedicalOtherClinical CaseStudyReviewOther1 (low quality) to 5 (high quality).Base model
FacebookAI/xlm-roberta-base