Mistral Nemo GCP Officer v1

A LoRA fine-tune of Mistral Nemo Instruct (12.2B) specialised in Good Clinical Practice (GCP) concepts, terminology, and regulatory guidance for clinical trials.

Model Description

This adapter was trained on a synthetic instruction-following dataset derived from GCP concepts and glossaries. The goal is to produce a model that can accurately explain, summarise, and reason about GCP principles โ€” covering topics such as informed consent, investigator responsibilities, sponsor obligations, IRB/IEC oversight, essential documents, adverse event reporting, and ICH E6(R2) guidelines.

This is a LoRA adapter, not a standalone model. It must be loaded on top of the base model using PEFT.

Attribute Value
Base model mistralai/Mistral-Nemo-Instruct-2407
Parameters (base) 12.25B
Trainable parameters 9.83M (0.08% of total)
Architecture MistralForCausalLM โ€” 40 layers, GQA (32 heads / 8 KV heads)
Context length 128K tokens (base model) โ€” trained with max_length 2048
Precision BF16
License Apache 2.0

Training Details

Data

  • Dataset: 815 synthetic instructionโ€“output pairs covering GCP concepts and glossary terms (v1.0)
  • Format: Alpaca-style (instruction / input / output fields)
  • Split: 90/10 train/eval โ†’ 733 training, 82 evaluation examples

LoRA Configuration

Hyperparameter Value
Rank (r) 8
Alpha 16
Dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj
Bias None
PEFT version 0.18.1

Training Hyperparameters

Hyperparameter Value
Epochs 3
Batch size (per device) 2
Gradient accumulation steps 2
Effective batch size 4
Learning rate 2 ร— 10โปโด
Optimizer AdamW (torch)
Weight decay 0.01
Warmup steps 100
Precision BF16
Hardware NVIDIA H100 NVL
Total training time ~17 minutes (549 steps)

Training Results

Epoch Training Loss Validation Loss
1 1.6238 1.5943
2 1.1876 1.5455

Validation loss decreased from epoch 1 to 2, with training loss continuing to drop. The gap between training and validation loss at epoch 2 suggests the model is approaching the useful limit for this dataset size.

Usage

Loading the adapter

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model_id = "mistralai/Mistral-Nemo-Instruct-2407"
adapter_id = "NvMayMay/mistral-nemo-GCP-officerv1"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype="bfloat16",
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)

Inference

prompt = "Instruction: What are the primary responsibilities of a clinical trial sponsor under ICH E6(R2)?\nOutput:"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use

  • Educational tool for learning GCP concepts and clinical trial regulations
  • Rapid look-up and explanation of GCP terminology
  • Drafting study-level summaries of regulatory obligations
  • Supporting training material development for clinical research staff

Limitations

  • Small training set (815 examples): The model may not cover edge cases or nuanced regulatory scenarios
  • Synthetic data only: Responses have not been validated against primary regulatory source documents
  • Not a regulatory authority: Outputs should not be treated as legal or regulatory advice โ€” always verify against the official ICH E6(R2) guideline and applicable local regulations
  • Validation loss plateau: The train/eval loss gap at epoch 2 suggests limited headroom without additional data
  • English only

Citation

If you use this model, please cite the base model and this adapter:

@misc{mistral-nemo-gcp-officerv1,
  title={Mistral Nemo GCP Officer v1},
  author={NvMayMay},
  year={2025},
  url={https://huggingface.co/NvMayMay/mistral-nemo-GCP-officerv1},
}
Downloads last month
-
Safetensors
Model size
12B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for NvMayMay/mistral-nemo-GCP-officerv1