Mistral Nemo GCP Officer v1

A LoRA fine-tune of Mistral Nemo Instruct (12.2B) specialised in Good Clinical Practice (GCP) concepts, terminology, and regulatory guidance for clinical trials.

Model Description

This adapter was trained on a synthetic instruction-following dataset derived from GCP concepts and glossaries. The goal is to produce a model that can accurately explain, summarise, and reason about GCP principles — covering topics such as informed consent, investigator responsibilities, sponsor obligations, IRB/IEC oversight, essential documents, adverse event reporting, and ICH E6(R2) guidelines.

This is a LoRA adapter, not a standalone model. It must be loaded on top of the base model using PEFT.

Attribute	Value
Base model	`mistralai/Mistral-Nemo-Instruct-2407`
Parameters (base)	12.25B
Trainable parameters	9.83M (0.08% of total)
Architecture	MistralForCausalLM — 40 layers, GQA (32 heads / 8 KV heads)
Context length	128K tokens (base model) — trained with max_length 2048
Precision	BF16
License	Apache 2.0

Training Details

Data

Dataset: 815 synthetic instruction–output pairs covering GCP concepts and glossary terms (v1.0)
Format: Alpaca-style (instruction / input / output fields)
Split: 90/10 train/eval → 733 training, 82 evaluation examples

LoRA Configuration

Hyperparameter	Value
Rank (r)	8
Alpha	16
Dropout	0.05
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`
Bias	None
PEFT version	0.18.1

Training Hyperparameters

Hyperparameter	Value
Epochs	3
Batch size (per device)	2
Gradient accumulation steps	2
Effective batch size	4
Learning rate	2 × 10⁻⁴
Optimizer	AdamW (torch)
Weight decay	0.01
Warmup steps	100
Precision	BF16
Hardware	NVIDIA H100 NVL
Total training time	~17 minutes (549 steps)

Training Results

Epoch	Training Loss	Validation Loss
1	1.6238	1.5943
2	1.1876	1.5455

Validation loss decreased from epoch 1 to 2, with training loss continuing to drop. The gap between training and validation loss at epoch 2 suggests the model is approaching the useful limit for this dataset size.

Usage

Loading the adapter

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model_id = "mistralai/Mistral-Nemo-Instruct-2407"
adapter_id = "NvMayMay/mistral-nemo-GCP-officerv1"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype="bfloat16",
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)

Inference

prompt = "Instruction: What are the primary responsibilities of a clinical trial sponsor under ICH E6(R2)?\nOutput:"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use

Educational tool for learning GCP concepts and clinical trial regulations
Rapid look-up and explanation of GCP terminology
Drafting study-level summaries of regulatory obligations
Supporting training material development for clinical research staff

Limitations

Small training set (815 examples): The model may not cover edge cases or nuanced regulatory scenarios
Synthetic data only: Responses have not been validated against primary regulatory source documents
Not a regulatory authority: Outputs should not be treated as legal or regulatory advice — always verify against the official ICH E6(R2) guideline and applicable local regulations
Validation loss plateau: The train/eval loss gap at epoch 2 suggests limited headroom without additional data
English only

Citation

If you use this model, please cite the base model and this adapter:

@misc{mistral-nemo-gcp-officerv1,
  title={Mistral Nemo GCP Officer v1},
  author={NvMayMay},
  year={2025},
  url={https://huggingface.co/NvMayMay/mistral-nemo-GCP-officerv1},
}

Downloads last month: -

Safetensors

Model size

12B params

Tensor type

BF16

Model tree for NvMayMay/mistral-nemo-GCP-officerv1

Base model

mistralai/Mistral-Nemo-Base-2407

Finetuned

mistralai/Mistral-Nemo-Instruct-2407

Adapter

(57)

this model