GetSoloTech
/

Llama3.2-Medical-Notes-1B-ONNX

 - medical
 - summary
 - endocronology
+---
+# Llama3.2-Medical-Notes-1B-ONNX
+This is the ONNX quantized version of the [Llama3.2-Medical-Notes-1B](https://huggingface.co/GetSoloTech/Llama3.2-Medical-Notes-1B) model, optimized for efficient inference and deployment.
+## Model Details
+- **Base Model:** [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
+- **Fine-tuning Method:** PEFT (Parameter-Efficient Fine-Tuning) using LoRA
+- **Training Framework:** Unsloth library for accelerated fine-tuning and merging
+- **Quantization:** ONNX format for optimized inference
+- **Task:** Text Generation (specifically, generating structured SOAP notes)
+## Paper
+- [arXiv: 2507.03033](https://arxiv.org/abs/2507.03033)
+- [medRxiv: 10.1101/2025.07.01.25330679v1](https://www.medrxiv.org/content/10.1101/2025.07.01.25330679v1)
+## Intended Use
+**Input:** Free-text medical transcripts (doctor-patient conversations or dictated notes).
+**Output:** Structured medical notes with clearly defined sections (Demographics, Presenting Illness, History, etc.).
+## Usage with ONNX Runtime
+```python
+import onnxruntime as ort
+from transformers import AutoTokenizer
+import numpy as np
+# Load the ONNX model
+model_name = "GetSoloTech/Llama3.2-Medical-Notes-1B-ONNX"
+tokenizer = AutoTokenizer.from_pretrained("GetSoloTech/Llama3.2-Medical-Notes-1B")
+# Initialize ONNX Runtime session
+session = ort.InferenceSession(onnx_file_path)
+SYSTEM_PROMPT = """Convert the following medical transcript to a structured medical note.
+Use these sections in this order:
+1. Demographics
+   - Name, Age, Sex, DOB
+2. Presenting Illness
+   - Bullet point statements of the main problem and duration.
+3. History of Presenting Illness
+   - Chronological narrative: symptom onset, progression, modifiers, associated factors.
+4. Past Medical History
+   - List chronic illnesses and past medical diagnoses mentioned in the transcript. Do not include surgeries.
+5. Surgical History
+   - List prior surgeries with year if known, as mentioned in the transcript.
+6. Family History
+   - Relevant family history mentioned in the transcript.
+7. Social History
+   - Occupation, tobacco/alcohol/drug use, exercise, living situation if mentioned in the transcript.
+8. Allergy History
+   - Drug, food, or environmental allergies and reactions, if mentioned in the transcript.
+9. Medication History
+   - List medications the patient is already taking. Do not include any new or proposed drugs in this section.
+10. Dietary History
+    - If unrelated, write "Not applicable"; otherwise, summarize the diet pattern.
+11. Review of Systems
+    - Head-to-toe, alphabetically ordered bullet points; include both positives and pertinent negatives as mentioned in the transcript.
+12. Physical Exam Findings
+    - Vital Signs (BP, HR, RR, Temp, SpO₂, HT, WT, BMI) if mentioned in the transcript.
+    - Structured by system: General, HEENT, Cardiovascular, Respiratory, Abdomen, Neurological, Musculoskeletal, Skin, Psychiatric—as mentioned in the transcript.
+13. Labs and Imaging
+    - Summarize labs and imaging results.
+14. ASSESSMENT
+    - Provide a brief summary of the clinical assessment or diagnosis based on the information in the transcript.
+15. PLAN
+    - Outline the proposed management plan, including treatments, medications, follow-up, and patient instructions as discussed.
+Please use only the information present in the transcript. If an information is not mentioned or not applicable, state "Not applicable." Format each section clearly with its heading.
+"""
+def generate_structured_note_onnx(transcript):
+    message = [
+        {"role": "system", "content": SYSTEM_PROMPT},
+        {"role": "user", "content": f"<START_TRANSCRIPT>\n{transcript}\n<END_TRANSCRIPT>\n"},
+    ]
+    # Apply chat template
+    inputs = tokenizer.apply_chat_template(
+        message,
+        tokenize=True,
+        add_generation_prompt=True,
+        return_tensors="pt",
+    )
+    # Convert to numpy for ONNX inference
+    input_ids = inputs.numpy()
+    # Run inference with ONNX Runtime
+    outputs = session.run(
+        None,
+        {"input_ids": input_ids}
+    )
+    # Process outputs and generate text
+    # Note: This is a simplified example. You may need to implement proper text generation logic
+    return "Generated structured medical note..."
+# Example usage
+transcript = "Patient is a 45-year-old male presenting with chest pain for the past 2 days..."
+note = generate_structured_note_onnx(transcript)
+print("\n--- Generated Response ---")
+print(note)
+print("---------------------------")
+```
+## Alternative Usage with Transformers (Original Model)
+If you prefer to use the original model instead of the ONNX version:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "GetSoloTech/Llama3.2-Medical-Notes-1B"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
+def generate_structured_note(transcript):
+    message = [
+        {"role": "system", "content": SYSTEM_PROMPT},
+        {"role": "user", "content": f"<START_TRANSCRIPT>\n{transcript}\n<END_TRANSCRIPT>\n"},
+    ]
+    inputs = tokenizer.apply_chat_template(
+        message,
+        tokenize=True,
+        add_generation_prompt=True,
+        return_tensors="pt",
+    ).to(model.device)
+    outputs = model.generate(
+        input_ids=inputs,
+        max_new_tokens=2048,
+        temperature=0.2,
+        top_p=0.85,
+        min_p=0.1,
+        top_k=20,
+        do_sample=True,
+        eos_token_id=tokenizer.eos_token_id,
+        use_cache=True,
+    )
+    input_token_len = len(inputs[0])
+    generated_tokens = outputs[:, input_token_len:]
+    note = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
+    if "<START_NOTES>" in note:
+       note = note.split("<START_NOTES>")[-1].strip()
+    if "<END_NOTES>" in note:
+       note = note.split("<END_NOTES>")[0].strip()
+    return note
+```
+## Performance Benefits
+The ONNX version provides:
+- **Faster inference** through optimized runtime
+- **Reduced memory footprint** through quantization
+- **Cross-platform compatibility** for deployment
+- **Production-ready** inference capabilities
+## Requirements
+- `onnxruntime` for ONNX inference
+- `transformers` for tokenization
+- `numpy` for array operations