xelm-gemma-4b-austronesian-layer-reg

Layer-range L2-SP regularization: middle layers receive a larger L2 penalty against the base Gemma-3-4B weights than the first/last layers. Soft equivalent of layer freezing.

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("sanchitahuja205/xelm-gemma-4b-austronesian-layer-reg")
tokenizer = AutoTokenizer.from_pretrained("sanchitahuja205/xelm-gemma-4b-austronesian-layer-reg")

Training recipe

The exact training recipe lives in configs/yaml/train_gemma_layer_range.yaml in the code repo. The resolved config used for this specific run is also included in this model repo as training_config.yaml — load it with pyrallis to reproduce the run bit-for-bit:

python train.py --config_path configs/yaml/train_gemma_layer_range.yaml

Citation

@misc{ahuja2026parameteralignmentmitigatescatastrophic,
      title={Parameter Alignment Mitigates Catastrophic Forgetting in Multilingual Expert Language Models},
      author={Sanchit Ahuja and Terra Blevins},
      year={2026},
      eprint={2606.00284},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2606.00284},
}
Downloads last month
20
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sanchitahuja205/xelm-gemma-4b-austronesian-layer-reg

Finetuned
(304)
this model

Collection including sanchitahuja205/xelm-gemma-4b-austronesian-layer-reg

Paper for sanchitahuja205/xelm-gemma-4b-austronesian-layer-reg