NorOLMo

This is a base (not instruction-tuned) large language model, continually pre-trained on Norwegian data starting from the English OLMo2-13B model.

The model was trained for 33 000 steps on around 275 billion tokens. Intermediate checkpoints are published here as branches. The main branch contains the model's weights after step 33 000 (stage 3).

Evaluation

Below is a comparison of fully-open models supporting Norwegian. The figure shows the aggregate score across all 35 NorEval 1.1 tasks (5 categories, category average). Scores are first averaged within each task category, then averaged across categories. This gives equal weight to each category regardless of how many tasks it contains. Each task score is normalized to a 0–100 scale where 0 = random baseline performance and 100 = perfect score, then averaged across tasks. This accounts for different chance levels across tasks (e.g. 25% for 4-choice QA vs. 50% for binary classification).

More detailed evaluation is evailable in our interactive NorEval dashboard: https://ltgoslo.github.io/llm-dashboard.

Furthermore, an interactive per-checkpoint evaluation with additional ablation studies is available here.

Data Details

Stage 1 (24 000 steps -- 200B tokens)

Data ("pretraining data")

HPLTv3: Bokmål, Nynorsk, Faroese, Icelandic, Danish, Swedish
FinePDFs: Bokmål, Nynorsk, Faroese, Icelandic, Danish, Swedish
OLMo-Mix
Northern Sámi: (Glot500, Northern Sámi Web Corpus , SIKOR North Saami corpus)

Data splits

Data	Percentage	Unique Tokens	Total Tokens	Number of Documents	Average Document Length
HPLT Bokmål	39.57	39.8B	79.7B	36.5M	1 092
HPLT Nynorsk	4.95	1.2B	10.0B	1.5M	826
HPLT Faroese	0.46	0.2B	0.9B	0.3M	711
HPLT Icelandic	2.50	5.0B	5.0B	4.3M	1 173
HPLT Swedish	12.09	92.1B	24.4B	97.7M	942
HPLT Danish	12.12	50.1B	24.4B	52.5M	954
FinePDFs Bokmål	8.36	8.4B	16.8B	1.5M	5 604
FinePDFs Nynorsk	1.15	0.3B	2.3B	92.8K	3 117
FinePDFs Faroese	0.17	87.1M	0.3B	20.8K	4 196
FinePDFs Icelandic	1.60	3.2B	3.2B	0.4M	8 855
FinePDFs Swedish	2.48	18.9B	5.0B	4.1M	4 574
FinePDFs Danish	2.45	10.1B	4.9B	2.4M	4 190
Northern Sami	0.18	46.4M	0.4B	0.2M	288
Wiki (OLMo-Mix)	0.02	0.2B	40.3M	0.3M	667
Alg. Stack (OLMo-Mix)	0.04	0.6B	80.5M	0.1M	4 201
Open Web Math (OLMo-Mix)	0.04	0.6B	80.5M	0.1M	4 199
ArXiv (OLMo-Mix)	0.05	1.0B	0.1B	0.2M	5 210
PeS2o (OLMo-Mix)	0.15	2.5B	0.3B	1.6M	1 641
DCLM (OLMo-Mix)	9.50	48.3B	19.1B	35.1M	1 377
StarCoder (OLMo-Mix)	2.10	30.5B	4.2B	23.6M	1 293

The number of documents represents the total unique number of documents, not the documents used during training.

We only took a portion of OLMo-Mix as our unique data.

Stage 2 (6 000 steps -- 50B tokens) and Stage 3 (3 000 steps -- 25B tokens)

Data ("midtraining data")

HPLTv3 (filtered): Bokmål, Nynorsk, Icelandic, Danish, Swedish
FinePDFs-Edu: Bokmål, Nynorsk, Icelandic, Danish, Swedish, English
FinePDFs: Faroese
Northern Sámi: (Glot500, Northern Sámi Web Corpus , SIKOR North Saami corpus)
Stack-Edu
MegaMath Web-Pro
FineMath 4+
InfiWebMath 4+

Data splits

Data Splits

Data	Percentage	Unique Tokens	Total Tokens	Number of Documents	Average Document Length
HPLT Bokmål	45.78	23.0B	23.0B	19.0M	1 215
HPLT Nynorsk	7.84	1.0B	3.9B	1.0M	1 003
HPLT Icelandic	6.87	3.5B	3.5B	2.7M	1 268
HPLT Swedish	4.90	2.5B	2.5B	3.6M	3 403
HPLT Danish	7.73	3.9B	3.9B	4.1M	2 950
FinePDFs-Edu Bokmål	2.24	1.1B	1.1B	0.2M	6 897
FinePDFs-Edu Nynorsk	0.28	35.8M	0.1B	9.7K	3 681
FinePDFs Faroese	0.69	87.1M	0.3B	20.8K	4 196
FinePDFs-Edu Icelandic	0.53	0.3B	0.3B	40.1K	6 598
FinePDFs-Edu Swedish	5.80	2.9B	2.9B	0.4M	6 755
FinePDFs-Edu Danish	2.97	1.5B	1.5B	0.3M	5 833
FinePDFs-Edu English	7.00	7.2B	3.5B	1.1M	6 280
Northern Sami	0.37	46.4M	0.2B	0.2M	288
Stack-Edu	5.00	12.8B	2.5B	15.0M	856
MegaMath Web-Pro	0.84	13.7B	0.4B	15.0M	917
FineMath 4+	0.62	10.1B	0.3B	6.7M	1 512
InfiWebMath 4+	0.54	8.9B	0.3B	6.3M	1 417

Training details

Stage 1

Hyperparameter	Value
Embedding train steps	1 000
Warmup steps	2 000
Total train steps	24 000
Learning rate schedule	Warmup + constant
Learning rate	3e-4
Weight decay	1e-1
Sequence length	4 096
Batch size	2 048
RoPe theta	500 000
Clip grad	1.0
Adam epsilon	1e-8
Adam beta_1	0.9
Adam beta_2	0.95
RMSNorm epsilon	1e-6
Z-loss ratio	1e-5
Diffusion loss ratio	2e-2

Stage 2

Hyperparameter	Value
Decay steps	6 000
Total train steps	6 000
Learning rate schedule	Linear decay
Initial learning rate	3e-4
Final learning rate	1.5e-4
Weight decay	1e-1
Sequence length	4 096
Batch size	2 048
RoPe theta	500 000
Clip grad	1.0
Adam epsilon	1e-8
Adam beta_1	0.9
Adam beta_2	0.95
RMSNorm epsilon	1e-6
Z-loss ratio	1e-5
Diffusion loss ratio	2e-2

Stage 3

Hyperparameter	Value
Decay steps	3 000
Total train steps	3 000
Learning rate schedule	Linear decay
Max learning rate	1.5e-4
Final learning rate	0
Weight decay	1e-1
Sequence length	16 384
Batch size	512
RoPe theta	2 000 000
Clip grad	1.0
Adam epsilon	1e-8
Adam beta_1	0.9
Adam beta_2	0.95
RMSNorm epsilon	1e-6
Z-loss ratio	1e-5
Diffusion loss ratio	2e-2

Acknowledgements

Training was conducted as a part of the HPLT project.

This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070350 and from UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee [grant number 10052546]

Downloads last month: 993

Model tree for HPLT/NorOLMo-13B

Base model

allenai/OLMo-2-1124-13B

Finetuned

(5)

this model

Datasets used to train HPLT/NorOLMo-13B

Collections including HPLT/NorOLMo-13B

Large Language Models

Collection

16 items • Updated Jan 21

Continually pre-trained models

Collection

Language-specific LLMs continually pre-trained from fully open English base models • 2 items • Updated Jan 21 • 1