danish-foundation-models
/

dfm-decoder-open-v0-7b-pt

@@ -10,29 +10,28 @@ base_model:
 - common-pile/comma-v0.1-2t
 pipeline_tag: text-generation
 ---
-# Munin-7B-Open-pt
-Munin-7B-open-pt is a 7 billion parameter language model continually pre-trained from [Comma v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t) using 30B tokens using a mix of the [DynaWord](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword) and the [Comma v0.1 dataset](https://huggingface.co/datasets/common-pile/comma_v0.1_training_dataset), both comprising only public domain and openly licensed data.
-Munin-7B-open-pt is a base model that can be used as the starting point for fine-tuning and post-training. It has not been instruction-tuned and cannot directly be expected to function as a chat model.
-## License
-The model is made available under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) open source license. It may therefore be used, modified, distributed, and sublicensed for any purpose, including commercial use, without the licensee having to release their own derivative works under the same permissive terms, provided that users retain copyright and license notices and document any modifications they make.
 ## Training details
-Munin-7B-open-pt has been trained using the [maester](https://github.com/rlrs/maester/tree/main/3aca26960eaa1a16250b3feda40303c240ba4ca1) framework developed as part of the [Danish Foundation Models project](https://foundationmodels.dk/). All training was performed on a single 8x NVIDIA B200 node (the first of its kind in Denmark) as part of the [SDU UCloud](https://cloud.sdu.dk/) research cloud.
-The training was performed in three stages, with data mix (open-stageK.py) and maester (open-stageK.toml) configuration files available in each subfolder. The datasets can be created using the create_dataset.py script provided in this repository.
 The characteristics of the three pre-training stages are detailed in the following table:
 | Stage | Batch size | Steps | HF path | Data mix | Comments |
 |-|-|-|-|-|-|
-| stage1 | 262,144 tok | 37,852| [subfolder="stage1"](https://huggingface.co/danish-foundation-models/munin-7b-open-pt/tree/main/stage1)  | 2/3 [DynaWord](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword/tree/9e230b35e31a510e5ab909112ad5bfc9463b2c23); <br> 1/3 [Common-Pile](https://huggingface.co/common-pile/comma_v0.1_training_dataset/5afc546db324e7f39f297ba757c9a60547151e7c) | Excludes depbank, jvj, nordjyllandnews, synne for DynaWord; <br> uses subsets and weighting from [Comma-v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t) cooldown phase for Common-Pile ; LR schedule with 1000 steps warmup, constant 1e-5, 1000 steps cooldown |
-| stage2 | 524,288 tok | 18,926 | [subfolder="stage2"](https://huggingface.co/danish-foundation-models/munin-7b-open-pt/tree/main/stage2)  | 2/3 [DynaWord](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword/tree/9e230b35e31a510e5ab909112ad5bfc9463b2c23); <br> 1/3 [Common-Pile](https://huggingface.co/common-pile/comma_v0.1_training_dataset/5afc546db324e7f39f297ba757c9a60547151e7c) | Excludes depbank, jvj, nordjyllandnews, synne for DynaWord; <br> uses subsets and weighting from [Comma-v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t) cooldown phase for Common-Pile; LR schedule with 500 steps warmup, constant 1e-5, 500 steps cooldown |
-| stage3 | 524,288 tok | 18,926 | [subfolder="stage3"](https://huggingface.co/danish-foundation-models/munin-7b-open-pt/tree/main/stage3)  | 2/3 [DynaWord](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword/tree/9e230b35e31a510e5ab909112ad5bfc9463b2c23); <br> 1/3 [Common-Pile](https://huggingface.co/common-pile/comma_v0.1_training_dataset/5afc546db324e7f39f297ba757c9a60547151e7c) | Excludes depbank, jvj, nordjyllandnews, synne for DynaWord; <br> uses subsets and weighting from [Comma-v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t) cooldown phase for Common-Pile; LR schedule with 500 steps warmup, square root decay from 1e-5 |
 ## Limitations
@@ -41,10 +40,20 @@ It will likely have poor performance on other languages or programming languages
 As a base model, Munin-7B-Open-pt has not been aligned for safety and may, for example, reflect social biases present in its training data or potentially provide toxic or harmful information.
 ## Project partners & funding
-The development of Munin-7B-Open-pt was performed in a close collaboration between [Aarhus University](https://chc.au.dk/), the [Alexandra Institute](https://alexandra.dk/), and the [University of Southern Denmark](https://www.sdu.dk/en/forskning/machine-learning) as part of the [Danish Foundation Models project](https://foundationmodels.dk/).
 Funding was provided by the [Danish Ministry of Digital Affairs](https://www.english.digmin.dk/) and the [Danish Ministry of Higher Education and Science](https://ufm.dk/en).
 ## How to cite
 Coming soon.

 - common-pile/comma-v0.1-2t
 pipeline_tag: text-generation
 ---
+# Munin-7B-Open-pt
+Munin-7B-open-pt is a 7 billion parameter language model continually pre-trained from [Comma v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t) using 30B tokens utilizing a mix of [Danish Dynaword](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword) and the [Comma v0.1 dataset](https://huggingface.co/datasets/common-pile/comma_v0.1_training_dataset), both comprising only public domain and openly licensed data.
+Munin-7B-o
+pen-pt is a base model that can serve as a starting point for fine-tuning and post-training. It has not been instruction-tuned and cannot directly be expected to function as a chat model.
 ## Training details
+Munin-7B-open-pt has been trained using the [maester](https://github.com/rlrs/maester/tree/main/3aca26960eaa1a16250b3feda40303c240ba4ca1) framework developed as part of [Danish Foundation Models](https://foundationmodels.dk/). All training was performed on a single 8x NVIDIA B200 node (the first of its kind in Denmark) as part of the [SDU UCloud](https://cloud.sdu.dk/) research cloud.
+The training was performed in three stages, with data mix (open-stageK.py) and maester (open-stageK.toml) configuration files available in each subfolder. The datasets can be created using the `create_dataset.py` script provided in this repository.
 The characteristics of the three pre-training stages are detailed in the following table:
 | Stage | Batch size | Steps | HF path | Data mix | Comments |
 |-|-|-|-|-|-|
+| stage1 | 262,144 tok | 37,852| [subfolder="stage1"](https://huggingface.co/danish-foundation-models/munin-7b-open-pt/tree/main/stage1)  | 2/3 [Dynaword](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword/tree/9e230b35e31a510e5ab909112ad5bfc9463b2c23); <br> 1/3 [Common-Pile](https://huggingface.co/common-pile/comma_v0.1_training_dataset/5afc546db324e7f39f297ba757c9a60547151e7c) | Excludes depbank, jvj, nordjyllandnews, synne for Dynaword; <br> uses subsets and weighting from [Comma-v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t) cooldown phase for Common-Pile ; LR schedule with 1000 steps warmup, constant 1e-5, 1000 steps cooldown |
+| stage2 | 524,288 tok | 18,926 | [subfolder="stage2"](https://huggingface.co/danish-foundation-models/munin-7b-open-pt/tree/main/stage2)  | 2/3 [Dynaword](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword/tree/9e230b35e31a510e5ab909112ad5bfc9463b2c23); <br> 1/3 [Common-Pile](https://huggingface.co/common-pile/comma_v0.1_training_dataset/5afc546db324e7f39f297ba757c9a60547151e7c) | Excludes depbank, jvj, nordjyllandnews, synne for Dynaword; <br> uses subsets and weighting from [Comma-v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t) cooldown phase for Common-Pile; LR schedule with 500 steps warmup, constant 1e-5, 500 steps cooldown |
+| stage3 | 524,288 tok | 18,926 | [subfolder="stage3"](https://huggingface.co/danish-foundation-models/munin-7b-open-pt/tree/main/stage3)  | 2/3 [Dynaword](https://huggingface.co/datasets/danish-foundation-models/danish-dynaword/tree/9e230b35e31a510e5ab909112ad5bfc9463b2c23); <br> 1/3 [Common-Pile](https://huggingface.co/common-pile/comma_v0.1_training_dataset/5afc546db324e7f39f297ba757c9a60547151e7c) | Excludes depbank, jvj, nordjyllandnews, synne for Dynaword; <br> uses subsets and weighting from [Comma-v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t) cooldown phase for Common-Pile; LR schedule with 500 steps warmup, square root decay from 1e-5 |
 ## Limitations
 As a base model, Munin-7B-Open-pt has not been aligned for safety and may, for example, reflect social biases present in its training data or potentially provide toxic or harmful information.
+## License
+The model is made available under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) open source license. It may therefore be used, modified, distributed, and sublicensed for any purpose, including commercial use, without the licensee having to release their own derivative works under the same permissive terms, provided that users retain copyright and license notices and document any modifications they make.
 ## Project partners & funding
+The development of Munin-7B-Open-pt was performed in a close collaboration between [Aarhus University](https://chc.au.dk/), the [Alexandra Institute](https://alexandra.dk/), and the [University of Southern Denmark](https://www.sdu.dk/en/forskning/machine-learning) as part of [Danish Foundation Models](https://foundationmodels.dk/).
 Funding was provided by the [Danish Ministry of Digital Affairs](https://www.english.digmin.dk/) and the [Danish Ministry of Higher Education and Science](https://ufm.dk/en).
 ## How to cite
 Coming soon.