Axon
Collection
A collection of memory, parameter, and computationally efficient models.
•
2 items
•
Updated
•
1
Visit this space to test out Axon! View how it chooses tokens and how sampling parameters change the behavior.
Axon-Nano-6M is a small-scale language model built on the novel Axon architecture. It is a recurrent neural network with multi-timescale diagonal memory and local convolution mixing. Unlike Transformers, Axon models have O(1) memory per token during inference, making them efficient for long-context generation, while remaining highly expressive.
| Component | Configuration |
|---|---|
| Parameters | 6,204,032 |
| Layers | 4 |
| Model Dimension | 256 |
| Fast/Slow Split | 64/192 |
| Memory Dimension | 128 |
| Control Dimension | 128 |
| FFN Multiplier | 6× |
| Local Kernel | 5 |
| Vocab Size | 2,048 (BPE) |
| Max Sequence | 256 |
| Group | Size | Forget Range | Input Max | Purpose |
|---|---|---|---|---|
| Fast | 43 | [0.05, 0.90] | 0.75 | Recent context |
| Medium | 43 | [0.01, 0.98] | 0.60 | Medium-term |
| Slow | 42 | [0.001, 0.995] | 0.45 | Long-term memory |
| Step | Train Loss | Val Loss | Val PPL |
|---|---|---|---|
| 500 | 2.82 | 2.74 | 15.51 |
| 1000 | 2.30 | 2.25 | 9.51 |
| 2000 | 2.03 | 1.97 | 7.14 |
| 3000 | 1.90 | 1.85 | 6.38 |
| 4000 | 1.82 | 1.77 | 5.87 |
| 5000 | 1.79 | 1.73 | 5.62 |
The model is slightly undertrained.
=== Sample 1 ===
[INST] I feel so angry about the public spaces we spent with my new dog after that event [/INST] That is disappointing; it can be frustrating when people do not think about the effort we put in. [INST] Yes, it seems like some people do not understand how their actions affect others [/INST] It does seem unkind, especially when they do not share their concerns to others. [INST] I just wish people could think more before they speak up for no good reason [/INST] That is a good wish, sometimes people do not realize the impact of their actions. [INST] It makes me wonder if they ever think about how their actions affect others [/INST] Caring for others is important, yet many still prefer
=== Sample 2 ===
[INST] How about we talk about the weather today because it is sunny and warm? [/INST] It is great to see you too, the warm sun makes everything bright. [INST] Yes, the air feels better; I love sunny days when it is sunny. [/INST] Sailing on the water really helps us feel refreshed and happy. [INST] What is your favorite part of sunny days to stay positive? [/INST] I enjoy reading books, especially when the sun is shining brightly outside. [INST] That sounds nice, I also like wearing my favorite spoon in my garden. [/INST] Warm milk always brings joy to everyone, do you enjoy spending time with nature?
<|endoftext
=== Sample 3 ===
[INST] Hello, I feel a bit sad today because I heard some false stories I heard about me. [/INST] I understand, those stories can sometimes make us feel fearful or sad. [INST] Yes, it hurts a lot, and I wonder if anyone will ever feel the same way. [/INST] Art can bring peace, but it often seems to come back slowly when we least expect it. [INST] I hope something can bring back some of that sadness and find peace again. [/INST] It is always possible for things to be valued; maybe you will find new ways to share.
<|endoftext|>
[INST] Why do people find it so disgusting when I just sneez
=== Sample 4 ===
[INST] Hello, I feel a bit sad today because everything seems so cold and gloomy outside. [/INST] It is understandable to feel that way when the days seem to pass so quickly. [INST] I worry about how bad the weather can make us feel gloomy sometimes. [/INST] Yes, the current weather can often impact our mood and energy levels. [INST] Do you think the rain will stop soon, or will it stay like this? [/INST] I hope it does, it always makes everything bright and cheerful for outdoor fun. [INST] I hope we can go outside again soon, it would be wonderful outside. [/INST] I also hope for brighter skies, it is important for outdoor fun
=== Sample 5 ===
[INST] I think we should talk about how disgusting the curtain looks for the clock. [/INST] That is true, the colors can look unappealing and gross to many people. [INST] I just cannot stand seeing all that trash like that; it really does not help. [/INST] Yes, it's like the heat is piling up there, making it a mess, which is frustrating. [INST] Do you think we can find a way to stop feeling so dirty and disilled about it? [/INST] It might be wise to find something like that, but it is hard to predict. [INST] True, maybe we can find a way to turn things tidy and clean next time.
Not the most coherent, but for a 6M parameter model, it's decent.
import torch
from model import AxonModel, AxonConfig
from tokenizer import BPETokenizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
config = AxonConfig.from_json("config.json")
model = AxonModel(config).to(device)
model.load_state_dict(torch.load("pytorch_model.pt", map_location=device))
model.eval()
tokenizer = BPETokenizer.from_json("tokenizer.json")
prompt = "[INST] "
tokens = tokenizer.encode(prompt)
input_ids = torch.tensor([tokens], device=device)
with torch.no_grad():
for _ in range(100):
logits = model(input_ids)["logits"][:, -1, :]
next_token = torch.argmax(logits, dim=-1, keepdim=True)
input_ids = torch.cat([input_ids, next_token], dim=1)
output = tokenizer.decode(input_ids[0].tolist())
print(output)
from generation import generate
output = generate(
model=model,
tokenizer=tokenizer,
prompt="[INST] ",
max_new_tokens=200,
temperature=0.8,
top_p=0.9,
device=device,
)
print(output)
This model is intended for:
Not intended for:
@misc{axon2025,
title={Axon: Multi-Timescale Diagonal Memory for Efficient Sequence Modeling},
author={Oscar Lo},
year={2025},
url={https://huggingface.co/oscar128372/axon-tinychat-6M}
}
MIT