Axon-Nano-6M

Visit this space to test out Axon! View how it chooses tokens and how sampling parameters change the behavior.

Axon-Nano-6M Inference

Model Description

Axon-Nano-6M is a small-scale language model built on the novel Axon architecture. It is a recurrent neural network with multi-timescale diagonal memory and local convolution mixing. Unlike Transformers, Axon models have O(1) memory per token during inference, making them efficient for long-context generation, while remaining highly expressive.

Key Features

Multi-Timescale Memory: Three groups of memory cells with different forget/input gate constraints capture short, medium, and long-range dependencies
Efficient Inference: Constant memory footprint regardless of sequence length
Parallel Training: Uses parallel scan for O(L) training complexity
Local + Global Mixing: Depthwise convolutions for local patterns, diagonal RNN for global context

Architecture Details

Component	Configuration
Parameters	6,204,032
Layers	4
Model Dimension	256
Fast/Slow Split	64/192
Memory Dimension	128
Control Dimension	128
FFN Multiplier	6×
Local Kernel	5
Vocab Size	2,048 (BPE)
Max Sequence	256

Timescale Groups

Group	Size	Forget Range	Input Max	Purpose
Fast	43	[0.05, 0.90]	0.75	Recent context
Medium	43	[0.01, 0.98]	0.60	Medium-term
Slow	42	[0.001, 0.995]	0.45	Long-term memory

Training Details

Dataset: TinyChat (~190M BPE tokens)
Steps: 5,000 (~0.9 epochs)
Batch Size: 128
Sequence Length: 256
Optimizer: HydraX (custom optimizer, more details soon)
Learning Rate: 3e-4
Hardware: 1x T4 GPU (Google Colab)

Training Curve

Step	Train Loss	Val Loss	Val PPL
500	2.82	2.74	15.51
1000	2.30	2.25	9.51
2000	2.03	1.97	7.14
3000	1.90	1.85	6.38
4000	1.82	1.77	5.87
5000	1.79	1.73	5.62

The model is slightly undertrained.

Sample Outputs:

=== Sample 1 ===
[INST] I feel so angry about the public spaces we spent with my new dog after that event [/INST] That is disappointing; it can be frustrating when people do not think about the effort we put in. [INST] Yes, it seems like some people do not understand how their actions affect others [/INST] It does seem unkind, especially when they do not share their concerns to others. [INST] I just wish people could think more before they speak up for no good reason [/INST] That is a good wish, sometimes people do not realize the impact of their actions. [INST] It makes me wonder if they ever think about how their actions affect others [/INST] Caring for others is important, yet many still prefer

=== Sample 2 ===
[INST] How about we talk about the weather today because it is sunny and warm? [/INST] It is great to see you too, the warm sun makes everything bright. [INST] Yes, the air feels better; I love sunny days when it is sunny. [/INST] Sailing on the water really helps us feel refreshed and happy. [INST] What is your favorite part of sunny days to stay positive? [/INST] I enjoy reading books, especially when the sun is shining brightly outside. [INST] That sounds nice, I also like wearing my favorite spoon in my garden. [/INST] Warm milk always brings joy to everyone, do you enjoy spending time with nature?
<|endoftext

=== Sample 3 ===
[INST] Hello, I feel a bit sad today because I heard some false stories I heard about me. [/INST] I understand, those stories can sometimes make us feel fearful or sad. [INST] Yes, it hurts a lot, and I wonder if anyone will ever feel the same way. [/INST] Art can bring peace, but it often seems to come back slowly when we least expect it. [INST] I hope something can bring back some of that sadness and find peace again. [/INST] It is always possible for things to be valued; maybe you will find new ways to share.
<|endoftext|>
[INST] Why do people find it so disgusting when I just sneez

=== Sample 4 ===
[INST] Hello, I feel a bit sad today because everything seems so cold and gloomy outside. [/INST] It is understandable to feel that way when the days seem to pass so quickly. [INST] I worry about how bad the weather can make us feel gloomy sometimes. [/INST] Yes, the current weather can often impact our mood and energy levels. [INST] Do you think the rain will stop soon, or will it stay like this? [/INST] I hope it does, it always makes everything bright and cheerful for outdoor fun. [INST] I hope we can go outside again soon, it would be wonderful outside. [/INST] I also hope for brighter skies, it is important for outdoor fun

=== Sample 5 ===
[INST] I think we should talk about how disgusting the curtain looks for the clock. [/INST] That is true, the colors can look unappealing and gross to many people. [INST] I just cannot stand seeing all that trash like that; it really does not help. [/INST] Yes, it's like the heat is piling up there, making it a mess, which is frustrating. [INST] Do you think we can find a way to stop feeling so dirty and disilled about it? [/INST] It might be wise to find something like that, but it is hard to predict. [INST] True, maybe we can find a way to turn things tidy and clean next time.

Not the most coherent, but for a 6M parameter model, it's decent.

Usage

Quick Start

import torch
from model import AxonModel, AxonConfig
from tokenizer import BPETokenizer

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
config = AxonConfig.from_json("config.json")
model = AxonModel(config).to(device)
model.load_state_dict(torch.load("pytorch_model.pt", map_location=device))
model.eval()

tokenizer = BPETokenizer.from_json("tokenizer.json")

prompt = "[INST] "
tokens = tokenizer.encode(prompt)
input_ids = torch.tensor([tokens], device=device)

with torch.no_grad():
    for _ in range(100):
        logits = model(input_ids)["logits"][:, -1, :]
        next_token = torch.argmax(logits, dim=-1, keepdim=True)
        input_ids = torch.cat([input_ids, next_token], dim=1)

output = tokenizer.decode(input_ids[0].tolist())
print(output)

Generation with Sampling

from generation import generate

output = generate(
    model=model,
    tokenizer=tokenizer,
    prompt="[INST] ",
    max_new_tokens=200,
    temperature=0.8,
    top_p=0.9,
    device=device,
)
print(output)

Limitations

Scale: This is a 6M parameter model trained on TinyChat, no other datasets
Coherence: Long generations may lose coherence
Knowledge: Limited factual knowledge due to small training corpus
Chat Format: Best used with basic conversational prompts (see starhopp3r/TinyChat)

Intended Use

This model is intended for:

Research into efficient RNN architectures
Educational purposes
Experimentation with novel memory mechanisms

Not intended for:

Production applications
Factual question answering
Safety-critical use cases

Citation

@misc{axon2025,
  title={Axon: Multi-Timescale Diagonal Memory for Efficient Sequence Modeling},
  author={Oscar Lo},
  year={2025},
  url={https://huggingface.co/oscar128372/axon-tinychat-6M}
}

License

MIT

Downloads last month: 1

Dataset used to train oscar128372/axon-nano-6m

Spaces using oscar128372/axon-nano-6m 2

Collection including oscar128372/axon-nano-6m

Axon

Collection

A collection of memory, parameter, and computationally efficient models. • 2 items • Updated Nov 29, 2025 • 1

Evaluation results

Validation Perplexity on TinyChat
self-reported

5.620
Validation Loss on TinyChat
self-reported

1.727