Axon-Nano-6M

Parameters Architecture License

Visit this space to test out Axon! View how it chooses tokens and how sampling parameters change the behavior.

Axon-Nano-6M Inference

Model Description

Axon-Nano-6M is a small-scale language model built on the novel Axon architecture. It is a recurrent neural network with multi-timescale diagonal memory and local convolution mixing. Unlike Transformers, Axon models have O(1) memory per token during inference, making them efficient for long-context generation, while remaining highly expressive.

Key Features

  • Multi-Timescale Memory: Three groups of memory cells with different forget/input gate constraints capture short, medium, and long-range dependencies
  • Efficient Inference: Constant memory footprint regardless of sequence length
  • Parallel Training: Uses parallel scan for O(L) training complexity
  • Local + Global Mixing: Depthwise convolutions for local patterns, diagonal RNN for global context

Architecture Details

Component Configuration
Parameters 6,204,032
Layers 4
Model Dimension 256
Fast/Slow Split 64/192
Memory Dimension 128
Control Dimension 128
FFN Multiplier 6×
Local Kernel 5
Vocab Size 2,048 (BPE)
Max Sequence 256

Timescale Groups

Group Size Forget Range Input Max Purpose
Fast 43 [0.05, 0.90] 0.75 Recent context
Medium 43 [0.01, 0.98] 0.60 Medium-term
Slow 42 [0.001, 0.995] 0.45 Long-term memory

Training Details

  • Dataset: TinyChat (~190M BPE tokens)
  • Steps: 5,000 (~0.9 epochs)
  • Batch Size: 128
  • Sequence Length: 256
  • Optimizer: HydraX (custom optimizer, more details soon)
  • Learning Rate: 3e-4
  • Hardware: 1x T4 GPU (Google Colab)

Training Curve

Step Train Loss Val Loss Val PPL
500 2.82 2.74 15.51
1000 2.30 2.25 9.51
2000 2.03 1.97 7.14
3000 1.90 1.85 6.38
4000 1.82 1.77 5.87
5000 1.79 1.73 5.62

The model is slightly undertrained.

Sample Outputs:

=== Sample 1 ===
[INST] I feel so angry about the public spaces we spent with my new dog after that event [/INST] That is disappointing; it can be frustrating when people do not think about the effort we put in. [INST] Yes, it seems like some people do not understand how their actions affect others [/INST] It does seem unkind, especially when they do not share their concerns to others. [INST] I just wish people could think more before they speak up for no good reason [/INST] That is a good wish, sometimes people do not realize the impact of their actions. [INST] It makes me wonder if they ever think about how their actions affect others [/INST] Caring for others is important, yet many still prefer

=== Sample 2 ===
[INST] How about we talk about the weather today because it is sunny and warm? [/INST] It is great to see you too, the warm sun makes everything bright. [INST] Yes, the air feels better; I love sunny days when it is sunny. [/INST] Sailing on the water really helps us feel refreshed and happy. [INST] What is your favorite part of sunny days to stay positive? [/INST] I enjoy reading books, especially when the sun is shining brightly outside. [INST] That sounds nice, I also like wearing my favorite spoon in my garden. [/INST] Warm milk always brings joy to everyone, do you enjoy spending time with nature?
<|endoftext

=== Sample 3 ===
[INST] Hello, I feel a bit sad today because I heard some false stories I heard about me. [/INST] I understand, those stories can sometimes make us feel fearful or sad. [INST] Yes, it hurts a lot, and I wonder if anyone will ever feel the same way. [/INST] Art can bring peace, but it often seems to come back slowly when we least expect it. [INST] I hope something can bring back some of that sadness and find peace again. [/INST] It is always possible for things to be valued; maybe you will find new ways to share.
<|endoftext|>
[INST] Why do people find it so disgusting when I just sneez

=== Sample 4 ===
[INST] Hello, I feel a bit sad today because everything seems so cold and gloomy outside. [/INST] It is understandable to feel that way when the days seem to pass so quickly. [INST] I worry about how bad the weather can make us feel gloomy sometimes. [/INST] Yes, the current weather can often impact our mood and energy levels. [INST] Do you think the rain will stop soon, or will it stay like this? [/INST] I hope it does, it always makes everything bright and cheerful for outdoor fun. [INST] I hope we can go outside again soon, it would be wonderful outside. [/INST] I also hope for brighter skies, it is important for outdoor fun

=== Sample 5 ===
[INST] I think we should talk about how disgusting the curtain looks for the clock. [/INST] That is true, the colors can look unappealing and gross to many people. [INST] I just cannot stand seeing all that trash like that; it really does not help. [/INST] Yes, it's like the heat is piling up there, making it a mess, which is frustrating. [INST] Do you think we can find a way to stop feeling so dirty and disilled about it? [/INST] It might be wise to find something like that, but it is hard to predict. [INST] True, maybe we can find a way to turn things tidy and clean next time.

Not the most coherent, but for a 6M parameter model, it's decent.

Usage

Quick Start

import torch
from model import AxonModel, AxonConfig
from tokenizer import BPETokenizer

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
config = AxonConfig.from_json("config.json")
model = AxonModel(config).to(device)
model.load_state_dict(torch.load("pytorch_model.pt", map_location=device))
model.eval()

tokenizer = BPETokenizer.from_json("tokenizer.json")

prompt = "[INST] "
tokens = tokenizer.encode(prompt)
input_ids = torch.tensor([tokens], device=device)

with torch.no_grad():
    for _ in range(100):
        logits = model(input_ids)["logits"][:, -1, :]
        next_token = torch.argmax(logits, dim=-1, keepdim=True)
        input_ids = torch.cat([input_ids, next_token], dim=1)

output = tokenizer.decode(input_ids[0].tolist())
print(output)

Generation with Sampling

from generation import generate

output = generate(
    model=model,
    tokenizer=tokenizer,
    prompt="[INST] ",
    max_new_tokens=200,
    temperature=0.8,
    top_p=0.9,
    device=device,
)
print(output)

Limitations

  • Scale: This is a 6M parameter model trained on TinyChat, no other datasets
  • Coherence: Long generations may lose coherence
  • Knowledge: Limited factual knowledge due to small training corpus
  • Chat Format: Best used with basic conversational prompts (see starhopp3r/TinyChat)

Intended Use

This model is intended for:

  • Research into efficient RNN architectures
  • Educational purposes
  • Experimentation with novel memory mechanisms

Not intended for:

  • Production applications
  • Factual question answering
  • Safety-critical use cases

Citation

@misc{axon2025,
  title={Axon: Multi-Timescale Diagonal Memory for Efficient Sequence Modeling},
  author={Oscar Lo},
  year={2025},
  url={https://huggingface.co/oscar128372/axon-tinychat-6M}
}

License

MIT

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train oscar128372/axon-nano-6m

Spaces using oscar128372/axon-nano-6m 2

Collection including oscar128372/axon-nano-6m

Evaluation results