LLM-TRM Sequence TRM

TRM model pretrained on CoT reasoning traces from SmolLMv3 to latently reason through compressed hidden states

Model Details

  • Architecture: Tiny Recursive Network with transformer blocks
  • Compressed dimension: 256
  • Layers: 2
  • Heads: 8
  • Latent steps (n): 6
  • Deep recursions (T): 3

Training Metrics

Training complete!
Best loss: 0.004876

 Run history:
         epoch/best_loss β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–†β–…β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–
 epoch/cosine_similarity β–β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–†β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–…β–ˆ
    epoch/halt_threshold β–β–β–β–β–‚β–β–β–β–β–‚β–…β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆβ–†β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆ
              epoch/loss β–ˆβ–‡β–‡β–‡β–†β–„β–„β–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–β–β–β–‚β–β–β–β–β–β–β–‚β–β–β–β–
     train/avg_halt_prob β–β–‚β–‚β–ƒβ–ƒβ–ƒβ–‚β–β–‚β–β–…β–…β–…β–…β–†β–„β–…β–‡β–‡β–‡β–‡β–†β–ˆβ–ˆβ–†β–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
 train/cosine_similarity β–‚β–‚β–β–‚β–β–ƒβ–‚β–…β–…β–†β–†β–†β–†β–‡β–‡β–‡β–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
         train/halt_loss β–†β–†β–ˆβ–†β–†β–…β–†β–†β–‡β–†β–…β–‡β–„β–…β–ƒβ–ƒβ–‚β–β–‚β–β–‚β–‚β–‚β–β–β–β–β–β–β–‚β–β–β–β–β–β–β–β–β–β–
              train/loss β–‡β–‡β–‡β–‡β–‡β–ˆβ–†β–‡β–‡β–…β–„β–„β–„β–„β–ƒβ–‚β–‚β–ƒβ–β–β–β–β–ƒβ–β–β–β–β–β–ƒβ–β–β–β–β–β–β–β–β–β–β–
                train/lr β–ˆβ–‡β–†β–„β–ƒβ–‚β–…β–†β–‡β–ˆβ–„β–„β–‚β–β–β–‚β–ƒβ–ƒβ–…β–‡β–ˆβ–ˆβ–‡β–†β–„β–‚β–β–β–ˆβ–ˆβ–ˆβ–‚β–β–β–β–ˆβ–ˆβ–…β–…β–
               train/mse β–ˆβ–ˆβ–‡β–‡β–ˆβ–‡β–‡β–…β–‡β–…β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–ƒβ–ƒβ–‚β–‚β–β–‚β–‚β–‚β–‚β–β–‚β–β–β–β–β–β–β–‚β–β–β–β–β–
                      +2 ...

 Run summary:
         epoch/best_loss 0.00488
 epoch/cosine_similarity 0.99913
    epoch/halt_threshold 0.94917
              epoch/loss 0.00488
     train/avg_halt_prob 1.0
 train/cosine_similarity 0.99913
         train/halt_loss 0.0
              train/loss 0.00488
                train/lr 0.0
               train/mse 0.00488

Usage

import torch
from huggingface_hub import hf_hub_download
from src.train.phase2_trm import SequenceTRM

# Download and load
checkpoint_path = hf_hub_download(repo_id="anonx3247/llm-trm-pretrained-trm", filename="trm.pt")
checkpoint = torch.load(checkpoint_path, map_location="cpu")

# Initialize TRM
trm = SequenceTRM(
    d_compressed=256,
    n_layers=2,
    n_heads=8,
)
trm.load_state_dict(checkpoint["trm_state_dict"])

# Use: takes [B, L, D'] context, outputs [B, L+1, D']
compressed_hidden = ...  # [B, L, 256]
output = trm(compressed_hidden, n_steps=4)  # [B, L+1, 256]
reasoning_result = output[:, -1, :]  # [B, 256]

Part of LLM-TRM

This TRM is part of the LLM-TRM project for integrating Tiny Recursive Models with language models.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support