Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

SeaWolf-AI 
posted an update 2 days ago
view post
Post
4281
Darwin-TTS: 3% of an LLM's Brain Makes TTS Speak with Emotion — Zero Training

We blended 3% of Qwen3-1.7B (LLM) FFN weights into Qwen3-TTS-1.7B's talker module. The result: emotionally enhanced speech synthesis — with zero training, zero data, and zero GPU hours.

Try the Demo: FINAL-Bench/Darwin-TTS-1.7B-Cross

Model Weights: FINAL-Bench/Darwin-TTS-1.7B-Cross

Full Research Article: https://huggingface.co/blog/FINAL-Bench/darwin-tts

Qwen3-1.7B (LLM) and Qwen3-TTS-1.7B's talker share 100% identical architecture — same hidden_size (2048), same layers (28), same heads (16). This enabled pure 1:1 weight blending across 84 FFN tensors with a single lerp operation. At 3% blend, emotion appears. At 5%, emotion intensifies. At 10%, the model breaks — producing 655-second outputs for a 3-second sentence, because the LLM's "keep generating" pattern overwhelms the TTS stop signal.

To our knowledge, this is the first training-free cross-modal weight transfer between an LLM and a TTS model. Prior work either requires adapter training (SmolTolk, 2025), fine-tuning (CSLM, 2025), or massive end-to-end compute (GPT-4o). Darwin-TTS achieves cross-modal capability transfer in under 2 minutes on CPU.

The key insight: TTS models with LLM backbones already "think" in language. We're just restoring 3% of the original LLM's language understanding patterns — particularly those related to emotional semantics and prosody planning. The code is three lines: load the model, load the LLM FFN, call p.lerp_(llm_weight, 0.03).

creators of the Darwin Evolutionary Merge Framework.
Darwin LLM V7 achieved GPQA Diamond 86.9% (HF Benchmark #3)
through CMA-ES optimized FFN crossbreeding. Darwin-TTS extends this principle from LLM-to-LLM merging into cross-modal LLM-to-TTS transfer. Apache 2.0.
imnotkitty 
posted an update 1 day ago
view post
Post
1873
Just tried tencent/HY-World-2.0 — a multimodal world model that takes in text or a single image and generates editable 3D scenes.

Unlike Google's Genie and HY-World 1.5, v2.0 generates engine-ready 3D content:
🎮 Direct import into Unreal Engine and Unity — no format wrangling
🧊 Supports multiple 3D asset formats: Mesh, 3DGS, point cloud, etc.
✏️ Fully editable — not a baked video, but actual geometry you can modify
🤖 Also usable for embodied simulation environments

Basically: from "AI generates a world you can look at" → "AI generates a world you can ship."
danielhanchen 
posted an update about 14 hours ago
DedeProGames 
posted an update 1 day ago
view post
Post
2716
🔥 GRM-2.5 - The most POWERFUL model for local inference

The GRM-2.5 is the newest model from Orion LLM Labs. It has consistent RAW reasoning and is capable of generating very precise responses, similar to large models, while maintaining a parameter size of 4b.

The GRM-2.5 family consists of these models:
OrionLLM/GRM-2.5 (4b)
OrionLLM/GRM-2.5-Air (0.8b)

Furthermore, the GRM-2.5 is the best option for local agentic environments, being very good in code, terminal agent, etc. It is capable of generating 1000 lines of consistent code and programming like large models.
The GRM-2.5 is the best base for FineTune to date and has vision, which means it can interpret images and videos.
  • 1 reply
·
Benedictat 
posted an update 1 day ago
view post
Post
1980
Hunyuan HY-World 2.0 Open-Sourced | Unified SOTA for 3D Generation / Reconstruction / Simulation

HY-World 2.0 is a unified 3D world model supporting multimodal inputs including text and images.

Its end-to-end framework simultaneously performs 3D understanding, scene generation, and geometric reconstruction.

Based on HY-Pano-2.0, the model enables panorama generation without camera parameters

It ensures geometric consistency via spatial agents and trajectory planning, and achieves joint 3DGS & Mesh representation with WorldMirror 2.0, reaching SOTA performance in novel view synthesis and 3D reconstruction

Unlike Genie 3 and HY-World 1.5, which only output videos, HY-World 2.0 directly generates editable 3D assets, better meeting real-world research and simulation demands
  • 1 reply
·
kelsend 
posted an update 1 day ago
view post
Post
1898
Tencent Open-Sources Hunyuan 3D World Model 2.0 Generate Editable 3D Game Worlds with One Sentence, Compatible with Unity/UE

Tencent has officially released and open-sourced Hunyuan 3D World Model 2.0 (HY-World 2.0), enabling AI to evolve from video generation to creating playable, editable 3D world

Core Highlights

Text / Image / Video → Directly generate exportable 3D assets (Mesh / 3DGS / Point Cloud)

Seamlessly integrates with Unity / Unreal Engine for game maps and level prototyping

One-click reconstruction of digital twin scenes from single images/videos, no camera parameters required

Spatial Agent for intelligent navigation trajectories no wall penetration, consistent spatial height
All-new HY-Pano-2.0 + WorldMirror 2.0 architecture, achieving SOTA in 3D reconstruction and novel view synthesis

Key Breakthrough
Unlike Genie 3 and Hunyuan 1.5, which only output videos, HY-World 2.0 generates re-editable 3D worlds that support collision, interaction, and engine import

Application Scenarios
Game Development, Indoor Preview, Urban Planning, Digitalization of Cultural Heritage, Embodied AI Simulation
cahlen 
posted an update 2 days ago
view post
Post
2243
Huggingface just enabled cuda kernel repos!! This is crazy cool!

Expect a ton more portable number theory cuda kernels in the near future. I'm going to have a hell of a lot of fun with this new feature.

Appreciate it huggingface!

https://huggingface.co/kernels

  • 1 reply
·
wangbuer999 
posted an update about 20 hours ago
view post
Post
1511
Hands-on testing of HY-World 2.0 shows a significant improvement in end-to-end engineering maturity compared to version 1.5

The model supports direct multimodal input from text, single-frame images, and video. Inference can be launched without camera intrinsic/extrinsic calibration or additional preprocessing

After panorama generation, the built-in Spatial Agent automatically performs semantic navigation path planning. Combined with spatial consistency constraints from HY-WorldStereo, it ensures artifact-free multi-view generation and stable geometric alignment

Outputs include standard 3D asset formats such as Mesh, 3DGS, and point clouds, which can be directly imported into Unity/UE

It is suitable for engineering scenarios including game level prototyping, digital twins, and embodied simulation
sergiopaniego 
posted an update 1 day ago
view post
Post
427
Earlier this month, Apple introduced Simple Self-Distillation: a fine-tuning method that improves models on coding tasks just by sampling from the model and training on its own outputs with plain cross-entropy

And… it's already supported in TRL, built by Kashif Rasul. you can really feel the pace of development in the team 🐎

Paper by Ruixiang ZHANG, He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang at Apple 🍎

How it works: the model generates completions at a training-time temperature (T_train) with top_k/top_p truncation, then fine-tunes on them with plain cross-entropy. no labels or verifier needed

You can try it right away with this ready-to-run example (Qwen3-4B on rStar-Coder):
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd.py
or benchmark a checkpoint with the eval script:
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd_eval.py

One neat insight from the paper: T_train and T_eval compose into an effective T_eff = T_train × T_eval, so a broad band of configs works well. even very noisy samples still help

Want to dig deeper?

Paper: Embarrassingly Simple Self-Distillation Improves Code Generation (2604.01193)
Trainer docs: https://huggingface.co/docs/trl/main/en/ssd_trainer
victor 
posted an update 3 days ago
view post
Post
4473
Want to share my enthusiasm for zai-org/GLM-5.1 here too 🔥

I think we have it: our open source Claude Code = GLM-5.1 + Pi (https://pi.dev/) - Built a Three.js racing game to eval and it's extremely impressive. Thoughts:

- One-shot car physics with real drift mechanics (this is hard)

- My fav part: Awesome at self iterating (with no vision!) created 20+ Bun.WebView debugging tools to drive the car programmatically and read game state. Proved a winding bug with vector math without ever seeing the screen

- 531-line racing AI in a single write: 4 personalities, curvature map, racing lines, tactical drifting. Built telemetry tools to compare player vs AI speed curves and data-tuned parameters

- All assets from scratch: 3D models, procedural textures, sky shader, engine sounds, spatial AI audio!

- Can do hard math: proved road normals pointed DOWN via vector cross products, computed track curvature normalized by arc length to tune AI cornering speed

You are going to hear about this model a lot in the next months - open source let's go - and thanks z-ai🚀🚀
  • 4 replies
·