Hybrid Naming Scheme & Benchmark Synopsis

This report summarizes baseline and hybrid quantization results for Qwen3-4B-Instruct-2507-unsloth as measured by the Magic Quant pipeline.

Naming Scheme

Model variants follow a structured suffix convention that encodes both the base conversion mode and per-tensor quantization schemes.

Suffix Example	Meaning
`BF16`	Pure full-precision family baseline (no quantization).
`Q8_0`, `Q6_K`, `Q5_K`, `Q4_K_M`, `IQ4_NL`, `MXFP4_MOE`	Pure model-wide quantization baselines.
`iq4_nl-emb_Q4_K-head_Q4_K-moe_rt_Q4_K`	Base conversion mode `iq4_nl` with per-group schemes: embeddings (`emb_`), output head (`head_`), MoE router (`moe_rt_`).
`...-aq_F16-akv_Q8_0-fd_Q4_K-ao_Q5_K`	Extended sensitivity groups: Attention Q (`aq_`), Attention K+V (`akv_`), FFN Down (`fd_`), Attention Output (`ao_`).
`mxfp4_moe-emb_IQ4_NL-head_Q6_K-moe_exp_MXFP4-moe_rt_Q6_K`	MXFP4-centric hybrids with MoE expert group (`moe_exp_`) and mixed IQ / Q-schemes per tensor group.

In general, anything after the base model name is a purely mechanical description of how the weights were transformed, not a new training run.

All models were tested with a unified automated harness using llama.cpp tools.

Included tests:

Throughput:
llama-bench with descending GPU offload (-ngl 35 → 0) and automatic OOM retry.
Highest successful TPS is recorded.
Perplexity:
Three domains: general, code, math.
Each uses an auto-generated corpus of ~32k tokens.
Perplexity is computed with llama-perplexity at 2048-token context.
Same GPU retry logic as above.
Precision loss:
Each model is compared to its family BF16 baseline.
Precision-loss % is computed for all PPL domains, plus an averaged score.
Models are ranked by this metric.

Comparing to BF16.

model_name	size_reduction	tps_change
mxfp4_moe-akv_BF16-ao_Q6_K-aq_Q6_K-emb_Q8_0-fd_Q8_0-fug_Q8_0	46.93%	46.64%
mxfp4_moe-akv_Q8_0-ao_Q6_K-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0	48.00%	39.41%
mxfp4_moe-akv_Q6_K-ao_Q5_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K	59.60%	68.19%
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_Q8_0-fd_IQ4_NL-fug_IQ4_NL	69.60%	61.56%
mxfp4_moe-akv_BF16-ao_MXFP4-aq_IQ4_NL-emb_Q5_K-fd_Q6_K-fug_IQ4_NL	65.07%	83.66%
IQ4_NL	70.27%	67.59%
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL	70.67%	70.40%
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_Q6_K-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL	69.47%	71.89%
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL	72.00%	103.44%

model_name	file_size_gb	bench_tps	avg_prec_loss
BF16	7.50	254.70	0.0000
mxfp4_moe-akv_BF16-ao_Q6_K-aq_Q6_K-emb_Q8_0-fd_Q8_0-fug_Q8_0	3.98	373.48	0.0533
mxfp4_moe-akv_Q8_0-ao_Q6_K-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0	3.90	355.09	0.0728
mxfp4_moe-akv_Q6_K-ao_Q5_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K	3.03	428.37	0.1631
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_Q8_0-fd_IQ4_NL-fug_IQ4_NL	2.28	411.49	0.7356
mxfp4_moe-akv_BF16-ao_MXFP4-aq_IQ4_NL-emb_Q5_K-fd_Q6_K-fug_IQ4_NL	2.62	467.79	0.8322
IQ4_NL	2.23	426.86	0.8996
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL	2.20	434.01	1.0426
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_Q6_K-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL	2.29	437.81	1.1673
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL	2.10	518.15	2.0904

model_name	gen	gen_er	code	code_er	math	math_er
BF16	8.8830	0.2056	1.5469	0.0122	6.7086	0.1369
mxfp4_moe-akv_BF16-ao_Q6_K-aq_Q6_K-emb_Q8_0-fd_Q8_0-fug_Q8_0	8.8766	0.2053	1.5463	0.0122	6.7119	0.1368
mxfp4_moe-akv_Q8_0-ao_Q6_K-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0	8.8712	0.2051	1.5476	0.0122	6.7113	0.1368
mxfp4_moe-akv_Q6_K-ao_Q5_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K	8.8564	0.2036	1.5473	0.0122	6.6976	0.1358
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_Q8_0-fd_IQ4_NL-fug_IQ4_NL	9.0127	0.2057	1.5546	0.0119	6.6919	0.1331
mxfp4_moe-akv_BF16-ao_MXFP4-aq_IQ4_NL-emb_Q5_K-fd_Q6_K-fug_IQ4_NL	9.0490	0.2096	1.5535	0.0121	6.7221	0.1358
IQ4_NL	8.9948	0.2072	1.5600	0.0123	6.7484	0.1362
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL	9.0487	0.2082	1.5611	0.0122	6.7317	0.1350
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_Q6_K-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL	9.0419	0.2084	1.5615	0.0122	6.7602	0.1361
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL	9.2104	0.2106	1.5598	0.0119	6.8261	0.1363

model_name	loss_general	loss_code	loss_math
BF16	0.0000	0.0000	0.0000
mxfp4_moe-akv_BF16-ao_Q6_K-aq_Q6_K-emb_Q8_0-fd_Q8_0-fug_Q8_0	0.0720	0.0388	0.0492
mxfp4_moe-akv_Q8_0-ao_Q6_K-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0	0.1328	0.0453	0.0402
mxfp4_moe-akv_Q6_K-ao_Q5_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K	0.2994	0.0259	0.1640
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_Q8_0-fd_IQ4_NL-fug_IQ4_NL	1.4601	0.4978	0.2489
mxfp4_moe-akv_BF16-ao_MXFP4-aq_IQ4_NL-emb_Q5_K-fd_Q6_K-fug_IQ4_NL	1.8687	0.4267	0.2012
IQ4_NL	1.2586	0.8469	0.5933
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL	1.8654	0.9180	0.3443
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_Q6_K-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL	1.7888	0.9438	0.7692
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL	3.6857	0.8339	1.7515