magiccodingman's picture
File name changes
5b3b88a verified

Hybrid Naming Scheme & Benchmark Synopsis

This report summarizes baseline and hybrid quantization results for Qwen3-4B-Instruct-2507-unsloth as measured by the Magic Quant pipeline.

Naming Scheme

Model variants follow a structured suffix convention that encodes both the base conversion mode and per-tensor quantization schemes.

Suffix Example Meaning
BF16 Pure full-precision family baseline (no quantization).
Q8_0, Q6_K, Q5_K, Q4_K_M, IQ4_NL, MXFP4_MOE Pure model-wide quantization baselines.
iq4_nl-emb_Q4_K-head_Q4_K-moe_rt_Q4_K Base conversion mode iq4_nl with per-group schemes: embeddings (emb_), output head (head_), MoE router (moe_rt_).
...-aq_F16-akv_Q8_0-fd_Q4_K-ao_Q5_K Extended sensitivity groups: Attention Q (aq_), Attention K+V (akv_), FFN Down (fd_), Attention Output (ao_).
mxfp4_moe-emb_IQ4_NL-head_Q6_K-moe_exp_MXFP4-moe_rt_Q6_K MXFP4-centric hybrids with MoE expert group (moe_exp_) and mixed IQ / Q-schemes per tensor group.

In general, anything after the base model name is a purely mechanical description of how the weights were transformed, not a new training run.


Benchmark Methodology

All models were tested with a unified automated harness using llama.cpp tools.

Included tests:

  • Throughput:
    llama-bench with descending GPU offload (-ngl 35 → 0) and automatic OOM retry.
    Highest successful TPS is recorded.

  • Perplexity:
    Three domains: general, code, math.
    Each uses an auto-generated corpus of ~32k tokens.
    Perplexity is computed with llama-perplexity at 2048-token context.
    Same GPU retry logic as above.

  • Precision loss:
    Each model is compared to its family BF16 baseline.
    Precision-loss % is computed for all PPL domains, plus an averaged score.
    Models are ranked by this metric.


Table - Overview of Results

Comparing to BF16.

model_name size_reduction tps_change
mxfp4_moe-akv_BF16-ao_Q6_K-aq_Q6_K-emb_Q8_0-fd_Q8_0-fug_Q8_0 46.93% 46.64%
mxfp4_moe-akv_Q8_0-ao_Q6_K-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0 48.00% 39.41%
mxfp4_moe-akv_Q6_K-ao_Q5_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K 59.60% 68.19%
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_Q8_0-fd_IQ4_NL-fug_IQ4_NL 69.60% 61.56%
mxfp4_moe-akv_BF16-ao_MXFP4-aq_IQ4_NL-emb_Q5_K-fd_Q6_K-fug_IQ4_NL 65.07% 83.66%
IQ4_NL 70.27% 67.59%
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL 70.67% 70.40%
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_Q6_K-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL 69.47% 71.89%
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL 72.00% 103.44%
  • All percentages compared against the selected family BF16 baseline.

Table - File Size + TPS + Avg Precision Loss

model_name file_size_gb bench_tps avg_prec_loss
BF16 7.50 254.70 0.0000
mxfp4_moe-akv_BF16-ao_Q6_K-aq_Q6_K-emb_Q8_0-fd_Q8_0-fug_Q8_0 3.98 373.48 0.0533
mxfp4_moe-akv_Q8_0-ao_Q6_K-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0 3.90 355.09 0.0728
mxfp4_moe-akv_Q6_K-ao_Q5_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K 3.03 428.37 0.1631
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_Q8_0-fd_IQ4_NL-fug_IQ4_NL 2.28 411.49 0.7356
mxfp4_moe-akv_BF16-ao_MXFP4-aq_IQ4_NL-emb_Q5_K-fd_Q6_K-fug_IQ4_NL 2.62 467.79 0.8322
IQ4_NL 2.23 426.86 0.8996
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL 2.20 434.01 1.0426
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_Q6_K-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL 2.29 437.81 1.1673
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL 2.10 518.15 2.0904
  • avg_prec_loss is the averaged absolute precision-loss % vs BF16.

Table - PPL Columns

model_name gen gen_er code code_er math math_er
BF16 8.8830 0.2056 1.5469 0.0122 6.7086 0.1369
mxfp4_moe-akv_BF16-ao_Q6_K-aq_Q6_K-emb_Q8_0-fd_Q8_0-fug_Q8_0 8.8766 0.2053 1.5463 0.0122 6.7119 0.1368
mxfp4_moe-akv_Q8_0-ao_Q6_K-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0 8.8712 0.2051 1.5476 0.0122 6.7113 0.1368
mxfp4_moe-akv_Q6_K-ao_Q5_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K 8.8564 0.2036 1.5473 0.0122 6.6976 0.1358
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_Q8_0-fd_IQ4_NL-fug_IQ4_NL 9.0127 0.2057 1.5546 0.0119 6.6919 0.1331
mxfp4_moe-akv_BF16-ao_MXFP4-aq_IQ4_NL-emb_Q5_K-fd_Q6_K-fug_IQ4_NL 9.0490 0.2096 1.5535 0.0121 6.7221 0.1358
IQ4_NL 8.9948 0.2072 1.5600 0.0123 6.7484 0.1362
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL 9.0487 0.2082 1.5611 0.0122 6.7317 0.1350
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_Q6_K-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL 9.0419 0.2084 1.5615 0.0122 6.7602 0.1361
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL 9.2104 0.2106 1.5598 0.0119 6.8261 0.1363
  • gen = ppl_general, code = ppl_code, math = ppl_math

Table - Precision Loss Columns

model_name loss_general loss_code loss_math
BF16 0.0000 0.0000 0.0000
mxfp4_moe-akv_BF16-ao_Q6_K-aq_Q6_K-emb_Q8_0-fd_Q8_0-fug_Q8_0 0.0720 0.0388 0.0492
mxfp4_moe-akv_Q8_0-ao_Q6_K-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0 0.1328 0.0453 0.0402
mxfp4_moe-akv_Q6_K-ao_Q5_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K 0.2994 0.0259 0.1640
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_Q8_0-fd_IQ4_NL-fug_IQ4_NL 1.4601 0.4978 0.2489
mxfp4_moe-akv_BF16-ao_MXFP4-aq_IQ4_NL-emb_Q5_K-fd_Q6_K-fug_IQ4_NL 1.8687 0.4267 0.2012
IQ4_NL 1.2586 0.8469 0.5933
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL 1.8654 0.9180 0.3443
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_Q6_K-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL 1.7888 0.9438 0.7692
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL 3.6857 0.8339 1.7515
  • loss_* values are absolute precision-loss % vs BF16 per domain.