# Hybrid Naming Scheme & Benchmark Synopsis This report summarizes baseline and hybrid quantization results for `Qwen3-4B-Instruct-2507-unsloth` as measured by the Magic Quant pipeline. ## Naming Scheme Model variants follow a structured suffix convention that encodes both the base conversion mode and per-tensor quantization schemes. | Suffix Example | Meaning | | -------------- | ------- | | `BF16` | Pure full-precision family baseline (no quantization). | | `Q8_0`, `Q6_K`, `Q5_K`, `Q4_K_M`, `IQ4_NL`, `MXFP4_MOE` | Pure model-wide quantization baselines. | | `iq4_nl-emb_Q4_K-head_Q4_K-moe_rt_Q4_K` | Base conversion mode `iq4_nl` with per-group schemes: embeddings (`emb_`), output head (`head_`), MoE router (`moe_rt_`). | | `...-aq_F16-akv_Q8_0-fd_Q4_K-ao_Q5_K` | Extended sensitivity groups: Attention Q (`aq_`), Attention K+V (`akv_`), FFN Down (`fd_`), Attention Output (`ao_`). | | `mxfp4_moe-emb_IQ4_NL-head_Q6_K-moe_exp_MXFP4-moe_rt_Q6_K` | MXFP4-centric hybrids with MoE expert group (`moe_exp_`) and mixed IQ / Q-schemes per tensor group. | In general, anything after the base model name is a purely mechanical description of **how** the weights were transformed, not a new training run. --- ## Benchmark Methodology All models were tested with a unified automated harness using `llama.cpp` tools. **Included tests:** - **Throughput:** `llama-bench` with descending GPU offload (`-ngl 35 → 0`) and automatic OOM retry. Highest successful TPS is recorded. - **Perplexity:** Three domains: **general**, **code**, **math**. Each uses an auto-generated corpus of ~**32k tokens**. Perplexity is computed with `llama-perplexity` at **2048-token** context. Same GPU retry logic as above. - **Precision loss:** Each model is compared to its **family BF16 baseline**. Precision-loss % is computed for all PPL domains, plus an averaged score. Models are ranked by this metric. --- ### Table - Overview of Results Comparing to BF16. | model_name | size_reduction | tps_change | | ---------- | -------------- | ---------- | | mxfp4_moe-akv_BF16-ao_Q6_K-aq_Q6_K-emb_Q8_0-fd_Q8_0-fug_Q8_0 | 46.93% | 46.64% | | mxfp4_moe-akv_Q8_0-ao_Q6_K-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0 | 48.00% | 39.41% | | mxfp4_moe-akv_Q6_K-ao_Q5_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K | 59.60% | 68.19% | | mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_Q8_0-fd_IQ4_NL-fug_IQ4_NL | 69.60% | 61.56% | | mxfp4_moe-akv_BF16-ao_MXFP4-aq_IQ4_NL-emb_Q5_K-fd_Q6_K-fug_IQ4_NL | 65.07% | 83.66% | | IQ4_NL | 70.27% | 67.59% | | mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL | 70.67% | 70.40% | | mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_Q6_K-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL | 69.47% | 71.89% | | mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL | 72.00% | 103.44% | * All percentages compared against the selected family BF16 baseline. --- ### Table - File Size + TPS + Avg Precision Loss | model_name | file_size_gb | bench_tps | avg_prec_loss | | ---------- | ------------ | --------- | ------------- | | BF16 | 7.50 | 254.70 | 0.0000 | | mxfp4_moe-akv_BF16-ao_Q6_K-aq_Q6_K-emb_Q8_0-fd_Q8_0-fug_Q8_0 | 3.98 | 373.48 | 0.0533 | | mxfp4_moe-akv_Q8_0-ao_Q6_K-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0 | 3.90 | 355.09 | 0.0728 | | mxfp4_moe-akv_Q6_K-ao_Q5_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K | 3.03 | 428.37 | 0.1631 | | mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_Q8_0-fd_IQ4_NL-fug_IQ4_NL | 2.28 | 411.49 | 0.7356 | | mxfp4_moe-akv_BF16-ao_MXFP4-aq_IQ4_NL-emb_Q5_K-fd_Q6_K-fug_IQ4_NL | 2.62 | 467.79 | 0.8322 | | IQ4_NL | 2.23 | 426.86 | 0.8996 | | mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL | 2.20 | 434.01 | 1.0426 | | mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_Q6_K-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL | 2.29 | 437.81 | 1.1673 | | mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL | 2.10 | 518.15 | 2.0904 | * `avg_prec_loss` is the averaged absolute precision-loss % vs BF16. --- ### Table - PPL Columns | model_name | gen | gen_er | code | code_er | math | math_er | | ---------- | --- | ------ | ---- | ------- | ---- | ------- | | BF16 | 8.8830 | 0.2056 | 1.5469 | 0.0122 | 6.7086 | 0.1369 | | mxfp4_moe-akv_BF16-ao_Q6_K-aq_Q6_K-emb_Q8_0-fd_Q8_0-fug_Q8_0 | 8.8766 | 0.2053 | 1.5463 | 0.0122 | 6.7119 | 0.1368 | | mxfp4_moe-akv_Q8_0-ao_Q6_K-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0 | 8.8712 | 0.2051 | 1.5476 | 0.0122 | 6.7113 | 0.1368 | | mxfp4_moe-akv_Q6_K-ao_Q5_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K | 8.8564 | 0.2036 | 1.5473 | 0.0122 | 6.6976 | 0.1358 | | mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_Q8_0-fd_IQ4_NL-fug_IQ4_NL | 9.0127 | 0.2057 | 1.5546 | 0.0119 | 6.6919 | 0.1331 | | mxfp4_moe-akv_BF16-ao_MXFP4-aq_IQ4_NL-emb_Q5_K-fd_Q6_K-fug_IQ4_NL | 9.0490 | 0.2096 | 1.5535 | 0.0121 | 6.7221 | 0.1358 | | IQ4_NL | 8.9948 | 0.2072 | 1.5600 | 0.0123 | 6.7484 | 0.1362 | | mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL | 9.0487 | 0.2082 | 1.5611 | 0.0122 | 6.7317 | 0.1350 | | mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_Q6_K-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL | 9.0419 | 0.2084 | 1.5615 | 0.0122 | 6.7602 | 0.1361 | | mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL | 9.2104 | 0.2106 | 1.5598 | 0.0119 | 6.8261 | 0.1363 | * gen = ppl_general, code = ppl_code, math = ppl_math --- ### Table - Precision Loss Columns | model_name | loss_general | loss_code | loss_math | | ---------- | ------------ | --------- | --------- | | BF16 | 0.0000 | 0.0000 | 0.0000 | | mxfp4_moe-akv_BF16-ao_Q6_K-aq_Q6_K-emb_Q8_0-fd_Q8_0-fug_Q8_0 | 0.0720 | 0.0388 | 0.0492 | | mxfp4_moe-akv_Q8_0-ao_Q6_K-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0 | 0.1328 | 0.0453 | 0.0402 | | mxfp4_moe-akv_Q6_K-ao_Q5_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K | 0.2994 | 0.0259 | 0.1640 | | mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_Q8_0-fd_IQ4_NL-fug_IQ4_NL | 1.4601 | 0.4978 | 0.2489 | | mxfp4_moe-akv_BF16-ao_MXFP4-aq_IQ4_NL-emb_Q5_K-fd_Q6_K-fug_IQ4_NL | 1.8687 | 0.4267 | 0.2012 | | IQ4_NL | 1.2586 | 0.8469 | 0.5933 | | mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL | 1.8654 | 0.9180 | 0.3443 | | mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_Q6_K-emb_Q6_K-fd_IQ4_NL-fug_IQ4_NL | 1.7888 | 0.9438 | 0.7692 | | mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL | 3.6857 | 0.8339 | 1.7515 | * loss_* values are absolute precision-loss % vs BF16 per domain.