Quantization Drift

Qwen3.5-9B  ·  47 models  ·  @seed42  ·  -c 400

Each row shows the raw completion a quantized model produced vs. the BF16 baseline. The faded prefix is where both models agree token-for-token. The coloured suffix is where they part ways — style drift, loop onset, or full hallucination. The bar shows at what percentage of the BF16 output length divergence first occurs.

Prompt context
Model Div. @token Completion (truncated to 600 chars)
Identical to BF16
Diverges after 50% of BF16 length
Diverges after 25%
Diverges within first 25%
Method: llama.cpp b8250 · CUDA · temp=0, seed=42 · -c 400 · models loaded one at a time from VRAM to ensure identical KV state. Divergence onset = first character position where the completion differs from BF16 output. bart = bartowski/Qwen_Qwen3.5-9B-GGUF  ·  unsl = unsloth/Qwen3.5-9B-GGUF  ·  lmst = lmstudio-community/Qwen3.5-9B-GGUF