Quantization Drift — Qwen3.5-9B

Prompt context

Model Div. @token Completion (truncated to 600 chars)

Identical to BF16

Diverges after 50% of BF16 length

Diverges after 25%

Diverges within first 25%

Method: llama.cpp b8250 · CUDA · temp=0, seed=42 · -c 400 · models loaded one at a time from VRAM to ensure identical KV state. Divergence onset = first character position where the completion differs from BF16 output. bart = bartowski/Qwen_Qwen3.5-9B-GGUF · unsl = unsloth/Qwen3.5-9B-GGUF · lmst = lmstudio-community/Qwen3.5-9B-GGUF