Getting MTP drafter download error when loading model

#47
by mazdadoost - opened

Hi everyone,

I’m getting this error when trying to load a model in Unsloth Studio: yuxinlu1/gemma-4-12B-agentic-fable5-composer2.5-v2-3.5x-tau2-GGUF

“This model supports MTP, but its drafter file could not be downloaded, so MTP is off and it falls back to n-gram speculative decoding where the llama.cpp build supports it. Check your network connection or Hugging Face access, then reload the model to retry the drafter.”

My internet connection seems fine. Could someone help me understand:

What exactly is the “drafter file” and where should it be downloaded from?
Do I need special Hugging Face access or permissions?
How can I fix this so MTP works properly?
Thanks for any help!

@mazdadoost That message is harmless — let me take your three questions in order:

  1. What the "drafter" is — it's the MTP draft model for speculative decoding: a small companion model that guesses a few tokens ahead so the main model can verify them in a batch (~1.2–1.3x faster, lossless). In this repo it lives in the MTP/ subfolder: gemma-4-12B-it-MTP-Q8_0.gguf.

  2. HF access — none needed. The repo is fully public (Apache-2.0), no gating, no token. So it's not a permissions problem on your end — most likely Unsloth Studio's auto-downloader just isn't resolving the drafter inside the MTP/ subfolder correctly.

  3. How to fix :

  • If you only want to use the model: ignore the message. It just means the speculative speedup is off; the model itself runs completely fine (it fell back to n-gram).
  • If you want MTP actually working: download MTP/gemma-4-12B-it-MTP-Q8_0.gguf from the repo manually and run it as the draft model in llama.cpp's llama-server directly — that's the most reliable path. One caveat: Gemma 4 MTP needs a specific-ish llama.cpp build (it landed around b9553, and some newer builds have a regression that crashes the Gemma 4 drafter), so if it still won't load after the file is present, it's the build — not your network or HF access.

Happy to share the exact llama-server draft flags if you want them — just tell me your llama.cpp build.

Thanks a lot, that was very helpful.

My current llama.cpp version is b9789-mix-1f1aaa4.

Could you please share the exact llama-server flags for loading the Gemma 4 MTP drafter manually? Also, do you know if this build works reliably, or should I downgrade to something around b9553?

Sign up or log in to comment