model failing in claude code \ lm studio

#21
by agent20stv - opened

huihui-gemma-4-12b-it-qat-unquantized-abliterated@q4_k works fine
this model hallucinate in claude code

agent20stv changed discussion title from model failing in claude code to model failing in claude code \ lm studio

The single biggest cause of hallucination here is the tool results not making it back to the model β€” if the harness
doesn't feed real tool output in, a 12B fills the gap by guessing. So the serving path matters more than the weights:

Use llama-server --jinja (llama.cpp) β€” not LM Studio / minja, which can't render Gemma 4's custom chat template and
silently breaks tool calls. With --jinja, Gemma's native tool calls parse into proper structured tool_calls and
results round-trip correctly. For Claude Code, front that with a LiteLLM proxy.

When I tested v2 in Claude Code myself it worked β€” it drives tools and writes the correct files β€” but different setups
hit different failure modes, so I'd need your config to pin down yours.

Could you share:

  • How you're serving it β€” llama-server --jinja? LM Studio? Ollama? + the exact command/config
  • Which harness/client, and how it's wired (Claude Code via what proxy?)
  • Quant + sampling (temp / top_p / top_k / repeat_penalty)
  • What the hallucination actually looks like β€” fabricated file contents? invented command output? wrong tool args?

With that I can tell you whether it's the template / tool-routing or something else.

The issue is that even when you do use llama-server --jinja with the exact native template and correct environment constraints, the weights still suffer from catastrophic logit meltdowns and architectural loops.

I have just documented this with full reproduction flags, exact configs, and logs under clean baseline controls in Discussion #20. The stock Gemma 4 IT works perfectly under repeat_penalty 1.0 on llama-server, whereas v2 collapses into endless numerical loops unless heavily suppressed by samplers. Would love to hear your thoughts on the logit explosions over there, since the serving path argument doesn't apply to those isolated tests.

Sign up or log in to comment