How to use this model with Claude Code / Codex ?

#6
by bebahung - opened

I tried to use this model with LM Studio and connect via Claude Code.

However, it's getting error on jinja prompt.

Anyone have tested with Claude Code and give me good config to use ?

Thanks a lot!

H.

Hey H., thanks for trying it! I actually set this up and tested it end-to-end just now, so here's a config I canconfirm works.

First, the error itself: that jinja error is LM Studio's template engine ("minja") choking on this model's custom chat
template — it uses a thinking channel + Gemma 4's native tool-calling format, which relies on jinja features minja
doesn't fully support. So it fails before the model even runs. It's not your setup. The catch for Claude Code
specifically is that it leans entirely on tool-calling, which is exactly the part minja handles worst — so LM Studio
isn't the right server here.

What I tested and confirmed working: llama.cpp's llama-server with --jinja (its full jinja backend renders the native
template correctly) → a small proxy to translate Anthropic↔OpenAI → Claude Code CLI pointed at it.

  1. Serve the model:
    llama-server -m gemma4-v2-Q4_K_M.gguf --jinja -ngl 99 --no-mmap -fa on --ctx-size 32768 --temp 1.0 --top-p 0.95
    --top-k 64 --repeat-penalty 1.1 --host 127.0.0.1 --port 8080
  2. Bridge it (Claude Code speaks Anthropic's API, llama-server speaks OpenAI). I used LiteLLM with a one-line config
    (model_name: "*" → api_base http://127.0.0.1:8080/v1), run it on port 4000. claude-code-router works too. (Windows
    note: if LiteLLM crashes on startup with a UnicodeEncodeError, set PYTHONUTF8=1 / PYTHONIOENCODING=utf-8 first.)
  3. Point Claude Code at the proxy:
    ANTHROPIC_BASE_URL=http://127.0.0.1:4000
    ANTHROPIC_AUTH_TOKEN=dummy
    ANTHROPIC_MODEL=gemma4-v2

Results on my machine: a plain coding question (write is_prime) came back with correct, clean code. An agentic task
("create fizzbuzz.py and run it") worked too — it drove the tools, created the file, and the code it wrote was
correct.

One honest heads-up though: Claude Code is about the heaviest harness there is (huge system prompt + a dozen tools),
which is genuinely hard for any 12B. In longer agent loops this model can get shaky — e.g. in my test it wrote a
correct file but then mis-reported its own run output, so trust-but-verify what it tells you. For smoother local
agentic coding on a 12B, a lighter agent like opencode tends to behave better. But the setup above does work — give it
a go and let me know how it lands.

Hi, I noticed you mentioned that a lighter agent like opencode works better. I have been using LM-Studio as the backend and OpenCode as the agent and I keep getting this error:
"Error rendering prompt with jinja template: "Cannot call something that is not a function: got UndefinedValue".\n\nThis is usually an issue with the model's prompt template. If you are using a popular model, you can try to search the model under lmstudio-community, which will have fixed prompt templates. If you cannot find one, you are welcome to post this issue to our discord or issue tracker on GitHub. Alternatively, if you know how to write jinja templates, you can override the prompt template in My Models > model settings > Prompt Template."
I also saw you recommend llama.cpp, so I was wondering if there was gonna be any support for LM-Studio or Ollama.

Hi @TheZayBae — that UndefinedValue error is the same root cause as the earlier LM Studio jinja issue: LM Studio's
template engine (minja) doesn't fully implement the standard jinja2 features this model's template uses (the thinking
channel + Gemma 4's native tool-calling rely on .get(), namespace, etc., which minja doesn't support). It fails at
template-render time, before the model even runs — so it's not your config. It's also not something I can "fix" by
retraining: the model's template is standard for Gemma 4; the gap is on the client engine's side.

To your direct question — for an agent like OpenCode, which leans entirely on tool-calling, LM Studio is the wrong
backend, because tool rendering is exactly minja's weakest branch (even if you get past the chat error, tool calls
stay unreliable). The robust path — keep OpenCode, swap the backend to llama.cpp:

  1. Serve with llama.cpp (full jinja backend + a real native-tool parser → clean structured tool_calls):
    llama-server -m gemma4-v2-Q4_K_M.gguf --jinja -ngl 99 --no-mmap -fa on
    --ctx-size 65536 --temp 1.0 --top-p 0.95 --top-k 64 --repeat-penalty 1.1
    --host 127.0.0.1 --port 8080
    (OpenCode's system prompt is large — give it room; 32k floor, 64k comfortable.)

  2. Point OpenCode at it as an OpenAI-compatible provider (baseURL: http://127.0.0.1:8080/v1), and in opencode.json set
    the tool parser to raw-function-call + json — that's what makes OpenCode read Gemma 4's native tool format instead of
    expecting its own. Keep the tool set small and use clear parameter names; a 12B over-calls less that way.

On LM Studio / Ollama support specifically:

  • LM Studio: you can override the prompt template under My Models → model settings → Prompt Template (which is what
    the error suggests) — I can share a minja-safe chat-only version — but I'd be upfront that it only fixes plain chat;
    it won't make tool-calling reliable in minja, so it won't get OpenCode working.
  • Ollama: works if you hand it the official Gemma 4 Go template (ollama pull gemma4 then ollama show --modelfile
    gemma4, copy the TEMPLATE block) and pin num_ctx. But again, for a tool-driven agent, llama.cpp --jinja is the most
    reliable.

Bottom line: it's a client template-engine limitation, not something living in the weights — and llama.cpp --jinja is
the one backend I've confirmed returns clean structured tool calls for this model. Give that a go with OpenCode and
let me know how it lands.

Thank you so much for responding so fast! I look forward to using this model once I get everything all set up!

in opencode.json set the tool parser to raw-function-call + json

Neither I nor the duck can find any mention of this anywhere. Can you provide the exact syntax to use? Are you maybe using a fork or custom build of OpenCode?

@quaestor Fair challenge — and no, no fork or custom build, it's stock OpenCode. The reason you couldn't find it is on me: I gave it as shorthand ("raw-function-call + json") instead of the exact key. It's an option on the @ai-sdk/openai-compatible provider, nested under options.toolParser, and it's an array. Here's the verbatim block:

{
"$schema": "https://opencode.ai/config.json",
"provider": {
"llama": {
"npm": "@ai-sdk/openai-compatible",
"name": "llama.cpp (local)",
"options": {
"baseURL": "http://127.0.0.1:8080/v1",
"toolParser": [
{ "type": "raw-function-call" },
{ "type": "json" }
]
},
"models": {
"gemma4-v2": {
"name": "Gemma 4 12B v2",
"tool_call": true,
"limit": { "context": 65536, "output": 8192 }
}
}
}
},
"model": "llama/gemma4-v2"
}

What it does: raw-function-call rewrites the tools into the legacy function-call format Gemma 4 emits natively, and json recovers any tool calls that come back as plain text — together that's what makes OpenCode read Gemma 4's native tool format instead of expecting its own.

Since it's a provider option for @ai-sdk/openai-compatible it's only lightly documented, which is why it doesn't turn up easily — the clearest concrete reference is this Gemma‑4‑on‑llama.cpp gist that uses the exact block: https://gist.github.com/daniel-farina/87dc1c394b94e45bb700d27e9ea03193 (and OpenCode's config/providers docs for the surrounding structure). Sorry for the run‑around — that snippet is the precise thing.

@undefined

TYSM! JSON is just terrible for human-generated deeply-nested docs like this. It's basically just guesswork unless one has a concrete example...

Sign up or log in to comment