Instructions to use bearzi/GLM-5.2-JANGTQ_K with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use bearzi/GLM-5.2-JANGTQ_K with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("bearzi/GLM-5.2-JANGTQ_K") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use bearzi/GLM-5.2-JANGTQ_K with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "bearzi/GLM-5.2-JANGTQ_K"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "bearzi/GLM-5.2-JANGTQ_K" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use bearzi/GLM-5.2-JANGTQ_K with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "bearzi/GLM-5.2-JANGTQ_K"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default bearzi/GLM-5.2-JANGTQ_K
Run Hermes
hermes
- MLX LM
How to use bearzi/GLM-5.2-JANGTQ_K with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "bearzi/GLM-5.2-JANGTQ_K"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "bearzi/GLM-5.2-JANGTQ_K" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bearzi/GLM-5.2-JANGTQ_K", "messages": [ {"role": "user", "content": "Hello"} ] }'
GLM-5.2-JANGTQ_K
JANGTQ (JANG TurboQuant) quantization of zai-org/GLM-5.2 (744B-parameter glm_moe_dsa MoE) for MLX on Apple silicon. TurboQuant applies a random-sign Hadamard rotation, a per-row FP16 norm, and a per-(dim,bits) Lloyd-Max codebook to the routed experts, keeping the backbone at higher precision.
Profile: JANGTQ_K (max-quality mixed precision) — ~260 GB on disk.
| Component | Precision |
|---|---|
| Routed experts — gate_proj / up_proj | 2-bit MXTQ (codebook + Hadamard) |
| Routed experts — down_proj | 4-bit MXTQ (codebook + Hadamard) |
| Attention (MLA + DSA indexer) | FP16 |
| Shared experts | FP16 |
| Router / norms | FP16 |
| Embeddings / LM head | FP16 |
MTP (multi-token-prediction) head is dropped — it serves speculative decoding only and is unused by the MLX single-token decode path.
Requirements
- ~260 GB of unified memory. Whole-machine model; will not load alongside other large jobs. A 512 GB Mac (e.g. M3 Ultra) loads it comfortably.
- Load with the
jang-toolspackage. Not supported by stock MLX, LM Studio, or Ollama. - Requires an IndexShare-patched runtime (mandatory). GLM-5.2 introduces IndexShare: most sparse-attention layers are
sharedand carry no DSA indexer weights — they reuse the top-k token selections computed by the periodicfulllayers. Stockmlx_lm'sglm_moe_dsamodel runs an indexer on every layer, so it cannot load this bundle as-is.
IndexShare patch
Stock mlx_lm (through 0.31.3) does not implement IndexShare. The patched module ships in the jang-tools runtime: it builds a DSA indexer only on full layers and reuses the most-recent full layer's indices on shared layers (with a matching make_cache). On stock mlx_lm you must apply an equivalent override to mlx_lm/models/glm_moe_dsa.py before loading. Note: pip install -U mlx-lm overwrites the patch — re-apply after any upgrade.
Usage
from jang_tools.load_jangtq import load_jangtq_model as load
from mlx_lm import generate
model, tokenizer = load("bearzi/GLM-5.2-JANGTQ_K")
msgs = [{"role": "user", "content": "Write a Python function that reverses a string."}]
prompt = tokenizer.apply_chat_template(msgs, add_generation_prompt=True, tokenize=False)
print(generate(model, tokenizer, prompt=prompt, max_tokens=512, verbose=True))
License
MIT, inherited from zai-org/GLM-5.2; quantization does not change the upstream terms. MIT requires retaining the copyright and license notice in redistributions.
- Downloads last month
- 5,383
Quantized
Model tree for bearzi/GLM-5.2-JANGTQ_K
Base model
zai-org/GLM-5.2