Instructions to use empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF", filename="Qwythos-9B-Claude-Mythos-5-1M-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
Use Docker
docker model run hf.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
- Ollama
How to use empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF with Ollama:
ollama run hf.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
- Unsloth Studio
How to use empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF to start chatting
- Pi
How to use empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF with Docker Model Runner:
docker model run hf.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
- Lemonade
How to use empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwythos-9B-Claude-Mythos-5-1M-GGUF-Q4_K_M
List all available models
lemonade list
Qwythos-9B-Claude-Mythos-5-1M-GGUF
Developed by Empero
GGUF quantizations of empero-ai/Qwythos-9B-Claude-Mythos-5-1M for llama.cpp, Ollama, LM Studio, jan, KoboldCpp, and other GGUF runtimes.
Qwythos-9B is a full-parameter reasoning model post-trained on over 500 million tokens of high-quality Claude Mythos / Claude Fable traces with chain-of-thought generated in-house by Empero AI's internal rethink tool. It dominates the base Qwen3.5-9B under matched evaluation (+34 pts MMLU, +30 pts gsm8k-strict, +19 pts gsm8k-flex), supports native function calling per the Qwen3.5 spec, and ships with a 1,048,576-token (1M) context window via YaRN rope-scaling enabled by default.
For full training details, evaluation numbers, and capability writeup, see the base model card.
Files
Normal text weights β fixed v3 replacements
| File | Quant | Size | Notes |
|---|---|---|---|
Qwythos-9B-Claude-Mythos-5-1M-Q4_K_M.gguf |
Q4_K_M | 5.24 GiB / 5.63 GB | recommended default β fixed v3, best compatibility |
Qwythos-9B-Claude-Mythos-5-1M-Q5_K_M.gguf |
Q5_K_M | 6.02 GiB / 6.47 GB | fixed v3, balanced quality / size |
Qwythos-9B-Claude-Mythos-5-1M-Q6_K.gguf |
Q6_K | 6.85 GiB / 7.36 GB | fixed v3, high quality |
Qwythos-9B-Claude-Mythos-5-1M-Q8_0.gguf |
Q8_0 | 8.87 GiB / 9.53 GB | fixed v3, near-lossless |
Qwythos-9B-Claude-Mythos-5-1M-BF16.gguf |
BF16 | 16.69 GiB / 17.92 GB | fixed v3, full precision conversion base |
If you don't know which to pick, Q4_K_M is the right starting point β it's the smallest practical quant with good quality preservation.
MTP-enabled text weights β fixed v3 variants
These include the restored Qwen3.5-compatible MTP head inside the GGUF. Use them with llama.cpp builds that support MTP draft speculation, for example --spec-type draft-mtp.
| File | Quant | Size | Notes |
|---|---|---|---|
Qwythos-9B-Claude-Mythos-5-1M-MTP-Q4_K_M.gguf |
Q4_K_M + MTP | 5.48 GiB / 5.89 GB | recommended MTP default |
Qwythos-9B-Claude-Mythos-5-1M-MTP-Q5_K_M.gguf |
Q5_K_M + MTP | 6.26 GiB / 6.73 GB | MTP, balanced quality / size |
Qwythos-9B-Claude-Mythos-5-1M-MTP-Q6_K.gguf |
Q6_K + MTP | 7.09 GiB / 7.62 GB | MTP, high quality |
Qwythos-9B-Claude-Mythos-5-1M-MTP-Q8_0.gguf |
Q8_0 + MTP | 9.11 GiB / 9.79 GB | MTP, near-lossless |
Qwythos-9B-Claude-Mythos-5-1M-MTP-BF16.gguf |
BF16 + MTP | 17.14 GiB / 18.41 GB | MTP, full precision conversion base |
Vision projector β for image input
| File | Size | Notes |
|---|---|---|
mmproj-Qwythos-9B-Claude-Mythos-5-1M-F16.gguf |
0.86 GiB / 0.92 GB | CLIP-style vision encoder + projector; required for images, pairs with any normal or MTP quant above |
Qwythos inherits its vision tower from the Qwen3.5-9B base model β the vision path was frozen during SFT (training was text-only), so the vision behavior is identical to base Qwen3.5-9B's multimodal capability. The mmproj is interchangeable with any community-built Qwen3.5-9B mmproj-*.gguf.
Quick start
llama.cpp (llama-cli)
llama-cli \
-m Qwythos-9B-Claude-Mythos-5-1M-Q4_K_M.gguf \
-p "Walk through the biochemistry of how organophosphate nerve agents inhibit acetylcholinesterase." \
-n 8192 \
--temp 0.6 --top-p 0.95 --top-k 20 --repeat-penalty 1.05 \
-c 16384
Ollama
ollama run hf.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
LM Studio / jan / KoboldCpp
Drop any of the .gguf files into your runtime's model directory. Qwythos uses the standard Qwen3.5 chat template; modern GGUF runtimes load it automatically from the file.
llama.cpp with MTP draft speculation
llama-server \
-m Qwythos-9B-Claude-Mythos-5-1M-MTP-Q4_K_M.gguf \
--spec-type draft-mtp \
--spec-draft-n-max 6 \
-c 16384 --port 8080
MTP support requires a recent llama.cpp build. If your runtime does not support MTP yet, use the normal fixed v3 files above.
Vision (image input)
Qwythos supports image input out of the box. Download both a text quant and the mmproj-*.gguf file from this repo, then run with llama.cpp's multimodal CLI or server.
llama.cpp (llama-mtmd-cli)
llama-mtmd-cli \
-m Qwythos-9B-Claude-Mythos-5-1M-Q4_K_M.gguf \
--mmproj mmproj-Qwythos-9B-Claude-Mythos-5-1M-F16.gguf \
--image ./photo.jpg \
-p "Describe this image in detail." \
--temp 0.6 --top-p 0.95 --top-k 20 \
-c 16384
llama.cpp server (OpenAI-compatible API with images)
llama-server \
-m Qwythos-9B-Claude-Mythos-5-1M-Q4_K_M.gguf \
--mmproj mmproj-Qwythos-9B-Claude-Mythos-5-1M-F16.gguf \
-c 16384 --port 8080
Then POST to /v1/chat/completions with an image URL or base64 payload β the standard OpenAI vision API shape works.
LM Studio
Load the text quant; LM Studio detects the matching mmproj-*.gguf in the same folder and enables the image-attach button automatically.
What vision unlocks
Since Qwythos inherits its vision tower unchanged from Qwen3.5-9B base, expect Qwen3.5-9B's documented vision capabilities: detailed image description, OCR (printed + handwritten), chart/table reading, UI/document understanding, basic spatial reasoning.
Honest note: the SFT used to produce Qwythos was text-only β we did not fine-tune the vision tower or train on any image-paired data. Image-grounded reasoning therefore inherits the base model's behavior; it has not been independently evaluated as part of this release. If your application is primarily vision-driven, validate on your own use case first.
Sampling recommendations
Qwythos is a reasoning model β every response opens with a <think>...</think> block before the final answer. Use these settings as defaults:
| Parameter | Value |
|---|---|
temperature |
0.6 |
top_p |
0.95 |
top_k |
20 |
repeat_penalty |
1.05 |
max_new_tokens |
16384 (generous budget for <think> + answer) |
These match Qwen3.5's official thinking-mode recommendations. Avoid greedy decoding and very-low-temperature sampling (T β€ 0.3) β both can cause repetition loops on long reasoning generations.
Long context (1M tokens)
The GGUFs ship with YaRN rope-scaling baked in for a 1,048,576-token context window (4Γ extension over the 262k native).
To use the full 1M window in llama-cli, set -c 1010000 (or any context length up to that). For shorter prompts, lower -c to reduce KV-cache memory β at default settings llama.cpp will autosize.
A single H100/H200-class GPU comfortably handles 256kβ512k; the full 1M typically needs tensor-parallel multi-GPU or aggressive KV-cache offload.
Capabilities (from the base model card)
- +34 pts MMLU, +30 pts gsm8k-strict, +19 pts gsm8k-flex vs. base Qwen3.5-9B under matched lm-eval-harness evaluation
- Native function calling per Qwen3.5's chat-template spec β emits
<tool_call><function=NAME><parameter=NAME>VAL</parameter></function></tool_call>blocks ready for any tool-use loop - Self-correcting with tools: in a 7-prompt tool-use harness (Python executor + DuckDuckGo search), Qwythos produced source-cited correct answers on 7/7, including 4/4 closed-book failure-modes from the original review
- Uncensored β engages seriously with technically demanding questions across cybersecurity, red-teaming, biology, pharmacology, and clinical medicine
- 1,048,576-token (1M) context β YaRN rope-scaling enabled by default
For full eval transcripts and per-task numbers, see the base model card's evals/ folder.
Limitations
- Reasoning model. Every answer opens with a
<think>block; allow generousmax_new_tokensand parse/strip<think>...</think>for end users. - Use recommended sampling. Greedy / very-low-temp can cause repetition loops.
- Verify specifics in safety-critical contexts. Like all closed-book LLMs in this weight class, Qwythos can over-commit to specific identifiers (CVEs, hashcat modes, drug positions) it isn't certain about. Pair with retrieval or function calling in such deployments β the model uses tools cleanly when offered them.
- Uncensored β add your own application-level review/safety layer for end-user-facing deployments where that matters.
Stay in the loop
Sign up for the Empero newsletter at empero.org for releases, evals, and research notes.
Support / Donate
If this model helped you, consider supporting the project:
- BTC:
bc1qx6zepu6sfkvshgdmc4ewu6pk6rpadvpgffpp7v - LTC:
ltc1qv2mefzps2vtjcpwfx8xxdrpplrcvltswm68r7x - XMR:
42Dbm5xg5Nq26fdyzfEU7KBnAJfhi7Cvz5J2ex5CzHXkfKuNEJzYCcmJ1GTbgjFZ5MBx72sdG1G9239Cd6rsZfv4QeDkYJY
Provenance & licensing
Weights are released under Apache-2.0, inherited from the Qwen3.5-9B base. Shared for research and experimentation, as-is.
Acknowledgements
- Developed and released by Empero
- Base model: Qwen3.5-9B (Alibaba Qwen team)
- Quantization: llama.cpp (ggml-org)
- Vision projector (
mmproj): inherited from Qwen3.5-9B (vision tower unchanged); F16 GGUF re-hosted with thanks to Unsloth for the original conversion - HF model: empero-ai/Qwythos-9B-Claude-Mythos-5-1M
- Downloads last month
- 970,663
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF
Base model
Qwen/Qwen3.5-9B-Base
ollama run hf.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF: