Instructions to use cagataydev/Qwen3.5-4B-cagatay-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use cagataydev/Qwen3.5-4B-cagatay-4bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("cagataydev/Qwen3.5-4B-cagatay-4bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use cagataydev/Qwen3.5-4B-cagatay-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "cagataydev/Qwen3.5-4B-cagatay-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "cagataydev/Qwen3.5-4B-cagatay-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use cagataydev/Qwen3.5-4B-cagatay-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "cagataydev/Qwen3.5-4B-cagatay-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default cagataydev/Qwen3.5-4B-cagatay-4bit
Run Hermes
hermes
- MLX LM
How to use cagataydev/Qwen3.5-4B-cagatay-4bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "cagataydev/Qwen3.5-4B-cagatay-4bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "cagataydev/Qwen3.5-4B-cagatay-4bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cagataydev/Qwen3.5-4B-cagatay-4bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
🐤 Q-Tiny MLX — Qwen 3.5 4B Cagatay (4-bit)
A 4-bit quantized MLX model for Apple Silicon — fine-tuned for robotics reasoning and instruction following.
8.4 GB → 2.4 GB | 4.5 bits/weight | Runs on MacBook Air
What is this?
This is the merged + quantized version of cagataydev/qwen3.5-4B-cagatay (LoRA adapter) for native Apple Silicon inference via MLX.
Pipeline: Qwen/Qwen3.5-4B + LoRA adapter → merged weights → MLX 4-bit quantization (group size 64)
🚀 Use with Strands Agents + MLX
The recommended way to use this model is with strands-agents and strands-mlx:
pip install strands-agents strands-agents-mlx
from strands import Agent
from strands_mlx import MLXModel
# Load the 4-bit quantized model
model = MLXModel(model_id="cagataydev/Qwen3.5-4B-cagatay-4bit")
# Create an agent with tools
agent = Agent(model=model)
# Use it!
agent("Plan the steps to pick up a red cube and place it on the shelf")
With Custom Tools
from strands import Agent, tool
from strands_mlx import MLXModel
@tool
def get_robot_state() -> dict:
"""Get the current state of the robot."""
return {"position": [0.5, 0.3, 0.1], "gripper": "open"}
model = MLXModel(
model_id="cagataydev/Qwen3.5-4B-cagatay-4bit",
params={"temperature": 0.7, "max_tokens": 1024}
)
agent = Agent(model=model, tools=[get_robot_state])
agent("What is the robot's current position? Then plan a pick-and-place task.")
With DevDuck
pip install devduck
export MODEL_PROVIDER=mlx
export STRANDS_MODEL_ID=cagataydev/Qwen3.5-4B-cagatay-4bit
devduck
📦 Use with mlx-lm (standalone)
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("cagataydev/Qwen3.5-4B-cagatay-4bit")
messages = [{"role": "user", "content": "Plan how to pick up a cup from the table"}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_dict=False)
response = generate(model, tokenizer, prompt=prompt, verbose=True, max_tokens=512)
CLI
mlx_lm generate --model cagataydev/Qwen3.5-4B-cagatay-4bit --prompt "Hello!"
📊 Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen3.5-4B |
| Fine-tune | cagataydev/qwen3.5-4B-cagatay (LoRA) |
| Architecture | Qwen 3.5 (32 layers, 2560 hidden, 16 heads) |
| Parameters | 4B total |
| Quantization | 4-bit (4.503 bits/weight, group size 64) |
| Model Size | 2.4 GB (down from 8.4 GB fp16) |
| Format | MLX SafeTensors |
| Platform | Apple Silicon (M1/M2/M3/M4) |
| License | Apache 2.0 |
🏋️ Training Provenance
The LoRA adapter was trained with:
| Parameter | Value |
|---|---|
| Method | LoRA + SFT (TRL) |
| LoRA Rank | 32 |
| LoRA Alpha | 64 |
| Dropout | 0.05 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Infrastructure | HuggingFace Jobs (cloud GPU) |
🤖 Use Cases
- Robotics task planning — Break down commands into step-by-step action plans
- Embodied reasoning — Spatial understanding and action sequencing
- Edge deployment — 2.4 GB fits comfortably on any Apple Silicon Mac
- Strands agent backbone — Local model for Strands Agents on Mac
- Neon VLA — Part of the Neon VLA vision-language-action stack
📦 Q-Model Family
| Model | Base | Size | Quantized | Use Case |
|---|---|---|---|---|
| 🌐 Q-Omni | Qwen 2.5 Omni 3B | 3B | — | Voice & multimodal |
| 🐤 Q-Tiny (this) | Qwen 3.5 4B | 4B | 2.4 GB 4-bit | Task planning on Mac |
| 🧠 Q-Brain | Qwen 3.5 35B MoE | 35B (3B active) | — | Complex reasoning |
Built with DevDuck 🦆 and Strands Agents 🧬
- Downloads last month
- 8
4-bit