Instructions to use cagataydev/Qwen3.5-4B-cagatay-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cagataydev/Qwen3.5-4B-cagatay-4bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("cagataydev/Qwen3.5-4B-cagatay-4bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use cagataydev/Qwen3.5-4B-cagatay-4bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "cagataydev/Qwen3.5-4B-cagatay-4bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "cagataydev/Qwen3.5-4B-cagatay-4bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use cagataydev/Qwen3.5-4B-cagatay-4bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "cagataydev/Qwen3.5-4B-cagatay-4bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default cagataydev/Qwen3.5-4B-cagatay-4bit

Run Hermes

hermes

MLX LM

How to use cagataydev/Qwen3.5-4B-cagatay-4bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "cagataydev/Qwen3.5-4B-cagatay-4bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "cagataydev/Qwen3.5-4B-cagatay-4bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "cagataydev/Qwen3.5-4B-cagatay-4bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

🐤 Q-Tiny MLX — Qwen 3.5 4B Cagatay (4-bit)

A 4-bit quantized MLX model for Apple Silicon — fine-tuned for robotics reasoning and instruction following.

8.4 GB → 2.4 GB | 4.5 bits/weight | Runs on MacBook Air

What is this?

This is the merged + quantized version of cagataydev/qwen3.5-4B-cagatay (LoRA adapter) for native Apple Silicon inference via MLX.

Pipeline: Qwen/Qwen3.5-4B + LoRA adapter → merged weights → MLX 4-bit quantization (group size 64)

🚀 Use with Strands Agents + MLX

The recommended way to use this model is with strands-agents and strands-mlx:

pip install strands-agents strands-agents-mlx

from strands import Agent
from strands_mlx import MLXModel

# Load the 4-bit quantized model
model = MLXModel(model_id="cagataydev/Qwen3.5-4B-cagatay-4bit")

# Create an agent with tools
agent = Agent(model=model)

# Use it!
agent("Plan the steps to pick up a red cube and place it on the shelf")

With Custom Tools

from strands import Agent, tool
from strands_mlx import MLXModel

@tool
def get_robot_state() -> dict:
    """Get the current state of the robot."""
    return {"position": [0.5, 0.3, 0.1], "gripper": "open"}

model = MLXModel(
    model_id="cagataydev/Qwen3.5-4B-cagatay-4bit",
    params={"temperature": 0.7, "max_tokens": 1024}
)

agent = Agent(model=model, tools=[get_robot_state])
agent("What is the robot's current position? Then plan a pick-and-place task.")

With DevDuck

pip install devduck
export MODEL_PROVIDER=mlx
export STRANDS_MODEL_ID=cagataydev/Qwen3.5-4B-cagatay-4bit
devduck

📦 Use with mlx-lm (standalone)

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("cagataydev/Qwen3.5-4B-cagatay-4bit")

messages = [{"role": "user", "content": "Plan how to pick up a cup from the table"}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_dict=False)

response = generate(model, tokenizer, prompt=prompt, verbose=True, max_tokens=512)

CLI

mlx_lm generate --model cagataydev/Qwen3.5-4B-cagatay-4bit --prompt "Hello!"

📊 Model Details

Property	Value
Base Model	Qwen/Qwen3.5-4B
Fine-tune	cagataydev/qwen3.5-4B-cagatay (LoRA)
Architecture	Qwen 3.5 (32 layers, 2560 hidden, 16 heads)
Parameters	4B total
Quantization	4-bit (4.503 bits/weight, group size 64)
Model Size	2.4 GB (down from 8.4 GB fp16)
Format	MLX SafeTensors
Platform	Apple Silicon (M1/M2/M3/M4)
License	Apache 2.0

🏋️ Training Provenance

The LoRA adapter was trained with:

Parameter	Value
Method	LoRA + SFT (TRL)
LoRA Rank	32
LoRA Alpha	64
Dropout	0.05
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Infrastructure	HuggingFace Jobs (cloud GPU)

🤖 Use Cases

Robotics task planning — Break down commands into step-by-step action plans
Embodied reasoning — Spatial understanding and action sequencing
Edge deployment — 2.4 GB fits comfortably on any Apple Silicon Mac
Strands agent backbone — Local model for Strands Agents on Mac
Neon VLA — Part of the Neon VLA vision-language-action stack

📦 Q-Model Family

Model	Base	Size	Quantized	Use Case
🌐 Q-Omni	Qwen 2.5 Omni 3B	3B	—	Voice & multimodal
🐤 Q-Tiny (this)	Qwen 3.5 4B	4B	2.4 GB 4-bit	Task planning on Mac
🧠 Q-Brain	Qwen 3.5 35B MoE	35B (3B active)	—	Complex reasoning

Built with DevDuck 🦆 and Strands Agents 🧬

Downloads last month: 8

Safetensors

Model size

0.7B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for cagataydev/Qwen3.5-4B-cagatay-4bit

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Quantized

(278)

this model