Instructions to use MLXBits/krea-2-mlx-q4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use MLXBits/krea-2-mlx-q4 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir krea-2-mlx-q4 MLXBits/krea-2-mlx-q4
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Krea-2-Turbo · MLX · 4-bit (mflux)
A 4-bit quantized MLX conversion of
krea/Krea-2-Turbo, saved with
mflux for fast local text-to-image
generation on Apple Silicon.
Krea 2 is a single-stream MMDiT text-to-image model built on the Qwen-Image stack: it reuses the Qwen-Image VAE and conditions on a 12-layer hidden-state tap from a Qwen3-VL-4B text encoder. The Turbo variant is distilled and produces high-quality images in 8 steps.
Details
| Base model | krea/Krea-2-Turbo |
| Format | MLX safetensors (sharded) |
| Quantization | 4-bit |
| Saved with | mflux 0.18.0 |
| Pipeline | Text-to-image |
| Hardware | Apple Silicon (Metal / MLX) |
This is a ready-to-run quantized snapshot, so it loads without re-quantizing at runtime. It contains the transformer, the Qwen3-VL-4B text encoder, the tokenizer, and the Qwen-Image VAE. At 4-bit it is the smallest snapshot (~7 GB), trading some fidelity for a lower memory footprint than the 8-bit build.
Usage
Install mflux:
pip install mflux
Generate from the local model directory:
mflux-generate-krea2 \
--model /path/to/krea2-q4 \
--prompt "a photograph of a red fox sitting in a sunlit forest clearing, sharp focus, bokeh" \
--width 1024 \
--height 1024 \
--seed 42 \
--steps 8
Turbo defaults: 8 steps, guidance 1.0 (CFG off), er_sde sampler. The
plain flow-matching Euler sampler — which matches the official diffusers
FlowMatchEulerDiscreteScheduler — is available via --scheduler euler.
Standard mflux CLI options are supported (--metadata,
--stepwise-image-output-dir, multiple --seed values). Image conditioning
(edit / reference) is not yet implemented.
Python API
from mflux.models.krea2 import Krea2
model = Krea2(model_path="/path/to/krea2-q4")
image = model.generate_image(
seed=42,
prompt="a photograph of a red fox sitting in a sunlit forest clearing, sharp focus, bokeh",
num_inference_steps=8,
width=1024,
height=1024,
guidance=1.0,
)
image.save("krea2_fox.png")
Architecture
- Transformer: 28-layer single-stream MMDiT — hidden 6144, GQA (48 query /
12 KV heads, head_dim 128), SwiGLU, 3-axis Flux-style RoPE
[32, 48, 48], per-head QK-norm + sigmoid-gated attention, AdaLN-single 6-way modulation, and atxtfusionadapter that fuses the 12 text-encoder hidden states. - Text encoder: Qwen3-VL-4B, 12-layer tap
[2, 5, …, 35]flattened layer-major; the chat-template prefix is stripped so only prompt tokens condition the DiT. - VAE: Qwen-Image VAE (Wan2.1 16-channel latent).
License
This conversion inherits the license of the base model,
krea/Krea-2-Turbo. Review and accept
the original model's terms before use.
Acknowledgements
Quantized