Krea-2-Turbo · MLX · 4-bit (mflux)

A 4-bit quantized MLX conversion of krea/Krea-2-Turbo, saved with mflux for fast local text-to-image generation on Apple Silicon.

Krea 2 is a single-stream MMDiT text-to-image model built on the Qwen-Image stack: it reuses the Qwen-Image VAE and conditions on a 12-layer hidden-state tap from a Qwen3-VL-4B text encoder. The Turbo variant is distilled and produces high-quality images in 8 steps.

Details

Base model krea/Krea-2-Turbo
Format MLX safetensors (sharded)
Quantization 4-bit
Saved with mflux 0.18.0
Pipeline Text-to-image
Hardware Apple Silicon (Metal / MLX)

This is a ready-to-run quantized snapshot, so it loads without re-quantizing at runtime. It contains the transformer, the Qwen3-VL-4B text encoder, the tokenizer, and the Qwen-Image VAE. At 4-bit it is the smallest snapshot (~7 GB), trading some fidelity for a lower memory footprint than the 8-bit build.

Usage

Install mflux:

pip install mflux

Generate from the local model directory:

mflux-generate-krea2 \
  --model /path/to/krea2-q4 \
  --prompt "a photograph of a red fox sitting in a sunlit forest clearing, sharp focus, bokeh" \
  --width 1024 \
  --height 1024 \
  --seed 42 \
  --steps 8

Turbo defaults: 8 steps, guidance 1.0 (CFG off), er_sde sampler. The plain flow-matching Euler sampler — which matches the official diffusers FlowMatchEulerDiscreteScheduler — is available via --scheduler euler.

Standard mflux CLI options are supported (--metadata, --stepwise-image-output-dir, multiple --seed values). Image conditioning (edit / reference) is not yet implemented.

Python API

from mflux.models.krea2 import Krea2

model = Krea2(model_path="/path/to/krea2-q4")
image = model.generate_image(
    seed=42,
    prompt="a photograph of a red fox sitting in a sunlit forest clearing, sharp focus, bokeh",
    num_inference_steps=8,
    width=1024,
    height=1024,
    guidance=1.0,
)
image.save("krea2_fox.png")

Architecture

  • Transformer: 28-layer single-stream MMDiT — hidden 6144, GQA (48 query / 12 KV heads, head_dim 128), SwiGLU, 3-axis Flux-style RoPE [32, 48, 48], per-head QK-norm + sigmoid-gated attention, AdaLN-single 6-way modulation, and a txtfusion adapter that fuses the 12 text-encoder hidden states.
  • Text encoder: Qwen3-VL-4B, 12-layer tap [2, 5, …, 35] flattened layer-major; the chat-template prefix is stripped so only prompt tokens condition the DiT.
  • VAE: Qwen-Image VAE (Wan2.1 16-channel latent).

License

This conversion inherits the license of the base model, krea/Krea-2-Turbo. Review and accept the original model's terms before use.

Acknowledgements

  • Krea for the original Krea-2-Turbo model
  • mflux for the MLX implementation and conversion tooling
  • MLX by Apple
Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
2B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MLXBits/krea-2-mlx-q4

Base model

krea/Krea-2-Raw
Finetuned
(5)
this model

Collection including MLXBits/krea-2-mlx-q4