JoyAI-Image-Edit (ComfyUI weights)

Single-file .safetensors checkpoints of JoyAI-Image-Edit, repackaged for native ComfyUI support (no custom node required).

JoyAI-Image-Edit is the single-image instruction-guided editing model of the JoyAI-Image family. It takes one reference image plus a text instruction and generates the edited result.

Files

File	Size	Goes into	Component
`diffusion_models/joy_image_edit_bf16.safetensors`	~31 GB	`ComfyUI/models/diffusion_models/`	`JoyImageEditTransformer3DModel` (bf16)
`text_encoders/qwen3vl_joyimage_bf16.safetensors`	~17 GB	`ComfyUI/models/text_encoders/`	Qwen3-VL-8B text encoder (bf16)
`vae/joy_image_edit_vae.safetensors`	~243 MB	`ComfyUI/models/vae/`	`AutoencoderKLWan`

The repo's directory layout already matches ComfyUI/models/, so a single hf download into your models root drops every file where it needs to go.

Installation

The model runs natively in ComfyUI. Native support is proposed upstream in Comfy-Org/ComfyUI#14428; until it is merged, install the fork branch:

git clone -b joyimage-edit-pr https://github.com/feice-huang/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt

Once the PR is merged upstream, the stock ComfyUI release will run these weights with no fork needed.

Then download the weights straight into ComfyUI/models/:

hf download jdopensource/JoyAI-Image-Edit-ComfyUI \
  --local-dir /path/to/ComfyUI/models

Restart ComfyUI.

Usage

Build the graph from these native nodes:

Load Diffusion Model (UNETLoader) → diffusion_models/joy_image_edit_bf16.safetensors
Load CLIP (CLIPLoader) → text_encoders/qwen3vl_joyimage_bf16.safetensors, type joyimage
Load VAE (VAELoader) → vae/joy_image_edit_vae.safetensors
Load Image (LoadImage) for the reference
TextEncodeJoyImageEdit — feed clip, vae, the instruction, and the reference image. Wire one instance for the positive prompt and one (empty prompt, same image) for the negative. The node bucket-resizes the reference to the 1024-base buckets, VAE-encodes it, and appends the reference latent to the conditioning; its image output feeds VAEDecode / empty-latent sizing.
KSampler → VAEDecode → SaveImage

Example workflow: workflow_joyimage_edit.json

Recommended parameters

Parameter	Value
Steps	40
CFG	4.0
Sampler	`euler`
Scheduler	`simple`
dtype	bf16
Resolution	auto (1024-base buckets)

GGUF quantizations

Lower-bit GGUF quants of the transformer and text encoder are available at huangfeice/JoyAI-Image-Edit-Diffusers-GGUF (community contribution). The VAE here is the only VAE you need — GGUF doesn't quantize the VAE.

Model tree for jdopensource/JoyAI-Image-Edit-ComfyUI

Base model

jdopensource/JoyAI-Image-Edit-Diffusers

Finetuned

(1)

this model

jdopensource
/

JoyAI-Image-Edit-ComfyUI