Nunchaku 0.2 SVDQuant 4-bit Quantization FLUX Model Technical Updated

Recently, there has been a significant update to a new quantization technology.
It’s called Nunchaku , https://github.com/mit-han-lab/nunchaku

which can quantize models based on the U-Net and DiT architectures into the Int4 format while maintaining image quality comparable to that of FP8 models. Developed by the MIT Han Lab team, it aims to significantly improve generation speed and reduce GPU memory usage through quantization techniques such as 4-bit weights and activations.

Previous versions only supported the addition of a single LoRA for image generation, and installing via WHL wheels was extremely challenging, making it barely usable.

However, the newly released version 0.2 adds several features:

Multiple interconnected LoRAs are now supported.
ControlNet support has been added, currently limited to Flux Canny, Flux Depth, and Union-Pro. (FP8 models are not supported, nor are other ControlNet models.)
Additionally, it now supports generating high-resolution images above 2K !
With the integration of FP16 attention mechanism and First-Block Cache , performance has improved significantly.
Now, not only are 20-series GPUs compatible, but 50-series GPUs are also fully supported.
The best part? LoRA models no longer need to be converted —they can be used directly, saving a lot of disk space!

The plugin has been officially renamed to ComfyUI-nunchaku , and the node names have been updated:
(SVDQuant XXX -> Nunchaku XXX ), so users should take note of this change!

Compatibility issues still persist, as the following are not yet supported:

Batch size greater than 1 is not supported.
PuildID is not supported.
Below, I will list a comparison table based on my laptop’s specs with a 3060 GPU and 6GB VRAM:

4.2dev-fp8-20steps-1024x1024+gibilylora+teacache-1m06s

FluxFp8 fp8_e4m3fn , 20 steps, 1024×1024, 2 minutes 10 seconds
FluxFp8 fp8_e4m3fn , 20 steps, 1024×1024, teacache , 1 minute 6 seconds
FluxFp8 fp8_e4m3fn , 8 steps, 1024×1024, turbolora , 52 seconds
SVDdev , 20 steps, 1024×1024, turbolora , 49 seconds
SVDdev , 8 steps, 1024×1024, turbolora , 18 seconds

The speed improvement of Image 5 compared to Image 1 is substantial.

Personally, I think it’s really impressive!

Category : StableDiffusion


2min10s	1min06s	52s	49s	18s