COMFYUI – AI for Architecture Case Study 04 : Render Style with Chroma

Task

Tutorial: Generating Architectural Images with the Chroma Model

This tutorial will guide you through a ComfyUI workflow designed to create high-quality architectural images using the Chroma model. The workflow is optimized to handle a detailed architectural prompt and generate a final image.

D:\LAURENT\Downloads\Gemini_Generated_Image_h10wxdh10wxdh10w.png

Chroma

Chroma is a text-to-image generative AI model that is based on the FLUX.1 architecture. It is known for its ability to produce high-quality images from text prompts and for its uncensored content generation capabilities.

Here are a few key points about the Chroma model:

  • Architecture: Chroma uses a rectified flow transformer architecture with 8.9 billion parameters, which is a significant reduction from the original 12 billion parameters of FLUX.1. This optimization improves its efficiency and speed while maintaining quality.
  • Uncensored: A key feature of Chroma is its uncensored approach, which gives users complete creative freedom and reintroduces anatomical concepts often removed from other commercial models.
  • Optimization: The developers made architectural changes to improve the model’s performance, including a drastic reduction in the modulation layer, which led to better adherence to prompts and reduced generative noise. It also uses a custom temporal distribution and Minibatch Optimal Transport to accelerate training and improve stability.
  • Accessibility: Chroma is an open-source model available in multiple formats, including standard checkpoints for ComfyUI, and optimized versions like FP8 Scaled Quantization for faster inference on less powerful hardware.

Step 1: Load the Models

The first part of the workflow involves loading the necessary models. These are the foundational components for image generation.

  • UNETLoader: This node loads the diffusion model. The workflow specifies chroma-unlocked-v33.safetensors from the diffusion_models directory. You can find the latest version of the Chroma model on Hugging Face.
  • CLIPLoader: There are two CLIPLoader nodes, and they load the text encoders. The workflow uses t5xxl_fp8_e4m3fn_scaled.safetensors for the positive prompt and t5xxl_fp16.safetensors for the negative prompt. The fp8 version is recommended for systems with lower VRAM. These files are stored in the text_encoders directory.
  • VAELoader: This node loads the VAE model, specified as ae.safetensors, which is located in the vae directory.

Step 2: Define Image Size and Prompt

Next, you will set the dimensions of the final image and write your prompts.

  • FluxEmptyLatentSizePicker: This node allows you to set the image resolution and other parameters for the initial latent image. In the workflow, the resolution is set to 1344×768, a 16:9 aspect ratio.

  • T5TokenizerOptions: This optional node provides a way to adjust the tokenizer settings, such as min_padding and min_length. The provided note mentions that min_padding 0 might give better results, even though min_padding 1 is the official way to run the model.
  • CLIPTextEncode: This is where you write your prompts.
    • Positive Prompt: The example prompt is a very detailed description of a « large contemporary building » with architectural features inspired by ocean waves. Using a specific, descriptive prompt like this helps the model generate a more precise image.
    • Negative Prompt: The negative prompt is configured to prevent common generation issues by using keywords like low quality, bad anatomy, extra digits, and missing limbs.

Step 3: Sampling and Decoding

This is the core of the image generation process, where the model creates the image based on your prompts.

  • FreSca: This is an optional node that improves the quality of anime-style images. The accompanying note suggests disabling it (using CTRL-B) if you are generating realistic images.
  • KSampler: This is the main sampling node that generates the latent image based on your model, prompts, and the latent image size.
  • VAEDecode: After the KSampler finishes, this node converts the latent image back into a viewable image.

Step 4: Save the Final Image

  • SaveImage: The final node in the workflow saves the generated architectural image to your computer.

Same Workflow using Flux.

Put back the CFG to 1

Flux-dev-fp8.safetensors

Flux-dev.safetensors

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *

This site uses Akismet to reduce spam. Learn how your comment data is processed.