A Coding Guide to High-Quality Image Production, Control, and Editing Using HuggingFace Diffusers

0 0 3 minutes read

A Coding Guide to High-Quality Image Production, Control, and Editing Using HuggingFace Diffusers

In this tutorial, we design an efficient workflow for image generation using Diffusers the library. We start by stabilizing the environment, then produce high quality images with text notification using Stable Diffusion with an advanced editor. We speed up implicit reasoning based on LoRA, index structure and ControlNet under edge conditioning, and finally perform localized programming by mapping. Also, we focus on real-world techniques that balance image quality, speed, and control.

!pip -q uninstall -y pillow Pillow || true
!pip -q install --upgrade --force-reinstall "pillow<12.0"
!pip -q install --upgrade diffusers transformers accelerate safetensors huggingface_hub opencv-python


import os, math, random
import torch
import numpy as np
import cv2
from PIL import Image, ImageDraw, ImageFilter
from diffusers import (
   StableDiffusionPipeline,
   StableDiffusionInpaintPipeline,
   ControlNetModel,
   StableDiffusionControlNetPipeline,
   UniPCMultistepScheduler,
)

We ensure a clean and consistent runtime by resolving dependency conflicts and installing all required libraries. We ensure that image processing works reliably by pinning the correct version of the cushion and loading the Diffusers ecosystem. We also import all the essential modules needed for the execution, control, and flow of painting work.

def seed_everything(seed=42):
   random.seed(seed)
   np.random.seed(seed)
   torch.manual_seed(seed)
   torch.cuda.manual_seed_all(seed)


def to_grid(images, cols=2, bg=255):
   if isinstance(images, Image.Image):
       images = [images]
   w, h = images[0].size
   rows = math.ceil(len(images) / cols)
   grid = Image.new("RGB", (cols*w, rows*h), (bg, bg, bg))
   for i, im in enumerate(images):
       grid.paste(im, ((i % cols)*w, (i // cols)*h))
   return grid


device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32
print("device:", device, "| dtype:", dtype)

We define resource functions to ensure reproducibility and organize visual output. We set a random global seed so that our generations remain consistent across applications. We also detect available hardware and adjust precision to improve performance on GPU or CPU.

seed_everything(7)
BASE_MODEL = "runwayml/stable-diffusion-v1-5"


pipe = StableDiffusionPipeline.from_pretrained(
   BASE_MODEL,
   torch_dtype=dtype,
   safety_checker=None,
).to(device)


pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)


if device == "cuda":
   pipe.enable_attention_slicing()
   pipe.enable_vae_slicing()


prompt = "a cinematic photo of a futuristic street market at dusk, ultra-detailed, 35mm, volumetric lighting"
negative_prompt = "blurry, low quality, deformed, watermark, text"


img_text = pipe(
   prompt=prompt,
   negative_prompt=negative_prompt,
   num_inference_steps=25,
   guidance_scale=6.5,
   width=768,
   height=512,
).images[0]

We run the base Stable Distribution pipeline and switch to the more efficient UniPC scheduler. We create a high-quality image directly by text command using carefully selected instructions and resolution settings. This establishes a solid foundation for subsequent improvements in speed and control.

LCM_LORA = "latent-consistency/lcm-lora-sdv1-5"
pipe.load_lora_weights(LCM_LORA)


try:
   pipe.fuse_lora()
   lora_fused = True
except Exception as e:
   lora_fused = False
   print("LoRA fuse skipped:", e)


fast_prompt = "a clean product photo of a minimal smartwatch on a reflective surface, studio lighting"
fast_images = []
for steps in [4, 6, 8]:
   fast_images.append(
       pipe(
           prompt=fast_prompt,
           negative_prompt=negative_prompt,
           num_inference_steps=steps,
           guidance_scale=1.5,
           width=768,
           height=512,
       ).images[0]
   )


grid_fast = to_grid(fast_images, cols=3)
print("LoRA fused:", lora_fused)


W, H = 768, 512
layout = Image.new("RGB", (W, H), "white")
draw = ImageDraw.Draw(layout)
draw.rectangle([40, 80, 340, 460], outline="black", width=6)
draw.ellipse([430, 110, 720, 400], outline="black", width=6)
draw.line([0, 420, W, 420], fill="black", width=5)


edges = cv2.Canny(np.array(layout), 80, 160)
edges = np.stack([edges]*3, axis=-1)
canny_image = Image.fromarray(edges)


CONTROLNET = "lllyasviel/sd-controlnet-canny"
controlnet = ControlNetModel.from_pretrained(
   CONTROLNET,
   torch_dtype=dtype,
).to(device)


cn_pipe = StableDiffusionControlNetPipeline.from_pretrained(
   BASE_MODEL,
   controlnet=controlnet,
   torch_dtype=dtype,
   safety_checker=None,
).to(device)


cn_pipe.scheduler = UniPCMultistepScheduler.from_config(cn_pipe.scheduler.config)


if device == "cuda":
   cn_pipe.enable_attention_slicing()
   cn_pipe.enable_vae_slicing()


cn_prompt = "a modern cafe interior, architectural render, soft daylight, high detail"
img_controlnet = cn_pipe(
   prompt=cn_prompt,
   negative_prompt=negative_prompt,
   image=canny_image,
   num_inference_steps=25,
   guidance_scale=6.5,
   controlnet_conditioning_scale=1.0,
).images[0]

We speed up the thinking by loading and assembling the LoRA adapter and demonstrate quick sampling in a few deployment steps. We then create a structural conditioning image and use ControlNet to direct the structure of the generated scene. This allows us to maintain structure while still benefiting from the direction of creative writing.

mask = Image.new("L", img_controlnet.size, 0)
mask_draw = ImageDraw.Draw(mask)
mask_draw.rectangle([60, 90, 320, 170], fill=255)
mask = mask.filter(ImageFilter.GaussianBlur(2))


inpaint_pipe = StableDiffusionInpaintPipeline.from_pretrained(
   BASE_MODEL,
   torch_dtype=dtype,
   safety_checker=None,
).to(device)


inpaint_pipe.scheduler = UniPCMultistepScheduler.from_config(inpaint_pipe.scheduler.config)


if device == "cuda":
   inpaint_pipe.enable_attention_slicing()
   inpaint_pipe.enable_vae_slicing()


inpaint_prompt = "a glowing neon sign that says 'CAFÉ', cyberpunk style, realistic lighting"


img_inpaint = inpaint_pipe(
   prompt=inpaint_prompt,
   negative_prompt=negative_prompt,
   image=img_controlnet,
   mask_image=mask,
   num_inference_steps=30,
   guidance_scale=7.0,
).images[0]


os.makedirs("outputs", exist_ok=True)
img_text.save("outputs/text2img.png")
grid_fast.save("outputs/lora_fast_grid.png")
layout.save("outputs/layout.png")
canny_image.save("outputs/canny.png")
img_controlnet.save("outputs/controlnet.png")
mask.save("outputs/mask.png")
img_inpaint.save("outputs/inpaint.png")


print("Saved outputs:", sorted(os.listdir("outputs")))
print("Done.")

We create a mask to isolate a specific region and apply paint to modify only that part of the image. We refine the selected area using the target command while keeping everything else intact. Finally, we save all intermediate and final results to disk for testing and reuse.

In conclusion, we have shown how a single pipeline of Diffusers can evolve into a flexible, production-ready imaging system. We described how to go from pure text-to-image generation to fast sampling, layout control, and directed image editing without changing frameworks or tools. This tutorial highlights how to combine editors, LoRA adapters, ControlNet, and Paint to create controllable and efficient production pipelines that are easily extended for more advanced creative or applied scenarios.

Check it out Full Codes here. Also, feel free to follow us Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.

pleasuremandarya@gmail.com 3 hours ago

0 0 3 minutes read

A Coding Guide to High-Quality Image Production, Control, and Editing Using HuggingFace Diffusers

pleasuremandarya@gmail.com

Leave a Reply Cancel reply

Cybersecurity in 2026: AI Attacks, Identity‑First Defense, and the New Playbook for Resilience

Top 10 Best Earning Apps in 2026 (Legit, Paying Fast & With Official Download Links)

Future‑Ready Gadgets 2026: The Devices That Actually Improve Daily Life

How Agentic AI Will Change Workflows in 2026 (Complete Guide)

“12 Best Fully Funded Scholarships for International Students in 2026 (Apply Now with Official Links)”

Top 10 Most Anticipated Games of 2026 – Release Dates, Platforms & Official Details”

pleasuremandarya@gmail.com

How to customize your character in GTA 5?

Amazon hits back at Financial Times report blaming AI coding tools for AWS outages - GeekWire

Related Articles

How to Design a Swiss Army Knife Research Agent Tool-Using AI, Web Search, PDF Analysis, Visualization, and Automated Reporting

NVIDIA Releases DreamDojo: An Open World Robot Model Trained on 44,711 Hours of Real-World Human Video Data

Why IT Reports Don’t Make Sense for Business Leaders

NVIDIA Releases Dynamo v0.9.0: Major Infrastructure Fixes Including FlashIndexer, Multi-Condition Support, and Removed NATS and ETCD