Stable Diffusion Prompt Builder
Build structured prompts for Stable Diffusion / Midjourney with subject, style, quality, lighting, camera. Negative prompt supported.
Engineering prompts for Stable Diffusion
Stable Diffusion is an open-source text-to-image diffusion model released in August 2022 by Stability AI together with Robin Rombach, Andreas Blattmann and collaborators from CompVis (LMU Munich). It combines a variational autoencoder that compresses images into a latent space, a U-Net that denoises that latent step by step, and a text encoder (CLIP for 1.x, OpenCLIP for 2.x, T5 for SD3) that turns the prompt into the conditioning vector. Because weights and code are public, it triggered an ecosystem of fine-tunes, LoRAs, ControlNets and front-ends that no closed model has matched.
Major versions: SD 1.5 is the popular legacy checkpoint with the largest LoRA library; SD 2.x moved to OpenCLIP and 768×768; SDXL (July 2023) introduced a two-stage refiner and 1024×1024 native output with dramatically better anatomy and text; SD3 (2024) replaced the U-Net with a Multimodal Diffusion Transformer; and Flux.1 (Black Forest Labs, August 2024) is the spiritual successor from the original SD authors, with sharper photography and reliable text rendering.
Prompt structure
A reliable formula is subject + descriptors + style + quality + composition + lighting + lens + colour palette. Example: a small dog wearing a top hat, oil painting, Studio Ghibli style, masterpiece, 4k, ultra detailed, rule of thirds, golden hour lighting, 50mm bokeh, warm pastel palette. Common quality modifiers include masterpiece, best quality, ultra detailed, 4k, 8k, photorealistic, cinematic, dramatic lighting, by Greg Rutkowski. The negative prompt (low quality, blurry, bad anatomy, deformed, extra fingers, watermark, text) tells the sampler what to avoid and is often as impactful as the positive prompt.
Weights, tokens and samplers
In AUTOMATIC1111 syntax, (word:1.3) increases attention on that token and [word] reduces it. SD 1.5 has a 75-token cap per CLIP chunk; longer prompts are split and re-embedded automatically by the WebUI. The CFG scale (7-12 typical) controls how strictly the sampler obeys the prompt — very high values produce burnt, over-saturated outputs. The sampler matters too: Euler a is fast and creative; DPM++ 2M Karras at 20-30 steps is the modern default for clean photography.
Ecosystem: ControlNet, LoRA, inpainting
Img2Img seeds generation from an existing image with a denoising strength of 0.3-0.7. ControlNet (Feb 2023, Lvmin Zhang) conditions on pose skeletons, depth maps, Canny edges or scribbles — making composition predictable. LoRA (Low-Rank Adaptation) trains a 10-100 MB delta that injects a specific character or art style with only 10-100 reference images. Inpainting edits a masked region while preserving the rest. Front-ends include AUTOMATIC1111, ComfyUI (node graph), Forge, InvokeAI and Fooocus. Hosting options span a local 8 GB+ NVIDIA GPU, RunPod, Replicate and prompt galleries like lexica.art.
FAQ
Can I use Stable Diffusion outputs commercially? The main checkpoints (1.5, SDXL base, SD3) are released under permissive open-source / CC0-like licences and outputs are yours to use, but always re-check the specific licence of each fine-tune, LoRA or ControlNet you load — some impose non-commercial clauses.
ControlNet vs LoRA — what is the difference? ControlNet enforces structure (pose, depth, edges, segmentation). LoRA enforces style or identity (a specific artist, character, outfit). They compose: a ControlNet pose plus a LoRA character is the standard recipe for consistent comic panels.
Which sampler and step count should I start with? DPM++ 2M Karras at 25 steps with CFG 7 is a solid default for SDXL. For quick exploration drop to Euler a at 15 steps; for final renders try DPM++ 3M SDE Karras at 30-40 steps.
Is there an ethical concern with training data? Yes — the LAION-5B dataset behind early SD versions scraped artwork without explicit artist consent, sparking lawsuits (Andersen v. Stability AI). SD3 introduced an artist opt-out registry, and deepfake misuse remains an unresolved policy issue that every operator should consider before deployment.
Related Tools
Handwriting Generator
Convert typed text into an image with handwriting appearance. Useful for adding a personal touch to digital work.
Resume Generator
Fill a simple printable A4 CV from a form with personal data, education and experience.
Favicon Generator
Generate a favicon from text/emoji in all common sizes (16, 32, 48, 64, 192, 512). PNG download.