第 5 章 · Prompt 工程 — Diffusion × ComfyUI 实战

一、Text Encoder 决定一切

理解 prompt 的第一步:模型"读"你写的字用的是 text encoder,不是魔法。不同模型的 text encoder 差异很大,决定它能理解什么样的文本。

模型	Text Encoder	最大 token	理解力
SD 1.5	CLIP ViT-L/14	77	简单概念,长句崩
SDXL	CLIP-L + CLIP-G	77×2	组合好,仍偏标签化
Pony / Illustrious	CLIP-L + CLIP-G	77×2	训练时用 danbooru 标签,必须标签化
SD 3 / Flux	CLIP-L + CLIP-G + T5-XXL	512(T5 部分)	长句、抽象概念、空间关系都行

关键区别:CLIP 只懂 77 token 上下文,T5 能读 512 token 的复杂句子。这就是为什么 Flux 能画出"一个戴红帽子的男人站在桌子左边,桌上放着三个黄色苹果和一本打开的书"这种多目标场景,而 SDXL 经常把红帽子戴到苹果上。

二、SDXL/SD1.5 的 prompt 原则

1. 按重要度从前往后

CLIP 对靠前 token 更敏感——主体 → 动作 → 环境 → 风格的顺序几乎铁律:

好:
a majestic siberian husky, wearing round glasses, sitting at a
wooden desk, warm afternoon light, bookshelf background,
photorealistic, shallow depth of field

差(风格先/主体后):
photorealistic, shallow depth of field, bookshelf, afternoon light,
wooden desk, sitting, round glasses, a siberian husky

2. 关键词比自然语言更稳

SDXL 对"关键词列表"比"完整英文句子"更敏感:

SDXL 友好:
1girl, long silver hair, red dress, standing, garden, cherry
blossoms, soft light, detailed, masterpiece

SDXL 可用但不是最优:
A girl with long silver hair wearing a red dress stands in a
garden with cherry blossoms under soft light.

但 Flux 反过来——Flux 喜欢完整自然句子,标签风会让它"水土不服"。下面第七节详解。

3. 权重语法

语法	含义	备注
`(word:1.3)`	权重 1.3×	推荐区间 0.8-1.4
`(word:0.5)`	权重 0.5×	弱化,不完全移除
`(((word)))`	≈ (word:1.1^3) ≈ 1.33	老语法,实际是嵌套乘
`[word]`	≈ 0.9	弱化
`(red:1.4) hair`	只 "red" 权重 1.4	精确控制
`(red hair:1.4)`	"red hair" 整体 1.4	联合强调

权重超过 1.5 基本要出问题
——会开始"硬塞"颜色/元素,牺牲解剖结构。想强化某元素,先加相关描述词(multiple red flowers)而不是 (red:2.0)。

4. BREAK 语法(重置 CLIP 上下文)

SDXL/SD1.5 的 CLIP 上下文是 77 token——一个 prompt 可以写超过 77,ComfyUI 会自动分片(chunk)。BREAK 显式告诉它"这里切开":

a beautiful woman, blue dress, standing on beach BREAK
sunset sky, dramatic clouds, ocean waves BREAK
hyperrealistic, golden hour, 8k

作用:避免不同语义互相"污染"——否则"blue dress" 可能和 "sunset sky" 混进同一个 chunk,模型会画出蓝色晚霞。

5. AND 语法(多条件混合)

AND 是另一种语义隔离——两段 prompt 各自编码后按权重加权平均:

red car :1.0 AND blue car :1.0
# 结果:紫色车(红蓝混合)

red car :1.5 AND blue car :0.5
# 结果:偏红色车

实用场景:生成"介于两种风格之间"的图——studio ghibli style :1.0 AND makoto shinkai style :1.0。

三、ComfyUI 的两个 CLIP Text Encode 技巧

CLIP Set Last Layer(Clip Skip)

一个争议技巧——跳过 CLIP 最后 N 层,让 prompt 更"贴近词面":

skip = 1(默认):用最后一层
skip = 2:跳过最后一层,常用于 Anything/NovelAI 系 SD1.5 模型
SDXL 和 Flux 上 Clip Skip 基本无效果——不要用

Prompt 长度 > 77 token 的自动处理

ComfyUI 默认策略是把 prompt 拆成 77-token chunks,每个 chunk 单独编码后拼起来。所以你 prompt 写 200 token 不会报错,但:

跨 chunk 的语义关联会弱化
推荐显式用 BREAK 控制边界
核心描述放在前 77 token

四、负向 prompt 的反常识

新手以为负向 prompt 越长效果越好——错。负向也占 CLIP 上下文,太长反而压缩正向空间。

SDXL 短负向模板(已经够用)

blurry, lowres, bad anatomy, extra fingers, watermark, text,
jpeg artifacts, deformed, worst quality

常见的"越加越糟"负向

"easynegative" 这类 embedding——对 SD1.5 有效,SDXL 上可能引入偏色
大段"anatomical impossibilities, extra limbs, floating limbs"——SDXL 本身已经很好,堆这些反而压缩正向容量
"nsfw, nude, explicit"——Pony 上必加,但干净模型(Animagine)上没必要

Flux 不吃负向

Flux 的 CFG=1 让负向 prompt 物理上无效——负向 CLIP Text Encode 节点即使连上 KSampler 也没作用。别在 Flux 上写负向,浪费时间。

五、Pony / Illustrious 的标签语法

Pony V6 和 Illustrious XL 完全用 danbooru 标签训练,语法是另一套:

必加的"质量锚点"

Pony V6 正向必加前缀:
score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up,

Illustrious 正向必加:
masterpiece, best quality, amazing quality, very aesthetic,
absurdres

内容 rating

rating_safe,                    # 最常用
rating_questionable,            # 轻度暗示
rating_explicit,                # 明确 NSFW

负向必加(干净输出):
rating_explicit, rating_questionable

常用结构/姿态标签

1girl, solo, cowboy_shot, looking_at_viewer, standing,
outdoors, cherry_blossom_tree, detailed_background,
long_hair, silver_hair, red_dress, (smile:1.2)

Pony 生态的查询工具

Danbooru 标签 wiki——所有合法标签的官方列表
AI Booru——Pony 生态二次分类
每个 Pony LoRA 页面上都有"Trigger Tags"——必看

Pony 高质量 Prompt 结构模板

score_9, score_8_up, score_7_up, source_anime, rating_safe,
1girl, solo,
long_hair, (silver_hair:1.2), blue_eyes,
red_dress, white_stockings,
standing, looking_at_viewer, cowboy_shot,
outdoors, cherry_blossom_tree, (falling_petals:1.1),
detailed_background, soft_light,
masterpiece, best_quality, absurdres

六、Flux 的自然语言写法

Flux 的 T5-XXL 本身是语言模型,理解自然英文完整句子能力极强——把 Flux 当 DALL-E 3 用,别用 SDXL 的标签思维。

Flux 最优 prompt 风格(完整段落):
A photograph of a majestic Siberian husky wearing round golden
wire-frame glasses, sitting upright at a polished walnut desk.
The desk has an open leather-bound book and a steaming cup of
coffee. Behind the husky, a wall of bookshelves filled with old
volumes. Warm afternoon sunlight streams in from the left through
a tall window, creating soft shadows and a golden-hour atmosphere.
Shallow depth of field, photorealistic, cinematic composition,
shot with a 85mm lens.

Flux 特别强的能力:

多目标空间关系:"three red apples on the left, a yellow book on the right"
长段落 coherent 描述:一次描述镜头、光线、人物、物体、氛围都能出
文字渲染:"A sign that says 'Open 24h' hangs on the wall"——Flux 能把文字画得对,SDXL 基本崩
抽象概念:"a feeling of nostalgia" 这种 SDXL 听不懂的描述,Flux 能理解

七、ComfyUI 高级 prompt 节点

ConditioningCombine / ConditioningAverage

把多个 prompt 手动合并——等价于 AND 但更直观:

ConditioningCombine:直接拼接,权重各 1
ConditioningAverage:按 conditioning_to_strength 混合
ConditioningConcat:前后拼,保留位置信息

ConditioningSetArea / ConditioningSetMask(区域 prompt)

让不同 prompt 作用于图像的不同区域——强力的构图工具:

CLIP Text Encode "red car" ─▶ ConditioningSetArea(x=0,   y=0, w=512, h=512) ─┐
CLIP Text Encode "blue car"─▶ ConditioningSetArea(x=512, y=0, w=512, h=512) ─┤
                                                                              ▼
                                                                     ConditioningCombine ──▶ KSampler

结果:画面左半红车,右半蓝车。比负向 prompt 精确千倍。

ComfyUI-Custom-Scripts 的动态 prompt

常用 custom node 包——支持:

{red|blue|green} dress——随机选一个
Wildcards:__color__ 从 color.txt 文件随机抽一行
Lora 嵌入式语法:<lora:my_lora:0.8>

八、Prompt Weighting 和 sigma 的微妙关系

一个进阶冷知识:prompt 在不同采样步数有不同影响——早期步数决定构图,晚期步数决定细节。ComfyUI 有节点 ConditioningSetTimestepRange 能"在前 50% 步用 prompt A,后 50% 用 prompt B"——SDXL 高手做风格过渡时用。

前 50% 步:a rough sketch of a castle
后 50% 步:a highly detailed castle, 8k, octane render
结果:构图阶段用"草图"定结构(防止过度细节),细节阶段加"高质量"词让后期锐化

九、反模式

SDXL 写成整段英文小作文:不如关键词列表稳。
Flux 写成标签列表:浪费 T5 的长句理解力。
权重动辄 1.8:解剖崩坏,降到 1.3 再加描述词。
负向 prompt 复制粘贴巨长模板:占 CLIP 上下文,不如短而精。
Pony 不加 score_X:画面糊得像 10 年前。
Flux 加负向 prompt:物理无效,白浪费时间。
SDXL 上用 Clip Skip 2:SDXL 没效果还可能变糟。
追求超长 prompt:200 token 不如 60 token 精准。
BREAK 到处乱塞:过分切 chunks 反而让语义断裂。
不同模型共用同一个 prompt:SDXL 能跑的 prompt 在 Flux 上平庸,反之亦然。至少要 A/B 调过。

十、本章小结

记住:
① Text Encoder 决定一切:CLIP 擅长标签,T5 擅长长句——Flux 用自然语言,SDXL/Pony 用关键词/标签。
② 权重语法 (word:1.3) 就够,控制在 0.8-1.4;BREAK 切 chunk,AND 多 prompt 混合。
③ Pony/Illustrious 必须用 danbooru 标签体系,score_9 这类锚点是质量生死线。
④ Flux 无负向 prompt,CFG=1 + FluxGuidance=3.5,自然语言段落最佳。