Chapter 03

让字一个个跳出来

chatbot 的体验 80% 在"首 token 多快出现"。流式响应让用户 300ms 内就看到首字,而不是等 10 秒一整段"砰"的一下出现。本章讲 SSE 事件格式、SDK 帮你做了什么、什么时候必须自己处理原始事件。

为什么要流式

SSE 事件协议

Claude 流式走标准的 SSE(Server-Sent Events)——HTTP 长连接,每个事件形如:

event: message_start
data: {"type":"message_start","message":{"id":"msg_...","model":"...","usage":{...}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"你"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"好"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":64}}

event: message_stop
data: {"type":"message_stop"}

事件类型速查

message_start
消息元信息,usage.input_tokens 在这里已经知道
content_block_start
某个 block 开始(text / tool_use / thinking),index 标识是第几块
content_block_delta
增量:text_delta / input_json_delta(工具调用参数)/ thinking_delta
content_block_stop
某个 block 结束。如果下一条是另一个 block_start,说明还有内容
message_delta
最终 stop_reasonusage.output_tokens 在这里
message_stop
整条消息结束
ping
保活心跳,忽略即可
error
流中途错误,里面有 error.type 和 message

Node.js:async iterator

SDK 让你用标准 for await 循环,事件自动解析:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const stream = client.messages.stream({
  model: "claude-sonnet-4-6",
  max_tokens: 2048,
  messages: [{ role: "user", content: "讲 tcp 三次握手" }],
});

for await (const event of stream) {
  if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
    process.stdout.write(event.delta.text);
  }
}

// 或更高级别的便利事件
stream.on("text", (text) => process.stdout.write(text));
stream.on("finalMessage", (msg) => console.log("\n完整:", msg));

Python:with 语法

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[{"role": "user", "content": "讲 tcp 三次握手"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

    final = stream.get_final_message()
    print("\nusage:", final.usage)

with 退出时会自动关闭底层连接——别不用 with,否则忘关很容易漏资源。

用户点停止:优雅中断

前端一般用 AbortController 控制。Node SDK 支持原生 signal:

const ac = new AbortController();

// 用户点 stop 按钮
cancelBtn.addEventListener("click", () => ac.abort());

const stream = client.messages.stream({
  model: "claude-sonnet-4-6",
  max_tokens: 4096,
  messages,
}, { signal: ac.signal });

try {
  for await (const ev of stream) { /* ... */ }
} catch (e) {
  if (e.name === "AbortError") console.log("用户取消了");
  else throw e;
}

取消后底层 HTTP 连接立刻断,Anthropic 不会继续生成——你不付剩下的 output token

用 fetch 手动解析(不用 SDK)

前端不带 SDK、或者自定义代理想看原始格式:

const resp = await fetch("https://api.anthropic.com/v1/messages", {
  method: "POST",
  headers: {
    "x-api-key": key,
    "anthropic-version": "2023-06-01",
    "content-type": "application/json",
  },
  body: JSON.stringify({
    model: "claude-sonnet-4-6",
    max_tokens: 1024,
    stream: true,
    messages: [{ role: "user", content: "hi" }],
  }),
});

const reader = resp.body!.pipeThrough(new TextDecoderStream()).getReader();
let buf = "";
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buf += value;
  let idx;
  while ((idx = buf.indexOf("\n\n")) !== -1) {
    const chunk = buf.slice(0, idx);
    buf = buf.slice(idx + 2);
    const m = chunk.match(/^data: (.*)$/m);
    if (!m) continue;
    const ev = JSON.parse(m[1]);
    if (ev.type === "content_block_delta") console.log(ev.delta.text);
  }
}
记得 anthropic-version
直接 fetch 一定要带 anthropic-version: 2023-06-01 请求头,漏了直接 400。SDK 已经自动带上。

前端 → 服务器 → Claude 的透传

前端不能直接调 Anthropic(会暴露 Key)。典型架构:

[Browser] → POST /api/chat (SSE) → [Your Server] → Anthropic (SSE)
                                           │
                                           └─ 透传事件 + 注入你自己的业务逻辑

实现要点:

Thinking 流式细节

开 Extended Thinking(第 7 章)时,流里会先出 thinking 类型 block —— 这部分是 Claude 的内部推理,通常不展示给用户,但 API 会流出来以提高响应感。

content_block_start { type: "thinking" }
content_block_delta { delta: { type: "thinking_delta", thinking: "..." } }
content_block_stop
content_block_start { type: "text" }
content_block_delta { delta: { type: "text_delta", text: "最终答案" } }
...

应用层一般:thinking 显示为"正在思考..."指示器,真正的 text block 才打到聊天气泡里。

Tool Use 流式

Claude 调工具时,input_json_delta 是工具参数的分片(字符串片段,累加才成合法 JSON):

content_block_start { type: "tool_use", id: "toolu_...", name: "get_weather", input: {} }
content_block_delta { delta: { type: "input_json_delta", partial_json: "{\"lo" } }
content_block_delta { delta: { type: "input_json_delta", partial_json: "c\":\"" } }
content_block_delta { delta: { type: "input_json_delta", partial_json: "Tokyo\"}" } }
content_block_stop

SDK 帮你累加 —— stream.on("inputJson", ...) 回调整段合法 JSON。

超时 & 断线重连

生产建议:

事件可视化 demo

[0.2s] message_start          input_tokens=18
[0.3s] content_block_start    index=0 type=text
[0.4s] content_block_delta    "TCP"
[0.5s] content_block_delta    " 三次握手"
[0.6s] content_block_delta    "是..."
       ... (每约 20ms 一个 delta)
[8.1s] content_block_stop     index=0
[8.1s] message_delta          stop_reason=end_turn, output_tokens=520
[8.2s] message_stop

首 token 延迟 ≈ 200–400ms(Sonnet),后续大约每秒 80–200 tokens。

本章小结