第9章 Agent SDK 与 Computer Use — Anthropic API 完全指南

Claude Agent SDK 是什么

Anthropic 官方的 Agent 开发框架(TypeScript 和 Python 双版本),在 Messages API 之上封装了:

自动 agent loop(tool_use → 执行 → tool_result → 继续)
内置工具:Bash、File Edit、Read、Write、Web Search、Web Fetch
Skills 机制:把常用 prompt 打包成可复用技能
MCP 客户端:接入 Model Context Protocol 服务器
权限 & 审批钩子:危险操作前暂停
Claude Code / Claude Web 都是基于它

最小 Agent

import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({
  prompt: "克隆这个仓库然后跑测试:https://github.com/foo/bar",
  options: {
    model: "claude-sonnet-4-6",
    allowed_tools: ["Bash", "Read", "Edit", "WebFetch"],
    permission_mode: "default",   // 或 "bypass" / "acceptEdits"
  },
})) {
  if (message.type === "assistant") {
    console.log(message.content);
  }
}

一条命令就跑完"clone → 分析 → 跑测试 → 看结果"——Agent SDK 把 loop、权限、工具全包了。

和 Messages API 的关系

你的代码
   ↓
Agent SDK(loop + tools + permission)
   ↓
Messages API(原始 HTTP)
   ↓
Claude 模型

底层还是 Messages API,Agent SDK 是"production-grade loop"——不想自己写,直接用。

内置工具清单

工具	用途
Bash	执行 shell 命令
Read	读文件(支持图片/PDF)
Write	写文件
Edit	精确字符串替换
Glob	按模式找文件
Grep	ripgrep 搜索
WebFetch	抓网页并总结
WebSearch	网络搜索
Task	派生子 agent(并行任务)

通过 allowed_tools 白名单控制可用范围——只给 Read / Grep 就能做"只读分析器"。

权限模式

default

危险操作(写文件、跑 Bash)逐个征求用户同意,通过 canUseTool 回调实现

acceptEdits

文件编辑自动通过,其他仍需审批

bypass

全通过——信任模式,适合沙箱内/可信任务

plan

只能读、不能写;agent 先制定计划给你审

canUseTool 自定义拦截

options: {
  canUseTool: async (toolName, input) => {
    if (toolName === "Bash" && input.command.includes("rm -rf")) {
      return { behavior: "deny", reason: "禁止递归删除" };
    }
    if (toolName === "WebFetch" && input.url.includes("internal.company")) {
      return { behavior: "allow", updatedInput: { ...input, headers: {...} } };
    }
    return { behavior: "allow", updatedInput: input };
  },
},

自定义工具

import { tool, createSdkMcpServer } from "@anthropic-ai/claude-agent-sdk";
import { z } from "zod";

const weatherTool = tool(
  "get_weather",
  "获取城市天气",
  { city: z.string() },
  async ({ city }) => {
    const data = await fetchWeather(city);
    return { content: [{ type: "text", text: JSON.stringify(data) }] };
  }
);

const mcpServer = createSdkMcpServer({
  name: "my-tools",
  tools: [weatherTool],
});

// 挂进 agent
const result = query({
  prompt: "...",
  options: { mcpServers: { myTools: mcpServer } },
});

Skills 机制

把常用的"角色+工具+指令"打包成 Skill,Claude 发现场景匹配时自动启用:

--- skills/pdf-tools/SKILL.md ---
---
name: pdf-tools
description: 处理 PDF 文件 - 提取文本、合并、加水印
---

# PDF Tools

This skill helps with PDF manipulation. Use it when the user asks about:
- Extracting text from PDFs
- Merging multiple PDFs
- Adding watermarks
- Splitting by pages

## Example
User: merge these 3 PDFs into one
You: Use `python merge.py ...`

Claude 看 description 决定要不要载入整个 SKILL.md——渐进加载,节省上下文。

MCP:Model Context Protocol

MCP 是 Anthropic 2024 末推出的开放协议,让 LLM 客户端(Claude Desktop、Claude Code、VS Code 扩展等)和外部工具服务对话。现在已经事实标准。

// ~/.claude/mcp.json
{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": { "GITHUB_TOKEN": "ghp_..." }
    },
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"]
    }
  }
}

一个 MCP 服务器暴露 tools / resources / prompts 三类能力。社区已有 100+ 服务器:GitHub、Slack、PostgreSQL、Puppeteer、AWS……Agent SDK 直接挂。

Computer Use

让 Claude 操作屏幕——截屏看、动鼠标、敲键盘、点按钮。三个服务端工具:

tools: [
  { type: "computer_20250124", name: "computer", display_width_px: 1280, display_height_px: 800 },
  { type: "bash_20250124",     name: "bash" },
  { type: "text_editor_20250728", name: "str_replace_editor" },
],

Claude 会产出类似:

{
  "type": "tool_use",
  "name": "computer",
  "input": {
    "action": "screenshot"
  }
}

你截屏,base64 作为 image 回塞;Claude 看图后决定下一步:

{
  "input": { "action": "left_click", "coordinate": [512, 340] }
}

Computer Use 必须沙箱
让模型控制你的主机 = 随时有危险(删文件、发邮件、登陆密码管理器)。务必用 Docker / VM / 专用账户 / 网络隔离运行。生产场景推荐 Anthropic 的托管 sandbox。

Computer Use 支持的 action

screenshot(截屏)
mouse_move / left_click / right_click / double_click / middle_click
left_click_drag(拖拽)
type(敲字符)/ key(按键盘,支持组合键 ctrl+c)
cursor_position(查当前鼠标位置)

Agent 架构选择

场景	推荐
快速原型 / 简单 agent	手写 loop(第 4 章)
生产 coding agent / 运维助手	Agent SDK
需要状态图、分支、HITL	LangGraph(见 langgraph/ 教程)
多 agent 协作	OpenAI Agents / CrewAI / Mastra
屏幕自动化	Agent SDK + Computer Use

生产部署建议

Agent 跑在容器里(每个会话独立),有超时、资源限额
日志 trace 每一步的 tool_use / tool_result 和 usage —— 搭 OpenTelemetry(见 opentelemetry/ 教程第 10 章)
所有危险操作强制走 canUseTool 审批,敏感动作需要人审
Prompt injection 防御:工具返回内容用 XML 标签包裹、不可信内容单独标记
Computer Use 一律沙箱,没例外

本章小结

    Agent SDK = Messages API 之上的"production loop + 内置工具 + 权限"
内置 Bash / Read / Write / Edit / Grep / WebFetch 等,白名单控制
Skills 渐进加载,MCP 协议接入外部工具(GitHub / Slack / DB 等)
Computer Use = screenshot + mouse/key,一律沙箱运行
自己写 loop 是基础;Agent SDK 适合生产;LangGraph 适合复杂状态

  

从 API 到 Agent

Claude Agent SDK 是什么

最小 Agent

和 Messages API 的关系

内置工具清单

权限模式

canUseTool 自定义拦截

自定义工具

Skills 机制

MCP:Model Context Protocol

Computer Use

Computer Use 支持的 action

Agent 架构选择

生产部署建议

本章小结