第7章 Sampling：让 Server 请求 LLM

Sampling 的本质

Sampling 是 MCP 协议中最独特也最强大的能力——它允许 MCP Server 反向向 MCP Client 发起 LLM 推理请求。通常的调用流向是 Client → Server，而 Sampling 实现了 Server → Client（最终到达 LLM）的反向调用。

Sampling（采样/推理请求）

MCP Server 通过 sampling/createMessage 方法，请求 Client 端的 LLM 生成一段文本。这使得 Server 可以在执行工具逻辑时，临时调用 AI 进行分析、决策或内容生成，而无需直接访问 LLM API。

  普通调用流向（Client → Server）：
  Host/LLM ──tools/call──► Server ──执行操作──► 结果返回

  Sampling 调用流向（Server → Client → LLM）：

  Host/LLM    Server
     │   ◄──────────────────────────────────────────── 1. tools/call
     │          │
     │          │ 2. 执行过程中需要 AI 辅助判断
     │          │
     │   ◄──────┘  3. sampling/createMessage
     │              { messages: [...], maxTokens: 200 }
     │
     │ 4. LLM 推理生成文本
     │
     │   ──────────────────────────────────────────►  5. CreateMessageResult
     │
     │          │ 6. 使用 AI 结果继续执行工具逻辑
     │   ◄──────┘
     │              7. CallToolResult（最终结果）

sampling/createMessage 能力

Server 通过调用 server.server.request({ method: "sampling/createMessage" }) 来发起推理请求。

请求参数

interface CreateMessageRequest {
  messages: SamplingMessage[];   // 对话消息列表
  modelPreferences?: {
    hints?: Array<{ name?: string }>;  // 建议使用的模型
    costPriority?: number;            // 成本优先级 0-1
    speedPriority?: number;           // 速度优先级 0-1
    intelligencePriority?: number;    // 智能优先级 0-1
  };
  systemPrompt?: string;     // 系统提示词
  includeContext?: "none" | "thisServer" | "allServers";
  temperature?: number;      // 生成温度
  maxTokens: number;         // 最大 token 数（必填）
  stopSequences?: string[];  // 停止序列
  metadata?: object;         // 自定义元数据
}

完整实现示例

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { CallToolRequestSchema, CreateMessageRequestSchema }
  from "@modelcontextprotocol/sdk/types.js";

const server = new Server(
  { name: "ai-assistant-server", version: "1.0.0" },
  {
    capabilities: {
      tools: {},
      // 注意：sampling 是 Client 的能力，Server 只需在初始化时确认
    }
  }
);

// 一个使用 sampling 的智能摘要工具
server.setRequestHandler(CallToolRequestSchema, async (req, extra) => {
  if (req.params.name !== "smart_summarize") {
    throw new Error("Unknown tool");
  }

  const { text, style } = req.params.arguments as {
    text: string;
    style: "bullet" | "paragraph" | "tweet";
  };

  // 通过 sampling 请求 LLM 生成摘要
  const result = await server.request(
    {
      method: "sampling/createMessage",
      params: {
        messages: [
          {
            role: "user",
            content: {
              type: "text",
              text: `请将以下文章总结为${
                style === "bullet" ? "要点列表" :
                style === "paragraph" ? "一段话" :
                "一条推文（140字以内）"
              }：\n\n${text}`,
            },
          },
        ],
        systemPrompt: "你是专业的内容摘要助手，善于提炼要点，表达简洁清晰。",
        maxTokens: style === "tweet" ? 100 : 500,
        temperature: 0.3,
        // 建议使用快速模型（成本和速度优先）
        modelPreferences: {
          speedPriority: 0.8,
          costPriority: 0.7,
          intelligencePriority: 0.5,
        },
      },
    },
    CreateMessageRequestSchema  // 响应 Schema
  );

  return {
    content: [{
      type: "text",
      text: result.content.type === "text" ? result.content.text : "（无法生成摘要）",
    }],
  };
});

Human-in-the-loop 设计

Sampling 的重要设计原则是 Human-in-the-loop（人在回路）——所有 Sampling 请求必须经过用户的明确批准。这是 MCP 安全模型的核心。

Human-in-the-loop（人在回路）

AI 系统在执行关键操作时，将决策权保留给人类的设计模式。在 MCP Sampling 中，Host 在将推理请求发送给 LLM 之前，必须向用户展示请求内容并获得明确许可。用户可以查看、修改甚至拒绝这个请求。

  MCP Server              MCP Host/Client           用户

  sampling/createMessage ──►
                              显示请求内容 ──────────► 👤 查看
                              等待用户批准 ◄────────── ✓ 同意 / ✗ 拒绝

                              （用户可以修改消息内容）

                              发送给 LLM
                              ◄── LLM 响应

  ◄── CreateMessageResult
      (model: "claude-3-5-sonnet", content: "...")

安全警告 Sampling 请求绑过 Human-in-the-loop 审批是严重的安全风险。恶意的 MCP Server 可能通过 Sampling 请求诱导 LLM 生成有害内容或泄露敏感信息。Host 实现必须在每次 Sampling 请求前向用户展示请求详情，永远不能自动批准。

实际应用场景

场景一：智能代码补全分析

// 读取代码后，用 AI 分析潜在问题
async function analyzeCodeWithAI(
  code: string,
  server: Server
): Promise<string> {
  const result = await server.request({
    method: "sampling/createMessage",
    params: {
      messages: [{
        role: "user",
        content: {
          type: "text",
          text: `分析这段代码，识别可能的安全漏洞（简洁回答）：\n\`\`\`\n${code}\n\`\`\``,
        },
      }],
      maxTokens: 300,
      systemPrompt: "你是安全代码审计专家，只指出真实存在的安全问题。",
    },
  }, CreateMessageRequestSchema);

  return result.content.type === "text" ? result.content.text : "";
}

场景二：多步骤智能任务

// 自动化重构工具：读文件 → AI 分析 → 生成重构方案 → 用户确认 → 写入
server.setRequestHandler(CallToolRequestSchema, async (req) => {
  if (req.params.name !== "auto_refactor") return;

  const { filePath } = req.params.arguments as { filePath: string };

  // Step 1: 读取原始代码
  const original = await fs.readFile(filePath, "utf-8");

  // Step 2: 用 AI 生成重构后的代码
  const refactored = await server.request({
    method: "sampling/createMessage",
    params: {
      messages: [{
        role: "user",
        content: {
          type: "text",
          text: `重构以下代码，改善可读性和性能，只返回重构后的代码，不要解释：\n\n${original}`,
        },
      }],
      maxTokens: 2000,
    },
  }, CreateMessageRequestSchema);

  // Step 3: 返回重构方案供用户审阅（不直接写入）
  const refactoredCode = refactored.content.type === "text"
    ? refactored.content.text
    : "（生成失败）";

  return {
    content: [{
      type: "text",
      text: `已生成重构方案。请使用 write_file 工具写入（如果你认可这个方案）：\n\n${refactoredCode}`,
    }],
  };
});

安全边界设置

在实现 Sampling 功能时，需要设置合理的安全边界：

maxTokens 限制

始终设置合理的 maxTokens 上限，防止意外生成超长内容导致费用失控。简单判断任务设 200-500，内容生成任务设 1000-2000。

systemPrompt 约束

通过 systemPrompt 明确限定 AI 的行为范围，例如"只回答技术问题"、"不生成任何代码以外的内容"，减少越权行为的可能。

includeContext 控制

谨慎使用 includeContext: "allServers"，它会将所有 Server 的上下文一并发送给 LLM，可能导致跨 Server 的数据泄露。默认使用 "none"。

频率限制

在 Server 内部实现请求频率限制，防止工具在循环中无限次发起 Sampling 请求。

Sampling 的适用场景 Sampling 适合"需要 AI 辅助判断但不是对话主流程"的场景，例如：内容摘要、代码分析、自动分类、格式转换。不要将 Sampling 用于"代替用户决策"——这样做会绕过 Human-in-the-loop 的保护机制。

本章小结 Sampling 是 MCP 协议中最独特的能力：Server 可以反向向 Host 发起 LLM 请求，实现 AI-in-the-loop 的服务端智能。这打破了传统"AI 调用工具"的单向模式，让服务端逻辑也能利用语言模型的推理能力。

核心安全保障：Human-in-the-loop——所有 Sampling 请求必须经过 Host 批准才能发送给 LLM，用户始终保持对 AI 行为的控制权。

典型应用：代码智能分析（在 commit hook 中调用 LLM 评估代码质量）、内容自动分类、多步骤 AI 工作流编排。使用时注意频率限制，避免在循环中无限次触发 Sampling。

Prompts：复用提示词模板

认证、安全与权限