第5章工具调用的跨厂商统一

工具调用在做什么

LLM 本身只能生成文本,它不能直接查数据库、发邮件、调 API。工具调用是这样一个协议:

你把可用的"工具清单"(函数名 + 参数 schema)一并发给模型。
模型判断需要调某个工具,返回一个结构化的 tool_call——函数名 + 参数 JSON。
你的代码执行这个函数,把结果作为 role=tool 的消息回填给模型。
模型看到结果后继续生成最终回复,或者决定再调另一个工具。

这就是 Agent 的基本循环。重要的是:模型自己不执行任何代码,执行是你的责任。它只是决定"该调谁,传什么参数"。

OpenAI tools 格式:通用货币

LiteLLM 要求你永远用 OpenAI 的 tools 格式写定义。它内部翻译成 Claude 的 tool_use、Gemini 的 function_declarations、Bedrock Converse 的 toolConfig 等。

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "查询指定城市当前天气",
            "parameters": {
                "type": "object",
                "properties": {
                    "city":    {"type": "string", "description": "城市名, 如 Beijing"},
                    "unit":    {"type": "string",
                                "enum": ["celsius", "fahrenheit"],
                                "default": "celsius"},
                },
                "required": ["city"],
            },
        },
    }
]

四个关键点:

name

函数名,要求是合法的标识符(字母数字下划线)。模型按名字引用。

description

一句话描述这个工具做什么。这是模型决定"要不要调、什么时候调"的唯一依据,写得越清晰越好。

parameters

标准 JSON Schema。type: object,properties 描述每个字段。required 列出必填项。

field description

每个字段的 description 模型都会看。"英文名还是中文名?"、"是否带单位?"这种歧义全靠这里消除。

最小完整示例:一次工具调用闭环

import json
from litellm import completion

def get_weather(city: str, unit: str = "celsius") -> dict:
    # 真实世界里这里可能是 HTTP 请求
    return {"city": city, "temp": 22, "unit": unit, "condition": "cloudy"}

TOOLS = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "返回城市当前天气",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
            },
            "required": ["city"],
        },
    },
}]

messages = [{"role":"user", "content":"北京现在天气如何? 用摄氏度"}]

# === Step 1: 让模型决定要不要调工具 ===
resp = completion(
    model="gpt-4o",
    messages=messages,
    tools=TOOLS,
    tool_choice="auto",     # 模型自己决定
)

msg = resp.choices[0].message
messages.append(msg.model_dump(exclude_none=True))   # 回填 assistant 轮

# === Step 2: 执行所有 tool_calls ===
for tc in msg.tool_calls or []:
    args = json.loads(tc.function.arguments)
    result = get_weather(**args)
    messages.append({
        "role": "tool",
        "tool_call_id": tc.id,
        "content": json.dumps(result, ensure_ascii=False),
    })

# === Step 3: 把工具结果交回模型, 让它写最终回答 ===
final = completion(model="gpt-4o", messages=messages, tools=TOOLS)
print(final.choices[0].message.content)
# 北京现在多云, 气温 22°C。

这一段是整个工具调用的骨架。生产 Agent 要做的就是把这段放进一个循环,直到模型不再返回 tool_calls 为止。

把这段代码换成 Claude?

只改一行:

resp = completion(
    model="anthropic/claude-sonnet-4-5",      # 改这里
    messages=messages,
    tools=TOOLS,
    tool_choice="auto",
    max_tokens=1024,                            # Claude 必填
)

其他全部不用动。LiteLLM 在底下做了这些翻译:

概念	OpenAI 原生	Anthropic 原生	Gemini 原生	LiteLLM 对外
工具定义	`tools=[{type:"function",function:{...}}]`	`tools=[{name,description,input_schema}]`	`tools=[{functionDeclarations:[...]}]`	OpenAI 格式
工具调用返回	`message.tool_calls`	`content=[{type:"tool_use",...}]`	`parts=[{functionCall:{...}}]`	`message.tool_calls`
工具调用 id	`tool_calls[i].id`	`tool_use.id`	无官方 id	`tool_calls[i].id`
工具结果回填	`role:"tool",tool_call_id:...`	`role:"user",content=[{type:"tool_result",...}]`	`role:"user",parts=[{functionResponse:{...}}]`	`role:"tool",tool_call_id:...`
强制调某个工具	`tool_choice={type:"function",function:{name}}`	`tool_choice={type:"tool",name}`	`tool_config.mode="ANY"`	OpenAI 格式

这张表基本就是 LiteLLM 这一层"最硬核"的工作。你只要信任它的翻译,日常用不到底层细节。

tool_choice:谁决定调工具

tool_choice = "auto"

默认值。模型自己决定要不要调,觉得不需要就直接返回文本。

tool_choice = "none"

禁止调工具,无论 tools 列表里有什么。用在"只让模型总结"这种场景。

tool_choice = "required"

强制必须调一个工具(任意一个)。Classifier 类场景常用。

tool_choice = {"type": "function", "function": {"name": "get_weather"}}

强制调指定工具。比如"结构化提取"场景——硬把输出塞进某个 schema。

兼容性注意
· "required" 和指定工具在 OpenAI、Anthropic(相当于 any/具体工具)、Gemini(ANY/AUTO 模式)都支持。
· 老模型(如 GPT-3.5)、某些开源模型(Ollama 里的小模型)可能不支持 "required" 或只部分支持——测试必须覆盖你真实使用的模型。

并行工具调用

GPT-4o、Claude 4、Gemini 2 都支持一次返回多个 tool_call。比如用户问"北京和上海的天气",模型可能一次性返回两个 get_weather 调用。

resp = completion(
    model="gpt-4o",
    messages=[{"role":"user","content":"北京和上海今天天气"}],
    tools=TOOLS,
    parallel_tool_calls=True,   # OpenAI 默认就是 True
)

for tc in resp.choices[0].message.tool_calls:
    print(tc.function.name, tc.function.arguments)
# get_weather {"city":"Beijing"}
# get_weather {"city":"Shanghai"}

执行这两个调用时,可以并行:

import asyncio
from litellm import acompletion

async def run_tool(tc):
    args = json.loads(tc.function.arguments)
    # 真实里 get_weather 可能是 await httpx.get(...)
    result = get_weather(**args)
    return {
        "role": "tool",
        "tool_call_id": tc.id,
        "content": json.dumps(result),
    }

tool_msgs = await asyncio.gather(*[run_tool(tc) for tc in msg.tool_calls])
messages.extend(tool_msgs)

并行执行是 Agent 延迟优化的首要招数。三个工具顺序调要 3 秒,并行只要 1 秒。

关掉 parallel_tool_calls 的时机

有时你不希望并行——比如第二个工具依赖第一个的结果(读库->写库)。这时:

resp = completion(
    model="gpt-4o",
    messages=msgs,
    tools=tools,
    parallel_tool_calls=False,    # 一次只返回一个 tool_call
)

这是 OpenAI 独有能力。LiteLLM 会把它映射到其他家,但有些厂商只能通过 prompt 引导,不保证百分百遵守。

strict mode:JSON schema 强制匹配

OpenAI 的 strict: true 让模型返回严格符合 schema 的 JSON——不允许多余字段、不允许类型错误。代价是只支持 JSON Schema 的一个子集(不能有 default、anyOf 部分受限等)。

TOOLS = [{
    "type": "function",
    "function": {
        "name": "classify_ticket",
        "description": "将客服工单分类",
        "strict": True,         # 严格模式
        "parameters": {
            "type": "object",
            "properties": {
                "label": {"type": "string",
                          "enum": ["bug", "feature", "billing", "other"]},
                "urgency": {"type": "integer", "minimum": 1, "maximum": 5},
            },
            "required": ["label", "urgency"],
            "additionalProperties": False,  # strict 要求必填
        },
    },
}]

strict 在不同家的支持:

OpenAI GPT-4o / o1 / o3

原生支持,100% 遵守 schema。

Anthropic Claude

协议里没有 strict 字段,但 Claude 在 tool use 场景下对 JSON schema 的遵守本来就很严格。LiteLLM 静默忽略 strict。

Gemini

通过 responseSchema 实现严格模式,覆盖率接近 OpenAI。

开源模型

看推理后端——vLLM + guided_json 可以严格,Ollama 的 json mode 只是"尽量是 JSON"。不要假设有 strict。

多轮 Agent 循环

真实 Agent 是要循环的——模型可能连续调好几轮工具。标准模板:

def run_agent(user_input, max_iters=8):
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user",   "content": user_input},
    ]
    for i in range(max_iters):
        resp = completion(
            model="gpt-4o",
            messages=messages,
            tools=TOOLS,
            tool_choice="auto",
            max_tokens=1024,
        )
        msg = resp.choices[0].message
        messages.append(msg.model_dump(exclude_none=True))

        # 没调工具 = 轮次终止
        if not msg.tool_calls:
            return msg.content

        # 执行所有工具
        for tc in msg.tool_calls:
            fn   = TOOL_REGISTRY[tc.function.name]
            args = json.loads(tc.function.arguments)
            try:
                result = fn(**args)
            except Exception as e:
                result = {"error": str(e)}   # 错误也要回填, 让模型自己决定重试
            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "content": json.dumps(result, ensure_ascii=False),
            })

    return "<达到最大迭代次数>"

五条关于稳健 Agent 的经验:

设 max_iters:没有上限的循环可能无限烧钱。8~10 是个安全值。
工具错误要回填,不能 raise:让模型看到错误信息,它会尝试换参数或道歉。
tool_call_id 必须原样回填:少一位字母都会让模型报错找不到对应调用(Claude 尤其严格)。
工具描述改了以后,旧的历史会话要清掉:模型看到某个 tool_call_id 找不到对应 tool 会崩溃。
assistant 消息里 tool_calls 和 content 不能同时缺:要么有 content,要么有 tool_calls,否则某些 provider 拒收。msg.model_dump(exclude_none=True) 能过滤掉 None 字段,推荐。

工具注册的工程化

一个 Agent 通常有几十个工具。手写 TOOLS 列表很快就会乱。一个好办法是"函数即工具":

import inspect
from functools import wraps

TOOL_REGISTRY = {}
TOOL_SPECS    = []

def tool(fn):
    """把普通 Python 函数自动变成 LLM tool."""
    sig = inspect.signature(fn)
    props, required = {}, []
    for name, p in sig.parameters.items():
        props[name] = {"type": "string"}   # 简化, 真实要看 type hint
        if p.default is inspect.Parameter.empty:
            required.append(name)

    spec = {
        "type": "function",
        "function": {
            "name": fn.__name__,
            "description": fn.__doc__ or "",
            "parameters": {
                "type": "object",
                "properties": props,
                "required": required,
            },
        },
    }
    TOOL_REGISTRY[fn.__name__] = fn
    TOOL_SPECS.append(spec)
    return fn

@tool
def get_weather(city: str) -> dict:
    """返回指定城市当前天气"""
    return {"city": city, "temp": 22}

@tool
def search_flight(origin: str, dest: str) -> list:
    """搜索起始地到目的地的航班"""
    return [...]

# 现在 TOOL_SPECS 是你要传给 completion 的 tools 列表

生产用的框架(OpenAI Agents SDK、Pydantic AI、LangChain)都有更完善的版本——自动读 type hint、处理 Pydantic model、生成准确 schema。这里只是示意。

流式 + 工具调用

第 4 章说过,流式工具调用要用 stream_chunk_builder 安全合并。再给一个完整片段:

from litellm import completion, stream_chunk_builder

resp = completion(
    model="anthropic/claude-sonnet-4-5",
    messages=messages,
    tools=TOOLS,
    max_tokens=1024,
    stream=True,
)

chunks = []
for chunk in resp:
    chunks.append(chunk)
    delta = chunk.choices[0].delta
    # 把思考内容 / 最终文本实时显示给用户
    if delta.content:
        print(delta.content, end="", flush=True)

full = stream_chunk_builder(chunks)
msg = full.choices[0].message

if msg.tool_calls:
    for tc in msg.tool_calls:
        print(f"\n[调用 {tc.function.name}]")
        args = json.loads(tc.function.arguments)
        # ... 执行并回填

JSON mode vs 工具调用:结构化输出的两条路

很多人混淆这两者。结构化输出有两条主流实现:

对比项	JSON mode / response_format	工具调用(tool_choice=forced)
适用场景	单次输出就是 JSON,无副作用	Agent 流程,可能多轮
Schema 严格性	新 OpenAI + Gemini 原生 strict	各家都支持 schema,严格度不一
多字段 & 多项	返回值本身就是结构化	返回是"要调用的函数+参数"
UI 流式输出	部分厂商支持流式 JSON patch	content 可流式, tool_call arguments 也可流
跨厂商一致性	参差不齐,OpenAI 最好	统一度较高

经验法则:

只要结构化数据,没有后续动作(如"提取 resume 里的姓名和邮箱") → 用 response_format(第 6 章讲)。
要真正做点什么(查库、发邮件、调 API) → 用 tools。
强制只走一条路径的分类器(如"四选一+置信度") → 用 tool_choice={指定工具},相当于把工具当 schema 用。

常见坑汇总

忘了 role="tool" 这一轮:只把 assistant 的 tool_calls 记进 messages,没有把结果回填,下一轮模型会再次调同一个工具(无限循环)。
tool_call_id 手写了一个假的:不行,必须用 tc.id 原样返回。
Anthropic 的工具参数里带 None/null:Claude 对类型特别严格,传 null 给 required 字段会 400。手动 exclude_none=True。
工具定义里的 description 是英文,用户说中文,模型选不对工具:description 尽量用用户语言,或双语。
additionalProperties 不写:strict 模式下 OpenAI 要求显式 additionalProperties: false,否则报 schema invalid。
tools 列表巨大(50+)导致模型乱选:用"工具路由"把相关的几个选出来再传,类似 RAG 的筛选。
小参数模型(如 Llama-3-8B)根本不稳:工具调用对模型能力要求高。Haiku/Flash/DeepSeek V3 以上才能稳定。

本章小结

    OpenAI tools 格式是 LiteLLM 的"通用货币",底层翻译成各家原生协议
工具调用闭环 = 声明 tools → 模型返回 tool_calls → 你执行并回填 role="tool" → 模型继续
tool_choice:auto / none / required / 指定工具,四挡涵盖所有使用场景
并行调用靠 parallel_tool_calls,禁用靠 False;并行执行用 asyncio.gather
strict: true 让 schema 严格校验,但有类型限制,且不同厂商支持不一
流式工具调用要用 stream_chunk_builder 合并,不要自己拼 JSON
Agent 循环必须:max_iters 上限 / 错误回填 / tool_call_id 原样 / 过滤 None 字段