Chapter 09

PyTorch / CUDA / 平台约束

AI 开发者的年度噩梦——Mac 装 CPU 版、Linux 装 CUDA 版、CI 还要装 CPU 跑测试

问题为什么这么难?

PyTorch(以及 TensorFlow、JAX)不在 PyPI 发布 CUDA 版本,而是在 自家的独立索引(如 https://download.pytorch.org/whl/cu121)上提供。同一个版本号 torch==2.3.0 在不同索引里是不同的文件:

你的团队同时存在:① 开发者 Mac(必须 CPU)② 训练服务器 Linux+H100(要 CUDA 12.1)③ CI(用 CPU 跑单元测试)。过去需要写三份 requirements.txt 或在 Dockerfile 里魔法脚本。uv 用命名索引 + fork + marker在一个 pyproject 里搞定。

配置命名索引

[[tool.uv.index]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
explicit = true   # 只有显式引用才从这里找包

[[tool.uv.index]]
name = "pytorch-cu121"
url = "https://download.pytorch.org/whl/cu121"
explicit = true

explicit = true 表示"这个索引不参与默认解析"——uv 不会去这里找 pandas 之类的无关包。

按平台选择 PyTorch 变体

[project]
dependencies = [
  "torch>=2.3",
  "torchvision",
]

[tool.uv.sources]
torch = [
  { index = "pytorch-cpu",   marker = "sys_platform == 'darwin' or platform_machine != 'x86_64'" },
  { index = "pytorch-cu121", marker = "sys_platform == 'linux' and platform_machine == 'x86_64'" },
]
torchvision = [
  { index = "pytorch-cpu",   marker = "sys_platform == 'darwin'" },
  { index = "pytorch-cu121", marker = "sys_platform == 'linux'" },
]

解析时 uv 会为不同平台"分叉"(fork),同一个 uv.lock 里记录多组解:

一份 uv.lock,两组解共存: fork: sys_platform == "darwin" ├── torch 2.3.0 (from pytorch-cpu) └── torchvision 0.18.0 (from pytorch-cpu) fork: sys_platform == "linux" ├── torch 2.3.0+cu121 (from pytorch-cu121) └── torchvision 0.18.0+cu121 (from pytorch-cu121) Mac 的 uv sync 按 fork 1 安装;Linux 按 fork 2 安装。

CUDA 版本切换

从 CUDA 11.8 升到 12.1 只改索引 URL:

[[tool.uv.index]]
name = "pytorch-cu121"
url = "https://download.pytorch.org/whl/cu121"
explicit = true

# 换 CUDA 版本只改上面的 url。运行 `uv lock` 会重新解析
CUDA 版本要与驱动匹配

pytorch-cu121 要求主机 NVIDIA 驱动支持 CUDA 12.1+(nvidia-smi 查看)。驱动太老装 cu121 的 torch 能装上但 torch.cuda.is_available() 会返回 False。这不是 uv 的锅,是硬件驱动问题。

真实案例:支持 Mac + Linux GPU + CI

[project]
name = "my-ml"
requires-python = ">=3.12"
dependencies = [
  "torch>=2.3",
  "transformers>=4.40",
  "datasets",
  "accelerate",
]

[dependency-groups]
dev = ["pytest", "ruff"]
train = ["wandb", "deepspeed; sys_platform == 'linux'"]

[[tool.uv.index]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
explicit = true

[[tool.uv.index]]
name = "pytorch-cu121"
url = "https://download.pytorch.org/whl/cu121"
explicit = true

[tool.uv.sources]
torch = [
  { index = "pytorch-cpu",   marker = "sys_platform == 'darwin'" },
  { index = "pytorch-cu121", marker = "sys_platform == 'linux'" },
]

[tool.uv]
conflicts = [
  [ { group = "train" }, { group = "dev" } ],  # 仅示例:两组互斥
]

私有 PyPI / Artifactory / Nexus

企业内部私有源:

[[tool.uv.index]]
name = "internal"
url = "https://pypi.internal.mycorp.com/simple"
default = true   # 作为默认源(替代 PyPI)

# 认证通过环境变量
# UV_INDEX_INTERNAL_USERNAME=...
# UV_INDEX_INTERNAL_PASSWORD=...

[[tool.uv.index]]
name = "pypi"
url = "https://pypi.org/simple"   # 公开 PyPI 作为 fallback

带认证的完整 URL(不推荐,会进 lock):

url = "https://<user>:<token>@pypi.internal.mycorp.com/simple"

推荐做法:URL 不带凭证,凭证用环境变量注入。uv 按 UV_INDEX_<NAME_UPPERCASE>_USERNAME/PASSWORD 规则查找。

国内镜像

[[tool.uv.index]]
name = "tuna"
url = "https://pypi.tuna.tsinghua.edu.cn/simple"
default = true

或全局配置 ~/.config/uv/uv.toml(个人机器偏好,不进仓库)。

keyring 集成(企业推荐)

让 uv 通过系统 keyring 拿私有仓库凭证,无需环境变量:

[tool.uv]
keyring-provider = "subprocess"   # 调用 keyring CLI
# 一次性存储凭证
keyring set https://pypi.internal.mycorp.com/simple myuser
# 之后 uv sync 自动从 keyring 取密码

TensorFlow / JAX 的平台约束

[project]
dependencies = [
  # TensorFlow:Mac ARM 要装 tensorflow-macos,其他平台装 tensorflow
  "tensorflow; sys_platform != 'darwin' or platform_machine != 'arm64'",
  "tensorflow-macos; sys_platform == 'darwin' and platform_machine == 'arm64'",
  "tensorflow-metal; sys_platform == 'darwin' and platform_machine == 'arm64'",

  # JAX:CPU 版兼容所有平台;GPU 版需要单独装
  "jax",
  "jaxlib",
]

调试技巧

# 看解析过程(verbose)
uv lock -v

# 看某个包实际装了哪个 wheel
uv pip show torch

# 强制重新下载(绕过缓存)
uv sync --refresh

# 用 dry-run 预览,不真的动环境
uv sync --dry-run
本章小结

AI 项目的多平台 PyTorch 依赖终于有了干净解法:命名索引 + [tool.uv.sources] 下按 marker 分派。私有源用环境变量或 keyring 注入凭证,不要把 token 写入仓库。一份 pyproject、一份 uv.lock,团队所有平台统一。最后一章是实战:CI/CD 与 Docker。