Chapter 05

聚合函数与分组

COUNT/SUM/AVG/MAX/MIN、GROUP BY、HAVING、窗口函数与 CTE

5.1 聚合函数详解

聚合函数对一组行计算并返回单一值,常与 GROUP BY 配合使用。

函数含义忽略 NULL?
COUNT(*)计算总行数(包括 NULL)
COUNT(列名)计算该列非 NULL 的行数
COUNT(DISTINCT 列名)计算该列不重复非 NULL 值的数量
SUM(列名)求和
AVG(列名)求平均值是(分母不含 NULL)
MAX(列名)最大值(也适用于字符串/日期)
MIN(列名)最小值
SQL-- 基本聚合示例(整张 orders 表的统计)
SELECT
  COUNT(*)                        AS total_orders,
  COUNT(DISTINCT user_id)         AS unique_customers,
  SUM(total_price)               AS revenue,
  AVG(total_price)               AS avg_order_value,
  MAX(total_price)               AS max_order,
  MIN(total_price)               AS min_order
FROM orders
WHERE status != 'cancelled';

-- COUNT 的 NULL 陷阱示例
SELECT
  COUNT(*),               -- 5(总行数)
  COUNT(published_at),   -- 2(只有 2 篇已发布,published_at 非 NULL)
  COUNT(id)              -- 5(id 没有 NULL)
FROM posts;

5.2 GROUP BY — 分组

GROUP BY 将表按某一列(或多列)的相同值分成若干组,然后对每组执行聚合。

⚠️

GROUP BY 规则:SELECT 中的非聚合列必须出现在 GROUP BY 中 SELECT user_id, username, COUNT(*) FROM orders GROUP BY user_id 这在 MySQL 的 ONLY_FULL_GROUP_BY 模式(默认)下会报错,因为 username 不在 GROUP BY 中。正确写法应该把 username 也加入 GROUP BY,或用 ANY_VALUE(username)

SQL-- 按用户分组统计订单
SELECT
  user_id,
  COUNT(*) AS order_count,
  SUM(total_price) AS total_spent
FROM orders
WHERE status != 'cancelled'   -- WHERE 在分组前过滤原始行
GROUP BY user_id
ORDER BY total_spent DESC;

-- 多列分组:按用户+状态分组
SELECT
  user_id,
  status,
  COUNT(*) AS cnt,
  SUM(total_price) AS subtotal
FROM orders
GROUP BY user_id, status
ORDER BY user_id, status;

5.3 HAVING — 对分组结果过滤

HAVING 相当于针对分组结果的 WHERE,在 GROUP BY 之后执行。

SQL-- 找出消费总额超过 500 元的用户
SELECT
  user_id,
  SUM(total_price) AS total_spent
FROM orders
WHERE status != 'cancelled'    -- ①先过滤行(WHERE)
GROUP BY user_id               -- ②再分组
HAVING SUM(total_price) > 500   -- ③最后过滤分组结果(HAVING)
ORDER BY total_spent DESC;

SQL 语句执行顺序

📋

SQL 逻辑执行顺序(非书写顺序):
FROM — 确定数据来源(包括 JOIN)
WHERE — 过滤原始行
GROUP BY — 分组
HAVING — 过滤分组结果
SELECT — 计算输出列(包括聚合函数)
DISTINCT — 去重
ORDER BY — 排序
LIMIT / OFFSET — 分页
理解执行顺序有助于理解为什么 WHERE 中不能用 SELECT 的别名(SELECT 还没执行),而 HAVING 和 ORDER BY 可以引用聚合结果。

5.4 ROLLUP — 多维汇总

SQL-- 按状态分组并自动添加总计行(MySQL)
SELECT
  COALESCE(status, '总计') AS status,
  COUNT(*) AS count,
  SUM(total_price) AS total
FROM orders
GROUP BY status WITH ROLLUP;  -- MySQL 语法

-- PostgreSQL 语法
SELECT
  COALESCE(status, '总计') AS status,
  COUNT(*), SUM(total_price)
FROM orders
GROUP BY ROLLUP(status);  -- PostgreSQL 语法

5.5 窗口函数(Window Functions)

窗口函数是 SQL 中最强大的特性之一。与 GROUP BY 不同,窗口函数不折叠行——每行都保留,同时可以获取跨行的聚合结果。

💡

直观理解:滑动窗口 想象你站在一排数据前,每次只"看"某个范围内的数据(分区、前几行、后几行),计算出一个值附加在当前行旁边——但不改变行的数量。

OVER() 子句语法

SQL窗口函数名() OVER (
  [PARTITION BY 分区列]      -- 类似 GROUP BY,但不折叠行
  [ORDER BY 排序列]           -- 窗口内的排序(影响排名和累计计算)
  [ROWS BETWEEN ... AND ...]  -- 窗口帧范围(默认:从分区开始到当前行)
)

排名函数:ROW_NUMBER、RANK、DENSE_RANK

SQL-- 示例:对每个用户的订单按金额排名
SELECT
  user_id,
  total_price,
  ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY total_price DESC) AS rn,
  RANK()       OVER (PARTITION BY user_id ORDER BY total_price DESC) AS rnk,
  DENSE_RANK() OVER (PARTITION BY user_id ORDER BY total_price DESC) AS drnk
FROM orders;
函数并列处理示例(分数:100,100,80)
ROW_NUMBER()连续编号,无并列1, 2, 3
RANK()并列相同,之后跳号1, 1, 3
DENSE_RANK()并列相同,之后连续1, 1, 2

使用 ROW_NUMBER 实现"每组取前 N"

SQL-- 取每个用户金额最高的前 2 笔订单
SELECT *
FROM (
  SELECT
    user_id,
    total_price,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY total_price DESC) AS rn
  FROM orders
) ranked
WHERE rn <= 2;

偏移函数:LAG 和 LEAD

SQL-- LAG:取当前行的上一行值(用于环比计算)
-- LEAD:取当前行的下一行值
SELECT
  created_at,
  total_price,
  LAG(total_price, 1) OVER (ORDER BY created_at) AS prev_order_price,
  total_price - LAG(total_price, 1) OVER (ORDER BY created_at) AS price_diff
FROM orders
WHERE user_id = 1
ORDER BY created_at;
-- 第一行的 prev_order_price 为 NULL(没有上一行)

聚合窗口函数:累计统计

SQL-- 计算每笔订单的累计销售额(按时间排序)
SELECT
  id,
  created_at,
  total_price,
  SUM(total_price) OVER (
    ORDER BY created_at
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS cumulative_revenue,     -- 从第一行到当前行的累计
  AVG(total_price) OVER (
    ORDER BY created_at
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) AS rolling_3_avg           -- 最近 3 笔订单的滚动平均
FROM orders
ORDER BY created_at;

5.6 CTE — 公共表表达式

CTE(Common Table Expression,公共表表达式)WITH ... AS (...) 语法定义,是命名的临时结果集,让复杂查询更易读。

SQL-- 用 CTE 拆解复杂查询(等价于嵌套子查询,但更清晰)
WITH user_stats AS (
  -- CTE 1:统计每个用户的订单数和总金额
  SELECT
    user_id,
    COUNT(*) AS order_count,
    SUM(total_price) AS total_spent
  FROM orders
  WHERE status != 'cancelled'
  GROUP BY user_id
),
top_customers AS (
  -- CTE 2:筛选高价值用户(可引用前面定义的 CTE)
  SELECT * FROM user_stats
  WHERE total_spent > 500
)
-- 最终查询:关联用户信息
SELECT
  u.username,
  tc.order_count,
  tc.total_spent
FROM top_customers tc
INNER JOIN users u ON tc.user_id = u.id
ORDER BY tc.total_spent DESC;

递归 CTE(简介)

SQL-- 递归 CTE:生成 1-10 的数列
WITH RECURSIVE numbers AS (
  SELECT 1 AS n             -- 基础情况(递归起点)
  UNION ALL
  SELECT n + 1 FROM numbers  -- 递归步骤
  WHERE n < 10              -- 终止条件
)
SELECT * FROM numbers;

-- 实用场景:遍历员工层级结构(找某人所有下属)
WITH RECURSIVE subordinates AS (
  SELECT id, name, manager_id, 0 AS depth
  FROM employees
  WHERE id = 1   -- 从 CEO(id=1) 开始
  UNION ALL
  SELECT e.id, e.name, e.manager_id, s.depth + 1
  FROM employees e
  INNER JOIN subordinates s ON e.manager_id = s.id
)
SELECT * FROM subordinates ORDER BY depth;