5.1 聚合函数详解
聚合函数对一组行计算并返回单一值,常与 GROUP BY 配合使用。
| 函数 | 含义 | 忽略 NULL? |
|---|---|---|
COUNT(*) | 计算总行数(包括 NULL) | 否 |
COUNT(列名) | 计算该列非 NULL 的行数 | 是 |
COUNT(DISTINCT 列名) | 计算该列不重复非 NULL 值的数量 | 是 |
SUM(列名) | 求和 | 是 |
AVG(列名) | 求平均值 | 是(分母不含 NULL) |
MAX(列名) | 最大值(也适用于字符串/日期) | 是 |
MIN(列名) | 最小值 | 是 |
SQL-- 基本聚合示例(整张 orders 表的统计)
SELECT
COUNT(*) AS total_orders,
COUNT(DISTINCT user_id) AS unique_customers,
SUM(total_price) AS revenue,
AVG(total_price) AS avg_order_value,
MAX(total_price) AS max_order,
MIN(total_price) AS min_order
FROM orders
WHERE status != 'cancelled';
-- COUNT 的 NULL 陷阱示例
SELECT
COUNT(*), -- 5(总行数)
COUNT(published_at), -- 2(只有 2 篇已发布,published_at 非 NULL)
COUNT(id) -- 5(id 没有 NULL)
FROM posts;
5.2 GROUP BY — 分组
GROUP BY 将表按某一列(或多列)的相同值分成若干组,然后对每组执行聚合。
GROUP BY 规则:SELECT 中的非聚合列必须出现在 GROUP BY 中
SELECT user_id, username, COUNT(*) FROM orders GROUP BY user_id
这在 MySQL 的 ONLY_FULL_GROUP_BY 模式(默认)下会报错,因为 username 不在 GROUP BY 中。正确写法应该把 username 也加入 GROUP BY,或用 ANY_VALUE(username)。
SQL-- 按用户分组统计订单
SELECT
user_id,
COUNT(*) AS order_count,
SUM(total_price) AS total_spent
FROM orders
WHERE status != 'cancelled' -- WHERE 在分组前过滤原始行
GROUP BY user_id
ORDER BY total_spent DESC;
-- 多列分组:按用户+状态分组
SELECT
user_id,
status,
COUNT(*) AS cnt,
SUM(total_price) AS subtotal
FROM orders
GROUP BY user_id, status
ORDER BY user_id, status;
5.3 HAVING — 对分组结果过滤
HAVING 相当于针对分组结果的 WHERE,在 GROUP BY 之后执行。
SQL-- 找出消费总额超过 500 元的用户
SELECT
user_id,
SUM(total_price) AS total_spent
FROM orders
WHERE status != 'cancelled' -- ①先过滤行(WHERE)
GROUP BY user_id -- ②再分组
HAVING SUM(total_price) > 500 -- ③最后过滤分组结果(HAVING)
ORDER BY total_spent DESC;
SQL 语句执行顺序
SQL 逻辑执行顺序(非书写顺序):
① FROM — 确定数据来源(包括 JOIN)
② WHERE — 过滤原始行
③ GROUP BY — 分组
④ HAVING — 过滤分组结果
⑤ SELECT — 计算输出列(包括聚合函数)
⑥ DISTINCT — 去重
⑦ ORDER BY — 排序
⑧ LIMIT / OFFSET — 分页
理解执行顺序有助于理解为什么 WHERE 中不能用 SELECT 的别名(SELECT 还没执行),而 HAVING 和 ORDER BY 可以引用聚合结果。
5.4 ROLLUP — 多维汇总
SQL-- 按状态分组并自动添加总计行(MySQL)
SELECT
COALESCE(status, '总计') AS status,
COUNT(*) AS count,
SUM(total_price) AS total
FROM orders
GROUP BY status WITH ROLLUP; -- MySQL 语法
-- PostgreSQL 语法
SELECT
COALESCE(status, '总计') AS status,
COUNT(*), SUM(total_price)
FROM orders
GROUP BY ROLLUP(status); -- PostgreSQL 语法
5.5 窗口函数(Window Functions)
窗口函数是 SQL 中最强大的特性之一。与 GROUP BY 不同,窗口函数不折叠行——每行都保留,同时可以获取跨行的聚合结果。
直观理解:滑动窗口 想象你站在一排数据前,每次只"看"某个范围内的数据(分区、前几行、后几行),计算出一个值附加在当前行旁边——但不改变行的数量。
OVER() 子句语法
SQL窗口函数名() OVER (
[PARTITION BY 分区列] -- 类似 GROUP BY,但不折叠行
[ORDER BY 排序列] -- 窗口内的排序(影响排名和累计计算)
[ROWS BETWEEN ... AND ...] -- 窗口帧范围(默认:从分区开始到当前行)
)
排名函数:ROW_NUMBER、RANK、DENSE_RANK
SQL-- 示例:对每个用户的订单按金额排名
SELECT
user_id,
total_price,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY total_price DESC) AS rn,
RANK() OVER (PARTITION BY user_id ORDER BY total_price DESC) AS rnk,
DENSE_RANK() OVER (PARTITION BY user_id ORDER BY total_price DESC) AS drnk
FROM orders;
| 函数 | 并列处理 | 示例(分数:100,100,80) |
|---|---|---|
ROW_NUMBER() | 连续编号,无并列 | 1, 2, 3 |
RANK() | 并列相同,之后跳号 | 1, 1, 3 |
DENSE_RANK() | 并列相同,之后连续 | 1, 1, 2 |
使用 ROW_NUMBER 实现"每组取前 N"
SQL-- 取每个用户金额最高的前 2 笔订单
SELECT *
FROM (
SELECT
user_id,
total_price,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY total_price DESC) AS rn
FROM orders
) ranked
WHERE rn <= 2;
偏移函数:LAG 和 LEAD
SQL-- LAG:取当前行的上一行值(用于环比计算)
-- LEAD:取当前行的下一行值
SELECT
created_at,
total_price,
LAG(total_price, 1) OVER (ORDER BY created_at) AS prev_order_price,
total_price - LAG(total_price, 1) OVER (ORDER BY created_at) AS price_diff
FROM orders
WHERE user_id = 1
ORDER BY created_at;
-- 第一行的 prev_order_price 为 NULL(没有上一行)
聚合窗口函数:累计统计
SQL-- 计算每笔订单的累计销售额(按时间排序)
SELECT
id,
created_at,
total_price,
SUM(total_price) OVER (
ORDER BY created_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_revenue, -- 从第一行到当前行的累计
AVG(total_price) OVER (
ORDER BY created_at
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS rolling_3_avg -- 最近 3 笔订单的滚动平均
FROM orders
ORDER BY created_at;
5.6 CTE — 公共表表达式
CTE(Common Table Expression,公共表表达式)用 WITH ... AS (...) 语法定义,是命名的临时结果集,让复杂查询更易读。
SQL-- 用 CTE 拆解复杂查询(等价于嵌套子查询,但更清晰)
WITH user_stats AS (
-- CTE 1:统计每个用户的订单数和总金额
SELECT
user_id,
COUNT(*) AS order_count,
SUM(total_price) AS total_spent
FROM orders
WHERE status != 'cancelled'
GROUP BY user_id
),
top_customers AS (
-- CTE 2:筛选高价值用户(可引用前面定义的 CTE)
SELECT * FROM user_stats
WHERE total_spent > 500
)
-- 最终查询:关联用户信息
SELECT
u.username,
tc.order_count,
tc.total_spent
FROM top_customers tc
INNER JOIN users u ON tc.user_id = u.id
ORDER BY tc.total_spent DESC;
递归 CTE(简介)
SQL-- 递归 CTE:生成 1-10 的数列
WITH RECURSIVE numbers AS (
SELECT 1 AS n -- 基础情况(递归起点)
UNION ALL
SELECT n + 1 FROM numbers -- 递归步骤
WHERE n < 10 -- 终止条件
)
SELECT * FROM numbers;
-- 实用场景:遍历员工层级结构(找某人所有下属)
WITH RECURSIVE subordinates AS (
SELECT id, name, manager_id, 0 AS depth
FROM employees
WHERE id = 1 -- 从 CEO(id=1) 开始
UNION ALL
SELECT e.id, e.name, e.manager_id, s.depth + 1
FROM employees e
INNER JOIN subordinates s ON e.manager_id = s.id
)
SELECT * FROM subordinates ORDER BY depth;