支持多种SQL标准特性,包括窗口函数
窗口函数允许在查询结果集中进行复杂的计算,如排序、聚合和排名等,而不需要将行分组或减少结果集的大小,下面具体说说
窗口排序函数
1. ROW_NUMBER()
•功能:为每一行分配一个唯一的行号,从1开始,按指定的排序规则递增。
•语法:
ROW_NUMBER() OVER ([PARTITION BY partition_expression, ...] ORDER BY sort_expression [ASC|DESC], ...)
2. RANK()
•功能:根据排序条件给每行分配一个排名,如果有相同的值,则它们会得到相同的排名,并且下一个排名会跳过相应数量。
•语法:
RANK() OVER ([PARTITION BY partition_expression, ...] ORDER BY sort_expression [ASC|DESC], ...)
3. DENSE_RANK()
•功能:类似于 RANK(),但不会跳过排名。即使有相同的值,它们也会获得相同的排名,接下来的排名会连续增加。
•语法:
DENSE_RANK() OVER ([PARTITION BY partition_expression, ...] ORDER BY sort_expression [ASC|DESC], ...)
4. PERCENT_RANK()
•功能:计算当前行在其分区中的相对位置(百分比排名),范围是0到1之间。
•语法:
PERCENT_RANK() OVER ([PARTITION BY partition_expression, ...] ORDER BY sort_expression [ASC|DESC], ...)
5. CUME_DIST()
•功能:计算累积分布,即当前行及其之前所有行占总行数的比例。
•语法:
CUME_DIST() OVER ([PARTITION BY partition_expression, ...] ORDER BY sort_expression [ASC|DESC], ...)
6. NTILE(n)
•功能:将结果集分成n个几乎相等的部分,并为每一部分分配一个桶编号(从1到n)。
•语法:
NTILE(n) OVER ([PARTITION BY partition_expression, ...] ORDER BY sort_expression [ASC|DESC], ...)
7. LAG() 和 LEAD()
•功能:LAG() 用于访问前一行的数据,LEAD() 用于访问下一行的数据。这两个函数虽然不是直接用于排序,但在排序基础上可以用来比较相邻行的数据。
•语法:
LAG(column_name, offset, default_value) OVER ([PARTITION BY partition_expression, ...] ORDER BY sort_expression [ASC|DESC], ...)
LEAD(column_name, offset, default_value) OVER ([PARTITION BY partition_expression, ...] ORDER BY sort_expression [ASC|DESC], ...)
使用举例
sql使用示例假设有一个销售记录表 sales,包含字段 sale_id, product_id, sale_date, amount。我们可以使用这些窗口函数来分析数据,例如:
SELECT
sale_id,
product_id,
sale_date,
amount,
ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY sale_date DESC) AS row_num,
RANK() OVER (PARTITION BY product_id ORDER BY amount DESC) AS rank,
DENSE_RANK() OVER (PARTITION BY product_id ORDER BY amount DESC) AS dense_rank,
PERCENT_RANK() OVER (PARTITION BY product_id ORDER BY amount DESC) AS percent_rank,
CUME_DIST() OVER (PARTITION BY product_id ORDER BY amount DESC) AS cume_dist,
NTILE(4) OVER (PARTITION BY product_id ORDER BY amount DESC) AS quartile,
LAG(amount, 1, 0) OVER (ORDER BY sale_date) AS prev_amount,
LEAD(amount, 1, 0) OVER (ORDER BY sale_date) AS next_amount
FROM
sales;
sql上述查询将返回每个产品的销售记录,并为其添加基于时间顺序和销售额排序的各种窗口函数计算的结果。
原创文章,作者:kirin,如若转载,请注明出处:https://blog.ytso.com/tech/bigdata/318077.html