目录
mysql语法
数据准备
create table emp (
empno numeric(4) not null,
ename varchar(10),
job varchar(9),
mgr numeric(4),
hiredate datetime,
sal numeric(7, 2),
comm numeric(7, 2),
deptno numeric(2)
);
insert into emp values (7369, 'SMITH', 'CLERK', 7902, '1980-12-17', 800, null, 20);
insert into emp values (7499, 'ALLEN', 'SALESMAN', 7698, '1981-02-20', 1600, 300, 30);
insert into emp values (7521, 'WARD', 'SALESMAN', 7698, '1981-02-22', 1250, 500, 30);
insert into emp values (7566, 'JONES', 'MANAGER', 7839, '1981-04-02', 2975, null, 20);
insert into emp values (7654, 'MARTIN', 'SALESMAN', 7698, '1981-09-28', 1250, 1400, 30);
insert into emp values (7698, 'BLAKE', 'MANAGER', 7839, '1981-05-01', 2850, null, 30);
insert into emp values (7782, 'CLARK', 'MANAGER', 7839, '1981-06-09', 2450, null, 10);
insert into emp values (7788, 'SCOTT', 'ANALYST', 7566, '1982-12-09', 3000, null, 20);
insert into emp values (7839, 'KING', 'PRESIDENT', null, '1981-11-17', 5000, null, 10);
insert into emp values (7844, 'TURNER', 'SALESMAN', 7698, '1981-09-08', 1500, 0, 30);
insert into emp values (7876, 'ADAMS', 'CLERK', 7788, '1983-01-12', 1100, null, 20);
insert into emp values (7900, 'JAMES', 'CLERK', 7698, '1981-12-03', 950, null, 30);
insert into emp values (7902, 'FORD', 'ANALYST', 7566, '1981-12-03', 3000, null, 20);
insert into emp values (7934, 'MILLER', 'CLERK', 7782, '1982-01-23', 1300, null, 10);
1.聚合函数(分组函数)
1.聚合统计逻辑
聚合统计:
group by => 分组
xianyu,<1,a,xc,asd>
lxy,<as,zxf,zxf,qwr,ags>
聚合函数 => 指标
xianyu,4
lxy,5
2.函数使用
group by =》 分组
聚合函数 =》 指标统计 sun avg max min count
需求:
统计每个部门有多少个人?
查什么?
维度:部门
指标:人数
select
deptno,
count(1) as cnt
from emp
group by deptno;
解释:
count(1) 【1.代表 先放置一个假数,然后再查询】
【2.理解为按照第几个字段进行查数】
select
select + 函数 => 可以校验函数是否存在
2.开窗函数
1.语法
窗口函数:
窗口 + 函数
窗口:函数运行时 计算数据集的范围
函数:运行时的函数
1.聚合函数
sun avg max min count
2.内置窗口函数
语法结构:
函数 over([partition by xxx,...] [order by xxx,...])
over() 是以谁进行开窗【table or 数据集】
partition by:以谁进行分组 【group by column】
order by:以谁进行排序【column】
2.聚合函数:多行数据 按照一定规则 进行聚合 为一行
sum avg max...
理论上:聚合后的行数 <= 聚合前的行数 【主要是看维度选取 group by 里面的字段】
需求:
既要显示 聚合前的数据 又要显示 聚合后的数据 ?
id name sal dt sal_all
1 zs 1000 2022-4 1000
2 ls 2000 2022-4 2000
3 wu 3000 2022-4 3000
4 zs 1000 2022-5 2000
5 ls 2000 2022-5 4000
6 wu 3000 2022-5 6000
数据:
服务器 每天的启动 次数
linux01,2022-04-15,1
linux01,2022-04-16,5
linux01,2022-04-17,7
linux01,2022-04-18,2
linux01,2022-04-19,3
linux01,2022-04-20,10
linux01,2022-04-21,4
统计累计问题:
创建表
create table window01(
name varchar(50),
dt varchar(20),
cnt int
);
插入数据
insert into window01 values("linux01","2022-04-15",1);
insert into window01 values("linux01","2022-04-16",5);
insert into window01 values("linux01","2022-04-17",7);
insert into window01 values("linux01","2022-04-18",2);
insert into window01 values("linux01","2022-04-19",3);
insert into window01 values("linux01","2022-04-20",10);
insert into window01 values("linux01","2022-04-21",4);
insert into window01 values("linux02","2022-04-18",20);
insert into window01 values("linux02","2022-04-19",30);
insert into window01 values("linux02","2022-04-20",10);
insert into window01 values("linux02","2022-04-21",40);
需求:
每个服务器 每天 累积启动次数
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt) as cut_all
from window01;
+---------+------------+------+---------+
| name | dt | cnt | cut_all |
+---------+------------+------+---------+
| linux01 | 2022-04-15 | 1 | 1 |
| linux01 | 2022-04-16 | 5 | 6 |
| linux01 | 2022-04-17 | 7 | 13 |
| linux01 | 2022-04-18 | 2 | 15 |
| linux01 | 2022-04-19 | 3 | 18 |
| linux01 | 2022-04-20 | 10 | 28 |
| linux01 | 2022-04-21 | 4 | 32 |
| linux02 | 2022-04-18 | 20 | 20 |
| linux02 | 2022-04-19 | 30 | 50 |
| linux02 | 2022-04-20 | 10 | 60 |
| linux02 | 2022-04-21 | 40 | 100 |
+---------+------------+------+---------+
1 9 10 11 str 【字典序】
1 10 11 9
* 从1开始,1,2,3,4,5,6,7,8,9
* 从10开始,1,10…19,2,3,4,5,6,7,8,9
* 从20开始,1,10…19,2,20…29,3,4,5,6,7,8,9
* 以此类推,所有的10位数,都插入到与他们十位数位置上相等的个位数后面。
3.内置窗口函数
窗口大小 xxx between xxx and xxx
参数
(ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt) as cut_all,
-- 无边界
sum(cnt) over(partition by name order by dt rows between unbounded preceding and current row) as cut_all2,
-- 前三行 + 当前行
sum(cnt) over(partition by name order by dt rows between 3 preceding and current row) as cut_all3,
-- 前三行 + 当前行 + 下一行
sum(cnt) over(partition by name order by dt rows between 3 preceding and 1 following) as cut_all4,
-- 上面无边界 + 下面无边界
sum(cnt) over(partition by name order by dt rows between unbounded preceding and UNBOUNDED FOLLOWING) as cut_all5
from window01;
select
name,
dt,
cnt,
-- 常规分组排序求加和
sum(cnt) over(partition by name order by dt) as cut_all,
-- 整张表对时间排序,然后加和,作用到整张表,理解为18号并列有两条数据
sum(cnt) over(order by dt) as cut_all2,
-- 对整张表进行加和
sum(cnt) over() as cut_all3,
-- 直接按照名字进分组
sum(cnt) over(partition by name) as cut_all4
from window01
order by dt;
1.partition by 不加 => 作用整张表
数仓顺序
ods不动
union all + group by select ifnull case when
join
group by
grouping sets 【维度组合】
4.内置窗口函数
1.取值 串行
1.串行
LAG 【窗口内 向上 第n行的值 当前行向上取一行】
LAG(column [, N[, default]])
column => 列名
n => 取几行
default => 取不到给默认值
LEAD 【窗口内 向下 第n行的值 当前行向下取一行】
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt) as cut_all,
lead(dt,1,"9999-99-99") over(partition by name order by dt) as lead_alias,
lead(dt,1,"9999-99-99") over(partition by name order by dt) as lag_alias
from window01;
2.取值
FIRST_VALUE() : 取分组内排序后 截止到当前行 第一个值
LAST_VALUE():取分组内排序后 截止到当前行 最后一个值
select
name,
dt,
cnt,
first_value(cnt) over(partition by name order by dt) as f_value,
last_value(cnt) over(partition by name order by dt) as l_value
from window01;
2.排序
分组
ntile
需求:
把数据按照某个字段进行排序,把数据分成n份ntile(n)
如果不能平均分配 优先分配到编号小的里面
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt) as cut_all,
-- 平均分成n份,不能平均分,优先把多余的放到最小的里面
ntile(2) over(partition by name order by dt) as n2,
ntile(3) over(partition by name order by dt) as n3
from window01
order by dt;
排序
rank : 从1开始,按照排序 相同会重复,名次会留下空位 生成组内的记录编号
row_number: 从1开始,按照排序 生成组内的记录编号
dense_rank:从1开始,按照排序 相同会重复,名次不会留下空位 生成组内的记录编号
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt) as cut_all,
rank() over(partition by name order by cnt desc) as rk,
row_number() over(partition by name order by cnt desc) as rw,
dense_rank() over(partition by name order by cnt desc) as d_rk
from window01;
原创文章,作者:506227337,如若转载,请注明出处:https://blog.ytso.com/244908.html