Optimize a query by using Partial Index

今天通过查看数据库日志,有个核心生产库有条SQL执行在800 ms 以上,这引起了我的注意。

1 查看CSVLOG,有大量如下信息。

1
86314,0,LOG,00000,"duration: 819.900 ms execute <unnamed>: select count(recv_id) from skytf.my_invite_rec where recv_id=$1 and result=2","parameters: $1 = '125662840'",,,,,,,,""

2 sql 语句

1
select count(recv_id) from skytf.my_invite_rec where recv_id=$1 and result=2

3 表结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
skytf=> /d my_invite_rec  
Table "skytf.my_invite_rec"
Column | Type | Modifiers
-------------+-----------------------------+--------------------
invite_id | numeric(12,0) | not null
invite_time | timestamp without time zone |
user_id | numeric(10,0) | not null
recv_id | numeric(10,0) | not null
invite_msg | character varying(512) |
appinfo | character varying(32) |
extinfo | character varying(128) |
group_id | numeric(12,0) |
result | numeric(10,0) | not null default 2
Indexes:
"pk_my_invite_rec" PRIMARY KEY, btree (invite_id)
"idx_my_invite_rec_recvid" btree (recv_id)
"idx_my_invite_rec_userid" btree (user_id)

4 老的执行计划

1
2
3
4
5
6
7
8
9
10
11
12
13
14
skytf=> explain analyze select count(recv_id) from skytf.my_invite_rec where recv_id=125662840 and result=2;  
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
---------------------
Aggregate (cost=42015.40..42015.41 rows=1 width=10) (actual time=739.196..739.196 rows=1 loops=1)
-> Bitmap Heap Scan on my_invite_rec (cost=5039.49..41753.20 rows=104879 width=10) (actual time=739.190..739.190 rows=0 loops=1)
Recheck Cond: (recv_id = 125662840::numeric)
Filter: (result = 2::numeric)
-> Bitmap Index Scan on idx_my_invite_rec_recvid (cost=0.00..5013.27 rows=329714 width=0) (actual time=213.024..213.024 rows=330385 loops=1)
Index Cond: (recv_id = 125662840::numeric)
Total runtime: 739.243 ms
(7 rows)
Time: 739.800 ms

从执行计划来看,PLAN正常。唯独 rows=330385 比较高.

5 通过查看数据库 csvlog 日志发现有两个 recv_id的出现的频率比较高

1
2
3
4
[postgres@skytf1](mailto:postgres@skytf1)-> cat postgresql-2011-02-20_000000.csv | grep 125662840 | wc -l  
7749
[postgres@skytf1](mailto:postgres@skytf1)-> cat postgresql-2011-02-20_000000.csv | grep 148161000 | wc -l
6060

6 随机取些recv_id,测试下这条SQL的时间

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
skytf=> select recv_id from my_invite_rec where result=2 limit 10;  
recv_id
-----------
103103900
106460840
153989140
153099420
335932
132509748
110848636
101857296
107371328
135262800
(10 rows)
Time: 0.556 ms

skytf=> select count(*) from my_invite_rec where recv_id=103103900;
count
-------
5
(1 row)
Time: 0.681 ms

skytf=> select count(*) from my_invite_rec where recv_id=107371328;
count
-------
28
(1 row)
Time: 0.668 ms

skytf=> explain analyze select count(*) from my_invite_rec where recv_id=103103900;
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
--------------
Aggregate (cost=95.78..95.79 rows=1 width=0) (actual time=0.058..0.058 rows=1 loops=1)
-> Index Scan using idx_my_invite_rec_recvid on my_invite_rec (cost=0.00..95.66 rows=48 width=0) (actual time=0.044..0.055 rows=5 loops=1)
Index Cond: (recv_id = 103103900::numeric)
Total runtime: 0.093 ms
(4 rows)
Time: 0.773 ms

从上面可以看出,SQL都在 1 ms 以下,非常的迅速啊,猜想可能与记录数有关。

7 猜想先前的两个recv_id的记录数比较多,查询如下

1
2
3
4
5
6
7
8
9
10
11
12
13
skytf=> select count(*) from my_invite_rec where recv_id=125662840;  
count
--------
332424
(1 row)
Time: 804.972 ms

skytf=> select count(*) from my_invite_rec where recv_id=148161000;
count
--------
416366
(1 row)
Time: 850.757 ms

果然如此,是因为这两个 recv_id 对应的数据比较多,PG花了很长的时间寻找符合条件记录。

8 后来在德哥的指导下,创建联合索引

1
2
skytf=> _create index idx_my_invite_rec_result on my_invite_rec (result,recv_id) where recv_id in (125662840,148161000);_CREATE INDEX  
Time: 4949.886 ms

9 再次查看PLAN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
skytf=> explain analyze select count(recv_id) from skytf.my_invite_rec where recv_id=125662840 and result=2;  
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
------------
Aggregate (cost=39107.39..39107.40 rows=1 width=10) (actual time=0.032..0.032 rows=1 loops=1)
-> Bitmap Heap Scan on my_invite_rec (cost=1871.45..38847.11 rows=104111 width=10) (actual time=0.030..0.030 rows=0 loops=1)
Recheck Cond: ((result = 2::numeric) AND (recv_id = 125662840::numeric))
-> Bitmap Index Scan on idx_my_invite_rec_result (cost=0.00..1845.42 rows=104111 width=0) (actual time=0.028..0.028 rows=0 loops=1)
Index Cond: ((result = 2::numeric) AND (recv_id = 125662840::numeric))
Total runtime: 0.079 ms
(6 rows)
Time: 1.075 ms

skytf=> select count(recv_id) from skytf.my_invite_rec where recv_id=148161000 and result=2;
count
-------
0
(1 row)
Time: 0.625 ms

执行时间由 739 ms 优化成现在的 1 ms , 速度提高了近 739倍。

总结

  1. 联合索引在上述这种SQL语句下显得非常的有效。
  2. 查看执行计划,需要特别注意 PLAN中 rows的值,通常较大的 rows 值意味着查询较慢。

原创文章,作者:506227337,如若转载,请注明出处:https://blog.ytso.com/236366.html

(0)
上一篇 2022年1月24日
下一篇 2022年1月24日

相关推荐

发表回复

登录后才能评论