Optimize a query by using Partial Index

今天通过查看数据库日志，有个核心生产库有条SQL执行在800 ms 以上,这引起了我的注意。

1 查看CSVLOG，有大量如下信息。

1	86314,0,LOG,00000,"duration: 819.900 ms execute <unnamed>: select count(recv_id) from skytf.my_invite_rec where recv_id=$1 and result=2","parameters: $1 = '125662840'",,,,,,,,""

2 sql 语句

1	select count(recv_id) from skytf.my_invite_rec where recv_id=$1 and result=2

3 表结构

skytf=> /d my_invite_rec  
 Table "skytf.my_invite_rec"  
 Column | Type | Modifiers  
-------------+-----------------------------+--------------------  
invite_id | numeric(12,0) | not null  
invite_time | timestamp without time zone |  
user_id | numeric(10,0) | not null  
recv_id | numeric(10,0) | not null  
invite_msg | character varying(512) |  
appinfo | character varying(32) |  
extinfo | character varying(128) |  
group_id | numeric(12,0) |  
result | numeric(10,0) | not null default 2  
Indexes:  
 "pk_my_invite_rec" PRIMARY KEY, btree (invite_id)  
 "idx_my_invite_rec_recvid" btree (recv_id)  
 "idx_my_invite_rec_userid" btree (user_id)

4 老的执行计划

skytf=> explain analyze select count(recv_id) from skytf.my_invite_rec where recv_id=125662840 and result=2;  
 QUERY PLAN  
  
------------------------------------------------------------------------------------------------------------------------------------  
---------------------  
Aggregate (cost=42015.40..42015.41 rows=1 width=10) (actual time=739.196..739.196 rows=1 loops=1)  
 -> Bitmap Heap Scan on my_invite_rec (cost=5039.49..41753.20 rows=104879 width=10) (actual time=739.190..739.190 rows=0 loops=1)  
 Recheck Cond: (recv_id = 125662840::numeric)  
 Filter: (result = 2::numeric)  
 -> Bitmap Index Scan on idx_my_invite_rec_recvid (cost=0.00..5013.27 rows=329714 width=0) (actual time=213.024..213.024 rows=330385 loops=1)  
 Index Cond: (recv_id = 125662840::numeric)  
Total runtime: 739.243 ms  
(7 rows)
Time: 739.800 ms

从执行计划来看，PLAN正常。唯独 rows=330385 比较高.

5 通过查看数据库 csvlog 日志发现有两个 recv_id的出现的频率比较高

[postgres@skytf1](mailto:postgres@skytf1)-> cat postgresql-2011-02-20_000000.csv | grep 125662840 | wc -l  
7749  
[postgres@skytf1](mailto:postgres@skytf1)-> cat postgresql-2011-02-20_000000.csv | grep 148161000 | wc -l  
6060

6 随机取些recv_id,测试下这条SQL的时间

skytf=> select recv_id from my_invite_rec where result=2 limit 10;  
 recv_id  
-----------  
103103900  
106460840  
153989140  
153099420  
 335932  
132509748  
110848636  
101857296  
107371328  
135262800  
(10 rows)
Time: 0.556 ms  

skytf=> select count(*) from my_invite_rec where recv_id=103103900;  
count  
-------  
 5  
(1 row)  
Time: 0.681 ms  

skytf=> select count(*) from my_invite_rec where recv_id=107371328;  
count  
-------  
 28  
(1 row)
Time: 0.668 ms

skytf=> explain analyze select count(*) from my_invite_rec where recv_id=103103900;  
 QUERY PLAN  
  
------------------------------------------------------------------------------------------------------------------------------------  
--------------  
Aggregate (cost=95.78..95.79 rows=1 width=0) (actual time=0.058..0.058 rows=1 loops=1)  
 -> Index Scan using idx_my_invite_rec_recvid on my_invite_rec (cost=0.00..95.66 rows=48 width=0) (actual time=0.044..0.055 rows=5 loops=1)  
 Index Cond: (recv_id = 103103900::numeric)  
Total runtime: 0.093 ms  
(4 rows)
Time: 0.773 ms

从上面可以看出，SQL都在 1 ms 以下，非常的迅速啊，猜想可能与记录数有关。

7 猜想先前的两个recv_id的记录数比较多，查询如下

skytf=> select count(*) from my_invite_rec where recv_id=125662840;  
count  
--------  
332424  
(1 row)
Time: 804.972 ms  

skytf=> select count(*) from my_invite_rec where recv_id=148161000;  
count  
--------  
416366  
(1 row)
Time: 850.757 ms

果然如此，是因为这两个 recv_id 对应的数据比较多，PG花了很长的时间寻找符合条件记录。

8 后来在德哥的指导下，创建联合索引

1 2	skytf=> _create index idx_my_invite_rec_result on my_invite_rec (result,recv_id) where recv_id in (125662840,148161000);_CREATE INDEX Time: 4949.886 ms

9 再次查看PLAN

skytf=> explain analyze select count(recv_id) from skytf.my_invite_rec where recv_id=125662840 and result=2;  
 QUERY PLAN  
  
------------------------------------------------------------------------------------------------------------------------------------  
------------  
Aggregate (cost=39107.39..39107.40 rows=1 width=10) (actual time=0.032..0.032 rows=1 loops=1)  
 -> Bitmap Heap Scan on my_invite_rec (cost=1871.45..38847.11 rows=104111 width=10) (actual time=0.030..0.030 rows=0 loops=1)  
 Recheck Cond: ((result = 2::numeric) AND (recv_id = 125662840::numeric))  
 -> Bitmap Index Scan on idx_my_invite_rec_result (cost=0.00..1845.42 rows=104111 width=0) (actual time=0.028..0.028 rows=0 loops=1)  
 Index Cond: ((result = 2::numeric) AND (recv_id = 125662840::numeric))  
Total runtime: 0.079 ms  
(6 rows)
Time: 1.075 ms

skytf=> select count(recv_id) from skytf.my_invite_rec where recv_id=148161000 and result=2;  
count  
-------  
 0  
(1 row)
Time: 0.625 ms

执行时间由 739 ms 优化成现在的 1 ms , 速度提高了近 739倍。

总结

联合索引在上述这种SQL语句下显得非常的有效。
查看执行计划，需要特别注意 PLAN中 rows的值，通常较大的 rows 值意味着查询较慢。

原创文章，作者：506227337，如若转载，请注明出处：https://blog.ytso.com/236366.html

Optimize a query by using Partial Index

相关推荐

发表回复