优化一例:创建最精简的索引

今天查看数据库 CSVLOG, 有个慢查询语句, 这个优化有点意思,记录下来,下面是详细的步骤。

1 数据库CSVLOG

1
2
3
4
5
6
2011-03-01 17:24:49.442 CST,"skytf","skytf",21500,"192.168.164.37:39154",4d1d3830.53fc,7950,"SELECT",2010-12-31 09:56:00 CST,16/3143301,0,LOG,00000,"duration: 1701.007 ms execute <unnamed>:  
SELECT id, modifier, gmt_create as gmtCreate, gmt_modified as gmtModified, creator,
is_deleted as isDeleted, mpxyzdatas, skyid, wap_token as wapToken, app_token as appToken, source,
username, passwd, nickname, mpxyzdataf, mpxyzdatag, phone, phone2, mpxyzdatae, f FROM tmp_user
WHERE is_deleted='n' AND mpxyzdatas=$1
order by id desc ","parameters: $1 = ')dzyDlyyLdyyfjzy)vzy'",,,,,,,,""

根据 duration: 1701.007 ,这个语句执行时间在1.7 s左右,而且数据库日志抛出大量这个慢SQL语句。

2 慢查询 SQL

1
2
3
4
5
SELECT id, modifier, gmt_create as gmtCreate, gmt_modified as gmtModified, creator, is_deleted as isDeleted,  
mpxyzdatas, skyid, wap_token as wapToken, app_token as appToken, source, username, passwd, nickname,
mpxyzdataf, mpxyzdatag, phone, phone2, mpxyzdatae, f FROM tmp_user
WHERE is_deleted='n' AND mpxyzdatas=$1 order by id desc ;
parameters: $1 = ')dzyL5zyzpzy)rzy!Pzy'",,,,,,,,"

3 老的执行计划

1
2
3
4
5
6
7
8
9
10
11
12
13
skytf=> explain analyze SELECT id, modifier, gmt_create as gmtCreate, gmt_modified as gmtModified, creator, is_deleted as isDeleted,  
skytf-> mpxyzdatas, skyid, wap_token as wapToken, app_token as appToken, source, username, passwd, nickname,
skytf-> mpxyzdataf, mpxyzdatag, phone, phone2, mpxyzdatae, f FROM tmp_user WHERE is_deleted='n' AND
skytf-> mpxyzdatas=')dzyL5zyzpzy)rzy!Pzy' order by id desc ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Sort (cost=139298.73..139298.74 rows=1 width=226) (actual time=1552.541..1552.542 rows=1 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 25kB
-> Seq Scan on tmp_user (cost=0.00..139298.72 rows=1 width=226) (actual time=1352.316..1552.524 rows=1 loops=1)
Filter: ((is_deleted = 'n'::bpchar) AND ((mpxyzdatas)::text = ')dzyL5zyzpzy)rzy!Pzy'::text))Total runtime: 1552.583 ms
(6 rows)
Time: 1566.089 ms

这个PLAN比较简单,很明了,”Seq Scan”,整个消耗 1.5s

4 表的相关信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
skytf=> select pg_size_pretty(pg_relation_size('tmp_user'));  
pg_size_pretty
----------------
573 MB
(1 row)

skytf=> /d tmp_user
Table "skytf.tmp_user"
Column | Type | Modifiers
--------------+-----------------------------+------------------------------
id | integer | not null
modifier | integer |
gmt_create | timestamp without time zone |
gmt_modified | timestamp without time zone |
creator | integer |
is_deleted | character(1) | not null default 'n'::bpchar
mpxyzdatas | character varying |
skyid | bigint | not null
wap_token | character varying(100) |
app_token | character varying(100) |
source | character(10) | default 'applist'::bpchar
username | character varying |
passwd | character varying |
nickname | character varying |
mpxyzdataf | character varying |
mpxyzdatag | character varying |
phone | character varying(15) |
mpxyzdatae | character varying |
f | character varying |
phone2 | character varying(15) |
Indexes:
"tmp_user_new_pkey" PRIMARY KEY, btree (id)
"unique_skyid1" UNIQUE, btree (skyid)
"idx_mpxyzdatas" btree (mpxyzdatas)
"index_gmt_create1" btree (date(gmt_create))

从上面信息可以看出,表tmp_user不太大,才 573 MB , 但查询条件 is_deleted 和 mpxyzdatas 都没有建索引, 根据直觉,像这种查询,建个联合索引 (is_deleted,mpxyzdatas ) 最合适不过了,先别着急,下面深入分析下。

5 查询列的 distinct 情况

1
2
3
4
5
skytf=> select tablename,attname,n_distinct from pg_stats where tablename='tmp_user' and attname in ('is_deleted','mpxyzdatas');  
tablename | attname | n_distinct
-----------+------------+------------
tmp_user | is_deleted | 1
tmp_user | mpxyzdatas | -1

pg_stats.n_distinct值为正数时,表时distinct实际值,当 n_distinct为 “ -1 “ 时,表示此列具有非常好的 distinct属性,即为 unique. 也就是说 表 tmp_user的列 is_deleted 只有一个 distinct 值, 而 mpxyzdatas 字段却为唯一。

6 设想几种创建索引的方法

  1. create index idx_is_deleted_mpxyzdatas on skytf.tmp_user using btree (is_deleted,mpxyzdatas );
  2. create index idx_is_mpxyzdatas_id on skytf.tmp_user using btree (mpxyzdatas, id desc );
  3. create index idx_mpxyzdatas on skytf.tmp_user using btree (mpxyzdatas );

现在权衡一下上面三种方法,方法一创建联合索引通常情况下适合这种QUERY 场合,但由于列 is_deleted 只有一个 distinct 值,所以这个字段上不适合建索引。于是排除方法一。 而方法二建了个(mpxyzdatas, id desc )索引,第二个方法主要是考虑到查询条件中按ID 进行降序排序,但由于 id 已经是 pk,并且排序代价不大,这可以从步骤3的 plan来查看,Sort (cost=139298.73..139298.74 rows=1 width=226) (actual time=1552.541..1552.542 rows=1 loops=1) 根据 cost=139298.73..139298或者 actual time=1552.541..1552.542 这个结点的起动代价,完成代价,起动时间完成时间可以非常明显地看出这个排序步骤代价非常小。而方法三,仅在列 mpxyzdatas 上创建索引,看上去不可行,

下面准备在测试库上测试下,接着把生产的这张表倒到测试环境下去验证一下,我的测试库主机,配置和生产的几乎相同,导表的步骤这里就省略了,通过 pg_dump备份表,pg_restore 还原表就行。

在测试环境上测试

7 先取部分数据,做测试用。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
skytf=> select mpxyzdatas from tmp_user limit 10;  
mpxyzdatas
----------------------
)dzyK5zy$vzyGdzyLNzy
)dzyKBzyL1zy(~zy)dzy
)dzyD5zyP1yy%xzyL9zy
)dzyLlyy)5zz$NzyfFzy
)dzyDlzyb#zybvzy)^zy
)dzyD5yyD1zyC3zyP@zy
)dzyH5yyK9zyy)zzz1yy
)dzyL5yy~7zz(fzyL5yz
)dzyD5yyKJzye3zyG7zy
)dzyKlzy$jzyDtyy)Dzy
(10 rows)

8 创建单列索引

1
2
3
skytf=> create index idx_mpxyzdatas on skytf.tmp_user using btree (mpxyzdatas );  
CREATE INDEX
Time: 18191.498 ms

9 创建索引后的 plan

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
skytf=> explain analyze SELECT id, modifier, gmt_create as gmtCreate, gmt_modified as gmtModified, creator, is_deleted as isDeleted,  
skytf-> mpxyzdatas, skyid, wap_token as wapToken, app_token as appToken, source, username, passwd, nickname,
skytf-> mpxyzdataf, mpxyzdatag, phone, phone2, mpxyzdatae, f FROM tmp_user WHERE is_deleted='n' AND
skytf-> mpxyzdatas=')dzyDlzyb#zybvzy)^zy' order by id desc ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Sort (cost=4.71..4.72 rows=1 width=226) (actual time=0.100..0.101 rows=1 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 25kB
-> Index Scan using idx_mpxyzdatas on tmp_user (cost=0.00..4.70 rows=1 width=226) (actual time=0.088..0.090 rows=1 loops=1)
Index Cond: ((mpxyzdatas)::text = ')dzyDlzyb#zybvzy)^zy'::text)
Filter: (is_deleted = 'n'::bpchar)
Total runtime: 0.131 ms
(7 rows)
Time: 0.459 ms

可以看出,plan 走了索引 idx_mpxyzdatas, 花费的 cost 大大降低,执行时间也仅为 0.459 ms。在测试环境上通过后,于是将方法三的索引加到生产环境下,慢查询消失。

总结

  1. 创建精简的索引对性能影响是很大的,一方面保证了查询的速度,另一方面降低了索引的维护成本,当表上有insert,update操作时,相比联合索引,维护代价更低。

  2. 引用一句德哥的话,在这节能俭排的年代,能够利用最小的代价实现最大的价值,何乐而不为呢。

原创文章,作者:3628473679,如若转载,请注明出处:https://blog.ytso.com/236373.html

(0)
上一篇 2022年1月24日
下一篇 2022年1月24日

相关推荐

发表回复

登录后才能评论