PostgreSQL 优化一例: 使用函数索引和联合索引

今天有个生产库上的语句比较慢,花费近 500 毫秒左右,为了便于测试,后来将这几个表导到测试环境下测试,并且优化后,时间下降到仅需要 0.104 ms,下面是优化过程。

表信息

表结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
francs=> /d test_count  
Table "francs.test_count"
Column | Type | Modifiers
-------------+-------------------+---------------
answerid | character varying |
skyid | bigint |
questid | character varying |
username | character varying |
id | integer | not null
create_time | character varying | default now()
flag | integer | default 0
Indexes:
"v_a_c_pk_id" PRIMARY KEY, btree (id)
"idx_test_count_ctime" btree (create_time)

francs=> /d test
Table "francs.test"
Column | Type | Modifiers
-------------+-----------------------------+---------------
answer | character varying |
create_time | timestamp without time zone | default now()
questid | character varying(20) |
id | integer | not null
answercount | bigint | default 0
Indexes:
"pk_ans_id" PRIMARY KEY, btree (id)
"idx_test_ctime" btree (create_time)

表大小和数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
francs=> /dt+ test_count  
List of relations
Schema | Name | Type | Owner | Size | Description
--------+-------------------+-------+--------+-------+-------------
francs | test_count | table | francs | 56 MB |
(1 row)

francs=> /dt+ test
List of relations
Schema | Name | Type | Owner | Size | Description
--------+-------------+-------+--------+---------+-------------
francs | test | table | francs | 8920 kB |
(1 row)

francs=> select count(*) from test_count;
count
--------
500455
(1 row)

francs=> select count(*) from test;
count
--------
101118
(1 row)

备注:两个表都不大。

优化前的SQL和执行计划

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
francs=> explain analyze SELECT vac.skyid, vac.username, va.answer  
FROM test_count vac, test va
WHERE vac.questid = '20110224123544'
and vac.answerid = va.id ::character varying
and vac.flag = 0
ORDER BY vac.create_time DESC LIMIT 3;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..494.32 rows=3 width=69) (actual time=81.274..420.706 rows=3 loops=1)
-> Nested Loop (cost=0.00..85352.00 rows=518 width=69) (actual time=81.273..420.702 rows=3 loops=1)
Join Filter: ((vac.answerid)::text = ((va.id)::character varying)::text)
-> Index Scan Backward using idx_test_count_ctime on test_count vac (cost=0.00..20208.38 rows=24 width=56)
(actual time=80.522..263.236 rows=3 loops=1)
Filter: (((questid)::text = '20110224123544'::text) AND (flag = 0))
-> Materialize (cost=0.00..3220.77 rows=101118 width=22) (actual time=0.026..31.220 rows=67673 loops=3)
-> Seq Scan on test va (cost=0.00..2122.18 rows=101118 width=22) (actual time=0.009..28.288 rows=101118 loops=1)
Total runtime: 537.650 ms

备注:这是原始的SQL,花费了537.650 ms,生产上的表导到测试库后,执行时间也这么多; 多表关联一般在关联字段上创建索引,而表 vac.answerid 字段上没有索引,计划在这个字段上创建索引。

查看 Answerid 字段的选择性

1
2
3
4
5
6
7
8
9
10
11
francs=> select count(distinct answerid),count(*) from test_count;  
count | count
-------+--------
32847 | 500455
(1 row)

francs=> create index idx_test_count_answeid on test_count using btree ( answerid );
CREATE INDEX

francs=> analyze test_count;
ANALYZE

备注:answerid 字段 选择性不错,可以创建索引。

查看PLAN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
francs=> explain analyze SELECT vac.skyid, vac.username, va.answer  
FROM test_count vac, test va
WHERE vac.questid = '20110224123544'
and vac.answerid = va.id ::character varying
and vac.flag = 0
ORDER BY vac.create_time DESC LIMIT 3;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..489.59 rows=3 width=69) (actual time=81.897..422.056 rows=3 loops=1)
-> Nested Loop (cost=0.00..85352.00 rows=523 width=69) (actual time=81.895..422.052 rows=3 loops=1)
Join Filter: ((vac.answerid)::text = ((va.id)::character varying)::text)
-> Index Scan Backward using idx_test_count_ctime on test_count vac (cost=0.00..20208.38 rows=24 width=56) (actual time=81.131..264.123 rows=3 loops=1)
Filter: (((questid)::text = '20110224123544'::text) AND (flag = 0))
-> Materialize (cost=0.00..3220.77 rows=101118 width=22) (actual time=0.027..31.352 rows=67673 loops=3)
-> Seq Scan on test va (cost=0.00..2122.18 rows=101118 width=22) (actual time=0.009..28.355 rows=101118 loops=1)
Total runtime: 529.655 ms

备注:还是走了表 test_count 上的时间索引 idx_test_count_ctime,并且 test 走了全表扫描由于表关联字段类型不一致,va.id 为 integer ,vac.answerid 为 character varying,这时是用不到表 test 的主键的,这里需要在表 test 上创建函数索引。

创建函数索引

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
francs=> create index idx_test_id on test using btree (( id::character varying));  
CREATE INDEX
francs=> /d test
Table "francs.test"
Column | Type | Modifiers
-------------+-----------------------------+---------------
answer | character varying |
create_time | timestamp without time zone | default now()
questid | character varying(20) |
id | integer | not null
answercount | bigint | default 0
Indexes:
"pk_ans_id" PRIMARY KEY, btree (id)
"idx_test_ctime" btree (create_time)
"idx_test_id" btree ((id::character varying))

创建函数索引后再次查看PALN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
francs=> explain analyze SELECT vac.skyid, vac.username, va.answer  
FROM test_count vac, test va
WHERE vac.questid = '20110224123544'
and vac.answerid = va.id ::character varying
and vac.flag = 0
ORDER BY vac.create_time DESC LIMIT 3;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..125.03 rows=3 width=69) (actual time=87.517..269.151 rows=3 loops=1)
-> Nested Loop (cost=0.00..21795.61 rows=523 width=69) (actual time=87.514..269.146 rows=3 loops=1)
-> Index Scan Backward using idx_test_count_ctime on test_count vac (cost=0.00..20208.38 rows=24 width=56) (actual time=87.478..269.049 rows=3 loops=1)
Filter: (((questid)::text = '20110224123544'::text) AND (flag = 0))
-> Index Scan using idx_test_id on test va (cost=0.00..57.28 rows=506 width=22) (actual time=0.019..0.019 rows=1 loops=3)
Index Cond: (((id)::character varying)::text = (vac.answerid)::text)
Total runtime: 269.228 ms
(7 rows)

备注:这时表 test 已经走索引了,此时语句已优化到了只需要 200多 毫秒,看看是否还有优化空间。根据上面 PLAN ,时间主要花在 idx_test_count_ctime 扫描上,花费了182 ( 269 -87 ) 毫秒 。

尝试删除索引

1
2
3
4
5
6
7
8
francs=> drop index idx_test_count_ctime;  
DROP INDEX

francs=> create index idx_test_count_questid_ctime on test_count using btree (questid,create_time desc);
CREATE INDEX

francs=> analyze test_count;
ANALYZE

备注:尝试删除 create_time 索引,并在字段 (questid,create_time ) 上创建联合索引。

再次查看 PALN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
francs=> explain analyze SELECT vac.skyid, vac.username, va.answer  
FROM test_count vac, test va
WHERE vac.questid = '20110224123544'
and vac.answerid = va.id ::character varying
and vac.flag = 0
ORDER BY vac.create_time DESC LIMIT 3;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..9.36 rows=3 width=69) (actual time=0.038..0.057 rows=3 loops=1)
-> Nested Loop (cost=0.00..1609.55 rows=516 width=69) (actual time=0.037..0.056 rows=3 loops=1)
-> Index Scan using idx_test_count_questid_ctime on test_count vac (cost=0.00..22.32 rows=24 width=56) (actual time=0.028..0.033 rows=3 loops=1)
Index Cond: ((questid)::text = '20110224123544'::text)
Filter: (flag = 0)
-> Index Scan using idx_test_id on test va (cost=0.00..57.28 rows=506 width=22) (actual time=0.006..0.006 rows=1 loops=3)
Index Cond: (((id)::character varying)::text = (vac.answerid)::text)
Total runtime: 0.104 ms
(8 rows)
备注,现在仅花费了 0.104 ms,走了新建的索引 idx_test_count_questid_ctime,比最初的 500
多毫秒,优化了近 5000 倍!

总结

  1. 上面仅是在测试库上做的测试,优化后的 SQL从原来的 537.650 ms 优化到 0.104 ms ,优化了近 5000倍,
    但上生产前还需要和开发人员沟通,看看索引是否会影响到其它 SQL。
  2. 这里由于关联字段类型不一致,所以创建了一个函数索引,一般在应用中,关联字段的类型是一致的。
  3. 近期项目中低性能 SQL 语句较多( 500 毫秒以上),应该引起足够重视,争取每天都能做下优化。
  4. 在order by语句中,排序字段建议创建索引,但具体建法要视情况而定( 单列索引或者组合索引)。

原创文章,作者:kirin,如若转载,请注明出处:https://blog.ytso.com/tech/database/237871.html

(0)
上一篇 2022年1月29日 22:30
下一篇 2022年1月29日 22:30

相关推荐

发表回复

登录后才能评论