PostgreSQL:关于 ” Select MAX(id) From Table_name ” 的优化与分析

今天在一重要生产库上发现一个 SQL 执行较慢,严重影响数据库性能,以下是优化过程:

一 优化过程

1.1 出现性能问题的 SQL

1
SELECT max(duser2) FROM mpt_table;

1.2 老的 PLAN

1
2
3
4
5
6
7
ndroid_market=> explain analyze SELECT max(duser2) FROM mpt_table;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=45159.95..45159.96 rows=1 width=4) (actual time=1010.232..1010.233 rows=1 loops=1)
-> Seq Scan on mpt_table (cost=0.00..40891.56 rows=1707356 width=4) (actual time=0.019..413.291 rows=1707488 loops=1)
Total runtime:1010.278 ms
(3 rows)

备注:走了全表扫,执行时间为 1010.278 ms. 并且字段 duser2 上没加索引,经分析 duser2 的选择性较好,于是创建索引.

1.3 创建索引

1
create index concurrently idx_mpt_table_duser2 on mpt_table using btree (duser2);

1.4 新的 PLAN

1
2
3
4
5
6
7
8
9
10
android_market=> explain analyze SELECT max(duser2) FROM mpt_table;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.03..0.04 rows=1 width=0) (actual time=0.093..0.093 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.03 rows=1 width=4) (actual time=0.091..0.091 rows=1 loops=1)
->Index Scan Backwardusing idx_mpt_table_duser2 on mpt_table (cost=0.00..58384.34 rows=1707488 width=4) (actual time=0.090..0.090 rows=1 loops=1)
Index Cond: (duser2 IS NOT NULL)
Total runtime: 0.134 ms
(6 rows)

备注:在 duser2 字段上创建索引后,走了 Index Scan Backward 扫描,并且执行时间为 0.134 ms,非常的快,接着想测试下 oracle 对 max() 的处理。

二 Oracle 测试

2.1 创建测试表并导入数据

1
2
3
4
5
6
7
8
9
10
11
SQL> create table test_max_oracle(id integer);
Table created

SQL> begin
2 for i in 1..5000000 loop
3 insert into test_max_oracle (id) values (i);
4 end loop;
5 commit;
6 end;
7 /
PL/SQL procedure successfully completed

2.2 创建索引并表分析

1
2
3
4
5
SQL> create index idx_test_max_id on test_max_oracle (id);
Index created

SQL> execute dbms_stats.gather_table_stats(ownname=>'skytf',tabname=>'test_max_oracle',cascade=> true);
PL/SQL procedure successfully completed

2.3 执行时间

1
2
3
4
5
6
7
8
9
10
11
12
13
SQL> set timi on;

SQL> select max(id) from test_max_oracle;
MAX(ID)
----------
5000000
Executed in 0.031 seconds

SQL> select max(id) from test_max_oracle;
MAX(ID)
----------
5000000
Executed in0.031 seconds

备注:最短执行时间为 31 ms 左右。

2.4 执行计划

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
15:34:24 SKYTF@skytf> select max(id) from test_max_oracle;
MAX(ID)
----------
5000000
已用时间: 00: 00: 00.01

执行计划
----------------------------------------------------------
Plan hash value: 1160081309

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 1705 (3)| 00:00:21 |
| 1 | SORT AGGREGATE | | 1 | 5 | | |
| 2 | INDEX FULL SCAN(MIN/MAX)| IDX_TEST_MAX_ID | 5000K| 23M| | |
----------------------------------------------------------------------------------------------

统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
407 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

备注:走的是 INDEX FULL SCAN,全索引扫描会根据索引的顺序访问所有的索引 block, 这种扫描方式类似全表扫,效率不高。下面引用这段描述:

2.5 关于 INDEX FULL SCAN

In a full index scan, the database reads the entire index in order. A full index scan is available if a predicate (WHERE clause) in the SQL statement references a column in the index, and in some circumstances when no predicate is specified. A full scan can eliminate sorting because the data is ordered by index key.

三 PostgreSQL 测试

3.1 创建测试表并插入数据

1
2
3
4
5
6
7
8
9
10
11
francs=> create table test_max_pg(id int4);
CREATE TABLE

francs=> insert into test_max_pg select generate_series(1,5000000);
INSERT 0 5000000

francs=> create index idx_test_max_id on test_max_pg using btree (id);
CREATE INDEX

francs=> analyze test_max_pg;
ANALYZE

3.2 执行时间

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
francs=> select max(id) from test_max_pg;
max
---------
5000000
(1 row)

Time: 13.960 ms

francs=> select max(id) from test_max_pg;
max
---------
5000000
(1 row)

Time: 2.493 ms

francs=> select max(id) from test_max_pg;
max
---------
5000000
(1 row)

Time: 1.233 ms

备注:最短执行时间只花了 1.233 ms 左右,比 oracle 测试结果要好些,这里不是对比 oracle 和 pg 性能,只是争对 max() 函数应用场景分析两种数据库的处理方式。

3.3 执行计划

1
2
3
4
5
6
7
8
9
francs=> explain select max(id) from test_max_pg;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Result (cost=0.03..0.04 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.03 rows=1 width=4)
-> Index Only Scan Backwardusing idx_test_max_id on test_max_pg (cost=0.00..151011.64 rows=5000000 width=4)
Index Cond: (id IS NOT NULL)
(5 rows)

备注:这里走的是 Index Only Scan Backward,由于索引是顺序排序的,这种扫描方式直接后向读取索引项,而不需要扫描整个索引 page,因此效率比”INDEX FULL SCAN “要高些。

四 补充

  1. 这里的测试版本: Oracle 10.2.0.1 ; PostgreSQL 9.2。
  2. Oracle 的索引扫描方式很多,例如 index rang scanindex unique scanindex fast full scan等,比 PostgreSQL 的索引访问方式多些,这里不再描述。

原创文章,作者:奋斗,如若转载,请注明出处:https://blog.ytso.com/237987.html

(0)
上一篇 2022年1月29日
下一篇 2022年1月29日

相关推荐

发表回复

登录后才能评论