Using CONCURRENTLY without taking any locks when creating index

PostgreSQL 创建索引时通常会堵住 DML语句,如果给生产系统的大表加索引,可能加索引的过程会很长,那么在索引创建过程中,Application 的 insert,delete,update 语句会被 block,这对应用来说是非常沉痛的.幸运的是 PostgreSQL 在创建索引时,提供一个”CONCURRENTLY”选项,创建索引时, 使用这个选项则可以在线创建索引,同时又不会阻塞应用的 DML 语句,真是太棒了,下面这两段是介绍这个参数的,来自官网。

关于 CONCURRENTLY 选项

Creating an index can interfere with regular operation of a database. Normally PostgreSQL
locks the table to be indexed against writes and performs the entire index build with a
single scan of the table. Other transactions can still read the table, but if they try to
insert, update, or delete rows in the table they will block until the index build is finished.
This could have a severe effect if the system is a live production database. Very large tables
can take many hours to be indexed, and even for smaller tables, an index build can lock out
writers for periods that are unacceptably long for a production system.
PostgreSQL supports building indexes without locking out writes. This method is invoked by
specifying the CONCURRENTLY option of CREATE INDEX. When this option is used, PostgreSQL
must perform two scans of the table, and in addition it must wait for all existing transactions
that could potentially use the index to terminate. Thus this method requires more total work
than a standard index build and takes significantly longer to complete. However, since it allows
normal operations to continue while the index is built, this method is useful for adding new
indexes in a production environment. Of course, the extra CPU and I/O load imposed by the
index creation might slow other operations.

创建索引语法

1
2
3
4
5
6
7
8
Command: CREATE INDEX  
Description: define a new index
Syntax:
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table [ USING method ]
( { column | ( expression ) } [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ WITH ( storage_parameter = value [, ... ] ) ]
[ TABLESPACE tablespace ]
[ WHERE predicate ]

下面来做下测试,分别测试不带 “CONCURRENTLY” 参数和加上这个参数时的场景,具体步骤如下。

场景一:创建索引时不带”CONCURRENTLY”参数

这里有张测试表,相关信息如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
skytf=> select pg_size_pretty(pg_relation_size('my_table_201102'));  
pg_size_pretty
----------------
1436 MB
(1 row)

skytf=> /d my_table_201102
Table "skytf.my_table_201102"
Column | Type | Modifiers
-------------+-----------------------------+-----------
create_time | timestamp without time zone | not null
log_time | timestamp without time zone |
user_id | character varying(32) |
action | character varying(128) |
app_id | character varying(32) |
result | numeric(10,0) |
server_id | character varying(32) |
arg1 | character varying(128) |
arg2 | character varying(128) |
arg3 | character varying(128) |
arg4 | character varying(128) |
username | character varying(64) |
clientip | character varying(32) |
Indexes:
"idx_my_table_201102_create_time" btree (create_time)
Inherits: my_table

session一: 执行创建索引语句

1
skytf=> create index idx_my_table_201102_action on my_table_201102 using btree (action);

备注:这张表有1G多, 这个SESION正在执行,还没跑会。

session二:向表中插入记录

1
skytf=> insert into my_table_201102 (create_time,action) values (now(),'test');

此时发现 session二正处于等侍状态,说明 insert 操作已经被 session一阻塞了。

场景二:创建索引时加上”CONCURRENTLY”参数

先删除之前创建的索引

1
2
skytf=> drop index idx_my_table_201102_action;  
DROP INDEX

session一: 执行创建索引操作

1
skytf=> create index CONCURRENTLY idx_my_table_201102_action on my_table_201102 using btree (action);

备注:表有1G多, 这个SESION正在执行,还没跑会

session二: 向这个表中插入记录

1
2
skytf=> insert into my_table_201102 (create_time,action) values (now(),'test_b');  
INSERT 0 1

备注,当session一还在创建索引的过程中,立即新开一个 session二向些表中插入记录,发现插入动作可以立即执行下去,说明创建索引时带参数 “CONCURRENTLY”不会阻塞 DML语句。(DELETE 操作就没测试了,有兴趣的朋友可以测试下。)

比较两种场景索引创建时间

不带 “CONCURRENTLY” 参数

1
2
3
skytf=> create index idx_my_table_201102_action on my_table_201102 using btree (action);  
CREATE INDEX
Time: 46094.907 ms

带 “CONCURRENTLY” 参数

1
2
3
skytf=> create index CONCURRENTLY idx_my_table_201102_action on my_table_201102 using btree (action);  
CREATE INDEX
Time: 55521.221 ms

由于带”CONCURRENTLY” 创建索引时需要做更多的维护工作,所以耗时稍微长些。

Reindex 用法

1
2
3
4
5
skytf=> /h reindex  
Command: REINDEX
Description: rebuild indexes
Syntax:
REINDEX { INDEX | TABLE | DATABASE | SYSTEM } name [ FORCE ]

备注:但 Reindex 命令却不提供 “CONCURRENTLY” 参数。

总结

  1. 生产系统给表加索引时,建议使用”CONCURRENTLY”属性,对生产产生最小影响。
  2. 在使用”CONCURRENTLY”属性时,由于 PG需要做更多的内部操作,所以耗时稍微长些。
  3. 在使用”CONCURRENTLY”属性时,如果创建索引途中因为 “uniqueness violation” 或会话中断等原因创建失败,则这个索引需要删除重新创建。
  4. Reindex 命令不提供 “CONCURRENTLY” 参数,生产系统在执行 Reindex 时还需谨慎。

原创文章,作者:506227337,如若转载,请注明出处:https://blog.ytso.com/tech/database/236370.html

(0)
上一篇 2022年1月24日 21:27
下一篇 2022年1月24日 21:27

相关推荐

发表回复

登录后才能评论