PostgreSQL Lock 一例

今天开发人员发来SIR,需要给生产库上一张表加字段,SQL语句如下,表名用 table_test 代替 ALTER TABLE table_test ADD COLUMN column_a integer NOT NULL DEFAULT 0;当时偶查了一下这张表的记录数,才3000多条,这么小的表,接着查了下 pg_stata_activity,发现没有多少活动的会话,于是开始执行上面SQL,执行了大概10S之后,发现还没执行完,觉得奇怪,这么小的表,应该很快才对,估计是被HOLD住了, 详细操作过程如下。

表结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
testtf=# /d pg_class;  
Table "pg_catalog.pg_class"
Column | Type | Modifiers
-----------------+-----------+-----------
relname | name | not null
relnamespace | oid | not null
reltype | oid | not null
reloftype | oid | not null
relowner | oid | not null
relam | oid | not null
relfilenode | oid | not null
reltablespace | oid | not null
relpages | integer | not null
reltuples | real | not null
reltoastrelid | oid | not null
reltoastidxid | oid | not null
relhasindex | boolean | not null
relisshared | boolean | not null
relistemp | boolean | not null
relkind | "char" | not null
relnatts | smallint | not null
relchecks | smallint | not null
relhasoids | boolean | not null
relhaspkey | boolean | not null
relhasexclusion | boolean | not null
relhasrules | boolean | not null
relhastriggers | boolean | not null
relhassubclass | boolean | not null
relfrozenxid | xid | not null
relacl | aclitem[] |
reloptions | text[] |
Indexes:
"pg_class_oid_index" UNIQUE, btree (oid)
"pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)

先查出表table_test 的oid

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
testtf=# select oid,relname from pg_class where relname='table_test';  
oid | relname
-------+---------
16678 | table_test
(1 row)

testtf=# /d pg_locks;
View "pg_catalog.pg_locks"
Column | Type | Modifiers
--------------------+----------+-----------
locktype | text |
database | oid |
relation | oid |
page | integer |
tuple | smallint |
virtualxid | text |
transactionid | xid |
classid | oid |
objid | oid |
objsubid | smallint |
virtualtransaction | text |
pid | integer |
mode | text |
granted | boolean |

查询表table_test上持有的锁

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
testtf=# select locktype,database,pid,relation ,mode from pg_locks where relation=16678;  
locktype | database | pid | relation | mode
----------+----------+-------+----------+---------------------
relation | 16400 | 9905 | 16678 | AccessShareLock
relation | 16400 | 9902 | 16678 | AccessShareLock
relation | 16400 | 9127 | 16678 | AccessShareLock
relation | 16400 | 9909 | 16678 | AccessShareLock
relation | 16400 | 6781 | 16678 | AccessShareLock
relation | 16400 | 9910 | 16678 | AccessShareLock
relation | 16400 | 9903 | 16678 | AccessShareLock
relation | 16400 | 6779 | 16678 | AccessShareLock
relation | 16400 | 9904 | 16678 | AccessShareLock
relation | 16400 | 9907 | 16678 | AccessShareLock
relation | 16400 | 9436 | 16678 | AccessExclusiveLock
relation | 16400 | 29393 | 16678 | AccessShareLock
relation | 16400 | 9908 | 16678 | AccessShareLock
relation | 16400 | 9912 | 16678 | AccessShareLock
relation | 16400 | 9911 | 16678 | AccessShareLock
relation | 16400 | 4615 | 16678 | AccessShareLock
relation | 16400 | 9128 | 16678 | AccessShareLock
(17 rows)

testtf=# select locktype,database,pid,relation ,mode from pg_locks where relation=16678;
locktype | database | pid | relation | mode
----------+----------+-------+----------+-----------------
relation | 16400 | 29393 | 16678 | AccessShareLock

testtf=# select locktype,database,pid,relation ,mode from pg_locks where relation=16678;
locktype | database | pid | relation | mode
----------+----------+-------+----------+-----------------
relation | 16400 | 29393 | 16678 | AccessShareLock

备注:此时表 table_test 只有一个会话(pid=29393)持有 AccessShareLock 锁,这应该是个Select查询;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
testtf=# /d pg_stat_activity;  
View "pg_catalog.pg_stat_activity"
Column | Type | Modifiers
------------------+--------------------------+-----------
datid | oid |
datname | name |
procpid | integer |
usesysid | oid |
usename | name |
application_name | text |
client_addr | inet |
client_port | integer |
backend_start | timestamp with time zone |
xact_start | timestamp with time zone |
query_start | timestamp with time zone |
waiting | boolean |
current_query | text |

testtf=# select usename,current_query ,xact_start,procpid from pg_stat_activity where procpid=29393;
usename | current_query | xact_start | procpid
---------+-----------------------+-------------------------------+---------
wapnews | <IDLE> in transaction | 2010-07-08 13:53:07.906297+08 | 29393

备注:从上面可以看出 空闲事务 9436 持有共享锁”AccessShareLock”,这个事务从 2010-07-08 13:53:07 开始到现在还没有结束。

经和开发人员联系,可以KILL这个会话

1
2
3
4
5
testtf=# select pg_terminate_backend(29393);  
pg_terminate_backend
----------------------
t
(1 row)

会话消失

1
2
3
4
testtf=# select usename,current_query ,xact_start,procpid from pg_stat_activity where procpid=29393;  
usename | current_query | xact_start | procpid
---------+---------------+------------+---------
(0 rows)

再次执行DDL,执行成功

1
2
testtf=# ALTER TABLE table_test ADD COLUMN column_a integer NOT NULL DEFAULT 0;  
ALTER TABLE

总结

  1. 原因分析: 在生产库执行类似ALTER TABLE 的DDL时应该非常小心,因为此时加的是 Table 级的 ACCESS EXCLUSIVE 锁 ,ACCESS EXCLUSIVE和其它所有类型所有锁都冲突,包括SELECT,所以当执行Select这张表的事务还没有结束时,执行 Alter Table时,表table_test会因获取不到 ACCESS EXCLUSIVE 而发生等侍。
  2. 建议方法:当执行ALTER TABLE类似操作时,正确的方法是先查看 pg_locks 是否有这张表所持有的锁,如果有,等系统空闲的时候 再做这种操作
  3. 扩展:当在生产库执行DDL时,应该首先查看执行DDL对象当前所持有锁的情况,如果和执行DDL请求的锁冲突,应该等系统空闲的时候再执行DDL操作。
  4. 和开发人员沟通,程序代码及时提交事务。

原创文章,作者:254126420,如若转载,请注明出处:https://blog.ytso.com/236328.html

(0)
上一篇 2022年1月24日
下一篇 2022年1月24日

相关推荐

发表回复

登录后才能评论