记得原来一老大哥说他运维ORACLE,发现死锁的时候,总想着缓一缓,再拖一拖,没准再查看的时候。死锁就已经自己释放掉,不需要处理了(运维的无奈)。而在PostgreSQL中,事务可以按照任意顺序加锁。且PostgreSQL也有着其死锁处理机制。
当进程请求加锁的时候,如果失败,会进入等待队列。如果在队列中已经存在一些进程要求本进程中已经持有的锁,那么为了尽量避免死锁,可以把本进程插入到它们的前面。当一个锁被释放时,将会试图唤醒等待队列里的进程。这个行为预防了死锁的产生。
但是这种方法不能完全避免死锁的产生,PostgreSQL提供如下图的死锁检验机制。
且PostgreSQL使用等待图(WFG)来检验死锁,WFG为一个有向图,顶点ABC表示申请加锁的进程,XY有向边表示依赖关系。
图中的虚线为soft edge,实线为hard edge
当进程A和进程B都在某个锁的等待队列,且进程A在进程B的后边,两个进程的加锁要求冲突,进程A在等待进程B,则存在从A到B的有向边,名为soft edge;如果进程A的加锁要求和进程B已经持有的锁冲突,这时候从A指向B的为hard edge。
系统出现死锁当且仅当WFG出现环,如果WFG中有soft edge环,则可以通过拓扑排序对队列进行重排,尝试消除死锁。从顶点开始,沿着WFG有向边走,如果能回到顶点,说明出现死锁。如果路径没有出现soft edge,则直接终止此事务。如果存在soft edge,则记录所有的soft edge,并尝试对这个集合进行调整。
通过拓扑排序找到可行的方案,则采用此方案,消除死锁,(不一定是最优的),否则死锁清除失败,终止该事务。
以下为pg12.1有向边的数据结构
/*
* One edge in the waits-for graph.
*
* waiter and blocker may or may not be members of a lock group, but if either
* is, it will be the leader rather than any other member of the lock group.
* The group leaders act as representatives of the whole group even though
* those particular processes need not be waiting at all. There will be at
* least one member of the waiter's lock group on the wait queue for the given
* lock, maybe more.
*/
typedef struct
{
PGPROC *waiter; /* the leader of the waiting lock group */
PGPROC *blocker; /* the leader of the group it is waiting for */
LOCK *lock; /* the lock being waited for */
int pred; /* workspace for TopoSort */
int link; /* workspace for TopoSort */
} EDGE;
PostgreSQL对进程的检验过程为:
- 递归试图检验和消除死锁。
- 测试当前队列状态是否会发生死锁,如果不满足约束性检查,则死锁。对soft edge调整,并检验是否合法。
- 判断是否出现环,如果存在且不能调整,则死锁。
相关参数:
1.Postgresql中,有一个死锁等待事件的参数,默认是1s,也就是是说Postgresql后台进程会以1s的频率来检测是否存在死锁。
锁等待超时
2.Postgresql中同样可以设置锁等待的超时时间,意味着当前事务在请求一个锁的时候,一旦等待时长超出指定的时间,当前语句被中止。该参数的默认值为0,意味着发生锁等待的时候永远不超时,一直等待下去。默认情况下,锁超时之后,当前Session的任何语句都会被回滚,即便是执行一个commit。
以下为标准锁的锁模式以及对应的操作
锁手动处理:
1.查询阻塞:
postgres=# SELECT w.query as waiting_query,
postgres-# w.pid as w_pid,
postgres-# w.usename as w_user,
postgres-# l.query as locking_query,
postgres-# l.pid as l_pid,
postgres-# l.usename as l_user,
postgres-# t.schemaname || '.' || t.relname as tablename
postgres-# from pg_stat_activity w join pg_locks l1 on w.pid = l1.pid
postgres-# and not l1.granted join pg_locks l2 on l1.relation = l2.relation
postgres-# and l2.granted join pg_stat_activity l on l2.pid = l.pid join pg_stat_user_tables t on l1.relation = t.relid;
waiting_query | w_pid | w_user | locking_query | l_pid | l_user | tablename
--------------------+-------+----------+--------------------------------------+-------+----------+----------------
truncate tab_ysl ; | 5391 | postgres | update tab_ysl set id =9 where id=1; | 5524 | postgres | public.tab_ysl
(1 row)
//其中l_pid为阻塞者的pid,w_pid为被阻塞者的pid。
2.查询表持有的锁:
postgres=# select oid from pg_class where relname= 'tab_ysl';
oid
------
24580
(1 row)
使pg_class.oid=pg_locks.relation,则表tab_ysl上持有的锁为,RowExclusiveLock和
AccessExclusiveLock ,对应了update和truncate的操作。
postgres=# select * from pg_locks where pid in ('5524','5391');
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+------+---------------------+---------+----------
relation | 13593 | 2659 | | | | | | | | 7/11 | 5524 | AccessShareLock | t | t
relation | 13593 | 2658 | | | | | | | | 7/11 | 5524 | AccessShareLock | t | t
relation | 13593 | 1249 | | | | | | | | 7/11 | 5524 | AccessShareLock | t | t
relation | 13593 | 3455 | | | | | | | | 7/11 | 5524 | AccessShareLock | t | t
relation | 13593 | 2663 | | | | | | | | 7/11 | 5524 | AccessShareLock | t | t
relation | 13593 | 2662 | | | | | | | | 7/11 | 5524 | AccessShareLock | t | t
relation | 13593 | 2685 | | | | | | | | 7/11 | 5524 | AccessShareLock | t | t
relation | 13593 | 2684 | | | | | | | | 7/11 | 5524 | AccessShareLock | t | t
relation | 13593 | 2615 | | | | | | | | 7/11 | 5524 | AccessShareLock | t | t
relation | 13593 | 1259 | | | | | | | | 7/11 | 5524 | AccessShareLock | t | t
virtualxid | | | | | 7/11 | | | | | 7/11 | 5524 | ExclusiveLock | t | t
virtualxid | | | | | 6/181 | | | | | 6/181 | 5391 | ExclusiveLock | t | t
transactionid | | | | | | 512 | | | | 7/11 | 5524 | ExclusiveLock | t | f
relation | 13593 | 24580 | | | | | | | | 7/11 | 5524 | RowExclusiveLock | t | f
transactionid | | | | | | 513 | | | | 6/181 | 5391 | ExclusiveLock | t | f
relation | 13593 | 24580 | | | | | | | | 6/181 | 5391 | AccessExclusiveLock | f | f
(16 rows)
杀掉阻塞者的进程,释放锁:
select pg_cancel_backend('上面查询到的阻塞着的pid');
##注意##pg_cancel_backend(‘阻塞者的pid值’);只能杀死select语句,对其他语句不生效,杀了之后查询发现还存在,考虑使用pg_terminate_backend(‘进程ID’);
原创文章,作者:奋斗,如若转载,请注明出处:https://blog.ytso.com/237222.html