不要使用kill -9 杀 PostgreSQL 进程

今天应用反映数据库很慢,有些SQL执行一天都没执行完,数据库版本为 8.3。

检查数据库,发现大量会话在更新同一张表 tbl_tmp, 产生大量行锁申请等侍。等于等侍状态的SQL (被堵住的SQL)。

查找等待状态的SQL

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
anpstat=# select datname,procpid,query_start,current_query,waiting,client_addr from pg_stat_activity where waiting='t';  
datname | procpid |query_start |current_query | waiting | client_addr
----------------------------------------------------------------------------------------------------------------------+---------+-------------
sanpstat | 14044 | 2010-10-07 16:46:40.386904+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 27839 | 2010-10-07 16:49:18.022499+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 3539 | 2010-10-07 20:28:13.212075+08 |update tbl_tmp setfeedback =$1 where seq=$2 | t | 127.0.0.1
sanpstat | 3894 | 2010-10-07 20:53:43.466517+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 10130 | 2010-10-08 09:37:51.253871+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 9083 | 2010-10-08 08:08:59.221976+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 10038 | 2010-10-08 09:25:55.434459+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 10241 | 2010-10-08 09:51:13.990492+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1 | t | 127.0.0.1
sanpstat | 11147 | 2010-10-08 11:11:21.574665+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 11168 | 2010-10-08 11:17:17.158246+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 11926 | 2010-10-08 11:54:10.704641+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 11959 | 2010-10-08 11:56:07.021072+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1

查找正在执行的语句

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
sanpstat=# select datname,procpid,query_start,current_query,waiting,client_addr from pg_stat_activity where waiting='f'   
sanpstat-# and current_query like '%update tbl_tmp %';
datname | procpid |query_start |current_query | waiting |client_addr
----------+---------+-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------+---------+---------------
sanpstat | 2012 | 2010-10-07 16:52:55.8228+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 14157 | 2010-10-07 16:41:37.308062+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 10177 | 2010-10-08 09:43:43.297872+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 2043 | 2010-10-07 19:25:46.931806+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 3298 | 2010-10-07 19:37:53.307125+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 3322 | 2010-10-07 19:40:39.373079+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 4114 | 2010-10-08 09:25:18.701269+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 9082 | 2010-10-08 09:18:16.153882+08 |update tbl_tmp set show=1 , gmt_modified=now(), modifier=$1 where seq=$2 | f | 127.0.0.1
sanpstat | 10042 | 2010-10-08 09:32:55.260732+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 10278 | 2010-10-08 11:08:55.334562+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 11810 | 2010-10-08 11:46:24.147652+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 11964 | 2010-10-08 11:58:32.831916+08 |update tbl_tmp set responsed=true ,feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 |

使用pg_cancel_backend()杀会话

1
2
3
4
5
sanpstat=# select pg_cancel_backend(2012);   
pg_cancel_backend
-------------------
t
(1 row)

pg_cancel_backend()杀进程,虽然显示为 ‘t’ ,但进程还在,文档上解释 Cancel a backend's current query,猜想pg_cancel_backend()只能 kill select 语句,而不能kill update语句,后来实验也证实了这一点,有兴趣的朋友可以自己做实验去验证下。

1
2
3
4
5
6
7
8
9
10
11
12
13
kill -9 杀进程  
kill -9 2012
kill -9 14157
kill -9 10177
kill -9 2043
kill -9 3298
kill -9 3322
kill -9 4114
kill -9 9082
kill -9 10042
kill -9 10278
kill -9 11810
kill -9 11964

这里将正在执行的update tbl_tmp 表的会话kill 掉.

数据库无法连接

1
2
[enterprisedb@sanp-rich-db1 ~]$ psql -h 192.168.3.27  -p 1921  -d postgres -U rmt_db_bak  
psql: FATAL:the database system is in recovery mode

备注:进程kill -9 以后,发现数据库无法连接,遭了,还好是个测试库。

数据库处于恢复模式

1
2
3
4
5
6
7
8
-bash-3.2$ ps -ef | grep post  
postgres 4627 1 0 Apr12 ? 00:02:24 /home/postgres/bin/postgres -D /mnt/data
postgres 4629 4627 0 Apr12 ? 00:00:04 postgres: logger process
root 14195 14168 0 15:37 pts/7 00:00:00 su - postgres
postgres 14196 14195 0 15:37 pts/7 00:00:00 -bash
postgres 14332 4627 44 15:41 ? 00:00:18 postgres: startup process recovering 00000001000000DA0000006E
postgres 21495 14196 0 15:41 pts/7 00:00:00 ps -ef
postgres 21496 14196 0 15:41 pts/7 00:00:00 grep post

发现Postgresql 处于 recovery 状态,奇怪,Postgresql 恢复过程中是不允许连接的,在这点上,Oracle似乎要友好些

结论

  1. postgresql 8.3.3的版本 只提供系统函数 pg_cancel_backend(pid int) 杀进程,但是,这个函数只能 kill Select 查询,而updae,delete DML不生效,感兴趣的朋友可以自己做实验验证下;
  2. postgresql 8.4 以后版本提供 函数 pg_terminate_backend(pid int), 这个函数功能比前者强大,可以kill 各种DML(SELECT,UPDATE,DELETE,DROP)操作;
  3. 对于Postgresql ,最好不要用kill -9 去杀用户进程,因为如果是一个很大的UPDATE,kill后,Postgresql需要很长的时间做Recovery,而在这个RECOVERY过程中,数据库是不可用的,在这点上,Oracle 要友好些,在数据库恢复的过程中数据库依然可用,不太需要因Kill -9 用户进程而造成数据库不可用的情况。

原创文章,作者:Maggie-Hunter,如若转载,请注明出处:https://blog.ytso.com/236331.html

(0)
上一篇 2022年1月24日
下一篇 2022年1月24日

相关推荐

发表回复

登录后才能评论