Killed a query failed When using pg_termniate_backend

今天生产库上有个会话(Select 操作) 引起了我的注意,都运行两天了,还没结束,这个SQL关联查询两张表的数据,具体信息如下

数据库和 OS 版本

PostgreSQL版本: 8.4.3
OS: Red Hat Enterprise Linux Server release 4.5

查询数据库活动会话

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
db_skytf=# select procpid, usename,query_start,client_addr, client_port ,current_query ,waiting from pg_stat_activity where current_query !='<IDLE>' ;  
procpid | usename | query_start | client_addr | client_port |
current_query | waiting
---------+----------+-------------------------------+---------------+-------------+------------------------------------
15583 | postgres | 2011-04-26 15:35:48.909562+08 | | -1 | select procpid, usename,query_start,client_addr,
client_port ,current_query ,waiting from pg_stat_activity where current_query !='<IDLE>' ; | f
1312 | skytf | 2011-04-24 13:59:49.318178+08 | 192.168.x.xxx | 34621 | SELECT
| f
: mrp.*,res.name as displayname
: FROM
: skytf.tbl_app_mrp as mrp,tbl_app_resource as res
: where mrp.appid=res.appid and
: mrp.status='1' and res.status='1'
: order by id desc
(2 rows)

备注:反复执行上述查询几次,正是会话 1312 一直在执行, 根据 query_start,可以看出这个SQL已经跑了两天了。下面看下相关表信息,并看下这个SQL的执行计划。

SQL 信息

查看表大小

1
2
3
4
5
6
7
8
9
10
11
12
13
db_skytf=> /dt+ tbl_app_mrp  
List of relations
Schema | Name | Type | Owner | Size | Description
---------+-------------+-------+---------+--------+-------------
skytf | tbl_app_mrp | table | skytf | 520 MB |
(1 row)

db_skytf=> /dt+ tbl_app_resource
List of relations
Schema | Name | Type | Owner | Size | Description
---------+------------------+-------+---------+-------+-------------
skytf | tbl_app_resource | table | skytf | 24 kB |
(1 row)

从上面看出表不大。

查看执行计划

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
db_skytf=> explain SELECT  
db_skytf-> mrp.*,res.name as displayname
db_skytf-> FROM skytf.tbl_app_mrp as mrp,tbl_app_resource as res
db_skytf-> where mrp.appid=res.appid and
db_skytf-> mrp.status='1' and res.status='1' ;
QUERY PLAN
-----------------------------------------------------------------------------------
Hash Join (cost=7.56..145797.09 rows=2839643 width=141)
Hash Cond: (mrp.appid = res.appid)
-> Seq Scan on tbl_app_mrp mrp (cost=0.00..106744.44 rows=2839643 width=127)
Filter: (status = '1'::bpchar)
-> Hash (cost=5.29..5.29 rows=182 width=18)
-> Seq Scan on tbl_app_resource res (cost=0.00..5.29 rows=182 width=18)
Filter: (status = '1'::bpchar)
(7 rows)
db_skytf=> explain analyze SELECT
db_skytf-> mrp.*,res.name as displayname
db_skytf-> FROM skytf.tbl_app_mrp as mrp,tbl_app_resource as res
db_skytf-> where mrp.appid=res.appid and
db_skytf-> mrp.status='1' and res.status='1' ;
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=7.56..145797.09 rows=2839643 width=141) (actual time=0.270..3211.741 rows=2799308 loops=1)
Hash Cond: (mrp.appid = res.appid)
-> Seq Scan on tbl_app_mrp mrp (cost=0.00..106744.44 rows=2839643 width=127) (actual time=0.043..1896.213 rows=2799308 loops=1)
Filter: (status = '1'::bpchar)
-> Hash (cost=5.29..5.29 rows=182 width=18) (actual time=0.187..0.187 rows=184 loops=1)
-> Seq Scan on tbl_app_resource res (cost=0.00..5.29 rows=182 width=18) (actual time=0.017..0.112 rows=184 loops=1)
Filter: (status = '1'::bpchar)
Total runtime: 3455.818 ms
(8 rows)

备注:从PLAN执行情况来看,执行时间还快的,3 秒钟左右就跑完了,但为什么 1312 的查询都跑了两天了,还没跑完,猜测进程可能僵死,于是决定将它Kill。

杀会话

1
2
3
4
5
db_skytf=# select pg_cancel_backend(1312);  
pg_cancel_backend
-------------------
t
(1 row)

起初,使用 pg_cancel_backend, 返回结果为 true, 但发现 1312 会话依然还在下面接着使用 pg_terminate_backend看下效果。

1
2
3
4
5
db_skytf=# select pg_terminate_backend(1312);  
pg_terminate_backend
----------------------
t
(1 row)

再次查询数据库活动的会话

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
db_skytf=# select procpid, usename,query_start,client_addr, client_port ,current_query ,waiting from pg_stat_activity where current_query !='<IDLE>' ;  
procpid | usename | query_start | client_addr | client_port |
current_query | waiting
---------+----------+-------------------------------+---------------+-------------+-------------------------------------------------
--------------------------------------------------------------------------------------------+---------
15583 | postgres | 2011-04-26 15:35:48.909562+08 | | -1 | select procpid, usename,query_start,client_addr,
client_port ,current_query ,waiting from pg_stat_activity where current_query !='<IDLE>' ; | f
1312 | skytf | 2011-04-24 13:59:49.318178+08 | 192.168.x.xxx | 34621 | SELECT
| f
: mrp.*,res.name as displayname
: FROM
: skytf.tbl_app_mrp as mrp,tbl_app_resource as res
: where mrp.appid=res.appid and
: mrp.status='1' and res.status='1'
: order by id desc

使用 pg_terminate_backend 后,虽然结果也返回 t,但会话 1312 依然没被Kill, 好顽强的会话啊,要是这个库是 Oracle, 俺早使用 kill -9 结果它了,但这是 PostgreSQL 使用 kill -9 后果很严重。。。

从 OS 层面分析

1
2
3
[postgres@skytf-db](mailto:postgres@skytf-db)-> ps -ef | grep 1312;  
postgres 1312 3949 0 Jan01 ? 00:41:55 postgres: skytf db_skytf 192.168.x.xxx(34621) SELECT
postgres 15354 15105 0 15:16 pts/4 00:00:00 grep 1312

备注:怀疑 1312 进程可能已经僵死,于是让应用人员检查。

应用主机(192.168.x.xxx)查看进程

1
2
netstat -anp|grep 34621  
tcp 3216507 0 ::ffff:192.168.x.xxx:34621 ::ffff:192.168.1.102:1921 ESTABLISHED 4187/java

备注:应用人员发现些进程无异常,后来建议将这进程杀掉,在通过应用人员将这个进程杀掉后, 1312 会话终于被干掉了。

总结

有些SQL在通过 pg_cancel_backendpg_terminate_backen 也未能 Kill 的情况下,建议联系应用人员 kill 对应的应用进程。

原创文章,作者:745907710,如若转载,请注明出处:https://blog.ytso.com/tech/database/236393.html

(0)
上一篇 2022年1月24日 21:28
下一篇 2022年1月24日 21:28

相关推荐

发表回复

登录后才能评论