今天生产库上有个会话(Select 操作) 引起了我的注意，都运行两天了，还没结束，这个SQL关联查询两张表的数据，具体信息如下

数据库和 OS 版本

PostgreSQL版本: 8.4.3
OS: Red Hat Enterprise Linux Server release 4.5

查询数据库活动会话

db_skytf=# select procpid, usename,query_start,client_addr, client_port ,current_query ,waiting from pg_stat_activity where current_query !='<IDLE>' ;  
procpid | usename | query_start | client_addr | client_port |  
               current_query | waiting  
---------+----------+-------------------------------+---------------+-------------+------------------------------------   
 15583 | postgres | 2011-04-26 15:35:48.909562+08 | | -1 | select procpid, usename,query_start,client_addr,  
client_port ,current_query ,waiting from pg_stat_activity where current_query !='<IDLE>' ; | f  
 1312 | skytf | 2011-04-24 13:59:49.318178+08 | 192.168.x.xxx | 34621 | SELECT  
 | f  
 : mrp.*,res.name as displayname  
 : FROM  
 : skytf.tbl_app_mrp as mrp,tbl_app_resource as res  
 : where mrp.appid=res.appid and  
 : mrp.status='1' and res.status='1'  
 : order by id desc    
(2 rows)

备注：反复执行上述查询几次，正是会话 1312 一直在执行，根据 query_start，可以看出这个SQL已经跑了两天了。下面看下相关表信息，并看下这个SQL的执行计划。

SQL 信息

查看表大小

db_skytf=> /dt+ tbl_app_mrp  
 List of relations  
Schema | Name | Type | Owner | Size | Description  
---------+-------------+-------+---------+--------+-------------  
skytf | tbl_app_mrp | table | skytf | 520 MB |  
(1 row)

db_skytf=> /dt+ tbl_app_resource  
 List of relations  
Schema | Name | Type | Owner | Size | Description  
---------+------------------+-------+---------+-------+-------------  
skytf | tbl_app_resource | table | skytf | 24 kB |  
(1 row)

从上面看出表不大。

查看执行计划

db_skytf=> explain SELECT  
db_skytf-> mrp.*,res.name as displayname  
db_skytf-> FROM skytf.tbl_app_mrp as mrp,tbl_app_resource as res  
db_skytf-> where mrp.appid=res.appid and  
db_skytf-> mrp.status='1' and res.status='1' ;  
 QUERY PLAN  
-----------------------------------------------------------------------------------  
Hash Join (cost=7.56..145797.09 rows=2839643 width=141)  
 Hash Cond: (mrp.appid = res.appid)  
 -> Seq Scan on tbl_app_mrp mrp (cost=0.00..106744.44 rows=2839643 width=127)  
 Filter: (status = '1'::bpchar)  
 -> Hash (cost=5.29..5.29 rows=182 width=18)  
 -> Seq Scan on tbl_app_resource res (cost=0.00..5.29 rows=182 width=18)  
 Filter: (status = '1'::bpchar)  
(7 rows)
db_skytf=> explain analyze SELECT  
db_skytf-> mrp.*,res.name as displayname  
db_skytf-> FROM skytf.tbl_app_mrp as mrp,tbl_app_resource as res  
db_skytf-> where mrp.appid=res.appid and  
db_skytf-> mrp.status='1' and res.status='1' ;  
                      QUERY PLAN  
       
------------------------------------------------------------------------------------------------------------------------   
Hash Join (cost=7.56..145797.09 rows=2839643 width=141) (actual time=0.270..3211.741 rows=2799308 loops=1)  
 Hash Cond: (mrp.appid = res.appid)  
 -> Seq Scan on tbl_app_mrp mrp (cost=0.00..106744.44 rows=2839643 width=127) (actual time=0.043..1896.213 rows=2799308 loops=1)  
 Filter: (status = '1'::bpchar)  
 -> Hash (cost=5.29..5.29 rows=182 width=18) (actual time=0.187..0.187 rows=184 loops=1)  
 -> Seq Scan on tbl_app_resource res (cost=0.00..5.29 rows=182 width=18) (actual time=0.017..0.112 rows=184 loops=1)  
 Filter: (status = '1'::bpchar)  
Total runtime: 3455.818 ms  
(8 rows)

备注：从PLAN执行情况来看，执行时间还快的，3 秒钟左右就跑完了，但为什么 1312 的查询都跑了两天了，还没跑完，猜测进程可能僵死，于是决定将它Kill。

杀会话

db_skytf=# select pg_cancel_backend(1312);  
pg_cancel_backend  
-------------------  
t  
(1 row)

起初，使用 pg_cancel_backend, 返回结果为 true, 但发现 1312 会话依然还在下面接着使用 pg_terminate_backend看下效果。

db_skytf=# select pg_terminate_backend(1312);  
pg_terminate_backend  
----------------------  
t  
(1 row)

再次查询数据库活动的会话

db_skytf=# select procpid, usename,query_start,client_addr, client_port ,current_query ,waiting from pg_stat_activity where current_query !='<IDLE>' ;  
procpid | usename | query_start | client_addr | client_port |  
 current_query | waiting  
---------+----------+-------------------------------+---------------+-------------+-------------------------------------------------  
--------------------------------------------------------------------------------------------+---------  
 15583 | postgres | 2011-04-26 15:35:48.909562+08 | | -1 | select procpid, usename,query_start,client_addr,  
client_port ,current_query ,waiting from pg_stat_activity where current_query !='<IDLE>' ; | f  
 1312 | skytf | 2011-04-24 13:59:49.318178+08 | 192.168.x.xxx | 34621 | SELECT  
 | f  
 : mrp.*,res.name as displayname  
 : FROM  
 : skytf.tbl_app_mrp as mrp,tbl_app_resource as res  
 : where mrp.appid=res.appid and  
 : mrp.status='1' and res.status='1'  
 : order by id desc

使用 pg_terminate_backend 后，虽然结果也返回 t,但会话 1312 依然没被Kill, 好顽强的会话啊，要是这个库是 Oracle，俺早使用 kill -9 结果它了，但这是 PostgreSQL 使用 kill -9 后果很严重。。。

从 OS 层面分析

1
2
3

[postgres@skytf-db](mailto:postgres@skytf-db)-> ps -ef | grep 1312;  
postgres 1312 3949 0 Jan01 ? 00:41:55 postgres: skytf db_skytf 192.168.x.xxx(34621) SELECT  
postgres 15354 15105 0 15:16 pts/4 00:00:00 grep 1312

备注：怀疑 1312 进程可能已经僵死，于是让应用人员检查。

应用主机(192.168.x.xxx)查看进程

1 2	netstat -anp\|grep 34621 tcp 3216507 0 ::ffff:192.168.x.xxx:34621 ::ffff:192.168.1.102:1921 ESTABLISHED 4187/java

备注：应用人员发现些进程无异常，后来建议将这进程杀掉，在通过应用人员将这个进程杀掉后， 1312 会话终于被干掉了。

总结

有些SQL在通过 pg_cancel_backend 和 pg_terminate_backen 也未能 Kill 的情况下，建议联系应用人员 kill 对应的应用进程。

原创文章，作者：745907710，如若转载，请注明出处：https://blog.ytso.com/236393.html

Killed a query failed When using pg_termniate_backend

数据库和 OS 版本

查询数据库活动会话

SQL 信息

查看执行计划

杀会话

从 OS 层面分析

总结

发表回复

Killed a query failed When using pg_termniate_backend

数据库和 OS 版本

查询数据库活动会话

SQL 信息

查看执行计划

杀会话

从 OS 层面分析

总结

相关推荐

发表回复