今天生产库上有个会话(Select 操作) 引起了我的注意,都运行两天了,还没结束,这个SQL关联查询两张表的数据,具体信息如下
数据库和 OS 版本
PostgreSQL版本: 8.4.3
OS: Red Hat Enterprise Linux Server release 4.5
查询数据库活动会话
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
|
db_skytf=# select procpid, usename,query_start,client_addr, client_port ,current_query ,waiting from pg_stat_activity where current_query !='<IDLE>' ; procpid | usename | query_start | client_addr | client_port | current_query | waiting ---------+----------+-------------------------------+---------------+-------------+------------------------------------ 15583 | postgres | 2011-04-26 15:35:48.909562+08 | | -1 | select procpid, usename,query_start,client_addr, client_port ,current_query ,waiting from pg_stat_activity where current_query !='<IDLE>' ; | f 1312 | skytf | 2011-04-24 13:59:49.318178+08 | 192.168.x.xxx | 34621 | SELECT | f : mrp.*,res.name as displayname : FROM : skytf.tbl_app_mrp as mrp,tbl_app_resource as res : where mrp.appid=res.appid and : mrp.status='1' and res.status='1' : order by id desc (2 rows)
|
备注:反复执行上述查询几次,正是会话 1312 一直在执行, 根据 query_start,可以看出这个SQL已经跑了两天了。下面看下相关表信息,并看下这个SQL的执行计划。
SQL 信息
查看表大小
1 2 3 4 5 6 7 8 9 10 11 12 13
|
db_skytf=> /dt+ tbl_app_mrp List of relations Schema | Name | Type | Owner | Size | Description ---------+-------------+-------+---------+--------+------------- skytf | tbl_app_mrp | table | skytf | 520 MB | (1 row)
db_skytf=> /dt+ tbl_app_resource List of relations Schema | Name | Type | Owner | Size | Description ---------+------------------+-------+---------+-------+------------- skytf | tbl_app_resource | table | skytf | 24 kB | (1 row)
|
从上面看出表不大。
查看执行计划
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
|
db_skytf=> explain SELECT db_skytf-> mrp.*,res.name as displayname db_skytf-> FROM skytf.tbl_app_mrp as mrp,tbl_app_resource as res db_skytf-> where mrp.appid=res.appid and db_skytf-> mrp.status='1' and res.status='1' ; QUERY PLAN ----------------------------------------------------------------------------------- Hash Join (cost=7.56..145797.09 rows=2839643 width=141) Hash Cond: (mrp.appid = res.appid) -> Seq Scan on tbl_app_mrp mrp (cost=0.00..106744.44 rows=2839643 width=127) Filter: (status = '1'::bpchar) -> Hash (cost=5.29..5.29 rows=182 width=18) -> Seq Scan on tbl_app_resource res (cost=0.00..5.29 rows=182 width=18) Filter: (status = '1'::bpchar) (7 rows) db_skytf=> explain analyze SELECT db_skytf-> mrp.*,res.name as displayname db_skytf-> FROM skytf.tbl_app_mrp as mrp,tbl_app_resource as res db_skytf-> where mrp.appid=res.appid and db_skytf-> mrp.status='1' and res.status='1' ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=7.56..145797.09 rows=2839643 width=141) (actual time=0.270..3211.741 rows=2799308 loops=1) Hash Cond: (mrp.appid = res.appid) -> Seq Scan on tbl_app_mrp mrp (cost=0.00..106744.44 rows=2839643 width=127) (actual time=0.043..1896.213 rows=2799308 loops=1) Filter: (status = '1'::bpchar) -> Hash (cost=5.29..5.29 rows=182 width=18) (actual time=0.187..0.187 rows=184 loops=1) -> Seq Scan on tbl_app_resource res (cost=0.00..5.29 rows=182 width=18) (actual time=0.017..0.112 rows=184 loops=1) Filter: (status = '1'::bpchar) Total runtime: 3455.818 ms (8 rows)
|
备注:从PLAN执行情况来看,执行时间还快的,3 秒钟左右就跑完了,但为什么 1312 的查询都跑了两天了,还没跑完,猜测进程可能僵死,于是决定将它Kill。
杀会话
1 2 3 4 5
|
db_skytf=# select pg_cancel_backend(1312) pg_cancel_backend ------------------- t (1 row)
|
起初,使用 pg_cancel_backend
, 返回结果为 true, 但发现 1312 会话依然还在下面接着使用 pg_terminate_backend
看下效果。
1 2 3 4 5
|
db_skytf=# select pg_terminate_backend(1312) pg_terminate_backend ---------------------- t (1 row)
|
再次查询数据库活动的会话
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
|
db_skytf=# select procpid, usename,query_start,client_addr, client_port ,current_query ,waiting from pg_stat_activity where current_query !='<IDLE>' ; procpid | usename | query_start | client_addr | client_port | current_query | waiting ---------+----------+-------------------------------+---------------+-------------+------------------------------------------------- --------------------------------------------------------------------------------------------+--------- 15583 | postgres | 2011-04-26 15:35:48.909562+08 | | -1 | select procpid, usename,query_start,client_addr, client_port ,current_query ,waiting from pg_stat_activity where current_query !='<IDLE>' ; | f 1312 | skytf | 2011-04-24 13:59:49.318178+08 | 192.168.x.xxx | 34621 | SELECT | f : mrp.*,res.name as displayname : FROM : skytf.tbl_app_mrp as mrp,tbl_app_resource as res : where mrp.appid=res.appid and : mrp.status='1' and res.status='1' : order by id desc
|
使用 pg_terminate_backend
后,虽然结果也返回 t,但会话 1312 依然没被Kill, 好顽强的会话啊,要是这个库是 Oracle, 俺早使用 kill -9 结果它了,但这是 PostgreSQL 使用 kill -9 后果很严重。。。
从 OS 层面分析
1 2 3
|
[postgres@skytf-db](mailto:postgres@skytf-db)-> ps -ef | grep 1312; postgres 1312 3949 0 Jan01 ? 00:41:55 postgres: skytf db_skytf 192.168.x.xxx(34621) SELECT postgres 15354 15105 0 15:16 pts/4 00:00:00 grep 1312
|
备注:怀疑 1312 进程可能已经僵死,于是让应用人员检查。
应用主机(192.168.x.xxx)查看进程
1 2
|
netstat -anp|grep 34621 tcp 3216507 0 ::ffff:192.168.x.xxx:34621 ::ffff:192.168.1.102:1921 ESTABLISHED 4187/java
|
备注:应用人员发现些进程无异常,后来建议将这进程杀掉,在通过应用人员将这个进程杀掉后, 1312 会话终于被干掉了。
总结
有些SQL在通过 pg_cancel_backend
和 pg_terminate_backen
也未能 Kill 的情况下,建议联系应用人员 kill 对应的应用进程。
原创文章,作者:745907710,如若转载,请注明出处:https://blog.ytso.com/tech/database/236393.html