问题描述
【管控版本】Data Warehouse Service 821
Data Warehouse Service 821 DN节点有SQL残留导致truncate table卡住(GaussDB) 821 DN节点有SQL残留导致truncate table卡住
使用的ETL工具需要经常truncate table 后再写入数据
告警信息
无
处理过程
登录postgres数据库,查询锁冲突视图PGXC_LOCK_CONFLICTS。无锁冲突
select * from PGXC_LOCK_CONFLICTS;
https://doc.hcs.huawei.com/zh-cn/devg/dws/dws_04_0809.html
SELECT locktype, nodename, dbname, nspname, relname, partname, page, tuple, transactionid, username, gxid, xactstart, queryid, substr(query,1,150), pid, mode, granted FROM pgxc_lock_conflicts ORDER BY nodename,dbname,locktype,nspname,relname,partname limit 100;
查询pgxc_deadlock视图未发现死锁
select *from pgxc_deadlock;
查询执行truncate的query id
select coorname,usename,client_addr,sysdate-query_start as dur,query_id,substr(query,1,100) from pgxc_stat_activity where pid <>pg_backend_pid() and query ilike ‘%table_name%’;
根据query id查询等待事件,确定正在等待dn_6008_6010的DN实例
select * from pgxc_thread_wait_status where query_id ='<上面查询的query id>’ and wait_status !=’synchronize quit’;
根据query id查询等待事件
select * from pg_thread_wait_status where query_id =’xxxx’;
gstack 打印堆栈,没安装gdb,无法继续分析
gstack <上面查询的lwtid>
查询执行truncate命令的会话信息
select * from pgxc_stat_activity where state=’active’ and query ilike ‘truncate%’;
尝试kill会话:
execute direct on (<上面查到的cooranme>) ‘select pg_terminate_backend(<上面查到的PID>)’;
执行truncate table重试 还是卡住:
再次查询,怀疑dn_6009_6010上有残留:
查询执行truncate的query id
select coorname,usename,client_addr,sysdate-query_start as dur,query_id,substr(query,1,100) from pgxc_stat_activity where pid <>pg_backend_pid() and query ilike ‘%table_name%’;
根据query id查询等待事件,确定dn_6009正在等待锁
select * from pgxc_thread_wait_status where query_id ='<上面查询的query id>’ and wait_status !=’synchronize quit’;
应急处理,查询6009的DN进程ID
确定进程号为401345后kill DN进程,自动拉起
执行cm_ctl query -Cvd 查看集群状态正常,DN_6009已经拉起
再次执行truncate table命令成功
根因
DN_6009正在等待锁,有SQL残留,未安装gdb软件包,无gstack命令无法分析根因
解决方案
应急处理:kill DN_6009进程等待拉起后重试truncate table
建议与总结
需要先确定好产品是DWS(GaussDB) 还是GaussDB的问题
原创文章,作者:奋斗,如若转载,请注明出处:https://blog.ytso.com/tech/bigdata/316312.html