这个内容只是为了做个记录。

因为项目中有出现coredump的情况。 

先调起来。 

[app@主机A bin]$ gdb PROGRAM core.31018

下面是一连串的GDB信息。

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7

Copyright (C) 2013 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.  Type “show copying”

and “show warranty” for details.

This GDB was configured as “x86_64-redhat-linux-gnu”.

For bug reporting instructions, please see:

<http://www.gnu.org/software/gdb/bugs/>…

上面这段话的意思是,随便用,没毛病。

Reading symbols from /bin/PROGRAM…done.

[New LWP 31018]

[New LWP 31027]

[New LWP 31022]

[New LWP 31036]

[New LWP 31038]

[New LWP 31041]

[New LWP 31044]

[New LWP 31047]

[New LWP 31042]

[New LWP 31032]

[New LWP 31033]

[New LWP 31034]

[New LWP 31035]

[New LWP 31037]

[New LWP 31020]

[New LWP 31026]

[New LWP 31031]

[New LWP 31030]

[New LWP 31040]

[New LWP 31039]

[New LWP 31046]

[New LWP 31045]

[New LWP 31043]

[New LWP 31019]

[New LWP 31025]

[New LWP 31024]

[New LWP 31023]

[New LWP 31021]

[New LWP 31029]

[New LWP 31028]

上面是LWP编号,也就是我们常说的线程号,在linux中线程就是LWP,有人说,LWP不是线程,而是进程。因为是light-weight process嘛,肯定是进程,是的,又不是thread,确实它是叫做轻量级进程。但是在linux中,除了它其他的也没有线程了。看一下WIKI上说的:

In computer operating systems, a light-weight process (LWP) is a means of achieving multitasking. In the traditional meaning of the term, as used in Unix System V and Solaris, a LWP runs in user space on top of a single kernel thread and shares its address space and system resources with other LWPs within the same process. Multiple user level threads, managed by a thread library, can be placed on top of one or many LWPs – allowing multitasking to be done at the user level, which can have some performance benefits.

看了半天,也不知道所以然是啥对吧。那就对了,不用纠结,来跟我一起说,计较那么多概念干吗,这个东西就是线程!

[Thread debugging using libthread_db enabled]

Using host libthread_db library “/lib64/libthread_db.so.1”.

上面是说debug用的是啥子库。

Core was generated by `PROGRAM -g 1 -i 3006 -u VM_16_46_centos -U /data/app/log/LOG -m 0 -A’.

Program terminated with signal 6, Aborted.

这里列出来了是怎么产生的core。 这里有信号6. 中止。 系统有多少信号呢?

大概是下面这么多。

信号

处理动作

发出信号的原因

标准

SIGHUP

1

A

终端挂起或者控制进程终止

POSIX.1

SIGINT

2

A

键盘中断(如break键被按下)

POSIX.1

SIGQUIT

3

C

键盘的退出键被按下

POSIX.1

SIGILL

4

C

非法指令

POSIX.1

SIGABRT

6

C

由abort(3)发出的退出指令

POSIX.1

SIGFPE

8

C

浮点异常

POSIX.1

SIGKILL

9

AEF

Kill信号

POSIX.1

SIGSEGV

11

C

无效的内存引用

POSIX.1

SIGPIPE

13

A

管道破裂:写一个没有读端口的管道

POSIX.1

SIGALRM

14

A

由alarm(2)发出的信号

POSIX.1

SIGTERM

15

A

终止信号

POSIX.1

SIGUSR1

30,10,16

A

用户自定义信号1

POSIX.1

SIGUSR2

31,12,17

A

用户自定义信号2

POSIX.1

SIGCHLD

20,17,18

B

子进程结束信号

POSIX.1

SIGCONT

19,18,25

进程继续(曾被停止的进程)

POSIX.1

SIGSTOP

17,19,23

DEF

终止进程

POSIX.1

SIGTSTP

18,20,24

D

控制终端(tty)上按下停止键

POSIX.1

SIGTTIN

21,21,26

D

后台进程企图从控制终端读

POSIX.1

SIGTTOU

22,22,27

D

后台进程企图从控制终端写

POSIX.1

SIGBUS

10,7,10

C

总线错误(错误的内存访问)

SUSv2

SIGPOLL

A

Sys

V定义的Pollable事件,与SIGIO同义

SUSv2

SIGPROF

27,27,29

A

Profiling定时器到

SUSv2

SIGSYS

12,-,12

C

无效的系统调用(SVID)

SUSv2

SIGTRAP

5

C

跟踪/断点捕获

SUSv2

SIGURG

16,23,21

B

Socket出现紧急条件(4.2BSD)

SUSv2

SIGVTALRM

26,26,28

A

实际时间报警时钟信号(4.2BSD)

SUSv2

SIGXCPU

24,24,30

C

超出设定的CPU时间限制(4.2BSD)

SUSv2

SIGXFSZ

25,25,31

C

超出设定的文件大小限制(4.2BSD)

SUSv2

SIGIOT

6

C

IO捕获指令,与SIGABRT同义

SIGEMT

7,-,7

SIGSTKFLT

-,16,-

A

协处理器堆栈错误

SIGIO

23,29,22

A

某I/O操作现在可以进行了(4.2 BSD)

SIGCLD

-,-,18

A

与SIGCHLD同义

SIGPWR

29,30,19

A

电源故障(System V)

SIGINFO

29,-,-

A

与SIGPWR同义

SIGLOST

-,-,-

A

文件锁丢失

SIGWINCH

28,28,20

B

窗口大小改变(4.3 BSD,Sun)

SIGUNUSED

-,31,-

A

未使用的信号(will be SIGSYS)

那上面的处理动作是什么意思呢?

A 缺省的动作是终止进程 

B 缺省的动作是忽略此信号 

C 缺省的动作是终止进程并进行内核映像转储(dump core) 

D 缺省的动作是停止进程 

E 信号不能被捕获 

F 信号不能被忽略 

#0  0x00007fa1fef385f7 in raise () from /lib64/libc.so.6

Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-19.2.el7.x86_64 elfutils-libelf-0.163-3.el7.x86_64 glibc-2.17-106.el7_2.4.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.13.2-10.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64 libcurl-7.29.0-25.el7.centos.x86_64 libgcc-4.8.5-4.el7.x86_64 libidn-1.28-4.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 libssh2-1.4.3-10.el7.x86_64 libstdc++-4.8.5-4.el7.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 nspr-4.10.8-2.el7_1.x86_64 nss-3.19.1-18.el7.x86_64 nss-softokn-freebl-3.16.2.3-13.el7_1.x86_64 nss-util-3.19.1-4.el7_1.x86_64 openldap-2.4.40-8.el7.x86_64 openssl-libs-1.0.1e-42.el7.9.x86_64 pcre-8.32-15.el7.x86_64 readline-6.2-9.el7.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64

上面这些是引用了一系列的东西来debug这个core文件。要是换了个机器说不定core的内容都看不到了呢(我猜的,我并没有那么闲,真的换个机器试一下)。

查看断点。

(gdb) bt

#0  0x00007fa1fef385f7 in raise () from /lib64/libc.so.6

#1  0x00007fa1fef39ce8 in abort () from /lib64/libc.so.6

#2  0x00007fa1fef78317 in __libc_message () from /lib64/libc.so.6

#3  0x00007fa1fef7e184 in malloc_printerr () from /lib64/libc.so.6

#4  0x00007fa1fef818e7 in _int_malloc () from /lib64/libc.so.6

#5  0x00007fa1fef828dc in malloc () from /lib64/libc.so.6

#6  0x000000000043a147 in CMemPool::frealloc (ud=0x0, ptr=0x0, osize=0, nsize=64, p=0x1a8a450) at MemPool.h:266

#7  0x0000000000434898 in luaM_realloc_ (L=0x1b344e0, block=0x0, osize=0, nsize=64) at lmem.cpp:79

#8  0x000000000043b481 in luaH_new (L=0x1b344e0, narray=0, nhash=0) at ltable.cpp:359

#9  0x000000000042cbf8 in lua_createtable (L=0x1b344e0, narray=0, nrec=0) at lapi.cpp:582

#10 0x00007fa1fecf0f76 in getMessage (l=0x1b344e0, pMessage=0x7fa1bc0008c0) at message.h:218

#11 0x00007fa1fecf3af6 in getResponse (l=0x1b344e0, res=0x1b0d6d0) at service.cpp:28

#12 0x00007fa1fecf3d3b in sendM (l=0x1b344e0) at service.cpp:59

#13 0x0000000000430dc0 in luaD_precall (L=0x1b344e0, func=0x1b247b0, nresults=2) at ldo.cpp:319

#14 0x000000000043faad in luaV_execute (L=0x1b344e0, nexeccalls=1) at lvm.cpp:590

#15 0x0000000000431092 in luaD_call (L=0x1b344e0, func=0x1b24740, nResults=-1) at ldo.cpp:377

#16 0x000000000042d420 in f_call (L=0x1b344e0, ud=0x7ffeb1c9db20) at lapi.cpp:801

#17 0x000000000042ffed in luaD_rawrunprotected (L=0x1b344e0, f=0x42d3eb <f_call(lua_State*, void*)>, ud=0x7ffeb1c9db20) at ldo.cpp:116

#18 0x00000000004314a3 in luaD_pcall (L=0x1b344e0, func=0x42d3eb <f_call(lua_State*, void*)>, u=0x7ffeb1c9db20, old_top=64, ef=0) at ldo.cpp:464

#19 0x000000000042d4c9 in lua_pcall (L=0x1b344e0, nargs=0, nresults=-1, errfunc=0) at lapi.cpp:822

#20 0x000000000044f074 in luaB_pcall (L=0x1b344e0) at lbaselib.cpp:466

#21 0x0000000000430dc0 in luaD_precall (L=0x1b344e0, func=0x1b24730, nresults=2) at ldo.cpp:319

#22 0x000000000043faad in luaV_execute (L=0x1b344e0, nexeccalls=2) at lvm.cpp:590

#23 0x0000000000431092 in luaD_call (L=0x1b344e0, func=0x1b24710, nResults=-1) at ldo.cpp:377

#24 0x000000000042d420 in f_call (L=0x1b344e0, ud=0x7ffeb1c9e230) at lapi.cpp:801

#25 0x000000000042ffed in luaD_rawrunprotected (L=0x1b344e0, f=0x42d3eb <f_call(lua_State*, void*)>, ud=0x7ffeb1c9e230) at ldo.cpp:116

#26 0x00000000004314a3 in luaD_pcall (L=0x1b344e0, func=0x42d3eb <f_call(lua_State*, void*)>, u=0x7ffeb1c9e230, old_top=16, ef=0) at ldo.cpp:464

#27 0x000000000042d4c9 in lua_pcall (L=0x1b344e0, nargs=0, nresults=-1, errfunc=0) at lapi.cpp:822

#28 0x0000000000426951 in process () at srv.cpp:120

#29 0x00000000004268ac in PROGRAM (req=0x7ffeb1c9e340) at srv.cpp:107

#30 0x00000000004bad36 in _svcdsp ()

#31 0x00000000004a3b4c in _runserver ()

#32 0x00000000004a2a22 in _main ()

#33 0x00000000004265f0 in main ()

上面这条就是告诉你这个core文件dump点是在哪里,调用关系从下到上。这里面看到的问题点基本上都是底层的调用。而这些底层的调用也只是表现,最重要的是上层的变量是怎么传的。

闲着没事,看下所有线程的当前断点。

(gdb) info threads

  Id   Target Id         Frame

  30   Thread 0x7fa1f5365700 (LWP 31028) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  29   Thread 0x7fa1f4b64700 (LWP 31029) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  28   Thread 0x7fa1f8b6c700 (LWP 31021) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  27   Thread 0x7fa1f7b6a700 (LWP 31023) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  26   Thread 0x7fa1f7369700 (LWP 31024) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  25   Thread 0x7fa1f6b68700 (LWP 31025) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  24   Thread 0x7fa1f9b6e700 (LWP 31019) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  23   Thread 0x7fa1edb56700 (LWP 31043) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  22   Thread 0x7fa1ecb54700 (LWP 31045) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  21   Thread 0x7fa1ec353700 (LWP 31046) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  20   Thread 0x7fa1efb5a700 (LWP 31039) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  19   Thread 0x7fa1ef359700 (LWP 31040) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  18   Thread 0x7fa1f4363700 (LWP 31030) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  17   Thread 0x7fa1f3b62700 (LWP 31031) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  16   Thread 0x7fa1f6367700 (LWP 31026) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  15   Thread 0x7fa1f936d700 (LWP 31020) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  14   Thread 0x7fa1f0b5c700 (LWP 31037) 0x00007fa1feff09b3 in select () from /lib64/libc.so.6

  13   Thread 0x7fa1f1b5e700 (LWP 31035) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  12   Thread 0x7fa1f235f700 (LWP 31034) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  11   Thread 0x7fa1f2b60700 (LWP 31033) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  10   Thread 0x7fa1f3361700 (LWP 31032) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  9    Thread 0x7fa1ee357700 (LWP 31042) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  8    Thread 0x7fa1ebb52700 (LWP 31047) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  7    Thread 0x7fa1ed355700 (LWP 31044) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  6    Thread 0x7fa1eeb58700 (LWP 31041) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  5    Thread 0x7fa1f035b700 (LWP 31038) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  4    Thread 0x7fa1f135d700 (LWP 31036) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  3    Thread 0x7fa1f836b700 (LWP 31022) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  2    Thread 0x7fa1f5b66700 (LWP 31027) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

* 1    Thread 0x7fa2009b0740 (LWP 31018) 0x00007fa1fef385f7 in raise () from /lib64/libc.so.6

(gdb)

大部分都在wait/timewait之类的,也没啥毛病。

尝试打印下变量

(gdb) p req

No symbol “req” in current context.

怎么没有符号表?

切一下frame。

(gdb) frame 29

#29 0x00000000004268ac in PROGRAM (req=0x7ffeb1c9e340) at srv.cpp:107

(gdb) p req

$1 = (SVCINFO *) 0x7ffeb1c9e340

可以看到这个变量的定义和值。有人说,这玩意是地址怎么看?

其实有源码就什么都能看得到的。只是这里没有加载进来。

GDB默认搜索当前目录,但是也没搜索到。

编译的时候是会记录源码位置的,但是因为这个主机上没有,所以看不到。

如果有兴趣玩的话,可以自己写一段把源码放一起,看看效果。