2011-09-22

服务器进程卡死调查

服务器上运行了一个python的爬虫程序,一直运行的很好的。今天凌晨突然进程占cpu 100%,然后爬取一直无法继续,因为是多线程的,用strace, ltrace都只能看到线程在等待锁。后来想到用gdb查看线程的信息,gdb完全不会用,只好借用poor man's profiler里的

(echo "set pagination 0";
echo "thread apply all bt";
echo "quit"; cat /dev/zero ) | gdb -p [processid]

得到
Thread 12 (Thread 0x41d61940 (LWP 2741)):
#0 0x000000333120cb21 in sem_wait () from /lib64/libpthread.so.0
#1 0x00000000004c06ad in PyThread_acquire_lock (lock=0x48bb110, waitflag=1) at Python/thread_pthread.h:349
#2 0x000000000048c6fa in PyEval_AcquireThread (tstate=0x48e8a10) at Python/ceval.c:253
#3 0x00002b31ff3b9121 in util_write_callback ...
#4 0x00002b31ff5d9bf8 in Curl_client_write ...
#5 0x00002b31ff5f030f in Curl_httpchunk_read ...
#6 0x00002b31ff5ee150 in readwrite_data (conn=0x2aaaac052b70, done=0x41d606c7) at transfer.c:530
#7 Curl_readwrite (conn=0x2aaaac052b70, done=0x41d606c7) at transfer.c:1600
#8 0x00002b31ff5ef016 in Transfer (data=0x2aaaac053b10) at transfer.c:1855
#9 Curl_perform (data=0x2aaaac053b10) at transfer.c:2451
#10 0x00002b31ff3b9503 in do_curl_perform (self=0x2aaaac047120) at src/pycurl.c:1024
#11 0x0000000000492f12 in call_function (f=0x4a70930, throwflag=) at Python/ceval.c:3690
#12 PyEval_EvalFrameEx (f=0x4a70930, throwflag=) at Python/ceval.c:2389
#13 0x0000000000494b7d in PyEval_EvalCodeEx (co=0x484ddc8, globals=, locals=, args=0x4a706a8, argcount=1, kws=0x4a706b0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968
#14 0x0000000000492d71 in call_function (f=0x4a70510, throwflag=) at Python/ceval.c:3802
#15 PyEval_EvalFrameEx (f=0x4a70510, throwflag=) at Python/ceval.c:2389
#16 0x00000000004941c5 in call_function (f=0x4a70340, throwflag=) at Python/ceval.c:3792
#17 PyEval_EvalFrameEx (f=0x4a70340, throwflag=) at Python/ceval.c:2389
#18 0x0000000000494b7d in PyEval_EvalCodeEx (co=0x475e6c0, globals=, locals=, args=0x4a12ee8, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968
#19 0x00000000004ea69d in function_call (func=0x4799cf8, arg=0x4a12ed0, kw=0x0) at Objects/funcobject.c:524
#20 0x0000000000417bbd in PyObject_Call (func=0x4799cf8, arg=0x4a12ed0, kw=0x0) at Objects/abstract.c:2492
#21 0x000000000041efcf in instancemethod_call (func=, arg=0x4a12ed0, kw=0x0) at Objects/classobject.c:2579
#22 0x0000000000417bbd in PyObject_Call (func=0x4858b40, arg=0x2b31f944d050, kw=0x0) at Objects/abstract.c:2492
#23 0x000000000048c516 in PyEval_CallObjectWithKeywords (func=0x4858b40, arg=0x2b31f944d050, kw=0x0) at Python/ceval.c:3575
#24 0x00000000004c39ad in t_bootstrap (boot_raw=0x4a63920) at ./Modules/threadmodule.c:425
#25 0x0000003331206617 in start_thread () from /lib64/libpthread.so.0
#26 0x0000003330ad3c2d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x427f5940 (LWP 2742)):
#0 0x000000333120cb21 in sem_wait () from /lib64/libpthread.so.0
#1 0x00000000004c06ad in PyThread_acquire_lock (lock=0x48bb110, waitflag=1) at Python/thread_pthread.h:349
#2 0x000000000048c6fa in PyEval_AcquireThread (tstate=0x48e18e0) at Python/ceval.c:253
#3 0x00002b31ff3b9121 in util_write_callback ...
#4 0x00002b31ff5d9bf8 in Curl_client_write ...
#5 0x00002b31ff5f030f in Curl_httpchunk_read ...
#6 0x00002b31ff5ee150 in readwrite_data (conn=0x2aaaac001170, done=0x427f46c7) at transfer.c:530
#7 Curl_readwrite (conn=0x2aaaac001170, done=0x427f46c7) at transfer.c:1600
#8 0x00002b31ff5ef016 in Transfer (data=0x2aaaac086660) at transfer.c:1855
#9 Curl_perform (data=0x2aaaac086660) at transfer.c:2451
#10 0x00002b31ff3b9503 in do_curl_perform (self=0x2aaaac048430) at src/pycurl.c:1024
#11 0x0000000000492f12 in call_function (f=0x4a72760, throwflag=) at Python/ceval.c:3690
#12 PyEval_EvalFrameEx (f=0x4a72760, throwflag=) at Python/ceval.c:2389
#13 0x0000000000494b7d in PyEval_EvalCodeEx (co=0x484ddc8, globals=, locals=, args=0x4a71cb8, argcount=1, kws=0x4a71cc0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968
#14 0x0000000000492d71 in call_function (f=0x4a71b20, throwflag=) at Python/ceval.c:3802
#15 PyEval_EvalFrameEx (f=0x4a71b20, throwflag=) at Python/ceval.c:2389
#16 0x00000000004941c5 in call_function (f=0x4a71950, throwflag=) at Python/ceval.c:3792
#17 PyEval_EvalFrameEx (f=0x4a71950, throwflag=) at Python/ceval.c:2389
#18 0x0000000000494b7d in PyEval_EvalCodeEx (co=0x475e6c0, globals=, locals=, args=0x4a2aea8, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968
#19 0x00000000004ea69d in function_call (func=0x4799cf8, arg=0x4a2ae90, kw=0x0) at Objects/funcobject.c:524
#20 0x0000000000417bbd in PyObject_Call (func=0x4799cf8, arg=0x4a2ae90, kw=0x0) at Objects/abstract.c:2492
#21 0x000000000041efcf in instancemethod_call (func=, arg=0x4a2ae90, kw=0x0) at Objects/classobject.c:2579
#22 0x0000000000417bbd in PyObject_Call (func=0x4858d20, arg=0x2b31f944d050, kw=0x0) at Objects/abstract.c:2492
#23 0x000000000048c516 in PyEval_CallObjectWithKeywords (func=0x4858d20, arg=0x2b31f944d050, kw=0x0) at Python/ceval.c:3575
#24 0x00000000004c39ad in t_bootstrap (boot_raw=0x49a2ba0) at ./Modules/threadmodule.c:425
#25 0x0000003331206617 in start_thread () from /lib64/libpthread.so.0
#26 0x0000003330ad3c2d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x459fa940 (LWP 2747)):
#0 sre_match (state=0x459f8880, pattern=0x0) at ./Modules/_sre.c:1137
#1 0x00000000004d2a13 in sre_search (state=0x459f8880, pattern=0x4c8f4ba) at ./Modules/_sre.c:1609
#2 0x00000000004d3fc3 in pattern_findall (self=0x4c8f350, args=, kw=) at ./Modules/_sre.c:2072
#3 0x0000000000493261 in call_function (f=0x4868950, throwflag=) at Python/ceval.c:3706
#4 PyEval_EvalFrameEx (f=0x4868950, throwflag=) at Python/ceval.c:2389
#5 0x0000000000494b7d in PyEval_EvalCodeEx (co=0x490a198, globals=, locals=, args=0x4d8a410, argcount=2, kws=0x4d8a420, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968
#6 0x0000000000492d71 in call_function (f=0x4d8a280, throwflag=) at Python/ceval.c:3802
#7 PyEval_EvalFrameEx (f=0x4d8a280, throwflag=) at Python/ceval.c:2389
#8 0x0000000000494b7d in PyEval_EvalCodeEx (co=0x4aff738, globals=, locals=, args=0x49fe318, argcount=1, kws=0x49fe320, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968
#9 0x0000000000492d71 in call_function (f=0x49fe170, throwflag=) at Python/ceval.c:3802
#10 PyEval_EvalFrameEx (f=0x49fe170, throwflag=) at Python/ceval.c:2389
#11 0x0000000000494b7d in PyEval_EvalCodeEx (co=0x4852dc8, globals=, locals=, args=0x4a78380, argcount=3, kws=0x4a78398, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968
#12 0x0000000000492d71 in call_function (f=0x4a781a0, throwflag=) at Python/ceval.c:3802
#13 PyEval_EvalFrameEx (f=0x4a781a0, throwflag=) at Python/ceval.c:2389
#14 0x0000000000494b7d in PyEval_EvalCodeEx (co=0x484ddc8, globals=, locals=, args=0x4a77818, argcount=1, kws=0x4a77820, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968
#15 0x0000000000492d71 in call_function (f=0x4a77680, throwflag=) at Python/ceval.c:3802
#16 PyEval_EvalFrameEx (f=0x4a77680, throwflag=) at Python/ceval.c:2389
#17 0x00000000004941c5 in call_function (f=0x4a774b0, throwflag=) at Python/ceval.c:3792
#18 PyEval_EvalFrameEx (f=0x4a774b0, throwflag=) at Python/ceval.c:2389
#19 0x0000000000494b7d in PyEval_EvalCodeEx (co=0x475e6c0, globals=, locals=, args=0x4a318e8, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968
#20 0x00000000004ea69d in function_call (func=0x4799cf8, arg=0x4a318d0, kw=0x0) at Objects/funcobject.c:524
#21 0x0000000000417bbd in PyObject_Call (func=0x4799cf8, arg=0x4a318d0, kw=0x0) at Objects/abstract.c:2492
#22 0x000000000041efcf in instancemethod_call (func=, arg=0x4a318d0, kw=0x0) at Objects/classobject.c:2579
#23 0x0000000000417bbd in PyObject_Call (func=0x48fbe10, arg=0x2b31f944d050, kw=0x0) at Objects/abstract.c:2492
#24 0x000000000048c516 in PyEval_CallObjectWithKeywords (func=0x48fbe10, arg=0x2b31f944d050, kw=0x0) at Python/ceval.c:3575
#25 0x00000000004c39ad in t_bootstrap (boot_raw=0x49efb60) at ./Modules/threadmodule.c:425
#26 0x0000003331206617 in start_thread () from /lib64/libpthread.so.0
#27 0x0000003330ad3c2d in clone () from /lib64/libc.so.6

...

Inferior 1 [process 2724] will be detached.

Quit anyway? (y or n) [answered Y; input not from terminal]
Detaching from program: /usr/local/python/bin/python2.6, process 2724

可以看到大部分的线程都在等待信号,只有一个线程是sre_search,这就是罪魁祸首了。具体的原因是正则匹配里有一个类似"a.*?b.*?c.*?d"的东西,页面今天改版之后,无法匹配,而这个表达式会不停的尝试,所以cpu一直是100%,这个线程也导致整个python进程卡死。

0 条评论: