Bug Hunting in Python
Having given the blog a good four years to really settle in and get comfortable, I thought it was about time I wrote a post. I don’t want to strain myself, so it’ll be short and incomplete (don’t worry, I’ll pad it with lots of debugging output so you can give that page key a workout).
There are a myriad ways to debug Python applications. My first stop is typically pdb. It’s simple and ships with Python.
When the Python process in question makes pdb difficult to use like when it’s
being run by systemd rpdb makes it very easy
to remotely debug the program. rpdb even supports SIGTRAP which is very helpful
in situations where the program is hanging and you want to quickly find out
where: just install the signal handler with import rpdb; rpdb.handle_trap()
,
run the process until it seems to get stuck, send the process SIGTRAP (consult
man 7 signal
for the signal number on your platform, it’s likely 5) and
connect to the debugger with nc 127.0.0.1 4444
.
Occasionally, though, (r)pdb isn’t enough. This could be because the bug involves a C extension or Python itself. For example, during Fedora’s rebase to Python 3.9b1, a package failed to rebuild. Adam Williamson diagnosed the build failure and isolated the package test that resulted in the process hanging forever. Unfortunately, rpdb was insufficient in this case. Upon sending the process a TRAP signal, we’re not rewarded with a remote pdb session:
dev in packaging/python-stompest/stompest/src/twisted on master [?] via 🐍 v3.9.0b1 (stompest)
❯ python -m pytest -vvvvv -s --ignore stompest/twisted/tests/async_client_integration_test.py
platform linux -- Python 3.9.0b1, pytest-5.4.3, py-1.8.1, pluggy-0.13.1 -- /home/jcline/.virtualenvs/stompest/bin/python3
cachedir: .pytest_cache
rootdir: /home/jcline/packaging/python-stompest/stompest/src/twisted
collected 14 items
stompest/twisted/tests/async_client_test.py::AsyncClientConnectTimeoutTestCase::test_connected_timeout <- ../../../stompest-715f358b8503320ea42bd4de6682916339943edc/src/twisted/stompest/twisted/tests/async_client_test.py INFO:twisted:--> stompest.twisted.tests.async_client_test.AsyncClientConnectTimeoutTestCase.test_connected_timeout <--
INFO:twisted:Factory starting on 44893
INFO:twisted:Starting factory <twisted.internet.protocol.Factory object at 0x7f037f652130>
INFO:stompest.twisted.protocol:Connecting to localhost:44893 ...
INFO:twisted:Starting factory <stompest.twisted.protocol.StompFactory object at 0x7f037f669be0>
DEBUG:stompest.twisted.tests.broker_simulator:Connection made
DEBUG:stompest.twisted.protocol:Sending CONNECT frame [version=1.0]
DEBUG:stompest.twisted.tests.broker_simulator:Received CONNECT frame [version=1.0]
ERROR:stompest.twisted.listener:Disconnect because of failure: STOMP broker did not answer on time [timeout=1.0]
INFO:stompest.twisted.listener:Disconnecting ... [reason=STOMP broker did not answer on time [timeout=1.0]]
INFO:stompest.twisted.listener:Disconnected: Connection was closed cleanly.
DEBUG:stompest.twisted.listener:Calling disconnected errback: STOMP broker did not answer on time [timeout=1.0]
Trace/breakpoint trap (core dumped)
Not great. Never fear, though, for gdb is here to save us. The first thing we can do is take a look at that core dump:
# In this fantasy there's only one core dump
dev in packaging/python-stompest/stompest/src/twisted on master [?] via 🐍 v3.9.0b1 (stompest)
❯ coredumpctl list
TIME PID UID GID SIG COREFILE EXE
Tue 2020-06-09 09:55:44 EDT 37144 1000 1000 5 present /usr/bin/python3.9
dev in packaging/python-stompest/stompest/src/twisted on master [?] via 🐍 v3.9.0b1 (stompest)
❯ coredumpctl debug python3
PID: 37144 (python3)
UID: 1000 (jcline)
GID: 1000 (jcline)
Signal: 5 (TRAP)
Timestamp: Tue 2020-06-09 09:55:43 EDT (1h 0min ago)
Command Line: python3 -m pytest -vvvvv -s --ignore stompest/twisted/tests/async_client_integration_test.py
Executable: /usr/bin/python3.9
Storage: /var/lib/systemd/coredump/core.python3.1000.a5660049602e418c9ed5d7737a890e5c.37144.1591710943000000000000.lz4
Message: Process 37144 (python3) of user 1000 dumped core.
Stack trace of thread 37144:
#0 0x00007f038e37157e PyException_GetContext (libpython3.9.so.1.0 + 0x18d57e)
#1 0x00007f038e2ffa38 _PyErr_SetObject (libpython3.9.so.1.0 + 0x11ba38)
#2 0x00007f038e30895d gen_send_ex (libpython3.9.so.1.0 + 0x12495d)
#3 0x00007f038e3c4a81 _gen_throw (libpython3.9.so.1.0 + 0x1e0a81)
#4 0x00007f038e3c497a gen_throw (libpython3.9.so.1.0 + 0x1e097a)
#5 0x00007f038e30a180 method_vectorcall_VARARGS (libpython3.9.so.1.0 + 0x126180)
#6 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#7 0x00007f038e2ece9e _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108e9e)
#8 0x00007f038e2f9a6b function_code_fastcall (libpython3.9.so.1.0 + 0x115a6b)
#9 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#10 0x00007f038e2ece9e _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108e9e)
#11 0x00007f038e2ebf59 _PyEval_EvalCode (libpython3.9.so.1.0 + 0x107f59)
#12 0x00007f038e2f97b6 _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x1157b6)
#13 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#14 0x00007f038e2eccc3 _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108cc3)
#15 0x00007f038e2ebf59 _PyEval_EvalCode (libpython3.9.so.1.0 + 0x107f59)
#16 0x00007f038e2f97b6 _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x1157b6)
#17 0x00007f038e2ef9fa _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x10b9fa)
#18 0x00007f038e2f9a6b function_code_fastcall (libpython3.9.so.1.0 + 0x115a6b)
#19 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#20 0x00007f038e2ece9e _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108e9e)
#21 0x00007f038e2f9a6b function_code_fastcall (libpython3.9.so.1.0 + 0x115a6b)
#22 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#23 0x00007f038e2ece9e _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108e9e)
#24 0x00007f038e2eb9a1 _PyEval_EvalCode (libpython3.9.so.1.0 + 0x1079a1)
#25 0x00007f038e2f97b6 _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x1157b6)
#26 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#27 0x00007f038e2ece9e _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108e9e)
#28 0x00007f038e2ebf59 _PyEval_EvalCode (libpython3.9.so.1.0 + 0x107f59)
#29 0x00007f038e2f97b6 _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x1157b6)
#30 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#31 0x00007f038e2eccc3 _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108cc3)
#32 0x00007f038e2ebf59 _PyEval_EvalCode (libpython3.9.so.1.0 + 0x107f59)
#33 0x00007f038e2f97b6 _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x1157b6)
#34 0x00007f038e2ef9fa _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x10b9fa)
#35 0x00007f038e2f9a6b function_code_fastcall (libpython3.9.so.1.0 + 0x115a6b)
#36 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#37 0x00007f038e2ece9e _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108e9e)
#38 0x00007f038e2f9a6b function_code_fastcall (libpython3.9.so.1.0 + 0x115a6b)
#39 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#40 0x00007f038e2ece9e _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108e9e)
#41 0x00007f038e2eb9a1 _PyEval_EvalCode (libpython3.9.so.1.0 + 0x1079a1)
#42 0x00007f038e2f97b6 _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x1157b6)
#43 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#44 0x00007f038e2ece9e _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108e9e)
#45 0x00007f038e2f9a6b function_code_fastcall (libpython3.9.so.1.0 + 0x115a6b)
#46 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#47 0x00007f038e2ece9e _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108e9e)
#48 0x00007f038e2ebf59 _PyEval_EvalCode (libpython3.9.so.1.0 + 0x107f59)
#49 0x00007f038e2f97b6 _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x1157b6)
#50 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#51 0x00007f038e2eccc3 _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108cc3)
#52 0x00007f038e3088da gen_send_ex (libpython3.9.so.1.0 + 0x1248da)
#53 0x00007f038e30157e method_vectorcall_O (libpython3.9.so.1.0 + 0x11d57e)
#54 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#55 0x00007f038e2ece9e _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108e9e)
#56 0x00007f038e2ebf59 _PyEval_EvalCode (libpython3.9.so.1.0 + 0x107f59)
#57 0x00007f038e2f97b6 _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x1157b6)
#58 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#59 0x00007f038e2eccc3 _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108cc3)
#60 0x00007f038e2ebf59 _PyEval_EvalCode (libpython3.9.so.1.0 + 0x107f59)
#61 0x00007f038e2f97b6 _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x1157b6)
#62 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#63 0x00007f038e2eccc3 _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108cc3)
Stack trace of thread 37145:
#0 0x00007f038e1d4a24 futex_abstimed_wait_cancelable (libpthread.so.0 + 0x12a24)
#1 0x00007f038e1d4b28 __new_sem_wait_slow (libpthread.so.0 + 0x12b28)
#2 0x00007f038e35da5e PyThread_acquire_lock_timed (libpython3.9.so.1.0 + 0x179a5e)
#3 0x00007f038e36c1c1 acquire_timed (libpython3.9.so.1.0 + 0x1881c1)
#4 0x00007f038e36bfcb lock_PyThread_acquire_lock (libpython3.9.so.1.0 + 0x187fcb)
#5 0x00007f038e2fa892 method_vectorcall_VARARGS_KEYWORDS (libpython3.9.so.1.0 + 0x116892)
#6 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#7 0x00007f038e2ece9e _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108e9e)
#8 0x00007f038e2eb9a1 _PyEval_EvalCode (libpython3.9.so.1.0 + 0x1079a1)
#9 0x00007f038e2f97b6 _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x1157b6)
#10 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#11 0x00007f038e2ece9e _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108e9e)
#12 0x00007f038e2eb9a1 _PyEval_EvalCode (libpython3.9.so.1.0 + 0x1079a1)
#13 0x00007f038e2f97b6 _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x1157b6)
#14 0x00007f038e302380 method_vectorcall (libpython3.9.so.1.0 + 0x11e380)
#15 0x00007f038e356d0c _PyObject_CallNoArg.lto_priv.17 (libpython3.9.so.1.0 + 0x172d0c)
#16 0x00007f038e3cf9e9 calliter_iternext (libpython3.9.so.1.0 + 0x1eb9e9)
#17 0x00007f038e2ed165 _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x109165)
#18 0x00007f038e2ebd6f _PyEval_EvalCode (libpython3.9.so.1.0 + 0x107d6f)
#19 0x00007f038e2f97b6 _PyFunction_Vectorcall (libpython3.9.so.1.0 + 0x1157b6)
#20 0x00007f038e2ef9fa _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x10b9fa)
#21 0x00007f038e2f9a6b function_code_fastcall (libpython3.9.so.1.0 + 0x115a6b)
#22 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#23 0x00007f038e2ece9e _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108e9e)
#24 0x00007f038e2f9a6b function_code_fastcall (libpython3.9.so.1.0 + 0x115a6b)
#25 0x00007f038e2f44aa call_function (libpython3.9.so.1.0 + 0x1104aa)
#26 0x00007f038e2ece9e _PyEval_EvalFrameDefault (libpython3.9.so.1.0 + 0x108e9e)
#27 0x00007f038e2f9a6b function_code_fastcall (libpython3.9.so.1.0 + 0x115a6b)
#28 0x00007f038e302380 method_vectorcall (libpython3.9.so.1.0 + 0x11e380)
#29 0x00007f038e3b1346 t_bootstrap (libpython3.9.so.1.0 + 0x1cd346)
#30 0x00007f038e3b12f4 pythread_wrapper (libpython3.9.so.1.0 + 0x1cd2f4)
#31 0x00007f038e1cb432 start_thread (libpthread.so.0 + 0x9432)
#32 0x00007f038e63a9d3 __clone (libc.so.6 + 0x1019d3)
Hmm. If we wanted to work hard, we could perhaps do some more investigation on
the core dump, but let’s play with another one of gdb’s features. gdb is a
general-purpose debugger that can, among other things, attach to running
processes. Running gdb --pid=<pytest pid>
will attach us to the running
process where we can poke around. gdb’s default interface is rather…
minimal, but can be customized with projects like
gdb-dashboard and that’s what I’m
using in the output below.
If you’re running on Fedora, you’ll likely be presented with a warning about missing debuginfo when you attach to the process and should install it using the provided command before continuing. As you can see here, I’ve already installed the debuginfo and debugsource packages for Python 3.9.
─── Output/messages ────────────────────────────────────────────────────────────
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
PyException_GetContext (self=0x7fd61fe53580) at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Objects/exceptions.c:350
350 PyObject *context = _PyBaseExceptionObject_cast(self)->context;
─── Assembly ───────────────────────────────────────────────────────────────────
~
~
~
~
~
0x00007fd62eb87570 PyException_GetContext+0 mov 0x28(%rdi),%rax
0x00007fd62eb87574 PyException_GetContext+4 test %rax,%rax
0x00007fd62eb87577 PyException_GetContext+7 jne 0x7fd62eb8757a <PyException_GetContext+10>
0x00007fd62eb87579 PyException_GetContext+9 retq
0x00007fd62eb8757a PyException_GetContext+10 addq $0x1,(%rax)
─── Breakpoints ────────────────────────────────────────────────────────────────
─── Expressions ────────────────────────────────────────────────────────────────
─── History ────────────────────────────────────────────────────────────────────
─── Memory ─────────────────────────────────────────────────────────────────────
─── Registers ──────────────────────────────────────────────────────────────────
rax 0x00007fd61fe53580 rbx 0x00007fd61fe53580 rcx 0x0000000000000000
rdx 0x00007fd61fe53a00 rsi 0x00007fd62ed18a20 rdi 0x00007fd61fe53580
rbp 0x00007fd62ed18a20 rsp 0x00007ffdc884cd58 r8 0x0000000000000004
r9 0x00005609a47be5f0 r10 0x00007fd61fe53580 r11 0x0000000000000050
r12 0x00007fd61fe53a00 r13 0x00007fd61fe53580 r14 0x00005609a47bd420
r15 0x00007fd62ed25540 rip 0x00007fd62eb87570 eflags [ IF ]
cs 0x00000033 ss 0x0000002b ds 0x00000000
es 0x00000000 fs 0x00000000 gs 0x00000000
─── Source ─────────────────────────────────────────────────────────────────────
345 }
346
347 PyObject *
348 PyException_GetContext(PyObject *self)
349 {
350> PyObject *context = _PyBaseExceptionObject_cast(self)->context;
351 Py_XINCREF(context);
352 return context;
353 }
354
─── Stack ──────────────────────────────────────────────────────────────────────
[0] from 0x00007fd62eb87570 in PyException_GetContext+0 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Objects/exceptions.c:350
[1] from 0x00007fd62eb15a38 in _PyErr_SetObject+456 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Python/errors.c:145
[2] from 0x00007fd62eb1e95d in _PyErr_SetNone+17 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Python/errors.c:199
[3] from 0x00007fd62eb1e95d in PyErr_SetNone+17 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Python/errors.c:199
[4] from 0x00007fd62eb1e95d in gen_send_ex+509 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Objects/genobject.c:241
[5] from 0x00007fd62ebdaa81 in _gen_throw+225 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Objects/genobject.c:513
[6] from 0x00007fd62ebda97a in gen_throw+122 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Objects/genobject.c:535
[7] from 0x00007fd62eb20180 in method_vectorcall_VARARGS+176 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Objects/descrobject.c:313
[8] from 0x00007fd62eb0a4aa in _PyObject_VectorcallTstate+46 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Include/cpython/abstract.h:118
[9] from 0x00007fd62eb0a4aa in PyObject_Vectorcall+80 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Include/cpython/abstract.h:127
[+]
─── Threads ────────────────────────────────────────────────────────────────────
[2] id 37996 name python3 from 0x00007fd62e9eaa24 in futex_abstimed_wait_cancelable+42 at ../sysdeps/nptl/futex-internal.h:320
[1] id 37995 name python3 from 0x00007fd62eb87570 in PyException_GetContext+0 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Objects/exceptions.c:350
─── Variables ──────────────────────────────────────────────────────────────────
arg self = 0x7fd61fe53580: {ob_refcnt = 19,ob_type = 0x5609a4c818d0}
loc context = <optimized out>
────────────────────────────────────────────────────────────────────────────────
Missing separate debuginfos, use: dnf debuginfo-install expat-2.2.8-2.fc32.x86_64
>>>
At a glance, we see two operating system threads are running. Let’s start by looking at what thread 2 is up to:
>>> thread 2
[Switching to thread 2 (Thread 0x7fd61fe19700 (LWP 37996))]
#0 futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x7fd618001450) at ../sysdeps/nptl/futex-internal.h:320
320 int err = lll_futex_clock_wait_bitset (futex_word, expected,
>>> bt
0 futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x7fd618001450) at ../sysdeps/nptl/futex-internal.h:320
#1 do_futex_wait (sem=sem@entry=0x7fd618001450, abstime=0x0, clockid=0) at sem_waitcommon.c:112
#2 0x00007fd62e9eab28 in __new_sem_wait_slow (sem=sem@entry=0x7fd618001450, abstime=0x0, clockid=0) at sem_waitcommon.c:184
#3 0x00007fd62e9eaba1 in __new_sem_wait (sem=sem@entry=0x7fd618001450) at sem_wait.c:42
#4 0x00007fd62eb73a5e in PyThread_acquire_lock_timed (lock=0x7fd618001450, microseconds=<optimized out>, intr_flag=1) at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Python/thread_pthread.h:463
#5 0x00007fd62eb821c1 in acquire_timed (lock=0x7fd618001450, timeout=<optimized out>) at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Modules/_threadmodule.c:63
...
This looks a bit intimidating if you’re not familiar with C. Fortunately, gdb knows about Python:
>>> py-bt
Traceback (most recent call first):
File "/usr/lib64/python3.9/threading.py", line 312, in wait
waiter.acquire()
File "/usr/lib64/python3.9/queue.py", line 171, in get
self.not_empty.wait()
File "/home/jcline/.virtualenvs/stompest/lib64/python3.9/site-packages/twisted/_threads/_threadworker.py", line 45, in work
for task in iter(queue.get, _stop):
File "/usr/lib64/python3.9/threading.py", line 888, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python3.9/threading.py", line 950, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.9/threading.py", line 908, in _bootstrap
self._bootstrap_inner()
One thing to note here is that the call stack is “upside down” when compared to Python tracebacks you’re likely used to, with the most recent call at the top of the stack. Perhaps unsurprisingly given the C stack, we see this thread is waiting for something. This doesn’t seem particularly unusual, so lets see what the other thread is up to:
>>> thread 1
[Switching to thread 1 (Thread 0x7fd62e881740 (LWP 37995))]
#0 PyException_GetContext (self=<StompCancelledError at remote 0x7fd61fe53580>) at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Objects/exceptions.c:350
350 PyObject *context = _PyBaseExceptionObject_cast(self)->context;
>>> py-bt
Traceback (most recent call first):
File "/home/jcline/.virtualenvs/stompest/lib64/python3.9/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/home/jcline/.virtualenvs/stompest/lib64/python3.9/site-packages/twisted/internet/defer.py", line 1672, in _inlineCallbacks
return succeed(False)
File "/home/jcline/.virtualenvs/stompest/lib64/python3.9/site-packages/twisted/internet/defer.py", line 1475, in gotResult
_inlineCallbacks(r, g, status)
File "/home/jcline/.virtualenvs/stompest/lib64/python3.9/site-packages/twisted/internet/defer.py", line 910, in _runCallbacks
if iscoroutine(coro) or isinstance(coro, GeneratorType):
File "/home/jcline/.virtualenvs/stompest/lib64/python3.9/site-packages/twisted/internet/defer.py", line 568, in _startRunCallbacks
self._runCallbacks()
File "/home/jcline/.virtualenvs/stompest/lib64/python3.9/site-packages/twisted/internet/defer.py", line 501, in errback
self._startRunCallbacks(fail)
File "/home/jcline/.virtualenvs/stompest/lib64/python3.9/site-packages/twisted/internet/defer.py", line 2488, in _inlineCallbacks
File "/home/jcline/.virtualenvs/stompest/lib64/python3.9/site-packages/twisted/internet/defer.py", line 1475, in gotResult
_inlineCallbacks(r, g, status)
File "/home/jcline/.virtualenvs/stompest/lib64/python3.9/site-packages/twisted/internet/defer.py", line 910, in _runCallbacks
if iscoroutine(coro) or isinstance(coro, GeneratorType):
File "/home/jcline/.virtualenvs/stompest/lib64/python3.9/site-packages/twisted/internet/defer.py", line 568, in _startRunCallbacks
self._runCallbacks()
File "/home/jcline/.virtualenvs/stompest/lib64/python3.9/site-packages/twisted/internet/defer.py", line 501, in errback
self._startRunCallbacks(fail)
File "/home/jcline/packaging/python-stompest/stompest/src/twisted/stompest/twisted/listener.py", line 111, in onConnectionLost
connection.disconnected.errback(self._disconnectReason)
File "/home/jcline/packaging/python-stompest/stompest/src/twisted/stompest/twisted/client.py", line 347, in <lambda>
yield self._notify(lambda l: l.onConnectionLost(self, reason))
File "/home/jcline/packaging/python-stompest/stompest/src/twisted/stompest/twisted/client.py", line 337, in _notify
yield notify(listener)
File "/home/jcline/.virtualenvs/stompest/lib64/python3.9/site-packages/twisted/internet/defer.py", line 1674, in _inlineCallbacks
...
This call stack is very deep due to pytest, pluggy, trail, and twisted so I have truncated it. The most interesting call is the top one in any case. It appears we’re in the process of throwing an exception, which as a general rule shouldn’t hang so this thread looks a lot more suspicious than the other thread. However, it’s not clear why it’s hanging, so it’s time to roll up our sleeves at what CPython is actually up to:
>>> bt
#0 PyException_GetContext (self=<StompCancelledError at remote 0x7fd61fe53580>) at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Objects/exceptions.c:350
#1 0x00007fd62eb15a38 in _PyErr_SetObject (tstate=0x5609a47bd420, exception=<type at remote 0x7fd62ed18a20>, value=StopIteration()) at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Python/errors.c:145
#2 0x00007fd62eb1e95d in _PyErr_SetNone (exception=<optimized out>, tstate=<optimized out>) at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Python/errors.c:199
#3 PyErr_SetNone (exception=<optimized out>) at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Python/errors.c:199
#4 gen_send_ex (gen=0x7fd61fe82c10, arg=<optimized out>, exc=<optimized out>, closing=<optimized out>) at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Objects/genobject.c:241
#5 0x00007fd62ebdaa81 in _gen_throw (gen=0x7fd61fe82c10, close_on_genexit=<optimized out>, typ=<optimized out>, val=<optimized out>, tb=<optimized out>) at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Objects/genobject.c:513
#6 0x00007fd62ebda97a in gen_throw (gen=0x7fd61fe82c10, args=args@entry=(<type at remote 0x5609a4c818d0>, <StompCancelledError at remote 0x7fd61fe53580>, <traceback at remote 0x7fd61f5f18c0>)) at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Objects/genobject.c:535
Okay, so it looks like we’re in the exception portion of CPython (not shocking
given our look at the Python stack) in a function getting some “context”. It
doesn’t appear that anything is intentionally waiting on some event, so lets
step through the program a bit to see what we’re busy doing. It turns out
PyException_GetContext
doesn’t do much and returns to the caller just fine:
>>> n
>>> n
>>> n
─── Output/messages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
147 if (context == value) {
─── Assembly ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
0x00007fd62eb15a7e _PyErr_SetObject+526 mov %rax,%r15
0x00007fd62eb15a81 _PyErr_SetObject+529 jmp 0x7fd62eb15a0d <_PyErr_SetObject+413>
0x00007fd62eb15a83 _PyErr_SetObject+531 mov %rax,%rdi
0x00007fd62eb15a86 _PyErr_SetObject+534 mov %rax,0x8(%rsp)
0x00007fd62eb15a8b _PyErr_SetObject+539 callq 0x7fd62eaf6e20 <_Py_DECREF>
0x00007fd62eb15a90 _PyErr_SetObject+544 mov 0x8(%rsp),%r10
0x00007fd62eb15a95 _PyErr_SetObject+549 cmp %r10,%r12
0x00007fd62eb15a98 _PyErr_SetObject+552 jne 0x7fd62eb15a2d <_PyErr_SetObject+445>
0x00007fd62eb15a9a _PyErr_SetObject+554 jmpq 0x7fd62ea65487 <_PyErr_SetObject-721897>
0x00007fd62eb15a9f _PyErr_SetObject+559 xor %edx,%edx
─── Breakpoints ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
─── Expressions ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
─── History ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
─── Memory ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
─── Registers ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
rax 0x00007fd61fe53580 rbx 0x00007fd61fe53580 rcx 0x0000000000000000 rdx 0x00007fd61fe53a00 rsi 0x00007fd62ed18a20 rdi 0x00007fd61fe53580 rbp 0x00007fd62ed18a20
rsp 0x00007ffdc884cd60 r8 0x0000000000000004 r9 0x00005609a47be5f0 r10 0x00007fd61fe53580 r11 0x0000000000000050 r12 0x00007fd61fe53a00 r13 0x00007fd61fe53580
r14 0x00005609a47bd420 r15 0x00007fd62ed25540 rip 0x00007fd62eb15a90 eflags [ IF ] cs 0x00000033 ss 0x0000002b ds 0x00000000
es 0x00000000 fs 0x00000000 gs 0x00000000
─── Source ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
142 to inline the call to PyException_GetContext. */
143 if (exc_value != value) {
144 PyObject *o = exc_value, *context;
145 while ((context = PyException_GetContext(o))) {
146 Py_DECREF(context);
147> if (context == value) {
148 PyException_SetContext(o, NULL);
149 break;
150 }
151 o = context;
152 }
─── Stack ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
[0] from 0x00007fd62eb15a90 in _PyErr_SetObject+544 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Python/errors.c:147
[1] from 0x00007fd62eb1e95d in _PyErr_SetNone+17 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Python/errors.c:199
[2] from 0x00007fd62eb1e95d in PyErr_SetNone+17 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Python/errors.c:199
[3] from 0x00007fd62eb1e95d in gen_send_ex+509 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Objects/genobject.c:241
[4] from 0x00007fd62ebdaa81 in _gen_throw+225 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Objects/genobject.c:513
[5] from 0x00007fd62ebda97a in gen_throw+122 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Objects/genobject.c:535
[6] from 0x00007fd62eb20180 in method_vectorcall_VARARGS+176 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Objects/descrobject.c:313
[7] from 0x00007fd62eb0a4aa in _PyObject_VectorcallTstate+46 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Include/cpython/abstract.h:118
[8] from 0x00007fd62eb0a4aa in PyObject_Vectorcall+80 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Include/cpython/abstract.h:127
[9] from 0x00007fd62eb0a4aa in call_function+154 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Python/ceval.c:5099
[+]
─── Threads ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
[2] id 37996 name python3 from 0x00007fd62e9eaa24 in futex_abstimed_wait_cancelable+42 at ../sysdeps/nptl/futex-internal.h:320
[1] id 37995 name python3 from 0x00007fd62eb15a90 in _PyErr_SetObject+544 at /usr/src/debug/python39-3.9.0~b1-1.fc32.x86_64/Python/errors.c:147
─── Variables ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
arg tstate = 0x5609a47bd420: {prev = 0x5609a49fd300,next = 0x0,interp = 0x5609a47be370,frame = Frame 0…, exception = <type at remote 0x7fd62ed18a20>: {ob_refcnt = 7,ob_type = 0x7fd62ed237c0 <PyType_Type>}, value = StopIteration(): {ob_refcnt = 1,ob_type = 0x7fd62ed18a20 <_PyExc_StopIteration>}
loc o = <StompCancelledError at remote 0x7fd61fe53580>: {ob_refcnt = 19,ob_type = 0x5609a4c818d0}, context = <StompCancelledError at remote 0x7fd61fe53580>: {ob_refcnt = 19,ob_type = 0x5609a4c818d0}, exc_value = <StompCancelledError at remote 0x7fd61fe53580>: {ob_refcnt = 19,ob_type = 0x5609a4c818d0}, tb = 0x0: Cannot access memory at address 0x0
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
>>>
Now here’s something a little more interesting. That while ((context =
PyException_GetContext(o)))
is just the sort of thing that could hang forever.
We can see that the loop retrieves a context
from o
, checks it against a
value
, and then updates o
to be context
. Those variables are visible in
the variables section above, but for clarity:
>>> p o
$1 = <StompCancelledError at remote 0x7fd61fe53580>
>>> p context
$2 = <StompCancelledError at remote 0x7fd61fe53580>
>>> p value
$3 = StopIteration()
>>>
Well, well, well. Since o
is the exact same as context
, the update at the
end of the loop that is presumably supposed to create an exit condition will
never occur.
Now that we know what’s wrong, it’s a very good time to do some hunting to determine if this bug has already been found. This is a process I can give very little advice on since it appears to be a rule of the universe that all bug reports are duplicates. What I did in this case was:
dev in ~/devel/c/cpython on 3.9 via 🐍 v3.8.3
❯ git log --grep="SetObject" # Maybe the function name is in a commit fixing it
commit 7f77ac463cff219e0c8afef2611cad5080cc9df1
Author: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
Date: Fri May 22 14:35:22 2020 -0700
bpo-40696: Fix a hang that can arise after gen.throw() (GH-20287)
This updates _PyErr_ChainStackItem() to use _PyErr_SetObject()
instead of _PyErr_ChainExceptions(). This prevents a hang in
certain circumstances because _PyErr_SetObject() performs checks
to prevent cycles in the exception context chain while
_PyErr_ChainExceptions() doesn't.
(cherry picked from commit 7c30d12bd5359b0f66c4fbc98aa055398bcc8a7e)
Co-authored-by: Chris Jerdonek <chris.jerdonek@gmail.com>
Ding ding ding. We can check out the GitHub pull request and Python issue to see that all this debugging has happened before (and all this debugging will happen again).
The quality of commit messages vary from project to project, so this won’t always work, but it’s a good start. If nothing obvious shows up in the commit log, try testing with the development branch of whatever is causing you pain. Maybe it’s fixed. If that fails, skim through the issue tracker (if the project has an issue tracker) and hope for the best.