-
Notifications
You must be signed in to change notification settings - Fork 849
Description
In our environment, we started seeing crashes with this stack.
#0 0x00002af8ef890387 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1 0x00002af8ef891a78 in __GI_abort () at abort.c:90
#2 0x00002af8ecdeb63b in ink_abort (message_format=message_format@entry=0x2af8ece0dc47 "%s:%d: failed assertion `%s`") at ../../../../../../_scm/trafficserver9.1/src/tscore/ink_error.cc:99
#3 0x00002af8ecde7bb5 in _ink_assert (expression=expression@entry=0x7e4b6a "0", file=file@entry=0x7f9c50 "../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc", line=line@entry=1418)
at ../../../../../../_scm/trafficserver9.1/src/tscore/ink_assert.cc:37
#4 0x000000000056d96d in HttpSM::state_common_wait_for_transform_read(HttpTransformInfo*, int (HttpSM::*)(int, void*), int, void*) () at ../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc:1418
#5 0x000000000056da74 in HttpSM::state_response_wait_for_transform_read(int, void*) () at ../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc:1329
#6 0x000000000057e73b in HttpSM::main_handler(int, void*) () at ../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc:2742
#7 0x000000000079840e in read_signal_and_update(int, UnixNetVConnection*) () at ../../../../../../_scm/trafficserver9.1/iocore/net/UnixNetVConnection.cc:83
#8 0x00000000007988bb in UnixNetVConnection::mainEvent(int, Event*) () at ../../../../../../_scm/trafficserver9.1/iocore/net/UnixNetVConnection.cc:1160
#9 0x000000000078d47b in InactivityCop::check_inactivity (this=0x2af90a8cb000, event=<optimized out>, e=<optimized out>) at ../../../../../../_scm/trafficserver9.1/iocore/net/UnixNet.cc:100
#10 0x00000000007da0dd in handleEvent (data=0x2af90a800060, event=2, this=0x2af90a8cb000) at ../../../../../../_scm/trafficserver9.1/iocore/eventsystem/I_Continuation.h:219
#11 handleEvent (data=0x2af90a800060, event=2, this=0x2af90a8cb000) at ../../../../../../_scm/trafficserver9.1/iocore/eventsystem/I_Continuation.h:215
#12 EThread::process_event(Event*, int) () at ../../../../../../_scm/trafficserver9.1/iocore/eventsystem/UnixEThread.cc:164
#13 0x00000000007dae04 in EThread::execute_regular (this=this@entry=0x2af8f53b0e00) at ../../../../../../_scm/trafficserver9.1/iocore/eventsystem/UnixEThread.cc:273
#14 0x00000000007db182 in execute (this=0x2af8f53b0e00) at ../../../../../../_scm/trafficserver9.1/iocore/eventsystem/UnixEThread.cc:364
#15 EThread::execute (this=0x2af8f53b0e00) at ../../../../../../_scm/trafficserver9.1/iocore/eventsystem/UnixEThread.cc:342
#16 0x00000000007d97ca in spawn_thread_internal (a=0x2af8f08e0d40) at ../../../../../../_scm/trafficserver9.1/iocore/eventsystem/Thread.cc:92
#17 0x00002af8eec22ea5 in start_thread (arg=0x2af8fc703700) at pthread_create.c:307
#18 0x00002af8ef9588dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
The end of the HttpSM history contains the following
{location = {file = 0x7f9c50 "../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc",
func = 0x7fd680 <HttpSM::setup_server_transfer_to_transform()::__FUNCTION__> "setup_server_transfer_to_transform", line = 6666}, event = 34463, reentrancy = 6}, {location = {
file = 0x7f9c50 "../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc",
func = 0x7fdaf0 <HttpSM::tunnel_handler_transform_write(int, HttpTunnelConsumer*)::__FUNCTION__> "tunnel_handler_transform_write", line = 3954}, event = 103, reentrancy = 6}, {location = {
file = 0x7f9c50 "../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc", func = 0x7fdc80 <HttpSM::tunnel_handler_server(int, HttpTunnelProducer*)::__FUNCTION__> "tunnel_handler_server",
line = 3063}, event = 102, reentrancy = 6}, {location = {file = 0x7f9c50 "../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc",
func = 0x7fe000 <HttpSM::state_response_wait_for_transform_read(int, void*)::__FUNCTION__> "state_response_wait_for_transform_read", line = 1314}, event = 2301, reentrancy = 7}, {location = {
file = 0x7f9c50 "../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc",
func = 0x7fdfc0 <HttpSM::state_common_wait_for_transform_read(HttpTransformInfo*, int (HttpSM::*)(int, void*), int, void*)::__FUNCTION__> "state_common_wait_for_transform_read", line = 1344},
event = 2301, reentrancy = 7}, {location = {file = 0x7f9c50 "../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc",
func = 0x7fe000 <HttpSM::state_response_wait_for_transform_read(int, void*)::__FUNCTION__> "state_response_wait_for_transform_read", line = 1314}, event = 106, reentrancy = 1}, {location = {
file = 0x7f9c50 "../../../../../../_scm/trafficserver9.1/proxy/http/HttpSM.cc",
func = 0x7fdfc0 <HttpSM::state_common_wait_for_transform_read(HttpTransformInfo*, int (HttpSM::*)(int, void*), int, void*)::__FUNCTION__> "state_common_wait_for_transform_read", line = 1344},
event = 106, reentrancy = 1},
You see that the tunnel was set up. The consumer got a write complete, the producer got a read complete and tunnel done was sent to the final handler. Then inactivity timeout (106) is sent to the default handler. In our environment the outbound active timeout is set to 14 minutes.
Digging into the core a bit more, we see that the problem requests were GET requests. The respond body is 0 bytes long (correctly reported by the content-length). In our environment a transform plugin (for most of the cases this was txn_box) was installed to adjust the response being sent to the user agent. But from the history, the TRANSFORM_READ_READY is never sent. Instead things timeout and the HttpSM is in a bad state for the particularly handler left. (I will file a separate issue about not handling a 0 length body for a write transform).
@SolidWallOfCode was able to reproduce this in a txn_box autest. We will try to get a test together to exercise this scenario using one of the transforms in the trafficserver source tree (null or buffererd null).
We were able to get around the assert crash by clearing the active timeout when ProxyTransaction::release is called on server_txn. I think this issue arose when PR #7849. We are running with that change in our 9.1.x branch. With that change, the server_txn is not immediately delivered to the HttpSessionManager when release() is called. Rather the actual transfer of control is delayed until the HttpSM is deleted.
The change is to apply the keepalive timeout and clear the active timeout when release() is called. The active timeout makes no sense when the server session is in the keep alive pool.