You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
EXCESSIVE NDB KERNEL THREAD IS STUCK IN: PERFORMING SEND; WATCHDOG TERMINATE It was possible to end up in a livelock on the send_buffer mutex if send buffers became a limiting resource. This could be due to too little send buffers configured, slow/failing communication network such that all send buffers are filled, slow receiveres which does not consume what is sent and likely other reasons. In this situation (all?) worker threads will fail to allocate send buffer memory for the signals, and will attempt a ::forceSend() to free up space. At the same time the send thread will be very busy trying to send to the same node(s). All these threads will compete for taking the send_buffer mutex, which result in a livelock on it. Send threads stalled due to hitting this livelock will be reported by the watchdog as 'Stuck in Send' As 'stuck' send threads also held the global send_thread mutex they could even block worker threads trying to grab the same mutex while alerting send threads as part of a do_send request. Thus, even the worker threads got 'Stuck in Send' This patch does two things: 1) Code analysys revealed that the send thread does not need to hold the global send_thread mutex while grabbing the send_buffer mutex. By releasing this global mutex prior to locking the SB-mutex the *worker threads* will no more be 'Stuck in Send'. 2) Changed the send treads locking of the send_buffer mutex to use a trylock. If the try-locking failed, the node to be sent to is re-inserted last into the list of send-nodes in order to be retried later. This removed the 'Stuck in Send' condition for the send threads as well.
0 commit comments