From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757160AbbKRVSL (ORCPT ); Wed, 18 Nov 2015 16:18:11 -0500 Received: from mx1.redhat.com ([209.132.183.28]:53786 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757081AbbKRVSJ (ORCPT ); Wed, 18 Nov 2015 16:18:09 -0500 Subject: Re: Asterisk deadlocks since Kernel 4.1 To: Stefan Priebe References: <564B3D35.50004@profihost.ag> <564B7F9D.5060701@profihost.ag> <564CDE2F.8000201@profihost.ag> Cc: Thomas Gleixner , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org From: Florian Weimer X-Enigmail-Draft-Status: N1110 Message-ID: <564CEB0C.40006@redhat.com> Date: Wed, 18 Nov 2015 22:18:04 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <564CDE2F.8000201@profihost.ag> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/18/2015 09:23 PM, Stefan Priebe wrote: > > Am 17.11.2015 um 20:43 schrieb Thomas Gleixner: >> On Tue, 17 Nov 2015, Stefan Priebe wrote: >>> I've now also two gdb backtraces from two crashes: >>> http://pastebin.com/raw.php?i=yih5jNt8 >>> >>> http://pastebin.com/raw.php?i=kGEcvH4T >> >> They don't tell me anything as I have no idea of the inner workings of >> asterisk. You might be better of to talk to the asterisk folks to help >> you track down what that thing is waiting for, so we can actually look >> at a well defined area. > > The asterisk guys told me it's a livelock asterisk is waiting for > getaddrinfo / recvmsg. > > Thread 2 (Thread 0x7fbe989c6700 (LWP 12890)): > #0 0x00007fbeb9eb487d in recvmsg () from /lib/x86_64-linux-gnu/libc.so.6 > #1 0x00007fbeb9ed4fcc in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #2 0x00007fbeb9ed544a in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #3 0x00007fbeb9e92007 in getaddrinfo () from > /lib/x86_64-linux-gnu/libc.so.6 Stefan, please try to get a backtrace with debugging information. It is likely that this is the make_request/__check_pf functionality in glibc, but it would be nice to get some certainty. Which glibc version do you use? Has it got a fix for CVE-2013-7423? So far, the only known cause for a hang in this place (that is, lack of return from recvmsg) is incorrect file descriptor use. (CVE-2013-7423 is such an issue in glibc itself.) The kernel upgrade could change scheduling behavior, and the actual bug might have been latent before. Theoretically, recvmsg could also hang if the Netlink query was dropped by the kernel, or the final packet in the response was dropped. We never saw that happen, even under extreme load, but I didn't test with recent kernels. The glibc change Hannes mentioned won't detect the hang, but if there is incorrect file descriptor reuse going on, it is possible that the new assert catches it. Florian