From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757902AbcH3JkE (ORCPT ); Tue, 30 Aug 2016 05:40:04 -0400 Received: from mail-lf0-f48.google.com ([209.85.215.48]:36289 "EHLO mail-lf0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757595AbcH3JkB (ORCPT ); Tue, 30 Aug 2016 05:40:01 -0400 MIME-Version: 1.0 In-Reply-To: <57C55258.8070509@kyup.com> References: <57BDAE06.1040400@kyup.com> <9dde3145-9128-ffef-1b84-e3bd429dd4e8@stressinduktion.org> <57C55258.8070509@kyup.com> From: Miklos Szeredi Date: Tue, 30 Aug 2016 11:39:58 +0200 Message-ID: Subject: Re: kernel BUG at net/unix/garbage.c:149!" To: Nikolay Borisov Cc: Hannes Frederic Sowa , "Linux-Kernel@Vger. Kernel. Org" , netdev@vger.kernel.org Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 30, 2016 at 11:31 AM, Nikolay Borisov wrote: > > > On 08/30/2016 12:18 PM, Miklos Szeredi wrote: >> On Tue, Aug 30, 2016 at 12:37 AM, Miklos Szeredi wrote: >>> On Sat, Aug 27, 2016 at 11:55 AM, Miklos Szeredi wrote: >> >>> crash> list -H gc_inflight_list unix_sock.link -s unix_sock.inflight | >>> grep counter | cut -d= -f2 | awk '{s+=$1} END {print s}' >>> 130 >>> crash> p unix_tot_inflight >>> unix_tot_inflight = $2 = 135 >>> >>> We've lost track of a total of five inflight sockets, so it's not a >>> one-off thing. Really weird... Now off to sleep, maybe I'll dream of >>> the solution. >> >> Okay, found one bug: gc assumes that in-flight sockets that don't have >> an external ref can't gain one while unix_gc_lock is held. That is >> true because unix_notinflight() will be called before detaching fds, >> which takes unix_gc_lock. Only MSG_PEEK was somehow overlooked. That >> one also clones the fds, also keeping them in the skb. But through >> MSG_PEEK an external reference can definitely be gained without ever >> touching unix_gc_lock. >> >> Not sure whether the reported bug can be explained by this. Can you >> confirm the MSG_PEEK was used in the setup? >> >> Does someone want to write a stress test for SCM_RIGHTS + MSG_PEEK? >> >> Anyway, attaching a fix that works by acquiring unix_gc_lock in case >> of MSG_PEEK also. It is trivially correct, but I haven't tested it. > > I have no way of being 100% sure but looking through nginx's source code > it seems they do utilize MSG_PEEK on several occasions. This issue has > been apparently very hard to reproduce since I have 100s of servers > running a lot of NGINX processes and this has been triggered only once. > > On a different note - if I inspect a live node without this patch should > the discrepancy between the gc_inflight_list and the unix_tot_inflight > be present VS with this patch applied? May well be, since in the vmcore 4 in-flight sockets were "lost" before triggering the bug. I guess the best way to check is with a systemtap script that walks the list with the gc lock. Thanks, Miklos