From mboxrd@z Thu Jan 1 00:00:00 1970 From: Josef Bacik Subject: Re: Soft lockup in inet_put_port on 4.6 Date: Thu, 8 Dec 2016 16:36:56 -0500 Message-ID: <1481233016.11849.1@smtp.office365.com> References: <1481231024.1911284.813071977.72AF4DEE@webmail.messagingengine.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Cc: Tom Herbert , Linux Kernel Network Developers To: Hannes Frederic Sowa Return-path: Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:38259 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752186AbcLHVhz (ORCPT ); Thu, 8 Dec 2016 16:37:55 -0500 In-Reply-To: <1481231024.1911284.813071977.72AF4DEE@webmail.messagingengine.com> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, Dec 8, 2016 at 4:03 PM, Hannes Frederic Sowa wrote: > Hello Tom, > > On Wed, Dec 7, 2016, at 00:06, Tom Herbert wrote: >> We are seeing a fair number of machines getting into softlockup in >> 4.6 >> kernel. As near as I can tell this is happening on the spinlock in >> bind hash bucket. When inet_csk_get_port exits and does >> spinunlock_bh >> the TCP timer runs and we hit lockup in inet_put_port (presumably on >> same lock). It seems like the locked isn't properly be unlocked >> somewhere but I don't readily see it. >> >> Any ideas? > > Likewise we received reports that pretty much look the same on our > heavily patched kernel. Did you have a chance to investigate or > reproduce the problem? > > I am wondering if you would be able to take a complete thread stack > dump > if you can reproduce this to check if one of the user space processes > is > looping inside finding a free port? We can reproduce the problem at will, still trying to run down the problem. I'll try and find one of the boxes that dumped a core and get a bt of everybody. Thanks, Josef