From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neil Horman Subject: Re: [PATCH net-next] netpoll: allow cleanup to be synchronous Date: Thu, 18 Oct 2018 12:59:35 -0400 Message-ID: <20181018165935.GC2601@hmswarspite.think-freely.org> References: <20181012165929.20098-1-dbanerje@akamai.com> <20181017.214705.2139316496548022085.davem@davemloft.net> <20181018112929.GA2601@hmswarspite.think-freely.org> <5777390a08cb43709615237dcd4ada83@usma1ex-dag1mb2.msg.corp.akamai.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: David Miller , "netdev@vger.kernel.org" To: "Banerjee, Debabrata" Return-path: Received: from charlotte.tuxdriver.com ([70.61.120.58]:38297 "EHLO smtp.tuxdriver.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728570AbeJSBBr (ORCPT ); Thu, 18 Oct 2018 21:01:47 -0400 Content-Disposition: inline In-Reply-To: <5777390a08cb43709615237dcd4ada83@usma1ex-dag1mb2.msg.corp.akamai.com> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, Oct 18, 2018 at 03:17:06PM +0000, Banerjee, Debabrata wrote: > > From: Neil Horman > > > Agreed, this doesn't make sense. If you want a synchronous cleanup, create > > a wrapper function that creates a wait queue, calls __netpoll_free_async, > > and blocks on the wait queue completion. Modify the cleanup_work > > method(s) to complete the wait queue, and you've got what you want. > > > > Neil > > Actually the patch looks bad, but it seems to turn out the rtnl is always > held by the parent when it is called, and asynchronous cleanup doesn't > seem to be necessary at all. Perhaps you could share your deadlock > test case for 2cde6acd49da? > That commit doesn't fix a deadlock, it fixes a modification of data that must only be modified under the protection of the rtnl lock. From the commit log: __netpoll_rcu_free is used to free netpoll structures when the rtnl_lock is already held. The mechanism is used to asynchronously call __netpoll_cleanup outside of the holding of the rtnl_lock, so as to avoid deadlock. Unfortunately, __netpoll_cleanup modifies pointers (dev->np), which means the rtnl_lock must be held while calling it. Further, it cannot be held, because rcu callbacks may be issued in softirq contexts, which cannot sleep. "The mechanism" is referring to the use of rcu callbacks, which origionally were put in place to allow the freeing of data outside the holding of the rtnl, which did avoid deadlock. This patch fixes the fact that __netpoll_cleanup modifies the dev->np pointer outside of the rtnl_lock, which is disallowed because readers of dev->np assume it will be stable while rtnl is held. As for the workqueue not being needed anymore, it definately used to be, but it appears that most of the instances where a deadlock might occur have been modified such that it is no longer the case. The team driver still seems to be an outlier there though I think , in that it doesn't guarantee the holding of rtnl in its port add/delete paths. It might be worth cleaning that up and simply replacing all the calls to __netpoll_free_async with direct calls to __netpoll_cleanup. That would let you eliminate the former function alltogether. Neil > Patch v2 coming next. > > -Deb >