All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] netpoll: allow cleanup to be synchronous
@ 2018-10-12 16:59 Debabrata Banerjee
  2018-10-12 17:27 ` Banerjee, Debabrata
  2018-10-18  4:47 ` David Miller
  0 siblings, 2 replies; 7+ messages in thread
From: Debabrata Banerjee @ 2018-10-12 16:59 UTC (permalink / raw)
  To: David S . Miller, netdev; +Cc: dbanerje, Neil Horman

This fixes a problem introduced by:
commit 2cde6acd49da ("netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock")

When using netconsole on a bond, __netpoll_cleanup can asynchronously
recurse multiple times, each __netpoll_free_async call can result in
more __netpoll_free_async's. This means there is now a race between
cleanup_work queues on multiple netpoll_info's on multiple devices and
the configuration of a new netpoll. For example if a netconsole is set
to enable 0, reconfigured, and enable 1 immediately, this netconsole
will likely not work.

Given the reason for __netpoll_free_async is it can be called when rtnl
is not locked, if it is locked, we should be able to execute
synchronously.

CC: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
---
 net/core/netpoll.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index de1d1ba92f2d..b899cbfbe639 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -826,7 +826,10 @@ static void netpoll_async_cleanup(struct work_struct *work)
 
 void __netpoll_free_async(struct netpoll *np)
 {
-	schedule_work(&np->cleanup_work);
+	if (rtnl_is_locked())
+		__netpoll_cleanup(np);
+	else
+		schedule_work(&np->cleanup_work);
 }
 EXPORT_SYMBOL_GPL(__netpoll_free_async);
 
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] netpoll: allow cleanup to be synchronous
  2018-10-12 16:59 [PATCH net-next] netpoll: allow cleanup to be synchronous Debabrata Banerjee
@ 2018-10-12 17:27 ` Banerjee, Debabrata
  2018-10-18  4:47 ` David Miller
  1 sibling, 0 replies; 7+ messages in thread
From: Banerjee, Debabrata @ 2018-10-12 17:27 UTC (permalink / raw)
  To: David S . Miller, netdev; +Cc: Neil Horman

Actually I realized this patch might be problematic, although someone might be holding rtnl, it might not be the current callstack. A different solution is needed I think. Input appreciated.

-Deb

    
    


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] netpoll: allow cleanup to be synchronous
  2018-10-12 16:59 [PATCH net-next] netpoll: allow cleanup to be synchronous Debabrata Banerjee
  2018-10-12 17:27 ` Banerjee, Debabrata
@ 2018-10-18  4:47 ` David Miller
  2018-10-18 11:29   ` Neil Horman
  1 sibling, 1 reply; 7+ messages in thread
From: David Miller @ 2018-10-18  4:47 UTC (permalink / raw)
  To: dbanerje; +Cc: netdev, nhorman

From: Debabrata Banerjee <dbanerje@akamai.com>
Date: Fri, 12 Oct 2018 12:59:29 -0400

> @@ -826,7 +826,10 @@ static void netpoll_async_cleanup(struct work_struct *work)
>  
>  void __netpoll_free_async(struct netpoll *np)
>  {
> -	schedule_work(&np->cleanup_work);
> +	if (rtnl_is_locked())
> +		__netpoll_cleanup(np);
> +	else
> +		schedule_work(&np->cleanup_work);
>  }

rtnl_is_locked() says only that the RTNL mutex is held by someone.

It does not necessarily say that it is held by the current execution
context.

Which means you could erronesly run this synchronously when another
thread has the RTNL mutex held, not you.

I'm not applying this, sorry.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] netpoll: allow cleanup to be synchronous
  2018-10-18  4:47 ` David Miller
@ 2018-10-18 11:29   ` Neil Horman
  2018-10-18 15:17     ` Banerjee, Debabrata
  0 siblings, 1 reply; 7+ messages in thread
From: Neil Horman @ 2018-10-18 11:29 UTC (permalink / raw)
  To: David Miller; +Cc: dbanerje, netdev

On Wed, Oct 17, 2018 at 09:47:05PM -0700, David Miller wrote:
> From: Debabrata Banerjee <dbanerje@akamai.com>
> Date: Fri, 12 Oct 2018 12:59:29 -0400
> 
> > @@ -826,7 +826,10 @@ static void netpoll_async_cleanup(struct work_struct *work)
> >  
> >  void __netpoll_free_async(struct netpoll *np)
> >  {
> > -	schedule_work(&np->cleanup_work);
> > +	if (rtnl_is_locked())
> > +		__netpoll_cleanup(np);
> > +	else
> > +		schedule_work(&np->cleanup_work);
> >  }
> 
> rtnl_is_locked() says only that the RTNL mutex is held by someone.
> 
> It does not necessarily say that it is held by the current execution
> context.
> 
> Which means you could erronesly run this synchronously when another
> thread has the RTNL mutex held, not you.
> 
> I'm not applying this, sorry.
> 


Agreed, this doesn't make sense.  If you want a synchronous cleanup, create a
wrapper function that creates a wait queue, calls __netpoll_free_async, and
blocks on the wait queue completion.  Modify the cleanup_work method(s) to
complete the wait queue, and you've got what you want.

Neil

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH net-next] netpoll: allow cleanup to be synchronous
  2018-10-18 11:29   ` Neil Horman
@ 2018-10-18 15:17     ` Banerjee, Debabrata
  2018-10-18 16:59       ` Neil Horman
  0 siblings, 1 reply; 7+ messages in thread
From: Banerjee, Debabrata @ 2018-10-18 15:17 UTC (permalink / raw)
  To: 'Neil Horman', David Miller; +Cc: netdev

> From: Neil Horman <nhorman@tuxdriver.com>

> Agreed, this doesn't make sense.  If you want a synchronous cleanup, create
> a wrapper function that creates a wait queue, calls __netpoll_free_async,
> and blocks on the wait queue completion.  Modify the cleanup_work
> method(s) to complete the wait queue, and you've got what you want.
> 
> Neil

Actually the patch looks bad, but it seems to turn out the rtnl is always
held by the parent when it is called, and asynchronous cleanup doesn't
seem to be necessary at all. Perhaps you could share your deadlock
test case for 2cde6acd49da?

Patch v2 coming next.

-Deb

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] netpoll: allow cleanup to be synchronous
  2018-10-18 15:17     ` Banerjee, Debabrata
@ 2018-10-18 16:59       ` Neil Horman
  2018-10-18 17:47         ` Banerjee, Debabrata
  0 siblings, 1 reply; 7+ messages in thread
From: Neil Horman @ 2018-10-18 16:59 UTC (permalink / raw)
  To: Banerjee, Debabrata; +Cc: David Miller, netdev

On Thu, Oct 18, 2018 at 03:17:06PM +0000, Banerjee, Debabrata wrote:
> > From: Neil Horman <nhorman@tuxdriver.com>
> 
> > Agreed, this doesn't make sense.  If you want a synchronous cleanup, create
> > a wrapper function that creates a wait queue, calls __netpoll_free_async,
> > and blocks on the wait queue completion.  Modify the cleanup_work
> > method(s) to complete the wait queue, and you've got what you want.
> > 
> > Neil
> 
> Actually the patch looks bad, but it seems to turn out the rtnl is always
> held by the parent when it is called, and asynchronous cleanup doesn't
> seem to be necessary at all. Perhaps you could share your deadlock
> test case for 2cde6acd49da?
> 

That commit doesn't fix a deadlock, it fixes a modification of data that must
only be modified under the protection of the rtnl lock.  From the commit log:

__netpoll_rcu_free is used to free netpoll structures when the rtnl_lock is
    already held.  The mechanism is used to asynchronously call __netpoll_cleanup
    outside of the holding of the rtnl_lock, so as to avoid deadlock.
    Unfortunately, __netpoll_cleanup modifies pointers (dev->np), which means the
    rtnl_lock must be held while calling it.  Further, it cannot be held, because
    rcu callbacks may be issued in softirq contexts, which cannot sleep.

"The mechanism" is referring to the use of rcu callbacks, which origionally were
put in place to allow the freeing of data outside the holding of the rtnl, which
did avoid deadlock.

This patch fixes the fact that __netpoll_cleanup modifies the dev->np pointer
outside of the rtnl_lock, which is disallowed because readers of dev->np assume
it will be stable while rtnl is held.

As for the workqueue not being needed anymore, it definately used to be, but it
appears that most of the instances where a deadlock might occur have been
modified such that it is no longer the case.  The team driver still seems to be
an outlier there though I think , in that it doesn't guarantee the holding of
rtnl in its port add/delete paths.  

It might be worth cleaning that up and simply replacing all the calls to
__netpoll_free_async with direct calls to __netpoll_cleanup.  That would let you
eliminate the former function alltogether.

Neil

> Patch v2 coming next.
> 
> -Deb
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH net-next] netpoll: allow cleanup to be synchronous
  2018-10-18 16:59       ` Neil Horman
@ 2018-10-18 17:47         ` Banerjee, Debabrata
  0 siblings, 0 replies; 7+ messages in thread
From: Banerjee, Debabrata @ 2018-10-18 17:47 UTC (permalink / raw)
  To: 'Neil Horman'; +Cc: David Miller, netdev

> From: Neil Horman <nhorman@tuxdriver.com> 

> The team driver still seems
> to be an outlier there though I think , in that it doesn't guarantee the holding
> of rtnl in its port add/delete paths.

Not seeing where this is the case in the team driver today? They were actually
calling __netpoll_cleanup synchronously already.

> It might be worth cleaning that up and simply replacing all the calls to
> __netpoll_free_async with direct calls to __netpoll_cleanup.  That would let
> you eliminate the former function alltogether.

I believe v2 accomplishes that, please review.

-Deb

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-10-19  1:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-12 16:59 [PATCH net-next] netpoll: allow cleanup to be synchronous Debabrata Banerjee
2018-10-12 17:27 ` Banerjee, Debabrata
2018-10-18  4:47 ` David Miller
2018-10-18 11:29   ` Neil Horman
2018-10-18 15:17     ` Banerjee, Debabrata
2018-10-18 16:59       ` Neil Horman
2018-10-18 17:47         ` Banerjee, Debabrata

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.