netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* IPv4/IPv6 sysctl unregistration deadlock
@ 2009-02-25  5:23 Patrick McHardy
  2009-02-25  6:19 ` Herbert Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2009-02-25  5:23 UTC (permalink / raw)
  To: Linux Netdev List; +Cc: Herbert Xu

Ben Greear reported sporadically hanging ip processes when adding or
removing MACVLAN devices, which seem to be caused by a race between
sysctl handling and device notifiers.

What is happening is:

Process 1:

- "ip" deletes a macvlan device (or any other kind of device gets
   removed for whatever reason)
- netdev notifier chain notifies IPv6 addrconf while holding the RTNL

Process 2:

- a different process writes something to /proc/sys/net/ipv6/forwarding
- sysctl table is marked as busy, addrconf_sysctl_forward() is invoked
- tries to take rtnl, waits for process 1

Process 1:

- IPv6 begins sysctl unregistration, which is deferred using a
   completion until the table is not busy anymore (=> waiting for
   process 2)

At this point both processes are deadlocked. Judging by a quick look,
IPv4 seems to have the exact same problem.

An easy fix would be to keep track of whether sysctl unregistration
is in progress in IPv4/IPv6 and ignore new requests from that point
on. Its not very elegant though, so I was wondering whether anyone
has a better suggestion.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-25  5:23 IPv4/IPv6 sysctl unregistration deadlock Patrick McHardy
@ 2009-02-25  6:19 ` Herbert Xu
  2009-02-25  6:23   ` Patrick McHardy
  0 siblings, 1 reply; 25+ messages in thread
From: Herbert Xu @ 2009-02-25  6:19 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Linux Netdev List

On Wed, Feb 25, 2009 at 06:23:33AM +0100, Patrick McHardy wrote:
>
> An easy fix would be to keep track of whether sysctl unregistration
> is in progress in IPv4/IPv6 and ignore new requests from that point
> on. Its not very elegant though, so I was wondering whether anyone
> has a better suggestion.

We could make the unregistration asynchronous and invoke a callback
when it's done.  Then we can simply hold a net_device refcount and
relinquish it in the callback.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-25  6:19 ` Herbert Xu
@ 2009-02-25  6:23   ` Patrick McHardy
  2009-02-25  7:18     ` Patrick McHardy
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2009-02-25  6:23 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Linux Netdev List

Herbert Xu wrote:
> On Wed, Feb 25, 2009 at 06:23:33AM +0100, Patrick McHardy wrote:
>> An easy fix would be to keep track of whether sysctl unregistration
>> is in progress in IPv4/IPv6 and ignore new requests from that point
>> on. Its not very elegant though, so I was wondering whether anyone
>> has a better suggestion.
> 
> We could make the unregistration asynchronous and invoke a callback
> when it's done.  Then we can simply hold a net_device refcount and
> relinquish it in the callback

That sounds simple enough. I'll see if I can come up with a patch, thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-25  6:23   ` Patrick McHardy
@ 2009-02-25  7:18     ` Patrick McHardy
  2009-02-25  8:43       ` Herbert Xu
  2009-02-26 16:55       ` Stephen Hemminger
  0 siblings, 2 replies; 25+ messages in thread
From: Patrick McHardy @ 2009-02-25  7:18 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Linux Netdev List

Patrick McHardy wrote:
> Herbert Xu wrote:
>> On Wed, Feb 25, 2009 at 06:23:33AM +0100, Patrick McHardy wrote:
>>> An easy fix would be to keep track of whether sysctl unregistration
>>> is in progress in IPv4/IPv6 and ignore new requests from that point
>>> on. Its not very elegant though, so I was wondering whether anyone
>>> has a better suggestion.
>>
>> We could make the unregistration asynchronous and invoke a callback
>> when it's done.  Then we can simply hold a net_device refcount and
>> relinquish it in the callback
> 
> That sounds simple enough. I'll see if I can come up with a patch, thanks.

Unfortunately its more complicated than I thought because of
device renames, where the sysctl pointer is reused after
unregistration and the rename/unregistration/re-registration
should be atomic. Deferring unregistration means we can't perform
the new registration immediately unless we allow multiple
registrations for a single device to be active simulaneously,
which introduces a whole new set of problems.

Simply ignoring the request during unregistration doesn't seem
so bad after all, the main problem is that it intoduces a different
race on renames where a write to the "forwarding" file returns
success, but the change doesn't take effect. We could return
-ENOENT, but that seems a bit strange after open() returned success.
Maybe -EBUSY, although I would prefer to make this transparent
to userspace.

Another alternative would be to simply not take the RTNL in
the sysctl handler since we're already taking dev_base_lock
before performing any forwaring changes. But in case of IPv4
we need it for disabling LRO.

I think I'm stuck. Will rethink it after some coffee :)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-25  7:18     ` Patrick McHardy
@ 2009-02-25  8:43       ` Herbert Xu
  2009-02-26  6:06         ` Eric W. Biederman
  2009-02-26  6:10         ` Eric W. Biederman
  2009-02-26 16:55       ` Stephen Hemminger
  1 sibling, 2 replies; 25+ messages in thread
From: Herbert Xu @ 2009-02-25  8:43 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Linux Netdev List

On Wed, Feb 25, 2009 at 08:18:47AM +0100, Patrick McHardy wrote:
>
> Unfortunately its more complicated than I thought because of
> device renames, where the sysctl pointer is reused after
> unregistration and the rename/unregistration/re-registration
> should be atomic. Deferring unregistration means we can't perform
> the new registration immediately unless we allow multiple
> registrations for a single device to be active simulaneously,
> which introduces a whole new set of problems.

Good point.

> Simply ignoring the request during unregistration doesn't seem
> so bad after all, the main problem is that it intoduces a different
> race on renames where a write to the "forwarding" file returns
> success, but the change doesn't take effect. We could return
> -ENOENT, but that seems a bit strange after open() returned success.
> Maybe -EBUSY, although I would prefer to make this transparent
> to userspace.

I'd like to avoid that for the rename case just because shell
scripts know how to deal with echo foo > /nonexist/file but not
necessarily a failed echo on write/close.

> I think I'm stuck. Will rethink it after some coffee :)

Yes we need more coffee :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-25  8:43       ` Herbert Xu
@ 2009-02-26  6:06         ` Eric W. Biederman
  2009-02-26  6:10         ` Eric W. Biederman
  1 sibling, 0 replies; 25+ messages in thread
From: Eric W. Biederman @ 2009-02-26  6:06 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Patrick McHardy, Linux Netdev List



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-25  8:43       ` Herbert Xu
  2009-02-26  6:06         ` Eric W. Biederman
@ 2009-02-26  6:10         ` Eric W. Biederman
  2009-02-26  6:22           ` Herbert Xu
  1 sibling, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2009-02-26  6:10 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Patrick McHardy, Linux Netdev List

Herbert Xu <herbert@gondor.apana.org.au> writes:

> On Wed, Feb 25, 2009 at 08:18:47AM +0100, Patrick McHardy wrote:
>>
>> Unfortunately its more complicated than I thought because of
>> device renames, where the sysctl pointer is reused after
>> unregistration and the rename/unregistration/re-registration
>> should be atomic. Deferring unregistration means we can't perform
>> the new registration immediately unless we allow multiple
>> registrations for a single device to be active simulaneously,
>> which introduces a whole new set of problems.
>
> Good point.
>
>> Simply ignoring the request during unregistration doesn't seem
>> so bad after all, the main problem is that it intoduces a different
>> race on renames where a write to the "forwarding" file returns
>> success, but the change doesn't take effect. We could return
>> -ENOENT, but that seems a bit strange after open() returned success.
>> Maybe -EBUSY, although I would prefer to make this transparent
>> to userspace.
>
> I'd like to avoid that for the rename case just because shell
> scripts know how to deal with echo foo > /nonexist/file but not
> necessarily a failed echo on write/close.
>
>> I think I'm stuck. Will rethink it after some coffee :)
>
> Yes we need more coffee :)

How does adding a rename operation to sysctl sound?

I am a little concerned that if we have this issue with sysctl
we also have it with proc and sysfs as well.

Although I admit I don't understand it yet.

Eric



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-26  6:10         ` Eric W. Biederman
@ 2009-02-26  6:22           ` Herbert Xu
  2009-02-26  7:18             ` Eric W. Biederman
  0 siblings, 1 reply; 25+ messages in thread
From: Herbert Xu @ 2009-02-26  6:22 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Patrick McHardy, Linux Netdev List, David S. Miller

On Wed, Feb 25, 2009 at 10:10:33PM -0800, Eric W. Biederman wrote:
>
> How does adding a rename operation to sysctl sound?

Yes that would definitely help.  Of course for the unregister case
we'd still need either an async removal or a no-op as Patrick
suggested.

> I am a little concerned that if we have this issue with sysctl
> we also have it with proc and sysfs as well.

Indeed.  /proc should be OK because it's all read-only so we don't
take the RTNL anywhere.  But sysfs may well have the same issue.

I think we should watch out for any more attempts at adding
netdev name-based sysfs nodes and nip them in the bud.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-26  6:22           ` Herbert Xu
@ 2009-02-26  7:18             ` Eric W. Biederman
  2009-02-26 16:49               ` Stephen Hemminger
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2009-02-26  7:18 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Patrick McHardy, Linux Netdev List, David S. Miller

Herbert Xu <herbert@gondor.apana.org.au> writes:

> On Wed, Feb 25, 2009 at 10:10:33PM -0800, Eric W. Biederman wrote:
>>
>> How does adding a rename operation to sysctl sound?
>
> Yes that would definitely help.  Of course for the unregister case
> we'd still need either an async removal or a no-op as Patrick
> suggested.

After having reread the thread and looking at the code I think
I understand what is happening.

sysctl, proc, and sysfs all need to wait until there are no
more users before their unregister operation succeeds.  So that
we can guarantee that it is safe to remove a module that provides
the callback function.

Currently ndo_stop, NETDEV_DOWN, unlist_netdevice and I don't
know how much other code is run from unregister_netdevice
with the rtnl lock.  If we do an asynchronous unregister
we need to ensure that entire code path is safe without
rtnl_lock.  And we would need to run the unregister work
from rtnl_lock.

Ugh.  netdev_store() and a few other functions in net-sysfs.c
take rtnl_lock.  The instance in netdev_store appears to date
back to 21 May 2003 sometime during 2.5.

So this is an old problem that we are just noticing now. Ugh.

Currently rtnl_lock() protects the netdevice_notifier_chain.
So it appears we need to hold rtnl_lock().

Which leads me to conclude either we need to completely rewrite the
locking rules for the networking stack, or we need to teach the sysfs,
sysctl, and proc how to grab a subsystem lock around a callback.

We already do this for netlink with netlink_create_kernel.

So I guess we need a variants of:
register_sysctl_table, proc_create, and class_create_file.

What a pain, but at least it looks like it can work.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-26  7:18             ` Eric W. Biederman
@ 2009-02-26 16:49               ` Stephen Hemminger
  2009-02-26 19:01                 ` Eric W. Biederman
                                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Stephen Hemminger @ 2009-02-26 16:49 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Herbert Xu, Patrick McHardy, Linux Netdev List, David S. Miller

On Wed, 25 Feb 2009 23:18:42 -0800
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Herbert Xu <herbert@gondor.apana.org.au> writes:
> 
> > On Wed, Feb 25, 2009 at 10:10:33PM -0800, Eric W. Biederman wrote:
> >>
> >> How does adding a rename operation to sysctl sound?
> >
> > Yes that would definitely help.  Of course for the unregister case
> > we'd still need either an async removal or a no-op as Patrick
> > suggested.
> 
> After having reread the thread and looking at the code I think
> I understand what is happening.
> 
> sysctl, proc, and sysfs all need to wait until there are no
> more users before their unregister operation succeeds.  So that
> we can guarantee that it is safe to remove a module that provides
> the callback function.
> 
> Currently ndo_stop, NETDEV_DOWN, unlist_netdevice and I don't
> know how much other code is run from unregister_netdevice
> with the rtnl lock.  If we do an asynchronous unregister
> we need to ensure that entire code path is safe without
> rtnl_lock.  And we would need to run the unregister work
> from rtnl_lock.
> 
> Ugh.  netdev_store() and a few other functions in net-sysfs.c
> take rtnl_lock.  The instance in netdev_store appears to date
> back to 21 May 2003 sometime during 2.5.
> 
> So this is an old problem that we are just noticing now. Ugh.
> 
> Currently rtnl_lock() protects the netdevice_notifier_chain.
> So it appears we need to hold rtnl_lock().
> 
> Which leads me to conclude either we need to completely rewrite the
> locking rules for the networking stack, or we need to teach the sysfs,
> sysctl, and proc how to grab a subsystem lock around a callback.
> 
> We already do this for netlink with netlink_create_kernel.
> 
> So I guess we need a variants of:
> register_sysctl_table, proc_create, and class_create_file.
> 
> What a pain, but at least it looks like it can work.
> 
> Eric

What about something like this:

Subject: [PATCH] Avoid race between network down and sysfs


Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/core/net-sysfs.c	2009-02-26 08:36:18.000000000 -0800
+++ b/net/core/net-sysfs.c	2009-02-26 08:37:51.000000000 -0800
@@ -77,7 +77,9 @@ static ssize_t netdev_store(struct devic
 	if (endp == buf)
 		goto err;
 
-	rtnl_lock();
+	if (!rtnl_trylock())
+		return -ERESTARTSYS;
+
 	if (dev_isalive(net)) {
 		if ((ret = (*set)(net, new)) == 0)
 			ret = len;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-25  7:18     ` Patrick McHardy
  2009-02-25  8:43       ` Herbert Xu
@ 2009-02-26 16:55       ` Stephen Hemminger
  1 sibling, 0 replies; 25+ messages in thread
From: Stephen Hemminger @ 2009-02-26 16:55 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Herbert Xu, Linux Netdev List

On Wed, 25 Feb 2009 08:18:47 +0100
Patrick McHardy <kaber@trash.net> wrote:

> Patrick McHardy wrote:
> > Herbert Xu wrote:
> >> On Wed, Feb 25, 2009 at 06:23:33AM +0100, Patrick McHardy wrote:
> >>> An easy fix would be to keep track of whether sysctl unregistration
> >>> is in progress in IPv4/IPv6 and ignore new requests from that point
> >>> on. Its not very elegant though, so I was wondering whether anyone
> >>> has a better suggestion.
> >>
> >> We could make the unregistration asynchronous and invoke a callback
> >> when it's done.  Then we can simply hold a net_device refcount and
> >> relinquish it in the callback
> > 
> > That sounds simple enough. I'll see if I can come up with a patch, thanks.
> 
> Unfortunately its more complicated than I thought because of
> device renames, where the sysctl pointer is reused after
> unregistration and the rename/unregistration/re-registration
> should be atomic. Deferring unregistration means we can't perform
> the new registration immediately unless we allow multiple
> registrations for a single device to be active simulaneously,
> which introduces a whole new set of problems.
> 
> Simply ignoring the request during unregistration doesn't seem
> so bad after all, the main problem is that it intoduces a different
> race on renames where a write to the "forwarding" file returns
> success, but the change doesn't take effect. We could return
> -ENOENT, but that seems a bit strange after open() returned success.
> Maybe -EBUSY, although I would prefer to make this transparent
> to userspace.
> 
> Another alternative would be to simply not take the RTNL in
> the sysctl handler since we're already taking dev_base_lock
> before performing any forwaring changes. But in case of IPv4
> we need it for disabling LRO.
> 
> I think I'm stuck. Will rethink it after some coffee :)

Will the following help? It punts the problem back out to VFS which
will restart.

--- a/net/ipv6/addrconf.c	2009-02-26 08:51:09.000000000 -0800
+++ b/net/ipv6/addrconf.c	2009-02-26 08:54:08.000000000 -0800
@@ -493,15 +493,17 @@ static void addrconf_forward_change(stru
 	read_unlock(&dev_base_lock);
 }
 
-static void addrconf_fixup_forwarding(struct ctl_table *table, int *p, int old)
+static int addrconf_fixup_forwarding(struct ctl_table *table, int *p, int old)
 {
 	struct net *net;
 
 	net = (struct net *)table->extra2;
 	if (p == &net->ipv6.devconf_dflt->forwarding)
-		return;
+		return 0;
+
+	if (!rtnl_trylock())
+		return -ERESTARTSYS;
 
-	rtnl_lock();
 	if (p == &net->ipv6.devconf_all->forwarding) {
 		__s32 newf = net->ipv6.devconf_all->forwarding;
 		net->ipv6.devconf_dflt->forwarding = newf;
@@ -512,6 +514,7 @@ static void addrconf_fixup_forwarding(st
 
 	if (*p)
 		rt6_purge_dflt_routers(net);
+	return 1;
 }
 #endif
 
@@ -3977,7 +3980,7 @@ int addrconf_sysctl_forward(ctl_table *c
 	ret = proc_dointvec(ctl, write, filp, buffer, lenp, ppos);
 
 	if (write)
-		addrconf_fixup_forwarding(ctl, valp, val);
+		ret = addrconf_fixup_forwarding(ctl, valp, val);
 	return ret;
 }
 
@@ -4013,8 +4016,7 @@ static int addrconf_sysctl_forward_strat
 	}
 
 	*valp = new;
-	addrconf_fixup_forwarding(table, valp, val);
-	return 1;
+	return addrconf_fixup_forwarding(table, valp, val);
 }
 
 static struct addrconf_sysctl_table

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-26 16:49               ` Stephen Hemminger
@ 2009-02-26 19:01                 ` Eric W. Biederman
  2009-02-26 20:24                   ` Stephen Hemminger
  2009-02-27  0:59                 ` Herbert Xu
  2009-02-27 18:26                 ` Ben Greear
  2 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2009-02-26 19:01 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Herbert Xu, Patrick McHardy, Linux Netdev List, David S. Miller

Stephen Hemminger <shemminger@vyatta.com> writes:

> What about something like this:
>
> Subject: [PATCH] Avoid race between network down and sysfs

As far as solutions go.  That looks like the easiest correct solution.
So.  
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>


Will -ERESTARTSYS trigger the in kernel restart logic in this case?

There are a lot more cases to cover, and I don't I like it long
term.  Spinning waiting for rtnl_lock feels wrong.  Plus it does
not help with discovering the problem in new sysfs, sysctl, or
proc files.

It has the major advantage that we can fix things now.

> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
> --- a/net/core/net-sysfs.c	2009-02-26 08:36:18.000000000 -0800
> +++ b/net/core/net-sysfs.c	2009-02-26 08:37:51.000000000 -0800
> @@ -77,7 +77,9 @@ static ssize_t netdev_store(struct devic
>  	if (endp == buf)
>  		goto err;
>  
> -	rtnl_lock();
> +	if (!rtnl_trylock())
> +		return -ERESTARTSYS;
> +
>  	if (dev_isalive(net)) {
>  		if ((ret = (*set)(net, new)) == 0)
>  			ret = len;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-26 19:01                 ` Eric W. Biederman
@ 2009-02-26 20:24                   ` Stephen Hemminger
  0 siblings, 0 replies; 25+ messages in thread
From: Stephen Hemminger @ 2009-02-26 20:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Herbert Xu, Patrick McHardy, Linux Netdev List, David S. Miller

On Thu, 26 Feb 2009 11:01:41 -0800
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Stephen Hemminger <shemminger@vyatta.com> writes:
> 
> > What about something like this:
> >
> > Subject: [PATCH] Avoid race between network down and sysfs
> 
> As far as solutions go.  That looks like the easiest correct solution.
> So.  
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> 
> Will -ERESTARTSYS trigger the in kernel restart logic in this case?
> 
> There are a lot more cases to cover, and I don't I like it long
> term.  Spinning waiting for rtnl_lock feels wrong.  Plus it does
> not help with discovering the problem in new sysfs, sysctl, or
> proc files.
> 
> It has the major advantage that we can fix things now.
> 

I haven't tested it, but it should restart in VFS.
Spinning is not that big a deal, and it also handles the case
where the name changed or some other race occurred during processing.
When syscall is re-entered, the name will no longer be found and
-ENOENT will be returned.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-26 16:49               ` Stephen Hemminger
  2009-02-26 19:01                 ` Eric W. Biederman
@ 2009-02-27  0:59                 ` Herbert Xu
  2009-02-27  1:25                   ` Stephen Hemminger
  2009-02-27 18:26                 ` Ben Greear
  2 siblings, 1 reply; 25+ messages in thread
From: Herbert Xu @ 2009-02-27  0:59 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Eric W. Biederman, Patrick McHardy, Linux Netdev List, David S. Miller

On Thu, Feb 26, 2009 at 08:49:24AM -0800, Stephen Hemminger wrote:
>
> -	rtnl_lock();
> +	if (!rtnl_trylock())
> +		return -ERESTARTSYS;

This is going to spin instead of sleep on contention, is that
intended?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-27  0:59                 ` Herbert Xu
@ 2009-02-27  1:25                   ` Stephen Hemminger
  0 siblings, 0 replies; 25+ messages in thread
From: Stephen Hemminger @ 2009-02-27  1:25 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Eric W. Biederman, Patrick McHardy, Linux Netdev List, David S. Miller

On Fri, 27 Feb 2009 08:59:45 +0800
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Thu, Feb 26, 2009 at 08:49:24AM -0800, Stephen Hemminger wrote:
> >
> > -	rtnl_lock();
> > +	if (!rtnl_trylock())
> > +		return -ERESTARTSYS;
> 
> This is going to spin instead of sleep on contention, is that
> intended?
> 
> Cheers,

It walks all the way back out to VFS, which is what you have to
do since the sysctl may move.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-26 16:49               ` Stephen Hemminger
  2009-02-26 19:01                 ` Eric W. Biederman
  2009-02-27  0:59                 ` Herbert Xu
@ 2009-02-27 18:26                 ` Ben Greear
  2009-02-27 18:38                   ` Stephen Hemminger
  2 siblings, 1 reply; 25+ messages in thread
From: Ben Greear @ 2009-02-27 18:26 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Eric W. Biederman, Herbert Xu, Patrick McHardy,
	Linux Netdev List, David S. Miller

Stephen Hemminger wrote:

> Subject: [PATCH] Avoid race between network down and sysfs
> 
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> --- a/net/core/net-sysfs.c	2009-02-26 08:36:18.000000000 -0800
> +++ b/net/core/net-sysfs.c	2009-02-26 08:37:51.000000000 -0800
> @@ -77,7 +77,9 @@ static ssize_t netdev_store(struct devic
>  	if (endp == buf)
>  		goto err;
>  
> -	rtnl_lock();
> +	if (!rtnl_trylock())
> +		return -ERESTARTSYS;
> +
>  	if (dev_isalive(net)) {
>  		if ((ret = (*set)(net, new)) == 0)
>  			ret = len;

I can test this to see if it fixes my problem.  Are the above lines the
entirety of the patch?

I think I'll add a printk if we do the ERESTARTSYS to make sure that I
hit the race (but still recover).

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-27 18:26                 ` Ben Greear
@ 2009-02-27 18:38                   ` Stephen Hemminger
  2009-03-02 11:07                     ` Patrick McHardy
  2009-03-02 22:11                     ` Ben Greear
  0 siblings, 2 replies; 25+ messages in thread
From: Stephen Hemminger @ 2009-02-27 18:38 UTC (permalink / raw)
  To: Ben Greear
  Cc: Eric W. Biederman, Herbert Xu, Patrick McHardy,
	Linux Netdev List, David S. Miller

On Fri, 27 Feb 2009 10:26:43 -0800
Ben Greear <greearb@candelatech.com> wrote:

> Stephen Hemminger wrote:
> 
> > Subject: [PATCH] Avoid race between network down and sysfs
> > 
> > 
> > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> > 
> > --- a/net/core/net-sysfs.c	2009-02-26 08:36:18.000000000 -0800
> > +++ b/net/core/net-sysfs.c	2009-02-26 08:37:51.000000000 -0800
> > @@ -77,7 +77,9 @@ static ssize_t netdev_store(struct devic
> >  	if (endp == buf)
> >  		goto err;
> >  
> > -	rtnl_lock();
> > +	if (!rtnl_trylock())
> > +		return -ERESTARTSYS;
> > +
> >  	if (dev_isalive(net)) {
> >  		if ((ret = (*set)(net, new)) == 0)
> >  			ret = len;
> 
> I can test this to see if it fixes my problem.  Are the above lines the
> entirety of the patch?

yes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-27 18:38                   ` Stephen Hemminger
@ 2009-03-02 11:07                     ` Patrick McHardy
  2009-03-02 11:21                       ` Patrick McHardy
  2009-03-02 22:11                     ` Ben Greear
  1 sibling, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2009-03-02 11:07 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Ben Greear, Eric W. Biederman, Herbert Xu, Linux Netdev List,
	David S. Miller

Stephen Hemminger wrote:
> On Fri, 27 Feb 2009 10:26:43 -0800
> Ben Greear <greearb@candelatech.com> wrote:
> 
>> Stephen Hemminger wrote:
>>
>>> Subject: [PATCH] Avoid race between network down and sysfs
>>>
>>>
>>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>>>
>>> --- a/net/core/net-sysfs.c	2009-02-26 08:36:18.000000000 -0800
>>> +++ b/net/core/net-sysfs.c	2009-02-26 08:37:51.000000000 -0800
>>> @@ -77,7 +77,9 @@ static ssize_t netdev_store(struct devic
>>>  	if (endp == buf)
>>>  		goto err;
>>>  
>>> -	rtnl_lock();
>>> +	if (!rtnl_trylock())
>>> +		return -ERESTARTSYS;
>>> +
>>>  	if (dev_isalive(net)) {
>>>  		if ((ret = (*set)(net, new)) == 0)
>>>  			ret = len;
>> I can test this to see if it fixes my problem.  Are the above lines the
>> entirety of the patch?
> 
> yes

We should be able to avoid the restart looping in most cases,
we only need to do this while unregistration is in progress.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-03-02 11:07                     ` Patrick McHardy
@ 2009-03-02 11:21                       ` Patrick McHardy
  0 siblings, 0 replies; 25+ messages in thread
From: Patrick McHardy @ 2009-03-02 11:21 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Ben Greear, Eric W. Biederman, Herbert Xu, Linux Netdev List,
	David S. Miller

Patrick McHardy wrote:
> Stephen Hemminger wrote:
>>>> @@ -77,7 +77,9 @@ static ssize_t netdev_store(struct devic
>>>>      if (endp == buf)
>>>>          goto err;
>>>>  
>>>> -    rtnl_lock();
>>>> +    if (!rtnl_trylock())
>>>> +        return -ERESTARTSYS;
>>>> +
>>>>      if (dev_isalive(net)) {
>>>>          if ((ret = (*set)(net, new)) == 0)
>>>>              ret = len;
>>> I can test this to see if it fixes my problem.  Are the above lines the
>>> entirety of the patch?
>>
>> yes
> 
> We should be able to avoid the restart looping in most cases,
> we only need to do this while unregistration is in progress.

It doesn't seem to work. The idea was something like this:

static int addrconf_fixup_forwarding(struct ctl_table *table, int *p, 
int old)
{
	struct inet6_dev *idev;
	struct net *net;

	net = (struct net *)table->extra2;
	if (p == &net->ipv6.devconf_dflt->forwarding)
		return 0;

	/* Unregistration in progress */
	idev = table->extra1;
	if (idev->cnf.sysctl == NULL)
		return -ERESTARTSYS;

	rtnl_lock();
...

but the process might have entered this function while the rtnl
is already held, but the unregistration hasn't been triggered yet.
So it would still deadlock.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-02-27 18:38                   ` Stephen Hemminger
  2009-03-02 11:07                     ` Patrick McHardy
@ 2009-03-02 22:11                     ` Ben Greear
  2009-03-02 22:20                       ` Patrick McHardy
  1 sibling, 1 reply; 25+ messages in thread
From: Ben Greear @ 2009-03-02 22:11 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Eric W. Biederman, Herbert Xu, Patrick McHardy,
	Linux Netdev List, David S. Miller

Stephen Hemminger wrote:
> On Fri, 27 Feb 2009 10:26:43 -0800
> Ben Greear <greearb@candelatech.com> wrote:
> 
>> Stephen Hemminger wrote:
>>
>>> Subject: [PATCH] Avoid race between network down and sysfs
>>>
>>>
>>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>>>
>>> --- a/net/core/net-sysfs.c	2009-02-26 08:36:18.000000000 -0800
>>> +++ b/net/core/net-sysfs.c	2009-02-26 08:37:51.000000000 -0800
>>> @@ -77,7 +77,9 @@ static ssize_t netdev_store(struct devic
>>>  	if (endp == buf)
>>>  		goto err;
>>>  
>>> -	rtnl_lock();
>>> +	if (!rtnl_trylock())
>>> +		return -ERESTARTSYS;
>>> +
>>>  	if (dev_isalive(net)) {
>>>  		if ((ret = (*set)(net, new)) == 0)
>>>  			ret = len;
>> I can test this to see if it fixes my problem.  Are the above lines the
>> entirety of the patch?
> 
> yes

With both of Stephen's patches included in the latest -rc6 source,
I re-ran the test and it seems to be working (I added printks so
that I would know the new code was being exercised)

I had 2000 or so mac-vlans configured, with 10 of them being
re-configured concurrently, while also deleting groups of 20-100
mac-vlans in my test.  This was locking up reliably before,
and now it seems to be working fine.

Here's the kernel log showing the ERESTARTSYS in action.  I don't
have an easy way to check to see if the VFS (or whatever) retried
the call properly, but will let you all know if I see any indication
that isn't working.

I only saw the ipv6 fixup in my logs, but maybe my test case just
doesn't hit the other...

Mar  2 13:37:41 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:37:41 simech-1 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
Mar  2 13:37:42 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:38:13 simech-1 kernel:last message repeated 42 times
Mar  2 13:39:20 simech-1 kernel:last message repeated 71 times
Mar  2 13:40:13 simech-1 kernel:last message repeated 68 times
Mar  2 13:40:13 simech-1 dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 8
Mar  2 13:40:13 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:40:21 simech-1 kernel:last message repeated 11 times
Mar  2 13:40:21 simech-1 dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 13
Mar  2 13:40:21 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:40:28 simech-1 kernel:last message repeated 16 times
Mar  2 13:40:28 simech-1 kernel: __ratelimit: 3352 callbacks suppressed
Mar  2 13:40:28 simech-1 kernel: Neighbour table overflow.
Mar  2 13:40:28 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:40:34 simech-1 kernel:last message repeated 8 times
Mar  2 13:40:34 simech-1 dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 20
Mar  2 13:40:34 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:40:54 simech-1 kernel:last message repeated 42 times
Mar  2 13:40:54 simech-1 dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 7
Mar  2 13:40:54 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:41:01 simech-1 kernel:last message repeated 13 times
Mar  2 13:41:01 simech-1 dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 13
Mar  2 13:41:01 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:41:14 simech-1 kernel:last message repeated 26 times
Mar  2 13:41:14 simech-1 dhclient: No DHCPOFFERS received.
Mar  2 13:41:14 simech-1 dhclient: No working leases in persistent database - sleeping.
Mar  2 13:41:15 simech-1 dhclient: receive_packet failed on eth1: Network is down
Mar  2 13:41:16 simech-1 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
Mar  2 13:41:17 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:41:24 simech-1 kernel:last message repeated 16 times
Mar  2 13:41:24 simech-1 kernel: Neighbour table overflow.
Mar  2 13:41:25 simech-1 kernel:last message repeated 8 times
Mar  2 13:41:25 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:41:30 simech-1 kernel:last message repeated 11 times
Mar  2 13:41:30 simech-1 kernel: __ratelimit: 7059 callbacks suppressed
Mar  2 13:41:30 simech-1 kernel: Neighbour table overflow.
Mar  2 13:41:30 simech-1 kernel:last message repeated 9 times
Mar  2 13:41:30 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:41:35 simech-1 kernel:last message repeated 10 times
Mar  2 13:41:35 simech-1 kernel: __ratelimit: 7973 callbacks suppressed
Mar  2 13:41:35 simech-1 kernel: Neighbour table overflow.
Mar  2 13:41:36 simech-1 kernel:last message repeated 9 times
Mar  2 13:41:36 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:41:40 simech-1 kernel:last message repeated 7 times
Mar  2 13:41:40 simech-1 kernel: __ratelimit: 10890 callbacks suppressed
Mar  2 13:41:40 simech-1 kernel: Neighbour table overflow.
Mar  2 13:41:40 simech-1 kernel:last message repeated 9 times
Mar  2 13:41:40 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:41:46 simech-1 kernel:last message repeated 13 times
Mar  2 13:41:46 simech-1 kernel: __ratelimit: 6571 callbacks suppressed
Mar  2 13:41:46 simech-1 kernel: Neighbour table overflow.
Mar  2 13:41:46 simech-1 kernel:last message repeated 9 times
Mar  2 13:41:46 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:41:51 simech-1 kernel:last message repeated 7 times
Mar  2 13:41:51 simech-1 kernel: __ratelimit: 15491 callbacks suppressed
Mar  2 13:41:51 simech-1 kernel: Neighbour table overflow.
Mar  2 13:41:51 simech-1 kernel:last message repeated 9 times
Mar  2 13:41:51 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:41:56 simech-1 kernel:last message repeated 9 times
Mar  2 13:41:56 simech-1 kernel: __ratelimit: 9658 callbacks suppressed
Mar  2 13:41:56 simech-1 kernel: Neighbour table overflow.
Mar  2 13:41:56 simech-1 kernel:last message repeated 9 times
Mar  2 13:41:56 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:42:01 simech-1 kernel:last message repeated 10 times
Mar  2 13:42:01 simech-1 kernel: __ratelimit: 12678 callbacks suppressed
Mar  2 13:42:01 simech-1 kernel: Neighbour table overflow.
Mar  2 13:42:01 simech-1 kernel:last message repeated 9 times
Mar  2 13:42:01 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:42:06 simech-1 kernel:last message repeated 7 times
Mar  2 13:42:06 simech-1 kernel: __ratelimit: 13401 callbacks suppressed
Mar  2 13:42:06 simech-1 kernel: Neighbour table overflow.
Mar  2 13:42:06 simech-1 kernel:last message repeated 9 times
Mar  2 13:42:06 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:42:11 simech-1 kernel:last message repeated 10 times
Mar  2 13:42:11 simech-1 kernel: __ratelimit: 11875 callbacks suppressed
Mar  2 13:42:11 simech-1 kernel: Neighbour table overflow.
Mar  2 13:42:11 simech-1 kernel:last message repeated 9 times
Mar  2 13:42:11 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.
Mar  2 13:42:16 simech-1 kernel:last message repeated 9 times
Mar  2 13:42:16 simech-1 kernel: __ratelimit: 9362 callbacks suppressed
Mar  2 13:42:16 simech-1 kernel: Neighbour table overflow.
Mar  2 13:42:17 simech-1 kernel:last message repeated 9 times
Mar  2 13:42:17 simech-1 kernel: Returning ERESTARTSYS in ipv6-addrconf-fixup-forwarding to avoid deadlock.


Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-03-02 22:11                     ` Ben Greear
@ 2009-03-02 22:20                       ` Patrick McHardy
  2009-03-02 22:47                         ` David Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2009-03-02 22:20 UTC (permalink / raw)
  To: Ben Greear
  Cc: Stephen Hemminger, Eric W. Biederman, Herbert Xu,
	Linux Netdev List, David S. Miller

Ben Greear wrote:
> With both of Stephen's patches included in the latest -rc6 source,
> I re-ran the test and it seems to be working (I added printks so
> that I would know the new code was being exercise
> 
> I had 2000 or so mac-vlans configured, with 10 of them being
> re-configured concurrently, while also deleting groups of 20-100
> mac-vlans in my test.  This was locking up reliably before,
> and now it seems to be working fine.
> 
> Here's the kernel log showing the ERESTARTSYS in action.  I don't
> have an easy way to check to see if the VFS (or whatever) retried
> the call properly, but will let you all know if I see any indication
> that isn't working.
> 
> I only saw the ipv6 fixup in my logs, but maybe my test case just
> doesn't hit the other...

This looks like its working fine. Despite the non-desirable active
spinning, this seems like the best fix (actually much simpler than
I expected to be possible) at this time. If we just could avoid
the spinning when unnecessary, it would be perfect :)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-03-02 22:20                       ` Patrick McHardy
@ 2009-03-02 22:47                         ` David Miller
  2009-03-02 23:03                           ` Patrick McHardy
  0 siblings, 1 reply; 25+ messages in thread
From: David Miller @ 2009-03-02 22:47 UTC (permalink / raw)
  To: kaber; +Cc: greearb, shemminger, ebiederm, herbert, netdev

From: Patrick McHardy <kaber@trash.net>
Date: Mon, 02 Mar 2009 23:20:49 +0100

> Ben Greear wrote:
> > With both of Stephen's patches included in the latest -rc6 source,
> > I re-ran the test and it seems to be working (I added printks so
> > that I would know the new code was being exercise
> > I had 2000 or so mac-vlans configured, with 10 of them being
> > re-configured concurrently, while also deleting groups of 20-100
> > mac-vlans in my test.  This was locking up reliably before,
> > and now it seems to be working fine.
> > Here's the kernel log showing the ERESTARTSYS in action.  I don't
> > have an easy way to check to see if the VFS (or whatever) retried
> > the call properly, but will let you all know if I see any indication
> > that isn't working.
> > I only saw the ipv6 fixup in my logs, but maybe my test case just
> > doesn't hit the other...
> 
> This looks like its working fine. Despite the non-desirable active
> spinning, this seems like the best fix (actually much simpler than
> I expected to be possible) at this time. If we just could avoid
> the spinning when unnecessary, it would be perfect :)

Could you give that "not actually in-progress" detection a shot?

I don't like the spinning either.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-03-02 22:47                         ` David Miller
@ 2009-03-02 23:03                           ` Patrick McHardy
  2009-03-03  8:48                             ` David Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2009-03-02 23:03 UTC (permalink / raw)
  To: David Miller; +Cc: greearb, shemminger, ebiederm, herbert, netdev

David Miller wrote:
> From: Patrick McHardy <kaber@trash.net>
> Date: Mon, 02 Mar 2009 23:20:49 +0100
> 
>> This looks like its working fine. Despite the non-desirable active
>> spinning, this seems like the best fix (actually much simpler than
>> I expected to be possible) at this time. If we just could avoid
>> the spinning when unnecessary, it would be perfect :)
> 
> Could you give that "not actually in-progress" detection a shot?
> 
> I don't like the spinning either.


I tried this morning, the problem is that its always the sysctl
handler which will run into the deadlock first, but there is no
reliable indication to avoid it other than that the RTNL is
already held. The problem is that the sysctl interface puts the
process holding the RTNL to sleep and allows a process requiring
it to run. Any different synchronization attempt will have the
same problem, it seems you simply can't hold any locks while
unregistering sysctls.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-03-02 23:03                           ` Patrick McHardy
@ 2009-03-03  8:48                             ` David Miller
  2009-03-08  3:36                               ` Eric W. Biederman
  0 siblings, 1 reply; 25+ messages in thread
From: David Miller @ 2009-03-03  8:48 UTC (permalink / raw)
  To: kaber; +Cc: greearb, shemminger, ebiederm, herbert, netdev

From: Patrick McHardy <kaber@trash.net>
Date: Tue, 03 Mar 2009 00:03:00 +0100

> David Miller wrote:
> > From: Patrick McHardy <kaber@trash.net>
> > Date: Mon, 02 Mar 2009 23:20:49 +0100
> > 
> >> This looks like its working fine. Despite the non-desirable active
> >> spinning, this seems like the best fix (actually much simpler than
> >> I expected to be possible) at this time. If we just could avoid
> >> the spinning when unnecessary, it would be perfect :)
> > Could you give that "not actually in-progress" detection a shot?
> > I don't like the spinning either.
> 
> I tried this morning, the problem is that its always the sysctl
> handler which will run into the deadlock first, but there is no
> reliable indication to avoid it other than that the RTNL is
> already held. The problem is that the sysctl interface puts the
> process holding the RTNL to sleep and allows a process requiring
> it to run. Any different synchronization attempt will have the
> same problem, it seems you simply can't hold any locks while
> unregistering sysctls.

Ok, I applied Stephen's patches then...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: IPv4/IPv6 sysctl unregistration deadlock
  2009-03-03  8:48                             ` David Miller
@ 2009-03-08  3:36                               ` Eric W. Biederman
  0 siblings, 0 replies; 25+ messages in thread
From: Eric W. Biederman @ 2009-03-08  3:36 UTC (permalink / raw)
  To: David Miller; +Cc: kaber, greearb, shemminger, herbert, netdev

David Miller <davem@davemloft.net> writes:

> From: Patrick McHardy <kaber@trash.net>
> Date: Tue, 03 Mar 2009 00:03:00 +0100
>
>> David Miller wrote:
>> > From: Patrick McHardy <kaber@trash.net>
>> > Date: Mon, 02 Mar 2009 23:20:49 +0100
>> > 
>> >> This looks like its working fine. Despite the non-desirable active
>> >> spinning, this seems like the best fix (actually much simpler than
>> >> I expected to be possible) at this time. If we just could avoid
>> >> the spinning when unnecessary, it would be perfect :)
>> > Could you give that "not actually in-progress" detection a shot?
>> > I don't like the spinning either.
>> 
>> I tried this morning, the problem is that its always the sysctl
>> handler which will run into the deadlock first, but there is no
>> reliable indication to avoid it other than that the RTNL is
>> already held. The problem is that the sysctl interface puts the
>> process holding the RTNL to sleep and allows a process requiring
>> it to run. Any different synchronization attempt will have the
>> same problem, it seems you simply can't hold any locks while
>> unregistering sysctls.
>
> Ok, I applied Stephen's patches then...

>From the further brainstorming department...

It appears we are using the wait for the completion
for two distinct purposes.

Waiting until it is safe to free storage.

Blocking module unregistration until the all of the
users of the code are gone.

I'm wondering if we could move the memory freeing.

And consolidate the waiting ultimately moving the
wait for completion into netdev_run_todo after
we drop the lock.

The reason I care is that it looks like to get hotplug
handling sane, I'm going to need to implement a generic
version of what sysfs, proc and sysctl are doing, and I
expect that generic version belongs in the vfs ultimately
giving us the capability to implement revoke.

So I am going to be revisiting how the code works, and if
I can come up with a cleaner solution to the networking
stack it would be a good time to implement it.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2009-03-08  3:37 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-25  5:23 IPv4/IPv6 sysctl unregistration deadlock Patrick McHardy
2009-02-25  6:19 ` Herbert Xu
2009-02-25  6:23   ` Patrick McHardy
2009-02-25  7:18     ` Patrick McHardy
2009-02-25  8:43       ` Herbert Xu
2009-02-26  6:06         ` Eric W. Biederman
2009-02-26  6:10         ` Eric W. Biederman
2009-02-26  6:22           ` Herbert Xu
2009-02-26  7:18             ` Eric W. Biederman
2009-02-26 16:49               ` Stephen Hemminger
2009-02-26 19:01                 ` Eric W. Biederman
2009-02-26 20:24                   ` Stephen Hemminger
2009-02-27  0:59                 ` Herbert Xu
2009-02-27  1:25                   ` Stephen Hemminger
2009-02-27 18:26                 ` Ben Greear
2009-02-27 18:38                   ` Stephen Hemminger
2009-03-02 11:07                     ` Patrick McHardy
2009-03-02 11:21                       ` Patrick McHardy
2009-03-02 22:11                     ` Ben Greear
2009-03-02 22:20                       ` Patrick McHardy
2009-03-02 22:47                         ` David Miller
2009-03-02 23:03                           ` Patrick McHardy
2009-03-03  8:48                             ` David Miller
2009-03-08  3:36                               ` Eric W. Biederman
2009-02-26 16:55       ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).