All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/1] net: race condition when removing virtual net_device
       [not found] <1379008796-2121-1-git-send-email-fruggeri@aristanetworks.com>
@ 2013-09-12 20:06 ` Eric W. Biederman
  2013-09-12 21:48   ` Francesco Ruggeri
  2013-09-14  0:16   ` [PATCH 1/1] net: race condition when removing virtual net_device David Miller
  0 siblings, 2 replies; 36+ messages in thread
From: Eric W. Biederman @ 2013-09-12 20:06 UTC (permalink / raw)
  To: Francesco Ruggeri
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev

Francesco Ruggeri <fruggeri@aristanetworks.com> writes:

> Resending.

To summarize:

In netdev_run_todo, netdev_wait_allrefs, call_netdevice_notifiers you
have observed a situation where dev_net(dev) was an invalid pointer.

My first impression is that we probably just want to remove the repeated
call to call_netdevice_notifiers, that happens after 1 second of
waiting.

Your suggested patch below doesn't have a prayer of working, as it only
decreases the device reference count after the loop to wait for the
reference count to go to zero.

Your patch modified to grab a count on the network namespace the device
references and not the device itself might make sense, but that runs the
risk of incrementing the network namespace counts after the network
namespace is down.

Simply not rerunning call_netdevice_notifiers seems like the proper
approach to fix this.

So I think the fix will probably just be:

David, Eric, do any of you know why we have this netdevice notfier
rebroadcast?

diff --git a/net/core/dev.c b/net/core/dev.c
index a3d8d44..e6bd828 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5556,42 +5556,15 @@ EXPORT_SYMBOL(netdev_refcnt_read);
  */
 static void netdev_wait_allrefs(struct net_device *dev)
 {
-       unsigned long rebroadcast_time, warning_time;
+       unsigned long warning_time;
        int refcnt;
 
        linkwatch_forget_dev(dev);
 
-       rebroadcast_time = warning_time = jiffies;
+       warning_time = jiffies;
        refcnt = netdev_refcnt_read(dev);
 
        while (refcnt != 0) {
-               if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
-                       rtnl_lock();
-
-                       /* Rebroadcast unregister notification */
-                       call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
-
-                       __rtnl_unlock();
-                       rcu_barrier();
-                       rtnl_lock();
-
-                       call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
-                       if (test_bit(__LINK_STATE_LINKWATCH_PENDING,
-                                    &dev->state)) {
-                               /* We must not have linkwatch events
-                                * pending on unregister. If this
-                                * happens, we simply run the queue
-                                * unscheduled, resulting in a noop
-                                * for this device.
-                                */
-                               linkwatch_run_queue();
-                       }
-
-                       __rtnl_unlock();
-
-                       rebroadcast_time = jiffies;
-               }
-
                msleep(250);
 
                refcnt = netdev_refcnt_read(dev);

> There is a race condition when removing a net_device while its net namespace
> is also being removed.
> This can result in a crash or other unpredictable behavior.
> This is a sample scenario with veth, but the same applies to other virtual
> devices such as vlan and macvlan.
> veth pair v0-v1 is created, with v0 in namespace ns0 and v1 in ns1.
> Process p0 deletes v0. v1 is also unregistered (in veth_dellink), so when p0
> gets to netdev_run_todo both v0 and v1 are in net_todo_list, and they have both
> been unlisted from their respective namespaces. If all references to v1 have not
> already been dropped then netdev_run_todo/netdev_wait_allrefs will call netdev
> notifiers for v1, releasing the rtnl lock between calls.
> Now process p1 removes namespace ns1. v1 will not prevent this from happening
> (in default_device_exit_batch) since it was already unlisted by p0.
> Next time p0 invokes the notifiers for v1 any notifiers that use dev_net(v1)
> will get a pointer to a namespace that has been or is being removed.
> Similar scenarios apply with v1 as a vlan or macvlan interface and v0 as its
> real device.
> We hit this problem in 3.4 with sequence fib_netdev_event, fib_disable_ip,
> rt_cache_flush, rt_cache_invalidate, inetpeer_invalidate_tree. That sequence no
> longer applies in later kernels, but unless we can guarantee that no
> NETDEV_UNREGISTER or NETDEV_UNREGISTER_FINAL handler accesses a net_device's
> dev_net(dev) then there is a vulnerability (this happens for example with
> NETDEV_UNREGISTER_FINAL in dst_dev_event/dst_ifdown).
> Commit 0115e8e3 later made things better by reducing the chances of this
> happening, but the underlying problem still seems to be there.
> I would like to get some feedback on this patch.
> The idea is to take a reference to the loopback_dev of a net_device's
> namespace (which is always the last net_device to be removed when a namespace
> is destroyed) when the net_device is unlisted, and release it when the
> net_device is disposed of.
> To avoid deadlocks in cleanup_net all loopback_devs are queued at the end
> of net_todo_list.
>
> Tested on Linux 3.4.
>
> Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
> ---
>  net/core/dev.c |   30 +++++++++++++++++++++++++++++-
>  1 files changed, 29 insertions(+), 1 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 26755dd..da2fd78 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -225,10 +225,14 @@ static void list_netdevice(struct net_device *dev)
>  }
>  
>  /* Device list removal
> + * Take a reference to dev_net(dev)->loopback_dev, so dev_net(dev)
> + * will not be freed until we are done with dev.
>   * caller must respect a RCU grace period before freeing/reusing dev
>   */
>  static void unlist_netdevice(struct net_device *dev)
>  {
> +	struct net_device *loopback_dev = dev_net(dev)->loopback_dev;
> +
>  	ASSERT_RTNL();
>  
>  	/* Unlink dev from the device chain */
> @@ -238,9 +242,23 @@ static void unlist_netdevice(struct net_device *dev)
>  	hlist_del_rcu(&dev->index_hlist);
>  	write_unlock_bh(&dev_base_lock);
>  
> +	if (dev != loopback_dev)
> +		dev_hold(loopback_dev);
> +
>  	dev_base_seq_inc(dev_net(dev));
>  }
>  
> +/**
> + * Called when a net_device that has been previously unlisted from a net
> + * namespace is disposed of.
> + */
> +static inline void unlist_netdevice_done(struct net_device *dev)
> +{
> +	struct net_device *loopback_dev = dev_net(dev)->loopback_dev;
> +	if (dev != loopback_dev)
> +		dev_put(loopback_dev);
> +}
> +
>  /*
>   *	Our notifier list
>   */
> @@ -5009,10 +5027,17 @@ static int dev_new_index(struct net *net)
>  
>  /* Delayed registration/unregisteration */
>  static LIST_HEAD(net_todo_list);
> +static struct list_head *first_loopback_dev = &net_todo_list;
>  
>  static void net_set_todo(struct net_device *dev)
>  {
> -	list_add_tail(&dev->todo_list, &net_todo_list);
> +	/* All loopback_devs go to end of net_todo_list. */
> +	if (dev == dev_net(dev)->loopback_dev) {
> +		list_add_tail(&dev->todo_list, &net_todo_list);
> +		if (first_loopback_dev == &net_todo_list)
> +			first_loopback_dev = &dev->todo_list;
> +	} else
> +		list_add_tail(&dev->todo_list, first_loopback_dev);
>  }
>  
>  static void rollback_registered_many(struct list_head *head)
> @@ -5641,6 +5666,7 @@ void netdev_run_todo(void)
>  
>  	/* Snapshot list, allow later requests */
>  	list_replace_init(&net_todo_list, &list);
> +	first_loopback_dev = &net_todo_list;
>  
>  	__rtnl_unlock();
>  
> @@ -5670,6 +5696,7 @@ void netdev_run_todo(void)
>  		on_each_cpu(flush_backlog, dev, 1);
>  
>  		netdev_wait_allrefs(dev);
> +		unlist_netdevice_done(dev);
>  
>  		/* paranoia */
>  		BUG_ON(netdev_refcnt_read(dev));
> @@ -6076,6 +6103,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
>  	kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
>  
>  	/* Actually switch the network namespace */
> +	unlist_netdevice_done(dev);
>  	dev_net_set(dev, net);
>  
>  	/* If there is an ifindex conflict assign a new one */

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] net: race condition when removing virtual net_device
  2013-09-12 20:06 ` [PATCH 1/1] net: race condition when removing virtual net_device Eric W. Biederman
@ 2013-09-12 21:48   ` Francesco Ruggeri
  2013-09-12 22:02     ` Francesco Ruggeri
  2013-09-13  5:50     ` Eric W. Biederman
  2013-09-14  0:16   ` [PATCH 1/1] net: race condition when removing virtual net_device David Miller
  1 sibling, 2 replies; 36+ messages in thread
From: Francesco Ruggeri @ 2013-09-12 21:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev

On Thu, Sep 12, 2013 at 1:06 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Francesco Ruggeri <fruggeri@aristanetworks.com> writes:
>
>> Resending.
>
> To summarize:
>
> In netdev_run_todo, netdev_wait_allrefs, call_netdevice_notifiers you
> have observed a situation where dev_net(dev) was an invalid pointer.
>

Correct.


> My first impression is that we probably just want to remove the repeated
> call to call_netdevice_notifiers, that happens after 1 second of
> waiting.
>
> Your suggested patch below doesn't have a prayer of working, as it only
> decreases the device reference count after the loop to wait for the
> reference count to go to zero.
>

Actually it did work. In 3.4 it avoided the crash in
inetpeer_invalidate_tree and it did not cause any visible adverse
effects after running it several days on multiple servers that create
and destroy namespaces on a regular basis.
The extra refcount that is taken and released is on the loopback_dev
of the namespace that the net_device is in. The assumption is that
loopback_dev is always the last net_device to be destroyed in a
namespace. So each net_device in a namespace (except for the
loopback_dev itself) takes such reference when it is unlisted, and it
releases it after it is destroyed (after netdev_wait_allrefs).
The patch also makes sure that any loopback_devs are at the tail of
net_todo_list within every namespace in every instance of
netdev_run_todo. This guarantees that no deadlocks are introduced,
since the relative order of net_devices within each namespace is
preserved, and only non-loopback_devs take and release the extra
reference.

> Your patch modified to grab a count on the network namespace the device
> references and not the device itself might make sense, but that runs the
> risk of incrementing the network namespace counts after the network
> namespace is down.

Right.

>
> Simply not rerunning call_netdevice_notifiers seems like the proper
> approach to fix this.
>

That would be great. There would still be one scenario to take care of though:

- veth interfaces v0 and v1 are in namespaces ns0 and ns1.
- process p0 unregisters v0, which also causes v1 to be unregistered.
When p0 enters netdev_run_todo both v0 and v1 are in net_todo_list and
have been unlisted from their namespaces.
- then in p0's netdev_run_todo:

void netdev_run_todo(void)
{
        struct list_head list;

        /* Snapshot list, allow later requests */
        list_replace_init(&net_todo_list, &list);

        __rtnl_unlock();


        /* Wait for rcu callbacks to finish before next phase */
        if (!list_empty(&list))
                rcu_barrier();

********* Assume ns1 is destroyed by another process here ************

        while (!list_empty(&list)) {
                struct net_device *dev
                        = list_first_entry(&list, struct net_device, todo_list);
                list_del(&dev->todo_list);

                rtnl_lock();
                call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);

********* dst_dev_event/dst_ifdown is invoked here, and it tries to access
            dev_net(dev)->loopback_dev.
            In case of v1 this would be an invalid pointer. *****************

                __rtnl_unlock();

Similar scenarios apply if v1 is a vlan or macvlan interface, and v0
is its real device.

Francesco

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] net: race condition when removing virtual net_device
  2013-09-12 21:48   ` Francesco Ruggeri
@ 2013-09-12 22:02     ` Francesco Ruggeri
  2013-09-13  5:50     ` Eric W. Biederman
  1 sibling, 0 replies; 36+ messages in thread
From: Francesco Ruggeri @ 2013-09-12 22:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev

On Thu, Sep 12, 2013 at 2:48 PM, Francesco Ruggeri
<fruggeri@aristanetworks.com> wrote:
> On Thu, Sep 12, 2013 at 1:06 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Francesco Ruggeri <fruggeri@aristanetworks.com> writes:
...
> netdev_run_todo. This guarantees that no deadlocks are introduced,
> since the relative order of net_devices within each namespace is
> preserved, and only non-loopback_devs take and release the extra
> reference.

I should have said "since in net_todo_list the relative order of
net_devices within each namespace is the same as before, ...".

Sorry for any confusion.

Francesco

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] net: race condition when removing virtual net_device
  2013-09-12 21:48   ` Francesco Ruggeri
  2013-09-12 22:02     ` Francesco Ruggeri
@ 2013-09-13  5:50     ` Eric W. Biederman
  2013-09-13 17:54       ` Francesco Ruggeri
  1 sibling, 1 reply; 36+ messages in thread
From: Eric W. Biederman @ 2013-09-13  5:50 UTC (permalink / raw)
  To: Francesco Ruggeri
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev

Francesco Ruggeri <fruggeri@aristanetworks.com> writes:

> That would be great. There would still be one scenario to take care of though:
>
> - veth interfaces v0 and v1 are in namespaces ns0 and ns1.
> - process p0 unregisters v0, which also causes v1 to be unregistered.
> When p0 enters netdev_run_todo both v0 and v1 are in net_todo_list and
> have been unlisted from their namespaces.
> - then in p0's netdev_run_todo:

So I looked at this a little more and this problem appears largely
specific to veth.  In the normal case the caller of dellink has to hold
a reference to the network namespace to find the device to delete.

So I think the solution is just to warp the interface of the second
device into the network namespace of the device we are actually
deleting.

I will buy that similar situations can happen with other virtual devices
that have one foot in two network namespaces, and I expect the same
solution will apply.

So the patch below looks like the solution.  If there is more than one
device that needs this treatment perhaps the code should be moved
into a helper function rather than expanded inline.

Does this look like it will fix your issue?

Eric


diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index da86652..5922066 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -423,6 +423,19 @@ static void veth_dellink(struct net_device *dev, struct list_head *head)
        unregister_netdevice_queue(dev, head);
 
        if (peer) {
+               struct net *net = dev_net(dev);
+               if (dev_net(peer) != net) {
+                       /* Move the peer to the same net to avoid teardown races */
+                       char peer_name[IFNAMSIZ];
+                       int err;
+                       snprintf(fb_name, IFNAMSIZ, "dev%d", peer->ifindex);
+                       err = dev_change_net_namespace(peer, net, peer_name);
+                       if (err) {
+                               pr_emerg("%s: failed to move %s to peers net: %d\n",
+                                        __func__, peer->name, err);
+                               BUG();
+                       }
+               }
                priv = netdev_priv(peer);
                RCU_INIT_POINTER(priv->peer, NULL);
                unregister_netdevice_queue(peer, head);

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] net: race condition when removing virtual net_device
  2013-09-13  5:50     ` Eric W. Biederman
@ 2013-09-13 17:54       ` Francesco Ruggeri
  2013-09-14  1:46         ` Eric W. Biederman
  0 siblings, 1 reply; 36+ messages in thread
From: Francesco Ruggeri @ 2013-09-13 17:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev

On Thu, Sep 12, 2013 at 10:50 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Francesco Ruggeri <fruggeri@aristanetworks.com> writes:
>
>> That would be great. There would still be one scenario to take care of though:
>>
>> - veth interfaces v0 and v1 are in namespaces ns0 and ns1.
>> - process p0 unregisters v0, which also causes v1 to be unregistered.
>> When p0 enters netdev_run_todo both v0 and v1 are in net_todo_list and
>> have been unlisted from their namespaces.
>> - then in p0's netdev_run_todo:
>
> So I looked at this a little more and this problem appears largely
> specific to veth.  In the normal case the caller of dellink has to hold
> a reference to the network namespace to find the device to delete.
>
> So I think the solution is just to warp the interface of the second
> device into the network namespace of the device we are actually
> deleting.
>
> I will buy that similar situations can happen with other virtual devices
> that have one foot in two network namespaces, and I expect the same
> solution will apply.
>
> So the patch below looks like the solution.  If there is more than one
> device that needs this treatment perhaps the code should be moved
> into a helper function rather than expanded inline.
>
> Does this look like it will fix your issue?
>
> Eric
>

To summarize your proposal:
1) Remove re-broadcasting of NETDEV_UNREGISTER and
NETDEV_UNREGISTER_FINAL from netdev_wait_allrefs.
2) If unregistering a net_device causes unregistering of other virtual
devices (eg veth, macvlan, vlan) then move the virtual devices to the
namespace of the original net_device.

I do have some concerns about both correctness and feasibility of this approach.

About 2), namespace dependent operations triggered by unregistering
the virtual devices (eg rt_flush_dev, dst_dev_event/dst_ifdown and
probably more) would not be done in the namespaces where they should.

About the feasibility of 1), consider just as an example dst_dev_event
in NETDEV_UNREGISTER_FINAL. dst_dev_event will not process dst->dev
unless the dst_entry is already in dst_garbage.list, ie unless it has
already been dst_free'd or dst_release'd. But since destroying
resources is often delayed to workqueues or to one of several garbage
collection lists, can we guarantee that one broadcast of
NETDEV_UNREGISTER_FINAL is always enough? There may be similar cases
with NETDEV_UNREGISTER.

I do urge you to take a second look at the approach that I proposed.
All it does is make sure that a namespace loopback_dev is not removed
until any devices unlisted from that namespace are also disposed of.
That in turn prevents non-device pernet subsystems from exiting that
namespace in ops_exit_list.
Logically it is enforcing the right behavior (namely non-device pernet
subsystems should not exit a namespace until all devices in that
namespace - listed or unlisted - have been properly disposed of) and
it correctly supports existing code, such as rt_flush_dev and
dst_ifdown, which relies on the correct loopback_dev to be there.

Francesco

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] net: race condition when removing virtual net_device
  2013-09-12 20:06 ` [PATCH 1/1] net: race condition when removing virtual net_device Eric W. Biederman
  2013-09-12 21:48   ` Francesco Ruggeri
@ 2013-09-14  0:16   ` David Miller
  2013-09-14  1:32     ` Eric W. Biederman
  1 sibling, 1 reply; 36+ messages in thread
From: David Miller @ 2013-09-14  0:16 UTC (permalink / raw)
  To: ebiederm; +Cc: fruggeri, edumazet, jiri, alexander.h.duyck, amwang, netdev

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Thu, 12 Sep 2013 13:06:53 -0700

> David, Eric, do any of you know why we have this netdevice notfier
> rebroadcast?

It's been there as long as I can remember, but I don't remember
exactly why we do that.

If I had to guess, it's to handle something that has to retry later
due to a "spin_trylock()" or similar type of situation.

Probably best to not remove it. :)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] net: race condition when removing virtual net_device
  2013-09-14  0:16   ` [PATCH 1/1] net: race condition when removing virtual net_device David Miller
@ 2013-09-14  1:32     ` Eric W. Biederman
  0 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2013-09-14  1:32 UTC (permalink / raw)
  To: David Miller; +Cc: fruggeri, edumazet, jiri, alexander.h.duyck, amwang, netdev

David Miller <davem@davemloft.net> writes:

> From: ebiederm@xmission.com (Eric W. Biederman)
> Date: Thu, 12 Sep 2013 13:06:53 -0700
>
>> David, Eric, do any of you know why we have this netdevice notfier
>> rebroadcast?
>
> It's been there as long as I can remember, but I don't remember
> exactly why we do that.
>
> If I had to guess, it's to handle something that has to retry later
> due to a "spin_trylock()" or similar type of situation.
>
> Probably best to not remove it. :)

Since it doesn't completely solve the problem, it isn't worth it to
solve this problem.  Sigh

I have added this bit of code and the moving of references to the
loopback device in dst_ifdown to my hitlist of things to sort out some
day when I have plenty time.

Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] net: race condition when removing virtual net_device
  2013-09-13 17:54       ` Francesco Ruggeri
@ 2013-09-14  1:46         ` Eric W. Biederman
  2013-09-16  2:54           ` Francesco Ruggeri
  0 siblings, 1 reply; 36+ messages in thread
From: Eric W. Biederman @ 2013-09-14  1:46 UTC (permalink / raw)
  To: Francesco Ruggeri
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev

Francesco Ruggeri <fruggeri@aristanetworks.com> writes:

> On Thu, Sep 12, 2013 at 10:50 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:

> To summarize your proposal:
> 1) Remove re-broadcasting of NETDEV_UNREGISTER and
> NETDEV_UNREGISTER_FINAL from netdev_wait_allrefs.

Forget that it isn't needed.  It would be nice but since it doesn't
solve the entire problem let's not go there.

> 2) If unregistering a net_device causes unregistering of other virtual
> devices (eg veth, macvlan, vlan) then move the virtual devices to the
> namespace of the original net_device.
>
> I do have some concerns about both correctness and feasibility of this approach.
>
> About 2), namespace dependent operations triggered by unregistering
> the virtual devices (eg rt_flush_dev, dst_dev_event/dst_ifdown and
> probably more) would not be done in the namespaces where they should.

Yes they will.  That is what dev_change_net_namespace does.
dev_change_net_namespace would not be correct if it did not do that.

Now dev_change_net_namespace is comparitively expensive so it may not be the
best approach but it is an approach that will work.

> I do urge you to take a second look at the approach that I proposed.
> All it does is make sure that a namespace loopback_dev is not removed
> until any devices unlisted from that namespace are also disposed of.
> That in turn prevents non-device pernet subsystems from exiting that
> namespace in ops_exit_list.

It was worth a second look.  I can not find anything wrong with your
patch but I can not convince myself that it is correct either.  The
moving around the loopback device in the net dev todo list to prevent
deadlock I can't imagine why you are doing that.

I think I would prefer something more explicit and less likely to break.
Perhaps something that keeps a network namespace from exiting while we
delete devices.  Using the loopback device like that looks fragile.
However it is very true that we require the loopback device to stay
around until all of the dst_ifdown calls in a network namespace are
made.

At least my original concern that you could be incrementing the count on
the loop back device when it had already reached zero can't be the case.

It would be very nice to have something that I could think through
easily and was robust to change.

Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] net: race condition when removing virtual net_device
  2013-09-14  1:46         ` Eric W. Biederman
@ 2013-09-16  2:54           ` Francesco Ruggeri
  2013-09-16 10:45             ` Eric W. Biederman
  0 siblings, 1 reply; 36+ messages in thread
From: Francesco Ruggeri @ 2013-09-16  2:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev

On Fri, Sep 13, 2013 at 6:46 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Francesco Ruggeri <fruggeri@aristanetworks.com> writes:
>
>> On Thu, Sep 12, 2013 at 10:50 PM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>>
>> I do have some concerns about both correctness and feasibility of this approach.
>>
>> About 2), namespace dependent operations triggered by unregistering
>> the virtual devices (eg rt_flush_dev, dst_dev_event/dst_ifdown and
>> probably more) would not be done in the namespaces where they should.
>
> Yes they will.  That is what dev_change_net_namespace does.

You are right, I don't know what I was thinking.

>
> It was worth a second look.  I can not find anything wrong with your
> patch but I can not convince myself that it is correct either.  The
> moving around the loopback device in the net dev todo list to prevent
> deadlock I can't imagine why you are doing that.
>

That is in order not to introduce a potential deadlock when multiple
namespaces are destroyed in default_device_exit_batch.
Take the same veth scenario as before:
v0 in namespace ns0 (loopback_dev lo0) and similarly for v1, ns1 and lo1.
Assume two processes destroy ns0 and ns1. cleanup_net is executed in a
workqueue and the two namespaces can end up being processed in the
same instance of cleanup_net/ops_exit_list/default_device_exit_batch.
default_device_exit_batch traverses each namespace's dev_base list and
unregister_netdevice_queue is executed in this order:
v0 v1 lo0 v1 v0 lo1.
unregister_netdevice_queue is executed twice on v0 and v1 but that is
ok because it uses list_move_tail and only the last one sticks.
The resulting list for unregister_netdevice_many is:
lo0 v1 v0 lo1.
If v0 holds a reference to lo0 there will be a deadlock in
netdev_run_todo if v0 does not come before lo0 in net_todo_list. By
pushing all loopback_devs to the end of net_todo_list this situation
is avoided.

This is the sequence with today's (actually 3.4) code:

unregister_netdevice_queue: v0 (ns ffff880037aecd00)
unregister_netdevice_queue: v1 (ns ffff880037aed800)
unregister_netdevice_queue: lo (ns ffff880037aecd00)
unregister_netdevice_queue: v1 (ns ffff880037aed800)
unregister_netdevice_queue: v0 (ns ffff880037aecd00)
unregister_netdevice_queue: lo (ns ffff880037aed800)
unregister_netdevice_many: lo (ns ffff880037aecd00) v1 (ns
ffff880037aed800) v0 (ns ffff880037aecd00) lo (ns ffff880037aed800)
netdev_run_todo: lo (ns ffff880037aecd00) v1 (ns ffff880037aed800) v0
(ns ffff880037aecd00) lo (ns ffff880037aed800)

and this is the sequence after pushing the loopback_devs to the tail
of net_todo_list:

unregister_netdevice_queue: v0 (ns ffff8800379f8000)
unregister_netdevice_queue: v1 (ns ffff8800378c0b00)
unregister_netdevice_queue: lo (ns ffff8800379f8000)
unregister_netdevice_queue: v1 (ns ffff8800378c0b00)
unregister_netdevice_queue: v0 (ns ffff8800379f8000)
unregister_netdevice_queue: lo (ns ffff8800378c0b00)
unregister_netdevice_many: lo (ns ffff8800379f8000) v1 (ns
ffff8800378c0b00) v0 (ns ffff8800379f8000) lo (ns ffff8800378c0b00)
netdev_run_todo: v1 (ns ffff8800378c0b00) v0 (ns ffff8800379f8000) lo
(ns ffff8800379f8000) lo (ns ffff8800378c0b00)

Should we take this discussion offline?
I do appreciate your spending time on this.

Thanks,
Francesco

>
> Eric
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] net: race condition when removing virtual net_device
  2013-09-16  2:54           ` Francesco Ruggeri
@ 2013-09-16 10:45             ` Eric W. Biederman
  2013-09-16 20:30               ` Francesco Ruggeri
  0 siblings, 1 reply; 36+ messages in thread
From: Eric W. Biederman @ 2013-09-16 10:45 UTC (permalink / raw)
  To: Francesco Ruggeri
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev

Francesco Ruggeri <fruggeri@aristanetworks.com> writes:

> On Fri, Sep 13, 2013 at 6:46 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Francesco Ruggeri <fruggeri@aristanetworks.com> writes:
>>
>>> On Thu, Sep 12, 2013 at 10:50 PM, Eric W. Biederman
>>> <ebiederm@xmission.com> wrote:
>>>
>>> I do have some concerns about both correctness and feasibility of this approach.
>>>
>>> About 2), namespace dependent operations triggered by unregistering
>>> the virtual devices (eg rt_flush_dev, dst_dev_event/dst_ifdown and
>>> probably more) would not be done in the namespaces where they should.
>>
>> Yes they will.  That is what dev_change_net_namespace does.
>
> You are right, I don't know what I was thinking.

>> It was worth a second look.  I can not find anything wrong with your
>> patch but I can not convince myself that it is correct either.  The
>> moving around the loopback device in the net dev todo list to prevent
>> deadlock I can't imagine why you are doing that.
>>
>
> That is in order not to introduce a potential deadlock when multiple
> namespaces are destroyed in default_device_exit_batch.
> Take the same veth scenario as before:
> v0 in namespace ns0 (loopback_dev lo0) and similarly for v1, ns1 and lo1.
> Assume two processes destroy ns0 and ns1. cleanup_net is executed in a
> workqueue and the two namespaces can end up being processed in the
> same instance of cleanup_net/ops_exit_list/default_device_exit_batch.
> default_device_exit_batch traverses each namespace's dev_base list and
> unregister_netdevice_queue is executed in this order:
> v0 v1 lo0 v1 v0 lo1.
> unregister_netdevice_queue is executed twice on v0 and v1 but that is
> ok because it uses list_move_tail and only the last one sticks.
> The resulting list for unregister_netdevice_many is:
> lo0 v1 v0 lo1.
> If v0 holds a reference to lo0 there will be a deadlock in
> netdev_run_todo if v0 does not come before lo0 in net_todo_list. By
> pushing all loopback_devs to the end of net_todo_list this situation
> is avoided.


>
> This is the sequence with today's (actually 3.4) code:
>
> unregister_netdevice_queue: v0 (ns ffff880037aecd00)
> unregister_netdevice_queue: v1 (ns ffff880037aed800)
> unregister_netdevice_queue: lo (ns ffff880037aecd00)
> unregister_netdevice_queue: v1 (ns ffff880037aed800)
> unregister_netdevice_queue: v0 (ns ffff880037aecd00)
> unregister_netdevice_queue: lo (ns ffff880037aed800)
> unregister_netdevice_many: lo (ns ffff880037aecd00) v1 (ns
> ffff880037aed800) v0 (ns ffff880037aecd00) lo (ns ffff880037aed800)
> netdev_run_todo: lo (ns ffff880037aecd00) v1 (ns ffff880037aed800) v0
> (ns ffff880037aecd00) lo (ns ffff880037aed800)

Interesting.

So we have a very small chance of hillarity ensuing with dst_ifdown.
But we probably won't notice because even if lo has been freed
the memory likely hasn't been recycled.  So dst_hold likely does not
affect anyone.

So it seems real problems only happens when we send the rebroadcast.
Which explains why this has gone unnoticed for so long.

I really don't want to support concurrency during the network namespace
shutdown.  That quickly becomes too crazy to think about.

I believe the weird reordering of veth devices is solved or largely
solved by:

commit f45a5c267da35174e22cec955093a7513dc1623d
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Feb 8 20:10:49 2013 +0000

    veth: fix NULL dereference in veth_dellink()
    
    commit d0e2c55e7c940 (veth: avoid a NULL deref in veth_stats_one)
    added another NULL deref in veth_dellink().
    
    # ip link add name veth1 type veth peer name veth0
    # rmmod veth
    
    We crash because veth_dellink() is called twice, so we must
    take care of NULL peer.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Which unfortunately means you really need to test 3.11 or 3.12-rc1
to make headway on this issue.  If there is further veth specific
madness we can solve it by tweaking the veth driver.

> and this is the sequence after pushing the loopback_devs to the tail
> of net_todo_list:
>
> unregister_netdevice_queue: v0 (ns ffff8800379f8000)
> unregister_netdevice_queue: v1 (ns ffff8800378c0b00)
> unregister_netdevice_queue: lo (ns ffff8800379f8000)
> unregister_netdevice_queue: v1 (ns ffff8800378c0b00)
> unregister_netdevice_queue: v0 (ns ffff8800379f8000)
> unregister_netdevice_queue: lo (ns ffff8800378c0b00)
> unregister_netdevice_many: lo (ns ffff8800379f8000) v1 (ns
> ffff8800378c0b00) v0 (ns ffff8800379f8000) lo (ns ffff8800378c0b00)
> netdev_run_todo: v1 (ns ffff8800378c0b00) v0 (ns ffff8800379f8000) lo
> (ns ffff8800379f8000) lo (ns ffff8800378c0b00)

I am scratching my head.  That isn't what I would expect from putting
loopback devices at the end of the list.

Even more I am not comfortable with any situation where preserving the
order in which the devices are unregistered is not sufficient to make
things work correctly.  That quickly gets into fragile layering
problems, and I don't know how anyone would be able to maintain that.

I still don't see a better solution than dev_change_net_namespace,
for the network devices that have this problem.  Network devices are not
supposed to be findable after the dance at the start of cleanup_net.

It might be possible to actually build asynchronous network device
unregistration and opportunistick batching.  Aka cleanup network devices
much like we clean up network namespaces now, and by funnelling them all
into a single threaded work queue that would probably stop the races, as
well as improve performance.  I don't have the energy to do that right
now unfortunately :(

> Should we take this discussion offline?
> I do appreciate your spending time on this.

If anyone complains we can drop them off of the CC but it is useful to
leave a record of the thought process that went into this, so we should
at the minimum keep netdev in the loop.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] net: race condition when removing virtual net_device
  2013-09-16 10:45             ` Eric W. Biederman
@ 2013-09-16 20:30               ` Francesco Ruggeri
  2013-09-16 23:52                 ` [PATCH net-next] net loopback: Set loopback_dev to NULL when freed Eric W. Biederman
                                   ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Francesco Ruggeri @ 2013-09-16 20:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev

On Mon, Sep 16, 2013 at 3:45 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:

> I believe the weird reordering of veth devices is solved or largely
> solved by:
>
> commit f45a5c267da35174e22cec955093a7513dc1623d

I looked at d0e2c55e and f45a5c26 in the past and I do not think they
are related to the problem I see, which I think is more related to the
infrastructure. See more below.

>
>> and this is the sequence after pushing the loopback_devs to the tail
>> of net_todo_list:
>>
>> unregister_netdevice_queue: v0 (ns ffff8800379f8000)
>> unregister_netdevice_queue: v1 (ns ffff8800378c0b00)
>> unregister_netdevice_queue: lo (ns ffff8800379f8000)
>> unregister_netdevice_queue: v1 (ns ffff8800378c0b00)
>> unregister_netdevice_queue: v0 (ns ffff8800379f8000)
>> unregister_netdevice_queue: lo (ns ffff8800378c0b00)
>> unregister_netdevice_many: lo (ns ffff8800379f8000) v1 (ns
>> ffff8800378c0b00) v0 (ns ffff8800379f8000) lo (ns ffff8800378c0b00)
>> netdev_run_todo: v1 (ns ffff8800378c0b00) v0 (ns ffff8800379f8000) lo
>> (ns ffff8800379f8000) lo (ns ffff8800378c0b00)
>
> I am scratching my head.  That isn't what I would expect from putting
> loopback devices at the end of the list.

The way to read it is as follows:
unregister_netdevice_many gets the sequence lo0 v1 v0 lo1 (lo0 is the
occurrence of "lo" with the same namespace as v0, etc.).
As the interfaces are unlisted the loopback_devs are queued at the end
of net_todo_list, so when netdev_run_todo executes net_todo_list is v1
v0 lo0 lo1.

>
> Even more I am not comfortable with any situation where preserving the
> order in which the devices are unregistered is not sufficient to make
> things work correctly.  That quickly gets into fragile layering
> problems, and I don't know how anyone would be able to maintain that.
>

I agree with your general statement, but unfortunately the order in
which the interfaces are unregistered is not preserved even in the
current code, since dev_close_many already rearranges them based on
whether they are UP or not.

If I delete a namespace with only lo and v0 this is the order in which
they are released depending on their state. If lo is DOWN then it will
be processed before v0 in netdev_run_todo.

====== lo down, v0 down
unregister_netdevice_queue: v0 (ns ffff880037bb0b00)
unregister_netdevice_queue: lo (ns ffff880037bb0b00)
unregister_netdevice_many: v0 (ns ffff880037bb0b00) lo (ns ffff880037bb0b00)
netdev_run_todo: lo (ns ffff880037bb0b00) v0 (ns ffff880037bb0b00)

====== lo up, v0 down
unregister_netdevice_queue: v0 (ns ffff8800378a9600)
unregister_netdevice_queue: lo (ns ffff8800378a9600)
unregister_netdevice_many: v0 (ns ffff8800378a9600) lo (ns ffff8800378a9600)
netdev_run_todo: v0 (ns ffff8800378a9600) lo (ns ffff8800378a9600)

====== lo down, v0 up
unregister_netdevice_queue: v0 (ns ffff880037bb0b00)
unregister_netdevice_queue: lo (ns ffff880037bb0b00)
unregister_netdevice_many: v0 (ns ffff880037bb0b00) lo (ns ffff880037bb0b00)
netdev_run_todo: lo (ns ffff880037bb0b00) v0 (ns ffff880037bb0b00)

====== lo up, v0 up
unregister_netdevice_queue: v0 (ns ffff8800378a9600)
unregister_netdevice_queue: lo (ns ffff8800378a9600)
unregister_netdevice_many: v0 (ns ffff8800378a9600) lo (ns ffff8800378a9600)
netdev_run_todo: v0 (ns ffff8800378a9600) lo (ns ffff8800378a9600)


> I still don't see a better solution than dev_change_net_namespace,
> for the network devices that have this problem.  Network devices are not
> supposed to be findable after the dance at the start of cleanup_net.
>

I think the problem we are seeing is an infrastructure one. If I can restate it:

1) When destroying a namespace all net_devices in it should be
disposed of before any non-device pernet subsystems exit. The existing
code tries to do this but it does not take into consideration
net_devices that may have been unlisted from the namespace by another
process which is still in the process of destroying them (specifically
in netdev_run_todo/netdev_wait_allrefs).

There is also another issue, separate but related (my diffs did not
try to address it):

2) dev_change_net_namespace does not have the equivalent of
netdev_wait_allrefs. If rebroadcasts are in fact needed when
destroying a net_device then the same should apply when the net_device
is moved out of a namespace. With existing code a net_device can be
moved to a different namespace before its refcount drops to 0. Any
later broadcasts will not trigger any action in the original namespace
where they should have occurred.

I suspect that this is not catastrophic but its effects could be
subtle. This is another reason why I would avoid using
dev_change_net_namespace as a driver level solution for 1) above,
besides the fact that we would have to apply it to all drivers
affected by this (veth, vlan, macvlan, maybe more).

> It might be possible to actually build asynchronous network device
> unregistration and opportunistick batching.  Aka cleanup network devices
> much like we clean up network namespaces now, and by funnelling them all
> into a single threaded work queue that would probably stop the races, as
> well as improve performance.  I don't have the energy to do that right
> now unfortunately :(
>

That would probably solve it, but as you suggest it will take a
considerable amount of work and code re-structuring.

Francesco

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH net-next] net loopback: Set loopback_dev to NULL when freed
  2013-09-16 20:30               ` Francesco Ruggeri
@ 2013-09-16 23:52                 ` Eric W. Biederman
  2013-09-17  0:50                   ` Eric Dumazet
  2013-09-17  0:25                 ` [PATCH 1/1] net: race condition when removing virtual net_device Eric W. Biederman
  2013-09-17  3:49                 ` [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering Eric W. Biederman
  2 siblings, 1 reply; 36+ messages in thread
From: Eric W. Biederman @ 2013-09-16 23:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: Eric Dumazet, Jiri Pirko, Alexander Duyck, Cong Wang, netdev,
	Francesco Ruggeri


It has recently turned up that we have a number of long standing bugs
in the network stack cleanup code with use of the loopback device
after it has been freed that have not turned up because in most cases
the storage allocated to the loopback device is not reused, when those
accesses happen.

Set looback_dev to NULL to trigger oopses instead of silent data corrupt
when we hit this class of bug.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/net/loopback.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index fcbf680..a17d85a 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -146,6 +146,7 @@ static int loopback_dev_init(struct net_device *dev)
 
 static void loopback_dev_free(struct net_device *dev)
 {
+	dev_net(dev)->loopback_dev = NULL;
 	free_percpu(dev->lstats);
 	free_netdev(dev);
 }
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] net: race condition when removing virtual net_device
  2013-09-16 20:30               ` Francesco Ruggeri
  2013-09-16 23:52                 ` [PATCH net-next] net loopback: Set loopback_dev to NULL when freed Eric W. Biederman
@ 2013-09-17  0:25                 ` Eric W. Biederman
  2013-09-17  5:12                   ` Francesco Ruggeri
  2013-09-17  3:49                 ` [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering Eric W. Biederman
  2 siblings, 1 reply; 36+ messages in thread
From: Eric W. Biederman @ 2013-09-17  0:25 UTC (permalink / raw)
  To: Francesco Ruggeri
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev


Francesco Ruggeri <fruggeri@aristanetworks.com> writes:

> On Mon, Sep 16, 2013 at 3:45 AM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>
>> I believe the weird reordering of veth devices is solved or largely
>> solved by:
>>
>> commit f45a5c267da35174e22cec955093a7513dc1623d
>
> I looked at d0e2c55e and f45a5c26 in the past and I do not think they
> are related to the problem I see, which I think is more related to the
> infrastructure. See more below.

If everything is in the same batch they definitely relate, as it stops
refiling one of the devices past the point where it's loopback device
is destroyed.

It is not the fundamental problem but it is related and those fixes.

>>> and this is the sequence after pushing the loopback_devs to the tail
>>> of net_todo_list:
>>>
>>> unregister_netdevice_queue: v0 (ns ffff8800379f8000)
>>> unregister_netdevice_queue: v1 (ns ffff8800378c0b00)
>>> unregister_netdevice_queue: lo (ns ffff8800379f8000)
>>> unregister_netdevice_queue: v1 (ns ffff8800378c0b00)
>>> unregister_netdevice_queue: v0 (ns ffff8800379f8000)
>>> unregister_netdevice_queue: lo (ns ffff8800378c0b00)
>>> unregister_netdevice_many: lo (ns ffff8800379f8000) v1 (ns
>>> ffff8800378c0b00) v0 (ns ffff8800379f8000) lo (ns ffff8800378c0b00)
>>> netdev_run_todo: v1 (ns ffff8800378c0b00) v0 (ns ffff8800379f8000) lo
>>> (ns ffff8800379f8000) lo (ns ffff8800378c0b00)
>>
>> I am scratching my head.  That isn't what I would expect from putting
>> loopback devices at the end of the list.
>
> The way to read it is as follows:
> unregister_netdevice_many gets the sequence lo0 v1 v0 lo1 (lo0 is the
> occurrence of "lo" with the same namespace as v0, etc.).
> As the interfaces are unlisted the loopback_devs are queued at the end
> of net_todo_list, so when netdev_run_todo executes net_todo_list is v1
> v0 lo0 lo1.

My reading jumbled the unregister_netdevice_many line and the
netdev_run_todo line.  So I was seeing v0, lo, v1, v0, lo, lo.
But since both loop devices are now at the end your that part of your
patch looks ok.

>> Even more I am not comfortable with any situation where preserving the
>> order in which the devices are unregistered is not sufficient to make
>> things work correctly.  That quickly gets into fragile layering
>> problems, and I don't know how anyone would be able to maintain that.
>>
>
> I agree with your general statement, but unfortunately the order in
> which the interfaces are unregistered is not preserved even in the
> current code, since dev_close_many already rearranges them based on
> whether they are UP or not.

Ugh.  That is a bug.

I have sent a patch to solve this by simply not reordering the device
deletions as that is easier to think about.  And more robust if we
happen to care about anything besides the loopback device.  Reording
happens so seldom I can't imagine it being tested properly.

I have also sent a patch to set loopback_dev to NULL so that it is more
likely we will see your class of bugs.

I think I see bugs with the handling of rtmsg_ifinfo, and
netpoll_rx_enable/disable in dev_close_many as well. But those are
significantly less serious issues.

If you could verify that my patch to dev_close solves the ordering
issues you were seeing I would appreciate that.

>> I still don't see a better solution than dev_change_net_namespace,
>> for the network devices that have this problem.  Network devices are not
>> supposed to be findable after the dance at the start of cleanup_net.
>>
>
> I think the problem we are seeing is an infrastructure one. If I can restate it:
>
> 1) When destroying a namespace all net_devices in it should be
> disposed of before any non-device pernet subsystems exit. The existing
> code tries to do this but it does not take into consideration
> net_devices that may have been unlisted from the namespace by another
> process which is still in the process of destroying them (specifically
> in netdev_run_todo/netdev_wait_allrefs).

The existing code makes the assumption that is not possible to find
a network device and manipulate it outside the network namespace in
any way (outside of the network namespace cleanup methods) after the
network namespace reference count has dropped to 0 and the network
namespace is no longer in the network namespace list.

That assumption is clearly not true for veth pairs, vlans, macvlans and
the like, and it is an infrastructure problem.

The question is what is the most robust comprehensible and maintinable
fix?  (That doesn't penalize the fast path of course).

> There is also another issue, separate but related (my diffs did not
> try to address it):
>
> 2) dev_change_net_namespace does not have the equivalent of
> netdev_wait_allrefs. If rebroadcasts are in fact needed when
> destroying a net_device then the same should apply when the net_device
> is moved out of a namespace. With existing code a net_device can be
> moved to a different namespace before its refcount drops to 0. Any
> later broadcasts will not trigger any action in the original namespace
> where they should have occurred.

Not having the broadcast in dev_change_net_namespace is not a bug.  The
repeated broadcast is most definitely there for hysterical raisins, and
with a good audit of dev_hold/dev_put call sites can most definitely be
removed.  At a practical level the repeated broadcast is just acting as
a hammer and saying to subsystems you messed up now let this device go.
Let go! Let go! Let go! And there is a slight hope that something will
let go when told multiple times.

> I suspect that this is not catastrophic but its effects could be
> subtle. This is another reason why I would avoid using
> dev_change_net_namespace as a driver level solution for 1) above,
> besides the fact that we would have to apply it to all drivers
> affected by this (veth, vlan, macvlan, maybe more).

We don't avoid the rebroadcast and wait, with dev_change_net_namespace,
they are just performed in a different network namespace.  So if
something is wrong we will know.

>> It might be possible to actually build asynchronous network device
>> unregistration and opportunistick batching.  Aka cleanup network devices
>> much like we clean up network namespaces now, and by funnelling them all
>> into a single threaded work queue that would probably stop the races, as
>> well as improve performance.  I don't have the energy to do that right
>> now unfortunately :(
>>
>
> That would probably solve it, but as you suggest it will take a
> considerable amount of work and code re-structuring.

And it may be worth it for the speed up you get when destroying 384 mlag
ports.

Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next] net loopback: Set loopback_dev to NULL when freed
  2013-09-16 23:52                 ` [PATCH net-next] net loopback: Set loopback_dev to NULL when freed Eric W. Biederman
@ 2013-09-17  0:50                   ` Eric Dumazet
  2013-09-17  1:34                     ` David Miller
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2013-09-17  0:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev, Francesco Ruggeri

On Mon, 2013-09-16 at 16:52 -0700, Eric W. Biederman wrote:
> It has recently turned up that we have a number of long standing bugs
> in the network stack cleanup code with use of the loopback device
> after it has been freed that have not turned up because in most cases
> the storage allocated to the loopback device is not reused, when those
> accesses happen.
> 
> Set looback_dev to NULL to trigger oopses instead of silent data corrupt
> when we hit this class of bug.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next] net loopback: Set loopback_dev to NULL when freed
  2013-09-17  0:50                   ` Eric Dumazet
@ 2013-09-17  1:34                     ` David Miller
  2013-09-17  1:41                       ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: David Miller @ 2013-09-17  1:34 UTC (permalink / raw)
  To: eric.dumazet
  Cc: ebiederm, edumazet, jiri, alexander.h.duyck, amwang, netdev, fruggeri

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 16 Sep 2013 17:50:51 -0700

> On Mon, 2013-09-16 at 16:52 -0700, Eric W. Biederman wrote:
>> It has recently turned up that we have a number of long standing bugs
>> in the network stack cleanup code with use of the loopback device
>> after it has been freed that have not turned up because in most cases
>> the storage allocated to the loopback device is not reused, when those
>> accesses happen.
>> 
>> Set looback_dev to NULL to trigger oopses instead of silent data corrupt
>> when we hit this class of bug.
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
> 
> Acked-by: Eric Dumazet <edumazet@google.com>

I'd like to apply this to 'net', any objections?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next] net loopback: Set loopback_dev to NULL when freed
  2013-09-17  1:34                     ` David Miller
@ 2013-09-17  1:41                       ` Eric Dumazet
  2013-09-17  1:52                         ` Eric W. Biederman
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2013-09-17  1:41 UTC (permalink / raw)
  To: David Miller
  Cc: ebiederm, edumazet, jiri, alexander.h.duyck, amwang, netdev, fruggeri

On Mon, 2013-09-16 at 21:34 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 16 Sep 2013 17:50:51 -0700
> 
> > On Mon, 2013-09-16 at 16:52 -0700, Eric W. Biederman wrote:
> >> It has recently turned up that we have a number of long standing bugs
> >> in the network stack cleanup code with use of the loopback device
> >> after it has been freed that have not turned up because in most cases
> >> the storage allocated to the loopback device is not reused, when those
> >> accesses happen.
> >> 
> >> Set looback_dev to NULL to trigger oopses instead of silent data corrupt
> >> when we hit this class of bug.
> >> 
> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> >> ---
> > 
> > Acked-by: Eric Dumazet <edumazet@google.com>
> 
> I'd like to apply this to 'net', any objections?

No objections from me.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next] net loopback: Set loopback_dev to NULL when freed
  2013-09-17  1:41                       ` Eric Dumazet
@ 2013-09-17  1:52                         ` Eric W. Biederman
  2013-09-17 23:05                           ` David Miller
  0 siblings, 1 reply; 36+ messages in thread
From: Eric W. Biederman @ 2013-09-17  1:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, edumazet, jiri, alexander.h.duyck, amwang, netdev,
	fruggeri

Eric Dumazet <eric.dumazet@gmail.com> writes:

> On Mon, 2013-09-16 at 21:34 -0400, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Mon, 16 Sep 2013 17:50:51 -0700
>> 
>> > On Mon, 2013-09-16 at 16:52 -0700, Eric W. Biederman wrote:
>> >> It has recently turned up that we have a number of long standing bugs
>> >> in the network stack cleanup code with use of the loopback device
>> >> after it has been freed that have not turned up because in most cases
>> >> the storage allocated to the loopback device is not reused, when those
>> >> accesses happen.
>> >> 
>> >> Set looback_dev to NULL to trigger oopses instead of silent data corrupt
>> >> when we hit this class of bug.
>> >> 
>> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> >> ---
>> > 
>> > Acked-by: Eric Dumazet <edumazet@google.com>
>> 
>> I'd like to apply this to 'net', any objections?
>
> No objections from me.

No objects from me I just hadn't seen it as a bug fix, but I guess it
sort of is.

Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering
  2013-09-16 20:30               ` Francesco Ruggeri
  2013-09-16 23:52                 ` [PATCH net-next] net loopback: Set loopback_dev to NULL when freed Eric W. Biederman
  2013-09-17  0:25                 ` [PATCH 1/1] net: race condition when removing virtual net_device Eric W. Biederman
@ 2013-09-17  3:49                 ` Eric W. Biederman
  2013-09-17  6:54                   ` Francesco Ruggeri
  2013-09-17 23:21                   ` David Miller
  2 siblings, 2 replies; 36+ messages in thread
From: Eric W. Biederman @ 2013-09-17  3:49 UTC (permalink / raw)
  To: Francesco Ruggeri
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev


The implementation is a little rough but the logic should be right.

Device registration and unregistration is serialized with the rtnl_lock.
The final pieces of device unregistration do not happen under the
rtnl_lock resulting in the possibility that while we wait for the
refcount of a device to drop to zero the network namespace is
unregistered while no locks are held.

Prevent that by keeping a count of the network devices that are being
unregistered and before we make the final pass through a network
namespace to flush out all of the network devices, wait for the count of
network devices being unregistered to drop to zero.

Reported-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

Francesco could you take a look at this.  I am about 99% certain this is
right but I am starting to fade.  So it is entirely possible I missed
something.

 net/core/dev.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 5d702fe..c25e6f3 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5002,10 +5002,13 @@ static int dev_new_index(struct net *net)
 
 /* Delayed registration/unregisteration */
 static LIST_HEAD(net_todo_list);
+static atomic_t netdev_unregistering = ATOMIC_INIT(0);
+static DECLARE_WAIT_QUEUE_HEAD(netdev_unregistering_wait);
 
 static void net_set_todo(struct net_device *dev)
 {
 	list_add_tail(&dev->todo_list, &net_todo_list);
+	atomic_inc(&netdev_unregistering);
 }
 
 static void rollback_registered_many(struct list_head *head)
@@ -5673,6 +5676,9 @@ void netdev_run_todo(void)
 		if (dev->destructor)
 			dev->destructor(dev);
 
+		if (atomic_dec_and_test(&netdev_unregistering))
+			wake_up(&netdev_unregistering_wait);
+
 		/* Free network device */
 		kobject_put(&dev->dev.kobj);
 	}
@@ -6369,7 +6375,13 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
 	struct net *net;
 	LIST_HEAD(dev_kill_list);
 
+retry:
+	wait_event(netdev_unregistering_wait, (atomic_read(&netdev_unregistering) == 0));
 	rtnl_lock();
+	if (atomic_read(&netdev_unregistering) != 0) {
+		__rtnl_unlock();
+		goto retry;
+	}
 	list_for_each_entry(net, net_list, exit_list) {
 		for_each_netdev_reverse(net, dev) {
 			if (dev->rtnl_link_ops)
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] net: race condition when removing virtual net_device
  2013-09-17  0:25                 ` [PATCH 1/1] net: race condition when removing virtual net_device Eric W. Biederman
@ 2013-09-17  5:12                   ` Francesco Ruggeri
  0 siblings, 0 replies; 36+ messages in thread
From: Francesco Ruggeri @ 2013-09-17  5:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev

On Mon, Sep 16, 2013 at 5:25 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:

>
> If you could verify that my patch to dev_close solves the ordering
> issues you were seeing I would appreciate that.
>

I have not run extensive tests, but your patch fixes the reordering
issue I was seeing in dev_close_many. When I destroyed a namespace
with only v0 and lo the order was preserved from
unregister_netdevice_many to netdev_run_todo.

Francesco

====== lo down, v0 down
unregister_netdevice_queue: v0 (ns ffff880136bc8000)
unregister_netdevice_queue: lo (ns ffff880136bc8000)
unregister_netdevice_many: v0 (ns ffff880136bc8000) lo (ns ffff880136bc8000)
netdev_run_todo: v0 (ns ffff880136bc8000) lo (ns ffff880136bc8000)

====== lo up, v0 down
unregister_netdevice_queue: v0 (ns ffff880037ac8000)
unregister_netdevice_queue: lo (ns ffff880037ac8000)
unregister_netdevice_many: v0 (ns ffff880037ac8000) lo (ns ffff880037ac8000)
netdev_run_todo: v0 (ns ffff880037ac8000) lo (ns ffff880037ac8000)


====== lo down, v0 up
unregister_netdevice_queue: v0 (ns ffff880136bc8000)
unregister_netdevice_queue: lo (ns ffff880136bc8000)
unregister_netdevice_many: v0 (ns ffff880136bc8000) lo (ns ffff880136bc8000)
netdev_run_todo: v0 (ns ffff880136bc8000) lo (ns ffff880136bc8000)


====== lo up, v0 up
unregister_netdevice_queue: v0 (ns ffff880037ac8000)
unregister_netdevice_queue: lo (ns ffff880037ac8000)
unregister_netdevice_many: v0 (ns ffff880037ac8000) lo (ns ffff880037ac8000)
netdev_run_todo: v0 (ns ffff880037ac8000) lo (ns ffff880037ac8000)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering
  2013-09-17  3:49                 ` [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering Eric W. Biederman
@ 2013-09-17  6:54                   ` Francesco Ruggeri
  2013-09-17  9:38                     ` Eric W. Biederman
  2013-09-17 23:21                   ` David Miller
  1 sibling, 1 reply; 36+ messages in thread
From: Francesco Ruggeri @ 2013-09-17  6:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev

On Mon, Sep 16, 2013 at 8:49 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> The implementation is a little rough but the logic should be right.
>
> Device registration and unregistration is serialized with the rtnl_lock.
> The final pieces of device unregistration do not happen under the
> rtnl_lock resulting in the possibility that while we wait for the
> refcount of a device to drop to zero the network namespace is
> unregistered while no locks are held.
>
> Prevent that by keeping a count of the network devices that are being
> unregistered and before we make the final pass through a network
> namespace to flush out all of the network devices, wait for the count of
> network devices being unregistered to drop to zero.
>
> Reported-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>
> Francesco could you take a look at this.  I am about 99% certain this is
> right but I am starting to fade.  So it is entirely possible I missed
> something.

Same here ...
The logic looks right to me and I think it should address the original
issue I ran into.
Would it make sense to have netdev_unregistering and
netdev_unregistering_wait be per-namespace, and have
default_device_exit_batch only wait for the namespaces in net_list? It
would require some extra loops and locking, but it may help avoid
unnecessary waits.

Francesco

>
>  net/core/dev.c |   12 ++++++++++++
>  1 files changed, 12 insertions(+), 0 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 5d702fe..c25e6f3 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5002,10 +5002,13 @@ static int dev_new_index(struct net *net)
>
>  /* Delayed registration/unregisteration */
>  static LIST_HEAD(net_todo_list);
> +static atomic_t netdev_unregistering = ATOMIC_INIT(0);
> +static DECLARE_WAIT_QUEUE_HEAD(netdev_unregistering_wait);
>
>  static void net_set_todo(struct net_device *dev)
>  {
>         list_add_tail(&dev->todo_list, &net_todo_list);
> +       atomic_inc(&netdev_unregistering);
>  }
>
>  static void rollback_registered_many(struct list_head *head)
> @@ -5673,6 +5676,9 @@ void netdev_run_todo(void)
>                 if (dev->destructor)
>                         dev->destructor(dev);
>
> +               if (atomic_dec_and_test(&netdev_unregistering))
> +                       wake_up(&netdev_unregistering_wait);
> +
>                 /* Free network device */
>                 kobject_put(&dev->dev.kobj);
>         }
> @@ -6369,7 +6375,13 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
>         struct net *net;
>         LIST_HEAD(dev_kill_list);
>
> +retry:
> +       wait_event(netdev_unregistering_wait, (atomic_read(&netdev_unregistering) == 0));
>         rtnl_lock();
> +       if (atomic_read(&netdev_unregistering) != 0) {
> +               __rtnl_unlock();
> +               goto retry;
> +       }
>         list_for_each_entry(net, net_list, exit_list) {
>                 for_each_netdev_reverse(net, dev) {
>                         if (dev->rtnl_link_ops)
> --
> 1.7.5.4
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering
  2013-09-17  6:54                   ` Francesco Ruggeri
@ 2013-09-17  9:38                     ` Eric W. Biederman
  2013-09-17 17:14                       ` Francesco Ruggeri
  0 siblings, 1 reply; 36+ messages in thread
From: Eric W. Biederman @ 2013-09-17  9:38 UTC (permalink / raw)
  To: Francesco Ruggeri
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev

Francesco Ruggeri <fruggeri@aristanetworks.com> writes:

> On Mon, Sep 16, 2013 at 8:49 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:

>> Francesco could you take a look at this.  I am about 99% certain this is
>> right but I am starting to fade.  So it is entirely possible I missed
>> something.
>
> Same here ...
> The logic looks right to me and I think it should address the original
> issue I ran into.
> Would it make sense to have netdev_unregistering and
> netdev_unregistering_wait be per-namespace, and have
> default_device_exit_batch only wait for the namespaces in net_list? It
> would require some extra loops and locking, but it may help avoid
> unnecessary waits.

It might be worth it, and it certainly would give a good progress
guarantee, as activity in an exiting network namespace is guaranteed to
wind down.  Progress guarantees are always good to have.

The downside of a per netns unregistering count is that we will have
extra wake ups which will could result in extra contention on the
rtnl_lock.

The wait queue definitely makes sense to be global as there only ever
can be a single waiter, because the netns_wq is single threaded and
everything we are doing is happening with the net_mutex held.

The count doesn't necessarily need to be atomic as it can be protected
by the rtnl_lock.

Like I said the logic is a little rough as I was tired. 

I am heading off to New Orleans for Linux Plumbers Conference in a
couple hours so I don't expect I will be particularly responsive
again until next week.

If you could test this patch perhaps refine it I think we are almost at
a final point of fixing this.

Just to be clear my reason for prefering this approach is that because
it adds no extra wait points (we already wait for the rtnl_lock), the
logic is unconditional and explicit and not hidden in the loopback
device's reference count.  Which should allow anyone reading the code
to discover and understand this guarantee.  Although a big fat comment
in default_device_exit_batch that we are guaranteeing we don't allow
the network namespace to exit while there are still network devices in
it (or something to that effect) is probably appropriate.

Eric

> Francesco
>
>>
>>  net/core/dev.c |   12 ++++++++++++
>>  1 files changed, 12 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 5d702fe..c25e6f3 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -5002,10 +5002,13 @@ static int dev_new_index(struct net *net)
>>
>>  /* Delayed registration/unregisteration */
>>  static LIST_HEAD(net_todo_list);
>> +static atomic_t netdev_unregistering = ATOMIC_INIT(0);
>> +static DECLARE_WAIT_QUEUE_HEAD(netdev_unregistering_wait);
>>
>>  static void net_set_todo(struct net_device *dev)
>>  {
>>         list_add_tail(&dev->todo_list, &net_todo_list);
>> +       atomic_inc(&netdev_unregistering);
>>  }
>>
>>  static void rollback_registered_many(struct list_head *head)
>> @@ -5673,6 +5676,9 @@ void netdev_run_todo(void)
>>                 if (dev->destructor)
>>                         dev->destructor(dev);
>>
>> +               if (atomic_dec_and_test(&netdev_unregistering))
>> +                       wake_up(&netdev_unregistering_wait);
>> +
>>                 /* Free network device */
>>                 kobject_put(&dev->dev.kobj);
>>         }
>> @@ -6369,7 +6375,13 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
>>         struct net *net;
>>         LIST_HEAD(dev_kill_list);
>>
>> +retry:
>> +       wait_event(netdev_unregistering_wait, (atomic_read(&netdev_unregistering) == 0));
>>         rtnl_lock();
>> +       if (atomic_read(&netdev_unregistering) != 0) {
>> +               __rtnl_unlock();
>> +               goto retry;
>> +       }
>>         list_for_each_entry(net, net_list, exit_list) {
>>                 for_each_netdev_reverse(net, dev) {
>>                         if (dev->rtnl_link_ops)
>> --
>> 1.7.5.4
>>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering
  2013-09-17  9:38                     ` Eric W. Biederman
@ 2013-09-17 17:14                       ` Francesco Ruggeri
  0 siblings, 0 replies; 36+ messages in thread
From: Francesco Ruggeri @ 2013-09-17 17:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev

>
> If you could test this patch perhaps refine it I think we are almost at
> a final point of fixing this.
>
I will.

> Just to be clear my reason for prefering this approach is that because
> it adds no extra wait points (we already wait for the rtnl_lock), the
> logic is unconditional and explicit and not hidden in the loopback
> device's reference count.  Which should allow anyone reading the code
> to discover and understand this guarantee.  Although a big fat comment
> in default_device_exit_batch that we are guaranteeing we don't allow
> the network namespace to exit while there are still network devices in
> it (or something to that effect) is probably appropriate.

I agree, this approach is cleaner than overloading loopback_dev's
refcount as in my original patch.
I will let you know how my tests go over the next few days.

Thanks,
Francesco


>
> Eric
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next] net loopback: Set loopback_dev to NULL when freed
  2013-09-17  1:52                         ` Eric W. Biederman
@ 2013-09-17 23:05                           ` David Miller
  0 siblings, 0 replies; 36+ messages in thread
From: David Miller @ 2013-09-17 23:05 UTC (permalink / raw)
  To: ebiederm
  Cc: eric.dumazet, edumazet, jiri, alexander.h.duyck, amwang, netdev,
	fruggeri

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Mon, 16 Sep 2013 18:52:27 -0700

> Eric Dumazet <eric.dumazet@gmail.com> writes:
> 
>> On Mon, 2013-09-16 at 21:34 -0400, David Miller wrote:
>>> From: Eric Dumazet <eric.dumazet@gmail.com>
>>> Date: Mon, 16 Sep 2013 17:50:51 -0700
>>> 
>>> > On Mon, 2013-09-16 at 16:52 -0700, Eric W. Biederman wrote:
>>> >> It has recently turned up that we have a number of long standing bugs
>>> >> in the network stack cleanup code with use of the loopback device
>>> >> after it has been freed that have not turned up because in most cases
>>> >> the storage allocated to the loopback device is not reused, when those
>>> >> accesses happen.
>>> >> 
>>> >> Set looback_dev to NULL to trigger oopses instead of silent data corrupt
>>> >> when we hit this class of bug.
>>> >> 
>>> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>> >> ---
>>> > 
>>> > Acked-by: Eric Dumazet <edumazet@google.com>
>>> 
>>> I'd like to apply this to 'net', any objections?
>>
>> No objections from me.
> 
> No objects from me I just hadn't seen it as a bug fix, but I guess it
> sort of is.

Ok, done.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering
  2013-09-17  3:49                 ` [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering Eric W. Biederman
  2013-09-17  6:54                   ` Francesco Ruggeri
@ 2013-09-17 23:21                   ` David Miller
  2013-09-17 23:41                     ` Eric W. Biederman
  1 sibling, 1 reply; 36+ messages in thread
From: David Miller @ 2013-09-17 23:21 UTC (permalink / raw)
  To: ebiederm; +Cc: fruggeri, edumazet, jiri, alexander.h.duyck, amwang, netdev

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Mon, 16 Sep 2013 20:49:31 -0700

>  /* Delayed registration/unregisteration */
>  static LIST_HEAD(net_todo_list);
> +static atomic_t netdev_unregistering = ATOMIC_INIT(0);
> +static DECLARE_WAIT_QUEUE_HEAD(netdev_unregistering_wait);

I think you don't need this atomic.

Something like this should work and seems much simpler:

1) Still use the netdev_unregistering_wait queue, that's fine.

2) In netdev_run_todo(), unconditionally wake it up at the end of
   the while() loop.

3) In default_device_exit_batch() use the same retry logic but
   your tests are on list_empty(&net_todo_list() rather than the
   new atomic count.

I think with this simplification I'm fine to apply this patch after
it has been tested properly.

Long term I'd like to see a per-namespace todo list, as seems to
have been suggested and discussed already.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering
  2013-09-17 23:21                   ` David Miller
@ 2013-09-17 23:41                     ` Eric W. Biederman
  2013-09-18  0:15                       ` David Miller
  0 siblings, 1 reply; 36+ messages in thread
From: Eric W. Biederman @ 2013-09-17 23:41 UTC (permalink / raw)
  To: David Miller; +Cc: fruggeri, edumazet, jiri, alexander.h.duyck, amwang, netdev

David Miller <davem@davemloft.net> wrote:
>From: ebiederm@xmission.com (Eric W. Biederman)
>Date: Mon, 16 Sep 2013 20:49:31 -0700
>
>>  /* Delayed registration/unregisteration */
>>  static LIST_HEAD(net_todo_list);
>> +static atomic_t netdev_unregistering = ATOMIC_INIT(0);
>> +static DECLARE_WAIT_QUEUE_HEAD(netdev_unregistering_wait);
>
>I think you don't need this atomic.
>
>Something like this should work and seems much simpler:
>
>1) Still use the netdev_unregistering_wait queue, that's fine.
>
>2) In netdev_run_todo(), unconditionally wake it up at the end of
>   the while() loop.
>
>3) In default_device_exit_batch() use the same retry logic but
>   your tests are on list_empty(&net_todo_list() rather than the
>   new atomic count.

List/count I don't much care but currently we don't have a list of all of the devices that are unregistering.

The problem with this is that netdev_run_todo moves all of the devices to a local list, so they are only visible from a list_head on the stack.  Which makes sense as we run this all in the context of rtnl_unlock.

>I think with this simplification I'm fine to apply this patch after
>it has been tested properly.

I wish we could make that simplication.

>Long term I'd like to see a per-namespace todo list, as seems to
>have been suggested and discussed already.

Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering
  2013-09-17 23:41                     ` Eric W. Biederman
@ 2013-09-18  0:15                       ` David Miller
  2013-09-18  3:50                         ` Francesco Ruggeri
  0 siblings, 1 reply; 36+ messages in thread
From: David Miller @ 2013-09-18  0:15 UTC (permalink / raw)
  To: ebiederm; +Cc: fruggeri, edumazet, jiri, alexander.h.duyck, amwang, netdev

From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Tue, 17 Sep 2013 16:41:37 -0700

> List/count I don't much care but currently we don't have a list of
> all of the devices that are unregistering.
> 
> The problem with this is that netdev_run_todo moves all of the
> devices to a local list, so they are only visible from a list_head
> on the stack.  Which makes sense as we run this all in the context
> of rtnl_unlock.

And when that local list is processed (the while loop completes and
has iterated over the entire list), either the global todo list is
empty, or it is not empty.

And the waked up code will check for this.

I really don't see what the problem is.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering
  2013-09-18  0:15                       ` David Miller
@ 2013-09-18  3:50                         ` Francesco Ruggeri
  2013-09-18  3:52                           ` David Miller
  0 siblings, 1 reply; 36+ messages in thread
From: Francesco Ruggeri @ 2013-09-18  3:50 UTC (permalink / raw)
  To: David Miller
  Cc: Eric W. Biederman, Eric Dumazet, Jiří Pírko,
	Alexander Duyck, Cong Wang, netdev

>> List/count I don't much care but currently we don't have a list of
>> all of the devices that are unregistering.
>>
>> The problem with this is that netdev_run_todo moves all of the
>> devices to a local list, so they are only visible from a list_head
>> on the stack.  Which makes sense as we run this all in the context
>> of rtnl_unlock.
>
> And when that local list is processed (the while loop completes and
> has iterated over the entire list), either the global todo list is
> empty, or it is not empty.
>
> And the waked up code will check for this.
>
> I really don't see what the problem is.

I am not sure that would work.
list_empty(&net_todo_list) does not guarantee that there are no
unregistering devices still in flight.
Another process may have copied net_todo_run into its local list, left
net_todo_list empty and still be in the middle of processing
unregistering devices (without the rtnl lock) when
default_device_exit_batch starts executing.

> The count doesn't necessarily need to be atomic as it can be protected
> by the rtnl_lock.

If we use the rtnl_lock then we could have per-net wait queues. While
not strictly needed (since only one instance of
default_device_exit_batch can execute at a time) it would avoid
unnecessarily waking up the waiting code.

Francesco

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering
  2013-09-18  3:50                         ` Francesco Ruggeri
@ 2013-09-18  3:52                           ` David Miller
  2013-09-18  8:19                             ` Eric W. Biederman
  2013-09-24  4:19                             ` [PATCH] net: Delay default_device_exit_batch until no devices are unregistering v2 Eric W. Biederman
  0 siblings, 2 replies; 36+ messages in thread
From: David Miller @ 2013-09-18  3:52 UTC (permalink / raw)
  To: fruggeri; +Cc: ebiederm, edumazet, jiri, alexander.h.duyck, amwang, netdev

From: Francesco Ruggeri <fruggeri@aristanetworks.com>
Date: Tue, 17 Sep 2013 20:50:41 -0700

>>> List/count I don't much care but currently we don't have a list of
>>> all of the devices that are unregistering.
>>>
>>> The problem with this is that netdev_run_todo moves all of the
>>> devices to a local list, so they are only visible from a list_head
>>> on the stack.  Which makes sense as we run this all in the context
>>> of rtnl_unlock.
>>
>> And when that local list is processed (the while loop completes and
>> has iterated over the entire list), either the global todo list is
>> empty, or it is not empty.
>>
>> And the waked up code will check for this.
>>
>> I really don't see what the problem is.
> 
> I am not sure that would work.
> list_empty(&net_todo_list) does not guarantee that there are no
> unregistering devices still in flight.
> Another process may have copied net_todo_run into its local list, left
> net_todo_list empty and still be in the middle of processing
> unregistering devices (without the rtnl lock) when
> default_device_exit_batch starts executing.

Ok, now I understand.

Eric B., when you get a chance can you resubmit your patch and perhaps
elaborate on the situation in the commit message.

If I was confused I'm sure other people will be if they look into
this in the future.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering
  2013-09-18  3:52                           ` David Miller
@ 2013-09-18  8:19                             ` Eric W. Biederman
  2013-09-20 16:34                               ` Francesco Ruggeri
  2013-09-24  4:19                             ` [PATCH] net: Delay default_device_exit_batch until no devices are unregistering v2 Eric W. Biederman
  1 sibling, 1 reply; 36+ messages in thread
From: Eric W. Biederman @ 2013-09-18  8:19 UTC (permalink / raw)
  To: David Miller; +Cc: fruggeri, edumazet, jiri, alexander.h.duyck, amwang, netdev

David Miller <davem@davemloft.net> writes:

> From: Francesco Ruggeri <fruggeri@aristanetworks.com>
> Date: Tue, 17 Sep 2013 20:50:41 -0700

>> I am not sure that would work.
>> list_empty(&net_todo_list) does not guarantee that there are no
>> unregistering devices still in flight.
>> Another process may have copied net_todo_run into its local list, left
>> net_todo_list empty and still be in the middle of processing
>> unregistering devices (without the rtnl lock) when
>> default_device_exit_batch starts executing.
>
> Ok, now I understand.
>
> Eric B., when you get a chance can you resubmit your patch and perhaps
> elaborate on the situation in the commit message.

The code is on my computer at home so I won't be able to get back to
this until Sunday or Monday.  This should give Francesco time to verify
I really have closed the holes he is seeing.

> If I was confused I'm sure other people will be if they look into
> this in the future.

Agreed.  It is easy to get confused with this issue.  And I was in a
rush so I expect my wording could be improved.



There is still the other part of this issue that dev_close reorders
devices in default_device_exit_batch.

With that reordering we can get the call chain:
  netdev_run_todo
    call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL,...)
      dst_dev_event
        dst_ifdown(..., unregister == 1)
          dst->dev = dev_net(dst->dev)->loopback_dev;
	  dev_hold(dst->dev);

With dev_net(dst->dev)->loopback_dev == NULL;

So my patch to split the close_list out of the unreg_list is also needed
(or something else to address that issue). 

Hopefully my exaspearation with dev_close_many didn't obscure what is
happening there.

Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering
  2013-09-18  8:19                             ` Eric W. Biederman
@ 2013-09-20 16:34                               ` Francesco Ruggeri
  0 siblings, 0 replies; 36+ messages in thread
From: Francesco Ruggeri @ 2013-09-20 16:34 UTC (permalink / raw)
  To: ebiederm
  Cc: edumazet, jiri, alexander.h.duyck, amwang, netdev, Francesco Ruggeri

Just an update.
I have been testing this patch in 3.4 (that is the environment where I can
stress things a bit more) and I have also manually tested cases that were
causing problems before.
This is essentially Eric's patch except that it maintains per-namespace
counters and wait queues to avoid unnecessary wakeup calls.
Everything seems to be working fine.
We just started using lockdep and I did see a couple of warnings though.
They do not seem to be related to the patch to me, but I am attaching them
to see what you think.

Index: linux-3.4.x86_64/include/net/net_namespace.h
===================================================================
--- linux-3.4.x86_64.orig/include/net/net_namespace.h
+++ linux-3.4.x86_64/include/net/net_namespace.h
@@ -68,6 +68,8 @@ struct net {
 	struct hlist_head 	*dev_name_head;
 	struct hlist_head	*dev_index_head;
 	unsigned int		dev_base_seq;	/* protected by rtnl_mutex */
+	int			dev_unregistering; /* protected by rtnl_mutex */
+	wait_queue_head_t	dev_unregistering_wait;
 
 	/* core fib_rules */
 	struct list_head	rules_ops;
Index: linux-3.4.x86_64/net/core/net_namespace.c
===================================================================
--- linux-3.4.x86_64.orig/net/core/net_namespace.c
+++ linux-3.4.x86_64/net/core/net_namespace.c
@@ -153,6 +153,8 @@ static __net_init int setup_net(struct n
 	atomic_set(&net->count, 1);
 	atomic_set(&net->passive, 1);
 	net->dev_base_seq = 1;
+	net->dev_unregistering = 0;
+	init_waitqueue_head(&net->dev_unregistering_wait);
 
 #ifdef NETNS_REFCNT_DEBUG
 	atomic_set(&net->use_count, 0);
Index: linux-3.4.x86_64/net/core/dev.c
===================================================================
--- linux-3.4.x86_64.orig/net/core/dev.c
+++ linux-3.4.x86_64/net/core/dev.c
@@ -5191,6 +5191,7 @@ static LIST_HEAD(net_todo_list);
 static void net_set_todo(struct net_device *dev)
 {
 	list_add_tail(&dev->todo_list, &net_todo_list);
+	dev_net(dev)->dev_unregistering++;
 }
 
 static void rollback_registered_many(struct list_head *head)
@@ -5815,6 +5816,12 @@ void netdev_run_todo(void)
 		if (dev->destructor)
 			dev->destructor(dev);
 
+		rtnl_lock();
+		BUG_ON(dev_net(dev)->dev_unregistering <= 0);
+		if (--dev_net(dev)->dev_unregistering == 0)
+			wake_up(&dev_net(dev)->dev_unregistering_wait);
+		__rtnl_unlock();
+
 		/* Free network device */
 		kobject_put(&dev->dev.kobj);
 	}
@@ -6478,6 +6485,26 @@ static void __net_exit default_device_ex
 	rtnl_unlock();
 }
 
+static void netdev_wait_unregistering_and_lock(struct list_head *net_list)
+{
+	struct net *net;
+
+retry:
+	/* Wait until any devices in namespaces in net_list that were
+	   unregistered by other processes are destroyed */
+	list_for_each_entry(net, net_list, exit_list) {
+		wait_event(net->dev_unregistering_wait,
+			   net->dev_unregistering == 0);
+	}
+	rtnl_lock();
+	list_for_each_entry(net, net_list, exit_list) {
+		if (net->dev_unregistering) {
+			__rtnl_unlock();
+			goto retry;
+		}
+	}
+}
+
 static void __net_exit default_device_exit_batch(struct list_head *net_list)
 {
 	/* At exit all network devices most be removed from a network
@@ -6489,7 +6516,7 @@ static void __net_exit default_device_ex
 	struct net *net;
 	LIST_HEAD(dev_kill_list);
 
-	rtnl_lock();
+	netdev_wait_unregistering_and_lock(net_list);
 	list_for_each_entry(net, net_list, exit_list) {
 		for_each_netdev_reverse(net, dev) {
 			if (dev->rtnl_link_ops)
================== LOCKDEP:
Sep 19 01:48:46 bs341 kernel: ===============================
Sep 19 01:48:46 bs341 kernel: [ INFO: suspicious RCU usage. ]
Sep 19 01:48:46 bs341 kernel: 3.4.43-1432884.2010AroraNoureddine.7.fc14.x86_64 #1 Not tainted
Sep 19 01:48:46 bs341 kernel: -------------------------------
Sep 19 01:48:46 bs341 kernel: drivers/net/macvtap.c:97 suspicious rcu_dereference_check() usage!
Sep 19 01:48:46 bs341 kernel:
Sep 19 01:48:46 bs341 kernel: other info that might help us debug this:
Sep 19 01:48:46 bs341 kernel:
Sep 19 01:48:46 bs341 kernel:
Sep 19 01:48:46 bs341 kernel: rcu_scheduler_active = 1, debug_locks = 1
Sep 19 01:48:46 bs341 kernel: 1 lock held by libvirtd/29791:
Sep 19 01:48:46 bs341 kernel: #0:  (macvtap_lock){+.+...}, at: [<ffffffffa033000a>] macvtap_open+0x157/0x1f3 [macvtap]
Sep 19 01:48:46 bs341 kernel:
Sep 19 01:48:46 bs341 kernel: stack backtrace:
Sep 19 01:48:46 bs341 kernel: Pid: 29791, comm: libvirtd Not tainted 3.4.43-1432884.2010AroraNoureddine.7.fc14.x86_64 #1
Sep 19 01:48:46 bs341 kernel: Call Trace:
Sep 19 01:48:46 bs341 kernel: [<ffffffff8106e3e4>] lockdep_rcu_suspicious+0xfc/0x105
Sep 19 01:48:46 bs341 kernel: [<ffffffffa032f51d>] get_slot+0x5f/0x7f [macvtap]
Sep 19 01:48:46 bs341 kernel: [<ffffffffa033001e>] macvtap_open+0x16b/0x1f3 [macvtap]
Sep 19 01:48:46 bs341 kernel: [<ffffffff81105f38>] chrdev_open+0x12b/0x154
Sep 19 01:48:46 bs341 kernel: [<ffffffff81105e0d>] ? cdev_put+0x23/0x23
Sep 19 01:48:46 bs341 kernel: [<ffffffff81100b58>] __dentry_open+0x176/0x29f
Sep 19 01:48:46 bs341 kernel: [<ffffffff8110191f>] nameidata_to_filp+0x63/0x6a
Sep 19 01:48:46 bs341 kernel: [<ffffffff8110e690>] do_last+0x569/0x596
Sep 19 01:48:46 bs341 kernel: [<ffffffff8110edee>] path_openat+0xce/0x348
Sep 19 01:48:46 bs341 kernel: [<ffffffff81008765>] ? native_sched_clock+0x35/0x37
Sep 19 01:48:46 bs341 kernel: [<ffffffff81008770>] ? sched_clock+0x9/0xd
Sep 19 01:48:46 bs341 kernel: [<ffffffff8110f156>] do_filp_open+0x38/0x84
Sep 19 01:48:46 bs341 kernel: [<ffffffff81488775>] ? _raw_spin_unlock+0x26/0x2a
Sep 19 01:48:46 bs341 kernel: [<ffffffff81119d82>] ? alloc_fd+0x173/0x185
Sep 19 01:48:46 bs341 kernel: [<ffffffff81101996>] do_sys_open+0x70/0x102
Sep 19 01:48:46 bs341 kernel: [<ffffffff81141c29>] compat_sys_open+0x16/0x18
Sep 19 01:48:46 bs341 kernel: [<ffffffff814908a6>] sysenter_dispatch+0x7/0x26
Sep 19 01:48:46 bs341 kernel: [<ffffffff812467de>] ? trace_hardirqs_on_thunk+0x3a/0x3f

Sep 19 02:00:40 bs341 kernel: ======================================================
Sep 19 02:00:40 bs341 kernel: [ INFO: possible circular locking dependency detected ]
Sep 19 02:00:40 bs341 kernel: 3.4.43-1432884.2010AroraNoureddine.7.fc14.x86_64 #1 Not tainted
Sep 19 02:00:40 bs341 kernel: -------------------------------------------------------
Sep 19 02:00:40 bs341 kernel: [manager]/6158 is trying to acquire lock:
Sep 19 02:00:40 bs341 kernel: (rtnl_mutex){+.+.+.}, at: [<ffffffff813bccfe>] rtnl_lock+0x12/0x14
Sep 19 02:00:40 bs341 kernel:
Sep 19 02:00:40 bs341 kernel: but task is already holding lock:
Sep 19 02:00:40 bs341 kernel: (&mm->mmap_sem){++++++}, at: [<ffffffff810e3702>] vm_munmap+0x35/0x5c
Sep 19 02:00:40 bs341 kernel:
Sep 19 02:00:40 bs341 kernel: which lock already depends on the new lock.
Sep 19 02:00:40 bs341 kernel:
Sep 19 02:00:40 bs341 kernel:
Sep 19 02:00:40 bs341 kernel: the existing dependency chain (in reverse order) is:
Sep 19 02:00:40 bs341 kernel:
Sep 19 02:00:40 bs341 kernel: -> #1 (&mm->mmap_sem){++++++}:
Sep 19 02:00:40 bs341 kernel:       [<ffffffff8107222b>] lock_acquire+0xbe/0x101
Sep 19 02:00:40 bs341 kernel:       [<ffffffff810dcbef>] might_fault+0x68/0x8b
Sep 19 02:00:40 bs341 kernel:       [<ffffffff813abdbd>] copy_from_user+0x19/0x2c
Sep 19 02:00:40 bs341 kernel:       [<ffffffff813b1375>] dev_ioctl+0x57/0x683
Sep 19 02:00:40 bs341 kernel:       [<ffffffff81399ef5>] sock_do_ioctl+0x38/0x43
Sep 19 02:00:40 bs341 kernel:       [<ffffffff8139a320>] sock_ioctl+0x20e/0x21d
Sep 19 02:00:40 bs341 kernel:       [<ffffffff811115a2>] do_vfs_ioctl+0x488/0x4c9
Sep 19 02:00:40 bs341 kernel:       [<ffffffff81111634>] sys_ioctl+0x51/0x75
Sep 19 02:00:40 bs341 kernel:       [<ffffffff8148f2b9>] system_call_fastpath+0x16/0x1b
Sep 19 02:00:40 bs341 kernel:
Sep 19 02:00:40 bs341 kernel: -> #0 (rtnl_mutex){+.+.+.}:
Sep 19 02:00:40 bs341 kernel:       [<ffffffff810719fa>] __lock_acquire+0xb41/0xe50
Sep 19 02:00:40 bs341 kernel:       [<ffffffff8107222b>] lock_acquire+0xbe/0x101
Sep 19 02:00:40 bs341 kernel:       [<ffffffff814862c2>] __mutex_lock_common+0x4c/0x392
Sep 19 02:00:40 bs341 kernel:       [<ffffffff81486667>] mutex_lock_nested+0x16/0x18
Sep 19 02:00:40 bs341 kernel:       [<ffffffff813bccfe>] rtnl_lock+0x12/0x14
Sep 19 02:00:40 bs341 kernel:       [<ffffffff814684f8>] packet_release+0xd7/0x267
Sep 19 02:00:40 bs341 kernel:       [<ffffffff8139c429>] sock_release+0x1a/0x77
Sep 19 02:00:40 bs341 kernel:       [<ffffffff8139c4a8>] sock_close+0x22/0x26
Sep 19 02:00:40 bs341 kernel:       [<ffffffff81103c65>] fput+0xff/0x1d4
Sep 19 02:00:40 bs341 kernel:       [<ffffffff810e26ee>] remove_vma+0x37/0x6c
Sep 19 02:00:40 bs341 kernel:       [<ffffffff810e36b4>] do_munmap+0x2ed/0x306
Sep 19 02:00:40 bs341 kernel:       [<ffffffff810e3710>] vm_munmap+0x43/0x5c
Sep 19 02:00:40 bs341 kernel:       [<ffffffff810e3851>] sys_munmap+0x21/0x2a
Sep 19 02:00:40 bs341 kernel:       [<ffffffff814908a6>] sysenter_dispatch+0x7/0x26
Sep 19 02:00:40 bs341 kernel:
Sep 19 02:00:40 bs341 kernel: other info that might help us debug this:
Sep 19 02:00:40 bs341 kernel:
Sep 19 02:00:40 bs341 kernel: Possible unsafe locking scenario:
Sep 19 02:00:40 bs341 kernel:
Sep 19 02:00:40 bs341 kernel:       CPU0                    CPU1
Sep 19 02:00:40 bs341 kernel:       ----                    ----
Sep 19 02:00:40 bs341 kernel:  lock(&mm->mmap_sem);
Sep 19 02:00:40 bs341 kernel:                               lock(rtnl_mutex);
Sep 19 02:00:40 bs341 kernel:                               lock(&mm->mmap_sem);
Sep 19 02:00:40 bs341 kernel:  lock(rtnl_mutex);
Sep 19 02:00:40 bs341 kernel:
Sep 19 02:00:40 bs341 kernel: *** DEADLOCK ***
Sep 19 02:00:40 bs341 kernel:
Sep 19 02:00:40 bs341 kernel: 1 lock held by [manager]/6158:
Sep 19 02:00:40 bs341 kernel: #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff810e3702>] vm_munmap+0x35/0x5c
Sep 19 02:00:40 bs341 kernel:
Sep 19 02:00:40 bs341 kernel: stack backtrace:
Sep 19 02:00:40 bs341 kernel: Pid: 6158, comm: [manager] Not tainted 3.4.43-1432884.2010AroraNoureddine.7.fc14.x86_64 #1
Sep 19 02:00:40 bs341 kernel: Call Trace:
Sep 19 02:00:40 bs341 kernel: [<ffffffff8106e906>] print_circular_bug+0x1f8/0x209
Sep 19 02:00:40 bs341 kernel: [<ffffffff810719fa>] __lock_acquire+0xb41/0xe50
Sep 19 02:00:40 bs341 kernel: [<ffffffff8100d93c>] ? save_stack_trace+0x2a/0x47
Sep 19 02:00:40 bs341 kernel: [<ffffffff813bccfe>] ? rtnl_lock+0x12/0x14
Sep 19 02:00:40 bs341 kernel: [<ffffffff8107222b>] lock_acquire+0xbe/0x101
Sep 19 02:00:40 bs341 kernel: [<ffffffff813bccfe>] ? rtnl_lock+0x12/0x14
Sep 19 02:00:40 bs341 kernel: [<ffffffff814862c2>] __mutex_lock_common+0x4c/0x392
Sep 19 02:00:40 bs341 kernel: [<ffffffff813bccfe>] ? rtnl_lock+0x12/0x14
Sep 19 02:00:40 bs341 kernel: [<ffffffff8106da2a>] ? trace_hardirqs_off+0xd/0xf
Sep 19 02:00:40 bs341 kernel: [<ffffffff8105c972>] ? local_clock+0x36/0x4d
Sep 19 02:00:40 bs341 kernel: [<ffffffff813bccfe>] ? rtnl_lock+0x12/0x14
Sep 19 02:00:40 bs341 kernel: [<ffffffff81072144>] ? lock_release+0x1d8/0x201
Sep 19 02:00:40 bs341 kernel: [<ffffffff81486667>] mutex_lock_nested+0x16/0x18
Sep 19 02:00:40 bs341 kernel: [<ffffffff813bccfe>] rtnl_lock+0x12/0x14
Sep 19 02:00:40 bs341 kernel: [<ffffffff814684f8>] packet_release+0xd7/0x267
Sep 19 02:00:40 bs341 kernel: [<ffffffff8139c429>] sock_release+0x1a/0x77
Sep 19 02:00:40 bs341 kernel: [<ffffffff8139c4a8>] sock_close+0x22/0x26
Sep 19 02:00:40 bs341 kernel: [<ffffffff81103c65>] fput+0xff/0x1d4
Sep 19 02:00:40 bs341 kernel: [<ffffffff810e26ee>] remove_vma+0x37/0x6c
Sep 19 02:00:40 bs341 kernel: [<ffffffff810e36b4>] do_munmap+0x2ed/0x306
Sep 19 02:00:40 bs341 kernel: [<ffffffff810e3710>] vm_munmap+0x43/0x5c
Sep 19 02:00:40 bs341 kernel: [<ffffffff810e3851>] sys_munmap+0x21/0x2a
Sep 19 02:00:40 bs341 kernel: [<ffffffff814908a6>] sysenter_dispatch+0x7/0x26
Sep 19 02:00:40 bs341 kernel: [<ffffffff812467de>] ? trace_hardirqs_on_thunk+0x3a/0x3f

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] net: Delay default_device_exit_batch until no devices are unregistering v2
  2013-09-18  3:52                           ` David Miller
  2013-09-18  8:19                             ` Eric W. Biederman
@ 2013-09-24  4:19                             ` Eric W. Biederman
  2013-09-24 17:54                               ` Francesco Ruggeri
  2013-09-28 22:14                               ` David Miller
  1 sibling, 2 replies; 36+ messages in thread
From: Eric W. Biederman @ 2013-09-24  4:19 UTC (permalink / raw)
  To: David Miller; +Cc: fruggeri, edumazet, jiri, alexander.h.duyck, amwang, netdev


There is currently serialization network namespaces exiting and
network devices exiting as the final part of netdev_run_todo does not
happen under the rtnl_lock.  This is compounded by the fact that the
only list of devices unregistering in netdev_run_todo is local to the
netdev_run_todo.

This lack of serialization in extreme cases results in network devices
unregistering in netdev_run_todo after the loopback device of their
network namespace has been freed (making dst_ifdown unsafe), and after
the their network namespace has exited (making the NETDEV_UNREGISTER,
and NETDEV_UNREGISTER_FINAL callbacks unsafe).

Add the missing serialization by a per network namespace count of how
many network devices are unregistering and having a wait queue that is
woken up whenever the count is decreased.  The count and wait queue
allow default_device_exit_batch to wait until all of the unregistration
activity for a network namespace has finished before proceeding to
unregister the loopback device and then allowing the network namespace
to exit.

Only a single global wait queue is used because there is a single global
lock, and there is a single waiter, per network namespace wait queues
would be a waste of resources.

The per network namespace count of unregistering devices gives a
progress guarantee because the number of network devices unregistering
in an exiting network namespace must ultimately drop to zero (assuming
network device unregistration completes).

The basic logic remains the same as in v1.  This patch is now half
comment and half rtnl_lock_unregistering an expanded version of
wait_event performs no extra work in the common case where no network
devices are unregistering when we get to default_device_exit_batch.

Reported-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/net/net_namespace.h |    1 +
 net/core/dev.c              |   49 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 49 insertions(+), 1 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 84e37b1..be46311 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -74,6 +74,7 @@ struct net {
 	struct hlist_head	*dev_index_head;
 	unsigned int		dev_base_seq;	/* protected by rtnl_mutex */
 	int			ifindex;
+	unsigned int		dev_unreg_count;
 
 	/* core fib_rules */
 	struct list_head	rules_ops;
diff --git a/net/core/dev.c b/net/core/dev.c
index 5d702fe..7fa75e7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5002,10 +5002,12 @@ static int dev_new_index(struct net *net)
 
 /* Delayed registration/unregisteration */
 static LIST_HEAD(net_todo_list);
+static DECLARE_WAIT_QUEUE_HEAD(netdev_unregistering_wq);
 
 static void net_set_todo(struct net_device *dev)
 {
 	list_add_tail(&dev->todo_list, &net_todo_list);
+	dev_net(dev)->dev_unreg_count++;
 }
 
 static void rollback_registered_many(struct list_head *head)
@@ -5673,6 +5675,12 @@ void netdev_run_todo(void)
 		if (dev->destructor)
 			dev->destructor(dev);
 
+		/* Report a network device has been unregistered */
+		rtnl_lock();
+		dev_net(dev)->dev_unreg_count--;
+		__rtnl_unlock();
+		wake_up(&netdev_unregistering_wq);
+
 		/* Free network device */
 		kobject_put(&dev->dev.kobj);
 	}
@@ -6358,6 +6366,34 @@ static void __net_exit default_device_exit(struct net *net)
 	rtnl_unlock();
 }
 
+static void __net_exit rtnl_lock_unregistering(struct list_head *net_list)
+{
+	/* Return with the rtnl_lock held when there are no network
+	 * devices unregistering in any network namespace in net_list.
+	 */
+	struct net *net;
+	bool unregistering;
+	DEFINE_WAIT(wait);
+
+	for (;;) {
+		prepare_to_wait(&netdev_unregistering_wq, &wait,
+				TASK_UNINTERRUPTIBLE);
+		unregistering = false;
+		rtnl_lock();
+		list_for_each_entry(net, net_list, exit_list) {
+			if (net->dev_unreg_count > 0) {
+				unregistering = true;
+				break;
+			}
+		}
+		if (!unregistering)
+			break;
+		__rtnl_unlock();
+		schedule();
+	}
+	finish_wait(&netdev_unregistering_wq, &wait);
+}
+
 static void __net_exit default_device_exit_batch(struct list_head *net_list)
 {
 	/* At exit all network devices most be removed from a network
@@ -6369,7 +6405,18 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
 	struct net *net;
 	LIST_HEAD(dev_kill_list);
 
-	rtnl_lock();
+	/* To prevent network device cleanup code from dereferencing
+	 * loopback devices or network devices that have been freed
+	 * wait here for all pending unregistrations to complete,
+	 * before unregistring the loopback device and allowing the
+	 * network namespace be freed.
+	 *
+	 * The netdev todo list containing all network devices
+	 * unregistrations that happen in default_device_exit_batch
+	 * will run in the rtnl_unlock() at the end of
+	 * default_device_exit_batch.
+	 */
+	rtnl_lock_unregistering(net_list);
 	list_for_each_entry(net, net_list, exit_list) {
 		for_each_netdev_reverse(net, dev) {
 			if (dev->rtnl_link_ops)
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] net: Delay default_device_exit_batch until no devices are unregistering v2
  2013-09-24  4:19                             ` [PATCH] net: Delay default_device_exit_batch until no devices are unregistering v2 Eric W. Biederman
@ 2013-09-24 17:54                               ` Francesco Ruggeri
  2013-09-28 22:14                               ` David Miller
  1 sibling, 0 replies; 36+ messages in thread
From: Francesco Ruggeri @ 2013-09-24 17:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Miller, Eric Dumazet, Jiří Pírko,
	Alexander Duyck, Cong Wang, netdev

The patch looks fine to me.
Thanks again for looking into this.

Francesco

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] net: Delay default_device_exit_batch until no devices are unregistering v2
  2013-09-24  4:19                             ` [PATCH] net: Delay default_device_exit_batch until no devices are unregistering v2 Eric W. Biederman
  2013-09-24 17:54                               ` Francesco Ruggeri
@ 2013-09-28 22:14                               ` David Miller
  1 sibling, 0 replies; 36+ messages in thread
From: David Miller @ 2013-09-28 22:14 UTC (permalink / raw)
  To: ebiederm; +Cc: fruggeri, edumazet, jiri, alexander.h.duyck, amwang, netdev

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Mon, 23 Sep 2013 21:19:49 -0700

> 
> There is currently serialization network namespaces exiting and
> network devices exiting as the final part of netdev_run_todo does not
> happen under the rtnl_lock.  This is compounded by the fact that the
> only list of devices unregistering in netdev_run_todo is local to the
> netdev_run_todo.
> 
> This lack of serialization in extreme cases results in network devices
> unregistering in netdev_run_todo after the loopback device of their
> network namespace has been freed (making dst_ifdown unsafe), and after
> the their network namespace has exited (making the NETDEV_UNREGISTER,
> and NETDEV_UNREGISTER_FINAL callbacks unsafe).
> 
> Add the missing serialization by a per network namespace count of how
> many network devices are unregistering and having a wait queue that is
> woken up whenever the count is decreased.  The count and wait queue
> allow default_device_exit_batch to wait until all of the unregistration
> activity for a network namespace has finished before proceeding to
> unregister the loopback device and then allowing the network namespace
> to exit.
> 
> Only a single global wait queue is used because there is a single global
> lock, and there is a single waiter, per network namespace wait queues
> would be a waste of resources.
> 
> The per network namespace count of unregistering devices gives a
> progress guarantee because the number of network devices unregistering
> in an exiting network namespace must ultimately drop to zero (assuming
> network device unregistration completes).
> 
> The basic logic remains the same as in v1.  This patch is now half
> comment and half rtnl_lock_unregistering an expanded version of
> wait_event performs no extra work in the common case where no network
> devices are unregistering when we get to default_device_exit_batch.
> 
> Reported-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Applied, thanks for following up on this Eric.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] net: race condition when removing virtual net_device
  2013-09-10  0:57 ` Stephen Hemminger
@ 2013-09-10  2:03   ` Francesco Ruggeri
  0 siblings, 0 replies; 36+ messages in thread
From: Francesco Ruggeri @ 2013-09-10  2:03 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev

On Mon, Sep 9, 2013 at 5:57 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Mon,  9 Sep 2013 16:15:11 -0700
> Francesco Ruggeri <fruggeri@aristanetworks.com> wrote:
>
> > There is a race condition when removing a net_device while its net namespace
> > is also being removed.
> > This can result in a crash or other unpredictable behavior.
> > This is a sample scenario with veth, but the same applies to other virtual
> > devices such as vlan and macvlan.
> > veth pair v0-v1 is created, with v0 in namespace ns0 and v1 in ns1.
> > Process p0 deletes v0. v1 is also unregistered (in veth_dellink), so when p0
> > gets to netdev_run_todo both v0 and v1 are in net_todo_list, and they have both
> > been unlisted from their respective namespaces. If all references to v1 have not
> > already been dropped then netdev_run_todo/netdev_wait_allrefs will call netdev
> > notifiers for v1, releasing the rtnl lock between calls.
> > Now process p1 removes namespace ns1. v1 will not prevent this from happening
> > (in default_device_exit_batch) since it was already unlisted by p0.
> > Next time p0 invokes the notifiers for v1 any notifiers that use dev_net(v1)
> > will get a pointer to a namespace that has been or is being removed.
>
> Namespace's should be ref counted.


Namespaces are refcounted, but rollback_registered_many is also
invoked when a namespace is torn down. That is triggered in put_net
exactly by the namespace refcount dropping to 0, which then triggers
__put_net, cleanup_net, ops_exit_list, default_device_exit_batch and
unregister_netdevice_many. In that case using the namespace refcount
would not prevent the namespace from being deleted, since we already
passed the point (in put_net) where the refcount would have had that
effect. It would probably also not be straightforward to start using
again the namespace refcount in code that started under the premise
that the refcount had dropped to 0.
net_devices in a namespace do not seem to take references to the
namespace that they are in, and as far as I can tell the only thing
that prevents a namespace form being removed from under its
net_devices' feet is default_device_exit or default_device_exit_batch.
But unlist_netdevice creates a window where that logic fails.
In addition to this, in general we can have several instances of
netdev_run_todo running concurrently with any number of net_devices
from any number of net namespaces, possibly cross-referencing each
other. Some of these instances may be in the context of cleanup_net
and some may not.
Taking a refcount on the loopback_dev and making sure the looback_devs
are at the tail of net_todo_list in each of these instances was the
only way I could come up with to address all scenarios, but there may
be better ways to do it.

Thanks,
Francesco

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] net: race condition when removing virtual net_device
  2013-09-09 23:15 Francesco Ruggeri
@ 2013-09-10  0:57 ` Stephen Hemminger
  2013-09-10  2:03   ` Francesco Ruggeri
  0 siblings, 1 reply; 36+ messages in thread
From: Stephen Hemminger @ 2013-09-10  0:57 UTC (permalink / raw)
  To: Francesco Ruggeri
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev

On Mon,  9 Sep 2013 16:15:11 -0700
Francesco Ruggeri <fruggeri@aristanetworks.com> wrote:

> There is a race condition when removing a net_device while its net namespace
> is also being removed.
> This can result in a crash or other unpredictable behavior.
> This is a sample scenario with veth, but the same applies to other virtual
> devices such as vlan and macvlan.
> veth pair v0-v1 is created, with v0 in namespace ns0 and v1 in ns1.
> Process p0 deletes v0. v1 is also unregistered (in veth_dellink), so when p0
> gets to netdev_run_todo both v0 and v1 are in net_todo_list, and they have both
> been unlisted from their respective namespaces. If all references to v1 have not
> already been dropped then netdev_run_todo/netdev_wait_allrefs will call netdev
> notifiers for v1, releasing the rtnl lock between calls.
> Now process p1 removes namespace ns1. v1 will not prevent this from happening
> (in default_device_exit_batch) since it was already unlisted by p0.
> Next time p0 invokes the notifiers for v1 any notifiers that use dev_net(v1)
> will get a pointer to a namespace that has been or is being removed.

Namespace's should be ref counted.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/1] net: race condition when removing virtual net_device
@ 2013-09-09 23:15 Francesco Ruggeri
  2013-09-10  0:57 ` Stephen Hemminger
  0 siblings, 1 reply; 36+ messages in thread
From: Francesco Ruggeri @ 2013-09-09 23:15 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev
  Cc: Francesco Ruggeri

There is a race condition when removing a net_device while its net namespace
is also being removed.
This can result in a crash or other unpredictable behavior.
This is a sample scenario with veth, but the same applies to other virtual
devices such as vlan and macvlan.
veth pair v0-v1 is created, with v0 in namespace ns0 and v1 in ns1.
Process p0 deletes v0. v1 is also unregistered (in veth_dellink), so when p0
gets to netdev_run_todo both v0 and v1 are in net_todo_list, and they have both
been unlisted from their respective namespaces. If all references to v1 have not
already been dropped then netdev_run_todo/netdev_wait_allrefs will call netdev
notifiers for v1, releasing the rtnl lock between calls.
Now process p1 removes namespace ns1. v1 will not prevent this from happening
(in default_device_exit_batch) since it was already unlisted by p0.
Next time p0 invokes the notifiers for v1 any notifiers that use dev_net(v1)
will get a pointer to a namespace that has been or is being removed.
Similar scenarios apply with v1 as a vlan or macvlan interface and v0 as its
real device.
We hit this problem in 3.4 with sequence fib_netdev_event, fib_disable_ip,
rt_cache_flush, rt_cache_invalidate, inetpeer_invalidate_tree. That sequence no
longer applies in later kernels, but unless we can guarantee that no
NETDEV_UNREGISTER or NETDEV_UNREGISTER_FINAL handler accesses a net_device's
dev_net(dev) then there is a vulnerability (this happens for example with
NETDEV_UNREGISTER_FINAL in dst_dev_event/dst_ifdown).
Commit 0115e8e3 later made things better by reducing the chances of this
happening, but the underlying problem still seems to be there.
I would like to get some feedback on this patch.
The idea is to take a reference to the loopback_dev of a net_device's
namespace (which is always the last net_device to be removed when a namespace
is destroyed) when the net_device is unlisted, and release it when the
net_device is disposed of.
To avoid deadlocks in cleanup_net all loopback_devs are queued at the end
of net_todo_list.

Tested on Linux 3.4.

Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
---
 net/core/dev.c |   30 +++++++++++++++++++++++++++++-
 1 files changed, 29 insertions(+), 1 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 26755dd..da2fd78 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -225,10 +225,14 @@ static void list_netdevice(struct net_device *dev)
 }
 
 /* Device list removal
+ * Take a reference to dev_net(dev)->loopback_dev, so dev_net(dev)
+ * will not be freed until we are done with dev.
  * caller must respect a RCU grace period before freeing/reusing dev
  */
 static void unlist_netdevice(struct net_device *dev)
 {
+	struct net_device *loopback_dev = dev_net(dev)->loopback_dev;
+
 	ASSERT_RTNL();
 
 	/* Unlink dev from the device chain */
@@ -238,9 +242,23 @@ static void unlist_netdevice(struct net_device *dev)
 	hlist_del_rcu(&dev->index_hlist);
 	write_unlock_bh(&dev_base_lock);
 
+	if (dev != loopback_dev)
+		dev_hold(loopback_dev);
+
 	dev_base_seq_inc(dev_net(dev));
 }
 
+/**
+ * Called when a net_device that has been previously unlisted from a net
+ * namespace is disposed of.
+ */
+static inline void unlist_netdevice_done(struct net_device *dev)
+{
+	struct net_device *loopback_dev = dev_net(dev)->loopback_dev;
+	if (dev != loopback_dev)
+		dev_put(loopback_dev);
+}
+
 /*
  *	Our notifier list
  */
@@ -5009,10 +5027,17 @@ static int dev_new_index(struct net *net)
 
 /* Delayed registration/unregisteration */
 static LIST_HEAD(net_todo_list);
+static struct list_head *first_loopback_dev = &net_todo_list;
 
 static void net_set_todo(struct net_device *dev)
 {
-	list_add_tail(&dev->todo_list, &net_todo_list);
+	/* All loopback_devs go to end of net_todo_list. */
+	if (dev == dev_net(dev)->loopback_dev) {
+		list_add_tail(&dev->todo_list, &net_todo_list);
+		if (first_loopback_dev == &net_todo_list)
+			first_loopback_dev = &dev->todo_list;
+	} else
+		list_add_tail(&dev->todo_list, first_loopback_dev);
 }
 
 static void rollback_registered_many(struct list_head *head)
@@ -5641,6 +5666,7 @@ void netdev_run_todo(void)
 
 	/* Snapshot list, allow later requests */
 	list_replace_init(&net_todo_list, &list);
+	first_loopback_dev = &net_todo_list;
 
 	__rtnl_unlock();
 
@@ -5670,6 +5696,7 @@ void netdev_run_todo(void)
 		on_each_cpu(flush_backlog, dev, 1);
 
 		netdev_wait_allrefs(dev);
+		unlist_netdevice_done(dev);
 
 		/* paranoia */
 		BUG_ON(netdev_refcnt_read(dev));
@@ -6076,6 +6103,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
 
 	/* Actually switch the network namespace */
+	unlist_netdevice_done(dev);
 	dev_net_set(dev, net);
 
 	/* If there is an ifindex conflict assign a new one */
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2013-09-28 22:07 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1379008796-2121-1-git-send-email-fruggeri@aristanetworks.com>
2013-09-12 20:06 ` [PATCH 1/1] net: race condition when removing virtual net_device Eric W. Biederman
2013-09-12 21:48   ` Francesco Ruggeri
2013-09-12 22:02     ` Francesco Ruggeri
2013-09-13  5:50     ` Eric W. Biederman
2013-09-13 17:54       ` Francesco Ruggeri
2013-09-14  1:46         ` Eric W. Biederman
2013-09-16  2:54           ` Francesco Ruggeri
2013-09-16 10:45             ` Eric W. Biederman
2013-09-16 20:30               ` Francesco Ruggeri
2013-09-16 23:52                 ` [PATCH net-next] net loopback: Set loopback_dev to NULL when freed Eric W. Biederman
2013-09-17  0:50                   ` Eric Dumazet
2013-09-17  1:34                     ` David Miller
2013-09-17  1:41                       ` Eric Dumazet
2013-09-17  1:52                         ` Eric W. Biederman
2013-09-17 23:05                           ` David Miller
2013-09-17  0:25                 ` [PATCH 1/1] net: race condition when removing virtual net_device Eric W. Biederman
2013-09-17  5:12                   ` Francesco Ruggeri
2013-09-17  3:49                 ` [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering Eric W. Biederman
2013-09-17  6:54                   ` Francesco Ruggeri
2013-09-17  9:38                     ` Eric W. Biederman
2013-09-17 17:14                       ` Francesco Ruggeri
2013-09-17 23:21                   ` David Miller
2013-09-17 23:41                     ` Eric W. Biederman
2013-09-18  0:15                       ` David Miller
2013-09-18  3:50                         ` Francesco Ruggeri
2013-09-18  3:52                           ` David Miller
2013-09-18  8:19                             ` Eric W. Biederman
2013-09-20 16:34                               ` Francesco Ruggeri
2013-09-24  4:19                             ` [PATCH] net: Delay default_device_exit_batch until no devices are unregistering v2 Eric W. Biederman
2013-09-24 17:54                               ` Francesco Ruggeri
2013-09-28 22:14                               ` David Miller
2013-09-14  0:16   ` [PATCH 1/1] net: race condition when removing virtual net_device David Miller
2013-09-14  1:32     ` Eric W. Biederman
2013-09-09 23:15 Francesco Ruggeri
2013-09-10  0:57 ` Stephen Hemminger
2013-09-10  2:03   ` Francesco Ruggeri

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.