All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/20] Batch network namespace cleanup
@ 2009-11-30  1:46 Eric W. Biederman
  2009-11-30  8:07 ` Eric Dumazet
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Eric W. Biederman @ 2009-11-30  1:46 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, jamal, Daniel Lezcano, Alexey Dobriyan, Patrick McHardy



Recently Jamal and Daniel perform some experiments and found that
large numbers of network namespace exiting simultaneously is very
inefficient.  24+ minutes in some configurations.  The cpu overhead
was negligible but it results in long hold times of net_mutex, and
memory being consumed a long time after the last user has gone away.

I looked into it and discovered that by batching network namespace
cleanups I can reduce the time for 4k network namespaces exiting from
5-7 minutes in my configuration to 44 seconds.

This patch series is my set of changes to the network namespace core
and associated cleanups to allow for network namespace batching.

Eric


 drivers/net/bonding/bond_main.c         |   32 +---
 drivers/net/loopback.c                  |    8 -
 drivers/net/ppp_generic.c               |   30 +---
 drivers/net/pppoe.c                     |   38 +----
 drivers/net/pppol2tp.c                  |   36 +----
 include/linux/netdevice.h               |    2 +
 include/linux/notifier.h                |    2 +-
 include/net/net_namespace.h             |    8 +-
 include/net/netns/generic.h             |    8 +-
 include/net/route.h                     |    1 +
 net/8021q/vlan.c                        |   33 +---
 net/core/dev.c                          |   52 ++-----
 net/core/net_namespace.c                |  254 +++++++++++++++++++------------
 net/ipv4/fib_frontend.c                 |    4 +-
 net/ipv4/ip_gre.c                       |   24 +---
 net/ipv4/ipip.c                         |   24 +---
 net/ipv4/route.c                        |    6 +
 net/ipv6/ip6_tunnel.c                   |   25 +---
 net/ipv6/sit.c                          |   25 +---
 net/key/af_key.c                        |   25 +---
 net/netfilter/nf_conntrack_proto_dccp.c |   31 +---
 net/netfilter/nf_conntrack_proto_gre.c  |   20 +--
 net/phonet/pn_dev.c                     |   16 +--
 23 files changed, 284 insertions(+), 420 deletions(-)

Eric W. Biederman (20):
      net: NETDEV_UNREGISTER_PERNET -> NETDEV_UNREGISTER_BATCH
      net: Implement for_each_netdev_reverse.
      net: Batch network namespace destruction.
      net: Automatically allocate per namespace data.
      net: Simplify loopback and improve batching.
      net: Simplfy default_device_exit and improve batching.
      net: Simplify the bond drivers pernet operations.
      net: Simplify vlan pernet operations.
      net: Simplify af_key pernet operations.
      net: Simplify conntrack_proto_dccp pernet operations.
      net: Simplify conntrack_proto_gre pernet operations.
      net: Simplify ppp_generic pernet operations.
      net: Simplify pppoe pernet operations.
      net: Simplify pppol2tp pernet operations.
      net: Simplify phonet pernet operations.
      net: Simplify ip_gre pernet operations.
      net: Simplify ipip pernet operations.
      net: Simplify ip6_tunnel pernet operations.
      net: Simplify ipip6 aka sit pernet operations.
      net: remove [un]register_pernet_gen_... and update the docs.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/20] Batch network namespace cleanup
  2009-11-30  1:46 [PATCH 0/20] Batch network namespace cleanup Eric W. Biederman
@ 2009-11-30  8:07 ` Eric Dumazet
  2009-11-30  8:09   ` David Miller
  2009-11-30  8:25 ` [PATCH 02/20] net: Implement for_each_netdev_reverse Eric W. Biederman
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2009-11-30  8:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Miller, netdev, jamal, Daniel Lezcano, Alexey Dobriyan,
	Patrick McHardy

Eric W. Biederman a écrit :

> Eric W. Biederman (20):
>       net: NETDEV_UNREGISTER_PERNET -> NETDEV_UNREGISTER_BATCH
>       net: Implement for_each_netdev_reverse.
>       net: Batch network namespace destruction.
>       net: Automatically allocate per namespace data.
>       net: Simplify loopback and improve batching.
>       net: Simplfy default_device_exit and improve batching.
>       net: Simplify the bond drivers pernet operations.
>       net: Simplify vlan pernet operations.
>       net: Simplify af_key pernet operations.
>       net: Simplify conntrack_proto_dccp pernet operations.
>       net: Simplify conntrack_proto_gre pernet operations.
>       net: Simplify ppp_generic pernet operations.
>       net: Simplify pppoe pernet operations.
>       net: Simplify pppol2tp pernet operations.
>       net: Simplify phonet pernet operations.
>       net: Simplify ip_gre pernet operations.
>       net: Simplify ipip pernet operations.
>       net: Simplify ip6_tunnel pernet operations.
>       net: Simplify ipip6 aka sit pernet operations.
>       net: remove [un]register_pernet_gen_... and update the docs.
> 

It seems list received part of your patches Eric

01, 07-20 are OK,   02-06 are missing

Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/20] Batch network namespace cleanup
  2009-11-30  8:07 ` Eric Dumazet
@ 2009-11-30  8:09   ` David Miller
  2009-11-30  8:17     ` Eric W. Biederman
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2009-11-30  8:09 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ebiederm, netdev, hadi, dlezcano, adobriyan, kaber

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 30 Nov 2009 09:07:39 +0100

> It seems list received part of your patches Eric
> 
> 01, 07-20 are OK,   02-06 are missing

Not just the list, I didn't get copies of the missing patches through
direct email even though I was on the CC: list.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/20] Batch network namespace cleanup
  2009-11-30  8:09   ` David Miller
@ 2009-11-30  8:17     ` Eric W. Biederman
  2009-11-30  8:48       ` Eric W. Biederman
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2009-11-30  8:17 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev, hadi, dlezcano, adobriyan, kaber

David Miller <davem@davemloft.net> writes:

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 30 Nov 2009 09:07:39 +0100
>
>> It seems list received part of your patches Eric
>> 
>> 01, 07-20 are OK,   02-06 are missing
>
> Not just the list, I didn't get copies of the missing patches through
> direct email even though I was on the CC: list.

Thanks.  Weird.  I will take a look and resend.

Eric


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 02/20] net: Implement for_each_netdev_reverse.
  2009-11-30  1:46 [PATCH 0/20] Batch network namespace cleanup Eric W. Biederman
  2009-11-30  8:07 ` Eric Dumazet
@ 2009-11-30  8:25 ` Eric W. Biederman
  2009-11-30  8:25 ` [PATCH 03/20] net: Batch network namespace destruction Eric W. Biederman
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2009-11-30  8:25 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, jamal, Daniel Lezcano, Alexey Dobriyan, Patrick McHardy,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

I will need this shortly to implement network namespace shutdown
batching.  For sanity sake network devices should be removed in
the reverse order they were created in.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/netdevice.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9428793..daf13d3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1112,6 +1112,8 @@ extern rwlock_t				dev_base_lock;		/* Device list lock */
 
 #define for_each_netdev(net, d)		\
 		list_for_each_entry(d, &(net)->dev_base_head, dev_list)
+#define for_each_netdev_reverse(net, d)	\
+		list_for_each_entry_reverse(d, &(net)->dev_base_head, dev_list)
 #define for_each_netdev_rcu(net, d)		\
 		list_for_each_entry_rcu(d, &(net)->dev_base_head, dev_list)
 #define for_each_netdev_safe(net, d, n)	\
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 03/20] net: Batch network namespace destruction.
  2009-11-30  1:46 [PATCH 0/20] Batch network namespace cleanup Eric W. Biederman
  2009-11-30  8:07 ` Eric Dumazet
  2009-11-30  8:25 ` [PATCH 02/20] net: Implement for_each_netdev_reverse Eric W. Biederman
@ 2009-11-30  8:25 ` Eric W. Biederman
  2009-11-30  8:25 ` [PATCH 04/20] net: Automatically allocate per namespace data Eric W. Biederman
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2009-11-30  8:25 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, jamal, Daniel Lezcano, Alexey Dobriyan, Patrick McHardy,
	Eric W. Biederman, Eric W. Biederman

From: Eric W. Biederman <eric@conroxe.ebiederm.org>

It is fairly common to kill several network namespaces at once.  Either
because they are nested one inside the other or because they are cooperating
in multiple machine networking experiments.  As the network stack control logic
does not parallelize easily batch up multiple network namespaces existing
together.

To get the full benefit of batching the virtual network devices to be
removed must be all removed in one batch.  For that purpose I have added
a loop after the last network device operations have run that batches
up all remaining network devices and deletes them.

An extra benefit is that the reorganization slightly shrinks the size
of the per network namespace data structures replaceing a work_struct
with a list_head.

In a trivial test with 4K namespaces this change reduced the cost of
a destroying 4K namespaces from 7+ minutes (at 12% cpu) to 44 seconds
(at 60% cpu).  The bulk of that 44s was spent in inet_twsk_purge.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/net/net_namespace.h |    2 +-
 net/core/net_namespace.c    |   66 +++++++++++++++++++++++++++++++++++++-----
 2 files changed, 59 insertions(+), 9 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 0addd45..d69b479 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -42,7 +42,7 @@ struct net {
 						 */
 #endif
 	struct list_head	list;		/* list of network namespaces */
-	struct work_struct	work;		/* work struct for freeing */
+	struct list_head	cleanup_list;	/* namespaces on death row */
 
 	struct proc_dir_entry 	*proc_net;
 	struct proc_dir_entry 	*proc_net_stat;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 86ed7f4..a42caa2 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -8,8 +8,10 @@
 #include <linux/idr.h>
 #include <linux/rculist.h>
 #include <linux/nsproxy.h>
+#include <linux/netdevice.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
+#include <net/rtnetlink.h>
 
 /*
  *	Our network namespace constructor/destructor lists
@@ -27,6 +29,20 @@ EXPORT_SYMBOL(init_net);
 
 #define INITIAL_NET_GEN_PTRS	13 /* +1 for len +2 for rcu_head */
 
+static void unregister_netdevices(struct net *net, struct list_head *list)
+{
+	struct net_device *dev;
+	/* At exit all network devices most be removed from a network
+	 * namespace.  Do this in the reverse order of registeration.
+	 */
+	for_each_netdev_reverse(net, dev) {
+		if (dev->rtnl_link_ops)
+			dev->rtnl_link_ops->dellink(dev, list);
+		else
+			unregister_netdevice_queue(dev, list);
+	}
+}
+
 /*
  * setup_net runs the initializers for the network namespace object.
  */
@@ -59,6 +75,13 @@ out_undo:
 	list_for_each_entry_continue_reverse(ops, &pernet_list, list) {
 		if (ops->exit)
 			ops->exit(net);
+		if (&ops->list == first_device) {
+			LIST_HEAD(dev_kill_list);
+			rtnl_lock();
+			unregister_netdevices(net, &dev_kill_list);
+			unregister_netdevice_many(&dev_kill_list);
+			rtnl_unlock();
+		}
 	}
 
 	rcu_barrier();
@@ -147,18 +170,26 @@ struct net *copy_net_ns(unsigned long flags, struct net *old_net)
 	return net_create();
 }
 
+static DEFINE_SPINLOCK(cleanup_list_lock);
+static LIST_HEAD(cleanup_list);  /* Must hold cleanup_list_lock to touch */
+
 static void cleanup_net(struct work_struct *work)
 {
 	struct pernet_operations *ops;
-	struct net *net;
+	struct net *net, *tmp;
+	LIST_HEAD(net_kill_list);
 
-	net = container_of(work, struct net, work);
+	/* Atomically snapshot the list of namespaces to cleanup */
+	spin_lock_irq(&cleanup_list_lock);
+	list_replace_init(&cleanup_list, &net_kill_list);
+	spin_unlock_irq(&cleanup_list_lock);
 
 	mutex_lock(&net_mutex);
 
 	/* Don't let anyone else find us. */
 	rtnl_lock();
-	list_del_rcu(&net->list);
+	list_for_each_entry(net, &net_kill_list, cleanup_list)
+		list_del_rcu(&net->list);
 	rtnl_unlock();
 
 	/*
@@ -170,8 +201,18 @@ static void cleanup_net(struct work_struct *work)
 
 	/* Run all of the network namespace exit methods */
 	list_for_each_entry_reverse(ops, &pernet_list, list) {
-		if (ops->exit)
-			ops->exit(net);
+		if (ops->exit) {
+			list_for_each_entry(net, &net_kill_list, cleanup_list)
+				ops->exit(net);
+		}
+		if (&ops->list == first_device) {
+			LIST_HEAD(dev_kill_list);
+			rtnl_lock();
+			list_for_each_entry(net, &net_kill_list, cleanup_list)
+				unregister_netdevices(net, &dev_kill_list);
+			unregister_netdevice_many(&dev_kill_list);
+			rtnl_unlock();
+		}
 	}
 
 	mutex_unlock(&net_mutex);
@@ -182,14 +223,23 @@ static void cleanup_net(struct work_struct *work)
 	rcu_barrier();
 
 	/* Finally it is safe to free my network namespace structure */
-	net_free(net);
+	list_for_each_entry_safe(net, tmp, &net_kill_list, cleanup_list) {
+		list_del_init(&net->cleanup_list);
+		net_free(net);
+	}
 }
+static DECLARE_WORK(net_cleanup_work, cleanup_net);
 
 void __put_net(struct net *net)
 {
 	/* Cleanup the network namespace in process context */
-	INIT_WORK(&net->work, cleanup_net);
-	queue_work(netns_wq, &net->work);
+	unsigned long flags;
+
+	spin_lock_irqsave(&cleanup_list_lock, flags);
+	list_add(&net->cleanup_list, &cleanup_list);
+	spin_unlock_irqrestore(&cleanup_list_lock, flags);
+
+	queue_work(netns_wq, &net_cleanup_work);
 }
 EXPORT_SYMBOL_GPL(__put_net);
 
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 04/20] net: Automatically allocate per namespace data.
  2009-11-30  1:46 [PATCH 0/20] Batch network namespace cleanup Eric W. Biederman
                   ` (2 preceding siblings ...)
  2009-11-30  8:25 ` [PATCH 03/20] net: Batch network namespace destruction Eric W. Biederman
@ 2009-11-30  8:25 ` Eric W. Biederman
  2009-11-30  8:25 ` [PATCH 05/20] net: Simplify loopback and improve batching Eric W. Biederman
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2009-11-30  8:25 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, jamal, Daniel Lezcano, Alexey Dobriyan, Patrick McHardy,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

To get the full benefit of batched network namespace cleanup netowrk
device deletion needs to be performed by the generic code.  When
using register_pernet_gen_device and freeing the data in exit_net
it is impossible to delay allocation until after exit_net has called
as the device uninit methods are no longer safe.

To correct this, and to simplify working with per network namespace data
I have moved allocation and deletion of per network namespace data into
the network namespace core.  The core now frees the data only after
all of the network namespace exit routines have run.

Now it is only required to set the new fields .id and .size
in the pernet_operations structure if you want network namespace
data to be managed for you automatically.

This makes the current register_pernet_gen_device and
register_pernet_gen_subsys routines unnecessary.  For the moment
I have left them as compatibility wrappers in net_namespace.h
They will be removed once all of the users have been updated.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/net/net_namespace.h |   28 ++++++-
 net/core/net_namespace.c    |  188 +++++++++++++++++++++++--------------------
 2 files changed, 126 insertions(+), 90 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index d69b479..080774b 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -236,6 +236,8 @@ struct pernet_operations {
 	struct list_head list;
 	int (*init)(struct net *net);
 	void (*exit)(struct net *net);
+	int *id;
+	size_t size;
 };
 
 /*
@@ -259,12 +261,30 @@ struct pernet_operations {
  */
 extern int register_pernet_subsys(struct pernet_operations *);
 extern void unregister_pernet_subsys(struct pernet_operations *);
-extern int register_pernet_gen_subsys(int *id, struct pernet_operations *);
-extern void unregister_pernet_gen_subsys(int id, struct pernet_operations *);
 extern int register_pernet_device(struct pernet_operations *);
 extern void unregister_pernet_device(struct pernet_operations *);
-extern int register_pernet_gen_device(int *id, struct pernet_operations *);
-extern void unregister_pernet_gen_device(int id, struct pernet_operations *);
+
+static inline int register_pernet_gen_subsys(int *id, struct pernet_operations *ops)
+{
+	ops->id = id;
+	return register_pernet_subsys(ops);
+}
+
+static inline void unregister_pernet_gen_subsys(int id, struct pernet_operations *ops)
+{
+	return unregister_pernet_subsys(ops);
+}
+
+static inline int register_pernet_gen_device(int *id, struct pernet_operations *ops)
+{
+	ops->id = id;
+	return register_pernet_device(ops);
+}
+
+static inline void unregister_pernet_gen_device(int id, struct pernet_operations *ops)
+{
+	return unregister_pernet_device(ops);
+}
 
 struct ctl_path;
 struct ctl_table;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index a42caa2..9679ad2 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -43,13 +43,40 @@ static void unregister_netdevices(struct net *net, struct list_head *list)
 	}
 }
 
+static int ops_init(const struct pernet_operations *ops, struct net *net)
+{
+	int err;
+	if (ops->id && ops->size) {
+		void *data = kzalloc(ops->size, GFP_KERNEL);
+		if (!data)
+			return -ENOMEM;
+
+		err = net_assign_generic(net, *ops->id, data);
+		if (err) {
+			kfree(data);
+			return err;
+		}
+	}
+	if (ops->init)
+		return ops->init(net);
+	return 0;
+}
+
+static void ops_free(const struct pernet_operations *ops, struct net *net)
+{
+	if (ops->id && ops->size) {
+		int id = *ops->id;
+		kfree(net_generic(net, id));
+	}
+}
+
 /*
  * setup_net runs the initializers for the network namespace object.
  */
 static __net_init int setup_net(struct net *net)
 {
 	/* Must be called with net_mutex held */
-	struct pernet_operations *ops;
+	const struct pernet_operations *ops, *saved_ops;
 	int error = 0;
 
 	atomic_set(&net->count, 1);
@@ -59,11 +86,9 @@ static __net_init int setup_net(struct net *net)
 #endif
 
 	list_for_each_entry(ops, &pernet_list, list) {
-		if (ops->init) {
-			error = ops->init(net);
-			if (error < 0)
-				goto out_undo;
-		}
+		error = ops_init(ops, net);
+		if (error < 0)
+			goto out_undo;
 	}
 out:
 	return error;
@@ -72,6 +97,7 @@ out_undo:
 	/* Walk through the list backwards calling the exit functions
 	 * for the pernet modules whose init functions did not fail.
 	 */
+	saved_ops = ops;
 	list_for_each_entry_continue_reverse(ops, &pernet_list, list) {
 		if (ops->exit)
 			ops->exit(net);
@@ -83,6 +109,9 @@ out_undo:
 			rtnl_unlock();
 		}
 	}
+	ops = saved_ops;
+	list_for_each_entry_continue_reverse(ops, &pernet_list, list)
+		ops_free(ops, net);
 
 	rcu_barrier();
 	goto out;
@@ -175,7 +204,7 @@ static LIST_HEAD(cleanup_list);  /* Must hold cleanup_list_lock to touch */
 
 static void cleanup_net(struct work_struct *work)
 {
-	struct pernet_operations *ops;
+	const struct pernet_operations *ops;
 	struct net *net, *tmp;
 	LIST_HEAD(net_kill_list);
 
@@ -214,6 +243,13 @@ static void cleanup_net(struct work_struct *work)
 			rtnl_unlock();
 		}
 	}
+	/* Free the net generic variables */
+	list_for_each_entry_reverse(ops, &pernet_list, list) {
+		if (ops->size && ops->id) {
+			list_for_each_entry(net, &net_kill_list, cleanup_list)
+				ops_free(ops, net);
+		}
+	}
 
 	mutex_unlock(&net_mutex);
 
@@ -309,16 +345,16 @@ static int __init net_ns_init(void)
 pure_initcall(net_ns_init);
 
 #ifdef CONFIG_NET_NS
-static int register_pernet_operations(struct list_head *list,
-				      struct pernet_operations *ops)
+static int __register_pernet_operations(struct list_head *list,
+					struct pernet_operations *ops)
 {
 	struct net *net, *undo_net;
 	int error;
 
 	list_add_tail(&ops->list, list);
-	if (ops->init) {
+	if (ops->init || (ops->id && ops->size)) {
 		for_each_net(net) {
-			error = ops->init(net);
+			error = ops_init(ops, net);
 			if (error)
 				goto out_undo;
 		}
@@ -336,10 +372,18 @@ out_undo:
 		}
 	}
 undone:
+	if (ops->size && ops->id) {
+		for_each_net(undo_net) {
+			if (net_eq(undo_net, net))
+				goto freed;
+			ops_free(ops, undo_net);
+		}
+	}
+freed:
 	return error;
 }
 
-static void unregister_pernet_operations(struct pernet_operations *ops)
+static void __unregister_pernet_operations(struct pernet_operations *ops)
 {
 	struct net *net;
 
@@ -347,27 +391,66 @@ static void unregister_pernet_operations(struct pernet_operations *ops)
 	if (ops->exit)
 		for_each_net(net)
 			ops->exit(net);
+	if (ops->id && ops->size)
+		for_each_net(net)
+			ops_free(ops, net);
 }
 
 #else
 
-static int register_pernet_operations(struct list_head *list,
-				      struct pernet_operations *ops)
+static int __register_pernet_operations(struct list_head *list,
+					struct pernet_operations *ops)
 {
-	if (ops->init == NULL)
-		return 0;
-	return ops->init(&init_net);
+	int err = 0;
+	err = ops_init(ops, &init_net);
+	if (err)
+		ops_free(ops, &init_net);
+	return err;
+	
 }
 
-static void unregister_pernet_operations(struct pernet_operations *ops)
+static void __unregister_pernet_operations(struct pernet_operations *ops)
 {
 	if (ops->exit)
 		ops->exit(&init_net);
+	ops_free(ops, &init_net);
 }
-#endif
+
+#endif /* CONFIG_NET_NS */
 
 static DEFINE_IDA(net_generic_ids);
 
+static int register_pernet_operations(struct list_head *list,
+				      struct pernet_operations *ops)
+{
+	int error;
+
+	if (ops->id) {
+again:
+		error = ida_get_new_above(&net_generic_ids, 1, ops->id);
+		if (error < 0) {
+			if (error == -EAGAIN) {
+				ida_pre_get(&net_generic_ids, GFP_KERNEL);
+				goto again;
+			}
+			return error;
+		}
+	}
+	error = __register_pernet_operations(list, ops);
+	if (error && ops->id)
+		ida_remove(&net_generic_ids, *ops->id);
+
+	return error;
+}
+
+static void unregister_pernet_operations(struct pernet_operations *ops)
+{
+	
+	__unregister_pernet_operations(ops);
+	if (ops->id)
+		ida_remove(&net_generic_ids, *ops->id);
+}
+
 /**
  *      register_pernet_subsys - register a network namespace subsystem
  *	@ops:  pernet operations structure for the subsystem
@@ -414,38 +497,6 @@ void unregister_pernet_subsys(struct pernet_operations *module)
 }
 EXPORT_SYMBOL_GPL(unregister_pernet_subsys);
 
-int register_pernet_gen_subsys(int *id, struct pernet_operations *ops)
-{
-	int rv;
-
-	mutex_lock(&net_mutex);
-again:
-	rv = ida_get_new_above(&net_generic_ids, 1, id);
-	if (rv < 0) {
-		if (rv == -EAGAIN) {
-			ida_pre_get(&net_generic_ids, GFP_KERNEL);
-			goto again;
-		}
-		goto out;
-	}
-	rv = register_pernet_operations(first_device, ops);
-	if (rv < 0)
-		ida_remove(&net_generic_ids, *id);
-out:
-	mutex_unlock(&net_mutex);
-	return rv;
-}
-EXPORT_SYMBOL_GPL(register_pernet_gen_subsys);
-
-void unregister_pernet_gen_subsys(int id, struct pernet_operations *ops)
-{
-	mutex_lock(&net_mutex);
-	unregister_pernet_operations(ops);
-	ida_remove(&net_generic_ids, id);
-	mutex_unlock(&net_mutex);
-}
-EXPORT_SYMBOL_GPL(unregister_pernet_gen_subsys);
-
 /**
  *      register_pernet_device - register a network namespace device
  *	@ops:  pernet operations structure for the subsystem
@@ -477,30 +528,6 @@ int register_pernet_device(struct pernet_operations *ops)
 }
 EXPORT_SYMBOL_GPL(register_pernet_device);
 
-int register_pernet_gen_device(int *id, struct pernet_operations *ops)
-{
-	int error;
-	mutex_lock(&net_mutex);
-again:
-	error = ida_get_new_above(&net_generic_ids, 1, id);
-	if (error) {
-		if (error == -EAGAIN) {
-			ida_pre_get(&net_generic_ids, GFP_KERNEL);
-			goto again;
-		}
-		goto out;
-	}
-	error = register_pernet_operations(&pernet_list, ops);
-	if (error)
-		ida_remove(&net_generic_ids, *id);
-	else if (first_device == &pernet_list)
-		first_device = &ops->list;
-out:
-	mutex_unlock(&net_mutex);
-	return error;
-}
-EXPORT_SYMBOL_GPL(register_pernet_gen_device);
-
 /**
  *      unregister_pernet_device - unregister a network namespace netdevice
  *	@ops: pernet operations structure to manipulate
@@ -520,17 +547,6 @@ void unregister_pernet_device(struct pernet_operations *ops)
 }
 EXPORT_SYMBOL_GPL(unregister_pernet_device);
 
-void unregister_pernet_gen_device(int id, struct pernet_operations *ops)
-{
-	mutex_lock(&net_mutex);
-	if (&ops->list == first_device)
-		first_device = first_device->next;
-	unregister_pernet_operations(ops);
-	ida_remove(&net_generic_ids, id);
-	mutex_unlock(&net_mutex);
-}
-EXPORT_SYMBOL_GPL(unregister_pernet_gen_device);
-
 static void net_generic_release(struct rcu_head *rcu)
 {
 	struct net_generic *ng;
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 05/20] net: Simplify loopback and improve batching.
  2009-11-30  1:46 [PATCH 0/20] Batch network namespace cleanup Eric W. Biederman
                   ` (3 preceding siblings ...)
  2009-11-30  8:25 ` [PATCH 04/20] net: Automatically allocate per namespace data Eric W. Biederman
@ 2009-11-30  8:25 ` Eric W. Biederman
  2009-11-30  8:25 ` [PATCH 06/20] net: Simplfy default_device_exit " Eric W. Biederman
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2009-11-30  8:25 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, jamal, Daniel Lezcano, Alexey Dobriyan, Patrick McHardy,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Defer calling unregister_netdevice_queue to cleanup_net.  It's simpler
and it allows the loopback device to land in the same batch as other
network devices.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/net/loopback.c |    8 --------
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index c9f6557..eae4ad7 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -212,15 +212,7 @@ out:
 	return err;
 }
 
-static __net_exit void loopback_net_exit(struct net *net)
-{
-	struct net_device *dev = net->loopback_dev;
-
-	unregister_netdev(dev);
-}
-
 /* Registered in net/core/dev.c */
 struct pernet_operations __net_initdata loopback_net_ops = {
        .init = loopback_net_init,
-       .exit = loopback_net_exit,
 };
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 06/20] net: Simplfy default_device_exit and improve batching.
  2009-11-30  1:46 [PATCH 0/20] Batch network namespace cleanup Eric W. Biederman
                   ` (4 preceding siblings ...)
  2009-11-30  8:25 ` [PATCH 05/20] net: Simplify loopback and improve batching Eric W. Biederman
@ 2009-11-30  8:25 ` Eric W. Biederman
  2009-11-30 12:44 ` [PATCH 0/20] Batch network namespace cleanup jamal
  2009-12-01  0:34 ` David Miller
  7 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2009-11-30  8:25 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, jamal, Daniel Lezcano, Alexey Dobriyan, Patrick McHardy,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

- Defer dellink to net_cleanup() allowing for batching.
- Fix comment.
- Use for_each_netdev_safe again as dev_change_net_namespace touches
  at most one network device (unlike veth dellink).

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 net/core/dev.c |   16 ++++++----------
 1 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 124029e..c3e3400 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5735,14 +5735,13 @@ static struct pernet_operations __net_initdata netdev_net_ops = {
 
 static void __net_exit default_device_exit(struct net *net)
 {
-	struct net_device *dev;
+	struct net_device *dev, *aux;
 	/*
-	 * Push all migratable of the network devices back to the
+	 * Push all migratable network devices back to the
 	 * initial network namespace
 	 */
 	rtnl_lock();
-restart:
-	for_each_netdev(net, dev) {
+	for_each_netdev_safe(net, dev, aux) {
 		int err;
 		char fb_name[IFNAMSIZ];
 
@@ -5750,11 +5749,9 @@ restart:
 		if (dev->features & NETIF_F_NETNS_LOCAL)
 			continue;
 
-		/* Delete virtual devices */
-		if (dev->rtnl_link_ops && dev->rtnl_link_ops->dellink) {
-			dev->rtnl_link_ops->dellink(dev, NULL);
-			goto restart;
-		}
+		/* Leave virtual devices for the generic cleanup */
+		if (dev->rtnl_link_ops)
+			continue;
 
 		/* Push remaing network devices to init_net */
 		snprintf(fb_name, IFNAMSIZ, "dev%d", dev->ifindex);
@@ -5764,7 +5761,6 @@ restart:
 				__func__, dev->name, err);
 			BUG();
 		}
-		goto restart;
 	}
 	rtnl_unlock();
 }
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/20] Batch network namespace cleanup
  2009-11-30  8:17     ` Eric W. Biederman
@ 2009-11-30  8:48       ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2009-11-30  8:48 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev, hadi, dlezcano, adobriyan, kaber

ebiederm@xmission.com (Eric W. Biederman) writes:

> David Miller <davem@davemloft.net> writes:
>
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Mon, 30 Nov 2009 09:07:39 +0100
>>
>>> It seems list received part of your patches Eric
>>> 
>>> 01, 07-20 are OK,   02-06 are missing
>>
>> Not just the list, I didn't get copies of the missing patches through
>> direct email even though I was on the CC: list.
>
> Thanks.  Weird.  I will take a look and resend.

I have resent the missing patches, I'm not certain what happened
the first time but clearly those didn't get sent.

02 and 06 have made it to netdev and looped back to me.  I'm not certain
where 03-05 are, but they should be in transit.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/20] Batch network namespace cleanup
  2009-11-30  1:46 [PATCH 0/20] Batch network namespace cleanup Eric W. Biederman
                   ` (5 preceding siblings ...)
  2009-11-30  8:25 ` [PATCH 06/20] net: Simplfy default_device_exit " Eric W. Biederman
@ 2009-11-30 12:44 ` jamal
  2009-11-30 19:22   ` Eric W. Biederman
  2009-12-01  0:34 ` David Miller
  7 siblings, 1 reply; 14+ messages in thread
From: jamal @ 2009-11-30 12:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Miller, netdev, Daniel Lezcano, Alexey Dobriyan, Patrick McHardy

On Sun, 2009-11-29 at 17:46 -0800, Eric W. Biederman wrote:
> 
> Recently Jamal and Daniel perform some experiments and found that
> large numbers of network namespace exiting simultaneously is very
> inefficient.  24+ minutes in some configurations.  The cpu overhead
> was negligible but it results in long hold times of net_mutex, and
> memory being consumed a long time after the last user has gone away.
> 
> I looked into it and discovered that by batching network namespace
> cleanups I can reduce the time for 4k network namespaces exiting from
> 5-7 minutes in my configuration to 44 seconds.

Excellent work Eric. I have time today so i can test these patches
status quo vs.
This is against net-next?

cheers,
jamal




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/20] Batch network namespace cleanup
  2009-11-30 12:44 ` [PATCH 0/20] Batch network namespace cleanup jamal
@ 2009-11-30 19:22   ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2009-11-30 19:22 UTC (permalink / raw)
  To: hadi
  Cc: David Miller, netdev, Daniel Lezcano, Alexey Dobriyan, Patrick McHardy

jamal <hadi@cyberus.ca> writes:

> On Sun, 2009-11-29 at 17:46 -0800, Eric W. Biederman wrote:
>> 
>> Recently Jamal and Daniel perform some experiments and found that
>> large numbers of network namespace exiting simultaneously is very
>> inefficient.  24+ minutes in some configurations.  The cpu overhead
>> was negligible but it results in long hold times of net_mutex, and
>> memory being consumed a long time after the last user has gone away.
>> 
>> I looked into it and discovered that by batching network namespace
>> cleanups I can reduce the time for 4k network namespaces exiting from
>> 5-7 minutes in my configuration to 44 seconds.
>
> Excellent work Eric. I have time today so i can test these patches
> status quo vs.
> This is against net-next?

Yes.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/20] Batch network namespace cleanup
  2009-11-30  1:46 [PATCH 0/20] Batch network namespace cleanup Eric W. Biederman
                   ` (6 preceding siblings ...)
  2009-11-30 12:44 ` [PATCH 0/20] Batch network namespace cleanup jamal
@ 2009-12-01  0:34 ` David Miller
  2009-12-01  0:55   ` Eric W. Biederman
  7 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2009-12-01  0:34 UTC (permalink / raw)
  To: ebiederm; +Cc: netdev, hadi, dlezcano, adobriyan, kaber

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Sun, 29 Nov 2009 17:46:03 -0800

> Recently Jamal and Daniel perform some experiments and found that
> large numbers of network namespace exiting simultaneously is very
> inefficient.  24+ minutes in some configurations.  The cpu overhead
> was negligible but it results in long hold times of net_mutex, and
> memory being consumed a long time after the last user has gone away.
> 
> I looked into it and discovered that by batching network namespace
> cleanups I can reduce the time for 4k network namespaces exiting from
> 5-7 minutes in my configuration to 44 seconds.
> 
> This patch series is my set of changes to the network namespace core
> and associated cleanups to allow for network namespace batching.

All applied, and assuming all of the build checks pass I'll
push this out to net-next-2.6

I should look into that inet_twsk_purge performance issue you mention
when tearing down a namespace.  It walks the entire hash table and
takes a lock for every hash chain.

Eric, is it possible for us to at least slightly optimize this by
doing a peek at whether the head pointer of each chain is NULL and
bypass the spinlock and everything else in that case?  Or is this
not legal with sk_nulls?

Something like:

	if (hlist_nulls_empty(&head->twchain))
		continue;

right before the 'restart' label?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/20] Batch network namespace cleanup
  2009-12-01  0:34 ` David Miller
@ 2009-12-01  0:55   ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2009-12-01  0:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, hadi, dlezcano, adobriyan, kaber

David Miller <davem@davemloft.net> writes:

> From: ebiederm@xmission.com (Eric W. Biederman)
> Date: Sun, 29 Nov 2009 17:46:03 -0800
>
>> Recently Jamal and Daniel perform some experiments and found that
>> large numbers of network namespace exiting simultaneously is very
>> inefficient.  24+ minutes in some configurations.  The cpu overhead
>> was negligible but it results in long hold times of net_mutex, and
>> memory being consumed a long time after the last user has gone away.
>> 
>> I looked into it and discovered that by batching network namespace
>> cleanups I can reduce the time for 4k network namespaces exiting from
>> 5-7 minutes in my configuration to 44 seconds.
>> 
>> This patch series is my set of changes to the network namespace core
>> and associated cleanups to allow for network namespace batching.
>
> All applied, and assuming all of the build checks pass I'll
> push this out to net-next-2.6
>
> I should look into that inet_twsk_purge performance issue you mention
> when tearing down a namespace.  It walks the entire hash table and
> takes a lock for every hash chain.
>
> Eric, is it possible for us to at least slightly optimize this by
> doing a peek at whether the head pointer of each chain is NULL and
> bypass the spinlock and everything else in that case?  Or is this
> not legal with sk_nulls?
>
> Something like:
>
> 	if (hlist_nulls_empty(&head->twchain))
> 		continue;
>
> right before the 'restart' label?

I haven't had a chance to wrap my head around that case fully yet.

After playing with a few ideas I think what we want to do algorithmically
is to have a batched flush like we do for the routing table cache.  That
should get the cost down to only about 100ms.  Which is much better when
you have a lot of them but is still a lot of time.

>From my preliminary investigation I believe we don't need to take any
locks to traverse the hash table.  I think we can do the entire hash
table traversal under simple rcu protection, and only take the lock on
the individual time wait entries to delete them.

....

Also for best batching we have ipip, ipgre, ip6_tunnel, sit, vlan,
and bridging that need to be taught to use rtnl_link_ops and let
the generic code delete their devices.  

The changes for the vlan code are simple.  The rest I haven't finished
wrapping my head around the drivers individual requirements for.   Still
the changes should not be sweeping.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-12-01  0:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-30  1:46 [PATCH 0/20] Batch network namespace cleanup Eric W. Biederman
2009-11-30  8:07 ` Eric Dumazet
2009-11-30  8:09   ` David Miller
2009-11-30  8:17     ` Eric W. Biederman
2009-11-30  8:48       ` Eric W. Biederman
2009-11-30  8:25 ` [PATCH 02/20] net: Implement for_each_netdev_reverse Eric W. Biederman
2009-11-30  8:25 ` [PATCH 03/20] net: Batch network namespace destruction Eric W. Biederman
2009-11-30  8:25 ` [PATCH 04/20] net: Automatically allocate per namespace data Eric W. Biederman
2009-11-30  8:25 ` [PATCH 05/20] net: Simplify loopback and improve batching Eric W. Biederman
2009-11-30  8:25 ` [PATCH 06/20] net: Simplfy default_device_exit " Eric W. Biederman
2009-11-30 12:44 ` [PATCH 0/20] Batch network namespace cleanup jamal
2009-11-30 19:22   ` Eric W. Biederman
2009-12-01  0:34 ` David Miller
2009-12-01  0:55   ` Eric W. Biederman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.