All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown
@ 2016-02-01 22:32 Salam Noureddine
  2016-02-01 22:32 ` [PATCH net-next 1/4] net: add event_list to struct net and provide utility functions Salam Noureddine
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Salam Noureddine @ 2016-02-01 22:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, netdev
  Cc: Salam Noureddine

fib_flush walks the whole fib in a net_namespace and is called for
each net_device being closed or unregistered. This can be very expensive
when dealing with 100k or more routes in the fib and removal of a lot
of interfaces. These four patches deal with this issue by calling fib_flush
just once for each net namespace and introduce a new function arp_ifdown_all
that does a similar optimization for the neighbour table.

I got the following benchmark results on one of our switches.
Without this patch, deleting 1k interfaces with 100k routes in the fib held
the rtnl_lock for 13 seconds. With the patch, rtnl_lock hold time went down
to 5 seconds. The gain is even more pronounced with 512k routes in the FIB.
In this case, without the patch, rtnl_lock was held for 36 seconds and with
the patch it was held for 5.5 seconds.

Salam Noureddine (4):
  net: add event_list to struct net and provide utility functions
  net: dev: add batching to net_device notifiers
  net: core: introduce neigh_ifdown_all for all down interfaces
  net: fib: avoid calling fib_flush for each device when doing batch
    close     and unregister

 include/linux/netdevice.h   |  2 ++
 include/net/arp.h           |  1 +
 include/net/neighbour.h     |  1 +
 include/net/net_namespace.h | 22 ++++++++++++++++++++++
 include/net/netns/ipv4.h    |  1 +
 net/core/dev.c              | 39 ++++++++++++++++++++++++++++++++++++---
 net/core/neighbour.c        | 38 +++++++++++++++++++++++++++++++-------
 net/ipv4/arp.c              |  4 ++++
 net/ipv4/fib_frontend.c     | 16 ++++++++++++++--
 9 files changed, 112 insertions(+), 12 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next 1/4] net: add event_list to struct net and provide utility functions
  2016-02-01 22:32 [PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
@ 2016-02-01 22:32 ` Salam Noureddine
  2016-02-02 20:01   ` Julian Anastasov
  2016-02-01 22:32 ` [PATCH net-next 2/4] net: dev: add batching to net_device notifiers Salam Noureddine
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Salam Noureddine @ 2016-02-01 22:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, netdev
  Cc: Salam Noureddine


Signed-off-by: Salam Noureddine <noureddine@arista.com>
---
 include/net/net_namespace.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4089abc..4cf47de 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -58,6 +58,7 @@ struct net {
 	struct list_head	list;		/* list of network namespaces */
 	struct list_head	cleanup_list;	/* namespaces on death row */
 	struct list_head	exit_list;	/* Use only net_mutex */
+	struct list_head	event_list;     /* net_device notifier list */
 
 	struct user_namespace   *user_ns;	/* Owning user namespace */
 	spinlock_t		nsid_lock;
@@ -380,4 +381,25 @@ static inline void fnhe_genid_bump(struct net *net)
 	atomic_inc(&net->fnhe_genid);
 }
 
+#ifdef CONFIG_NET_NS
+static inline void net_add_event_list(struct list_head *head, struct net *net)
+{
+	if (!list_empty(&net->event_list))
+		list_add_tail(&net->event_list, head);
+}
+
+static inline void net_del_event_list(struct net *net)
+{
+	list_del_init(&net->event_list);
+}
+#else
+static inline void net_add_event_list(struct list_head *head, struct net *net)
+{
+}
+
+static inline void net_del_event_list(struct net *net)
+{
+}
+#endif
+
 #endif /* __NET_NET_NAMESPACE_H */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 2/4] net: dev: add batching to net_device notifiers
  2016-02-01 22:32 [PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
  2016-02-01 22:32 ` [PATCH net-next 1/4] net: add event_list to struct net and provide utility functions Salam Noureddine
@ 2016-02-01 22:32 ` Salam Noureddine
  2016-02-02 20:55   ` Julian Anastasov
  2016-02-01 22:32 ` [PATCH net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces Salam Noureddine
  2016-02-01 22:32 ` [PATCH net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister Salam Noureddine
  3 siblings, 1 reply; 15+ messages in thread
From: Salam Noureddine @ 2016-02-01 22:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, netdev
  Cc: Salam Noureddine

This can be used to optimize bringing down and unregsitering
net_devices by running certain cleanup operations only on the
net namespace instead of on each net_device.

Signed-off-by: Salam Noureddine <noureddine@arista.com>
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c            | 39 ++++++++++++++++++++++++++++++++++++---
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c20b814..1b12269 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2183,6 +2183,8 @@ struct netdev_lag_lower_state_info {
 #define NETDEV_BONDING_INFO	0x0019
 #define NETDEV_PRECHANGEUPPER	0x001A
 #define NETDEV_CHANGELOWERSTATE	0x001B
+#define NETDEV_UNREGISTER_BATCH	0x001C
+#define NETDEV_DOWN_BATCH	0x001D
 
 int register_netdevice_notifier(struct notifier_block *nb);
 int unregister_netdevice_notifier(struct notifier_block *nb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 914b4a2..77410a3 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1439,11 +1439,16 @@ static int __dev_close(struct net_device *dev)
 int dev_close_many(struct list_head *head, bool unlink)
 {
 	struct net_device *dev, *tmp;
+	struct net *net, *net_tmp;
+	LIST_HEAD(net_head);
 
 	/* Remove the devices that don't need to be closed */
-	list_for_each_entry_safe(dev, tmp, head, close_list)
+	list_for_each_entry_safe(dev, tmp, head, close_list) {
 		if (!(dev->flags & IFF_UP))
 			list_del_init(&dev->close_list);
+		else
+			net_add_event_list(&net_head, dev_net(dev));
+	}
 
 	__dev_close_many(head);
 
@@ -1454,6 +1459,11 @@ int dev_close_many(struct list_head *head, bool unlink)
 			list_del_init(&dev->close_list);
 	}
 
+	list_for_each_entry_safe(net, net_tmp, &net_head, event_list) {
+		call_netdevice_notifiers(NETDEV_DOWN_BATCH, net->loopback_dev);
+		net_del_event_list(net);
+	}
+
 	return 0;
 }
 EXPORT_SYMBOL(dev_close_many);
@@ -1572,8 +1582,12 @@ rollback:
 				call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
 							dev);
 				call_netdevice_notifier(nb, NETDEV_DOWN, dev);
+				call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
+							dev);
 			}
 			call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
+			call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
+						dev);
 		}
 	}
 
@@ -1614,8 +1628,12 @@ int unregister_netdevice_notifier(struct notifier_block *nb)
 				call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
 							dev);
 				call_netdevice_notifier(nb, NETDEV_DOWN, dev);
+				call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
+							dev);
 			}
 			call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
+			call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
+						dev);
 		}
 	}
 unlock:
@@ -6187,10 +6205,12 @@ void __dev_notify_flags(struct net_device *dev, unsigned int old_flags,
 		rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges, GFP_ATOMIC);
 
 	if (changes & IFF_UP) {
-		if (dev->flags & IFF_UP)
+		if (dev->flags & IFF_UP) {
 			call_netdevice_notifiers(NETDEV_UP, dev);
-		else
+		} else {
 			call_netdevice_notifiers(NETDEV_DOWN, dev);
+			call_netdevice_notifiers(NETDEV_DOWN_BATCH, dev);
+		}
 	}
 
 	if (dev->flags & IFF_UP &&
@@ -6427,7 +6447,9 @@ static void net_set_todo(struct net_device *dev)
 static void rollback_registered_many(struct list_head *head)
 {
 	struct net_device *dev, *tmp;
+	struct net *net, *net_tmp;
 	LIST_HEAD(close_head);
+	LIST_HEAD(net_head);
 
 	BUG_ON(dev_boot_phase);
 	ASSERT_RTNL();
@@ -6504,6 +6526,15 @@ static void rollback_registered_many(struct list_head *head)
 #endif
 	}
 
+	list_for_each_entry(dev, head, unreg_list) {
+		net_add_event_list(&net_head, dev_net(dev));
+	}
+	list_for_each_entry_safe(net, net_tmp, &net_head, event_list) {
+		call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH,
+					 net->loopback_dev);
+		net_del_event_list(net);
+	}
+
 	synchronize_net();
 
 	list_for_each_entry(dev, head, unreg_list)
@@ -7065,6 +7096,7 @@ static void netdev_wait_allrefs(struct net_device *dev)
 
 			/* Rebroadcast unregister notification */
 			call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
+			call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev);
 
 			__rtnl_unlock();
 			rcu_barrier();
@@ -7581,6 +7613,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	   the device is just moving and can keep their slaves up.
 	*/
 	call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
+	call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev);
 	rcu_barrier();
 	call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
 	rtmsg_ifinfo(RTM_DELLINK, dev, ~0U, GFP_KERNEL);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces
  2016-02-01 22:32 [PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
  2016-02-01 22:32 ` [PATCH net-next 1/4] net: add event_list to struct net and provide utility functions Salam Noureddine
  2016-02-01 22:32 ` [PATCH net-next 2/4] net: dev: add batching to net_device notifiers Salam Noureddine
@ 2016-02-01 22:32 ` Salam Noureddine
  2016-02-01 22:32 ` [PATCH net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister Salam Noureddine
  3 siblings, 0 replies; 15+ messages in thread
From: Salam Noureddine @ 2016-02-01 22:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, netdev
  Cc: Salam Noureddine

This cleans up neighbour entries for all interfaces in the down
state, avoiding walking the whole neighbour table for each interface
being brought down.

Signed-off-by: Salam Noureddine <noureddine@arista.com>
---
 include/net/arp.h       |  1 +
 include/net/neighbour.h |  1 +
 net/core/neighbour.c    | 38 +++++++++++++++++++++++++++++++-------
 net/ipv4/arp.c          |  4 ++++
 4 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/include/net/arp.h b/include/net/arp.h
index 5e0f891..0efee66 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -43,6 +43,7 @@ void arp_send(int type, int ptype, __be32 dest_ip,
 	      const unsigned char *src_hw, const unsigned char *th);
 int arp_mc_map(__be32 addr, u8 *haddr, struct net_device *dev, int dir);
 void arp_ifdown(struct net_device *dev);
+void arp_ifdown_all(void);
 
 struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip,
 			   struct net_device *dev, __be32 src_ip,
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 8b68384..8785d7b 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -318,6 +318,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, u32 flags);
 void __neigh_set_probe_once(struct neighbour *neigh);
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev);
 int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
+int neigh_ifdown_all(struct neigh_table *tbl);
 int neigh_resolve_output(struct neighbour *neigh, struct sk_buff *skb);
 int neigh_connected_output(struct neighbour *neigh, struct sk_buff *skb);
 int neigh_direct_output(struct neighbour *neigh, struct sk_buff *skb);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index f18ae91..bfbd97a 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -54,7 +54,8 @@ do {						\
 static void neigh_timer_handler(unsigned long arg);
 static void __neigh_notify(struct neighbour *n, int type, int flags);
 static void neigh_update_notify(struct neighbour *neigh);
-static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
+static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
+			 bool all_down);
 
 #ifdef CONFIG_PROC_FS
 static const struct file_operations neigh_stat_seq_fops;
@@ -192,7 +193,8 @@ static void pneigh_queue_purge(struct sk_buff_head *list)
 	}
 }
 
-static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
+static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev,
+			    bool all_down)
 {
 	int i;
 	struct neigh_hash_table *nht;
@@ -210,6 +212,12 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
 				np = &n->next;
 				continue;
 			}
+			if (!dev && n->dev && all_down) {
+				if (n->dev->flags & IFF_UP) {
+					np = &n->next;
+					continue;
+				}
+			}
 			rcu_assign_pointer(*np,
 				   rcu_dereference_protected(n->next,
 						lockdep_is_held(&tbl->lock)));
@@ -245,7 +253,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev)
 {
 	write_lock_bh(&tbl->lock);
-	neigh_flush_dev(tbl, dev);
+	neigh_flush_dev(tbl, dev, false);
 	write_unlock_bh(&tbl->lock);
 }
 EXPORT_SYMBOL(neigh_changeaddr);
@@ -253,8 +261,8 @@ EXPORT_SYMBOL(neigh_changeaddr);
 int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
 {
 	write_lock_bh(&tbl->lock);
-	neigh_flush_dev(tbl, dev);
-	pneigh_ifdown(tbl, dev);
+	neigh_flush_dev(tbl, dev, false);
+	pneigh_ifdown(tbl, dev, false);
 	write_unlock_bh(&tbl->lock);
 
 	del_timer_sync(&tbl->proxy_timer);
@@ -263,6 +271,19 @@ int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
 }
 EXPORT_SYMBOL(neigh_ifdown);
 
+int neigh_ifdown_all(struct neigh_table *tbl)
+{
+	write_lock_bh(&tbl->lock);
+	neigh_flush_dev(tbl, NULL, true);
+	pneigh_ifdown(tbl, NULL, true);
+	write_unlock_bh(&tbl->lock);
+
+	del_timer_sync(&tbl->proxy_timer);
+	pneigh_queue_purge(&tbl->proxy_queue);
+	return 0;
+}
+EXPORT_SYMBOL(neigh_ifdown_all);
+
 static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device *dev)
 {
 	struct neighbour *n = NULL;
@@ -645,7 +666,8 @@ int pneigh_delete(struct neigh_table *tbl, struct net *net, const void *pkey,
 	return -ENOENT;
 }
 
-static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
+static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
+			 bool all_down)
 {
 	struct pneigh_entry *n, **np;
 	u32 h;
@@ -653,7 +675,9 @@ static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
 	for (h = 0; h <= PNEIGH_HASHMASK; h++) {
 		np = &tbl->phash_buckets[h];
 		while ((n = *np) != NULL) {
-			if (!dev || n->dev == dev) {
+			if ((!dev && !all_down) || (all_down && n->dev &&
+						    !(n->dev->flags & IFF_UP)) ||
+			    n->dev == dev) {
 				*np = n->next;
 				if (tbl->pdestructor)
 					tbl->pdestructor(n);
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 59b3e0e..1328244 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1219,6 +1219,10 @@ void arp_ifdown(struct net_device *dev)
 	neigh_ifdown(&arp_tbl, dev);
 }
 
+void arp_ifdown_all(void)
+{
+	neigh_ifdown_all(&arp_tbl);
+}
 
 /*
  *	Called once on startup.
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister
  2016-02-01 22:32 [PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
                   ` (2 preceding siblings ...)
  2016-02-01 22:32 ` [PATCH net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces Salam Noureddine
@ 2016-02-01 22:32 ` Salam Noureddine
  3 siblings, 0 replies; 15+ messages in thread
From: Salam Noureddine @ 2016-02-01 22:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, netdev
  Cc: Salam Noureddine

Call fib_flush at the end when closing or unregistering multiple
devices. This can save walking the fib many times and greatly
reduce rtnl_lock hold time when unregistering many devices with
a fib having hundreds of thousands of routes.

Signed-off-by: Salam Noureddine <noureddine@arista.com>
---
 include/net/netns/ipv4.h |  1 +
 net/ipv4/fib_frontend.c  | 16 ++++++++++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index d75be32..d59a078 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -111,5 +111,6 @@ struct netns_ipv4 {
 #endif
 #endif
 	atomic_t	rt_genid;
+	bool		needs_fib_flush;
 };
 #endif
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4734475..808426e 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1161,11 +1161,22 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
 	unsigned int flags;
 
 	if (event == NETDEV_UNREGISTER) {
-		fib_disable_ip(dev, event, true);
+		if (fib_sync_down_dev(dev, event, true))
+			net->ipv4.needs_fib_flush = true;
 		rt_flush_dev(dev);
 		return NOTIFY_DONE;
 	}
 
+	if (event == NETDEV_UNREGISTER_BATCH || event == NETDEV_DOWN_BATCH) {
+		if (net->ipv4.needs_fib_flush) {
+			fib_flush(net);
+			net->ipv4.needs_fib_flush = false;
+		}
+		rt_cache_flush(net);
+		arp_ifdown_all();
+		return NOTIFY_DONE;
+	}
+
 	in_dev = __in_dev_get_rtnl(dev);
 	if (!in_dev)
 		return NOTIFY_DONE;
@@ -1182,7 +1193,8 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
 		rt_cache_flush(net);
 		break;
 	case NETDEV_DOWN:
-		fib_disable_ip(dev, event, false);
+		if (fib_sync_down_dev(dev, event, false))
+			net->ipv4.needs_fib_flush = true;
 		break;
 	case NETDEV_CHANGE:
 		flags = dev_get_flags(dev);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/4] net: add event_list to struct net and provide utility functions
  2016-02-01 22:32 ` [PATCH net-next 1/4] net: add event_list to struct net and provide utility functions Salam Noureddine
@ 2016-02-02 20:01   ` Julian Anastasov
  2016-02-02 23:58     ` Salam Noureddine
  0 siblings, 1 reply; 15+ messages in thread
From: Julian Anastasov @ 2016-02-02 20:01 UTC (permalink / raw)
  To: Salam Noureddine
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, netdev


	Hello,

On Mon, 1 Feb 2016, Salam Noureddine wrote:

> +#ifdef CONFIG_NET_NS
> +static inline void net_add_event_list(struct list_head *head, struct net *net)
> +{
> +	if (!list_empty(&net->event_list))

	Above check looks inverted, it works may be
because INIT_LIST_HEAD(&net->event_list) is missing.

> +		list_add_tail(&net->event_list, head);
> +}
> +

Regards

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 2/4] net: dev: add batching to net_device notifiers
  2016-02-01 22:32 ` [PATCH net-next 2/4] net: dev: add batching to net_device notifiers Salam Noureddine
@ 2016-02-02 20:55   ` Julian Anastasov
  2016-02-03  6:31     ` Salam Noureddine
  0 siblings, 1 reply; 15+ messages in thread
From: Julian Anastasov @ 2016-02-02 20:55 UTC (permalink / raw)
  To: Salam Noureddine
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, netdev


	Hello,

On Mon, 1 Feb 2016, Salam Noureddine wrote:

> @@ -1572,8 +1582,12 @@ rollback:
>  				call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
>  							dev);
>  				call_netdevice_notifier(nb, NETDEV_DOWN, dev);
> +				call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
> +							dev);
>  			}
>  			call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
> +			call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
> +						dev);

			If the rule is once per net, the above call...

>  		}

		should be here:

		call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
					net->loopback_dev);

	and also once after outroll label?:

	call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, last);

>  	}
>  
> @@ -1614,8 +1628,12 @@ int unregister_netdevice_notifier(struct notifier_block *nb)
>  				call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
>  							dev);
>  				call_netdevice_notifier(nb, NETDEV_DOWN, dev);
> +				call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
> +							dev);
>  			}
>  			call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
> +			call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
> +						dev);

			Above call...

>  		}

		should be here, for net->loopback_dev?
Also, is it ok to call NETDEV_DOWN_BATCH many times, as result,
sometimes after NETDEV_UNREGISTER?

>  	}
>  unlock:

> @@ -6427,7 +6447,9 @@ static void net_set_todo(struct net_device *dev)
>  static void rollback_registered_many(struct list_head *head)
>  {
>  	struct net_device *dev, *tmp;
> +	struct net *net, *net_tmp;
>  	LIST_HEAD(close_head);
> +	LIST_HEAD(net_head);
>  
>  	BUG_ON(dev_boot_phase);
>  	ASSERT_RTNL();
> @@ -6504,6 +6526,15 @@ static void rollback_registered_many(struct list_head *head)
>  #endif
>  	}
>  
> +	list_for_each_entry(dev, head, unreg_list) {
> +		net_add_event_list(&net_head, dev_net(dev));
> +	}
> +	list_for_each_entry_safe(net, net_tmp, &net_head, event_list) {
> +		call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH,
> +					 net->loopback_dev);
> +		net_del_event_list(net);
> +	}
> +

	NETDEV_UNREGISTER* should not be called before
following synchronize_net and NETDEV_UNREGISTER. May be
we should split the loop: loop (dev_shutdown+NETDEV_UNREGISTER)
followed by above NETDEV_UNREGISTER_BATCH then again the
loop for all remaining calls.

>  	synchronize_net();
>  
>  	list_for_each_entry(dev, head, unreg_list)

Regards

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/4] net: add event_list to struct net and provide utility functions
  2016-02-02 20:01   ` Julian Anastasov
@ 2016-02-02 23:58     ` Salam Noureddine
  0 siblings, 0 replies; 15+ messages in thread
From: Salam Noureddine @ 2016-02-02 23:58 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, Network Development

On Tue, Feb 2, 2016 at 12:01 PM, Julian Anastasov <ja@ssi.bg> wrote:

>> +#ifdef CONFIG_NET_NS
>> +static inline void net_add_event_list(struct list_head *head, struct net *net)
>> +{
>> +     if (!list_empty(&net->event_list))
>
>         Above check looks inverted, it works may be
> because INIT_LIST_HEAD(&net->event_list) is missing.
>
>> +             list_add_tail(&net->event_list, head);
>> +}
>> +

Thanks for catching this! I ran my benchmark again with the corrected check
and I still get the same benefits with respect to rtnl_lock hold time.

Salam

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 2/4] net: dev: add batching to net_device notifiers
  2016-02-02 20:55   ` Julian Anastasov
@ 2016-02-03  6:31     ` Salam Noureddine
  2016-02-03  8:08       ` Julian Anastasov
  0 siblings, 1 reply; 15+ messages in thread
From: Salam Noureddine @ 2016-02-03  6:31 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, Network Development

On Tue, Feb 2, 2016 at 12:55 PM, Julian Anastasov <ja@ssi.bg> wrote:
>
>         Hello,
>
> On Mon, 1 Feb 2016, Salam Noureddine wrote:
>
>> @@ -1572,8 +1582,12 @@ rollback:

>>                       call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
>> +                     call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
>> +                                             dev);
>
>                         If the rule is once per net, the above call...
>
>>               }
>
>                 should be here:
>
>                 call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
>                                         net->loopback_dev);
>
>         and also once after outroll label?:

That's a good optimization to add. I was mostly focusing on the device
unregister path.

>
>         call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, last);
>
>>       }
>>
>> @@ -1614,8 +1628,12 @@ int unregister_netdevice_notifier(struct notifier_block *nb)

>>                       call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
>> +                     call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
>> +                                             dev);
>
>                         Above call...
>
>>               }
>
>                 should be here, for net->loopback_dev?
> Also, is it ok to call NETDEV_DOWN_BATCH many times, as result,
> sometimes after NETDEV_UNREGISTER?

Same here, I can add this optimization. I think it is fine to call the
BATCH notifiers
for every interface. It is just better to do it for many interfaces at
the same time.


>> +     list_for_each_entry_safe(net, net_tmp, &net_head, event_list) {
>> +             call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH,
>> +                                      net->loopback_dev);
>> +             net_del_event_list(net);
>> +     }
>> +
>
>         NETDEV_UNREGISTER* should not be called before
> following synchronize_net and NETDEV_UNREGISTER. May be
> we should split the loop: loop (dev_shutdown+NETDEV_UNREGISTER)
> followed by above NETDEV_UNREGISTER_BATCH then again the
> loop for all remaining calls
>
>>       synchronize_net();
>>
>>       list_for_each_entry(dev, head, unreg_list)
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>

The call to NETDEV_UNREGISTER_BATCH is actually after NETDEV_UNREGISTER,
it seems the other way around in the patch because it is showing part
of netdev_wait_allrefs
and not rollback_registered_may.

Thanks,

Salam

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 2/4] net: dev: add batching to net_device notifiers
  2016-02-03  6:31     ` Salam Noureddine
@ 2016-02-03  8:08       ` Julian Anastasov
  2016-02-04  1:26         ` Salam Noureddine
  0 siblings, 1 reply; 15+ messages in thread
From: Julian Anastasov @ 2016-02-03  8:08 UTC (permalink / raw)
  To: Salam Noureddine
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, Network Development


	Hello,

On Tue, 2 Feb 2016, Salam Noureddine wrote:

> On Tue, Feb 2, 2016 at 12:55 PM, Julian Anastasov <ja@ssi.bg> wrote:
> >
> >                         If the rule is once per net, the above call...
> >
> >>               }
> >
> >                 should be here:
> >
> >                 call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
> >                                         net->loopback_dev);
> >
> >         and also once after outroll label?:
> 
> That's a good optimization to add. I was mostly focusing on the device
> unregister path.

	Yes, the idea is to avoid NETDEV_UNREGISTER_BATCH for
every dev. And not to forget to call it for every net.
In this case it is a cleanup path after failure.

> >>                       call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
> >> +                     call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
> >> +                                             dev);
> >
> >                         Above call...
> >
> >>               }
> >
> >                 should be here, for net->loopback_dev?
> > Also, is it ok to call NETDEV_DOWN_BATCH many times, as result,
> > sometimes after NETDEV_UNREGISTER?
> 
> Same here, I can add this optimization. I think it is fine to call the
> BATCH notifiers
> for every interface. It is just better to do it for many interfaces at
> the same time.

	Agreed

> >> +     list_for_each_entry_safe(net, net_tmp, &net_head, event_list) {
> >> +             call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH,
> >> +                                      net->loopback_dev);
> >> +             net_del_event_list(net);
> >> +     }
> >> +
> >
> >         NETDEV_UNREGISTER* should not be called before
> > following synchronize_net and NETDEV_UNREGISTER. May be
> > we should split the loop: loop (dev_shutdown+NETDEV_UNREGISTER)
> > followed by above NETDEV_UNREGISTER_BATCH then again the
> > loop for all remaining calls
> >
> >>       synchronize_net();
> 
> The call to NETDEV_UNREGISTER_BATCH is actually after NETDEV_UNREGISTER,
> it seems the other way around in the patch because it is showing part
> of netdev_wait_allrefs
> and not rollback_registered_may.

	Aha, I see, it is after NETDEV_UNREGISTER but may be
the above loop should be changed to two loops so that
NETDEV_UNREGISTER_BATCH is called exactly after all
NETDEV_UNREGISTER and before all dev_*_flush and
ndo_uninit calls to avoid any risks. I mean:

        synchronize_net();

	First part of loop:

        list_for_each_entry(dev, head, unreg_list) {
                /* Shutdown queueing discipline. */
                dev_shutdown(dev);

                /* Notify protocols, that we are about to destroy
                   this device. They should clean all the things.
                */
                call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
	}

	This is the same NETDEV_UNREGISTER_BATCH logic:

+       list_for_each_entry(dev, head, unreg_list) {
+               net_add_event_list(&net_head, dev_net(dev));
+       }
+       list_for_each_entry_safe(net, net_tmp, &net_head, event_list) {
+               call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH,
+                                        net->loopback_dev);
+               net_del_event_list(net);
+       }

	Second part of the loop:

        list_for_each_entry(dev, head, unreg_list) {
                struct sk_buff *skb = NULL;

		if (!dev->rtnl_link_ops ||
		...

Regards

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 2/4] net: dev: add batching to net_device notifiers
  2016-02-03  8:08       ` Julian Anastasov
@ 2016-02-04  1:26         ` Salam Noureddine
  0 siblings, 0 replies; 15+ messages in thread
From: Salam Noureddine @ 2016-02-04  1:26 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, Network Development

On Wed, Feb 3, 2016 at 12:08 AM, Julian Anastasov <ja@ssi.bg> wrote:


>         Aha, I see, it is after NETDEV_UNREGISTER but may be
> the above loop should be changed to two loops so that
> NETDEV_UNREGISTER_BATCH is called exactly after all
> NETDEV_UNREGISTER and before all dev_*_flush and
> ndo_uninit calls to avoid any risks. I mean:
>
>         synchronize_net();
>
>         First part of loop:
>
>         list_for_each_entry(dev, head, unreg_list) {
>                 /* Shutdown queueing discipline. */
>                 dev_shutdown(dev);
>
>                 /* Notify protocols, that we are about to destroy
>                    this device. They should clean all the things.
>                 */
>                 call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
>         }
>
>         This is the same NETDEV_UNREGISTER_BATCH logic:
>
> +       list_for_each_entry(dev, head, unreg_list) {
> +               net_add_event_list(&net_head, dev_net(dev));
> +       }
> +       list_for_each_entry_safe(net, net_tmp, &net_head, event_list) {
> +               call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH,
> +                                        net->loopback_dev);
> +               net_del_event_list(net);
> +       }
>
>         Second part of the loop:
>
>         list_for_each_entry(dev, head, unreg_list) {
>                 struct sk_buff *skb = NULL;
>
>                 if (!dev->rtnl_link_ops ||
>                 ...
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>

You're right. It is probably safer to organize the code the way you said.
Will change that, Thanks!

Salam

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown
  2016-01-05  1:12   ` Salam Noureddine
@ 2016-01-12  1:53     ` Salam Noureddine
  0 siblings, 0 replies; 15+ messages in thread
From: Salam Noureddine @ 2016-01-12  1:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Network Development

Any thoughts on the correctness of the patchset? Or a different way to
solve this issue leading to long rtnl_lock hold time?

Thanks,

Salam

On Mon, Jan 4, 2016 at 5:12 PM, Salam Noureddine <noureddine@arista.com> wrote:
> On Mon, Jan 4, 2016 at 4:35 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>> Two things would be very valuable with this patchset.
>>
>> Some numbers on how much your changes have improved the code in the case
>> you care about.  I suspect the improvements are not subtle so this
>> should not be hard.
>>
>> Can you please provide a justification for event_list.  Just skimming
>> through it appears that event_list because a duplicate of the list of
>> batched network devices that are passed to dev_close_many and friends.
>> If the list is actually a duplicate it appears foolish to create it.
>>
>> Eric
>
> The performance test I ran tries to unregister 1000 dummy interfaces
> with 512K routes in the fib.
> Without the patch I could unregister 35 interfaces per second and with
> the patch it jumped to 620
> interfaces per second. 512K is a lot of routes but I am assuming we
> would get a good improvement
> even with 100K routes in the fib.
>
> I am using event_list to put all the net namespaces in the current
> net_device batch on a list and only
> call the NETDEV_UNREGISTER_BATCH on those namespaces. It would be
> possible to just call the
> notifier for NETDEV_DOWN/UNREGISTER_BATCH for all the devices on the
> list and rely on the
> needs_fib_flush flag to only call fib_flush once per namespace but it
> seems like a waste to me.
>
> Salam

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown
  2016-01-05  0:35 ` Eric W. Biederman
@ 2016-01-05  1:12   ` Salam Noureddine
  2016-01-12  1:53     ` Salam Noureddine
  0 siblings, 1 reply; 15+ messages in thread
From: Salam Noureddine @ 2016-01-05  1:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, netdev

On Mon, Jan 4, 2016 at 4:35 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:

> Two things would be very valuable with this patchset.
>
> Some numbers on how much your changes have improved the code in the case
> you care about.  I suspect the improvements are not subtle so this
> should not be hard.
>
> Can you please provide a justification for event_list.  Just skimming
> through it appears that event_list because a duplicate of the list of
> batched network devices that are passed to dev_close_many and friends.
> If the list is actually a duplicate it appears foolish to create it.
>
> Eric

The performance test I ran tries to unregister 1000 dummy interfaces
with 512K routes in the fib.
Without the patch I could unregister 35 interfaces per second and with
the patch it jumped to 620
interfaces per second. 512K is a lot of routes but I am assuming we
would get a good improvement
even with 100K routes in the fib.

I am using event_list to put all the net namespaces in the current
net_device batch on a list and only
call the NETDEV_UNREGISTER_BATCH on those namespaces. It would be
possible to just call the
notifier for NETDEV_DOWN/UNREGISTER_BATCH for all the devices on the
list and rely on the
needs_fib_flush flag to only call fib_flush once per namespace but it
seems like a waste to me.

Salam

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown
  2016-01-04 23:10 [PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
@ 2016-01-05  0:35 ` Eric W. Biederman
  2016-01-05  1:12   ` Salam Noureddine
  0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2016-01-05  0:35 UTC (permalink / raw)
  To: Salam Noureddine
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, netdev

Salam Noureddine <noureddine@arista.com> writes:

> fib_flush walks the whole fib in a net_namespace and is called for
> each net_device being closed or unregistered. This can be very expensive
> when dealing with 100k or more routes in the fib and removal of a lot
> of interfaces. These four patches deal with this issue by calling fib_flush
> just once for each net namespace and introduce a new function arp_ifdown_all
> that does a similar optimization for the neighbour table.

Two things would be very valuable with this patchset.

Some numbers on how much your changes have improved the code in the case
you care about.  I suspect the improvements are not subtle so this
should not be hard.

Can you please provide a justification for event_list.  Just skimming
through it appears that event_list because a duplicate of the list of
batched network devices that are passed to dev_close_many and friends.
If the list is actually a duplicate it appears foolish to create it.

Eric

> Salam Noureddine (4):
>   net: add event_list to struct net and provide utility functions
>   net: dev: add batching to net_device notifiers
>   net: core: introduce neigh_ifdown_all for all down interfaces
>   net: fib: avoid calling fib_flush for each device when doing batch
>     close     and unregister
>
>  include/linux/netdevice.h   |  2 ++
>  include/net/arp.h           |  1 +
>  include/net/neighbour.h     |  1 +
>  include/net/net_namespace.h | 22 ++++++++++++++++++++++
>  include/net/netns/ipv4.h    |  1 +
>  net/core/dev.c              | 39 ++++++++++++++++++++++++++++++++++++---
>  net/core/neighbour.c        | 38 +++++++++++++++++++++++++++++++-------
>  net/ipv4/arp.c              |  4 ++++
>  net/ipv4/fib_frontend.c     | 16 ++++++++++++++--
>  9 files changed, 112 insertions(+), 12 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown
@ 2016-01-04 23:10 Salam Noureddine
  2016-01-05  0:35 ` Eric W. Biederman
  0 siblings, 1 reply; 15+ messages in thread
From: Salam Noureddine @ 2016-01-04 23:10 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Eric W. Biederman, netdev
  Cc: Salam Noureddine

fib_flush walks the whole fib in a net_namespace and is called for
each net_device being closed or unregistered. This can be very expensive
when dealing with 100k or more routes in the fib and removal of a lot
of interfaces. These four patches deal with this issue by calling fib_flush
just once for each net namespace and introduce a new function arp_ifdown_all
that does a similar optimization for the neighbour table.

Salam Noureddine (4):
  net: add event_list to struct net and provide utility functions
  net: dev: add batching to net_device notifiers
  net: core: introduce neigh_ifdown_all for all down interfaces
  net: fib: avoid calling fib_flush for each device when doing batch
    close     and unregister

 include/linux/netdevice.h   |  2 ++
 include/net/arp.h           |  1 +
 include/net/neighbour.h     |  1 +
 include/net/net_namespace.h | 22 ++++++++++++++++++++++
 include/net/netns/ipv4.h    |  1 +
 net/core/dev.c              | 39 ++++++++++++++++++++++++++++++++++++---
 net/core/neighbour.c        | 38 +++++++++++++++++++++++++++++++-------
 net/ipv4/arp.c              |  4 ++++
 net/ipv4/fib_frontend.c     | 16 ++++++++++++++--
 9 files changed, 112 insertions(+), 12 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-02-04  1:26 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-01 22:32 [PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
2016-02-01 22:32 ` [PATCH net-next 1/4] net: add event_list to struct net and provide utility functions Salam Noureddine
2016-02-02 20:01   ` Julian Anastasov
2016-02-02 23:58     ` Salam Noureddine
2016-02-01 22:32 ` [PATCH net-next 2/4] net: dev: add batching to net_device notifiers Salam Noureddine
2016-02-02 20:55   ` Julian Anastasov
2016-02-03  6:31     ` Salam Noureddine
2016-02-03  8:08       ` Julian Anastasov
2016-02-04  1:26         ` Salam Noureddine
2016-02-01 22:32 ` [PATCH net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces Salam Noureddine
2016-02-01 22:32 ` [PATCH net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister Salam Noureddine
  -- strict thread matches above, loose matches on Subject: below --
2016-01-04 23:10 [PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown Salam Noureddine
2016-01-05  0:35 ` Eric W. Biederman
2016-01-05  1:12   ` Salam Noureddine
2016-01-12  1:53     ` Salam Noureddine

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.