netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/7] net: speedup netns create/delete time
@ 2017-09-19 23:27 Eric Dumazet
  2017-09-19 23:27 ` [PATCH v2 net-next 1/7] kobject: add kobject_uevent_net_broadcast() Eric Dumazet
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Eric Dumazet @ 2017-09-19 23:27 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric W . Biederman, Eric Dumazet, Eric Dumazet

When rate of netns creation/deletion is high enough,
we observe softlockups in cleanup_net() caused by huge list
of netns and way too many rcu_barrier() calls.

This patch series does some optimizations in kobject,
and add batching to tunnels so that netns dismantles are
less costly.

IPv6 addrlabels also get a per netns list, and tcp_metrics
also benefit from batch flushing.

This gives me one order of magnitude gain.
(~50 ms -> ~5 ms for one netns create/delete pair)

Tested:

for i in `seq 1 40`
do
 (for j in `seq 1 100` ; do  unshare -n /bin/true >/dev/null ; done) &
done
wait ; grep net_namespace /proc/slabinfo

Before patch series :

$ time ./add_del_unshare.sh
net_namespace        116    258   5504    1    2 : tunables    8    4    0 : slabdata    116    258      0

real	3m24.910s
user	0m0.747s
sys	0m43.162s

After :
$ time ./add_del_unshare.sh
net_namespace        135    291   5504    1    2 : tunables    8    4    0 : slabdata    135    291      0

real	0m22.117s
user	0m0.728s
sys	0m35.328s


Eric Dumazet (7):
  kobject: add kobject_uevent_net_broadcast()
  kobject: copy env blob in one go
  kobject: factorize skb setup in kobject_uevent_net_broadcast()
  ipv6: addrlabel: per netns list
  tcp: batch tcp_net_metrics_exit
  ipv6: speedup ipv6 tunnels dismantle
  ipv4: speedup ipv6 tunnels dismantle

 include/net/ip_tunnels.h |  3 +-
 include/net/netns/ipv6.h |  5 +++
 lib/kobject_uevent.c     | 94 ++++++++++++++++++++++++++----------------------
 net/ipv4/ip_gre.c        | 22 +++++-------
 net/ipv4/ip_tunnel.c     | 12 +++++--
 net/ipv4/ip_vti.c        |  7 ++--
 net/ipv4/ipip.c          |  7 ++--
 net/ipv4/tcp_metrics.c   | 14 +++++---
 net/ipv6/addrlabel.c     | 81 ++++++++++++++++-------------------------
 net/ipv6/ip6_gre.c       |  8 +++--
 net/ipv6/ip6_tunnel.c    | 20 ++++++-----
 net/ipv6/ip6_vti.c       | 23 +++++++-----
 net/ipv6/sit.c           |  9 +++--
 13 files changed, 157 insertions(+), 148 deletions(-)

-- 
2.14.1.690.gbb1197296e-goog

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 net-next 1/7] kobject: add kobject_uevent_net_broadcast()
  2017-09-19 23:27 [PATCH v2 net-next 0/7] net: speedup netns create/delete time Eric Dumazet
@ 2017-09-19 23:27 ` Eric Dumazet
  2017-09-19 23:27 ` [PATCH v2 net-next 2/7] kobject: copy env blob in one go Eric Dumazet
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2017-09-19 23:27 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric W . Biederman, Eric Dumazet, Eric Dumazet

This removes some #ifdef pollution and will ease follow up patches.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 lib/kobject_uevent.c | 96 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 53 insertions(+), 43 deletions(-)

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index e590523ea4761425df5e112a2c2aab873dbaa90d..4f48cc3b11d566e44c4115cc7716bc3b1cdf96df 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -294,6 +294,57 @@ static void cleanup_uevent_env(struct subprocess_info *info)
 }
 #endif
 
+static int kobject_uevent_net_broadcast(struct kobject *kobj,
+					struct kobj_uevent_env *env,
+					const char *action_string,
+					const char *devpath)
+{
+	int retval = 0;
+#if defined(CONFIG_NET)
+	struct uevent_sock *ue_sk;
+
+	/* send netlink message */
+	list_for_each_entry(ue_sk, &uevent_sock_list, list) {
+		struct sock *uevent_sock = ue_sk->sk;
+		struct sk_buff *skb;
+		size_t len;
+
+		if (!netlink_has_listeners(uevent_sock, 1))
+			continue;
+
+		/* allocate message with the maximum possible size */
+		len = strlen(action_string) + strlen(devpath) + 2;
+		skb = alloc_skb(len + env->buflen, GFP_KERNEL);
+		if (skb) {
+			char *scratch;
+			int i;
+
+			/* add header */
+			scratch = skb_put(skb, len);
+			sprintf(scratch, "%s@%s", action_string, devpath);
+
+			/* copy keys to our continuous event payload buffer */
+			for (i = 0; i < env->envp_idx; i++) {
+				len = strlen(env->envp[i]) + 1;
+				scratch = skb_put(skb, len);
+				strcpy(scratch, env->envp[i]);
+			}
+
+			NETLINK_CB(skb).dst_group = 1;
+			retval = netlink_broadcast_filtered(uevent_sock, skb,
+							    0, 1, GFP_KERNEL,
+							    kobj_bcast_filter,
+							    kobj);
+			/* ENOBUFS should be handled in userspace */
+			if (retval == -ENOBUFS || retval == -ESRCH)
+				retval = 0;
+		} else
+			retval = -ENOMEM;
+	}
+#endif
+	return retval;
+}
+
 /**
  * kobject_uevent_env - send an uevent with environmental data
  *
@@ -316,9 +367,6 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 	const struct kset_uevent_ops *uevent_ops;
 	int i = 0;
 	int retval = 0;
-#ifdef CONFIG_NET
-	struct uevent_sock *ue_sk;
-#endif
 
 	pr_debug("kobject: '%s' (%p): %s\n",
 		 kobject_name(kobj), kobj, __func__);
@@ -427,46 +475,8 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 		mutex_unlock(&uevent_sock_mutex);
 		goto exit;
 	}
-
-#if defined(CONFIG_NET)
-	/* send netlink message */
-	list_for_each_entry(ue_sk, &uevent_sock_list, list) {
-		struct sock *uevent_sock = ue_sk->sk;
-		struct sk_buff *skb;
-		size_t len;
-
-		if (!netlink_has_listeners(uevent_sock, 1))
-			continue;
-
-		/* allocate message with the maximum possible size */
-		len = strlen(action_string) + strlen(devpath) + 2;
-		skb = alloc_skb(len + env->buflen, GFP_KERNEL);
-		if (skb) {
-			char *scratch;
-
-			/* add header */
-			scratch = skb_put(skb, len);
-			sprintf(scratch, "%s@%s", action_string, devpath);
-
-			/* copy keys to our continuous event payload buffer */
-			for (i = 0; i < env->envp_idx; i++) {
-				len = strlen(env->envp[i]) + 1;
-				scratch = skb_put(skb, len);
-				strcpy(scratch, env->envp[i]);
-			}
-
-			NETLINK_CB(skb).dst_group = 1;
-			retval = netlink_broadcast_filtered(uevent_sock, skb,
-							    0, 1, GFP_KERNEL,
-							    kobj_bcast_filter,
-							    kobj);
-			/* ENOBUFS should be handled in userspace */
-			if (retval == -ENOBUFS || retval == -ESRCH)
-				retval = 0;
-		} else
-			retval = -ENOMEM;
-	}
-#endif
+	retval = kobject_uevent_net_broadcast(kobj, env, action_string,
+					      devpath);
 	mutex_unlock(&uevent_sock_mutex);
 
 #ifdef CONFIG_UEVENT_HELPER
-- 
2.14.1.690.gbb1197296e-goog

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 net-next 2/7] kobject: copy env blob in one go
  2017-09-19 23:27 [PATCH v2 net-next 0/7] net: speedup netns create/delete time Eric Dumazet
  2017-09-19 23:27 ` [PATCH v2 net-next 1/7] kobject: add kobject_uevent_net_broadcast() Eric Dumazet
@ 2017-09-19 23:27 ` Eric Dumazet
  2017-09-19 23:27 ` [PATCH v2 net-next 3/7] kobject: factorize skb setup in kobject_uevent_net_broadcast() Eric Dumazet
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2017-09-19 23:27 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric W . Biederman, Eric Dumazet, Eric Dumazet

No need to iterate over strings, just copy in one efficient memcpy() call.

Tested:
time perf record "(for f in `seq 1 3000` ; do ip netns add tast$f; done)"
[ perf record: Woken up 10 times to write data ]
[ perf record: Captured and wrote 8.224 MB perf.data (~359301 samples) ]

real    0m52.554s  # instead of 1m7.492s
user    0m0.309s
sys 0m51.375s # instead of 1m6.875s

     9.88%       ip  [kernel.kallsyms]  [k] netlink_broadcast_filtered
     8.86%       ip  [kernel.kallsyms]  [k] string
     7.37%       ip  [kernel.kallsyms]  [k] __ip6addrlbl_add
     5.68%       ip  [kernel.kallsyms]  [k] netlink_has_listeners
     5.52%       ip  [kernel.kallsyms]  [k] memcpy_erms
     4.76%       ip  [kernel.kallsyms]  [k] __alloc_skb
     4.54%       ip  [kernel.kallsyms]  [k] vsnprintf
     3.94%       ip  [kernel.kallsyms]  [k] format_decode
     3.80%       ip  [kernel.kallsyms]  [k] kmem_cache_alloc_node_trace
     3.71%       ip  [kernel.kallsyms]  [k] kmem_cache_alloc_node
     3.66%       ip  [kernel.kallsyms]  [k] kobject_uevent_env
     3.38%       ip  [kernel.kallsyms]  [k] strlen
     2.65%       ip  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
     2.20%       ip  [kernel.kallsyms]  [k] kfree
     2.09%       ip  [kernel.kallsyms]  [k] memset_erms
     2.07%       ip  [kernel.kallsyms]  [k] ___cache_free
     1.95%       ip  [kernel.kallsyms]  [k] kmem_cache_free
     1.91%       ip  [kernel.kallsyms]  [k] _raw_read_lock
     1.45%       ip  [kernel.kallsyms]  [k] ksize
     1.25%       ip  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
     1.00%       ip  [kernel.kallsyms]  [k] widen_string

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 lib/kobject_uevent.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 4f48cc3b11d566e44c4115cc7716bc3b1cdf96df..78b2a7e378c0deda3b32b1178d7f44203702c3f2 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -317,18 +317,12 @@ static int kobject_uevent_net_broadcast(struct kobject *kobj,
 		skb = alloc_skb(len + env->buflen, GFP_KERNEL);
 		if (skb) {
 			char *scratch;
-			int i;
 
 			/* add header */
 			scratch = skb_put(skb, len);
 			sprintf(scratch, "%s@%s", action_string, devpath);
 
-			/* copy keys to our continuous event payload buffer */
-			for (i = 0; i < env->envp_idx; i++) {
-				len = strlen(env->envp[i]) + 1;
-				scratch = skb_put(skb, len);
-				strcpy(scratch, env->envp[i]);
-			}
+			skb_put_data(skb, env->buf, env->buflen);
 
 			NETLINK_CB(skb).dst_group = 1;
 			retval = netlink_broadcast_filtered(uevent_sock, skb,
-- 
2.14.1.690.gbb1197296e-goog

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 net-next 3/7] kobject: factorize skb setup in kobject_uevent_net_broadcast()
  2017-09-19 23:27 [PATCH v2 net-next 0/7] net: speedup netns create/delete time Eric Dumazet
  2017-09-19 23:27 ` [PATCH v2 net-next 1/7] kobject: add kobject_uevent_net_broadcast() Eric Dumazet
  2017-09-19 23:27 ` [PATCH v2 net-next 2/7] kobject: copy env blob in one go Eric Dumazet
@ 2017-09-19 23:27 ` Eric Dumazet
  2017-09-19 23:27 ` [PATCH v2 net-next 4/7] ipv6: addrlabel: per netns list Eric Dumazet
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2017-09-19 23:27 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric W . Biederman, Eric Dumazet, Eric Dumazet

We can build one skb and let it be cloned in netlink.

This is much faster, and use less memory (all clones will
share the same skb->head)

Tested:

time perf record (for f in `seq 1 3000` ; do ip netns add tast$f; done)
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.110 MB perf.data (~179584 samples) ]

real    0m24.227s # instead of 0m52.554s
user    0m0.329s
sys 0m23.753s # instead of 0m51.375s

    14.77%       ip  [kernel.kallsyms]  [k] __ip6addrlbl_add
    14.56%       ip  [kernel.kallsyms]  [k] netlink_broadcast_filtered
    11.65%       ip  [kernel.kallsyms]  [k] netlink_has_listeners
     6.19%       ip  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
     5.66%       ip  [kernel.kallsyms]  [k] kobject_uevent_env
     4.97%       ip  [kernel.kallsyms]  [k] memset_erms
     4.67%       ip  [kernel.kallsyms]  [k] refcount_sub_and_test
     4.41%       ip  [kernel.kallsyms]  [k] _raw_read_lock
     3.59%       ip  [kernel.kallsyms]  [k] refcount_inc_not_zero
     3.13%       ip  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
     1.55%       ip  [kernel.kallsyms]  [k] __wake_up
     1.20%       ip  [kernel.kallsyms]  [k] strlen
     1.03%       ip  [kernel.kallsyms]  [k] __wake_up_common
     0.93%       ip  [kernel.kallsyms]  [k] consume_skb
     0.92%       ip  [kernel.kallsyms]  [k] netlink_trim
     0.87%       ip  [kernel.kallsyms]  [k] insert_header
     0.63%       ip  [kernel.kallsyms]  [k] unmap_page_range

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 lib/kobject_uevent.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 78b2a7e378c0deda3b32b1178d7f44203702c3f2..147db91c10d06485868ff56626a5a9b073a8a846 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -301,23 +301,26 @@ static int kobject_uevent_net_broadcast(struct kobject *kobj,
 {
 	int retval = 0;
 #if defined(CONFIG_NET)
+	struct sk_buff *skb = NULL;
 	struct uevent_sock *ue_sk;
 
 	/* send netlink message */
 	list_for_each_entry(ue_sk, &uevent_sock_list, list) {
 		struct sock *uevent_sock = ue_sk->sk;
-		struct sk_buff *skb;
-		size_t len;
 
 		if (!netlink_has_listeners(uevent_sock, 1))
 			continue;
 
-		/* allocate message with the maximum possible size */
-		len = strlen(action_string) + strlen(devpath) + 2;
-		skb = alloc_skb(len + env->buflen, GFP_KERNEL);
-		if (skb) {
+		if (!skb) {
+			/* allocate message with the maximum possible size */
+			size_t len = strlen(action_string) + strlen(devpath) + 2;
 			char *scratch;
 
+			retval = -ENOMEM;
+			skb = alloc_skb(len + env->buflen, GFP_KERNEL);
+			if (!skb)
+				continue;
+
 			/* add header */
 			scratch = skb_put(skb, len);
 			sprintf(scratch, "%s@%s", action_string, devpath);
@@ -325,16 +328,17 @@ static int kobject_uevent_net_broadcast(struct kobject *kobj,
 			skb_put_data(skb, env->buf, env->buflen);
 
 			NETLINK_CB(skb).dst_group = 1;
-			retval = netlink_broadcast_filtered(uevent_sock, skb,
-							    0, 1, GFP_KERNEL,
-							    kobj_bcast_filter,
-							    kobj);
-			/* ENOBUFS should be handled in userspace */
-			if (retval == -ENOBUFS || retval == -ESRCH)
-				retval = 0;
-		} else
-			retval = -ENOMEM;
+		}
+
+		retval = netlink_broadcast_filtered(uevent_sock, skb_get(skb),
+						    0, 1, GFP_KERNEL,
+						    kobj_bcast_filter,
+						    kobj);
+		/* ENOBUFS should be handled in userspace */
+		if (retval == -ENOBUFS || retval == -ESRCH)
+			retval = 0;
 	}
+	consume_skb(skb);
 #endif
 	return retval;
 }
-- 
2.14.1.690.gbb1197296e-goog

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 net-next 4/7] ipv6: addrlabel: per netns list
  2017-09-19 23:27 [PATCH v2 net-next 0/7] net: speedup netns create/delete time Eric Dumazet
                   ` (2 preceding siblings ...)
  2017-09-19 23:27 ` [PATCH v2 net-next 3/7] kobject: factorize skb setup in kobject_uevent_net_broadcast() Eric Dumazet
@ 2017-09-19 23:27 ` Eric Dumazet
  2017-09-19 23:27 ` [PATCH v2 net-next 5/7] tcp: batch tcp_net_metrics_exit Eric Dumazet
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2017-09-19 23:27 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric W . Biederman, Eric Dumazet, Eric Dumazet

Having a global list of labels do not scale to thousands of
netns in the cloud era. This causes quadratic behavior on
netns creation and deletion.

This is time having a per netns list of ~10 labels.

Tested:

$ time perf record (for f in `seq 1 3000` ; do ip netns add tast$f; done)
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 3.637 MB perf.data (~158898 samples) ]

real    0m20.837s # instead of 0m24.227s
user    0m0.328s
sys     0m20.338s # instead of 0m23.753s

    16.17%       ip  [kernel.kallsyms]  [k] netlink_broadcast_filtered
    12.30%       ip  [kernel.kallsyms]  [k] netlink_has_listeners
     6.76%       ip  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
     5.78%       ip  [kernel.kallsyms]  [k] memset_erms
     5.77%       ip  [kernel.kallsyms]  [k] kobject_uevent_env
     5.18%       ip  [kernel.kallsyms]  [k] refcount_sub_and_test
     4.96%       ip  [kernel.kallsyms]  [k] _raw_read_lock
     3.82%       ip  [kernel.kallsyms]  [k] refcount_inc_not_zero
     3.33%       ip  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
     2.11%       ip  [kernel.kallsyms]  [k] unmap_page_range
     1.77%       ip  [kernel.kallsyms]  [k] __wake_up
     1.69%       ip  [kernel.kallsyms]  [k] strlen
     1.17%       ip  [kernel.kallsyms]  [k] __wake_up_common
     1.09%       ip  [kernel.kallsyms]  [k] insert_header
     1.04%       ip  [kernel.kallsyms]  [k] page_remove_rmap
     1.01%       ip  [kernel.kallsyms]  [k] consume_skb
     0.98%       ip  [kernel.kallsyms]  [k] netlink_trim
     0.51%       ip  [kernel.kallsyms]  [k] kernfs_link_sibling
     0.51%       ip  [kernel.kallsyms]  [k] filemap_map_pages
     0.46%       ip  [kernel.kallsyms]  [k] memcpy_erms

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/netns/ipv6.h |  5 +++
 net/ipv6/addrlabel.c     | 81 ++++++++++++++++++------------------------------
 2 files changed, 35 insertions(+), 51 deletions(-)

diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 2544f9760a4263b7f1b8d622331ca63038586137..2ea1ed341ef81901b4fa271b0f7f4592e17c4f8a 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -89,6 +89,11 @@ struct netns_ipv6 {
 	atomic_t		fib6_sernum;
 	struct seg6_pernet_data *seg6_data;
 	struct fib_notifier_ops	*notifier_ops;
+	struct {
+		struct hlist_head head;
+		spinlock_t	lock;
+		u32		seq;
+	} ip6addrlbl_table;
 };
 
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
diff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c
index b055bc79f56d555c89684116c1580984950f77a8..c6311d7108f651c7385cd6316752ba4a86667dcc 100644
--- a/net/ipv6/addrlabel.c
+++ b/net/ipv6/addrlabel.c
@@ -30,7 +30,6 @@
  * Policy Table
  */
 struct ip6addrlbl_entry {
-	possible_net_t lbl_net;
 	struct in6_addr prefix;
 	int prefixlen;
 	int ifindex;
@@ -41,19 +40,6 @@ struct ip6addrlbl_entry {
 	struct rcu_head rcu;
 };
 
-static struct ip6addrlbl_table
-{
-	struct hlist_head head;
-	spinlock_t lock;
-	u32 seq;
-} ip6addrlbl_table;
-
-static inline
-struct net *ip6addrlbl_net(const struct ip6addrlbl_entry *lbl)
-{
-	return read_pnet(&lbl->lbl_net);
-}
-
 /*
  * Default policy table (RFC6724 + extensions)
  *
@@ -148,13 +134,10 @@ static inline void ip6addrlbl_put(struct ip6addrlbl_entry *p)
 }
 
 /* Find label */
-static bool __ip6addrlbl_match(struct net *net,
-			       const struct ip6addrlbl_entry *p,
+static bool __ip6addrlbl_match(const struct ip6addrlbl_entry *p,
 			       const struct in6_addr *addr,
 			       int addrtype, int ifindex)
 {
-	if (!net_eq(ip6addrlbl_net(p), net))
-		return false;
 	if (p->ifindex && p->ifindex != ifindex)
 		return false;
 	if (p->addrtype && p->addrtype != addrtype)
@@ -169,8 +152,9 @@ static struct ip6addrlbl_entry *__ipv6_addr_label(struct net *net,
 						  int type, int ifindex)
 {
 	struct ip6addrlbl_entry *p;
-	hlist_for_each_entry_rcu(p, &ip6addrlbl_table.head, list) {
-		if (__ip6addrlbl_match(net, p, addr, type, ifindex))
+
+	hlist_for_each_entry_rcu(p, &net->ipv6.ip6addrlbl_table.head, list) {
+		if (__ip6addrlbl_match(p, addr, type, ifindex))
 			return p;
 	}
 	return NULL;
@@ -196,8 +180,7 @@ u32 ipv6_addr_label(struct net *net,
 }
 
 /* allocate one entry */
-static struct ip6addrlbl_entry *ip6addrlbl_alloc(struct net *net,
-						 const struct in6_addr *prefix,
+static struct ip6addrlbl_entry *ip6addrlbl_alloc(const struct in6_addr *prefix,
 						 int prefixlen, int ifindex,
 						 u32 label)
 {
@@ -236,24 +219,23 @@ static struct ip6addrlbl_entry *ip6addrlbl_alloc(struct net *net,
 	newp->addrtype = addrtype;
 	newp->label = label;
 	INIT_HLIST_NODE(&newp->list);
-	write_pnet(&newp->lbl_net, net);
 	refcount_set(&newp->refcnt, 1);
 	return newp;
 }
 
 /* add a label */
-static int __ip6addrlbl_add(struct ip6addrlbl_entry *newp, int replace)
+static int __ip6addrlbl_add(struct net *net, struct ip6addrlbl_entry *newp,
+			    int replace)
 {
-	struct hlist_node *n;
 	struct ip6addrlbl_entry *last = NULL, *p = NULL;
+	struct hlist_node *n;
 	int ret = 0;
 
 	ADDRLABEL(KERN_DEBUG "%s(newp=%p, replace=%d)\n", __func__, newp,
 		  replace);
 
-	hlist_for_each_entry_safe(p, n,	&ip6addrlbl_table.head, list) {
+	hlist_for_each_entry_safe(p, n,	&net->ipv6.ip6addrlbl_table.head, list) {
 		if (p->prefixlen == newp->prefixlen &&
-		    net_eq(ip6addrlbl_net(p), ip6addrlbl_net(newp)) &&
 		    p->ifindex == newp->ifindex &&
 		    ipv6_addr_equal(&p->prefix, &newp->prefix)) {
 			if (!replace) {
@@ -273,10 +255,10 @@ static int __ip6addrlbl_add(struct ip6addrlbl_entry *newp, int replace)
 	if (last)
 		hlist_add_behind_rcu(&newp->list, &last->list);
 	else
-		hlist_add_head_rcu(&newp->list, &ip6addrlbl_table.head);
+		hlist_add_head_rcu(&newp->list, &net->ipv6.ip6addrlbl_table.head);
 out:
 	if (!ret)
-		ip6addrlbl_table.seq++;
+		net->ipv6.ip6addrlbl_table.seq++;
 	return ret;
 }
 
@@ -292,12 +274,12 @@ static int ip6addrlbl_add(struct net *net,
 		  __func__, prefix, prefixlen, ifindex, (unsigned int)label,
 		  replace);
 
-	newp = ip6addrlbl_alloc(net, prefix, prefixlen, ifindex, label);
+	newp = ip6addrlbl_alloc(prefix, prefixlen, ifindex, label);
 	if (IS_ERR(newp))
 		return PTR_ERR(newp);
-	spin_lock(&ip6addrlbl_table.lock);
-	ret = __ip6addrlbl_add(newp, replace);
-	spin_unlock(&ip6addrlbl_table.lock);
+	spin_lock(&net->ipv6.ip6addrlbl_table.lock);
+	ret = __ip6addrlbl_add(net, newp, replace);
+	spin_unlock(&net->ipv6.ip6addrlbl_table.lock);
 	if (ret)
 		ip6addrlbl_free(newp);
 	return ret;
@@ -315,9 +297,8 @@ static int __ip6addrlbl_del(struct net *net,
 	ADDRLABEL(KERN_DEBUG "%s(prefix=%pI6, prefixlen=%d, ifindex=%d)\n",
 		  __func__, prefix, prefixlen, ifindex);
 
-	hlist_for_each_entry_safe(p, n, &ip6addrlbl_table.head, list) {
+	hlist_for_each_entry_safe(p, n, &net->ipv6.ip6addrlbl_table.head, list) {
 		if (p->prefixlen == prefixlen &&
-		    net_eq(ip6addrlbl_net(p), net) &&
 		    p->ifindex == ifindex &&
 		    ipv6_addr_equal(&p->prefix, prefix)) {
 			hlist_del_rcu(&p->list);
@@ -340,9 +321,9 @@ static int ip6addrlbl_del(struct net *net,
 		  __func__, prefix, prefixlen, ifindex);
 
 	ipv6_addr_prefix(&prefix_buf, prefix, prefixlen);
-	spin_lock(&ip6addrlbl_table.lock);
+	spin_lock(&net->ipv6.ip6addrlbl_table.lock);
 	ret = __ip6addrlbl_del(net, &prefix_buf, prefixlen, ifindex);
-	spin_unlock(&ip6addrlbl_table.lock);
+	spin_unlock(&net->ipv6.ip6addrlbl_table.lock);
 	return ret;
 }
 
@@ -354,6 +335,9 @@ static int __net_init ip6addrlbl_net_init(struct net *net)
 
 	ADDRLABEL(KERN_DEBUG "%s\n", __func__);
 
+	spin_lock_init(&net->ipv6.ip6addrlbl_table.lock);
+	INIT_HLIST_HEAD(&net->ipv6.ip6addrlbl_table.head);
+
 	for (i = 0; i < ARRAY_SIZE(ip6addrlbl_init_table); i++) {
 		int ret = ip6addrlbl_add(net,
 					 ip6addrlbl_init_table[i].prefix,
@@ -373,14 +357,12 @@ static void __net_exit ip6addrlbl_net_exit(struct net *net)
 	struct hlist_node *n;
 
 	/* Remove all labels belonging to the exiting net */
-	spin_lock(&ip6addrlbl_table.lock);
-	hlist_for_each_entry_safe(p, n, &ip6addrlbl_table.head, list) {
-		if (net_eq(ip6addrlbl_net(p), net)) {
-			hlist_del_rcu(&p->list);
-			ip6addrlbl_put(p);
-		}
+	spin_lock(&net->ipv6.ip6addrlbl_table.lock);
+	hlist_for_each_entry_safe(p, n, &net->ipv6.ip6addrlbl_table.head, list) {
+		hlist_del_rcu(&p->list);
+		ip6addrlbl_put(p);
 	}
-	spin_unlock(&ip6addrlbl_table.lock);
+	spin_unlock(&net->ipv6.ip6addrlbl_table.lock);
 }
 
 static struct pernet_operations ipv6_addr_label_ops = {
@@ -390,8 +372,6 @@ static struct pernet_operations ipv6_addr_label_ops = {
 
 int __init ipv6_addr_label_init(void)
 {
-	spin_lock_init(&ip6addrlbl_table.lock);
-
 	return register_pernet_subsys(&ipv6_addr_label_ops);
 }
 
@@ -510,11 +490,10 @@ static int ip6addrlbl_dump(struct sk_buff *skb, struct netlink_callback *cb)
 	int err;
 
 	rcu_read_lock();
-	hlist_for_each_entry_rcu(p, &ip6addrlbl_table.head, list) {
-		if (idx >= s_idx &&
-		    net_eq(ip6addrlbl_net(p), net)) {
+	hlist_for_each_entry_rcu(p, &net->ipv6.ip6addrlbl_table.head, list) {
+		if (idx >= s_idx) {
 			err = ip6addrlbl_fill(skb, p,
-					      ip6addrlbl_table.seq,
+					      net->ipv6.ip6addrlbl_table.seq,
 					      NETLINK_CB(cb->skb).portid,
 					      cb->nlh->nlmsg_seq,
 					      RTM_NEWADDRLABEL,
@@ -571,7 +550,7 @@ static int ip6addrlbl_get(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	p = __ipv6_addr_label(net, addr, ipv6_addr_type(addr), ifal->ifal_index);
 	if (p && !ip6addrlbl_hold(p))
 		p = NULL;
-	lseq = ip6addrlbl_table.seq;
+	lseq = net->ipv6.ip6addrlbl_table.seq;
 	rcu_read_unlock();
 
 	if (!p) {
-- 
2.14.1.690.gbb1197296e-goog

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 net-next 5/7] tcp: batch tcp_net_metrics_exit
  2017-09-19 23:27 [PATCH v2 net-next 0/7] net: speedup netns create/delete time Eric Dumazet
                   ` (3 preceding siblings ...)
  2017-09-19 23:27 ` [PATCH v2 net-next 4/7] ipv6: addrlabel: per netns list Eric Dumazet
@ 2017-09-19 23:27 ` Eric Dumazet
  2017-09-19 23:27 ` [PATCH v2 net-next 6/7] ipv6: speedup ipv6 tunnels dismantle Eric Dumazet
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2017-09-19 23:27 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric W . Biederman, Eric Dumazet, Eric Dumazet

When dealing with a list of dismantling netns, we can scan
tcp_metrics once, saving cpu cycles.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_metrics.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index 102b2c90bb807d3a88d31b59324baf72cf901cdf..0ab78abc811bef0388089befed672e3d4ee9d881 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -892,10 +892,14 @@ static void tcp_metrics_flush_all(struct net *net)
 
 	for (row = 0; row < max_rows; row++, hb++) {
 		struct tcp_metrics_block __rcu **pp;
+		bool match;
+
 		spin_lock_bh(&tcp_metrics_lock);
 		pp = &hb->chain;
 		for (tm = deref_locked(*pp); tm; tm = deref_locked(*pp)) {
-			if (net_eq(tm_net(tm), net)) {
+			match = net ? net_eq(tm_net(tm), net) :
+				!atomic_read(&tm_net(tm)->count);
+			if (match) {
 				*pp = tm->tcpm_next;
 				kfree_rcu(tm, rcu_head);
 			} else {
@@ -1018,14 +1022,14 @@ static int __net_init tcp_net_metrics_init(struct net *net)
 	return 0;
 }
 
-static void __net_exit tcp_net_metrics_exit(struct net *net)
+static void __net_exit tcp_net_metrics_exit_batch(struct list_head *net_exit_list)
 {
-	tcp_metrics_flush_all(net);
+	tcp_metrics_flush_all(NULL);
 }
 
 static __net_initdata struct pernet_operations tcp_net_metrics_ops = {
-	.init	=	tcp_net_metrics_init,
-	.exit	=	tcp_net_metrics_exit,
+	.init		=	tcp_net_metrics_init,
+	.exit_batch	=	tcp_net_metrics_exit_batch,
 };
 
 void __init tcp_metrics_init(void)
-- 
2.14.1.690.gbb1197296e-goog

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 net-next 6/7] ipv6: speedup ipv6 tunnels dismantle
  2017-09-19 23:27 [PATCH v2 net-next 0/7] net: speedup netns create/delete time Eric Dumazet
                   ` (4 preceding siblings ...)
  2017-09-19 23:27 ` [PATCH v2 net-next 5/7] tcp: batch tcp_net_metrics_exit Eric Dumazet
@ 2017-09-19 23:27 ` Eric Dumazet
  2017-09-19 23:27 ` [PATCH v2 net-next 7/7] ipv4: " Eric Dumazet
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2017-09-19 23:27 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric W . Biederman, Eric Dumazet, Eric Dumazet

Implement exit_batch() method to dismantle more devices
per round.

(rtnl_lock() ...
 unregister_netdevice_many() ...
 rtnl_unlock())

Tested:
$ cat add_del_unshare.sh
for i in `seq 1 40`
do
 (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
done
wait ; grep net_namespace /proc/slabinfo

Before patch :
$ time ./add_del_unshare.sh
net_namespace        110    267   5504    1    2 : tunables    8    4    0 : slabdata    110    267      0

real    3m25.292s
user    0m0.644s
sys     0m40.153s

After patch:

$ time ./add_del_unshare.sh
net_namespace        126    282   5504    1    2 : tunables    8    4    0 : slabdata    126    282      0

real	1m38.965s
user	0m0.688s
sys	0m37.017s

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/ip6_gre.c    |  8 +++++---
 net/ipv6/ip6_tunnel.c | 20 +++++++++++---------
 net/ipv6/ip6_vti.c    | 23 ++++++++++++++---------
 net/ipv6/sit.c        |  9 ++++++---
 4 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index b7a72d40933441f835708f55e2d8af371661a5fb..c82d41ef25e283ff92b1eed1f8b927c9d7b8f333 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1155,19 +1155,21 @@ static int __net_init ip6gre_init_net(struct net *net)
 	return err;
 }
 
-static void __net_exit ip6gre_exit_net(struct net *net)
+static void __net_exit ip6gre_exit_batch_net(struct list_head *net_list)
 {
+	struct net *net;
 	LIST_HEAD(list);
 
 	rtnl_lock();
-	ip6gre_destroy_tunnels(net, &list);
+	list_for_each_entry(net, net_list, exit_list)
+		ip6gre_destroy_tunnels(net, &list);
 	unregister_netdevice_many(&list);
 	rtnl_unlock();
 }
 
 static struct pernet_operations ip6gre_net_ops = {
 	.init = ip6gre_init_net,
-	.exit = ip6gre_exit_net,
+	.exit_batch = ip6gre_exit_batch_net,
 	.id   = &ip6gre_net_id,
 	.size = sizeof(struct ip6gre_net),
 };
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index ae73164559d5c4d7f2650ae63c56d76dc93b165c..3d6df489b39f00014f330340927c4d11a64911c2 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -2167,17 +2167,16 @@ static struct xfrm6_tunnel ip6ip6_handler __read_mostly = {
 	.priority	=	1,
 };
 
-static void __net_exit ip6_tnl_destroy_tunnels(struct net *net)
+static void __net_exit ip6_tnl_destroy_tunnels(struct net *net, struct list_head *list)
 {
 	struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id);
 	struct net_device *dev, *aux;
 	int h;
 	struct ip6_tnl *t;
-	LIST_HEAD(list);
 
 	for_each_netdev_safe(net, dev, aux)
 		if (dev->rtnl_link_ops == &ip6_link_ops)
-			unregister_netdevice_queue(dev, &list);
+			unregister_netdevice_queue(dev, list);
 
 	for (h = 0; h < IP6_TUNNEL_HASH_SIZE; h++) {
 		t = rtnl_dereference(ip6n->tnls_r_l[h]);
@@ -2186,12 +2185,10 @@ static void __net_exit ip6_tnl_destroy_tunnels(struct net *net)
 			 * been added to the list by the previous loop.
 			 */
 			if (!net_eq(dev_net(t->dev), net))
-				unregister_netdevice_queue(t->dev, &list);
+				unregister_netdevice_queue(t->dev, list);
 			t = rtnl_dereference(t->next);
 		}
 	}
-
-	unregister_netdevice_many(&list);
 }
 
 static int __net_init ip6_tnl_init_net(struct net *net)
@@ -2235,16 +2232,21 @@ static int __net_init ip6_tnl_init_net(struct net *net)
 	return err;
 }
 
-static void __net_exit ip6_tnl_exit_net(struct net *net)
+static void __net_exit ip6_tnl_exit_batch_net(struct list_head *net_list)
 {
+	struct net *net;
+	LIST_HEAD(list);
+
 	rtnl_lock();
-	ip6_tnl_destroy_tunnels(net);
+	list_for_each_entry(net, net_list, exit_list)
+		ip6_tnl_destroy_tunnels(net, &list);
+	unregister_netdevice_many(&list);
 	rtnl_unlock();
 }
 
 static struct pernet_operations ip6_tnl_net_ops = {
 	.init = ip6_tnl_init_net,
-	.exit = ip6_tnl_exit_net,
+	.exit_batch = ip6_tnl_exit_batch_net,
 	.id   = &ip6_tnl_net_id,
 	.size = sizeof(struct ip6_tnl_net),
 };
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 79444a4bfd6d245b66a7edcefe2b5b32801bf2c0..714914d1bb987c46cc98817903ec7bcc367a1b2d 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -1052,23 +1052,22 @@ static struct rtnl_link_ops vti6_link_ops __read_mostly = {
 	.get_link_net	= ip6_tnl_get_link_net,
 };
 
-static void __net_exit vti6_destroy_tunnels(struct vti6_net *ip6n)
+static void __net_exit vti6_destroy_tunnels(struct vti6_net *ip6n,
+					    struct list_head *list)
 {
 	int h;
 	struct ip6_tnl *t;
-	LIST_HEAD(list);
 
 	for (h = 0; h < IP6_VTI_HASH_SIZE; h++) {
 		t = rtnl_dereference(ip6n->tnls_r_l[h]);
 		while (t) {
-			unregister_netdevice_queue(t->dev, &list);
+			unregister_netdevice_queue(t->dev, list);
 			t = rtnl_dereference(t->next);
 		}
 	}
 
 	t = rtnl_dereference(ip6n->tnls_wc[0]);
-	unregister_netdevice_queue(t->dev, &list);
-	unregister_netdevice_many(&list);
+	unregister_netdevice_queue(t->dev, list);
 }
 
 static int __net_init vti6_init_net(struct net *net)
@@ -1108,18 +1107,24 @@ static int __net_init vti6_init_net(struct net *net)
 	return err;
 }
 
-static void __net_exit vti6_exit_net(struct net *net)
+static void __net_exit vti6_exit_batch_net(struct list_head *net_list)
 {
-	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+	struct vti6_net *ip6n;
+	struct net *net;
+	LIST_HEAD(list);
 
 	rtnl_lock();
-	vti6_destroy_tunnels(ip6n);
+	list_for_each_entry(net, net_list, exit_list) {
+		ip6n = net_generic(net, vti6_net_id);
+		vti6_destroy_tunnels(ip6n, &list);
+	}
+	unregister_netdevice_many(&list);
 	rtnl_unlock();
 }
 
 static struct pernet_operations vti6_net_ops = {
 	.init = vti6_init_net,
-	.exit = vti6_exit_net,
+	.exit_batch = vti6_exit_batch_net,
 	.id   = &vti6_net_id,
 	.size = sizeof(struct vti6_net),
 };
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index ac912bb217471c048df3b76aa3d7b82886221dc1..a799f525861487ad5b822ab62cdc90f6ca06762f 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1848,19 +1848,22 @@ static int __net_init sit_init_net(struct net *net)
 	return err;
 }
 
-static void __net_exit sit_exit_net(struct net *net)
+static void __net_exit sit_exit_batch_net(struct list_head *net_list)
 {
 	LIST_HEAD(list);
+	struct net *net;
 
 	rtnl_lock();
-	sit_destroy_tunnels(net, &list);
+	list_for_each_entry(net, net_list, exit_list)
+		sit_destroy_tunnels(net, &list);
+
 	unregister_netdevice_many(&list);
 	rtnl_unlock();
 }
 
 static struct pernet_operations sit_net_ops = {
 	.init = sit_init_net,
-	.exit = sit_exit_net,
+	.exit_batch = sit_exit_batch_net,
 	.id   = &sit_net_id,
 	.size = sizeof(struct sit_net),
 };
-- 
2.14.1.690.gbb1197296e-goog

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 net-next 7/7] ipv4: speedup ipv6 tunnels dismantle
  2017-09-19 23:27 [PATCH v2 net-next 0/7] net: speedup netns create/delete time Eric Dumazet
                   ` (5 preceding siblings ...)
  2017-09-19 23:27 ` [PATCH v2 net-next 6/7] ipv6: speedup ipv6 tunnels dismantle Eric Dumazet
@ 2017-09-19 23:27 ` Eric Dumazet
  2017-09-19 23:32 ` [PATCH v2 net-next 0/7] net: speedup netns create/delete time David Miller
  2017-09-26 11:21 ` Tariq Toukan
  8 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2017-09-19 23:27 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric W . Biederman, Eric Dumazet, Eric Dumazet

Implement exit_batch() method to dismantle more devices
per round.

(rtnl_lock() ...
 unregister_netdevice_many() ...
 rtnl_unlock())

Tested:
$ cat add_del_unshare.sh
for i in `seq 1 40`
do
 (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
done
wait ; grep net_namespace /proc/slabinfo

Before patch :
$ time ./add_del_unshare.sh
net_namespace        126    282   5504    1    2 : tunables    8    4    0 : slabdata    126    282      0

real    1m38.965s
user    0m0.688s
sys     0m37.017s

After patch:
$ time ./add_del_unshare.sh
net_namespace        135    291   5504    1    2 : tunables    8    4    0 : slabdata    135    291      0

real	0m22.117s
user	0m0.728s
sys	0m35.328s

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/ip_tunnels.h |  3 ++-
 net/ipv4/ip_gre.c        | 22 +++++++++-------------
 net/ipv4/ip_tunnel.c     | 12 +++++++++---
 net/ipv4/ip_vti.c        |  7 +++----
 net/ipv4/ipip.c          |  7 +++----
 5 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 992652856fe8c7c1032e0f5f92ce7ee5aa0119da..b41a1e057fcec9d6e4c5a0c1cafd1f1d537ccd53 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -258,7 +258,8 @@ int ip_tunnel_get_iflink(const struct net_device *dev);
 int ip_tunnel_init_net(struct net *net, unsigned int ip_tnl_net_id,
 		       struct rtnl_link_ops *ops, char *devname);
 
-void ip_tunnel_delete_net(struct ip_tunnel_net *itn, struct rtnl_link_ops *ops);
+void ip_tunnel_delete_nets(struct list_head *list_net, unsigned int id,
+			   struct rtnl_link_ops *ops);
 
 void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 		    const struct iphdr *tnl_params, const u8 protocol);
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 0162fb955b33abf18514cbfd482e72a0ebce6e48..9cee986ac6b8ed04ff95e193fe1e8e60e74d84a9 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1013,15 +1013,14 @@ static int __net_init ipgre_init_net(struct net *net)
 	return ip_tunnel_init_net(net, ipgre_net_id, &ipgre_link_ops, NULL);
 }
 
-static void __net_exit ipgre_exit_net(struct net *net)
+static void __net_exit ipgre_exit_batch_net(struct list_head *list_net)
 {
-	struct ip_tunnel_net *itn = net_generic(net, ipgre_net_id);
-	ip_tunnel_delete_net(itn, &ipgre_link_ops);
+	ip_tunnel_delete_nets(list_net, ipgre_net_id, &ipgre_link_ops);
 }
 
 static struct pernet_operations ipgre_net_ops = {
 	.init = ipgre_init_net,
-	.exit = ipgre_exit_net,
+	.exit_batch = ipgre_exit_batch_net,
 	.id   = &ipgre_net_id,
 	.size = sizeof(struct ip_tunnel_net),
 };
@@ -1540,15 +1539,14 @@ static int __net_init ipgre_tap_init_net(struct net *net)
 	return ip_tunnel_init_net(net, gre_tap_net_id, &ipgre_tap_ops, "gretap0");
 }
 
-static void __net_exit ipgre_tap_exit_net(struct net *net)
+static void __net_exit ipgre_tap_exit_batch_net(struct list_head *list_net)
 {
-	struct ip_tunnel_net *itn = net_generic(net, gre_tap_net_id);
-	ip_tunnel_delete_net(itn, &ipgre_tap_ops);
+	ip_tunnel_delete_nets(list_net, gre_tap_net_id, &ipgre_tap_ops);
 }
 
 static struct pernet_operations ipgre_tap_net_ops = {
 	.init = ipgre_tap_init_net,
-	.exit = ipgre_tap_exit_net,
+	.exit_batch = ipgre_tap_exit_batch_net,
 	.id   = &gre_tap_net_id,
 	.size = sizeof(struct ip_tunnel_net),
 };
@@ -1559,16 +1557,14 @@ static int __net_init erspan_init_net(struct net *net)
 				  &erspan_link_ops, "erspan0");
 }
 
-static void __net_exit erspan_exit_net(struct net *net)
+static void __net_exit erspan_exit_batch_net(struct list_head *net_list)
 {
-	struct ip_tunnel_net *itn = net_generic(net, erspan_net_id);
-
-	ip_tunnel_delete_net(itn, &erspan_link_ops);
+	ip_tunnel_delete_nets(net_list, erspan_net_id, &erspan_link_ops);
 }
 
 static struct pernet_operations erspan_net_ops = {
 	.init = erspan_init_net,
-	.exit = erspan_exit_net,
+	.exit_batch = erspan_exit_batch_net,
 	.id   = &erspan_net_id,
 	.size = sizeof(struct ip_tunnel_net),
 };
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index e9805ad664ac24c3405ad015cfaab89dc1c95279..fe6fee728ce49d01b55aa478698e1a3bcf9a3bdb 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -1061,16 +1061,22 @@ static void ip_tunnel_destroy(struct ip_tunnel_net *itn, struct list_head *head,
 	}
 }
 
-void ip_tunnel_delete_net(struct ip_tunnel_net *itn, struct rtnl_link_ops *ops)
+void ip_tunnel_delete_nets(struct list_head *net_list, unsigned int id,
+			   struct rtnl_link_ops *ops)
 {
+	struct ip_tunnel_net *itn;
+	struct net *net;
 	LIST_HEAD(list);
 
 	rtnl_lock();
-	ip_tunnel_destroy(itn, &list, ops);
+	list_for_each_entry(net, net_list, exit_list) {
+		itn = net_generic(net, id);
+		ip_tunnel_destroy(itn, &list, ops);
+	}
 	unregister_netdevice_many(&list);
 	rtnl_unlock();
 }
-EXPORT_SYMBOL_GPL(ip_tunnel_delete_net);
+EXPORT_SYMBOL_GPL(ip_tunnel_delete_nets);
 
 int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
 		      struct ip_tunnel_parm *p, __u32 fwmark)
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 5ed63d25095062d44dacfd291e227290d24ea0ed..02d70ca99db16f2a50e3e179a05e74b535865f46 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -452,15 +452,14 @@ static int __net_init vti_init_net(struct net *net)
 	return 0;
 }
 
-static void __net_exit vti_exit_net(struct net *net)
+static void __net_exit vti_exit_batch_net(struct list_head *list_net)
 {
-	struct ip_tunnel_net *itn = net_generic(net, vti_net_id);
-	ip_tunnel_delete_net(itn, &vti_link_ops);
+	ip_tunnel_delete_nets(list_net, vti_net_id, &vti_link_ops);
 }
 
 static struct pernet_operations vti_net_ops = {
 	.init = vti_init_net,
-	.exit = vti_exit_net,
+	.exit_batch = vti_exit_batch_net,
 	.id   = &vti_net_id,
 	.size = sizeof(struct ip_tunnel_net),
 };
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index fb1ad22b5e292d5669c70b5640ad3207c353c6bb..1e47818e38c766a3dab63dfa6bfa9610fa9550ac 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -634,15 +634,14 @@ static int __net_init ipip_init_net(struct net *net)
 	return ip_tunnel_init_net(net, ipip_net_id, &ipip_link_ops, "tunl0");
 }
 
-static void __net_exit ipip_exit_net(struct net *net)
+static void __net_exit ipip_exit_batch_net(struct list_head *list_net)
 {
-	struct ip_tunnel_net *itn = net_generic(net, ipip_net_id);
-	ip_tunnel_delete_net(itn, &ipip_link_ops);
+	ip_tunnel_delete_nets(list_net, ipip_net_id, &ipip_link_ops);
 }
 
 static struct pernet_operations ipip_net_ops = {
 	.init = ipip_init_net,
-	.exit = ipip_exit_net,
+	.exit_batch = ipip_exit_batch_net,
 	.id   = &ipip_net_id,
 	.size = sizeof(struct ip_tunnel_net),
 };
-- 
2.14.1.690.gbb1197296e-goog

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
  2017-09-19 23:27 [PATCH v2 net-next 0/7] net: speedup netns create/delete time Eric Dumazet
                   ` (6 preceding siblings ...)
  2017-09-19 23:27 ` [PATCH v2 net-next 7/7] ipv4: " Eric Dumazet
@ 2017-09-19 23:32 ` David Miller
  2017-09-26 11:21 ` Tariq Toukan
  8 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2017-09-19 23:32 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, ebiederm, eric.dumazet

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 19 Sep 2017 16:27:02 -0700

> When rate of netns creation/deletion is high enough,
> we observe softlockups in cleanup_net() caused by huge list
> of netns and way too many rcu_barrier() calls.
> 
> This patch series does some optimizations in kobject,
> and add batching to tunnels so that netns dismantles are
> less costly.
 ...

Series applied, thanks Eric.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
  2017-09-19 23:27 [PATCH v2 net-next 0/7] net: speedup netns create/delete time Eric Dumazet
                   ` (7 preceding siblings ...)
  2017-09-19 23:32 ` [PATCH v2 net-next 0/7] net: speedup netns create/delete time David Miller
@ 2017-09-26 11:21 ` Tariq Toukan
  2017-09-26 12:51   ` Eric Dumazet
  8 siblings, 1 reply; 21+ messages in thread
From: Tariq Toukan @ 2017-09-26 11:21 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Eric W . Biederman, Eric Dumazet, Majd Dibbiny,
	Yonatan Cohen, Eran Ben Elisha


On 20/09/2017 2:27 AM, Eric Dumazet wrote:
> When rate of netns creation/deletion is high enough,
> we observe softlockups in cleanup_net() caused by huge list
> of netns and way too many rcu_barrier() calls.
> 
> This patch series does some optimizations in kobject,
> and add batching to tunnels so that netns dismantles are
> less costly.
> 
> IPv6 addrlabels also get a per netns list, and tcp_metrics
> also benefit from batch flushing.
> 
> This gives me one order of magnitude gain.
> (~50 ms -> ~5 ms for one netns create/delete pair)
> 
...
> 
> Eric Dumazet (7):
>    kobject: add kobject_uevent_net_broadcast()
>    kobject: copy env blob in one go
>    kobject: factorize skb setup in kobject_uevent_net_broadcast()
>    ipv6: addrlabel: per netns list
>    tcp: batch tcp_net_metrics_exit
>    ipv6: speedup ipv6 tunnels dismantle
>    ipv4: speedup ipv6 tunnels dismantle
> 
>   include/net/ip_tunnels.h |  3 +-
>   include/net/netns/ipv6.h |  5 +++
>   lib/kobject_uevent.c     | 94 ++++++++++++++++++++++++++----------------------
>   net/ipv4/ip_gre.c        | 22 +++++-------
>   net/ipv4/ip_tunnel.c     | 12 +++++--
>   net/ipv4/ip_vti.c        |  7 ++--
>   net/ipv4/ipip.c          |  7 ++--
>   net/ipv4/tcp_metrics.c   | 14 +++++---
>   net/ipv6/addrlabel.c     | 81 ++++++++++++++++-------------------------
>   net/ipv6/ip6_gre.c       |  8 +++--
>   net/ipv6/ip6_tunnel.c    | 20 ++++++-----
>   net/ipv6/ip6_vti.c       | 23 +++++++-----
>   net/ipv6/sit.c           |  9 +++--
>   13 files changed, 157 insertions(+), 148 deletions(-)
> 

Hi Eric,

We see a regression introduced in this series, specifically in the 
patches touching lib/kobject_uevent.c.
We tried to figure out what is wrong there, but couldn't point it out.

Bug is that mlx4 driver restart fails, because mlx4_core is still in use.
According to module dependencies, both mlx4_en and mlx4_ib should have 
been unloaded at this point
Please see log below.

This looks to be some kind of a race, as the repro is not deterministic.
Probably the en/ib modules are now mistakenly reloaded.

Any idea what could this be?

Regards,
Tariq


[root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
Unloading HCA driver:                                      [  OK  ]
[root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd start
Loading HCA driver and Access Layer:                       [  OK  ]
[root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
Unloading mlx4_core                                        [FAILED]
rmmod: ERROR: Module mlx4_core is in use

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
  2017-09-26 11:21 ` Tariq Toukan
@ 2017-09-26 12:51   ` Eric Dumazet
  2017-09-26 15:04     ` Tariq Toukan
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2017-09-26 12:51 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S . Miller, netdev, Eric W . Biederman, Eric Dumazet,
	Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha

On Tue, Sep 26, 2017 at 4:21 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>
>
> Hi Eric,
>
> We see a regression introduced in this series, specifically in the patches
> touching lib/kobject_uevent.c.
> We tried to figure out what is wrong there, but couldn't point it out.
>
> Bug is that mlx4 driver restart fails, because mlx4_core is still in use.
> According to module dependencies, both mlx4_en and mlx4_ib should have been
> unloaded at this point
> Please see log below.
>
> This looks to be some kind of a race, as the repro is not deterministic.
> Probably the en/ib modules are now mistakenly reloaded.
>
> Any idea what could this be?
>
> Regards,
> Tariq
>
>
> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
> Unloading HCA driver:                                      [  OK  ]
> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd start
> Loading HCA driver and Access Layer:                       [  OK  ]
> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
> Unloading mlx4_core                                        [FAILED]
> rmmod: ERROR: Module mlx4_core is in use

I have absolutely no idea. Please bisect.

Are you really using netns in the first place ?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
  2017-09-26 12:51   ` Eric Dumazet
@ 2017-09-26 15:04     ` Tariq Toukan
  2017-09-26 15:13       ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Tariq Toukan @ 2017-09-26 15:04 UTC (permalink / raw)
  To: Eric Dumazet, Dmitry Torokhov
  Cc: David S . Miller, netdev, Eric W . Biederman, Eric Dumazet,
	Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha



On 26/09/2017 3:51 PM, Eric Dumazet wrote:
> On Tue, Sep 26, 2017 at 4:21 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>>
>> Hi Eric,
>>
>> We see a regression introduced in this series, specifically in the patches
>> touching lib/kobject_uevent.c.
>> We tried to figure out what is wrong there, but couldn't point it out.
>>
>> Bug is that mlx4 driver restart fails, because mlx4_core is still in use.
>> According to module dependencies, both mlx4_en and mlx4_ib should have been
>> unloaded at this point
>> Please see log below.
>>
>> This looks to be some kind of a race, as the repro is not deterministic.
>> Probably the en/ib modules are now mistakenly reloaded.
>>
>> Any idea what could this be?
>>
>> Regards,
>> Tariq
>>
>>
>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>> Unloading HCA driver:                                      [  OK  ]
>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd start
>> Loading HCA driver and Access Layer:                       [  OK  ]
>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>> Unloading mlx4_core                                        [FAILED]
>> rmmod: ERROR: Module mlx4_core is in use
> I have absolutely no idea. Please bisect.
We previously saw a similar issue, that was reported in mailing list.
Dmitry Torokhov suggested the following fix:
https://lkml.org/lkml/2017/9/12/523

And indeed, it solved the issue.

We kept the suggested patch in our internal branch, and rebased.
Issue appeared again once your series was accepted.

By bisecting, we see that the issue re-appears in this patch:
4a336a23d619 kobject: copy env blob in one go

>
> Are you really using netns in the first place ?
No. But seems like it still affects the modules load/unload.

Regards,
Tariq

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
  2017-09-26 15:04     ` Tariq Toukan
@ 2017-09-26 15:13       ` Eric Dumazet
  2017-09-26 15:22         ` Dmitry Torokhov
  2017-09-26 15:26         ` Tariq Toukan
  0 siblings, 2 replies; 21+ messages in thread
From: Eric Dumazet @ 2017-09-26 15:13 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Dmitry Torokhov, David S . Miller, netdev, Eric W . Biederman,
	Eric Dumazet, Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha

On Tue, Sep 26, 2017 at 8:04 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>
>
> On 26/09/2017 3:51 PM, Eric Dumazet wrote:
>>
>> On Tue, Sep 26, 2017 at 4:21 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>>>
>>>
>>> Hi Eric,
>>>
>>> We see a regression introduced in this series, specifically in the
>>> patches
>>> touching lib/kobject_uevent.c.
>>> We tried to figure out what is wrong there, but couldn't point it out.
>>>
>>> Bug is that mlx4 driver restart fails, because mlx4_core is still in use.
>>> According to module dependencies, both mlx4_en and mlx4_ib should have
>>> been
>>> unloaded at this point
>>> Please see log below.
>>>
>>> This looks to be some kind of a race, as the repro is not deterministic.
>>> Probably the en/ib modules are now mistakenly reloaded.
>>>
>>> Any idea what could this be?
>>>
>>> Regards,
>>> Tariq
>>>
>>>
>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>> Unloading HCA driver:                                      [  OK  ]
>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd start
>>> Loading HCA driver and Access Layer:                       [  OK  ]
>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>> Unloading mlx4_core                                        [FAILED]
>>> rmmod: ERROR: Module mlx4_core is in use
>>
>> I have absolutely no idea. Please bisect.
>
> We previously saw a similar issue, that was reported in mailing list.
> Dmitry Torokhov suggested the following fix:
> https://lkml.org/lkml/2017/9/12/523
>
> And indeed, it solved the issue.
>
> We kept the suggested patch in our internal branch, and rebased.
> Issue appeared again once your series was accepted.
>
> By bisecting, we see that the issue re-appears in this patch:
> 4a336a23d619 kobject: copy env blob in one go
>
>>
>> Are you really using netns in the first place ?
>
> No. But seems like it still affects the modules load/unload.
>
> Regards,
> Tariq

Ah this makes sense now.

Dmitry Torokhov hack breaks the assumption I used in my patch.

Since it is not upstream yet, I believe that it will need more work
before being in a proper state.

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
  2017-09-26 15:13       ` Eric Dumazet
@ 2017-09-26 15:22         ` Dmitry Torokhov
  2017-09-26 15:30           ` Eric Dumazet
  2017-09-26 15:26         ` Tariq Toukan
  1 sibling, 1 reply; 21+ messages in thread
From: Dmitry Torokhov @ 2017-09-26 15:22 UTC (permalink / raw)
  To: Eric Dumazet, Tariq Toukan
  Cc: David S . Miller, netdev, Eric W . Biederman, Eric Dumazet,
	Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha

On September 26, 2017 8:13:21 AM PDT, Eric Dumazet <edumazet@google.com> wrote:
>On Tue, Sep 26, 2017 at 8:04 AM, Tariq Toukan <tariqt@mellanox.com>
>wrote:
>>
>>
>> On 26/09/2017 3:51 PM, Eric Dumazet wrote:
>>>
>>> On Tue, Sep 26, 2017 at 4:21 AM, Tariq Toukan <tariqt@mellanox.com>
>wrote:
>>>>
>>>>
>>>> Hi Eric,
>>>>
>>>> We see a regression introduced in this series, specifically in the
>>>> patches
>>>> touching lib/kobject_uevent.c.
>>>> We tried to figure out what is wrong there, but couldn't point it
>out.
>>>>
>>>> Bug is that mlx4 driver restart fails, because mlx4_core is still
>in use.
>>>> According to module dependencies, both mlx4_en and mlx4_ib should
>have
>>>> been
>>>> unloaded at this point
>>>> Please see log below.
>>>>
>>>> This looks to be some kind of a race, as the repro is not
>deterministic.
>>>> Probably the en/ib modules are now mistakenly reloaded.
>>>>
>>>> Any idea what could this be?
>>>>
>>>> Regards,
>>>> Tariq
>>>>
>>>>
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>>> Unloading HCA driver:                                      [  OK  ]
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd start
>>>> Loading HCA driver and Access Layer:                       [  OK  ]
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>>> Unloading mlx4_core                                        [FAILED]
>>>> rmmod: ERROR: Module mlx4_core is in use
>>>
>>> I have absolutely no idea. Please bisect.
>>
>> We previously saw a similar issue, that was reported in mailing list.
>> Dmitry Torokhov suggested the following fix:
>> https://lkml.org/lkml/2017/9/12/523
>>
>> And indeed, it solved the issue.
>>
>> We kept the suggested patch in our internal branch, and rebased.
>> Issue appeared again once your series was accepted.
>>
>> By bisecting, we see that the issue re-appears in this patch:
>> 4a336a23d619 kobject: copy env blob in one go
>>
>>>
>>> Are you really using netns in the first place ?
>>
>> No. But seems like it still affects the modules load/unload.
>>
>> Regards,
>> Tariq
>
>Ah this makes sense now.
>
>Dmitry Torokhov hack breaks the assumption I used in my patch.
>
>Since it is not upstream yet, I believe that it will need more work
>before being in a proper state.

It is in Greg's tree where all kobject patches should go through as far as I know.


Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
  2017-09-26 15:13       ` Eric Dumazet
  2017-09-26 15:22         ` Dmitry Torokhov
@ 2017-09-26 15:26         ` Tariq Toukan
  1 sibling, 0 replies; 21+ messages in thread
From: Tariq Toukan @ 2017-09-26 15:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Dmitry Torokhov, David S . Miller, netdev, Eric W . Biederman,
	Eric Dumazet, Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha



On 26/09/2017 6:13 PM, Eric Dumazet wrote:
> On Tue, Sep 26, 2017 at 8:04 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>>
>> On 26/09/2017 3:51 PM, Eric Dumazet wrote:
>>> On Tue, Sep 26, 2017 at 4:21 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>>>>
>>>> Hi Eric,
>>>>
>>>> We see a regression introduced in this series, specifically in the
>>>> patches
>>>> touching lib/kobject_uevent.c.
>>>> We tried to figure out what is wrong there, but couldn't point it out.
>>>>
>>>> Bug is that mlx4 driver restart fails, because mlx4_core is still in use.
>>>> According to module dependencies, both mlx4_en and mlx4_ib should have
>>>> been
>>>> unloaded at this point
>>>> Please see log below.
>>>>
>>>> This looks to be some kind of a race, as the repro is not deterministic.
>>>> Probably the en/ib modules are now mistakenly reloaded.
>>>>
>>>> Any idea what could this be?
>>>>
>>>> Regards,
>>>> Tariq
>>>>
>>>>
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>>> Unloading HCA driver:                                      [  OK  ]
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd start
>>>> Loading HCA driver and Access Layer:                       [  OK  ]
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>>> Unloading mlx4_core                                        [FAILED]
>>>> rmmod: ERROR: Module mlx4_core is in use
>>> I have absolutely no idea. Please bisect.
>> We previously saw a similar issue, that was reported in mailing list.
>> Dmitry Torokhov suggested the following fix:
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2017%2F9%2F12%2F523&data=02%7C01%7Ctariqt%40mellanox.com%7C4a275c766aeb4224376e08d504f12193%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636420356043309380&sdata=GGeDFkX277R%2BKShsUPsePoAD6p5yaO2v0CteABtCrcY%3D&reserved=0
>>
>> And indeed, it solved the issue.
>>
>> We kept the suggested patch in our internal branch, and rebased.
>> Issue appeared again once your series was accepted.
>>
>> By bisecting, we see that the issue re-appears in this patch:
>> 4a336a23d619 kobject: copy env blob in one go
>>
>>> Are you really using netns in the first place ?
>> No. But seems like it still affects the modules load/unload.
>>
>> Regards,
>> Tariq
> Ah this makes sense now.
>
> Dmitry Torokhov hack breaks the assumption I used in my patch.
>
> Since it is not upstream yet, I believe that it will need more work
> before being in a proper state.
>
> Thanks.
I see. Thanks for the clarification.
I guess we'll keep only one patch for now, until issues are resolved.

Regards.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
  2017-09-26 15:22         ` Dmitry Torokhov
@ 2017-09-26 15:30           ` Eric Dumazet
  2017-10-19 11:48             ` Tariq Toukan
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2017-09-26 15:30 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Tariq Toukan, David S . Miller, netdev, Eric W . Biederman,
	Eric Dumazet, Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha

On Tue, Sep 26, 2017 at 8:22 AM, Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:

> It is in Greg's tree where all kobject patches should go through as far as I know.

Yes, I will fix this, adding a second memmove()

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
  2017-09-26 15:30           ` Eric Dumazet
@ 2017-10-19 11:48             ` Tariq Toukan
  2017-10-19 14:11               ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Tariq Toukan @ 2017-10-19 11:48 UTC (permalink / raw)
  To: Eric Dumazet, Dmitry Torokhov
  Cc: Tariq Toukan, David S . Miller, netdev, Eric W . Biederman,
	Eric Dumazet, Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha



On 26/09/2017 6:30 PM, Eric Dumazet wrote:
> On Tue, Sep 26, 2017 at 8:22 AM, Dmitry Torokhov
> <dmitry.torokhov@gmail.com> wrote:
> 
>> It is in Greg's tree where all kobject patches should go through as far as I know.
> 
> Yes, I will fix this, adding a second memmove()
> 
Hi Eric,

I just wanted to check if this is solved already, as I don't want to 
keep an unnecessary revert patch in our internal branches.
According to my check bug still exists.

Thanks,
Tariq

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
  2017-10-19 11:48             ` Tariq Toukan
@ 2017-10-19 14:11               ` Eric Dumazet
  2017-12-13 21:43                 ` Dmitry Torokhov
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2017-10-19 14:11 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Dmitry Torokhov, David S . Miller, netdev, Eric W . Biederman,
	Eric Dumazet, Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha

On Thu, Oct 19, 2017 at 4:48 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>
> Hi Eric,
>
> I just wanted to check if this is solved already, as I don't want to keep an
> unnecessary revert patch in our internal branches.
> According to my check bug still exists.
>
I will handle this today, thanks for the reminder.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
  2017-10-19 14:11               ` Eric Dumazet
@ 2017-12-13 21:43                 ` Dmitry Torokhov
  2017-12-13 21:52                   ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Dmitry Torokhov @ 2017-12-13 21:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tariq Toukan, David S . Miller, netdev, Eric W . Biederman,
	Eric Dumazet, Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha

Hi Eric,

On Thu, Oct 19, 2017 at 7:11 AM, Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Oct 19, 2017 at 4:48 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
> >
> > Hi Eric,
> >
> > I just wanted to check if this is solved already, as I don't want to keep an
> > unnecessary revert patch in our internal branches.
> > According to my check bug still exists.
> >
> I will handle this today, thanks for the reminder.

Did you have a chance to do this? It looks like the original change
landed on mainline and causes modules to be autoloaded on KOBJ_UNBIND
again.

Thanks!

-- 
Dmitry



-- 
Dmitry

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
  2017-12-13 21:43                 ` Dmitry Torokhov
@ 2017-12-13 21:52                   ` Eric Dumazet
  2017-12-13 22:24                     ` Dmitry Torokhov
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2017-12-13 21:52 UTC (permalink / raw)
  To: Dmitry Torokhov, Eric Dumazet
  Cc: Tariq Toukan, David S . Miller, netdev, Eric W . Biederman,
	Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha

On Wed, 2017-12-13 at 13:43 -0800, Dmitry Torokhov wrote:
> Hi Eric,
> 
> On Thu, Oct 19, 2017 at 7:11 AM, Eric Dumazet <edumazet@google.com> wrote:
> > 
> > On Thu, Oct 19, 2017 at 4:48 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
> > > 
> > > Hi Eric,
> > > 
> > > I just wanted to check if this is solved already, as I don't want to keep an
> > > unnecessary revert patch in our internal branches.
> > > According to my check bug still exists.
> > > 
> > 
> > I will handle this today, thanks for the reminder.
> 
> Did you have a chance to do this? It looks like the original change
> landed on mainline and causes modules to be autoloaded on KOBJ_UNBIND
> again.
> 
> Thanks!

I sent the following to Tariq, and he tested it successfully.

I will submit this formally.

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index c3e84edc47c965d40199b652ba78876cdaa9c70c..0795482b15d5a8f1b65b570a071aa1419cb923d8 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -346,19 +346,25 @@ static int kobject_uevent_net_broadcast(struct kobject *kobj,
 static void zap_modalias_env(struct kobj_uevent_env *env)
 {
 	static const char modalias_prefix[] = "MODALIAS=";
+	size_t offset = 0, len;
 	int i;
 
 	for (i = 0; i < env->envp_idx;) {
+		len = strlen(env->envp[i]) + 1;
 		if (strncmp(env->envp[i], modalias_prefix,
 			    sizeof(modalias_prefix) - 1)) {
 			i++;
+			offset += len;
 			continue;
 		}
 
-		if (i != env->envp_idx - 1)
+		env->buflen -= len;
+		if (i != env->envp_idx - 1) {
+			memmove(env->envp[i], env->envp[i + 1],
+				env->buflen - offset);
 			memmove(&env->envp[i], &env->envp[i + 1],
 				sizeof(env->envp[i]) * env->envp_idx - 1);
-
+		}
 		env->envp_idx--;
 	}
 }

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
  2017-12-13 21:52                   ` Eric Dumazet
@ 2017-12-13 22:24                     ` Dmitry Torokhov
  0 siblings, 0 replies; 21+ messages in thread
From: Dmitry Torokhov @ 2017-12-13 22:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, Tariq Toukan, David S . Miller, netdev,
	Eric W . Biederman, Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha

On Wed, Dec 13, 2017 at 1:52 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2017-12-13 at 13:43 -0800, Dmitry Torokhov wrote:
>> Hi Eric,
>>
>> On Thu, Oct 19, 2017 at 7:11 AM, Eric Dumazet <edumazet@google.com> wrote:
>> >
>> > On Thu, Oct 19, 2017 at 4:48 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>> > >
>> > > Hi Eric,
>> > >
>> > > I just wanted to check if this is solved already, as I don't want to keep an
>> > > unnecessary revert patch in our internal branches.
>> > > According to my check bug still exists.
>> > >
>> >
>> > I will handle this today, thanks for the reminder.
>>
>> Did you have a chance to do this? It looks like the original change
>> landed on mainline and causes modules to be autoloaded on KOBJ_UNBIND
>> again.
>>
>> Thanks!
>
> I sent the following to Tariq, and he tested it successfully.
>
> I will submit this formally.
>
> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> index c3e84edc47c965d40199b652ba78876cdaa9c70c..0795482b15d5a8f1b65b570a071aa1419cb923d8 100644
> --- a/lib/kobject_uevent.c
> +++ b/lib/kobject_uevent.c
> @@ -346,19 +346,25 @@ static int kobject_uevent_net_broadcast(struct kobject *kobj,
>  static void zap_modalias_env(struct kobj_uevent_env *env)
>  {
>         static const char modalias_prefix[] = "MODALIAS=";
> +       size_t offset = 0, len;
>         int i;
>
>         for (i = 0; i < env->envp_idx;) {
> +               len = strlen(env->envp[i]) + 1;
>                 if (strncmp(env->envp[i], modalias_prefix,
>                             sizeof(modalias_prefix) - 1)) {
>                         i++;
> +                       offset += len;
>                         continue;
>                 }
>
> -               if (i != env->envp_idx - 1)
> +               env->buflen -= len;
> +               if (i != env->envp_idx - 1) {
> +                       memmove(env->envp[i], env->envp[i + 1],
> +                               env->buflen - offset);
>                         memmove(&env->envp[i], &env->envp[i + 1],
>                                 sizeof(env->envp[i]) * env->envp_idx - 1);
> -
> +               }
>                 env->envp_idx--;
>         }
>  }
>

As I mentioned in the other thread, that works for netlink, but breaks
if you actually using env->envp pointers, as they also need to be
adjusted. I have a patch that fixes it properly.

Thanks!


-- 
Dmitry

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-12-13 22:24 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-19 23:27 [PATCH v2 net-next 0/7] net: speedup netns create/delete time Eric Dumazet
2017-09-19 23:27 ` [PATCH v2 net-next 1/7] kobject: add kobject_uevent_net_broadcast() Eric Dumazet
2017-09-19 23:27 ` [PATCH v2 net-next 2/7] kobject: copy env blob in one go Eric Dumazet
2017-09-19 23:27 ` [PATCH v2 net-next 3/7] kobject: factorize skb setup in kobject_uevent_net_broadcast() Eric Dumazet
2017-09-19 23:27 ` [PATCH v2 net-next 4/7] ipv6: addrlabel: per netns list Eric Dumazet
2017-09-19 23:27 ` [PATCH v2 net-next 5/7] tcp: batch tcp_net_metrics_exit Eric Dumazet
2017-09-19 23:27 ` [PATCH v2 net-next 6/7] ipv6: speedup ipv6 tunnels dismantle Eric Dumazet
2017-09-19 23:27 ` [PATCH v2 net-next 7/7] ipv4: " Eric Dumazet
2017-09-19 23:32 ` [PATCH v2 net-next 0/7] net: speedup netns create/delete time David Miller
2017-09-26 11:21 ` Tariq Toukan
2017-09-26 12:51   ` Eric Dumazet
2017-09-26 15:04     ` Tariq Toukan
2017-09-26 15:13       ` Eric Dumazet
2017-09-26 15:22         ` Dmitry Torokhov
2017-09-26 15:30           ` Eric Dumazet
2017-10-19 11:48             ` Tariq Toukan
2017-10-19 14:11               ` Eric Dumazet
2017-12-13 21:43                 ` Dmitry Torokhov
2017-12-13 21:52                   ` Eric Dumazet
2017-12-13 22:24                     ` Dmitry Torokhov
2017-09-26 15:26         ` Tariq Toukan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).