All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/6] ipv6: dst_entry socket caching improvments
@ 2014-09-11 23:21 Hannes Frederic Sowa
  2014-09-11 23:21 ` [PATCH RFC 1/6] ipv6: also increase fib6_node sernum on deletion events Hannes Frederic Sowa
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Hannes Frederic Sowa @ 2014-09-11 23:21 UTC (permalink / raw)
  To: netdev

Eric Dumazet noticed that rt6_nodes wich are neither RTF_NONEXTHOP nor
RTF_GATEWAY but DST_HOST ones cause major routing lookup churn because
their rt6_genid is never renewed, thus ip6_dst_check always considers
them outdated. This is a major problem, because these kind of routes
are normally used to in input handling.

This patchset tries to improve the situation by also updating the
fn_sernum in the routing tables during address deletion. The only
expensive operation left, which needs a walk over all routing tables,
are xfrm policy modifications.

I didn't annotate the patches with fixes-tags as it only solves a
performance issue. Please review carefully, thanks! I'll do some more
tests and will do a propoer submission if the xfrm slow paths looks ok
to everyone.

Hannes Frederic Sowa (6):
  ipv6: also increase fib6_node sernum on deletion events
  ipv6: no need to bump rt_genid_ipv6 on address change anymore
  ipv6: if no function for cleaner is specified only visit nodes
  ipv6: new function fib6_flush_trees and use it instead of bumping
    removed rt6_genid
  ipv6: keep rt_sernum per namespace to reduce number of flushes
  ipv6: switch rt_sernum to atomic_t and clean up types

 include/net/ip6_fib.h       |  2 +-
 include/net/net_namespace.h | 14 ++------
 include/net/netns/ipv6.h    |  2 +-
 net/ipv6/addrconf.c         |  1 -
 net/ipv6/addrconf_core.c    |  6 ++++
 net/ipv6/af_inet6.c         |  2 +-
 net/ipv6/ip6_fib.c          | 81 ++++++++++++++++++++++++++++++++-------------
 net/ipv6/route.c            |  4 ---
 8 files changed, 70 insertions(+), 42 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH RFC 1/6] ipv6: also increase fib6_node sernum on deletion events
  2014-09-11 23:21 [PATCH RFC 0/6] ipv6: dst_entry socket caching improvments Hannes Frederic Sowa
@ 2014-09-11 23:21 ` Hannes Frederic Sowa
  2014-09-16 16:18   ` Nicolas Dichtel
  2014-09-11 23:21 ` [PATCH RFC 2/6] ipv6: no need to bump rt_genid_ipv6 on address change anymore Hannes Frederic Sowa
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Hannes Frederic Sowa @ 2014-09-11 23:21 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, Vlad Yasevich, Nicolas Dichtel

fib6_add increases the fn_sernum of fib6_nodes while it traverses the
tree. This serial number is used by ip6_dst_check to judge whether a
relookup for the socket cache should be done (e.g. a better route is
available).

We didn't do so for fib6_del, so we missed relookups on ipv6 address
deletion events. Because this caused trouble in the SCTP stack, instead
the genid for ipv6 was bumped. Also TCP connections used old source
addresses, which were not available anymore.

Because we have static rt6_nodes in the tree (no RTF_GATEWAY,
RTF_NONEXTHOP nor RTF_CACHE nodes but still DST_HOST) flag, we ended up
in a situation where the genid of the routing node was always smaller
than the published genid in the namespace. That caused ip6_dst_check to
always discard the current dst_entry and a relookup happend.

This patch prepares for the removal of the ipv6 genid by also modifying
the fn_sernum on route deletion.

Thanks to Eric Dumazet who noticed this problem!

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv6/ip6_fib.c | 40 +++++++++++++++++++++++++---------------
 1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 76b7f5e..101efab 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -60,6 +60,7 @@ struct fib6_cleaner_t {
 	struct fib6_walker_t w;
 	struct net *net;
 	int (*func)(struct rt6_info *, void *arg);
+	u32 sernum;
 	void *arg;
 };
 
@@ -71,7 +72,8 @@ static DEFINE_RWLOCK(fib6_walker_lock);
 #define FWS_INIT FWS_L
 #endif
 
-static void fib6_prune_clones(struct net *net, struct fib6_node *fn);
+static void fib6_prune_clones(struct net *net, struct fib6_node *fn,
+			      u32 sernum);
 static struct rt6_info *fib6_find_prefix(struct net *net, struct fib6_node *fn);
 static struct fib6_node *fib6_repair_tree(struct net *net, struct fib6_node *fn);
 static int fib6_walk(struct fib6_walker_t *w);
@@ -84,7 +86,7 @@ static int fib6_walk_continue(struct fib6_walker_t *w);
  *	result of redirects, path MTU changes, etc.
  */
 
-static __u32 rt_sernum;
+static u32 rt_sernum;
 
 static void fib6_gc_timer_cb(unsigned long arg);
 
@@ -423,14 +425,14 @@ out:
 static struct fib6_node *fib6_add_1(struct fib6_node *root,
 				     struct in6_addr *addr, int plen,
 				     int offset, int allow_create,
-				     int replace_required)
+				     int replace_required,
+				     u32 sernum)
 {
 	struct fib6_node *fn, *in, *ln;
 	struct fib6_node *pn = NULL;
 	struct rt6key *key;
 	int	bit;
 	__be32	dir = 0;
-	__u32	sernum = fib6_new_sernum();
 
 	RT6_TRACE("fib6_add_1\n");
 
@@ -848,6 +850,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
 	int err = -ENOMEM;
 	int allow_create = 1;
 	int replace_required = 0;
+	u32 sernum = fib6_new_sernum();
 
 	if (info->nlh) {
 		if (!(info->nlh->nlmsg_flags & NLM_F_CREATE))
@@ -860,7 +863,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
 
 	fn = fib6_add_1(root, &rt->rt6i_dst.addr, rt->rt6i_dst.plen,
 			offsetof(struct rt6_info, rt6i_dst), allow_create,
-			replace_required);
+			replace_required, sernum);
 	if (IS_ERR(fn)) {
 		err = PTR_ERR(fn);
 		fn = NULL;
@@ -894,14 +897,14 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
 			sfn->leaf = info->nl_net->ipv6.ip6_null_entry;
 			atomic_inc(&info->nl_net->ipv6.ip6_null_entry->rt6i_ref);
 			sfn->fn_flags = RTN_ROOT;
-			sfn->fn_sernum = fib6_new_sernum();
+			sfn->fn_sernum = sernum;
 
 			/* Now add the first leaf node to new subtree */
 
 			sn = fib6_add_1(sfn, &rt->rt6i_src.addr,
 					rt->rt6i_src.plen,
 					offsetof(struct rt6_info, rt6i_src),
-					allow_create, replace_required);
+					allow_create, replace_required, sernum);
 
 			if (IS_ERR(sn)) {
 				/* If it is failed, discard just allocated
@@ -920,7 +923,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
 			sn = fib6_add_1(fn->subtree, &rt->rt6i_src.addr,
 					rt->rt6i_src.plen,
 					offsetof(struct rt6_info, rt6i_src),
-					allow_create, replace_required);
+					allow_create, replace_required, sernum);
 
 			if (IS_ERR(sn)) {
 				err = PTR_ERR(sn);
@@ -940,7 +943,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
 	if (!err) {
 		fib6_start_gc(info->nl_net, rt);
 		if (!(rt->rt6i_flags & RTF_CACHE))
-			fib6_prune_clones(info->nl_net, pn);
+			fib6_prune_clones(info->nl_net, pn, sernum);
 	}
 
 out:
@@ -1352,6 +1355,7 @@ int fib6_del(struct rt6_info *rt, struct nl_info *info)
 	struct net *net = info->nl_net;
 	struct fib6_node *fn = rt->rt6i_node;
 	struct rt6_info **rtp;
+	u32 sernum = fib6_new_sernum();
 
 #if RT6_DEBUG >= 2
 	if (rt->dst.obsolete > 0) {
@@ -1364,6 +1368,7 @@ int fib6_del(struct rt6_info *rt, struct nl_info *info)
 
 	WARN_ON(!(fn->fn_flags & RTN_RTINFO));
 
+	fn->fn_sernum = sernum;
 	if (!(rt->rt6i_flags & RTF_CACHE)) {
 		struct fib6_node *pn = fn;
 #ifdef CONFIG_IPV6_SUBTREES
@@ -1374,7 +1379,7 @@ int fib6_del(struct rt6_info *rt, struct nl_info *info)
 			pn = pn->parent;
 		}
 #endif
-		fib6_prune_clones(info->nl_net, pn);
+		fib6_prune_clones(info->nl_net, pn, sernum);
 	}
 
 	/*
@@ -1521,6 +1526,9 @@ static int fib6_clean_node(struct fib6_walker_t *w)
 		.nl_net = c->net,
 	};
 
+	if (c->sernum)
+		c->w.node->fn_sernum = c->sernum;
+
 	for (rt = w->leaf; rt; rt = rt->dst.rt6_next) {
 		res = c->func(rt, c->arg);
 		if (res < 0) {
@@ -1554,7 +1562,7 @@ static int fib6_clean_node(struct fib6_walker_t *w)
 
 static void fib6_clean_tree(struct net *net, struct fib6_node *root,
 			    int (*func)(struct rt6_info *, void *arg),
-			    int prune, void *arg)
+			    int prune, u32 sernum, void *arg)
 {
 	struct fib6_cleaner_t c;
 
@@ -1563,6 +1571,7 @@ static void fib6_clean_tree(struct net *net, struct fib6_node *root,
 	c.w.prune = prune;
 	c.w.count = 0;
 	c.w.skip = 0;
+	c.sernum = sernum;
 	c.func = func;
 	c.arg = arg;
 	c.net = net;
@@ -1583,7 +1592,7 @@ void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *arg),
 		hlist_for_each_entry_rcu(table, head, tb6_hlist) {
 			write_lock_bh(&table->tb6_lock);
 			fib6_clean_tree(net, &table->tb6_root,
-					func, 0, arg);
+					func, 0, 0, arg);
 			write_unlock_bh(&table->tb6_lock);
 		}
 	}
@@ -1600,9 +1609,10 @@ static int fib6_prune_clone(struct rt6_info *rt, void *arg)
 	return 0;
 }
 
-static void fib6_prune_clones(struct net *net, struct fib6_node *fn)
+static void fib6_prune_clones(struct net *net, struct fib6_node *fn,
+			      u32 sernum)
 {
-	fib6_clean_tree(net, fn, fib6_prune_clone, 1, NULL);
+	fib6_clean_tree(net, fn, fib6_prune_clone, 1, sernum, NULL);
 }
 
 /*
@@ -1811,7 +1821,7 @@ struct ipv6_route_iter {
 	struct fib6_walker_t w;
 	loff_t skip;
 	struct fib6_table *tbl;
-	__u32 sernum;
+	u32 sernum;
 };
 
 static int ipv6_route_seq_show(struct seq_file *seq, void *v)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 2/6] ipv6: no need to bump rt_genid_ipv6 on address change anymore
  2014-09-11 23:21 [PATCH RFC 0/6] ipv6: dst_entry socket caching improvments Hannes Frederic Sowa
  2014-09-11 23:21 ` [PATCH RFC 1/6] ipv6: also increase fib6_node sernum on deletion events Hannes Frederic Sowa
@ 2014-09-11 23:21 ` Hannes Frederic Sowa
  2014-09-11 23:21 ` [PATCH RFC 3/6] ipv6: if no function for cleaner is specified only visit nodes Hannes Frederic Sowa
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Hannes Frederic Sowa @ 2014-09-11 23:21 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, Vlad Yasevich, Nicolas Dichtel

The fn_sernum is now modified on fib6_del events so we don't need to
bump the ipv6 genid anymore.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv6/addrconf.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index ad4598f..16ce390 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4781,7 +4781,6 @@ static void __ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp)
 		break;
 	}
 	atomic_inc(&net->ipv6.dev_addr_genid);
-	rt_genid_bump_ipv6(net);
 }
 
 static void ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 3/6] ipv6: if no function for cleaner is specified only visit nodes
  2014-09-11 23:21 [PATCH RFC 0/6] ipv6: dst_entry socket caching improvments Hannes Frederic Sowa
  2014-09-11 23:21 ` [PATCH RFC 1/6] ipv6: also increase fib6_node sernum on deletion events Hannes Frederic Sowa
  2014-09-11 23:21 ` [PATCH RFC 2/6] ipv6: no need to bump rt_genid_ipv6 on address change anymore Hannes Frederic Sowa
@ 2014-09-11 23:21 ` Hannes Frederic Sowa
  2014-09-11 23:21 ` [PATCH RFC 4/6] ipv6: new function fib6_flush_trees and use it instead of bumping removed rt6_genid Hannes Frederic Sowa
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Hannes Frederic Sowa @ 2014-09-11 23:21 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, Vlad Yasevich, Nicolas Dichtel

We now allow NULL rt6_info walker functions to we only visit nodes.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv6/ip6_fib.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 101efab..590c5d2 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1529,6 +1529,13 @@ static int fib6_clean_node(struct fib6_walker_t *w)
 	if (c->sernum)
 		c->w.node->fn_sernum = c->sernum;
 
+	/* if func is NULL only visit node to update sernum */
+	if (!c->func) {
+		WARN_ON_ONCE(!c->sernum);
+		w->leaf = NULL;
+		return 0;
+	}
+
 	for (rt = w->leaf; rt; rt = rt->dst.rt6_next) {
 		res = c->func(rt, c->arg);
 		if (res < 0) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 4/6] ipv6: new function fib6_flush_trees and use it instead of bumping removed rt6_genid
  2014-09-11 23:21 [PATCH RFC 0/6] ipv6: dst_entry socket caching improvments Hannes Frederic Sowa
                   ` (2 preceding siblings ...)
  2014-09-11 23:21 ` [PATCH RFC 3/6] ipv6: if no function for cleaner is specified only visit nodes Hannes Frederic Sowa
@ 2014-09-11 23:21 ` Hannes Frederic Sowa
  2014-09-11 23:21 ` [PATCH RFC 5/6] ipv6: keep rt_sernum per namespace to reduce number of flushes Hannes Frederic Sowa
  2014-09-11 23:21 ` [PATCH RFC 6/6] ipv6: switch rt_sernum to atomic_t and clean up types Hannes Frederic Sowa
  5 siblings, 0 replies; 10+ messages in thread
From: Hannes Frederic Sowa @ 2014-09-11 23:21 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, Vlad Yasevich, Nicolas Dichtel

fib6_flush_trees is still a very costly operation but now is only called
by xfrm code when a policy changes.

fib6_flush_tree must walk all ipv6 routing tables and modify fn_sernum,
so all sockets relookup their dst_entries. Use a NULL callback, so we
only walk the nodes without looking at the rt6_infos.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/net_namespace.h | 14 +++-----------
 include/net/netns/ipv6.h    |  1 -
 net/ipv6/addrconf_core.c    |  6 ++++++
 net/ipv6/af_inet6.c         |  1 -
 net/ipv6/ip6_fib.c          | 23 ++++++++++++++++++++---
 net/ipv6/route.c            |  4 ----
 6 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 361d260..61aad36 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -353,21 +353,13 @@ static inline void rt_genid_bump_ipv4(struct net *net)
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
-static inline int rt_genid_ipv6(struct net *net)
-{
-	return atomic_read(&net->ipv6.rt_genid);
-}
-
+extern void (*__fib6_flush_trees)(struct net *);
 static inline void rt_genid_bump_ipv6(struct net *net)
 {
-	atomic_inc(&net->ipv6.rt_genid);
+	if (__fib6_flush_trees)
+		__fib6_flush_trees(net);
 }
 #else
-static inline int rt_genid_ipv6(struct net *net)
-{
-	return 0;
-}
-
 static inline void rt_genid_bump_ipv6(struct net *net)
 {
 }
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index eade27a..3291ba6 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -76,7 +76,6 @@ struct netns_ipv6 {
 #endif
 #endif
 	atomic_t		dev_addr_genid;
-	atomic_t		rt_genid;
 };
 
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c
index e696045..8b2d99a 100644
--- a/net/ipv6/addrconf_core.c
+++ b/net/ipv6/addrconf_core.c
@@ -10,6 +10,12 @@
 
 #define IPV6_ADDR_SCOPE_TYPE(scope)	((scope) << 16)
 
+/* if ipv6 module registers this function is used by xfrm to force
+ * all sockets to relookup their nodes - this is fairly expensive
+ */
+void (*__fib6_flush_trees)(struct net *);
+EXPORT_SYMBOL(__fib6_flush_trees);
+
 static inline unsigned int ipv6_addr_scope2type(unsigned int scope)
 {
 	switch (scope) {
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index e4865a3..2189d2d 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -766,7 +766,6 @@ static int __net_init inet6_net_init(struct net *net)
 	net->ipv6.sysctl.icmpv6_time = 1*HZ;
 	net->ipv6.sysctl.flowlabel_consistency = 1;
 	net->ipv6.sysctl.auto_flowlabels = 0;
-	atomic_set(&net->ipv6.rt_genid, 0);
 
 	err = ipv6_init_mibs(net);
 	if (err)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 590c5d2..cffee60 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1586,26 +1586,36 @@ static void fib6_clean_tree(struct net *net, struct fib6_node *root,
 	fib6_walk(&c.w);
 }
 
-void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *arg),
-		    void *arg)
+static void __fib6_clean_all(struct net *net,
+			     int (*func)(struct rt6_info *, void *arg),
+			     bool update_sernum, void *arg)
 {
 	struct fib6_table *table;
 	struct hlist_head *head;
 	unsigned int h;
+	u32 sernum = 0;
 
 	rcu_read_lock();
 	for (h = 0; h < FIB6_TABLE_HASHSZ; h++) {
 		head = &net->ipv6.fib_table_hash[h];
 		hlist_for_each_entry_rcu(table, head, tb6_hlist) {
 			write_lock_bh(&table->tb6_lock);
+			sernum = update_sernum && !sernum ?
+				 fib6_new_sernum() : 0;
 			fib6_clean_tree(net, &table->tb6_root,
-					func, 0, 0, arg);
+					func, 0, sernum, arg);
 			write_unlock_bh(&table->tb6_lock);
 		}
 	}
 	rcu_read_unlock();
 }
 
+void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *arg),
+		    void *arg)
+{
+	__fib6_clean_all(net, func, false, arg);
+}
+
 static int fib6_prune_clone(struct rt6_info *rt, void *arg)
 {
 	if (rt->rt6i_flags & RTF_CACHE) {
@@ -1622,6 +1632,11 @@ static void fib6_prune_clones(struct net *net, struct fib6_node *fn,
 	fib6_clean_tree(net, fn, fib6_prune_clone, 1, sernum, NULL);
 }
 
+static void fib6_flush_trees(struct net *net)
+{
+	__fib6_clean_all(net, NULL, true, NULL);
+}
+
 /*
  *	Garbage collection
  */
@@ -1805,6 +1820,8 @@ int __init fib6_init(void)
 			      NULL);
 	if (ret)
 		goto out_unregister_subsys;
+
+	__fib6_flush_trees = fib6_flush_trees;
 out:
 	return ret;
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f74b041..a318dd89 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -314,7 +314,6 @@ static inline struct rt6_info *ip6_dst_alloc(struct net *net,
 
 		memset(dst + 1, 0, sizeof(*rt) - sizeof(*dst));
 		rt6_init_peer(rt, table ? &table->tb6_peers : net->ipv6.peers);
-		rt->rt6i_genid = rt_genid_ipv6(net);
 		INIT_LIST_HEAD(&rt->rt6i_siblings);
 	}
 	return rt;
@@ -1096,9 +1095,6 @@ static struct dst_entry *ip6_dst_check(struct dst_entry *dst, u32 cookie)
 	 * DST_OBSOLETE_FORCE_CHK which forces validation calls down
 	 * into this function always.
 	 */
-	if (rt->rt6i_genid != rt_genid_ipv6(dev_net(rt->dst.dev)))
-		return NULL;
-
 	if (!rt->rt6i_node || (rt->rt6i_node->fn_sernum != cookie))
 		return NULL;
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 5/6] ipv6: keep rt_sernum per namespace to reduce number of flushes
  2014-09-11 23:21 [PATCH RFC 0/6] ipv6: dst_entry socket caching improvments Hannes Frederic Sowa
                   ` (3 preceding siblings ...)
  2014-09-11 23:21 ` [PATCH RFC 4/6] ipv6: new function fib6_flush_trees and use it instead of bumping removed rt6_genid Hannes Frederic Sowa
@ 2014-09-11 23:21 ` Hannes Frederic Sowa
  2014-09-11 23:21 ` [PATCH RFC 6/6] ipv6: switch rt_sernum to atomic_t and clean up types Hannes Frederic Sowa
  5 siblings, 0 replies; 10+ messages in thread
From: Hannes Frederic Sowa @ 2014-09-11 23:21 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, Vlad Yasevich, Nicolas Dichtel

This patch moves rt_sernum into per-namespace storage. This reduces number
of flushes needed when a xfrm policy gets modified in one namespace.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/netns/ipv6.h |  1 +
 net/ipv6/af_inet6.c      |  1 +
 net/ipv6/ip6_fib.c       | 21 +++++++++++----------
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 3291ba6..2319949 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -76,6 +76,7 @@ struct netns_ipv6 {
 #endif
 #endif
 	atomic_t		dev_addr_genid;
+	u32			rt_sernum;
 };
 
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 2189d2d..7ff8996 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -766,6 +766,7 @@ static int __net_init inet6_net_init(struct net *net)
 	net->ipv6.sysctl.icmpv6_time = 1*HZ;
 	net->ipv6.sysctl.flowlabel_consistency = 1;
 	net->ipv6.sysctl.auto_flowlabels = 0;
+	net->ipv6.rt_sernum = 1;
 
 	err = ipv6_init_mibs(net);
 	if (err)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index cffee60..0ffabe2 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -86,8 +86,6 @@ static int fib6_walk_continue(struct fib6_walker_t *w);
  *	result of redirects, path MTU changes, etc.
  */
 
-static u32 rt_sernum;
-
 static void fib6_gc_timer_cb(unsigned long arg);
 
 static LIST_HEAD(fib6_walkers);
@@ -106,12 +104,15 @@ static inline void fib6_walker_unlink(struct fib6_walker_t *w)
 	list_del(&w->lh);
 	write_unlock_bh(&fib6_walker_lock);
 }
-static __inline__ u32 fib6_new_sernum(void)
+
+static u32 fib6_new_sernum(struct net *net)
 {
-	u32 n = ++rt_sernum;
-	if ((__s32)n <= 0)
-		rt_sernum = n = 1;
-	return n;
+	int *n = &net->ipv6.rt_sernum;
+
+	(*n)++;
+	if ((s32)*n <= 0)
+		*n = 1;
+	return *n;
 }
 
 /*
@@ -850,7 +851,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
 	int err = -ENOMEM;
 	int allow_create = 1;
 	int replace_required = 0;
-	u32 sernum = fib6_new_sernum();
+	u32 sernum = fib6_new_sernum(dev_net(rt->dst.dev));
 
 	if (info->nlh) {
 		if (!(info->nlh->nlmsg_flags & NLM_F_CREATE))
@@ -1355,7 +1356,7 @@ int fib6_del(struct rt6_info *rt, struct nl_info *info)
 	struct net *net = info->nl_net;
 	struct fib6_node *fn = rt->rt6i_node;
 	struct rt6_info **rtp;
-	u32 sernum = fib6_new_sernum();
+	u32 sernum = fib6_new_sernum(dev_net(rt->dst.dev));
 
 #if RT6_DEBUG >= 2
 	if (rt->dst.obsolete > 0) {
@@ -1601,7 +1602,7 @@ static void __fib6_clean_all(struct net *net,
 		hlist_for_each_entry_rcu(table, head, tb6_hlist) {
 			write_lock_bh(&table->tb6_lock);
 			sernum = update_sernum && !sernum ?
-				 fib6_new_sernum() : 0;
+				 fib6_new_sernum(net) : 0;
 			fib6_clean_tree(net, &table->tb6_root,
 					func, 0, sernum, arg);
 			write_unlock_bh(&table->tb6_lock);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 6/6] ipv6: switch rt_sernum to atomic_t and clean up types
  2014-09-11 23:21 [PATCH RFC 0/6] ipv6: dst_entry socket caching improvments Hannes Frederic Sowa
                   ` (4 preceding siblings ...)
  2014-09-11 23:21 ` [PATCH RFC 5/6] ipv6: keep rt_sernum per namespace to reduce number of flushes Hannes Frederic Sowa
@ 2014-09-11 23:21 ` Hannes Frederic Sowa
  5 siblings, 0 replies; 10+ messages in thread
From: Hannes Frederic Sowa @ 2014-09-11 23:21 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, Vlad Yasevich, Nicolas Dichtel

Switch rt_sernum to atomic_t, make it concurrency safe (the old scheme
looked broken to me) and switch from u32 to int types for the fn_sernum.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/ip6_fib.h    |  2 +-
 include/net/netns/ipv6.h |  2 +-
 net/ipv6/af_inet6.c      |  2 +-
 net/ipv6/ip6_fib.c       | 40 ++++++++++++++++++++--------------------
 4 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 9bcb220..59c391f 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -64,7 +64,7 @@ struct fib6_node {
 
 	__u16			fn_bit;		/* bit key */
 	__u16			fn_flags;
-	__u32			fn_sernum;
+	int			fn_sernum;
 	struct rt6_info		*rr_ptr;
 };
 
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 2319949..7dee21b 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -76,7 +76,7 @@ struct netns_ipv6 {
 #endif
 #endif
 	atomic_t		dev_addr_genid;
-	u32			rt_sernum;
+	atomic_t		rt_sernum;
 };
 
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 7ff8996..6cde9b4 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -766,7 +766,7 @@ static int __net_init inet6_net_init(struct net *net)
 	net->ipv6.sysctl.icmpv6_time = 1*HZ;
 	net->ipv6.sysctl.flowlabel_consistency = 1;
 	net->ipv6.sysctl.auto_flowlabels = 0;
-	net->ipv6.rt_sernum = 1;
+	atomic_set(&net->ipv6.rt_sernum, 1);
 
 	err = ipv6_init_mibs(net);
 	if (err)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 0ffabe2..03234bb 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -60,7 +60,7 @@ struct fib6_cleaner_t {
 	struct fib6_walker_t w;
 	struct net *net;
 	int (*func)(struct rt6_info *, void *arg);
-	u32 sernum;
+	int sernum;
 	void *arg;
 };
 
@@ -73,7 +73,7 @@ static DEFINE_RWLOCK(fib6_walker_lock);
 #endif
 
 static void fib6_prune_clones(struct net *net, struct fib6_node *fn,
-			      u32 sernum);
+			      int sernum);
 static struct rt6_info *fib6_find_prefix(struct net *net, struct fib6_node *fn);
 static struct fib6_node *fib6_repair_tree(struct net *net, struct fib6_node *fn);
 static int fib6_walk(struct fib6_walker_t *w);
@@ -105,14 +105,17 @@ static inline void fib6_walker_unlink(struct fib6_walker_t *w)
 	write_unlock_bh(&fib6_walker_lock);
 }
 
-static u32 fib6_new_sernum(struct net *net)
+static int fib6_new_sernum(struct net *net)
 {
-	int *n = &net->ipv6.rt_sernum;
+	int old, new;
 
-	(*n)++;
-	if ((s32)*n <= 0)
-		*n = 1;
-	return *n;
+	do {
+		old = atomic_read(&net->ipv6.rt_sernum);
+		new = old + 1;
+		if (new <= 0)
+			new = 1;
+	} while (atomic_cmpxchg(&net->ipv6.rt_sernum, old, new) != old);
+	return new;
 }
 
 /*
@@ -427,7 +430,7 @@ static struct fib6_node *fib6_add_1(struct fib6_node *root,
 				     struct in6_addr *addr, int plen,
 				     int offset, int allow_create,
 				     int replace_required,
-				     u32 sernum)
+				     int sernum)
 {
 	struct fib6_node *fn, *in, *ln;
 	struct fib6_node *pn = NULL;
@@ -851,7 +854,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
 	int err = -ENOMEM;
 	int allow_create = 1;
 	int replace_required = 0;
-	u32 sernum = fib6_new_sernum(dev_net(rt->dst.dev));
+	int sernum = fib6_new_sernum(dev_net(rt->dst.dev));
 
 	if (info->nlh) {
 		if (!(info->nlh->nlmsg_flags & NLM_F_CREATE))
@@ -1356,7 +1359,7 @@ int fib6_del(struct rt6_info *rt, struct nl_info *info)
 	struct net *net = info->nl_net;
 	struct fib6_node *fn = rt->rt6i_node;
 	struct rt6_info **rtp;
-	u32 sernum = fib6_new_sernum(dev_net(rt->dst.dev));
+	int sernum = fib6_new_sernum(dev_net(rt->dst.dev));
 
 #if RT6_DEBUG >= 2
 	if (rt->dst.obsolete > 0) {
@@ -1570,7 +1573,7 @@ static int fib6_clean_node(struct fib6_walker_t *w)
 
 static void fib6_clean_tree(struct net *net, struct fib6_node *root,
 			    int (*func)(struct rt6_info *, void *arg),
-			    int prune, u32 sernum, void *arg)
+			    int prune, int sernum, void *arg)
 {
 	struct fib6_cleaner_t c;
 
@@ -1589,20 +1592,17 @@ static void fib6_clean_tree(struct net *net, struct fib6_node *root,
 
 static void __fib6_clean_all(struct net *net,
 			     int (*func)(struct rt6_info *, void *arg),
-			     bool update_sernum, void *arg)
+			     int sernum, void *arg)
 {
 	struct fib6_table *table;
 	struct hlist_head *head;
 	unsigned int h;
-	u32 sernum = 0;
 
 	rcu_read_lock();
 	for (h = 0; h < FIB6_TABLE_HASHSZ; h++) {
 		head = &net->ipv6.fib_table_hash[h];
 		hlist_for_each_entry_rcu(table, head, tb6_hlist) {
 			write_lock_bh(&table->tb6_lock);
-			sernum = update_sernum && !sernum ?
-				 fib6_new_sernum(net) : 0;
 			fib6_clean_tree(net, &table->tb6_root,
 					func, 0, sernum, arg);
 			write_unlock_bh(&table->tb6_lock);
@@ -1614,7 +1614,7 @@ static void __fib6_clean_all(struct net *net,
 void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *arg),
 		    void *arg)
 {
-	__fib6_clean_all(net, func, false, arg);
+	__fib6_clean_all(net, func, fib6_new_sernum(net), arg);
 }
 
 static int fib6_prune_clone(struct rt6_info *rt, void *arg)
@@ -1628,14 +1628,14 @@ static int fib6_prune_clone(struct rt6_info *rt, void *arg)
 }
 
 static void fib6_prune_clones(struct net *net, struct fib6_node *fn,
-			      u32 sernum)
+			      int sernum)
 {
 	fib6_clean_tree(net, fn, fib6_prune_clone, 1, sernum, NULL);
 }
 
 static void fib6_flush_trees(struct net *net)
 {
-	__fib6_clean_all(net, NULL, true, NULL);
+	__fib6_clean_all(net, NULL, fib6_new_sernum(net), NULL);
 }
 
 /*
@@ -1846,7 +1846,7 @@ struct ipv6_route_iter {
 	struct fib6_walker_t w;
 	loff_t skip;
 	struct fib6_table *tbl;
-	u32 sernum;
+	int sernum;
 };
 
 static int ipv6_route_seq_show(struct seq_file *seq, void *v)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC 1/6] ipv6: also increase fib6_node sernum on deletion events
  2014-09-11 23:21 ` [PATCH RFC 1/6] ipv6: also increase fib6_node sernum on deletion events Hannes Frederic Sowa
@ 2014-09-16 16:18   ` Nicolas Dichtel
  2014-09-16 21:05     ` Hannes Frederic Sowa
  2014-09-17 22:48     ` Hannes Frederic Sowa
  0 siblings, 2 replies; 10+ messages in thread
From: Nicolas Dichtel @ 2014-09-16 16:18 UTC (permalink / raw)
  To: Hannes Frederic Sowa, netdev; +Cc: Eric Dumazet, Vlad Yasevich

Le 12/09/2014 01:21, Hannes Frederic Sowa a écrit :
> fib6_add increases the fn_sernum of fib6_nodes while it traverses the
> tree. This serial number is used by ip6_dst_check to judge whether a
> relookup for the socket cache should be done (e.g. a better route is
> available).
>
> We didn't do so for fib6_del, so we missed relookups on ipv6 address
> deletion events. Because this caused trouble in the SCTP stack, instead
> the genid for ipv6 was bumped. Also TCP connections used old source
> addresses, which were not available anymore.
>
> Because we have static rt6_nodes in the tree (no RTF_GATEWAY,
> RTF_NONEXTHOP nor RTF_CACHE nodes but still DST_HOST) flag, we ended up
> in a situation where the genid of the routing node was always smaller
> than the published genid in the namespace. That caused ip6_dst_check to
> always discard the current dst_entry and a relookup happend.
>
> This patch prepares for the removal of the ipv6 genid by also modifying
> the fn_sernum on route deletion.
>
> Thanks to Eric Dumazet who noticed this problem!
>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Vlad Yasevich <vyasevich@gmail.com>
> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
This serie looks good to me. Thank you for working on this topic!


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC 1/6] ipv6: also increase fib6_node sernum on deletion events
  2014-09-16 16:18   ` Nicolas Dichtel
@ 2014-09-16 21:05     ` Hannes Frederic Sowa
  2014-09-17 22:48     ` Hannes Frederic Sowa
  1 sibling, 0 replies; 10+ messages in thread
From: Hannes Frederic Sowa @ 2014-09-16 21:05 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, Eric Dumazet, Vlad Yasevich

On Di, 2014-09-16 at 18:18 +0200, Nicolas Dichtel wrote:
> Le 12/09/2014 01:21, Hannes Frederic Sowa a écrit :
> > fib6_add increases the fn_sernum of fib6_nodes while it traverses the
> > tree. This serial number is used by ip6_dst_check to judge whether a
> > relookup for the socket cache should be done (e.g. a better route is
> > available).
> >
> > We didn't do so for fib6_del, so we missed relookups on ipv6 address
> > deletion events. Because this caused trouble in the SCTP stack, instead
> > the genid for ipv6 was bumped. Also TCP connections used old source
> > addresses, which were not available anymore.
> >
> > Because we have static rt6_nodes in the tree (no RTF_GATEWAY,
> > RTF_NONEXTHOP nor RTF_CACHE nodes but still DST_HOST) flag, we ended up
> > in a situation where the genid of the routing node was always smaller
> > than the published genid in the namespace. That caused ip6_dst_check to
> > always discard the current dst_entry and a relookup happend.
> >
> > This patch prepares for the removal of the ipv6 genid by also modifying
> > the fn_sernum on route deletion.
> >
> > Thanks to Eric Dumazet who noticed this problem!
> >
> > Cc: Eric Dumazet <eric.dumazet@gmail.com>
> > Cc: Vlad Yasevich <vyasevich@gmail.com>
> > Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> This serie looks good to me. Thank you for working on this topic!

Thanks a lot for your review, Nicolas! I'll do some more build tests I
got all the module handling correct and will send the formal submission.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC 1/6] ipv6: also increase fib6_node sernum on deletion events
  2014-09-16 16:18   ` Nicolas Dichtel
  2014-09-16 21:05     ` Hannes Frederic Sowa
@ 2014-09-17 22:48     ` Hannes Frederic Sowa
  1 sibling, 0 replies; 10+ messages in thread
From: Hannes Frederic Sowa @ 2014-09-17 22:48 UTC (permalink / raw)
  To: Nicolas Dichtel, netdev; +Cc: Eric Dumazet, Vlad Yasevich

On Tue, Sep 16, 2014, at 18:18, Nicolas Dichtel wrote:
> Le 12/09/2014 01:21, Hannes Frederic Sowa a écrit :
> > fib6_add increases the fn_sernum of fib6_nodes while it traverses the
> > tree. This serial number is used by ip6_dst_check to judge whether a
> > relookup for the socket cache should be done (e.g. a better route is
> > available).
> >
> > We didn't do so for fib6_del, so we missed relookups on ipv6 address
> > deletion events. Because this caused trouble in the SCTP stack, instead
> > the genid for ipv6 was bumped. Also TCP connections used old source
> > addresses, which were not available anymore.
> >
> > Because we have static rt6_nodes in the tree (no RTF_GATEWAY,
> > RTF_NONEXTHOP nor RTF_CACHE nodes but still DST_HOST) flag, we ended up
> > in a situation where the genid of the routing node was always smaller
> > than the published genid in the namespace. That caused ip6_dst_check to
> > always discard the current dst_entry and a relookup happend.
> >
> > This patch prepares for the removal of the ipv6 genid by also modifying
> > the fn_sernum on route deletion.
> >
> > Thanks to Eric Dumazet who noticed this problem!
> >
> > Cc: Eric Dumazet <eric.dumazet@gmail.com>
> > Cc: Vlad Yasevich <vyasevich@gmail.com>
> > Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> This serie looks good to me. Thank you for working on this topic!

I fear we still have to bump all fn_sernums on ipv6 address deletion
events, because the dst_check also implicitly checked for source address
notifications.

I have two approaches to solve this for the moment: we store the
ipv6.dev_addr_genid in the ipv6_pinfo or walk the whole tree or walk the
whole tree on device removal. Both approaches work here, one is faster
but ipv6 sockets need to be increased, other one needs to walk all nodes
on address removal.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-09-17 22:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-11 23:21 [PATCH RFC 0/6] ipv6: dst_entry socket caching improvments Hannes Frederic Sowa
2014-09-11 23:21 ` [PATCH RFC 1/6] ipv6: also increase fib6_node sernum on deletion events Hannes Frederic Sowa
2014-09-16 16:18   ` Nicolas Dichtel
2014-09-16 21:05     ` Hannes Frederic Sowa
2014-09-17 22:48     ` Hannes Frederic Sowa
2014-09-11 23:21 ` [PATCH RFC 2/6] ipv6: no need to bump rt_genid_ipv6 on address change anymore Hannes Frederic Sowa
2014-09-11 23:21 ` [PATCH RFC 3/6] ipv6: if no function for cleaner is specified only visit nodes Hannes Frederic Sowa
2014-09-11 23:21 ` [PATCH RFC 4/6] ipv6: new function fib6_flush_trees and use it instead of bumping removed rt6_genid Hannes Frederic Sowa
2014-09-11 23:21 ` [PATCH RFC 5/6] ipv6: keep rt_sernum per namespace to reduce number of flushes Hannes Frederic Sowa
2014-09-11 23:21 ` [PATCH RFC 6/6] ipv6: switch rt_sernum to atomic_t and clean up types Hannes Frederic Sowa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.