netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v2 PATCH 00/11] net: Significantly shrink the size of routes.
@ 2017-10-31 14:10 David S. Miller
  2017-10-31 14:10 ` [RFC v2 PATCH 01/11] net: dst->rt_next is unused David S. Miller
                   ` (10 more replies)
  0 siblings, 11 replies; 24+ messages in thread
From: David S. Miller @ 2017-10-31 14:10 UTC (permalink / raw)
  To: netdev; +Cc: David S . Miller

Through a combination of several things, our route structures are
larger than they need to be.

Mostly this stems from having members in dst_entry which are only used
by one class of routes.  So the majority of the work in this series is
about "un-commoning" these members and pushing them into the type
specific structures.

Unfortunately, IPSEC needed the most surgery.  The majority of the
changes here had to do with bundle creation and management.

The other issue is the refcount alignment in dst_entry.  Once we get
rid of the not-so-common members, it really opens the door to removing
that alignment entirely.

I think the new layout looks really nice, so I'll reproduce it here:

	struct net_device       *dev;
	struct  dst_ops	        *ops;
	unsigned long		_metrics;
	unsigned long           expires;
	struct xfrm_state	*xfrm;
	int			(*input)(struct sk_buff *);
	int			(*output)(struct net *net, struct sock *sk, struct sk_buff *skb);
	unsigned short		flags;
	short			obsolete;
	unsigned short		header_len;
	unsigned short		trailer_len;
	atomic_t		__refcnt;
	int			__use;
	unsigned long		lastuse;
	struct lwtunnel_state   *lwtstate;
	struct rcu_head		rcu_head;
	short			error;
	short			__pad;
	__u32			tclassid;

This is a rough draft, so there are still some problems to resolve.
In particular, the refcount alignment is only sorted out on 64-bit at
this time.  It shouldn't be too hard to either fix 32-bit or decide
that we don't care so much these days or can lower the target
alignment there to 32-bytes rather than 64-bytes.

So, the good news:

1) struct dst_entry shrinks from 160 to 112 bytes.

2) struct rtable shrinks from 216 to 168 bytes.

3) struct rt6_info shrinks from 384 to 320 bytes.

Enjoy.

v2:
	Collapse some patches logically based upon feedback.
	Fix the strange patch #7.

Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC v2 PATCH 01/11] net: dst->rt_next is unused.
  2017-10-31 14:10 [RFC v2 PATCH 00/11] net: Significantly shrink the size of routes David S. Miller
@ 2017-10-31 14:10 ` David S. Miller
  2017-10-31 18:36   ` Eric Dumazet
  2017-10-31 14:10 ` [RFC v2 PATCH 02/11] decnet: Move dn_next into decnet route structure David S. Miller
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: David S. Miller @ 2017-10-31 14:10 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

Delete it.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dst.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 2f53ecc2c296..1551fdeadc7a 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -100,7 +100,6 @@ struct dst_entry {
 	struct lwtunnel_state   *lwtstate;
 	union {
 		struct dst_entry	*next;
-		struct rtable __rcu	*rt_next;
 		struct rt6_info __rcu	*rt6_next;
 		struct dn_route __rcu	*dn_next;
 	};
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v2 PATCH 02/11] decnet: Move dn_next into decnet route structure.
  2017-10-31 14:10 [RFC v2 PATCH 00/11] net: Significantly shrink the size of routes David S. Miller
  2017-10-31 14:10 ` [RFC v2 PATCH 01/11] net: dst->rt_next is unused David S. Miller
@ 2017-10-31 14:10 ` David S. Miller
  2017-10-31 18:36   ` Eric Dumazet
  2017-10-31 14:10 ` [RFC v2 PATCH 03/11] ipv6: Move rt6_next from dst_entry into ipv6 " David S. Miller
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: David S. Miller @ 2017-10-31 14:10 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dn_route.h |  1 +
 include/net/dst.h      |  1 -
 net/decnet/dn_route.c  | 34 ++++++++++++++++++----------------
 3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/include/net/dn_route.h b/include/net/dn_route.h
index 55df9939bca2..342d2503cba5 100644
--- a/include/net/dn_route.h
+++ b/include/net/dn_route.h
@@ -69,6 +69,7 @@ int dn_route_rcv(struct sk_buff *skb, struct net_device *dev,
  */
 struct dn_route {
 	struct dst_entry dst;
+	struct dn_route __rcu *dn_next;
 
 	struct neighbour *n;
 
diff --git a/include/net/dst.h b/include/net/dst.h
index 1551fdeadc7a..6948217e4d37 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -101,7 +101,6 @@ struct dst_entry {
 	union {
 		struct dst_entry	*next;
 		struct rt6_info __rcu	*rt6_next;
-		struct dn_route __rcu	*dn_next;
 	};
 };
 
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index bff5ab88cdbb..fd43c442ab52 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -199,11 +199,11 @@ static void dn_dst_check_expire(unsigned long dummy)
 						lockdep_is_held(&dn_rt_hash_table[i].lock))) != NULL) {
 			if (atomic_read(&rt->dst.__refcnt) > 1 ||
 			    (now - rt->dst.lastuse) < expire) {
-				rtp = &rt->dst.dn_next;
+				rtp = &rt->dn_next;
 				continue;
 			}
-			*rtp = rt->dst.dn_next;
-			rt->dst.dn_next = NULL;
+			*rtp = rt->dn_next;
+			rt->dn_next = NULL;
 			dst_dev_put(&rt->dst);
 			dst_release(&rt->dst);
 		}
@@ -233,11 +233,11 @@ static int dn_dst_gc(struct dst_ops *ops)
 						lockdep_is_held(&dn_rt_hash_table[i].lock))) != NULL) {
 			if (atomic_read(&rt->dst.__refcnt) > 1 ||
 			    (now - rt->dst.lastuse) < expire) {
-				rtp = &rt->dst.dn_next;
+				rtp = &rt->dn_next;
 				continue;
 			}
-			*rtp = rt->dst.dn_next;
-			rt->dst.dn_next = NULL;
+			*rtp = rt->dn_next;
+			rt->dn_next = NULL;
 			dst_dev_put(&rt->dst);
 			dst_release(&rt->dst);
 			break;
@@ -333,8 +333,8 @@ static int dn_insert_route(struct dn_route *rt, unsigned int hash, struct dn_rou
 						lockdep_is_held(&dn_rt_hash_table[hash].lock))) != NULL) {
 		if (compare_keys(&rth->fld, &rt->fld)) {
 			/* Put it first */
-			*rthp = rth->dst.dn_next;
-			rcu_assign_pointer(rth->dst.dn_next,
+			*rthp = rth->dn_next;
+			rcu_assign_pointer(rth->dn_next,
 					   dn_rt_hash_table[hash].chain);
 			rcu_assign_pointer(dn_rt_hash_table[hash].chain, rth);
 
@@ -345,10 +345,10 @@ static int dn_insert_route(struct dn_route *rt, unsigned int hash, struct dn_rou
 			*rp = rth;
 			return 0;
 		}
-		rthp = &rth->dst.dn_next;
+		rthp = &rth->dn_next;
 	}
 
-	rcu_assign_pointer(rt->dst.dn_next, dn_rt_hash_table[hash].chain);
+	rcu_assign_pointer(rt->dn_next, dn_rt_hash_table[hash].chain);
 	rcu_assign_pointer(dn_rt_hash_table[hash].chain, rt);
 
 	dst_hold_and_use(&rt->dst, now);
@@ -369,8 +369,8 @@ static void dn_run_flush(unsigned long dummy)
 			goto nothing_to_declare;
 
 		for(; rt; rt = next) {
-			next = rcu_dereference_raw(rt->dst.dn_next);
-			RCU_INIT_POINTER(rt->dst.dn_next, NULL);
+			next = rcu_dereference_raw(rt->dn_next);
+			RCU_INIT_POINTER(rt->dn_next, NULL);
 			dst_dev_put(&rt->dst);
 			dst_release(&rt->dst);
 		}
@@ -1183,6 +1183,7 @@ static int dn_route_output_slow(struct dst_entry **pprt, const struct flowidn *o
 	if (rt == NULL)
 		goto e_nobufs;
 
+	rt->dn_next = NULL;
 	memset(&rt->fld, 0, sizeof(rt->fld));
 	rt->fld.saddr        = oldflp->saddr;
 	rt->fld.daddr        = oldflp->daddr;
@@ -1252,7 +1253,7 @@ static int __dn_route_output_key(struct dst_entry **pprt, const struct flowidn *
 	if (!(flags & MSG_TRYHARD)) {
 		rcu_read_lock_bh();
 		for (rt = rcu_dereference_bh(dn_rt_hash_table[hash].chain); rt;
-			rt = rcu_dereference_bh(rt->dst.dn_next)) {
+			rt = rcu_dereference_bh(rt->dn_next)) {
 			if ((flp->daddr == rt->fld.daddr) &&
 			    (flp->saddr == rt->fld.saddr) &&
 			    (flp->flowidn_mark == rt->fld.flowidn_mark) &&
@@ -1448,6 +1449,7 @@ static int dn_route_input_slow(struct sk_buff *skb)
 	if (rt == NULL)
 		goto e_nobufs;
 
+	rt->dn_next = NULL;
 	memset(&rt->fld, 0, sizeof(rt->fld));
 	rt->rt_saddr      = fld.saddr;
 	rt->rt_daddr      = fld.daddr;
@@ -1529,7 +1531,7 @@ static int dn_route_input(struct sk_buff *skb)
 
 	rcu_read_lock();
 	for(rt = rcu_dereference(dn_rt_hash_table[hash].chain); rt != NULL;
-	    rt = rcu_dereference(rt->dst.dn_next)) {
+	    rt = rcu_dereference(rt->dn_next)) {
 		if ((rt->fld.saddr == cb->src) &&
 		    (rt->fld.daddr == cb->dst) &&
 		    (rt->fld.flowidn_oif == 0) &&
@@ -1749,7 +1751,7 @@ int dn_cache_dump(struct sk_buff *skb, struct netlink_callback *cb)
 		rcu_read_lock_bh();
 		for(rt = rcu_dereference_bh(dn_rt_hash_table[h].chain), idx = 0;
 			rt;
-			rt = rcu_dereference_bh(rt->dst.dn_next), idx++) {
+			rt = rcu_dereference_bh(rt->dn_next), idx++) {
 			if (idx < s_idx)
 				continue;
 			skb_dst_set(skb, dst_clone(&rt->dst));
@@ -1795,7 +1797,7 @@ static struct dn_route *dn_rt_cache_get_next(struct seq_file *seq, struct dn_rou
 {
 	struct dn_rt_cache_iter_state *s = seq->private;
 
-	rt = rcu_dereference_bh(rt->dst.dn_next);
+	rt = rcu_dereference_bh(rt->dn_next);
 	while (!rt) {
 		rcu_read_unlock_bh();
 		if (--s->bucket < 0)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v2 PATCH 03/11] ipv6: Move rt6_next from dst_entry into ipv6 route structure.
  2017-10-31 14:10 [RFC v2 PATCH 00/11] net: Significantly shrink the size of routes David S. Miller
  2017-10-31 14:10 ` [RFC v2 PATCH 01/11] net: dst->rt_next is unused David S. Miller
  2017-10-31 14:10 ` [RFC v2 PATCH 02/11] decnet: Move dn_next into decnet route structure David S. Miller
@ 2017-10-31 14:10 ` David S. Miller
  2017-10-31 18:37   ` Eric Dumazet
  2017-10-31 14:10 ` [RFC v2 PATCH 04/11] net: Create and use new helper xfrm_dst_child() David S. Miller
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: David S. Miller @ 2017-10-31 14:10 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dst.h     |  1 -
 include/net/ip6_fib.h |  5 +++--
 net/ipv6/ip6_fib.c    | 26 +++++++++++++-------------
 net/ipv6/route.c      | 10 +++++-----
 4 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 6948217e4d37..83a790b16007 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -100,7 +100,6 @@ struct dst_entry {
 	struct lwtunnel_state   *lwtstate;
 	union {
 		struct dst_entry	*next;
-		struct rt6_info __rcu	*rt6_next;
 	};
 };
 
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 10c913816032..281a922f0c62 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -129,6 +129,7 @@ struct rt6_exception {
 
 struct rt6_info {
 	struct dst_entry		dst;
+	struct rt6_info __rcu		*rt6_next;
 
 	/*
 	 * Tail elements of dst_entry (__refcnt etc.)
@@ -176,11 +177,11 @@ struct rt6_info {
 
 #define for_each_fib6_node_rt_rcu(fn)					\
 	for (rt = rcu_dereference((fn)->leaf); rt;			\
-	     rt = rcu_dereference(rt->dst.rt6_next))
+	     rt = rcu_dereference(rt->rt6_next))
 
 #define for_each_fib6_walker_rt(w)					\
 	for (rt = (w)->leaf; rt;					\
-	     rt = rcu_dereference_protected(rt->dst.rt6_next, 1))
+	     rt = rcu_dereference_protected(rt->rt6_next, 1))
 
 static inline struct inet6_dev *ip6_dst_idev(struct dst_entry *dst)
 {
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 1ada9672d198..0b2f7fd3e876 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -890,7 +890,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 	ins = &fn->leaf;
 
 	for (iter = leaf; iter;
-	     iter = rcu_dereference_protected(iter->dst.rt6_next,
+	     iter = rcu_dereference_protected(iter->rt6_next,
 				lockdep_is_held(&rt->rt6i_table->tb6_lock))) {
 		/*
 		 *	Search for duplicates
@@ -947,7 +947,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 			break;
 
 next_iter:
-		ins = &iter->dst.rt6_next;
+		ins = &iter->rt6_next;
 	}
 
 	if (fallback_ins && !found) {
@@ -976,7 +976,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 					      &sibling->rt6i_siblings);
 				break;
 			}
-			sibling = rcu_dereference_protected(sibling->dst.rt6_next,
+			sibling = rcu_dereference_protected(sibling->rt6_next,
 				    lockdep_is_held(&rt->rt6i_table->tb6_lock));
 		}
 		/* For each sibling in the list, increment the counter of
@@ -1006,7 +1006,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 		if (err)
 			return err;
 
-		rcu_assign_pointer(rt->dst.rt6_next, iter);
+		rcu_assign_pointer(rt->rt6_next, iter);
 		atomic_inc(&rt->rt6i_ref);
 		rcu_assign_pointer(rt->rt6i_node, fn);
 		rcu_assign_pointer(*ins, rt);
@@ -1037,7 +1037,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 
 		atomic_inc(&rt->rt6i_ref);
 		rcu_assign_pointer(rt->rt6i_node, fn);
-		rt->dst.rt6_next = iter->dst.rt6_next;
+		rt->rt6_next = iter->rt6_next;
 		rcu_assign_pointer(*ins, rt);
 		call_fib6_entry_notifiers(info->nl_net, FIB_EVENT_ENTRY_REPLACE,
 					  rt);
@@ -1056,14 +1056,14 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 
 		if (nsiblings) {
 			/* Replacing an ECMP route, remove all siblings */
-			ins = &rt->dst.rt6_next;
+			ins = &rt->rt6_next;
 			iter = rcu_dereference_protected(*ins,
 				    lockdep_is_held(&rt->rt6i_table->tb6_lock));
 			while (iter) {
 				if (iter->rt6i_metric > rt->rt6i_metric)
 					break;
 				if (rt6_qualify_for_ecmp(iter)) {
-					*ins = iter->dst.rt6_next;
+					*ins = iter->rt6_next;
 					iter->rt6i_node = NULL;
 					fib6_purge_rt(iter, fn, info->nl_net);
 					if (rcu_access_pointer(fn->rr_ptr) == iter)
@@ -1072,7 +1072,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 					nsiblings--;
 					info->nl_net->ipv6.rt6_stats->fib_rt_entries--;
 				} else {
-					ins = &iter->dst.rt6_next;
+					ins = &iter->rt6_next;
 				}
 				iter = rcu_dereference_protected(*ins,
 					lockdep_is_held(&rt->rt6i_table->tb6_lock));
@@ -1641,7 +1641,7 @@ static void fib6_del_route(struct fib6_table *table, struct fib6_node *fn,
 	WARN_ON_ONCE(rt->rt6i_flags & RTF_CACHE);
 
 	/* Unlink it */
-	*rtp = rt->dst.rt6_next;
+	*rtp = rt->rt6_next;
 	rt->rt6i_node = NULL;
 	net->ipv6.rt6_stats->fib_rt_entries--;
 	net->ipv6.rt6_stats->fib_discarded_routes++;
@@ -1669,7 +1669,7 @@ static void fib6_del_route(struct fib6_table *table, struct fib6_node *fn,
 	FOR_WALKERS(net, w) {
 		if (w->state == FWS_C && w->leaf == rt) {
 			RT6_TRACE("walker %p adjusted by delroute\n", w);
-			w->leaf = rcu_dereference_protected(rt->dst.rt6_next,
+			w->leaf = rcu_dereference_protected(rt->rt6_next,
 					    lockdep_is_held(&table->tb6_lock));
 			if (!w->leaf)
 				w->state = FWS_U;
@@ -1728,7 +1728,7 @@ int fib6_del(struct rt6_info *rt, struct nl_info *info)
 			fib6_del_route(table, fn, rtp, info);
 			return 0;
 		}
-		rtp_next = &cur->dst.rt6_next;
+		rtp_next = &cur->rt6_next;
 	}
 	return -ENOENT;
 }
@@ -2203,7 +2203,7 @@ static int ipv6_route_yield(struct fib6_walker *w)
 
 	do {
 		iter->w.leaf = rcu_dereference_protected(
-				iter->w.leaf->dst.rt6_next,
+				iter->w.leaf->rt6_next,
 				lockdep_is_held(&iter->tbl->tb6_lock));
 		iter->skip--;
 		if (!iter->skip && iter->w.leaf)
@@ -2269,7 +2269,7 @@ static void *ipv6_route_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 	if (!v)
 		goto iter_table;
 
-	n = rcu_dereference_bh(((struct rt6_info *)v)->dst.rt6_next);
+	n = rcu_dereference_bh(((struct rt6_info *)v)->rt6_next);
 	if (n) {
 		++*pos;
 		return n;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 70d9659fc1e9..0e98bfab3462 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -497,7 +497,7 @@ static inline struct rt6_info *rt6_device_match(struct net *net,
 	if (!oif && ipv6_addr_any(saddr))
 		goto out;
 
-	for (sprt = rt; sprt; sprt = rcu_dereference(sprt->dst.rt6_next)) {
+	for (sprt = rt; sprt; sprt = rcu_dereference(sprt->rt6_next)) {
 		struct net_device *dev = sprt->dst.dev;
 
 		if (oif) {
@@ -716,7 +716,7 @@ static struct rt6_info *find_rr_leaf(struct fib6_node *fn,
 
 	match = NULL;
 	cont = NULL;
-	for (rt = rr_head; rt; rt = rcu_dereference(rt->dst.rt6_next)) {
+	for (rt = rr_head; rt; rt = rcu_dereference(rt->rt6_next)) {
 		if (rt->rt6i_metric != metric) {
 			cont = rt;
 			break;
@@ -726,7 +726,7 @@ static struct rt6_info *find_rr_leaf(struct fib6_node *fn,
 	}
 
 	for (rt = leaf; rt && rt != rr_head;
-	     rt = rcu_dereference(rt->dst.rt6_next)) {
+	     rt = rcu_dereference(rt->rt6_next)) {
 		if (rt->rt6i_metric != metric) {
 			cont = rt;
 			break;
@@ -738,7 +738,7 @@ static struct rt6_info *find_rr_leaf(struct fib6_node *fn,
 	if (match || !cont)
 		return match;
 
-	for (rt = cont; rt; rt = rcu_dereference(rt->dst.rt6_next))
+	for (rt = cont; rt; rt = rcu_dereference(rt->rt6_next))
 		match = find_match(rt, oif, strict, &mpri, match, do_rr);
 
 	return match;
@@ -776,7 +776,7 @@ static struct rt6_info *rt6_select(struct net *net, struct fib6_node *fn,
 			     &do_rr);
 
 	if (do_rr) {
-		struct rt6_info *next = rcu_dereference(rt0->dst.rt6_next);
+		struct rt6_info *next = rcu_dereference(rt0->rt6_next);
 
 		/* no entries matched; do round-robin */
 		if (!next || next->rt6i_metric != rt0->rt6i_metric)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v2 PATCH 04/11] net: Create and use new helper xfrm_dst_child().
  2017-10-31 14:10 [RFC v2 PATCH 00/11] net: Significantly shrink the size of routes David S. Miller
                   ` (2 preceding siblings ...)
  2017-10-31 14:10 ` [RFC v2 PATCH 03/11] ipv6: Move rt6_next from dst_entry into ipv6 " David S. Miller
@ 2017-10-31 14:10 ` David S. Miller
  2017-10-31 18:39   ` Eric Dumazet
  2017-10-31 14:10 ` [RFC v2 PATCH 05/11] ipsec: Create and use new helpers for dst child access David S. Miller
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: David S. Miller @ 2017-10-31 14:10 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

Only IPSEC routes have a non-NULL dst->child pointer.  And IPSEC
routes are identified by a non-NULL dst->xfrm pointer.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/xfrm.h           |  9 +++++++++
 net/core/dst.c               |  8 +++++---
 net/ipv4/xfrm4_mode_tunnel.c |  2 +-
 net/ipv6/xfrm6_mode_tunnel.c |  2 +-
 net/ipv6/xfrm6_policy.c      |  2 +-
 net/xfrm/xfrm_output.c       |  2 +-
 net/xfrm/xfrm_policy.c       | 12 ++++++------
 security/selinux/xfrm.c      |  2 +-
 8 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index f002a2c5e33c..be599f9bb60d 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -993,6 +993,15 @@ struct xfrm_dst {
 	u32 path_cookie;
 };
 
+static inline struct dst_entry *xfrm_dst_child(const struct dst_entry *dst)
+{
+#ifdef CONFIG_XFRM
+	if (dst->xfrm)
+		return dst->child;
+#endif
+	return NULL;
+}
+
 #ifdef CONFIG_XFRM
 static inline void xfrm_dst_destroy(struct xfrm_dst *xdst)
 {
diff --git a/net/core/dst.c b/net/core/dst.c
index 662a2d4a3d19..6a3c21b8fc8d 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -116,12 +116,14 @@ EXPORT_SYMBOL(dst_alloc);
 
 struct dst_entry *dst_destroy(struct dst_entry * dst)
 {
-	struct dst_entry *child;
+	struct dst_entry *child = NULL;
 
 	smp_rmb();
 
-	child = dst->child;
-
+#ifdef CONFIG_XFRM
+	if (dst->xfrm)
+		child = dst->child;
+#endif
 	if (!(dst->flags & DST_NOCOUNT))
 		dst_entries_add(dst->ops, -1);
 
diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c
index e6265e2c274e..7d885a44dc9d 100644
--- a/net/ipv4/xfrm4_mode_tunnel.c
+++ b/net/ipv4/xfrm4_mode_tunnel.c
@@ -62,7 +62,7 @@ static int xfrm4_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb)
 	top_iph->frag_off = (flags & XFRM_STATE_NOPMTUDISC) ?
 		0 : (XFRM_MODE_SKB_CB(skb)->frag_off & htons(IP_DF));
 
-	top_iph->ttl = ip4_dst_hoplimit(dst->child);
+	top_iph->ttl = ip4_dst_hoplimit(xfrm_dst_child(dst));
 
 	top_iph->saddr = x->props.saddr.a4;
 	top_iph->daddr = x->id.daddr.a4;
diff --git a/net/ipv6/xfrm6_mode_tunnel.c b/net/ipv6/xfrm6_mode_tunnel.c
index 02556e356f87..e66b94f46532 100644
--- a/net/ipv6/xfrm6_mode_tunnel.c
+++ b/net/ipv6/xfrm6_mode_tunnel.c
@@ -59,7 +59,7 @@ static int xfrm6_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb)
 	if (x->props.flags & XFRM_STATE_NOECN)
 		dsfield &= ~INET_ECN_MASK;
 	ipv6_change_dsfield(top_iph, 0, dsfield);
-	top_iph->hop_limit = ip6_dst_hoplimit(dst->child);
+	top_iph->hop_limit = ip6_dst_hoplimit(xfrm_dst_child(dst));
 	top_iph->saddr = *(struct in6_addr *)&x->props.saddr;
 	top_iph->daddr = *(struct in6_addr *)&x->id.daddr;
 	return 0;
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 4ed9f8cc3b6a..e2e6cceef288 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -264,7 +264,7 @@ static void xfrm6_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
 			in6_dev_put(xdst->u.rt6.rt6i_idev);
 			xdst->u.rt6.rt6i_idev = loopback_idev;
 			in6_dev_hold(loopback_idev);
-			xdst = (struct xfrm_dst *)xdst->u.dst.child;
+			xdst = (struct xfrm_dst *)xfrm_dst_child(&xdst->u.dst);
 		} while (xdst->u.dst.xfrm);
 
 		__in6_dev_put(loopback_idev);
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 31a2e6d34dba..7fc0932d61ff 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -44,7 +44,7 @@ static int xfrm_skb_check_space(struct sk_buff *skb)
 
 static struct dst_entry *skb_dst_pop(struct sk_buff *skb)
 {
-	struct dst_entry *child = dst_clone(skb_dst(skb)->child);
+	struct dst_entry *child = dst_clone(xfrm_dst_child(skb_dst(skb)));
 
 	skb_dst_drop(skb);
 	return child;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index b669c624a1ec..1ecc8dbce2e2 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1635,7 +1635,7 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 	xfrm_init_path((struct xfrm_dst *)dst0, dst, nfheader_len);
 	xfrm_init_pmtu(dst_prev);
 
-	for (dst_prev = dst0; dst_prev != dst; dst_prev = dst_prev->child) {
+	for (dst_prev = dst0; dst_prev != dst; dst_prev = xfrm_dst_child(dst_prev)) {
 		struct xfrm_dst *xdst = (struct xfrm_dst *)dst_prev;
 
 		err = xfrm_fill_dst(xdst, dev, fl);
@@ -2570,7 +2570,7 @@ static int stale_bundle(struct dst_entry *dst)
 
 void xfrm_dst_ifdown(struct dst_entry *dst, struct net_device *dev)
 {
-	while ((dst = dst->child) && dst->xfrm && dst->dev == dev) {
+	while ((dst = xfrm_dst_child(dst)) && dst->xfrm && dst->dev == dev) {
 		dst->dev = dev_net(dev)->loopback_dev;
 		dev_hold(dst->dev);
 		dev_put(dev);
@@ -2600,7 +2600,7 @@ static void xfrm_init_pmtu(struct dst_entry *dst)
 		struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
 		u32 pmtu, route_mtu_cached;
 
-		pmtu = dst_mtu(dst->child);
+		pmtu = dst_mtu(xfrm_dst_child(dst));
 		xdst->child_mtu_cached = pmtu;
 
 		pmtu = xfrm_state_mtu(dst->xfrm, pmtu);
@@ -2645,7 +2645,7 @@ static int xfrm_bundle_ok(struct xfrm_dst *first)
 		    xdst->policy_genid != atomic_read(&xdst->pols[0]->genid))
 			return 0;
 
-		mtu = dst_mtu(dst->child);
+		mtu = dst_mtu(xfrm_dst_child(dst));
 		if (xdst->child_mtu_cached != mtu) {
 			last = xdst;
 			xdst->child_mtu_cached = mtu;
@@ -2659,7 +2659,7 @@ static int xfrm_bundle_ok(struct xfrm_dst *first)
 			xdst->route_mtu_cached = mtu;
 		}
 
-		dst = dst->child;
+		dst = xfrm_dst_child(dst);
 	} while (dst->xfrm);
 
 	if (likely(!last))
@@ -2701,7 +2701,7 @@ static const void *xfrm_get_dst_nexthop(const struct dst_entry *dst,
 {
 	const struct dst_entry *path = dst->path;
 
-	for (; dst != path; dst = dst->child) {
+	for (; dst != path; dst = xfrm_dst_child(dst)) {
 		const struct xfrm_state *xfrm = dst->xfrm;
 
 		if (xfrm->props.mode == XFRM_MODE_TRANSPORT)
diff --git a/security/selinux/xfrm.c b/security/selinux/xfrm.c
index 56e354fcdfc6..928188902901 100644
--- a/security/selinux/xfrm.c
+++ b/security/selinux/xfrm.c
@@ -452,7 +452,7 @@ int selinux_xfrm_postroute_last(u32 sk_sid, struct sk_buff *skb,
 	if (dst) {
 		struct dst_entry *iter;
 
-		for (iter = dst; iter != NULL; iter = iter->child) {
+		for (iter = dst; iter != NULL; iter = xfrm_dst_child(iter)) {
 			struct xfrm_state *x = iter->xfrm;
 
 			if (x && selinux_authorizable_xfrm(x))
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v2 PATCH 05/11] ipsec: Create and use new helpers for dst child access.
  2017-10-31 14:10 [RFC v2 PATCH 00/11] net: Significantly shrink the size of routes David S. Miller
                   ` (3 preceding siblings ...)
  2017-10-31 14:10 ` [RFC v2 PATCH 04/11] net: Create and use new helper xfrm_dst_child() David S. Miller
@ 2017-10-31 14:10 ` David S. Miller
  2017-10-31 18:40   ` Eric Dumazet
  2017-10-31 14:10 ` [RFC v2 PATCH 06/11] xfrm: Move child route linkage into xfrm_dst David S. Miller
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: David S. Miller @ 2017-10-31 14:10 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

This will make a future change moving the dst->child pointer less
invasive.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/xfrm.h     |  5 +++++
 net/xfrm/xfrm_policy.c | 47 +++++++++++++++++++++++------------------------
 2 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index be599f9bb60d..32267e099638 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1003,6 +1003,11 @@ static inline struct dst_entry *xfrm_dst_child(const struct dst_entry *dst)
 }
 
 #ifdef CONFIG_XFRM
+static inline void xfrm_dst_set_child(struct xfrm_dst *xdst, struct dst_entry *child)
+{
+	xdst->u.dst.child = child;
+}
+
 static inline void xfrm_dst_destroy(struct xfrm_dst *xdst)
 {
 	xfrm_pols_put(xdst->pols, xdst->num_pols);
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 1ecc8dbce2e2..206ac90ff4f0 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1545,8 +1545,8 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 	unsigned long now = jiffies;
 	struct net_device *dev;
 	struct xfrm_mode *inner_mode;
-	struct dst_entry *dst_prev = NULL;
-	struct dst_entry *dst0 = NULL;
+	struct xfrm_dst *xdst_prev = NULL;
+	struct xfrm_dst *xdst0 = NULL;
 	int i = 0;
 	int err;
 	int header_len = 0;
@@ -1572,13 +1572,13 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 			goto put_states;
 		}
 
-		if (!dst_prev)
-			dst0 = dst1;
+		if (!xdst_prev)
+			xdst0 = xdst;
 		else
 			/* Ref count is taken during xfrm_alloc_dst()
 			 * No need to do dst_clone() on dst1
 			 */
-			dst_prev->child = dst1;
+			xfrm_dst_set_child(xdst_prev, &xdst->u.dst);
 
 		if (xfrm[i]->sel.family == AF_UNSPEC) {
 			inner_mode = xfrm_ip2inner_mode(xfrm[i],
@@ -1615,8 +1615,8 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 		dst1->input = dst_discard;
 		dst1->output = inner_mode->afinfo->output;
 
-		dst1->next = dst_prev;
-		dst_prev = dst1;
+		dst1->next = &xdst_prev->u.dst;
+		xdst_prev = xdst;
 
 		header_len += xfrm[i]->props.header_len;
 		if (xfrm[i]->type->flags & XFRM_TYPE_NON_FRAGMENT)
@@ -1624,40 +1624,39 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 		trailer_len += xfrm[i]->props.trailer_len;
 	}
 
-	dst_prev->child = dst;
-	dst0->path = dst;
+	xfrm_dst_set_child(xdst_prev, dst);
+	xdst0->u.dst.path = dst;
 
 	err = -ENODEV;
 	dev = dst->dev;
 	if (!dev)
 		goto free_dst;
 
-	xfrm_init_path((struct xfrm_dst *)dst0, dst, nfheader_len);
-	xfrm_init_pmtu(dst_prev);
+	xfrm_init_path(xdst0, dst, nfheader_len);
+	xfrm_init_pmtu(&xdst_prev->u.dst);
 
-	for (dst_prev = dst0; dst_prev != dst; dst_prev = xfrm_dst_child(dst_prev)) {
-		struct xfrm_dst *xdst = (struct xfrm_dst *)dst_prev;
-
-		err = xfrm_fill_dst(xdst, dev, fl);
+	for (xdst_prev = xdst0; xdst_prev != (struct xfrm_dst *)dst;
+	     xdst_prev = (struct xfrm_dst *) xfrm_dst_child(&xdst_prev->u.dst)) {
+		err = xfrm_fill_dst(xdst_prev, dev, fl);
 		if (err)
 			goto free_dst;
 
-		dst_prev->header_len = header_len;
-		dst_prev->trailer_len = trailer_len;
-		header_len -= xdst->u.dst.xfrm->props.header_len;
-		trailer_len -= xdst->u.dst.xfrm->props.trailer_len;
+		xdst_prev->u.dst.header_len = header_len;
+		xdst_prev->u.dst.trailer_len = trailer_len;
+		header_len -= xdst_prev->u.dst.xfrm->props.header_len;
+		trailer_len -= xdst_prev->u.dst.xfrm->props.trailer_len;
 	}
 
 out:
-	return dst0;
+	return &xdst0->u.dst;
 
 put_states:
 	for (; i < nx; i++)
 		xfrm_state_put(xfrm[i]);
 free_dst:
-	if (dst0)
-		dst_release_immediate(dst0);
-	dst0 = ERR_PTR(err);
+	if (xdst0)
+		dst_release_immediate(&xdst0->u.dst);
+	xdst0 = ERR_PTR(err);
 	goto out;
 }
 
@@ -2005,7 +2004,7 @@ static struct xfrm_dst *xfrm_create_dummy_bundle(struct net *net,
 	dst1->output = xdst_queue_output;
 
 	dst_hold(dst);
-	dst1->child = dst;
+	xfrm_dst_set_child(xdst, dst);
 	dst1->path = dst;
 
 	xfrm_init_path((struct xfrm_dst *)dst1, dst, 0);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v2 PATCH 06/11] xfrm: Move child route linkage into xfrm_dst.
  2017-10-31 14:10 [RFC v2 PATCH 00/11] net: Significantly shrink the size of routes David S. Miller
                   ` (4 preceding siblings ...)
  2017-10-31 14:10 ` [RFC v2 PATCH 05/11] ipsec: Create and use new helpers for dst child access David S. Miller
@ 2017-10-31 14:10 ` David S. Miller
  2017-10-31 18:42   ` Eric Dumazet
  2017-10-31 14:10 ` [RFC v2 PATCH 07/11] ipv6: Move dst->from into struct rt6_info David S. Miller
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: David S. Miller @ 2017-10-31 14:10 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

XFRM bundle child chains look like this:

	xdst1 --> xdst2 --> xdst3 --> path_dst

All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL.
The final child pointer in the chain, here called 'path_dst', is some
other kind of route such as an ipv4 or ipv6 one.

The xfrm output path pops routes, one at a time, via the child
pointer, until we hit one which has a dst->xfrm pointer which
is NULL.

We can easily preserve the above mechanisms with child sitting
only in the xfrm_dst structure.  All children in the chain
before we break out of the xfrm_output() loop have dst->xfrm
non-NULL and are therefore xfrm_dst objects.

Since we break out of the loop when we find dst->xfrm NULL, we
will not try to dereference 'dst' as if it were an xfrm_dst.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dst.h         |  3 +--
 include/net/xfrm.h        | 15 ++++++++++-----
 net/core/dst.c            |  9 ++++++---
 net/core/pktgen.c         | 12 ++++++------
 net/netfilter/xt_policy.c |  3 ++-
 net/xfrm/xfrm_device.c    |  2 +-
 6 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 83a790b16007..45720cc779f8 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -34,7 +34,6 @@ struct sk_buff;
 struct dst_entry {
 	struct net_device       *dev;
 	struct rcu_head		rcu_head;
-	struct dst_entry	*child;
 	struct  dst_ops	        *ops;
 	unsigned long		_metrics;
 	unsigned long           expires;
@@ -88,7 +87,7 @@ struct dst_entry {
 	 * Align __refcnt to a 64 bytes alignment
 	 * (L1_CACHE_SIZE would be too much)
 	 */
-	long			__pad_to_align_refcnt[2];
+	long			__pad_to_align_refcnt[3];
 #endif
 	/*
 	 * __refcnt wants to be on a different cache line from
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 32267e099638..725c3d656c62 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -967,7 +967,7 @@ static inline bool xfrm_sec_ctx_match(struct xfrm_sec_ctx *s1, struct xfrm_sec_c
 
 /* A struct encoding bundle of transformations to apply to some set of flow.
  *
- * dst->child points to the next element of bundle.
+ * xdst->child points to the next element of bundle.
  * dst->xfrm  points to an instanse of transformer.
  *
  * Due to unfortunate limitations of current routing cache, which we
@@ -983,6 +983,7 @@ struct xfrm_dst {
 		struct rt6_info		rt6;
 	} u;
 	struct dst_entry *route;
+	struct dst_entry *child;
 	struct xfrm_policy *pols[XFRM_POLICY_TYPE_MAX];
 	int num_pols, num_xfrms;
 	u32 xfrm_genid;
@@ -996,8 +997,10 @@ struct xfrm_dst {
 static inline struct dst_entry *xfrm_dst_child(const struct dst_entry *dst)
 {
 #ifdef CONFIG_XFRM
-	if (dst->xfrm)
-		return dst->child;
+	if (dst->xfrm) {
+		struct xfrm_dst *xdst = (struct xfrm_dst *) dst;
+		return xdst->child;
+	}
 #endif
 	return NULL;
 }
@@ -1005,7 +1008,7 @@ static inline struct dst_entry *xfrm_dst_child(const struct dst_entry *dst)
 #ifdef CONFIG_XFRM
 static inline void xfrm_dst_set_child(struct xfrm_dst *xdst, struct dst_entry *child)
 {
-	xdst->u.dst.child = child;
+	xdst->child = child;
 }
 
 static inline void xfrm_dst_destroy(struct xfrm_dst *xdst)
@@ -1879,12 +1882,14 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x);
 static inline bool xfrm_dst_offload_ok(struct dst_entry *dst)
 {
 	struct xfrm_state *x = dst->xfrm;
+	struct xfrm_dst *xdst;
 
 	if (!x || !x->type_offload)
 		return false;
 
+	xdst = (struct xfrm_dst *) dst;
 	if (x->xso.offload_handle && (x->xso.dev == dst->path->dev) &&
-	    !dst->child->xfrm)
+	    !xdst->child->xfrm)
 		return true;
 
 	return false;
diff --git a/net/core/dst.c b/net/core/dst.c
index 6a3c21b8fc8d..5cf96179e8e0 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -21,6 +21,7 @@
 #include <linux/sched.h>
 #include <linux/prefetch.h>
 #include <net/lwtunnel.h>
+#include <net/xfrm.h>
 
 #include <net/dst.h>
 #include <net/dst_metadata.h>
@@ -62,7 +63,6 @@ void dst_init(struct dst_entry *dst, struct dst_ops *ops,
 	      struct net_device *dev, int initial_ref, int initial_obsolete,
 	      unsigned short flags)
 {
-	dst->child = NULL;
 	dst->dev = dev;
 	if (dev)
 		dev_hold(dev);
@@ -121,8 +121,11 @@ struct dst_entry *dst_destroy(struct dst_entry * dst)
 	smp_rmb();
 
 #ifdef CONFIG_XFRM
-	if (dst->xfrm)
-		child = dst->child;
+	if (dst->xfrm) {
+		struct xfrm_dst *xdst = (struct xfrm_dst *) dst;
+
+		child = xdst->child;
+	}
 #endif
 	if (!(dst->flags & DST_NOCOUNT))
 		dst_entries_add(dst->ops, -1);
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 6e1e10ff433a..099b0a2f6bb2 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -399,7 +399,7 @@ struct pktgen_dev {
 	__u8	ipsmode;		/* IPSEC mode (config) */
 	__u8	ipsproto;		/* IPSEC type (config) */
 	__u32	spi;
-	struct dst_entry dst;
+	struct xfrm_dst xdst;
 	struct dst_ops dstops;
 #endif
 	char result[512];
@@ -2609,7 +2609,7 @@ static int pktgen_output_ipsec(struct sk_buff *skb, struct pktgen_dev *pkt_dev)
 	 * supports both transport/tunnel mode + ESP/AH type.
 	 */
 	if ((x->props.mode == XFRM_MODE_TUNNEL) && (pkt_dev->spi != 0))
-		skb->_skb_refdst = (unsigned long)&pkt_dev->dst | SKB_DST_NOREF;
+		skb->_skb_refdst = (unsigned long)&pkt_dev->xdst.u.dst | SKB_DST_NOREF;
 
 	rcu_read_lock_bh();
 	err = x->outer_mode->output(x, skb);
@@ -3734,10 +3734,10 @@ static int pktgen_add_device(struct pktgen_thread *t, const char *ifname)
 	 * performance under such circumstance.
 	 */
 	pkt_dev->dstops.family = AF_INET;
-	pkt_dev->dst.dev = pkt_dev->odev;
-	dst_init_metrics(&pkt_dev->dst, pktgen_dst_metrics, false);
-	pkt_dev->dst.child = &pkt_dev->dst;
-	pkt_dev->dst.ops = &pkt_dev->dstops;
+	pkt_dev->xdst.u.dst.dev = pkt_dev->odev;
+	dst_init_metrics(&pkt_dev->xdst.u.dst, pktgen_dst_metrics, false);
+	pkt_dev->xdst.child = &pkt_dev->xdst.u.dst;
+	pkt_dev->xdst.u.dst.ops = &pkt_dev->dstops;
 #endif
 
 	return add_dev_to_thread(t, pkt_dev);
diff --git a/net/netfilter/xt_policy.c b/net/netfilter/xt_policy.c
index 2b4ab189bba7..5639fb03bdd9 100644
--- a/net/netfilter/xt_policy.c
+++ b/net/netfilter/xt_policy.c
@@ -93,7 +93,8 @@ match_policy_out(const struct sk_buff *skb, const struct xt_policy_info *info,
 	if (dst->xfrm == NULL)
 		return -1;
 
-	for (i = 0; dst && dst->xfrm; dst = dst->child, i++) {
+	for (i = 0; dst && dst->xfrm;
+	     dst = ((struct xfrm_dst *)dst)->child, i++) {
 		pos = strict ? i : 0;
 		if (pos >= info->len)
 			return 0;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 30e5746085b8..c5851ddddd2a 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -121,7 +121,7 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
 		return false;
 
 	if ((x->xso.offload_handle && (dev == dst->path->dev)) &&
-	     !dst->child->xfrm && x->type->get_mtu) {
+	     !xdst->child->xfrm && x->type->get_mtu) {
 		mtu = x->type->get_mtu(x, xdst->child_mtu_cached);
 
 		if (skb->len <= mtu)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v2 PATCH 07/11] ipv6: Move dst->from into struct rt6_info.
  2017-10-31 14:10 [RFC v2 PATCH 00/11] net: Significantly shrink the size of routes David S. Miller
                   ` (5 preceding siblings ...)
  2017-10-31 14:10 ` [RFC v2 PATCH 06/11] xfrm: Move child route linkage into xfrm_dst David S. Miller
@ 2017-10-31 14:10 ` David S. Miller
  2017-10-31 18:47   ` Eric Dumazet
  2017-10-31 14:10 ` [RFC v2 PATCH 08/11] xfrm: Move dst->path into struct xfrm_dst David S. Miller
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: David S. Miller @ 2017-10-31 14:10 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

The dst->from value is only used by ipv6 routes to track where
a route "came from".

Any time we clone or copy a core ipv6 route in the ipv6 routing
tables, we have the copy/clone's ->from point to the base route.

This is used to handle route expiration properly.

Only ipv6 uses this mechanism, and only ipv6 code references
it.  So it is safe to move it into rt6_info.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dst.h     |  3 +--
 include/net/ip6_fib.h |  9 ++++-----
 net/core/dst.c        |  1 -
 net/ipv6/route.c      | 34 +++++++++++++++++-----------------
 4 files changed, 22 insertions(+), 25 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 45720cc779f8..19f24f3b6c06 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -38,7 +38,6 @@ struct dst_entry {
 	unsigned long		_metrics;
 	unsigned long           expires;
 	struct dst_entry	*path;
-	struct dst_entry	*from;
 #ifdef CONFIG_XFRM
 	struct xfrm_state	*xfrm;
 #else
@@ -87,7 +86,7 @@ struct dst_entry {
 	 * Align __refcnt to a 64 bytes alignment
 	 * (L1_CACHE_SIZE would be too much)
 	 */
-	long			__pad_to_align_refcnt[3];
+	long			__pad_to_align_refcnt[4];
 #endif
 	/*
 	 * __refcnt wants to be on a different cache line from
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 281a922f0c62..44d96a91e745 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -130,6 +130,7 @@ struct rt6_exception {
 struct rt6_info {
 	struct dst_entry		dst;
 	struct rt6_info __rcu		*rt6_next;
+	struct rt6_info			*from;
 
 	/*
 	 * Tail elements of dst_entry (__refcnt etc.)
@@ -204,11 +205,9 @@ static inline void rt6_update_expires(struct rt6_info *rt0, int timeout)
 {
 	struct rt6_info *rt;
 
-	for (rt = rt0; rt && !(rt->rt6i_flags & RTF_EXPIRES);
-	     rt = (struct rt6_info *)rt->dst.from);
+	for (rt = rt0; rt && !(rt->rt6i_flags & RTF_EXPIRES); rt = rt->from);
 	if (rt && rt != rt0)
 		rt0->dst.expires = rt->dst.expires;
-
 	dst_set_expires(&rt0->dst, timeout);
 	rt0->rt6i_flags |= RTF_EXPIRES;
 }
@@ -243,8 +242,8 @@ static inline u32 rt6_get_cookie(const struct rt6_info *rt)
 	u32 cookie = 0;
 
 	if (rt->rt6i_flags & RTF_PCPU ||
-	    (unlikely(!list_empty(&rt->rt6i_uncached)) && rt->dst.from))
-		rt = (struct rt6_info *)(rt->dst.from);
+	    (unlikely(!list_empty(&rt->rt6i_uncached)) && rt->from))
+		rt = rt->from;
 
 	rt6_get_cookie_safe(rt, &cookie);
 
diff --git a/net/core/dst.c b/net/core/dst.c
index 5cf96179e8e0..cf2076c0eb22 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -70,7 +70,6 @@ void dst_init(struct dst_entry *dst, struct dst_ops *ops,
 	dst_init_metrics(dst, dst_default_metrics.metrics, true);
 	dst->expires = 0UL;
 	dst->path = dst;
-	dst->from = NULL;
 #ifdef CONFIG_XFRM
 	dst->xfrm = NULL;
 #endif
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 0e98bfab3462..5b55072d6e31 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -186,7 +186,7 @@ static void rt6_uncached_list_flush_dev(struct net *net, struct net_device *dev)
 
 static u32 *rt6_pcpu_cow_metrics(struct rt6_info *rt)
 {
-	return dst_metrics_write_ptr(rt->dst.from);
+	return dst_metrics_write_ptr(&rt->from->dst);
 }
 
 static u32 *ipv6_cow_metrics(struct dst_entry *dst, unsigned long old)
@@ -391,7 +391,7 @@ static void ip6_dst_destroy(struct dst_entry *dst)
 {
 	struct rt6_info *rt = (struct rt6_info *)dst;
 	struct rt6_exception_bucket *bucket;
-	struct dst_entry *from = dst->from;
+	struct rt6_info *from = rt->from;
 	struct inet6_dev *idev;
 
 	dst_destroy_metrics_generic(dst);
@@ -409,8 +409,8 @@ static void ip6_dst_destroy(struct dst_entry *dst)
 		kfree(bucket);
 	}
 
-	dst->from = NULL;
-	dst_release(from);
+	rt->from = NULL;
+	dst_release(&from->dst);
 }
 
 static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
@@ -443,9 +443,9 @@ static bool rt6_check_expired(const struct rt6_info *rt)
 	if (rt->rt6i_flags & RTF_EXPIRES) {
 		if (time_after(jiffies, rt->dst.expires))
 			return true;
-	} else if (rt->dst.from) {
+	} else if (rt->from) {
 		return rt->dst.obsolete != DST_OBSOLETE_FORCE_CHK ||
-		       rt6_check_expired((struct rt6_info *)rt->dst.from);
+			rt6_check_expired(rt->from);
 	}
 	return false;
 }
@@ -1049,7 +1049,7 @@ static struct rt6_info *ip6_rt_cache_alloc(struct rt6_info *ort,
 	 */
 
 	if (ort->rt6i_flags & (RTF_CACHE | RTF_PCPU))
-		ort = (struct rt6_info *)ort->dst.from;
+		ort = ort->from;
 
 	rcu_read_lock();
 	dev = ip6_rt_get_dev_rcu(ort);
@@ -1269,7 +1269,7 @@ static int rt6_insert_exception(struct rt6_info *nrt,
 
 	/* ort can't be a cache or pcpu route */
 	if (ort->rt6i_flags & (RTF_CACHE | RTF_PCPU))
-		ort = (struct rt6_info *)ort->dst.from;
+		ort = ort->from;
 	WARN_ON_ONCE(ort->rt6i_flags & (RTF_CACHE | RTF_PCPU));
 
 	spin_lock_bh(&rt6_exception_lock);
@@ -1410,8 +1410,8 @@ static struct rt6_info *rt6_find_cached_rt(struct rt6_info *rt,
 /* Remove the passed in cached rt from the hash table that contains it */
 int rt6_remove_exception_rt(struct rt6_info *rt)
 {
-	struct rt6_info *from = (struct rt6_info *)rt->dst.from;
 	struct rt6_exception_bucket *bucket;
+	struct rt6_info *from = rt->from;
 	struct in6_addr *src_key = NULL;
 	struct rt6_exception *rt6_ex;
 	int err;
@@ -1455,8 +1455,8 @@ int rt6_remove_exception_rt(struct rt6_info *rt)
  */
 static void rt6_update_exception_stamp_rt(struct rt6_info *rt)
 {
-	struct rt6_info *from = (struct rt6_info *)rt->dst.from;
 	struct rt6_exception_bucket *bucket;
+	struct rt6_info *from = rt->from;
 	struct in6_addr *src_key = NULL;
 	struct rt6_exception *rt6_ex;
 
@@ -1924,9 +1924,9 @@ struct dst_entry *ip6_blackhole_route(struct net *net, struct dst_entry *dst_ori
 
 static void rt6_dst_from_metrics_check(struct rt6_info *rt)
 {
-	if (rt->dst.from &&
-	    dst_metrics_ptr(&rt->dst) != dst_metrics_ptr(rt->dst.from))
-		dst_init_metrics(&rt->dst, dst_metrics_ptr(rt->dst.from), true);
+	if (rt->from &&
+	    dst_metrics_ptr(&rt->dst) != dst_metrics_ptr(&rt->from->dst))
+		dst_init_metrics(&rt->dst, dst_metrics_ptr(&rt->from->dst), true);
 }
 
 static struct dst_entry *rt6_check(struct rt6_info *rt, u32 cookie)
@@ -1946,7 +1946,7 @@ static struct dst_entry *rt6_dst_from_check(struct rt6_info *rt, u32 cookie)
 {
 	if (!__rt6_check_expired(rt) &&
 	    rt->dst.obsolete == DST_OBSOLETE_FORCE_CHK &&
-	    rt6_check((struct rt6_info *)(rt->dst.from), cookie))
+	    rt6_check(rt->from, cookie))
 		return &rt->dst;
 	else
 		return NULL;
@@ -1966,7 +1966,7 @@ static struct dst_entry *ip6_dst_check(struct dst_entry *dst, u32 cookie)
 	rt6_dst_from_metrics_check(rt);
 
 	if (rt->rt6i_flags & RTF_PCPU ||
-	    (unlikely(!list_empty(&rt->rt6i_uncached)) && rt->dst.from))
+	    (unlikely(!list_empty(&rt->rt6i_uncached)) && rt->from))
 		return rt6_dst_from_check(rt, cookie);
 	else
 		return rt6_check(rt, cookie);
@@ -3049,11 +3049,11 @@ static void rt6_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_bu
 
 static void rt6_set_from(struct rt6_info *rt, struct rt6_info *from)
 {
-	BUG_ON(from->dst.from);
+	BUG_ON(from->from);
 
 	rt->rt6i_flags &= ~RTF_EXPIRES;
 	dst_hold(&from->dst);
-	rt->dst.from = &from->dst;
+	rt->from = from;
 	dst_init_metrics(&rt->dst, dst_metrics_ptr(&from->dst), true);
 }
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v2 PATCH 08/11] xfrm: Move dst->path into struct xfrm_dst
  2017-10-31 14:10 [RFC v2 PATCH 00/11] net: Significantly shrink the size of routes David S. Miller
                   ` (6 preceding siblings ...)
  2017-10-31 14:10 ` [RFC v2 PATCH 07/11] ipv6: Move dst->from into struct rt6_info David S. Miller
@ 2017-10-31 14:10 ` David S. Miller
  2017-10-31 18:49   ` Eric Dumazet
  2017-10-31 14:10 ` [RFC v2 PATCH 09/11] net: Rearrange dst_entry layout to avoid useless padding David S. Miller
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: David S. Miller @ 2017-10-31 14:10 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

The first member of an IPSEC route bundle chain sets it's dst->path to
the underlying ipv4/ipv6 route that carries the bundle.

Stated another way, if one were to follow the xfrm_dst->child chain of
the bundle, the final non-NULL pointer would be the path and point to
either an ipv4 or an ipv6 route.

This is largely used to make sure that PMTU events propagate down to
the correct ipv4 or ipv6 route.

When we don't have the top of an IPSEC bundle 'dst->path == dst'.

Move it down into xfrm_dst and key off of dst->xfrm.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dst.h       |  3 +--
 include/net/xfrm.h      | 15 ++++++++++++++-
 net/bridge/br_nf_core.c |  1 -
 net/core/dst.c          |  1 -
 net/ipv4/route.c        |  2 +-
 net/ipv6/ip6_output.c   |  4 ++--
 net/ipv6/route.c        |  6 ------
 net/xfrm/xfrm_device.c  |  2 +-
 net/xfrm/xfrm_policy.c  | 28 ++++++++++++++--------------
 9 files changed, 33 insertions(+), 29 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 19f24f3b6c06..e860c3b11322 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -37,7 +37,6 @@ struct dst_entry {
 	struct  dst_ops	        *ops;
 	unsigned long		_metrics;
 	unsigned long           expires;
-	struct dst_entry	*path;
 #ifdef CONFIG_XFRM
 	struct xfrm_state	*xfrm;
 #else
@@ -86,7 +85,7 @@ struct dst_entry {
 	 * Align __refcnt to a 64 bytes alignment
 	 * (L1_CACHE_SIZE would be too much)
 	 */
-	long			__pad_to_align_refcnt[4];
+	long			__pad_to_align_refcnt[5];
 #endif
 	/*
 	 * __refcnt wants to be on a different cache line from
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 725c3d656c62..c9e77d4362b1 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -984,6 +984,7 @@ struct xfrm_dst {
 	} u;
 	struct dst_entry *route;
 	struct dst_entry *child;
+	struct dst_entry *path;
 	struct xfrm_policy *pols[XFRM_POLICY_TYPE_MAX];
 	int num_pols, num_xfrms;
 	u32 xfrm_genid;
@@ -994,6 +995,18 @@ struct xfrm_dst {
 	u32 path_cookie;
 };
 
+static struct dst_entry *xfrm_dst_path(const struct dst_entry *dst)
+{
+#ifdef CONFIG_XFRM
+	if (dst->xfrm) {
+		const struct xfrm_dst *xdst = (const struct xfrm_dst *) dst;
+
+		return xdst->path;
+	}
+#endif
+	return (struct dst_entry *) dst;
+}
+
 static inline struct dst_entry *xfrm_dst_child(const struct dst_entry *dst)
 {
 #ifdef CONFIG_XFRM
@@ -1888,7 +1901,7 @@ static inline bool xfrm_dst_offload_ok(struct dst_entry *dst)
 		return false;
 
 	xdst = (struct xfrm_dst *) dst;
-	if (x->xso.offload_handle && (x->xso.dev == dst->path->dev) &&
+	if (x->xso.offload_handle && (x->xso.dev == xfrm_dst_path(dst)->dev) &&
 	    !xdst->child->xfrm)
 		return true;
 
diff --git a/net/bridge/br_nf_core.c b/net/bridge/br_nf_core.c
index 20cbb727df4d..8e2d7cfa4e16 100644
--- a/net/bridge/br_nf_core.c
+++ b/net/bridge/br_nf_core.c
@@ -78,7 +78,6 @@ void br_netfilter_rtable_init(struct net_bridge *br)
 
 	atomic_set(&rt->dst.__refcnt, 1);
 	rt->dst.dev = br->dev;
-	rt->dst.path = &rt->dst;
 	dst_init_metrics(&rt->dst, br_dst_default_metrics, true);
 	rt->dst.flags	= DST_NOXFRM | DST_FAKE_RTABLE;
 	rt->dst.ops = &fake_dst_ops;
diff --git a/net/core/dst.c b/net/core/dst.c
index cf2076c0eb22..9bc3bb6e94ef 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -69,7 +69,6 @@ void dst_init(struct dst_entry *dst, struct dst_ops *ops,
 	dst->ops = ops;
 	dst_init_metrics(dst, dst_default_metrics.metrics, true);
 	dst->expires = 0UL;
-	dst->path = dst;
 #ifdef CONFIG_XFRM
 	dst->xfrm = NULL;
 #endif
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index bc40bd411196..5ff6e3edcd3e 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1102,7 +1102,7 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
 		new = true;
 	}
 
-	__ip_rt_update_pmtu((struct rtable *) rt->dst.path, &fl4, mtu);
+	__ip_rt_update_pmtu((struct rtable *) xfrm_dst_path(&rt->dst), &fl4, mtu);
 
 	if (!dst_check(&rt->dst, 0)) {
 		if (new)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 5110a418cc4d..176d74fb3b4d 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1201,13 +1201,13 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 		      rt->dst.dev->mtu : dst_mtu(&rt->dst);
 	else
 		mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
-		      rt->dst.dev->mtu : dst_mtu(rt->dst.path);
+		      rt->dst.dev->mtu : dst_mtu(xfrm_dst_path(&rt->dst));
 	if (np->frag_size < mtu) {
 		if (np->frag_size)
 			mtu = np->frag_size;
 	}
 	cork->base.fragsize = mtu;
-	if (dst_allfrag(rt->dst.path))
+	if (dst_allfrag(xfrm_dst_path(&rt->dst)))
 		cork->base.flags |= IPCORK_ALLFRAG;
 	cork->base.length = 0;
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 5b55072d6e31..73e57c7bd951 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4590,8 +4590,6 @@ static int __net_init ip6_route_net_init(struct net *net)
 					   GFP_KERNEL);
 	if (!net->ipv6.ip6_null_entry)
 		goto out_ip6_dst_entries;
-	net->ipv6.ip6_null_entry->dst.path =
-		(struct dst_entry *)net->ipv6.ip6_null_entry;
 	net->ipv6.ip6_null_entry->dst.ops = &net->ipv6.ip6_dst_ops;
 	dst_init_metrics(&net->ipv6.ip6_null_entry->dst,
 			 ip6_template_metrics, true);
@@ -4603,8 +4601,6 @@ static int __net_init ip6_route_net_init(struct net *net)
 					       GFP_KERNEL);
 	if (!net->ipv6.ip6_prohibit_entry)
 		goto out_ip6_null_entry;
-	net->ipv6.ip6_prohibit_entry->dst.path =
-		(struct dst_entry *)net->ipv6.ip6_prohibit_entry;
 	net->ipv6.ip6_prohibit_entry->dst.ops = &net->ipv6.ip6_dst_ops;
 	dst_init_metrics(&net->ipv6.ip6_prohibit_entry->dst,
 			 ip6_template_metrics, true);
@@ -4614,8 +4610,6 @@ static int __net_init ip6_route_net_init(struct net *net)
 					       GFP_KERNEL);
 	if (!net->ipv6.ip6_blk_hole_entry)
 		goto out_ip6_prohibit_entry;
-	net->ipv6.ip6_blk_hole_entry->dst.path =
-		(struct dst_entry *)net->ipv6.ip6_blk_hole_entry;
 	net->ipv6.ip6_blk_hole_entry->dst.ops = &net->ipv6.ip6_dst_ops;
 	dst_init_metrics(&net->ipv6.ip6_blk_hole_entry->dst,
 			 ip6_template_metrics, true);
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index c5851ddddd2a..c61a7d46b412 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -120,7 +120,7 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
 	if (!x->type_offload || x->encap)
 		return false;
 
-	if ((x->xso.offload_handle && (dev == dst->path->dev)) &&
+	if ((x->xso.offload_handle && (dev == xfrm_dst_path(dst)->dev)) &&
 	     !xdst->child->xfrm && x->type->get_mtu) {
 		mtu = x->type->get_mtu(x, xdst->child_mtu_cached);
 
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 206ac90ff4f0..7b80ee7486db 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1625,7 +1625,7 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 	}
 
 	xfrm_dst_set_child(xdst_prev, dst);
-	xdst0->u.dst.path = dst;
+	xdst0->path = dst;
 
 	err = -ENODEV;
 	dev = dst->dev;
@@ -1872,8 +1872,8 @@ static void xfrm_policy_queue_process(struct timer_list *t)
 	xfrm_decode_session(skb, &fl, dst->ops->family);
 	spin_unlock(&pq->hold_queue.lock);
 
-	dst_hold(dst->path);
-	dst = xfrm_lookup(net, dst->path, &fl, sk, 0);
+	dst_hold(xfrm_dst_path(dst));
+	dst = xfrm_lookup(net, xfrm_dst_path(dst), &fl, sk, 0);
 	if (IS_ERR(dst))
 		goto purge_queue;
 
@@ -1902,8 +1902,8 @@ static void xfrm_policy_queue_process(struct timer_list *t)
 		skb = __skb_dequeue(&list);
 
 		xfrm_decode_session(skb, &fl, skb_dst(skb)->ops->family);
-		dst_hold(skb_dst(skb)->path);
-		dst = xfrm_lookup(net, skb_dst(skb)->path, &fl, skb->sk, 0);
+		dst_hold(xfrm_dst_path(skb_dst(skb)));
+		dst = xfrm_lookup(net, xfrm_dst_path(skb_dst(skb)), &fl, skb->sk, 0);
 		if (IS_ERR(dst)) {
 			kfree_skb(skb);
 			continue;
@@ -2005,7 +2005,7 @@ static struct xfrm_dst *xfrm_create_dummy_bundle(struct net *net,
 
 	dst_hold(dst);
 	xfrm_dst_set_child(xdst, dst);
-	dst1->path = dst;
+	xdst->path = dst;
 
 	xfrm_init_path((struct xfrm_dst *)dst1, dst, 0);
 
@@ -2624,7 +2624,7 @@ static int xfrm_bundle_ok(struct xfrm_dst *first)
 	struct xfrm_dst *last;
 	u32 mtu;
 
-	if (!dst_check(dst->path, ((struct xfrm_dst *)dst)->path_cookie) ||
+	if (!dst_check(xfrm_dst_path(dst), ((struct xfrm_dst *)dst)->path_cookie) ||
 	    (dst->dev && !netif_running(dst->dev)))
 		return 0;
 
@@ -2685,22 +2685,20 @@ static int xfrm_bundle_ok(struct xfrm_dst *first)
 
 static unsigned int xfrm_default_advmss(const struct dst_entry *dst)
 {
-	return dst_metric_advmss(dst->path);
+	return dst_metric_advmss(xfrm_dst_path(dst));
 }
 
 static unsigned int xfrm_mtu(const struct dst_entry *dst)
 {
 	unsigned int mtu = dst_metric_raw(dst, RTAX_MTU);
 
-	return mtu ? : dst_mtu(dst->path);
+	return mtu ? : dst_mtu(xfrm_dst_path(dst));
 }
 
 static const void *xfrm_get_dst_nexthop(const struct dst_entry *dst,
 					const void *daddr)
 {
-	const struct dst_entry *path = dst->path;
-
-	for (; dst != path; dst = xfrm_dst_child(dst)) {
+	while (dst->xfrm) {
 		const struct xfrm_state *xfrm = dst->xfrm;
 
 		if (xfrm->props.mode == XFRM_MODE_TRANSPORT)
@@ -2709,6 +2707,8 @@ static const void *xfrm_get_dst_nexthop(const struct dst_entry *dst,
 			daddr = xfrm->coaddr;
 		else if (!(xfrm->type->flags & XFRM_TYPE_LOCAL_COADDR))
 			daddr = &xfrm->id.daddr;
+
+		dst = xfrm_dst_child(dst);
 	}
 	return daddr;
 }
@@ -2717,7 +2717,7 @@ static struct neighbour *xfrm_neigh_lookup(const struct dst_entry *dst,
 					   struct sk_buff *skb,
 					   const void *daddr)
 {
-	const struct dst_entry *path = dst->path;
+	const struct dst_entry *path = xfrm_dst_path(dst);
 
 	if (!skb)
 		daddr = xfrm_get_dst_nexthop(dst, daddr);
@@ -2726,7 +2726,7 @@ static struct neighbour *xfrm_neigh_lookup(const struct dst_entry *dst,
 
 static void xfrm_confirm_neigh(const struct dst_entry *dst, const void *daddr)
 {
-	const struct dst_entry *path = dst->path;
+	const struct dst_entry *path = xfrm_dst_path(dst);
 
 	daddr = xfrm_get_dst_nexthop(dst, daddr);
 	path->ops->confirm_neigh(path, daddr);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v2 PATCH 09/11] net: Rearrange dst_entry layout to avoid useless padding.
  2017-10-31 14:10 [RFC v2 PATCH 00/11] net: Significantly shrink the size of routes David S. Miller
                   ` (7 preceding siblings ...)
  2017-10-31 14:10 ` [RFC v2 PATCH 08/11] xfrm: Move dst->path into struct xfrm_dst David S. Miller
@ 2017-10-31 14:10 ` David S. Miller
  2017-10-31 18:49   ` Eric Dumazet
  2017-10-31 14:10 ` [RFC v2 PATCH 10/11] xfrm: Stop using dst->next in bundle construction David S. Miller
  2017-10-31 14:10 ` [RFC v2 PATCH 11/11] net: Remove dst->next David S. Miller
  10 siblings, 1 reply; 24+ messages in thread
From: David S. Miller @ 2017-10-31 14:10 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

We have padding to try and align the refcount on a separate cache
line.  But after several simplifications the padding has increased
substantially.

So now it's easy to change the layout to get rid of the padding
entirely.

We group the write-heavy __refcnt and __use with less often used
items such as the rcu_head and the error code.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dst.h | 22 +++++-----------------
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index e860c3b11322..ffd0d81f861f 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -33,7 +33,6 @@ struct sk_buff;
 
 struct dst_entry {
 	struct net_device       *dev;
-	struct rcu_head		rcu_head;
 	struct  dst_ops	        *ops;
 	unsigned long		_metrics;
 	unsigned long           expires;
@@ -55,8 +54,6 @@ struct dst_entry {
 #define DST_XFRM_QUEUE		0x0040
 #define DST_METADATA		0x0080
 
-	short			error;
-
 	/* A non-zero value of dst->obsolete forces by-hand validation
 	 * of the route entry.  Positive values are set by the generic
 	 * dst layer to indicate that the entry has been forcefully
@@ -72,21 +69,7 @@ struct dst_entry {
 #define DST_OBSOLETE_KILL	-2
 	unsigned short		header_len;	/* more space at head required */
 	unsigned short		trailer_len;	/* space to reserve at tail */
-	unsigned short		__pad3;
-
-#ifdef CONFIG_IP_ROUTE_CLASSID
-	__u32			tclassid;
-#else
-	__u32			__pad2;
-#endif
 
-#ifdef CONFIG_64BIT
-	/*
-	 * Align __refcnt to a 64 bytes alignment
-	 * (L1_CACHE_SIZE would be too much)
-	 */
-	long			__pad_to_align_refcnt[5];
-#endif
 	/*
 	 * __refcnt wants to be on a different cache line from
 	 * input/output/ops or performance tanks badly
@@ -95,6 +78,11 @@ struct dst_entry {
 	int			__use;
 	unsigned long		lastuse;
 	struct lwtunnel_state   *lwtstate;
+	struct rcu_head		rcu_head;
+	short			error;
+	short			__pad;
+	__u32			tclassid;
+
 	union {
 		struct dst_entry	*next;
 	};
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v2 PATCH 10/11] xfrm: Stop using dst->next in bundle construction.
  2017-10-31 14:10 [RFC v2 PATCH 00/11] net: Significantly shrink the size of routes David S. Miller
                   ` (8 preceding siblings ...)
  2017-10-31 14:10 ` [RFC v2 PATCH 09/11] net: Rearrange dst_entry layout to avoid useless padding David S. Miller
@ 2017-10-31 14:10 ` David S. Miller
  2017-10-31 18:52   ` Eric Dumazet
  2017-10-31 14:10 ` [RFC v2 PATCH 11/11] net: Remove dst->next David S. Miller
  10 siblings, 1 reply; 24+ messages in thread
From: David S. Miller @ 2017-10-31 14:10 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

While building ipsec bundles, blocks of xfrm dsts are linked together
using dst->next from bottom to the top.

The only thing this is used for is initializing the pmtu values of the
xfrm stack, and for updating the mtu values at xfrm_bundle_ok() time.

The bundle pmtu entries must be processed in this order so that pmtu
values lower in the stack of routes can propagate up to the higher
ones.

Avoid using dst->next by simply maintaining an array of dst pointers
as we already do for the xfrm_state objects when building the bundle.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/xfrm/xfrm_policy.c | 56 ++++++++++++++++++++++++++++----------------------
 1 file changed, 32 insertions(+), 24 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 7b80ee7486db..b815884deb16 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -54,7 +54,7 @@ static struct xfrm_policy_afinfo const __rcu *xfrm_policy_afinfo[AF_INET6 + 1]
 static struct kmem_cache *xfrm_dst_cache __read_mostly;
 static __read_mostly seqcount_t xfrm_policy_hash_generation;
 
-static void xfrm_init_pmtu(struct dst_entry *dst);
+static void xfrm_init_pmtu(struct xfrm_dst **bundle, int nr);
 static int stale_bundle(struct dst_entry *dst);
 static int xfrm_bundle_ok(struct xfrm_dst *xdst);
 static void xfrm_policy_queue_process(struct timer_list *t);
@@ -1537,7 +1537,9 @@ static inline int xfrm_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
  */
 
 static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
-					    struct xfrm_state **xfrm, int nx,
+					    struct xfrm_state **xfrm,
+					    struct xfrm_dst **bundle,
+					    int nx,
 					    const struct flowi *fl,
 					    struct dst_entry *dst)
 {
@@ -1572,6 +1574,7 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 			goto put_states;
 		}
 
+		bundle[i] = xdst;
 		if (!xdst_prev)
 			xdst0 = xdst;
 		else
@@ -1615,7 +1618,6 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 		dst1->input = dst_discard;
 		dst1->output = inner_mode->afinfo->output;
 
-		dst1->next = &xdst_prev->u.dst;
 		xdst_prev = xdst;
 
 		header_len += xfrm[i]->props.header_len;
@@ -1633,7 +1635,7 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 		goto free_dst;
 
 	xfrm_init_path(xdst0, dst, nfheader_len);
-	xfrm_init_pmtu(&xdst_prev->u.dst);
+	xfrm_init_pmtu(bundle, nx);
 
 	for (xdst_prev = xdst0; xdst_prev != (struct xfrm_dst *)dst;
 	     xdst_prev = (struct xfrm_dst *) xfrm_dst_child(&xdst_prev->u.dst)) {
@@ -1807,6 +1809,7 @@ xfrm_resolve_and_create_bundle(struct xfrm_policy **pols, int num_pols,
 {
 	struct net *net = xp_net(pols[0]);
 	struct xfrm_state *xfrm[XFRM_MAX_DEPTH];
+	struct xfrm_dst *bundle[XFRM_MAX_DEPTH];
 	struct xfrm_dst *xdst, *old;
 	struct dst_entry *dst;
 	int err;
@@ -1832,7 +1835,7 @@ xfrm_resolve_and_create_bundle(struct xfrm_policy **pols, int num_pols,
 		return ERR_PTR(err);
 	}
 
-	dst = xfrm_bundle_create(pols[0], xfrm, err, fl, dst_orig);
+	dst = xfrm_bundle_create(pols[0], xfrm, bundle, err, fl, dst_orig);
 	if (IS_ERR(dst)) {
 		XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTBUNDLEGENERROR);
 		return ERR_CAST(dst);
@@ -2593,12 +2596,14 @@ static struct dst_entry *xfrm_negative_advice(struct dst_entry *dst)
 	return dst;
 }
 
-static void xfrm_init_pmtu(struct dst_entry *dst)
+static void xfrm_init_pmtu(struct xfrm_dst **bundle, int nr)
 {
-	do {
-		struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
+	while (nr--) {
+		struct xfrm_dst *xdst = bundle[nr];
 		u32 pmtu, route_mtu_cached;
+		struct dst_entry *dst;
 
+		dst = &xdst->u.dst;
 		pmtu = dst_mtu(xfrm_dst_child(dst));
 		xdst->child_mtu_cached = pmtu;
 
@@ -2611,7 +2616,7 @@ static void xfrm_init_pmtu(struct dst_entry *dst)
 			pmtu = route_mtu_cached;
 
 		dst_metric_set(dst, RTAX_MTU, pmtu);
-	} while ((dst = dst->next));
+	}
 }
 
 /* Check that the bundle accepts the flow and its components are
@@ -2620,8 +2625,10 @@ static void xfrm_init_pmtu(struct dst_entry *dst)
 
 static int xfrm_bundle_ok(struct xfrm_dst *first)
 {
+	struct xfrm_dst *bundle[XFRM_MAX_DEPTH];
 	struct dst_entry *dst = &first->u.dst;
-	struct xfrm_dst *last;
+	struct xfrm_dst *xdst;
+	int start_from, nr;
 	u32 mtu;
 
 	if (!dst_check(xfrm_dst_path(dst), ((struct xfrm_dst *)dst)->path_cookie) ||
@@ -2631,8 +2638,7 @@ static int xfrm_bundle_ok(struct xfrm_dst *first)
 	if (dst->flags & DST_XFRM_QUEUE)
 		return 1;
 
-	last = NULL;
-
+	start_from = nr = 0;
 	do {
 		struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
 
@@ -2644,9 +2650,11 @@ static int xfrm_bundle_ok(struct xfrm_dst *first)
 		    xdst->policy_genid != atomic_read(&xdst->pols[0]->genid))
 			return 0;
 
+		bundle[nr++] = xdst;
+
 		mtu = dst_mtu(xfrm_dst_child(dst));
 		if (xdst->child_mtu_cached != mtu) {
-			last = xdst;
+			start_from = nr;
 			xdst->child_mtu_cached = mtu;
 		}
 
@@ -2654,30 +2662,30 @@ static int xfrm_bundle_ok(struct xfrm_dst *first)
 			return 0;
 		mtu = dst_mtu(xdst->route);
 		if (xdst->route_mtu_cached != mtu) {
-			last = xdst;
+			start_from = nr;
 			xdst->route_mtu_cached = mtu;
 		}
 
 		dst = xfrm_dst_child(dst);
 	} while (dst->xfrm);
 
-	if (likely(!last))
+	if (likely(!start_from))
 		return 1;
 
-	mtu = last->child_mtu_cached;
-	for (;;) {
-		dst = &last->u.dst;
+	xdst = bundle[start_from - 1];
+	mtu = xdst->child_mtu_cached;
+	while (start_from--) {
+		dst = &xdst->u.dst;
 
 		mtu = xfrm_state_mtu(dst->xfrm, mtu);
-		if (mtu > last->route_mtu_cached)
-			mtu = last->route_mtu_cached;
+		if (mtu > xdst->route_mtu_cached)
+			mtu = xdst->route_mtu_cached;
 		dst_metric_set(dst, RTAX_MTU, mtu);
-
-		if (last == first)
+		if (!start_from)
 			break;
 
-		last = (struct xfrm_dst *)last->u.dst.next;
-		last->child_mtu_cached = mtu;
+		xdst = bundle[start_from - 1];
+		xdst->child_mtu_cached = mtu;
 	}
 
 	return 1;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v2 PATCH 11/11] net: Remove dst->next
  2017-10-31 14:10 [RFC v2 PATCH 00/11] net: Significantly shrink the size of routes David S. Miller
                   ` (9 preceding siblings ...)
  2017-10-31 14:10 ` [RFC v2 PATCH 10/11] xfrm: Stop using dst->next in bundle construction David S. Miller
@ 2017-10-31 14:10 ` David S. Miller
  2017-10-31 18:52   ` Eric Dumazet
  10 siblings, 1 reply; 24+ messages in thread
From: David S. Miller @ 2017-10-31 14:10 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

There are no more users.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dst.h | 4 ----
 net/core/dst.c    | 1 -
 2 files changed, 5 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index ffd0d81f861f..b0e71091d159 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -82,10 +82,6 @@ struct dst_entry {
 	short			error;
 	short			__pad;
 	__u32			tclassid;
-
-	union {
-		struct dst_entry	*next;
-	};
 };
 
 struct dst_metrics {
diff --git a/net/core/dst.c b/net/core/dst.c
index 9bc3bb6e94ef..007aa0b08291 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -86,7 +86,6 @@ void dst_init(struct dst_entry *dst, struct dst_ops *ops,
 	dst->__use = 0;
 	dst->lastuse = jiffies;
 	dst->flags = flags;
-	dst->next = NULL;
 	if (!(flags & DST_NOCOUNT))
 		dst_entries_add(ops, 1);
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC v2 PATCH 01/11] net: dst->rt_next is unused.
  2017-10-31 14:10 ` [RFC v2 PATCH 01/11] net: dst->rt_next is unused David S. Miller
@ 2017-10-31 18:36   ` Eric Dumazet
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2017-10-31 18:36 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

On Tue, 2017-10-31 at 23:10 +0900, David S. Miller wrote:
> Delete it.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2 PATCH 02/11] decnet: Move dn_next into decnet route structure.
  2017-10-31 14:10 ` [RFC v2 PATCH 02/11] decnet: Move dn_next into decnet route structure David S. Miller
@ 2017-10-31 18:36   ` Eric Dumazet
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2017-10-31 18:36 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

On Tue, 2017-10-31 at 23:10 +0900, David S. Miller wrote:
> Signed-off-by: David S. Miller <davem@davemloft.net>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2 PATCH 03/11] ipv6: Move rt6_next from dst_entry into ipv6 route structure.
  2017-10-31 14:10 ` [RFC v2 PATCH 03/11] ipv6: Move rt6_next from dst_entry into ipv6 " David S. Miller
@ 2017-10-31 18:37   ` Eric Dumazet
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2017-10-31 18:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

On Tue, 2017-10-31 at 23:10 +0900, David S. Miller wrote:
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2 PATCH 04/11] net: Create and use new helper xfrm_dst_child().
  2017-10-31 14:10 ` [RFC v2 PATCH 04/11] net: Create and use new helper xfrm_dst_child() David S. Miller
@ 2017-10-31 18:39   ` Eric Dumazet
  2017-11-01  2:07     ` David Miller
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Dumazet @ 2017-10-31 18:39 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

On Tue, 2017-10-31 at 23:10 +0900, David S. Miller wrote:
> Only IPSEC routes have a non-NULL dst->child pointer.  And IPSEC
> routes are identified by a non-NULL dst->xfrm pointer.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  include/net/xfrm.h           |  9 +++++++++
>  net/core/dst.c               |  8 +++++---
>  net/ipv4/xfrm4_mode_tunnel.c |  2 +-
>  net/ipv6/xfrm6_mode_tunnel.c |  2 +-
>  net/ipv6/xfrm6_policy.c      |  2 +-
>  net/xfrm/xfrm_output.c       |  2 +-
>  net/xfrm/xfrm_policy.c       | 12 ++++++------
>  security/selinux/xfrm.c      |  2 +-
>  8 files changed, 25 insertions(+), 14 deletions(-)
> 
> diff --git a/include/net/xfrm.h b/include/net/xfrm.h
> index f002a2c5e33c..be599f9bb60d 100644
> --- a/include/net/xfrm.h
> +++ b/include/net/xfrm.h
> @@ -993,6 +993,15 @@ struct xfrm_dst {
>  	u32 path_cookie;
>  };
>  
> +static inline struct dst_entry *xfrm_dst_child(const struct dst_entry *dst)
> +{
> +#ifdef CONFIG_XFRM
> +	if (dst->xfrm)
> +		return dst->child;
> +#endif
> +	return NULL;
> +}
> +
>  #ifdef CONFIG_XFRM
>  static inline void xfrm_dst_destroy(struct xfrm_dst *xdst)
>  {
> diff --git a/net/core/dst.c b/net/core/dst.c
> index 662a2d4a3d19..6a3c21b8fc8d 100644
> --- a/net/core/dst.c
> +++ b/net/core/dst.c
> @@ -116,12 +116,14 @@ EXPORT_SYMBOL(dst_alloc);
>  
>  struct dst_entry *dst_destroy(struct dst_entry * dst)
>  {
> -	struct dst_entry *child;
> +	struct dst_entry *child = NULL;
>  
>  	smp_rmb();
>  
> -	child = dst->child;
> -
> +#ifdef CONFIG_XFRM
> +	if (dst->xfrm)
> +		child = dst->child;
> +#endif


Why not using here :

	child = xfrm_dst_child(dst);

This avoid the #ifdef and uses the new helper quite well.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2 PATCH 05/11] ipsec: Create and use new helpers for dst child access.
  2017-10-31 14:10 ` [RFC v2 PATCH 05/11] ipsec: Create and use new helpers for dst child access David S. Miller
@ 2017-10-31 18:40   ` Eric Dumazet
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2017-10-31 18:40 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

On Tue, 2017-10-31 at 23:10 +0900, David S. Miller wrote:
> This will make a future change moving the dst->child pointer less
> invasive.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2 PATCH 06/11] xfrm: Move child route linkage into xfrm_dst.
  2017-10-31 14:10 ` [RFC v2 PATCH 06/11] xfrm: Move child route linkage into xfrm_dst David S. Miller
@ 2017-10-31 18:42   ` Eric Dumazet
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2017-10-31 18:42 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

On Tue, 2017-10-31 at 23:10 +0900, David S. Miller wrote:
> XFRM bundle child chains look like this:
> 
> 	xdst1 --> xdst2 --> xdst3 --> path_dst
> 
> All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL.
> The final child pointer in the chain, here called 'path_dst', is some
> other kind of route such as an ipv4 or ipv6 one.
> 
> The xfrm output path pops routes, one at a time, via the child
> pointer, until we hit one which has a dst->xfrm pointer which
> is NULL.
> 
> We can easily preserve the above mechanisms with child sitting
> only in the xfrm_dst structure.  All children in the chain
> before we break out of the xfrm_output() loop have dst->xfrm
> non-NULL and are therefore xfrm_dst objects.
> 
> Since we break out of the loop when we find dst->xfrm NULL, we
> will not try to dereference 'dst' as if it were an xfrm_dst.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---


Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2 PATCH 07/11] ipv6: Move dst->from into struct rt6_info.
  2017-10-31 14:10 ` [RFC v2 PATCH 07/11] ipv6: Move dst->from into struct rt6_info David S. Miller
@ 2017-10-31 18:47   ` Eric Dumazet
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2017-10-31 18:47 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

On Tue, 2017-10-31 at 23:10 +0900, David S. Miller wrote:
> The dst->from value is only used by ipv6 routes to track where
> a route "came from".
> 
> Any time we clone or copy a core ipv6 route in the ipv6 routing
> tables, we have the copy/clone's ->from point to the base route.
> 
> This is used to handle route expiration properly.
> 
> Only ipv6 uses this mechanism, and only ipv6 code references
> it.  So it is safe to move it into rt6_info.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2 PATCH 08/11] xfrm: Move dst->path into struct xfrm_dst
  2017-10-31 14:10 ` [RFC v2 PATCH 08/11] xfrm: Move dst->path into struct xfrm_dst David S. Miller
@ 2017-10-31 18:49   ` Eric Dumazet
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2017-10-31 18:49 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

On Tue, 2017-10-31 at 23:10 +0900, David S. Miller wrote:
> The first member of an IPSEC route bundle chain sets it's dst->path to
> the underlying ipv4/ipv6 route that carries the bundle.
> 
> Stated another way, if one were to follow the xfrm_dst->child chain of
> the bundle, the final non-NULL pointer would be the path and point to
> either an ipv4 or an ipv6 route.
> 
> This is largely used to make sure that PMTU events propagate down to
> the correct ipv4 or ipv6 route.
> 
> When we don't have the top of an IPSEC bundle 'dst->path == dst'.
> 
> Move it down into xfrm_dst and key off of dst->xfrm.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2 PATCH 09/11] net: Rearrange dst_entry layout to avoid useless padding.
  2017-10-31 14:10 ` [RFC v2 PATCH 09/11] net: Rearrange dst_entry layout to avoid useless padding David S. Miller
@ 2017-10-31 18:49   ` Eric Dumazet
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2017-10-31 18:49 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

On Tue, 2017-10-31 at 23:10 +0900, David S. Miller wrote:
> We have padding to try and align the refcount on a separate cache
> line.  But after several simplifications the padding has increased
> substantially.
> 
> So now it's easy to change the layout to get rid of the padding
> entirely.
> 
> We group the write-heavy __refcnt and __use with less often used
> items such as the rcu_head and the error code.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2 PATCH 10/11] xfrm: Stop using dst->next in bundle construction.
  2017-10-31 14:10 ` [RFC v2 PATCH 10/11] xfrm: Stop using dst->next in bundle construction David S. Miller
@ 2017-10-31 18:52   ` Eric Dumazet
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2017-10-31 18:52 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

On Tue, 2017-10-31 at 23:10 +0900, David S. Miller wrote:
> While building ipsec bundles, blocks of xfrm dsts are linked together
> using dst->next from bottom to the top.
> 
> The only thing this is used for is initializing the pmtu values of the
> xfrm stack, and for updating the mtu values at xfrm_bundle_ok() time.
> 
> The bundle pmtu entries must be processed in this order so that pmtu
> values lower in the stack of routes can propagate up to the higher
> ones.
> 
> Avoid using dst->next by simply maintaining an array of dst pointers
> as we already do for the xfrm_state objects when building the bundle.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2 PATCH 11/11] net: Remove dst->next
  2017-10-31 14:10 ` [RFC v2 PATCH 11/11] net: Remove dst->next David S. Miller
@ 2017-10-31 18:52   ` Eric Dumazet
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2017-10-31 18:52 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

On Tue, 2017-10-31 at 23:10 +0900, David S. Miller wrote:
> There are no more users.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  include/net/dst.h | 4 ----
>  net/core/dst.c    | 1 -
>  2 files changed, 5 deletions(-)


Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2 PATCH 04/11] net: Create and use new helper xfrm_dst_child().
  2017-10-31 18:39   ` Eric Dumazet
@ 2017-11-01  2:07     ` David Miller
  0 siblings, 0 replies; 24+ messages in thread
From: David Miller @ 2017-11-01  2:07 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 31 Oct 2017 11:39:22 -0700

> On Tue, 2017-10-31 at 23:10 +0900, David S. Miller wrote:
>> @@ -116,12 +116,14 @@ EXPORT_SYMBOL(dst_alloc);
>>  
>>  struct dst_entry *dst_destroy(struct dst_entry * dst)
>>  {
>> -	struct dst_entry *child;
>> +	struct dst_entry *child = NULL;
>>  
>>  	smp_rmb();
>>  
>> -	child = dst->child;
>> -
>> +#ifdef CONFIG_XFRM
>> +	if (dst->xfrm)
>> +		child = dst->child;
>> +#endif
> 
> 
> Why not using here :
> 
> 	child = xfrm_dst_child(dst);
> 
> This avoid the #ifdef and uses the new helper quite well.

Yep, that makes a lot of sense, thanks for the review(s).

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-11-01  2:07 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-31 14:10 [RFC v2 PATCH 00/11] net: Significantly shrink the size of routes David S. Miller
2017-10-31 14:10 ` [RFC v2 PATCH 01/11] net: dst->rt_next is unused David S. Miller
2017-10-31 18:36   ` Eric Dumazet
2017-10-31 14:10 ` [RFC v2 PATCH 02/11] decnet: Move dn_next into decnet route structure David S. Miller
2017-10-31 18:36   ` Eric Dumazet
2017-10-31 14:10 ` [RFC v2 PATCH 03/11] ipv6: Move rt6_next from dst_entry into ipv6 " David S. Miller
2017-10-31 18:37   ` Eric Dumazet
2017-10-31 14:10 ` [RFC v2 PATCH 04/11] net: Create and use new helper xfrm_dst_child() David S. Miller
2017-10-31 18:39   ` Eric Dumazet
2017-11-01  2:07     ` David Miller
2017-10-31 14:10 ` [RFC v2 PATCH 05/11] ipsec: Create and use new helpers for dst child access David S. Miller
2017-10-31 18:40   ` Eric Dumazet
2017-10-31 14:10 ` [RFC v2 PATCH 06/11] xfrm: Move child route linkage into xfrm_dst David S. Miller
2017-10-31 18:42   ` Eric Dumazet
2017-10-31 14:10 ` [RFC v2 PATCH 07/11] ipv6: Move dst->from into struct rt6_info David S. Miller
2017-10-31 18:47   ` Eric Dumazet
2017-10-31 14:10 ` [RFC v2 PATCH 08/11] xfrm: Move dst->path into struct xfrm_dst David S. Miller
2017-10-31 18:49   ` Eric Dumazet
2017-10-31 14:10 ` [RFC v2 PATCH 09/11] net: Rearrange dst_entry layout to avoid useless padding David S. Miller
2017-10-31 18:49   ` Eric Dumazet
2017-10-31 14:10 ` [RFC v2 PATCH 10/11] xfrm: Stop using dst->next in bundle construction David S. Miller
2017-10-31 18:52   ` Eric Dumazet
2017-10-31 14:10 ` [RFC v2 PATCH 11/11] net: Remove dst->next David S. Miller
2017-10-31 18:52   ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).