netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/2] ipv4: fib_select_default changes
@ 2015-07-22  7:43 Julian Anastasov
  2015-07-22  7:43 ` [PATCH net 1/2] ipv4: fib_select_default should match the prefix Julian Anastasov
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Julian Anastasov @ 2015-07-22  7:43 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Florian Westphal, Hagen Paul Pfeifer

This patchset contains 2 changes for the alternative routes,
one to add tb_id/fa_slen check needed after the recent
fib_trie optimizations for fib aliases and the second
change attempts to support alternative routes with TOS
requirement.

	Sorry that I don't have access to the original
report from Hagen Paul Pfeifer. I hope he will see this
change.

	The second change adds fa_default field to the
fib aliases (which can be many) and if the feature to
filter the alternative routes by TOS is not worth it,
this second patch can be scrapped.

Julian Anastasov (2):
  ipv4: fib_select_default should match the prefix
  ipv4: consider TOS in fib_select_default

 include/net/ip_fib.h     |  3 +--
 net/ipv4/fib_lookup.h    |  1 +
 net/ipv4/fib_semantics.c | 41 ++++++++++++++++++++++++++++++-----------
 net/ipv4/fib_trie.c      |  3 ++-
 net/ipv4/route.c         |  2 +-
 5 files changed, 35 insertions(+), 15 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH net 1/2] ipv4: fib_select_default should match the prefix
  2015-07-22  7:43 [PATCH net 0/2] ipv4: fib_select_default changes Julian Anastasov
@ 2015-07-22  7:43 ` Julian Anastasov
  2015-07-22  7:43 ` [PATCH net 2/2] ipv4: consider TOS in fib_select_default Julian Anastasov
  2015-07-25  5:46 ` [PATCH net 0/2] ipv4: fib_select_default changes David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Julian Anastasov @ 2015-07-22  7:43 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Florian Westphal, Hagen Paul Pfeifer

fib_trie starting from 4.1 can link fib aliases from
different prefixes in same list. Make sure the alternative
gateways are in same table and for same prefix (0) by
checking tb_id and fa_slen.

Fixes: 79e5ad2ceb00 ("fib_trie: Remove leaf_info")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 net/ipv4/fib_semantics.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index c7358ea..e107958 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1207,12 +1207,17 @@ void fib_select_default(struct fib_result *res)
 	struct fib_info *fi = NULL, *last_resort = NULL;
 	struct hlist_head *fa_head = res->fa_head;
 	struct fib_table *tb = res->table;
+	u8 slen = 32 - res->prefixlen;
 	int order = -1, last_idx = -1;
 	struct fib_alias *fa;
 
 	hlist_for_each_entry_rcu(fa, fa_head, fa_list) {
 		struct fib_info *next_fi = fa->fa_info;
 
+		if (fa->fa_slen != slen)
+			continue;
+		if (fa->tb_id != tb->tb_id)
+			continue;
 		if (next_fi->fib_scope != res->scope ||
 		    fa->fa_type != RTN_UNICAST)
 			continue;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH net 2/2] ipv4: consider TOS in fib_select_default
  2015-07-22  7:43 [PATCH net 0/2] ipv4: fib_select_default changes Julian Anastasov
  2015-07-22  7:43 ` [PATCH net 1/2] ipv4: fib_select_default should match the prefix Julian Anastasov
@ 2015-07-22  7:43 ` Julian Anastasov
  2015-07-25  5:46 ` [PATCH net 0/2] ipv4: fib_select_default changes David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Julian Anastasov @ 2015-07-22  7:43 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Florian Westphal, Hagen Paul Pfeifer

fib_select_default considers alternative routes only when
res->fi is for the first alias in res->fa_head. In the
common case this can happen only when the initial lookup
matches the first alias with highest TOS value. This
prevents the alternative routes to require specific TOS.

This patch solves the problem as follows:

- routes that require specific TOS should be returned by
fib_select_default only when TOS matches, as already done
in fib_table_lookup. This rule implies that depending on the
TOS we can have many different lists of alternative gateways
and we have to keep the last used gateway (fa_default) in first
alias for the TOS instead of using single tb_default value.

- as the aliases are ordered by many keys (TOS desc,
fib_priority asc), we restrict the possible results to
routes with matching TOS and lowest metric (fib_priority)
and routes that match any TOS, again with lowest metric.

For example, packet with TOS 8 can not use gw3 (not lowest
metric), gw4 (different TOS) and gw6 (not lowest metric),
all other gateways can be used:

tos 8 via gw1 metric 2 <--- res->fa_head and res->fi
tos 8 via gw2 metric 2
tos 8 via gw3 metric 3
tos 4 via gw4
tos 0 via gw5
tos 0 via gw6 metric 1

Reported-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/ip_fib.h     |  3 +--
 net/ipv4/fib_lookup.h    |  1 +
 net/ipv4/fib_semantics.c | 36 +++++++++++++++++++++++++-----------
 net/ipv4/fib_trie.c      |  3 ++-
 net/ipv4/route.c         |  2 +-
 5 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 49c142b..5fa643b 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -183,7 +183,6 @@ __be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh);
 struct fib_table {
 	struct hlist_node	tb_hlist;
 	u32			tb_id;
-	int			tb_default;
 	int			tb_num_default;
 	struct rcu_head		rcu;
 	unsigned long 		*tb_data;
@@ -290,7 +289,7 @@ __be32 fib_compute_spec_dst(struct sk_buff *skb);
 int fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
 			u8 tos, int oif, struct net_device *dev,
 			struct in_device *idev, u32 *itag);
-void fib_select_default(struct fib_result *res);
+void fib_select_default(const struct flowi4 *flp, struct fib_result *res);
 #ifdef CONFIG_IP_ROUTE_CLASSID
 static inline int fib_num_tclassid_users(struct net *net)
 {
diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
index c6211ed..9c02920 100644
--- a/net/ipv4/fib_lookup.h
+++ b/net/ipv4/fib_lookup.h
@@ -13,6 +13,7 @@ struct fib_alias {
 	u8			fa_state;
 	u8			fa_slen;
 	u32			tb_id;
+	s16			fa_default;
 	struct rcu_head		rcu;
 };
 
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index e107958..3a06586 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1202,28 +1202,40 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event)
 }
 
 /* Must be invoked inside of an RCU protected region.  */
-void fib_select_default(struct fib_result *res)
+void fib_select_default(const struct flowi4 *flp, struct fib_result *res)
 {
 	struct fib_info *fi = NULL, *last_resort = NULL;
 	struct hlist_head *fa_head = res->fa_head;
 	struct fib_table *tb = res->table;
 	u8 slen = 32 - res->prefixlen;
 	int order = -1, last_idx = -1;
-	struct fib_alias *fa;
+	struct fib_alias *fa, *fa1 = NULL;
+	u32 last_prio = res->fi->fib_priority;
+	u8 last_tos = 0;
 
 	hlist_for_each_entry_rcu(fa, fa_head, fa_list) {
 		struct fib_info *next_fi = fa->fa_info;
 
 		if (fa->fa_slen != slen)
 			continue;
+		if (fa->fa_tos && fa->fa_tos != flp->flowi4_tos)
+			continue;
 		if (fa->tb_id != tb->tb_id)
 			continue;
+		if (next_fi->fib_priority > last_prio &&
+		    fa->fa_tos == last_tos) {
+			if (last_tos)
+				continue;
+			break;
+		}
+		if (next_fi->fib_flags & RTNH_F_DEAD)
+			continue;
+		last_tos = fa->fa_tos;
+		last_prio = next_fi->fib_priority;
+
 		if (next_fi->fib_scope != res->scope ||
 		    fa->fa_type != RTN_UNICAST)
 			continue;
-
-		if (next_fi->fib_priority > res->fi->fib_priority)
-			break;
 		if (!next_fi->fib_nh[0].nh_gw ||
 		    next_fi->fib_nh[0].nh_scope != RT_SCOPE_LINK)
 			continue;
@@ -1233,10 +1245,11 @@ void fib_select_default(struct fib_result *res)
 		if (!fi) {
 			if (next_fi != res->fi)
 				break;
+			fa1 = fa;
 		} else if (!fib_detect_death(fi, order, &last_resort,
-					     &last_idx, tb->tb_default)) {
+					     &last_idx, fa1->fa_default)) {
 			fib_result_assign(res, fi);
-			tb->tb_default = order;
+			fa1->fa_default = order;
 			goto out;
 		}
 		fi = next_fi;
@@ -1244,20 +1257,21 @@ void fib_select_default(struct fib_result *res)
 	}
 
 	if (order <= 0 || !fi) {
-		tb->tb_default = -1;
+		if (fa1)
+			fa1->fa_default = -1;
 		goto out;
 	}
 
 	if (!fib_detect_death(fi, order, &last_resort, &last_idx,
-				tb->tb_default)) {
+			      fa1->fa_default)) {
 		fib_result_assign(res, fi);
-		tb->tb_default = order;
+		fa1->fa_default = order;
 		goto out;
 	}
 
 	if (last_idx >= 0)
 		fib_result_assign(res, last_resort);
-	tb->tb_default = last_idx;
+	fa1->fa_default = last_idx;
 out:
 	return;
 }
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 15d3261..81797e0 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1171,6 +1171,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 			new_fa->fa_state = state & ~FA_S_ACCESSED;
 			new_fa->fa_slen = fa->fa_slen;
 			new_fa->tb_id = tb->tb_id;
+			new_fa->fa_default = -1;
 
 			err = switchdev_fib_ipv4_add(key, plen, fi,
 						     new_fa->fa_tos,
@@ -1222,6 +1223,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 	new_fa->fa_state = 0;
 	new_fa->fa_slen = slen;
 	new_fa->tb_id = tb->tb_id;
+	new_fa->fa_default = -1;
 
 	/* (Optionally) offload fib entry to switch hardware. */
 	err = switchdev_fib_ipv4_add(key, plen, fi, tos, cfg->fc_type,
@@ -1990,7 +1992,6 @@ struct fib_table *fib_trie_table(u32 id, struct fib_table *alias)
 		return NULL;
 
 	tb->tb_id = id;
-	tb->tb_default = -1;
 	tb->tb_num_default = 0;
 	tb->tb_data = (alias ? alias->__data : tb->__data);
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d0362a2..e681b85 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2176,7 +2176,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
 	if (!res.prefixlen &&
 	    res.table->tb_num_default > 1 &&
 	    res.type == RTN_UNICAST && !fl4->flowi4_oif)
-		fib_select_default(&res);
+		fib_select_default(fl4, &res);
 
 	if (!fl4->saddr)
 		fl4->saddr = FIB_RES_PREFSRC(net, res);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net 0/2] ipv4: fib_select_default changes
  2015-07-22  7:43 [PATCH net 0/2] ipv4: fib_select_default changes Julian Anastasov
  2015-07-22  7:43 ` [PATCH net 1/2] ipv4: fib_select_default should match the prefix Julian Anastasov
  2015-07-22  7:43 ` [PATCH net 2/2] ipv4: consider TOS in fib_select_default Julian Anastasov
@ 2015-07-25  5:46 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2015-07-25  5:46 UTC (permalink / raw)
  To: ja; +Cc: netdev, fw, hagen

From: Julian Anastasov <ja@ssi.bg>
Date: Wed, 22 Jul 2015 10:43:21 +0300

> This patchset contains 2 changes for the alternative routes,
> one to add tb_id/fa_slen check needed after the recent
> fib_trie optimizations for fib aliases and the second
> change attempts to support alternative routes with TOS
> requirement.
> 
> 	Sorry that I don't have access to the original
> report from Hagen Paul Pfeifer. I hope he will see this
> change.
> 
> 	The second change adds fa_default field to the
> fib aliases (which can be many) and if the feature to
> filter the alternative routes by TOS is not worth it,
> this second patch can be scrapped.

Great work, series applied, thanks Julian!

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-07-25  5:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-22  7:43 [PATCH net 0/2] ipv4: fib_select_default changes Julian Anastasov
2015-07-22  7:43 ` [PATCH net 1/2] ipv4: fib_select_default should match the prefix Julian Anastasov
2015-07-22  7:43 ` [PATCH net 2/2] ipv4: consider TOS in fib_select_default Julian Anastasov
2015-07-25  5:46 ` [PATCH net 0/2] ipv4: fib_select_default changes David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).