All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 net-next 0/2] ipv4: Hash-based multipath routing
@ 2015-09-15 20:29 Peter Nørlund
  2015-09-15 20:29 ` [PATCH v3 net-next 1/2] ipv4: L3 hash-based multipath Peter Nørlund
  2015-09-15 20:29 ` [PATCH v3 net-next 2/2] ipv4: ICMP packet inspection for multipath Peter Nørlund
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Nørlund @ 2015-09-15 20:29 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

When the routing cache was removed in 3.6, the IPv4 multipath algorithm changed
from more or less being destination-based into being quasi-random per-packet
scheduling. This increases the risk of out-of-order packets and makes it
impossible to use multipath together with anycast services.

This patch series replaces the old implementation with flow-based load
balancing based on a hash over the source and destination addresses.

Distribution of the hash is done with thresholds as described in RFC 2992.
This reduces the disruption when a path is added/remove when having more than
two paths.

To futher the chance of successful usage in conjuction with anycast, ICMP
error packets are hashed over the inner IP addresses. This ensures that PMTU
will work together with anycast or load-balancers such as IPVS.

Port numbers are not considered since fragments could cause problems with
anycast and IPVS. Relying on the DF-flag for TCP packets is also insufficient,
since ICMP inspection effectively extracts information from the opposite
flow which might have a different state of the DF-flag. This is also why the
RSS hash is not used. These are typically based on the NDIS RSS spec which
mandates TCP support.

Benchmarking on a Xeon X3550 (4 cores, 2.66GHz) showed that it was desireable
to move the ICMP handling to a separate method. The reason for this being that
the standard hash function can work without using the stack, and the ICMP
function cannot (due to skb_header_pointer), causing 4 additional hits on the
cache. By separating the two, the fast path (non-ICMP) only requires three
reads from cache.

Two-path benchmarks (ip_mkroute_input excl. __mkroute_input):

Original per-packet:  ~394 cycles/packet
L3 hash w/o noinline: ~128 cycles/packet
L3 hash w/ noinline:   ~97 cycles/packet

Changes in v3:
- Multipath algorithm is no longer configurable (always L3)
- Added random seed to hash
- Moved ICMP inspection to isolated function
- Ignore source quench packets (deprecated as per RFC 6633)

Changes in v2:
- Replaced 8-bit xor hash with 31-bit jenkins hash
- Don't scale weights (since 31-bit)
- Avoided unnecesary renaming of variables
- Rely on DF-bit instead of fragment offset when checking for fragmentation
- upper_bound is now inclusive to avoid overflow
- Use a callback to postpone extracting flow information until necessary
- Skipped ICMP inspection entirely with L4 hashing
- Handle newly added sysctl ignore_routes_with_linkdown

Best Regards
 Peter Nørlund


Peter Nørlund (2):
      ipv4: L3 hash-based multipath
      ipv4: ICMP packet inspection for multipath

 include/net/ip_fib.h     |  11 +++-
 include/net/route.h      |  12 +++-
 net/ipv4/fib_semantics.c | 137 +++++++++++++++++++++++--------------------
 net/ipv4/icmp.c          |  16 +++++
 net/ipv4/route.c         |  73 +++++++++++++++++++++--
 5 files changed, 177 insertions(+), 72 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 net-next 1/2] ipv4: L3 hash-based multipath
  2015-09-15 20:29 [PATCH v3 net-next 0/2] ipv4: Hash-based multipath routing Peter Nørlund
@ 2015-09-15 20:29 ` Peter Nørlund
  2015-09-15 21:40   ` Alexander Duyck
  2015-09-21 16:52   ` David Ahern
  2015-09-15 20:29 ` [PATCH v3 net-next 2/2] ipv4: ICMP packet inspection for multipath Peter Nørlund
  1 sibling, 2 replies; 7+ messages in thread
From: Peter Nørlund @ 2015-09-15 20:29 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Peter Nørlund

Replaces the per-packet multipath with a hash-based multipath using
source and destination address.

Signed-off-by: Peter Nørlund <pch@ordbogen.com>
---
 include/net/ip_fib.h     |  11 ++--
 net/ipv4/fib_semantics.c | 137 +++++++++++++++++++++++++----------------------
 net/ipv4/route.c         |  23 +++++++-
 3 files changed, 102 insertions(+), 69 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index a37d043..c335dd4 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -79,7 +79,7 @@ struct fib_nh {
 	unsigned char		nh_scope;
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 	int			nh_weight;
-	int			nh_power;
+	atomic_t		nh_upper_bound;
 #endif
 #ifdef CONFIG_IP_ROUTE_CLASSID
 	__u32			nh_tclassid;
@@ -118,7 +118,7 @@ struct fib_info {
 #define fib_advmss fib_metrics[RTAX_ADVMSS-1]
 	int			fib_nhs;
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-	int			fib_power;
+	int			fib_weight;
 #endif
 	struct rcu_head		rcu;
 	struct fib_nh		fib_nh[0];
@@ -312,7 +312,12 @@ int ip_fib_check_default(__be32 gw, struct net_device *dev);
 int fib_sync_down_dev(struct net_device *dev, unsigned long event);
 int fib_sync_down_addr(struct net *net, __be32 local);
 int fib_sync_up(struct net_device *dev, unsigned int nh_flags);
-void fib_select_multipath(struct fib_result *res);
+
+extern u32 fib_multipath_secret __read_mostly;
+
+typedef int (*multipath_hash_func_t)(void *ctx);
+void fib_select_multipath(struct fib_result *res,
+			  multipath_hash_func_t hash_func, void *ctx);
 
 /* Exported by fib_trie.c */
 void fib_trie_init(void);
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 064bd3c..64d3e0e 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -57,8 +57,7 @@ static unsigned int fib_info_cnt;
 static struct hlist_head fib_info_devhash[DEVINDEX_HASHSIZE];
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-
-static DEFINE_SPINLOCK(fib_multipath_lock);
+u32 fib_multipath_secret __read_mostly;
 
 #define for_nexthops(fi) {						\
 	int nhsel; const struct fib_nh *nh;				\
@@ -468,6 +467,55 @@ static int fib_count_nexthops(struct rtnexthop *rtnh, int remaining)
 	return remaining > 0 ? 0 : nhs;
 }
 
+static void fib_rebalance(struct fib_info *fi)
+{
+	int total;
+	int w;
+	struct in_device *in_dev;
+
+	if (fi->fib_nhs < 2)
+		return;
+
+	total = 0;
+	for_nexthops(fi) {
+		if (nh->nh_flags & RTNH_F_DEAD)
+			continue;
+
+		in_dev = __in_dev_get_rcu(nh->nh_dev);
+
+		if (in_dev &&
+		    IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
+		    nh->nh_flags & RTNH_F_LINKDOWN)
+			continue;
+
+		total += nh->nh_weight;
+	} endfor_nexthops(fi);
+
+	w = 0;
+	change_nexthops(fi) {
+		int upper_bound;
+
+		in_dev = __in_dev_get_rcu(nexthop_nh->nh_dev);
+
+		if (nexthop_nh->nh_flags & RTNH_F_DEAD) {
+			upper_bound = -1;
+		} else if (in_dev &&
+			   IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
+			   nexthop_nh->nh_flags & RTNH_F_LINKDOWN) {
+			upper_bound = -1;
+		} else {
+			w += nexthop_nh->nh_weight;
+			upper_bound = DIV_ROUND_CLOSEST(2147483648LL * w,
+							total) - 1;
+		}
+
+		atomic_set(&nexthop_nh->nh_upper_bound, upper_bound);
+	} endfor_nexthops(fi);
+
+	net_get_random_once(&fib_multipath_secret,
+			    sizeof(fib_multipath_secret));
+}
+
 static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh,
 		       int remaining, struct fib_config *cfg)
 {
@@ -1094,8 +1142,15 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
 
 	change_nexthops(fi) {
 		fib_info_update_nh_saddr(net, nexthop_nh);
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+		fi->fib_weight += nexthop_nh->nh_weight;
+#endif
 	} endfor_nexthops(fi)
 
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+	fib_rebalance(fi);
+#endif
+
 link_it:
 	ofi = fib_find_info(fi);
 	if (ofi) {
@@ -1317,12 +1372,6 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event)
 					nexthop_nh->nh_flags |= RTNH_F_LINKDOWN;
 					break;
 				}
-#ifdef CONFIG_IP_ROUTE_MULTIPATH
-				spin_lock_bh(&fib_multipath_lock);
-				fi->fib_power -= nexthop_nh->nh_power;
-				nexthop_nh->nh_power = 0;
-				spin_unlock_bh(&fib_multipath_lock);
-#endif
 				dead++;
 			}
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
@@ -1345,6 +1394,10 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event)
 			}
 			ret++;
 		}
+
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+		fib_rebalance(fi);
+#endif
 	}
 
 	return ret;
@@ -1467,20 +1520,17 @@ int fib_sync_up(struct net_device *dev, unsigned int nh_flags)
 			    !__in_dev_get_rtnl(dev))
 				continue;
 			alive++;
-#ifdef CONFIG_IP_ROUTE_MULTIPATH
-			spin_lock_bh(&fib_multipath_lock);
-			nexthop_nh->nh_power = 0;
 			nexthop_nh->nh_flags &= ~nh_flags;
-			spin_unlock_bh(&fib_multipath_lock);
-#else
-			nexthop_nh->nh_flags &= ~nh_flags;
-#endif
 		} endfor_nexthops(fi)
 
 		if (alive > 0) {
 			fi->fib_flags &= ~nh_flags;
 			ret++;
 		}
+
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+		fib_rebalance(fi);
+#endif
 	}
 
 	return ret;
@@ -1488,62 +1538,21 @@ int fib_sync_up(struct net_device *dev, unsigned int nh_flags)
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 
-/*
- * The algorithm is suboptimal, but it provides really
- * fair weighted route distribution.
- */
-void fib_select_multipath(struct fib_result *res)
+void fib_select_multipath(struct fib_result *res,
+			  multipath_hash_func_t hash_func, void *ctx)
 {
 	struct fib_info *fi = res->fi;
-	struct in_device *in_dev;
-	int w;
-
-	spin_lock_bh(&fib_multipath_lock);
-	if (fi->fib_power <= 0) {
-		int power = 0;
-		change_nexthops(fi) {
-			in_dev = __in_dev_get_rcu(nexthop_nh->nh_dev);
-			if (nexthop_nh->nh_flags & RTNH_F_DEAD)
-				continue;
-			if (in_dev &&
-			    IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
-			    nexthop_nh->nh_flags & RTNH_F_LINKDOWN)
-				continue;
-			power += nexthop_nh->nh_weight;
-			nexthop_nh->nh_power = nexthop_nh->nh_weight;
-		} endfor_nexthops(fi);
-		fi->fib_power = power;
-		if (power <= 0) {
-			spin_unlock_bh(&fib_multipath_lock);
-			/* Race condition: route has just become dead. */
-			res->nh_sel = 0;
-			return;
-		}
-	}
-
+	int h = hash_func(ctx) & 0x7FFFFFFF;
 
-	/* w should be random number [0..fi->fib_power-1],
-	 * it is pretty bad approximation.
-	 */
-
-	w = jiffies % fi->fib_power;
+	for_nexthops(fi) {
+		if (h > atomic_read(&nh->nh_upper_bound))
+			continue;
 
-	change_nexthops(fi) {
-		if (!(nexthop_nh->nh_flags & RTNH_F_DEAD) &&
-		    nexthop_nh->nh_power) {
-			w -= nexthop_nh->nh_power;
-			if (w <= 0) {
-				nexthop_nh->nh_power--;
-				fi->fib_power--;
-				res->nh_sel = nhsel;
-				spin_unlock_bh(&fib_multipath_lock);
-				return;
-			}
-		}
+		res->nh_sel = nhsel;
+		return;
 	} endfor_nexthops(fi);
 
 	/* Race condition: route has just become dead. */
 	res->nh_sel = 0;
-	spin_unlock_bh(&fib_multipath_lock);
 }
 #endif
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 5f4a556..41d977c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1646,6 +1646,25 @@ out:
 	return err;
 }
 
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+
+static int ip_multipath_hash_skb(void *ctx)
+{
+	const struct sk_buff *skb = (struct sk_buff *)ctx;
+	const struct iphdr *iph = ip_hdr(skb);
+
+	return jhash_2words(iph->saddr, iph->daddr, fib_multipath_secret);
+}
+
+static int ip_multipath_hash_fl4(void *ctx)
+{
+	const struct flowi4 *fl4 = (const struct flowi4 *)ctx;
+
+	return jhash_2words(fl4->saddr, fl4->daddr, fib_multipath_secret);
+}
+
+#endif /* CONFIG_IP_ROUTE_MULTIPATH */
+
 static int ip_mkroute_input(struct sk_buff *skb,
 			    struct fib_result *res,
 			    const struct flowi4 *fl4,
@@ -1654,7 +1673,7 @@ static int ip_mkroute_input(struct sk_buff *skb,
 {
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 	if (res->fi && res->fi->fib_nhs > 1)
-		fib_select_multipath(res);
+		fib_select_multipath(res, ip_multipath_hash_skb, skb);
 #endif
 
 	/* create a routing cache entry */
@@ -2200,7 +2219,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 	if (res.fi->fib_nhs > 1 && fl4->flowi4_oif == 0)
-		fib_select_multipath(&res);
+		fib_select_multipath(&res, ip_multipath_hash_fl4, fl4);
 	else
 #endif
 	if (!res.prefixlen &&
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 net-next 2/2] ipv4: ICMP packet inspection for multipath
  2015-09-15 20:29 [PATCH v3 net-next 0/2] ipv4: Hash-based multipath routing Peter Nørlund
  2015-09-15 20:29 ` [PATCH v3 net-next 1/2] ipv4: L3 hash-based multipath Peter Nørlund
@ 2015-09-15 20:29 ` Peter Nørlund
  1 sibling, 0 replies; 7+ messages in thread
From: Peter Nørlund @ 2015-09-15 20:29 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Peter Nørlund

ICMP packets are inspected to let them route together with the flow they
belong to, minimizing the chance that a problematic path will affect flows
on other paths, and so that anycast environments can work with ECMP.

Signed-off-by: Peter Nørlund <pch@ordbogen.com>
---
 include/net/route.h | 12 +++++++++++-
 net/ipv4/icmp.c     | 16 ++++++++++++++++
 net/ipv4/route.c    | 54 +++++++++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 77 insertions(+), 5 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index cc61cb9..bbbae2c 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -28,6 +28,7 @@
 #include <net/inetpeer.h>
 #include <net/flow.h>
 #include <net/inet_sock.h>
+#include <net/ip_fib.h>
 #include <linux/in_route.h>
 #include <linux/rtnetlink.h>
 #include <linux/rcupdate.h>
@@ -110,7 +111,16 @@ struct in_device;
 int ip_rt_init(void);
 void rt_cache_flush(struct net *net);
 void rt_flush_dev(struct net_device *dev);
-struct rtable *__ip_route_output_key(struct net *, struct flowi4 *flp);
+struct rtable *__ip_route_output_key_hash(struct net *, struct flowi4 *flp,
+					  multipath_hash_func_t hash_func,
+					  void *ctx);
+
+static inline struct rtable *__ip_route_output_key(struct net *net,
+						   struct flowi4 *flp)
+{
+	return __ip_route_output_key_hash(net, flp, NULL, NULL);
+}
+
 struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
 				    struct sock *sk);
 struct dst_entry *ipv4_blackhole_route(struct net *net,
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 79fe05b..9d7c97c 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -440,6 +440,17 @@ out_unlock:
 	icmp_xmit_unlock(sk);
 }
 
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+/* Source and destination is swapped. See ip_multipath_hash_skb */
+static int icmp_multipath_hash_skb(void *ctx)
+{
+	const struct sk_buff *skb = (const struct sk_buff *)ctx;
+	const struct iphdr *iph = ip_hdr(skb);
+
+	return jhash_2words(iph->daddr, iph->saddr, fib_multipath_secret);
+}
+#endif
+
 static struct rtable *icmp_route_lookup(struct net *net,
 					struct flowi4 *fl4,
 					struct sk_buff *skb_in,
@@ -464,7 +475,12 @@ static struct rtable *icmp_route_lookup(struct net *net,
 	fl4->flowi4_oif = vrf_master_ifindex(skb_in->dev) ? : skb_in->dev->ifindex;
 
 	security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4));
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+	rt = __ip_route_output_key_hash(net, fl4, icmp_multipath_hash_skb,
+					skb_in);
+#else
 	rt = __ip_route_output_key(net, fl4);
+#endif
 	if (IS_ERR(rt))
 		return rt;
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 41d977c..b472d8c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1648,11 +1648,51 @@ out:
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 
+static noinline int ip_multipath_icmp_hash_skb(struct sk_buff *skb)
+{
+	const struct iphdr *outer_iph = ip_hdr(skb);
+	struct icmphdr _icmph;
+	const struct icmphdr *icmph;
+	struct iphdr _inner_iph;
+	const struct iphdr *inner_iph;
+
+	if (unlikely((outer_iph->frag_off & htons(IP_OFFSET)) != 0))
+		goto standard_hash;
+
+	icmph = skb_header_pointer(skb, outer_iph->ihl * 4, sizeof(_icmph),
+				   &_icmph);
+	if (!icmph)
+		goto standard_hash;
+
+	if (icmph->type != ICMP_DEST_UNREACH &&
+	    icmph->type != ICMP_REDIRECT &&
+	    icmph->type != ICMP_TIME_EXCEEDED &&
+	    icmph->type != ICMP_PARAMETERPROB) {
+		goto standard_hash;
+	}
+
+	inner_iph = skb_header_pointer(skb,
+				       outer_iph->ihl * 4 + sizeof(_icmph),
+				       sizeof(_inner_iph), &_inner_iph);
+	if (!inner_iph)
+		goto standard_hash;
+
+	return jhash_2words(inner_iph->daddr, inner_iph->saddr,
+			    fib_multipath_secret);
+
+standard_hash:
+	return jhash_2words(outer_iph->saddr, outer_iph->daddr,
+			    fib_multipath_secret);
+}
+
 static int ip_multipath_hash_skb(void *ctx)
 {
-	const struct sk_buff *skb = (struct sk_buff *)ctx;
+	struct sk_buff *skb = (struct sk_buff *)ctx;
 	const struct iphdr *iph = ip_hdr(skb);
 
+	if (unlikely(iph->protocol == IPPROTO_ICMP))
+		return ip_multipath_icmp_hash_skb(skb);
+
 	return jhash_2words(iph->saddr, iph->daddr, fib_multipath_secret);
 }
 
@@ -2056,7 +2096,9 @@ add:
  * Major route resolver routine.
  */
 
-struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
+struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
+					  multipath_hash_func_t hash_func,
+					  void *ctx)
 {
 	struct net_device *dev_out = NULL;
 	__u8 tos = RT_FL_TOS(fl4);
@@ -2218,8 +2260,12 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
 	}
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-	if (res.fi->fib_nhs > 1 && fl4->flowi4_oif == 0)
-		fib_select_multipath(&res, ip_multipath_hash_fl4, fl4);
+	if (res.fi->fib_nhs > 1 && fl4->flowi4_oif == 0) {
+		if (hash_func)
+			fib_select_multipath(&res, hash_func, ctx);
+		else
+			fib_select_multipath(&res, ip_multipath_hash_fl4, fl4);
+	}
 	else
 #endif
 	if (!res.prefixlen &&
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 net-next 1/2] ipv4: L3 hash-based multipath
  2015-09-15 20:29 ` [PATCH v3 net-next 1/2] ipv4: L3 hash-based multipath Peter Nørlund
@ 2015-09-15 21:40   ` Alexander Duyck
  2015-09-15 22:32     ` Peter Nørlund
  2015-09-21 16:52   ` David Ahern
  1 sibling, 1 reply; 7+ messages in thread
From: Alexander Duyck @ 2015-09-15 21:40 UTC (permalink / raw)
  To: Peter Nørlund, netdev
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On 09/15/2015 01:29 PM, Peter Nørlund wrote:
> Replaces the per-packet multipath with a hash-based multipath using
> source and destination address.
>
> Signed-off-by: Peter Nørlund <pch@ordbogen.com>
> ---
>   include/net/ip_fib.h     |  11 ++--
>   net/ipv4/fib_semantics.c | 137 +++++++++++++++++++++++++----------------------
>   net/ipv4/route.c         |  23 +++++++-
>   3 files changed, 102 insertions(+), 69 deletions(-)
>
> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> index a37d043..c335dd4 100644
> --- a/include/net/ip_fib.h
> +++ b/include/net/ip_fib.h
> @@ -79,7 +79,7 @@ struct fib_nh {
>   	unsigned char		nh_scope;
>   #ifdef CONFIG_IP_ROUTE_MULTIPATH
>   	int			nh_weight;
> -	int			nh_power;
> +	atomic_t		nh_upper_bound;
>   #endif
>   #ifdef CONFIG_IP_ROUTE_CLASSID
>   	__u32			nh_tclassid;
> @@ -118,7 +118,7 @@ struct fib_info {
>   #define fib_advmss fib_metrics[RTAX_ADVMSS-1]
>   	int			fib_nhs;
>   #ifdef CONFIG_IP_ROUTE_MULTIPATH
> -	int			fib_power;
> +	int			fib_weight;
>   #endif
>   	struct rcu_head		rcu;
>   	struct fib_nh		fib_nh[0];
> @@ -312,7 +312,12 @@ int ip_fib_check_default(__be32 gw, struct net_device *dev);
>   int fib_sync_down_dev(struct net_device *dev, unsigned long event);
>   int fib_sync_down_addr(struct net *net, __be32 local);
>   int fib_sync_up(struct net_device *dev, unsigned int nh_flags);
> -void fib_select_multipath(struct fib_result *res);
> +
> +extern u32 fib_multipath_secret __read_mostly;
> +
> +typedef int (*multipath_hash_func_t)(void *ctx);
> +void fib_select_multipath(struct fib_result *res,
> +			  multipath_hash_func_t hash_func, void *ctx);
>
>   /* Exported by fib_trie.c */
>   void fib_trie_init(void);
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index 064bd3c..64d3e0e 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -57,8 +57,7 @@ static unsigned int fib_info_cnt;
>   static struct hlist_head fib_info_devhash[DEVINDEX_HASHSIZE];
>
>   #ifdef CONFIG_IP_ROUTE_MULTIPATH
> -
> -static DEFINE_SPINLOCK(fib_multipath_lock);
> +u32 fib_multipath_secret __read_mostly;
>
>   #define for_nexthops(fi) {						\
>   	int nhsel; const struct fib_nh *nh;				\
> @@ -468,6 +467,55 @@ static int fib_count_nexthops(struct rtnexthop *rtnh, int remaining)
>   	return remaining > 0 ? 0 : nhs;
>   }
>
> +static void fib_rebalance(struct fib_info *fi)
> +{
> +	int total;
> +	int w;
> +	struct in_device *in_dev;
> +
> +	if (fi->fib_nhs < 2)
> +		return;
> +
> +	total = 0;
> +	for_nexthops(fi) {
> +		if (nh->nh_flags & RTNH_F_DEAD)
> +			continue;
> +
> +		in_dev = __in_dev_get_rcu(nh->nh_dev);
> +
> +		if (in_dev &&
> +		    IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
> +		    nh->nh_flags & RTNH_F_LINKDOWN)
> +			continue;
> +
> +		total += nh->nh_weight;
> +	} endfor_nexthops(fi);
> +
> +	w = 0;
> +	change_nexthops(fi) {
> +		int upper_bound;
> +
> +		in_dev = __in_dev_get_rcu(nexthop_nh->nh_dev);
> +
> +		if (nexthop_nh->nh_flags & RTNH_F_DEAD) {
> +			upper_bound = -1;
> +		} else if (in_dev &&
> +			   IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
> +			   nexthop_nh->nh_flags & RTNH_F_LINKDOWN) {
> +			upper_bound = -1;
> +		} else {
> +			w += nexthop_nh->nh_weight;
> +			upper_bound = DIV_ROUND_CLOSEST(2147483648LL * w,
> +							total) - 1;
> +		}
> +
> +		atomic_set(&nexthop_nh->nh_upper_bound, upper_bound);
> +	} endfor_nexthops(fi);
> +
> +	net_get_random_once(&fib_multipath_secret,
> +			    sizeof(fib_multipath_secret));
> +}
> +
>   static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh,
>   		       int remaining, struct fib_config *cfg)
>   {
> @@ -1094,8 +1142,15 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
>
>   	change_nexthops(fi) {
>   		fib_info_update_nh_saddr(net, nexthop_nh);
> +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> +		fi->fib_weight += nexthop_nh->nh_weight;
> +#endif
>   	} endfor_nexthops(fi)
>
> +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> +	fib_rebalance(fi);
> +#endif
> +
>   link_it:
>   	ofi = fib_find_info(fi);
>   	if (ofi) {
> @@ -1317,12 +1372,6 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event)
>   					nexthop_nh->nh_flags |= RTNH_F_LINKDOWN;
>   					break;
>   				}
> -#ifdef CONFIG_IP_ROUTE_MULTIPATH
> -				spin_lock_bh(&fib_multipath_lock);
> -				fi->fib_power -= nexthop_nh->nh_power;
> -				nexthop_nh->nh_power = 0;
> -				spin_unlock_bh(&fib_multipath_lock);
> -#endif
>   				dead++;
>   			}
>   #ifdef CONFIG_IP_ROUTE_MULTIPATH
> @@ -1345,6 +1394,10 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event)
>   			}
>   			ret++;
>   		}
> +
> +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> +		fib_rebalance(fi);
> +#endif
>   	}
>
>   	return ret;
> @@ -1467,20 +1520,17 @@ int fib_sync_up(struct net_device *dev, unsigned int nh_flags)
>   			    !__in_dev_get_rtnl(dev))
>   				continue;
>   			alive++;
> -#ifdef CONFIG_IP_ROUTE_MULTIPATH
> -			spin_lock_bh(&fib_multipath_lock);
> -			nexthop_nh->nh_power = 0;
>   			nexthop_nh->nh_flags &= ~nh_flags;
> -			spin_unlock_bh(&fib_multipath_lock);
> -#else
> -			nexthop_nh->nh_flags &= ~nh_flags;
> -#endif
>   		} endfor_nexthops(fi)
>
>   		if (alive > 0) {
>   			fi->fib_flags &= ~nh_flags;
>   			ret++;
>   		}
> +
> +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> +		fib_rebalance(fi);
> +#endif
>   	}
>
>   	return ret;
> @@ -1488,62 +1538,21 @@ int fib_sync_up(struct net_device *dev, unsigned int nh_flags)
>
>   #ifdef CONFIG_IP_ROUTE_MULTIPATH
>
> -/*
> - * The algorithm is suboptimal, but it provides really
> - * fair weighted route distribution.
> - */
> -void fib_select_multipath(struct fib_result *res)
> +void fib_select_multipath(struct fib_result *res,
> +			  multipath_hash_func_t hash_func, void *ctx)
>   {
>   	struct fib_info *fi = res->fi;
> -	struct in_device *in_dev;
> -	int w;
> -
> -	spin_lock_bh(&fib_multipath_lock);
> -	if (fi->fib_power <= 0) {
> -		int power = 0;
> -		change_nexthops(fi) {
> -			in_dev = __in_dev_get_rcu(nexthop_nh->nh_dev);
> -			if (nexthop_nh->nh_flags & RTNH_F_DEAD)
> -				continue;
> -			if (in_dev &&
> -			    IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
> -			    nexthop_nh->nh_flags & RTNH_F_LINKDOWN)
> -				continue;
> -			power += nexthop_nh->nh_weight;
> -			nexthop_nh->nh_power = nexthop_nh->nh_weight;
> -		} endfor_nexthops(fi);
> -		fi->fib_power = power;
> -		if (power <= 0) {
> -			spin_unlock_bh(&fib_multipath_lock);
> -			/* Race condition: route has just become dead. */
> -			res->nh_sel = 0;
> -			return;
> -		}
> -	}
> -
> +	int h = hash_func(ctx) & 0x7FFFFFFF;
>

So if I understand this correctly you use the ctx and hash function at 
the very start of the fib_select_multipath function call, do I have that 
right?

> -	/* w should be random number [0..fi->fib_power-1],
> -	 * it is pretty bad approximation.
> -	 */
> -
> -	w = jiffies % fi->fib_power;
> +	for_nexthops(fi) {
> +		if (h > atomic_read(&nh->nh_upper_bound))
> +			continue;
>
> -	change_nexthops(fi) {
> -		if (!(nexthop_nh->nh_flags & RTNH_F_DEAD) &&
> -		    nexthop_nh->nh_power) {
> -			w -= nexthop_nh->nh_power;
> -			if (w <= 0) {
> -				nexthop_nh->nh_power--;
> -				fi->fib_power--;
> -				res->nh_sel = nhsel;
> -				spin_unlock_bh(&fib_multipath_lock);
> -				return;
> -			}
> -		}
> +		res->nh_sel = nhsel;
> +		return;
>   	} endfor_nexthops(fi);
>
>   	/* Race condition: route has just become dead. */
>   	res->nh_sel = 0;
> -	spin_unlock_bh(&fib_multipath_lock);
>   }
>   #endif
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 5f4a556..41d977c 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1646,6 +1646,25 @@ out:
>   	return err;
>   }
>
> +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> +
> +static int ip_multipath_hash_skb(void *ctx)
> +{
> +	const struct sk_buff *skb = (struct sk_buff *)ctx;
> +	const struct iphdr *iph = ip_hdr(skb);
> +
> +	return jhash_2words(iph->saddr, iph->daddr, fib_multipath_secret);
> +}
> +
> +static int ip_multipath_hash_fl4(void *ctx)
> +{
> +	const struct flowi4 *fl4 = (const struct flowi4 *)ctx;
> +
> +	return jhash_2words(fl4->saddr, fl4->daddr, fib_multipath_secret);
> +}
> +
> +#endif /* CONFIG_IP_ROUTE_MULTIPATH */
> +
>   static int ip_mkroute_input(struct sk_buff *skb,
>   			    struct fib_result *res,
>   			    const struct flowi4 *fl4,
> @@ -1654,7 +1673,7 @@ static int ip_mkroute_input(struct sk_buff *skb,
>   {
>   #ifdef CONFIG_IP_ROUTE_MULTIPATH
>   	if (res->fi && res->fi->fib_nhs > 1)
> -		fib_select_multipath(res);
> +		fib_select_multipath(res, ip_multipath_hash_skb, skb);
>   #endif
>
>   	/* create a routing cache entry */
> @@ -2200,7 +2219,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
>
>   #ifdef CONFIG_IP_ROUTE_MULTIPATH
>   	if (res.fi->fib_nhs > 1 && fl4->flowi4_oif == 0)
> -		fib_select_multipath(&res);
> +		fib_select_multipath(&res, ip_multipath_hash_fl4, fl4);
>   	else
>   #endif
>   	if (!res.prefixlen &&
>

If you are using the hash right at the start of the function anyway why 
not just call jhash_2words first, then pass the resultant hash to 
fib_select_multipath instead of passing the function pointer and the 
void pointer.  You can probably save yourself quite a few cycles this way.

- Alex

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 net-next 1/2] ipv4: L3 hash-based multipath
  2015-09-15 21:40   ` Alexander Duyck
@ 2015-09-15 22:32     ` Peter Nørlund
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Nørlund @ 2015-09-15 22:32 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Tue, 15 Sep 2015 14:40:48 -0700
Alexander Duyck <alexander.duyck@gmail.com> wrote:

> On 09/15/2015 01:29 PM, Peter Nørlund wrote:
> > Replaces the per-packet multipath with a hash-based multipath using
> > source and destination address.
> >
> > Signed-off-by: Peter Nørlund <pch@ordbogen.com>
> > ---
> >   include/net/ip_fib.h     |  11 ++--
> >   net/ipv4/fib_semantics.c | 137
> > +++++++++++++++++++++++++----------------------
> > net/ipv4/route.c         |  23 +++++++- 3 files changed, 102
> > insertions(+), 69 deletions(-)
> >
> > diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> > index a37d043..c335dd4 100644
> > --- a/include/net/ip_fib.h
> > +++ b/include/net/ip_fib.h
> > @@ -79,7 +79,7 @@ struct fib_nh {
> >   	unsigned char		nh_scope;
> >   #ifdef CONFIG_IP_ROUTE_MULTIPATH
> >   	int			nh_weight;
> > -	int			nh_power;
> > +	atomic_t		nh_upper_bound;
> >   #endif
> >   #ifdef CONFIG_IP_ROUTE_CLASSID
> >   	__u32			nh_tclassid;
> > @@ -118,7 +118,7 @@ struct fib_info {
> >   #define fib_advmss fib_metrics[RTAX_ADVMSS-1]
> >   	int			fib_nhs;
> >   #ifdef CONFIG_IP_ROUTE_MULTIPATH
> > -	int			fib_power;
> > +	int			fib_weight;
> >   #endif
> >   	struct rcu_head		rcu;
> >   	struct fib_nh		fib_nh[0];
> > @@ -312,7 +312,12 @@ int ip_fib_check_default(__be32 gw, struct
> > net_device *dev); int fib_sync_down_dev(struct net_device *dev,
> > unsigned long event); int fib_sync_down_addr(struct net *net,
> > __be32 local); int fib_sync_up(struct net_device *dev, unsigned int
> > nh_flags); -void fib_select_multipath(struct fib_result *res);
> > +
> > +extern u32 fib_multipath_secret __read_mostly;
> > +
> > +typedef int (*multipath_hash_func_t)(void *ctx);
> > +void fib_select_multipath(struct fib_result *res,
> > +			  multipath_hash_func_t hash_func, void
> > *ctx);
> >
> >   /* Exported by fib_trie.c */
> >   void fib_trie_init(void);
> > diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> > index 064bd3c..64d3e0e 100644
> > --- a/net/ipv4/fib_semantics.c
> > +++ b/net/ipv4/fib_semantics.c
> > @@ -57,8 +57,7 @@ static unsigned int fib_info_cnt;
> >   static struct hlist_head fib_info_devhash[DEVINDEX_HASHSIZE];
> >
> >   #ifdef CONFIG_IP_ROUTE_MULTIPATH
> > -
> > -static DEFINE_SPINLOCK(fib_multipath_lock);
> > +u32 fib_multipath_secret __read_mostly;
> >
> >   #define for_nexthops(fi)
> > {						\ int nhsel; const
> > struct fib_nh *nh;				\ @@ -468,6
> > +467,55 @@ static int fib_count_nexthops(struct rtnexthop *rtnh,
> > int remaining) return remaining > 0 ? 0 : nhs; }
> >
> > +static void fib_rebalance(struct fib_info *fi)
> > +{
> > +	int total;
> > +	int w;
> > +	struct in_device *in_dev;
> > +
> > +	if (fi->fib_nhs < 2)
> > +		return;
> > +
> > +	total = 0;
> > +	for_nexthops(fi) {
> > +		if (nh->nh_flags & RTNH_F_DEAD)
> > +			continue;
> > +
> > +		in_dev = __in_dev_get_rcu(nh->nh_dev);
> > +
> > +		if (in_dev &&
> > +		    IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
> > +		    nh->nh_flags & RTNH_F_LINKDOWN)
> > +			continue;
> > +
> > +		total += nh->nh_weight;
> > +	} endfor_nexthops(fi);
> > +
> > +	w = 0;
> > +	change_nexthops(fi) {
> > +		int upper_bound;
> > +
> > +		in_dev = __in_dev_get_rcu(nexthop_nh->nh_dev);
> > +
> > +		if (nexthop_nh->nh_flags & RTNH_F_DEAD) {
> > +			upper_bound = -1;
> > +		} else if (in_dev &&
> > +
> > IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
> > +			   nexthop_nh->nh_flags & RTNH_F_LINKDOWN)
> > {
> > +			upper_bound = -1;
> > +		} else {
> > +			w += nexthop_nh->nh_weight;
> > +			upper_bound =
> > DIV_ROUND_CLOSEST(2147483648LL * w,
> > +							total) - 1;
> > +		}
> > +
> > +		atomic_set(&nexthop_nh->nh_upper_bound,
> > upper_bound);
> > +	} endfor_nexthops(fi);
> > +
> > +	net_get_random_once(&fib_multipath_secret,
> > +			    sizeof(fib_multipath_secret));
> > +}
> > +
> >   static int fib_get_nhs(struct fib_info *fi, struct rtnexthop
> > *rtnh, int remaining, struct fib_config *cfg)
> >   {
> > @@ -1094,8 +1142,15 @@ struct fib_info *fib_create_info(struct
> > fib_config *cfg)
> >
> >   	change_nexthops(fi) {
> >   		fib_info_update_nh_saddr(net, nexthop_nh);
> > +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> > +		fi->fib_weight += nexthop_nh->nh_weight;
> > +#endif
> >   	} endfor_nexthops(fi)
> >
> > +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> > +	fib_rebalance(fi);
> > +#endif
> > +
> >   link_it:
> >   	ofi = fib_find_info(fi);
> >   	if (ofi) {
> > @@ -1317,12 +1372,6 @@ int fib_sync_down_dev(struct net_device
> > *dev, unsigned long event) nexthop_nh->nh_flags |= RTNH_F_LINKDOWN;
> >   					break;
> >   				}
> > -#ifdef CONFIG_IP_ROUTE_MULTIPATH
> > -				spin_lock_bh(&fib_multipath_lock);
> > -				fi->fib_power -=
> > nexthop_nh->nh_power;
> > -				nexthop_nh->nh_power = 0;
> > -
> > spin_unlock_bh(&fib_multipath_lock); -#endif
> >   				dead++;
> >   			}
> >   #ifdef CONFIG_IP_ROUTE_MULTIPATH
> > @@ -1345,6 +1394,10 @@ int fib_sync_down_dev(struct net_device
> > *dev, unsigned long event) }
> >   			ret++;
> >   		}
> > +
> > +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> > +		fib_rebalance(fi);
> > +#endif
> >   	}
> >
> >   	return ret;
> > @@ -1467,20 +1520,17 @@ int fib_sync_up(struct net_device *dev,
> > unsigned int nh_flags) !__in_dev_get_rtnl(dev))
> >   				continue;
> >   			alive++;
> > -#ifdef CONFIG_IP_ROUTE_MULTIPATH
> > -			spin_lock_bh(&fib_multipath_lock);
> > -			nexthop_nh->nh_power = 0;
> >   			nexthop_nh->nh_flags &= ~nh_flags;
> > -			spin_unlock_bh(&fib_multipath_lock);
> > -#else
> > -			nexthop_nh->nh_flags &= ~nh_flags;
> > -#endif
> >   		} endfor_nexthops(fi)
> >
> >   		if (alive > 0) {
> >   			fi->fib_flags &= ~nh_flags;
> >   			ret++;
> >   		}
> > +
> > +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> > +		fib_rebalance(fi);
> > +#endif
> >   	}
> >
> >   	return ret;
> > @@ -1488,62 +1538,21 @@ int fib_sync_up(struct net_device *dev,
> > unsigned int nh_flags)
> >
> >   #ifdef CONFIG_IP_ROUTE_MULTIPATH
> >
> > -/*
> > - * The algorithm is suboptimal, but it provides really
> > - * fair weighted route distribution.
> > - */
> > -void fib_select_multipath(struct fib_result *res)
> > +void fib_select_multipath(struct fib_result *res,
> > +			  multipath_hash_func_t hash_func, void
> > *ctx) {
> >   	struct fib_info *fi = res->fi;
> > -	struct in_device *in_dev;
> > -	int w;
> > -
> > -	spin_lock_bh(&fib_multipath_lock);
> > -	if (fi->fib_power <= 0) {
> > -		int power = 0;
> > -		change_nexthops(fi) {
> > -			in_dev =
> > __in_dev_get_rcu(nexthop_nh->nh_dev);
> > -			if (nexthop_nh->nh_flags & RTNH_F_DEAD)
> > -				continue;
> > -			if (in_dev &&
> > -
> > IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
> > -			    nexthop_nh->nh_flags & RTNH_F_LINKDOWN)
> > -				continue;
> > -			power += nexthop_nh->nh_weight;
> > -			nexthop_nh->nh_power =
> > nexthop_nh->nh_weight;
> > -		} endfor_nexthops(fi);
> > -		fi->fib_power = power;
> > -		if (power <= 0) {
> > -			spin_unlock_bh(&fib_multipath_lock);
> > -			/* Race condition: route has just become
> > dead. */
> > -			res->nh_sel = 0;
> > -			return;
> > -		}
> > -	}
> > -
> > +	int h = hash_func(ctx) & 0x7FFFFFFF;
> >
> 
> So if I understand this correctly you use the ctx and hash function
> at the very start of the fib_select_multipath function call, do I
> have that right?

Correct

> 
> > -	/* w should be random number [0..fi->fib_power-1],
> > -	 * it is pretty bad approximation.
> > -	 */
> > -
> > -	w = jiffies % fi->fib_power;
> > +	for_nexthops(fi) {
> > +		if (h > atomic_read(&nh->nh_upper_bound))
> > +			continue;
> >
> > -	change_nexthops(fi) {
> > -		if (!(nexthop_nh->nh_flags & RTNH_F_DEAD) &&
> > -		    nexthop_nh->nh_power) {
> > -			w -= nexthop_nh->nh_power;
> > -			if (w <= 0) {
> > -				nexthop_nh->nh_power--;
> > -				fi->fib_power--;
> > -				res->nh_sel = nhsel;
> > -
> > spin_unlock_bh(&fib_multipath_lock);
> > -				return;
> > -			}
> > -		}
> > +		res->nh_sel = nhsel;
> > +		return;
> >   	} endfor_nexthops(fi);
> >
> >   	/* Race condition: route has just become dead. */
> >   	res->nh_sel = 0;
> > -	spin_unlock_bh(&fib_multipath_lock);
> >   }
> >   #endif
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index 5f4a556..41d977c 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -1646,6 +1646,25 @@ out:
> >   	return err;
> >   }
> >
> > +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> > +
> > +static int ip_multipath_hash_skb(void *ctx)
> > +{
> > +	const struct sk_buff *skb = (struct sk_buff *)ctx;
> > +	const struct iphdr *iph = ip_hdr(skb);
> > +
> > +	return jhash_2words(iph->saddr, iph->daddr,
> > fib_multipath_secret); +}
> > +
> > +static int ip_multipath_hash_fl4(void *ctx)
> > +{
> > +	const struct flowi4 *fl4 = (const struct flowi4 *)ctx;
> > +
> > +	return jhash_2words(fl4->saddr, fl4->daddr,
> > fib_multipath_secret); +}
> > +
> > +#endif /* CONFIG_IP_ROUTE_MULTIPATH */
> > +
> >   static int ip_mkroute_input(struct sk_buff *skb,
> >   			    struct fib_result *res,
> >   			    const struct flowi4 *fl4,
> > @@ -1654,7 +1673,7 @@ static int ip_mkroute_input(struct sk_buff
> > *skb, {
> >   #ifdef CONFIG_IP_ROUTE_MULTIPATH
> >   	if (res->fi && res->fi->fib_nhs > 1)
> > -		fib_select_multipath(res);
> > +		fib_select_multipath(res, ip_multipath_hash_skb,
> > skb); #endif
> >
> >   	/* create a routing cache entry */
> > @@ -2200,7 +2219,7 @@ struct rtable *__ip_route_output_key(struct
> > net *net, struct flowi4 *fl4)
> >
> >   #ifdef CONFIG_IP_ROUTE_MULTIPATH
> >   	if (res.fi->fib_nhs > 1 && fl4->flowi4_oif == 0)
> > -		fib_select_multipath(&res);
> > +		fib_select_multipath(&res, ip_multipath_hash_fl4,
> > fl4); else
> >   #endif
> >   	if (!res.prefixlen &&
> >
> 
> If you are using the hash right at the start of the function anyway
> why not just call jhash_2words first, then pass the resultant hash to 
> fib_select_multipath instead of passing the function pointer and the 
> void pointer.  You can probably save yourself quite a few cycles this
> way.

Good point! A quick proof-of-concept showed I saved 17 cycles (from 97
to 80 cycles/packet).

Thanks

> 
> - Alex
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 net-next 1/2] ipv4: L3 hash-based multipath
  2015-09-15 20:29 ` [PATCH v3 net-next 1/2] ipv4: L3 hash-based multipath Peter Nørlund
  2015-09-15 21:40   ` Alexander Duyck
@ 2015-09-21 16:52   ` David Ahern
  2015-09-23 19:22     ` Peter Nørlund
  1 sibling, 1 reply; 7+ messages in thread
From: David Ahern @ 2015-09-21 16:52 UTC (permalink / raw)
  To: Peter Nørlund, netdev
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On 9/15/15 2:29 PM, Peter Nørlund wrote:
> @@ -1094,8 +1142,15 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
>
>   	change_nexthops(fi) {
>   		fib_info_update_nh_saddr(net, nexthop_nh);
> +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> +		fi->fib_weight += nexthop_nh->nh_weight;
> +#endif
>   	} endfor_nexthops(fi)
>
> +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> +	fib_rebalance(fi);
> +#endif

The ifdefs everywhere make this harder to read. Just have a second 
definition above for fib_rebalance(fi) when CONFIG_IP_ROUTE_MULTIPATH is 
not defined.

Similarly an inlined macro of fi->fib_weight that compiles out if 
CONFIG_IP_ROUTE_MULTIPATH is not defined.

David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 net-next 1/2] ipv4: L3 hash-based multipath
  2015-09-21 16:52   ` David Ahern
@ 2015-09-23 19:22     ` Peter Nørlund
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Nørlund @ 2015-09-23 19:22 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Mon, 21 Sep 2015 10:52:13 -0600
David Ahern <dsa@cumulusnetworks.com> wrote:

> On 9/15/15 2:29 PM, Peter Nørlund wrote:
> > @@ -1094,8 +1142,15 @@ struct fib_info *fib_create_info(struct
> > fib_config *cfg)
> >
> >   	change_nexthops(fi) {
> >   		fib_info_update_nh_saddr(net, nexthop_nh);
> > +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> > +		fi->fib_weight += nexthop_nh->nh_weight;
> > +#endif
> >   	} endfor_nexthops(fi)
> >
> > +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> > +	fib_rebalance(fi);
> > +#endif
> 
> The ifdefs everywhere make this harder to read. Just have a second 
> definition above for fib_rebalance(fi) when CONFIG_IP_ROUTE_MULTIPATH
> is not defined.
> 
> Similarly an inlined macro of fi->fib_weight that compiles out if 
> CONFIG_IP_ROUTE_MULTIPATH is not defined.
> 
> David

Okay, will do.

Regards,
 Peter Nørlund

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-09-23 19:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-15 20:29 [PATCH v3 net-next 0/2] ipv4: Hash-based multipath routing Peter Nørlund
2015-09-15 20:29 ` [PATCH v3 net-next 1/2] ipv4: L3 hash-based multipath Peter Nørlund
2015-09-15 21:40   ` Alexander Duyck
2015-09-15 22:32     ` Peter Nørlund
2015-09-21 16:52   ` David Ahern
2015-09-23 19:22     ` Peter Nørlund
2015-09-15 20:29 ` [PATCH v3 net-next 2/2] ipv4: ICMP packet inspection for multipath Peter Nørlund

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.