All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/2] bonding support for IPv6 transmit hashing
@ 2012-07-01  7:01 John Eaglesham
  2012-07-01  7:01 ` [PATCH v4 1/2] Add support for IPv6 and bounds checking to transmit hashing functions John Eaglesham
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: John Eaglesham @ 2012-07-01  7:01 UTC (permalink / raw)
  To: netdev; +Cc: John Eaglesham

Currently the "bonding" driver does not support load balancing outgoing
traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4)
are currently supported; this patch adds transmit hashing for IPv6 (and
TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the
bonding driver.

The algorithm chosen (xor'ing the bottom three quads and then xor'ing
the bottom three bytes of that) was chosen after testing almost 400,000
unique IPv6 addresses harvested from server logs. This algorithm had the
most even distribution for both big- and little-endian architectures while
still using few instructions.

The IPv6 flow label was intentionally not included in the hash as it appears
to be unset in the vast majority of IPv6 traffic sampled, and the current
algorithm not using the flow label already offers a very even distribution.

Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets,
ie, they are not balanced based on layer 4 information. Additionally,
IPv6 packets with intermediate headers are not balanced based on layer
4 information. In practice these intermediate headers are not common and
this should not cause any problems, and the alternative (a packet-parsing
loop and look-up table) seemed slow and complicated for little gain.

This is an update to a prior patch I submitted. This version includes
a clarified description, thorough bounds checking, updates functions to
call bond_xmit_hash_policy_l2 rather than re-implement the same logic,
incorporates Jay's style suggestions, and patches against net-next. Patch
has been tested and performs as expected.

John Eaglesham (2):
  Add support for IPv6 and bounds checking to transmit hashing
    functions.
  Update bonding driver documentation to include IPv6 transmit hashing
    algorithm.

 Documentation/networking/bonding.txt | 31 ++++++++++--
 drivers/net/bonding/bond_main.c      | 91 +++++++++++++++++++++++++-----------
 2 files changed, 90 insertions(+), 32 deletions(-)

-- 
1.7.11

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v4 1/2] Add support for IPv6 and bounds checking to transmit hashing functions.
  2012-07-01  7:01 [PATCH v4 0/2] bonding support for IPv6 transmit hashing John Eaglesham
@ 2012-07-01  7:01 ` John Eaglesham
  2012-07-01  7:33   ` David Miller
  2012-07-01  7:01 ` [PATCH v4 2/2] Update bonding driver documentation to include IPv6 transmit hashing algorithm John Eaglesham
  2012-07-01  8:07 ` [PATCH v5] bonding support for IPv6 transmit hashing John Eaglesham
  2 siblings, 1 reply; 25+ messages in thread
From: John Eaglesham @ 2012-07-01  7:01 UTC (permalink / raw)
  To: netdev; +Cc: John Eaglesham

---
 drivers/net/bonding/bond_main.c | 91 +++++++++++++++++++++++++++++------------
 1 file changed, 64 insertions(+), 27 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index f5a40b9..b138d84 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3345,56 +3345,93 @@ static struct notifier_block bond_netdev_notifier = {
 /*---------------------------- Hashing Policies -----------------------------*/
 
 /*
+ * Hash for the output device based upon layer 2 data
+ */
+static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
+{
+	struct ethhdr *data = (struct ethhdr *)skb->data;
+
+	if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
+		return (data->h_dest[5] ^ data->h_source[5]) % count;
+
+	return 0;
+}
+
+/*
  * Hash for the output device based upon layer 2 and layer 3 data. If
- * the packet is not IP mimic bond_xmit_hash_policy_l2()
+ * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
  */
 static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
-	struct iphdr *iph = ip_hdr(skb);
+	struct iphdr *iph;
+	struct ipv6hdr *ipv6h;
+	u32 v6hash;
 
-	if (skb->protocol == htons(ETH_P_IP)) {
+	if (skb->protocol == htons(ETH_P_IP) &&
+		skb_network_header_len(skb) >= sizeof(struct iphdr)) {
+		iph = ip_hdr(skb);
 		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
 			(data->h_dest[5] ^ data->h_source[5])) % count;
-	}
-
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
+	} else if (skb->protocol == htons(ETH_P_IPV6) &&
+		skb_network_header_len(skb) >= sizeof(struct ipv6hdr)) {
+		ipv6h = ipv6_hdr(skb);
+		v6hash =
+			(ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^
+			(ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^
+			(ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]);
+		v6hash = (v6hash >> 16) ^ (v6hash >> 8) ^ v6hash;
+		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
+	}
+
+	return bond_xmit_hash_policy_l2(skb, count);
 }
 
 /*
  * Hash for the output device based upon layer 3 and layer 4 data. If
  * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
- * altogether not IP, mimic bond_xmit_hash_policy_l2()
+ * altogether not IP, fall back on bond_xmit_hash_policy_l2()
  */
 static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
 {
-	struct ethhdr *data = (struct ethhdr *)skb->data;
-	struct iphdr *iph = ip_hdr(skb);
-	__be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
-	int layer4_xor = 0;
+	u32 layer4_xor = 0;
+	struct iphdr *iph;
+	struct ipv6hdr *ipv6h;
 
 	if (skb->protocol == htons(ETH_P_IP)) {
+		iph = ip_hdr(skb);
 		if (!ip_is_fragment(iph) &&
-		    (iph->protocol == IPPROTO_TCP ||
-		     iph->protocol == IPPROTO_UDP)) {
+			(iph->protocol == IPPROTO_TCP ||
+			iph->protocol == IPPROTO_UDP)) {
+			__be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
+			if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 >
+				skb_headlen(skb) - skb_network_offset(skb))
+				goto short_header;
 			layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1)));
+		} else if (skb_network_header_len(skb) < sizeof(struct iphdr)) {
+			goto short_header;
 		}
-		return (layer4_xor ^
-			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
-
+		return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		ipv6h = ipv6_hdr(skb);
+		if (ipv6h->nexthdr == IPPROTO_TCP || ipv6h->nexthdr == IPPROTO_UDP) {
+			__be16 *layer4hdrv6 = (__be16 *)((u8 *)ipv6h + sizeof(struct ipv6hdr));
+			if (sizeof(struct ipv6hdr) + sizeof(__be16) * 2 >
+				skb_headlen(skb) - skb_network_offset(skb))
+				goto short_header;
+			layer4_xor = (*layer4hdrv6 ^ *(layer4hdrv6 + 1));
+		} else if (skb_network_header_len(skb) < sizeof(struct ipv6hdr)) {
+			goto short_header;
+		}
+		layer4_xor ^=
+			(ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^
+			(ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^
+			(ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]);
+		return ((layer4_xor >> 16) ^ (layer4_xor >> 8) ^ layer4_xor) % count;
 	}
 
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
-}
-
-/*
- * Hash for the output device based upon layer 2 data
- */
-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
-{
-	struct ethhdr *data = (struct ethhdr *)skb->data;
-
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
+short_header:
+	return bond_xmit_hash_policy_l2(skb, count);
 }
 
 /*-------------------------- Device entry points ----------------------------*/
-- 
1.7.11

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 2/2] Update bonding driver documentation to include IPv6 transmit hashing algorithm.
  2012-07-01  7:01 [PATCH v4 0/2] bonding support for IPv6 transmit hashing John Eaglesham
  2012-07-01  7:01 ` [PATCH v4 1/2] Add support for IPv6 and bounds checking to transmit hashing functions John Eaglesham
@ 2012-07-01  7:01 ` John Eaglesham
  2012-07-01  7:34   ` David Miller
  2012-07-01  8:07 ` [PATCH v5] bonding support for IPv6 transmit hashing John Eaglesham
  2 siblings, 1 reply; 25+ messages in thread
From: John Eaglesham @ 2012-07-01  7:01 UTC (permalink / raw)
  To: netdev; +Cc: John Eaglesham

---
 Documentation/networking/bonding.txt | 31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index bfea8a3..5db14fe 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -752,12 +752,22 @@ xmit_hash_policy
 		protocol information to generate the hash.
 
 		Uses XOR of hardware MAC addresses and IP addresses to
-		generate the hash.  The formula is
+		generate the hash.  The IPv4 formula is
 
 		(((source IP XOR dest IP) AND 0xffff) XOR
 			( source MAC XOR destination MAC ))
 				modulo slave count
 
+		The IPv6 forumla is
+
+		iphash =
+			(source ip quad 2 XOR dest IP quad 2) XOR
+			(source ip quad 3 XOR dest IP quad 3) XOR
+			(source ip quad 4 XOR dest IP quad 4)
+
+		((iphash >> 16) XOR (iphash >> 8) XOR iphash)
+			modulo slave count
+
 		This algorithm will place all traffic to a particular
 		network peer on the same slave.  For non-IP traffic,
 		the formula is the same as for the layer2 transmit
@@ -778,19 +788,30 @@ xmit_hash_policy
 		slaves, although a single connection will not span
 		multiple slaves.
 
-		The formula for unfragmented TCP and UDP packets is
+		The formula for unfragmented IPv4 TCP and UDP packets is
 
 		((source port XOR dest port) XOR
 			 ((source IP XOR dest IP) AND 0xffff)
 				modulo slave count
 
-		For fragmented TCP or UDP packets and all other IP
-		protocol traffic, the source and destination port
+		The formula for unfragmented IPv6 TCP and UDP packets is
+
+		iphash =
+			(source ip quad 2 XOR dest IP quad 2) XOR
+			(source ip quad 3 XOR dest IP quad 3) XOR
+			(source ip quad 4 XOR dest IP quad 4)
+
+		((source port XOR dest port) XOR
+			(iphash >> 16) XOR (iphash >> 8) XOR iphash)
+				modulo slave count
+
+		For fragmented TCP or UDP packets and all other IPv4 and
+		IPv6 protocol traffic, the source and destination port
 		information is omitted.  For non-IP traffic, the
 		formula is the same as for the layer2 transmit hash
 		policy.
 
-		This policy is intended to mimic the behavior of
+		The IPv4 policy is intended to mimic the behavior of
 		certain switches, notably Cisco switches with PFC2 as
 		well as some Foundry and IBM products.
 
-- 
1.7.11

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 1/2] Add support for IPv6 and bounds checking to transmit hashing functions.
  2012-07-01  7:01 ` [PATCH v4 1/2] Add support for IPv6 and bounds checking to transmit hashing functions John Eaglesham
@ 2012-07-01  7:33   ` David Miller
  0 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2012-07-01  7:33 UTC (permalink / raw)
  To: linux; +Cc: netdev

From: John Eaglesham <linux@8192.net>
Date: Sun,  1 Jul 2012 00:01:38 -0700

> -	if (skb->protocol == htons(ETH_P_IP)) {
> +	if (skb->protocol == htons(ETH_P_IP) &&
> +		skb_network_header_len(skb) >= sizeof(struct iphdr)) {

This is not indented properly, the goal isn't to use only TAB
characters to indent, the goal it to line things up right after
the openning parenthesis on the previous line, so this should
be:

	if (skb->protocol == htons(ETH_P_IP) &&
	    skb_network_header_len(skb) >= sizeof(struct iphdr)) {

> +	} else if (skb->protocol == htons(ETH_P_IPV6) &&
> +		skb_network_header_len(skb) >= sizeof(struct ipv6hdr)) {

Likewise, in this case you even under-indented it.

> +		v6hash =
> +			(ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^
> +			(ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^
> +			(ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]);

This is rediculous, just put &ipv6h->saddr into a local pointer named
's' and then you won't have use such gymnastics to indent the code.

>  		if (!ip_is_fragment(iph) &&
> -		    (iph->protocol == IPPROTO_TCP ||
> -		     iph->protocol == IPPROTO_UDP)) {
> +			(iph->protocol == IPPROTO_TCP ||
> +			iph->protocol == IPPROTO_UDP)) {

This is what _REALLY_ bothers me.  You took an existing conditional
which _WAS_ indented properly and you made erroneously re-indented.

In fact your only change here is to break the indentation.

Just remove these changes entirely.

> +			if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 >
> +				skb_headlen(skb) - skb_network_offset(skb))

Similarly, this is indented improperly.

> +		} else if (skb_network_header_len(skb) < sizeof(struct iphdr)) {
> +			goto short_header;
>  		}

Single line basic blocks do not use openning and closing braces.

> -		return (layer4_xor ^
> -			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
> -
> +		return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;

Don't reindent code unless you can do it properly, now this line is way
over 80 columns.  The previous code was perfectly fine, leave it alone.

> +		if (ipv6h->nexthdr == IPPROTO_TCP || ipv6h->nexthdr == IPPROTO_UDP) {

Line is too long, write it as:

		if (ipv6h->nexthdr == IPPROTO_TCP ||
		    ipv6h->nexthdr == IPPROTO_UDP) {

> +			__be16 *layer4hdrv6 = (__be16 *)((u8 *)ipv6h + sizeof(struct ipv6hdr));

Likewise, line is too long.  Just do the variable declaration seperate from the
assignment:
			__be16 *layer4hdrv6;

			layer4hdrv6 = BLAH BLAH BLAH;

> +			if (sizeof(struct ipv6hdr) + sizeof(__be16) * 2 >
> +				skb_headlen(skb) - skb_network_offset(skb))

Improperly indented, fix.

> +		} else if (skb_network_header_len(skb) < sizeof(struct ipv6hdr)) {
> +			goto short_header;
> +		}

Since line basic block, no braces.

> +		layer4_xor ^=
> +			(ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^
> +			(ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^
> +			(ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]);

This is gross, do as I said above using a local pointer variable.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/2] Update bonding driver documentation to include IPv6 transmit hashing algorithm.
  2012-07-01  7:01 ` [PATCH v4 2/2] Update bonding driver documentation to include IPv6 transmit hashing algorithm John Eaglesham
@ 2012-07-01  7:34   ` David Miller
  2012-07-01  7:42     ` John Eaglesham
  0 siblings, 1 reply; 25+ messages in thread
From: David Miller @ 2012-07-01  7:34 UTC (permalink / raw)
  To: linux; +Cc: netdev


I think you should combine this into the first patch, there is no
reason to separate these two changes.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/2] Update bonding driver documentation to include IPv6 transmit hashing algorithm.
  2012-07-01  7:34   ` David Miller
@ 2012-07-01  7:42     ` John Eaglesham
  0 siblings, 0 replies; 25+ messages in thread
From: John Eaglesham @ 2012-07-01  7:42 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On 7/1/2012 12:34 AM, David Miller wrote:
>
> I think you should combine this into the first patch, there is no
> reason to separate these two changes.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Thanks, I will do that.

John

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v5] bonding support for IPv6 transmit hashing
  2012-07-01  7:01 [PATCH v4 0/2] bonding support for IPv6 transmit hashing John Eaglesham
  2012-07-01  7:01 ` [PATCH v4 1/2] Add support for IPv6 and bounds checking to transmit hashing functions John Eaglesham
  2012-07-01  7:01 ` [PATCH v4 2/2] Update bonding driver documentation to include IPv6 transmit hashing algorithm John Eaglesham
@ 2012-07-01  8:07 ` John Eaglesham
  2012-07-01 10:33   ` David Miller
  2012-07-01 19:13   ` [PATCH v6] " John Eaglesham
  2 siblings, 2 replies; 25+ messages in thread
From: John Eaglesham @ 2012-07-01  8:07 UTC (permalink / raw)
  To: netdev; +Cc: John Eaglesham

Currently the "bonding" driver does not support load balancing outgoing
traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4)
are currently supported; this patch adds transmit hashing for IPv6 (and
TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the
bonding driver.

The algorithm chosen (xor'ing the bottom three quads and then xor'ing
the bottom three bytes of that) was chosen after testing almost 400,000
unique IPv6 addresses harvested from server logs. This algorithm had the
most even distribution for both big- and little-endian architectures while
still using few instructions.

The IPv6 flow label was intentionally not included in the hash as it appears
to be unset in the vast majority of IPv6 traffic sampled, and the current
algorithm not using the flow label already offers a very even distribution.

Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets,
ie, they are not balanced based on layer 4 information. Additionally,
IPv6 packets with intermediate headers are not balanced based on layer
4 information. In practice these intermediate headers are not common and
this should not cause any problems, and the alternative (a packet-parsing
loop and look-up table) seemed slow and complicated for little gain.

This is an update to a prior patch I submitted. This version includes
a clarified description, thorough bounds checking, updates functions to
call bond_xmit_hash_policy_l2 rather than re-implement the same logic,
incorporates Jay's style suggestions, patches against net-next, and
squashes the documentation and code patch into one. Patch has been tested
and performs as expected.

John Eaglesham

---
 Documentation/networking/bonding.txt | 31 ++++++++++--
 drivers/net/bonding/bond_main.c      | 91 +++++++++++++++++++++++++-----------
 2 files changed, 90 insertions(+), 32 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index bfea8a3..5db14fe 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -752,12 +752,22 @@ xmit_hash_policy
 		protocol information to generate the hash.
 
 		Uses XOR of hardware MAC addresses and IP addresses to
-		generate the hash.  The formula is
+		generate the hash.  The IPv4 formula is
 
 		(((source IP XOR dest IP) AND 0xffff) XOR
 			( source MAC XOR destination MAC ))
 				modulo slave count
 
+		The IPv6 forumla is
+
+		iphash =
+			(source ip quad 2 XOR dest IP quad 2) XOR
+			(source ip quad 3 XOR dest IP quad 3) XOR
+			(source ip quad 4 XOR dest IP quad 4)
+
+		((iphash >> 16) XOR (iphash >> 8) XOR iphash)
+			modulo slave count
+
 		This algorithm will place all traffic to a particular
 		network peer on the same slave.  For non-IP traffic,
 		the formula is the same as for the layer2 transmit
@@ -778,19 +788,30 @@ xmit_hash_policy
 		slaves, although a single connection will not span
 		multiple slaves.
 
-		The formula for unfragmented TCP and UDP packets is
+		The formula for unfragmented IPv4 TCP and UDP packets is
 
 		((source port XOR dest port) XOR
 			 ((source IP XOR dest IP) AND 0xffff)
 				modulo slave count
 
-		For fragmented TCP or UDP packets and all other IP
-		protocol traffic, the source and destination port
+		The formula for unfragmented IPv6 TCP and UDP packets is
+
+		iphash =
+			(source ip quad 2 XOR dest IP quad 2) XOR
+			(source ip quad 3 XOR dest IP quad 3) XOR
+			(source ip quad 4 XOR dest IP quad 4)
+
+		((source port XOR dest port) XOR
+			(iphash >> 16) XOR (iphash >> 8) XOR iphash)
+				modulo slave count
+
+		For fragmented TCP or UDP packets and all other IPv4 and
+		IPv6 protocol traffic, the source and destination port
 		information is omitted.  For non-IP traffic, the
 		formula is the same as for the layer2 transmit hash
 		policy.
 
-		This policy is intended to mimic the behavior of
+		The IPv4 policy is intended to mimic the behavior of
 		certain switches, notably Cisco switches with PFC2 as
 		well as some Foundry and IBM products.
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index f5a40b9..b138d84 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3345,56 +3345,93 @@ static struct notifier_block bond_netdev_notifier = {
 /*---------------------------- Hashing Policies -----------------------------*/
 
 /*
+ * Hash for the output device based upon layer 2 data
+ */
+static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
+{
+	struct ethhdr *data = (struct ethhdr *)skb->data;
+
+	if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
+		return (data->h_dest[5] ^ data->h_source[5]) % count;
+
+	return 0;
+}
+
+/*
  * Hash for the output device based upon layer 2 and layer 3 data. If
- * the packet is not IP mimic bond_xmit_hash_policy_l2()
+ * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
  */
 static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
-	struct iphdr *iph = ip_hdr(skb);
+	struct iphdr *iph;
+	struct ipv6hdr *ipv6h;
+	u32 v6hash;
 
-	if (skb->protocol == htons(ETH_P_IP)) {
+	if (skb->protocol == htons(ETH_P_IP) &&
+		skb_network_header_len(skb) >= sizeof(struct iphdr)) {
+		iph = ip_hdr(skb);
 		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
 			(data->h_dest[5] ^ data->h_source[5])) % count;
-	}
-
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
+	} else if (skb->protocol == htons(ETH_P_IPV6) &&
+		skb_network_header_len(skb) >= sizeof(struct ipv6hdr)) {
+		ipv6h = ipv6_hdr(skb);
+		v6hash =
+			(ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^
+			(ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^
+			(ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]);
+		v6hash = (v6hash >> 16) ^ (v6hash >> 8) ^ v6hash;
+		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
+	}
+
+	return bond_xmit_hash_policy_l2(skb, count);
 }
 
 /*
  * Hash for the output device based upon layer 3 and layer 4 data. If
  * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
- * altogether not IP, mimic bond_xmit_hash_policy_l2()
+ * altogether not IP, fall back on bond_xmit_hash_policy_l2()
  */
 static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
 {
-	struct ethhdr *data = (struct ethhdr *)skb->data;
-	struct iphdr *iph = ip_hdr(skb);
-	__be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
-	int layer4_xor = 0;
+	u32 layer4_xor = 0;
+	struct iphdr *iph;
+	struct ipv6hdr *ipv6h;
 
 	if (skb->protocol == htons(ETH_P_IP)) {
+		iph = ip_hdr(skb);
 		if (!ip_is_fragment(iph) &&
-		    (iph->protocol == IPPROTO_TCP ||
-		     iph->protocol == IPPROTO_UDP)) {
+			(iph->protocol == IPPROTO_TCP ||
+			iph->protocol == IPPROTO_UDP)) {
+			__be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
+			if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 >
+				skb_headlen(skb) - skb_network_offset(skb))
+				goto short_header;
 			layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1)));
+		} else if (skb_network_header_len(skb) < sizeof(struct iphdr)) {
+			goto short_header;
 		}
-		return (layer4_xor ^
-			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
-
+		return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		ipv6h = ipv6_hdr(skb);
+		if (ipv6h->nexthdr == IPPROTO_TCP || ipv6h->nexthdr == IPPROTO_UDP) {
+			__be16 *layer4hdrv6 = (__be16 *)((u8 *)ipv6h + sizeof(struct ipv6hdr));
+			if (sizeof(struct ipv6hdr) + sizeof(__be16) * 2 >
+				skb_headlen(skb) - skb_network_offset(skb))
+				goto short_header;
+			layer4_xor = (*layer4hdrv6 ^ *(layer4hdrv6 + 1));
+		} else if (skb_network_header_len(skb) < sizeof(struct ipv6hdr)) {
+			goto short_header;
+		}
+		layer4_xor ^=
+			(ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^
+			(ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^
+			(ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]);
+		return ((layer4_xor >> 16) ^ (layer4_xor >> 8) ^ layer4_xor) % count;
 	}
 
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
-}
-
-/*
- * Hash for the output device based upon layer 2 data
- */
-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
-{
-	struct ethhdr *data = (struct ethhdr *)skb->data;
-
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
+short_header:
+	return bond_xmit_hash_policy_l2(skb, count);
 }
 
 /*-------------------------- Device entry points ----------------------------*/
-- 
1.7.11

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v5] bonding support for IPv6 transmit hashing
  2012-07-01  8:07 ` [PATCH v5] bonding support for IPv6 transmit hashing John Eaglesham
@ 2012-07-01 10:33   ` David Miller
  2012-07-01 19:01     ` John Eaglesham
  2012-07-01 19:13   ` [PATCH v6] " John Eaglesham
  1 sibling, 1 reply; 25+ messages in thread
From: David Miller @ 2012-07-01 10:33 UTC (permalink / raw)
  To: linux; +Cc: netdev

From: John Eaglesham <linux@8192.net>
Date: Sun,  1 Jul 2012 01:07:43 -0700

> +		v6hash =
> +			(ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^
> +			(ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^
> +			(ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]);

You completely ignored my feedback:

	This is rediculous, just put &ipv6h->saddr into a local
	pointer named 's' and then you won't have use such gymnastics
	to indent the code.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5] bonding support for IPv6 transmit hashing
  2012-07-01 10:33   ` David Miller
@ 2012-07-01 19:01     ` John Eaglesham
  0 siblings, 0 replies; 25+ messages in thread
From: John Eaglesham @ 2012-07-01 19:01 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On 7/1/2012 3:33 AM, David Miller wrote:
> From: John Eaglesham <linux@8192.net>
> Date: Sun,  1 Jul 2012 01:07:43 -0700
>
>> +		v6hash =
>> +			(ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^
>> +			(ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^
>> +			(ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]);
>
> You completely ignored my feedback:
>
> 	This is rediculous, just put &ipv6h->saddr into a local
> 	pointer named 's' and then you won't have use such gymnastics
> 	to indent the code.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

My apologies, I completely misunderstood your feedback. I will make the 
correction and re-submit.

Thanks.

John

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v6] bonding support for IPv6 transmit hashing
  2012-07-01  8:07 ` [PATCH v5] bonding support for IPv6 transmit hashing John Eaglesham
  2012-07-01 10:33   ` David Miller
@ 2012-07-01 19:13   ` John Eaglesham
  2012-07-02 23:33     ` Jay Vosburgh
  2012-08-22  5:12     ` [PATCH v7] bonding: " John Eaglesham
  1 sibling, 2 replies; 25+ messages in thread
From: John Eaglesham @ 2012-07-01 19:13 UTC (permalink / raw)
  To: netdev; +Cc: John Eaglesham

Currently the "bonding" driver does not support load balancing outgoing
traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4)
are currently supported; this patch adds transmit hashing for IPv6 (and
TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the
bonding driver.

The algorithm chosen (xor'ing the bottom three quads of the source and
destination addresses together, then xor'ing each byte of that result into
the bottom byte, finally xor'ing with the last bytes of the MAC addresses)
was selected after testing almost 400,000 unique IPv6 addresses harvested
from server logs. This algorithm had the most even distribution for both
big- and little-endian architectures while still using few instructions. Its
behavior also attempts to closely match that of the IPv4 algorithm.

The IPv6 flow label was intentionally not included in the hash as it appears
to be unset in the vast majority of IPv6 traffic sampled, and the current
algorithm not using the flow label already offers a very even distribution.

Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets,
ie, they are not balanced based on layer 4 information. Additionally,
IPv6 packets with intermediate headers are not balanced based on layer
4 information. In practice these intermediate headers are not common and
this should not cause any problems, and the alternative (a packet-parsing
loop and look-up table) seemed slow and complicated for little gain.

This is an update to prior patches I submitted. This version includes:
* Updated and clarified description
* IPv6 algorithm more closely matches that of IPv4
* Thorough bounds checking on all xmit functions
* Consolidate layer 2 hashing logic into one function
* Update style as per Jay Vosburgh and David Miller
* Patches against net-next as one patch

Patch has been tested and performs as expected.

John Eaglesham

---
 Documentation/networking/bonding.txt | 32 +++++++++++--
 drivers/net/bonding/bond_main.c      | 91 +++++++++++++++++++++++++-----------
 2 files changed, 92 insertions(+), 31 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index bfea8a3..3851dad 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -752,12 +752,23 @@ xmit_hash_policy
 		protocol information to generate the hash.
 
 		Uses XOR of hardware MAC addresses and IP addresses to
-		generate the hash.  The formula is
+		generate the hash.  The IPv4 formula is
 
 		(((source IP XOR dest IP) AND 0xffff) XOR
 			( source MAC XOR destination MAC ))
 				modulo slave count
 
+		The IPv6 formula is
+
+		hash =
+			(source ip quad 2 XOR dest IP quad 2) XOR
+			(source ip quad 3 XOR dest IP quad 3) XOR
+			(source ip quad 4 XOR dest IP quad 4)
+
+		(((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
+			(source MAC XOR destination MAC))
+				modulo slave count
+
 		This algorithm will place all traffic to a particular
 		network peer on the same slave.  For non-IP traffic,
 		the formula is the same as for the layer2 transmit
@@ -778,19 +789,30 @@ xmit_hash_policy
 		slaves, although a single connection will not span
 		multiple slaves.
 
-		The formula for unfragmented TCP and UDP packets is
+		The formula for unfragmented IPv4 TCP and UDP packets is
 
 		((source port XOR dest port) XOR
 			 ((source IP XOR dest IP) AND 0xffff)
 				modulo slave count
 
-		For fragmented TCP or UDP packets and all other IP
-		protocol traffic, the source and destination port
+		The formula for unfragmented IPv6 TCP and UDP packets is
+
+		hash =
+			(source ip quad 2 XOR dest IP quad 2) XOR
+			(source ip quad 3 XOR dest IP quad 3) XOR
+			(source ip quad 4 XOR dest IP quad 4)
+
+		((source port XOR dest port) XOR
+			(hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
+				modulo slave count
+
+		For fragmented TCP or UDP packets and all other IPv4 and
+		IPv6 protocol traffic, the source and destination port
 		information is omitted.  For non-IP traffic, the
 		formula is the same as for the layer2 transmit hash
 		policy.
 
-		This policy is intended to mimic the behavior of
+		The IPv4 policy is intended to mimic the behavior of
 		certain switches, notably Cisco switches with PFC2 as
 		well as some Foundry and IBM products.
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index f5a40b9..c733d55 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3345,56 +3345,95 @@ static struct notifier_block bond_netdev_notifier = {
 /*---------------------------- Hashing Policies -----------------------------*/
 
 /*
+ * Hash for the output device based upon layer 2 data
+ */
+static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
+{
+	struct ethhdr *data = (struct ethhdr *)skb->data;
+
+	if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
+		return (data->h_dest[5] ^ data->h_source[5]) % count;
+
+	return 0;
+}
+
+/*
  * Hash for the output device based upon layer 2 and layer 3 data. If
- * the packet is not IP mimic bond_xmit_hash_policy_l2()
+ * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
  */
 static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
-	struct iphdr *iph = ip_hdr(skb);
-
-	if (skb->protocol == htons(ETH_P_IP)) {
+	struct iphdr *iph;
+	struct ipv6hdr *ipv6h;
+	u32 v6hash;
+	__be32 *s, *d;
+
+	if (skb->protocol == htons(ETH_P_IP) &&
+		skb_network_header_len(skb) >= sizeof(struct iphdr)) {
+		iph = ip_hdr(skb);
 		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
 			(data->h_dest[5] ^ data->h_source[5])) % count;
+	} else if (skb->protocol == htons(ETH_P_IPV6) &&
+		skb_network_header_len(skb) >= sizeof(struct ipv6hdr)) {
+		ipv6h = ipv6_hdr(skb);
+		s = &ipv6h->saddr.s6_addr32[0];
+		d = &ipv6h->daddr.s6_addr32[0];
+		v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
+		v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
+		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
 	}
 
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
+	return bond_xmit_hash_policy_l2(skb, count);
 }
 
 /*
  * Hash for the output device based upon layer 3 and layer 4 data. If
  * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
- * altogether not IP, mimic bond_xmit_hash_policy_l2()
+ * altogether not IP, fall back on bond_xmit_hash_policy_l2()
  */
 static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
 {
-	struct ethhdr *data = (struct ethhdr *)skb->data;
-	struct iphdr *iph = ip_hdr(skb);
-	__be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
-	int layer4_xor = 0;
+	u32 layer4_xor = 0;
+	struct iphdr *iph;
+	struct ipv6hdr *ipv6h;
+	__be32 *s, *d;
+	__be16 *layer4hdr;
 
 	if (skb->protocol == htons(ETH_P_IP)) {
+		iph = ip_hdr(skb);
 		if (!ip_is_fragment(iph) &&
-		    (iph->protocol == IPPROTO_TCP ||
-		     iph->protocol == IPPROTO_UDP)) {
+			(iph->protocol == IPPROTO_TCP ||
+			iph->protocol == IPPROTO_UDP)) {
+			layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
+			if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 >
+				skb_headlen(skb) - skb_network_offset(skb))
+				goto short_header;
 			layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1)));
+		} else if (skb_network_header_len(skb) < sizeof(struct iphdr)) {
+			goto short_header;
 		}
-		return (layer4_xor ^
-			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
-
+		return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		ipv6h = ipv6_hdr(skb);
+		if (ipv6h->nexthdr == IPPROTO_TCP || ipv6h->nexthdr == IPPROTO_UDP) {
+			layer4hdr = (__be16 *)((u8 *)ipv6h + sizeof(struct ipv6hdr));
+			if (sizeof(struct ipv6hdr) + sizeof(__be16) * 2 >
+				skb_headlen(skb) - skb_network_offset(skb))
+				goto short_header;
+			layer4_xor = (*layer4hdr ^ *(layer4hdr + 1));
+		} else if (skb_network_header_len(skb) < sizeof(struct ipv6hdr)) {
+			goto short_header;
+		}
+		s = &ipv6h->saddr.s6_addr32[0];
+		d = &ipv6h->daddr.s6_addr32[0];
+		layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
+		layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^ (layer4_xor >> 8);
+		return layer4_xor % count;
 	}
 
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
-}
-
-/*
- * Hash for the output device based upon layer 2 data
- */
-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
-{
-	struct ethhdr *data = (struct ethhdr *)skb->data;
-
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
+short_header:
+	return bond_xmit_hash_policy_l2(skb, count);
 }
 
 /*-------------------------- Device entry points ----------------------------*/
-- 
1.7.11

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v6] bonding support for IPv6 transmit hashing
  2012-07-01 19:13   ` [PATCH v6] " John Eaglesham
@ 2012-07-02 23:33     ` Jay Vosburgh
  2012-07-03  5:01       ` John Eaglesham
  2012-08-22  5:12     ` [PATCH v7] bonding: " John Eaglesham
  1 sibling, 1 reply; 25+ messages in thread
From: Jay Vosburgh @ 2012-07-02 23:33 UTC (permalink / raw)
  To: John Eaglesham; +Cc: netdev

John Eaglesham <linux@8192.net> wrote:

>Currently the "bonding" driver does not support load balancing outgoing
>traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4)
>are currently supported; this patch adds transmit hashing for IPv6 (and
>TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the
>bonding driver.
>
>The algorithm chosen (xor'ing the bottom three quads of the source and
>destination addresses together, then xor'ing each byte of that result into
>the bottom byte, finally xor'ing with the last bytes of the MAC addresses)
>was selected after testing almost 400,000 unique IPv6 addresses harvested
>from server logs. This algorithm had the most even distribution for both
>big- and little-endian architectures while still using few instructions. Its
>behavior also attempts to closely match that of the IPv4 algorithm.
>
>The IPv6 flow label was intentionally not included in the hash as it appears
>to be unset in the vast majority of IPv6 traffic sampled, and the current
>algorithm not using the flow label already offers a very even distribution.
>
>Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets,
>ie, they are not balanced based on layer 4 information. Additionally,
>IPv6 packets with intermediate headers are not balanced based on layer
>4 information. In practice these intermediate headers are not common and
>this should not cause any problems, and the alternative (a packet-parsing
>loop and look-up table) seemed slow and complicated for little gain.
>
>This is an update to prior patches I submitted. This version includes:
>* Updated and clarified description
>* IPv6 algorithm more closely matches that of IPv4
>* Thorough bounds checking on all xmit functions
>* Consolidate layer 2 hashing logic into one function
>* Update style as per Jay Vosburgh and David Miller
>* Patches against net-next as one patch
>
>Patch has been tested and performs as expected.
>
>John Eaglesham
>
>---
> Documentation/networking/bonding.txt | 32 +++++++++++--
> drivers/net/bonding/bond_main.c      | 91 +++++++++++++++++++++++++-----------
> 2 files changed, 92 insertions(+), 31 deletions(-)
>
>diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
>index bfea8a3..3851dad 100644
>--- a/Documentation/networking/bonding.txt
>+++ b/Documentation/networking/bonding.txt
>@@ -752,12 +752,23 @@ xmit_hash_policy
> 		protocol information to generate the hash.
>
> 		Uses XOR of hardware MAC addresses and IP addresses to
>-		generate the hash.  The formula is
>+		generate the hash.  The IPv4 formula is
>
> 		(((source IP XOR dest IP) AND 0xffff) XOR
> 			( source MAC XOR destination MAC ))
> 				modulo slave count
>
>+		The IPv6 formula is
>+
>+		hash =
>+			(source ip quad 2 XOR dest IP quad 2) XOR
>+			(source ip quad 3 XOR dest IP quad 3) XOR
>+			(source ip quad 4 XOR dest IP quad 4)
>+
>+		(((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
>+			(source MAC XOR destination MAC))
>+				modulo slave count

	This seems to be missing an XOR, between the end of "XOR hash)"
and the start of "(source MAC".

>+
> 		This algorithm will place all traffic to a particular
> 		network peer on the same slave.  For non-IP traffic,
> 		the formula is the same as for the layer2 transmit
>@@ -778,19 +789,30 @@ xmit_hash_policy
> 		slaves, although a single connection will not span
> 		multiple slaves.
>
>-		The formula for unfragmented TCP and UDP packets is
>+		The formula for unfragmented IPv4 TCP and UDP packets is
>
> 		((source port XOR dest port) XOR
> 			 ((source IP XOR dest IP) AND 0xffff)
> 				modulo slave count
>
>-		For fragmented TCP or UDP packets and all other IP
>-		protocol traffic, the source and destination port
>+		The formula for unfragmented IPv6 TCP and UDP packets is
>+
>+		hash =
>+			(source ip quad 2 XOR dest IP quad 2) XOR
>+			(source ip quad 3 XOR dest IP quad 3) XOR
>+			(source ip quad 4 XOR dest IP quad 4)
>+
>+		((source port XOR dest port) XOR
>+			(hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
>+				modulo slave count
>+
>+		For fragmented TCP or UDP packets and all other IPv4 and
>+		IPv6 protocol traffic, the source and destination port
> 		information is omitted.  For non-IP traffic, the
> 		formula is the same as for the layer2 transmit hash
> 		policy.
>
>-		This policy is intended to mimic the behavior of
>+		The IPv4 policy is intended to mimic the behavior of
> 		certain switches, notably Cisco switches with PFC2 as
> 		well as some Foundry and IBM products.
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index f5a40b9..c733d55 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -3345,56 +3345,95 @@ static struct notifier_block bond_netdev_notifier = {
> /*---------------------------- Hashing Policies -----------------------------*/
>
> /*
>+ * Hash for the output device based upon layer 2 data
>+ */
>+static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
>+{
>+	struct ethhdr *data = (struct ethhdr *)skb->data;
>+
>+	if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
>+		return (data->h_dest[5] ^ data->h_source[5]) % count;
>+
>+	return 0;
>+}
>+
>+/*
>  * Hash for the output device based upon layer 2 and layer 3 data. If
>- * the packet is not IP mimic bond_xmit_hash_policy_l2()
>+ * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
>  */
> static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
> {
> 	struct ethhdr *data = (struct ethhdr *)skb->data;
>-	struct iphdr *iph = ip_hdr(skb);
>-
>-	if (skb->protocol == htons(ETH_P_IP)) {
>+	struct iphdr *iph;
>+	struct ipv6hdr *ipv6h;
>+	u32 v6hash;
>+	__be32 *s, *d;
>+
>+	if (skb->protocol == htons(ETH_P_IP) &&
>+		skb_network_header_len(skb) >= sizeof(struct iphdr)) {
>+		iph = ip_hdr(skb);
> 		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
> 			(data->h_dest[5] ^ data->h_source[5])) % count;
>+	} else if (skb->protocol == htons(ETH_P_IPV6) &&
>+		skb_network_header_len(skb) >= sizeof(struct ipv6hdr)) {
>+		ipv6h = ipv6_hdr(skb);
>+		s = &ipv6h->saddr.s6_addr32[0];
>+		d = &ipv6h->daddr.s6_addr32[0];
>+		v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
>+		v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
>+		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
> 	}
>
>-	return (data->h_dest[5] ^ data->h_source[5]) % count;
>+	return bond_xmit_hash_policy_l2(skb, count);
> }
>
> /*
>  * Hash for the output device based upon layer 3 and layer 4 data. If
>  * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
>- * altogether not IP, mimic bond_xmit_hash_policy_l2()
>+ * altogether not IP, fall back on bond_xmit_hash_policy_l2()
>  */
> static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
> {
>-	struct ethhdr *data = (struct ethhdr *)skb->data;
>-	struct iphdr *iph = ip_hdr(skb);
>-	__be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
>-	int layer4_xor = 0;
>+	u32 layer4_xor = 0;
>+	struct iphdr *iph;
>+	struct ipv6hdr *ipv6h;
>+	__be32 *s, *d;
>+	__be16 *layer4hdr;
>
> 	if (skb->protocol == htons(ETH_P_IP)) {
>+		iph = ip_hdr(skb);
> 		if (!ip_is_fragment(iph) &&
>-		    (iph->protocol == IPPROTO_TCP ||
>-		     iph->protocol == IPPROTO_UDP)) {
>+			(iph->protocol == IPPROTO_TCP ||
>+			iph->protocol == IPPROTO_UDP)) {

	Why did these two lines change?

>+			layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
>+			if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 >
>+				skb_headlen(skb) - skb_network_offset(skb))
>+				goto short_header;
> 			layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1)));
>+		} else if (skb_network_header_len(skb) < sizeof(struct iphdr)) {
>+			goto short_header;
> 		}
>-		return (layer4_xor ^
>-			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
>-
>+		return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;

	This line runs past 80 columns.  There are a few more of these
further down.

>+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
>+		ipv6h = ipv6_hdr(skb);
>+		if (ipv6h->nexthdr == IPPROTO_TCP || ipv6h->nexthdr == IPPROTO_UDP) {
>+			layer4hdr = (__be16 *)((u8 *)ipv6h + sizeof(struct ipv6hdr));

	Could this be written as

			layer4hdr = (__be16 *)(ipv6h + 1);

	instead?

	-J

>+			if (sizeof(struct ipv6hdr) + sizeof(__be16) * 2 >
>+				skb_headlen(skb) - skb_network_offset(skb))
>+				goto short_header;
>+			layer4_xor = (*layer4hdr ^ *(layer4hdr + 1));
>+		} else if (skb_network_header_len(skb) < sizeof(struct ipv6hdr)) {
>+			goto short_header;
>+		}
>+		s = &ipv6h->saddr.s6_addr32[0];
>+		d = &ipv6h->daddr.s6_addr32[0];
>+		layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
>+		layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^ (layer4_xor >> 8);
>+		return layer4_xor % count;
> 	}
>
>-	return (data->h_dest[5] ^ data->h_source[5]) % count;
>-}
>-
>-/*
>- * Hash for the output device based upon layer 2 data
>- */
>-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
>-{
>-	struct ethhdr *data = (struct ethhdr *)skb->data;
>-
>-	return (data->h_dest[5] ^ data->h_source[5]) % count;
>+short_header:
>+	return bond_xmit_hash_policy_l2(skb, count);
> }
>
> /*-------------------------- Device entry points ----------------------------*/
>-- 
>1.7.11

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6] bonding support for IPv6 transmit hashing
  2012-07-02 23:33     ` Jay Vosburgh
@ 2012-07-03  5:01       ` John Eaglesham
  2012-07-03  5:14         ` David Miller
  0 siblings, 1 reply; 25+ messages in thread
From: John Eaglesham @ 2012-07-03  5:01 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev

On 7/2/2012 4:33 PM, Jay Vosburgh wrote:
>> +
>> +		(((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
>> +			(source MAC XOR destination MAC))
>> +				modulo slave count
>
> 	This seems to be missing an XOR, between the end of "XOR hash)"
> and the start of "(source MAC".
>

You're correct.

>> 	if (skb->protocol == htons(ETH_P_IP)) {
>> +		iph = ip_hdr(skb);
>> 		if (!ip_is_fragment(iph) &&
>> -		    (iph->protocol == IPPROTO_TCP ||
>> -		     iph->protocol == IPPROTO_UDP)) {
>> +			(iph->protocol == IPPROTO_TCP ||
>> +			iph->protocol == IPPROTO_UDP)) {
>
> 	Why did these two lines change?
>

I replaced the mixed tabs and spaces with all tabs when I updated that 
function, but in retrospect the tabs and spaces were likely intentional. 
I will revert.

>> +			layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
>> +			if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 >
>> +				skb_headlen(skb) - skb_network_offset(skb))
>> +				goto short_header;
>> 			layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1)));
>> +		} else if (skb_network_header_len(skb) < sizeof(struct iphdr)) {
>> +			goto short_header;
>> 		}
>> -		return (layer4_xor ^
>> -			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
>> -
>> +		return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
>
> 	This line runs past 80 columns.  There are a few more of these
> further down.
>

I will double-check this.

>> +	} else if (skb->protocol == htons(ETH_P_IPV6)) {
>> +		ipv6h = ipv6_hdr(skb);
>> +		if (ipv6h->nexthdr == IPPROTO_TCP || ipv6h->nexthdr == IPPROTO_UDP) {
>> +			layer4hdr = (__be16 *)((u8 *)ipv6h + sizeof(struct ipv6hdr));
>
> 	Could this be written as
>
> 			layer4hdr = (__be16 *)(ipv6h + 1);
>
> 	instead?
>
> 	-J
>

Yes, I can make that change. Thanks.

John

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6] bonding support for IPv6 transmit hashing
  2012-07-03  5:01       ` John Eaglesham
@ 2012-07-03  5:14         ` David Miller
  2012-07-03  5:38           ` John Eaglesham
  0 siblings, 1 reply; 25+ messages in thread
From: David Miller @ 2012-07-03  5:14 UTC (permalink / raw)
  To: linux; +Cc: fubar, netdev

From: John Eaglesham <linux@8192.net>
Date: Mon, 02 Jul 2012 22:01:20 -0700

> I replaced the mixed tabs and spaces with all tabs when I updated that
> function, but in retrospect the tabs and spaces were likely
> intentional. I will revert.

They were.  Don't change existing coding style unless you are certain
it is wrong.

Besides any such changes are completely outside of the scope of
the changes you are making, so you should have left them out in
any event.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6] bonding support for IPv6 transmit hashing
  2012-07-03  5:14         ` David Miller
@ 2012-07-03  5:38           ` John Eaglesham
  2012-07-03  5:43             ` David Miller
  0 siblings, 1 reply; 25+ messages in thread
From: John Eaglesham @ 2012-07-03  5:38 UTC (permalink / raw)
  To: David Miller; +Cc: fubar, netdev

On 7/2/2012 10:14 PM, David Miller wrote:
> From: John Eaglesham <linux@8192.net>
> Date: Mon, 02 Jul 2012 22:01:20 -0700
>
>> I replaced the mixed tabs and spaces with all tabs when I updated that
>> function, but in retrospect the tabs and spaces were likely
>> intentional. I will revert.
>
> They were.  Don't change existing coding style unless you are certain
> it is wrong.
>
> Besides any such changes are completely outside of the scope of
> the changes you are making, so you should have left them out in
> any event.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Should the indents in the code I am adding follow the style guide and 
use only tabs, or follow the style already present and mix spaces and 
tabs for indentation?

Thanks,
John

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6] bonding support for IPv6 transmit hashing
  2012-07-03  5:38           ` John Eaglesham
@ 2012-07-03  5:43             ` David Miller
  2012-08-21 18:11               ` Jeremy Brookman
  0 siblings, 1 reply; 25+ messages in thread
From: David Miller @ 2012-07-03  5:43 UTC (permalink / raw)
  To: linux; +Cc: fubar, netdev

From: John Eaglesham <linux@8192.net>
Date: Mon, 02 Jul 2012 22:38:05 -0700

> On 7/2/2012 10:14 PM, David Miller wrote:
>> From: John Eaglesham <linux@8192.net>
>> Date: Mon, 02 Jul 2012 22:01:20 -0700
>>
>>> I replaced the mixed tabs and spaces with all tabs when I updated that
>>> function, but in retrospect the tabs and spaces were likely
>>> intentional. I will revert.
>>
>> They were.  Don't change existing coding style unless you are certain
>> it is wrong.
>>
>> Besides any such changes are completely outside of the scope of
>> the changes you are making, so you should have left them out in
>> any event.
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> Should the indents in the code I am adding follow the style guide and
> use only tabs, or follow the style already present and mix spaces and
> tabs for indentation?

You should use a mix of tabs, as necessary, to get things to line up
how I told you they need to line up.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6] bonding support for IPv6 transmit hashing
  2012-07-03  5:43             ` David Miller
@ 2012-08-21 18:11               ` Jeremy Brookman
  2012-08-21 19:19                 ` Jay Vosburgh
  0 siblings, 1 reply; 25+ messages in thread
From: Jeremy Brookman @ 2012-08-21 18:11 UTC (permalink / raw)
  To: linux, netdev

> You should use a mix of tabs, as necessary, to get things to line up
> how I told you they need to line up.

Unless I'm missing something, this change doesn't seem to have made it
through to the kernel tip, but we could really use this bugfix. Is it
in a repository I didn't notice, or not yet through the review?  If
it's not through the review, is any help needed to get it there?

Regards,

Jeremy Brookman

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6] bonding support for IPv6 transmit hashing
  2012-08-21 18:11               ` Jeremy Brookman
@ 2012-08-21 19:19                 ` Jay Vosburgh
  2012-08-21 22:21                   ` John Eaglesham
  2012-08-23 10:42                   ` Jeremy Brookman
  0 siblings, 2 replies; 25+ messages in thread
From: Jay Vosburgh @ 2012-08-21 19:19 UTC (permalink / raw)
  To: Jeremy Brookman; +Cc: linux, netdev

Jeremy Brookman <jeremy.brookman@gmail.com> wrote:

>> You should use a mix of tabs, as necessary, to get things to line up
>> how I told you they need to line up.
>
>Unless I'm missing something, this change doesn't seem to have made it
>through to the kernel tip, but we could really use this bugfix. Is it
>in a repository I didn't notice, or not yet through the review?  If
>it's not through the review, is any help needed to get it there?

	The submitter (John Eaglesham) never posted an updated version
that addressed the various comments, nor did his original patch
submission include a Signed-off-by.

	I went ahead and updated the patch to address the comments; I've
only compile tested this.  Are you (Jeremy or John) able to test this to
confirm that it will hash ipv6 traffic as expected (I can test it, but
it won't be today)?

	John, can you post a Signed-off-by for your patch (really, this
updated version of your patch)?

	If John signs off and somebody tests this, I'll post a formal
submssion with the full commit message.

	-J

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 6b1c711..834c919 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -752,12 +752,23 @@ xmit_hash_policy
 		protocol information to generate the hash.
 
 		Uses XOR of hardware MAC addresses and IP addresses to
-		generate the hash.  The formula is
+		generate the hash.  The IPv4 formula is
 
 		(((source IP XOR dest IP) AND 0xffff) XOR
 			( source MAC XOR destination MAC ))
 				modulo slave count
 
+		The IPv6 formula is
+
+		hash =
+			(source ip quad 2 XOR dest IP quad 2) XOR
+			(source ip quad 3 XOR dest IP quad 3) XOR
+			(source ip quad 4 XOR dest IP quad 4)
+
+		(((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
+			XOR (source MAC XOR destination MAC))
+				modulo slave count
+
 		This algorithm will place all traffic to a particular
 		network peer on the same slave.  For non-IP traffic,
 		the formula is the same as for the layer2 transmit
@@ -778,19 +789,30 @@ xmit_hash_policy
 		slaves, although a single connection will not span
 		multiple slaves.
 
-		The formula for unfragmented TCP and UDP packets is
+		The formula for unfragmented IPv4 TCP and UDP packets is
 
 		((source port XOR dest port) XOR
 			 ((source IP XOR dest IP) AND 0xffff)
 				modulo slave count
 
-		For fragmented TCP or UDP packets and all other IP
-		protocol traffic, the source and destination port
+		The formula for unfragmented IPv6 TCP and UDP packets is
+
+		hash =
+			(source ip quad 2 XOR dest IP quad 2) XOR
+			(source ip quad 3 XOR dest IP quad 3) XOR
+			(source ip quad 4 XOR dest IP quad 4)
+
+		((source port XOR dest port) XOR
+			(hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
+				modulo slave count
+
+		For fragmented TCP or UDP packets and all other IPv4 and
+		IPv6 protocol traffic, the source and destination port
 		information is omitted.  For non-IP traffic, the
 		formula is the same as for the layer2 transmit hash
 		policy.
 
-		This policy is intended to mimic the behavior of
+		The IPv4 policy is intended to mimic the behavior of
 		certain switches, notably Cisco switches with PFC2 as
 		well as some Foundry and IBM products.
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index d95fbc3..4dfe7e3 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3354,56 +3354,99 @@ static struct notifier_block bond_netdev_notifier = {
 /*---------------------------- Hashing Policies -----------------------------*/
 
 /*
+ * Hash for the output device based upon layer 2 data
+ */
+static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
+{
+	struct ethhdr *data = (struct ethhdr *)skb->data;
+
+	if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
+		return (data->h_dest[5] ^ data->h_source[5]) % count;
+
+	return 0;
+}
+
+/*
  * Hash for the output device based upon layer 2 and layer 3 data. If
- * the packet is not IP mimic bond_xmit_hash_policy_l2()
+ * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
  */
 static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
-	struct iphdr *iph = ip_hdr(skb);
-
-	if (skb->protocol == htons(ETH_P_IP)) {
+	struct iphdr *iph;
+	struct ipv6hdr *ipv6h;
+	u32 v6hash;
+	__be32 *s, *d;
+
+	if (skb->protocol == htons(ETH_P_IP) &&
+		skb_network_header_len(skb) >= sizeof(struct iphdr)) {
+		iph = ip_hdr(skb);
 		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
 			(data->h_dest[5] ^ data->h_source[5])) % count;
+	} else if (skb->protocol == htons(ETH_P_IPV6) &&
+		skb_network_header_len(skb) >= sizeof(struct ipv6hdr)) {
+		ipv6h = ipv6_hdr(skb);
+		s = &ipv6h->saddr.s6_addr32[0];
+		d = &ipv6h->daddr.s6_addr32[0];
+		v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
+		v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
+		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
 	}
 
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
+	return bond_xmit_hash_policy_l2(skb, count);
 }
 
 /*
  * Hash for the output device based upon layer 3 and layer 4 data. If
  * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
- * altogether not IP, mimic bond_xmit_hash_policy_l2()
+ * altogether not IP, fall back on bond_xmit_hash_policy_l2()
  */
 static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
 {
-	struct ethhdr *data = (struct ethhdr *)skb->data;
-	struct iphdr *iph = ip_hdr(skb);
-	__be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
-	int layer4_xor = 0;
+	u32 layer4_xor = 0;
+	struct iphdr *iph;
+	struct ipv6hdr *ipv6h;
+	__be32 *s, *d;
+	__be16 *layer4hdr;
 
 	if (skb->protocol == htons(ETH_P_IP)) {
+		iph = ip_hdr(skb);
 		if (!ip_is_fragment(iph) &&
 		    (iph->protocol == IPPROTO_TCP ||
 		     iph->protocol == IPPROTO_UDP)) {
+			if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 >
+			    skb_headlen(skb) - skb_network_offset(skb))
+				goto short_header;
+			layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
 			layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1)));
+		} else if (skb_network_header_len(skb) < sizeof(struct iphdr)) {
+			goto short_header;
 		}
 		return (layer4_xor ^
 			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
-
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		ipv6h = ipv6_hdr(skb);
+		if (ipv6h->nexthdr == IPPROTO_TCP ||
+		    ipv6h->nexthdr == IPPROTO_UDP) {
+			layer4hdr = (__be16 *)(ipv6h + 1);
+			if (sizeof(struct ipv6hdr) + sizeof(__be16) * 2 >
+				skb_headlen(skb) - skb_network_offset(skb))
+				goto short_header;
+			layer4_xor = (*layer4hdr ^ *(layer4hdr + 1));
+		} else if (skb_network_header_len(skb) <
+			   sizeof(struct ipv6hdr)) {
+			goto short_header;
+		}
+		s = &ipv6h->saddr.s6_addr32[0];
+		d = &ipv6h->daddr.s6_addr32[0];
+		layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
+		layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^
+			       (layer4_xor >> 8);
+		return layer4_xor % count;
 	}
 
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
-}
-
-/*
- * Hash for the output device based upon layer 2 data
- */
-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
-{
-	struct ethhdr *data = (struct ethhdr *)skb->data;
-
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
+short_header:
+	return bond_xmit_hash_policy_l2(skb, count);
 }
 
 /*-------------------------- Device entry points ----------------------------*/


---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v6] bonding support for IPv6 transmit hashing
  2012-08-21 19:19                 ` Jay Vosburgh
@ 2012-08-21 22:21                   ` John Eaglesham
  2012-08-22 12:06                     ` Jeremy Brookman
  2012-08-23 10:42                   ` Jeremy Brookman
  1 sibling, 1 reply; 25+ messages in thread
From: John Eaglesham @ 2012-08-21 22:21 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Jeremy Brookman, netdev

On 8/21/2012 12:19 PM, Jay Vosburgh wrote:
> Jeremy Brookman <jeremy.brookman@gmail.com> wrote:
>
>>> You should use a mix of tabs, as necessary, to get things to line up
>>> how I told you they need to line up.
>>
>> Unless I'm missing something, this change doesn't seem to have made it
>> through to the kernel tip, but we could really use this bugfix. Is it
>> in a repository I didn't notice, or not yet through the review?  If
>> it's not through the review, is any help needed to get it there?
>
> 	The submitter (John Eaglesham) never posted an updated version
> that addressed the various comments, nor did his original patch
> submission include a Signed-off-by.
>
> 	I went ahead and updated the patch to address the comments; I've
> only compile tested this.  Are you (Jeremy or John) able to test this to
> confirm that it will hash ipv6 traffic as expected (I can test it, but
> it won't be today)?
>
> 	John, can you post a Signed-off-by for your patch (really, this
> updated version of your patch)?
>
> 	If John signs off and somebody tests this, I'll post a formal
> submssion with the full commit message.
>
> 	-J
>

Since my last submission I ended up making some changes on my end to 
streamline the logic. I can fold together my patch with yours and test 
them later tonight. If everything looks good I'll post the changes back 
to the list.

Thanks!
John

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v7] bonding: support for IPv6 transmit hashing
  2012-07-01 19:13   ` [PATCH v6] " John Eaglesham
  2012-07-02 23:33     ` Jay Vosburgh
@ 2012-08-22  5:12     ` John Eaglesham
  2012-08-22  5:29       ` David Miller
  2012-08-22  6:43       ` [PATCH v8] " John Eaglesham
  1 sibling, 2 replies; 25+ messages in thread
From: John Eaglesham @ 2012-08-22  5:12 UTC (permalink / raw)
  To: netdev; +Cc: John Eaglesham

From: John Eaglesham <linux@8192.net>

Currently the "bonding" driver does not support load balancing outgoing
traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4)
are currently supported; this patch adds transmit hashing for IPv6 (and
TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the
bonding driver. In addition, bounds checking has been added to all
transmit hashing functions.

The algorithm chosen (xor'ing the bottom three quads of the source and
destination addresses together, then xor'ing each byte of that result into
the bottom byte, finally xor'ing with the last bytes of the MAC addresses)
was selected after testing almost 400,000 unique IPv6 addresses harvested
from server logs. This algorithm had the most even distribution for both
big- and little-endian architectures while still using few instructions. Its
behavior also attempts to closely match that of the IPv4 algorithm.

The IPv6 flow label was intentionally not included in the hash as it appears
to be unset in the vast majority of IPv6 traffic sampled, and the current
algorithm not using the flow label already offers a very even distribution.

Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets,
ie, they are not balanced based on layer 4 information. Additionally,
IPv6 packets with intermediate headers are not balanced based on layer
4 information. In practice these intermediate headers are not common and
this should not cause any problems, and the alternative (a packet-parsing
loop and look-up table) seemed slow and complicated for little gain.

Changes:
v2)
	* Clarify description
	* Add bounds checking to more functions
	* All functions call bond_xmit_hash_policy_l2 rather than re-
          implement the same logic.
v3)
	* Patch against net-next.
	* Style corrections.
v4)
	* Correct indenting.
v5)
	* Squash documentation and code patches into one.
v6)
	* Modify IPv6 hash to behave more like the IPv4 hash, update
	  documentation with modified algorithm.
	* Clean up formatting.
	* Move all variable declaration to the top of the function.
	* Minor change to IPv6 layer 4 hash to match IPv4 hash behavior
	  (mix all hashed address bits together rather than just the
	  bottom 24 bits).
v7)
	* Improve bounds checking code (handle truncated IPv6 header,
	  removed goto, fewer if statements).
	* Re-write pseudocode in documentation to match actual code more
	  closely.
	* Correct indenting, align parentheses, wrap code at <= 80 columns
	  (based on Jay's changes).

Patch has been tested and performs as expected.

Signed-off-by: John Eaglesham

---
 Documentation/networking/bonding.txt | 30 ++++++++++--
 drivers/net/bonding/bond_main.c      | 89 +++++++++++++++++++++++++-----------
 2 files changed, 88 insertions(+), 31 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 6b1c711..10a015c 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -752,12 +752,22 @@ xmit_hash_policy
 		protocol information to generate the hash.
 
 		Uses XOR of hardware MAC addresses and IP addresses to
-		generate the hash.  The formula is
+		generate the hash.  The IPv4 formula is
 
 		(((source IP XOR dest IP) AND 0xffff) XOR
 			( source MAC XOR destination MAC ))
 				modulo slave count
 
+		The IPv6 formula is
+
+		hash = (source ip quad 2 XOR dest IP quad 2) XOR
+		       (source ip quad 3 XOR dest IP quad 3) XOR
+		       (source ip quad 4 XOR dest IP quad 4)
+
+		(((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
+			XOR (source MAC XOR destination MAC))
+				modulo slave count
+
 		This algorithm will place all traffic to a particular
 		network peer on the same slave.  For non-IP traffic,
 		the formula is the same as for the layer2 transmit
@@ -778,19 +788,29 @@ xmit_hash_policy
 		slaves, although a single connection will not span
 		multiple slaves.
 
-		The formula for unfragmented TCP and UDP packets is
+		The formula for unfragmented IPv4 TCP and UDP packets is
 
 		((source port XOR dest port) XOR
 			 ((source IP XOR dest IP) AND 0xffff)
 				modulo slave count
 
-		For fragmented TCP or UDP packets and all other IP
-		protocol traffic, the source and destination port
+		The formula for unfragmented IPv6 TCP and UDP packets is
+
+		hash = (source port XOR dest port) XOR
+		       ((source ip quad 2 XOR dest IP quad 2) XOR
+			(source ip quad 3 XOR dest IP quad 3) XOR
+			(source ip quad 4 XOR dest IP quad 4))
+
+		((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
+			modulo slave count
+
+		For fragmented TCP or UDP packets and all other IPv4 and
+		IPv6 protocol traffic, the source and destination port
 		information is omitted.  For non-IP traffic, the
 		formula is the same as for the layer2 transmit hash
 		policy.
 
-		This policy is intended to mimic the behavior of
+		The IPv4 policy is intended to mimic the behavior of
 		certain switches, notably Cisco switches with PFC2 as
 		well as some Foundry and IBM products.
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index d95fbc3..4221e57 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3354,56 +3354,93 @@ static struct notifier_block bond_netdev_notifier = {
 /*---------------------------- Hashing Policies -----------------------------*/
 
 /*
+ * Hash for the output device based upon layer 2 data
+ */
+static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
+{
+	struct ethhdr *data = (struct ethhdr *)skb->data;
+
+	if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
+		return (data->h_dest[5] ^ data->h_source[5]) % count;
+
+	return 0;
+}
+
+/*
  * Hash for the output device based upon layer 2 and layer 3 data. If
- * the packet is not IP mimic bond_xmit_hash_policy_l2()
+ * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
  */
 static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
-	struct iphdr *iph = ip_hdr(skb);
-
-	if (skb->protocol == htons(ETH_P_IP)) {
+	struct iphdr *iph;
+	struct ipv6hdr *ipv6h;
+	u32 v6hash;
+	__be32 *s, *d;
+
+	if (skb->protocol == htons(ETH_P_IP) &&
+	    skb_network_header_len(skb) >= sizeof(*iph)) {
+		iph = ip_hdr(skb);
 		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
 			(data->h_dest[5] ^ data->h_source[5])) % count;
+	} else if (skb->protocol == htons(ETH_P_IPV6) &&
+		   skb_network_header_len(skb) >= sizeof(*ipv6h)) {
+		ipv6h = ipv6_hdr(skb);
+		s = &ipv6h->saddr.s6_addr32[0];
+		d = &ipv6h->daddr.s6_addr32[0];
+		v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
+		v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
+		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
 	}
 
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
+	return bond_xmit_hash_policy_l2(skb, count);
 }
 
 /*
  * Hash for the output device based upon layer 3 and layer 4 data. If
  * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
- * altogether not IP, mimic bond_xmit_hash_policy_l2()
+ * altogether not IP, fall back on bond_xmit_hash_policy_l2()
  */
 static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
 {
-	struct ethhdr *data = (struct ethhdr *)skb->data;
-	struct iphdr *iph = ip_hdr(skb);
-	__be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
-	int layer4_xor = 0;
-
-	if (skb->protocol == htons(ETH_P_IP)) {
+	u32 layer4_xor = 0;
+	struct iphdr *iph;
+	struct ipv6hdr *ipv6h;
+	__be32 *s, *d;
+	__be16 *layer4hdr;
+
+	if (skb->protocol == htons(ETH_P_IP) &&
+	    skb_network_header_len(skb) >= sizeof(*iph)) {
+		iph = ip_hdr(skb);
 		if (!ip_is_fragment(iph) &&
 		    (iph->protocol == IPPROTO_TCP ||
-		     iph->protocol == IPPROTO_UDP)) {
-			layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1)));
+		     iph->protocol == IPPROTO_UDP) &&
+		    (skb_headlen(skb) - skb_network_offset(skb) >=
+		     iph->ihl * sizeof(u32) + sizeof(*layer4hdr) * 2)) {
+			layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
+			layer4_xor = ntohs(*layer4hdr ^ *(layer4hdr + 1));
 		}
 		return (layer4_xor ^
 			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
-
+	} else if (skb->protocol == htons(ETH_P_IPV6) &&
+		   skb_network_header_len(skb) >= sizeof(*ipv6h)) {
+		ipv6h = ipv6_hdr(skb);
+		if ((ipv6h->nexthdr == IPPROTO_TCP ||
+		     ipv6h->nexthdr == IPPROTO_UDP) &&
+		    (skb_headlen(skb) - skb_network_offset(skb) >=
+		     sizeof(*ipv6h) + sizeof(*layer4hdr) * 2)) {
+			layer4hdr = (__be16 *)(ipv6h + 1);
+			layer4_xor = ntohs(*layer4hdr ^ *(layer4hdr + 1));
+		}
+		s = &ipv6h->saddr.s6_addr32[0];
+		d = &ipv6h->daddr.s6_addr32[0];
+		layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
+		layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^
+			       (layer4_xor >> 8);
+		return layer4_xor % count;
 	}
 
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
-}
-
-/*
- * Hash for the output device based upon layer 2 data
- */
-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
-{
-	struct ethhdr *data = (struct ethhdr *)skb->data;
-
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
+	return bond_xmit_hash_policy_l2(skb, count);
 }
 
 /*-------------------------- Device entry points ----------------------------*/
-- 
1.7.11

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v7] bonding: support for IPv6 transmit hashing
  2012-08-22  5:12     ` [PATCH v7] bonding: " John Eaglesham
@ 2012-08-22  5:29       ` David Miller
  2012-08-22  6:43       ` [PATCH v8] " John Eaglesham
  1 sibling, 0 replies; 25+ messages in thread
From: David Miller @ 2012-08-22  5:29 UTC (permalink / raw)
  To: linux; +Cc: netdev

From: John Eaglesham <linux@8192.net>
Date: Tue, 21 Aug 2012 22:12:33 -0700

> Signed-off-by: John Eaglesham

This is not a proper signoff, you must put an email address on this line.

Also, the changelog between the versions doesn't belong in the commit
log message itself.  It gets placed in commentary after the "---" line.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v8] bonding: support for IPv6 transmit hashing
  2012-08-22  5:12     ` [PATCH v7] bonding: " John Eaglesham
  2012-08-22  5:29       ` David Miller
@ 2012-08-22  6:43       ` John Eaglesham
  2012-08-23  5:49         ` David Miller
  2012-08-23 12:23         ` Jeremy Brookman
  1 sibling, 2 replies; 25+ messages in thread
From: John Eaglesham @ 2012-08-22  6:43 UTC (permalink / raw)
  To: netdev; +Cc: John Eaglesham

From: John Eaglesham <linux@8192.net>

Currently the "bonding" driver does not support load balancing outgoing
traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4)
are currently supported; this patch adds transmit hashing for IPv6 (and
TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the
bonding driver. In addition, bounds checking has been added to all
transmit hashing functions.

The algorithm chosen (xor'ing the bottom three quads of the source and
destination addresses together, then xor'ing each byte of that result into
the bottom byte, finally xor'ing with the last bytes of the MAC addresses)
was selected after testing almost 400,000 unique IPv6 addresses harvested
from server logs. This algorithm had the most even distribution for both
big- and little-endian architectures while still using few instructions. Its
behavior also attempts to closely match that of the IPv4 algorithm.

The IPv6 flow label was intentionally not included in the hash as it appears
to be unset in the vast majority of IPv6 traffic sampled, and the current
algorithm not using the flow label already offers a very even distribution.

Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets,
ie, they are not balanced based on layer 4 information. Additionally,
IPv6 packets with intermediate headers are not balanced based on layer
4 information. In practice these intermediate headers are not common and
this should not cause any problems, and the alternative (a packet-parsing
loop and look-up table) seemed slow and complicated for little gain.

Tested-by: John Eaglesham <linux@8192.net>
Signed-off-by: John Eaglesham <linux@8192.net>

---

Changes:
v2)
	* Clarify description
	* Add bounds checking to more functions
	* All functions call bond_xmit_hash_policy_l2 rather than re-
          implement the same logic.
v3)
	* Patch against net-next.
	* Style corrections.
v4)
	* Correct indenting.
v5)
	* Squash documentation and code patches into one.
v6)
	* Modify IPv6 hash to behave more like the IPv4 hash, update
	  documentation with modified algorithm.
	* Clean up formatting.
	* Move all variable declaration to the top of the function.
	* Minor change to IPv6 layer 4 hash to match IPv4 hash behavior
	  (mix all hashed address bits together rather than just the
	  bottom 24 bits).
v7)
	* Improve bounds checking code (handle truncated IPv6 header,
	  removed goto, fewer if statements).
	* Re-write pseudocode in documentation to match actual code more
	  closely.
	* Correct indenting, align parentheses, wrap code at <= 80 columns
	  (based on Jay's changes).
v8)
	* Correct patch submission format.

 Documentation/networking/bonding.txt | 30 ++++++++++--
 drivers/net/bonding/bond_main.c      | 89 +++++++++++++++++++++++++-----------
 2 files changed, 88 insertions(+), 31 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 6b1c711..10a015c 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -752,12 +752,22 @@ xmit_hash_policy
 		protocol information to generate the hash.
 
 		Uses XOR of hardware MAC addresses and IP addresses to
-		generate the hash.  The formula is
+		generate the hash.  The IPv4 formula is
 
 		(((source IP XOR dest IP) AND 0xffff) XOR
 			( source MAC XOR destination MAC ))
 				modulo slave count
 
+		The IPv6 formula is
+
+		hash = (source ip quad 2 XOR dest IP quad 2) XOR
+		       (source ip quad 3 XOR dest IP quad 3) XOR
+		       (source ip quad 4 XOR dest IP quad 4)
+
+		(((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
+			XOR (source MAC XOR destination MAC))
+				modulo slave count
+
 		This algorithm will place all traffic to a particular
 		network peer on the same slave.  For non-IP traffic,
 		the formula is the same as for the layer2 transmit
@@ -778,19 +788,29 @@ xmit_hash_policy
 		slaves, although a single connection will not span
 		multiple slaves.
 
-		The formula for unfragmented TCP and UDP packets is
+		The formula for unfragmented IPv4 TCP and UDP packets is
 
 		((source port XOR dest port) XOR
 			 ((source IP XOR dest IP) AND 0xffff)
 				modulo slave count
 
-		For fragmented TCP or UDP packets and all other IP
-		protocol traffic, the source and destination port
+		The formula for unfragmented IPv6 TCP and UDP packets is
+
+		hash = (source port XOR dest port) XOR
+		       ((source ip quad 2 XOR dest IP quad 2) XOR
+			(source ip quad 3 XOR dest IP quad 3) XOR
+			(source ip quad 4 XOR dest IP quad 4))
+
+		((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
+			modulo slave count
+
+		For fragmented TCP or UDP packets and all other IPv4 and
+		IPv6 protocol traffic, the source and destination port
 		information is omitted.  For non-IP traffic, the
 		formula is the same as for the layer2 transmit hash
 		policy.
 
-		This policy is intended to mimic the behavior of
+		The IPv4 policy is intended to mimic the behavior of
 		certain switches, notably Cisco switches with PFC2 as
 		well as some Foundry and IBM products.
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index d95fbc3..4221e57 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3354,56 +3354,93 @@ static struct notifier_block bond_netdev_notifier = {
 /*---------------------------- Hashing Policies -----------------------------*/
 
 /*
+ * Hash for the output device based upon layer 2 data
+ */
+static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
+{
+	struct ethhdr *data = (struct ethhdr *)skb->data;
+
+	if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
+		return (data->h_dest[5] ^ data->h_source[5]) % count;
+
+	return 0;
+}
+
+/*
  * Hash for the output device based upon layer 2 and layer 3 data. If
- * the packet is not IP mimic bond_xmit_hash_policy_l2()
+ * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
  */
 static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
-	struct iphdr *iph = ip_hdr(skb);
-
-	if (skb->protocol == htons(ETH_P_IP)) {
+	struct iphdr *iph;
+	struct ipv6hdr *ipv6h;
+	u32 v6hash;
+	__be32 *s, *d;
+
+	if (skb->protocol == htons(ETH_P_IP) &&
+	    skb_network_header_len(skb) >= sizeof(*iph)) {
+		iph = ip_hdr(skb);
 		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
 			(data->h_dest[5] ^ data->h_source[5])) % count;
+	} else if (skb->protocol == htons(ETH_P_IPV6) &&
+		   skb_network_header_len(skb) >= sizeof(*ipv6h)) {
+		ipv6h = ipv6_hdr(skb);
+		s = &ipv6h->saddr.s6_addr32[0];
+		d = &ipv6h->daddr.s6_addr32[0];
+		v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
+		v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
+		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
 	}
 
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
+	return bond_xmit_hash_policy_l2(skb, count);
 }
 
 /*
  * Hash for the output device based upon layer 3 and layer 4 data. If
  * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
- * altogether not IP, mimic bond_xmit_hash_policy_l2()
+ * altogether not IP, fall back on bond_xmit_hash_policy_l2()
  */
 static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
 {
-	struct ethhdr *data = (struct ethhdr *)skb->data;
-	struct iphdr *iph = ip_hdr(skb);
-	__be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
-	int layer4_xor = 0;
-
-	if (skb->protocol == htons(ETH_P_IP)) {
+	u32 layer4_xor = 0;
+	struct iphdr *iph;
+	struct ipv6hdr *ipv6h;
+	__be32 *s, *d;
+	__be16 *layer4hdr;
+
+	if (skb->protocol == htons(ETH_P_IP) &&
+	    skb_network_header_len(skb) >= sizeof(*iph)) {
+		iph = ip_hdr(skb);
 		if (!ip_is_fragment(iph) &&
 		    (iph->protocol == IPPROTO_TCP ||
-		     iph->protocol == IPPROTO_UDP)) {
-			layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1)));
+		     iph->protocol == IPPROTO_UDP) &&
+		    (skb_headlen(skb) - skb_network_offset(skb) >=
+		     iph->ihl * sizeof(u32) + sizeof(*layer4hdr) * 2)) {
+			layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
+			layer4_xor = ntohs(*layer4hdr ^ *(layer4hdr + 1));
 		}
 		return (layer4_xor ^
 			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
-
+	} else if (skb->protocol == htons(ETH_P_IPV6) &&
+		   skb_network_header_len(skb) >= sizeof(*ipv6h)) {
+		ipv6h = ipv6_hdr(skb);
+		if ((ipv6h->nexthdr == IPPROTO_TCP ||
+		     ipv6h->nexthdr == IPPROTO_UDP) &&
+		    (skb_headlen(skb) - skb_network_offset(skb) >=
+		     sizeof(*ipv6h) + sizeof(*layer4hdr) * 2)) {
+			layer4hdr = (__be16 *)(ipv6h + 1);
+			layer4_xor = ntohs(*layer4hdr ^ *(layer4hdr + 1));
+		}
+		s = &ipv6h->saddr.s6_addr32[0];
+		d = &ipv6h->daddr.s6_addr32[0];
+		layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
+		layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^
+			       (layer4_xor >> 8);
+		return layer4_xor % count;
 	}
 
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
-}
-
-/*
- * Hash for the output device based upon layer 2 data
- */
-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
-{
-	struct ethhdr *data = (struct ethhdr *)skb->data;
-
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
+	return bond_xmit_hash_policy_l2(skb, count);
 }
 
 /*-------------------------- Device entry points ----------------------------*/
-- 
1.7.11

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v6] bonding support for IPv6 transmit hashing
  2012-08-21 22:21                   ` John Eaglesham
@ 2012-08-22 12:06                     ` Jeremy Brookman
  0 siblings, 0 replies; 25+ messages in thread
From: Jeremy Brookman @ 2012-08-22 12:06 UTC (permalink / raw)
  To: John Eaglesham; +Cc: Jay Vosburgh, netdev

>>         If John signs off and somebody tests this, I'll post a formal
>> submssion with the full commit message.
>
> Since my last submission I ended up making some changes on my end to
> streamline the logic. I can fold together my patch with yours and test them
> later tonight. If everything looks good I'll post the changes back to the
> list.

Great - thanks for that Jay/John; will look forward to the latest
patch later. I'm actually looking at the 2.6.32 branch so have tested
a backport of Jay's patch (which only took a couple of very minor
modifications); a quick test on an 8-port bond with layer3+4 hashing
looked fine.  (Will hold off until the final patch before doing more
testing.)

Regards,

Jeremy

On Tue, Aug 21, 2012 at 11:21 PM, John Eaglesham <linux@8192.net> wrote:
> On 8/21/2012 12:19 PM, Jay Vosburgh wrote:
>>
>> Jeremy Brookman <jeremy.brookman@gmail.com> wrote:
>>
>>>> You should use a mix of tabs, as necessary, to get things to line up
>>>> how I told you they need to line up.
>>>
>>>
>>> Unless I'm missing something, this change doesn't seem to have made it
>>> through to the kernel tip, but we could really use this bugfix. Is it
>>> in a repository I didn't notice, or not yet through the review?  If
>>> it's not through the review, is any help needed to get it there?
>>
>>
>>         The submitter (John Eaglesham) never posted an updated version
>> that addressed the various comments, nor did his original patch
>> submission include a Signed-off-by.
>>
>>         I went ahead and updated the patch to address the comments; I've
>> only compile tested this.  Are you (Jeremy or John) able to test this to
>> confirm that it will hash ipv6 traffic as expected (I can test it, but
>> it won't be today)?
>>
>>         John, can you post a Signed-off-by for your patch (really, this
>> updated version of your patch)?
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8] bonding: support for IPv6 transmit hashing
  2012-08-22  6:43       ` [PATCH v8] " John Eaglesham
@ 2012-08-23  5:49         ` David Miller
  2012-08-23 12:23         ` Jeremy Brookman
  1 sibling, 0 replies; 25+ messages in thread
From: David Miller @ 2012-08-23  5:49 UTC (permalink / raw)
  To: linux; +Cc: netdev

From: John Eaglesham <linux@8192.net>
Date: Tue, 21 Aug 2012 23:43:35 -0700

> From: John Eaglesham <linux@8192.net>
> 
> Currently the "bonding" driver does not support load balancing outgoing
> traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4)
> are currently supported; this patch adds transmit hashing for IPv6 (and
> TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the
> bonding driver. In addition, bounds checking has been added to all
> transmit hashing functions.
> 
> The algorithm chosen (xor'ing the bottom three quads of the source and
> destination addresses together, then xor'ing each byte of that result into
> the bottom byte, finally xor'ing with the last bytes of the MAC addresses)
> was selected after testing almost 400,000 unique IPv6 addresses harvested
> from server logs. This algorithm had the most even distribution for both
> big- and little-endian architectures while still using few instructions. Its
> behavior also attempts to closely match that of the IPv4 algorithm.
> 
> The IPv6 flow label was intentionally not included in the hash as it appears
> to be unset in the vast majority of IPv6 traffic sampled, and the current
> algorithm not using the flow label already offers a very even distribution.
> 
> Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets,
> ie, they are not balanced based on layer 4 information. Additionally,
> IPv6 packets with intermediate headers are not balanced based on layer
> 4 information. In practice these intermediate headers are not common and
> this should not cause any problems, and the alternative (a packet-parsing
> loop and look-up table) seemed slow and complicated for little gain.
> 
> Tested-by: John Eaglesham <linux@8192.net>
> Signed-off-by: John Eaglesham <linux@8192.net>

Applied, thanks a lot.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6] bonding support for IPv6 transmit hashing
  2012-08-21 19:19                 ` Jay Vosburgh
  2012-08-21 22:21                   ` John Eaglesham
@ 2012-08-23 10:42                   ` Jeremy Brookman
  1 sibling, 0 replies; 25+ messages in thread
From: Jeremy Brookman @ 2012-08-23 10:42 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: linux, netdev

Hi,

A few questions on the actual patch inline now I've had a bit more time...

>  static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)

...

> +       if (skb->protocol == htons(ETH_P_IP) &&
> +               skb_network_header_len(skb) >= sizeof(struct iphdr)) {
> +               iph = ip_hdr(skb);
>                 return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
>                         (data->h_dest[5] ^ data->h_source[5])) % count;
> +       } else if (skb->protocol == htons(ETH_P_IPV6) &&
> +               skb_network_header_len(skb) >= sizeof(struct ipv6hdr)) {
> +               ipv6h = ipv6_hdr(skb);
> +               s = &ipv6h->saddr.s6_addr32[0];
> +               d = &ipv6h->daddr.s6_addr32[0];
> +               v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
> +               v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
> +               return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
>         }
>

If the IPv4 case needs an ntohl, does the IPv6 case (if we're
interpreting the address as 4 32-bits)?  If IPv4 hashing algorithm is
consistent across different endiannesses, then maybe IPv6 should be
too?

>  /*
>   * Hash for the output device based upon layer 3 and layer 4 data. If
>   * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
> - * altogether not IP, mimic bond_xmit_hash_policy_l2()
> + * altogether not IP, fall back on bond_xmit_hash_policy_l2()
>   */

Looking at the code below, we only check the first value of next_hdr
in the chain; however RFC 2460 lists the following possible extension
headers, all of which will therefore cause fallback to L3 hashing:

           Hop-by-Hop Options
           Routing (Type 0)
           Fragment
           Destination Options
           Authentication
           Encapsulating Security Payload

Clearly with some (eg ESP and fragment) we do need to drop out of
using L4 header info in the hash.  And anyone using Routing (Type 0)
probably deserves anything they get.  But should we at least comment
on the limitation that the existence of the other headers also causes
fallback to L3 hashing only? (And possibly even include in
documentation?)  Or of course, fix?

>         if (skb->protocol == htons(ETH_P_IP)) {
> +               iph = ip_hdr(skb);
>                 if (!ip_is_fragment(iph) &&
>                     (iph->protocol == IPPROTO_TCP ||
>                      iph->protocol == IPPROTO_UDP)) {
> +                       if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 >
> +                           skb_headlen(skb) - skb_network_offset(skb))
> +                               goto short_header;
> +                       layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
>                         layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1)));
> +               } else if (skb_network_header_len(skb) < sizeof(struct iphdr)) {
> +                       goto short_header;
>                 }

I don't know the assertions we can make about
skb_network_header_len(skb), but it looks odd doing a length check
against sizeof(iphdr) after iph->protocol has already been
dereferenced. Is this really right?  (The pattern recurs a few times.)

Regards,

Jeremy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8] bonding: support for IPv6 transmit hashing
  2012-08-22  6:43       ` [PATCH v8] " John Eaglesham
  2012-08-23  5:49         ` David Miller
@ 2012-08-23 12:23         ` Jeremy Brookman
  1 sibling, 0 replies; 25+ messages in thread
From: Jeremy Brookman @ 2012-08-23 12:23 UTC (permalink / raw)
  To: John Eaglesham; +Cc: netdev

Thanks for getting this in John.  Apologies for my earlier reply,
where I hadn't spotted this revision of the patch; it looks like the
comments I made have been addressed, and all is well.

Thanks again,

Jeremy

On Wed, Aug 22, 2012 at 7:43 AM, John Eaglesham <linux@8192.net> wrote:
> From: John Eaglesham <linux@8192.net>
>
> Currently the "bonding" driver does not support load balancing outgoing
> traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4)
> are currently supported; this patch adds transmit hashing for IPv6 (and
> TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the
> bonding driver. In addition, bounds checking has been added to all
> transmit hashing functions.
>
> The algorithm chosen (xor'ing the bottom three quads of the source and
> destination addresses together, then xor'ing each byte of that result into
> the bottom byte, finally xor'ing with the last bytes of the MAC addresses)
> was selected after testing almost 400,000 unique IPv6 addresses harvested
> from server logs. This algorithm had the most even distribution for both
> big- and little-endian architectures while still using few instructions. Its
> behavior also attempts to closely match that of the IPv4 algorithm.
>
> The IPv6 flow label was intentionally not included in the hash as it appears
> to be unset in the vast majority of IPv6 traffic sampled, and the current
> algorithm not using the flow label already offers a very even distribution.
>
> Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets,
> ie, they are not balanced based on layer 4 information. Additionally,
> IPv6 packets with intermediate headers are not balanced based on layer
> 4 information. In practice these intermediate headers are not common and
> this should not cause any problems, and the alternative (a packet-parsing
> loop and look-up table) seemed slow and complicated for little gain.
>
> Tested-by: John Eaglesham <linux@8192.net>
> Signed-off-by: John Eaglesham <linux@8192.net>
>
> ---
>
> Changes:
> v2)
>         * Clarify description
>         * Add bounds checking to more functions
>         * All functions call bond_xmit_hash_policy_l2 rather than re-
>           implement the same logic.
> v3)
>         * Patch against net-next.
>         * Style corrections.
> v4)
>         * Correct indenting.
> v5)
>         * Squash documentation and code patches into one.
> v6)
>         * Modify IPv6 hash to behave more like the IPv4 hash, update
>           documentation with modified algorithm.
>         * Clean up formatting.
>         * Move all variable declaration to the top of the function.
>         * Minor change to IPv6 layer 4 hash to match IPv4 hash behavior
>           (mix all hashed address bits together rather than just the
>           bottom 24 bits).
> v7)
>         * Improve bounds checking code (handle truncated IPv6 header,
>           removed goto, fewer if statements).
>         * Re-write pseudocode in documentation to match actual code more
>           closely.
>         * Correct indenting, align parentheses, wrap code at <= 80 columns
>           (based on Jay's changes).
> v8)
>         * Correct patch submission format.
>
>  Documentation/networking/bonding.txt | 30 ++++++++++--
>  drivers/net/bonding/bond_main.c      | 89 +++++++++++++++++++++++++-----------
>  2 files changed, 88 insertions(+), 31 deletions(-)
>
> diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
> index 6b1c711..10a015c 100644
> --- a/Documentation/networking/bonding.txt
> +++ b/Documentation/networking/bonding.txt
> @@ -752,12 +752,22 @@ xmit_hash_policy
>                 protocol information to generate the hash.
>
>                 Uses XOR of hardware MAC addresses and IP addresses to
> -               generate the hash.  The formula is
> +               generate the hash.  The IPv4 formula is
>
>                 (((source IP XOR dest IP) AND 0xffff) XOR
>                         ( source MAC XOR destination MAC ))
>                                 modulo slave count
>
> +               The IPv6 formula is
> +
> +               hash = (source ip quad 2 XOR dest IP quad 2) XOR
> +                      (source ip quad 3 XOR dest IP quad 3) XOR
> +                      (source ip quad 4 XOR dest IP quad 4)
> +
> +               (((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
> +                       XOR (source MAC XOR destination MAC))
> +                               modulo slave count
> +
>                 This algorithm will place all traffic to a particular
>                 network peer on the same slave.  For non-IP traffic,
>                 the formula is the same as for the layer2 transmit
> @@ -778,19 +788,29 @@ xmit_hash_policy
>                 slaves, although a single connection will not span
>                 multiple slaves.
>
> -               The formula for unfragmented TCP and UDP packets is
> +               The formula for unfragmented IPv4 TCP and UDP packets is
>
>                 ((source port XOR dest port) XOR
>                          ((source IP XOR dest IP) AND 0xffff)
>                                 modulo slave count
>
> -               For fragmented TCP or UDP packets and all other IP
> -               protocol traffic, the source and destination port
> +               The formula for unfragmented IPv6 TCP and UDP packets is
> +
> +               hash = (source port XOR dest port) XOR
> +                      ((source ip quad 2 XOR dest IP quad 2) XOR
> +                       (source ip quad 3 XOR dest IP quad 3) XOR
> +                       (source ip quad 4 XOR dest IP quad 4))
> +
> +               ((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
> +                       modulo slave count
> +
> +               For fragmented TCP or UDP packets and all other IPv4 and
> +               IPv6 protocol traffic, the source and destination port
>                 information is omitted.  For non-IP traffic, the
>                 formula is the same as for the layer2 transmit hash
>                 policy.
>
> -               This policy is intended to mimic the behavior of
> +               The IPv4 policy is intended to mimic the behavior of
>                 certain switches, notably Cisco switches with PFC2 as
>                 well as some Foundry and IBM products.
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index d95fbc3..4221e57 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -3354,56 +3354,93 @@ static struct notifier_block bond_netdev_notifier = {
>  /*---------------------------- Hashing Policies -----------------------------*/
>
>  /*
> + * Hash for the output device based upon layer 2 data
> + */
> +static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
> +{
> +       struct ethhdr *data = (struct ethhdr *)skb->data;
> +
> +       if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
> +               return (data->h_dest[5] ^ data->h_source[5]) % count;
> +
> +       return 0;
> +}
> +
> +/*
>   * Hash for the output device based upon layer 2 and layer 3 data. If
> - * the packet is not IP mimic bond_xmit_hash_policy_l2()
> + * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
>   */
>  static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
>  {
>         struct ethhdr *data = (struct ethhdr *)skb->data;
> -       struct iphdr *iph = ip_hdr(skb);
> -
> -       if (skb->protocol == htons(ETH_P_IP)) {
> +       struct iphdr *iph;
> +       struct ipv6hdr *ipv6h;
> +       u32 v6hash;
> +       __be32 *s, *d;
> +
> +       if (skb->protocol == htons(ETH_P_IP) &&
> +           skb_network_header_len(skb) >= sizeof(*iph)) {
> +               iph = ip_hdr(skb);
>                 return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
>                         (data->h_dest[5] ^ data->h_source[5])) % count;
> +       } else if (skb->protocol == htons(ETH_P_IPV6) &&
> +                  skb_network_header_len(skb) >= sizeof(*ipv6h)) {
> +               ipv6h = ipv6_hdr(skb);
> +               s = &ipv6h->saddr.s6_addr32[0];
> +               d = &ipv6h->daddr.s6_addr32[0];
> +               v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
> +               v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
> +               return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
>         }
>
> -       return (data->h_dest[5] ^ data->h_source[5]) % count;
> +       return bond_xmit_hash_policy_l2(skb, count);
>  }
>
>  /*
>   * Hash for the output device based upon layer 3 and layer 4 data. If
>   * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
> - * altogether not IP, mimic bond_xmit_hash_policy_l2()
> + * altogether not IP, fall back on bond_xmit_hash_policy_l2()
>   */
>  static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
>  {
> -       struct ethhdr *data = (struct ethhdr *)skb->data;
> -       struct iphdr *iph = ip_hdr(skb);
> -       __be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
> -       int layer4_xor = 0;
> -
> -       if (skb->protocol == htons(ETH_P_IP)) {
> +       u32 layer4_xor = 0;
> +       struct iphdr *iph;
> +       struct ipv6hdr *ipv6h;
> +       __be32 *s, *d;
> +       __be16 *layer4hdr;
> +
> +       if (skb->protocol == htons(ETH_P_IP) &&
> +           skb_network_header_len(skb) >= sizeof(*iph)) {
> +               iph = ip_hdr(skb);
>                 if (!ip_is_fragment(iph) &&
>                     (iph->protocol == IPPROTO_TCP ||
> -                    iph->protocol == IPPROTO_UDP)) {
> -                       layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1)));
> +                    iph->protocol == IPPROTO_UDP) &&
> +                   (skb_headlen(skb) - skb_network_offset(skb) >=
> +                    iph->ihl * sizeof(u32) + sizeof(*layer4hdr) * 2)) {
> +                       layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
> +                       layer4_xor = ntohs(*layer4hdr ^ *(layer4hdr + 1));
>                 }
>                 return (layer4_xor ^
>                         ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
> -
> +       } else if (skb->protocol == htons(ETH_P_IPV6) &&
> +                  skb_network_header_len(skb) >= sizeof(*ipv6h)) {
> +               ipv6h = ipv6_hdr(skb);
> +               if ((ipv6h->nexthdr == IPPROTO_TCP ||
> +                    ipv6h->nexthdr == IPPROTO_UDP) &&
> +                   (skb_headlen(skb) - skb_network_offset(skb) >=
> +                    sizeof(*ipv6h) + sizeof(*layer4hdr) * 2)) {
> +                       layer4hdr = (__be16 *)(ipv6h + 1);
> +                       layer4_xor = ntohs(*layer4hdr ^ *(layer4hdr + 1));
> +               }
> +               s = &ipv6h->saddr.s6_addr32[0];
> +               d = &ipv6h->daddr.s6_addr32[0];
> +               layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
> +               layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^
> +                              (layer4_xor >> 8);
> +               return layer4_xor % count;
>         }
>
> -       return (data->h_dest[5] ^ data->h_source[5]) % count;
> -}
> -
> -/*
> - * Hash for the output device based upon layer 2 data
> - */
> -static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
> -{
> -       struct ethhdr *data = (struct ethhdr *)skb->data;
> -
> -       return (data->h_dest[5] ^ data->h_source[5]) % count;
> +       return bond_xmit_hash_policy_l2(skb, count);
>  }
>
>  /*-------------------------- Device entry points ----------------------------*/
> --
> 1.7.11
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2012-08-23 12:23 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-01  7:01 [PATCH v4 0/2] bonding support for IPv6 transmit hashing John Eaglesham
2012-07-01  7:01 ` [PATCH v4 1/2] Add support for IPv6 and bounds checking to transmit hashing functions John Eaglesham
2012-07-01  7:33   ` David Miller
2012-07-01  7:01 ` [PATCH v4 2/2] Update bonding driver documentation to include IPv6 transmit hashing algorithm John Eaglesham
2012-07-01  7:34   ` David Miller
2012-07-01  7:42     ` John Eaglesham
2012-07-01  8:07 ` [PATCH v5] bonding support for IPv6 transmit hashing John Eaglesham
2012-07-01 10:33   ` David Miller
2012-07-01 19:01     ` John Eaglesham
2012-07-01 19:13   ` [PATCH v6] " John Eaglesham
2012-07-02 23:33     ` Jay Vosburgh
2012-07-03  5:01       ` John Eaglesham
2012-07-03  5:14         ` David Miller
2012-07-03  5:38           ` John Eaglesham
2012-07-03  5:43             ` David Miller
2012-08-21 18:11               ` Jeremy Brookman
2012-08-21 19:19                 ` Jay Vosburgh
2012-08-21 22:21                   ` John Eaglesham
2012-08-22 12:06                     ` Jeremy Brookman
2012-08-23 10:42                   ` Jeremy Brookman
2012-08-22  5:12     ` [PATCH v7] bonding: " John Eaglesham
2012-08-22  5:29       ` David Miller
2012-08-22  6:43       ` [PATCH v8] " John Eaglesham
2012-08-23  5:49         ` David Miller
2012-08-23 12:23         ` Jeremy Brookman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.