* [PATCH v4 0/2] bonding support for IPv6 transmit hashing @ 2012-07-01 7:01 John Eaglesham 2012-07-01 7:01 ` [PATCH v4 1/2] Add support for IPv6 and bounds checking to transmit hashing functions John Eaglesham ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: John Eaglesham @ 2012-07-01 7:01 UTC (permalink / raw) To: netdev; +Cc: John Eaglesham Currently the "bonding" driver does not support load balancing outgoing traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) are currently supported; this patch adds transmit hashing for IPv6 (and TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the bonding driver. The algorithm chosen (xor'ing the bottom three quads and then xor'ing the bottom three bytes of that) was chosen after testing almost 400,000 unique IPv6 addresses harvested from server logs. This algorithm had the most even distribution for both big- and little-endian architectures while still using few instructions. The IPv6 flow label was intentionally not included in the hash as it appears to be unset in the vast majority of IPv6 traffic sampled, and the current algorithm not using the flow label already offers a very even distribution. Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets, ie, they are not balanced based on layer 4 information. Additionally, IPv6 packets with intermediate headers are not balanced based on layer 4 information. In practice these intermediate headers are not common and this should not cause any problems, and the alternative (a packet-parsing loop and look-up table) seemed slow and complicated for little gain. This is an update to a prior patch I submitted. This version includes a clarified description, thorough bounds checking, updates functions to call bond_xmit_hash_policy_l2 rather than re-implement the same logic, incorporates Jay's style suggestions, and patches against net-next. Patch has been tested and performs as expected. John Eaglesham (2): Add support for IPv6 and bounds checking to transmit hashing functions. Update bonding driver documentation to include IPv6 transmit hashing algorithm. Documentation/networking/bonding.txt | 31 ++++++++++-- drivers/net/bonding/bond_main.c | 91 +++++++++++++++++++++++++----------- 2 files changed, 90 insertions(+), 32 deletions(-) -- 1.7.11 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v4 1/2] Add support for IPv6 and bounds checking to transmit hashing functions. 2012-07-01 7:01 [PATCH v4 0/2] bonding support for IPv6 transmit hashing John Eaglesham @ 2012-07-01 7:01 ` John Eaglesham 2012-07-01 7:33 ` David Miller 2012-07-01 7:01 ` [PATCH v4 2/2] Update bonding driver documentation to include IPv6 transmit hashing algorithm John Eaglesham 2012-07-01 8:07 ` [PATCH v5] bonding support for IPv6 transmit hashing John Eaglesham 2 siblings, 1 reply; 25+ messages in thread From: John Eaglesham @ 2012-07-01 7:01 UTC (permalink / raw) To: netdev; +Cc: John Eaglesham --- drivers/net/bonding/bond_main.c | 91 +++++++++++++++++++++++++++++------------ 1 file changed, 64 insertions(+), 27 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index f5a40b9..b138d84 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3345,56 +3345,93 @@ static struct notifier_block bond_netdev_notifier = { /*---------------------------- Hashing Policies -----------------------------*/ /* + * Hash for the output device based upon layer 2 data + */ +static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count) +{ + struct ethhdr *data = (struct ethhdr *)skb->data; + + if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto)) + return (data->h_dest[5] ^ data->h_source[5]) % count; + + return 0; +} + +/* * Hash for the output device based upon layer 2 and layer 3 data. If - * the packet is not IP mimic bond_xmit_hash_policy_l2() + * the packet is not IP, fall back on bond_xmit_hash_policy_l2() */ static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count) { struct ethhdr *data = (struct ethhdr *)skb->data; - struct iphdr *iph = ip_hdr(skb); + struct iphdr *iph; + struct ipv6hdr *ipv6h; + u32 v6hash; - if (skb->protocol == htons(ETH_P_IP)) { + if (skb->protocol == htons(ETH_P_IP) && + skb_network_header_len(skb) >= sizeof(struct iphdr)) { + iph = ip_hdr(skb); return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^ (data->h_dest[5] ^ data->h_source[5])) % count; - } - - return (data->h_dest[5] ^ data->h_source[5]) % count; + } else if (skb->protocol == htons(ETH_P_IPV6) && + skb_network_header_len(skb) >= sizeof(struct ipv6hdr)) { + ipv6h = ipv6_hdr(skb); + v6hash = + (ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^ + (ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^ + (ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]); + v6hash = (v6hash >> 16) ^ (v6hash >> 8) ^ v6hash; + return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count; + } + + return bond_xmit_hash_policy_l2(skb, count); } /* * Hash for the output device based upon layer 3 and layer 4 data. If * the packet is a frag or not TCP or UDP, just use layer 3 data. If it is - * altogether not IP, mimic bond_xmit_hash_policy_l2() + * altogether not IP, fall back on bond_xmit_hash_policy_l2() */ static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count) { - struct ethhdr *data = (struct ethhdr *)skb->data; - struct iphdr *iph = ip_hdr(skb); - __be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); - int layer4_xor = 0; + u32 layer4_xor = 0; + struct iphdr *iph; + struct ipv6hdr *ipv6h; if (skb->protocol == htons(ETH_P_IP)) { + iph = ip_hdr(skb); if (!ip_is_fragment(iph) && - (iph->protocol == IPPROTO_TCP || - iph->protocol == IPPROTO_UDP)) { + (iph->protocol == IPPROTO_TCP || + iph->protocol == IPPROTO_UDP)) { + __be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); + if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 > + skb_headlen(skb) - skb_network_offset(skb)) + goto short_header; layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1))); + } else if (skb_network_header_len(skb) < sizeof(struct iphdr)) { + goto short_header; } - return (layer4_xor ^ - ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count; - + return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count; + } else if (skb->protocol == htons(ETH_P_IPV6)) { + ipv6h = ipv6_hdr(skb); + if (ipv6h->nexthdr == IPPROTO_TCP || ipv6h->nexthdr == IPPROTO_UDP) { + __be16 *layer4hdrv6 = (__be16 *)((u8 *)ipv6h + sizeof(struct ipv6hdr)); + if (sizeof(struct ipv6hdr) + sizeof(__be16) * 2 > + skb_headlen(skb) - skb_network_offset(skb)) + goto short_header; + layer4_xor = (*layer4hdrv6 ^ *(layer4hdrv6 + 1)); + } else if (skb_network_header_len(skb) < sizeof(struct ipv6hdr)) { + goto short_header; + } + layer4_xor ^= + (ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^ + (ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^ + (ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]); + return ((layer4_xor >> 16) ^ (layer4_xor >> 8) ^ layer4_xor) % count; } - return (data->h_dest[5] ^ data->h_source[5]) % count; -} - -/* - * Hash for the output device based upon layer 2 data - */ -static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count) -{ - struct ethhdr *data = (struct ethhdr *)skb->data; - - return (data->h_dest[5] ^ data->h_source[5]) % count; +short_header: + return bond_xmit_hash_policy_l2(skb, count); } /*-------------------------- Device entry points ----------------------------*/ -- 1.7.11 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v4 1/2] Add support for IPv6 and bounds checking to transmit hashing functions. 2012-07-01 7:01 ` [PATCH v4 1/2] Add support for IPv6 and bounds checking to transmit hashing functions John Eaglesham @ 2012-07-01 7:33 ` David Miller 0 siblings, 0 replies; 25+ messages in thread From: David Miller @ 2012-07-01 7:33 UTC (permalink / raw) To: linux; +Cc: netdev From: John Eaglesham <linux@8192.net> Date: Sun, 1 Jul 2012 00:01:38 -0700 > - if (skb->protocol == htons(ETH_P_IP)) { > + if (skb->protocol == htons(ETH_P_IP) && > + skb_network_header_len(skb) >= sizeof(struct iphdr)) { This is not indented properly, the goal isn't to use only TAB characters to indent, the goal it to line things up right after the openning parenthesis on the previous line, so this should be: if (skb->protocol == htons(ETH_P_IP) && skb_network_header_len(skb) >= sizeof(struct iphdr)) { > + } else if (skb->protocol == htons(ETH_P_IPV6) && > + skb_network_header_len(skb) >= sizeof(struct ipv6hdr)) { Likewise, in this case you even under-indented it. > + v6hash = > + (ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^ > + (ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^ > + (ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]); This is rediculous, just put &ipv6h->saddr into a local pointer named 's' and then you won't have use such gymnastics to indent the code. > if (!ip_is_fragment(iph) && > - (iph->protocol == IPPROTO_TCP || > - iph->protocol == IPPROTO_UDP)) { > + (iph->protocol == IPPROTO_TCP || > + iph->protocol == IPPROTO_UDP)) { This is what _REALLY_ bothers me. You took an existing conditional which _WAS_ indented properly and you made erroneously re-indented. In fact your only change here is to break the indentation. Just remove these changes entirely. > + if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 > > + skb_headlen(skb) - skb_network_offset(skb)) Similarly, this is indented improperly. > + } else if (skb_network_header_len(skb) < sizeof(struct iphdr)) { > + goto short_header; > } Single line basic blocks do not use openning and closing braces. > - return (layer4_xor ^ > - ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count; > - > + return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count; Don't reindent code unless you can do it properly, now this line is way over 80 columns. The previous code was perfectly fine, leave it alone. > + if (ipv6h->nexthdr == IPPROTO_TCP || ipv6h->nexthdr == IPPROTO_UDP) { Line is too long, write it as: if (ipv6h->nexthdr == IPPROTO_TCP || ipv6h->nexthdr == IPPROTO_UDP) { > + __be16 *layer4hdrv6 = (__be16 *)((u8 *)ipv6h + sizeof(struct ipv6hdr)); Likewise, line is too long. Just do the variable declaration seperate from the assignment: __be16 *layer4hdrv6; layer4hdrv6 = BLAH BLAH BLAH; > + if (sizeof(struct ipv6hdr) + sizeof(__be16) * 2 > > + skb_headlen(skb) - skb_network_offset(skb)) Improperly indented, fix. > + } else if (skb_network_header_len(skb) < sizeof(struct ipv6hdr)) { > + goto short_header; > + } Since line basic block, no braces. > + layer4_xor ^= > + (ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^ > + (ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^ > + (ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]); This is gross, do as I said above using a local pointer variable. ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v4 2/2] Update bonding driver documentation to include IPv6 transmit hashing algorithm. 2012-07-01 7:01 [PATCH v4 0/2] bonding support for IPv6 transmit hashing John Eaglesham 2012-07-01 7:01 ` [PATCH v4 1/2] Add support for IPv6 and bounds checking to transmit hashing functions John Eaglesham @ 2012-07-01 7:01 ` John Eaglesham 2012-07-01 7:34 ` David Miller 2012-07-01 8:07 ` [PATCH v5] bonding support for IPv6 transmit hashing John Eaglesham 2 siblings, 1 reply; 25+ messages in thread From: John Eaglesham @ 2012-07-01 7:01 UTC (permalink / raw) To: netdev; +Cc: John Eaglesham --- Documentation/networking/bonding.txt | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-) diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index bfea8a3..5db14fe 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -752,12 +752,22 @@ xmit_hash_policy protocol information to generate the hash. Uses XOR of hardware MAC addresses and IP addresses to - generate the hash. The formula is + generate the hash. The IPv4 formula is (((source IP XOR dest IP) AND 0xffff) XOR ( source MAC XOR destination MAC )) modulo slave count + The IPv6 forumla is + + iphash = + (source ip quad 2 XOR dest IP quad 2) XOR + (source ip quad 3 XOR dest IP quad 3) XOR + (source ip quad 4 XOR dest IP quad 4) + + ((iphash >> 16) XOR (iphash >> 8) XOR iphash) + modulo slave count + This algorithm will place all traffic to a particular network peer on the same slave. For non-IP traffic, the formula is the same as for the layer2 transmit @@ -778,19 +788,30 @@ xmit_hash_policy slaves, although a single connection will not span multiple slaves. - The formula for unfragmented TCP and UDP packets is + The formula for unfragmented IPv4 TCP and UDP packets is ((source port XOR dest port) XOR ((source IP XOR dest IP) AND 0xffff) modulo slave count - For fragmented TCP or UDP packets and all other IP - protocol traffic, the source and destination port + The formula for unfragmented IPv6 TCP and UDP packets is + + iphash = + (source ip quad 2 XOR dest IP quad 2) XOR + (source ip quad 3 XOR dest IP quad 3) XOR + (source ip quad 4 XOR dest IP quad 4) + + ((source port XOR dest port) XOR + (iphash >> 16) XOR (iphash >> 8) XOR iphash) + modulo slave count + + For fragmented TCP or UDP packets and all other IPv4 and + IPv6 protocol traffic, the source and destination port information is omitted. For non-IP traffic, the formula is the same as for the layer2 transmit hash policy. - This policy is intended to mimic the behavior of + The IPv4 policy is intended to mimic the behavior of certain switches, notably Cisco switches with PFC2 as well as some Foundry and IBM products. -- 1.7.11 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v4 2/2] Update bonding driver documentation to include IPv6 transmit hashing algorithm. 2012-07-01 7:01 ` [PATCH v4 2/2] Update bonding driver documentation to include IPv6 transmit hashing algorithm John Eaglesham @ 2012-07-01 7:34 ` David Miller 2012-07-01 7:42 ` John Eaglesham 0 siblings, 1 reply; 25+ messages in thread From: David Miller @ 2012-07-01 7:34 UTC (permalink / raw) To: linux; +Cc: netdev I think you should combine this into the first patch, there is no reason to separate these two changes. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 2/2] Update bonding driver documentation to include IPv6 transmit hashing algorithm. 2012-07-01 7:34 ` David Miller @ 2012-07-01 7:42 ` John Eaglesham 0 siblings, 0 replies; 25+ messages in thread From: John Eaglesham @ 2012-07-01 7:42 UTC (permalink / raw) To: David Miller; +Cc: netdev On 7/1/2012 12:34 AM, David Miller wrote: > > I think you should combine this into the first patch, there is no > reason to separate these two changes. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Thanks, I will do that. John ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v5] bonding support for IPv6 transmit hashing 2012-07-01 7:01 [PATCH v4 0/2] bonding support for IPv6 transmit hashing John Eaglesham 2012-07-01 7:01 ` [PATCH v4 1/2] Add support for IPv6 and bounds checking to transmit hashing functions John Eaglesham 2012-07-01 7:01 ` [PATCH v4 2/2] Update bonding driver documentation to include IPv6 transmit hashing algorithm John Eaglesham @ 2012-07-01 8:07 ` John Eaglesham 2012-07-01 10:33 ` David Miller 2012-07-01 19:13 ` [PATCH v6] " John Eaglesham 2 siblings, 2 replies; 25+ messages in thread From: John Eaglesham @ 2012-07-01 8:07 UTC (permalink / raw) To: netdev; +Cc: John Eaglesham Currently the "bonding" driver does not support load balancing outgoing traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) are currently supported; this patch adds transmit hashing for IPv6 (and TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the bonding driver. The algorithm chosen (xor'ing the bottom three quads and then xor'ing the bottom three bytes of that) was chosen after testing almost 400,000 unique IPv6 addresses harvested from server logs. This algorithm had the most even distribution for both big- and little-endian architectures while still using few instructions. The IPv6 flow label was intentionally not included in the hash as it appears to be unset in the vast majority of IPv6 traffic sampled, and the current algorithm not using the flow label already offers a very even distribution. Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets, ie, they are not balanced based on layer 4 information. Additionally, IPv6 packets with intermediate headers are not balanced based on layer 4 information. In practice these intermediate headers are not common and this should not cause any problems, and the alternative (a packet-parsing loop and look-up table) seemed slow and complicated for little gain. This is an update to a prior patch I submitted. This version includes a clarified description, thorough bounds checking, updates functions to call bond_xmit_hash_policy_l2 rather than re-implement the same logic, incorporates Jay's style suggestions, patches against net-next, and squashes the documentation and code patch into one. Patch has been tested and performs as expected. John Eaglesham --- Documentation/networking/bonding.txt | 31 ++++++++++-- drivers/net/bonding/bond_main.c | 91 +++++++++++++++++++++++++----------- 2 files changed, 90 insertions(+), 32 deletions(-) diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index bfea8a3..5db14fe 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -752,12 +752,22 @@ xmit_hash_policy protocol information to generate the hash. Uses XOR of hardware MAC addresses and IP addresses to - generate the hash. The formula is + generate the hash. The IPv4 formula is (((source IP XOR dest IP) AND 0xffff) XOR ( source MAC XOR destination MAC )) modulo slave count + The IPv6 forumla is + + iphash = + (source ip quad 2 XOR dest IP quad 2) XOR + (source ip quad 3 XOR dest IP quad 3) XOR + (source ip quad 4 XOR dest IP quad 4) + + ((iphash >> 16) XOR (iphash >> 8) XOR iphash) + modulo slave count + This algorithm will place all traffic to a particular network peer on the same slave. For non-IP traffic, the formula is the same as for the layer2 transmit @@ -778,19 +788,30 @@ xmit_hash_policy slaves, although a single connection will not span multiple slaves. - The formula for unfragmented TCP and UDP packets is + The formula for unfragmented IPv4 TCP and UDP packets is ((source port XOR dest port) XOR ((source IP XOR dest IP) AND 0xffff) modulo slave count - For fragmented TCP or UDP packets and all other IP - protocol traffic, the source and destination port + The formula for unfragmented IPv6 TCP and UDP packets is + + iphash = + (source ip quad 2 XOR dest IP quad 2) XOR + (source ip quad 3 XOR dest IP quad 3) XOR + (source ip quad 4 XOR dest IP quad 4) + + ((source port XOR dest port) XOR + (iphash >> 16) XOR (iphash >> 8) XOR iphash) + modulo slave count + + For fragmented TCP or UDP packets and all other IPv4 and + IPv6 protocol traffic, the source and destination port information is omitted. For non-IP traffic, the formula is the same as for the layer2 transmit hash policy. - This policy is intended to mimic the behavior of + The IPv4 policy is intended to mimic the behavior of certain switches, notably Cisco switches with PFC2 as well as some Foundry and IBM products. diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index f5a40b9..b138d84 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3345,56 +3345,93 @@ static struct notifier_block bond_netdev_notifier = { /*---------------------------- Hashing Policies -----------------------------*/ /* + * Hash for the output device based upon layer 2 data + */ +static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count) +{ + struct ethhdr *data = (struct ethhdr *)skb->data; + + if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto)) + return (data->h_dest[5] ^ data->h_source[5]) % count; + + return 0; +} + +/* * Hash for the output device based upon layer 2 and layer 3 data. If - * the packet is not IP mimic bond_xmit_hash_policy_l2() + * the packet is not IP, fall back on bond_xmit_hash_policy_l2() */ static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count) { struct ethhdr *data = (struct ethhdr *)skb->data; - struct iphdr *iph = ip_hdr(skb); + struct iphdr *iph; + struct ipv6hdr *ipv6h; + u32 v6hash; - if (skb->protocol == htons(ETH_P_IP)) { + if (skb->protocol == htons(ETH_P_IP) && + skb_network_header_len(skb) >= sizeof(struct iphdr)) { + iph = ip_hdr(skb); return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^ (data->h_dest[5] ^ data->h_source[5])) % count; - } - - return (data->h_dest[5] ^ data->h_source[5]) % count; + } else if (skb->protocol == htons(ETH_P_IPV6) && + skb_network_header_len(skb) >= sizeof(struct ipv6hdr)) { + ipv6h = ipv6_hdr(skb); + v6hash = + (ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^ + (ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^ + (ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]); + v6hash = (v6hash >> 16) ^ (v6hash >> 8) ^ v6hash; + return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count; + } + + return bond_xmit_hash_policy_l2(skb, count); } /* * Hash for the output device based upon layer 3 and layer 4 data. If * the packet is a frag or not TCP or UDP, just use layer 3 data. If it is - * altogether not IP, mimic bond_xmit_hash_policy_l2() + * altogether not IP, fall back on bond_xmit_hash_policy_l2() */ static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count) { - struct ethhdr *data = (struct ethhdr *)skb->data; - struct iphdr *iph = ip_hdr(skb); - __be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); - int layer4_xor = 0; + u32 layer4_xor = 0; + struct iphdr *iph; + struct ipv6hdr *ipv6h; if (skb->protocol == htons(ETH_P_IP)) { + iph = ip_hdr(skb); if (!ip_is_fragment(iph) && - (iph->protocol == IPPROTO_TCP || - iph->protocol == IPPROTO_UDP)) { + (iph->protocol == IPPROTO_TCP || + iph->protocol == IPPROTO_UDP)) { + __be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); + if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 > + skb_headlen(skb) - skb_network_offset(skb)) + goto short_header; layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1))); + } else if (skb_network_header_len(skb) < sizeof(struct iphdr)) { + goto short_header; } - return (layer4_xor ^ - ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count; - + return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count; + } else if (skb->protocol == htons(ETH_P_IPV6)) { + ipv6h = ipv6_hdr(skb); + if (ipv6h->nexthdr == IPPROTO_TCP || ipv6h->nexthdr == IPPROTO_UDP) { + __be16 *layer4hdrv6 = (__be16 *)((u8 *)ipv6h + sizeof(struct ipv6hdr)); + if (sizeof(struct ipv6hdr) + sizeof(__be16) * 2 > + skb_headlen(skb) - skb_network_offset(skb)) + goto short_header; + layer4_xor = (*layer4hdrv6 ^ *(layer4hdrv6 + 1)); + } else if (skb_network_header_len(skb) < sizeof(struct ipv6hdr)) { + goto short_header; + } + layer4_xor ^= + (ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^ + (ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^ + (ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]); + return ((layer4_xor >> 16) ^ (layer4_xor >> 8) ^ layer4_xor) % count; } - return (data->h_dest[5] ^ data->h_source[5]) % count; -} - -/* - * Hash for the output device based upon layer 2 data - */ -static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count) -{ - struct ethhdr *data = (struct ethhdr *)skb->data; - - return (data->h_dest[5] ^ data->h_source[5]) % count; +short_header: + return bond_xmit_hash_policy_l2(skb, count); } /*-------------------------- Device entry points ----------------------------*/ -- 1.7.11 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v5] bonding support for IPv6 transmit hashing 2012-07-01 8:07 ` [PATCH v5] bonding support for IPv6 transmit hashing John Eaglesham @ 2012-07-01 10:33 ` David Miller 2012-07-01 19:01 ` John Eaglesham 2012-07-01 19:13 ` [PATCH v6] " John Eaglesham 1 sibling, 1 reply; 25+ messages in thread From: David Miller @ 2012-07-01 10:33 UTC (permalink / raw) To: linux; +Cc: netdev From: John Eaglesham <linux@8192.net> Date: Sun, 1 Jul 2012 01:07:43 -0700 > + v6hash = > + (ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^ > + (ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^ > + (ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]); You completely ignored my feedback: This is rediculous, just put &ipv6h->saddr into a local pointer named 's' and then you won't have use such gymnastics to indent the code. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v5] bonding support for IPv6 transmit hashing 2012-07-01 10:33 ` David Miller @ 2012-07-01 19:01 ` John Eaglesham 0 siblings, 0 replies; 25+ messages in thread From: John Eaglesham @ 2012-07-01 19:01 UTC (permalink / raw) To: David Miller; +Cc: netdev On 7/1/2012 3:33 AM, David Miller wrote: > From: John Eaglesham <linux@8192.net> > Date: Sun, 1 Jul 2012 01:07:43 -0700 > >> + v6hash = >> + (ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^ >> + (ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^ >> + (ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]); > > You completely ignored my feedback: > > This is rediculous, just put &ipv6h->saddr into a local > pointer named 's' and then you won't have use such gymnastics > to indent the code. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > My apologies, I completely misunderstood your feedback. I will make the correction and re-submit. Thanks. John ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v6] bonding support for IPv6 transmit hashing 2012-07-01 8:07 ` [PATCH v5] bonding support for IPv6 transmit hashing John Eaglesham 2012-07-01 10:33 ` David Miller @ 2012-07-01 19:13 ` John Eaglesham 2012-07-02 23:33 ` Jay Vosburgh 2012-08-22 5:12 ` [PATCH v7] bonding: " John Eaglesham 1 sibling, 2 replies; 25+ messages in thread From: John Eaglesham @ 2012-07-01 19:13 UTC (permalink / raw) To: netdev; +Cc: John Eaglesham Currently the "bonding" driver does not support load balancing outgoing traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) are currently supported; this patch adds transmit hashing for IPv6 (and TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the bonding driver. The algorithm chosen (xor'ing the bottom three quads of the source and destination addresses together, then xor'ing each byte of that result into the bottom byte, finally xor'ing with the last bytes of the MAC addresses) was selected after testing almost 400,000 unique IPv6 addresses harvested from server logs. This algorithm had the most even distribution for both big- and little-endian architectures while still using few instructions. Its behavior also attempts to closely match that of the IPv4 algorithm. The IPv6 flow label was intentionally not included in the hash as it appears to be unset in the vast majority of IPv6 traffic sampled, and the current algorithm not using the flow label already offers a very even distribution. Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets, ie, they are not balanced based on layer 4 information. Additionally, IPv6 packets with intermediate headers are not balanced based on layer 4 information. In practice these intermediate headers are not common and this should not cause any problems, and the alternative (a packet-parsing loop and look-up table) seemed slow and complicated for little gain. This is an update to prior patches I submitted. This version includes: * Updated and clarified description * IPv6 algorithm more closely matches that of IPv4 * Thorough bounds checking on all xmit functions * Consolidate layer 2 hashing logic into one function * Update style as per Jay Vosburgh and David Miller * Patches against net-next as one patch Patch has been tested and performs as expected. John Eaglesham --- Documentation/networking/bonding.txt | 32 +++++++++++-- drivers/net/bonding/bond_main.c | 91 +++++++++++++++++++++++++----------- 2 files changed, 92 insertions(+), 31 deletions(-) diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index bfea8a3..3851dad 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -752,12 +752,23 @@ xmit_hash_policy protocol information to generate the hash. Uses XOR of hardware MAC addresses and IP addresses to - generate the hash. The formula is + generate the hash. The IPv4 formula is (((source IP XOR dest IP) AND 0xffff) XOR ( source MAC XOR destination MAC )) modulo slave count + The IPv6 formula is + + hash = + (source ip quad 2 XOR dest IP quad 2) XOR + (source ip quad 3 XOR dest IP quad 3) XOR + (source ip quad 4 XOR dest IP quad 4) + + (((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash) + (source MAC XOR destination MAC)) + modulo slave count + This algorithm will place all traffic to a particular network peer on the same slave. For non-IP traffic, the formula is the same as for the layer2 transmit @@ -778,19 +789,30 @@ xmit_hash_policy slaves, although a single connection will not span multiple slaves. - The formula for unfragmented TCP and UDP packets is + The formula for unfragmented IPv4 TCP and UDP packets is ((source port XOR dest port) XOR ((source IP XOR dest IP) AND 0xffff) modulo slave count - For fragmented TCP or UDP packets and all other IP - protocol traffic, the source and destination port + The formula for unfragmented IPv6 TCP and UDP packets is + + hash = + (source ip quad 2 XOR dest IP quad 2) XOR + (source ip quad 3 XOR dest IP quad 3) XOR + (source ip quad 4 XOR dest IP quad 4) + + ((source port XOR dest port) XOR + (hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash) + modulo slave count + + For fragmented TCP or UDP packets and all other IPv4 and + IPv6 protocol traffic, the source and destination port information is omitted. For non-IP traffic, the formula is the same as for the layer2 transmit hash policy. - This policy is intended to mimic the behavior of + The IPv4 policy is intended to mimic the behavior of certain switches, notably Cisco switches with PFC2 as well as some Foundry and IBM products. diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index f5a40b9..c733d55 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3345,56 +3345,95 @@ static struct notifier_block bond_netdev_notifier = { /*---------------------------- Hashing Policies -----------------------------*/ /* + * Hash for the output device based upon layer 2 data + */ +static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count) +{ + struct ethhdr *data = (struct ethhdr *)skb->data; + + if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto)) + return (data->h_dest[5] ^ data->h_source[5]) % count; + + return 0; +} + +/* * Hash for the output device based upon layer 2 and layer 3 data. If - * the packet is not IP mimic bond_xmit_hash_policy_l2() + * the packet is not IP, fall back on bond_xmit_hash_policy_l2() */ static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count) { struct ethhdr *data = (struct ethhdr *)skb->data; - struct iphdr *iph = ip_hdr(skb); - - if (skb->protocol == htons(ETH_P_IP)) { + struct iphdr *iph; + struct ipv6hdr *ipv6h; + u32 v6hash; + __be32 *s, *d; + + if (skb->protocol == htons(ETH_P_IP) && + skb_network_header_len(skb) >= sizeof(struct iphdr)) { + iph = ip_hdr(skb); return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^ (data->h_dest[5] ^ data->h_source[5])) % count; + } else if (skb->protocol == htons(ETH_P_IPV6) && + skb_network_header_len(skb) >= sizeof(struct ipv6hdr)) { + ipv6h = ipv6_hdr(skb); + s = &ipv6h->saddr.s6_addr32[0]; + d = &ipv6h->daddr.s6_addr32[0]; + v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]); + v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8); + return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count; } - return (data->h_dest[5] ^ data->h_source[5]) % count; + return bond_xmit_hash_policy_l2(skb, count); } /* * Hash for the output device based upon layer 3 and layer 4 data. If * the packet is a frag or not TCP or UDP, just use layer 3 data. If it is - * altogether not IP, mimic bond_xmit_hash_policy_l2() + * altogether not IP, fall back on bond_xmit_hash_policy_l2() */ static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count) { - struct ethhdr *data = (struct ethhdr *)skb->data; - struct iphdr *iph = ip_hdr(skb); - __be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); - int layer4_xor = 0; + u32 layer4_xor = 0; + struct iphdr *iph; + struct ipv6hdr *ipv6h; + __be32 *s, *d; + __be16 *layer4hdr; if (skb->protocol == htons(ETH_P_IP)) { + iph = ip_hdr(skb); if (!ip_is_fragment(iph) && - (iph->protocol == IPPROTO_TCP || - iph->protocol == IPPROTO_UDP)) { + (iph->protocol == IPPROTO_TCP || + iph->protocol == IPPROTO_UDP)) { + layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); + if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 > + skb_headlen(skb) - skb_network_offset(skb)) + goto short_header; layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1))); + } else if (skb_network_header_len(skb) < sizeof(struct iphdr)) { + goto short_header; } - return (layer4_xor ^ - ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count; - + return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count; + } else if (skb->protocol == htons(ETH_P_IPV6)) { + ipv6h = ipv6_hdr(skb); + if (ipv6h->nexthdr == IPPROTO_TCP || ipv6h->nexthdr == IPPROTO_UDP) { + layer4hdr = (__be16 *)((u8 *)ipv6h + sizeof(struct ipv6hdr)); + if (sizeof(struct ipv6hdr) + sizeof(__be16) * 2 > + skb_headlen(skb) - skb_network_offset(skb)) + goto short_header; + layer4_xor = (*layer4hdr ^ *(layer4hdr + 1)); + } else if (skb_network_header_len(skb) < sizeof(struct ipv6hdr)) { + goto short_header; + } + s = &ipv6h->saddr.s6_addr32[0]; + d = &ipv6h->daddr.s6_addr32[0]; + layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]); + layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^ (layer4_xor >> 8); + return layer4_xor % count; } - return (data->h_dest[5] ^ data->h_source[5]) % count; -} - -/* - * Hash for the output device based upon layer 2 data - */ -static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count) -{ - struct ethhdr *data = (struct ethhdr *)skb->data; - - return (data->h_dest[5] ^ data->h_source[5]) % count; +short_header: + return bond_xmit_hash_policy_l2(skb, count); } /*-------------------------- Device entry points ----------------------------*/ -- 1.7.11 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v6] bonding support for IPv6 transmit hashing 2012-07-01 19:13 ` [PATCH v6] " John Eaglesham @ 2012-07-02 23:33 ` Jay Vosburgh 2012-07-03 5:01 ` John Eaglesham 2012-08-22 5:12 ` [PATCH v7] bonding: " John Eaglesham 1 sibling, 1 reply; 25+ messages in thread From: Jay Vosburgh @ 2012-07-02 23:33 UTC (permalink / raw) To: John Eaglesham; +Cc: netdev John Eaglesham <linux@8192.net> wrote: >Currently the "bonding" driver does not support load balancing outgoing >traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) >are currently supported; this patch adds transmit hashing for IPv6 (and >TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the >bonding driver. > >The algorithm chosen (xor'ing the bottom three quads of the source and >destination addresses together, then xor'ing each byte of that result into >the bottom byte, finally xor'ing with the last bytes of the MAC addresses) >was selected after testing almost 400,000 unique IPv6 addresses harvested >from server logs. This algorithm had the most even distribution for both >big- and little-endian architectures while still using few instructions. Its >behavior also attempts to closely match that of the IPv4 algorithm. > >The IPv6 flow label was intentionally not included in the hash as it appears >to be unset in the vast majority of IPv6 traffic sampled, and the current >algorithm not using the flow label already offers a very even distribution. > >Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets, >ie, they are not balanced based on layer 4 information. Additionally, >IPv6 packets with intermediate headers are not balanced based on layer >4 information. In practice these intermediate headers are not common and >this should not cause any problems, and the alternative (a packet-parsing >loop and look-up table) seemed slow and complicated for little gain. > >This is an update to prior patches I submitted. This version includes: >* Updated and clarified description >* IPv6 algorithm more closely matches that of IPv4 >* Thorough bounds checking on all xmit functions >* Consolidate layer 2 hashing logic into one function >* Update style as per Jay Vosburgh and David Miller >* Patches against net-next as one patch > >Patch has been tested and performs as expected. > >John Eaglesham > >--- > Documentation/networking/bonding.txt | 32 +++++++++++-- > drivers/net/bonding/bond_main.c | 91 +++++++++++++++++++++++++----------- > 2 files changed, 92 insertions(+), 31 deletions(-) > >diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt >index bfea8a3..3851dad 100644 >--- a/Documentation/networking/bonding.txt >+++ b/Documentation/networking/bonding.txt >@@ -752,12 +752,23 @@ xmit_hash_policy > protocol information to generate the hash. > > Uses XOR of hardware MAC addresses and IP addresses to >- generate the hash. The formula is >+ generate the hash. The IPv4 formula is > > (((source IP XOR dest IP) AND 0xffff) XOR > ( source MAC XOR destination MAC )) > modulo slave count > >+ The IPv6 formula is >+ >+ hash = >+ (source ip quad 2 XOR dest IP quad 2) XOR >+ (source ip quad 3 XOR dest IP quad 3) XOR >+ (source ip quad 4 XOR dest IP quad 4) >+ >+ (((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash) >+ (source MAC XOR destination MAC)) >+ modulo slave count This seems to be missing an XOR, between the end of "XOR hash)" and the start of "(source MAC". >+ > This algorithm will place all traffic to a particular > network peer on the same slave. For non-IP traffic, > the formula is the same as for the layer2 transmit >@@ -778,19 +789,30 @@ xmit_hash_policy > slaves, although a single connection will not span > multiple slaves. > >- The formula for unfragmented TCP and UDP packets is >+ The formula for unfragmented IPv4 TCP and UDP packets is > > ((source port XOR dest port) XOR > ((source IP XOR dest IP) AND 0xffff) > modulo slave count > >- For fragmented TCP or UDP packets and all other IP >- protocol traffic, the source and destination port >+ The formula for unfragmented IPv6 TCP and UDP packets is >+ >+ hash = >+ (source ip quad 2 XOR dest IP quad 2) XOR >+ (source ip quad 3 XOR dest IP quad 3) XOR >+ (source ip quad 4 XOR dest IP quad 4) >+ >+ ((source port XOR dest port) XOR >+ (hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash) >+ modulo slave count >+ >+ For fragmented TCP or UDP packets and all other IPv4 and >+ IPv6 protocol traffic, the source and destination port > information is omitted. For non-IP traffic, the > formula is the same as for the layer2 transmit hash > policy. > >- This policy is intended to mimic the behavior of >+ The IPv4 policy is intended to mimic the behavior of > certain switches, notably Cisco switches with PFC2 as > well as some Foundry and IBM products. > >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >index f5a40b9..c733d55 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -3345,56 +3345,95 @@ static struct notifier_block bond_netdev_notifier = { > /*---------------------------- Hashing Policies -----------------------------*/ > > /* >+ * Hash for the output device based upon layer 2 data >+ */ >+static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count) >+{ >+ struct ethhdr *data = (struct ethhdr *)skb->data; >+ >+ if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto)) >+ return (data->h_dest[5] ^ data->h_source[5]) % count; >+ >+ return 0; >+} >+ >+/* > * Hash for the output device based upon layer 2 and layer 3 data. If >- * the packet is not IP mimic bond_xmit_hash_policy_l2() >+ * the packet is not IP, fall back on bond_xmit_hash_policy_l2() > */ > static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count) > { > struct ethhdr *data = (struct ethhdr *)skb->data; >- struct iphdr *iph = ip_hdr(skb); >- >- if (skb->protocol == htons(ETH_P_IP)) { >+ struct iphdr *iph; >+ struct ipv6hdr *ipv6h; >+ u32 v6hash; >+ __be32 *s, *d; >+ >+ if (skb->protocol == htons(ETH_P_IP) && >+ skb_network_header_len(skb) >= sizeof(struct iphdr)) { >+ iph = ip_hdr(skb); > return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^ > (data->h_dest[5] ^ data->h_source[5])) % count; >+ } else if (skb->protocol == htons(ETH_P_IPV6) && >+ skb_network_header_len(skb) >= sizeof(struct ipv6hdr)) { >+ ipv6h = ipv6_hdr(skb); >+ s = &ipv6h->saddr.s6_addr32[0]; >+ d = &ipv6h->daddr.s6_addr32[0]; >+ v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]); >+ v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8); >+ return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count; > } > >- return (data->h_dest[5] ^ data->h_source[5]) % count; >+ return bond_xmit_hash_policy_l2(skb, count); > } > > /* > * Hash for the output device based upon layer 3 and layer 4 data. If > * the packet is a frag or not TCP or UDP, just use layer 3 data. If it is >- * altogether not IP, mimic bond_xmit_hash_policy_l2() >+ * altogether not IP, fall back on bond_xmit_hash_policy_l2() > */ > static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count) > { >- struct ethhdr *data = (struct ethhdr *)skb->data; >- struct iphdr *iph = ip_hdr(skb); >- __be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); >- int layer4_xor = 0; >+ u32 layer4_xor = 0; >+ struct iphdr *iph; >+ struct ipv6hdr *ipv6h; >+ __be32 *s, *d; >+ __be16 *layer4hdr; > > if (skb->protocol == htons(ETH_P_IP)) { >+ iph = ip_hdr(skb); > if (!ip_is_fragment(iph) && >- (iph->protocol == IPPROTO_TCP || >- iph->protocol == IPPROTO_UDP)) { >+ (iph->protocol == IPPROTO_TCP || >+ iph->protocol == IPPROTO_UDP)) { Why did these two lines change? >+ layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); >+ if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 > >+ skb_headlen(skb) - skb_network_offset(skb)) >+ goto short_header; > layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1))); >+ } else if (skb_network_header_len(skb) < sizeof(struct iphdr)) { >+ goto short_header; > } >- return (layer4_xor ^ >- ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count; >- >+ return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count; This line runs past 80 columns. There are a few more of these further down. >+ } else if (skb->protocol == htons(ETH_P_IPV6)) { >+ ipv6h = ipv6_hdr(skb); >+ if (ipv6h->nexthdr == IPPROTO_TCP || ipv6h->nexthdr == IPPROTO_UDP) { >+ layer4hdr = (__be16 *)((u8 *)ipv6h + sizeof(struct ipv6hdr)); Could this be written as layer4hdr = (__be16 *)(ipv6h + 1); instead? -J >+ if (sizeof(struct ipv6hdr) + sizeof(__be16) * 2 > >+ skb_headlen(skb) - skb_network_offset(skb)) >+ goto short_header; >+ layer4_xor = (*layer4hdr ^ *(layer4hdr + 1)); >+ } else if (skb_network_header_len(skb) < sizeof(struct ipv6hdr)) { >+ goto short_header; >+ } >+ s = &ipv6h->saddr.s6_addr32[0]; >+ d = &ipv6h->daddr.s6_addr32[0]; >+ layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]); >+ layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^ (layer4_xor >> 8); >+ return layer4_xor % count; > } > >- return (data->h_dest[5] ^ data->h_source[5]) % count; >-} >- >-/* >- * Hash for the output device based upon layer 2 data >- */ >-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count) >-{ >- struct ethhdr *data = (struct ethhdr *)skb->data; >- >- return (data->h_dest[5] ^ data->h_source[5]) % count; >+short_header: >+ return bond_xmit_hash_policy_l2(skb, count); > } > > /*-------------------------- Device entry points ----------------------------*/ >-- >1.7.11 --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6] bonding support for IPv6 transmit hashing 2012-07-02 23:33 ` Jay Vosburgh @ 2012-07-03 5:01 ` John Eaglesham 2012-07-03 5:14 ` David Miller 0 siblings, 1 reply; 25+ messages in thread From: John Eaglesham @ 2012-07-03 5:01 UTC (permalink / raw) To: Jay Vosburgh; +Cc: netdev On 7/2/2012 4:33 PM, Jay Vosburgh wrote: >> + >> + (((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash) >> + (source MAC XOR destination MAC)) >> + modulo slave count > > This seems to be missing an XOR, between the end of "XOR hash)" > and the start of "(source MAC". > You're correct. >> if (skb->protocol == htons(ETH_P_IP)) { >> + iph = ip_hdr(skb); >> if (!ip_is_fragment(iph) && >> - (iph->protocol == IPPROTO_TCP || >> - iph->protocol == IPPROTO_UDP)) { >> + (iph->protocol == IPPROTO_TCP || >> + iph->protocol == IPPROTO_UDP)) { > > Why did these two lines change? > I replaced the mixed tabs and spaces with all tabs when I updated that function, but in retrospect the tabs and spaces were likely intentional. I will revert. >> + layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); >> + if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 > >> + skb_headlen(skb) - skb_network_offset(skb)) >> + goto short_header; >> layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1))); >> + } else if (skb_network_header_len(skb) < sizeof(struct iphdr)) { >> + goto short_header; >> } >> - return (layer4_xor ^ >> - ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count; >> - >> + return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count; > > This line runs past 80 columns. There are a few more of these > further down. > I will double-check this. >> + } else if (skb->protocol == htons(ETH_P_IPV6)) { >> + ipv6h = ipv6_hdr(skb); >> + if (ipv6h->nexthdr == IPPROTO_TCP || ipv6h->nexthdr == IPPROTO_UDP) { >> + layer4hdr = (__be16 *)((u8 *)ipv6h + sizeof(struct ipv6hdr)); > > Could this be written as > > layer4hdr = (__be16 *)(ipv6h + 1); > > instead? > > -J > Yes, I can make that change. Thanks. John ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6] bonding support for IPv6 transmit hashing 2012-07-03 5:01 ` John Eaglesham @ 2012-07-03 5:14 ` David Miller 2012-07-03 5:38 ` John Eaglesham 0 siblings, 1 reply; 25+ messages in thread From: David Miller @ 2012-07-03 5:14 UTC (permalink / raw) To: linux; +Cc: fubar, netdev From: John Eaglesham <linux@8192.net> Date: Mon, 02 Jul 2012 22:01:20 -0700 > I replaced the mixed tabs and spaces with all tabs when I updated that > function, but in retrospect the tabs and spaces were likely > intentional. I will revert. They were. Don't change existing coding style unless you are certain it is wrong. Besides any such changes are completely outside of the scope of the changes you are making, so you should have left them out in any event. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6] bonding support for IPv6 transmit hashing 2012-07-03 5:14 ` David Miller @ 2012-07-03 5:38 ` John Eaglesham 2012-07-03 5:43 ` David Miller 0 siblings, 1 reply; 25+ messages in thread From: John Eaglesham @ 2012-07-03 5:38 UTC (permalink / raw) To: David Miller; +Cc: fubar, netdev On 7/2/2012 10:14 PM, David Miller wrote: > From: John Eaglesham <linux@8192.net> > Date: Mon, 02 Jul 2012 22:01:20 -0700 > >> I replaced the mixed tabs and spaces with all tabs when I updated that >> function, but in retrospect the tabs and spaces were likely >> intentional. I will revert. > > They were. Don't change existing coding style unless you are certain > it is wrong. > > Besides any such changes are completely outside of the scope of > the changes you are making, so you should have left them out in > any event. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Should the indents in the code I am adding follow the style guide and use only tabs, or follow the style already present and mix spaces and tabs for indentation? Thanks, John ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6] bonding support for IPv6 transmit hashing 2012-07-03 5:38 ` John Eaglesham @ 2012-07-03 5:43 ` David Miller 2012-08-21 18:11 ` Jeremy Brookman 0 siblings, 1 reply; 25+ messages in thread From: David Miller @ 2012-07-03 5:43 UTC (permalink / raw) To: linux; +Cc: fubar, netdev From: John Eaglesham <linux@8192.net> Date: Mon, 02 Jul 2012 22:38:05 -0700 > On 7/2/2012 10:14 PM, David Miller wrote: >> From: John Eaglesham <linux@8192.net> >> Date: Mon, 02 Jul 2012 22:01:20 -0700 >> >>> I replaced the mixed tabs and spaces with all tabs when I updated that >>> function, but in retrospect the tabs and spaces were likely >>> intentional. I will revert. >> >> They were. Don't change existing coding style unless you are certain >> it is wrong. >> >> Besides any such changes are completely outside of the scope of >> the changes you are making, so you should have left them out in >> any event. >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > Should the indents in the code I am adding follow the style guide and > use only tabs, or follow the style already present and mix spaces and > tabs for indentation? You should use a mix of tabs, as necessary, to get things to line up how I told you they need to line up. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6] bonding support for IPv6 transmit hashing 2012-07-03 5:43 ` David Miller @ 2012-08-21 18:11 ` Jeremy Brookman 2012-08-21 19:19 ` Jay Vosburgh 0 siblings, 1 reply; 25+ messages in thread From: Jeremy Brookman @ 2012-08-21 18:11 UTC (permalink / raw) To: linux, netdev > You should use a mix of tabs, as necessary, to get things to line up > how I told you they need to line up. Unless I'm missing something, this change doesn't seem to have made it through to the kernel tip, but we could really use this bugfix. Is it in a repository I didn't notice, or not yet through the review? If it's not through the review, is any help needed to get it there? Regards, Jeremy Brookman ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6] bonding support for IPv6 transmit hashing 2012-08-21 18:11 ` Jeremy Brookman @ 2012-08-21 19:19 ` Jay Vosburgh 2012-08-21 22:21 ` John Eaglesham 2012-08-23 10:42 ` Jeremy Brookman 0 siblings, 2 replies; 25+ messages in thread From: Jay Vosburgh @ 2012-08-21 19:19 UTC (permalink / raw) To: Jeremy Brookman; +Cc: linux, netdev Jeremy Brookman <jeremy.brookman@gmail.com> wrote: >> You should use a mix of tabs, as necessary, to get things to line up >> how I told you they need to line up. > >Unless I'm missing something, this change doesn't seem to have made it >through to the kernel tip, but we could really use this bugfix. Is it >in a repository I didn't notice, or not yet through the review? If >it's not through the review, is any help needed to get it there? The submitter (John Eaglesham) never posted an updated version that addressed the various comments, nor did his original patch submission include a Signed-off-by. I went ahead and updated the patch to address the comments; I've only compile tested this. Are you (Jeremy or John) able to test this to confirm that it will hash ipv6 traffic as expected (I can test it, but it won't be today)? John, can you post a Signed-off-by for your patch (really, this updated version of your patch)? If John signs off and somebody tests this, I'll post a formal submssion with the full commit message. -J diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 6b1c711..834c919 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -752,12 +752,23 @@ xmit_hash_policy protocol information to generate the hash. Uses XOR of hardware MAC addresses and IP addresses to - generate the hash. The formula is + generate the hash. The IPv4 formula is (((source IP XOR dest IP) AND 0xffff) XOR ( source MAC XOR destination MAC )) modulo slave count + The IPv6 formula is + + hash = + (source ip quad 2 XOR dest IP quad 2) XOR + (source ip quad 3 XOR dest IP quad 3) XOR + (source ip quad 4 XOR dest IP quad 4) + + (((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash) + XOR (source MAC XOR destination MAC)) + modulo slave count + This algorithm will place all traffic to a particular network peer on the same slave. For non-IP traffic, the formula is the same as for the layer2 transmit @@ -778,19 +789,30 @@ xmit_hash_policy slaves, although a single connection will not span multiple slaves. - The formula for unfragmented TCP and UDP packets is + The formula for unfragmented IPv4 TCP and UDP packets is ((source port XOR dest port) XOR ((source IP XOR dest IP) AND 0xffff) modulo slave count - For fragmented TCP or UDP packets and all other IP - protocol traffic, the source and destination port + The formula for unfragmented IPv6 TCP and UDP packets is + + hash = + (source ip quad 2 XOR dest IP quad 2) XOR + (source ip quad 3 XOR dest IP quad 3) XOR + (source ip quad 4 XOR dest IP quad 4) + + ((source port XOR dest port) XOR + (hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash) + modulo slave count + + For fragmented TCP or UDP packets and all other IPv4 and + IPv6 protocol traffic, the source and destination port information is omitted. For non-IP traffic, the formula is the same as for the layer2 transmit hash policy. - This policy is intended to mimic the behavior of + The IPv4 policy is intended to mimic the behavior of certain switches, notably Cisco switches with PFC2 as well as some Foundry and IBM products. diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index d95fbc3..4dfe7e3 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3354,56 +3354,99 @@ static struct notifier_block bond_netdev_notifier = { /*---------------------------- Hashing Policies -----------------------------*/ /* + * Hash for the output device based upon layer 2 data + */ +static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count) +{ + struct ethhdr *data = (struct ethhdr *)skb->data; + + if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto)) + return (data->h_dest[5] ^ data->h_source[5]) % count; + + return 0; +} + +/* * Hash for the output device based upon layer 2 and layer 3 data. If - * the packet is not IP mimic bond_xmit_hash_policy_l2() + * the packet is not IP, fall back on bond_xmit_hash_policy_l2() */ static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count) { struct ethhdr *data = (struct ethhdr *)skb->data; - struct iphdr *iph = ip_hdr(skb); - - if (skb->protocol == htons(ETH_P_IP)) { + struct iphdr *iph; + struct ipv6hdr *ipv6h; + u32 v6hash; + __be32 *s, *d; + + if (skb->protocol == htons(ETH_P_IP) && + skb_network_header_len(skb) >= sizeof(struct iphdr)) { + iph = ip_hdr(skb); return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^ (data->h_dest[5] ^ data->h_source[5])) % count; + } else if (skb->protocol == htons(ETH_P_IPV6) && + skb_network_header_len(skb) >= sizeof(struct ipv6hdr)) { + ipv6h = ipv6_hdr(skb); + s = &ipv6h->saddr.s6_addr32[0]; + d = &ipv6h->daddr.s6_addr32[0]; + v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]); + v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8); + return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count; } - return (data->h_dest[5] ^ data->h_source[5]) % count; + return bond_xmit_hash_policy_l2(skb, count); } /* * Hash for the output device based upon layer 3 and layer 4 data. If * the packet is a frag or not TCP or UDP, just use layer 3 data. If it is - * altogether not IP, mimic bond_xmit_hash_policy_l2() + * altogether not IP, fall back on bond_xmit_hash_policy_l2() */ static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count) { - struct ethhdr *data = (struct ethhdr *)skb->data; - struct iphdr *iph = ip_hdr(skb); - __be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); - int layer4_xor = 0; + u32 layer4_xor = 0; + struct iphdr *iph; + struct ipv6hdr *ipv6h; + __be32 *s, *d; + __be16 *layer4hdr; if (skb->protocol == htons(ETH_P_IP)) { + iph = ip_hdr(skb); if (!ip_is_fragment(iph) && (iph->protocol == IPPROTO_TCP || iph->protocol == IPPROTO_UDP)) { + if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 > + skb_headlen(skb) - skb_network_offset(skb)) + goto short_header; + layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1))); + } else if (skb_network_header_len(skb) < sizeof(struct iphdr)) { + goto short_header; } return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count; - + } else if (skb->protocol == htons(ETH_P_IPV6)) { + ipv6h = ipv6_hdr(skb); + if (ipv6h->nexthdr == IPPROTO_TCP || + ipv6h->nexthdr == IPPROTO_UDP) { + layer4hdr = (__be16 *)(ipv6h + 1); + if (sizeof(struct ipv6hdr) + sizeof(__be16) * 2 > + skb_headlen(skb) - skb_network_offset(skb)) + goto short_header; + layer4_xor = (*layer4hdr ^ *(layer4hdr + 1)); + } else if (skb_network_header_len(skb) < + sizeof(struct ipv6hdr)) { + goto short_header; + } + s = &ipv6h->saddr.s6_addr32[0]; + d = &ipv6h->daddr.s6_addr32[0]; + layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]); + layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^ + (layer4_xor >> 8); + return layer4_xor % count; } - return (data->h_dest[5] ^ data->h_source[5]) % count; -} - -/* - * Hash for the output device based upon layer 2 data - */ -static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count) -{ - struct ethhdr *data = (struct ethhdr *)skb->data; - - return (data->h_dest[5] ^ data->h_source[5]) % count; +short_header: + return bond_xmit_hash_policy_l2(skb, count); } /*-------------------------- Device entry points ----------------------------*/ --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v6] bonding support for IPv6 transmit hashing 2012-08-21 19:19 ` Jay Vosburgh @ 2012-08-21 22:21 ` John Eaglesham 2012-08-22 12:06 ` Jeremy Brookman 2012-08-23 10:42 ` Jeremy Brookman 1 sibling, 1 reply; 25+ messages in thread From: John Eaglesham @ 2012-08-21 22:21 UTC (permalink / raw) To: Jay Vosburgh; +Cc: Jeremy Brookman, netdev On 8/21/2012 12:19 PM, Jay Vosburgh wrote: > Jeremy Brookman <jeremy.brookman@gmail.com> wrote: > >>> You should use a mix of tabs, as necessary, to get things to line up >>> how I told you they need to line up. >> >> Unless I'm missing something, this change doesn't seem to have made it >> through to the kernel tip, but we could really use this bugfix. Is it >> in a repository I didn't notice, or not yet through the review? If >> it's not through the review, is any help needed to get it there? > > The submitter (John Eaglesham) never posted an updated version > that addressed the various comments, nor did his original patch > submission include a Signed-off-by. > > I went ahead and updated the patch to address the comments; I've > only compile tested this. Are you (Jeremy or John) able to test this to > confirm that it will hash ipv6 traffic as expected (I can test it, but > it won't be today)? > > John, can you post a Signed-off-by for your patch (really, this > updated version of your patch)? > > If John signs off and somebody tests this, I'll post a formal > submssion with the full commit message. > > -J > Since my last submission I ended up making some changes on my end to streamline the logic. I can fold together my patch with yours and test them later tonight. If everything looks good I'll post the changes back to the list. Thanks! John ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6] bonding support for IPv6 transmit hashing 2012-08-21 22:21 ` John Eaglesham @ 2012-08-22 12:06 ` Jeremy Brookman 0 siblings, 0 replies; 25+ messages in thread From: Jeremy Brookman @ 2012-08-22 12:06 UTC (permalink / raw) To: John Eaglesham; +Cc: Jay Vosburgh, netdev >> If John signs off and somebody tests this, I'll post a formal >> submssion with the full commit message. > > Since my last submission I ended up making some changes on my end to > streamline the logic. I can fold together my patch with yours and test them > later tonight. If everything looks good I'll post the changes back to the > list. Great - thanks for that Jay/John; will look forward to the latest patch later. I'm actually looking at the 2.6.32 branch so have tested a backport of Jay's patch (which only took a couple of very minor modifications); a quick test on an 8-port bond with layer3+4 hashing looked fine. (Will hold off until the final patch before doing more testing.) Regards, Jeremy On Tue, Aug 21, 2012 at 11:21 PM, John Eaglesham <linux@8192.net> wrote: > On 8/21/2012 12:19 PM, Jay Vosburgh wrote: >> >> Jeremy Brookman <jeremy.brookman@gmail.com> wrote: >> >>>> You should use a mix of tabs, as necessary, to get things to line up >>>> how I told you they need to line up. >>> >>> >>> Unless I'm missing something, this change doesn't seem to have made it >>> through to the kernel tip, but we could really use this bugfix. Is it >>> in a repository I didn't notice, or not yet through the review? If >>> it's not through the review, is any help needed to get it there? >> >> >> The submitter (John Eaglesham) never posted an updated version >> that addressed the various comments, nor did his original patch >> submission include a Signed-off-by. >> >> I went ahead and updated the patch to address the comments; I've >> only compile tested this. Are you (Jeremy or John) able to test this to >> confirm that it will hash ipv6 traffic as expected (I can test it, but >> it won't be today)? >> >> John, can you post a Signed-off-by for your patch (really, this >> updated version of your patch)? >> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6] bonding support for IPv6 transmit hashing 2012-08-21 19:19 ` Jay Vosburgh 2012-08-21 22:21 ` John Eaglesham @ 2012-08-23 10:42 ` Jeremy Brookman 1 sibling, 0 replies; 25+ messages in thread From: Jeremy Brookman @ 2012-08-23 10:42 UTC (permalink / raw) To: Jay Vosburgh; +Cc: linux, netdev Hi, A few questions on the actual patch inline now I've had a bit more time... > static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count) ... > + if (skb->protocol == htons(ETH_P_IP) && > + skb_network_header_len(skb) >= sizeof(struct iphdr)) { > + iph = ip_hdr(skb); > return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^ > (data->h_dest[5] ^ data->h_source[5])) % count; > + } else if (skb->protocol == htons(ETH_P_IPV6) && > + skb_network_header_len(skb) >= sizeof(struct ipv6hdr)) { > + ipv6h = ipv6_hdr(skb); > + s = &ipv6h->saddr.s6_addr32[0]; > + d = &ipv6h->daddr.s6_addr32[0]; > + v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]); > + v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8); > + return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count; > } > If the IPv4 case needs an ntohl, does the IPv6 case (if we're interpreting the address as 4 32-bits)? If IPv4 hashing algorithm is consistent across different endiannesses, then maybe IPv6 should be too? > /* > * Hash for the output device based upon layer 3 and layer 4 data. If > * the packet is a frag or not TCP or UDP, just use layer 3 data. If it is > - * altogether not IP, mimic bond_xmit_hash_policy_l2() > + * altogether not IP, fall back on bond_xmit_hash_policy_l2() > */ Looking at the code below, we only check the first value of next_hdr in the chain; however RFC 2460 lists the following possible extension headers, all of which will therefore cause fallback to L3 hashing: Hop-by-Hop Options Routing (Type 0) Fragment Destination Options Authentication Encapsulating Security Payload Clearly with some (eg ESP and fragment) we do need to drop out of using L4 header info in the hash. And anyone using Routing (Type 0) probably deserves anything they get. But should we at least comment on the limitation that the existence of the other headers also causes fallback to L3 hashing only? (And possibly even include in documentation?) Or of course, fix? > if (skb->protocol == htons(ETH_P_IP)) { > + iph = ip_hdr(skb); > if (!ip_is_fragment(iph) && > (iph->protocol == IPPROTO_TCP || > iph->protocol == IPPROTO_UDP)) { > + if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 > > + skb_headlen(skb) - skb_network_offset(skb)) > + goto short_header; > + layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); > layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1))); > + } else if (skb_network_header_len(skb) < sizeof(struct iphdr)) { > + goto short_header; > } I don't know the assertions we can make about skb_network_header_len(skb), but it looks odd doing a length check against sizeof(iphdr) after iph->protocol has already been dereferenced. Is this really right? (The pattern recurs a few times.) Regards, Jeremy ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v7] bonding: support for IPv6 transmit hashing 2012-07-01 19:13 ` [PATCH v6] " John Eaglesham 2012-07-02 23:33 ` Jay Vosburgh @ 2012-08-22 5:12 ` John Eaglesham 2012-08-22 5:29 ` David Miller 2012-08-22 6:43 ` [PATCH v8] " John Eaglesham 1 sibling, 2 replies; 25+ messages in thread From: John Eaglesham @ 2012-08-22 5:12 UTC (permalink / raw) To: netdev; +Cc: John Eaglesham From: John Eaglesham <linux@8192.net> Currently the "bonding" driver does not support load balancing outgoing traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) are currently supported; this patch adds transmit hashing for IPv6 (and TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the bonding driver. In addition, bounds checking has been added to all transmit hashing functions. The algorithm chosen (xor'ing the bottom three quads of the source and destination addresses together, then xor'ing each byte of that result into the bottom byte, finally xor'ing with the last bytes of the MAC addresses) was selected after testing almost 400,000 unique IPv6 addresses harvested from server logs. This algorithm had the most even distribution for both big- and little-endian architectures while still using few instructions. Its behavior also attempts to closely match that of the IPv4 algorithm. The IPv6 flow label was intentionally not included in the hash as it appears to be unset in the vast majority of IPv6 traffic sampled, and the current algorithm not using the flow label already offers a very even distribution. Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets, ie, they are not balanced based on layer 4 information. Additionally, IPv6 packets with intermediate headers are not balanced based on layer 4 information. In practice these intermediate headers are not common and this should not cause any problems, and the alternative (a packet-parsing loop and look-up table) seemed slow and complicated for little gain. Changes: v2) * Clarify description * Add bounds checking to more functions * All functions call bond_xmit_hash_policy_l2 rather than re- implement the same logic. v3) * Patch against net-next. * Style corrections. v4) * Correct indenting. v5) * Squash documentation and code patches into one. v6) * Modify IPv6 hash to behave more like the IPv4 hash, update documentation with modified algorithm. * Clean up formatting. * Move all variable declaration to the top of the function. * Minor change to IPv6 layer 4 hash to match IPv4 hash behavior (mix all hashed address bits together rather than just the bottom 24 bits). v7) * Improve bounds checking code (handle truncated IPv6 header, removed goto, fewer if statements). * Re-write pseudocode in documentation to match actual code more closely. * Correct indenting, align parentheses, wrap code at <= 80 columns (based on Jay's changes). Patch has been tested and performs as expected. Signed-off-by: John Eaglesham --- Documentation/networking/bonding.txt | 30 ++++++++++-- drivers/net/bonding/bond_main.c | 89 +++++++++++++++++++++++++----------- 2 files changed, 88 insertions(+), 31 deletions(-) diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 6b1c711..10a015c 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -752,12 +752,22 @@ xmit_hash_policy protocol information to generate the hash. Uses XOR of hardware MAC addresses and IP addresses to - generate the hash. The formula is + generate the hash. The IPv4 formula is (((source IP XOR dest IP) AND 0xffff) XOR ( source MAC XOR destination MAC )) modulo slave count + The IPv6 formula is + + hash = (source ip quad 2 XOR dest IP quad 2) XOR + (source ip quad 3 XOR dest IP quad 3) XOR + (source ip quad 4 XOR dest IP quad 4) + + (((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash) + XOR (source MAC XOR destination MAC)) + modulo slave count + This algorithm will place all traffic to a particular network peer on the same slave. For non-IP traffic, the formula is the same as for the layer2 transmit @@ -778,19 +788,29 @@ xmit_hash_policy slaves, although a single connection will not span multiple slaves. - The formula for unfragmented TCP and UDP packets is + The formula for unfragmented IPv4 TCP and UDP packets is ((source port XOR dest port) XOR ((source IP XOR dest IP) AND 0xffff) modulo slave count - For fragmented TCP or UDP packets and all other IP - protocol traffic, the source and destination port + The formula for unfragmented IPv6 TCP and UDP packets is + + hash = (source port XOR dest port) XOR + ((source ip quad 2 XOR dest IP quad 2) XOR + (source ip quad 3 XOR dest IP quad 3) XOR + (source ip quad 4 XOR dest IP quad 4)) + + ((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash) + modulo slave count + + For fragmented TCP or UDP packets and all other IPv4 and + IPv6 protocol traffic, the source and destination port information is omitted. For non-IP traffic, the formula is the same as for the layer2 transmit hash policy. - This policy is intended to mimic the behavior of + The IPv4 policy is intended to mimic the behavior of certain switches, notably Cisco switches with PFC2 as well as some Foundry and IBM products. diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index d95fbc3..4221e57 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3354,56 +3354,93 @@ static struct notifier_block bond_netdev_notifier = { /*---------------------------- Hashing Policies -----------------------------*/ /* + * Hash for the output device based upon layer 2 data + */ +static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count) +{ + struct ethhdr *data = (struct ethhdr *)skb->data; + + if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto)) + return (data->h_dest[5] ^ data->h_source[5]) % count; + + return 0; +} + +/* * Hash for the output device based upon layer 2 and layer 3 data. If - * the packet is not IP mimic bond_xmit_hash_policy_l2() + * the packet is not IP, fall back on bond_xmit_hash_policy_l2() */ static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count) { struct ethhdr *data = (struct ethhdr *)skb->data; - struct iphdr *iph = ip_hdr(skb); - - if (skb->protocol == htons(ETH_P_IP)) { + struct iphdr *iph; + struct ipv6hdr *ipv6h; + u32 v6hash; + __be32 *s, *d; + + if (skb->protocol == htons(ETH_P_IP) && + skb_network_header_len(skb) >= sizeof(*iph)) { + iph = ip_hdr(skb); return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^ (data->h_dest[5] ^ data->h_source[5])) % count; + } else if (skb->protocol == htons(ETH_P_IPV6) && + skb_network_header_len(skb) >= sizeof(*ipv6h)) { + ipv6h = ipv6_hdr(skb); + s = &ipv6h->saddr.s6_addr32[0]; + d = &ipv6h->daddr.s6_addr32[0]; + v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]); + v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8); + return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count; } - return (data->h_dest[5] ^ data->h_source[5]) % count; + return bond_xmit_hash_policy_l2(skb, count); } /* * Hash for the output device based upon layer 3 and layer 4 data. If * the packet is a frag or not TCP or UDP, just use layer 3 data. If it is - * altogether not IP, mimic bond_xmit_hash_policy_l2() + * altogether not IP, fall back on bond_xmit_hash_policy_l2() */ static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count) { - struct ethhdr *data = (struct ethhdr *)skb->data; - struct iphdr *iph = ip_hdr(skb); - __be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); - int layer4_xor = 0; - - if (skb->protocol == htons(ETH_P_IP)) { + u32 layer4_xor = 0; + struct iphdr *iph; + struct ipv6hdr *ipv6h; + __be32 *s, *d; + __be16 *layer4hdr; + + if (skb->protocol == htons(ETH_P_IP) && + skb_network_header_len(skb) >= sizeof(*iph)) { + iph = ip_hdr(skb); if (!ip_is_fragment(iph) && (iph->protocol == IPPROTO_TCP || - iph->protocol == IPPROTO_UDP)) { - layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1))); + iph->protocol == IPPROTO_UDP) && + (skb_headlen(skb) - skb_network_offset(skb) >= + iph->ihl * sizeof(u32) + sizeof(*layer4hdr) * 2)) { + layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); + layer4_xor = ntohs(*layer4hdr ^ *(layer4hdr + 1)); } return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count; - + } else if (skb->protocol == htons(ETH_P_IPV6) && + skb_network_header_len(skb) >= sizeof(*ipv6h)) { + ipv6h = ipv6_hdr(skb); + if ((ipv6h->nexthdr == IPPROTO_TCP || + ipv6h->nexthdr == IPPROTO_UDP) && + (skb_headlen(skb) - skb_network_offset(skb) >= + sizeof(*ipv6h) + sizeof(*layer4hdr) * 2)) { + layer4hdr = (__be16 *)(ipv6h + 1); + layer4_xor = ntohs(*layer4hdr ^ *(layer4hdr + 1)); + } + s = &ipv6h->saddr.s6_addr32[0]; + d = &ipv6h->daddr.s6_addr32[0]; + layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]); + layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^ + (layer4_xor >> 8); + return layer4_xor % count; } - return (data->h_dest[5] ^ data->h_source[5]) % count; -} - -/* - * Hash for the output device based upon layer 2 data - */ -static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count) -{ - struct ethhdr *data = (struct ethhdr *)skb->data; - - return (data->h_dest[5] ^ data->h_source[5]) % count; + return bond_xmit_hash_policy_l2(skb, count); } /*-------------------------- Device entry points ----------------------------*/ -- 1.7.11 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v7] bonding: support for IPv6 transmit hashing 2012-08-22 5:12 ` [PATCH v7] bonding: " John Eaglesham @ 2012-08-22 5:29 ` David Miller 2012-08-22 6:43 ` [PATCH v8] " John Eaglesham 1 sibling, 0 replies; 25+ messages in thread From: David Miller @ 2012-08-22 5:29 UTC (permalink / raw) To: linux; +Cc: netdev From: John Eaglesham <linux@8192.net> Date: Tue, 21 Aug 2012 22:12:33 -0700 > Signed-off-by: John Eaglesham This is not a proper signoff, you must put an email address on this line. Also, the changelog between the versions doesn't belong in the commit log message itself. It gets placed in commentary after the "---" line. ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v8] bonding: support for IPv6 transmit hashing 2012-08-22 5:12 ` [PATCH v7] bonding: " John Eaglesham 2012-08-22 5:29 ` David Miller @ 2012-08-22 6:43 ` John Eaglesham 2012-08-23 5:49 ` David Miller 2012-08-23 12:23 ` Jeremy Brookman 1 sibling, 2 replies; 25+ messages in thread From: John Eaglesham @ 2012-08-22 6:43 UTC (permalink / raw) To: netdev; +Cc: John Eaglesham From: John Eaglesham <linux@8192.net> Currently the "bonding" driver does not support load balancing outgoing traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) are currently supported; this patch adds transmit hashing for IPv6 (and TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the bonding driver. In addition, bounds checking has been added to all transmit hashing functions. The algorithm chosen (xor'ing the bottom three quads of the source and destination addresses together, then xor'ing each byte of that result into the bottom byte, finally xor'ing with the last bytes of the MAC addresses) was selected after testing almost 400,000 unique IPv6 addresses harvested from server logs. This algorithm had the most even distribution for both big- and little-endian architectures while still using few instructions. Its behavior also attempts to closely match that of the IPv4 algorithm. The IPv6 flow label was intentionally not included in the hash as it appears to be unset in the vast majority of IPv6 traffic sampled, and the current algorithm not using the flow label already offers a very even distribution. Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets, ie, they are not balanced based on layer 4 information. Additionally, IPv6 packets with intermediate headers are not balanced based on layer 4 information. In practice these intermediate headers are not common and this should not cause any problems, and the alternative (a packet-parsing loop and look-up table) seemed slow and complicated for little gain. Tested-by: John Eaglesham <linux@8192.net> Signed-off-by: John Eaglesham <linux@8192.net> --- Changes: v2) * Clarify description * Add bounds checking to more functions * All functions call bond_xmit_hash_policy_l2 rather than re- implement the same logic. v3) * Patch against net-next. * Style corrections. v4) * Correct indenting. v5) * Squash documentation and code patches into one. v6) * Modify IPv6 hash to behave more like the IPv4 hash, update documentation with modified algorithm. * Clean up formatting. * Move all variable declaration to the top of the function. * Minor change to IPv6 layer 4 hash to match IPv4 hash behavior (mix all hashed address bits together rather than just the bottom 24 bits). v7) * Improve bounds checking code (handle truncated IPv6 header, removed goto, fewer if statements). * Re-write pseudocode in documentation to match actual code more closely. * Correct indenting, align parentheses, wrap code at <= 80 columns (based on Jay's changes). v8) * Correct patch submission format. Documentation/networking/bonding.txt | 30 ++++++++++-- drivers/net/bonding/bond_main.c | 89 +++++++++++++++++++++++++----------- 2 files changed, 88 insertions(+), 31 deletions(-) diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 6b1c711..10a015c 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -752,12 +752,22 @@ xmit_hash_policy protocol information to generate the hash. Uses XOR of hardware MAC addresses and IP addresses to - generate the hash. The formula is + generate the hash. The IPv4 formula is (((source IP XOR dest IP) AND 0xffff) XOR ( source MAC XOR destination MAC )) modulo slave count + The IPv6 formula is + + hash = (source ip quad 2 XOR dest IP quad 2) XOR + (source ip quad 3 XOR dest IP quad 3) XOR + (source ip quad 4 XOR dest IP quad 4) + + (((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash) + XOR (source MAC XOR destination MAC)) + modulo slave count + This algorithm will place all traffic to a particular network peer on the same slave. For non-IP traffic, the formula is the same as for the layer2 transmit @@ -778,19 +788,29 @@ xmit_hash_policy slaves, although a single connection will not span multiple slaves. - The formula for unfragmented TCP and UDP packets is + The formula for unfragmented IPv4 TCP and UDP packets is ((source port XOR dest port) XOR ((source IP XOR dest IP) AND 0xffff) modulo slave count - For fragmented TCP or UDP packets and all other IP - protocol traffic, the source and destination port + The formula for unfragmented IPv6 TCP and UDP packets is + + hash = (source port XOR dest port) XOR + ((source ip quad 2 XOR dest IP quad 2) XOR + (source ip quad 3 XOR dest IP quad 3) XOR + (source ip quad 4 XOR dest IP quad 4)) + + ((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash) + modulo slave count + + For fragmented TCP or UDP packets and all other IPv4 and + IPv6 protocol traffic, the source and destination port information is omitted. For non-IP traffic, the formula is the same as for the layer2 transmit hash policy. - This policy is intended to mimic the behavior of + The IPv4 policy is intended to mimic the behavior of certain switches, notably Cisco switches with PFC2 as well as some Foundry and IBM products. diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index d95fbc3..4221e57 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3354,56 +3354,93 @@ static struct notifier_block bond_netdev_notifier = { /*---------------------------- Hashing Policies -----------------------------*/ /* + * Hash for the output device based upon layer 2 data + */ +static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count) +{ + struct ethhdr *data = (struct ethhdr *)skb->data; + + if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto)) + return (data->h_dest[5] ^ data->h_source[5]) % count; + + return 0; +} + +/* * Hash for the output device based upon layer 2 and layer 3 data. If - * the packet is not IP mimic bond_xmit_hash_policy_l2() + * the packet is not IP, fall back on bond_xmit_hash_policy_l2() */ static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count) { struct ethhdr *data = (struct ethhdr *)skb->data; - struct iphdr *iph = ip_hdr(skb); - - if (skb->protocol == htons(ETH_P_IP)) { + struct iphdr *iph; + struct ipv6hdr *ipv6h; + u32 v6hash; + __be32 *s, *d; + + if (skb->protocol == htons(ETH_P_IP) && + skb_network_header_len(skb) >= sizeof(*iph)) { + iph = ip_hdr(skb); return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^ (data->h_dest[5] ^ data->h_source[5])) % count; + } else if (skb->protocol == htons(ETH_P_IPV6) && + skb_network_header_len(skb) >= sizeof(*ipv6h)) { + ipv6h = ipv6_hdr(skb); + s = &ipv6h->saddr.s6_addr32[0]; + d = &ipv6h->daddr.s6_addr32[0]; + v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]); + v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8); + return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count; } - return (data->h_dest[5] ^ data->h_source[5]) % count; + return bond_xmit_hash_policy_l2(skb, count); } /* * Hash for the output device based upon layer 3 and layer 4 data. If * the packet is a frag or not TCP or UDP, just use layer 3 data. If it is - * altogether not IP, mimic bond_xmit_hash_policy_l2() + * altogether not IP, fall back on bond_xmit_hash_policy_l2() */ static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count) { - struct ethhdr *data = (struct ethhdr *)skb->data; - struct iphdr *iph = ip_hdr(skb); - __be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); - int layer4_xor = 0; - - if (skb->protocol == htons(ETH_P_IP)) { + u32 layer4_xor = 0; + struct iphdr *iph; + struct ipv6hdr *ipv6h; + __be32 *s, *d; + __be16 *layer4hdr; + + if (skb->protocol == htons(ETH_P_IP) && + skb_network_header_len(skb) >= sizeof(*iph)) { + iph = ip_hdr(skb); if (!ip_is_fragment(iph) && (iph->protocol == IPPROTO_TCP || - iph->protocol == IPPROTO_UDP)) { - layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1))); + iph->protocol == IPPROTO_UDP) && + (skb_headlen(skb) - skb_network_offset(skb) >= + iph->ihl * sizeof(u32) + sizeof(*layer4hdr) * 2)) { + layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); + layer4_xor = ntohs(*layer4hdr ^ *(layer4hdr + 1)); } return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count; - + } else if (skb->protocol == htons(ETH_P_IPV6) && + skb_network_header_len(skb) >= sizeof(*ipv6h)) { + ipv6h = ipv6_hdr(skb); + if ((ipv6h->nexthdr == IPPROTO_TCP || + ipv6h->nexthdr == IPPROTO_UDP) && + (skb_headlen(skb) - skb_network_offset(skb) >= + sizeof(*ipv6h) + sizeof(*layer4hdr) * 2)) { + layer4hdr = (__be16 *)(ipv6h + 1); + layer4_xor = ntohs(*layer4hdr ^ *(layer4hdr + 1)); + } + s = &ipv6h->saddr.s6_addr32[0]; + d = &ipv6h->daddr.s6_addr32[0]; + layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]); + layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^ + (layer4_xor >> 8); + return layer4_xor % count; } - return (data->h_dest[5] ^ data->h_source[5]) % count; -} - -/* - * Hash for the output device based upon layer 2 data - */ -static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count) -{ - struct ethhdr *data = (struct ethhdr *)skb->data; - - return (data->h_dest[5] ^ data->h_source[5]) % count; + return bond_xmit_hash_policy_l2(skb, count); } /*-------------------------- Device entry points ----------------------------*/ -- 1.7.11 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v8] bonding: support for IPv6 transmit hashing 2012-08-22 6:43 ` [PATCH v8] " John Eaglesham @ 2012-08-23 5:49 ` David Miller 2012-08-23 12:23 ` Jeremy Brookman 1 sibling, 0 replies; 25+ messages in thread From: David Miller @ 2012-08-23 5:49 UTC (permalink / raw) To: linux; +Cc: netdev From: John Eaglesham <linux@8192.net> Date: Tue, 21 Aug 2012 23:43:35 -0700 > From: John Eaglesham <linux@8192.net> > > Currently the "bonding" driver does not support load balancing outgoing > traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) > are currently supported; this patch adds transmit hashing for IPv6 (and > TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the > bonding driver. In addition, bounds checking has been added to all > transmit hashing functions. > > The algorithm chosen (xor'ing the bottom three quads of the source and > destination addresses together, then xor'ing each byte of that result into > the bottom byte, finally xor'ing with the last bytes of the MAC addresses) > was selected after testing almost 400,000 unique IPv6 addresses harvested > from server logs. This algorithm had the most even distribution for both > big- and little-endian architectures while still using few instructions. Its > behavior also attempts to closely match that of the IPv4 algorithm. > > The IPv6 flow label was intentionally not included in the hash as it appears > to be unset in the vast majority of IPv6 traffic sampled, and the current > algorithm not using the flow label already offers a very even distribution. > > Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets, > ie, they are not balanced based on layer 4 information. Additionally, > IPv6 packets with intermediate headers are not balanced based on layer > 4 information. In practice these intermediate headers are not common and > this should not cause any problems, and the alternative (a packet-parsing > loop and look-up table) seemed slow and complicated for little gain. > > Tested-by: John Eaglesham <linux@8192.net> > Signed-off-by: John Eaglesham <linux@8192.net> Applied, thanks a lot. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v8] bonding: support for IPv6 transmit hashing 2012-08-22 6:43 ` [PATCH v8] " John Eaglesham 2012-08-23 5:49 ` David Miller @ 2012-08-23 12:23 ` Jeremy Brookman 1 sibling, 0 replies; 25+ messages in thread From: Jeremy Brookman @ 2012-08-23 12:23 UTC (permalink / raw) To: John Eaglesham; +Cc: netdev Thanks for getting this in John. Apologies for my earlier reply, where I hadn't spotted this revision of the patch; it looks like the comments I made have been addressed, and all is well. Thanks again, Jeremy On Wed, Aug 22, 2012 at 7:43 AM, John Eaglesham <linux@8192.net> wrote: > From: John Eaglesham <linux@8192.net> > > Currently the "bonding" driver does not support load balancing outgoing > traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) > are currently supported; this patch adds transmit hashing for IPv6 (and > TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the > bonding driver. In addition, bounds checking has been added to all > transmit hashing functions. > > The algorithm chosen (xor'ing the bottom three quads of the source and > destination addresses together, then xor'ing each byte of that result into > the bottom byte, finally xor'ing with the last bytes of the MAC addresses) > was selected after testing almost 400,000 unique IPv6 addresses harvested > from server logs. This algorithm had the most even distribution for both > big- and little-endian architectures while still using few instructions. Its > behavior also attempts to closely match that of the IPv4 algorithm. > > The IPv6 flow label was intentionally not included in the hash as it appears > to be unset in the vast majority of IPv6 traffic sampled, and the current > algorithm not using the flow label already offers a very even distribution. > > Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets, > ie, they are not balanced based on layer 4 information. Additionally, > IPv6 packets with intermediate headers are not balanced based on layer > 4 information. In practice these intermediate headers are not common and > this should not cause any problems, and the alternative (a packet-parsing > loop and look-up table) seemed slow and complicated for little gain. > > Tested-by: John Eaglesham <linux@8192.net> > Signed-off-by: John Eaglesham <linux@8192.net> > > --- > > Changes: > v2) > * Clarify description > * Add bounds checking to more functions > * All functions call bond_xmit_hash_policy_l2 rather than re- > implement the same logic. > v3) > * Patch against net-next. > * Style corrections. > v4) > * Correct indenting. > v5) > * Squash documentation and code patches into one. > v6) > * Modify IPv6 hash to behave more like the IPv4 hash, update > documentation with modified algorithm. > * Clean up formatting. > * Move all variable declaration to the top of the function. > * Minor change to IPv6 layer 4 hash to match IPv4 hash behavior > (mix all hashed address bits together rather than just the > bottom 24 bits). > v7) > * Improve bounds checking code (handle truncated IPv6 header, > removed goto, fewer if statements). > * Re-write pseudocode in documentation to match actual code more > closely. > * Correct indenting, align parentheses, wrap code at <= 80 columns > (based on Jay's changes). > v8) > * Correct patch submission format. > > Documentation/networking/bonding.txt | 30 ++++++++++-- > drivers/net/bonding/bond_main.c | 89 +++++++++++++++++++++++++----------- > 2 files changed, 88 insertions(+), 31 deletions(-) > > diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt > index 6b1c711..10a015c 100644 > --- a/Documentation/networking/bonding.txt > +++ b/Documentation/networking/bonding.txt > @@ -752,12 +752,22 @@ xmit_hash_policy > protocol information to generate the hash. > > Uses XOR of hardware MAC addresses and IP addresses to > - generate the hash. The formula is > + generate the hash. The IPv4 formula is > > (((source IP XOR dest IP) AND 0xffff) XOR > ( source MAC XOR destination MAC )) > modulo slave count > > + The IPv6 formula is > + > + hash = (source ip quad 2 XOR dest IP quad 2) XOR > + (source ip quad 3 XOR dest IP quad 3) XOR > + (source ip quad 4 XOR dest IP quad 4) > + > + (((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash) > + XOR (source MAC XOR destination MAC)) > + modulo slave count > + > This algorithm will place all traffic to a particular > network peer on the same slave. For non-IP traffic, > the formula is the same as for the layer2 transmit > @@ -778,19 +788,29 @@ xmit_hash_policy > slaves, although a single connection will not span > multiple slaves. > > - The formula for unfragmented TCP and UDP packets is > + The formula for unfragmented IPv4 TCP and UDP packets is > > ((source port XOR dest port) XOR > ((source IP XOR dest IP) AND 0xffff) > modulo slave count > > - For fragmented TCP or UDP packets and all other IP > - protocol traffic, the source and destination port > + The formula for unfragmented IPv6 TCP and UDP packets is > + > + hash = (source port XOR dest port) XOR > + ((source ip quad 2 XOR dest IP quad 2) XOR > + (source ip quad 3 XOR dest IP quad 3) XOR > + (source ip quad 4 XOR dest IP quad 4)) > + > + ((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash) > + modulo slave count > + > + For fragmented TCP or UDP packets and all other IPv4 and > + IPv6 protocol traffic, the source and destination port > information is omitted. For non-IP traffic, the > formula is the same as for the layer2 transmit hash > policy. > > - This policy is intended to mimic the behavior of > + The IPv4 policy is intended to mimic the behavior of > certain switches, notably Cisco switches with PFC2 as > well as some Foundry and IBM products. > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > index d95fbc3..4221e57 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -3354,56 +3354,93 @@ static struct notifier_block bond_netdev_notifier = { > /*---------------------------- Hashing Policies -----------------------------*/ > > /* > + * Hash for the output device based upon layer 2 data > + */ > +static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count) > +{ > + struct ethhdr *data = (struct ethhdr *)skb->data; > + > + if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto)) > + return (data->h_dest[5] ^ data->h_source[5]) % count; > + > + return 0; > +} > + > +/* > * Hash for the output device based upon layer 2 and layer 3 data. If > - * the packet is not IP mimic bond_xmit_hash_policy_l2() > + * the packet is not IP, fall back on bond_xmit_hash_policy_l2() > */ > static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count) > { > struct ethhdr *data = (struct ethhdr *)skb->data; > - struct iphdr *iph = ip_hdr(skb); > - > - if (skb->protocol == htons(ETH_P_IP)) { > + struct iphdr *iph; > + struct ipv6hdr *ipv6h; > + u32 v6hash; > + __be32 *s, *d; > + > + if (skb->protocol == htons(ETH_P_IP) && > + skb_network_header_len(skb) >= sizeof(*iph)) { > + iph = ip_hdr(skb); > return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^ > (data->h_dest[5] ^ data->h_source[5])) % count; > + } else if (skb->protocol == htons(ETH_P_IPV6) && > + skb_network_header_len(skb) >= sizeof(*ipv6h)) { > + ipv6h = ipv6_hdr(skb); > + s = &ipv6h->saddr.s6_addr32[0]; > + d = &ipv6h->daddr.s6_addr32[0]; > + v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]); > + v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8); > + return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count; > } > > - return (data->h_dest[5] ^ data->h_source[5]) % count; > + return bond_xmit_hash_policy_l2(skb, count); > } > > /* > * Hash for the output device based upon layer 3 and layer 4 data. If > * the packet is a frag or not TCP or UDP, just use layer 3 data. If it is > - * altogether not IP, mimic bond_xmit_hash_policy_l2() > + * altogether not IP, fall back on bond_xmit_hash_policy_l2() > */ > static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count) > { > - struct ethhdr *data = (struct ethhdr *)skb->data; > - struct iphdr *iph = ip_hdr(skb); > - __be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); > - int layer4_xor = 0; > - > - if (skb->protocol == htons(ETH_P_IP)) { > + u32 layer4_xor = 0; > + struct iphdr *iph; > + struct ipv6hdr *ipv6h; > + __be32 *s, *d; > + __be16 *layer4hdr; > + > + if (skb->protocol == htons(ETH_P_IP) && > + skb_network_header_len(skb) >= sizeof(*iph)) { > + iph = ip_hdr(skb); > if (!ip_is_fragment(iph) && > (iph->protocol == IPPROTO_TCP || > - iph->protocol == IPPROTO_UDP)) { > - layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1))); > + iph->protocol == IPPROTO_UDP) && > + (skb_headlen(skb) - skb_network_offset(skb) >= > + iph->ihl * sizeof(u32) + sizeof(*layer4hdr) * 2)) { > + layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); > + layer4_xor = ntohs(*layer4hdr ^ *(layer4hdr + 1)); > } > return (layer4_xor ^ > ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count; > - > + } else if (skb->protocol == htons(ETH_P_IPV6) && > + skb_network_header_len(skb) >= sizeof(*ipv6h)) { > + ipv6h = ipv6_hdr(skb); > + if ((ipv6h->nexthdr == IPPROTO_TCP || > + ipv6h->nexthdr == IPPROTO_UDP) && > + (skb_headlen(skb) - skb_network_offset(skb) >= > + sizeof(*ipv6h) + sizeof(*layer4hdr) * 2)) { > + layer4hdr = (__be16 *)(ipv6h + 1); > + layer4_xor = ntohs(*layer4hdr ^ *(layer4hdr + 1)); > + } > + s = &ipv6h->saddr.s6_addr32[0]; > + d = &ipv6h->daddr.s6_addr32[0]; > + layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]); > + layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^ > + (layer4_xor >> 8); > + return layer4_xor % count; > } > > - return (data->h_dest[5] ^ data->h_source[5]) % count; > -} > - > -/* > - * Hash for the output device based upon layer 2 data > - */ > -static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count) > -{ > - struct ethhdr *data = (struct ethhdr *)skb->data; > - > - return (data->h_dest[5] ^ data->h_source[5]) % count; > + return bond_xmit_hash_policy_l2(skb, count); > } > > /*-------------------------- Device entry points ----------------------------*/ > -- > 1.7.11 > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2012-08-23 12:23 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-07-01 7:01 [PATCH v4 0/2] bonding support for IPv6 transmit hashing John Eaglesham 2012-07-01 7:01 ` [PATCH v4 1/2] Add support for IPv6 and bounds checking to transmit hashing functions John Eaglesham 2012-07-01 7:33 ` David Miller 2012-07-01 7:01 ` [PATCH v4 2/2] Update bonding driver documentation to include IPv6 transmit hashing algorithm John Eaglesham 2012-07-01 7:34 ` David Miller 2012-07-01 7:42 ` John Eaglesham 2012-07-01 8:07 ` [PATCH v5] bonding support for IPv6 transmit hashing John Eaglesham 2012-07-01 10:33 ` David Miller 2012-07-01 19:01 ` John Eaglesham 2012-07-01 19:13 ` [PATCH v6] " John Eaglesham 2012-07-02 23:33 ` Jay Vosburgh 2012-07-03 5:01 ` John Eaglesham 2012-07-03 5:14 ` David Miller 2012-07-03 5:38 ` John Eaglesham 2012-07-03 5:43 ` David Miller 2012-08-21 18:11 ` Jeremy Brookman 2012-08-21 19:19 ` Jay Vosburgh 2012-08-21 22:21 ` John Eaglesham 2012-08-22 12:06 ` Jeremy Brookman 2012-08-23 10:42 ` Jeremy Brookman 2012-08-22 5:12 ` [PATCH v7] bonding: " John Eaglesham 2012-08-22 5:29 ` David Miller 2012-08-22 6:43 ` [PATCH v8] " John Eaglesham 2012-08-23 5:49 ` David Miller 2012-08-23 12:23 ` Jeremy Brookman
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.