* [PATCH 0/4] bond_alb: support VMs behind bridges better @ 2021-05-18 21:08 Jarod Wilson 2021-05-18 21:08 ` [PATCH 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson ` (4 more replies) 0 siblings, 5 replies; 20+ messages in thread From: Jarod Wilson @ 2021-05-18 21:08 UTC (permalink / raw) To: linux-kernel; +Cc: Jarod Wilson I've been further educated on a use case, where a bridge sits on top of a bond, with multiple vnetX interfaces attached to virtual machines, also acting as ports of the bridge. Each leg of the bond goes to a different switch, but there is NO mlag/vpc in play, the bonding driver has to handle traffic that loops back appropriately to avoid breaking transmission. Rather than adding some sort of mac filtering to balance-xor mode, we switched to using balance-alb, which already does some of this, and with the tweaks provided in this series, empirically seems to behave as desired in actual operation. Jarod Wilson (4): bonding: add pure source-mac-based tx hashing option bond_alb: don't rewrite bridged non-local MACs bond_alb: don't tx balance multicast traffic either bond_alb: put all slaves into promisc Documentation/networking/bonding.rst | 13 ++++++++++++ drivers/net/bonding/bond_alb.c | 27 ++++++++++++++++++++++-- drivers/net/bonding/bond_main.c | 31 ++++++++++++++++++---------- drivers/net/bonding/bond_options.c | 1 + include/linux/netdevice.h | 1 + include/uapi/linux/if_bonding.h | 1 + 6 files changed, 61 insertions(+), 13 deletions(-) -- 2.30.2 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/4] bonding: add pure source-mac-based tx hashing option 2021-05-18 21:08 [PATCH 0/4] bond_alb: support VMs behind bridges better Jarod Wilson @ 2021-05-18 21:08 ` Jarod Wilson 2021-05-19 9:01 ` Nikolay Aleksandrov 2021-05-18 21:08 ` [PATCH 2/4] bond_alb: don't rewrite bridged non-local MACs Jarod Wilson ` (3 subsequent siblings) 4 siblings, 1 reply; 20+ messages in thread From: Jarod Wilson @ 2021-05-18 21:08 UTC (permalink / raw) To: linux-kernel Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, netdev As it turns out, a pure source-mac only tx hash has a place for some VM setups. The previously added vlan+srcmac hash doesn't work as well for a VM with a single MAC and multiple vlans -- these types of setups path traffic more efficiently if the load is split by source mac alone. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Thomas Davis <tadavis@lbl.gov> Cc: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> --- Documentation/networking/bonding.rst | 13 +++++++++++++ drivers/net/bonding/bond_main.c | 26 +++++++++++++++++--------- drivers/net/bonding/bond_options.c | 1 + include/linux/netdevice.h | 1 + include/uapi/linux/if_bonding.h | 1 + 5 files changed, 33 insertions(+), 9 deletions(-) diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst index 62f2aab8eaec..66c3fa3a9040 100644 --- a/Documentation/networking/bonding.rst +++ b/Documentation/networking/bonding.rst @@ -964,6 +964,19 @@ xmit_hash_policy hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev) + srcmac + + This policy uses a very rudimentary source mac hash to + load-balance traffic per-source-mac, with failover should + one leg fail. The intended use case is for a bond shared + by multiple virtual machines, each with their own virtual + mac address, keeping the VMs traffic all limited to the + same outbound interface. + + The formula for the hash is simply + + hash = (source MAC vendor) XOR (source MAC dev) + The default value is layer2. This option was added in bonding version 2.6.3. In earlier versions of bonding, this parameter does not exist, and the layer2 policy is the only policy. The diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 20bbda1b36e1..d71e398642fb 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -167,7 +167,8 @@ module_param(xmit_hash_policy, charp, 0); MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3ad hashing method; " "0 for layer 2 (default), 1 for layer 3+4, " "2 for layer 2+3, 3 for encap layer 2+3, " - "4 for encap layer 3+4, 5 for vlan+srcmac"); + "4 for encap layer 3+4, 5 for vlan+srcmac, " + "6 for srcmac"); module_param(arp_interval, int, 0); MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds"); module_param_array(arp_ip_target, charp, NULL, 0); @@ -1459,6 +1460,8 @@ static enum netdev_lag_hash bond_lag_hash_type(struct bonding *bond, return NETDEV_LAG_HASH_E34; case BOND_XMIT_POLICY_VLAN_SRCMAC: return NETDEV_LAG_HASH_VLAN_SRCMAC; + case BOND_XMIT_POLICY_SRCMAC: + return NETDEV_LAG_HASH_SRCMAC; default: return NETDEV_LAG_HASH_UNKNOWN; } @@ -3521,11 +3524,11 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, return true; } -static u32 bond_vlan_srcmac_hash(struct sk_buff *skb) +static u32 bond_vlan_srcmac_hash(struct sk_buff *skb, bool with_vlan) { - struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb); + struct ethhdr *mac_hdr = eth_hdr(skb); u32 srcmac_vendor = 0, srcmac_dev = 0; - u16 vlan; + u32 hash; int i; for (i = 0; i < 3; i++) @@ -3534,12 +3537,14 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb) for (i = 3; i < ETH_ALEN; i++) srcmac_dev = (srcmac_dev << 8) | mac_hdr->h_source[i]; - if (!skb_vlan_tag_present(skb)) - return srcmac_vendor ^ srcmac_dev; + hash = srcmac_vendor ^ srcmac_dev; + + if (!with_vlan || !skb_vlan_tag_present(skb)) + return hash; - vlan = skb_vlan_tag_get(skb); + hash ^= skb_vlan_tag_get(skb); - return vlan ^ srcmac_vendor ^ srcmac_dev; + return hash; } /* Extract the appropriate headers based on bond's xmit policy */ @@ -3618,8 +3623,11 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb) skb->l4_hash) return skb->hash; + if (bond->params.xmit_policy == BOND_XMIT_POLICY_SRCMAC) + return bond_vlan_srcmac_hash(skb, false); + if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC) - return bond_vlan_srcmac_hash(skb); + return bond_vlan_srcmac_hash(skb, true); if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 || !bond_flow_dissect(bond, skb, &flow)) diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index c9d3604ae129..ff68ad2589f0 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -102,6 +102,7 @@ static const struct bond_opt_value bond_xmit_hashtype_tbl[] = { { "encap2+3", BOND_XMIT_POLICY_ENCAP23, 0}, { "encap3+4", BOND_XMIT_POLICY_ENCAP34, 0}, { "vlan+srcmac", BOND_XMIT_POLICY_VLAN_SRCMAC, 0}, + { "srcmac", BOND_XMIT_POLICY_SRCMAC, 0}, { NULL, -1, 0}, }; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 5cbc950b34df..d88319fca1d3 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2732,6 +2732,7 @@ enum netdev_lag_hash { NETDEV_LAG_HASH_E23, NETDEV_LAG_HASH_E34, NETDEV_LAG_HASH_VLAN_SRCMAC, + NETDEV_LAG_HASH_SRCMAC, NETDEV_LAG_HASH_UNKNOWN, }; diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h index d174914a837d..f3b4d412a73f 100644 --- a/include/uapi/linux/if_bonding.h +++ b/include/uapi/linux/if_bonding.h @@ -95,6 +95,7 @@ #define BOND_XMIT_POLICY_ENCAP23 3 /* encapsulated layer 2+3 */ #define BOND_XMIT_POLICY_ENCAP34 4 /* encapsulated layer 3+4 */ #define BOND_XMIT_POLICY_VLAN_SRCMAC 5 /* vlan + source MAC */ +#define BOND_XMIT_POLICY_SRCMAC 6 /* source MAC only */ /* 802.3ad port state definitions (43.4.2.2 in the 802.3ad standard) */ #define LACP_STATE_LACP_ACTIVITY 0x1 -- 2.30.2 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] bonding: add pure source-mac-based tx hashing option 2021-05-18 21:08 ` [PATCH 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson @ 2021-05-19 9:01 ` Nikolay Aleksandrov 0 siblings, 0 replies; 20+ messages in thread From: Nikolay Aleksandrov @ 2021-05-19 9:01 UTC (permalink / raw) To: Jarod Wilson, linux-kernel Cc: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, netdev On 19/05/2021 00:08, Jarod Wilson wrote: > As it turns out, a pure source-mac only tx hash has a place for some VM > setups. The previously added vlan+srcmac hash doesn't work as well for a > VM with a single MAC and multiple vlans -- these types of setups path > traffic more efficiently if the load is split by source mac alone. > > Cc: Jay Vosburgh <j.vosburgh@gmail.com> > Cc: Veaceslav Falico <vfalico@gmail.com> > Cc: Andy Gospodarek <andy@greyhouse.net> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Thomas Davis <tadavis@lbl.gov> > Cc: netdev@vger.kernel.org > Signed-off-by: Jarod Wilson <jarod@redhat.com> > --- > Documentation/networking/bonding.rst | 13 +++++++++++++ > drivers/net/bonding/bond_main.c | 26 +++++++++++++++++--------- > drivers/net/bonding/bond_options.c | 1 + > include/linux/netdevice.h | 1 + > include/uapi/linux/if_bonding.h | 1 + > 5 files changed, 33 insertions(+), 9 deletions(-) > Hi, It would seem you keep adding modes for each field, that seems unnecessary to me and it also affects the fast path - each new mode you add is another 1+ tests in bond's fast path. You could instead just add 1 new mode which has configurable hash fields, take the "hit" for it once in the fast-path (if chosen) and use that. I'd like to avoid tomorrow getting another "dstmac" mode or something like that. In fact both of these new modes are unnecessary in most cases, you could use any available method (e.g. ebpf) to compute and set the skb queue mapping on Tx to choose any slave and that would override any hash or bond mode. Check __bond_start_xmit() -> bond_slave_override() Cheers, Nik > diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst > index 62f2aab8eaec..66c3fa3a9040 100644 > --- a/Documentation/networking/bonding.rst > +++ b/Documentation/networking/bonding.rst > @@ -964,6 +964,19 @@ xmit_hash_policy > > hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev) > > + srcmac > + > + This policy uses a very rudimentary source mac hash to > + load-balance traffic per-source-mac, with failover should > + one leg fail. The intended use case is for a bond shared > + by multiple virtual machines, each with their own virtual > + mac address, keeping the VMs traffic all limited to the > + same outbound interface. > + > + The formula for the hash is simply > + > + hash = (source MAC vendor) XOR (source MAC dev) > + > The default value is layer2. This option was added in bonding > version 2.6.3. In earlier versions of bonding, this parameter > does not exist, and the layer2 policy is the only policy. The > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > index 20bbda1b36e1..d71e398642fb 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -167,7 +167,8 @@ module_param(xmit_hash_policy, charp, 0); > MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3ad hashing method; " > "0 for layer 2 (default), 1 for layer 3+4, " > "2 for layer 2+3, 3 for encap layer 2+3, " > - "4 for encap layer 3+4, 5 for vlan+srcmac"); > + "4 for encap layer 3+4, 5 for vlan+srcmac, " > + "6 for srcmac"); > module_param(arp_interval, int, 0); > MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds"); > module_param_array(arp_ip_target, charp, NULL, 0); > @@ -1459,6 +1460,8 @@ static enum netdev_lag_hash bond_lag_hash_type(struct bonding *bond, > return NETDEV_LAG_HASH_E34; > case BOND_XMIT_POLICY_VLAN_SRCMAC: > return NETDEV_LAG_HASH_VLAN_SRCMAC; > + case BOND_XMIT_POLICY_SRCMAC: > + return NETDEV_LAG_HASH_SRCMAC; > default: > return NETDEV_LAG_HASH_UNKNOWN; > } > @@ -3521,11 +3524,11 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, > return true; > } > > -static u32 bond_vlan_srcmac_hash(struct sk_buff *skb) > +static u32 bond_vlan_srcmac_hash(struct sk_buff *skb, bool with_vlan) > { > - struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb); > + struct ethhdr *mac_hdr = eth_hdr(skb); > u32 srcmac_vendor = 0, srcmac_dev = 0; > - u16 vlan; > + u32 hash; > int i; > > for (i = 0; i < 3; i++) > @@ -3534,12 +3537,14 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb) > for (i = 3; i < ETH_ALEN; i++) > srcmac_dev = (srcmac_dev << 8) | mac_hdr->h_source[i]; > > - if (!skb_vlan_tag_present(skb)) > - return srcmac_vendor ^ srcmac_dev; > + hash = srcmac_vendor ^ srcmac_dev; > + > + if (!with_vlan || !skb_vlan_tag_present(skb)) > + return hash; > > - vlan = skb_vlan_tag_get(skb); > + hash ^= skb_vlan_tag_get(skb); > > - return vlan ^ srcmac_vendor ^ srcmac_dev; > + return hash; > } > > /* Extract the appropriate headers based on bond's xmit policy */ > @@ -3618,8 +3623,11 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb) > skb->l4_hash) > return skb->hash; > > + if (bond->params.xmit_policy == BOND_XMIT_POLICY_SRCMAC) > + return bond_vlan_srcmac_hash(skb, false); > + > if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC) > - return bond_vlan_srcmac_hash(skb); > + return bond_vlan_srcmac_hash(skb, true); > > if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 || > !bond_flow_dissect(bond, skb, &flow)) > diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c > index c9d3604ae129..ff68ad2589f0 100644 > --- a/drivers/net/bonding/bond_options.c > +++ b/drivers/net/bonding/bond_options.c > @@ -102,6 +102,7 @@ static const struct bond_opt_value bond_xmit_hashtype_tbl[] = { > { "encap2+3", BOND_XMIT_POLICY_ENCAP23, 0}, > { "encap3+4", BOND_XMIT_POLICY_ENCAP34, 0}, > { "vlan+srcmac", BOND_XMIT_POLICY_VLAN_SRCMAC, 0}, > + { "srcmac", BOND_XMIT_POLICY_SRCMAC, 0}, > { NULL, -1, 0}, > }; > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 5cbc950b34df..d88319fca1d3 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -2732,6 +2732,7 @@ enum netdev_lag_hash { > NETDEV_LAG_HASH_E23, > NETDEV_LAG_HASH_E34, > NETDEV_LAG_HASH_VLAN_SRCMAC, > + NETDEV_LAG_HASH_SRCMAC, > NETDEV_LAG_HASH_UNKNOWN, > }; > > diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h > index d174914a837d..f3b4d412a73f 100644 > --- a/include/uapi/linux/if_bonding.h > +++ b/include/uapi/linux/if_bonding.h > @@ -95,6 +95,7 @@ > #define BOND_XMIT_POLICY_ENCAP23 3 /* encapsulated layer 2+3 */ > #define BOND_XMIT_POLICY_ENCAP34 4 /* encapsulated layer 3+4 */ > #define BOND_XMIT_POLICY_VLAN_SRCMAC 5 /* vlan + source MAC */ > +#define BOND_XMIT_POLICY_SRCMAC 6 /* source MAC only */ > > /* 802.3ad port state definitions (43.4.2.2 in the 802.3ad standard) */ > #define LACP_STATE_LACP_ACTIVITY 0x1 > ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 2/4] bond_alb: don't rewrite bridged non-local MACs 2021-05-18 21:08 [PATCH 0/4] bond_alb: support VMs behind bridges better Jarod Wilson 2021-05-18 21:08 ` [PATCH 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson @ 2021-05-18 21:08 ` Jarod Wilson 2021-05-19 22:31 ` Jay Vosburgh 2021-05-18 21:08 ` [PATCH 3/4] bond_alb: don't tx balance multicast traffic either Jarod Wilson ` (2 subsequent siblings) 4 siblings, 1 reply; 20+ messages in thread From: Jarod Wilson @ 2021-05-18 21:08 UTC (permalink / raw) To: linux-kernel Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, netdev With a virtual machine behind a bridge on top of a bond, outgoing traffic should retain the VM's source MAC. That works fine most of the time, until doing a failover, and then the MAC gets rewritten to the bond slave's MAC, and the return traffic gets dropped. If we don't rewrite the MAC there, we don't lose any traffic. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Thomas Davis <tadavis@lbl.gov> Cc: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> --- drivers/net/bonding/bond_alb.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index 3455f2cc13f2..ce8257c7cbea 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -1302,6 +1302,26 @@ void bond_alb_deinitialize(struct bonding *bond) rlb_deinitialize(bond); } +static bool bond_alb_bridged_mac(struct bonding *bond, struct ethhdr *eth_data) +{ + struct list_head *iter; + struct slave *slave; + + if (BOND_MODE(bond) != BOND_MODE_ALB) + return false; + + /* Don't modify source MACs that do not originate locally + * (e.g.,arrive via a bridge). + */ + if (!netif_is_bridge_port(bond->dev)) + return false; + + if (bond_slave_has_mac_rx(bond, eth_data->h_source)) + return false; + + return true; +} + static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond, struct slave *tx_slave) { @@ -1316,7 +1336,8 @@ static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond, } if (tx_slave && bond_slave_can_tx(tx_slave)) { - if (tx_slave != rcu_access_pointer(bond->curr_active_slave)) { + if (tx_slave != rcu_access_pointer(bond->curr_active_slave) && + !bond_alb_bridged_mac(bond, eth_data)) { ether_addr_copy(eth_data->h_source, tx_slave->dev->dev_addr); } -- 2.30.2 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] bond_alb: don't rewrite bridged non-local MACs 2021-05-18 21:08 ` [PATCH 2/4] bond_alb: don't rewrite bridged non-local MACs Jarod Wilson @ 2021-05-19 22:31 ` Jay Vosburgh 0 siblings, 0 replies; 20+ messages in thread From: Jay Vosburgh @ 2021-05-19 22:31 UTC (permalink / raw) To: Jarod Wilson Cc: linux-kernel, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, netdev Jarod Wilson <jarod@redhat.com> wrote: >With a virtual machine behind a bridge on top of a bond, outgoing traffic >should retain the VM's source MAC. That works fine most of the time, until >doing a failover, and then the MAC gets rewritten to the bond slave's MAC, >and the return traffic gets dropped. If we don't rewrite the MAC there, we >don't lose any traffic. Please have the log message here specify that this applies only to balance-alb mode, and, the usual nomenclature for bonding patches is "[PATCH] bonding:"; for this case, I'd suggest "balance-alb:" right afterwards to be clear that it's only for alb mode. I didn't really spot the "bond_alb" tag for what it was on first read, and it is the only indication that this change is specific to alb mode other than the patch itself. >Cc: Jay Vosburgh <j.vosburgh@gmail.com> >Cc: Veaceslav Falico <vfalico@gmail.com> >Cc: Andy Gospodarek <andy@greyhouse.net> >Cc: "David S. Miller" <davem@davemloft.net> >Cc: Jakub Kicinski <kuba@kernel.org> >Cc: Thomas Davis <tadavis@lbl.gov> >Cc: netdev@vger.kernel.org >Signed-off-by: Jarod Wilson <jarod@redhat.com> >--- > drivers/net/bonding/bond_alb.c | 23 ++++++++++++++++++++++- > 1 file changed, 22 insertions(+), 1 deletion(-) > >diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c >index 3455f2cc13f2..ce8257c7cbea 100644 >--- a/drivers/net/bonding/bond_alb.c >+++ b/drivers/net/bonding/bond_alb.c >@@ -1302,6 +1302,26 @@ void bond_alb_deinitialize(struct bonding *bond) > rlb_deinitialize(bond); > } > >+static bool bond_alb_bridged_mac(struct bonding *bond, struct ethhdr *eth_data) >+{ >+ struct list_head *iter; >+ struct slave *slave; >+ >+ if (BOND_MODE(bond) != BOND_MODE_ALB) >+ return false; >+ >+ /* Don't modify source MACs that do not originate locally >+ * (e.g.,arrive via a bridge). >+ */ >+ if (!netif_is_bridge_port(bond->dev)) >+ return false; I believe this logic will fail if the plumbing is, e.g., bond -> vlan -> bridge, as netif_is_bridge_port() would not return true for the bond in that case. Making this reliable is tricky at best, and may be impossible to be correct for all possible cases. As such, I think the comment above should reflect the limited scope of what is actually being checked here (i.e., the bond itself is directly a bridge port). -J >+ >+ if (bond_slave_has_mac_rx(bond, eth_data->h_source)) >+ return false; >+ >+ return true; >+} >+ > static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond, > struct slave *tx_slave) > { >@@ -1316,7 +1336,8 @@ static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond, > } > > if (tx_slave && bond_slave_can_tx(tx_slave)) { >- if (tx_slave != rcu_access_pointer(bond->curr_active_slave)) { >+ if (tx_slave != rcu_access_pointer(bond->curr_active_slave) && >+ !bond_alb_bridged_mac(bond, eth_data)) { > ether_addr_copy(eth_data->h_source, > tx_slave->dev->dev_addr); > } >-- >2.30.2 > --- -Jay Vosburgh, jay.vosburgh@canonical.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 3/4] bond_alb: don't tx balance multicast traffic either 2021-05-18 21:08 [PATCH 0/4] bond_alb: support VMs behind bridges better Jarod Wilson 2021-05-18 21:08 ` [PATCH 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson 2021-05-18 21:08 ` [PATCH 2/4] bond_alb: don't rewrite bridged non-local MACs Jarod Wilson @ 2021-05-18 21:08 ` Jarod Wilson 2021-05-19 18:45 ` Jay Vosburgh 2021-05-18 21:08 ` [PATCH 4/4] bond_alb: put all slaves into promisc Jarod Wilson 2021-05-21 13:27 ` [PATCH net-next v2 0/4] bonding/balance-alb: support VMs behind bridges better Jarod Wilson 4 siblings, 1 reply; 20+ messages in thread From: Jarod Wilson @ 2021-05-18 21:08 UTC (permalink / raw) To: linux-kernel Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, netdev Multicast traffic going out the non-primary interface can come back in through the primary interface in alb mode. When there's a bridge sitting on top of the bond, with virtual machines behind it, attached to vnetX interfaces also acting as bridge ports, this can cause problems. The multicast traffic ends up rewriting the bridge forwarding database entries, replacing a vnetX entry in the fdb with the bond instead, at which point, we lose traffic. If we don't tx balance multicast traffic, we don't break connectivity. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Thomas Davis <tadavis@lbl.gov> Cc: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> --- drivers/net/bonding/bond_alb.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index ce8257c7cbea..4df661b77252 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -1422,6 +1422,7 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond, const struct iphdr *iph; if (is_broadcast_ether_addr(eth_data->h_dest) || + is_multicast_ether_addr(eth_data->h_dest) || !pskb_network_may_pull(skb, sizeof(*iph))) { do_tx_balance = false; break; @@ -1441,7 +1442,8 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond, /* IPv6 doesn't really use broadcast mac address, but leave * that here just in case. */ - if (is_broadcast_ether_addr(eth_data->h_dest)) { + if (is_broadcast_ether_addr(eth_data->h_dest) || + is_multicast_ether_addr(eth_data->h_dest)) { do_tx_balance = false; break; } -- 2.30.2 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 3/4] bond_alb: don't tx balance multicast traffic either 2021-05-18 21:08 ` [PATCH 3/4] bond_alb: don't tx balance multicast traffic either Jarod Wilson @ 2021-05-19 18:45 ` Jay Vosburgh 0 siblings, 0 replies; 20+ messages in thread From: Jay Vosburgh @ 2021-05-19 18:45 UTC (permalink / raw) To: Jarod Wilson Cc: linux-kernel, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, netdev Jarod Wilson <jarod@redhat.com> wrote: >Multicast traffic going out the non-primary interface can come back in >through the primary interface in alb mode. When there's a bridge sitting >on top of the bond, with virtual machines behind it, attached to vnetX >interfaces also acting as bridge ports, this can cause problems. The >multicast traffic ends up rewriting the bridge forwarding database >entries, replacing a vnetX entry in the fdb with the bond instead, at >which point, we lose traffic. If we don't tx balance multicast traffic, we >don't break connectivity. Just so I'm clear, the rewrite happens because the "looped" frame bears the source MAC of the VM behind the bridge, but is arriving at the bridge via the bond, correct? If so this change seems reasonable, with one minor nit, below. >Cc: Jay Vosburgh <j.vosburgh@gmail.com> >Cc: Veaceslav Falico <vfalico@gmail.com> >Cc: Andy Gospodarek <andy@greyhouse.net> >Cc: "David S. Miller" <davem@davemloft.net> >Cc: Jakub Kicinski <kuba@kernel.org> >Cc: Thomas Davis <tadavis@lbl.gov> >Cc: netdev@vger.kernel.org >Signed-off-by: Jarod Wilson <jarod@redhat.com> >--- > drivers/net/bonding/bond_alb.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > >diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c >index ce8257c7cbea..4df661b77252 100644 >--- a/drivers/net/bonding/bond_alb.c >+++ b/drivers/net/bonding/bond_alb.c >@@ -1422,6 +1422,7 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond, > const struct iphdr *iph; > > if (is_broadcast_ether_addr(eth_data->h_dest) || >+ is_multicast_ether_addr(eth_data->h_dest) || Note that is_multicast_ is a superset of is_broadcast_, so in this case (and the one below) is_broadcast_ can simply be replaced by is_multicast_. Granted, is_broadcast_ is cheap, but this is in the TX path for every packet. -J > !pskb_network_may_pull(skb, sizeof(*iph))) { > do_tx_balance = false; > break; >@@ -1441,7 +1442,8 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond, > /* IPv6 doesn't really use broadcast mac address, but leave > * that here just in case. > */ >- if (is_broadcast_ether_addr(eth_data->h_dest)) { >+ if (is_broadcast_ether_addr(eth_data->h_dest) || >+ is_multicast_ether_addr(eth_data->h_dest)) { > do_tx_balance = false; > break; > } >-- >2.30.2 > --- -Jay Vosburgh, jay.vosburgh@canonical.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 4/4] bond_alb: put all slaves into promisc 2021-05-18 21:08 [PATCH 0/4] bond_alb: support VMs behind bridges better Jarod Wilson ` (2 preceding siblings ...) 2021-05-18 21:08 ` [PATCH 3/4] bond_alb: don't tx balance multicast traffic either Jarod Wilson @ 2021-05-18 21:08 ` Jarod Wilson 2021-05-19 16:47 ` Jay Vosburgh 2021-05-21 13:27 ` [PATCH net-next v2 0/4] bonding/balance-alb: support VMs behind bridges better Jarod Wilson 4 siblings, 1 reply; 20+ messages in thread From: Jarod Wilson @ 2021-05-18 21:08 UTC (permalink / raw) To: linux-kernel Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, netdev ALB mode bonding can receive on all slaves, so it would seem to make sense that they're all in promisc, unlike other modes that have a primary interface and can only receive on that interface. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Thomas Davis <tadavis@lbl.gov> Cc: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> --- drivers/net/bonding/bond_main.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index d71e398642fb..93f57ff1c552 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -644,9 +644,10 @@ static int bond_check_dev_link(struct bonding *bond, static int bond_set_promiscuity(struct bonding *bond, int inc) { struct list_head *iter; - int err = 0; + int mode, err = 0; - if (bond_uses_primary(bond)) { + mode = BOND_MODE(bond); + if (mode == BOND_MODE_ACTIVEBACKUP || mode == BOND_MODE_TLB) { struct slave *curr_active = rtnl_dereference(bond->curr_active_slave); if (curr_active) -- 2.30.2 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] bond_alb: put all slaves into promisc 2021-05-18 21:08 ` [PATCH 4/4] bond_alb: put all slaves into promisc Jarod Wilson @ 2021-05-19 16:47 ` Jay Vosburgh 0 siblings, 0 replies; 20+ messages in thread From: Jay Vosburgh @ 2021-05-19 16:47 UTC (permalink / raw) To: Jarod Wilson Cc: linux-kernel, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, netdev Jarod Wilson <jarod@redhat.com> wrote: >ALB mode bonding can receive on all slaves, so it would seem to make sense >that they're all in promisc, unlike other modes that have a primary >interface and can only receive on that interface. When I first read the subject and this description, I thought you meant that all of the alb slaves would be in promisc mode all the time, not that enabling promisc on an alb bond would set promisc for all slaves (instead of just the primary, the current behavior). Regardless, since in alb mode the expectation is that all of the slaves will be on the same subnet (Ethernet broadcast domain), setting all of the bonded interfaces to promisc instead of just one sounds like a recipe for duplicate packets, as each would deliver multiple copies of all "no longer filtered" traffic. The bond_should_deliver_exact_match() logic will still suppress broadcast and multicast frames to the non-primary interface(s), but, e.g., unicast frames flooded to all switch ports will be delivered once for each member of the bond. Unless you have a specific use case that this will improve, I think this will cause more confusion that it will capture frames that would have otherwise been missed. -J >Cc: Jay Vosburgh <j.vosburgh@gmail.com> >Cc: Veaceslav Falico <vfalico@gmail.com> >Cc: Andy Gospodarek <andy@greyhouse.net> >Cc: "David S. Miller" <davem@davemloft.net> >Cc: Jakub Kicinski <kuba@kernel.org> >Cc: Thomas Davis <tadavis@lbl.gov> >Cc: netdev@vger.kernel.org >Signed-off-by: Jarod Wilson <jarod@redhat.com> >--- > drivers/net/bonding/bond_main.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >index d71e398642fb..93f57ff1c552 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -644,9 +644,10 @@ static int bond_check_dev_link(struct bonding *bond, > static int bond_set_promiscuity(struct bonding *bond, int inc) > { > struct list_head *iter; >- int err = 0; >+ int mode, err = 0; > >- if (bond_uses_primary(bond)) { >+ mode = BOND_MODE(bond); >+ if (mode == BOND_MODE_ACTIVEBACKUP || mode == BOND_MODE_TLB) { > struct slave *curr_active = rtnl_dereference(bond->curr_active_slave); > > if (curr_active) >-- >2.30.2 > --- -Jay Vosburgh, jay.vosburgh@canonical.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH net-next v2 0/4] bonding/balance-alb: support VMs behind bridges better 2021-05-18 21:08 [PATCH 0/4] bond_alb: support VMs behind bridges better Jarod Wilson ` (3 preceding siblings ...) 2021-05-18 21:08 ` [PATCH 4/4] bond_alb: put all slaves into promisc Jarod Wilson @ 2021-05-21 13:27 ` Jarod Wilson 2021-05-21 13:27 ` [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson ` (3 more replies) 4 siblings, 4 replies; 20+ messages in thread From: Jarod Wilson @ 2021-05-21 13:27 UTC (permalink / raw) To: linux-kernel; +Cc: Jarod Wilson I've been further educated on a use case, where a bridge sits on top of a bond, with multiple vnetX interfaces attached to virtual machines, also acting as ports of the bridge. Each leg of the bond goes to a different switch, but there is NO mlag/vpc in play, the bonding driver has to handle traffic that loops back appropriately to avoid breaking transmission. Rather than adding some sort of mac filtering to balance-xor mode, we switched to using balance-alb, which already does some of this, and with the tweaks provided in this series, empirically seems to behave as desired in actual operation. v2 attempts to support srcmac-only hashing via a modparam instead of by adding yet another hashing mode, as well as cleaning up and clarifying commit messages. Jarod Wilson (4): bonding: add pure source-mac-based tx hashing option bonding/balance-lb: don't rewrite bridged non-local MACs bonding/balance-alb: don't tx balance multicast traffic either bonding/balance-alb: put all slaves into promisc Documentation/networking/bonding.rst | 13 +++++++++++++ drivers/net/bonding/bond_alb.c | 24 +++++++++++++++++++++--- drivers/net/bonding/bond_main.c | 23 +++++++++++++++-------- 3 files changed, 49 insertions(+), 11 deletions(-) -- 2.30.2 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option 2021-05-21 13:27 ` [PATCH net-next v2 0/4] bonding/balance-alb: support VMs behind bridges better Jarod Wilson @ 2021-05-21 13:27 ` Jarod Wilson 2021-05-21 13:39 ` Nikolay Aleksandrov 2021-05-21 13:27 ` [PATCH net-next v2 2/4] bonding/balance-lb: don't rewrite bridged non-local MACs Jarod Wilson ` (2 subsequent siblings) 3 siblings, 1 reply; 20+ messages in thread From: Jarod Wilson @ 2021-05-21 13:27 UTC (permalink / raw) To: linux-kernel Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, Nikolay Aleksandrov, netdev As it turns out, a pure source-mac only tx hash has a place for some VM setups. The previously added vlan+srcmac hash doesn't work as well for a VM with a single MAC and multiple vlans -- these types of setups path traffic more efficiently if the load is split by source mac alone. Enable this by way of an extra module parameter, rather than adding yet another hashing method in the tx fast path. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Thomas Davis <tadavis@lbl.gov> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Cc: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> --- Documentation/networking/bonding.rst | 13 +++++++++++++ drivers/net/bonding/bond_main.c | 18 ++++++++++++------ 2 files changed, 25 insertions(+), 6 deletions(-) diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst index 62f2aab8eaec..767dbb49018b 100644 --- a/Documentation/networking/bonding.rst +++ b/Documentation/networking/bonding.rst @@ -707,6 +707,13 @@ mode swapped with the new curr_active_slave that was chosen. +novlan_srcmac + + When using the vlan+srcmac xmit_hash_policy, there may be cases where + omitting the vlan from the hash is beneficial. This can be done with + an extra module parameter here. The default value is 0 to include + vlan ID in the transmit hash. + num_grat_arp, num_unsol_na @@ -964,6 +971,12 @@ xmit_hash_policy hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev) + Optionally, if the module parameter novlan_srcmac=1 is set, + the vlan ID is omitted from the hash and only the source MAC + address is used, reducing the hash to + + hash = (source MAC vendor) XOR (source MAC dev) + The default value is layer2. This option was added in bonding version 2.6.3. In earlier versions of bonding, this parameter does not exist, and the layer2 policy is the only policy. The diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 20bbda1b36e1..32785e9d0295 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -107,6 +107,7 @@ static char *lacp_rate; static int min_links; static char *ad_select; static char *xmit_hash_policy; +static int novlan_srcmac; static int arp_interval; static char *arp_ip_target[BOND_MAX_ARP_TARGETS]; static char *arp_validate; @@ -168,6 +169,9 @@ MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3 "0 for layer 2 (default), 1 for layer 3+4, " "2 for layer 2+3, 3 for encap layer 2+3, " "4 for encap layer 3+4, 5 for vlan+srcmac"); +module_param(novlan_srcmac, int, 0); +MODULE_PARM_DESC(novlan_srcmac, "include vlan ID in vlan+srcmac xmit_hash_policy or not; " + "0 to include it (default), 1 to exclude it"); module_param(arp_interval, int, 0); MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds"); module_param_array(arp_ip_target, charp, NULL, 0); @@ -3523,9 +3527,9 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, static u32 bond_vlan_srcmac_hash(struct sk_buff *skb) { - struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb); + struct ethhdr *mac_hdr = eth_hdr(skb); u32 srcmac_vendor = 0, srcmac_dev = 0; - u16 vlan; + u32 hash; int i; for (i = 0; i < 3; i++) @@ -3534,12 +3538,14 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb) for (i = 3; i < ETH_ALEN; i++) srcmac_dev = (srcmac_dev << 8) | mac_hdr->h_source[i]; - if (!skb_vlan_tag_present(skb)) - return srcmac_vendor ^ srcmac_dev; + hash = srcmac_vendor ^ srcmac_dev; + + if (novlan_srcmac || !skb_vlan_tag_present(skb)) + return hash; - vlan = skb_vlan_tag_get(skb); + hash ^= skb_vlan_tag_get(skb); - return vlan ^ srcmac_vendor ^ srcmac_dev; + return hash; } /* Extract the appropriate headers based on bond's xmit policy */ -- 2.30.2 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option 2021-05-21 13:27 ` [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson @ 2021-05-21 13:39 ` Nikolay Aleksandrov 2021-05-21 13:41 ` Nikolay Aleksandrov 2021-05-21 18:01 ` Jay Vosburgh 0 siblings, 2 replies; 20+ messages in thread From: Nikolay Aleksandrov @ 2021-05-21 13:39 UTC (permalink / raw) To: Jarod Wilson, linux-kernel Cc: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, netdev On 21/05/2021 16:27, Jarod Wilson wrote: > As it turns out, a pure source-mac only tx hash has a place for some VM > setups. The previously added vlan+srcmac hash doesn't work as well for a > VM with a single MAC and multiple vlans -- these types of setups path > traffic more efficiently if the load is split by source mac alone. Enable > this by way of an extra module parameter, rather than adding yet another > hashing method in the tx fast path. > > Cc: Jay Vosburgh <j.vosburgh@gmail.com> > Cc: Veaceslav Falico <vfalico@gmail.com> > Cc: Andy Gospodarek <andy@greyhouse.net> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Thomas Davis <tadavis@lbl.gov> > Cc: Nikolay Aleksandrov <nikolay@nvidia.com> > Cc: netdev@vger.kernel.org > Signed-off-by: Jarod Wilson <jarod@redhat.com> > --- > Documentation/networking/bonding.rst | 13 +++++++++++++ > drivers/net/bonding/bond_main.c | 18 ++++++++++++------ > 2 files changed, 25 insertions(+), 6 deletions(-) > Hi, You seem to be missing netlink support for the new option. Code-wise the rest seems fine, my personal preference is still to make a configurable hash option and perhaps default to srcmac+vlan, i.e. it can be aliased with this hash option. I don't mind either way, but please add netlink support if it will be a new option as it's the preferred way for configuring. Thanks, Nik > diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst > index 62f2aab8eaec..767dbb49018b 100644 > --- a/Documentation/networking/bonding.rst > +++ b/Documentation/networking/bonding.rst > @@ -707,6 +707,13 @@ mode > swapped with the new curr_active_slave that was > chosen. > > +novlan_srcmac > + > + When using the vlan+srcmac xmit_hash_policy, there may be cases where > + omitting the vlan from the hash is beneficial. This can be done with > + an extra module parameter here. The default value is 0 to include > + vlan ID in the transmit hash. > + > num_grat_arp, > num_unsol_na > > @@ -964,6 +971,12 @@ xmit_hash_policy > > hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev) > > + Optionally, if the module parameter novlan_srcmac=1 is set, > + the vlan ID is omitted from the hash and only the source MAC > + address is used, reducing the hash to > + > + hash = (source MAC vendor) XOR (source MAC dev) > + > The default value is layer2. This option was added in bonding > version 2.6.3. In earlier versions of bonding, this parameter > does not exist, and the layer2 policy is the only policy. The > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > index 20bbda1b36e1..32785e9d0295 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -107,6 +107,7 @@ static char *lacp_rate; > static int min_links; > static char *ad_select; > static char *xmit_hash_policy; > +static int novlan_srcmac; > static int arp_interval; > static char *arp_ip_target[BOND_MAX_ARP_TARGETS]; > static char *arp_validate; > @@ -168,6 +169,9 @@ MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3 > "0 for layer 2 (default), 1 for layer 3+4, " > "2 for layer 2+3, 3 for encap layer 2+3, " > "4 for encap layer 3+4, 5 for vlan+srcmac"); > +module_param(novlan_srcmac, int, 0); > +MODULE_PARM_DESC(novlan_srcmac, "include vlan ID in vlan+srcmac xmit_hash_policy or not; " > + "0 to include it (default), 1 to exclude it"); > module_param(arp_interval, int, 0); > MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds"); > module_param_array(arp_ip_target, charp, NULL, 0); > @@ -3523,9 +3527,9 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, > > static u32 bond_vlan_srcmac_hash(struct sk_buff *skb) > { > - struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb); > + struct ethhdr *mac_hdr = eth_hdr(skb); > u32 srcmac_vendor = 0, srcmac_dev = 0; > - u16 vlan; > + u32 hash; > int i; > > for (i = 0; i < 3; i++) > @@ -3534,12 +3538,14 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb) > for (i = 3; i < ETH_ALEN; i++) > srcmac_dev = (srcmac_dev << 8) | mac_hdr->h_source[i]; > > - if (!skb_vlan_tag_present(skb)) > - return srcmac_vendor ^ srcmac_dev; > + hash = srcmac_vendor ^ srcmac_dev; > + > + if (novlan_srcmac || !skb_vlan_tag_present(skb)) > + return hash; > > - vlan = skb_vlan_tag_get(skb); > + hash ^= skb_vlan_tag_get(skb); > > - return vlan ^ srcmac_vendor ^ srcmac_dev; > + return hash; > } > > /* Extract the appropriate headers based on bond's xmit policy */ > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option 2021-05-21 13:39 ` Nikolay Aleksandrov @ 2021-05-21 13:41 ` Nikolay Aleksandrov 2021-05-21 18:01 ` Jay Vosburgh 1 sibling, 0 replies; 20+ messages in thread From: Nikolay Aleksandrov @ 2021-05-21 13:41 UTC (permalink / raw) To: Jarod Wilson, linux-kernel Cc: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, netdev On 21/05/2021 16:39, Nikolay Aleksandrov wrote: > On 21/05/2021 16:27, Jarod Wilson wrote: >> As it turns out, a pure source-mac only tx hash has a place for some VM >> setups. The previously added vlan+srcmac hash doesn't work as well for a >> VM with a single MAC and multiple vlans -- these types of setups path >> traffic more efficiently if the load is split by source mac alone. Enable >> this by way of an extra module parameter, rather than adding yet another >> hashing method in the tx fast path. >> >> Cc: Jay Vosburgh <j.vosburgh@gmail.com> >> Cc: Veaceslav Falico <vfalico@gmail.com> >> Cc: Andy Gospodarek <andy@greyhouse.net> >> Cc: "David S. Miller" <davem@davemloft.net> >> Cc: Jakub Kicinski <kuba@kernel.org> >> Cc: Thomas Davis <tadavis@lbl.gov> >> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> >> Cc: netdev@vger.kernel.org >> Signed-off-by: Jarod Wilson <jarod@redhat.com> >> --- >> Documentation/networking/bonding.rst | 13 +++++++++++++ >> drivers/net/bonding/bond_main.c | 18 ++++++++++++------ >> 2 files changed, 25 insertions(+), 6 deletions(-) >> > > Hi, > You seem to be missing netlink support for the new option. Code-wise the rest seems fine, > my personal preference is still to make a configurable hash option and perhaps default to > srcmac+vlan, i.e. it can be aliased with this hash option. I don't mind either way, but > please add netlink support if it will be a new option as it's the preferred way for > configuring. > > Thanks, > Nik > Almost forgot - please include a cover letter with overview and motivation of the changes. Also I guess these should be targeted at net-next (or at least the new option definitely should go there), and please add changelog between patchset versions. Cheers, Nik >> diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst >> index 62f2aab8eaec..767dbb49018b 100644 >> --- a/Documentation/networking/bonding.rst >> +++ b/Documentation/networking/bonding.rst >> @@ -707,6 +707,13 @@ mode >> swapped with the new curr_active_slave that was >> chosen. >> >> +novlan_srcmac >> + >> + When using the vlan+srcmac xmit_hash_policy, there may be cases where >> + omitting the vlan from the hash is beneficial. This can be done with >> + an extra module parameter here. The default value is 0 to include >> + vlan ID in the transmit hash. >> + >> num_grat_arp, >> num_unsol_na >> >> @@ -964,6 +971,12 @@ xmit_hash_policy >> >> hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev) >> >> + Optionally, if the module parameter novlan_srcmac=1 is set, >> + the vlan ID is omitted from the hash and only the source MAC >> + address is used, reducing the hash to >> + >> + hash = (source MAC vendor) XOR (source MAC dev) >> + >> The default value is layer2. This option was added in bonding >> version 2.6.3. In earlier versions of bonding, this parameter >> does not exist, and the layer2 policy is the only policy. The >> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >> index 20bbda1b36e1..32785e9d0295 100644 >> --- a/drivers/net/bonding/bond_main.c >> +++ b/drivers/net/bonding/bond_main.c >> @@ -107,6 +107,7 @@ static char *lacp_rate; >> static int min_links; >> static char *ad_select; >> static char *xmit_hash_policy; >> +static int novlan_srcmac; >> static int arp_interval; >> static char *arp_ip_target[BOND_MAX_ARP_TARGETS]; >> static char *arp_validate; >> @@ -168,6 +169,9 @@ MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3 >> "0 for layer 2 (default), 1 for layer 3+4, " >> "2 for layer 2+3, 3 for encap layer 2+3, " >> "4 for encap layer 3+4, 5 for vlan+srcmac"); >> +module_param(novlan_srcmac, int, 0); >> +MODULE_PARM_DESC(novlan_srcmac, "include vlan ID in vlan+srcmac xmit_hash_policy or not; " >> + "0 to include it (default), 1 to exclude it"); >> module_param(arp_interval, int, 0); >> MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds"); >> module_param_array(arp_ip_target, charp, NULL, 0); >> @@ -3523,9 +3527,9 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, >> >> static u32 bond_vlan_srcmac_hash(struct sk_buff *skb) >> { >> - struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb); >> + struct ethhdr *mac_hdr = eth_hdr(skb); >> u32 srcmac_vendor = 0, srcmac_dev = 0; >> - u16 vlan; >> + u32 hash; >> int i; >> >> for (i = 0; i < 3; i++) >> @@ -3534,12 +3538,14 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb) >> for (i = 3; i < ETH_ALEN; i++) >> srcmac_dev = (srcmac_dev << 8) | mac_hdr->h_source[i]; >> >> - if (!skb_vlan_tag_present(skb)) >> - return srcmac_vendor ^ srcmac_dev; >> + hash = srcmac_vendor ^ srcmac_dev; >> + >> + if (novlan_srcmac || !skb_vlan_tag_present(skb)) >> + return hash; >> >> - vlan = skb_vlan_tag_get(skb); >> + hash ^= skb_vlan_tag_get(skb); >> >> - return vlan ^ srcmac_vendor ^ srcmac_dev; >> + return hash; >> } >> >> /* Extract the appropriate headers based on bond's xmit policy */ >> > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option 2021-05-21 13:39 ` Nikolay Aleksandrov 2021-05-21 13:41 ` Nikolay Aleksandrov @ 2021-05-21 18:01 ` Jay Vosburgh 1 sibling, 0 replies; 20+ messages in thread From: Jay Vosburgh @ 2021-05-21 18:01 UTC (permalink / raw) To: Nikolay Aleksandrov Cc: Jarod Wilson, linux-kernel, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, netdev Nikolay Aleksandrov <nikolay@nvidia.com> wrote: >On 21/05/2021 16:27, Jarod Wilson wrote: >> As it turns out, a pure source-mac only tx hash has a place for some VM >> setups. The previously added vlan+srcmac hash doesn't work as well for a >> VM with a single MAC and multiple vlans -- these types of setups path >> traffic more efficiently if the load is split by source mac alone. Enable >> this by way of an extra module parameter, rather than adding yet another >> hashing method in the tx fast path. >> >> Cc: Jay Vosburgh <j.vosburgh@gmail.com> >> Cc: Veaceslav Falico <vfalico@gmail.com> >> Cc: Andy Gospodarek <andy@greyhouse.net> >> Cc: "David S. Miller" <davem@davemloft.net> >> Cc: Jakub Kicinski <kuba@kernel.org> >> Cc: Thomas Davis <tadavis@lbl.gov> >> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> >> Cc: netdev@vger.kernel.org >> Signed-off-by: Jarod Wilson <jarod@redhat.com> >> --- >> Documentation/networking/bonding.rst | 13 +++++++++++++ >> drivers/net/bonding/bond_main.c | 18 ++++++++++++------ >> 2 files changed, 25 insertions(+), 6 deletions(-) >> > >Hi, >You seem to be missing netlink support for the new option. Code-wise the rest seems fine, >my personal preference is still to make a configurable hash option and perhaps default to >srcmac+vlan, i.e. it can be aliased with this hash option. I don't mind either way, but >please add netlink support if it will be a new option as it's the preferred way for >configuring. FWIW, I'm in agreement with Nik here; netlink support is mandatory for any new options. -J >Thanks, > Nik > >> diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst >> index 62f2aab8eaec..767dbb49018b 100644 >> --- a/Documentation/networking/bonding.rst >> +++ b/Documentation/networking/bonding.rst >> @@ -707,6 +707,13 @@ mode >> swapped with the new curr_active_slave that was >> chosen. >> >> +novlan_srcmac >> + >> + When using the vlan+srcmac xmit_hash_policy, there may be cases where >> + omitting the vlan from the hash is beneficial. This can be done with >> + an extra module parameter here. The default value is 0 to include >> + vlan ID in the transmit hash. >> + >> num_grat_arp, >> num_unsol_na >> >> @@ -964,6 +971,12 @@ xmit_hash_policy >> >> hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev) >> >> + Optionally, if the module parameter novlan_srcmac=1 is set, >> + the vlan ID is omitted from the hash and only the source MAC >> + address is used, reducing the hash to >> + >> + hash = (source MAC vendor) XOR (source MAC dev) >> + >> The default value is layer2. This option was added in bonding >> version 2.6.3. In earlier versions of bonding, this parameter >> does not exist, and the layer2 policy is the only policy. The >> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >> index 20bbda1b36e1..32785e9d0295 100644 >> --- a/drivers/net/bonding/bond_main.c >> +++ b/drivers/net/bonding/bond_main.c >> @@ -107,6 +107,7 @@ static char *lacp_rate; >> static int min_links; >> static char *ad_select; >> static char *xmit_hash_policy; >> +static int novlan_srcmac; >> static int arp_interval; >> static char *arp_ip_target[BOND_MAX_ARP_TARGETS]; >> static char *arp_validate; >> @@ -168,6 +169,9 @@ MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3 >> "0 for layer 2 (default), 1 for layer 3+4, " >> "2 for layer 2+3, 3 for encap layer 2+3, " >> "4 for encap layer 3+4, 5 for vlan+srcmac"); >> +module_param(novlan_srcmac, int, 0); >> +MODULE_PARM_DESC(novlan_srcmac, "include vlan ID in vlan+srcmac xmit_hash_policy or not; " >> + "0 to include it (default), 1 to exclude it"); >> module_param(arp_interval, int, 0); >> MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds"); >> module_param_array(arp_ip_target, charp, NULL, 0); >> @@ -3523,9 +3527,9 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, >> >> static u32 bond_vlan_srcmac_hash(struct sk_buff *skb) >> { >> - struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb); >> + struct ethhdr *mac_hdr = eth_hdr(skb); >> u32 srcmac_vendor = 0, srcmac_dev = 0; >> - u16 vlan; >> + u32 hash; >> int i; >> >> for (i = 0; i < 3; i++) >> @@ -3534,12 +3538,14 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb) >> for (i = 3; i < ETH_ALEN; i++) >> srcmac_dev = (srcmac_dev << 8) | mac_hdr->h_source[i]; >> >> - if (!skb_vlan_tag_present(skb)) >> - return srcmac_vendor ^ srcmac_dev; >> + hash = srcmac_vendor ^ srcmac_dev; >> + >> + if (novlan_srcmac || !skb_vlan_tag_present(skb)) >> + return hash; >> >> - vlan = skb_vlan_tag_get(skb); >> + hash ^= skb_vlan_tag_get(skb); >> >> - return vlan ^ srcmac_vendor ^ srcmac_dev; >> + return hash; >> } >> >> /* Extract the appropriate headers based on bond's xmit policy */ --- -Jay Vosburgh, jay.vosburgh@canonical.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH net-next v2 2/4] bonding/balance-lb: don't rewrite bridged non-local MACs 2021-05-21 13:27 ` [PATCH net-next v2 0/4] bonding/balance-alb: support VMs behind bridges better Jarod Wilson 2021-05-21 13:27 ` [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson @ 2021-05-21 13:27 ` Jarod Wilson 2021-05-21 17:58 ` Jay Vosburgh 2021-05-21 13:27 ` [PATCH net-next v2 3/4] bonding/balance-alb: don't tx balance multicast traffic either Jarod Wilson 2021-05-21 13:27 ` [PATCH net-next v2 4/4] bonding/balance-alb: put all slaves into promisc Jarod Wilson 3 siblings, 1 reply; 20+ messages in thread From: Jarod Wilson @ 2021-05-21 13:27 UTC (permalink / raw) To: linux-kernel Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, netdev With a virtual machine behind a bridge that directly incorporates a balance-alb bond as one of its ports, outgoing traffic should retain the VM's source MAC. That works fine most of the time, until doing a failover, and then the MAC gets rewritten to the bond slave's MAC, and the return traffic gets dropped. If we don't rewrite the MAC there, we don't lose the traffic. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Thomas Davis <tadavis@lbl.gov> Cc: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> --- drivers/net/bonding/bond_alb.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index 3455f2cc13f2..c57f62e43328 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -1302,6 +1302,23 @@ void bond_alb_deinitialize(struct bonding *bond) rlb_deinitialize(bond); } +static bool bond_alb_bridged_mac(struct bonding *bond, struct ethhdr *eth_data) +{ + if (BOND_MODE(bond) != BOND_MODE_ALB) + return false; + + /* Don't modify source MACs that do not originate locally + * (e.g.,arrive via a bridge). + */ + if (!netif_is_bridge_port(bond->dev)) + return false; + + if (bond_slave_has_mac_rx(bond, eth_data->h_source)) + return false; + + return true; +} + static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond, struct slave *tx_slave) { @@ -1316,7 +1333,8 @@ static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond, } if (tx_slave && bond_slave_can_tx(tx_slave)) { - if (tx_slave != rcu_access_pointer(bond->curr_active_slave)) { + if (tx_slave != rcu_access_pointer(bond->curr_active_slave) && + !bond_alb_bridged_mac(bond, eth_data)) { ether_addr_copy(eth_data->h_source, tx_slave->dev->dev_addr); } -- 2.30.2 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH net-next v2 2/4] bonding/balance-lb: don't rewrite bridged non-local MACs 2021-05-21 13:27 ` [PATCH net-next v2 2/4] bonding/balance-lb: don't rewrite bridged non-local MACs Jarod Wilson @ 2021-05-21 17:58 ` Jay Vosburgh 0 siblings, 0 replies; 20+ messages in thread From: Jay Vosburgh @ 2021-05-21 17:58 UTC (permalink / raw) To: Jarod Wilson Cc: linux-kernel, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, netdev Jarod Wilson <jarod@redhat.com> wrote: >With a virtual machine behind a bridge that directly incorporates a >balance-alb bond as one of its ports, outgoing traffic should retain the >VM's source MAC. That works fine most of the time, until doing a failover, >and then the MAC gets rewritten to the bond slave's MAC, and the return >traffic gets dropped. If we don't rewrite the MAC there, we don't lose the >traffic. Comparing the description above to the patch, is the erroneous behavior really related to failover (i.e., bond slave goes down, bond reshuffles various things as it is wont to do), or is it related to either a TX side rebalance or even simply that specific traffic is being sent on a slave that isn't the curr_active_slave? One more comment, below. >Cc: Jay Vosburgh <j.vosburgh@gmail.com> >Cc: Veaceslav Falico <vfalico@gmail.com> >Cc: Andy Gospodarek <andy@greyhouse.net> >Cc: "David S. Miller" <davem@davemloft.net> >Cc: Jakub Kicinski <kuba@kernel.org> >Cc: Thomas Davis <tadavis@lbl.gov> >Cc: netdev@vger.kernel.org >Signed-off-by: Jarod Wilson <jarod@redhat.com> >--- > drivers/net/bonding/bond_alb.c | 20 +++++++++++++++++++- > 1 file changed, 19 insertions(+), 1 deletion(-) > >diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c >index 3455f2cc13f2..c57f62e43328 100644 >--- a/drivers/net/bonding/bond_alb.c >+++ b/drivers/net/bonding/bond_alb.c >@@ -1302,6 +1302,23 @@ void bond_alb_deinitialize(struct bonding *bond) > rlb_deinitialize(bond); > } > >+static bool bond_alb_bridged_mac(struct bonding *bond, struct ethhdr *eth_data) >+{ >+ if (BOND_MODE(bond) != BOND_MODE_ALB) >+ return false; >+ >+ /* Don't modify source MACs that do not originate locally >+ * (e.g.,arrive via a bridge). >+ */ >+ if (!netif_is_bridge_port(bond->dev)) >+ return false; Repeating my comment (from my response to the v1 patch) that hasn't been addressed: I believe this logic will fail if the plumbing is, e.g., bond -> vlan -> bridge, as netif_is_bridge_port() would not return true for the bond in that case. Making this reliable is tricky at best, and may be impossible to be correct for all possible cases. As such, I think the comment above should reflect the limited scope of what is actually being checked here (i.e., the bond itself is directly a bridge port). Ideally, the bonding.rst documentation should describe the special behaviors of alb mode when configured as a bridge port. -J >+ >+ if (bond_slave_has_mac_rx(bond, eth_data->h_source)) >+ return false; >+ >+ return true; >+} >+ > static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond, > struct slave *tx_slave) > { >@@ -1316,7 +1333,8 @@ static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond, > } > > if (tx_slave && bond_slave_can_tx(tx_slave)) { >- if (tx_slave != rcu_access_pointer(bond->curr_active_slave)) { >+ if (tx_slave != rcu_access_pointer(bond->curr_active_slave) && >+ !bond_alb_bridged_mac(bond, eth_data)) { > ether_addr_copy(eth_data->h_source, > tx_slave->dev->dev_addr); > } >-- >2.30.2 --- -Jay Vosburgh, jay.vosburgh@canonical.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH net-next v2 3/4] bonding/balance-alb: don't tx balance multicast traffic either 2021-05-21 13:27 ` [PATCH net-next v2 0/4] bonding/balance-alb: support VMs behind bridges better Jarod Wilson 2021-05-21 13:27 ` [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson 2021-05-21 13:27 ` [PATCH net-next v2 2/4] bonding/balance-lb: don't rewrite bridged non-local MACs Jarod Wilson @ 2021-05-21 13:27 ` Jarod Wilson 2021-05-21 17:02 ` Jay Vosburgh 2021-05-21 13:27 ` [PATCH net-next v2 4/4] bonding/balance-alb: put all slaves into promisc Jarod Wilson 3 siblings, 1 reply; 20+ messages in thread From: Jarod Wilson @ 2021-05-21 13:27 UTC (permalink / raw) To: linux-kernel Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, netdev Multicast traffic going out the non-primary interface can come back in through the primary interface in alb mode. When there's a bridge sitting on top of the bond, with virtual machines behind it, attached to vnetX interfaces also acting as bridge ports, this can cause problems. The looped frame has the source MAC of the VM behind the bridge, and ends up rewriting the bridge forwarding database entries, replacing a vnetX entry in the fdb with the bond instead, at which point, we lose traffic. If we don't tx balance multicast traffic, we don't break connectivity. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Thomas Davis <tadavis@lbl.gov> Cc: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> --- drivers/net/bonding/bond_alb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index c57f62e43328..cddc4d8b2519 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -1418,7 +1418,7 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond, case ETH_P_IP: { const struct iphdr *iph; - if (is_broadcast_ether_addr(eth_data->h_dest) || + if (is_multicast_ether_addr(eth_data->h_dest) || !pskb_network_may_pull(skb, sizeof(*iph))) { do_tx_balance = false; break; @@ -1438,7 +1438,7 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond, /* IPv6 doesn't really use broadcast mac address, but leave * that here just in case. */ - if (is_broadcast_ether_addr(eth_data->h_dest)) { + if (is_multicast_ether_addr(eth_data->h_dest)) { do_tx_balance = false; break; } -- 2.30.2 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH net-next v2 3/4] bonding/balance-alb: don't tx balance multicast traffic either 2021-05-21 13:27 ` [PATCH net-next v2 3/4] bonding/balance-alb: don't tx balance multicast traffic either Jarod Wilson @ 2021-05-21 17:02 ` Jay Vosburgh 0 siblings, 0 replies; 20+ messages in thread From: Jay Vosburgh @ 2021-05-21 17:02 UTC (permalink / raw) To: Jarod Wilson Cc: linux-kernel, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, netdev Jarod Wilson <jarod@redhat.com> wrote: >Multicast traffic going out the non-primary interface can come back in >through the primary interface in alb mode. When there's a bridge sitting >on top of the bond, with virtual machines behind it, attached to vnetX >interfaces also acting as bridge ports, this can cause problems. The >looped frame has the source MAC of the VM behind the bridge, and ends up >rewriting the bridge forwarding database entries, replacing a vnetX entry >in the fdb with the bond instead, at which point, we lose traffic. If we >don't tx balance multicast traffic, we don't break connectivity. > >Cc: Jay Vosburgh <j.vosburgh@gmail.com> >Cc: Veaceslav Falico <vfalico@gmail.com> >Cc: Andy Gospodarek <andy@greyhouse.net> >Cc: "David S. Miller" <davem@davemloft.net> >Cc: Jakub Kicinski <kuba@kernel.org> >Cc: Thomas Davis <tadavis@lbl.gov> >Cc: netdev@vger.kernel.org >Signed-off-by: Jarod Wilson <jarod@redhat.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> >--- > drivers/net/bonding/bond_alb.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > >diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c >index c57f62e43328..cddc4d8b2519 100644 >--- a/drivers/net/bonding/bond_alb.c >+++ b/drivers/net/bonding/bond_alb.c >@@ -1418,7 +1418,7 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond, > case ETH_P_IP: { > const struct iphdr *iph; > >- if (is_broadcast_ether_addr(eth_data->h_dest) || >+ if (is_multicast_ether_addr(eth_data->h_dest) || > !pskb_network_may_pull(skb, sizeof(*iph))) { > do_tx_balance = false; > break; >@@ -1438,7 +1438,7 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond, > /* IPv6 doesn't really use broadcast mac address, but leave > * that here just in case. > */ >- if (is_broadcast_ether_addr(eth_data->h_dest)) { >+ if (is_multicast_ether_addr(eth_data->h_dest)) { > do_tx_balance = false; > break; > } >-- >2.30.2 > ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH net-next v2 4/4] bonding/balance-alb: put all slaves into promisc 2021-05-21 13:27 ` [PATCH net-next v2 0/4] bonding/balance-alb: support VMs behind bridges better Jarod Wilson ` (2 preceding siblings ...) 2021-05-21 13:27 ` [PATCH net-next v2 3/4] bonding/balance-alb: don't tx balance multicast traffic either Jarod Wilson @ 2021-05-21 13:27 ` Jarod Wilson 2021-05-21 17:01 ` Jay Vosburgh 3 siblings, 1 reply; 20+ messages in thread From: Jarod Wilson @ 2021-05-21 13:27 UTC (permalink / raw) To: linux-kernel Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, netdev Unlike most other modes with a primary interface, ALB mode bonding can receive on all slaves. That includes traffic destined for a non-local MAC behind a bridge on top of the bond. Such traffic gets dropped if the interface isn't in promiscuous mode. Therefore, it would seem to make sense to put all slaves into promisc. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Thomas Davis <tadavis@lbl.gov> Cc: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> --- drivers/net/bonding/bond_main.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 32785e9d0295..6d95f9e46059 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -647,9 +647,10 @@ static int bond_check_dev_link(struct bonding *bond, static int bond_set_promiscuity(struct bonding *bond, int inc) { struct list_head *iter; - int err = 0; + int mode, err = 0; - if (bond_uses_primary(bond)) { + mode = BOND_MODE(bond); + if (mode == BOND_MODE_ACTIVEBACKUP || mode == BOND_MODE_TLB) { struct slave *curr_active = rtnl_dereference(bond->curr_active_slave); if (curr_active) -- 2.30.2 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH net-next v2 4/4] bonding/balance-alb: put all slaves into promisc 2021-05-21 13:27 ` [PATCH net-next v2 4/4] bonding/balance-alb: put all slaves into promisc Jarod Wilson @ 2021-05-21 17:01 ` Jay Vosburgh 0 siblings, 0 replies; 20+ messages in thread From: Jay Vosburgh @ 2021-05-21 17:01 UTC (permalink / raw) To: Jarod Wilson Cc: linux-kernel, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, Thomas Davis, netdev Jarod Wilson <jarod@redhat.com> wrote: >Unlike most other modes with a primary interface, ALB mode bonding can >receive on all slaves. That includes traffic destined for a non-local MAC >behind a bridge on top of the bond. Such traffic gets dropped if the >interface isn't in promiscuous mode. Therefore, it would seem to make >sense to put all slaves into promisc. I'm still confused by this description and the actual changes of the patch; the description above reads to me as you're intending that all slaves of an alb bond should be promisc all the time, but the patch below doesn't do that (it causes all alb mode slaves to be set to promisc when the bond itself is set to promisc mode). Could you clarify? Also, your phrasing that "it would seem to make sense" suggests to me that this change is speculative. Do you have a specific example of when the prior behavior causes an issue? -J >Cc: Jay Vosburgh <j.vosburgh@gmail.com> >Cc: Veaceslav Falico <vfalico@gmail.com> >Cc: Andy Gospodarek <andy@greyhouse.net> >Cc: "David S. Miller" <davem@davemloft.net> >Cc: Jakub Kicinski <kuba@kernel.org> >Cc: Thomas Davis <tadavis@lbl.gov> >Cc: netdev@vger.kernel.org >Signed-off-by: Jarod Wilson <jarod@redhat.com> >--- > drivers/net/bonding/bond_main.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >index 32785e9d0295..6d95f9e46059 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -647,9 +647,10 @@ static int bond_check_dev_link(struct bonding *bond, > static int bond_set_promiscuity(struct bonding *bond, int inc) > { > struct list_head *iter; >- int err = 0; >+ int mode, err = 0; > >- if (bond_uses_primary(bond)) { >+ mode = BOND_MODE(bond); >+ if (mode == BOND_MODE_ACTIVEBACKUP || mode == BOND_MODE_TLB) { > struct slave *curr_active = rtnl_dereference(bond->curr_active_slave); > > if (curr_active) >-- >2.30.2 --- -Jay Vosburgh, jay.vosburgh@canonical.com ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2021-05-21 18:01 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-18 21:08 [PATCH 0/4] bond_alb: support VMs behind bridges better Jarod Wilson 2021-05-18 21:08 ` [PATCH 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson 2021-05-19 9:01 ` Nikolay Aleksandrov 2021-05-18 21:08 ` [PATCH 2/4] bond_alb: don't rewrite bridged non-local MACs Jarod Wilson 2021-05-19 22:31 ` Jay Vosburgh 2021-05-18 21:08 ` [PATCH 3/4] bond_alb: don't tx balance multicast traffic either Jarod Wilson 2021-05-19 18:45 ` Jay Vosburgh 2021-05-18 21:08 ` [PATCH 4/4] bond_alb: put all slaves into promisc Jarod Wilson 2021-05-19 16:47 ` Jay Vosburgh 2021-05-21 13:27 ` [PATCH net-next v2 0/4] bonding/balance-alb: support VMs behind bridges better Jarod Wilson 2021-05-21 13:27 ` [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson 2021-05-21 13:39 ` Nikolay Aleksandrov 2021-05-21 13:41 ` Nikolay Aleksandrov 2021-05-21 18:01 ` Jay Vosburgh 2021-05-21 13:27 ` [PATCH net-next v2 2/4] bonding/balance-lb: don't rewrite bridged non-local MACs Jarod Wilson 2021-05-21 17:58 ` Jay Vosburgh 2021-05-21 13:27 ` [PATCH net-next v2 3/4] bonding/balance-alb: don't tx balance multicast traffic either Jarod Wilson 2021-05-21 17:02 ` Jay Vosburgh 2021-05-21 13:27 ` [PATCH net-next v2 4/4] bonding/balance-alb: put all slaves into promisc Jarod Wilson 2021-05-21 17:01 ` Jay Vosburgh
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.