All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] bond_alb: support VMs behind bridges better
@ 2021-05-18 21:08 Jarod Wilson
  2021-05-18 21:08 ` [PATCH 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Jarod Wilson @ 2021-05-18 21:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jarod Wilson

I've been further educated on a use case, where a bridge sits on top of
a bond, with multiple vnetX interfaces attached to virtual machines,
also acting as ports of the bridge. Each leg of the bond goes to a
different switch, but there is NO mlag/vpc in play, the bonding driver
has to handle traffic that loops back appropriately to avoid breaking
transmission. Rather than adding some sort of mac filtering to
balance-xor mode, we switched to using balance-alb, which already does
some of this, and with the tweaks provided in this series, empirically
seems to behave as desired in actual operation.

Jarod Wilson (4):
  bonding: add pure source-mac-based tx hashing option
  bond_alb: don't rewrite bridged non-local MACs
  bond_alb: don't tx balance multicast traffic either
  bond_alb: put all slaves into promisc

 Documentation/networking/bonding.rst | 13 ++++++++++++
 drivers/net/bonding/bond_alb.c       | 27 ++++++++++++++++++++++--
 drivers/net/bonding/bond_main.c      | 31 ++++++++++++++++++----------
 drivers/net/bonding/bond_options.c   |  1 +
 include/linux/netdevice.h            |  1 +
 include/uapi/linux/if_bonding.h      |  1 +
 6 files changed, 61 insertions(+), 13 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] bonding: add pure source-mac-based tx hashing option
  2021-05-18 21:08 [PATCH 0/4] bond_alb: support VMs behind bridges better Jarod Wilson
@ 2021-05-18 21:08 ` Jarod Wilson
  2021-05-19  9:01   ` Nikolay Aleksandrov
  2021-05-18 21:08 ` [PATCH 2/4] bond_alb: don't rewrite bridged non-local MACs Jarod Wilson
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Jarod Wilson @ 2021-05-18 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, Jakub Kicinski, Thomas Davis, netdev

As it turns out, a pure source-mac only tx hash has a place for some VM
setups. The previously added vlan+srcmac hash doesn't work as well for a
VM with a single MAC and multiple vlans -- these types of setups path
traffic more efficiently if the load is split by source mac alone.

Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Thomas Davis <tadavis@lbl.gov>
Cc: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 Documentation/networking/bonding.rst | 13 +++++++++++++
 drivers/net/bonding/bond_main.c      | 26 +++++++++++++++++---------
 drivers/net/bonding/bond_options.c   |  1 +
 include/linux/netdevice.h            |  1 +
 include/uapi/linux/if_bonding.h      |  1 +
 5 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
index 62f2aab8eaec..66c3fa3a9040 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -964,6 +964,19 @@ xmit_hash_policy
 
 		hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev)
 
+	srcmac
+
+		This policy uses a very rudimentary source mac hash to
+		load-balance traffic per-source-mac, with failover should
+		one leg fail. The intended use case is for a bond shared
+		by multiple virtual machines, each with their own virtual
+		mac address, keeping the VMs traffic all limited to the
+		same outbound interface.
+
+		The formula for the hash is simply
+
+		hash = (source MAC vendor) XOR (source MAC dev)
+
 	The default value is layer2.  This option was added in bonding
 	version 2.6.3.  In earlier versions of bonding, this parameter
 	does not exist, and the layer2 policy is the only policy.  The
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 20bbda1b36e1..d71e398642fb 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -167,7 +167,8 @@ module_param(xmit_hash_policy, charp, 0);
 MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3ad hashing method; "
 				   "0 for layer 2 (default), 1 for layer 3+4, "
 				   "2 for layer 2+3, 3 for encap layer 2+3, "
-				   "4 for encap layer 3+4, 5 for vlan+srcmac");
+				   "4 for encap layer 3+4, 5 for vlan+srcmac, "
+				   "6 for srcmac");
 module_param(arp_interval, int, 0);
 MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
 module_param_array(arp_ip_target, charp, NULL, 0);
@@ -1459,6 +1460,8 @@ static enum netdev_lag_hash bond_lag_hash_type(struct bonding *bond,
 		return NETDEV_LAG_HASH_E34;
 	case BOND_XMIT_POLICY_VLAN_SRCMAC:
 		return NETDEV_LAG_HASH_VLAN_SRCMAC;
+	case BOND_XMIT_POLICY_SRCMAC:
+		return NETDEV_LAG_HASH_SRCMAC;
 	default:
 		return NETDEV_LAG_HASH_UNKNOWN;
 	}
@@ -3521,11 +3524,11 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
 	return true;
 }
 
-static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
+static u32 bond_vlan_srcmac_hash(struct sk_buff *skb, bool with_vlan)
 {
-	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
+	struct ethhdr *mac_hdr = eth_hdr(skb);
 	u32 srcmac_vendor = 0, srcmac_dev = 0;
-	u16 vlan;
+	u32 hash;
 	int i;
 
 	for (i = 0; i < 3; i++)
@@ -3534,12 +3537,14 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
 	for (i = 3; i < ETH_ALEN; i++)
 		srcmac_dev = (srcmac_dev << 8) | mac_hdr->h_source[i];
 
-	if (!skb_vlan_tag_present(skb))
-		return srcmac_vendor ^ srcmac_dev;
+	hash = srcmac_vendor ^ srcmac_dev;
+
+	if (!with_vlan || !skb_vlan_tag_present(skb))
+		return hash;
 
-	vlan = skb_vlan_tag_get(skb);
+	hash ^= skb_vlan_tag_get(skb);
 
-	return vlan ^ srcmac_vendor ^ srcmac_dev;
+	return hash;
 }
 
 /* Extract the appropriate headers based on bond's xmit policy */
@@ -3618,8 +3623,11 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
 	    skb->l4_hash)
 		return skb->hash;
 
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_SRCMAC)
+		return bond_vlan_srcmac_hash(skb, false);
+
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
-		return bond_vlan_srcmac_hash(skb);
+		return bond_vlan_srcmac_hash(skb, true);
 
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
 	    !bond_flow_dissect(bond, skb, &flow))
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index c9d3604ae129..ff68ad2589f0 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -102,6 +102,7 @@ static const struct bond_opt_value bond_xmit_hashtype_tbl[] = {
 	{ "encap2+3",    BOND_XMIT_POLICY_ENCAP23,     0},
 	{ "encap3+4",    BOND_XMIT_POLICY_ENCAP34,     0},
 	{ "vlan+srcmac", BOND_XMIT_POLICY_VLAN_SRCMAC, 0},
+	{ "srcmac",      BOND_XMIT_POLICY_SRCMAC,      0},
 	{ NULL,          -1,                           0},
 };
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5cbc950b34df..d88319fca1d3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2732,6 +2732,7 @@ enum netdev_lag_hash {
 	NETDEV_LAG_HASH_E23,
 	NETDEV_LAG_HASH_E34,
 	NETDEV_LAG_HASH_VLAN_SRCMAC,
+	NETDEV_LAG_HASH_SRCMAC,
 	NETDEV_LAG_HASH_UNKNOWN,
 };
 
diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
index d174914a837d..f3b4d412a73f 100644
--- a/include/uapi/linux/if_bonding.h
+++ b/include/uapi/linux/if_bonding.h
@@ -95,6 +95,7 @@
 #define BOND_XMIT_POLICY_ENCAP23	3 /* encapsulated layer 2+3 */
 #define BOND_XMIT_POLICY_ENCAP34	4 /* encapsulated layer 3+4 */
 #define BOND_XMIT_POLICY_VLAN_SRCMAC	5 /* vlan + source MAC */
+#define BOND_XMIT_POLICY_SRCMAC		6 /* source MAC only */
 
 /* 802.3ad port state definitions (43.4.2.2 in the 802.3ad standard) */
 #define LACP_STATE_LACP_ACTIVITY   0x1
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/4] bond_alb: don't rewrite bridged non-local MACs
  2021-05-18 21:08 [PATCH 0/4] bond_alb: support VMs behind bridges better Jarod Wilson
  2021-05-18 21:08 ` [PATCH 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson
@ 2021-05-18 21:08 ` Jarod Wilson
  2021-05-19 22:31   ` Jay Vosburgh
  2021-05-18 21:08 ` [PATCH 3/4] bond_alb: don't tx balance multicast traffic either Jarod Wilson
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Jarod Wilson @ 2021-05-18 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, Jakub Kicinski, Thomas Davis, netdev

With a virtual machine behind a bridge on top of a bond, outgoing traffic
should retain the VM's source MAC. That works fine most of the time, until
doing a failover, and then the MAC gets rewritten to the bond slave's MAC,
and the return traffic gets dropped. If we don't rewrite the MAC there, we
don't lose any traffic.

Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Thomas Davis <tadavis@lbl.gov>
Cc: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/net/bonding/bond_alb.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 3455f2cc13f2..ce8257c7cbea 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -1302,6 +1302,26 @@ void bond_alb_deinitialize(struct bonding *bond)
 		rlb_deinitialize(bond);
 }
 
+static bool bond_alb_bridged_mac(struct bonding *bond, struct ethhdr *eth_data)
+{
+	struct list_head *iter;
+	struct slave *slave;
+
+	if (BOND_MODE(bond) != BOND_MODE_ALB)
+		return false;
+
+	/* Don't modify source MACs that do not originate locally
+	 * (e.g.,arrive via a bridge).
+	 */
+	if (!netif_is_bridge_port(bond->dev))
+		return false;
+
+	if (bond_slave_has_mac_rx(bond, eth_data->h_source))
+		return false;
+
+	return true;
+}
+
 static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond,
 				    struct slave *tx_slave)
 {
@@ -1316,7 +1336,8 @@ static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond,
 	}
 
 	if (tx_slave && bond_slave_can_tx(tx_slave)) {
-		if (tx_slave != rcu_access_pointer(bond->curr_active_slave)) {
+		if (tx_slave != rcu_access_pointer(bond->curr_active_slave) &&
+		    !bond_alb_bridged_mac(bond, eth_data)) {
 			ether_addr_copy(eth_data->h_source,
 					tx_slave->dev->dev_addr);
 		}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/4] bond_alb: don't tx balance multicast traffic either
  2021-05-18 21:08 [PATCH 0/4] bond_alb: support VMs behind bridges better Jarod Wilson
  2021-05-18 21:08 ` [PATCH 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson
  2021-05-18 21:08 ` [PATCH 2/4] bond_alb: don't rewrite bridged non-local MACs Jarod Wilson
@ 2021-05-18 21:08 ` Jarod Wilson
  2021-05-19 18:45   ` Jay Vosburgh
  2021-05-18 21:08 ` [PATCH 4/4] bond_alb: put all slaves into promisc Jarod Wilson
  2021-05-21 13:27 ` [PATCH net-next v2 0/4] bonding/balance-alb: support VMs behind bridges better Jarod Wilson
  4 siblings, 1 reply; 20+ messages in thread
From: Jarod Wilson @ 2021-05-18 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, Jakub Kicinski, Thomas Davis, netdev

Multicast traffic going out the non-primary interface can come back in
through the primary interface in alb mode. When there's a bridge sitting
on top of the bond, with virtual machines behind it, attached to vnetX
interfaces also acting as bridge ports, this can cause problems. The
multicast traffic ends up rewriting the bridge forwarding database
entries, replacing a vnetX entry in the fdb with the bond instead, at
which point, we lose traffic. If we don't tx balance multicast traffic, we
don't break connectivity.

Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Thomas Davis <tadavis@lbl.gov>
Cc: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/net/bonding/bond_alb.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index ce8257c7cbea..4df661b77252 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -1422,6 +1422,7 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
 		const struct iphdr *iph;
 
 		if (is_broadcast_ether_addr(eth_data->h_dest) ||
+		    is_multicast_ether_addr(eth_data->h_dest) ||
 		    !pskb_network_may_pull(skb, sizeof(*iph))) {
 			do_tx_balance = false;
 			break;
@@ -1441,7 +1442,8 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
 		/* IPv6 doesn't really use broadcast mac address, but leave
 		 * that here just in case.
 		 */
-		if (is_broadcast_ether_addr(eth_data->h_dest)) {
+		if (is_broadcast_ether_addr(eth_data->h_dest) ||
+		    is_multicast_ether_addr(eth_data->h_dest)) {
 			do_tx_balance = false;
 			break;
 		}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/4] bond_alb: put all slaves into promisc
  2021-05-18 21:08 [PATCH 0/4] bond_alb: support VMs behind bridges better Jarod Wilson
                   ` (2 preceding siblings ...)
  2021-05-18 21:08 ` [PATCH 3/4] bond_alb: don't tx balance multicast traffic either Jarod Wilson
@ 2021-05-18 21:08 ` Jarod Wilson
  2021-05-19 16:47   ` Jay Vosburgh
  2021-05-21 13:27 ` [PATCH net-next v2 0/4] bonding/balance-alb: support VMs behind bridges better Jarod Wilson
  4 siblings, 1 reply; 20+ messages in thread
From: Jarod Wilson @ 2021-05-18 21:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, Jakub Kicinski, Thomas Davis, netdev

ALB mode bonding can receive on all slaves, so it would seem to make sense
that they're all in promisc, unlike other modes that have a primary
interface and can only receive on that interface.

Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Thomas Davis <tadavis@lbl.gov>
Cc: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/net/bonding/bond_main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index d71e398642fb..93f57ff1c552 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -644,9 +644,10 @@ static int bond_check_dev_link(struct bonding *bond,
 static int bond_set_promiscuity(struct bonding *bond, int inc)
 {
 	struct list_head *iter;
-	int err = 0;
+	int mode, err = 0;
 
-	if (bond_uses_primary(bond)) {
+	mode = BOND_MODE(bond);
+	if (mode == BOND_MODE_ACTIVEBACKUP || mode == BOND_MODE_TLB) {
 		struct slave *curr_active = rtnl_dereference(bond->curr_active_slave);
 
 		if (curr_active)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] bonding: add pure source-mac-based tx hashing option
  2021-05-18 21:08 ` [PATCH 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson
@ 2021-05-19  9:01   ` Nikolay Aleksandrov
  0 siblings, 0 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2021-05-19  9:01 UTC (permalink / raw)
  To: Jarod Wilson, linux-kernel
  Cc: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller,
	Jakub Kicinski, Thomas Davis, netdev

On 19/05/2021 00:08, Jarod Wilson wrote:
> As it turns out, a pure source-mac only tx hash has a place for some VM
> setups. The previously added vlan+srcmac hash doesn't work as well for a
> VM with a single MAC and multiple vlans -- these types of setups path
> traffic more efficiently if the load is split by source mac alone.
> 
> Cc: Jay Vosburgh <j.vosburgh@gmail.com>
> Cc: Veaceslav Falico <vfalico@gmail.com>
> Cc: Andy Gospodarek <andy@greyhouse.net>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Thomas Davis <tadavis@lbl.gov>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Jarod Wilson <jarod@redhat.com>
> ---
>  Documentation/networking/bonding.rst | 13 +++++++++++++
>  drivers/net/bonding/bond_main.c      | 26 +++++++++++++++++---------
>  drivers/net/bonding/bond_options.c   |  1 +
>  include/linux/netdevice.h            |  1 +
>  include/uapi/linux/if_bonding.h      |  1 +
>  5 files changed, 33 insertions(+), 9 deletions(-)
> 

Hi,
It would seem you keep adding modes for each field, that seems unnecessary to me and
it also affects the fast path - each new mode you add is another 1+ tests in bond's
fast path. You could instead just add 1 new mode which has configurable hash fields,
take the "hit" for it once in the fast-path (if chosen) and use that.
I'd like to avoid tomorrow getting another "dstmac" mode or something like that.

In fact both of these new modes are unnecessary in most cases, you could use any available method
(e.g. ebpf) to compute and set the skb queue mapping on Tx to choose any slave and that would
override any hash or bond mode. Check __bond_start_xmit() -> bond_slave_override()

Cheers,
 Nik

> diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
> index 62f2aab8eaec..66c3fa3a9040 100644
> --- a/Documentation/networking/bonding.rst
> +++ b/Documentation/networking/bonding.rst
> @@ -964,6 +964,19 @@ xmit_hash_policy
>  
>  		hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev)
>  
> +	srcmac
> +
> +		This policy uses a very rudimentary source mac hash to
> +		load-balance traffic per-source-mac, with failover should
> +		one leg fail. The intended use case is for a bond shared
> +		by multiple virtual machines, each with their own virtual
> +		mac address, keeping the VMs traffic all limited to the
> +		same outbound interface.
> +
> +		The formula for the hash is simply
> +
> +		hash = (source MAC vendor) XOR (source MAC dev)
> +
>  	The default value is layer2.  This option was added in bonding
>  	version 2.6.3.  In earlier versions of bonding, this parameter
>  	does not exist, and the layer2 policy is the only policy.  The
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 20bbda1b36e1..d71e398642fb 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -167,7 +167,8 @@ module_param(xmit_hash_policy, charp, 0);
>  MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3ad hashing method; "
>  				   "0 for layer 2 (default), 1 for layer 3+4, "
>  				   "2 for layer 2+3, 3 for encap layer 2+3, "
> -				   "4 for encap layer 3+4, 5 for vlan+srcmac");
> +				   "4 for encap layer 3+4, 5 for vlan+srcmac, "
> +				   "6 for srcmac");
>  module_param(arp_interval, int, 0);
>  MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
>  module_param_array(arp_ip_target, charp, NULL, 0);
> @@ -1459,6 +1460,8 @@ static enum netdev_lag_hash bond_lag_hash_type(struct bonding *bond,
>  		return NETDEV_LAG_HASH_E34;
>  	case BOND_XMIT_POLICY_VLAN_SRCMAC:
>  		return NETDEV_LAG_HASH_VLAN_SRCMAC;
> +	case BOND_XMIT_POLICY_SRCMAC:
> +		return NETDEV_LAG_HASH_SRCMAC;
>  	default:
>  		return NETDEV_LAG_HASH_UNKNOWN;
>  	}
> @@ -3521,11 +3524,11 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
>  	return true;
>  }
>  
> -static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
> +static u32 bond_vlan_srcmac_hash(struct sk_buff *skb, bool with_vlan)
>  {
> -	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
> +	struct ethhdr *mac_hdr = eth_hdr(skb);
>  	u32 srcmac_vendor = 0, srcmac_dev = 0;
> -	u16 vlan;
> +	u32 hash;
>  	int i;
>  
>  	for (i = 0; i < 3; i++)
> @@ -3534,12 +3537,14 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
>  	for (i = 3; i < ETH_ALEN; i++)
>  		srcmac_dev = (srcmac_dev << 8) | mac_hdr->h_source[i];
>  
> -	if (!skb_vlan_tag_present(skb))
> -		return srcmac_vendor ^ srcmac_dev;
> +	hash = srcmac_vendor ^ srcmac_dev;
> +
> +	if (!with_vlan || !skb_vlan_tag_present(skb))
> +		return hash;
>  
> -	vlan = skb_vlan_tag_get(skb);
> +	hash ^= skb_vlan_tag_get(skb);
>  
> -	return vlan ^ srcmac_vendor ^ srcmac_dev;
> +	return hash;
>  }
>  
>  /* Extract the appropriate headers based on bond's xmit policy */
> @@ -3618,8 +3623,11 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
>  	    skb->l4_hash)
>  		return skb->hash;
>  
> +	if (bond->params.xmit_policy == BOND_XMIT_POLICY_SRCMAC)
> +		return bond_vlan_srcmac_hash(skb, false);
> +
>  	if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
> -		return bond_vlan_srcmac_hash(skb);
> +		return bond_vlan_srcmac_hash(skb, true);
>  
>  	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
>  	    !bond_flow_dissect(bond, skb, &flow))
> diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
> index c9d3604ae129..ff68ad2589f0 100644
> --- a/drivers/net/bonding/bond_options.c
> +++ b/drivers/net/bonding/bond_options.c
> @@ -102,6 +102,7 @@ static const struct bond_opt_value bond_xmit_hashtype_tbl[] = {
>  	{ "encap2+3",    BOND_XMIT_POLICY_ENCAP23,     0},
>  	{ "encap3+4",    BOND_XMIT_POLICY_ENCAP34,     0},
>  	{ "vlan+srcmac", BOND_XMIT_POLICY_VLAN_SRCMAC, 0},
> +	{ "srcmac",      BOND_XMIT_POLICY_SRCMAC,      0},
>  	{ NULL,          -1,                           0},
>  };
>  
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 5cbc950b34df..d88319fca1d3 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -2732,6 +2732,7 @@ enum netdev_lag_hash {
>  	NETDEV_LAG_HASH_E23,
>  	NETDEV_LAG_HASH_E34,
>  	NETDEV_LAG_HASH_VLAN_SRCMAC,
> +	NETDEV_LAG_HASH_SRCMAC,
>  	NETDEV_LAG_HASH_UNKNOWN,
>  };
>  
> diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
> index d174914a837d..f3b4d412a73f 100644
> --- a/include/uapi/linux/if_bonding.h
> +++ b/include/uapi/linux/if_bonding.h
> @@ -95,6 +95,7 @@
>  #define BOND_XMIT_POLICY_ENCAP23	3 /* encapsulated layer 2+3 */
>  #define BOND_XMIT_POLICY_ENCAP34	4 /* encapsulated layer 3+4 */
>  #define BOND_XMIT_POLICY_VLAN_SRCMAC	5 /* vlan + source MAC */
> +#define BOND_XMIT_POLICY_SRCMAC		6 /* source MAC only */
>  
>  /* 802.3ad port state definitions (43.4.2.2 in the 802.3ad standard) */
>  #define LACP_STATE_LACP_ACTIVITY   0x1
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] bond_alb: put all slaves into promisc
  2021-05-18 21:08 ` [PATCH 4/4] bond_alb: put all slaves into promisc Jarod Wilson
@ 2021-05-19 16:47   ` Jay Vosburgh
  0 siblings, 0 replies; 20+ messages in thread
From: Jay Vosburgh @ 2021-05-19 16:47 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, Veaceslav Falico, Andy Gospodarek, David S. Miller,
	Jakub Kicinski, Thomas Davis, netdev

Jarod Wilson <jarod@redhat.com> wrote:

>ALB mode bonding can receive on all slaves, so it would seem to make sense
>that they're all in promisc, unlike other modes that have a primary
>interface and can only receive on that interface.

	When I first read the subject and this description, I thought
you meant that all of the alb slaves would be in promisc mode all the
time, not that enabling promisc on an alb bond would set promisc for all
slaves (instead of just the primary, the current behavior).

	Regardless, since in alb mode the expectation is that all of the
slaves will be on the same subnet (Ethernet broadcast domain), setting
all of the bonded interfaces to promisc instead of just one sounds like
a recipe for duplicate packets, as each would deliver multiple copies of
all "no longer filtered" traffic.  The bond_should_deliver_exact_match()
logic will still suppress broadcast and multicast frames to the
non-primary interface(s), but, e.g., unicast frames flooded to all
switch ports will be delivered once for each member of the bond.

	Unless you have a specific use case that this will improve, I
think this will cause more confusion that it will capture frames that
would have otherwise been missed.

	-J

>Cc: Jay Vosburgh <j.vosburgh@gmail.com>
>Cc: Veaceslav Falico <vfalico@gmail.com>
>Cc: Andy Gospodarek <andy@greyhouse.net>
>Cc: "David S. Miller" <davem@davemloft.net>
>Cc: Jakub Kicinski <kuba@kernel.org>
>Cc: Thomas Davis <tadavis@lbl.gov>
>Cc: netdev@vger.kernel.org
>Signed-off-by: Jarod Wilson <jarod@redhat.com>
>---
> drivers/net/bonding/bond_main.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index d71e398642fb..93f57ff1c552 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -644,9 +644,10 @@ static int bond_check_dev_link(struct bonding *bond,
> static int bond_set_promiscuity(struct bonding *bond, int inc)
> {
> 	struct list_head *iter;
>-	int err = 0;
>+	int mode, err = 0;
> 
>-	if (bond_uses_primary(bond)) {
>+	mode = BOND_MODE(bond);
>+	if (mode == BOND_MODE_ACTIVEBACKUP || mode == BOND_MODE_TLB) {
> 		struct slave *curr_active = rtnl_dereference(bond->curr_active_slave);
> 
> 		if (curr_active)
>-- 
>2.30.2
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] bond_alb: don't tx balance multicast traffic either
  2021-05-18 21:08 ` [PATCH 3/4] bond_alb: don't tx balance multicast traffic either Jarod Wilson
@ 2021-05-19 18:45   ` Jay Vosburgh
  0 siblings, 0 replies; 20+ messages in thread
From: Jay Vosburgh @ 2021-05-19 18:45 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, Veaceslav Falico, Andy Gospodarek, David S. Miller,
	Jakub Kicinski, Thomas Davis, netdev

Jarod Wilson <jarod@redhat.com> wrote:

>Multicast traffic going out the non-primary interface can come back in
>through the primary interface in alb mode. When there's a bridge sitting
>on top of the bond, with virtual machines behind it, attached to vnetX
>interfaces also acting as bridge ports, this can cause problems. The
>multicast traffic ends up rewriting the bridge forwarding database
>entries, replacing a vnetX entry in the fdb with the bond instead, at
>which point, we lose traffic. If we don't tx balance multicast traffic, we
>don't break connectivity.

	Just so I'm clear, the rewrite happens because the "looped"
frame bears the source MAC of the VM behind the bridge, but is arriving
at the bridge via the bond, correct?

	If so this change seems reasonable, with one minor nit, below.

>Cc: Jay Vosburgh <j.vosburgh@gmail.com>
>Cc: Veaceslav Falico <vfalico@gmail.com>
>Cc: Andy Gospodarek <andy@greyhouse.net>
>Cc: "David S. Miller" <davem@davemloft.net>
>Cc: Jakub Kicinski <kuba@kernel.org>
>Cc: Thomas Davis <tadavis@lbl.gov>
>Cc: netdev@vger.kernel.org
>Signed-off-by: Jarod Wilson <jarod@redhat.com>
>---
> drivers/net/bonding/bond_alb.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index ce8257c7cbea..4df661b77252 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -1422,6 +1422,7 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
> 		const struct iphdr *iph;
> 
> 		if (is_broadcast_ether_addr(eth_data->h_dest) ||
>+		    is_multicast_ether_addr(eth_data->h_dest) ||

	Note that is_multicast_ is a superset of is_broadcast_, so in
this case (and the one below) is_broadcast_ can simply be replaced by
is_multicast_.  Granted, is_broadcast_ is cheap, but this is in the TX
path for every packet.

	-J

> 		    !pskb_network_may_pull(skb, sizeof(*iph))) {
> 			do_tx_balance = false;
> 			break;
>@@ -1441,7 +1442,8 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
> 		/* IPv6 doesn't really use broadcast mac address, but leave
> 		 * that here just in case.
> 		 */
>-		if (is_broadcast_ether_addr(eth_data->h_dest)) {
>+		if (is_broadcast_ether_addr(eth_data->h_dest) ||
>+		    is_multicast_ether_addr(eth_data->h_dest)) {
> 			do_tx_balance = false;
> 			break;
> 		}
>-- 
>2.30.2
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] bond_alb: don't rewrite bridged non-local MACs
  2021-05-18 21:08 ` [PATCH 2/4] bond_alb: don't rewrite bridged non-local MACs Jarod Wilson
@ 2021-05-19 22:31   ` Jay Vosburgh
  0 siblings, 0 replies; 20+ messages in thread
From: Jay Vosburgh @ 2021-05-19 22:31 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, Veaceslav Falico, Andy Gospodarek, David S. Miller,
	Jakub Kicinski, Thomas Davis, netdev

Jarod Wilson <jarod@redhat.com> wrote:

>With a virtual machine behind a bridge on top of a bond, outgoing traffic
>should retain the VM's source MAC. That works fine most of the time, until
>doing a failover, and then the MAC gets rewritten to the bond slave's MAC,
>and the return traffic gets dropped. If we don't rewrite the MAC there, we
>don't lose any traffic.

	Please have the log message here specify that this applies only
to balance-alb mode, and, the usual nomenclature for bonding patches is
"[PATCH] bonding:"; for this case, I'd suggest "balance-alb:" right
afterwards to be clear that it's only for alb mode.  I didn't really
spot the "bond_alb" tag for what it was on first read, and it is the
only indication that this change is specific to alb mode other than the
patch itself.

>Cc: Jay Vosburgh <j.vosburgh@gmail.com>
>Cc: Veaceslav Falico <vfalico@gmail.com>
>Cc: Andy Gospodarek <andy@greyhouse.net>
>Cc: "David S. Miller" <davem@davemloft.net>
>Cc: Jakub Kicinski <kuba@kernel.org>
>Cc: Thomas Davis <tadavis@lbl.gov>
>Cc: netdev@vger.kernel.org
>Signed-off-by: Jarod Wilson <jarod@redhat.com>
>---
> drivers/net/bonding/bond_alb.c | 23 ++++++++++++++++++++++-
> 1 file changed, 22 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index 3455f2cc13f2..ce8257c7cbea 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -1302,6 +1302,26 @@ void bond_alb_deinitialize(struct bonding *bond)
> 		rlb_deinitialize(bond);
> }
> 
>+static bool bond_alb_bridged_mac(struct bonding *bond, struct ethhdr *eth_data)
>+{
>+	struct list_head *iter;
>+	struct slave *slave;
>+
>+	if (BOND_MODE(bond) != BOND_MODE_ALB)
>+		return false;
>+
>+	/* Don't modify source MACs that do not originate locally
>+	 * (e.g.,arrive via a bridge).
>+	 */
>+	if (!netif_is_bridge_port(bond->dev))
>+		return false;

	I believe this logic will fail if the plumbing is, e.g., bond ->
vlan -> bridge, as netif_is_bridge_port() would not return true for the
bond in that case.

	Making this reliable is tricky at best, and may be impossible to
be correct for all possible cases.  As such, I think the comment above
should reflect the limited scope of what is actually being checked here
(i.e., the bond itself is directly a bridge port).

	-J

>+
>+	if (bond_slave_has_mac_rx(bond, eth_data->h_source))
>+		return false;
>+
>+	return true;
>+}
>+
> static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond,
> 				    struct slave *tx_slave)
> {
>@@ -1316,7 +1336,8 @@ static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond,
> 	}
> 
> 	if (tx_slave && bond_slave_can_tx(tx_slave)) {
>-		if (tx_slave != rcu_access_pointer(bond->curr_active_slave)) {
>+		if (tx_slave != rcu_access_pointer(bond->curr_active_slave) &&
>+		    !bond_alb_bridged_mac(bond, eth_data)) {
> 			ether_addr_copy(eth_data->h_source,
> 					tx_slave->dev->dev_addr);
> 		}
>-- 
>2.30.2
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next v2 0/4] bonding/balance-alb: support VMs behind bridges better
  2021-05-18 21:08 [PATCH 0/4] bond_alb: support VMs behind bridges better Jarod Wilson
                   ` (3 preceding siblings ...)
  2021-05-18 21:08 ` [PATCH 4/4] bond_alb: put all slaves into promisc Jarod Wilson
@ 2021-05-21 13:27 ` Jarod Wilson
  2021-05-21 13:27   ` [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson
                     ` (3 more replies)
  4 siblings, 4 replies; 20+ messages in thread
From: Jarod Wilson @ 2021-05-21 13:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jarod Wilson

I've been further educated on a use case, where a bridge sits on top of
a bond, with multiple vnetX interfaces attached to virtual machines,
also acting as ports of the bridge. Each leg of the bond goes to a
different switch, but there is NO mlag/vpc in play, the bonding driver
has to handle traffic that loops back appropriately to avoid breaking
transmission. Rather than adding some sort of mac filtering to
balance-xor mode, we switched to using balance-alb, which already does
some of this, and with the tweaks provided in this series, empirically
seems to behave as desired in actual operation.

v2 attempts to support srcmac-only hashing via a modparam instead of by
adding yet another hashing mode, as well as cleaning up and clarifying
commit messages.

Jarod Wilson (4):
  bonding: add pure source-mac-based tx hashing option
  bonding/balance-lb: don't rewrite bridged non-local MACs
  bonding/balance-alb: don't tx balance multicast traffic either
  bonding/balance-alb: put all slaves into promisc

 Documentation/networking/bonding.rst | 13 +++++++++++++
 drivers/net/bonding/bond_alb.c       | 24 +++++++++++++++++++++---
 drivers/net/bonding/bond_main.c      | 23 +++++++++++++++--------
 3 files changed, 49 insertions(+), 11 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option
  2021-05-21 13:27 ` [PATCH net-next v2 0/4] bonding/balance-alb: support VMs behind bridges better Jarod Wilson
@ 2021-05-21 13:27   ` Jarod Wilson
  2021-05-21 13:39     ` Nikolay Aleksandrov
  2021-05-21 13:27   ` [PATCH net-next v2 2/4] bonding/balance-lb: don't rewrite bridged non-local MACs Jarod Wilson
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Jarod Wilson @ 2021-05-21 13:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, Jakub Kicinski, Thomas Davis,
	Nikolay Aleksandrov, netdev

As it turns out, a pure source-mac only tx hash has a place for some VM
setups. The previously added vlan+srcmac hash doesn't work as well for a
VM with a single MAC and multiple vlans -- these types of setups path
traffic more efficiently if the load is split by source mac alone. Enable
this by way of an extra module parameter, rather than adding yet another
hashing method in the tx fast path.

Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Thomas Davis <tadavis@lbl.gov>
Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 Documentation/networking/bonding.rst | 13 +++++++++++++
 drivers/net/bonding/bond_main.c      | 18 ++++++++++++------
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
index 62f2aab8eaec..767dbb49018b 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -707,6 +707,13 @@ mode
 		swapped with the new curr_active_slave that was
 		chosen.
 
+novlan_srcmac
+
+	When using the vlan+srcmac xmit_hash_policy, there may be cases where
+	omitting the vlan from the hash is beneficial. This can be done with
+	an extra module parameter here. The default value is 0 to include
+	vlan ID in the transmit hash.
+
 num_grat_arp,
 num_unsol_na
 
@@ -964,6 +971,12 @@ xmit_hash_policy
 
 		hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev)
 
+		Optionally, if the module parameter novlan_srcmac=1 is set,
+		the vlan ID is omitted from the hash and only the source MAC
+		address is used, reducing the hash to
+
+		hash = (source MAC vendor) XOR (source MAC dev)
+
 	The default value is layer2.  This option was added in bonding
 	version 2.6.3.  In earlier versions of bonding, this parameter
 	does not exist, and the layer2 policy is the only policy.  The
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 20bbda1b36e1..32785e9d0295 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -107,6 +107,7 @@ static char *lacp_rate;
 static int min_links;
 static char *ad_select;
 static char *xmit_hash_policy;
+static int novlan_srcmac;
 static int arp_interval;
 static char *arp_ip_target[BOND_MAX_ARP_TARGETS];
 static char *arp_validate;
@@ -168,6 +169,9 @@ MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3
 				   "0 for layer 2 (default), 1 for layer 3+4, "
 				   "2 for layer 2+3, 3 for encap layer 2+3, "
 				   "4 for encap layer 3+4, 5 for vlan+srcmac");
+module_param(novlan_srcmac, int, 0);
+MODULE_PARM_DESC(novlan_srcmac, "include vlan ID in vlan+srcmac xmit_hash_policy or not; "
+			      "0 to include it (default), 1 to exclude it");
 module_param(arp_interval, int, 0);
 MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
 module_param_array(arp_ip_target, charp, NULL, 0);
@@ -3523,9 +3527,9 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
 
 static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
 {
-	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
+	struct ethhdr *mac_hdr = eth_hdr(skb);
 	u32 srcmac_vendor = 0, srcmac_dev = 0;
-	u16 vlan;
+	u32 hash;
 	int i;
 
 	for (i = 0; i < 3; i++)
@@ -3534,12 +3538,14 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
 	for (i = 3; i < ETH_ALEN; i++)
 		srcmac_dev = (srcmac_dev << 8) | mac_hdr->h_source[i];
 
-	if (!skb_vlan_tag_present(skb))
-		return srcmac_vendor ^ srcmac_dev;
+	hash = srcmac_vendor ^ srcmac_dev;
+
+	if (novlan_srcmac || !skb_vlan_tag_present(skb))
+		return hash;
 
-	vlan = skb_vlan_tag_get(skb);
+	hash ^= skb_vlan_tag_get(skb);
 
-	return vlan ^ srcmac_vendor ^ srcmac_dev;
+	return hash;
 }
 
 /* Extract the appropriate headers based on bond's xmit policy */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v2 2/4] bonding/balance-lb: don't rewrite bridged non-local MACs
  2021-05-21 13:27 ` [PATCH net-next v2 0/4] bonding/balance-alb: support VMs behind bridges better Jarod Wilson
  2021-05-21 13:27   ` [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson
@ 2021-05-21 13:27   ` Jarod Wilson
  2021-05-21 17:58     ` Jay Vosburgh
  2021-05-21 13:27   ` [PATCH net-next v2 3/4] bonding/balance-alb: don't tx balance multicast traffic either Jarod Wilson
  2021-05-21 13:27   ` [PATCH net-next v2 4/4] bonding/balance-alb: put all slaves into promisc Jarod Wilson
  3 siblings, 1 reply; 20+ messages in thread
From: Jarod Wilson @ 2021-05-21 13:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, Jakub Kicinski, Thomas Davis, netdev

With a virtual machine behind a bridge that directly incorporates a
balance-alb bond as one of its ports, outgoing traffic should retain the
VM's source MAC. That works fine most of the time, until doing a failover,
and then the MAC gets rewritten to the bond slave's MAC, and the return
traffic gets dropped. If we don't rewrite the MAC there, we don't lose the
traffic.

Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Thomas Davis <tadavis@lbl.gov>
Cc: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/net/bonding/bond_alb.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 3455f2cc13f2..c57f62e43328 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -1302,6 +1302,23 @@ void bond_alb_deinitialize(struct bonding *bond)
 		rlb_deinitialize(bond);
 }
 
+static bool bond_alb_bridged_mac(struct bonding *bond, struct ethhdr *eth_data)
+{
+	if (BOND_MODE(bond) != BOND_MODE_ALB)
+		return false;
+
+	/* Don't modify source MACs that do not originate locally
+	 * (e.g.,arrive via a bridge).
+	 */
+	if (!netif_is_bridge_port(bond->dev))
+		return false;
+
+	if (bond_slave_has_mac_rx(bond, eth_data->h_source))
+		return false;
+
+	return true;
+}
+
 static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond,
 				    struct slave *tx_slave)
 {
@@ -1316,7 +1333,8 @@ static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond,
 	}
 
 	if (tx_slave && bond_slave_can_tx(tx_slave)) {
-		if (tx_slave != rcu_access_pointer(bond->curr_active_slave)) {
+		if (tx_slave != rcu_access_pointer(bond->curr_active_slave) &&
+		    !bond_alb_bridged_mac(bond, eth_data)) {
 			ether_addr_copy(eth_data->h_source,
 					tx_slave->dev->dev_addr);
 		}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v2 3/4] bonding/balance-alb: don't tx balance multicast traffic either
  2021-05-21 13:27 ` [PATCH net-next v2 0/4] bonding/balance-alb: support VMs behind bridges better Jarod Wilson
  2021-05-21 13:27   ` [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson
  2021-05-21 13:27   ` [PATCH net-next v2 2/4] bonding/balance-lb: don't rewrite bridged non-local MACs Jarod Wilson
@ 2021-05-21 13:27   ` Jarod Wilson
  2021-05-21 17:02     ` Jay Vosburgh
  2021-05-21 13:27   ` [PATCH net-next v2 4/4] bonding/balance-alb: put all slaves into promisc Jarod Wilson
  3 siblings, 1 reply; 20+ messages in thread
From: Jarod Wilson @ 2021-05-21 13:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, Jakub Kicinski, Thomas Davis, netdev

Multicast traffic going out the non-primary interface can come back in
through the primary interface in alb mode. When there's a bridge sitting
on top of the bond, with virtual machines behind it, attached to vnetX
interfaces also acting as bridge ports, this can cause problems. The
looped frame has the source MAC of the VM behind the bridge, and ends up
rewriting the bridge forwarding database entries, replacing a vnetX entry
in the fdb with the bond instead, at which point, we lose traffic. If we
don't tx balance multicast traffic, we don't break connectivity.

Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Thomas Davis <tadavis@lbl.gov>
Cc: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/net/bonding/bond_alb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index c57f62e43328..cddc4d8b2519 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -1418,7 +1418,7 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
 	case ETH_P_IP: {
 		const struct iphdr *iph;
 
-		if (is_broadcast_ether_addr(eth_data->h_dest) ||
+		if (is_multicast_ether_addr(eth_data->h_dest) ||
 		    !pskb_network_may_pull(skb, sizeof(*iph))) {
 			do_tx_balance = false;
 			break;
@@ -1438,7 +1438,7 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
 		/* IPv6 doesn't really use broadcast mac address, but leave
 		 * that here just in case.
 		 */
-		if (is_broadcast_ether_addr(eth_data->h_dest)) {
+		if (is_multicast_ether_addr(eth_data->h_dest)) {
 			do_tx_balance = false;
 			break;
 		}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v2 4/4] bonding/balance-alb: put all slaves into promisc
  2021-05-21 13:27 ` [PATCH net-next v2 0/4] bonding/balance-alb: support VMs behind bridges better Jarod Wilson
                     ` (2 preceding siblings ...)
  2021-05-21 13:27   ` [PATCH net-next v2 3/4] bonding/balance-alb: don't tx balance multicast traffic either Jarod Wilson
@ 2021-05-21 13:27   ` Jarod Wilson
  2021-05-21 17:01     ` Jay Vosburgh
  3 siblings, 1 reply; 20+ messages in thread
From: Jarod Wilson @ 2021-05-21 13:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, Jakub Kicinski, Thomas Davis, netdev

Unlike most other modes with a primary interface, ALB mode bonding can
receive on all slaves. That includes traffic destined for a non-local MAC
behind a bridge on top of the bond. Such traffic gets dropped if the
interface isn't in promiscuous mode. Therefore, it would seem to make
sense to put all slaves into promisc.

Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Thomas Davis <tadavis@lbl.gov>
Cc: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/net/bonding/bond_main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 32785e9d0295..6d95f9e46059 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -647,9 +647,10 @@ static int bond_check_dev_link(struct bonding *bond,
 static int bond_set_promiscuity(struct bonding *bond, int inc)
 {
 	struct list_head *iter;
-	int err = 0;
+	int mode, err = 0;
 
-	if (bond_uses_primary(bond)) {
+	mode = BOND_MODE(bond);
+	if (mode == BOND_MODE_ACTIVEBACKUP || mode == BOND_MODE_TLB) {
 		struct slave *curr_active = rtnl_dereference(bond->curr_active_slave);
 
 		if (curr_active)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option
  2021-05-21 13:27   ` [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson
@ 2021-05-21 13:39     ` Nikolay Aleksandrov
  2021-05-21 13:41       ` Nikolay Aleksandrov
  2021-05-21 18:01       ` Jay Vosburgh
  0 siblings, 2 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2021-05-21 13:39 UTC (permalink / raw)
  To: Jarod Wilson, linux-kernel
  Cc: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller,
	Jakub Kicinski, Thomas Davis, netdev

On 21/05/2021 16:27, Jarod Wilson wrote:
> As it turns out, a pure source-mac only tx hash has a place for some VM
> setups. The previously added vlan+srcmac hash doesn't work as well for a
> VM with a single MAC and multiple vlans -- these types of setups path
> traffic more efficiently if the load is split by source mac alone. Enable
> this by way of an extra module parameter, rather than adding yet another
> hashing method in the tx fast path.
> 
> Cc: Jay Vosburgh <j.vosburgh@gmail.com>
> Cc: Veaceslav Falico <vfalico@gmail.com>
> Cc: Andy Gospodarek <andy@greyhouse.net>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Thomas Davis <tadavis@lbl.gov>
> Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Jarod Wilson <jarod@redhat.com>
> ---
>  Documentation/networking/bonding.rst | 13 +++++++++++++
>  drivers/net/bonding/bond_main.c      | 18 ++++++++++++------
>  2 files changed, 25 insertions(+), 6 deletions(-)
> 

Hi,
You seem to be missing netlink support for the new option. Code-wise the rest seems fine,
my personal preference is still to make a configurable hash option and perhaps default to
srcmac+vlan, i.e. it can be aliased with this hash option. I don't mind either way, but
please add netlink support if it will be a new option as it's the preferred way for
configuring.

Thanks,
 Nik

> diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
> index 62f2aab8eaec..767dbb49018b 100644
> --- a/Documentation/networking/bonding.rst
> +++ b/Documentation/networking/bonding.rst
> @@ -707,6 +707,13 @@ mode
>  		swapped with the new curr_active_slave that was
>  		chosen.
>  
> +novlan_srcmac
> +
> +	When using the vlan+srcmac xmit_hash_policy, there may be cases where
> +	omitting the vlan from the hash is beneficial. This can be done with
> +	an extra module parameter here. The default value is 0 to include
> +	vlan ID in the transmit hash.
> +
>  num_grat_arp,
>  num_unsol_na
>  
> @@ -964,6 +971,12 @@ xmit_hash_policy
>  
>  		hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev)
>  
> +		Optionally, if the module parameter novlan_srcmac=1 is set,
> +		the vlan ID is omitted from the hash and only the source MAC
> +		address is used, reducing the hash to
> +
> +		hash = (source MAC vendor) XOR (source MAC dev)
> +
>  	The default value is layer2.  This option was added in bonding
>  	version 2.6.3.  In earlier versions of bonding, this parameter
>  	does not exist, and the layer2 policy is the only policy.  The
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 20bbda1b36e1..32785e9d0295 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -107,6 +107,7 @@ static char *lacp_rate;
>  static int min_links;
>  static char *ad_select;
>  static char *xmit_hash_policy;
> +static int novlan_srcmac;
>  static int arp_interval;
>  static char *arp_ip_target[BOND_MAX_ARP_TARGETS];
>  static char *arp_validate;
> @@ -168,6 +169,9 @@ MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3
>  				   "0 for layer 2 (default), 1 for layer 3+4, "
>  				   "2 for layer 2+3, 3 for encap layer 2+3, "
>  				   "4 for encap layer 3+4, 5 for vlan+srcmac");
> +module_param(novlan_srcmac, int, 0);
> +MODULE_PARM_DESC(novlan_srcmac, "include vlan ID in vlan+srcmac xmit_hash_policy or not; "
> +			      "0 to include it (default), 1 to exclude it");
>  module_param(arp_interval, int, 0);
>  MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
>  module_param_array(arp_ip_target, charp, NULL, 0);
> @@ -3523,9 +3527,9 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
>  
>  static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
>  {
> -	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
> +	struct ethhdr *mac_hdr = eth_hdr(skb);
>  	u32 srcmac_vendor = 0, srcmac_dev = 0;
> -	u16 vlan;
> +	u32 hash;
>  	int i;
>  
>  	for (i = 0; i < 3; i++)
> @@ -3534,12 +3538,14 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
>  	for (i = 3; i < ETH_ALEN; i++)
>  		srcmac_dev = (srcmac_dev << 8) | mac_hdr->h_source[i];
>  
> -	if (!skb_vlan_tag_present(skb))
> -		return srcmac_vendor ^ srcmac_dev;
> +	hash = srcmac_vendor ^ srcmac_dev;
> +
> +	if (novlan_srcmac || !skb_vlan_tag_present(skb))
> +		return hash;
>  
> -	vlan = skb_vlan_tag_get(skb);
> +	hash ^= skb_vlan_tag_get(skb);
>  
> -	return vlan ^ srcmac_vendor ^ srcmac_dev;
> +	return hash;
>  }
>  
>  /* Extract the appropriate headers based on bond's xmit policy */
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option
  2021-05-21 13:39     ` Nikolay Aleksandrov
@ 2021-05-21 13:41       ` Nikolay Aleksandrov
  2021-05-21 18:01       ` Jay Vosburgh
  1 sibling, 0 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2021-05-21 13:41 UTC (permalink / raw)
  To: Jarod Wilson, linux-kernel
  Cc: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller,
	Jakub Kicinski, Thomas Davis, netdev

On 21/05/2021 16:39, Nikolay Aleksandrov wrote:
> On 21/05/2021 16:27, Jarod Wilson wrote:
>> As it turns out, a pure source-mac only tx hash has a place for some VM
>> setups. The previously added vlan+srcmac hash doesn't work as well for a
>> VM with a single MAC and multiple vlans -- these types of setups path
>> traffic more efficiently if the load is split by source mac alone. Enable
>> this by way of an extra module parameter, rather than adding yet another
>> hashing method in the tx fast path.
>>
>> Cc: Jay Vosburgh <j.vosburgh@gmail.com>
>> Cc: Veaceslav Falico <vfalico@gmail.com>
>> Cc: Andy Gospodarek <andy@greyhouse.net>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: Jakub Kicinski <kuba@kernel.org>
>> Cc: Thomas Davis <tadavis@lbl.gov>
>> Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
>> Cc: netdev@vger.kernel.org
>> Signed-off-by: Jarod Wilson <jarod@redhat.com>
>> ---
>>  Documentation/networking/bonding.rst | 13 +++++++++++++
>>  drivers/net/bonding/bond_main.c      | 18 ++++++++++++------
>>  2 files changed, 25 insertions(+), 6 deletions(-)
>>
> 
> Hi,
> You seem to be missing netlink support for the new option. Code-wise the rest seems fine,
> my personal preference is still to make a configurable hash option and perhaps default to
> srcmac+vlan, i.e. it can be aliased with this hash option. I don't mind either way, but
> please add netlink support if it will be a new option as it's the preferred way for
> configuring.
> 
> Thanks,
>  Nik
> 

Almost forgot - please include a cover letter with overview and motivation of the changes.
Also I guess these should be targeted at net-next (or at least the new option definitely should go
there), and please add changelog between patchset versions.

Cheers,
 Nik

>> diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
>> index 62f2aab8eaec..767dbb49018b 100644
>> --- a/Documentation/networking/bonding.rst
>> +++ b/Documentation/networking/bonding.rst
>> @@ -707,6 +707,13 @@ mode
>>  		swapped with the new curr_active_slave that was
>>  		chosen.
>>  
>> +novlan_srcmac
>> +
>> +	When using the vlan+srcmac xmit_hash_policy, there may be cases where
>> +	omitting the vlan from the hash is beneficial. This can be done with
>> +	an extra module parameter here. The default value is 0 to include
>> +	vlan ID in the transmit hash.
>> +
>>  num_grat_arp,
>>  num_unsol_na
>>  
>> @@ -964,6 +971,12 @@ xmit_hash_policy
>>  
>>  		hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev)
>>  
>> +		Optionally, if the module parameter novlan_srcmac=1 is set,
>> +		the vlan ID is omitted from the hash and only the source MAC
>> +		address is used, reducing the hash to
>> +
>> +		hash = (source MAC vendor) XOR (source MAC dev)
>> +
>>  	The default value is layer2.  This option was added in bonding
>>  	version 2.6.3.  In earlier versions of bonding, this parameter
>>  	does not exist, and the layer2 policy is the only policy.  The
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 20bbda1b36e1..32785e9d0295 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -107,6 +107,7 @@ static char *lacp_rate;
>>  static int min_links;
>>  static char *ad_select;
>>  static char *xmit_hash_policy;
>> +static int novlan_srcmac;
>>  static int arp_interval;
>>  static char *arp_ip_target[BOND_MAX_ARP_TARGETS];
>>  static char *arp_validate;
>> @@ -168,6 +169,9 @@ MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3
>>  				   "0 for layer 2 (default), 1 for layer 3+4, "
>>  				   "2 for layer 2+3, 3 for encap layer 2+3, "
>>  				   "4 for encap layer 3+4, 5 for vlan+srcmac");
>> +module_param(novlan_srcmac, int, 0);
>> +MODULE_PARM_DESC(novlan_srcmac, "include vlan ID in vlan+srcmac xmit_hash_policy or not; "
>> +			      "0 to include it (default), 1 to exclude it");
>>  module_param(arp_interval, int, 0);
>>  MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
>>  module_param_array(arp_ip_target, charp, NULL, 0);
>> @@ -3523,9 +3527,9 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
>>  
>>  static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
>>  {
>> -	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
>> +	struct ethhdr *mac_hdr = eth_hdr(skb);
>>  	u32 srcmac_vendor = 0, srcmac_dev = 0;
>> -	u16 vlan;
>> +	u32 hash;
>>  	int i;
>>  
>>  	for (i = 0; i < 3; i++)
>> @@ -3534,12 +3538,14 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
>>  	for (i = 3; i < ETH_ALEN; i++)
>>  		srcmac_dev = (srcmac_dev << 8) | mac_hdr->h_source[i];
>>  
>> -	if (!skb_vlan_tag_present(skb))
>> -		return srcmac_vendor ^ srcmac_dev;
>> +	hash = srcmac_vendor ^ srcmac_dev;
>> +
>> +	if (novlan_srcmac || !skb_vlan_tag_present(skb))
>> +		return hash;
>>  
>> -	vlan = skb_vlan_tag_get(skb);
>> +	hash ^= skb_vlan_tag_get(skb);
>>  
>> -	return vlan ^ srcmac_vendor ^ srcmac_dev;
>> +	return hash;
>>  }
>>  
>>  /* Extract the appropriate headers based on bond's xmit policy */
>>
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v2 4/4] bonding/balance-alb: put all slaves into promisc
  2021-05-21 13:27   ` [PATCH net-next v2 4/4] bonding/balance-alb: put all slaves into promisc Jarod Wilson
@ 2021-05-21 17:01     ` Jay Vosburgh
  0 siblings, 0 replies; 20+ messages in thread
From: Jay Vosburgh @ 2021-05-21 17:01 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, Veaceslav Falico, Andy Gospodarek, David S. Miller,
	Jakub Kicinski, Thomas Davis, netdev

Jarod Wilson <jarod@redhat.com> wrote:

>Unlike most other modes with a primary interface, ALB mode bonding can
>receive on all slaves. That includes traffic destined for a non-local MAC
>behind a bridge on top of the bond. Such traffic gets dropped if the
>interface isn't in promiscuous mode. Therefore, it would seem to make
>sense to put all slaves into promisc.

	I'm still confused by this description and the actual changes of
the patch; the description above reads to me as you're intending that
all slaves of an alb bond should be promisc all the time, but the patch
below doesn't do that (it causes all alb mode slaves to be set to
promisc when the bond itself is set to promisc mode).  Could you
clarify?

	Also, your phrasing that "it would seem to make sense" suggests
to me that this change is speculative.  Do you have a specific example
of when the prior behavior causes an issue?

	-J

>Cc: Jay Vosburgh <j.vosburgh@gmail.com>
>Cc: Veaceslav Falico <vfalico@gmail.com>
>Cc: Andy Gospodarek <andy@greyhouse.net>
>Cc: "David S. Miller" <davem@davemloft.net>
>Cc: Jakub Kicinski <kuba@kernel.org>
>Cc: Thomas Davis <tadavis@lbl.gov>
>Cc: netdev@vger.kernel.org
>Signed-off-by: Jarod Wilson <jarod@redhat.com>
>---
> drivers/net/bonding/bond_main.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 32785e9d0295..6d95f9e46059 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -647,9 +647,10 @@ static int bond_check_dev_link(struct bonding *bond,
> static int bond_set_promiscuity(struct bonding *bond, int inc)
> {
> 	struct list_head *iter;
>-	int err = 0;
>+	int mode, err = 0;
> 
>-	if (bond_uses_primary(bond)) {
>+	mode = BOND_MODE(bond);
>+	if (mode == BOND_MODE_ACTIVEBACKUP || mode == BOND_MODE_TLB) {
> 		struct slave *curr_active = rtnl_dereference(bond->curr_active_slave);
> 
> 		if (curr_active)
>-- 
>2.30.2


---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v2 3/4] bonding/balance-alb: don't tx balance multicast traffic either
  2021-05-21 13:27   ` [PATCH net-next v2 3/4] bonding/balance-alb: don't tx balance multicast traffic either Jarod Wilson
@ 2021-05-21 17:02     ` Jay Vosburgh
  0 siblings, 0 replies; 20+ messages in thread
From: Jay Vosburgh @ 2021-05-21 17:02 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, Veaceslav Falico, Andy Gospodarek, David S. Miller,
	Jakub Kicinski, Thomas Davis, netdev

Jarod Wilson <jarod@redhat.com> wrote:

>Multicast traffic going out the non-primary interface can come back in
>through the primary interface in alb mode. When there's a bridge sitting
>on top of the bond, with virtual machines behind it, attached to vnetX
>interfaces also acting as bridge ports, this can cause problems. The
>looped frame has the source MAC of the VM behind the bridge, and ends up
>rewriting the bridge forwarding database entries, replacing a vnetX entry
>in the fdb with the bond instead, at which point, we lose traffic. If we
>don't tx balance multicast traffic, we don't break connectivity.
>
>Cc: Jay Vosburgh <j.vosburgh@gmail.com>
>Cc: Veaceslav Falico <vfalico@gmail.com>
>Cc: Andy Gospodarek <andy@greyhouse.net>
>Cc: "David S. Miller" <davem@davemloft.net>
>Cc: Jakub Kicinski <kuba@kernel.org>
>Cc: Thomas Davis <tadavis@lbl.gov>
>Cc: netdev@vger.kernel.org
>Signed-off-by: Jarod Wilson <jarod@redhat.com>

Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>


>---
> drivers/net/bonding/bond_alb.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index c57f62e43328..cddc4d8b2519 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -1418,7 +1418,7 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
> 	case ETH_P_IP: {
> 		const struct iphdr *iph;
> 
>-		if (is_broadcast_ether_addr(eth_data->h_dest) ||
>+		if (is_multicast_ether_addr(eth_data->h_dest) ||
> 		    !pskb_network_may_pull(skb, sizeof(*iph))) {
> 			do_tx_balance = false;
> 			break;
>@@ -1438,7 +1438,7 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
> 		/* IPv6 doesn't really use broadcast mac address, but leave
> 		 * that here just in case.
> 		 */
>-		if (is_broadcast_ether_addr(eth_data->h_dest)) {
>+		if (is_multicast_ether_addr(eth_data->h_dest)) {
> 			do_tx_balance = false;
> 			break;
> 		}
>-- 
>2.30.2
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v2 2/4] bonding/balance-lb: don't rewrite bridged non-local MACs
  2021-05-21 13:27   ` [PATCH net-next v2 2/4] bonding/balance-lb: don't rewrite bridged non-local MACs Jarod Wilson
@ 2021-05-21 17:58     ` Jay Vosburgh
  0 siblings, 0 replies; 20+ messages in thread
From: Jay Vosburgh @ 2021-05-21 17:58 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, Veaceslav Falico, Andy Gospodarek, David S. Miller,
	Jakub Kicinski, Thomas Davis, netdev

Jarod Wilson <jarod@redhat.com> wrote:

>With a virtual machine behind a bridge that directly incorporates a
>balance-alb bond as one of its ports, outgoing traffic should retain the
>VM's source MAC. That works fine most of the time, until doing a failover,
>and then the MAC gets rewritten to the bond slave's MAC, and the return
>traffic gets dropped. If we don't rewrite the MAC there, we don't lose the
>traffic.

	Comparing the description above to the patch, is the erroneous
behavior really related to failover (i.e., bond slave goes down, bond
reshuffles various things as it is wont to do), or is it related to
either a TX side rebalance or even simply that specific traffic is being
sent on a slave that isn't the curr_active_slave?

	One more comment, below.

>Cc: Jay Vosburgh <j.vosburgh@gmail.com>
>Cc: Veaceslav Falico <vfalico@gmail.com>
>Cc: Andy Gospodarek <andy@greyhouse.net>
>Cc: "David S. Miller" <davem@davemloft.net>
>Cc: Jakub Kicinski <kuba@kernel.org>
>Cc: Thomas Davis <tadavis@lbl.gov>
>Cc: netdev@vger.kernel.org
>Signed-off-by: Jarod Wilson <jarod@redhat.com>
>---
> drivers/net/bonding/bond_alb.c | 20 +++++++++++++++++++-
> 1 file changed, 19 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index 3455f2cc13f2..c57f62e43328 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -1302,6 +1302,23 @@ void bond_alb_deinitialize(struct bonding *bond)
> 		rlb_deinitialize(bond);
> }
> 
>+static bool bond_alb_bridged_mac(struct bonding *bond, struct ethhdr *eth_data)
>+{
>+	if (BOND_MODE(bond) != BOND_MODE_ALB)
>+		return false;
>+
>+	/* Don't modify source MACs that do not originate locally
>+	 * (e.g.,arrive via a bridge).
>+	 */
>+	if (!netif_is_bridge_port(bond->dev))
>+		return false;

	Repeating my comment (from my response to the v1 patch) that
hasn't been addressed:

	I believe this logic will fail if the plumbing is, e.g., bond ->
vlan -> bridge, as netif_is_bridge_port() would not return true for the
bond in that case.

	Making this reliable is tricky at best, and may be impossible to
be correct for all possible cases.  As such, I think the comment above
should reflect the limited scope of what is actually being checked here
(i.e., the bond itself is directly a bridge port).

	Ideally, the bonding.rst documentation should describe the
special behaviors of alb mode when configured as a bridge port.

	-J

>+
>+	if (bond_slave_has_mac_rx(bond, eth_data->h_source))
>+		return false;
>+
>+	return true;
>+}
>+
> static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond,
> 				    struct slave *tx_slave)
> {
>@@ -1316,7 +1333,8 @@ static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond,
> 	}
> 
> 	if (tx_slave && bond_slave_can_tx(tx_slave)) {
>-		if (tx_slave != rcu_access_pointer(bond->curr_active_slave)) {
>+		if (tx_slave != rcu_access_pointer(bond->curr_active_slave) &&
>+		    !bond_alb_bridged_mac(bond, eth_data)) {
> 			ether_addr_copy(eth_data->h_source,
> 					tx_slave->dev->dev_addr);
> 		}
>-- 
>2.30.2

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option
  2021-05-21 13:39     ` Nikolay Aleksandrov
  2021-05-21 13:41       ` Nikolay Aleksandrov
@ 2021-05-21 18:01       ` Jay Vosburgh
  1 sibling, 0 replies; 20+ messages in thread
From: Jay Vosburgh @ 2021-05-21 18:01 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Jarod Wilson, linux-kernel, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, Jakub Kicinski, Thomas Davis, netdev

Nikolay Aleksandrov <nikolay@nvidia.com> wrote:

>On 21/05/2021 16:27, Jarod Wilson wrote:
>> As it turns out, a pure source-mac only tx hash has a place for some VM
>> setups. The previously added vlan+srcmac hash doesn't work as well for a
>> VM with a single MAC and multiple vlans -- these types of setups path
>> traffic more efficiently if the load is split by source mac alone. Enable
>> this by way of an extra module parameter, rather than adding yet another
>> hashing method in the tx fast path.
>> 
>> Cc: Jay Vosburgh <j.vosburgh@gmail.com>
>> Cc: Veaceslav Falico <vfalico@gmail.com>
>> Cc: Andy Gospodarek <andy@greyhouse.net>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: Jakub Kicinski <kuba@kernel.org>
>> Cc: Thomas Davis <tadavis@lbl.gov>
>> Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
>> Cc: netdev@vger.kernel.org
>> Signed-off-by: Jarod Wilson <jarod@redhat.com>
>> ---
>>  Documentation/networking/bonding.rst | 13 +++++++++++++
>>  drivers/net/bonding/bond_main.c      | 18 ++++++++++++------
>>  2 files changed, 25 insertions(+), 6 deletions(-)
>> 
>
>Hi,
>You seem to be missing netlink support for the new option. Code-wise the rest seems fine,
>my personal preference is still to make a configurable hash option and perhaps default to
>srcmac+vlan, i.e. it can be aliased with this hash option. I don't mind either way, but
>please add netlink support if it will be a new option as it's the preferred way for
>configuring.

	FWIW, I'm in agreement with Nik here; netlink support is
mandatory for any new options.

	-J

>Thanks,
> Nik
>
>> diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
>> index 62f2aab8eaec..767dbb49018b 100644
>> --- a/Documentation/networking/bonding.rst
>> +++ b/Documentation/networking/bonding.rst
>> @@ -707,6 +707,13 @@ mode
>>  		swapped with the new curr_active_slave that was
>>  		chosen.
>>  
>> +novlan_srcmac
>> +
>> +	When using the vlan+srcmac xmit_hash_policy, there may be cases where
>> +	omitting the vlan from the hash is beneficial. This can be done with
>> +	an extra module parameter here. The default value is 0 to include
>> +	vlan ID in the transmit hash.
>> +
>>  num_grat_arp,
>>  num_unsol_na
>>  
>> @@ -964,6 +971,12 @@ xmit_hash_policy
>>  
>>  		hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev)
>>  
>> +		Optionally, if the module parameter novlan_srcmac=1 is set,
>> +		the vlan ID is omitted from the hash and only the source MAC
>> +		address is used, reducing the hash to
>> +
>> +		hash = (source MAC vendor) XOR (source MAC dev)
>> +
>>  	The default value is layer2.  This option was added in bonding
>>  	version 2.6.3.  In earlier versions of bonding, this parameter
>>  	does not exist, and the layer2 policy is the only policy.  The
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 20bbda1b36e1..32785e9d0295 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -107,6 +107,7 @@ static char *lacp_rate;
>>  static int min_links;
>>  static char *ad_select;
>>  static char *xmit_hash_policy;
>> +static int novlan_srcmac;
>>  static int arp_interval;
>>  static char *arp_ip_target[BOND_MAX_ARP_TARGETS];
>>  static char *arp_validate;
>> @@ -168,6 +169,9 @@ MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3
>>  				   "0 for layer 2 (default), 1 for layer 3+4, "
>>  				   "2 for layer 2+3, 3 for encap layer 2+3, "
>>  				   "4 for encap layer 3+4, 5 for vlan+srcmac");
>> +module_param(novlan_srcmac, int, 0);
>> +MODULE_PARM_DESC(novlan_srcmac, "include vlan ID in vlan+srcmac xmit_hash_policy or not; "
>> +			      "0 to include it (default), 1 to exclude it");
>>  module_param(arp_interval, int, 0);
>>  MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
>>  module_param_array(arp_ip_target, charp, NULL, 0);
>> @@ -3523,9 +3527,9 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
>>  
>>  static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
>>  {
>> -	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
>> +	struct ethhdr *mac_hdr = eth_hdr(skb);
>>  	u32 srcmac_vendor = 0, srcmac_dev = 0;
>> -	u16 vlan;
>> +	u32 hash;
>>  	int i;
>>  
>>  	for (i = 0; i < 3; i++)
>> @@ -3534,12 +3538,14 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
>>  	for (i = 3; i < ETH_ALEN; i++)
>>  		srcmac_dev = (srcmac_dev << 8) | mac_hdr->h_source[i];
>>  
>> -	if (!skb_vlan_tag_present(skb))
>> -		return srcmac_vendor ^ srcmac_dev;
>> +	hash = srcmac_vendor ^ srcmac_dev;
>> +
>> +	if (novlan_srcmac || !skb_vlan_tag_present(skb))
>> +		return hash;
>>  
>> -	vlan = skb_vlan_tag_get(skb);
>> +	hash ^= skb_vlan_tag_get(skb);
>>  
>> -	return vlan ^ srcmac_vendor ^ srcmac_dev;
>> +	return hash;
>>  }
>>  
>>  /* Extract the appropriate headers based on bond's xmit policy */

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-05-21 18:01 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-18 21:08 [PATCH 0/4] bond_alb: support VMs behind bridges better Jarod Wilson
2021-05-18 21:08 ` [PATCH 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson
2021-05-19  9:01   ` Nikolay Aleksandrov
2021-05-18 21:08 ` [PATCH 2/4] bond_alb: don't rewrite bridged non-local MACs Jarod Wilson
2021-05-19 22:31   ` Jay Vosburgh
2021-05-18 21:08 ` [PATCH 3/4] bond_alb: don't tx balance multicast traffic either Jarod Wilson
2021-05-19 18:45   ` Jay Vosburgh
2021-05-18 21:08 ` [PATCH 4/4] bond_alb: put all slaves into promisc Jarod Wilson
2021-05-19 16:47   ` Jay Vosburgh
2021-05-21 13:27 ` [PATCH net-next v2 0/4] bonding/balance-alb: support VMs behind bridges better Jarod Wilson
2021-05-21 13:27   ` [PATCH net-next v2 1/4] bonding: add pure source-mac-based tx hashing option Jarod Wilson
2021-05-21 13:39     ` Nikolay Aleksandrov
2021-05-21 13:41       ` Nikolay Aleksandrov
2021-05-21 18:01       ` Jay Vosburgh
2021-05-21 13:27   ` [PATCH net-next v2 2/4] bonding/balance-lb: don't rewrite bridged non-local MACs Jarod Wilson
2021-05-21 17:58     ` Jay Vosburgh
2021-05-21 13:27   ` [PATCH net-next v2 3/4] bonding/balance-alb: don't tx balance multicast traffic either Jarod Wilson
2021-05-21 17:02     ` Jay Vosburgh
2021-05-21 13:27   ` [PATCH net-next v2 4/4] bonding/balance-alb: put all slaves into promisc Jarod Wilson
2021-05-21 17:01     ` Jay Vosburgh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.