netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC net-next 00/13] RX filtering for DSA switches
@ 2020-05-21 21:10 Vladimir Oltean
  2020-05-21 21:10 ` [PATCH RFC net-next 01/13] net: core: dev_addr_lists: add VID to device address Vladimir Oltean
                   ` (15 more replies)
  0 siblings, 16 replies; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-21 21:10 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem
  Cc: jiri, idosch, kuba, ivecera, netdev, horatiu.vultur,
	allan.nielsen, nikolay, roopa

From: Vladimir Oltean <vladimir.oltean@nxp.com>

This is a WIP series whose stated goal is to allow DSA and switchdev
drivers to flood less traffic to the CPU while keeping the same level of
functionality.

The strategy is to whitelist towards the CPU only the {DMAC, VLAN} pairs
that the operating system has expressed its interest in, either due to
those being the MAC addresses of one of the switch ports, or addresses
added to our device's RX filter via calls to dev_uc_add/dev_mc_add.
Then, the traffic which is not explicitly whitelisted is not sent by the
hardware to the CPU, under the assumption that the CPU didn't ask for it
and would have dropped it anyway.

The ground for these patches were the discussions surrounding RX
filtering with switchdev in general, as well as with DSA in particular:

"[PATCH net-next 0/4] DSA: promisc on master, generic flow dissector code":
https://www.spinics.net/lists/netdev/msg651922.html
"[PATCH v3 net-next 2/2] net: dsa: felix: Allow unknown unicast traffic towards the CPU port module":
https://www.spinics.net/lists/netdev/msg634859.html
"[PATCH v3 0/2] net: core: Notify on changes to dev->promiscuity":
https://lkml.org/lkml/2019/8/29/255
LPC2019 - SwitchDev offload optimizations:
https://www.youtube.com/watch?v=B1HhxEcU7Jg

Unicast filtering comes to me as most important, and this includes
termination of MAC addresses corresponding to the network interfaces in
the system (DSA switch ports, VLAN sub-interfaces, bridge interface).
The first 4 patches use Ivan Khoronzhuk's IVDF framework for extending
network interface addresses with a Virtual ID (typically VLAN ID). This
matches DSA switches perfectly because their FDB already contains keys
of the {DMAC, VID} form.

Multicast filtering was taken and reworked from Florian Fainelli's
previous attempts, according to my own understanding of multicast
forwarding requirements of an IGMP snooping switch. This is the part
that needs the most extra work, not only in the DSA core but also in
drivers. For this reason, I've left out of this patchset anything that
has to do with driver-level configuration (since the audience is a bit
larger than usual), as I'm trying to focus more on policy for now, and
the series is already pretty huge.

Florian Fainelli (3):
  net: bridge: multicast: propagate br_mc_disabled_update() return
  net: dsa: add ability to program unicast and multicast filters for CPU
    port
  net: dsa: wire up multicast IGMP snooping attribute notification

Ivan Khoronzhuk (4):
  net: core: dev_addr_lists: add VID to device address
  net: 8021q: vlan_dev: add vid tag to addresses of uc and mc lists
  net: 8021q: vlan_dev: add vid tag for vlan device own address
  ethernet: eth: add default vid len for all ethernet kind devices

Vladimir Oltean (6):
  net: core: dev_addr_lists: export some raw __hw_addr helpers
  net: dsa: don't use switchdev_notifier_fdb_info in
    dsa_switchdev_event_work
  net: dsa: mroute: don't panic the kernel if called without the prepare
    phase
  net: bridge: add port flags for host flooding
  net: dsa: deal with new flooding port attributes from bridge
  net: dsa: treat switchdev notifications for multicast router connected
    to port

 include/linux/if_bridge.h |   3 +
 include/linux/if_vlan.h   |   2 +
 include/linux/netdevice.h |  11 ++
 include/net/dsa.h         |  17 +++
 net/8021q/Kconfig         |  12 ++
 net/8021q/vlan.c          |   3 +
 net/8021q/vlan.h          |   2 +
 net/8021q/vlan_core.c     |  25 ++++
 net/8021q/vlan_dev.c      | 102 +++++++++++---
 net/bridge/br_if.c        |  40 ++++++
 net/bridge/br_multicast.c |  21 ++-
 net/bridge/br_switchdev.c |   4 +-
 net/core/dev_addr_lists.c | 144 +++++++++++++++----
 net/dsa/Kconfig           |   1 +
 net/dsa/dsa2.c            |   6 +
 net/dsa/dsa_priv.h        |  27 +++-
 net/dsa/port.c            | 155 ++++++++++++++++----
 net/dsa/slave.c           | 288 +++++++++++++++++++++++++++++++-------
 net/dsa/switch.c          |  36 +++++
 net/ethernet/eth.c        |  12 +-
 20 files changed, 780 insertions(+), 131 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH RFC net-next 01/13] net: core: dev_addr_lists: add VID to device address
  2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
@ 2020-05-21 21:10 ` Vladimir Oltean
  2020-05-21 21:10 ` [PATCH RFC net-next 02/13] net: 8021q: vlan_dev: add vid tag to addresses of uc and mc lists Vladimir Oltean
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-21 21:10 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem
  Cc: jiri, idosch, kuba, ivecera, netdev, horatiu.vultur,
	allan.nielsen, nikolay, roopa

From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>

Despite this is supposed to be used for Ethernet VLANs, not Ethernet
addresses with space for VID also can reuse this, so VID is considered
as virtual ID extension, not belonging strictly to Ethernet VLAN VIDs,
and overall change can be named individual virtual device filtering
(IVDF).

This patch adds VID tag at the end of each address. The actual
reserved address size is 32 bytes. For Ethernet addresses with 6 bytes
long that's possible to add tag w/o increasing address size. Thus,
each address for the case has 32 - 6 = 26 bytes to hold additional
info, say VID for virtual device addresses.

Therefore, when addresses are synced to the address list of parent
device the address list of latter can contain separate addresses for
virtual devices. It allows to track separate address tables for
virtual devices if they present and the device can be placed on
any place of device tree as the address is propagated to to the end
real device thru *_sync()/ndo_set_rx_mode() APIs. Also it simplifies
handling VID addresses at real device when it supports IVDF.

If parent device doesn't want to have virtual addresses in its address
space the vid_len has to be 0, thus its address space is "shrunk" to
the state as before this patch. For now it's 0 for every device. It
allows two devices with and w/o IVDF to be part of same bond device
for instance.

The end real device supporting IVDF can retrieve VID tag from an
address and set it for a given virtual device only. By default, vid 0
is used for real devices to distinguish it from virtual addresses.

See next patches to see how it's used.

Note that adding the vid_len member to struct net_device is not intended
to change the structure layout. Here is the output of pahole:

For ARM 32, on 1 hole less:
---------------------------

before (https://pastebin.com/DG1SVpFR):

/* size: 1344, cachelines: 21, members: 123 */
/* sum members: 1304, holes: 5, sum holes: 28 */
/* padding: 12 */
/* bit_padding: 31 bits */

after (https://pastebin.com/ZUMhxGkA):

/* size: 1344, cachelines: 21, members: 124 */
/* sum members: 1305, holes: 5, sum holes: 27 */
/* padding: 12 */
/* bit_padding: 31 bits */

For ARM 64, on 1 hole less:
---------------------------

before (https://pastebin.com/5CdTQWkc):

/* size: 2048, cachelines: 32, members: 120 */
/* sum members: 1972, holes: 7, sum holes: 48 */
/* padding: 28 */
/* bit_padding: 31 bits */

after (https://pastebin.com/32ktb1iV):

/* size: 2048, cachelines: 32, members: 121 */
/* sum members: 1973, holes: 7, sum holes: 47 */
/* padding: 28 */
/* bit_padding: 31 bits */

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/linux/netdevice.h |   4 ++
 net/core/dev_addr_lists.c | 127 ++++++++++++++++++++++++++++++++------
 2 files changed, 111 insertions(+), 20 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a18f8fdf4260..2d11b93f3af4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1698,6 +1698,7 @@ enum netdev_priv_flags {
  * 	@perm_addr:		Permanent hw address
  * 	@addr_assign_type:	Hw address assignment type
  * 	@addr_len:		Hardware address length
+ *	@vid_len:		Virtual ID length, set in case of IVDF
  *	@upper_level:		Maximum depth level of upper devices.
  *	@lower_level:		Maximum depth level of lower devices.
  *	@neigh_priv_len:	Used in neigh_alloc()
@@ -1950,6 +1951,7 @@ struct net_device {
 	unsigned char		perm_addr[MAX_ADDR_LEN];
 	unsigned char		addr_assign_type;
 	unsigned char		addr_len;
+	unsigned char		vid_len;
 	unsigned char		upper_level;
 	unsigned char		lower_level;
 	unsigned short		neigh_priv_len;
@@ -4316,8 +4318,10 @@ int dev_addr_init(struct net_device *dev);
 
 /* Functions used for unicast addresses handling */
 int dev_uc_add(struct net_device *dev, const unsigned char *addr);
+int dev_vid_uc_add(struct net_device *dev, const unsigned char *addr);
 int dev_uc_add_excl(struct net_device *dev, const unsigned char *addr);
 int dev_uc_del(struct net_device *dev, const unsigned char *addr);
+int dev_vid_uc_del(struct net_device *dev, const unsigned char *addr);
 int dev_uc_sync(struct net_device *to, struct net_device *from);
 int dev_uc_sync_multiple(struct net_device *to, struct net_device *from);
 void dev_uc_unsync(struct net_device *to, struct net_device *from);
diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c
index 2f949b5a1eb9..90eaa99b19e5 100644
--- a/net/core/dev_addr_lists.c
+++ b/net/core/dev_addr_lists.c
@@ -541,6 +541,35 @@ int dev_addr_del(struct net_device *dev, const unsigned char *addr,
 }
 EXPORT_SYMBOL(dev_addr_del);
 
+static int get_addr_len(struct net_device *dev)
+{
+	return dev->addr_len + dev->vid_len;
+}
+
+/**
+ *	set_vid_addr - Copy a device address into a new address with IVDF.
+ *	@dev: device
+ *	@addr: address to copy
+ *	@naddr: location of new address
+ *
+ *	Transform a regular device address into one with IVDF (Individual
+ *	Virtual Device Filtering). If the device does not support IVDF, the
+ *	original device address length is returned and no copying is done.
+ *	Otherwise, the length of the IVDF address is returned.
+ *	The VID is set to zero which denotes the address of a real device.
+ */
+static int set_vid_addr(struct net_device *dev, const unsigned char *addr,
+			unsigned char *naddr)
+{
+	if (!dev->vid_len)
+		return dev->addr_len;
+
+	memcpy(naddr, addr, dev->addr_len);
+	memset(naddr + dev->addr_len, 0, dev->vid_len);
+
+	return get_addr_len(dev);
+}
+
 /*
  * Unicast list handling functions
  */
@@ -552,18 +581,22 @@ EXPORT_SYMBOL(dev_addr_del);
  */
 int dev_uc_add_excl(struct net_device *dev, const unsigned char *addr)
 {
+	unsigned char naddr[MAX_ADDR_LEN];
 	struct netdev_hw_addr *ha;
-	int err;
+	int addr_len, err;
+
+	addr_len = set_vid_addr(dev, addr, naddr);
+	addr = dev->vid_len ? naddr : addr;
 
 	netif_addr_lock_bh(dev);
 	list_for_each_entry(ha, &dev->uc.list, list) {
-		if (!memcmp(ha->addr, addr, dev->addr_len) &&
+		if (!memcmp(ha->addr, addr, addr_len) &&
 		    ha->type == NETDEV_HW_ADDR_T_UNICAST) {
 			err = -EEXIST;
 			goto out;
 		}
 	}
-	err = __hw_addr_create_ex(&dev->uc, addr, dev->addr_len,
+	err = __hw_addr_create_ex(&dev->uc, addr, addr_len,
 				  NETDEV_HW_ADDR_T_UNICAST, true, false);
 	if (!err)
 		__dev_set_rx_mode(dev);
@@ -574,47 +607,89 @@ int dev_uc_add_excl(struct net_device *dev, const unsigned char *addr)
 EXPORT_SYMBOL(dev_uc_add_excl);
 
 /**
- *	dev_uc_add - Add a secondary unicast address
+ *	dev_vid_uc_add - Add a secondary unicast address with tag
  *	@dev: device
- *	@addr: address to add
+ *	@addr: address to add, includes vid tag already
  *
  *	Add a secondary unicast address to the device or increase
  *	the reference count if it already exists.
  */
-int dev_uc_add(struct net_device *dev, const unsigned char *addr)
+int dev_vid_uc_add(struct net_device *dev, const unsigned char *addr)
 {
 	int err;
 
 	netif_addr_lock_bh(dev);
-	err = __hw_addr_add(&dev->uc, addr, dev->addr_len,
+	err = __hw_addr_add(&dev->uc, addr, get_addr_len(dev),
 			    NETDEV_HW_ADDR_T_UNICAST);
 	if (!err)
 		__dev_set_rx_mode(dev);
 	netif_addr_unlock_bh(dev);
 	return err;
 }
+EXPORT_SYMBOL(dev_vid_uc_add);
+
+/**
+ *	dev_uc_add - Add a secondary unicast address
+ *	@dev: device
+ *	@addr: address to add
+ *
+ *	Add a secondary unicast address to the device or increase
+ *	the reference count if it already exists.
+ */
+int dev_uc_add(struct net_device *dev, const unsigned char *addr)
+{
+	unsigned char naddr[MAX_ADDR_LEN];
+	int err;
+
+	set_vid_addr(dev, addr, naddr);
+	addr = dev->vid_len ? naddr : addr;
+
+	err = dev_vid_uc_add(dev, addr);
+	return err;
+}
 EXPORT_SYMBOL(dev_uc_add);
 
 /**
  *	dev_uc_del - Release secondary unicast address.
  *	@dev: device
- *	@addr: address to delete
+ *	@addr: address to delete, includes vid tag already
  *
  *	Release reference to a secondary unicast address and remove it
  *	from the device if the reference count drops to zero.
  */
-int dev_uc_del(struct net_device *dev, const unsigned char *addr)
+int dev_vid_uc_del(struct net_device *dev, const unsigned char *addr)
 {
 	int err;
 
 	netif_addr_lock_bh(dev);
-	err = __hw_addr_del(&dev->uc, addr, dev->addr_len,
+	err = __hw_addr_del(&dev->uc, addr, get_addr_len(dev),
 			    NETDEV_HW_ADDR_T_UNICAST);
 	if (!err)
 		__dev_set_rx_mode(dev);
 	netif_addr_unlock_bh(dev);
 	return err;
 }
+EXPORT_SYMBOL(dev_vid_uc_del);
+
+/**
+ *	dev_uc_del - Release secondary unicast address.
+ *	@dev: device
+ *	@addr: address to delete
+ *
+ *	Release reference to a secondary unicast address and remove it
+ *	from the device if the reference count drops to zero.
+ */
+int dev_uc_del(struct net_device *dev, const unsigned char *addr)
+{
+	unsigned char naddr[MAX_ADDR_LEN];
+	int err;
+
+	set_vid_addr(dev, addr, naddr);
+	addr = dev->vid_len ? naddr : addr;
+
+	err = dev_vid_uc_del(dev, addr);
+	return err;
+}
 EXPORT_SYMBOL(dev_uc_del);
 
 /**
@@ -638,7 +713,7 @@ int dev_uc_sync(struct net_device *to, struct net_device *from)
 		return -EINVAL;
 
 	netif_addr_lock(to);
-	err = __hw_addr_sync(&to->uc, &from->uc, to->addr_len);
+	err = __hw_addr_sync(&to->uc, &from->uc, get_addr_len(to));
 	if (!err)
 		__dev_set_rx_mode(to);
 	netif_addr_unlock(to);
@@ -668,7 +743,7 @@ int dev_uc_sync_multiple(struct net_device *to, struct net_device *from)
 		return -EINVAL;
 
 	netif_addr_lock(to);
-	err = __hw_addr_sync_multiple(&to->uc, &from->uc, to->addr_len);
+	err = __hw_addr_sync_multiple(&to->uc, &from->uc, get_addr_len(to));
 	if (!err)
 		__dev_set_rx_mode(to);
 	netif_addr_unlock(to);
@@ -692,7 +767,7 @@ void dev_uc_unsync(struct net_device *to, struct net_device *from)
 
 	netif_addr_lock_bh(from);
 	netif_addr_lock(to);
-	__hw_addr_unsync(&to->uc, &from->uc, to->addr_len);
+	__hw_addr_unsync(&to->mc, &from->mc, get_addr_len(to));
 	__dev_set_rx_mode(to);
 	netif_addr_unlock(to);
 	netif_addr_unlock_bh(from);
@@ -736,18 +811,22 @@ EXPORT_SYMBOL(dev_uc_init);
  */
 int dev_mc_add_excl(struct net_device *dev, const unsigned char *addr)
 {
+	unsigned char naddr[MAX_ADDR_LEN];
 	struct netdev_hw_addr *ha;
-	int err;
+	int addr_len, err;
+
+	addr_len = set_vid_addr(dev, addr, naddr);
+	addr = dev->vid_len ? naddr : addr;
 
 	netif_addr_lock_bh(dev);
 	list_for_each_entry(ha, &dev->mc.list, list) {
-		if (!memcmp(ha->addr, addr, dev->addr_len) &&
+		if (!memcmp(ha->addr, addr, addr_len) &&
 		    ha->type == NETDEV_HW_ADDR_T_MULTICAST) {
 			err = -EEXIST;
 			goto out;
 		}
 	}
-	err = __hw_addr_create_ex(&dev->mc, addr, dev->addr_len,
+	err = __hw_addr_create_ex(&dev->mc, addr, addr_len,
 				  NETDEV_HW_ADDR_T_MULTICAST, true, false);
 	if (!err)
 		__dev_set_rx_mode(dev);
@@ -760,10 +839,14 @@ EXPORT_SYMBOL(dev_mc_add_excl);
 static int __dev_mc_add(struct net_device *dev, const unsigned char *addr,
 			bool global)
 {
-	int err;
+	unsigned char naddr[MAX_ADDR_LEN];
+	int addr_len, err;
+
+	addr_len = set_vid_addr(dev, addr, naddr);
+	addr = dev->vid_len ? naddr : addr;
 
 	netif_addr_lock_bh(dev);
-	err = __hw_addr_add_ex(&dev->mc, addr, dev->addr_len,
+	err = __hw_addr_add_ex(&dev->mc, addr, addr_len,
 			       NETDEV_HW_ADDR_T_MULTICAST, global, false, 0);
 	if (!err)
 		__dev_set_rx_mode(dev);
@@ -800,10 +883,14 @@ EXPORT_SYMBOL(dev_mc_add_global);
 static int __dev_mc_del(struct net_device *dev, const unsigned char *addr,
 			bool global)
 {
-	int err;
+	unsigned char naddr[MAX_ADDR_LEN];
+	int addr_len, err;
+
+	addr_len = set_vid_addr(dev, addr, naddr);
+	addr = dev->vid_len ? naddr : addr;
 
 	netif_addr_lock_bh(dev);
-	err = __hw_addr_del_ex(&dev->mc, addr, dev->addr_len,
+	err = __hw_addr_del_ex(&dev->mc, addr, addr_len,
 			       NETDEV_HW_ADDR_T_MULTICAST, global, false);
 	if (!err)
 		__dev_set_rx_mode(dev);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC net-next 02/13] net: 8021q: vlan_dev: add vid tag to addresses of uc and mc lists
  2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
  2020-05-21 21:10 ` [PATCH RFC net-next 01/13] net: core: dev_addr_lists: add VID to device address Vladimir Oltean
@ 2020-05-21 21:10 ` Vladimir Oltean
  2020-05-21 21:10 ` [PATCH RFC net-next 03/13] net: 8021q: vlan_dev: add vid tag for vlan device own address Vladimir Oltean
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-21 21:10 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem
  Cc: jiri, idosch, kuba, ivecera, netdev, horatiu.vultur,
	allan.nielsen, nikolay, roopa

From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>

Update vlan mc and uc addresses with VID tag while propagating
addresses to lower devices, do this only if address is not synced.
It allows at end driver level to distinguish addresses belonging
to vlan devices.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/linux/if_vlan.h |  1 +
 net/8021q/vlan.h        |  2 ++
 net/8021q/vlan_core.c   | 13 +++++++++++++
 net/8021q/vlan_dev.c    | 26 ++++++++++++++++++++++++++
 4 files changed, 42 insertions(+)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index b05e855f1ddd..20407f73cfee 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -131,6 +131,7 @@ extern struct net_device *__vlan_find_dev_deep_rcu(struct net_device *real_dev,
 extern int vlan_for_each(struct net_device *dev,
 			 int (*action)(struct net_device *dev, int vid,
 				       void *arg), void *arg);
+extern u16 vlan_dev_get_addr_vid(struct net_device *dev, const u8 *addr);
 extern struct net_device *vlan_dev_real_dev(const struct net_device *dev);
 extern u16 vlan_dev_vlan_id(const struct net_device *dev);
 extern __be16 vlan_dev_vlan_proto(const struct net_device *dev);
diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h
index bb7ec1a3915d..e7f43d7fcc9a 100644
--- a/net/8021q/vlan.h
+++ b/net/8021q/vlan.h
@@ -6,6 +6,8 @@
 #include <linux/u64_stats_sync.h>
 #include <linux/list.h>
 
+#define NET_8021Q_VID_TSIZE	2
+
 /* if this changes, algorithm will have to be reworked because this
  * depends on completely exhausting the VLAN identifier space.  Thus
  * it gives constant time look-up, but in many cases it wastes memory.
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 78ec2e1b14d1..b528f09be9a3 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -453,6 +453,19 @@ bool vlan_uses_dev(const struct net_device *dev)
 }
 EXPORT_SYMBOL(vlan_uses_dev);
 
+u16 vlan_dev_get_addr_vid(struct net_device *dev, const u8 *addr)
+{
+	u16 vid = 0;
+
+	if (dev->vid_len != NET_8021Q_VID_TSIZE)
+		return vid;
+
+	vid = addr[dev->addr_len];
+	vid |= (addr[dev->addr_len + 1] & 0xf) << 8;
+	return vid;
+}
+EXPORT_SYMBOL(vlan_dev_get_addr_vid);
+
 static struct sk_buff *vlan_gro_receive(struct list_head *head,
 					struct sk_buff *skb)
 {
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index f00bb57f0f60..c2c3e5ae535c 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -244,6 +244,14 @@ void vlan_dev_get_realdev_name(const struct net_device *dev, char *result)
 	strncpy(result, vlan_dev_priv(dev)->real_dev->name, 23);
 }
 
+static void vlan_dev_set_addr_vid(struct net_device *vlan_dev, u8 *addr)
+{
+	u16 vid = vlan_dev_vlan_id(vlan_dev);
+
+	addr[vlan_dev->addr_len] = vid & 0xff;
+	addr[vlan_dev->addr_len + 1] = (vid >> 8) & 0xf;
+}
+
 bool vlan_dev_inherit_address(struct net_device *dev,
 			      struct net_device *real_dev)
 {
@@ -482,8 +490,26 @@ static void vlan_dev_change_rx_flags(struct net_device *dev, int change)
 	}
 }
 
+static void vlan_dev_align_addr_vid(struct net_device *vlan_dev)
+{
+	struct net_device *real_dev = vlan_dev_real_dev(vlan_dev);
+	struct netdev_hw_addr *ha;
+
+	if (!real_dev->vid_len)
+		return;
+
+	netdev_for_each_mc_addr(ha, vlan_dev)
+		if (!ha->sync_cnt)
+			vlan_dev_set_addr_vid(vlan_dev, ha->addr);
+
+	netdev_for_each_uc_addr(ha, vlan_dev)
+		if (!ha->sync_cnt)
+			vlan_dev_set_addr_vid(vlan_dev, ha->addr);
+}
+
 static void vlan_dev_set_rx_mode(struct net_device *vlan_dev)
 {
+	vlan_dev_align_addr_vid(vlan_dev);
 	dev_mc_sync(vlan_dev_priv(vlan_dev)->real_dev, vlan_dev);
 	dev_uc_sync(vlan_dev_priv(vlan_dev)->real_dev, vlan_dev);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC net-next 03/13] net: 8021q: vlan_dev: add vid tag for vlan device own address
  2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
  2020-05-21 21:10 ` [PATCH RFC net-next 01/13] net: core: dev_addr_lists: add VID to device address Vladimir Oltean
  2020-05-21 21:10 ` [PATCH RFC net-next 02/13] net: 8021q: vlan_dev: add vid tag to addresses of uc and mc lists Vladimir Oltean
@ 2020-05-21 21:10 ` Vladimir Oltean
  2020-05-21 21:10 ` [PATCH RFC net-next 04/13] ethernet: eth: add default vid len for all ethernet kind devices Vladimir Oltean
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-21 21:10 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem
  Cc: jiri, idosch, kuba, ivecera, netdev, horatiu.vultur,
	allan.nielsen, nikolay, roopa

From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>

The vlan device address is held separately from uc/mc lists and
handled differently. The vlan dev address is bound with real device
address only if it's inherited from init, in all other cases it's
separate address entry in uc list. With vid set, the address becomes
not inherited from real device after it's set manually as before, but
is part of uc list any way, with appropriate vid tag set. If vid_len
for real device is 0, the behaviour is the same as before this change,
so shouldn't be any impact on systems w/o individual virtual device
filtering (IVDF) enabled. This allows to control and sync vlan device
address and disable concrete vlan packet ingress when vlan interface is
down.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/8021q/vlan.c     |  3 ++
 net/8021q/vlan_dev.c | 75 +++++++++++++++++++++++++++++++++-----------
 2 files changed, 60 insertions(+), 18 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index d4bcfd8f95bf..4cc341c191a4 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -298,6 +298,9 @@ static void vlan_sync_address(struct net_device *dev,
 	if (vlan_dev_inherit_address(vlandev, dev))
 		goto out;
 
+	if (dev->vid_len)
+		goto out;
+
 	/* vlan address was different from the old address and is equal to
 	 * the new address */
 	if (!ether_addr_equal(vlandev->dev_addr, vlan->real_dev_addr) &&
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index c2c3e5ae535c..f3f570a12ffd 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -252,12 +252,61 @@ static void vlan_dev_set_addr_vid(struct net_device *vlan_dev, u8 *addr)
 	addr[vlan_dev->addr_len + 1] = (vid >> 8) & 0xf;
 }
 
+static int vlan_dev_add_addr(struct net_device *dev, u8 *addr)
+{
+	struct net_device *real_dev = vlan_dev_real_dev(dev);
+	unsigned char naddr[ETH_ALEN + NET_8021Q_VID_TSIZE];
+
+	if (real_dev->vid_len) {
+		memcpy(naddr, addr, dev->addr_len);
+		vlan_dev_set_addr_vid(dev, naddr);
+		return dev_vid_uc_add(real_dev, naddr);
+	}
+
+	if (ether_addr_equal(addr, real_dev->dev_addr))
+		return 0;
+
+	return dev_uc_add(real_dev, addr);
+}
+
+static void vlan_dev_del_addr(struct net_device *dev, u8 *addr)
+{
+	struct net_device *real_dev = vlan_dev_real_dev(dev);
+	unsigned char naddr[ETH_ALEN + NET_8021Q_VID_TSIZE];
+
+	if (real_dev->vid_len) {
+		memcpy(naddr, addr, dev->addr_len);
+		vlan_dev_set_addr_vid(dev, naddr);
+		dev_vid_uc_del(real_dev, naddr);
+		return;
+	}
+
+	if (!ether_addr_equal(dev->dev_addr, real_dev->dev_addr))
+		dev_uc_del(real_dev, addr);
+}
+
+static int vlan_dev_subs_addr(struct net_device *dev, u8 *addr)
+{
+	int err;
+
+	err = vlan_dev_add_addr(dev, addr);
+	if (err < 0)
+		return err;
+
+	vlan_dev_del_addr(dev, dev->dev_addr);
+	return err;
+}
+
 bool vlan_dev_inherit_address(struct net_device *dev,
 			      struct net_device *real_dev)
 {
 	if (dev->addr_assign_type != NET_ADDR_STOLEN)
 		return false;
 
+	if (real_dev->vid_len)
+		if (vlan_dev_subs_addr(dev, real_dev->dev_addr))
+			return false;
+
 	ether_addr_copy(dev->dev_addr, real_dev->dev_addr);
 	call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);
 	return true;
@@ -273,9 +322,10 @@ static int vlan_dev_open(struct net_device *dev)
 	    !(vlan->flags & VLAN_FLAG_LOOSE_BINDING))
 		return -ENETDOWN;
 
-	if (!ether_addr_equal(dev->dev_addr, real_dev->dev_addr) &&
-	    !vlan_dev_inherit_address(dev, real_dev)) {
-		err = dev_uc_add(real_dev, dev->dev_addr);
+	if (ether_addr_equal(dev->dev_addr, real_dev->dev_addr) ||
+	    (!ether_addr_equal(dev->dev_addr, real_dev->dev_addr) &&
+	     !vlan_dev_inherit_address(dev, real_dev))) {
+		err = vlan_dev_add_addr(dev, dev->dev_addr);
 		if (err < 0)
 			goto out;
 	}
@@ -308,8 +358,7 @@ static int vlan_dev_open(struct net_device *dev)
 	if (dev->flags & IFF_ALLMULTI)
 		dev_set_allmulti(real_dev, -1);
 del_unicast:
-	if (!ether_addr_equal(dev->dev_addr, real_dev->dev_addr))
-		dev_uc_del(real_dev, dev->dev_addr);
+	vlan_dev_del_addr(dev, dev->dev_addr);
 out:
 	netif_carrier_off(dev);
 	return err;
@@ -327,8 +376,7 @@ static int vlan_dev_stop(struct net_device *dev)
 	if (dev->flags & IFF_PROMISC)
 		dev_set_promiscuity(real_dev, -1);
 
-	if (!ether_addr_equal(dev->dev_addr, real_dev->dev_addr))
-		dev_uc_del(real_dev, dev->dev_addr);
+	vlan_dev_del_addr(dev, dev->dev_addr);
 
 	if (!(vlan->flags & VLAN_FLAG_BRIDGE_BINDING))
 		netif_carrier_off(dev);
@@ -337,9 +385,7 @@ static int vlan_dev_stop(struct net_device *dev)
 
 static int vlan_dev_set_mac_address(struct net_device *dev, void *p)
 {
-	struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
 	struct sockaddr *addr = p;
-	int err;
 
 	if (!is_valid_ether_addr(addr->sa_data))
 		return -EADDRNOTAVAIL;
@@ -347,15 +393,8 @@ static int vlan_dev_set_mac_address(struct net_device *dev, void *p)
 	if (!(dev->flags & IFF_UP))
 		goto out;
 
-	if (!ether_addr_equal(addr->sa_data, real_dev->dev_addr)) {
-		err = dev_uc_add(real_dev, addr->sa_data);
-		if (err < 0)
-			return err;
-	}
-
-	if (!ether_addr_equal(dev->dev_addr, real_dev->dev_addr))
-		dev_uc_del(real_dev, dev->dev_addr);
-
+	if (vlan_dev_subs_addr(dev, addr->sa_data))
+		return true;
 out:
 	ether_addr_copy(dev->dev_addr, addr->sa_data);
 	return 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC net-next 04/13] ethernet: eth: add default vid len for all ethernet kind devices
  2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
                   ` (2 preceding siblings ...)
  2020-05-21 21:10 ` [PATCH RFC net-next 03/13] net: 8021q: vlan_dev: add vid tag for vlan device own address Vladimir Oltean
@ 2020-05-21 21:10 ` Vladimir Oltean
  2020-05-21 21:10 ` [PATCH RFC net-next 05/13] net: bridge: multicast: propagate br_mc_disabled_update() return Vladimir Oltean
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-21 21:10 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem
  Cc: jiri, idosch, kuba, ivecera, netdev, horatiu.vultur,
	allan.nielsen, nikolay, roopa

From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>

IVDF - individual virtual device filtering. Allows to set per vlan
L2 address filters on end real network device (for unicast and for
multicast) and drop redundant, unexpected packet ingress.

If CONFIG_VLAN_8021Q_IVDF is enabled the following changes are
applied, and only for ethernet network devices.

By default every ethernet netdev needs vid len = 2 bytes to be able to
hold up to 4096 vids. So set it for every eth device to be correct,
except vlan devs.

In order to shrink all addresses of devices above vlan, the vid_len
for vlan dev = 0, as result all suckers sync their addresses to common
base not taking into account vid part (vid_len of "to" devices is
important only). And only vlan device is the source of addresses with
actual its vid set, propagating it to parent devices while rx_mode().

Also, don't bother those ethernet devices that at this moment are not
moved to vlan addressing scheme, so while end ethernet device is
created - set vid_len to 0, thus, while syncing, its address space is
concatenated to one dimensional like usual, and who needs IVDF - set
it to NET_8021Q_VID_TSIZE.

There is another decision - is to inherit vid_len or some feature flag
from end root device in order to all upper devices have vlan extended
address space only if exact end real device have such capability. But
I didn't, because it requires more changes and probably I'm not
familiar with all places where it should be inherited, I would
appreciate if someone can guide where it's applicable, then it could
become a little bit more limited.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/linux/if_vlan.h |  1 +
 net/8021q/Kconfig       | 12 ++++++++++++
 net/8021q/vlan_core.c   | 12 ++++++++++++
 net/8021q/vlan_dev.c    |  1 +
 net/ethernet/eth.c      | 12 ++++++++++--
 5 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 20407f73cfee..b3f7e92cd645 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -132,6 +132,7 @@ extern int vlan_for_each(struct net_device *dev,
 			 int (*action)(struct net_device *dev, int vid,
 				       void *arg), void *arg);
 extern u16 vlan_dev_get_addr_vid(struct net_device *dev, const u8 *addr);
+extern void vlan_dev_ivdf_set(struct net_device *dev, bool enable);
 extern struct net_device *vlan_dev_real_dev(const struct net_device *dev);
 extern u16 vlan_dev_vlan_id(const struct net_device *dev);
 extern __be16 vlan_dev_vlan_proto(const struct net_device *dev);
diff --git a/net/8021q/Kconfig b/net/8021q/Kconfig
index 5510b4b90ff0..aaae09068ab8 100644
--- a/net/8021q/Kconfig
+++ b/net/8021q/Kconfig
@@ -39,3 +39,15 @@ config VLAN_8021Q_MVRP
 	  supersedes GVRP and is not backwards-compatible.
 
 	  If unsure, say N.
+
+config VLAN_8021Q_IVDF
+	bool "IVDF (Individual Virtual Device Filtering) support"
+	depends on VLAN_8021Q
+	help
+	  Select this to enable IVDF addressing scheme support. IVDF is used
+	  for automatic propagation of registered VLANs addresses to real end
+	  devices. If no device supporting IVDF then disable this as it can
+	  consume some memory in configuration with complex network device
+	  structures to hold vlan addresses.
+
+	  If unsure, say N.
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index b528f09be9a3..d21492f7f557 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -453,6 +453,18 @@ bool vlan_uses_dev(const struct net_device *dev)
 }
 EXPORT_SYMBOL(vlan_uses_dev);
 
+void vlan_dev_ivdf_set(struct net_device *dev, bool enable)
+{
+#ifdef CONFIG_VLAN_8021Q_IVDF
+	if (enable) {
+		dev->vid_len = NET_8021Q_VID_TSIZE;
+		return;
+	}
+#endif
+	dev->vid_len = 0;
+}
+EXPORT_SYMBOL(vlan_dev_ivdf_set);
+
 u16 vlan_dev_get_addr_vid(struct net_device *dev, const u8 *addr)
 {
 	u16 vid = 0;
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index f3f570a12ffd..22ce9f9f666d 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -894,5 +894,6 @@ void vlan_setup(struct net_device *dev)
 	dev->min_mtu		= 0;
 	dev->max_mtu		= ETH_MAX_MTU;
 
+	vlan_dev_ivdf_set(dev, true);
 	eth_zero_addr(dev->broadcast);
 }
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index c8b903302ff2..c40fae6df46b 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -372,6 +372,7 @@ void ether_setup(struct net_device *dev)
 	dev->flags		= IFF_BROADCAST|IFF_MULTICAST;
 	dev->priv_flags		|= IFF_TX_SKB_SHARING;
 
+	vlan_dev_ivdf_set(dev, false);
 	eth_broadcast_addr(dev->broadcast);
 
 }
@@ -395,8 +396,15 @@ EXPORT_SYMBOL(ether_setup);
 struct net_device *alloc_etherdev_mqs(int sizeof_priv, unsigned int txqs,
 				      unsigned int rxqs)
 {
-	return alloc_netdev_mqs(sizeof_priv, "eth%d", NET_NAME_UNKNOWN,
-				ether_setup, txqs, rxqs);
+	struct net_device *dev;
+
+	dev = alloc_netdev_mqs(sizeof_priv, "eth%d", NET_NAME_UNKNOWN,
+			       ether_setup, txqs, rxqs);
+	if (!dev)
+		return NULL;
+
+	vlan_dev_ivdf_set(dev, false);
+	return dev;
 }
 EXPORT_SYMBOL(alloc_etherdev_mqs);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC net-next 05/13] net: bridge: multicast: propagate br_mc_disabled_update() return
  2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
                   ` (3 preceding siblings ...)
  2020-05-21 21:10 ` [PATCH RFC net-next 04/13] ethernet: eth: add default vid len for all ethernet kind devices Vladimir Oltean
@ 2020-05-21 21:10 ` Vladimir Oltean
  2020-05-21 21:10 ` [PATCH RFC net-next 06/13] net: core: dev_addr_lists: export some raw __hw_addr helpers Vladimir Oltean
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-21 21:10 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem
  Cc: jiri, idosch, kuba, ivecera, netdev, horatiu.vultur,
	allan.nielsen, nikolay, roopa

From: Florian Fainelli <f.fainelli@gmail.com>

Some Ethernet switches might not be able to support disabling multicast
flooding globally when e.g: several bridges span the same physical
device, propagate the return value of br_mc_disabled_update() such that
this propagates correctly to user-space.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/bridge/br_multicast.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index ad12fe3fca8c..9e93035b1483 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -809,7 +809,7 @@ static void br_ip6_multicast_port_query_expired(struct timer_list *t)
 }
 #endif
 
-static void br_mc_disabled_update(struct net_device *dev, bool value)
+static int br_mc_disabled_update(struct net_device *dev, bool value)
 {
 	struct switchdev_attr attr = {
 		.orig_dev = dev,
@@ -818,11 +818,13 @@ static void br_mc_disabled_update(struct net_device *dev, bool value)
 		.u.mc_disabled = !value,
 	};
 
-	switchdev_port_attr_set(dev, &attr);
+	return switchdev_port_attr_set(dev, &attr);
 }
 
 int br_multicast_add_port(struct net_bridge_port *port)
 {
+	int ret;
+
 	port->multicast_router = MDB_RTR_TYPE_TEMP_QUERY;
 
 	timer_setup(&port->multicast_router_timer,
@@ -833,8 +835,11 @@ int br_multicast_add_port(struct net_bridge_port *port)
 	timer_setup(&port->ip6_own_query.timer,
 		    br_ip6_multicast_port_query_expired, 0);
 #endif
-	br_mc_disabled_update(port->dev,
-			      br_opt_get(port->br, BROPT_MULTICAST_ENABLED));
+	ret = br_mc_disabled_update(port->dev,
+				    br_opt_get(port->br,
+					       BROPT_MULTICAST_ENABLED));
+	if (ret)
+		return ret;
 
 	port->mcast_stats = netdev_alloc_pcpu_stats(struct bridge_mcast_stats);
 	if (!port->mcast_stats)
@@ -2049,12 +2054,16 @@ static void br_multicast_start_querier(struct net_bridge *br,
 int br_multicast_toggle(struct net_bridge *br, unsigned long val)
 {
 	struct net_bridge_port *port;
+	int err = 0;
 
 	spin_lock_bh(&br->multicast_lock);
 	if (!!br_opt_get(br, BROPT_MULTICAST_ENABLED) == !!val)
 		goto unlock;
 
-	br_mc_disabled_update(br->dev, val);
+	err = br_mc_disabled_update(br->dev, val);
+	if (err && err != -EOPNOTSUPP)
+		goto unlock;
+
 	br_opt_toggle(br, BROPT_MULTICAST_ENABLED, !!val);
 	if (!br_opt_get(br, BROPT_MULTICAST_ENABLED)) {
 		br_multicast_leave_snoopers(br);
@@ -2071,7 +2080,7 @@ int br_multicast_toggle(struct net_bridge *br, unsigned long val)
 unlock:
 	spin_unlock_bh(&br->multicast_lock);
 
-	return 0;
+	return err;
 }
 
 bool br_multicast_enabled(const struct net_device *dev)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC net-next 06/13] net: core: dev_addr_lists: export some raw __hw_addr helpers
  2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
                   ` (4 preceding siblings ...)
  2020-05-21 21:10 ` [PATCH RFC net-next 05/13] net: bridge: multicast: propagate br_mc_disabled_update() return Vladimir Oltean
@ 2020-05-21 21:10 ` Vladimir Oltean
  2020-05-21 21:10 ` [PATCH RFC net-next 07/13] net: dsa: don't use switchdev_notifier_fdb_info in dsa_switchdev_event_work Vladimir Oltean
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-21 21:10 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem
  Cc: jiri, idosch, kuba, ivecera, netdev, horatiu.vultur,
	allan.nielsen, nikolay, roopa

From: Vladimir Oltean <vladimir.oltean@nxp.com>

DSA switches need to keep the list of addresses which are filtered
towards the CPU port. One DSA switch can have 1 CPU port and many
front-panel (user) ports, each user port having its own MAC address
(they can potentially be all the same MAC address). Filtering towards
the CPU port means adding a FDB address for each user port MAC address
that sends that address to the CPU. There is no net_device associated
with the CPU port. So the DSA switches need to keep their own reference
counting of MAC addresses for which a FDB entry is installed or removed
on the CPU port. Permit that by exporting the raw helpers instead of
operating on a struct net_device.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/linux/netdevice.h |  7 +++++++
 net/core/dev_addr_lists.c | 17 ++++++++++-------
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2d11b93f3af4..239efd209c33 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4307,6 +4307,13 @@ void __hw_addr_unsync_dev(struct netdev_hw_addr_list *list,
 			  int (*unsync)(struct net_device *,
 					const unsigned char *));
 void __hw_addr_init(struct netdev_hw_addr_list *list);
+void __hw_addr_flush(struct netdev_hw_addr_list *list);
+int __hw_addr_add(struct netdev_hw_addr_list *list,
+		  const unsigned char *addr, int addr_len,
+		  unsigned char addr_type);
+int __hw_addr_del(struct netdev_hw_addr_list *list,
+		  const unsigned char *addr, int addr_len,
+		  unsigned char addr_type);
 
 /* Functions used for device addresses handling */
 int dev_addr_add(struct net_device *dev, const unsigned char *addr,
diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c
index 90eaa99b19e5..e307ae7d2a44 100644
--- a/net/core/dev_addr_lists.c
+++ b/net/core/dev_addr_lists.c
@@ -77,13 +77,14 @@ static int __hw_addr_add_ex(struct netdev_hw_addr_list *list,
 				   sync);
 }
 
-static int __hw_addr_add(struct netdev_hw_addr_list *list,
-			 const unsigned char *addr, int addr_len,
-			 unsigned char addr_type)
+int __hw_addr_add(struct netdev_hw_addr_list *list,
+		  const unsigned char *addr, int addr_len,
+		  unsigned char addr_type)
 {
 	return __hw_addr_add_ex(list, addr, addr_len, addr_type, false, false,
 				0);
 }
+EXPORT_SYMBOL(__hw_addr_add);
 
 static int __hw_addr_del_entry(struct netdev_hw_addr_list *list,
 			       struct netdev_hw_addr *ha, bool global,
@@ -123,12 +124,13 @@ static int __hw_addr_del_ex(struct netdev_hw_addr_list *list,
 	return -ENOENT;
 }
 
-static int __hw_addr_del(struct netdev_hw_addr_list *list,
-			 const unsigned char *addr, int addr_len,
-			 unsigned char addr_type)
+int __hw_addr_del(struct netdev_hw_addr_list *list,
+		  const unsigned char *addr, int addr_len,
+		  unsigned char addr_type)
 {
 	return __hw_addr_del_ex(list, addr, addr_len, addr_type, false, false);
 }
+EXPORT_SYMBOL(__hw_addr_del);
 
 static int __hw_addr_sync_one(struct netdev_hw_addr_list *to_list,
 			       struct netdev_hw_addr *ha,
@@ -403,7 +405,7 @@ void __hw_addr_unsync_dev(struct netdev_hw_addr_list *list,
 }
 EXPORT_SYMBOL(__hw_addr_unsync_dev);
 
-static void __hw_addr_flush(struct netdev_hw_addr_list *list)
+void __hw_addr_flush(struct netdev_hw_addr_list *list)
 {
 	struct netdev_hw_addr *ha, *tmp;
 
@@ -413,6 +415,7 @@ static void __hw_addr_flush(struct netdev_hw_addr_list *list)
 	}
 	list->count = 0;
 }
+EXPORT_SYMBOL(__hw_addr_flush);
 
 void __hw_addr_init(struct netdev_hw_addr_list *list)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC net-next 07/13] net: dsa: don't use switchdev_notifier_fdb_info in dsa_switchdev_event_work
  2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
                   ` (5 preceding siblings ...)
  2020-05-21 21:10 ` [PATCH RFC net-next 06/13] net: core: dev_addr_lists: export some raw __hw_addr helpers Vladimir Oltean
@ 2020-05-21 21:10 ` Vladimir Oltean
  2020-05-21 21:10 ` [PATCH RFC net-next 08/13] net: dsa: add ability to program unicast and multicast filters for CPU port Vladimir Oltean
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-21 21:10 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem
  Cc: jiri, idosch, kuba, ivecera, netdev, horatiu.vultur,
	allan.nielsen, nikolay, roopa

From: Vladimir Oltean <vladimir.oltean@nxp.com>

Currently DSA doesn't add FDB entries on the CPU port, because it only
does so through switchdev, which is associated with a net_device, and
there are none of those for the CPU port.

But actually FDB addresses on the CPU port can be associated with RX
filtering, so we can initiate switchdev operations from within the DSA
layer. We need the deferred work because .ndo_set_rx_mode runs in atomic
context. There is just one problem with the existing code: it passes a
structure in dsa_switchdev_event_work which was retrieved directly from
switchdev, so it contains a net_device. We need to generalize the
contents to something that covers the CPU port as well: the "ds, port"
tuple is fine for that.

Note that the new procedure for notifying the successful FDB offload is
inspired from the rocker model.

Also, nothing was being done if added_by_user was false. Let's check for
that a lot earlier, and don't actually bother to schedule the whole
workqueue for nothing.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/dsa_priv.h | 12 ++++++
 net/dsa/slave.c    | 98 +++++++++++++++++++++++-----------------------
 2 files changed, 60 insertions(+), 50 deletions(-)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index adecf73bd608..001668007efd 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -72,6 +72,18 @@ struct dsa_notifier_mtu_info {
 	int mtu;
 };
 
+struct dsa_switchdev_event_work {
+	struct dsa_switch *ds;
+	int port;
+	struct work_struct work;
+	unsigned long event;
+	/* Specific for SWITCHDEV_FDB_ADD_TO_DEVICE and
+	 * SWITCHDEV_FDB_DEL_TO_DEVICE
+	 */
+	unsigned char addr[ETH_ALEN];
+	u16 vid;
+};
+
 struct dsa_slave_priv {
 	/* Copy of CPU port xmit for faster access in slave transmit hot path */
 	struct sk_buff *	(*xmit)(struct sk_buff *skb,
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 886490fb203d..d2072fbd22fe 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1914,72 +1914,60 @@ static int dsa_slave_netdevice_event(struct notifier_block *nb,
 	return NOTIFY_DONE;
 }
 
-struct dsa_switchdev_event_work {
-	struct work_struct work;
-	struct switchdev_notifier_fdb_info fdb_info;
-	struct net_device *dev;
-	unsigned long event;
-};
+static void
+dsa_fdb_offload_notify(struct dsa_switchdev_event_work *switchdev_work)
+{
+	struct dsa_switch *ds = switchdev_work->ds;
+	struct dsa_port *dp = dsa_to_port(ds, switchdev_work->port);
+	struct switchdev_notifier_fdb_info info;
+
+	if (!dsa_is_user_port(ds, dp->index))
+		return;
+
+	info.addr = switchdev_work->addr;
+	info.vid = switchdev_work->vid;
+	info.offloaded = true;
+	call_switchdev_notifiers(SWITCHDEV_FDB_OFFLOADED,
+				 dp->slave, &info.info, NULL);
+}
 
 static void dsa_slave_switchdev_event_work(struct work_struct *work)
 {
 	struct dsa_switchdev_event_work *switchdev_work =
 		container_of(work, struct dsa_switchdev_event_work, work);
-	struct net_device *dev = switchdev_work->dev;
-	struct switchdev_notifier_fdb_info *fdb_info;
-	struct dsa_port *dp = dsa_slave_to_port(dev);
+	struct dsa_switch *ds = switchdev_work->ds;
+	struct dsa_port *dp = dsa_to_port(ds, switchdev_work->port);
 	int err;
 
 	rtnl_lock();
 	switch (switchdev_work->event) {
 	case SWITCHDEV_FDB_ADD_TO_DEVICE:
-		fdb_info = &switchdev_work->fdb_info;
-		if (!fdb_info->added_by_user)
-			break;
-
-		err = dsa_port_fdb_add(dp, fdb_info->addr, fdb_info->vid);
+		err = dsa_port_fdb_add(dp, switchdev_work->addr,
+				       switchdev_work->vid);
 		if (err) {
-			netdev_dbg(dev, "fdb add failed err=%d\n", err);
+			dev_dbg(ds->dev, "port %d fdb add failed err=%d\n",
+				dp->index, err);
 			break;
 		}
-		fdb_info->offloaded = true;
-		call_switchdev_notifiers(SWITCHDEV_FDB_OFFLOADED, dev,
-					 &fdb_info->info, NULL);
+		dsa_fdb_offload_notify(switchdev_work);
 		break;
 
 	case SWITCHDEV_FDB_DEL_TO_DEVICE:
-		fdb_info = &switchdev_work->fdb_info;
-		if (!fdb_info->added_by_user)
-			break;
-
-		err = dsa_port_fdb_del(dp, fdb_info->addr, fdb_info->vid);
+		err = dsa_port_fdb_del(dp, switchdev_work->addr,
+				       switchdev_work->vid);
 		if (err) {
-			netdev_dbg(dev, "fdb del failed err=%d\n", err);
-			dev_close(dev);
+			dev_dbg(ds->dev, "port %d fdb del failed err=%d\n",
+				dp->index, err);
+			if (dsa_is_user_port(ds, dp->index))
+				dev_close(dp->slave);
 		}
 		break;
 	}
 	rtnl_unlock();
 
-	kfree(switchdev_work->fdb_info.addr);
 	kfree(switchdev_work);
-	dev_put(dev);
-}
-
-static int
-dsa_slave_switchdev_fdb_work_init(struct dsa_switchdev_event_work *
-				  switchdev_work,
-				  const struct switchdev_notifier_fdb_info *
-				  fdb_info)
-{
-	memcpy(&switchdev_work->fdb_info, fdb_info,
-	       sizeof(switchdev_work->fdb_info));
-	switchdev_work->fdb_info.addr = kzalloc(ETH_ALEN, GFP_ATOMIC);
-	if (!switchdev_work->fdb_info.addr)
-		return -ENOMEM;
-	ether_addr_copy((u8 *)switchdev_work->fdb_info.addr,
-			fdb_info->addr);
-	return 0;
+	if (dsa_is_user_port(ds, dp->index))
+		dev_put(dp->slave);
 }
 
 /* Called under rcu_read_lock() */
@@ -1987,7 +1975,9 @@ static int dsa_slave_switchdev_event(struct notifier_block *unused,
 				     unsigned long event, void *ptr)
 {
 	struct net_device *dev = switchdev_notifier_info_to_dev(ptr);
+	const struct switchdev_notifier_fdb_info *fdb_info;
 	struct dsa_switchdev_event_work *switchdev_work;
+	struct dsa_port *dp;
 	int err;
 
 	if (event == SWITCHDEV_PORT_ATTR_SET) {
@@ -2000,20 +1990,32 @@ static int dsa_slave_switchdev_event(struct notifier_block *unused,
 	if (!dsa_slave_dev_check(dev))
 		return NOTIFY_DONE;
 
+	dp = dsa_slave_to_port(dev);
+
 	switchdev_work = kzalloc(sizeof(*switchdev_work), GFP_ATOMIC);
 	if (!switchdev_work)
 		return NOTIFY_BAD;
 
 	INIT_WORK(&switchdev_work->work,
 		  dsa_slave_switchdev_event_work);
-	switchdev_work->dev = dev;
+	switchdev_work->ds = dp->ds;
+	switchdev_work->port = dp->index;
 	switchdev_work->event = event;
 
 	switch (event) {
 	case SWITCHDEV_FDB_ADD_TO_DEVICE: /* fall through */
 	case SWITCHDEV_FDB_DEL_TO_DEVICE:
-		if (dsa_slave_switchdev_fdb_work_init(switchdev_work, ptr))
-			goto err_fdb_work_init;
+		fdb_info = ptr;
+
+		if (!fdb_info->added_by_user) {
+			kfree(switchdev_work);
+			return NOTIFY_OK;
+		}
+
+		ether_addr_copy(switchdev_work->addr,
+				fdb_info->addr);
+		switchdev_work->vid = fdb_info->vid;
+
 		dev_hold(dev);
 		break;
 	default:
@@ -2023,10 +2025,6 @@ static int dsa_slave_switchdev_event(struct notifier_block *unused,
 
 	dsa_schedule_work(&switchdev_work->work);
 	return NOTIFY_OK;
-
-err_fdb_work_init:
-	kfree(switchdev_work);
-	return NOTIFY_BAD;
 }
 
 static int dsa_slave_switchdev_blocking_event(struct notifier_block *unused,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC net-next 08/13] net: dsa: add ability to program unicast and multicast filters for CPU port
  2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
                   ` (6 preceding siblings ...)
  2020-05-21 21:10 ` [PATCH RFC net-next 07/13] net: dsa: don't use switchdev_notifier_fdb_info in dsa_switchdev_event_work Vladimir Oltean
@ 2020-05-21 21:10 ` Vladimir Oltean
  2020-05-21 21:10 ` [PATCH RFC net-next 09/13] net: dsa: mroute: don't panic the kernel if called without the prepare phase Vladimir Oltean
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-21 21:10 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem
  Cc: jiri, idosch, kuba, ivecera, netdev, horatiu.vultur,
	allan.nielsen, nikolay, roopa

From: Florian Fainelli <f.fainelli@gmail.com>

When the switch ports operate as individual network devices, the switch
driver might have configured the switch to flood multicast all the way
to the CPU port. This is really undesirable as it can lead to receiving
a lot of unwanted traffic that the network stack needs to filter in
software.

For each valid multicast address, program it into the switch's MDB only
when the host is interested in receiving such traffic, e.g: running a
multicast application.

For unicast filtering, consider that termination can only be done
through the primary MAC address of each net device virtually
corresponding to a switch port, as well as through upper interfaces
(VLAN, bridge) that add their MAC address to the list of secondary
unicast addresses of the switch net devices. For each such unicast
address, install a reference-counted FDB entry towards the CPU port.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/net/dsa.h |   6 ++
 net/dsa/Kconfig   |   1 +
 net/dsa/dsa2.c    |   6 ++
 net/dsa/slave.c   | 182 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 195 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 50389772c597..7aa78884a5f2 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -261,6 +261,12 @@ struct dsa_switch {
 	 */
 	const struct dsa_switch_ops	*ops;
 
+	/*
+	 * {MAC, VLAN} addresses that are copied to the CPU.
+	 */
+	struct netdev_hw_addr_list	uc;
+	struct netdev_hw_addr_list	mc;
+
 	/*
 	 * Slave mii_bus and devices for the individual ports.
 	 */
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index 739613070d07..d4644afdbdd7 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -9,6 +9,7 @@ menuconfig NET_DSA
 	tristate "Distributed Switch Architecture"
 	depends on HAVE_NET_DSA
 	depends on BRIDGE || BRIDGE=n
+	depends on VLAN_8021Q_IVDF || VLAN_8021Q_IVDF=n
 	select GRO_CELLS
 	select NET_SWITCHDEV
 	select PHYLINK
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 076908fdd29b..cd17554a912b 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -429,6 +429,9 @@ static int dsa_switch_setup(struct dsa_switch *ds)
 			goto unregister_notifier;
 	}
 
+	__hw_addr_init(&ds->mc);
+	__hw_addr_init(&ds->uc);
+
 	ds->setup = true;
 
 	return 0;
@@ -449,6 +452,9 @@ static void dsa_switch_teardown(struct dsa_switch *ds)
 	if (!ds->setup)
 		return;
 
+	__hw_addr_flush(&ds->mc);
+	__hw_addr_flush(&ds->uc);
+
 	if (ds->slave_mii_bus && ds->ops->phy_read)
 		mdiobus_unregister(ds->slave_mii_bus);
 
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index d2072fbd22fe..2743d689f6b1 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -62,6 +62,158 @@ static int dsa_slave_get_iflink(const struct net_device *dev)
 	return dsa_slave_to_master(dev)->ifindex;
 }
 
+/* Add a static host MDB entry, corresponding to a slave multicast MAC address,
+ * to the CPU port. The MDB entry is reference-counted (4 slave ports listening
+ * on the same multicast MAC address will only call this function once).
+ */
+static int dsa_upstream_sync_mdb_addr(struct net_device *dev,
+				      const unsigned char *addr)
+{
+	struct switchdev_obj_port_mdb mdb;
+
+	memset(&mdb, 0, sizeof(mdb));
+	mdb.obj.id = SWITCHDEV_OBJ_ID_HOST_MDB;
+	mdb.obj.flags = SWITCHDEV_F_DEFER;
+	mdb.vid = vlan_dev_get_addr_vid(dev, addr);
+	ether_addr_copy(mdb.addr, addr);
+
+	return switchdev_port_obj_add(dev, &mdb.obj, NULL);
+}
+
+/* Delete a static host MDB entry, corresponding to a slave multicast MAC
+ * address, to the CPU port. The MDB entry is reference-counted (4 slave ports
+ * listening on the same multicast MAC address will only call this function
+ * once).
+ */
+static int dsa_upstream_unsync_mdb_addr(struct net_device *dev,
+				        const unsigned char *addr)
+{
+	struct switchdev_obj_port_mdb mdb;
+
+	memset(&mdb, 0, sizeof(mdb));
+	mdb.obj.id = SWITCHDEV_OBJ_ID_HOST_MDB;
+	mdb.obj.flags = SWITCHDEV_F_DEFER;
+	mdb.vid = vlan_dev_get_addr_vid(dev, addr);
+	ether_addr_copy(mdb.addr, addr);
+
+	return switchdev_port_obj_del(dev, &mdb.obj);
+}
+
+static int dsa_slave_sync_mdb_addr(struct net_device *dev,
+				   const unsigned char *addr)
+{
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+	struct dsa_switch *ds = dp->ds;
+	int err;
+
+	err = __hw_addr_add(&ds->mc, addr, dev->addr_len + dev->vid_len,
+			    NETDEV_HW_ADDR_T_MULTICAST);
+	if (err)
+		return err;
+
+	return __hw_addr_sync_dev(&ds->mc, dev, dsa_upstream_sync_mdb_addr,
+				  dsa_upstream_unsync_mdb_addr);
+}
+
+static int dsa_slave_unsync_mdb_addr(struct net_device *dev,
+				     const unsigned char *addr)
+{
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+	struct dsa_switch *ds = dp->ds;
+	int err;
+
+	err = __hw_addr_del(&ds->mc, addr, dev->addr_len + dev->vid_len,
+			    NETDEV_HW_ADDR_T_MULTICAST);
+	if (err)
+		return err;
+
+	return __hw_addr_sync_dev(&ds->mc, dev, dsa_upstream_sync_mdb_addr,
+				  dsa_upstream_unsync_mdb_addr);
+}
+
+static void dsa_slave_switchdev_event_work(struct work_struct *work);
+
+static int dsa_upstream_fdb_addr(struct net_device *slave_dev,
+				 const unsigned char *addr,
+				 unsigned long event)
+{
+	int addr_len = slave_dev->addr_len + slave_dev->vid_len;
+	struct dsa_port *dp = dsa_slave_to_port(slave_dev);
+	u16 vid = vlan_dev_get_addr_vid(slave_dev, addr);
+	struct dsa_switchdev_event_work *switchdev_work;
+
+	switchdev_work = kzalloc(sizeof(*switchdev_work), GFP_ATOMIC);
+	if (!switchdev_work)
+		return -ENOMEM;
+
+	INIT_WORK(&switchdev_work->work, dsa_slave_switchdev_event_work);
+	switchdev_work->ds = dp->ds;
+	switchdev_work->port = dsa_upstream_port(dp->ds, dp->index);
+	switchdev_work->event = event;
+
+	memcpy(switchdev_work->addr, addr, addr_len);
+	switchdev_work->vid = vid;
+
+	dev_hold(slave_dev);
+	dsa_schedule_work(&switchdev_work->work);
+
+	return 0;
+}
+
+/* Add a static FDB entry, corresponding to a slave unicast MAC address,
+ * to the CPU port. The FDB entry is reference-counted (4 slave ports having
+ * the same MAC address will only call this function once).
+ */
+static int dsa_upstream_sync_fdb_addr(struct net_device *slave_dev,
+				      const unsigned char *addr)
+{
+	return dsa_upstream_fdb_addr(slave_dev, addr,
+				     SWITCHDEV_FDB_ADD_TO_DEVICE);
+}
+
+/* Remove a static FDB entry, corresponding to a slave unicast MAC address,
+ * from the CPU port. The FDB entry is reference-counted (the MAC address is
+ * only removed when there is no remaining slave port that uses it).
+ */
+static int dsa_upstream_unsync_fdb_addr(struct net_device *slave_dev,
+					const unsigned char *addr)
+{
+	return dsa_upstream_fdb_addr(slave_dev, addr,
+				     SWITCHDEV_FDB_DEL_TO_DEVICE);
+}
+
+static int dsa_slave_sync_fdb_addr(struct net_device *dev,
+				   const unsigned char *addr)
+{
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+	struct dsa_switch *ds = dp->ds;
+	int err;
+
+	err = __hw_addr_add(&ds->uc, addr, dev->addr_len + dev->vid_len,
+			    NETDEV_HW_ADDR_T_UNICAST);
+	if (err)
+		return err;
+
+	return __hw_addr_sync_dev(&ds->uc, dev, dsa_upstream_sync_fdb_addr,
+				  dsa_upstream_unsync_fdb_addr);
+}
+
+static int dsa_slave_unsync_fdb_addr(struct net_device *dev,
+				     const unsigned char *addr)
+{
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+	struct dsa_switch *ds = dp->ds;
+	int err;
+
+	err = __hw_addr_del(&ds->uc, addr, dev->addr_len + dev->vid_len,
+			    NETDEV_HW_ADDR_T_UNICAST);
+	if (err)
+		return err;
+
+	return __hw_addr_sync_dev(&ds->uc, dev, dsa_upstream_sync_fdb_addr,
+				  dsa_upstream_unsync_fdb_addr);
+}
+
 static int dsa_slave_open(struct net_device *dev)
 {
 	struct net_device *master = dsa_slave_to_master(dev);
@@ -76,6 +228,9 @@ static int dsa_slave_open(struct net_device *dev)
 		if (err < 0)
 			goto out;
 	}
+	err = dsa_slave_sync_fdb_addr(dev, dev->dev_addr);
+	if (err < 0)
+		goto out;
 
 	if (dev->flags & IFF_ALLMULTI) {
 		err = dev_set_allmulti(master, 1);
@@ -103,6 +258,7 @@ static int dsa_slave_open(struct net_device *dev)
 del_unicast:
 	if (!ether_addr_equal(dev->dev_addr, master->dev_addr))
 		dev_uc_del(master, dev->dev_addr);
+	dsa_slave_unsync_fdb_addr(dev, dev->dev_addr);
 out:
 	return err;
 }
@@ -116,6 +272,9 @@ static int dsa_slave_close(struct net_device *dev)
 
 	dev_mc_unsync(master, dev);
 	dev_uc_unsync(master, dev);
+	__dev_mc_unsync(dev, dsa_slave_unsync_mdb_addr);
+	__dev_uc_unsync(dev, dsa_slave_unsync_fdb_addr);
+
 	if (dev->flags & IFF_ALLMULTI)
 		dev_set_allmulti(master, -1);
 	if (dev->flags & IFF_PROMISC)
@@ -143,7 +302,17 @@ static void dsa_slave_change_rx_flags(struct net_device *dev, int change)
 static void dsa_slave_set_rx_mode(struct net_device *dev)
 {
 	struct net_device *master = dsa_slave_to_master(dev);
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+
+	/* If the port is bridged, the bridge takes care of sending
+	 * SWITCHDEV_OBJ_ID_HOST_MDB to program the host's MC filter
+	 */
+	if (netdev_mc_empty(dev) || dp->bridge_dev)
+		goto out;
 
+	__dev_mc_sync(dev, dsa_slave_sync_mdb_addr, dsa_slave_unsync_mdb_addr);
+out:
+	__dev_uc_sync(dev, dsa_slave_sync_fdb_addr, dsa_slave_unsync_fdb_addr);
 	dev_mc_sync(master, dev);
 	dev_uc_sync(master, dev);
 }
@@ -165,9 +334,15 @@ static int dsa_slave_set_mac_address(struct net_device *dev, void *a)
 		if (err < 0)
 			return err;
 	}
+	err = dsa_slave_sync_fdb_addr(dev, addr->sa_data);
+	if (err < 0)
+		goto out;
 
 	if (!ether_addr_equal(dev->dev_addr, master->dev_addr))
 		dev_uc_del(master, dev->dev_addr);
+	err = dsa_slave_unsync_fdb_addr(dev, dev->dev_addr);
+	if (err < 0)
+		goto out;
 
 out:
 	ether_addr_copy(dev->dev_addr, addr->sa_data);
@@ -1752,6 +1927,8 @@ int dsa_slave_create(struct dsa_port *port)
 	else
 		eth_hw_addr_inherit(slave_dev, master);
 	slave_dev->priv_flags |= IFF_NO_QUEUE;
+	if (ds->ops->port_fdb_add && ds->ops->port_egress_floods)
+		slave_dev->priv_flags |= IFF_UNICAST_FLT;
 	slave_dev->netdev_ops = &dsa_slave_netdev_ops;
 	slave_dev->min_mtu = 0;
 	if (ds->ops->port_max_mtu)
@@ -1759,6 +1936,7 @@ int dsa_slave_create(struct dsa_port *port)
 	else
 		slave_dev->max_mtu = ETH_MAX_MTU;
 	SET_NETDEV_DEVTYPE(slave_dev, &dsa_type);
+	vlan_dev_ivdf_set(slave_dev, true);
 
 	netdev_for_each_tx_queue(slave_dev, dsa_slave_set_lockdep_class_one,
 				 NULL);
@@ -1854,6 +2032,10 @@ static int dsa_slave_changeupper(struct net_device *dev,
 
 	if (netif_is_bridge_master(info->upper_dev)) {
 		if (info->linking) {
+			/* Remove existing MC addresses that might have been
+			 * programmed
+			 */
+			__dev_mc_unsync(dev, dsa_slave_unsync_mdb_addr);
 			err = dsa_port_bridge_join(dp, info->upper_dev);
 			if (!err)
 				dsa_bridge_mtu_normalization(dp);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC net-next 09/13] net: dsa: mroute: don't panic the kernel if called without the prepare phase
  2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
                   ` (7 preceding siblings ...)
  2020-05-21 21:10 ` [PATCH RFC net-next 08/13] net: dsa: add ability to program unicast and multicast filters for CPU port Vladimir Oltean
@ 2020-05-21 21:10 ` Vladimir Oltean
  2020-05-21 21:10 ` [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding Vladimir Oltean
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-21 21:10 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem
  Cc: jiri, idosch, kuba, ivecera, netdev, horatiu.vultur,
	allan.nielsen, nikolay, roopa

From: Vladimir Oltean <vladimir.oltean@nxp.com>

Currently, this function would check the port_egress_floods pointer only
in the preparation phase, the assumption being that the caller wouldn't
proceed to a second call since it returned -EOPNOTSUPP. If the function
were to be called a second time though, the port_egress_floods pointer
would not be checked and the driver would proceed to dereference it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/port.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/dsa/port.c b/net/dsa/port.c
index e23ece229c7e..c4032f79225a 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -324,8 +324,11 @@ int dsa_port_mrouter(struct dsa_port *dp, bool mrouter,
 	struct dsa_switch *ds = dp->ds;
 	int port = dp->index;
 
+	if (!ds->ops->port_egress_floods)
+		return -EOPNOTSUPP;
+
 	if (switchdev_trans_ph_prepare(trans))
-		return ds->ops->port_egress_floods ? 0 : -EOPNOTSUPP;
+		return 0;
 
 	return ds->ops->port_egress_floods(ds, port, true, mrouter);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding
  2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
                   ` (8 preceding siblings ...)
  2020-05-21 21:10 ` [PATCH RFC net-next 09/13] net: dsa: mroute: don't panic the kernel if called without the prepare phase Vladimir Oltean
@ 2020-05-21 21:10 ` Vladimir Oltean
  2020-05-22 12:38   ` Nikolay Aleksandrov
  2020-05-24 14:26   ` Ido Schimmel
  2020-05-21 21:10 ` [PATCH RFC net-next 11/13] net: dsa: deal with new flooding port attributes from bridge Vladimir Oltean
                   ` (5 subsequent siblings)
  15 siblings, 2 replies; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-21 21:10 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem
  Cc: jiri, idosch, kuba, ivecera, netdev, horatiu.vultur,
	allan.nielsen, nikolay, roopa

From: Vladimir Oltean <vladimir.oltean@nxp.com>

In cases where the bridge is offloaded by a switchdev, there are
situations where we can optimize RX filtering towards the host. To be
precise, the host only needs to do termination, which it can do by
responding at the MAC addresses of the slave ports and of the bridge
interface itself. But most notably, it doesn't need to do forwarding,
so there is no need to see packets with unknown destination address.

But there are, however, cases when a switchdev does need to flood to the
CPU. Such an example is when the switchdev is bridged with a foreign
interface, and since there is no offloaded datapath, packets need to
pass through the CPU. Currently this is the only identified case, but it
can be extended at any time.

So far, switchdev implementers made driver-level assumptions, such as:
this chip is never integrated in SoCs where it can be bridged with a
foreign interface, so I'll just disable host flooding and save some CPU
cycles. Or: I can never know what else can be bridged with this
switchdev port, so I must leave host flooding enabled in any case.

Let the bridge drive the host flooding decision, and pass it to
switchdev via the same mechanism as the external flooding flags.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/linux/if_bridge.h |  3 +++
 net/bridge/br_if.c        | 40 +++++++++++++++++++++++++++++++++++++++
 net/bridge/br_switchdev.c |  4 +++-
 3 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index b3a8d3054af0..6891a432862d 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -49,6 +49,9 @@ struct br_ip_list {
 #define BR_ISOLATED		BIT(16)
 #define BR_MRP_AWARE		BIT(17)
 #define BR_MRP_LOST_CONT	BIT(18)
+#define BR_HOST_FLOOD		BIT(19)
+#define BR_HOST_MCAST_FLOOD	BIT(20)
+#define BR_HOST_BCAST_FLOOD	BIT(21)
 
 #define BR_DEFAULT_AGEING_TIME	(300 * HZ)
 
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index a0e9a7937412..aae59d1e619b 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -166,6 +166,45 @@ void br_manage_promisc(struct net_bridge *br)
 	}
 }
 
+static int br_manage_host_flood(struct net_bridge *br)
+{
+	const unsigned long mask = BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD |
+				   BR_HOST_BCAST_FLOOD;
+	struct net_bridge_port *p, *q;
+
+	list_for_each_entry(p, &br->port_list, list) {
+		unsigned long flags = p->flags;
+		bool sw_bridging = false;
+		int err;
+
+		list_for_each_entry(q, &br->port_list, list) {
+			if (p == q)
+				continue;
+
+			if (!netdev_port_same_parent_id(p->dev, q->dev)) {
+				sw_bridging = true;
+				break;
+			}
+		}
+
+		if (sw_bridging)
+			flags |= mask;
+		else
+			flags &= ~mask;
+
+		if (flags == p->flags)
+			continue;
+
+		err = br_switchdev_set_port_flag(p, flags, mask);
+		if (err)
+			return err;
+
+		p->flags = flags;
+	}
+
+	return 0;
+}
+
 int nbp_backup_change(struct net_bridge_port *p,
 		      struct net_device *backup_dev)
 {
@@ -231,6 +270,7 @@ static void nbp_update_port_count(struct net_bridge *br)
 		br->auto_cnt = cnt;
 		br_manage_promisc(br);
 	}
+	br_manage_host_flood(br);
 }
 
 static void nbp_delete_promisc(struct net_bridge_port *p)
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index 015209bf44aa..360806ac7463 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -56,7 +56,9 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
 
 /* Flags that can be offloaded to hardware */
 #define BR_PORT_FLAGS_HW_OFFLOAD (BR_LEARNING | BR_FLOOD | \
-				  BR_MCAST_FLOOD | BR_BCAST_FLOOD)
+				  BR_MCAST_FLOOD | BR_BCAST_FLOOD | \
+				  BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD | \
+				  BR_HOST_BCAST_FLOOD)
 
 int br_switchdev_set_port_flag(struct net_bridge_port *p,
 			       unsigned long flags,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC net-next 11/13] net: dsa: deal with new flooding port attributes from bridge
  2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
                   ` (9 preceding siblings ...)
  2020-05-21 21:10 ` [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding Vladimir Oltean
@ 2020-05-21 21:10 ` Vladimir Oltean
  2020-05-21 21:10 ` [PATCH RFC net-next 12/13] net: dsa: treat switchdev notifications for multicast router connected to port Vladimir Oltean
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-21 21:10 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem
  Cc: jiri, idosch, kuba, ivecera, netdev, horatiu.vultur,
	allan.nielsen, nikolay, roopa

From: Vladimir Oltean <vladimir.oltean@nxp.com>

This refactors the DSA core handling of flooding attributes, since 3
more have been introduced (related to host flooding). In DSA, actually
host flooding is the same as egress flooding of the CPU port.

Note that there are some switches where flooding is a decision taken per
{source port, destination port}. In DSA, it is only per egress port. For
now, let's keep it that way, which means that we need to implement a
"flood count" for the CPU port (keep it in flooding while there is at
least one user port with the BR_HOST_FLOOD flag set).

With this patch, RX filtering can be done for switch ports operating in
standalone mode and in bridge mode with no foreign interfaces. When
bridging with other net devices in the system, all unknown destinations
are allowed to go to the CPU, where they continue to be forwarded in
software.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/net/dsa.h  |   8 ++++
 net/dsa/dsa_priv.h |   2 +-
 net/dsa/port.c     | 113 +++++++++++++++++++++++++++++++++------------
 3 files changed, 93 insertions(+), 30 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 7aa78884a5f2..c256467f1f4a 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -198,6 +198,14 @@ struct dsa_port {
 	struct devlink_port	devlink_port;
 	struct phylink		*pl;
 	struct phylink_config	pl_config;
+	/* Operational state of flooding */
+	int			uc_flood_count;
+	int			mc_flood_count;
+	bool			uc_flood;
+	bool			mc_flood;
+	/* Knobs from bridge */
+	unsigned long		br_flags;
+	bool			mrouter;
 
 	struct list_head list;
 
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 001668007efd..91cbaefc56b3 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -167,7 +167,7 @@ int dsa_port_mdb_del(const struct dsa_port *dp,
 		     const struct switchdev_obj_port_mdb *mdb);
 int dsa_port_pre_bridge_flags(const struct dsa_port *dp, unsigned long flags,
 			      struct switchdev_trans *trans);
-int dsa_port_bridge_flags(const struct dsa_port *dp, unsigned long flags,
+int dsa_port_bridge_flags(struct dsa_port *dp, unsigned long flags,
 			  struct switchdev_trans *trans);
 int dsa_port_mrouter(struct dsa_port *dp, bool mrouter,
 		     struct switchdev_trans *trans);
diff --git a/net/dsa/port.c b/net/dsa/port.c
index c4032f79225a..b527740d03a8 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -144,10 +144,7 @@ int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br)
 	};
 	int err;
 
-	/* Set the flooding mode before joining the port in the switch */
-	err = dsa_port_bridge_flags(dp, BR_FLOOD | BR_MCAST_FLOOD, NULL);
-	if (err)
-		return err;
+	dp->cpu_dp->mrouter = br_multicast_router(br);
 
 	/* Here the interface is already bridged. Reflect the current
 	 * configuration so that drivers can program their chips accordingly.
@@ -156,12 +153,6 @@ int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br)
 
 	err = dsa_broadcast(DSA_NOTIFIER_BRIDGE_JOIN, &info);
 
-	/* The bridging is rolled back on error */
-	if (err) {
-		dsa_port_bridge_flags(dp, 0, NULL);
-		dp->bridge_dev = NULL;
-	}
-
 	return err;
 }
 
@@ -184,8 +175,12 @@ void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br)
 	if (err)
 		pr_err("DSA: failed to notify DSA_NOTIFIER_BRIDGE_LEAVE\n");
 
-	/* Port is leaving the bridge, disable flooding */
-	dsa_port_bridge_flags(dp, 0, NULL);
+	dp->cpu_dp->mrouter = false;
+
+	/* Port is leaving the bridge, disable host flooding and enable
+	 * egress flooding
+	 */
+	dsa_port_bridge_flags(dp, BR_FLOOD | BR_MCAST_FLOOD, NULL);
 
 	/* Port left the bridge, put in BR_STATE_DISABLED by the bridge layer,
 	 * so allow it to be in BR_STATE_FORWARDING to be kept functional
@@ -289,48 +284,108 @@ int dsa_port_ageing_time(struct dsa_port *dp, clock_t ageing_clock,
 	return dsa_port_notify(dp, DSA_NOTIFIER_AGEING_TIME, &info);
 }
 
+static int dsa_port_update_flooding(struct dsa_port *dp, int uc_flood_count,
+				    int mc_flood_count)
+{
+	struct dsa_switch *ds = dp->ds;
+	bool uc_flood_changed;
+	bool mc_flood_changed;
+	int port = dp->index;
+	bool uc_flood;
+	bool mc_flood;
+	int err;
+
+	if (!ds->ops->port_egress_floods)
+		return 0;
+
+	uc_flood = !!uc_flood_count;
+	mc_flood = dp->mrouter;
+
+	uc_flood_changed = dp->uc_flood ^ uc_flood;
+	mc_flood_changed = dp->mc_flood ^ mc_flood;
+
+	if (uc_flood_changed || mc_flood_changed) {
+		err = ds->ops->port_egress_floods(ds, port, uc_flood, mc_flood);
+		if (err)
+			return err;
+	}
+
+	dp->uc_flood_count = uc_flood_count;
+	dp->mc_flood_count = mc_flood_count;
+	dp->uc_flood = uc_flood;
+	dp->mc_flood = mc_flood;
+
+	return 0;
+}
+
 int dsa_port_pre_bridge_flags(const struct dsa_port *dp, unsigned long flags,
 			      struct switchdev_trans *trans)
 {
+	const unsigned long mask = BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD |
+				   BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD |
+				   BR_HOST_BCAST_FLOOD;
 	struct dsa_switch *ds = dp->ds;
 
-	if (!ds->ops->port_egress_floods ||
-	    (flags & ~(BR_FLOOD | BR_MCAST_FLOOD)))
-		return -EINVAL;
+	if (!ds->ops->port_egress_floods || (flags & ~mask))
+		return -EOPNOTSUPP;
 
 	return 0;
 }
 
-int dsa_port_bridge_flags(const struct dsa_port *dp, unsigned long flags,
+int dsa_port_bridge_flags(struct dsa_port *dp, unsigned long flags,
 			  struct switchdev_trans *trans)
 {
-	struct dsa_switch *ds = dp->ds;
-	int port = dp->index;
+	struct dsa_port *cpu_dp = dp->cpu_dp;
+	int cpu_uc_flood_count;
+	int cpu_mc_flood_count;
+	unsigned long changed;
+	int uc_flood_count;
+	int mc_flood_count;
 	int err = 0;
 
 	if (switchdev_trans_ph_prepare(trans))
 		return 0;
 
-	if (ds->ops->port_egress_floods)
-		err = ds->ops->port_egress_floods(ds, port, flags & BR_FLOOD,
-						  flags & BR_MCAST_FLOOD);
+	uc_flood_count = dp->uc_flood_count;
+	mc_flood_count = dp->mc_flood_count;
+	cpu_uc_flood_count = cpu_dp->uc_flood_count;
+	cpu_mc_flood_count = cpu_dp->mc_flood_count;
 
-	return err;
+	changed = dp->br_flags ^ flags;
+
+	if (changed & BR_FLOOD)
+		uc_flood_count += (flags & BR_FLOOD) ? 1 : -1;
+	if (changed & BR_MCAST_FLOOD)
+		mc_flood_count += (flags & BR_MCAST_FLOOD) ? 1 : -1;
+	if (changed & BR_HOST_FLOOD)
+		cpu_uc_flood_count += (flags & BR_HOST_FLOOD) ? 1 : -1;
+	if (changed & BR_HOST_MCAST_FLOOD)
+		cpu_mc_flood_count += (flags & BR_HOST_MCAST_FLOOD) ? 1 : -1;
+
+	err = dsa_port_update_flooding(dp, uc_flood_count, mc_flood_count);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
+	err = dsa_port_update_flooding(cpu_dp, cpu_uc_flood_count,
+				       cpu_mc_flood_count);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
+	dp->br_flags = flags;
+
+	return 0;
 }
 
 int dsa_port_mrouter(struct dsa_port *dp, bool mrouter,
 		     struct switchdev_trans *trans)
 {
-	struct dsa_switch *ds = dp->ds;
-	int port = dp->index;
-
-	if (!ds->ops->port_egress_floods)
-		return -EOPNOTSUPP;
-
 	if (switchdev_trans_ph_prepare(trans))
 		return 0;
 
-	return ds->ops->port_egress_floods(ds, port, true, mrouter);
+	dp->mrouter = mrouter;
+
+	return dsa_port_update_flooding(dp, dp->uc_flood_count,
+					dp->mc_flood_count);
 }
 
 int dsa_port_mtu_change(struct dsa_port *dp, int new_mtu,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC net-next 12/13] net: dsa: treat switchdev notifications for multicast router connected to port
  2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
                   ` (10 preceding siblings ...)
  2020-05-21 21:10 ` [PATCH RFC net-next 11/13] net: dsa: deal with new flooding port attributes from bridge Vladimir Oltean
@ 2020-05-21 21:10 ` Vladimir Oltean
  2020-05-21 21:10 ` [PATCH RFC net-next 13/13] net: dsa: wire up multicast IGMP snooping attribute notification Vladimir Oltean
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-21 21:10 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem
  Cc: jiri, idosch, kuba, ivecera, netdev, horatiu.vultur,
	allan.nielsen, nikolay, roopa

From: Vladimir Oltean <vladimir.oltean@nxp.com>

Similar to the "bridge is multicast router" case, unknown multicast
should be flooded by this bridge to the ports where a multicast router
is connected.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/slave.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 2743d689f6b1..c023f1120736 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -467,7 +467,12 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
 	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
 		ret = dsa_port_bridge_flags(dp, attr->u.brport_flags, trans);
 		break;
+	case SWITCHDEV_ATTR_ID_PORT_MROUTER:
+		/* A multicast router is connected to this external port */
+		ret = dsa_port_mrouter(dp, attr->u.mrouter, trans);
+		break;
 	case SWITCHDEV_ATTR_ID_BRIDGE_MROUTER:
+		/* The local bridge is a multicast router */
 		ret = dsa_port_mrouter(dp->cpu_dp, attr->u.mrouter, trans);
 		break;
 	default:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH RFC net-next 13/13] net: dsa: wire up multicast IGMP snooping attribute notification
  2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
                   ` (11 preceding siblings ...)
  2020-05-21 21:10 ` [PATCH RFC net-next 12/13] net: dsa: treat switchdev notifications for multicast router connected to port Vladimir Oltean
@ 2020-05-21 21:10 ` Vladimir Oltean
  2020-05-22 18:42 ` [PATCH RFC net-next 00/13] RX filtering for DSA switches Allan W. Nielsen
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-21 21:10 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem
  Cc: jiri, idosch, kuba, ivecera, netdev, horatiu.vultur,
	allan.nielsen, nikolay, roopa

From: Florian Fainelli <f.fainelli@gmail.com>

The bridge can at runtime be configured with or without IGMP snooping
enabled but we were not processing the switchdev attribute that notifies
about that toggle, do this now.

Drivers that support frame parsing up to IGMP/MLD should enable trapping
of those frames towards the CPU, while pure L2 switches should trap the
entire range of 01:00:5E:00:00:01 to 01:00:5E:00:00:FF.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/net/dsa.h  |  3 +++
 net/dsa/dsa_priv.h | 13 +++++++++++++
 net/dsa/port.c     | 47 +++++++++++++++++++++++++++++++++++++++++++++-
 net/dsa/slave.c    |  3 +++
 net/dsa/switch.c   | 36 +++++++++++++++++++++++++++++++++++
 5 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index c256467f1f4a..3f7c1f56908c 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -205,6 +205,7 @@ struct dsa_port {
 	bool			mc_flood;
 	/* Knobs from bridge */
 	unsigned long		br_flags;
+	bool			mc_disabled;
 	bool			mrouter;
 
 	struct list_head list;
@@ -564,6 +565,8 @@ struct dsa_switch_ops {
 			     const struct switchdev_obj_port_mdb *mdb);
 	int	(*port_mdb_del)(struct dsa_switch *ds, int port,
 				const struct switchdev_obj_port_mdb *mdb);
+	int	(*port_igmp_mld_snoop)(struct dsa_switch *ds, int port,
+				       bool enable);
 	/*
 	 * RXNFC
 	 */
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 91cbaefc56b3..0761f2fff994 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -24,6 +24,7 @@ enum {
 	DSA_NOTIFIER_VLAN_ADD,
 	DSA_NOTIFIER_VLAN_DEL,
 	DSA_NOTIFIER_MTU,
+	DSA_NOTIFIER_MC_DISABLED,
 };
 
 /* DSA_NOTIFIER_AGEING_TIME */
@@ -72,6 +73,14 @@ struct dsa_notifier_mtu_info {
 	int mtu;
 };
 
+/* DSA_NOTIFIER_MC_DISABLED */
+struct dsa_notifier_mc_disabled_info {
+	int tree_index;
+	int sw_index;
+	struct net_device *br;
+	bool mc_disabled;
+};
+
 struct dsa_switchdev_event_work {
 	struct dsa_switch *ds;
 	int port;
@@ -150,6 +159,10 @@ int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br);
 void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br);
 int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering,
 			    struct switchdev_trans *trans);
+int dsa_port_multicast_toggle(struct dsa_switch *ds, int port,
+			      bool mc_disabled);
+int dsa_port_mc_disabled(struct dsa_port *dp, bool mc_disabled,
+			 struct switchdev_trans *trans);
 bool dsa_port_skip_vlan_configuration(struct dsa_port *dp);
 int dsa_port_ageing_time(struct dsa_port *dp, clock_t ageing_clock,
 			 struct switchdev_trans *trans);
diff --git a/net/dsa/port.c b/net/dsa/port.c
index b527740d03a8..962f25ee8cf2 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -144,6 +144,7 @@ int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br)
 	};
 	int err;
 
+	dp->cpu_dp->mc_disabled = !br_multicast_enabled(br);
 	dp->cpu_dp->mrouter = br_multicast_router(br);
 
 	/* Here the interface is already bridged. Reflect the current
@@ -175,6 +176,7 @@ void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br)
 	if (err)
 		pr_err("DSA: failed to notify DSA_NOTIFIER_BRIDGE_LEAVE\n");
 
+	dp->cpu_dp->mc_disabled = true;
 	dp->cpu_dp->mrouter = false;
 
 	/* Port is leaving the bridge, disable host flooding and enable
@@ -299,7 +301,17 @@ static int dsa_port_update_flooding(struct dsa_port *dp, int uc_flood_count,
 		return 0;
 
 	uc_flood = !!uc_flood_count;
-	mc_flood = dp->mrouter;
+	/* As explained in commit 8ecd4591e761 ("mlxsw: spectrum: Add an option
+	 * to flood mc by mc_router_port"), the decision whether to flood a
+	 * multicast packet to a port depends on 3 flags: mc_disabled,
+	 * mc_router_port, mc_flood.
+	 * If mc_disabled is on, the port will be flooded according to
+	 * mc_flood, otherwise, according to mc_router_port.
+	 */
+	if (dp->mc_disabled)
+		mc_flood = !!mc_flood_count;
+	else
+		mc_flood = dp->mrouter;
 
 	uc_flood_changed = dp->uc_flood ^ uc_flood;
 	mc_flood_changed = dp->mc_flood ^ mc_flood;
@@ -388,6 +400,39 @@ int dsa_port_mrouter(struct dsa_port *dp, bool mrouter,
 					dp->mc_flood_count);
 }
 
+int dsa_port_multicast_toggle(struct dsa_switch *ds, int port, bool mc_disabled)
+{
+	struct dsa_port *dp = dsa_to_port(ds, port);
+	int err;
+
+	if (ds->ops->port_igmp_mld_snoop) {
+		err = ds->ops->port_igmp_mld_snoop(ds, port, !mc_disabled);
+		if (err)
+			return err;
+	}
+
+	dp->mc_disabled = mc_disabled;
+
+	return dsa_port_update_flooding(dp, dp->uc_flood_count,
+					dp->mc_flood_count);
+}
+
+int dsa_port_mc_disabled(struct dsa_port *dp, bool mc_disabled,
+			 struct switchdev_trans *trans)
+{
+	struct dsa_notifier_mc_disabled_info info = {
+		.tree_index = dp->ds->dst->index,
+		.sw_index = dp->ds->index,
+		.br = dp->bridge_dev,
+		.mc_disabled = mc_disabled,
+	};
+
+	if (switchdev_trans_ph_prepare(trans))
+		return 0;
+
+	return dsa_broadcast(DSA_NOTIFIER_MC_DISABLED, &info);
+}
+
 int dsa_port_mtu_change(struct dsa_port *dp, int new_mtu,
 			bool propagate_upstream)
 {
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index c023f1120736..c0929613f1b4 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -475,6 +475,9 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
 		/* The local bridge is a multicast router */
 		ret = dsa_port_mrouter(dp->cpu_dp, attr->u.mrouter, trans);
 		break;
+	case SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED:
+		ret = dsa_port_mc_disabled(dp, attr->u.mc_disabled, trans);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index 86c8dc5c32a0..9d4f8fd9cf10 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -337,6 +337,39 @@ static int dsa_switch_vlan_del(struct dsa_switch *ds,
 	return 0;
 }
 
+static bool
+dsa_switch_mc_disabled_match(struct dsa_switch *ds, int port,
+			     struct dsa_notifier_mc_disabled_info *info)
+{
+	struct dsa_port *dp = dsa_to_port(ds, port);
+	struct dsa_switch_tree *dst = ds->dst;
+
+	if (dp->bridge_dev == info->br)
+		return true;
+
+	if (dst->index == info->tree_index && ds->index == info->sw_index)
+		return dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port);
+
+	return false;
+}
+
+static int dsa_switch_mc_disabled(struct dsa_switch *ds,
+				  struct dsa_notifier_mc_disabled_info *info)
+{
+	bool mc_disabled = info->mc_disabled;
+	int port, err;
+
+	for (port = 0; port < ds->num_ports; port++) {
+		if (dsa_switch_mc_disabled_match(ds, port, info)) {
+			err = dsa_port_multicast_toggle(ds, port, mc_disabled);
+			if (err)
+				return err;
+		}
+	}
+
+	return 0;
+}
+
 static int dsa_switch_event(struct notifier_block *nb,
 			    unsigned long event, void *info)
 {
@@ -374,6 +407,9 @@ static int dsa_switch_event(struct notifier_block *nb,
 	case DSA_NOTIFIER_MTU:
 		err = dsa_switch_mtu(ds, info);
 		break;
+	case DSA_NOTIFIER_MC_DISABLED:
+		err = dsa_switch_mc_disabled(ds, info);
+		break;
 	default:
 		err = -EOPNOTSUPP;
 		break;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding
  2020-05-21 21:10 ` [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding Vladimir Oltean
@ 2020-05-22 12:38   ` Nikolay Aleksandrov
  2020-05-22 13:13     ` Vladimir Oltean
  2020-05-24 14:26   ` Ido Schimmel
  1 sibling, 1 reply; 46+ messages in thread
From: Nikolay Aleksandrov @ 2020-05-22 12:38 UTC (permalink / raw)
  To: Vladimir Oltean, andrew, f.fainelli, vivien.didelot, davem
  Cc: jiri, idosch, kuba, ivecera, netdev, horatiu.vultur,
	allan.nielsen, roopa

On 22/05/2020 00:10, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> In cases where the bridge is offloaded by a switchdev, there are
> situations where we can optimize RX filtering towards the host. To be
> precise, the host only needs to do termination, which it can do by
> responding at the MAC addresses of the slave ports and of the bridge
> interface itself. But most notably, it doesn't need to do forwarding,
> so there is no need to see packets with unknown destination address.
> 
> But there are, however, cases when a switchdev does need to flood to the
> CPU. Such an example is when the switchdev is bridged with a foreign
> interface, and since there is no offloaded datapath, packets need to
> pass through the CPU. Currently this is the only identified case, but it
> can be extended at any time.
> 
> So far, switchdev implementers made driver-level assumptions, such as:
> this chip is never integrated in SoCs where it can be bridged with a
> foreign interface, so I'll just disable host flooding and save some CPU
> cycles. Or: I can never know what else can be bridged with this
> switchdev port, so I must leave host flooding enabled in any case.
> 
> Let the bridge drive the host flooding decision, and pass it to
> switchdev via the same mechanism as the external flooding flags.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  include/linux/if_bridge.h |  3 +++
>  net/bridge/br_if.c        | 40 +++++++++++++++++++++++++++++++++++++++
>  net/bridge/br_switchdev.c |  4 +++-
>  3 files changed, 46 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index b3a8d3054af0..6891a432862d 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -49,6 +49,9 @@ struct br_ip_list {
>  #define BR_ISOLATED		BIT(16)
>  #define BR_MRP_AWARE		BIT(17)
>  #define BR_MRP_LOST_CONT	BIT(18)
> +#define BR_HOST_FLOOD		BIT(19)
> +#define BR_HOST_MCAST_FLOOD	BIT(20)
> +#define BR_HOST_BCAST_FLOOD	BIT(21)
>  
>  #define BR_DEFAULT_AGEING_TIME	(300 * HZ)
>  
> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> index a0e9a7937412..aae59d1e619b 100644
> --- a/net/bridge/br_if.c
> +++ b/net/bridge/br_if.c
> @@ -166,6 +166,45 @@ void br_manage_promisc(struct net_bridge *br)
>  	}
>  }
>  
> +static int br_manage_host_flood(struct net_bridge *br)
> +{
> +	const unsigned long mask = BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD |
> +				   BR_HOST_BCAST_FLOOD;
> +	struct net_bridge_port *p, *q;
> +
> +	list_for_each_entry(p, &br->port_list, list) {
> +		unsigned long flags = p->flags;
> +		bool sw_bridging = false;
> +		int err;
> +
> +		list_for_each_entry(q, &br->port_list, list) {
> +			if (p == q)
> +				continue;
> +
> +			if (!netdev_port_same_parent_id(p->dev, q->dev)) {
> +				sw_bridging = true;
> +				break;
> +			}
> +		}
> +
> +		if (sw_bridging)
> +			flags |= mask;
> +		else
> +			flags &= ~mask;
> +
> +		if (flags == p->flags)
> +			continue;
> +
> +		err = br_switchdev_set_port_flag(p, flags, mask);
> +		if (err)
> +			return err;
> +
> +		p->flags = flags;
> +	}
> +
> +	return 0;
> +}
> +
>  int nbp_backup_change(struct net_bridge_port *p,
>  		      struct net_device *backup_dev)
>  {
> @@ -231,6 +270,7 @@ static void nbp_update_port_count(struct net_bridge *br)
>  		br->auto_cnt = cnt;
>  		br_manage_promisc(br);
>  	}
> +	br_manage_host_flood(br);
>  }
>  

Can we do this only at port add/del ?
Right now it will be invoked also by br_port_flags_change() upon BR_AUTO_MASK flag change.

>  static void nbp_delete_promisc(struct net_bridge_port *p)
> diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> index 015209bf44aa..360806ac7463 100644
> --- a/net/bridge/br_switchdev.c
> +++ b/net/bridge/br_switchdev.c
> @@ -56,7 +56,9 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
>  
>  /* Flags that can be offloaded to hardware */
>  #define BR_PORT_FLAGS_HW_OFFLOAD (BR_LEARNING | BR_FLOOD | \
> -				  BR_MCAST_FLOOD | BR_BCAST_FLOOD)
> +				  BR_MCAST_FLOOD | BR_BCAST_FLOOD | \
> +				  BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD | \
> +				  BR_HOST_BCAST_FLOOD)
>  
>  int br_switchdev_set_port_flag(struct net_bridge_port *p,
>  			       unsigned long flags,
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding
  2020-05-22 12:38   ` Nikolay Aleksandrov
@ 2020-05-22 13:13     ` Vladimir Oltean
  2020-05-22 18:45       ` Allan W. Nielsen
  0 siblings, 1 reply; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-22 13:13 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Ido Schimmel, Jakub Kicinski, Ivan Vecera, netdev,
	Horatiu Vultur, Allan W. Nielsen, Roopa Prabhu

On Fri, 22 May 2020 at 15:38, Nikolay Aleksandrov
<nikolay@cumulusnetworks.com> wrote:
>
> On 22/05/2020 00:10, Vladimir Oltean wrote:
> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> >
> > In cases where the bridge is offloaded by a switchdev, there are
> > situations where we can optimize RX filtering towards the host. To be
> > precise, the host only needs to do termination, which it can do by
> > responding at the MAC addresses of the slave ports and of the bridge
> > interface itself. But most notably, it doesn't need to do forwarding,
> > so there is no need to see packets with unknown destination address.
> >
> > But there are, however, cases when a switchdev does need to flood to the
> > CPU. Such an example is when the switchdev is bridged with a foreign
> > interface, and since there is no offloaded datapath, packets need to
> > pass through the CPU. Currently this is the only identified case, but it
> > can be extended at any time.
> >
> > So far, switchdev implementers made driver-level assumptions, such as:
> > this chip is never integrated in SoCs where it can be bridged with a
> > foreign interface, so I'll just disable host flooding and save some CPU
> > cycles. Or: I can never know what else can be bridged with this
> > switchdev port, so I must leave host flooding enabled in any case.
> >
> > Let the bridge drive the host flooding decision, and pass it to
> > switchdev via the same mechanism as the external flooding flags.
> >
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > ---
> >  include/linux/if_bridge.h |  3 +++
> >  net/bridge/br_if.c        | 40 +++++++++++++++++++++++++++++++++++++++
> >  net/bridge/br_switchdev.c |  4 +++-
> >  3 files changed, 46 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> > index b3a8d3054af0..6891a432862d 100644
> > --- a/include/linux/if_bridge.h
> > +++ b/include/linux/if_bridge.h
> > @@ -49,6 +49,9 @@ struct br_ip_list {
> >  #define BR_ISOLATED          BIT(16)
> >  #define BR_MRP_AWARE         BIT(17)
> >  #define BR_MRP_LOST_CONT     BIT(18)
> > +#define BR_HOST_FLOOD                BIT(19)
> > +#define BR_HOST_MCAST_FLOOD  BIT(20)
> > +#define BR_HOST_BCAST_FLOOD  BIT(21)
> >
> >  #define BR_DEFAULT_AGEING_TIME       (300 * HZ)
> >
> > diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> > index a0e9a7937412..aae59d1e619b 100644
> > --- a/net/bridge/br_if.c
> > +++ b/net/bridge/br_if.c
> > @@ -166,6 +166,45 @@ void br_manage_promisc(struct net_bridge *br)
> >       }
> >  }
> >
> > +static int br_manage_host_flood(struct net_bridge *br)
> > +{
> > +     const unsigned long mask = BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD |
> > +                                BR_HOST_BCAST_FLOOD;
> > +     struct net_bridge_port *p, *q;
> > +
> > +     list_for_each_entry(p, &br->port_list, list) {
> > +             unsigned long flags = p->flags;
> > +             bool sw_bridging = false;
> > +             int err;
> > +
> > +             list_for_each_entry(q, &br->port_list, list) {
> > +                     if (p == q)
> > +                             continue;
> > +
> > +                     if (!netdev_port_same_parent_id(p->dev, q->dev)) {
> > +                             sw_bridging = true;
> > +                             break;
> > +                     }
> > +             }
> > +
> > +             if (sw_bridging)
> > +                     flags |= mask;
> > +             else
> > +                     flags &= ~mask;
> > +
> > +             if (flags == p->flags)
> > +                     continue;
> > +
> > +             err = br_switchdev_set_port_flag(p, flags, mask);
> > +             if (err)
> > +                     return err;
> > +
> > +             p->flags = flags;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> >  int nbp_backup_change(struct net_bridge_port *p,
> >                     struct net_device *backup_dev)
> >  {
> > @@ -231,6 +270,7 @@ static void nbp_update_port_count(struct net_bridge *br)
> >               br->auto_cnt = cnt;
> >               br_manage_promisc(br);
> >       }
> > +     br_manage_host_flood(br);
> >  }
> >
>
> Can we do this only at port add/del ?
> Right now it will be invoked also by br_port_flags_change() upon BR_AUTO_MASK flag change.
>

Yes, we can do that.
Actually I have some doubts about BR_HOST_BCAST_FLOOD. We can't
disable that in the no-foreign-interface case, can we? For IPv6, it
looks like the stack does take care of installing dev_mc addresses for
the neighbor discovery protocol, but for IPv4 I guess the assumption
is that broadcast ARP should always be processed?

> >  static void nbp_delete_promisc(struct net_bridge_port *p)
> > diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> > index 015209bf44aa..360806ac7463 100644
> > --- a/net/bridge/br_switchdev.c
> > +++ b/net/bridge/br_switchdev.c
> > @@ -56,7 +56,9 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
> >
> >  /* Flags that can be offloaded to hardware */
> >  #define BR_PORT_FLAGS_HW_OFFLOAD (BR_LEARNING | BR_FLOOD | \
> > -                               BR_MCAST_FLOOD | BR_BCAST_FLOOD)
> > +                               BR_MCAST_FLOOD | BR_BCAST_FLOOD | \
> > +                               BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD | \
> > +                               BR_HOST_BCAST_FLOOD)
> >
> >  int br_switchdev_set_port_flag(struct net_bridge_port *p,
> >                              unsigned long flags,
> >
>

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
                   ` (12 preceding siblings ...)
  2020-05-21 21:10 ` [PATCH RFC net-next 13/13] net: dsa: wire up multicast IGMP snooping attribute notification Vladimir Oltean
@ 2020-05-22 18:42 ` Allan W. Nielsen
  2020-05-24 14:06 ` Ido Schimmel
  2020-05-24 16:13 ` Florian Fainelli
  15 siblings, 0 replies; 46+ messages in thread
From: Allan W. Nielsen @ 2020-05-22 18:42 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: andrew, f.fainelli, vivien.didelot, davem, jiri, idosch, kuba,
	ivecera, netdev, horatiu.vultur, nikolay, roopa

Hi Vladimir,

I'm very happy to see that you started working on this. Let me know if
you need help to update the Ocelot/Felix driver to support this.

/Allan

On 22.05.2020 00:10, Vladimir Oltean wrote:
>EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
>From: Vladimir Oltean <vladimir.oltean@nxp.com>
>
>This is a WIP series whose stated goal is to allow DSA and switchdev
>drivers to flood less traffic to the CPU while keeping the same level of
>functionality.
>
>The strategy is to whitelist towards the CPU only the {DMAC, VLAN} pairs
>that the operating system has expressed its interest in, either due to
>those being the MAC addresses of one of the switch ports, or addresses
>added to our device's RX filter via calls to dev_uc_add/dev_mc_add.
>Then, the traffic which is not explicitly whitelisted is not sent by the
>hardware to the CPU, under the assumption that the CPU didn't ask for it
>and would have dropped it anyway.
>
>The ground for these patches were the discussions surrounding RX
>filtering with switchdev in general, as well as with DSA in particular:
>
>"[PATCH net-next 0/4] DSA: promisc on master, generic flow dissector code":
>https://www.spinics.net/lists/netdev/msg651922.html
>"[PATCH v3 net-next 2/2] net: dsa: felix: Allow unknown unicast traffic towards the CPU port module":
>https://www.spinics.net/lists/netdev/msg634859.html
>"[PATCH v3 0/2] net: core: Notify on changes to dev->promiscuity":
>https://lkml.org/lkml/2019/8/29/255
>LPC2019 - SwitchDev offload optimizations:
>https://www.youtube.com/watch?v=B1HhxEcU7Jg
>
>Unicast filtering comes to me as most important, and this includes
>termination of MAC addresses corresponding to the network interfaces in
>the system (DSA switch ports, VLAN sub-interfaces, bridge interface).
>The first 4 patches use Ivan Khoronzhuk's IVDF framework for extending
>network interface addresses with a Virtual ID (typically VLAN ID). This
>matches DSA switches perfectly because their FDB already contains keys
>of the {DMAC, VID} form.
>
>Multicast filtering was taken and reworked from Florian Fainelli's
>previous attempts, according to my own understanding of multicast
>forwarding requirements of an IGMP snooping switch. This is the part
>that needs the most extra work, not only in the DSA core but also in
>drivers. For this reason, I've left out of this patchset anything that
>has to do with driver-level configuration (since the audience is a bit
>larger than usual), as I'm trying to focus more on policy for now, and
>the series is already pretty huge.
>
>Florian Fainelli (3):
>  net: bridge: multicast: propagate br_mc_disabled_update() return
>  net: dsa: add ability to program unicast and multicast filters for CPU
>    port
>  net: dsa: wire up multicast IGMP snooping attribute notification
>
>Ivan Khoronzhuk (4):
>  net: core: dev_addr_lists: add VID to device address
>  net: 8021q: vlan_dev: add vid tag to addresses of uc and mc lists
>  net: 8021q: vlan_dev: add vid tag for vlan device own address
>  ethernet: eth: add default vid len for all ethernet kind devices
>
>Vladimir Oltean (6):
>  net: core: dev_addr_lists: export some raw __hw_addr helpers
>  net: dsa: don't use switchdev_notifier_fdb_info in
>    dsa_switchdev_event_work
>  net: dsa: mroute: don't panic the kernel if called without the prepare
>    phase
>  net: bridge: add port flags for host flooding
>  net: dsa: deal with new flooding port attributes from bridge
>  net: dsa: treat switchdev notifications for multicast router connected
>    to port
>
> include/linux/if_bridge.h |   3 +
> include/linux/if_vlan.h   |   2 +
> include/linux/netdevice.h |  11 ++
> include/net/dsa.h         |  17 +++
> net/8021q/Kconfig         |  12 ++
> net/8021q/vlan.c          |   3 +
> net/8021q/vlan.h          |   2 +
> net/8021q/vlan_core.c     |  25 ++++
> net/8021q/vlan_dev.c      | 102 +++++++++++---
> net/bridge/br_if.c        |  40 ++++++
> net/bridge/br_multicast.c |  21 ++-
> net/bridge/br_switchdev.c |   4 +-
> net/core/dev_addr_lists.c | 144 +++++++++++++++----
> net/dsa/Kconfig           |   1 +
> net/dsa/dsa2.c            |   6 +
> net/dsa/dsa_priv.h        |  27 +++-
> net/dsa/port.c            | 155 ++++++++++++++++----
> net/dsa/slave.c           | 288 +++++++++++++++++++++++++++++++-------
> net/dsa/switch.c          |  36 +++++
> net/ethernet/eth.c        |  12 +-
> 20 files changed, 780 insertions(+), 131 deletions(-)
>
>--
>2.25.1
>
/Allan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding
  2020-05-22 13:13     ` Vladimir Oltean
@ 2020-05-22 18:45       ` Allan W. Nielsen
  2020-07-20 11:08         ` Vladimir Oltean
  0 siblings, 1 reply; 46+ messages in thread
From: Allan W. Nielsen @ 2020-05-22 18:45 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Nikolay Aleksandrov, Andrew Lunn, Florian Fainelli,
	Vivien Didelot, David S. Miller, Jiri Pirko, Ido Schimmel,
	Jakub Kicinski, Ivan Vecera, netdev, Horatiu Vultur,
	Roopa Prabhu

On 22.05.2020 16:13, Vladimir Oltean wrote:
>EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
>On Fri, 22 May 2020 at 15:38, Nikolay Aleksandrov
><nikolay@cumulusnetworks.com> wrote:
>>
>> On 22/05/2020 00:10, Vladimir Oltean wrote:
>> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
>> >
>> > In cases where the bridge is offloaded by a switchdev, there are
>> > situations where we can optimize RX filtering towards the host. To be
>> > precise, the host only needs to do termination, which it can do by
>> > responding at the MAC addresses of the slave ports and of the bridge
>> > interface itself. But most notably, it doesn't need to do forwarding,
>> > so there is no need to see packets with unknown destination address.
>> >
>> > But there are, however, cases when a switchdev does need to flood to the
>> > CPU. Such an example is when the switchdev is bridged with a foreign
>> > interface, and since there is no offloaded datapath, packets need to
>> > pass through the CPU. Currently this is the only identified case, but it
>> > can be extended at any time.
>> >
>> > So far, switchdev implementers made driver-level assumptions, such as:
>> > this chip is never integrated in SoCs where it can be bridged with a
>> > foreign interface, so I'll just disable host flooding and save some CPU
>> > cycles. Or: I can never know what else can be bridged with this
>> > switchdev port, so I must leave host flooding enabled in any case.
>> >
>> > Let the bridge drive the host flooding decision, and pass it to
>> > switchdev via the same mechanism as the external flooding flags.
>> >
>> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
>> > ---
>> >  include/linux/if_bridge.h |  3 +++
>> >  net/bridge/br_if.c        | 40 +++++++++++++++++++++++++++++++++++++++
>> >  net/bridge/br_switchdev.c |  4 +++-
>> >  3 files changed, 46 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
>> > index b3a8d3054af0..6891a432862d 100644
>> > --- a/include/linux/if_bridge.h
>> > +++ b/include/linux/if_bridge.h
>> > @@ -49,6 +49,9 @@ struct br_ip_list {
>> >  #define BR_ISOLATED          BIT(16)
>> >  #define BR_MRP_AWARE         BIT(17)
>> >  #define BR_MRP_LOST_CONT     BIT(18)
>> > +#define BR_HOST_FLOOD                BIT(19)
>> > +#define BR_HOST_MCAST_FLOOD  BIT(20)
>> > +#define BR_HOST_BCAST_FLOOD  BIT(21)
>> >
>> >  #define BR_DEFAULT_AGEING_TIME       (300 * HZ)
>> >
>> > diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
>> > index a0e9a7937412..aae59d1e619b 100644
>> > --- a/net/bridge/br_if.c
>> > +++ b/net/bridge/br_if.c
>> > @@ -166,6 +166,45 @@ void br_manage_promisc(struct net_bridge *br)
>> >       }
>> >  }
>> >
>> > +static int br_manage_host_flood(struct net_bridge *br)
>> > +{
>> > +     const unsigned long mask = BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD |
>> > +                                BR_HOST_BCAST_FLOOD;
>> > +     struct net_bridge_port *p, *q;
>> > +
>> > +     list_for_each_entry(p, &br->port_list, list) {
>> > +             unsigned long flags = p->flags;
>> > +             bool sw_bridging = false;
>> > +             int err;
>> > +
>> > +             list_for_each_entry(q, &br->port_list, list) {
>> > +                     if (p == q)
>> > +                             continue;
>> > +
>> > +                     if (!netdev_port_same_parent_id(p->dev, q->dev)) {
>> > +                             sw_bridging = true;
>> > +                             break;
>> > +                     }
>> > +             }
>> > +
>> > +             if (sw_bridging)
>> > +                     flags |= mask;
>> > +             else
>> > +                     flags &= ~mask;
>> > +
>> > +             if (flags == p->flags)
>> > +                     continue;
>> > +
>> > +             err = br_switchdev_set_port_flag(p, flags, mask);
>> > +             if (err)
>> > +                     return err;
>> > +
>> > +             p->flags = flags;
>> > +     }
>> > +
>> > +     return 0;
>> > +}
>> > +
>> >  int nbp_backup_change(struct net_bridge_port *p,
>> >                     struct net_device *backup_dev)
>> >  {
>> > @@ -231,6 +270,7 @@ static void nbp_update_port_count(struct net_bridge *br)
>> >               br->auto_cnt = cnt;
>> >               br_manage_promisc(br);
>> >       }
>> > +     br_manage_host_flood(br);
>> >  }
>> >
>>
>> Can we do this only at port add/del ?
>> Right now it will be invoked also by br_port_flags_change() upon BR_AUTO_MASK flag change.
>>
>
>Yes, we can do that.
>Actually I have some doubts about BR_HOST_BCAST_FLOOD. We can't
>disable that in the no-foreign-interface case, can we? For IPv6, it
>looks like the stack does take care of installing dev_mc addresses for
>the neighbor discovery protocol, but for IPv4 I guess the assumption
>is that broadcast ARP should always be processed?

Ideally this should be per VLAN. In case of IPv4, you only need to be
part of the broadcast domain on VLANs with an associated vlan-interface.

>> >  static void nbp_delete_promisc(struct net_bridge_port *p)
>> > diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
>> > index 015209bf44aa..360806ac7463 100644
>> > --- a/net/bridge/br_switchdev.c
>> > +++ b/net/bridge/br_switchdev.c
>> > @@ -56,7 +56,9 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
>> >
>> >  /* Flags that can be offloaded to hardware */
>> >  #define BR_PORT_FLAGS_HW_OFFLOAD (BR_LEARNING | BR_FLOOD | \
>> > -                               BR_MCAST_FLOOD | BR_BCAST_FLOOD)
>> > +                               BR_MCAST_FLOOD | BR_BCAST_FLOOD | \
>> > +                               BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD | \
>> > +                               BR_HOST_BCAST_FLOOD)
>> >
>> >  int br_switchdev_set_port_flag(struct net_bridge_port *p,
>> >                              unsigned long flags,
>> >
>>
>
>Thanks,
>-Vladimir
/Allan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
                   ` (13 preceding siblings ...)
  2020-05-22 18:42 ` [PATCH RFC net-next 00/13] RX filtering for DSA switches Allan W. Nielsen
@ 2020-05-24 14:06 ` Ido Schimmel
  2020-05-24 16:24   ` Vladimir Oltean
  2020-05-24 16:13 ` Florian Fainelli
  15 siblings, 1 reply; 46+ messages in thread
From: Ido Schimmel @ 2020-05-24 14:06 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: andrew, f.fainelli, vivien.didelot, davem, jiri, kuba, ivecera,
	netdev, horatiu.vultur, allan.nielsen, nikolay, roopa

On Fri, May 22, 2020 at 12:10:23AM +0300, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> This is a WIP series whose stated goal is to allow DSA and switchdev
> drivers to flood less traffic to the CPU while keeping the same level of
> functionality.
> 
> The strategy is to whitelist towards the CPU only the {DMAC, VLAN} pairs
> that the operating system has expressed its interest in, either due to
> those being the MAC addresses of one of the switch ports, or addresses
> added to our device's RX filter via calls to dev_uc_add/dev_mc_add.
> Then, the traffic which is not explicitly whitelisted is not sent by the
> hardware to the CPU, under the assumption that the CPU didn't ask for it
> and would have dropped it anyway.
> 
> The ground for these patches were the discussions surrounding RX
> filtering with switchdev in general, as well as with DSA in particular:
> 
> "[PATCH net-next 0/4] DSA: promisc on master, generic flow dissector code":
> https://www.spinics.net/lists/netdev/msg651922.html
> "[PATCH v3 net-next 2/2] net: dsa: felix: Allow unknown unicast traffic towards the CPU port module":
> https://www.spinics.net/lists/netdev/msg634859.html
> "[PATCH v3 0/2] net: core: Notify on changes to dev->promiscuity":
> https://lkml.org/lkml/2019/8/29/255
> LPC2019 - SwitchDev offload optimizations:
> https://www.youtube.com/watch?v=B1HhxEcU7Jg
> 
> Unicast filtering comes to me as most important, and this includes
> termination of MAC addresses corresponding to the network interfaces in
> the system (DSA switch ports, VLAN sub-interfaces, bridge interface).
> The first 4 patches use Ivan Khoronzhuk's IVDF framework for extending
> network interface addresses with a Virtual ID (typically VLAN ID). This
> matches DSA switches perfectly because their FDB already contains keys
> of the {DMAC, VID} form.

Hi,

I read through the series and I'm not sure how unicast filtering works.
Instead of writing a very long mail I just created a script with
comments. I think it's clearer that way. Note that this is not a made up
configuration. It is used in setups involving VRRP / VXLAN, for example.

```
#!/bin/bash

ip netns add ns1

ip -n ns1 link add name br0 type bridge vlan_filtering 1
ip -n ns1 link add name dummy10 up type dummy

ip -n ns1 link set dev dummy10 master br0
ip -n ns1 link set dev br0 up

ip -n ns1 link add link br0 name vlan10 up type vlan id 10
bridge -n ns1 vlan add vid 10 dev br0 self

echo "Before adding macvlan:"
echo "======================"

echo -n "Promiscuous mode: "
ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]

echo -e "\nvlan10's MAC is in br0's FDB:"
bridge -n ns1 fdb show br0 vlan 10

echo
echo "After adding macvlan:"
echo "====================="

ip -n ns1 link add link vlan10 name vlan10-v up address 00:00:5e:00:01:01 \
        type macvlan mode private

echo -n "Promiscuous mode: "
ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]

echo -e "\nvlan10-v's MAC is not in br0's FDB:"
bridge -n ns1 fdb show br0 | grep master | grep 00:00:5e:00:01:01
```

This is the output on my laptop (kernel 5.6.8):

```
Before adding macvlan:
======================
Promiscuous mode: 0

vlan10's MAC is in br0's FDB:
42:bd:b1:cc:67:15 dev br0 vlan 10 master br0 permanent

After adding macvlan:
=====================
Promiscuous mode: 1

vlan10-v's MAC is not in br0's FDB:
```

Basically, if the MAC of the VLAN device is not inherited from the
bridge or you stack macvlans on top, then the bridge will go into
promiscuous mode and it will locally receive all frames passing through
it. It's not ideal, but it's a very old and simple behavior. It does not
require you to track the VLAN associated with the MAC addresses, for
example.

When you are offloading the Linux data path to hardware this behavior is
not ideal as your hardware can handle much higher packet rates than the
CPU.

In mlxsw we handle this by tracking the upper devices of the bridge. I
was hoping that with Ivan's patches we could add support for unicast
filtering in the bridge driver and program the MAC addresses to its FDB
with 'local' flag. Then the FDB entries would be notified via switchdev
to device drivers.

> 
> Multicast filtering was taken and reworked from Florian Fainelli's
> previous attempts, according to my own understanding of multicast
> forwarding requirements of an IGMP snooping switch. This is the part
> that needs the most extra work, not only in the DSA core but also in
> drivers. For this reason, I've left out of this patchset anything that
> has to do with driver-level configuration (since the audience is a bit
> larger than usual), as I'm trying to focus more on policy for now, and
> the series is already pretty huge.

From what I remember, this is the logic in the Linux bridge:

* Broadcast is always locally received
* Multicast is locally received if:
	* Snooping disabled
	* Snooping enabled:
		* Bridge netdev is mrouter port
		or
		* Matches MDB entry with 'host_joined' indication

> 
> Florian Fainelli (3):
>   net: bridge: multicast: propagate br_mc_disabled_update() return
>   net: dsa: add ability to program unicast and multicast filters for CPU
>     port
>   net: dsa: wire up multicast IGMP snooping attribute notification
> 
> Ivan Khoronzhuk (4):
>   net: core: dev_addr_lists: add VID to device address
>   net: 8021q: vlan_dev: add vid tag to addresses of uc and mc lists
>   net: 8021q: vlan_dev: add vid tag for vlan device own address
>   ethernet: eth: add default vid len for all ethernet kind devices
> 
> Vladimir Oltean (6):
>   net: core: dev_addr_lists: export some raw __hw_addr helpers
>   net: dsa: don't use switchdev_notifier_fdb_info in
>     dsa_switchdev_event_work
>   net: dsa: mroute: don't panic the kernel if called without the prepare
>     phase
>   net: bridge: add port flags for host flooding
>   net: dsa: deal with new flooding port attributes from bridge
>   net: dsa: treat switchdev notifications for multicast router connected
>     to port
> 
>  include/linux/if_bridge.h |   3 +
>  include/linux/if_vlan.h   |   2 +
>  include/linux/netdevice.h |  11 ++
>  include/net/dsa.h         |  17 +++
>  net/8021q/Kconfig         |  12 ++
>  net/8021q/vlan.c          |   3 +
>  net/8021q/vlan.h          |   2 +
>  net/8021q/vlan_core.c     |  25 ++++
>  net/8021q/vlan_dev.c      | 102 +++++++++++---
>  net/bridge/br_if.c        |  40 ++++++
>  net/bridge/br_multicast.c |  21 ++-
>  net/bridge/br_switchdev.c |   4 +-
>  net/core/dev_addr_lists.c | 144 +++++++++++++++----
>  net/dsa/Kconfig           |   1 +
>  net/dsa/dsa2.c            |   6 +
>  net/dsa/dsa_priv.h        |  27 +++-
>  net/dsa/port.c            | 155 ++++++++++++++++----
>  net/dsa/slave.c           | 288 +++++++++++++++++++++++++++++++-------
>  net/dsa/switch.c          |  36 +++++
>  net/ethernet/eth.c        |  12 +-
>  20 files changed, 780 insertions(+), 131 deletions(-)
> 
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding
  2020-05-21 21:10 ` [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding Vladimir Oltean
  2020-05-22 12:38   ` Nikolay Aleksandrov
@ 2020-05-24 14:26   ` Ido Schimmel
  2020-05-24 16:13     ` Vladimir Oltean
  1 sibling, 1 reply; 46+ messages in thread
From: Ido Schimmel @ 2020-05-24 14:26 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: andrew, f.fainelli, vivien.didelot, davem, jiri, kuba, ivecera,
	netdev, horatiu.vultur, allan.nielsen, nikolay, roopa

On Fri, May 22, 2020 at 12:10:33AM +0300, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> In cases where the bridge is offloaded by a switchdev, there are
> situations where we can optimize RX filtering towards the host. To be
> precise, the host only needs to do termination, which it can do by
> responding at the MAC addresses of the slave ports and of the bridge
> interface itself. But most notably, it doesn't need to do forwarding,
> so there is no need to see packets with unknown destination address.
> 
> But there are, however, cases when a switchdev does need to flood to the
> CPU. Such an example is when the switchdev is bridged with a foreign
> interface, and since there is no offloaded datapath, packets need to
> pass through the CPU. Currently this is the only identified case, but it
> can be extended at any time.
> 
> So far, switchdev implementers made driver-level assumptions, such as:
> this chip is never integrated in SoCs where it can be bridged with a
> foreign interface, so I'll just disable host flooding and save some CPU
> cycles. Or: I can never know what else can be bridged with this
> switchdev port, so I must leave host flooding enabled in any case.
> 
> Let the bridge drive the host flooding decision, and pass it to
> switchdev via the same mechanism as the external flooding flags.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  include/linux/if_bridge.h |  3 +++
>  net/bridge/br_if.c        | 40 +++++++++++++++++++++++++++++++++++++++
>  net/bridge/br_switchdev.c |  4 +++-
>  3 files changed, 46 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index b3a8d3054af0..6891a432862d 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -49,6 +49,9 @@ struct br_ip_list {
>  #define BR_ISOLATED		BIT(16)
>  #define BR_MRP_AWARE		BIT(17)
>  #define BR_MRP_LOST_CONT	BIT(18)
> +#define BR_HOST_FLOOD		BIT(19)
> +#define BR_HOST_MCAST_FLOOD	BIT(20)
> +#define BR_HOST_BCAST_FLOOD	BIT(21)
>  
>  #define BR_DEFAULT_AGEING_TIME	(300 * HZ)
>  
> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> index a0e9a7937412..aae59d1e619b 100644
> --- a/net/bridge/br_if.c
> +++ b/net/bridge/br_if.c
> @@ -166,6 +166,45 @@ void br_manage_promisc(struct net_bridge *br)
>  	}
>  }
>  
> +static int br_manage_host_flood(struct net_bridge *br)
> +{
> +	const unsigned long mask = BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD |
> +				   BR_HOST_BCAST_FLOOD;
> +	struct net_bridge_port *p, *q;
> +
> +	list_for_each_entry(p, &br->port_list, list) {
> +		unsigned long flags = p->flags;
> +		bool sw_bridging = false;
> +		int err;
> +
> +		list_for_each_entry(q, &br->port_list, list) {
> +			if (p == q)
> +				continue;
> +
> +			if (!netdev_port_same_parent_id(p->dev, q->dev)) {
> +				sw_bridging = true;

It's not that simple. There are cases where not all bridge slaves have
the same parent ID and still there is no reason to flood traffic to the
CPU. VXLAN, for example.

You could argue that the VXLAN device needs to have the same parent ID
as the physical netdevs member in the bridge, but it will break your
data path. For example, lets assume your hardware decided to flood a
packet in L2. The packet will egress all the local ports, but will also
perform VXLAN encapsulation. The packet continues with the IP of the
remote VTEP(s) to the underlay router and then encounters a neighbour
miss exception, which sends it to the CPU for resolution.

Since this exception was encountered in the router the driver would mark
the packet with 'offload_fwd_mark', as it already performed L2
forwarding. If the VXLAN device has the same parent ID as the physical
netdevs, then the Linux bridge will never let it egress, nothing will
trigger neighbour resolution and the packet will be discarded.

> +				break;
> +			}
> +		}
> +
> +		if (sw_bridging)
> +			flags |= mask;
> +		else
> +			flags &= ~mask;
> +
> +		if (flags == p->flags)
> +			continue;
> +
> +		err = br_switchdev_set_port_flag(p, flags, mask);
> +		if (err)
> +			return err;
> +
> +		p->flags = flags;
> +	}
> +
> +	return 0;
> +}
> +
>  int nbp_backup_change(struct net_bridge_port *p,
>  		      struct net_device *backup_dev)
>  {
> @@ -231,6 +270,7 @@ static void nbp_update_port_count(struct net_bridge *br)
>  		br->auto_cnt = cnt;
>  		br_manage_promisc(br);
>  	}
> +	br_manage_host_flood(br);
>  }
>  
>  static void nbp_delete_promisc(struct net_bridge_port *p)
> diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> index 015209bf44aa..360806ac7463 100644
> --- a/net/bridge/br_switchdev.c
> +++ b/net/bridge/br_switchdev.c
> @@ -56,7 +56,9 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
>  
>  /* Flags that can be offloaded to hardware */
>  #define BR_PORT_FLAGS_HW_OFFLOAD (BR_LEARNING | BR_FLOOD | \
> -				  BR_MCAST_FLOOD | BR_BCAST_FLOOD)
> +				  BR_MCAST_FLOOD | BR_BCAST_FLOOD | \
> +				  BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD | \
> +				  BR_HOST_BCAST_FLOOD)
>  
>  int br_switchdev_set_port_flag(struct net_bridge_port *p,
>  			       unsigned long flags,
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
                   ` (14 preceding siblings ...)
  2020-05-24 14:06 ` Ido Schimmel
@ 2020-05-24 16:13 ` Florian Fainelli
  2020-05-24 16:34   ` Vladimir Oltean
  15 siblings, 1 reply; 46+ messages in thread
From: Florian Fainelli @ 2020-05-24 16:13 UTC (permalink / raw)
  To: Vladimir Oltean, andrew, vivien.didelot, davem
  Cc: jiri, idosch, kuba, ivecera, netdev, horatiu.vultur,
	allan.nielsen, nikolay, roopa

Hi Vladimir,

On 5/21/2020 2:10 PM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> This is a WIP series whose stated goal is to allow DSA and switchdev
> drivers to flood less traffic to the CPU while keeping the same level of
> functionality.
> 
> The strategy is to whitelist towards the CPU only the {DMAC, VLAN} pairs
> that the operating system has expressed its interest in, either due to
> those being the MAC addresses of one of the switch ports, or addresses
> added to our device's RX filter via calls to dev_uc_add/dev_mc_add.
> Then, the traffic which is not explicitly whitelisted is not sent by the
> hardware to the CPU, under the assumption that the CPU didn't ask for it
> and would have dropped it anyway.
> 
> The ground for these patches were the discussions surrounding RX
> filtering with switchdev in general, as well as with DSA in particular:
> 
> "[PATCH net-next 0/4] DSA: promisc on master, generic flow dissector code":
> https://www.spinics.net/lists/netdev/msg651922.html
> "[PATCH v3 net-next 2/2] net: dsa: felix: Allow unknown unicast traffic towards the CPU port module":
> https://www.spinics.net/lists/netdev/msg634859.html
> "[PATCH v3 0/2] net: core: Notify on changes to dev->promiscuity":
> https://lkml.org/lkml/2019/8/29/255
> LPC2019 - SwitchDev offload optimizations:
> https://www.youtube.com/watch?v=B1HhxEcU7Jg
> 
> Unicast filtering comes to me as most important, and this includes
> termination of MAC addresses corresponding to the network interfaces in
> the system (DSA switch ports, VLAN sub-interfaces, bridge interface).
> The first 4 patches use Ivan Khoronzhuk's IVDF framework for extending
> network interface addresses with a Virtual ID (typically VLAN ID). This
> matches DSA switches perfectly because their FDB already contains keys
> of the {DMAC, VID} form.
> 
> Multicast filtering was taken and reworked from Florian Fainelli's
> previous attempts, according to my own understanding of multicast
> forwarding requirements of an IGMP snooping switch. This is the part
> that needs the most extra work, not only in the DSA core but also in
> drivers. For this reason, I've left out of this patchset anything that
> has to do with driver-level configuration (since the audience is a bit
> larger than usual), as I'm trying to focus more on policy for now, and
> the series is already pretty huge.


First off, thank you very much for collecting the various patches and
bringing them up to date, so far I only had a cursory look at your
patches and they do look good to me in principle. I plan on testing this
next week with the b53/bcm_sf2 switches and give you some more detailed
feedback.

Which of UC or MC filtering do you value the most for your use cases?
For me it would be MC filtering because the environment is usually
Set-top-box and streaming devices.
-- 
Florian

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding
  2020-05-24 14:26   ` Ido Schimmel
@ 2020-05-24 16:13     ` Vladimir Oltean
  2020-05-25 20:11       ` Ido Schimmel
  0 siblings, 1 reply; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-24 16:13 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, netdev, Horatiu Vultur,
	Allan W. Nielsen, Nikolay Aleksandrov, Roopa Prabhu

Hi Ido,

On Sun, 24 May 2020 at 17:26, Ido Schimmel <idosch@idosch.org> wrote:
>
> On Fri, May 22, 2020 at 12:10:33AM +0300, Vladimir Oltean wrote:
> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> >
> > In cases where the bridge is offloaded by a switchdev, there are
> > situations where we can optimize RX filtering towards the host. To be
> > precise, the host only needs to do termination, which it can do by
> > responding at the MAC addresses of the slave ports and of the bridge
> > interface itself. But most notably, it doesn't need to do forwarding,
> > so there is no need to see packets with unknown destination address.
> >
> > But there are, however, cases when a switchdev does need to flood to the
> > CPU. Such an example is when the switchdev is bridged with a foreign
> > interface, and since there is no offloaded datapath, packets need to
> > pass through the CPU. Currently this is the only identified case, but it
> > can be extended at any time.
> >
> > So far, switchdev implementers made driver-level assumptions, such as:
> > this chip is never integrated in SoCs where it can be bridged with a
> > foreign interface, so I'll just disable host flooding and save some CPU
> > cycles. Or: I can never know what else can be bridged with this
> > switchdev port, so I must leave host flooding enabled in any case.
> >
> > Let the bridge drive the host flooding decision, and pass it to
> > switchdev via the same mechanism as the external flooding flags.
> >
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > ---
> >  include/linux/if_bridge.h |  3 +++
> >  net/bridge/br_if.c        | 40 +++++++++++++++++++++++++++++++++++++++
> >  net/bridge/br_switchdev.c |  4 +++-
> >  3 files changed, 46 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> > index b3a8d3054af0..6891a432862d 100644
> > --- a/include/linux/if_bridge.h
> > +++ b/include/linux/if_bridge.h
> > @@ -49,6 +49,9 @@ struct br_ip_list {
> >  #define BR_ISOLATED          BIT(16)
> >  #define BR_MRP_AWARE         BIT(17)
> >  #define BR_MRP_LOST_CONT     BIT(18)
> > +#define BR_HOST_FLOOD                BIT(19)
> > +#define BR_HOST_MCAST_FLOOD  BIT(20)
> > +#define BR_HOST_BCAST_FLOOD  BIT(21)
> >
> >  #define BR_DEFAULT_AGEING_TIME       (300 * HZ)
> >
> > diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> > index a0e9a7937412..aae59d1e619b 100644
> > --- a/net/bridge/br_if.c
> > +++ b/net/bridge/br_if.c
> > @@ -166,6 +166,45 @@ void br_manage_promisc(struct net_bridge *br)
> >       }
> >  }
> >
> > +static int br_manage_host_flood(struct net_bridge *br)
> > +{
> > +     const unsigned long mask = BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD |
> > +                                BR_HOST_BCAST_FLOOD;
> > +     struct net_bridge_port *p, *q;
> > +
> > +     list_for_each_entry(p, &br->port_list, list) {
> > +             unsigned long flags = p->flags;
> > +             bool sw_bridging = false;
> > +             int err;
> > +
> > +             list_for_each_entry(q, &br->port_list, list) {
> > +                     if (p == q)
> > +                             continue;
> > +
> > +                     if (!netdev_port_same_parent_id(p->dev, q->dev)) {
> > +                             sw_bridging = true;
>
> It's not that simple. There are cases where not all bridge slaves have
> the same parent ID and still there is no reason to flood traffic to the
> CPU. VXLAN, for example.
>
> You could argue that the VXLAN device needs to have the same parent ID
> as the physical netdevs member in the bridge, but it will break your
> data path. For example, lets assume your hardware decided to flood a
> packet in L2. The packet will egress all the local ports, but will also
> perform VXLAN encapsulation. The packet continues with the IP of the
> remote VTEP(s) to the underlay router and then encounters a neighbour
> miss exception, which sends it to the CPU for resolution.
>
> Since this exception was encountered in the router the driver would mark
> the packet with 'offload_fwd_mark', as it already performed L2
> forwarding. If the VXLAN device has the same parent ID as the physical
> netdevs, then the Linux bridge will never let it egress, nothing will
> trigger neighbour resolution and the packet will be discarded.
>

I wasn't going to argue that.
Ok, so with a bridged VXLAN only certain multicast DMACs corresponding
to multicast IPs should be flooded to the CPU.
Actually Allan's example was a bit simpler, he said that host flooding
can be made a per-VLAN flag. I'm glad that you raised this. So maybe
we should try to define some mechanism by which virtual interfaces can
specify to the bridge that they don't need to see all traffic? Do you
have any ideas?

> > +                             break;
> > +                     }
> > +             }
> > +
> > +             if (sw_bridging)
> > +                     flags |= mask;
> > +             else
> > +                     flags &= ~mask;
> > +
> > +             if (flags == p->flags)
> > +                     continue;
> > +
> > +             err = br_switchdev_set_port_flag(p, flags, mask);
> > +             if (err)
> > +                     return err;
> > +
> > +             p->flags = flags;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> >  int nbp_backup_change(struct net_bridge_port *p,
> >                     struct net_device *backup_dev)
> >  {
> > @@ -231,6 +270,7 @@ static void nbp_update_port_count(struct net_bridge *br)
> >               br->auto_cnt = cnt;
> >               br_manage_promisc(br);
> >       }
> > +     br_manage_host_flood(br);
> >  }
> >
> >  static void nbp_delete_promisc(struct net_bridge_port *p)
> > diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> > index 015209bf44aa..360806ac7463 100644
> > --- a/net/bridge/br_switchdev.c
> > +++ b/net/bridge/br_switchdev.c
> > @@ -56,7 +56,9 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
> >
> >  /* Flags that can be offloaded to hardware */
> >  #define BR_PORT_FLAGS_HW_OFFLOAD (BR_LEARNING | BR_FLOOD | \
> > -                               BR_MCAST_FLOOD | BR_BCAST_FLOOD)
> > +                               BR_MCAST_FLOOD | BR_BCAST_FLOOD | \
> > +                               BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD | \
> > +                               BR_HOST_BCAST_FLOOD)
> >
> >  int br_switchdev_set_port_flag(struct net_bridge_port *p,
> >                              unsigned long flags,
> > --
> > 2.25.1
> >

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-05-24 14:06 ` Ido Schimmel
@ 2020-05-24 16:24   ` Vladimir Oltean
  2020-05-25 19:48     ` Ido Schimmel
  0 siblings, 1 reply; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-24 16:24 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, netdev, Horatiu Vultur,
	Allan W. Nielsen, Nikolay Aleksandrov, Roopa Prabhu

On Sun, 24 May 2020 at 17:07, Ido Schimmel <idosch@idosch.org> wrote:
>
> On Fri, May 22, 2020 at 12:10:23AM +0300, Vladimir Oltean wrote:
> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> >
> > This is a WIP series whose stated goal is to allow DSA and switchdev
> > drivers to flood less traffic to the CPU while keeping the same level of
> > functionality.
> >
> > The strategy is to whitelist towards the CPU only the {DMAC, VLAN} pairs
> > that the operating system has expressed its interest in, either due to
> > those being the MAC addresses of one of the switch ports, or addresses
> > added to our device's RX filter via calls to dev_uc_add/dev_mc_add.
> > Then, the traffic which is not explicitly whitelisted is not sent by the
> > hardware to the CPU, under the assumption that the CPU didn't ask for it
> > and would have dropped it anyway.
> >
> > The ground for these patches were the discussions surrounding RX
> > filtering with switchdev in general, as well as with DSA in particular:
> >
> > "[PATCH net-next 0/4] DSA: promisc on master, generic flow dissector code":
> > https://www.spinics.net/lists/netdev/msg651922.html
> > "[PATCH v3 net-next 2/2] net: dsa: felix: Allow unknown unicast traffic towards the CPU port module":
> > https://www.spinics.net/lists/netdev/msg634859.html
> > "[PATCH v3 0/2] net: core: Notify on changes to dev->promiscuity":
> > https://lkml.org/lkml/2019/8/29/255
> > LPC2019 - SwitchDev offload optimizations:
> > https://www.youtube.com/watch?v=B1HhxEcU7Jg
> >
> > Unicast filtering comes to me as most important, and this includes
> > termination of MAC addresses corresponding to the network interfaces in
> > the system (DSA switch ports, VLAN sub-interfaces, bridge interface).
> > The first 4 patches use Ivan Khoronzhuk's IVDF framework for extending
> > network interface addresses with a Virtual ID (typically VLAN ID). This
> > matches DSA switches perfectly because their FDB already contains keys
> > of the {DMAC, VID} form.
>
> Hi,
>
> I read through the series and I'm not sure how unicast filtering works.
> Instead of writing a very long mail I just created a script with
> comments. I think it's clearer that way. Note that this is not a made up
> configuration. It is used in setups involving VRRP / VXLAN, for example.
>
> ```
> #!/bin/bash
>
> ip netns add ns1
>
> ip -n ns1 link add name br0 type bridge vlan_filtering 1
> ip -n ns1 link add name dummy10 up type dummy
>
> ip -n ns1 link set dev dummy10 master br0
> ip -n ns1 link set dev br0 up
>
> ip -n ns1 link add link br0 name vlan10 up type vlan id 10
> bridge -n ns1 vlan add vid 10 dev br0 self
>
> echo "Before adding macvlan:"
> echo "======================"
>
> echo -n "Promiscuous mode: "
> ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
>
> echo -e "\nvlan10's MAC is in br0's FDB:"
> bridge -n ns1 fdb show br0 vlan 10
>
> echo
> echo "After adding macvlan:"
> echo "====================="
>
> ip -n ns1 link add link vlan10 name vlan10-v up address 00:00:5e:00:01:01 \
>         type macvlan mode private
>
> echo -n "Promiscuous mode: "
> ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
>
> echo -e "\nvlan10-v's MAC is not in br0's FDB:"
> bridge -n ns1 fdb show br0 | grep master | grep 00:00:5e:00:01:01
> ```
>
> This is the output on my laptop (kernel 5.6.8):
>
> ```
> Before adding macvlan:
> ======================
> Promiscuous mode: 0
>
> vlan10's MAC is in br0's FDB:
> 42:bd:b1:cc:67:15 dev br0 vlan 10 master br0 permanent
>
> After adding macvlan:
> =====================
> Promiscuous mode: 1
>
> vlan10-v's MAC is not in br0's FDB:
> ```
>
> Basically, if the MAC of the VLAN device is not inherited from the
> bridge or you stack macvlans on top, then the bridge will go into
> promiscuous mode and it will locally receive all frames passing through
> it. It's not ideal, but it's a very old and simple behavior. It does not
> require you to track the VLAN associated with the MAC addresses, for
> example.
>

This is a good point. I wasn't aware that the bridge 'gives up' with
macvlan upper devices, but if I understand correctly, we do have the
necessary tools to improve that.
But actually, I'm wondering if this simple behavior from the bridge is
correct. As you, Jiri and Ivan pointed out in last summer's email
thread about the Linux bridge and promiscuous mode, putting the
interface in IFF_PROMISC is only going to guarantee acceptance through
the net device's RX filter, but not that the packets will go to the
CPU. So from that perspective, the current series would break things,
so we should definitely fix that and keep the {MAC, VLAN} pairs in the
bridge's local FDB.

> When you are offloading the Linux data path to hardware this behavior is
> not ideal as your hardware can handle much higher packet rates than the
> CPU.
>
> In mlxsw we handle this by tracking the upper devices of the bridge. I
> was hoping that with Ivan's patches we could add support for unicast
> filtering in the bridge driver and program the MAC addresses to its FDB
> with 'local' flag. Then the FDB entries would be notified via switchdev
> to device drivers.
>

Yes, it should be possible to do that. I'll try and see how far I get.

> >
> > Multicast filtering was taken and reworked from Florian Fainelli's
> > previous attempts, according to my own understanding of multicast
> > forwarding requirements of an IGMP snooping switch. This is the part
> > that needs the most extra work, not only in the DSA core but also in
> > drivers. For this reason, I've left out of this patchset anything that
> > has to do with driver-level configuration (since the audience is a bit
> > larger than usual), as I'm trying to focus more on policy for now, and
> > the series is already pretty huge.
>
> From what I remember, this is the logic in the Linux bridge:
>
> * Broadcast is always locally received
> * Multicast is locally received if:
>         * Snooping disabled
>         * Snooping enabled:
>                 * Bridge netdev is mrouter port
>                 or
>                 * Matches MDB entry with 'host_joined' indication
>
> >
> > Florian Fainelli (3):
> >   net: bridge: multicast: propagate br_mc_disabled_update() return
> >   net: dsa: add ability to program unicast and multicast filters for CPU
> >     port
> >   net: dsa: wire up multicast IGMP snooping attribute notification
> >
> > Ivan Khoronzhuk (4):
> >   net: core: dev_addr_lists: add VID to device address
> >   net: 8021q: vlan_dev: add vid tag to addresses of uc and mc lists
> >   net: 8021q: vlan_dev: add vid tag for vlan device own address
> >   ethernet: eth: add default vid len for all ethernet kind devices
> >
> > Vladimir Oltean (6):
> >   net: core: dev_addr_lists: export some raw __hw_addr helpers
> >   net: dsa: don't use switchdev_notifier_fdb_info in
> >     dsa_switchdev_event_work
> >   net: dsa: mroute: don't panic the kernel if called without the prepare
> >     phase
> >   net: bridge: add port flags for host flooding
> >   net: dsa: deal with new flooding port attributes from bridge
> >   net: dsa: treat switchdev notifications for multicast router connected
> >     to port
> >
> >  include/linux/if_bridge.h |   3 +
> >  include/linux/if_vlan.h   |   2 +
> >  include/linux/netdevice.h |  11 ++
> >  include/net/dsa.h         |  17 +++
> >  net/8021q/Kconfig         |  12 ++
> >  net/8021q/vlan.c          |   3 +
> >  net/8021q/vlan.h          |   2 +
> >  net/8021q/vlan_core.c     |  25 ++++
> >  net/8021q/vlan_dev.c      | 102 +++++++++++---
> >  net/bridge/br_if.c        |  40 ++++++
> >  net/bridge/br_multicast.c |  21 ++-
> >  net/bridge/br_switchdev.c |   4 +-
> >  net/core/dev_addr_lists.c | 144 +++++++++++++++----
> >  net/dsa/Kconfig           |   1 +
> >  net/dsa/dsa2.c            |   6 +
> >  net/dsa/dsa_priv.h        |  27 +++-
> >  net/dsa/port.c            | 155 ++++++++++++++++----
> >  net/dsa/slave.c           | 288 +++++++++++++++++++++++++++++++-------
> >  net/dsa/switch.c          |  36 +++++
> >  net/ethernet/eth.c        |  12 +-
> >  20 files changed, 780 insertions(+), 131 deletions(-)
> >
> > --
> > 2.25.1
> >

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-05-24 16:13 ` Florian Fainelli
@ 2020-05-24 16:34   ` Vladimir Oltean
  0 siblings, 0 replies; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-24 16:34 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Andrew Lunn, Vivien Didelot, David S. Miller, Jiri Pirko,
	Ido Schimmel, Jakub Kicinski, Ivan Vecera, netdev,
	Horatiu Vultur, Allan W. Nielsen, Nikolay Aleksandrov,
	Roopa Prabhu

Hi Florian,

On Sun, 24 May 2020 at 19:13, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> Hi Vladimir,
>
> On 5/21/2020 2:10 PM, Vladimir Oltean wrote:
> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> >
> > This is a WIP series whose stated goal is to allow DSA and switchdev
> > drivers to flood less traffic to the CPU while keeping the same level of
> > functionality.
> >
> > The strategy is to whitelist towards the CPU only the {DMAC, VLAN} pairs
> > that the operating system has expressed its interest in, either due to
> > those being the MAC addresses of one of the switch ports, or addresses
> > added to our device's RX filter via calls to dev_uc_add/dev_mc_add.
> > Then, the traffic which is not explicitly whitelisted is not sent by the
> > hardware to the CPU, under the assumption that the CPU didn't ask for it
> > and would have dropped it anyway.
> >
> > The ground for these patches were the discussions surrounding RX
> > filtering with switchdev in general, as well as with DSA in particular:
> >
> > "[PATCH net-next 0/4] DSA: promisc on master, generic flow dissector code":
> > https://www.spinics.net/lists/netdev/msg651922.html
> > "[PATCH v3 net-next 2/2] net: dsa: felix: Allow unknown unicast traffic towards the CPU port module":
> > https://www.spinics.net/lists/netdev/msg634859.html
> > "[PATCH v3 0/2] net: core: Notify on changes to dev->promiscuity":
> > https://lkml.org/lkml/2019/8/29/255
> > LPC2019 - SwitchDev offload optimizations:
> > https://www.youtube.com/watch?v=B1HhxEcU7Jg
> >
> > Unicast filtering comes to me as most important, and this includes
> > termination of MAC addresses corresponding to the network interfaces in
> > the system (DSA switch ports, VLAN sub-interfaces, bridge interface).
> > The first 4 patches use Ivan Khoronzhuk's IVDF framework for extending
> > network interface addresses with a Virtual ID (typically VLAN ID). This
> > matches DSA switches perfectly because their FDB already contains keys
> > of the {DMAC, VID} form.
> >
> > Multicast filtering was taken and reworked from Florian Fainelli's
> > previous attempts, according to my own understanding of multicast
> > forwarding requirements of an IGMP snooping switch. This is the part
> > that needs the most extra work, not only in the DSA core but also in
> > drivers. For this reason, I've left out of this patchset anything that
> > has to do with driver-level configuration (since the audience is a bit
> > larger than usual), as I'm trying to focus more on policy for now, and
> > the series is already pretty huge.
>
>
> First off, thank you very much for collecting the various patches and
> bringing them up to date, so far I only had a cursory look at your
> patches and they do look good to me in principle. I plan on testing this
> next week with the b53/bcm_sf2 switches and give you some more detailed
> feedback.
>
> Which of UC or MC filtering do you value the most for your use cases?
> For me it would be MC filtering because the environment is usually
> Set-top-box and streaming devices.
> --
> Florian

Actually one of my main motivations has to do with the fact that with
sja1105, I can only deliver up to 32 unique VLANs to the CPU. But I do
want to be able to use the other ~2000 VLANs in an
autonomous-forwarding manner. So I need to do very strict bookkeeping
of {DMAC, VLAN} addresses that the operating system needs to see,
because the CPU port will not be a member of the
autonomously-forwarded VLANs.
So it's not that I value unicast filtering more than multicast
filtering - I need to do both before I can achieve this goal, but at
the moment I have some trouble setting up IGMP snooping to work
properly on a device that doesn't look beyond L2 headers. With
Ocelot/Felix that is easier, but it has some challenges of its own.

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-05-24 16:24   ` Vladimir Oltean
@ 2020-05-25 19:48     ` Ido Schimmel
  2020-05-25 20:23       ` Vladimir Oltean
  0 siblings, 1 reply; 46+ messages in thread
From: Ido Schimmel @ 2020-05-25 19:48 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, netdev, Horatiu Vultur,
	Allan W. Nielsen, Nikolay Aleksandrov, Roopa Prabhu

On Sun, May 24, 2020 at 07:24:27PM +0300, Vladimir Oltean wrote:
> On Sun, 24 May 2020 at 17:07, Ido Schimmel <idosch@idosch.org> wrote:
> >
> > On Fri, May 22, 2020 at 12:10:23AM +0300, Vladimir Oltean wrote:
> > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > >
> > > This is a WIP series whose stated goal is to allow DSA and switchdev
> > > drivers to flood less traffic to the CPU while keeping the same level of
> > > functionality.
> > >
> > > The strategy is to whitelist towards the CPU only the {DMAC, VLAN} pairs
> > > that the operating system has expressed its interest in, either due to
> > > those being the MAC addresses of one of the switch ports, or addresses
> > > added to our device's RX filter via calls to dev_uc_add/dev_mc_add.
> > > Then, the traffic which is not explicitly whitelisted is not sent by the
> > > hardware to the CPU, under the assumption that the CPU didn't ask for it
> > > and would have dropped it anyway.
> > >
> > > The ground for these patches were the discussions surrounding RX
> > > filtering with switchdev in general, as well as with DSA in particular:
> > >
> > > "[PATCH net-next 0/4] DSA: promisc on master, generic flow dissector code":
> > > https://www.spinics.net/lists/netdev/msg651922.html
> > > "[PATCH v3 net-next 2/2] net: dsa: felix: Allow unknown unicast traffic towards the CPU port module":
> > > https://www.spinics.net/lists/netdev/msg634859.html
> > > "[PATCH v3 0/2] net: core: Notify on changes to dev->promiscuity":
> > > https://lkml.org/lkml/2019/8/29/255
> > > LPC2019 - SwitchDev offload optimizations:
> > > https://www.youtube.com/watch?v=B1HhxEcU7Jg
> > >
> > > Unicast filtering comes to me as most important, and this includes
> > > termination of MAC addresses corresponding to the network interfaces in
> > > the system (DSA switch ports, VLAN sub-interfaces, bridge interface).
> > > The first 4 patches use Ivan Khoronzhuk's IVDF framework for extending
> > > network interface addresses with a Virtual ID (typically VLAN ID). This
> > > matches DSA switches perfectly because their FDB already contains keys
> > > of the {DMAC, VID} form.
> >
> > Hi,
> >
> > I read through the series and I'm not sure how unicast filtering works.
> > Instead of writing a very long mail I just created a script with
> > comments. I think it's clearer that way. Note that this is not a made up
> > configuration. It is used in setups involving VRRP / VXLAN, for example.
> >
> > ```
> > #!/bin/bash
> >
> > ip netns add ns1
> >
> > ip -n ns1 link add name br0 type bridge vlan_filtering 1
> > ip -n ns1 link add name dummy10 up type dummy
> >
> > ip -n ns1 link set dev dummy10 master br0
> > ip -n ns1 link set dev br0 up
> >
> > ip -n ns1 link add link br0 name vlan10 up type vlan id 10
> > bridge -n ns1 vlan add vid 10 dev br0 self
> >
> > echo "Before adding macvlan:"
> > echo "======================"
> >
> > echo -n "Promiscuous mode: "
> > ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
> >
> > echo -e "\nvlan10's MAC is in br0's FDB:"
> > bridge -n ns1 fdb show br0 vlan 10
> >
> > echo
> > echo "After adding macvlan:"
> > echo "====================="
> >
> > ip -n ns1 link add link vlan10 name vlan10-v up address 00:00:5e:00:01:01 \
> >         type macvlan mode private
> >
> > echo -n "Promiscuous mode: "
> > ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
> >
> > echo -e "\nvlan10-v's MAC is not in br0's FDB:"
> > bridge -n ns1 fdb show br0 | grep master | grep 00:00:5e:00:01:01
> > ```
> >
> > This is the output on my laptop (kernel 5.6.8):
> >
> > ```
> > Before adding macvlan:
> > ======================
> > Promiscuous mode: 0
> >
> > vlan10's MAC is in br0's FDB:
> > 42:bd:b1:cc:67:15 dev br0 vlan 10 master br0 permanent
> >
> > After adding macvlan:
> > =====================
> > Promiscuous mode: 1
> >
> > vlan10-v's MAC is not in br0's FDB:
> > ```
> >
> > Basically, if the MAC of the VLAN device is not inherited from the
> > bridge or you stack macvlans on top, then the bridge will go into
> > promiscuous mode and it will locally receive all frames passing through
> > it. It's not ideal, but it's a very old and simple behavior. It does not
> > require you to track the VLAN associated with the MAC addresses, for
> > example.
> >
> 
> This is a good point. I wasn't aware that the bridge 'gives up' with
> macvlan upper devices, but if I understand correctly, we do have the
> necessary tools to improve that.
> But actually, I'm wondering if this simple behavior from the bridge is
> correct. 

Why would it be incorrect?

> As you, Jiri and Ivan pointed out in last summer's email
> thread about the Linux bridge and promiscuous mode, putting the
> interface in IFF_PROMISC is only going to guarantee acceptance through
> the net device's RX filter, but not that the packets will go to the
> CPU.

IFF_PROMISC has no bearing on whether a packet should go to the CPU or
not. It only influences the device's RX filter, like you said. If you
only look at the software data path, the bridge being in promiscuous
mode means that all received packets will be injected to the kernel's Rx
path as if they were received through the bridge device. This includes,
for example, an IPv4 packet with an unknown unicast MAC (does not
correspond to your MAC). Such a packet will be later dropped by the IPv4
code since it's not addressed to you:

vi net/ipv4/ip_input.c +443

We maintain the same behavior in the hardware data path. We don't have
MAC filtering in the router like the software data path, so we only send
to the router unicast MACs that correspond to the bridge's MAC and its
uppers. If such packets later hit a local route (for example), then they
will be trapped to the CPU, but the more common case is to simply route
them through a different device due to a prefix / gateway route. These
never reach the CPU.

> So from that perspective, the current series would break things, so we
> should definitely fix that and keep the {MAC, VLAN} pairs in the
> bridge's local FDB.

Not sure I follow. Can you explain what will break and why?

> 
> > When you are offloading the Linux data path to hardware this behavior is
> > not ideal as your hardware can handle much higher packet rates than the
> > CPU.
> >
> > In mlxsw we handle this by tracking the upper devices of the bridge. I
> > was hoping that with Ivan's patches we could add support for unicast
> > filtering in the bridge driver and program the MAC addresses to its FDB
> > with 'local' flag. Then the FDB entries would be notified via switchdev
> > to device drivers.
> >
> 
> Yes, it should be possible to do that. I'll try and see how far I get.
> 
> > >
> > > Multicast filtering was taken and reworked from Florian Fainelli's
> > > previous attempts, according to my own understanding of multicast
> > > forwarding requirements of an IGMP snooping switch. This is the part
> > > that needs the most extra work, not only in the DSA core but also in
> > > drivers. For this reason, I've left out of this patchset anything that
> > > has to do with driver-level configuration (since the audience is a bit
> > > larger than usual), as I'm trying to focus more on policy for now, and
> > > the series is already pretty huge.
> >
> > From what I remember, this is the logic in the Linux bridge:
> >
> > * Broadcast is always locally received
> > * Multicast is locally received if:
> >         * Snooping disabled
> >         * Snooping enabled:
> >                 * Bridge netdev is mrouter port
> >                 or
> >                 * Matches MDB entry with 'host_joined' indication
> >
> > >
> > > Florian Fainelli (3):
> > >   net: bridge: multicast: propagate br_mc_disabled_update() return
> > >   net: dsa: add ability to program unicast and multicast filters for CPU
> > >     port
> > >   net: dsa: wire up multicast IGMP snooping attribute notification
> > >
> > > Ivan Khoronzhuk (4):
> > >   net: core: dev_addr_lists: add VID to device address
> > >   net: 8021q: vlan_dev: add vid tag to addresses of uc and mc lists
> > >   net: 8021q: vlan_dev: add vid tag for vlan device own address
> > >   ethernet: eth: add default vid len for all ethernet kind devices
> > >
> > > Vladimir Oltean (6):
> > >   net: core: dev_addr_lists: export some raw __hw_addr helpers
> > >   net: dsa: don't use switchdev_notifier_fdb_info in
> > >     dsa_switchdev_event_work
> > >   net: dsa: mroute: don't panic the kernel if called without the prepare
> > >     phase
> > >   net: bridge: add port flags for host flooding
> > >   net: dsa: deal with new flooding port attributes from bridge
> > >   net: dsa: treat switchdev notifications for multicast router connected
> > >     to port
> > >
> > >  include/linux/if_bridge.h |   3 +
> > >  include/linux/if_vlan.h   |   2 +
> > >  include/linux/netdevice.h |  11 ++
> > >  include/net/dsa.h         |  17 +++
> > >  net/8021q/Kconfig         |  12 ++
> > >  net/8021q/vlan.c          |   3 +
> > >  net/8021q/vlan.h          |   2 +
> > >  net/8021q/vlan_core.c     |  25 ++++
> > >  net/8021q/vlan_dev.c      | 102 +++++++++++---
> > >  net/bridge/br_if.c        |  40 ++++++
> > >  net/bridge/br_multicast.c |  21 ++-
> > >  net/bridge/br_switchdev.c |   4 +-
> > >  net/core/dev_addr_lists.c | 144 +++++++++++++++----
> > >  net/dsa/Kconfig           |   1 +
> > >  net/dsa/dsa2.c            |   6 +
> > >  net/dsa/dsa_priv.h        |  27 +++-
> > >  net/dsa/port.c            | 155 ++++++++++++++++----
> > >  net/dsa/slave.c           | 288 +++++++++++++++++++++++++++++++-------
> > >  net/dsa/switch.c          |  36 +++++
> > >  net/ethernet/eth.c        |  12 +-
> > >  20 files changed, 780 insertions(+), 131 deletions(-)
> > >
> > > --
> > > 2.25.1
> > >
> 
> Thanks,
> -Vladimir

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding
  2020-05-24 16:13     ` Vladimir Oltean
@ 2020-05-25 20:11       ` Ido Schimmel
  2020-05-25 20:32         ` Vladimir Oltean
  2020-07-23 22:35         ` Vladimir Oltean
  0 siblings, 2 replies; 46+ messages in thread
From: Ido Schimmel @ 2020-05-25 20:11 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, netdev, Horatiu Vultur,
	Allan W. Nielsen, Nikolay Aleksandrov, Roopa Prabhu

On Sun, May 24, 2020 at 07:13:46PM +0300, Vladimir Oltean wrote:
> Hi Ido,
> 
> On Sun, 24 May 2020 at 17:26, Ido Schimmel <idosch@idosch.org> wrote:
> >
> > On Fri, May 22, 2020 at 12:10:33AM +0300, Vladimir Oltean wrote:
> > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > >
> > > In cases where the bridge is offloaded by a switchdev, there are
> > > situations where we can optimize RX filtering towards the host. To be
> > > precise, the host only needs to do termination, which it can do by
> > > responding at the MAC addresses of the slave ports and of the bridge
> > > interface itself. But most notably, it doesn't need to do forwarding,
> > > so there is no need to see packets with unknown destination address.
> > >
> > > But there are, however, cases when a switchdev does need to flood to the
> > > CPU. Such an example is when the switchdev is bridged with a foreign
> > > interface, and since there is no offloaded datapath, packets need to
> > > pass through the CPU. Currently this is the only identified case, but it
> > > can be extended at any time.
> > >
> > > So far, switchdev implementers made driver-level assumptions, such as:
> > > this chip is never integrated in SoCs where it can be bridged with a
> > > foreign interface, so I'll just disable host flooding and save some CPU
> > > cycles. Or: I can never know what else can be bridged with this
> > > switchdev port, so I must leave host flooding enabled in any case.
> > >
> > > Let the bridge drive the host flooding decision, and pass it to
> > > switchdev via the same mechanism as the external flooding flags.
> > >
> > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > ---
> > >  include/linux/if_bridge.h |  3 +++
> > >  net/bridge/br_if.c        | 40 +++++++++++++++++++++++++++++++++++++++
> > >  net/bridge/br_switchdev.c |  4 +++-
> > >  3 files changed, 46 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> > > index b3a8d3054af0..6891a432862d 100644
> > > --- a/include/linux/if_bridge.h
> > > +++ b/include/linux/if_bridge.h
> > > @@ -49,6 +49,9 @@ struct br_ip_list {
> > >  #define BR_ISOLATED          BIT(16)
> > >  #define BR_MRP_AWARE         BIT(17)
> > >  #define BR_MRP_LOST_CONT     BIT(18)
> > > +#define BR_HOST_FLOOD                BIT(19)
> > > +#define BR_HOST_MCAST_FLOOD  BIT(20)
> > > +#define BR_HOST_BCAST_FLOOD  BIT(21)
> > >
> > >  #define BR_DEFAULT_AGEING_TIME       (300 * HZ)
> > >
> > > diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> > > index a0e9a7937412..aae59d1e619b 100644
> > > --- a/net/bridge/br_if.c
> > > +++ b/net/bridge/br_if.c
> > > @@ -166,6 +166,45 @@ void br_manage_promisc(struct net_bridge *br)
> > >       }
> > >  }
> > >
> > > +static int br_manage_host_flood(struct net_bridge *br)
> > > +{
> > > +     const unsigned long mask = BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD |
> > > +                                BR_HOST_BCAST_FLOOD;
> > > +     struct net_bridge_port *p, *q;
> > > +
> > > +     list_for_each_entry(p, &br->port_list, list) {
> > > +             unsigned long flags = p->flags;
> > > +             bool sw_bridging = false;
> > > +             int err;
> > > +
> > > +             list_for_each_entry(q, &br->port_list, list) {
> > > +                     if (p == q)
> > > +                             continue;
> > > +
> > > +                     if (!netdev_port_same_parent_id(p->dev, q->dev)) {
> > > +                             sw_bridging = true;
> >
> > It's not that simple. There are cases where not all bridge slaves have
> > the same parent ID and still there is no reason to flood traffic to the
> > CPU. VXLAN, for example.
> >
> > You could argue that the VXLAN device needs to have the same parent ID
> > as the physical netdevs member in the bridge, but it will break your
> > data path. For example, lets assume your hardware decided to flood a
> > packet in L2. The packet will egress all the local ports, but will also
> > perform VXLAN encapsulation. The packet continues with the IP of the
> > remote VTEP(s) to the underlay router and then encounters a neighbour
> > miss exception, which sends it to the CPU for resolution.
> >
> > Since this exception was encountered in the router the driver would mark
> > the packet with 'offload_fwd_mark', as it already performed L2
> > forwarding. If the VXLAN device has the same parent ID as the physical
> > netdevs, then the Linux bridge will never let it egress, nothing will
> > trigger neighbour resolution and the packet will be discarded.
> >
> 
> I wasn't going to argue that.
> Ok, so with a bridged VXLAN only certain multicast DMACs corresponding
> to multicast IPs should be flooded to the CPU.
> Actually Allan's example was a bit simpler, he said that host flooding
> can be made a per-VLAN flag. I'm glad that you raised this. So maybe
> we should try to define some mechanism by which virtual interfaces can
> specify to the bridge that they don't need to see all traffic? Do you
> have any ideas?

Maybe, when a port joins a bridge, query member ports if they can
forward traffic to it in hardware and based on the answer determine the
flooding towards the CPU?

> 
> > > +                             break;
> > > +                     }
> > > +             }
> > > +
> > > +             if (sw_bridging)
> > > +                     flags |= mask;
> > > +             else
> > > +                     flags &= ~mask;
> > > +
> > > +             if (flags == p->flags)
> > > +                     continue;
> > > +
> > > +             err = br_switchdev_set_port_flag(p, flags, mask);
> > > +             if (err)
> > > +                     return err;
> > > +
> > > +             p->flags = flags;
> > > +     }
> > > +
> > > +     return 0;
> > > +}
> > > +
> > >  int nbp_backup_change(struct net_bridge_port *p,
> > >                     struct net_device *backup_dev)
> > >  {
> > > @@ -231,6 +270,7 @@ static void nbp_update_port_count(struct net_bridge *br)
> > >               br->auto_cnt = cnt;
> > >               br_manage_promisc(br);
> > >       }
> > > +     br_manage_host_flood(br);
> > >  }
> > >
> > >  static void nbp_delete_promisc(struct net_bridge_port *p)
> > > diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> > > index 015209bf44aa..360806ac7463 100644
> > > --- a/net/bridge/br_switchdev.c
> > > +++ b/net/bridge/br_switchdev.c
> > > @@ -56,7 +56,9 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
> > >
> > >  /* Flags that can be offloaded to hardware */
> > >  #define BR_PORT_FLAGS_HW_OFFLOAD (BR_LEARNING | BR_FLOOD | \
> > > -                               BR_MCAST_FLOOD | BR_BCAST_FLOOD)
> > > +                               BR_MCAST_FLOOD | BR_BCAST_FLOOD | \
> > > +                               BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD | \
> > > +                               BR_HOST_BCAST_FLOOD)
> > >
> > >  int br_switchdev_set_port_flag(struct net_bridge_port *p,
> > >                              unsigned long flags,
> > > --
> > > 2.25.1
> > >
> 
> Thanks,
> -Vladimir

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-05-25 19:48     ` Ido Schimmel
@ 2020-05-25 20:23       ` Vladimir Oltean
  2020-05-26 14:01         ` Ido Schimmel
  0 siblings, 1 reply; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-25 20:23 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, netdev, Horatiu Vultur,
	Allan W. Nielsen, Nikolay Aleksandrov, Roopa Prabhu

Hi Ido,

On Mon, 25 May 2020 at 22:48, Ido Schimmel <idosch@idosch.org> wrote:
>
> On Sun, May 24, 2020 at 07:24:27PM +0300, Vladimir Oltean wrote:
> > On Sun, 24 May 2020 at 17:07, Ido Schimmel <idosch@idosch.org> wrote:
> > >
> > > On Fri, May 22, 2020 at 12:10:23AM +0300, Vladimir Oltean wrote:
> > > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > >
> > > > This is a WIP series whose stated goal is to allow DSA and switchdev
> > > > drivers to flood less traffic to the CPU while keeping the same level of
> > > > functionality.
> > > >
> > > > The strategy is to whitelist towards the CPU only the {DMAC, VLAN} pairs
> > > > that the operating system has expressed its interest in, either due to
> > > > those being the MAC addresses of one of the switch ports, or addresses
> > > > added to our device's RX filter via calls to dev_uc_add/dev_mc_add.
> > > > Then, the traffic which is not explicitly whitelisted is not sent by the
> > > > hardware to the CPU, under the assumption that the CPU didn't ask for it
> > > > and would have dropped it anyway.
> > > >
> > > > The ground for these patches were the discussions surrounding RX
> > > > filtering with switchdev in general, as well as with DSA in particular:
> > > >
> > > > "[PATCH net-next 0/4] DSA: promisc on master, generic flow dissector code":
> > > > https://www.spinics.net/lists/netdev/msg651922.html
> > > > "[PATCH v3 net-next 2/2] net: dsa: felix: Allow unknown unicast traffic towards the CPU port module":
> > > > https://www.spinics.net/lists/netdev/msg634859.html
> > > > "[PATCH v3 0/2] net: core: Notify on changes to dev->promiscuity":
> > > > https://lkml.org/lkml/2019/8/29/255
> > > > LPC2019 - SwitchDev offload optimizations:
> > > > https://www.youtube.com/watch?v=B1HhxEcU7Jg
> > > >
> > > > Unicast filtering comes to me as most important, and this includes
> > > > termination of MAC addresses corresponding to the network interfaces in
> > > > the system (DSA switch ports, VLAN sub-interfaces, bridge interface).
> > > > The first 4 patches use Ivan Khoronzhuk's IVDF framework for extending
> > > > network interface addresses with a Virtual ID (typically VLAN ID). This
> > > > matches DSA switches perfectly because their FDB already contains keys
> > > > of the {DMAC, VID} form.
> > >
> > > Hi,
> > >
> > > I read through the series and I'm not sure how unicast filtering works.
> > > Instead of writing a very long mail I just created a script with
> > > comments. I think it's clearer that way. Note that this is not a made up
> > > configuration. It is used in setups involving VRRP / VXLAN, for example.
> > >
> > > ```
> > > #!/bin/bash
> > >
> > > ip netns add ns1
> > >
> > > ip -n ns1 link add name br0 type bridge vlan_filtering 1
> > > ip -n ns1 link add name dummy10 up type dummy
> > >
> > > ip -n ns1 link set dev dummy10 master br0
> > > ip -n ns1 link set dev br0 up
> > >
> > > ip -n ns1 link add link br0 name vlan10 up type vlan id 10
> > > bridge -n ns1 vlan add vid 10 dev br0 self
> > >
> > > echo "Before adding macvlan:"
> > > echo "======================"
> > >
> > > echo -n "Promiscuous mode: "
> > > ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
> > >
> > > echo -e "\nvlan10's MAC is in br0's FDB:"
> > > bridge -n ns1 fdb show br0 vlan 10
> > >
> > > echo
> > > echo "After adding macvlan:"
> > > echo "====================="
> > >
> > > ip -n ns1 link add link vlan10 name vlan10-v up address 00:00:5e:00:01:01 \
> > >         type macvlan mode private
> > >
> > > echo -n "Promiscuous mode: "
> > > ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
> > >
> > > echo -e "\nvlan10-v's MAC is not in br0's FDB:"
> > > bridge -n ns1 fdb show br0 | grep master | grep 00:00:5e:00:01:01
> > > ```
> > >
> > > This is the output on my laptop (kernel 5.6.8):
> > >
> > > ```
> > > Before adding macvlan:
> > > ======================
> > > Promiscuous mode: 0
> > >
> > > vlan10's MAC is in br0's FDB:
> > > 42:bd:b1:cc:67:15 dev br0 vlan 10 master br0 permanent
> > >
> > > After adding macvlan:
> > > =====================
> > > Promiscuous mode: 1
> > >
> > > vlan10-v's MAC is not in br0's FDB:
> > > ```
> > >
> > > Basically, if the MAC of the VLAN device is not inherited from the
> > > bridge or you stack macvlans on top, then the bridge will go into
> > > promiscuous mode and it will locally receive all frames passing through
> > > it. It's not ideal, but it's a very old and simple behavior. It does not
> > > require you to track the VLAN associated with the MAC addresses, for
> > > example.
> > >
> >
> > This is a good point. I wasn't aware that the bridge 'gives up' with
> > macvlan upper devices, but if I understand correctly, we do have the
> > necessary tools to improve that.
> > But actually, I'm wondering if this simple behavior from the bridge is
> > correct.
>
> Why would it be incorrect?
>
> > As you, Jiri and Ivan pointed out in last summer's email
> > thread about the Linux bridge and promiscuous mode, putting the
> > interface in IFF_PROMISC is only going to guarantee acceptance through
> > the net device's RX filter, but not that the packets will go to the
> > CPU.
>
> IFF_PROMISC has no bearing on whether a packet should go to the CPU or
> not. It only influences the device's RX filter, like you said. If you
> only look at the software data path, the bridge being in promiscuous
> mode means that all received packets will be injected to the kernel's Rx
> path as if they were received through the bridge device. This includes,
> for example, an IPv4 packet with an unknown unicast MAC (does not
> correspond to your MAC). Such a packet will be later dropped by the IPv4
> code since it's not addressed to you:
>
> vi net/ipv4/ip_input.c +443
>
> We maintain the same behavior in the hardware data path. We don't have
> MAC filtering in the router like the software data path, so we only send
> to the router unicast MACs that correspond to the bridge's MAC and its
> uppers. If such packets later hit a local route (for example), then they
> will be trapped to the CPU, but the more common case is to simply route
> them through a different device due to a prefix / gateway route. These
> never reach the CPU.
>
> > So from that perspective, the current series would break things, so we
> > should definitely fix that and keep the {MAC, VLAN} pairs in the
> > bridge's local FDB.
>
> Not sure I follow. Can you explain what will break and why?
>

I haven't done any further testing since yesterday, so my level of
(mis)understanding is the same. Let's hope at least I can explain
better this time.

I guess what I didn't understand from your "macvlan upper whose MAC
address isn't inherited from bridge" is why does the bridge go in
promiscuous mode. You said that it's so that the slave ports won't
drop packets with that DMAC, I said ok, yes the packets would get
dropped without promisc, but also promisc still doesn't mean the
packets will land on the CPU. This is one of the cases where the
bridge puts an interface in promisc mode with the intention of making
the CPU see some frames, something which has been argued, in the
context of switchdev, that was never the case. You said that's all
true, and that in mlxsw you're giving the bridge a helping hand, by
tracking the bridge's uppers in order to keep something that works by
accident in software working with switchdev too. I said that this is a
weird layering violation, because the bridge's job is to notify the
driver of addresses it needs to see, not for the driver to fish for
them.
As for "what will break and why". My current patch proposal is to only
send to the CPU the addresses added via dev_uc_add and dev_mc_add,
basically. The macvlan upper of the bridge would not be part of that
list. My rhetorical question then becomes: whose fault is it that
macvlan breaks? Mine for not tracking the bridge upper, or the bridge
for not notifying me and just pretending that 'promisc' means 'the CPU
will see all packets, including the ones I need'? Of course I think
it's the bridge.

> >
> > > When you are offloading the Linux data path to hardware this behavior is
> > > not ideal as your hardware can handle much higher packet rates than the
> > > CPU.
> > >
> > > In mlxsw we handle this by tracking the upper devices of the bridge. I
> > > was hoping that with Ivan's patches we could add support for unicast
> > > filtering in the bridge driver and program the MAC addresses to its FDB
> > > with 'local' flag. Then the FDB entries would be notified via switchdev
> > > to device drivers.
> > >
> >
> > Yes, it should be possible to do that. I'll try and see how far I get.
> >
> > > >
> > > > Multicast filtering was taken and reworked from Florian Fainelli's
> > > > previous attempts, according to my own understanding of multicast
> > > > forwarding requirements of an IGMP snooping switch. This is the part
> > > > that needs the most extra work, not only in the DSA core but also in
> > > > drivers. For this reason, I've left out of this patchset anything that
> > > > has to do with driver-level configuration (since the audience is a bit
> > > > larger than usual), as I'm trying to focus more on policy for now, and
> > > > the series is already pretty huge.
> > >
> > > From what I remember, this is the logic in the Linux bridge:
> > >
> > > * Broadcast is always locally received
> > > * Multicast is locally received if:
> > >         * Snooping disabled
> > >         * Snooping enabled:
> > >                 * Bridge netdev is mrouter port
> > >                 or
> > >                 * Matches MDB entry with 'host_joined' indication
> > >
> > > >
> > > > Florian Fainelli (3):
> > > >   net: bridge: multicast: propagate br_mc_disabled_update() return
> > > >   net: dsa: add ability to program unicast and multicast filters for CPU
> > > >     port
> > > >   net: dsa: wire up multicast IGMP snooping attribute notification
> > > >
> > > > Ivan Khoronzhuk (4):
> > > >   net: core: dev_addr_lists: add VID to device address
> > > >   net: 8021q: vlan_dev: add vid tag to addresses of uc and mc lists
> > > >   net: 8021q: vlan_dev: add vid tag for vlan device own address
> > > >   ethernet: eth: add default vid len for all ethernet kind devices
> > > >
> > > > Vladimir Oltean (6):
> > > >   net: core: dev_addr_lists: export some raw __hw_addr helpers
> > > >   net: dsa: don't use switchdev_notifier_fdb_info in
> > > >     dsa_switchdev_event_work
> > > >   net: dsa: mroute: don't panic the kernel if called without the prepare
> > > >     phase
> > > >   net: bridge: add port flags for host flooding
> > > >   net: dsa: deal with new flooding port attributes from bridge
> > > >   net: dsa: treat switchdev notifications for multicast router connected
> > > >     to port
> > > >
> > > >  include/linux/if_bridge.h |   3 +
> > > >  include/linux/if_vlan.h   |   2 +
> > > >  include/linux/netdevice.h |  11 ++
> > > >  include/net/dsa.h         |  17 +++
> > > >  net/8021q/Kconfig         |  12 ++
> > > >  net/8021q/vlan.c          |   3 +
> > > >  net/8021q/vlan.h          |   2 +
> > > >  net/8021q/vlan_core.c     |  25 ++++
> > > >  net/8021q/vlan_dev.c      | 102 +++++++++++---
> > > >  net/bridge/br_if.c        |  40 ++++++
> > > >  net/bridge/br_multicast.c |  21 ++-
> > > >  net/bridge/br_switchdev.c |   4 +-
> > > >  net/core/dev_addr_lists.c | 144 +++++++++++++++----
> > > >  net/dsa/Kconfig           |   1 +
> > > >  net/dsa/dsa2.c            |   6 +
> > > >  net/dsa/dsa_priv.h        |  27 +++-
> > > >  net/dsa/port.c            | 155 ++++++++++++++++----
> > > >  net/dsa/slave.c           | 288 +++++++++++++++++++++++++++++++-------
> > > >  net/dsa/switch.c          |  36 +++++
> > > >  net/ethernet/eth.c        |  12 +-
> > > >  20 files changed, 780 insertions(+), 131 deletions(-)
> > > >
> > > > --
> > > > 2.25.1
> > > >
> >
> > Thanks,
> > -Vladimir

-Vladimir

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding
  2020-05-25 20:11       ` Ido Schimmel
@ 2020-05-25 20:32         ` Vladimir Oltean
  2020-07-23 22:35         ` Vladimir Oltean
  1 sibling, 0 replies; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-25 20:32 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, netdev, Horatiu Vultur,
	Allan W. Nielsen, Nikolay Aleksandrov, Roopa Prabhu

On Mon, 25 May 2020 at 23:11, Ido Schimmel <idosch@idosch.org> wrote:
>
> On Sun, May 24, 2020 at 07:13:46PM +0300, Vladimir Oltean wrote:
> > Hi Ido,
> >
> > On Sun, 24 May 2020 at 17:26, Ido Schimmel <idosch@idosch.org> wrote:
> > >
> > > On Fri, May 22, 2020 at 12:10:33AM +0300, Vladimir Oltean wrote:
> > > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > >
> > > > In cases where the bridge is offloaded by a switchdev, there are
> > > > situations where we can optimize RX filtering towards the host. To be
> > > > precise, the host only needs to do termination, which it can do by
> > > > responding at the MAC addresses of the slave ports and of the bridge
> > > > interface itself. But most notably, it doesn't need to do forwarding,
> > > > so there is no need to see packets with unknown destination address.
> > > >
> > > > But there are, however, cases when a switchdev does need to flood to the
> > > > CPU. Such an example is when the switchdev is bridged with a foreign
> > > > interface, and since there is no offloaded datapath, packets need to
> > > > pass through the CPU. Currently this is the only identified case, but it
> > > > can be extended at any time.
> > > >
> > > > So far, switchdev implementers made driver-level assumptions, such as:
> > > > this chip is never integrated in SoCs where it can be bridged with a
> > > > foreign interface, so I'll just disable host flooding and save some CPU
> > > > cycles. Or: I can never know what else can be bridged with this
> > > > switchdev port, so I must leave host flooding enabled in any case.
> > > >
> > > > Let the bridge drive the host flooding decision, and pass it to
> > > > switchdev via the same mechanism as the external flooding flags.
> > > >
> > > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > > ---
> > > >  include/linux/if_bridge.h |  3 +++
> > > >  net/bridge/br_if.c        | 40 +++++++++++++++++++++++++++++++++++++++
> > > >  net/bridge/br_switchdev.c |  4 +++-
> > > >  3 files changed, 46 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> > > > index b3a8d3054af0..6891a432862d 100644
> > > > --- a/include/linux/if_bridge.h
> > > > +++ b/include/linux/if_bridge.h
> > > > @@ -49,6 +49,9 @@ struct br_ip_list {
> > > >  #define BR_ISOLATED          BIT(16)
> > > >  #define BR_MRP_AWARE         BIT(17)
> > > >  #define BR_MRP_LOST_CONT     BIT(18)
> > > > +#define BR_HOST_FLOOD                BIT(19)
> > > > +#define BR_HOST_MCAST_FLOOD  BIT(20)
> > > > +#define BR_HOST_BCAST_FLOOD  BIT(21)
> > > >
> > > >  #define BR_DEFAULT_AGEING_TIME       (300 * HZ)
> > > >
> > > > diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> > > > index a0e9a7937412..aae59d1e619b 100644
> > > > --- a/net/bridge/br_if.c
> > > > +++ b/net/bridge/br_if.c
> > > > @@ -166,6 +166,45 @@ void br_manage_promisc(struct net_bridge *br)
> > > >       }
> > > >  }
> > > >
> > > > +static int br_manage_host_flood(struct net_bridge *br)
> > > > +{
> > > > +     const unsigned long mask = BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD |
> > > > +                                BR_HOST_BCAST_FLOOD;
> > > > +     struct net_bridge_port *p, *q;
> > > > +
> > > > +     list_for_each_entry(p, &br->port_list, list) {
> > > > +             unsigned long flags = p->flags;
> > > > +             bool sw_bridging = false;
> > > > +             int err;
> > > > +
> > > > +             list_for_each_entry(q, &br->port_list, list) {
> > > > +                     if (p == q)
> > > > +                             continue;
> > > > +
> > > > +                     if (!netdev_port_same_parent_id(p->dev, q->dev)) {
> > > > +                             sw_bridging = true;
> > >
> > > It's not that simple. There are cases where not all bridge slaves have
> > > the same parent ID and still there is no reason to flood traffic to the
> > > CPU. VXLAN, for example.
> > >
> > > You could argue that the VXLAN device needs to have the same parent ID
> > > as the physical netdevs member in the bridge, but it will break your
> > > data path. For example, lets assume your hardware decided to flood a
> > > packet in L2. The packet will egress all the local ports, but will also
> > > perform VXLAN encapsulation. The packet continues with the IP of the
> > > remote VTEP(s) to the underlay router and then encounters a neighbour
> > > miss exception, which sends it to the CPU for resolution.
> > >
> > > Since this exception was encountered in the router the driver would mark
> > > the packet with 'offload_fwd_mark', as it already performed L2
> > > forwarding. If the VXLAN device has the same parent ID as the physical
> > > netdevs, then the Linux bridge will never let it egress, nothing will
> > > trigger neighbour resolution and the packet will be discarded.
> > >
> >
> > I wasn't going to argue that.
> > Ok, so with a bridged VXLAN only certain multicast DMACs corresponding
> > to multicast IPs should be flooded to the CPU.
> > Actually Allan's example was a bit simpler, he said that host flooding
> > can be made a per-VLAN flag. I'm glad that you raised this. So maybe
> > we should try to define some mechanism by which virtual interfaces can
> > specify to the bridge that they don't need to see all traffic? Do you
> > have any ideas?
>
> Maybe, when a port joins a bridge, query member ports if they can
> forward traffic to it in hardware and based on the answer determine the
> flooding towards the CPU?
>

Ok, should this be a new ndo or some already existing mechanism? In
what level of detail does the bridge need to know what filters is the
virtual interface going to apply? Just binary yes/no? In that case,
could it only check for the netdev ops?

> >
> > > > +                             break;
> > > > +                     }
> > > > +             }
> > > > +
> > > > +             if (sw_bridging)
> > > > +                     flags |= mask;
> > > > +             else
> > > > +                     flags &= ~mask;
> > > > +
> > > > +             if (flags == p->flags)
> > > > +                     continue;
> > > > +
> > > > +             err = br_switchdev_set_port_flag(p, flags, mask);
> > > > +             if (err)
> > > > +                     return err;
> > > > +
> > > > +             p->flags = flags;
> > > > +     }
> > > > +
> > > > +     return 0;
> > > > +}
> > > > +
> > > >  int nbp_backup_change(struct net_bridge_port *p,
> > > >                     struct net_device *backup_dev)
> > > >  {
> > > > @@ -231,6 +270,7 @@ static void nbp_update_port_count(struct net_bridge *br)
> > > >               br->auto_cnt = cnt;
> > > >               br_manage_promisc(br);
> > > >       }
> > > > +     br_manage_host_flood(br);
> > > >  }
> > > >
> > > >  static void nbp_delete_promisc(struct net_bridge_port *p)
> > > > diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> > > > index 015209bf44aa..360806ac7463 100644
> > > > --- a/net/bridge/br_switchdev.c
> > > > +++ b/net/bridge/br_switchdev.c
> > > > @@ -56,7 +56,9 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
> > > >
> > > >  /* Flags that can be offloaded to hardware */
> > > >  #define BR_PORT_FLAGS_HW_OFFLOAD (BR_LEARNING | BR_FLOOD | \
> > > > -                               BR_MCAST_FLOOD | BR_BCAST_FLOOD)
> > > > +                               BR_MCAST_FLOOD | BR_BCAST_FLOOD | \
> > > > +                               BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD | \
> > > > +                               BR_HOST_BCAST_FLOOD)
> > > >
> > > >  int br_switchdev_set_port_flag(struct net_bridge_port *p,
> > > >                              unsigned long flags,
> > > > --
> > > > 2.25.1
> > > >
> >
> > Thanks,
> > -Vladimir

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-05-25 20:23       ` Vladimir Oltean
@ 2020-05-26 14:01         ` Ido Schimmel
  2020-05-27 11:36           ` Vladimir Oltean
  0 siblings, 1 reply; 46+ messages in thread
From: Ido Schimmel @ 2020-05-26 14:01 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, netdev, Horatiu Vultur,
	Allan W. Nielsen, Nikolay Aleksandrov, Roopa Prabhu

On Mon, May 25, 2020 at 11:23:34PM +0300, Vladimir Oltean wrote:
> Hi Ido,
> 
> On Mon, 25 May 2020 at 22:48, Ido Schimmel <idosch@idosch.org> wrote:
> >
> > On Sun, May 24, 2020 at 07:24:27PM +0300, Vladimir Oltean wrote:
> > > On Sun, 24 May 2020 at 17:07, Ido Schimmel <idosch@idosch.org> wrote:
> > > >
> > > > On Fri, May 22, 2020 at 12:10:23AM +0300, Vladimir Oltean wrote:
> > > > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > > >
> > > > > This is a WIP series whose stated goal is to allow DSA and switchdev
> > > > > drivers to flood less traffic to the CPU while keeping the same level of
> > > > > functionality.
> > > > >
> > > > > The strategy is to whitelist towards the CPU only the {DMAC, VLAN} pairs
> > > > > that the operating system has expressed its interest in, either due to
> > > > > those being the MAC addresses of one of the switch ports, or addresses
> > > > > added to our device's RX filter via calls to dev_uc_add/dev_mc_add.
> > > > > Then, the traffic which is not explicitly whitelisted is not sent by the
> > > > > hardware to the CPU, under the assumption that the CPU didn't ask for it
> > > > > and would have dropped it anyway.
> > > > >
> > > > > The ground for these patches were the discussions surrounding RX
> > > > > filtering with switchdev in general, as well as with DSA in particular:
> > > > >
> > > > > "[PATCH net-next 0/4] DSA: promisc on master, generic flow dissector code":
> > > > > https://www.spinics.net/lists/netdev/msg651922.html
> > > > > "[PATCH v3 net-next 2/2] net: dsa: felix: Allow unknown unicast traffic towards the CPU port module":
> > > > > https://www.spinics.net/lists/netdev/msg634859.html
> > > > > "[PATCH v3 0/2] net: core: Notify on changes to dev->promiscuity":
> > > > > https://lkml.org/lkml/2019/8/29/255
> > > > > LPC2019 - SwitchDev offload optimizations:
> > > > > https://www.youtube.com/watch?v=B1HhxEcU7Jg
> > > > >
> > > > > Unicast filtering comes to me as most important, and this includes
> > > > > termination of MAC addresses corresponding to the network interfaces in
> > > > > the system (DSA switch ports, VLAN sub-interfaces, bridge interface).
> > > > > The first 4 patches use Ivan Khoronzhuk's IVDF framework for extending
> > > > > network interface addresses with a Virtual ID (typically VLAN ID). This
> > > > > matches DSA switches perfectly because their FDB already contains keys
> > > > > of the {DMAC, VID} form.
> > > >
> > > > Hi,
> > > >
> > > > I read through the series and I'm not sure how unicast filtering works.
> > > > Instead of writing a very long mail I just created a script with
> > > > comments. I think it's clearer that way. Note that this is not a made up
> > > > configuration. It is used in setups involving VRRP / VXLAN, for example.
> > > >
> > > > ```
> > > > #!/bin/bash
> > > >
> > > > ip netns add ns1
> > > >
> > > > ip -n ns1 link add name br0 type bridge vlan_filtering 1
> > > > ip -n ns1 link add name dummy10 up type dummy
> > > >
> > > > ip -n ns1 link set dev dummy10 master br0
> > > > ip -n ns1 link set dev br0 up
> > > >
> > > > ip -n ns1 link add link br0 name vlan10 up type vlan id 10
> > > > bridge -n ns1 vlan add vid 10 dev br0 self
> > > >
> > > > echo "Before adding macvlan:"
> > > > echo "======================"
> > > >
> > > > echo -n "Promiscuous mode: "
> > > > ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
> > > >
> > > > echo -e "\nvlan10's MAC is in br0's FDB:"
> > > > bridge -n ns1 fdb show br0 vlan 10
> > > >
> > > > echo
> > > > echo "After adding macvlan:"
> > > > echo "====================="
> > > >
> > > > ip -n ns1 link add link vlan10 name vlan10-v up address 00:00:5e:00:01:01 \
> > > >         type macvlan mode private
> > > >
> > > > echo -n "Promiscuous mode: "
> > > > ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
> > > >
> > > > echo -e "\nvlan10-v's MAC is not in br0's FDB:"
> > > > bridge -n ns1 fdb show br0 | grep master | grep 00:00:5e:00:01:01
> > > > ```
> > > >
> > > > This is the output on my laptop (kernel 5.6.8):
> > > >
> > > > ```
> > > > Before adding macvlan:
> > > > ======================
> > > > Promiscuous mode: 0
> > > >
> > > > vlan10's MAC is in br0's FDB:
> > > > 42:bd:b1:cc:67:15 dev br0 vlan 10 master br0 permanent
> > > >
> > > > After adding macvlan:
> > > > =====================
> > > > Promiscuous mode: 1
> > > >
> > > > vlan10-v's MAC is not in br0's FDB:
> > > > ```
> > > >
> > > > Basically, if the MAC of the VLAN device is not inherited from the
> > > > bridge or you stack macvlans on top, then the bridge will go into
> > > > promiscuous mode and it will locally receive all frames passing through
> > > > it. It's not ideal, but it's a very old and simple behavior. It does not
> > > > require you to track the VLAN associated with the MAC addresses, for
> > > > example.
> > > >
> > >
> > > This is a good point. I wasn't aware that the bridge 'gives up' with
> > > macvlan upper devices, but if I understand correctly, we do have the
> > > necessary tools to improve that.
> > > But actually, I'm wondering if this simple behavior from the bridge is
> > > correct.
> >
> > Why would it be incorrect?
> >
> > > As you, Jiri and Ivan pointed out in last summer's email
> > > thread about the Linux bridge and promiscuous mode, putting the
> > > interface in IFF_PROMISC is only going to guarantee acceptance through
> > > the net device's RX filter, but not that the packets will go to the
> > > CPU.
> >
> > IFF_PROMISC has no bearing on whether a packet should go to the CPU or
> > not. It only influences the device's RX filter, like you said. If you
> > only look at the software data path, the bridge being in promiscuous
> > mode means that all received packets will be injected to the kernel's Rx
> > path as if they were received through the bridge device. This includes,
> > for example, an IPv4 packet with an unknown unicast MAC (does not
> > correspond to your MAC). Such a packet will be later dropped by the IPv4
> > code since it's not addressed to you:
> >
> > vi net/ipv4/ip_input.c +443
> >
> > We maintain the same behavior in the hardware data path. We don't have
> > MAC filtering in the router like the software data path, so we only send
> > to the router unicast MACs that correspond to the bridge's MAC and its
> > uppers. If such packets later hit a local route (for example), then they
> > will be trapped to the CPU, but the more common case is to simply route
> > them through a different device due to a prefix / gateway route. These
> > never reach the CPU.
> >
> > > So from that perspective, the current series would break things, so we
> > > should definitely fix that and keep the {MAC, VLAN} pairs in the
> > > bridge's local FDB.
> >
> > Not sure I follow. Can you explain what will break and why?
> >
> 
> I haven't done any further testing since yesterday, so my level of
> (mis)understanding is the same. Let's hope at least I can explain
> better this time.
> 
> I guess what I didn't understand from your "macvlan upper whose MAC
> address isn't inherited from bridge" is why does the bridge go in
> promiscuous mode.

Packets received from bridge slaves with DMAC equal to an active bridge
upper (e.g., macvlan) should be received by this upper. When a packet is
received from a bridge slave it performs FDB lookup. Since {VID, MAC}
entries are not programmed for bridge uppers, packets addressed to such
addresses will incur an FDB miss and be flooded. If the bridge is not in
promiscuous mode, these packets will not be received via the bridge
interface and will not reach the relevant upper device.

> You said that it's so that the slave ports won't drop packets with
> that DMAC,

I did not say that. I explained above that if promiscuous mode is not
enabled on the bridge interface itself (a soft device), the packet will
not be received via the bridge interface and will not reach the upper
device.

> I said ok, yes the packets would get dropped without promisc, but also
> promisc still doesn't mean the packets will land on the CPU. This is
> one of the cases where the bridge puts an interface in promisc mode
> with the intention of making the CPU see some frames,

The statement "the bridge puts an interface in promisc mode with the
intention of making the CPU see some frames" is incorrect. The bridge
puts an interface in promiscuous mode so that the bridge will see all
the frames received by this interface. If the bridge is offloaded,
bridging happens in hardware and there is no reason to send all the
frames to the CPU.

> something which has been argued, in the context of switchdev, that was
> never the case. You said that's all true, and that in mlxsw you're
> giving the bridge a helping hand, by tracking the bridge's uppers in
> order to keep something that works by accident in software working
> with switchdev too.

I never said that the software bridge works by accident. I explained
why, to my understanding, the bridge works the way it's working and what
can be done in order to prevent the bridge from going into promiscuous
mode. It involves very careful (and error-prone?) tracking of the upper
devices and their VLANs.

Also, please differentiate between the bridge interface itself going
into promiscuous mode and bridge slaves going into promiscuous mode.

> I said that this is a
> weird layering violation, because the bridge's job is to notify the
> driver of addresses it needs to see, not for the driver to fish for
> them.
> As for "what will break and why". My current patch proposal is to only
> send to the CPU the addresses added via dev_uc_add and dev_mc_add,
> basically. The macvlan upper of the bridge would not be part of that
> list. My rhetorical question then becomes: whose fault is it that
> macvlan breaks? Mine for not tracking the bridge upper, or the bridge
> for not notifying me and just pretending that 'promisc' means 'the CPU
> will see all packets, including the ones I need'? Of course I think
> it's the bridge.
> 
> > >
> > > > When you are offloading the Linux data path to hardware this behavior is
> > > > not ideal as your hardware can handle much higher packet rates than the
> > > > CPU.
> > > >
> > > > In mlxsw we handle this by tracking the upper devices of the bridge. I
> > > > was hoping that with Ivan's patches we could add support for unicast
> > > > filtering in the bridge driver and program the MAC addresses to its FDB
> > > > with 'local' flag. Then the FDB entries would be notified via switchdev
> > > > to device drivers.
> > > >
> > >
> > > Yes, it should be possible to do that. I'll try and see how far I get.
> > >
> > > > >
> > > > > Multicast filtering was taken and reworked from Florian Fainelli's
> > > > > previous attempts, according to my own understanding of multicast
> > > > > forwarding requirements of an IGMP snooping switch. This is the part
> > > > > that needs the most extra work, not only in the DSA core but also in
> > > > > drivers. For this reason, I've left out of this patchset anything that
> > > > > has to do with driver-level configuration (since the audience is a bit
> > > > > larger than usual), as I'm trying to focus more on policy for now, and
> > > > > the series is already pretty huge.
> > > >
> > > > From what I remember, this is the logic in the Linux bridge:
> > > >
> > > > * Broadcast is always locally received
> > > > * Multicast is locally received if:
> > > >         * Snooping disabled
> > > >         * Snooping enabled:
> > > >                 * Bridge netdev is mrouter port
> > > >                 or
> > > >                 * Matches MDB entry with 'host_joined' indication
> > > >
> > > > >
> > > > > Florian Fainelli (3):
> > > > >   net: bridge: multicast: propagate br_mc_disabled_update() return
> > > > >   net: dsa: add ability to program unicast and multicast filters for CPU
> > > > >     port
> > > > >   net: dsa: wire up multicast IGMP snooping attribute notification
> > > > >
> > > > > Ivan Khoronzhuk (4):
> > > > >   net: core: dev_addr_lists: add VID to device address
> > > > >   net: 8021q: vlan_dev: add vid tag to addresses of uc and mc lists
> > > > >   net: 8021q: vlan_dev: add vid tag for vlan device own address
> > > > >   ethernet: eth: add default vid len for all ethernet kind devices
> > > > >
> > > > > Vladimir Oltean (6):
> > > > >   net: core: dev_addr_lists: export some raw __hw_addr helpers
> > > > >   net: dsa: don't use switchdev_notifier_fdb_info in
> > > > >     dsa_switchdev_event_work
> > > > >   net: dsa: mroute: don't panic the kernel if called without the prepare
> > > > >     phase
> > > > >   net: bridge: add port flags for host flooding
> > > > >   net: dsa: deal with new flooding port attributes from bridge
> > > > >   net: dsa: treat switchdev notifications for multicast router connected
> > > > >     to port
> > > > >
> > > > >  include/linux/if_bridge.h |   3 +
> > > > >  include/linux/if_vlan.h   |   2 +
> > > > >  include/linux/netdevice.h |  11 ++
> > > > >  include/net/dsa.h         |  17 +++
> > > > >  net/8021q/Kconfig         |  12 ++
> > > > >  net/8021q/vlan.c          |   3 +
> > > > >  net/8021q/vlan.h          |   2 +
> > > > >  net/8021q/vlan_core.c     |  25 ++++
> > > > >  net/8021q/vlan_dev.c      | 102 +++++++++++---
> > > > >  net/bridge/br_if.c        |  40 ++++++
> > > > >  net/bridge/br_multicast.c |  21 ++-
> > > > >  net/bridge/br_switchdev.c |   4 +-
> > > > >  net/core/dev_addr_lists.c | 144 +++++++++++++++----
> > > > >  net/dsa/Kconfig           |   1 +
> > > > >  net/dsa/dsa2.c            |   6 +
> > > > >  net/dsa/dsa_priv.h        |  27 +++-
> > > > >  net/dsa/port.c            | 155 ++++++++++++++++----
> > > > >  net/dsa/slave.c           | 288 +++++++++++++++++++++++++++++++-------
> > > > >  net/dsa/switch.c          |  36 +++++
> > > > >  net/ethernet/eth.c        |  12 +-
> > > > >  20 files changed, 780 insertions(+), 131 deletions(-)
> > > > >
> > > > > --
> > > > > 2.25.1
> > > > >
> > >
> > > Thanks,
> > > -Vladimir
> 
> -Vladimir

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-05-26 14:01         ` Ido Schimmel
@ 2020-05-27 11:36           ` Vladimir Oltean
  2020-05-28 14:37             ` Ido Schimmel
  0 siblings, 1 reply; 46+ messages in thread
From: Vladimir Oltean @ 2020-05-27 11:36 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, netdev, Horatiu Vultur,
	Allan W. Nielsen, Nikolay Aleksandrov, Roopa Prabhu

On Tue, 26 May 2020 at 17:02, Ido Schimmel <idosch@idosch.org> wrote:
>
> On Mon, May 25, 2020 at 11:23:34PM +0300, Vladimir Oltean wrote:
> > Hi Ido,
> >
> > On Mon, 25 May 2020 at 22:48, Ido Schimmel <idosch@idosch.org> wrote:
> > >
> > > On Sun, May 24, 2020 at 07:24:27PM +0300, Vladimir Oltean wrote:
> > > > On Sun, 24 May 2020 at 17:07, Ido Schimmel <idosch@idosch.org> wrote:
> > > > >
> > > > > On Fri, May 22, 2020 at 12:10:23AM +0300, Vladimir Oltean wrote:
> > > > > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > > > >
> > > > > > This is a WIP series whose stated goal is to allow DSA and switchdev
> > > > > > drivers to flood less traffic to the CPU while keeping the same level of
> > > > > > functionality.
> > > > > >
> > > > > > The strategy is to whitelist towards the CPU only the {DMAC, VLAN} pairs
> > > > > > that the operating system has expressed its interest in, either due to
> > > > > > those being the MAC addresses of one of the switch ports, or addresses
> > > > > > added to our device's RX filter via calls to dev_uc_add/dev_mc_add.
> > > > > > Then, the traffic which is not explicitly whitelisted is not sent by the
> > > > > > hardware to the CPU, under the assumption that the CPU didn't ask for it
> > > > > > and would have dropped it anyway.
> > > > > >
> > > > > > The ground for these patches were the discussions surrounding RX
> > > > > > filtering with switchdev in general, as well as with DSA in particular:
> > > > > >
> > > > > > "[PATCH net-next 0/4] DSA: promisc on master, generic flow dissector code":
> > > > > > https://www.spinics.net/lists/netdev/msg651922.html
> > > > > > "[PATCH v3 net-next 2/2] net: dsa: felix: Allow unknown unicast traffic towards the CPU port module":
> > > > > > https://www.spinics.net/lists/netdev/msg634859.html
> > > > > > "[PATCH v3 0/2] net: core: Notify on changes to dev->promiscuity":
> > > > > > https://lkml.org/lkml/2019/8/29/255
> > > > > > LPC2019 - SwitchDev offload optimizations:
> > > > > > https://www.youtube.com/watch?v=B1HhxEcU7Jg
> > > > > >
> > > > > > Unicast filtering comes to me as most important, and this includes
> > > > > > termination of MAC addresses corresponding to the network interfaces in
> > > > > > the system (DSA switch ports, VLAN sub-interfaces, bridge interface).
> > > > > > The first 4 patches use Ivan Khoronzhuk's IVDF framework for extending
> > > > > > network interface addresses with a Virtual ID (typically VLAN ID). This
> > > > > > matches DSA switches perfectly because their FDB already contains keys
> > > > > > of the {DMAC, VID} form.
> > > > >
> > > > > Hi,
> > > > >
> > > > > I read through the series and I'm not sure how unicast filtering works.
> > > > > Instead of writing a very long mail I just created a script with
> > > > > comments. I think it's clearer that way. Note that this is not a made up
> > > > > configuration. It is used in setups involving VRRP / VXLAN, for example.
> > > > >
> > > > > ```
> > > > > #!/bin/bash
> > > > >
> > > > > ip netns add ns1
> > > > >
> > > > > ip -n ns1 link add name br0 type bridge vlan_filtering 1
> > > > > ip -n ns1 link add name dummy10 up type dummy
> > > > >
> > > > > ip -n ns1 link set dev dummy10 master br0
> > > > > ip -n ns1 link set dev br0 up
> > > > >
> > > > > ip -n ns1 link add link br0 name vlan10 up type vlan id 10
> > > > > bridge -n ns1 vlan add vid 10 dev br0 self
> > > > >
> > > > > echo "Before adding macvlan:"
> > > > > echo "======================"
> > > > >
> > > > > echo -n "Promiscuous mode: "
> > > > > ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
> > > > >
> > > > > echo -e "\nvlan10's MAC is in br0's FDB:"
> > > > > bridge -n ns1 fdb show br0 vlan 10
> > > > >
> > > > > echo
> > > > > echo "After adding macvlan:"
> > > > > echo "====================="
> > > > >
> > > > > ip -n ns1 link add link vlan10 name vlan10-v up address 00:00:5e:00:01:01 \
> > > > >         type macvlan mode private
> > > > >
> > > > > echo -n "Promiscuous mode: "
> > > > > ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
> > > > >
> > > > > echo -e "\nvlan10-v's MAC is not in br0's FDB:"
> > > > > bridge -n ns1 fdb show br0 | grep master | grep 00:00:5e:00:01:01
> > > > > ```
> > > > >
> > > > > This is the output on my laptop (kernel 5.6.8):
> > > > >
> > > > > ```
> > > > > Before adding macvlan:
> > > > > ======================
> > > > > Promiscuous mode: 0
> > > > >
> > > > > vlan10's MAC is in br0's FDB:
> > > > > 42:bd:b1:cc:67:15 dev br0 vlan 10 master br0 permanent
> > > > >
> > > > > After adding macvlan:
> > > > > =====================
> > > > > Promiscuous mode: 1
> > > > >
> > > > > vlan10-v's MAC is not in br0's FDB:
> > > > > ```
> > > > >
> > > > > Basically, if the MAC of the VLAN device is not inherited from the
> > > > > bridge or you stack macvlans on top, then the bridge will go into
> > > > > promiscuous mode and it will locally receive all frames passing through
> > > > > it. It's not ideal, but it's a very old and simple behavior. It does not
> > > > > require you to track the VLAN associated with the MAC addresses, for
> > > > > example.
> > > > >
> > > >
> > > > This is a good point. I wasn't aware that the bridge 'gives up' with
> > > > macvlan upper devices, but if I understand correctly, we do have the
> > > > necessary tools to improve that.
> > > > But actually, I'm wondering if this simple behavior from the bridge is
> > > > correct.
> > >
> > > Why would it be incorrect?
> > >
> > > > As you, Jiri and Ivan pointed out in last summer's email
> > > > thread about the Linux bridge and promiscuous mode, putting the
> > > > interface in IFF_PROMISC is only going to guarantee acceptance through
> > > > the net device's RX filter, but not that the packets will go to the
> > > > CPU.
> > >
> > > IFF_PROMISC has no bearing on whether a packet should go to the CPU or
> > > not. It only influences the device's RX filter, like you said. If you
> > > only look at the software data path, the bridge being in promiscuous
> > > mode means that all received packets will be injected to the kernel's Rx
> > > path as if they were received through the bridge device. This includes,
> > > for example, an IPv4 packet with an unknown unicast MAC (does not
> > > correspond to your MAC). Such a packet will be later dropped by the IPv4
> > > code since it's not addressed to you:
> > >
> > > vi net/ipv4/ip_input.c +443
> > >
> > > We maintain the same behavior in the hardware data path. We don't have
> > > MAC filtering in the router like the software data path, so we only send
> > > to the router unicast MACs that correspond to the bridge's MAC and its
> > > uppers. If such packets later hit a local route (for example), then they
> > > will be trapped to the CPU, but the more common case is to simply route
> > > them through a different device due to a prefix / gateway route. These
> > > never reach the CPU.
> > >
> > > > So from that perspective, the current series would break things, so we
> > > > should definitely fix that and keep the {MAC, VLAN} pairs in the
> > > > bridge's local FDB.
> > >
> > > Not sure I follow. Can you explain what will break and why?
> > >
> >
> > I haven't done any further testing since yesterday, so my level of
> > (mis)understanding is the same. Let's hope at least I can explain
> > better this time.
> >
> > I guess what I didn't understand from your "macvlan upper whose MAC
> > address isn't inherited from bridge" is why does the bridge go in
> > promiscuous mode.
>
> Packets received from bridge slaves with DMAC equal to an active bridge
> upper (e.g., macvlan) should be received by this upper. When a packet is
> received from a bridge slave it performs FDB lookup. Since {VID, MAC}
> entries are not programmed for bridge uppers, packets addressed to such
> addresses will incur an FDB miss and be flooded. If the bridge is not in
> promiscuous mode, these packets will not be received via the bridge
> interface and will not reach the relevant upper device.
>
> > You said that it's so that the slave ports won't drop packets with
> > that DMAC,
>
> I did not say that. I explained above that if promiscuous mode is not
> enabled on the bridge interface itself (a soft device), the packet will
> not be received via the bridge interface and will not reach the upper
> device.
>
> > I said ok, yes the packets would get dropped without promisc, but also
> > promisc still doesn't mean the packets will land on the CPU. This is
> > one of the cases where the bridge puts an interface in promisc mode
> > with the intention of making the CPU see some frames,
>
> The statement "the bridge puts an interface in promisc mode with the
> intention of making the CPU see some frames" is incorrect. The bridge
> puts an interface in promiscuous mode so that the bridge will see all
> the frames received by this interface. If the bridge is offloaded,
> bridging happens in hardware and there is no reason to send all the
> frames to the CPU.
>
> > something which has been argued, in the context of switchdev, that was
> > never the case. You said that's all true, and that in mlxsw you're
> > giving the bridge a helping hand, by tracking the bridge's uppers in
> > order to keep something that works by accident in software working
> > with switchdev too.
>
> I never said that the software bridge works by accident. I explained
> why, to my understanding, the bridge works the way it's working and what
> can be done in order to prevent the bridge from going into promiscuous
> mode. It involves very careful (and error-prone?) tracking of the upper
> devices and their VLANs.
>
> Also, please differentiate between the bridge interface itself going
> into promiscuous mode and bridge slaves going into promiscuous mode.
>
> > I said that this is a
> > weird layering violation, because the bridge's job is to notify the
> > driver of addresses it needs to see, not for the driver to fish for
> > them.
> > As for "what will break and why". My current patch proposal is to only
> > send to the CPU the addresses added via dev_uc_add and dev_mc_add,
> > basically. The macvlan upper of the bridge would not be part of that
> > list. My rhetorical question then becomes: whose fault is it that
> > macvlan breaks? Mine for not tracking the bridge upper, or the bridge
> > for not notifying me and just pretending that 'promisc' means 'the CPU
> > will see all packets, including the ones I need'? Of course I think
> > it's the bridge.
> >

Ok, bridge promisc vs slave promisc is not a difference I explicitly
made, but my point is actually beyond that.
The bridge going in promisc will only help if the packets are sent to
the CPU in the first place. And it does nothing to ensure that that
will happen. So the bridge code works by accident.

I also have an additional question, only partially related.
Doesn't the SWITCHDEV_OBJ_ID_HOST_MDB mechanism conceptually overlap
with what we're trying to do here? If there are no objections I would
replace it with dev_mc_sync_multiple, to be symmetric with what I'm
going to be changing for unicast. Only DSA and cpsw are using
SWITCHDEV_OBJ_ID_HOST_MDB anyway, and looks like cpsw is using it
wrong.

> > > >
> > > > > When you are offloading the Linux data path to hardware this behavior is
> > > > > not ideal as your hardware can handle much higher packet rates than the
> > > > > CPU.
> > > > >
> > > > > In mlxsw we handle this by tracking the upper devices of the bridge. I
> > > > > was hoping that with Ivan's patches we could add support for unicast
> > > > > filtering in the bridge driver and program the MAC addresses to its FDB
> > > > > with 'local' flag. Then the FDB entries would be notified via switchdev
> > > > > to device drivers.
> > > > >
> > > >
> > > > Yes, it should be possible to do that. I'll try and see how far I get.
> > > >
> > > > > >
> > > > > > Multicast filtering was taken and reworked from Florian Fainelli's
> > > > > > previous attempts, according to my own understanding of multicast
> > > > > > forwarding requirements of an IGMP snooping switch. This is the part
> > > > > > that needs the most extra work, not only in the DSA core but also in
> > > > > > drivers. For this reason, I've left out of this patchset anything that
> > > > > > has to do with driver-level configuration (since the audience is a bit
> > > > > > larger than usual), as I'm trying to focus more on policy for now, and
> > > > > > the series is already pretty huge.
> > > > >
> > > > > From what I remember, this is the logic in the Linux bridge:
> > > > >
> > > > > * Broadcast is always locally received
> > > > > * Multicast is locally received if:
> > > > >         * Snooping disabled
> > > > >         * Snooping enabled:
> > > > >                 * Bridge netdev is mrouter port
> > > > >                 or
> > > > >                 * Matches MDB entry with 'host_joined' indication
> > > > >
> > > > > >
> > > > > > Florian Fainelli (3):
> > > > > >   net: bridge: multicast: propagate br_mc_disabled_update() return
> > > > > >   net: dsa: add ability to program unicast and multicast filters for CPU
> > > > > >     port
> > > > > >   net: dsa: wire up multicast IGMP snooping attribute notification
> > > > > >
> > > > > > Ivan Khoronzhuk (4):
> > > > > >   net: core: dev_addr_lists: add VID to device address
> > > > > >   net: 8021q: vlan_dev: add vid tag to addresses of uc and mc lists
> > > > > >   net: 8021q: vlan_dev: add vid tag for vlan device own address
> > > > > >   ethernet: eth: add default vid len for all ethernet kind devices
> > > > > >
> > > > > > Vladimir Oltean (6):
> > > > > >   net: core: dev_addr_lists: export some raw __hw_addr helpers
> > > > > >   net: dsa: don't use switchdev_notifier_fdb_info in
> > > > > >     dsa_switchdev_event_work
> > > > > >   net: dsa: mroute: don't panic the kernel if called without the prepare
> > > > > >     phase
> > > > > >   net: bridge: add port flags for host flooding
> > > > > >   net: dsa: deal with new flooding port attributes from bridge
> > > > > >   net: dsa: treat switchdev notifications for multicast router connected
> > > > > >     to port
> > > > > >
> > > > > >  include/linux/if_bridge.h |   3 +
> > > > > >  include/linux/if_vlan.h   |   2 +
> > > > > >  include/linux/netdevice.h |  11 ++
> > > > > >  include/net/dsa.h         |  17 +++
> > > > > >  net/8021q/Kconfig         |  12 ++
> > > > > >  net/8021q/vlan.c          |   3 +
> > > > > >  net/8021q/vlan.h          |   2 +
> > > > > >  net/8021q/vlan_core.c     |  25 ++++
> > > > > >  net/8021q/vlan_dev.c      | 102 +++++++++++---
> > > > > >  net/bridge/br_if.c        |  40 ++++++
> > > > > >  net/bridge/br_multicast.c |  21 ++-
> > > > > >  net/bridge/br_switchdev.c |   4 +-
> > > > > >  net/core/dev_addr_lists.c | 144 +++++++++++++++----
> > > > > >  net/dsa/Kconfig           |   1 +
> > > > > >  net/dsa/dsa2.c            |   6 +
> > > > > >  net/dsa/dsa_priv.h        |  27 +++-
> > > > > >  net/dsa/port.c            | 155 ++++++++++++++++----
> > > > > >  net/dsa/slave.c           | 288 +++++++++++++++++++++++++++++++-------
> > > > > >  net/dsa/switch.c          |  36 +++++
> > > > > >  net/ethernet/eth.c        |  12 +-
> > > > > >  20 files changed, 780 insertions(+), 131 deletions(-)
> > > > > >
> > > > > > --
> > > > > > 2.25.1
> > > > > >
> > > >
> > > > Thanks,
> > > > -Vladimir
> >
> > -Vladimir

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-05-27 11:36           ` Vladimir Oltean
@ 2020-05-28 14:37             ` Ido Schimmel
  2020-07-20 10:00               ` Vladimir Oltean
  0 siblings, 1 reply; 46+ messages in thread
From: Ido Schimmel @ 2020-05-28 14:37 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, netdev, Horatiu Vultur,
	Allan W. Nielsen, Nikolay Aleksandrov, Roopa Prabhu

On Wed, May 27, 2020 at 02:36:53PM +0300, Vladimir Oltean wrote:
> On Tue, 26 May 2020 at 17:02, Ido Schimmel <idosch@idosch.org> wrote:
> >
> > On Mon, May 25, 2020 at 11:23:34PM +0300, Vladimir Oltean wrote:
> > > Hi Ido,
> > >
> > > On Mon, 25 May 2020 at 22:48, Ido Schimmel <idosch@idosch.org> wrote:
> > > >
> > > > On Sun, May 24, 2020 at 07:24:27PM +0300, Vladimir Oltean wrote:
> > > > > On Sun, 24 May 2020 at 17:07, Ido Schimmel <idosch@idosch.org> wrote:
> > > > > >
> > > > > > On Fri, May 22, 2020 at 12:10:23AM +0300, Vladimir Oltean wrote:
> > > > > > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > > > > >
> > > > > > > This is a WIP series whose stated goal is to allow DSA and switchdev
> > > > > > > drivers to flood less traffic to the CPU while keeping the same level of
> > > > > > > functionality.
> > > > > > >
> > > > > > > The strategy is to whitelist towards the CPU only the {DMAC, VLAN} pairs
> > > > > > > that the operating system has expressed its interest in, either due to
> > > > > > > those being the MAC addresses of one of the switch ports, or addresses
> > > > > > > added to our device's RX filter via calls to dev_uc_add/dev_mc_add.
> > > > > > > Then, the traffic which is not explicitly whitelisted is not sent by the
> > > > > > > hardware to the CPU, under the assumption that the CPU didn't ask for it
> > > > > > > and would have dropped it anyway.
> > > > > > >
> > > > > > > The ground for these patches were the discussions surrounding RX
> > > > > > > filtering with switchdev in general, as well as with DSA in particular:
> > > > > > >
> > > > > > > "[PATCH net-next 0/4] DSA: promisc on master, generic flow dissector code":
> > > > > > > https://www.spinics.net/lists/netdev/msg651922.html
> > > > > > > "[PATCH v3 net-next 2/2] net: dsa: felix: Allow unknown unicast traffic towards the CPU port module":
> > > > > > > https://www.spinics.net/lists/netdev/msg634859.html
> > > > > > > "[PATCH v3 0/2] net: core: Notify on changes to dev->promiscuity":
> > > > > > > https://lkml.org/lkml/2019/8/29/255
> > > > > > > LPC2019 - SwitchDev offload optimizations:
> > > > > > > https://www.youtube.com/watch?v=B1HhxEcU7Jg
> > > > > > >
> > > > > > > Unicast filtering comes to me as most important, and this includes
> > > > > > > termination of MAC addresses corresponding to the network interfaces in
> > > > > > > the system (DSA switch ports, VLAN sub-interfaces, bridge interface).
> > > > > > > The first 4 patches use Ivan Khoronzhuk's IVDF framework for extending
> > > > > > > network interface addresses with a Virtual ID (typically VLAN ID). This
> > > > > > > matches DSA switches perfectly because their FDB already contains keys
> > > > > > > of the {DMAC, VID} form.
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I read through the series and I'm not sure how unicast filtering works.
> > > > > > Instead of writing a very long mail I just created a script with
> > > > > > comments. I think it's clearer that way. Note that this is not a made up
> > > > > > configuration. It is used in setups involving VRRP / VXLAN, for example.
> > > > > >
> > > > > > ```
> > > > > > #!/bin/bash
> > > > > >
> > > > > > ip netns add ns1
> > > > > >
> > > > > > ip -n ns1 link add name br0 type bridge vlan_filtering 1
> > > > > > ip -n ns1 link add name dummy10 up type dummy
> > > > > >
> > > > > > ip -n ns1 link set dev dummy10 master br0
> > > > > > ip -n ns1 link set dev br0 up
> > > > > >
> > > > > > ip -n ns1 link add link br0 name vlan10 up type vlan id 10
> > > > > > bridge -n ns1 vlan add vid 10 dev br0 self
> > > > > >
> > > > > > echo "Before adding macvlan:"
> > > > > > echo "======================"
> > > > > >
> > > > > > echo -n "Promiscuous mode: "
> > > > > > ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
> > > > > >
> > > > > > echo -e "\nvlan10's MAC is in br0's FDB:"
> > > > > > bridge -n ns1 fdb show br0 vlan 10
> > > > > >
> > > > > > echo
> > > > > > echo "After adding macvlan:"
> > > > > > echo "====================="
> > > > > >
> > > > > > ip -n ns1 link add link vlan10 name vlan10-v up address 00:00:5e:00:01:01 \
> > > > > >         type macvlan mode private
> > > > > >
> > > > > > echo -n "Promiscuous mode: "
> > > > > > ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
> > > > > >
> > > > > > echo -e "\nvlan10-v's MAC is not in br0's FDB:"
> > > > > > bridge -n ns1 fdb show br0 | grep master | grep 00:00:5e:00:01:01
> > > > > > ```
> > > > > >
> > > > > > This is the output on my laptop (kernel 5.6.8):
> > > > > >
> > > > > > ```
> > > > > > Before adding macvlan:
> > > > > > ======================
> > > > > > Promiscuous mode: 0
> > > > > >
> > > > > > vlan10's MAC is in br0's FDB:
> > > > > > 42:bd:b1:cc:67:15 dev br0 vlan 10 master br0 permanent
> > > > > >
> > > > > > After adding macvlan:
> > > > > > =====================
> > > > > > Promiscuous mode: 1
> > > > > >
> > > > > > vlan10-v's MAC is not in br0's FDB:
> > > > > > ```
> > > > > >
> > > > > > Basically, if the MAC of the VLAN device is not inherited from the
> > > > > > bridge or you stack macvlans on top, then the bridge will go into
> > > > > > promiscuous mode and it will locally receive all frames passing through
> > > > > > it. It's not ideal, but it's a very old and simple behavior. It does not
> > > > > > require you to track the VLAN associated with the MAC addresses, for
> > > > > > example.
> > > > > >
> > > > >
> > > > > This is a good point. I wasn't aware that the bridge 'gives up' with
> > > > > macvlan upper devices, but if I understand correctly, we do have the
> > > > > necessary tools to improve that.
> > > > > But actually, I'm wondering if this simple behavior from the bridge is
> > > > > correct.
> > > >
> > > > Why would it be incorrect?
> > > >
> > > > > As you, Jiri and Ivan pointed out in last summer's email
> > > > > thread about the Linux bridge and promiscuous mode, putting the
> > > > > interface in IFF_PROMISC is only going to guarantee acceptance through
> > > > > the net device's RX filter, but not that the packets will go to the
> > > > > CPU.
> > > >
> > > > IFF_PROMISC has no bearing on whether a packet should go to the CPU or
> > > > not. It only influences the device's RX filter, like you said. If you
> > > > only look at the software data path, the bridge being in promiscuous
> > > > mode means that all received packets will be injected to the kernel's Rx
> > > > path as if they were received through the bridge device. This includes,
> > > > for example, an IPv4 packet with an unknown unicast MAC (does not
> > > > correspond to your MAC). Such a packet will be later dropped by the IPv4
> > > > code since it's not addressed to you:
> > > >
> > > > vi net/ipv4/ip_input.c +443
> > > >
> > > > We maintain the same behavior in the hardware data path. We don't have
> > > > MAC filtering in the router like the software data path, so we only send
> > > > to the router unicast MACs that correspond to the bridge's MAC and its
> > > > uppers. If such packets later hit a local route (for example), then they
> > > > will be trapped to the CPU, but the more common case is to simply route
> > > > them through a different device due to a prefix / gateway route. These
> > > > never reach the CPU.
> > > >
> > > > > So from that perspective, the current series would break things, so we
> > > > > should definitely fix that and keep the {MAC, VLAN} pairs in the
> > > > > bridge's local FDB.
> > > >
> > > > Not sure I follow. Can you explain what will break and why?
> > > >
> > >
> > > I haven't done any further testing since yesterday, so my level of
> > > (mis)understanding is the same. Let's hope at least I can explain
> > > better this time.
> > >
> > > I guess what I didn't understand from your "macvlan upper whose MAC
> > > address isn't inherited from bridge" is why does the bridge go in
> > > promiscuous mode.
> >
> > Packets received from bridge slaves with DMAC equal to an active bridge
> > upper (e.g., macvlan) should be received by this upper. When a packet is
> > received from a bridge slave it performs FDB lookup. Since {VID, MAC}
> > entries are not programmed for bridge uppers, packets addressed to such
> > addresses will incur an FDB miss and be flooded. If the bridge is not in
> > promiscuous mode, these packets will not be received via the bridge
> > interface and will not reach the relevant upper device.
> >
> > > You said that it's so that the slave ports won't drop packets with
> > > that DMAC,
> >
> > I did not say that. I explained above that if promiscuous mode is not
> > enabled on the bridge interface itself (a soft device), the packet will
> > not be received via the bridge interface and will not reach the upper
> > device.
> >
> > > I said ok, yes the packets would get dropped without promisc, but also
> > > promisc still doesn't mean the packets will land on the CPU. This is
> > > one of the cases where the bridge puts an interface in promisc mode
> > > with the intention of making the CPU see some frames,
> >
> > The statement "the bridge puts an interface in promisc mode with the
> > intention of making the CPU see some frames" is incorrect. The bridge
> > puts an interface in promiscuous mode so that the bridge will see all
> > the frames received by this interface. If the bridge is offloaded,
> > bridging happens in hardware and there is no reason to send all the
> > frames to the CPU.
> >
> > > something which has been argued, in the context of switchdev, that was
> > > never the case. You said that's all true, and that in mlxsw you're
> > > giving the bridge a helping hand, by tracking the bridge's uppers in
> > > order to keep something that works by accident in software working
> > > with switchdev too.
> >
> > I never said that the software bridge works by accident. I explained
> > why, to my understanding, the bridge works the way it's working and what
> > can be done in order to prevent the bridge from going into promiscuous
> > mode. It involves very careful (and error-prone?) tracking of the upper
> > devices and their VLANs.
> >
> > Also, please differentiate between the bridge interface itself going
> > into promiscuous mode and bridge slaves going into promiscuous mode.
> >
> > > I said that this is a
> > > weird layering violation, because the bridge's job is to notify the
> > > driver of addresses it needs to see, not for the driver to fish for
> > > them.
> > > As for "what will break and why". My current patch proposal is to only
> > > send to the CPU the addresses added via dev_uc_add and dev_mc_add,
> > > basically. The macvlan upper of the bridge would not be part of that
> > > list. My rhetorical question then becomes: whose fault is it that
> > > macvlan breaks? Mine for not tracking the bridge upper, or the bridge
> > > for not notifying me and just pretending that 'promisc' means 'the CPU
> > > will see all packets, including the ones I need'? Of course I think
> > > it's the bridge.
> > >
> 
> Ok, bridge promisc vs slave promisc is not a difference I explicitly
> made, but my point is actually beyond that.
> The bridge going in promisc will only help if the packets are sent to
> the CPU in the first place. And it does nothing to ensure that that
> will happen. So the bridge code works by accident.

It's not beyond your point and the bridge code does not work by
accident. When the bridge interface is in promiscuous mode every packet
is injected to the kernel's Rx path as if it was accepted by the bridge
interface. Packets then reach the protocol handlers. In the case of
IPv4/IPv6, the packets go to ipv_rcv() / ip6_rcv() and perform routing.
Packets with a unicast destination MAC that does not correspond to that
of the receiving interface are dropped.

When you look at it from hardware offload perspective, not every packet
received by the bridge interface should reach the CPU. Actually, most
should not reach it. Otherwise it would mean that every routed packet
would need to go to the CPU, which is not feasible. If you can't perform
routing in hardware, then yes, you need to send such packets to the CPU.

In mlxsw we can't perform MAC filtering in the router like in the
software data path, so in order not to route packets we should not, we
only send to the router packets with destination MACs that correspond to
that of the bridge or one of its uppers. We don't flood all unknown
unicast packets there.

In the case of hardware offload it's relatively easy to do this sort of
tracking because only a limited set of upper devices topologies are
actually supported. I'm not sure how feasible it is with every
combination of upper devices supported by the kernel. It seems easiest
to just put the bridge interface in promiscuous mode and let upper
layers perform the filtering. Like it is today.

> I also have an additional question, only partially related.
> Doesn't the SWITCHDEV_OBJ_ID_HOST_MDB mechanism conceptually overlap
> with what we're trying to do here?

Yes, it is similar, but it's easier with multicast because you need to
send IGMP / MLD Membership reports if you are interested in a certain
address. So what the bridge does is to intercept such packets when they
are sent through it and then calls br_multicast_rcv() with a NULL port,
which indicates that the host itself is interested in the address.

> If there are no objections I would replace it with
> dev_mc_sync_multiple, to be symmetric with what I'm going to be
> changing for unicast. Only DSA and cpsw are using
> SWITCHDEV_OBJ_ID_HOST_MDB anyway, and looks like cpsw is using it
> wrong.

Not sure what you plan for unicast, but I'm quite happy with the work
Andrew did with SWITCHDEV_OBJ_ID_HOST_MDB.

> 
> > > > >
> > > > > > When you are offloading the Linux data path to hardware this behavior is
> > > > > > not ideal as your hardware can handle much higher packet rates than the
> > > > > > CPU.
> > > > > >
> > > > > > In mlxsw we handle this by tracking the upper devices of the bridge. I
> > > > > > was hoping that with Ivan's patches we could add support for unicast
> > > > > > filtering in the bridge driver and program the MAC addresses to its FDB
> > > > > > with 'local' flag. Then the FDB entries would be notified via switchdev
> > > > > > to device drivers.
> > > > > >
> > > > >
> > > > > Yes, it should be possible to do that. I'll try and see how far I get.
> > > > >
> > > > > > >
> > > > > > > Multicast filtering was taken and reworked from Florian Fainelli's
> > > > > > > previous attempts, according to my own understanding of multicast
> > > > > > > forwarding requirements of an IGMP snooping switch. This is the part
> > > > > > > that needs the most extra work, not only in the DSA core but also in
> > > > > > > drivers. For this reason, I've left out of this patchset anything that
> > > > > > > has to do with driver-level configuration (since the audience is a bit
> > > > > > > larger than usual), as I'm trying to focus more on policy for now, and
> > > > > > > the series is already pretty huge.
> > > > > >
> > > > > > From what I remember, this is the logic in the Linux bridge:
> > > > > >
> > > > > > * Broadcast is always locally received
> > > > > > * Multicast is locally received if:
> > > > > >         * Snooping disabled
> > > > > >         * Snooping enabled:
> > > > > >                 * Bridge netdev is mrouter port
> > > > > >                 or
> > > > > >                 * Matches MDB entry with 'host_joined' indication
> > > > > >
> > > > > > >
> > > > > > > Florian Fainelli (3):
> > > > > > >   net: bridge: multicast: propagate br_mc_disabled_update() return
> > > > > > >   net: dsa: add ability to program unicast and multicast filters for CPU
> > > > > > >     port
> > > > > > >   net: dsa: wire up multicast IGMP snooping attribute notification
> > > > > > >
> > > > > > > Ivan Khoronzhuk (4):
> > > > > > >   net: core: dev_addr_lists: add VID to device address
> > > > > > >   net: 8021q: vlan_dev: add vid tag to addresses of uc and mc lists
> > > > > > >   net: 8021q: vlan_dev: add vid tag for vlan device own address
> > > > > > >   ethernet: eth: add default vid len for all ethernet kind devices
> > > > > > >
> > > > > > > Vladimir Oltean (6):
> > > > > > >   net: core: dev_addr_lists: export some raw __hw_addr helpers
> > > > > > >   net: dsa: don't use switchdev_notifier_fdb_info in
> > > > > > >     dsa_switchdev_event_work
> > > > > > >   net: dsa: mroute: don't panic the kernel if called without the prepare
> > > > > > >     phase
> > > > > > >   net: bridge: add port flags for host flooding
> > > > > > >   net: dsa: deal with new flooding port attributes from bridge
> > > > > > >   net: dsa: treat switchdev notifications for multicast router connected
> > > > > > >     to port
> > > > > > >
> > > > > > >  include/linux/if_bridge.h |   3 +
> > > > > > >  include/linux/if_vlan.h   |   2 +
> > > > > > >  include/linux/netdevice.h |  11 ++
> > > > > > >  include/net/dsa.h         |  17 +++
> > > > > > >  net/8021q/Kconfig         |  12 ++
> > > > > > >  net/8021q/vlan.c          |   3 +
> > > > > > >  net/8021q/vlan.h          |   2 +
> > > > > > >  net/8021q/vlan_core.c     |  25 ++++
> > > > > > >  net/8021q/vlan_dev.c      | 102 +++++++++++---
> > > > > > >  net/bridge/br_if.c        |  40 ++++++
> > > > > > >  net/bridge/br_multicast.c |  21 ++-
> > > > > > >  net/bridge/br_switchdev.c |   4 +-
> > > > > > >  net/core/dev_addr_lists.c | 144 +++++++++++++++----
> > > > > > >  net/dsa/Kconfig           |   1 +
> > > > > > >  net/dsa/dsa2.c            |   6 +
> > > > > > >  net/dsa/dsa_priv.h        |  27 +++-
> > > > > > >  net/dsa/port.c            | 155 ++++++++++++++++----
> > > > > > >  net/dsa/slave.c           | 288 +++++++++++++++++++++++++++++++-------
> > > > > > >  net/dsa/switch.c          |  36 +++++
> > > > > > >  net/ethernet/eth.c        |  12 +-
> > > > > > >  20 files changed, 780 insertions(+), 131 deletions(-)
> > > > > > >
> > > > > > > --
> > > > > > > 2.25.1
> > > > > > >
> > > > >
> > > > > Thanks,
> > > > > -Vladimir
> > >
> > > -Vladimir
> 
> Thanks,
> -Vladimir

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-05-28 14:37             ` Ido Schimmel
@ 2020-07-20 10:00               ` Vladimir Oltean
  2020-07-27 16:56                 ` Ido Schimmel
  0 siblings, 1 reply; 46+ messages in thread
From: Vladimir Oltean @ 2020-07-20 10:00 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, vyasevich, netdev,
	UNGLinuxDriver, Nikolay Aleksandrov, Roopa Prabhu

On Thu, May 28, 2020 at 05:37:18PM +0300, Ido Schimmel wrote:
> On Wed, May 27, 2020 at 02:36:53PM +0300, Vladimir Oltean wrote:
> > On Tue, 26 May 2020 at 17:02, Ido Schimmel <idosch@idosch.org> wrote:
> > >
> > > On Mon, May 25, 2020 at 11:23:34PM +0300, Vladimir Oltean wrote:
> > > > Hi Ido,
> > > >
> > > > On Mon, 25 May 2020 at 22:48, Ido Schimmel <idosch@idosch.org> wrote:
> > > > >
> > > > > On Sun, May 24, 2020 at 07:24:27PM +0300, Vladimir Oltean wrote:
> > > > > > On Sun, 24 May 2020 at 17:07, Ido Schimmel <idosch@idosch.org> wrote:
> > > > > > >
> > > > > > > On Fri, May 22, 2020 at 12:10:23AM +0300, Vladimir Oltean wrote:
> > > > > > > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > > > > > >
> > > > > > > > This is a WIP series whose stated goal is to allow DSA and switchdev
> > > > > > > > drivers to flood less traffic to the CPU while keeping the same level of
> > > > > > > > functionality.
> > > > > > > >
> > > > > > > > The strategy is to whitelist towards the CPU only the {DMAC, VLAN} pairs
> > > > > > > > that the operating system has expressed its interest in, either due to
> > > > > > > > those being the MAC addresses of one of the switch ports, or addresses
> > > > > > > > added to our device's RX filter via calls to dev_uc_add/dev_mc_add.
> > > > > > > > Then, the traffic which is not explicitly whitelisted is not sent by the
> > > > > > > > hardware to the CPU, under the assumption that the CPU didn't ask for it
> > > > > > > > and would have dropped it anyway.
> > > > > > > >
> > > > > > > > The ground for these patches were the discussions surrounding RX
> > > > > > > > filtering with switchdev in general, as well as with DSA in particular:
> > > > > > > >
> > > > > > > > "[PATCH net-next 0/4] DSA: promisc on master, generic flow dissector code":
> > > > > > > > https://www.spinics.net/lists/netdev/msg651922.html
> > > > > > > > "[PATCH v3 net-next 2/2] net: dsa: felix: Allow unknown unicast traffic towards the CPU port module":
> > > > > > > > https://www.spinics.net/lists/netdev/msg634859.html
> > > > > > > > "[PATCH v3 0/2] net: core: Notify on changes to dev->promiscuity":
> > > > > > > > https://lkml.org/lkml/2019/8/29/255
> > > > > > > > LPC2019 - SwitchDev offload optimizations:
> > > > > > > > https://www.youtube.com/watch?v=B1HhxEcU7Jg
> > > > > > > >
> > > > > > > > Unicast filtering comes to me as most important, and this includes
> > > > > > > > termination of MAC addresses corresponding to the network interfaces in
> > > > > > > > the system (DSA switch ports, VLAN sub-interfaces, bridge interface).
> > > > > > > > The first 4 patches use Ivan Khoronzhuk's IVDF framework for extending
> > > > > > > > network interface addresses with a Virtual ID (typically VLAN ID). This
> > > > > > > > matches DSA switches perfectly because their FDB already contains keys
> > > > > > > > of the {DMAC, VID} form.
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I read through the series and I'm not sure how unicast filtering works.
> > > > > > > Instead of writing a very long mail I just created a script with
> > > > > > > comments. I think it's clearer that way. Note that this is not a made up
> > > > > > > configuration. It is used in setups involving VRRP / VXLAN, for example.
> > > > > > >
> > > > > > > ```
> > > > > > > #!/bin/bash
> > > > > > >
> > > > > > > ip netns add ns1
> > > > > > >
> > > > > > > ip -n ns1 link add name br0 type bridge vlan_filtering 1
> > > > > > > ip -n ns1 link add name dummy10 up type dummy
> > > > > > >
> > > > > > > ip -n ns1 link set dev dummy10 master br0
> > > > > > > ip -n ns1 link set dev br0 up
> > > > > > >
> > > > > > > ip -n ns1 link add link br0 name vlan10 up type vlan id 10
> > > > > > > bridge -n ns1 vlan add vid 10 dev br0 self
> > > > > > >
> > > > > > > echo "Before adding macvlan:"
> > > > > > > echo "======================"
> > > > > > >
> > > > > > > echo -n "Promiscuous mode: "
> > > > > > > ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
> > > > > > >
> > > > > > > echo -e "\nvlan10's MAC is in br0's FDB:"
> > > > > > > bridge -n ns1 fdb show br0 vlan 10
> > > > > > >
> > > > > > > echo
> > > > > > > echo "After adding macvlan:"
> > > > > > > echo "====================="
> > > > > > >
> > > > > > > ip -n ns1 link add link vlan10 name vlan10-v up address 00:00:5e:00:01:01 \
> > > > > > >         type macvlan mode private
> > > > > > >
> > > > > > > echo -n "Promiscuous mode: "
> > > > > > > ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
> > > > > > >
> > > > > > > echo -e "\nvlan10-v's MAC is not in br0's FDB:"
> > > > > > > bridge -n ns1 fdb show br0 | grep master | grep 00:00:5e:00:01:01
> > > > > > > ```
> > > > > > >
> > > > > > > This is the output on my laptop (kernel 5.6.8):
> > > > > > >
> > > > > > > ```
> > > > > > > Before adding macvlan:
> > > > > > > ======================
> > > > > > > Promiscuous mode: 0
> > > > > > >
> > > > > > > vlan10's MAC is in br0's FDB:
> > > > > > > 42:bd:b1:cc:67:15 dev br0 vlan 10 master br0 permanent
> > > > > > >
> > > > > > > After adding macvlan:
> > > > > > > =====================
> > > > > > > Promiscuous mode: 1
> > > > > > >
> > > > > > > vlan10-v's MAC is not in br0's FDB:
> > > > > > > ```
> > > > > > >
> > > > > > > Basically, if the MAC of the VLAN device is not inherited from the
> > > > > > > bridge or you stack macvlans on top, then the bridge will go into
> > > > > > > promiscuous mode and it will locally receive all frames passing through
> > > > > > > it. It's not ideal, but it's a very old and simple behavior. It does not
> > > > > > > require you to track the VLAN associated with the MAC addresses, for
> > > > > > > example.
> > > > > > >
> > > > > >
> > > > > > This is a good point. I wasn't aware that the bridge 'gives up' with
> > > > > > macvlan upper devices, but if I understand correctly, we do have the
> > > > > > necessary tools to improve that.
> > > > > > But actually, I'm wondering if this simple behavior from the bridge is
> > > > > > correct.
> > > > >
> > > > > Why would it be incorrect?
> > > > >
> > > > > > As you, Jiri and Ivan pointed out in last summer's email
> > > > > > thread about the Linux bridge and promiscuous mode, putting the
> > > > > > interface in IFF_PROMISC is only going to guarantee acceptance through
> > > > > > the net device's RX filter, but not that the packets will go to the
> > > > > > CPU.
> > > > >
> > > > > IFF_PROMISC has no bearing on whether a packet should go to the CPU or
> > > > > not. It only influences the device's RX filter, like you said. If you
> > > > > only look at the software data path, the bridge being in promiscuous
> > > > > mode means that all received packets will be injected to the kernel's Rx
> > > > > path as if they were received through the bridge device. This includes,
> > > > > for example, an IPv4 packet with an unknown unicast MAC (does not
> > > > > correspond to your MAC). Such a packet will be later dropped by the IPv4
> > > > > code since it's not addressed to you:
> > > > >
> > > > > vi net/ipv4/ip_input.c +443
> > > > >
> > > > > We maintain the same behavior in the hardware data path. We don't have
> > > > > MAC filtering in the router like the software data path, so we only send
> > > > > to the router unicast MACs that correspond to the bridge's MAC and its
> > > > > uppers. If such packets later hit a local route (for example), then they
> > > > > will be trapped to the CPU, but the more common case is to simply route
> > > > > them through a different device due to a prefix / gateway route. These
> > > > > never reach the CPU.
> > > > >
> > > > > > So from that perspective, the current series would break things, so we
> > > > > > should definitely fix that and keep the {MAC, VLAN} pairs in the
> > > > > > bridge's local FDB.
> > > > >
> > > > > Not sure I follow. Can you explain what will break and why?
> > > > >
> > > >
> > > > I haven't done any further testing since yesterday, so my level of
> > > > (mis)understanding is the same. Let's hope at least I can explain
> > > > better this time.
> > > >
> > > > I guess what I didn't understand from your "macvlan upper whose MAC
> > > > address isn't inherited from bridge" is why does the bridge go in
> > > > promiscuous mode.
> > >
> > > Packets received from bridge slaves with DMAC equal to an active bridge
> > > upper (e.g., macvlan) should be received by this upper. When a packet is
> > > received from a bridge slave it performs FDB lookup. Since {VID, MAC}
> > > entries are not programmed for bridge uppers, packets addressed to such
> > > addresses will incur an FDB miss and be flooded. If the bridge is not in
> > > promiscuous mode, these packets will not be received via the bridge
> > > interface and will not reach the relevant upper device.
> > >
> > > > You said that it's so that the slave ports won't drop packets with
> > > > that DMAC,
> > >
> > > I did not say that. I explained above that if promiscuous mode is not
> > > enabled on the bridge interface itself (a soft device), the packet will
> > > not be received via the bridge interface and will not reach the upper
> > > device.
> > >
> > > > I said ok, yes the packets would get dropped without promisc, but also
> > > > promisc still doesn't mean the packets will land on the CPU. This is
> > > > one of the cases where the bridge puts an interface in promisc mode
> > > > with the intention of making the CPU see some frames,
> > >
> > > The statement "the bridge puts an interface in promisc mode with the
> > > intention of making the CPU see some frames" is incorrect. The bridge
> > > puts an interface in promiscuous mode so that the bridge will see all
> > > the frames received by this interface. If the bridge is offloaded,
> > > bridging happens in hardware and there is no reason to send all the
> > > frames to the CPU.
> > >
> > > > something which has been argued, in the context of switchdev, that was
> > > > never the case. You said that's all true, and that in mlxsw you're
> > > > giving the bridge a helping hand, by tracking the bridge's uppers in
> > > > order to keep something that works by accident in software working
> > > > with switchdev too.
> > >
> > > I never said that the software bridge works by accident. I explained
> > > why, to my understanding, the bridge works the way it's working and what
> > > can be done in order to prevent the bridge from going into promiscuous
> > > mode. It involves very careful (and error-prone?) tracking of the upper
> > > devices and their VLANs.
> > >
> > > Also, please differentiate between the bridge interface itself going
> > > into promiscuous mode and bridge slaves going into promiscuous mode.
> > >
> > > > I said that this is a
> > > > weird layering violation, because the bridge's job is to notify the
> > > > driver of addresses it needs to see, not for the driver to fish for
> > > > them.
> > > > As for "what will break and why". My current patch proposal is to only
> > > > send to the CPU the addresses added via dev_uc_add and dev_mc_add,
> > > > basically. The macvlan upper of the bridge would not be part of that
> > > > list. My rhetorical question then becomes: whose fault is it that
> > > > macvlan breaks? Mine for not tracking the bridge upper, or the bridge
> > > > for not notifying me and just pretending that 'promisc' means 'the CPU
> > > > will see all packets, including the ones I need'? Of course I think
> > > > it's the bridge.
> > > >
> > 
> > Ok, bridge promisc vs slave promisc is not a difference I explicitly
> > made, but my point is actually beyond that.
> > The bridge going in promisc will only help if the packets are sent to
> > the CPU in the first place. And it does nothing to ensure that that
> > will happen. So the bridge code works by accident.
> 
> It's not beyond your point and the bridge code does not work by
> accident. When the bridge interface is in promiscuous mode every packet
> is injected to the kernel's Rx path as if it was accepted by the bridge
> interface. Packets then reach the protocol handlers. In the case of
> IPv4/IPv6, the packets go to ipv_rcv() / ip6_rcv() and perform routing.
> Packets with a unicast destination MAC that does not correspond to that
> of the receiving interface are dropped.
> 

Hi Ido,

I still maintain that the bridge and the network stack in general don't
have a proper design for managing a switchdev's filter of what goes to
the CPU, let me explain.

The whole purpose of my patch series is to remove the CPU port from the
flood domain of all switchdev net_devices. That means, when an unknown
unicast packet ingresses, it will be flooded but not to the CPU. For
frames that the CPU wants to see, there should be a universal mechanism
for it to whitelist them, by {DMAC, VID}. Otherwise, things don't scale.

There is one such mechanism already, and that is dev_uc_add(). It used
to install an address into a device's RX filter using DMAC only, and
Ivan Khoronzhuk's patches have added a new dev_vid_uc_add() that allow
additional filtering by VLAN.

That mechanism used to have a meaning. Its meaning was: for a
non-promisc net_device, don't drop unicast frames having a MAC address
equal to the argument passed to dev_uc_add().

For a promiscuous net_device, that is not needed, because promisc means
that no frames should be dropped due to MAC/VID match.

For a switchdev, promisc vs non-promisc doesn't mean a thing. A switch
is a switch, it's promiscuous by definition. It is _supposed_ to accept
traffic regardless of destination. There isn't even a mechanism in the
switchdevs that I know of to install an RX filter in the ingress MAC (or
if there is, it filters by source MAC address, and it's done for
different reasons). This is fundamentally because the destination MAC
address is parsed by a network card for _termination_ purposes. And
because a switch doesn't do _termination_, there is no reason to filter
by destination MAC (ignore ACL and such). But a Linux switchdev is
capable of termination. In the case of switchdev, termination means
sending to the CPU.

My interpretation of the meaning of dev_uc_add() for switchdev (and
therefore, of its opposite - promiscuous mode) is at odds with previous
work done for non-switchdev. Take Vlad Yasevich's work "[Bridge] [PATCH
net-next 0/8] Non-promisc bidge ports support" for example:

https://lists.linuxfoundation.org/pipermail/bridge/2014-May/008940.html

He is arguing that a bridge port without flood&learn doesn't need
promiscuous mode, because all addresses can be statically known, and
therefore, he added code to the bridge that does the following:

- syncs the bridge MAC address to all non-promisc bridge slaves, via
  dev_uc_add()
- syncs the MAC addresses of all static FDB entries on all ingress
  non-promisc bridge slave ports, via dev_uc_add()

with the obvious goal that "the bridge slave shouldn't drop these
packets".

In my interpretation of dev_uc_add(), I would have expected that:
- the bridge MAC address, as well as any other secondary unicast
  addresses that the bridge has, by means of its uppers (like macvlan,
  802.1q, etc) calling dev_uc_add() on it, would be synced to the bridge
  slaves anyway, regardless of whether they're promisc or not
- the static FDB entries are synced to the bridge ports only in the
  non-switchdev case. This is because for switchdev, I am treating a
  dev_uc_add() as a FDB entry towards the CPU, and therefore this would
  overwrite the FDB entry towards the external port.

In my interpretation, things would have worked neatly for the most part,
not only for unicast but also for multicast. For example, an application
wants to see a multicast stream, so it calls setsockopt(SOL_SOCKET,
PACKET_ADD_MEMBERSHIP, PACKET_MR_MULTICAST) with the multicast address
it wants to see. This is translated by the kernel into a dev_mc_add()
and sent to the network device. For a non-switchdev, this would have
been enough. For a switchdev, if I also installed the address in the
CPU's filter, it would have also been enough. Things 'just work' and
everybody's happy.

> When you look at it from hardware offload perspective, not every packet
> received by the bridge interface should reach the CPU. Actually, most
> should not reach it. Otherwise it would mean that every routed packet
> would need to go to the CPU, which is not feasible. If you can't perform
> routing in hardware, then yes, you need to send such packets to the CPU.
> 
> In mlxsw we can't perform MAC filtering in the router like in the
> software data path, so in order not to route packets we should not, we
> only send to the router packets with destination MACs that correspond to
> that of the bridge or one of its uppers. We don't flood all unknown
> unicast packets there.
> 
> In the case of hardware offload it's relatively easy to do this sort of
> tracking because only a limited set of upper devices topologies are
> actually supported. I'm not sure how feasible it is with every
> combination of upper devices supported by the kernel. It seems easiest
> to just put the bridge interface in promiscuous mode and let upper
> layers perform the filtering. Like it is today.
> 

Are you suggesting that tracking the uppers is the only way to do what I
want? (I didn't even find that piece of code in mlxsw, btw).

I am a bit reluctant to do such management at driver level. It is not a
driver problem, it is a switchdev design question. I shouldn't need to
care if there's a macvlan or an 802.1q or a bridge upper or whatnot, and
how many addresses those are listening to, as long as those network
interfaces can tell me what addresses they want to see, and as long as I
can interpret that information as the list of addresses I should be
delivering to the CPU. It is clear that right now, some of the uses of
dev_uc_add() are simply there to prevent drops, and not because the
bridge has a particular interest in seeing those frames. So either the
meaning of dev_uc_add() changes, and a meaning is standardized for
promisc on a switchdev port, or we add a new set of functions, such as
dev_cpu_filter_uc_add() specifically for switchdev and spray them
throughout the network stack, mostly in the places where dev_uc_add() is
currently also used.

Bridging with a non-switchdev interface is also a situation that should
be dealt with generically, as in that case, the CPU filter should
obviously become larger, as termination is no longer done just on this
CPU, and plain unicast filtering is no longer enough.

I hope it's a bit clearer now what is the problem I'm trying to address.
Needless to say, I would prefer that a new API is not introduced,
because upper layers shouldn't necessarily care about switchdev, unless
they can leverage it for offloading.

> > 
> > > > > >
> > > > > > > When you are offloading the Linux data path to hardware this behavior is
> > > > > > > not ideal as your hardware can handle much higher packet rates than the
> > > > > > > CPU.
> > > > > > >
> > > > > > > In mlxsw we handle this by tracking the upper devices of the bridge. I
> > > > > > > was hoping that with Ivan's patches we could add support for unicast
> > > > > > > filtering in the bridge driver and program the MAC addresses to its FDB
> > > > > > > with 'local' flag. Then the FDB entries would be notified via switchdev
> > > > > > > to device drivers.
> > > > > > >
> > > > > >
> > > > > > Yes, it should be possible to do that. I'll try and see how far I get.
> > > > > >
> > > > > > > >
> > > > > > > > Multicast filtering was taken and reworked from Florian Fainelli's
> > > > > > > > previous attempts, according to my own understanding of multicast
> > > > > > > > forwarding requirements of an IGMP snooping switch. This is the part
> > > > > > > > that needs the most extra work, not only in the DSA core but also in
> > > > > > > > drivers. For this reason, I've left out of this patchset anything that
> > > > > > > > has to do with driver-level configuration (since the audience is a bit
> > > > > > > > larger than usual), as I'm trying to focus more on policy for now, and
> > > > > > > > the series is already pretty huge.
> > > > > > >
> > > > > > > From what I remember, this is the logic in the Linux bridge:
> > > > > > >
> > > > > > > * Broadcast is always locally received
> > > > > > > * Multicast is locally received if:
> > > > > > >         * Snooping disabled
> > > > > > >         * Snooping enabled:
> > > > > > >                 * Bridge netdev is mrouter port
> > > > > > >                 or
> > > > > > >                 * Matches MDB entry with 'host_joined' indication
> > > > > > >
> > > > > > > >
> > > > > > > > Florian Fainelli (3):
> > > > > > > >   net: bridge: multicast: propagate br_mc_disabled_update() return
> > > > > > > >   net: dsa: add ability to program unicast and multicast filters for CPU
> > > > > > > >     port
> > > > > > > >   net: dsa: wire up multicast IGMP snooping attribute notification
> > > > > > > >
> > > > > > > > Ivan Khoronzhuk (4):
> > > > > > > >   net: core: dev_addr_lists: add VID to device address
> > > > > > > >   net: 8021q: vlan_dev: add vid tag to addresses of uc and mc lists
> > > > > > > >   net: 8021q: vlan_dev: add vid tag for vlan device own address
> > > > > > > >   ethernet: eth: add default vid len for all ethernet kind devices
> > > > > > > >
> > > > > > > > Vladimir Oltean (6):
> > > > > > > >   net: core: dev_addr_lists: export some raw __hw_addr helpers
> > > > > > > >   net: dsa: don't use switchdev_notifier_fdb_info in
> > > > > > > >     dsa_switchdev_event_work
> > > > > > > >   net: dsa: mroute: don't panic the kernel if called without the prepare
> > > > > > > >     phase
> > > > > > > >   net: bridge: add port flags for host flooding
> > > > > > > >   net: dsa: deal with new flooding port attributes from bridge
> > > > > > > >   net: dsa: treat switchdev notifications for multicast router connected
> > > > > > > >     to port
> > > > > > > >
> > > > > > > >  include/linux/if_bridge.h |   3 +
> > > > > > > >  include/linux/if_vlan.h   |   2 +
> > > > > > > >  include/linux/netdevice.h |  11 ++
> > > > > > > >  include/net/dsa.h         |  17 +++
> > > > > > > >  net/8021q/Kconfig         |  12 ++
> > > > > > > >  net/8021q/vlan.c          |   3 +
> > > > > > > >  net/8021q/vlan.h          |   2 +
> > > > > > > >  net/8021q/vlan_core.c     |  25 ++++
> > > > > > > >  net/8021q/vlan_dev.c      | 102 +++++++++++---
> > > > > > > >  net/bridge/br_if.c        |  40 ++++++
> > > > > > > >  net/bridge/br_multicast.c |  21 ++-
> > > > > > > >  net/bridge/br_switchdev.c |   4 +-
> > > > > > > >  net/core/dev_addr_lists.c | 144 +++++++++++++++----
> > > > > > > >  net/dsa/Kconfig           |   1 +
> > > > > > > >  net/dsa/dsa2.c            |   6 +
> > > > > > > >  net/dsa/dsa_priv.h        |  27 +++-
> > > > > > > >  net/dsa/port.c            | 155 ++++++++++++++++----
> > > > > > > >  net/dsa/slave.c           | 288 +++++++++++++++++++++++++++++++-------
> > > > > > > >  net/dsa/switch.c          |  36 +++++
> > > > > > > >  net/ethernet/eth.c        |  12 +-
> > > > > > > >  20 files changed, 780 insertions(+), 131 deletions(-)
> > > > > > > >
> > > > > > > > --
> > > > > > > > 2.25.1
> > > > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > -Vladimir
> > > >
> > > > -Vladimir
> > 
> > Thanks,
> > -Vladimir

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding
  2020-05-22 18:45       ` Allan W. Nielsen
@ 2020-07-20 11:08         ` Vladimir Oltean
  0 siblings, 0 replies; 46+ messages in thread
From: Vladimir Oltean @ 2020-07-20 11:08 UTC (permalink / raw)
  To: UNGLinuxDriver
  Cc: Nikolay Aleksandrov, Andrew Lunn, Florian Fainelli,
	Vivien Didelot, David S. Miller, Jiri Pirko, Ido Schimmel,
	Jakub Kicinski, Ivan Vecera, netdev, Roopa Prabhu

On Fri, May 22, 2020 at 08:45:34PM +0200, Allan W. Nielsen wrote:
> On 22.05.2020 16:13, Vladimir Oltean wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > 
> > On Fri, 22 May 2020 at 15:38, Nikolay Aleksandrov
> > <nikolay@cumulusnetworks.com> wrote:
> > > 
> > > On 22/05/2020 00:10, Vladimir Oltean wrote:
> > > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > >
> > > > In cases where the bridge is offloaded by a switchdev, there are
> > > > situations where we can optimize RX filtering towards the host. To be
> > > > precise, the host only needs to do termination, which it can do by
> > > > responding at the MAC addresses of the slave ports and of the bridge
> > > > interface itself. But most notably, it doesn't need to do forwarding,
> > > > so there is no need to see packets with unknown destination address.
> > > >
> > > > But there are, however, cases when a switchdev does need to flood to the
> > > > CPU. Such an example is when the switchdev is bridged with a foreign
> > > > interface, and since there is no offloaded datapath, packets need to
> > > > pass through the CPU. Currently this is the only identified case, but it
> > > > can be extended at any time.
> > > >
> > > > So far, switchdev implementers made driver-level assumptions, such as:
> > > > this chip is never integrated in SoCs where it can be bridged with a
> > > > foreign interface, so I'll just disable host flooding and save some CPU
> > > > cycles. Or: I can never know what else can be bridged with this
> > > > switchdev port, so I must leave host flooding enabled in any case.
> > > >
> > > > Let the bridge drive the host flooding decision, and pass it to
> > > > switchdev via the same mechanism as the external flooding flags.
> > > >
> > > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > > ---
> > > >  include/linux/if_bridge.h |  3 +++
> > > >  net/bridge/br_if.c        | 40 +++++++++++++++++++++++++++++++++++++++
> > > >  net/bridge/br_switchdev.c |  4 +++-
> > > >  3 files changed, 46 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> > > > index b3a8d3054af0..6891a432862d 100644
> > > > --- a/include/linux/if_bridge.h
> > > > +++ b/include/linux/if_bridge.h
> > > > @@ -49,6 +49,9 @@ struct br_ip_list {
> > > >  #define BR_ISOLATED          BIT(16)
> > > >  #define BR_MRP_AWARE         BIT(17)
> > > >  #define BR_MRP_LOST_CONT     BIT(18)
> > > > +#define BR_HOST_FLOOD                BIT(19)
> > > > +#define BR_HOST_MCAST_FLOOD  BIT(20)
> > > > +#define BR_HOST_BCAST_FLOOD  BIT(21)
> > > >
> > > >  #define BR_DEFAULT_AGEING_TIME       (300 * HZ)
> > > >
> > > > diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> > > > index a0e9a7937412..aae59d1e619b 100644
> > > > --- a/net/bridge/br_if.c
> > > > +++ b/net/bridge/br_if.c
> > > > @@ -166,6 +166,45 @@ void br_manage_promisc(struct net_bridge *br)
> > > >       }
> > > >  }
> > > >
> > > > +static int br_manage_host_flood(struct net_bridge *br)
> > > > +{
> > > > +     const unsigned long mask = BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD |
> > > > +                                BR_HOST_BCAST_FLOOD;
> > > > +     struct net_bridge_port *p, *q;
> > > > +
> > > > +     list_for_each_entry(p, &br->port_list, list) {
> > > > +             unsigned long flags = p->flags;
> > > > +             bool sw_bridging = false;
> > > > +             int err;
> > > > +
> > > > +             list_for_each_entry(q, &br->port_list, list) {
> > > > +                     if (p == q)
> > > > +                             continue;
> > > > +
> > > > +                     if (!netdev_port_same_parent_id(p->dev, q->dev)) {
> > > > +                             sw_bridging = true;
> > > > +                             break;
> > > > +                     }
> > > > +             }
> > > > +
> > > > +             if (sw_bridging)
> > > > +                     flags |= mask;
> > > > +             else
> > > > +                     flags &= ~mask;
> > > > +
> > > > +             if (flags == p->flags)
> > > > +                     continue;
> > > > +
> > > > +             err = br_switchdev_set_port_flag(p, flags, mask);
> > > > +             if (err)
> > > > +                     return err;
> > > > +
> > > > +             p->flags = flags;
> > > > +     }
> > > > +
> > > > +     return 0;
> > > > +}
> > > > +
> > > >  int nbp_backup_change(struct net_bridge_port *p,
> > > >                     struct net_device *backup_dev)
> > > >  {
> > > > @@ -231,6 +270,7 @@ static void nbp_update_port_count(struct net_bridge *br)
> > > >               br->auto_cnt = cnt;
> > > >               br_manage_promisc(br);
> > > >       }
> > > > +     br_manage_host_flood(br);
> > > >  }
> > > >
> > > 
> > > Can we do this only at port add/del ?
> > > Right now it will be invoked also by br_port_flags_change() upon BR_AUTO_MASK flag change.
> > > 
> > 
> > Yes, we can do that.
> > Actually I have some doubts about BR_HOST_BCAST_FLOOD. We can't
> > disable that in the no-foreign-interface case, can we? For IPv6, it
> > looks like the stack does take care of installing dev_mc addresses for
> > the neighbor discovery protocol, but for IPv4 I guess the assumption
> > is that broadcast ARP should always be processed?
> 
> Ideally this should be per VLAN. In case of IPv4, you only need to be
> part of the broadcast domain on VLANs with an associated vlan-interface.
> 

In Ocelot, what is the mechanism to remove the CPU from the flood domain
of a particular VLAN?

I thought of 2 approaches:

- VLAN_FLOOD_DIS in ANA:ANA_TABLES:VLANTIDX. But this disables flooding
  at the level of the entire VLAN, regardless of source and destination
  ports. So it cannot be used, as it interferes with the VLANs from the
  bridge.

- Removing the CPU from VLAN_PORT_MASK. But the documentation for this
  field says:

      Frames classified to this VLAN can only be 0x3F
      sent to ports in this mask. Note that the CPU
      port module is always member of all VLANs
      and its VLAN membership can therefore not
      be configured through this mask.

So I don't think any of them works.

> > > >  static void nbp_delete_promisc(struct net_bridge_port *p)
> > > > diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> > > > index 015209bf44aa..360806ac7463 100644
> > > > --- a/net/bridge/br_switchdev.c
> > > > +++ b/net/bridge/br_switchdev.c
> > > > @@ -56,7 +56,9 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
> > > >
> > > >  /* Flags that can be offloaded to hardware */
> > > >  #define BR_PORT_FLAGS_HW_OFFLOAD (BR_LEARNING | BR_FLOOD | \
> > > > -                               BR_MCAST_FLOOD | BR_BCAST_FLOOD)
> > > > +                               BR_MCAST_FLOOD | BR_BCAST_FLOOD | \
> > > > +                               BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD | \
> > > > +                               BR_HOST_BCAST_FLOOD)
> > > >
> > > >  int br_switchdev_set_port_flag(struct net_bridge_port *p,
> > > >                              unsigned long flags,
> > > >
> > > 
> > 
> > Thanks,
> > -Vladimir
> /Allan

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding
  2020-05-25 20:11       ` Ido Schimmel
  2020-05-25 20:32         ` Vladimir Oltean
@ 2020-07-23 22:35         ` Vladimir Oltean
  2020-07-27 17:15           ` Ido Schimmel
  1 sibling, 1 reply; 46+ messages in thread
From: Vladimir Oltean @ 2020-07-23 22:35 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, netdev, Horatiu Vultur,
	Allan W. Nielsen, Nikolay Aleksandrov, Roopa Prabhu

On Mon, May 25, 2020 at 11:11:11PM +0300, Ido Schimmel wrote:
> On Sun, May 24, 2020 at 07:13:46PM +0300, Vladimir Oltean wrote:
> > Hi Ido,
> > 
> > On Sun, 24 May 2020 at 17:26, Ido Schimmel <idosch@idosch.org> wrote:
> > >
> > > On Fri, May 22, 2020 at 12:10:33AM +0300, Vladimir Oltean wrote:
> > > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > >
> > > > In cases where the bridge is offloaded by a switchdev, there are
> > > > situations where we can optimize RX filtering towards the host. To be
> > > > precise, the host only needs to do termination, which it can do by
> > > > responding at the MAC addresses of the slave ports and of the bridge
> > > > interface itself. But most notably, it doesn't need to do forwarding,
> > > > so there is no need to see packets with unknown destination address.
> > > >
> > > > But there are, however, cases when a switchdev does need to flood to the
> > > > CPU. Such an example is when the switchdev is bridged with a foreign
> > > > interface, and since there is no offloaded datapath, packets need to
> > > > pass through the CPU. Currently this is the only identified case, but it
> > > > can be extended at any time.
> > > >
> > > > So far, switchdev implementers made driver-level assumptions, such as:
> > > > this chip is never integrated in SoCs where it can be bridged with a
> > > > foreign interface, so I'll just disable host flooding and save some CPU
> > > > cycles. Or: I can never know what else can be bridged with this
> > > > switchdev port, so I must leave host flooding enabled in any case.
> > > >
> > > > Let the bridge drive the host flooding decision, and pass it to
> > > > switchdev via the same mechanism as the external flooding flags.
> > > >
> > > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > > ---
> > > >  include/linux/if_bridge.h |  3 +++
> > > >  net/bridge/br_if.c        | 40 +++++++++++++++++++++++++++++++++++++++
> > > >  net/bridge/br_switchdev.c |  4 +++-
> > > >  3 files changed, 46 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> > > > index b3a8d3054af0..6891a432862d 100644
> > > > --- a/include/linux/if_bridge.h
> > > > +++ b/include/linux/if_bridge.h
> > > > @@ -49,6 +49,9 @@ struct br_ip_list {
> > > >  #define BR_ISOLATED          BIT(16)
> > > >  #define BR_MRP_AWARE         BIT(17)
> > > >  #define BR_MRP_LOST_CONT     BIT(18)
> > > > +#define BR_HOST_FLOOD                BIT(19)
> > > > +#define BR_HOST_MCAST_FLOOD  BIT(20)
> > > > +#define BR_HOST_BCAST_FLOOD  BIT(21)
> > > >
> > > >  #define BR_DEFAULT_AGEING_TIME       (300 * HZ)
> > > >
> > > > diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> > > > index a0e9a7937412..aae59d1e619b 100644
> > > > --- a/net/bridge/br_if.c
> > > > +++ b/net/bridge/br_if.c
> > > > @@ -166,6 +166,45 @@ void br_manage_promisc(struct net_bridge *br)
> > > >       }
> > > >  }
> > > >
> > > > +static int br_manage_host_flood(struct net_bridge *br)
> > > > +{
> > > > +     const unsigned long mask = BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD |
> > > > +                                BR_HOST_BCAST_FLOOD;
> > > > +     struct net_bridge_port *p, *q;
> > > > +
> > > > +     list_for_each_entry(p, &br->port_list, list) {
> > > > +             unsigned long flags = p->flags;
> > > > +             bool sw_bridging = false;
> > > > +             int err;
> > > > +
> > > > +             list_for_each_entry(q, &br->port_list, list) {
> > > > +                     if (p == q)
> > > > +                             continue;
> > > > +
> > > > +                     if (!netdev_port_same_parent_id(p->dev, q->dev)) {
> > > > +                             sw_bridging = true;
> > >
> > > It's not that simple. There are cases where not all bridge slaves have
> > > the same parent ID and still there is no reason to flood traffic to the
> > > CPU. VXLAN, for example.
> > >
> > > You could argue that the VXLAN device needs to have the same parent ID
> > > as the physical netdevs member in the bridge, but it will break your
> > > data path. For example, lets assume your hardware decided to flood a
> > > packet in L2. The packet will egress all the local ports, but will also
> > > perform VXLAN encapsulation. The packet continues with the IP of the
> > > remote VTEP(s) to the underlay router and then encounters a neighbour
> > > miss exception, which sends it to the CPU for resolution.
> > >
> > > Since this exception was encountered in the router the driver would mark
> > > the packet with 'offload_fwd_mark', as it already performed L2
> > > forwarding. If the VXLAN device has the same parent ID as the physical
> > > netdevs, then the Linux bridge will never let it egress, nothing will
> > > trigger neighbour resolution and the packet will be discarded.
> > >
> > 
> > I wasn't going to argue that.
> > Ok, so with a bridged VXLAN only certain multicast DMACs corresponding
> > to multicast IPs should be flooded to the CPU.
> > Actually Allan's example was a bit simpler, he said that host flooding
> > can be made a per-VLAN flag. I'm glad that you raised this. So maybe
> > we should try to define some mechanism by which virtual interfaces can
> > specify to the bridge that they don't need to see all traffic? Do you
> > have any ideas?
> 
> Maybe, when a port joins a bridge, query member ports if they can
> forward traffic to it in hardware and based on the answer determine the
> flooding towards the CPU?
> 

Hi Ido, Allan,

I understand less and less of this. What I don't really understand is,
if you have a switchdev bridged with a vtep like this:

 +-------------------------+
 |           br0           |
 +-------------------------+
     |                |
     |           +--------+
     |           | vxlan0 |
     |           +--------+
     |                |
 +--------+      +--------+
 |  swp0  |      |  eth0  |
 +--------+      +--------+

why would the swp0 interface care about the remote_ip at all. To the
traffic seen by swp0, the VXLAN segment doesn't exist. Encapsulation and
decapsulation all happen outside of the switchdev interface. All that
switchdev sees is that, from the CPU side, it's talking to a bunch of
MAC addresses.

The same comment also applies for 8021q, in fact. I did try this
experiment, to bridge a switchdev with a VLAN sub-interface of another
port. I don't know why, I used to have the misconception that the desire
in doing that would be to somehow only extract one VLAN ID from the
switchdev, and the rest could be kept outside of the CPU's flooding
domain. But that isn't the case at all. When bridging, I'm bridging the
_entire_ traffic of swp0 with, say, eth0.100. And, as in the case of
vxlan, encap/decap all happens outside of switchdev. So, contrary to my
initial expectation, if I'm receiving on swp0 a packet tagged with VLAN
100, it would end up exiting the bridge, on eth0, with 2 VLAN tags with
ID 100.

Simply put, I think my change is fine the way it is. Either that, or I
just don't understand your comment about querying bridge members whether
they can forward in hardware. How are you dealing with this today?

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-07-20 10:00               ` Vladimir Oltean
@ 2020-07-27 16:56                 ` Ido Schimmel
  2020-10-27 11:52                   ` Vladimir Oltean
  0 siblings, 1 reply; 46+ messages in thread
From: Ido Schimmel @ 2020-07-27 16:56 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, vyasevich, netdev,
	UNGLinuxDriver, Nikolay Aleksandrov, Roopa Prabhu

On Mon, Jul 20, 2020 at 01:00:37PM +0300, Vladimir Oltean wrote:
> On Thu, May 28, 2020 at 05:37:18PM +0300, Ido Schimmel wrote:
> > On Wed, May 27, 2020 at 02:36:53PM +0300, Vladimir Oltean wrote:
> > > On Tue, 26 May 2020 at 17:02, Ido Schimmel <idosch@idosch.org> wrote:
> > > >
> > > > On Mon, May 25, 2020 at 11:23:34PM +0300, Vladimir Oltean wrote:
> > > > > Hi Ido,
> > > > >
> > > > > On Mon, 25 May 2020 at 22:48, Ido Schimmel <idosch@idosch.org> wrote:
> > > > > >
> > > > > > On Sun, May 24, 2020 at 07:24:27PM +0300, Vladimir Oltean wrote:
> > > > > > > On Sun, 24 May 2020 at 17:07, Ido Schimmel <idosch@idosch.org> wrote:
> > > > > > > >
> > > > > > > > On Fri, May 22, 2020 at 12:10:23AM +0300, Vladimir Oltean wrote:
> > > > > > > > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > > > > > > >
> > > > > > > > > This is a WIP series whose stated goal is to allow DSA and switchdev
> > > > > > > > > drivers to flood less traffic to the CPU while keeping the same level of
> > > > > > > > > functionality.
> > > > > > > > >
> > > > > > > > > The strategy is to whitelist towards the CPU only the {DMAC, VLAN} pairs
> > > > > > > > > that the operating system has expressed its interest in, either due to
> > > > > > > > > those being the MAC addresses of one of the switch ports, or addresses
> > > > > > > > > added to our device's RX filter via calls to dev_uc_add/dev_mc_add.
> > > > > > > > > Then, the traffic which is not explicitly whitelisted is not sent by the
> > > > > > > > > hardware to the CPU, under the assumption that the CPU didn't ask for it
> > > > > > > > > and would have dropped it anyway.
> > > > > > > > >
> > > > > > > > > The ground for these patches were the discussions surrounding RX
> > > > > > > > > filtering with switchdev in general, as well as with DSA in particular:
> > > > > > > > >
> > > > > > > > > "[PATCH net-next 0/4] DSA: promisc on master, generic flow dissector code":
> > > > > > > > > https://www.spinics.net/lists/netdev/msg651922.html
> > > > > > > > > "[PATCH v3 net-next 2/2] net: dsa: felix: Allow unknown unicast traffic towards the CPU port module":
> > > > > > > > > https://www.spinics.net/lists/netdev/msg634859.html
> > > > > > > > > "[PATCH v3 0/2] net: core: Notify on changes to dev->promiscuity":
> > > > > > > > > https://lkml.org/lkml/2019/8/29/255
> > > > > > > > > LPC2019 - SwitchDev offload optimizations:
> > > > > > > > > https://www.youtube.com/watch?v=B1HhxEcU7Jg
> > > > > > > > >
> > > > > > > > > Unicast filtering comes to me as most important, and this includes
> > > > > > > > > termination of MAC addresses corresponding to the network interfaces in
> > > > > > > > > the system (DSA switch ports, VLAN sub-interfaces, bridge interface).
> > > > > > > > > The first 4 patches use Ivan Khoronzhuk's IVDF framework for extending
> > > > > > > > > network interface addresses with a Virtual ID (typically VLAN ID). This
> > > > > > > > > matches DSA switches perfectly because their FDB already contains keys
> > > > > > > > > of the {DMAC, VID} form.
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I read through the series and I'm not sure how unicast filtering works.
> > > > > > > > Instead of writing a very long mail I just created a script with
> > > > > > > > comments. I think it's clearer that way. Note that this is not a made up
> > > > > > > > configuration. It is used in setups involving VRRP / VXLAN, for example.
> > > > > > > >
> > > > > > > > ```
> > > > > > > > #!/bin/bash
> > > > > > > >
> > > > > > > > ip netns add ns1
> > > > > > > >
> > > > > > > > ip -n ns1 link add name br0 type bridge vlan_filtering 1
> > > > > > > > ip -n ns1 link add name dummy10 up type dummy
> > > > > > > >
> > > > > > > > ip -n ns1 link set dev dummy10 master br0
> > > > > > > > ip -n ns1 link set dev br0 up
> > > > > > > >
> > > > > > > > ip -n ns1 link add link br0 name vlan10 up type vlan id 10
> > > > > > > > bridge -n ns1 vlan add vid 10 dev br0 self
> > > > > > > >
> > > > > > > > echo "Before adding macvlan:"
> > > > > > > > echo "======================"
> > > > > > > >
> > > > > > > > echo -n "Promiscuous mode: "
> > > > > > > > ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
> > > > > > > >
> > > > > > > > echo -e "\nvlan10's MAC is in br0's FDB:"
> > > > > > > > bridge -n ns1 fdb show br0 vlan 10
> > > > > > > >
> > > > > > > > echo
> > > > > > > > echo "After adding macvlan:"
> > > > > > > > echo "====================="
> > > > > > > >
> > > > > > > > ip -n ns1 link add link vlan10 name vlan10-v up address 00:00:5e:00:01:01 \
> > > > > > > >         type macvlan mode private
> > > > > > > >
> > > > > > > > echo -n "Promiscuous mode: "
> > > > > > > > ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
> > > > > > > >
> > > > > > > > echo -e "\nvlan10-v's MAC is not in br0's FDB:"
> > > > > > > > bridge -n ns1 fdb show br0 | grep master | grep 00:00:5e:00:01:01
> > > > > > > > ```
> > > > > > > >
> > > > > > > > This is the output on my laptop (kernel 5.6.8):
> > > > > > > >
> > > > > > > > ```
> > > > > > > > Before adding macvlan:
> > > > > > > > ======================
> > > > > > > > Promiscuous mode: 0
> > > > > > > >
> > > > > > > > vlan10's MAC is in br0's FDB:
> > > > > > > > 42:bd:b1:cc:67:15 dev br0 vlan 10 master br0 permanent
> > > > > > > >
> > > > > > > > After adding macvlan:
> > > > > > > > =====================
> > > > > > > > Promiscuous mode: 1
> > > > > > > >
> > > > > > > > vlan10-v's MAC is not in br0's FDB:
> > > > > > > > ```
> > > > > > > >
> > > > > > > > Basically, if the MAC of the VLAN device is not inherited from the
> > > > > > > > bridge or you stack macvlans on top, then the bridge will go into
> > > > > > > > promiscuous mode and it will locally receive all frames passing through
> > > > > > > > it. It's not ideal, but it's a very old and simple behavior. It does not
> > > > > > > > require you to track the VLAN associated with the MAC addresses, for
> > > > > > > > example.
> > > > > > > >
> > > > > > >
> > > > > > > This is a good point. I wasn't aware that the bridge 'gives up' with
> > > > > > > macvlan upper devices, but if I understand correctly, we do have the
> > > > > > > necessary tools to improve that.
> > > > > > > But actually, I'm wondering if this simple behavior from the bridge is
> > > > > > > correct.
> > > > > >
> > > > > > Why would it be incorrect?
> > > > > >
> > > > > > > As you, Jiri and Ivan pointed out in last summer's email
> > > > > > > thread about the Linux bridge and promiscuous mode, putting the
> > > > > > > interface in IFF_PROMISC is only going to guarantee acceptance through
> > > > > > > the net device's RX filter, but not that the packets will go to the
> > > > > > > CPU.
> > > > > >
> > > > > > IFF_PROMISC has no bearing on whether a packet should go to the CPU or
> > > > > > not. It only influences the device's RX filter, like you said. If you
> > > > > > only look at the software data path, the bridge being in promiscuous
> > > > > > mode means that all received packets will be injected to the kernel's Rx
> > > > > > path as if they were received through the bridge device. This includes,
> > > > > > for example, an IPv4 packet with an unknown unicast MAC (does not
> > > > > > correspond to your MAC). Such a packet will be later dropped by the IPv4
> > > > > > code since it's not addressed to you:
> > > > > >
> > > > > > vi net/ipv4/ip_input.c +443
> > > > > >
> > > > > > We maintain the same behavior in the hardware data path. We don't have
> > > > > > MAC filtering in the router like the software data path, so we only send
> > > > > > to the router unicast MACs that correspond to the bridge's MAC and its
> > > > > > uppers. If such packets later hit a local route (for example), then they
> > > > > > will be trapped to the CPU, but the more common case is to simply route
> > > > > > them through a different device due to a prefix / gateway route. These
> > > > > > never reach the CPU.
> > > > > >
> > > > > > > So from that perspective, the current series would break things, so we
> > > > > > > should definitely fix that and keep the {MAC, VLAN} pairs in the
> > > > > > > bridge's local FDB.
> > > > > >
> > > > > > Not sure I follow. Can you explain what will break and why?
> > > > > >
> > > > >
> > > > > I haven't done any further testing since yesterday, so my level of
> > > > > (mis)understanding is the same. Let's hope at least I can explain
> > > > > better this time.
> > > > >
> > > > > I guess what I didn't understand from your "macvlan upper whose MAC
> > > > > address isn't inherited from bridge" is why does the bridge go in
> > > > > promiscuous mode.
> > > >
> > > > Packets received from bridge slaves with DMAC equal to an active bridge
> > > > upper (e.g., macvlan) should be received by this upper. When a packet is
> > > > received from a bridge slave it performs FDB lookup. Since {VID, MAC}
> > > > entries are not programmed for bridge uppers, packets addressed to such
> > > > addresses will incur an FDB miss and be flooded. If the bridge is not in
> > > > promiscuous mode, these packets will not be received via the bridge
> > > > interface and will not reach the relevant upper device.
> > > >
> > > > > You said that it's so that the slave ports won't drop packets with
> > > > > that DMAC,
> > > >
> > > > I did not say that. I explained above that if promiscuous mode is not
> > > > enabled on the bridge interface itself (a soft device), the packet will
> > > > not be received via the bridge interface and will not reach the upper
> > > > device.
> > > >
> > > > > I said ok, yes the packets would get dropped without promisc, but also
> > > > > promisc still doesn't mean the packets will land on the CPU. This is
> > > > > one of the cases where the bridge puts an interface in promisc mode
> > > > > with the intention of making the CPU see some frames,
> > > >
> > > > The statement "the bridge puts an interface in promisc mode with the
> > > > intention of making the CPU see some frames" is incorrect. The bridge
> > > > puts an interface in promiscuous mode so that the bridge will see all
> > > > the frames received by this interface. If the bridge is offloaded,
> > > > bridging happens in hardware and there is no reason to send all the
> > > > frames to the CPU.
> > > >
> > > > > something which has been argued, in the context of switchdev, that was
> > > > > never the case. You said that's all true, and that in mlxsw you're
> > > > > giving the bridge a helping hand, by tracking the bridge's uppers in
> > > > > order to keep something that works by accident in software working
> > > > > with switchdev too.
> > > >
> > > > I never said that the software bridge works by accident. I explained
> > > > why, to my understanding, the bridge works the way it's working and what
> > > > can be done in order to prevent the bridge from going into promiscuous
> > > > mode. It involves very careful (and error-prone?) tracking of the upper
> > > > devices and their VLANs.
> > > >
> > > > Also, please differentiate between the bridge interface itself going
> > > > into promiscuous mode and bridge slaves going into promiscuous mode.
> > > >
> > > > > I said that this is a
> > > > > weird layering violation, because the bridge's job is to notify the
> > > > > driver of addresses it needs to see, not for the driver to fish for
> > > > > them.
> > > > > As for "what will break and why". My current patch proposal is to only
> > > > > send to the CPU the addresses added via dev_uc_add and dev_mc_add,
> > > > > basically. The macvlan upper of the bridge would not be part of that
> > > > > list. My rhetorical question then becomes: whose fault is it that
> > > > > macvlan breaks? Mine for not tracking the bridge upper, or the bridge
> > > > > for not notifying me and just pretending that 'promisc' means 'the CPU
> > > > > will see all packets, including the ones I need'? Of course I think
> > > > > it's the bridge.
> > > > >
> > > 
> > > Ok, bridge promisc vs slave promisc is not a difference I explicitly
> > > made, but my point is actually beyond that.
> > > The bridge going in promisc will only help if the packets are sent to
> > > the CPU in the first place. And it does nothing to ensure that that
> > > will happen. So the bridge code works by accident.
> > 
> > It's not beyond your point and the bridge code does not work by
> > accident. When the bridge interface is in promiscuous mode every packet
> > is injected to the kernel's Rx path as if it was accepted by the bridge
> > interface. Packets then reach the protocol handlers. In the case of
> > IPv4/IPv6, the packets go to ipv_rcv() / ip6_rcv() and perform routing.
> > Packets with a unicast destination MAC that does not correspond to that
> > of the receiving interface are dropped.
> > 
> 
> Hi Ido,
> 
> I still maintain that the bridge and the network stack in general don't
> have a proper design for managing a switchdev's filter of what goes to
> the CPU, let me explain.

OK, but it's very different from your previous claim that "the bridge
code works by accident".

> 
> The whole purpose of my patch series is to remove the CPU port from the
> flood domain of all switchdev net_devices. That means, when an unknown
> unicast packet ingresses, it will be flooded but not to the CPU. 

Good. This is what happens in mlxsw today.

> For frames that the CPU wants to see, there should be a universal
> mechanism for it to whitelist them, by {DMAC, VID}. Otherwise, things
> don't scale.
> 
> There is one such mechanism already, and that is dev_uc_add(). It used
> to install an address into a device's RX filter using DMAC only, and
> Ivan Khoronzhuk's patches have added a new dev_vid_uc_add() that allow
> additional filtering by VLAN.

Yes, but please note that when you are talking about packets the CPU
cares about, then the device is the bridge device. Not its slaves which
are "promiscuous by definition".

> 
> That mechanism used to have a meaning. Its meaning was: for a
> non-promisc net_device, don't drop unicast frames having a MAC address
> equal to the argument passed to dev_uc_add().

Yes.

> 
> For a promiscuous net_device, that is not needed, because promisc means
> that no frames should be dropped due to MAC/VID match.

Yes.

> 
> For a switchdev, promisc vs non-promisc doesn't mean a thing.

Correct.

> A switch is a switch, it's promiscuous by definition. It is _supposed_
> to accept traffic regardless of destination. There isn't even a
> mechanism in the switchdevs that I know of to install an RX filter in
> the ingress MAC (or if there is, it filters by source MAC address, and
> it's done for different reasons).

Yes. It doesn't exist in mlxsw as well.

> This is fundamentally because the destination MAC address is parsed by
> a network card for _termination_ purposes. And because a switch
> doesn't do _termination_, there is no reason to filter by destination
> MAC (ignore ACL and such). But a Linux switchdev is capable of
> termination. In the case of switchdev, termination means sending to
> the CPU.

You keep saying "CPU", but it's because you are most likely only
concerned with switches that are not capable of L3 forwarding. In mlxsw
we never send packets from the FDB to the CPU, but to the "router port".
There the packets (whether unicast or multicast) are routed and either
forwarded to a different port or locally received.

> 
> My interpretation of the meaning of dev_uc_add() for switchdev (and
> therefore, of its opposite - promiscuous mode) is at odds with previous
> work done for non-switchdev. Take Vlad Yasevich's work "[Bridge] [PATCH
> net-next 0/8] Non-promisc bidge ports support" for example:
> 
> https://lists.linuxfoundation.org/pipermail/bridge/2014-May/008940.html
> 
> He is arguing that a bridge port without flood&learn doesn't need
> promiscuous mode, because all addresses can be statically known, and
> therefore, he added code to the bridge that does the following:
> 
> - syncs the bridge MAC address to all non-promisc bridge slaves, via
>   dev_uc_add()
> - syncs the MAC addresses of all static FDB entries on all ingress
>   non-promisc bridge slave ports, via dev_uc_add()
> 
> with the obvious goal that "the bridge slave shouldn't drop these
> packets".

Lets say all the ports are not automatic (using Vlad's terminology),
then packets can only be forwarded based on FDB entries. Any packets
with a destination MAC not in the FDB will be dropped by the bridge.
Agree?

Now, if this is the case, then you know in advance which MACs will not
be dropped by the bridge. Therefore, you can program only these MACs to
the Rx filters of the bridge slaves (simple NICs). That way, instead of
having the bridge (the CPU) waste cycles on dropping packets you can
drop them in hardware using the NIC's Rx filters.

> 
> In my interpretation of dev_uc_add(), I would have expected that:
> - the bridge MAC address, as well as any other secondary unicast
>   addresses that the bridge has, by means of its uppers (like macvlan,
>   802.1q, etc) calling dev_uc_add() on it, would be synced to the bridge
>   slaves anyway, regardless of whether they're promisc or not

Is this supposed to be related to previous paragraph about Vald's work?
I don't really follow. Anyway, he specifically wrote that "There are
some other cases when promiscuous mode has to be turned back on. One is
when the bridge itself if placed in promiscuous mode".

When you start adding bridge uppers with different MACs then the bridge
will enter promiscuous mode and all unknown unicast packets will be
flooded to it. In this case packets without a matching FDB will no
longer be dropped by the bridge and therefore the NIC can't drop them in
hardware using its Rx filters anymore.

> - the static FDB entries are synced to the bridge ports only in the
>   non-switchdev case. This is because for switchdev, I am treating a
>   dev_uc_add() as a FDB entry towards the CPU, and therefore this would
>   overwrite the FDB entry towards the external port.

OK, so this interpretation of "treating a dev_uc_add() as a FDB entry
towards the CPU" is wrong.

You already wrote that "For a switchdev, promisc vs non-promisc doesn't
mean a thing" and that "[dev_uc_add() is] used to install an address
into a device's RX filter".

You can't tell me that switches do not perform Rx filtering and then
decide to re-purpose a mechanism that is used for Rx filtering...

> 
> In my interpretation, things would have worked neatly for the most part,
> not only for unicast but also for multicast. For example, an application
> wants to see a multicast stream, so it calls setsockopt(SOL_SOCKET,
> PACKET_ADD_MEMBERSHIP, PACKET_MR_MULTICAST) with the multicast address
> it wants to see. This is translated by the kernel into a dev_mc_add()
> and sent to the network device. For a non-switchdev, this would have
> been enough. For a switchdev, if I also installed the address in the
> CPU's filter, it would have also been enough. Things 'just work' and
> everybody's happy.
> 
> > When you look at it from hardware offload perspective, not every packet
> > received by the bridge interface should reach the CPU. Actually, most
> > should not reach it. Otherwise it would mean that every routed packet
> > would need to go to the CPU, which is not feasible. If you can't perform
> > routing in hardware, then yes, you need to send such packets to the CPU.
> > 
> > In mlxsw we can't perform MAC filtering in the router like in the
> > software data path, so in order not to route packets we should not, we
> > only send to the router packets with destination MACs that correspond to
> > that of the bridge or one of its uppers. We don't flood all unknown
> > unicast packets there.
> > 
> > In the case of hardware offload it's relatively easy to do this sort of
> > tracking because only a limited set of upper devices topologies are
> > actually supported. I'm not sure how feasible it is with every
> > combination of upper devices supported by the kernel. It seems easiest
> > to just put the bridge interface in promiscuous mode and let upper
> > layers perform the filtering. Like it is today.
> > 
> 
> Are you suggesting that tracking the uppers is the only way to do what I
> want?

I don't see a different way. Your goal is to prevent flooding of unknown
unicast packets to the CPU. If the bridge is not in promiscuous mode,
then unknown unicast packets are not flooded to it. Only FDB entries
pointing to the bridge device should go to the CPU.

The problem starts when the bridge enters promiscuous mode. When does it
happen? When you start adding uppers that do not inherit the bridge's
MAC. Why? Because the bridge does not support unicast filtering. It is
not an easy thing to do when you have multiple levels of stacked
devices.

> (I didn't even find that piece of code in mlxsw, btw).

See the calls to mlxsw_sp_rif_fdb_op(). They program unicast FDBs
towards the "router port" (what you call CPU port). I mean to sync these
with the bridge by calling SWITCHDEV_FDB_ADD_TO_DEVICE. That way
hardware and software are in sync.

> 
> I am a bit reluctant to do such management at driver level. It is not a
> driver problem, it is a switchdev design question. I shouldn't need to
> care if there's a macvlan or an 802.1q or a bridge upper or whatnot, and
> how many addresses those are listening to, as long as those network
> interfaces can tell me what addresses they want to see, and as long as I
> can interpret that information as the list of addresses I should be
> delivering to the CPU.

I agree that it would be better not to duplicate this logic between
multiple drivers. Question is where this code should live? switchdev?
bridge?

> It is clear that right now, some of the uses of dev_uc_add() are
> simply there to prevent drops, and not because the bridge has a
> particular interest in seeing those frames. So either the meaning of
> dev_uc_add() changes, and a meaning is standardized for promisc on a
> switchdev port, or we add a new set of functions, such as
> dev_cpu_filter_uc_add() specifically for switchdev and spray them
> throughout the network stack, mostly in the places where dev_uc_add()
> is currently also used.

Again, this is not related to Rx filters. The only unicast packets you
need to send to the CPU are those matching FDB entries pointing to the
bridge device. How the list of these FDB entries is composed? By
tracking the bridge uppers. How/where this is done is the question.

> 
> Bridging with a non-switchdev interface is also a situation that should
> be dealt with generically, as in that case, the CPU filter should
> obviously become larger, as termination is no longer done just on this
> CPU, and plain unicast filtering is no longer enough.
> 
> I hope it's a bit clearer now what is the problem I'm trying to address.
> Needless to say, I would prefer that a new API is not introduced,
> because upper layers shouldn't necessarily care about switchdev, unless
> they can leverage it for offloading.
> 
> > > 
> > > > > > >
> > > > > > > > When you are offloading the Linux data path to hardware this behavior is
> > > > > > > > not ideal as your hardware can handle much higher packet rates than the
> > > > > > > > CPU.
> > > > > > > >
> > > > > > > > In mlxsw we handle this by tracking the upper devices of the bridge. I
> > > > > > > > was hoping that with Ivan's patches we could add support for unicast
> > > > > > > > filtering in the bridge driver and program the MAC addresses to its FDB
> > > > > > > > with 'local' flag. Then the FDB entries would be notified via switchdev
> > > > > > > > to device drivers.
> > > > > > > >
> > > > > > >
> > > > > > > Yes, it should be possible to do that. I'll try and see how far I get.
> > > > > > >
> > > > > > > > >
> > > > > > > > > Multicast filtering was taken and reworked from Florian Fainelli's
> > > > > > > > > previous attempts, according to my own understanding of multicast
> > > > > > > > > forwarding requirements of an IGMP snooping switch. This is the part
> > > > > > > > > that needs the most extra work, not only in the DSA core but also in
> > > > > > > > > drivers. For this reason, I've left out of this patchset anything that
> > > > > > > > > has to do with driver-level configuration (since the audience is a bit
> > > > > > > > > larger than usual), as I'm trying to focus more on policy for now, and
> > > > > > > > > the series is already pretty huge.
> > > > > > > >
> > > > > > > > From what I remember, this is the logic in the Linux bridge:
> > > > > > > >
> > > > > > > > * Broadcast is always locally received
> > > > > > > > * Multicast is locally received if:
> > > > > > > >         * Snooping disabled
> > > > > > > >         * Snooping enabled:
> > > > > > > >                 * Bridge netdev is mrouter port
> > > > > > > >                 or
> > > > > > > >                 * Matches MDB entry with 'host_joined' indication
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Florian Fainelli (3):
> > > > > > > > >   net: bridge: multicast: propagate br_mc_disabled_update() return
> > > > > > > > >   net: dsa: add ability to program unicast and multicast filters for CPU
> > > > > > > > >     port
> > > > > > > > >   net: dsa: wire up multicast IGMP snooping attribute notification
> > > > > > > > >
> > > > > > > > > Ivan Khoronzhuk (4):
> > > > > > > > >   net: core: dev_addr_lists: add VID to device address
> > > > > > > > >   net: 8021q: vlan_dev: add vid tag to addresses of uc and mc lists
> > > > > > > > >   net: 8021q: vlan_dev: add vid tag for vlan device own address
> > > > > > > > >   ethernet: eth: add default vid len for all ethernet kind devices
> > > > > > > > >
> > > > > > > > > Vladimir Oltean (6):
> > > > > > > > >   net: core: dev_addr_lists: export some raw __hw_addr helpers
> > > > > > > > >   net: dsa: don't use switchdev_notifier_fdb_info in
> > > > > > > > >     dsa_switchdev_event_work
> > > > > > > > >   net: dsa: mroute: don't panic the kernel if called without the prepare
> > > > > > > > >     phase
> > > > > > > > >   net: bridge: add port flags for host flooding
> > > > > > > > >   net: dsa: deal with new flooding port attributes from bridge
> > > > > > > > >   net: dsa: treat switchdev notifications for multicast router connected
> > > > > > > > >     to port
> > > > > > > > >
> > > > > > > > >  include/linux/if_bridge.h |   3 +
> > > > > > > > >  include/linux/if_vlan.h   |   2 +
> > > > > > > > >  include/linux/netdevice.h |  11 ++
> > > > > > > > >  include/net/dsa.h         |  17 +++
> > > > > > > > >  net/8021q/Kconfig         |  12 ++
> > > > > > > > >  net/8021q/vlan.c          |   3 +
> > > > > > > > >  net/8021q/vlan.h          |   2 +
> > > > > > > > >  net/8021q/vlan_core.c     |  25 ++++
> > > > > > > > >  net/8021q/vlan_dev.c      | 102 +++++++++++---
> > > > > > > > >  net/bridge/br_if.c        |  40 ++++++
> > > > > > > > >  net/bridge/br_multicast.c |  21 ++-
> > > > > > > > >  net/bridge/br_switchdev.c |   4 +-
> > > > > > > > >  net/core/dev_addr_lists.c | 144 +++++++++++++++----
> > > > > > > > >  net/dsa/Kconfig           |   1 +
> > > > > > > > >  net/dsa/dsa2.c            |   6 +
> > > > > > > > >  net/dsa/dsa_priv.h        |  27 +++-
> > > > > > > > >  net/dsa/port.c            | 155 ++++++++++++++++----
> > > > > > > > >  net/dsa/slave.c           | 288 +++++++++++++++++++++++++++++++-------
> > > > > > > > >  net/dsa/switch.c          |  36 +++++
> > > > > > > > >  net/ethernet/eth.c        |  12 +-
> > > > > > > > >  20 files changed, 780 insertions(+), 131 deletions(-)
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 2.25.1
> > > > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > -Vladimir
> > > > >
> > > > > -Vladimir
> > > 
> > > Thanks,
> > > -Vladimir
> 
> Thanks,
> -Vladimir

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding
  2020-07-23 22:35         ` Vladimir Oltean
@ 2020-07-27 17:15           ` Ido Schimmel
  0 siblings, 0 replies; 46+ messages in thread
From: Ido Schimmel @ 2020-07-27 17:15 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, netdev, Horatiu Vultur,
	Allan W. Nielsen, Nikolay Aleksandrov, Roopa Prabhu

On Fri, Jul 24, 2020 at 01:35:51AM +0300, Vladimir Oltean wrote:
> On Mon, May 25, 2020 at 11:11:11PM +0300, Ido Schimmel wrote:
> > On Sun, May 24, 2020 at 07:13:46PM +0300, Vladimir Oltean wrote:
> > > Hi Ido,
> > > 
> > > On Sun, 24 May 2020 at 17:26, Ido Schimmel <idosch@idosch.org> wrote:
> > > >
> > > > On Fri, May 22, 2020 at 12:10:33AM +0300, Vladimir Oltean wrote:
> > > > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > > >
> > > > > In cases where the bridge is offloaded by a switchdev, there are
> > > > > situations where we can optimize RX filtering towards the host. To be
> > > > > precise, the host only needs to do termination, which it can do by
> > > > > responding at the MAC addresses of the slave ports and of the bridge
> > > > > interface itself. But most notably, it doesn't need to do forwarding,
> > > > > so there is no need to see packets with unknown destination address.
> > > > >
> > > > > But there are, however, cases when a switchdev does need to flood to the
> > > > > CPU. Such an example is when the switchdev is bridged with a foreign
> > > > > interface, and since there is no offloaded datapath, packets need to
> > > > > pass through the CPU. Currently this is the only identified case, but it
> > > > > can be extended at any time.
> > > > >
> > > > > So far, switchdev implementers made driver-level assumptions, such as:
> > > > > this chip is never integrated in SoCs where it can be bridged with a
> > > > > foreign interface, so I'll just disable host flooding and save some CPU
> > > > > cycles. Or: I can never know what else can be bridged with this
> > > > > switchdev port, so I must leave host flooding enabled in any case.
> > > > >
> > > > > Let the bridge drive the host flooding decision, and pass it to
> > > > > switchdev via the same mechanism as the external flooding flags.
> > > > >
> > > > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > > > ---
> > > > >  include/linux/if_bridge.h |  3 +++
> > > > >  net/bridge/br_if.c        | 40 +++++++++++++++++++++++++++++++++++++++
> > > > >  net/bridge/br_switchdev.c |  4 +++-
> > > > >  3 files changed, 46 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> > > > > index b3a8d3054af0..6891a432862d 100644
> > > > > --- a/include/linux/if_bridge.h
> > > > > +++ b/include/linux/if_bridge.h
> > > > > @@ -49,6 +49,9 @@ struct br_ip_list {
> > > > >  #define BR_ISOLATED          BIT(16)
> > > > >  #define BR_MRP_AWARE         BIT(17)
> > > > >  #define BR_MRP_LOST_CONT     BIT(18)
> > > > > +#define BR_HOST_FLOOD                BIT(19)
> > > > > +#define BR_HOST_MCAST_FLOOD  BIT(20)
> > > > > +#define BR_HOST_BCAST_FLOOD  BIT(21)
> > > > >
> > > > >  #define BR_DEFAULT_AGEING_TIME       (300 * HZ)
> > > > >
> > > > > diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> > > > > index a0e9a7937412..aae59d1e619b 100644
> > > > > --- a/net/bridge/br_if.c
> > > > > +++ b/net/bridge/br_if.c
> > > > > @@ -166,6 +166,45 @@ void br_manage_promisc(struct net_bridge *br)
> > > > >       }
> > > > >  }
> > > > >
> > > > > +static int br_manage_host_flood(struct net_bridge *br)
> > > > > +{
> > > > > +     const unsigned long mask = BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD |
> > > > > +                                BR_HOST_BCAST_FLOOD;
> > > > > +     struct net_bridge_port *p, *q;
> > > > > +
> > > > > +     list_for_each_entry(p, &br->port_list, list) {
> > > > > +             unsigned long flags = p->flags;
> > > > > +             bool sw_bridging = false;
> > > > > +             int err;
> > > > > +
> > > > > +             list_for_each_entry(q, &br->port_list, list) {
> > > > > +                     if (p == q)
> > > > > +                             continue;
> > > > > +
> > > > > +                     if (!netdev_port_same_parent_id(p->dev, q->dev)) {
> > > > > +                             sw_bridging = true;
> > > >
> > > > It's not that simple. There are cases where not all bridge slaves have
> > > > the same parent ID and still there is no reason to flood traffic to the
> > > > CPU. VXLAN, for example.
> > > >
> > > > You could argue that the VXLAN device needs to have the same parent ID
> > > > as the physical netdevs member in the bridge, but it will break your
> > > > data path. For example, lets assume your hardware decided to flood a
> > > > packet in L2. The packet will egress all the local ports, but will also
> > > > perform VXLAN encapsulation. The packet continues with the IP of the
> > > > remote VTEP(s) to the underlay router and then encounters a neighbour
> > > > miss exception, which sends it to the CPU for resolution.
> > > >
> > > > Since this exception was encountered in the router the driver would mark
> > > > the packet with 'offload_fwd_mark', as it already performed L2
> > > > forwarding. If the VXLAN device has the same parent ID as the physical
> > > > netdevs, then the Linux bridge will never let it egress, nothing will
> > > > trigger neighbour resolution and the packet will be discarded.
> > > >
> > > 
> > > I wasn't going to argue that.
> > > Ok, so with a bridged VXLAN only certain multicast DMACs corresponding
> > > to multicast IPs should be flooded to the CPU.
> > > Actually Allan's example was a bit simpler, he said that host flooding
> > > can be made a per-VLAN flag. I'm glad that you raised this. So maybe
> > > we should try to define some mechanism by which virtual interfaces can
> > > specify to the bridge that they don't need to see all traffic? Do you
> > > have any ideas?
> > 
> > Maybe, when a port joins a bridge, query member ports if they can
> > forward traffic to it in hardware and based on the answer determine the
> > flooding towards the CPU?
> > 
> 
> Hi Ido, Allan,
> 
> I understand less and less of this. What I don't really understand is,
> if you have a switchdev bridged with a vtep like this:
> 
>  +-------------------------+
>  |           br0           |
>  +-------------------------+
>      |                |
>      |           +--------+
>      |           | vxlan0 |
>      |           +--------+
>      |                |
>  +--------+      +--------+
>  |  swp0  |      |  eth0  |
>  +--------+      +--------+
> 
> why would the swp0 interface care about the remote_ip at all. To the
> traffic seen by swp0, the VXLAN segment doesn't exist. Encapsulation and
> decapsulation all happen outside of the switchdev interface. All that
> switchdev sees is that, from the CPU side, it's talking to a bunch of
> MAC addresses.

I don't understand "Encapsulation and decapsulation all happen outside
of the switchdev interface". What does it mean? Encapsulation and
decapsulation happen in hardware... Frame is received by swp0, forwarded
to hardware VTEP, encapsulated and routed towards its destination. The
CPU does not see the packet. Same with decapsulation.

You patch instructs drivers to flood traffic to the CPU if netdevs with
different parent ID are member in it. I explained that this breaks with
VXLAN.

swp0 and vxlan0 do not have the same parent ID yet this does not mean
packets should be flooded to the CPU. I also explained why they should
not have the same parent ID:

1. Packet is received by swp0
2. Forwarded in hardware to hardware VTEP
3. VTEP performs encapsulation
4. VXLAN packet is routed in hardware
5. VXLAN packet encounters a neighbour miss in router
6. Original packet is trapped to CPU from swp0 because of an unresolved
neighbour
7. Driver marks packet with 'offload_fwd_mark' because it was already L2
forwarded in hardware
8. Packet reaches software bridge
9. bridge does not forward it to swpX netdevs because they share the
same parent ID as swp0 and packet is marked
10. Packet is forwarded to vxlan0
11. Packet is routed in and neighbour is resolved

Since the neighbour is now resolved the next packet will be completely
forwarded in hardware.

If vxlan0 and swp0 had the same parent ID, then step 10 would never
happen and the neighbour would never be resolved.

> 
> The same comment also applies for 8021q, in fact. I did try this
> experiment, to bridge a switchdev with a VLAN sub-interface of another
> port. I don't know why, I used to have the misconception that the desire
> in doing that would be to somehow only extract one VLAN ID from the
> switchdev, and the rest could be kept outside of the CPU's flooding
> domain. But that isn't the case at all. When bridging, I'm bridging the
> _entire_ traffic of swp0 with, say, eth0.100. And, as in the case of
> vxlan, encap/decap all happens outside of switchdev. So, contrary to my
> initial expectation, if I'm receiving on swp0 a packet tagged with VLAN
> 100, it would end up exiting the bridge, on eth0, with 2 VLAN tags with
> ID 100.
> 
> Simply put, I think my change is fine the way it is. Either that, or I
> just don't understand your comment about querying bridge members whether
> they can forward in hardware. How are you dealing with this today?
> 
> Thanks,
> -Vladimir

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-07-27 16:56                 ` Ido Schimmel
@ 2020-10-27 11:52                   ` Vladimir Oltean
  2020-10-28 14:43                     ` Ido Schimmel
  0 siblings, 1 reply; 46+ messages in thread
From: Vladimir Oltean @ 2020-10-27 11:52 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, vyasevich, netdev,
	UNGLinuxDriver, Nikolay Aleksandrov, Roopa Prabhu

Hi Ido,

On Mon, Jul 27, 2020 at 07:56:38PM +0300, Ido Schimmel wrote:
> > The whole purpose of my patch series is to remove the CPU port from the
> > flood domain of all switchdev net_devices. That means, when an unknown
> > unicast packet ingresses, it will be flooded but not to the CPU. 
> 
> Good. This is what happens in mlxsw today.
> 
> > For frames that the CPU wants to see, there should be a universal
> > mechanism for it to whitelist them, by {DMAC, VID}. Otherwise, things
> > don't scale.
> > 
> > There is one such mechanism already, and that is dev_uc_add(). It used
> > to install an address into a device's RX filter using DMAC only, and
> > Ivan Khoronzhuk's patches have added a new dev_vid_uc_add() that allow
> > additional filtering by VLAN.
> 
> Yes, but please note that when you are talking about packets the CPU
> cares about, then the device is the bridge device. Not its slaves which
> are "promiscuous by definition".

This is not completely true. A switchdev port can have a bridge upper or
not. You are only concentrating on the traffic that the bridge would be
interested in seeing, but I am also thinking of what traffic the CPU
should receive from this port in "standalone" mode. All traffic? Not
so compelling.

> > This is fundamentally because the destination MAC address is parsed by
> > a network card for _termination_ purposes. And because a switch
> > doesn't do _termination_, there is no reason to filter by destination
> > MAC (ignore ACL and such). But a Linux switchdev is capable of
> > termination. In the case of switchdev, termination means sending to
> > the CPU.
> 
> You keep saying "CPU", but it's because you are most likely only
> concerned with switches that are not capable of L3 forwarding. In mlxsw
> we never send packets from the FDB to the CPU, but to the "router port".
> There the packets (whether unicast or multicast) are routed and either
> forwarded to a different port or locally received.

I don't know nearly enough about IP forwarding offload to make a
relevant comment here.

> > 
> > My interpretation of the meaning of dev_uc_add() for switchdev (and
> > therefore, of its opposite - promiscuous mode) is at odds with previous
> > work done for non-switchdev. Take Vlad Yasevich's work "[Bridge] [PATCH
> > net-next 0/8] Non-promisc bidge ports support" for example:
> > 
> > https://lists.linuxfoundation.org/pipermail/bridge/2014-May/008940.html
> > 
> > He is arguing that a bridge port without flood&learn doesn't need
> > promiscuous mode, because all addresses can be statically known, and
> > therefore, he added code to the bridge that does the following:
> > 
> > - syncs the bridge MAC address to all non-promisc bridge slaves, via
> >   dev_uc_add()
> > - syncs the MAC addresses of all static FDB entries on all ingress
> >   non-promisc bridge slave ports, via dev_uc_add()
> > 
> > with the obvious goal that "the bridge slave shouldn't drop these
> > packets".
> 
> Lets say all the ports are not automatic (using Vlad's terminology),
> then packets can only be forwarded based on FDB entries. Any packets
> with a destination MAC not in the FDB will be dropped by the bridge.
> Agree?
> 
> Now, if this is the case, then you know in advance which MACs will not
> be dropped by the bridge. Therefore, you can program only these MACs to
> the Rx filters of the bridge slaves (simple NICs). That way, instead of
> having the bridge (the CPU) waste cycles on dropping packets you can
> drop them in hardware using the NIC's Rx filters.

_if_ there is a bridge.

> > 
> > In my interpretation of dev_uc_add(), I would have expected that:
> > - the bridge MAC address, as well as any other secondary unicast
> >   addresses that the bridge has, by means of its uppers (like macvlan,
> >   802.1q, etc) calling dev_uc_add() on it, would be synced to the bridge
> >   slaves anyway, regardless of whether they're promisc or not
> 
> Is this supposed to be related to previous paragraph about Vald's work?
> I don't really follow.

Yes, of course.

> Anyway, he specifically wrote that "There are some other cases when
> promiscuous mode has to be turned back on. One is when the bridge
> itself if placed in promiscuous mode".
> 
> When you start adding bridge uppers with different MACs then the bridge
> will enter promiscuous mode and all unknown unicast packets will be
> flooded to it. In this case packets without a matching FDB will no
> longer be dropped by the bridge and therefore the NIC can't drop them in
> hardware using its Rx filters anymore.

All would be fine if the bridge would declare IFF_UNICAST_FLT and
propagate its address lists to its slave ports somehow, either through
dev_uc_add/dev_mc_add or through SWITCHDEV_OBJ_ID_HOST_MDB [ and a new
SWITCHDEV_OBJ_ID_HOST_FDB, I can only assume ].

But if we are to introduce a new SWITCHDEV_OBJ_ID_HOST_FDB, then we
would be working around the problem, and the non-bridged switchdev
interfaces would still have no proper way of doing RX filtering.

> > - the static FDB entries are synced to the bridge ports only in the
> >   non-switchdev case. This is because for switchdev, I am treating a
> >   dev_uc_add() as a FDB entry towards the CPU, and therefore this would
> >   overwrite the FDB entry towards the external port.
> 
> OK, so this interpretation of "treating a dev_uc_add() as a FDB entry
> towards the CPU" is wrong.
> 
> You already wrote that "For a switchdev, promisc vs non-promisc doesn't
> mean a thing" and that "[dev_uc_add() is] used to install an address
> into a device's RX filter".
> 
> You can't tell me that switches do not perform Rx filtering and then
> decide to re-purpose a mechanism that is used for Rx filtering...

No, that's exactly what I'm trying to tell you...

> > 
> > In my interpretation, things would have worked neatly for the most part,
> > not only for unicast but also for multicast. For example, an application
> > wants to see a multicast stream, so it calls setsockopt(SOL_SOCKET,
> > PACKET_ADD_MEMBERSHIP, PACKET_MR_MULTICAST) with the multicast address
> > it wants to see. This is translated by the kernel into a dev_mc_add()
> > and sent to the network device. For a non-switchdev, this would have
> > been enough. For a switchdev, if I also installed the address in the
> > CPU's filter, it would have also been enough. Things 'just work' and
> > everybody's happy.
> > 
> > > When you look at it from hardware offload perspective, not every packet
> > > received by the bridge interface should reach the CPU. Actually, most
> > > should not reach it. Otherwise it would mean that every routed packet
> > > would need to go to the CPU, which is not feasible. If you can't perform
> > > routing in hardware, then yes, you need to send such packets to the CPU.
> > > 
> > > In mlxsw we can't perform MAC filtering in the router like in the
> > > software data path, so in order not to route packets we should not, we
> > > only send to the router packets with destination MACs that correspond to
> > > that of the bridge or one of its uppers. We don't flood all unknown
> > > unicast packets there.
> > > 
> > > In the case of hardware offload it's relatively easy to do this sort of
> > > tracking because only a limited set of upper devices topologies are
> > > actually supported. I'm not sure how feasible it is with every
> > > combination of upper devices supported by the kernel. It seems easiest
> > > to just put the bridge interface in promiscuous mode and let upper
> > > layers perform the filtering. Like it is today.
> > > 
> > 
> > Are you suggesting that tracking the uppers is the only way to do what I
> > want?
> 
> I don't see a different way. Your goal is to prevent flooding of unknown
> unicast packets to the CPU. If the bridge is not in promiscuous mode,
> then unknown unicast packets are not flooded to it. Only FDB entries
> pointing to the bridge device should go to the CPU.
>
> The problem starts when the bridge enters promiscuous mode. When does it
> happen? When you start adding uppers that do not inherit the bridge's
> MAC. Why? Because the bridge does not support unicast filtering. It is
> not an easy thing to do when you have multiple levels of stacked
> devices.

No, the problem doesn't start there, or end there. I just think that the
proposed solution would be incomplete if it just relied on tracking
uppers.

Take the case of IEEE 1588 packets. They should be trapped to the CPU
and not forwarded. But the destination address at which PTP packets are
sent is not set in stone, it is something that the profile decides.

How to ensure these packets are trapped to the CPU?
You're probably going to say "devlink trap", but:
- I don't want the PTP packets to be unconditionally trapped. I see it
  as a perfectly valid use case for a switch to be PTP-unaware and just
  let somebody else terminate those packets. But "devlink trap" only
  gives you an option to see what the traps are, not to turn them off.
- The hardware I'm working with doesn't even trap PTP to the CPU by
  default. I would need to hardcode trapping rules in the driver, to
  some multicast addresses I can just guess, then I would report them as
  non-disableable devlink traps.

Applications do call setsockopt with IP_ADD_MEMBERSHIP, IPV6_ADD_MEMBERSHIP
or PACKET_ADD_MEMBERSHIP. However I don't see how that is turning into a
notification that the driver can use, except through dev_mc_add.

Therefore, it simply looks easier to me to stub out the extraneous calls
to dev_uc_add and dev_mc_add, rather than add parallel plumbing into
net/ipv4/igmp.c, for ports that are "promiscuous by default".

What do you think about this example? Isn't it something that should be
supported by design?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-10-27 11:52                   ` Vladimir Oltean
@ 2020-10-28 14:43                     ` Ido Schimmel
  2020-10-28 18:46                       ` Vladimir Oltean
  0 siblings, 1 reply; 46+ messages in thread
From: Ido Schimmel @ 2020-10-28 14:43 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, vyasevich, netdev,
	UNGLinuxDriver, Nikolay Aleksandrov, Roopa Prabhu

On Tue, Oct 27, 2020 at 01:52:49PM +0200, Vladimir Oltean wrote:
> Hi Ido,

Hello,

> 
> On Mon, Jul 27, 2020 at 07:56:38PM +0300, Ido Schimmel wrote:
> > > The whole purpose of my patch series is to remove the CPU port from the
> > > flood domain of all switchdev net_devices. That means, when an unknown
> > > unicast packet ingresses, it will be flooded but not to the CPU. 
> > 
> > Good. This is what happens in mlxsw today.
> > 
> > > For frames that the CPU wants to see, there should be a universal
> > > mechanism for it to whitelist them, by {DMAC, VID}. Otherwise, things
> > > don't scale.
> > > 
> > > There is one such mechanism already, and that is dev_uc_add(). It used
> > > to install an address into a device's RX filter using DMAC only, and
> > > Ivan Khoronzhuk's patches have added a new dev_vid_uc_add() that allow
> > > additional filtering by VLAN.
> > 
> > Yes, but please note that when you are talking about packets the CPU
> > cares about, then the device is the bridge device. Not its slaves which
> > are "promiscuous by definition".
> 
> This is not completely true. A switchdev port can have a bridge upper or
> not. You are only concentrating on the traffic that the bridge would be
> interested in seeing, but I am also thinking of what traffic the CPU
> should receive from this port in "standalone" mode. All traffic? Not
> so compelling.

In "standalone" mode your netdev is like any other netdev and if it does
not support Rx filtering, then pass everything to the CPU and let it
filter what it does not want to see. I don't see the problem.  This is
exactly what mlxsw did before L3 forwarding was introduced. As soon as
you removed a netdev from a bridge we created an internal bridge between
the port and the CPU port and flooded everything to the CPU.

> 
> > > This is fundamentally because the destination MAC address is parsed by
> > > a network card for _termination_ purposes. And because a switch
> > > doesn't do _termination_, there is no reason to filter by destination
> > > MAC (ignore ACL and such). But a Linux switchdev is capable of
> > > termination. In the case of switchdev, termination means sending to
> > > the CPU.
> > 
> > You keep saying "CPU", but it's because you are most likely only
> > concerned with switches that are not capable of L3 forwarding. In mlxsw
> > we never send packets from the FDB to the CPU, but to the "router port".
> > There the packets (whether unicast or multicast) are routed and either
> > forwarded to a different port or locally received.
> 
> I don't know nearly enough about IP forwarding offload to make a
> relevant comment here.
> 
> > > 
> > > My interpretation of the meaning of dev_uc_add() for switchdev (and
> > > therefore, of its opposite - promiscuous mode) is at odds with previous
> > > work done for non-switchdev. Take Vlad Yasevich's work "[Bridge] [PATCH
> > > net-next 0/8] Non-promisc bidge ports support" for example:
> > > 
> > > https://lists.linuxfoundation.org/pipermail/bridge/2014-May/008940.html
> > > 
> > > He is arguing that a bridge port without flood&learn doesn't need
> > > promiscuous mode, because all addresses can be statically known, and
> > > therefore, he added code to the bridge that does the following:
> > > 
> > > - syncs the bridge MAC address to all non-promisc bridge slaves, via
> > >   dev_uc_add()
> > > - syncs the MAC addresses of all static FDB entries on all ingress
> > >   non-promisc bridge slave ports, via dev_uc_add()
> > > 
> > > with the obvious goal that "the bridge slave shouldn't drop these
> > > packets".
> > 
> > Lets say all the ports are not automatic (using Vlad's terminology),
> > then packets can only be forwarded based on FDB entries. Any packets
> > with a destination MAC not in the FDB will be dropped by the bridge.
> > Agree?
> > 
> > Now, if this is the case, then you know in advance which MACs will not
> > be dropped by the bridge. Therefore, you can program only these MACs to
> > the Rx filters of the bridge slaves (simple NICs). That way, instead of
> > having the bridge (the CPU) waste cycles on dropping packets you can
> > drop them in hardware using the NIC's Rx filters.
> 
> _if_ there is a bridge.

But he is talking about a bridge... I don't follow. You even wrote "He
is arguing that a bridge port". So how come there is no bridge?

> 
> > > 
> > > In my interpretation of dev_uc_add(), I would have expected that:
> > > - the bridge MAC address, as well as any other secondary unicast
> > >   addresses that the bridge has, by means of its uppers (like macvlan,
> > >   802.1q, etc) calling dev_uc_add() on it, would be synced to the bridge
> > >   slaves anyway, regardless of whether they're promisc or not
> > 
> > Is this supposed to be related to previous paragraph about Vald's work?
> > I don't really follow.
> 
> Yes, of course.
> 
> > Anyway, he specifically wrote that "There are some other cases when
> > promiscuous mode has to be turned back on. One is when the bridge
> > itself if placed in promiscuous mode".
> > 
> > When you start adding bridge uppers with different MACs then the bridge
> > will enter promiscuous mode and all unknown unicast packets will be
> > flooded to it. In this case packets without a matching FDB will no
> > longer be dropped by the bridge and therefore the NIC can't drop them in
> > hardware using its Rx filters anymore.
> 
> All would be fine if the bridge would declare IFF_UNICAST_FLT and
> propagate its address lists to its slave ports somehow, either through
> dev_uc_add/dev_mc_add or through SWITCHDEV_OBJ_ID_HOST_MDB [ and a new
> SWITCHDEV_OBJ_ID_HOST_FDB, I can only assume ].
> 
> But if we are to introduce a new SWITCHDEV_OBJ_ID_HOST_FDB, then we
> would be working around the problem, and the non-bridged switchdev
> interfaces would still have no proper way of doing RX filtering.

What prevents you from implementing ndo_set_rx_mode() in your driver?

Let me re-iterate my point again. Rx filtering determines which packets
can be received by the port. In "standalone" mode where you do not
support L3 forwarding I agree that the Rx filter determines which
packets the CPU should see.

However, in the "non-standalone" mode where your netdevs are enslaved to
a bridge that you offload, then the bridge's FDB determines which
packets the CPU should see. The ports themselves are in promiscuous mode
because the bridge (either SW one or HW one) wants to see all the
received packets.

See more below.

> 
> > > - the static FDB entries are synced to the bridge ports only in the
> > >   non-switchdev case. This is because for switchdev, I am treating a
> > >   dev_uc_add() as a FDB entry towards the CPU, and therefore this would
> > >   overwrite the FDB entry towards the external port.
> > 
> > OK, so this interpretation of "treating a dev_uc_add() as a FDB entry
> > towards the CPU" is wrong.
> > 
> > You already wrote that "For a switchdev, promisc vs non-promisc doesn't
> > mean a thing" and that "[dev_uc_add() is] used to install an address
> > into a device's RX filter".
> > 
> > You can't tell me that switches do not perform Rx filtering and then
> > decide to re-purpose a mechanism that is used for Rx filtering...
> 
> No, that's exactly what I'm trying to tell you...
> 
> > > 
> > > In my interpretation, things would have worked neatly for the most part,
> > > not only for unicast but also for multicast. For example, an application
> > > wants to see a multicast stream, so it calls setsockopt(SOL_SOCKET,
> > > PACKET_ADD_MEMBERSHIP, PACKET_MR_MULTICAST) with the multicast address
> > > it wants to see. This is translated by the kernel into a dev_mc_add()
> > > and sent to the network device. For a non-switchdev, this would have
> > > been enough. For a switchdev, if I also installed the address in the
> > > CPU's filter, it would have also been enough. Things 'just work' and
> > > everybody's happy.
> > > 
> > > > When you look at it from hardware offload perspective, not every packet
> > > > received by the bridge interface should reach the CPU. Actually, most
> > > > should not reach it. Otherwise it would mean that every routed packet
> > > > would need to go to the CPU, which is not feasible. If you can't perform
> > > > routing in hardware, then yes, you need to send such packets to the CPU.
> > > > 
> > > > In mlxsw we can't perform MAC filtering in the router like in the
> > > > software data path, so in order not to route packets we should not, we
> > > > only send to the router packets with destination MACs that correspond to
> > > > that of the bridge or one of its uppers. We don't flood all unknown
> > > > unicast packets there.
> > > > 
> > > > In the case of hardware offload it's relatively easy to do this sort of
> > > > tracking because only a limited set of upper devices topologies are
> > > > actually supported. I'm not sure how feasible it is with every
> > > > combination of upper devices supported by the kernel. It seems easiest
> > > > to just put the bridge interface in promiscuous mode and let upper
> > > > layers perform the filtering. Like it is today.
> > > > 
> > > 
> > > Are you suggesting that tracking the uppers is the only way to do what I
> > > want?
> > 
> > I don't see a different way. Your goal is to prevent flooding of unknown
> > unicast packets to the CPU. If the bridge is not in promiscuous mode,
> > then unknown unicast packets are not flooded to it. Only FDB entries
> > pointing to the bridge device should go to the CPU.
> >
> > The problem starts when the bridge enters promiscuous mode. When does it
> > happen? When you start adding uppers that do not inherit the bridge's
> > MAC. Why? Because the bridge does not support unicast filtering. It is
> > not an easy thing to do when you have multiple levels of stacked
> > devices.
> 
> No, the problem doesn't start there, or end there. I just think that the
> proposed solution would be incomplete if it just relied on tracking
> uppers.
> 
> Take the case of IEEE 1588 packets. They should be trapped to the CPU
> and not forwarded. But the destination address at which PTP packets are
> sent is not set in stone, it is something that the profile decides.
> 
> How to ensure these packets are trapped to the CPU?
> You're probably going to say "devlink trap", but:

I would say that it is up to the driver to configure this among all the
rest of the PTP configuration that it needs to do. mlxsw registers the
PTP trap during init because it is easy, but I assume we could also do
it when PTP is enabled.

> - I don't want the PTP packets to be unconditionally trapped. I see it
>   as a perfectly valid use case for a switch to be PTP-unaware and just
>   let somebody else terminate those packets. But "devlink trap" only
>   gives you an option to see what the traps are, not to turn them off.
> - The hardware I'm working with doesn't even trap PTP to the CPU by
>   default. I would need to hardcode trapping rules in the driver, to
>   some multicast addresses I can just guess, then I would report them as
>   non-disableable devlink traps.
> 
> Applications do call setsockopt with IP_ADD_MEMBERSHIP, IPV6_ADD_MEMBERSHIP
> or PACKET_ADD_MEMBERSHIP. However I don't see how that is turning into a
> notification that the driver can use, except through dev_mc_add.
> 
> Therefore, it simply looks easier to me to stub out the extraneous calls
> to dev_uc_add and dev_mc_add, rather than add parallel plumbing into
> net/ipv4/igmp.c, for ports that are "promiscuous by default".
> 
> What do you think about this example? Isn't it something that should be
> supported by design?

I believe it's already supported. Lets look at the "standalone" and
"non-standalone" cases:

1. Standalone: Your ndo_set_rx_mode() will be called and if you support
Rx filtering, you can program your filters accordingly. If not, then you
need to send everything to the CPU

2. Non-standalone and bridge is multicast aware: An IGMP membership
report is supposed to be sent via the bridge device (I assume you are
calling IP_ADD_MEMBERSHIP on the bridge device). This will cause the
bridge to create an MDB entry indicating that packets to this multicast
IP should be locally received. Drivers get it via the switchdev
operation Andrew added.

3. Non-standalone and bridge is not multicast aware: Incoming packet to
this multicast group are considered as broadcast and should be locally
received via the bridge device (CPU port in your case)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-10-28 14:43                     ` Ido Schimmel
@ 2020-10-28 18:46                       ` Vladimir Oltean
  2020-11-01 11:27                         ` Ido Schimmel
  0 siblings, 1 reply; 46+ messages in thread
From: Vladimir Oltean @ 2020-10-28 18:46 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, vyasevich, netdev,
	UNGLinuxDriver, Nikolay Aleksandrov, Roopa Prabhu

On Wed, Oct 28, 2020 at 04:43:38PM +0200, Ido Schimmel wrote:
> In "standalone" mode your netdev is like any other netdev and if it does
> not support Rx filtering, then pass everything to the CPU and let it
> filter what it does not want to see. I don't see the problem.  This is
> exactly what mlxsw did before L3 forwarding was introduced. As soon as
> you removed a netdev from a bridge we created an internal bridge between
> the port and the CPU port and flooded everything to the CPU.

Of course I was thinking about the better case where the netdev would
implement NETIF_F_UNICAST_FLT. If it would support filtering when
bridged, it would seem natural to me to also support it when not bridged.

> > > > My interpretation of the meaning of dev_uc_add() for switchdev (and
> > > > therefore, of its opposite - promiscuous mode) is at odds with previous
> > > > work done for non-switchdev. Take Vlad Yasevich's work "[Bridge] [PATCH
> > > > net-next 0/8] Non-promisc bidge ports support" for example:
> > > > 
> > > > https://lists.linuxfoundation.org/pipermail/bridge/2014-May/008940.html
> > > > 
> > > > He is arguing that a bridge port without flood&learn doesn't need
> > > > promiscuous mode, because all addresses can be statically known, and
> > > > therefore, he added code to the bridge that does the following:
> > > > 
> > > > - syncs the bridge MAC address to all non-promisc bridge slaves, via
> > > >   dev_uc_add()
> > > > - syncs the MAC addresses of all static FDB entries on all ingress
> > > >   non-promisc bridge slave ports, via dev_uc_add()
> > > > 
> > > > with the obvious goal that "the bridge slave shouldn't drop these
> > > > packets".
> > > 
> > > Lets say all the ports are not automatic (using Vlad's terminology),
> > > then packets can only be forwarded based on FDB entries. Any packets
> > > with a destination MAC not in the FDB will be dropped by the bridge.
> > > Agree?
> > > 
> > > Now, if this is the case, then you know in advance which MACs will not
> > > be dropped by the bridge. Therefore, you can program only these MACs to
> > > the Rx filters of the bridge slaves (simple NICs). That way, instead of
> > > having the bridge (the CPU) waste cycles on dropping packets you can
> > > drop them in hardware using the NIC's Rx filters.
> > 
> > _if_ there is a bridge.
> 
> But he is talking about a bridge... I don't follow. You even wrote "He
> is arguing that a bridge port". So how come there is no bridge?

Well, my problem is with the bridge's use of dev_uc_add, I'm sure you
got that by now. I would be forced to treat dev_uc_add differently
depending on whether or not I am bridged, I don't particularly like
that.

> > But if we are to introduce a new SWITCHDEV_OBJ_ID_HOST_FDB, then we
> > would be working around the problem, and the non-bridged switchdev
> > interfaces would still have no proper way of doing RX filtering.
> 
> What prevents you from implementing ndo_set_rx_mode() in your driver?

Nothing, that's exactly what I did here...

> Let me re-iterate my point again. Rx filtering determines which packets
> can be received by the port. In "standalone" mode where you do not
> support L3 forwarding I agree that the Rx filter determines which
> packets the CPU should see.
> 
> However, in the "non-standalone" mode where your netdevs are enslaved to
> a bridge that you offload, then the bridge's FDB determines which
> packets the CPU should see. The ports themselves are in promiscuous mode
> because the bridge (either SW one or HW one) wants to see all the
> received packets.

Agree. We all agree on this. However, the specifics are a bit fuzzy.

> > Take the case of IEEE 1588 packets. They should be trapped to the CPU
> > and not forwarded. But the destination address at which PTP packets are
> > sent is not set in stone, it is something that the profile decides.
> > 
> > How to ensure these packets are trapped to the CPU?
> > You're probably going to say "devlink trap", but:
> 
> I would say that it is up to the driver to configure this among all the
> rest of the PTP configuration that it needs to do. mlxsw registers the
> PTP trap during init because it is easy, but I assume we could also do
> it when PTP is enabled.

So based on the 

> > - I don't want the PTP packets to be unconditionally trapped. I see it
> >   as a perfectly valid use case for a switch to be PTP-unaware and just
> >   let somebody else terminate those packets. But "devlink trap" only
> >   gives you an option to see what the traps are, not to turn them off.
> > - The hardware I'm working with doesn't even trap PTP to the CPU by
> >   default. I would need to hardcode trapping rules in the driver, to
> >   some multicast addresses I can just guess, then I would report them as
> >   non-disableable devlink traps.
> > 
> > Applications do call setsockopt with IP_ADD_MEMBERSHIP, IPV6_ADD_MEMBERSHIP
> > or PACKET_ADD_MEMBERSHIP. However I don't see how that is turning into a
> > notification that the driver can use, except through dev_mc_add.
> > 
> > Therefore, it simply looks easier to me to stub out the extraneous calls
> > to dev_uc_add and dev_mc_add, rather than add parallel plumbing into
> > net/ipv4/igmp.c, for ports that are "promiscuous by default".
> > 
> > What do you think about this example? Isn't it something that should be
> > supported by design?
> 
> I believe it's already supported. Lets look at the "standalone" and
> "non-standalone" cases:
> 
> 1. Standalone: Your ndo_set_rx_mode() will be called and if you support
> Rx filtering, you can program your filters accordingly. If not, then you
> need to send everything to the CPU

Right, this is kind of what the patch set that we're commenting on is
doing.

> 2. Non-standalone and bridge is multicast aware: An IGMP membership
> report is supposed to be sent via the bridge device (I assume you are
> calling IP_ADD_MEMBERSHIP on the bridge device). This will cause the
> bridge to create an MDB entry indicating that packets to this multicast
> IP should be locally received. Drivers get it via the switchdev
> operation Andrew added.

I am not calling *_ADD_MEMBERSHIP on the bridge device, but on the slave
ports.

For PACKET_ADD_MEMBERSHIP, this should work as-is on swpN even if it's
bridged.

For IP_ADD_MEMBERSHIP, you would need to add some ebtables rules in
order for the bridge data path to not steal traffic on UDP ports 319 and
320 from the slave's data path.

But nonetheless, you get my point. Who will notify me of these multicast
addresses if I'm bridged and I need to terminate L2 or L4 PTP through
the data path of the slave interfaces and not of the bridge.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-10-28 18:46                       ` Vladimir Oltean
@ 2020-11-01 11:27                         ` Ido Schimmel
  2020-11-01 12:06                           ` Vladimir Oltean
  0 siblings, 1 reply; 46+ messages in thread
From: Ido Schimmel @ 2020-11-01 11:27 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, vyasevich, netdev,
	UNGLinuxDriver, Nikolay Aleksandrov, Roopa Prabhu

On Wed, Oct 28, 2020 at 08:46:44PM +0200, Vladimir Oltean wrote:
> On Wed, Oct 28, 2020 at 04:43:38PM +0200, Ido Schimmel wrote:
> > In "standalone" mode your netdev is like any other netdev and if it does
> > not support Rx filtering, then pass everything to the CPU and let it
> > filter what it does not want to see. I don't see the problem.  This is
> > exactly what mlxsw did before L3 forwarding was introduced. As soon as
> > you removed a netdev from a bridge we created an internal bridge between
> > the port and the CPU port and flooded everything to the CPU.
> 
> Of course I was thinking about the better case where the netdev would
> implement NETIF_F_UNICAST_FLT. If it would support filtering when
> bridged, it would seem natural to me to also support it when not bridged.

Please see below

> 
> > > > > My interpretation of the meaning of dev_uc_add() for switchdev (and
> > > > > therefore, of its opposite - promiscuous mode) is at odds with previous
> > > > > work done for non-switchdev. Take Vlad Yasevich's work "[Bridge] [PATCH
> > > > > net-next 0/8] Non-promisc bidge ports support" for example:
> > > > > 
> > > > > https://lists.linuxfoundation.org/pipermail/bridge/2014-May/008940.html
> > > > > 
> > > > > He is arguing that a bridge port without flood&learn doesn't need
> > > > > promiscuous mode, because all addresses can be statically known, and
> > > > > therefore, he added code to the bridge that does the following:
> > > > > 
> > > > > - syncs the bridge MAC address to all non-promisc bridge slaves, via
> > > > >   dev_uc_add()
> > > > > - syncs the MAC addresses of all static FDB entries on all ingress
> > > > >   non-promisc bridge slave ports, via dev_uc_add()
> > > > > 
> > > > > with the obvious goal that "the bridge slave shouldn't drop these
> > > > > packets".
> > > > 
> > > > Lets say all the ports are not automatic (using Vlad's terminology),
> > > > then packets can only be forwarded based on FDB entries. Any packets
> > > > with a destination MAC not in the FDB will be dropped by the bridge.
> > > > Agree?
> > > > 
> > > > Now, if this is the case, then you know in advance which MACs will not
> > > > be dropped by the bridge. Therefore, you can program only these MACs to
> > > > the Rx filters of the bridge slaves (simple NICs). That way, instead of
> > > > having the bridge (the CPU) waste cycles on dropping packets you can
> > > > drop them in hardware using the NIC's Rx filters.
> > > 
> > > _if_ there is a bridge.
> > 
> > But he is talking about a bridge... I don't follow. You even wrote "He
> > is arguing that a bridge port". So how come there is no bridge?
> 
> Well, my problem is with the bridge's use of dev_uc_add, I'm sure you
> got that by now. I would be forced to treat dev_uc_add differently
> depending on whether or not I am bridged, I don't particularly like
> that.

See below

> 
> > > But if we are to introduce a new SWITCHDEV_OBJ_ID_HOST_FDB, then we
> > > would be working around the problem, and the non-bridged switchdev
> > > interfaces would still have no proper way of doing RX filtering.
> > 
> > What prevents you from implementing ndo_set_rx_mode() in your driver?
> 
> Nothing, that's exactly what I did here...
> 
> > Let me re-iterate my point again. Rx filtering determines which packets
> > can be received by the port. In "standalone" mode where you do not
> > support L3 forwarding I agree that the Rx filter determines which
> > packets the CPU should see.
> > 
> > However, in the "non-standalone" mode where your netdevs are enslaved to
> > a bridge that you offload, then the bridge's FDB determines which
> > packets the CPU should see. The ports themselves are in promiscuous mode
> > because the bridge (either SW one or HW one) wants to see all the
> > received packets.
> 
> Agree. We all agree on this. However, the specifics are a bit fuzzy.

Vladimir,

The fundamental issue here is that you try to overload Rx filtering with
filtering towards the CPU port. These are two different things that are
only the same when the netdev is in "standalone" mode. If every received
packet is flooded to the CPU, then yes, Rx filtering means CPU
filtering. However, if received packets are forwarded to other ports,
then Rx filtering is not equivalent to CPU filtering.

You can implement filtering towards the CPU via ndo_set_rx_mode() if you
expose netdevs for the CPU port:

+---------+                           +----------+
|         +------+ PCI / Eth   +------+          |
|   CPU   | cpu1 +-------------+ cpu0 |   ASIC   |
|         +------+             +------+          |
+---------+                           +----------+

And implement ndo_set_rx_mode() on 'cpu1'. Personally, I wouldn't go in
this direction.

> 
> > > Take the case of IEEE 1588 packets. They should be trapped to the CPU
> > > and not forwarded. But the destination address at which PTP packets are
> > > sent is not set in stone, it is something that the profile decides.
> > > 
> > > How to ensure these packets are trapped to the CPU?
> > > You're probably going to say "devlink trap", but:
> > 
> > I would say that it is up to the driver to configure this among all the
> > rest of the PTP configuration that it needs to do. mlxsw registers the
> > PTP trap during init because it is easy, but I assume we could also do
> > it when PTP is enabled.
> 
> So based on the 
> 
> > > - I don't want the PTP packets to be unconditionally trapped. I see it
> > >   as a perfectly valid use case for a switch to be PTP-unaware and just
> > >   let somebody else terminate those packets. But "devlink trap" only
> > >   gives you an option to see what the traps are, not to turn them off.
> > > - The hardware I'm working with doesn't even trap PTP to the CPU by
> > >   default. I would need to hardcode trapping rules in the driver, to
> > >   some multicast addresses I can just guess, then I would report them as
> > >   non-disableable devlink traps.
> > > 
> > > Applications do call setsockopt with IP_ADD_MEMBERSHIP, IPV6_ADD_MEMBERSHIP
> > > or PACKET_ADD_MEMBERSHIP. However I don't see how that is turning into a
> > > notification that the driver can use, except through dev_mc_add.
> > > 
> > > Therefore, it simply looks easier to me to stub out the extraneous calls
> > > to dev_uc_add and dev_mc_add, rather than add parallel plumbing into
> > > net/ipv4/igmp.c, for ports that are "promiscuous by default".
> > > 
> > > What do you think about this example? Isn't it something that should be
> > > supported by design?
> > 
> > I believe it's already supported. Lets look at the "standalone" and
> > "non-standalone" cases:
> > 
> > 1. Standalone: Your ndo_set_rx_mode() will be called and if you support
> > Rx filtering, you can program your filters accordingly. If not, then you
> > need to send everything to the CPU
> 
> Right, this is kind of what the patch set that we're commenting on is
> doing.
> 
> > 2. Non-standalone and bridge is multicast aware: An IGMP membership
> > report is supposed to be sent via the bridge device (I assume you are
> > calling IP_ADD_MEMBERSHIP on the bridge device). This will cause the
> > bridge to create an MDB entry indicating that packets to this multicast
> > IP should be locally received. Drivers get it via the switchdev
> > operation Andrew added.
> 
> I am not calling *_ADD_MEMBERSHIP on the bridge device, but on the slave
> ports.
> 
> For PACKET_ADD_MEMBERSHIP, this should work as-is on swpN even if it's
> bridged.
> 
> For IP_ADD_MEMBERSHIP, you would need to add some ebtables rules in
> order for the bridge data path to not steal traffic on UDP ports 319 and
> 320 from the slave's data path.
> 
> But nonetheless, you get my point. Who will notify me of these multicast
> addresses if I'm bridged and I need to terminate L2 or L4 PTP through
> the data path of the slave interfaces and not of the bridge.

IIRC, getting PTP to work on bridged interfaces is tricky and this is
something that is not currently supported by mlxsw or Cumulus:
https://github.com/Mellanox/mlxsw/wiki/Precision-Time-Protocol#configuring-ptp
https://docs.cumulusnetworks.com/cumulus-linux-42/System-Configuration/Setting-Date-and-Time/#configure-the-ptp-boundary-clock

If the purpose of this discussion is to get PTP working in this
scenario, then lets have a separate discussion about that. This is
something we looked at in the past, but didn't make any progress (mainly
because we only got requirements for PTP over routed ports).

Anyway, opening packet sockets on interfaces (bridged or not) that pass
offloaded traffic will not get you this traffic to the packet sockets.
There was already a discussion about this last year (I think Microchip
guys started it) in the context of tcpdump.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-11-01 11:27                         ` Ido Schimmel
@ 2020-11-01 12:06                           ` Vladimir Oltean
  2020-11-01 14:42                             ` Ido Schimmel
  0 siblings, 1 reply; 46+ messages in thread
From: Vladimir Oltean @ 2020-11-01 12:06 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, vyasevich, netdev,
	UNGLinuxDriver, Nikolay Aleksandrov, Roopa Prabhu

On Sun, Nov 01, 2020 at 01:27:31PM +0200, Ido Schimmel wrote:
> IIRC, getting PTP to work on bridged interfaces is tricky and this is
> something that is not currently supported by mlxsw or Cumulus:
> https://github.com/Mellanox/mlxsw/wiki/Precision-Time-Protocol#configuring-ptp
> https://docs.cumulusnetworks.com/cumulus-linux-42/System-Configuration/Setting-Date-and-Time/#configure-the-ptp-boundary-clock
> 
> If the purpose of this discussion is to get PTP working in this
> scenario, then lets have a separate discussion about that. This is
> something we looked at in the past, but didn't make any progress (mainly
> because we only got requirements for PTP over routed ports).
> 
> Anyway, opening packet sockets on interfaces (bridged or not) that pass
> offloaded traffic will not get you this traffic to the packet sockets.

I don't think it's a different discussion, I think my issues with what
you're proposing are coming exactly from there. I think that user space
today is expecting that when it uses the *_ADD_MEMBERSHIP API, it is
sufficient in order to see that traffic over a socket. Switchdev and DSA
are kernel-only concepts, they have no user-facing API. I am not sure
that it is desirable to change that. I hope you aren't telling me that
we should add a --please argument to the PACKET_ADD_MEMBERSHIP /
IP_ADD_MEMBERSHIP UAPI just in case the network interface is a switchdev
port...

> There was already a discussion about this last year (I think Microchip
> guys started it) in the context of tcpdump.

The discussion with Microchip people was slightly different, as it was
tackling the notion of promiscuity on switchdev interfaces.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-11-01 12:06                           ` Vladimir Oltean
@ 2020-11-01 14:42                             ` Ido Schimmel
  2020-11-01 15:04                               ` Vladimir Oltean
  0 siblings, 1 reply; 46+ messages in thread
From: Ido Schimmel @ 2020-11-01 14:42 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, vyasevich, netdev,
	UNGLinuxDriver, Nikolay Aleksandrov, Roopa Prabhu

On Sun, Nov 01, 2020 at 02:06:44PM +0200, Vladimir Oltean wrote:
> On Sun, Nov 01, 2020 at 01:27:31PM +0200, Ido Schimmel wrote:
> > IIRC, getting PTP to work on bridged interfaces is tricky and this is
> > something that is not currently supported by mlxsw or Cumulus:
> > https://github.com/Mellanox/mlxsw/wiki/Precision-Time-Protocol#configuring-ptp
> > https://docs.cumulusnetworks.com/cumulus-linux-42/System-Configuration/Setting-Date-and-Time/#configure-the-ptp-boundary-clock
> > 
> > If the purpose of this discussion is to get PTP working in this
> > scenario, then lets have a separate discussion about that. This is
> > something we looked at in the past, but didn't make any progress (mainly
> > because we only got requirements for PTP over routed ports).
> > 
> > Anyway, opening packet sockets on interfaces (bridged or not) that pass
> > offloaded traffic will not get you this traffic to the packet sockets.
> 
> I don't think it's a different discussion, I think my issues with what
> you're proposing are coming exactly from there. I think that user space
> today is expecting that when it uses the *_ADD_MEMBERSHIP API, it is
> sufficient in order to see that traffic over a socket. Switchdev and DSA
> are kernel-only concepts, they have no user-facing API. I am not sure
> that it is desirable to change that. I hope you aren't telling me that
> we should add a --please argument to the PACKET_ADD_MEMBERSHIP /
> IP_ADD_MEMBERSHIP UAPI just in case the network interface is a switchdev
> port...

If the goal of this thread is to get packet sockets to work with
offloaded traffic, then I think you need to teach these sockets to
instruct the bound device to trap / mirror incoming traffic to the CPU.
Maybe via a new ndo.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-11-01 14:42                             ` Ido Schimmel
@ 2020-11-01 15:04                               ` Vladimir Oltean
  2020-11-01 15:39                                 ` Ido Schimmel
  0 siblings, 1 reply; 46+ messages in thread
From: Vladimir Oltean @ 2020-11-01 15:04 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, vyasevich, netdev,
	UNGLinuxDriver, Nikolay Aleksandrov, Roopa Prabhu

On Sun, Nov 01, 2020 at 04:42:17PM +0200, Ido Schimmel wrote:
> If the goal of this thread is to get packet sockets to work with
> offloaded traffic, then I think you need to teach these sockets to
> instruct the bound device to trap / mirror incoming traffic to the CPU.
> Maybe via a new ndo.

A new ndo that does what? It would be exclusively called by sockets?
We have packet traps with tc, packet traps with devlink, a mechanism for
switchdev host MDBs, and from the discussion with you I also gather that
there should be an equivalent switchdev object for host FDBs, that the
bridge would use. So we would need yet another mechanism to extract
packets from the hardware data path? I am simply lacking the clarity
about what the new ndo you're talking about should do.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-11-01 15:04                               ` Vladimir Oltean
@ 2020-11-01 15:39                                 ` Ido Schimmel
  2020-11-01 16:13                                   ` Vladimir Oltean
  0 siblings, 1 reply; 46+ messages in thread
From: Ido Schimmel @ 2020-11-01 15:39 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, vyasevich, netdev,
	UNGLinuxDriver, Nikolay Aleksandrov, Roopa Prabhu

On Sun, Nov 01, 2020 at 05:04:42PM +0200, Vladimir Oltean wrote:
> On Sun, Nov 01, 2020 at 04:42:17PM +0200, Ido Schimmel wrote:
> > If the goal of this thread is to get packet sockets to work with
> > offloaded traffic, then I think you need to teach these sockets to
> > instruct the bound device to trap / mirror incoming traffic to the CPU.
> > Maybe via a new ndo.
> 
> A new ndo that does what? It would be exclusively called by sockets?
> We have packet traps with tc, packet traps with devlink, a mechanism for
> switchdev host MDBs, and from the discussion with you I also gather that
> there should be an equivalent switchdev object for host FDBs, that the
> bridge would use. So we would need yet another mechanism to extract
> packets from the hardware data path? I am simply lacking the clarity
> about what the new ndo you're talking about should do.

You indicated that you want packet sockets to work without any user
space changes:

"I think that user space today is expecting that when it uses the
*_ADD_MEMBERSHIP API, it is sufficient in order to see that traffic over
a socket. Switchdev and DSA are kernel-only concepts, they have no
user-facing API. I am not sure that it is desirable to change that."

So tc is irrelevant. And it should work regardless if the socket is
bound to an interface that is bridged:

"For PACKET_ADD_MEMBERSHIP, this should work as-is on swpN even if it's
bridged."

So anything related to the bridge is irrelevant as well.

You also wondered which indication you would get down to the driver that
eventually needs to program the hardware to get the packets:

"Who will notify me of these multicast addresses if I'm bridged and I
need to terminate L2 or L4 PTP through the data path of the slave
interfaces and not of the bridge."

Which kernel entity you want to get the notification from? The packet
socket wants the packets, so it should notify you. The kernel is aware
that traffic is offloaded and can do whatever it needs (e.g., calling
the ndo) in order to extract packets from the hardware data path to the
CPU and to the socket.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-11-01 15:39                                 ` Ido Schimmel
@ 2020-11-01 16:13                                   ` Vladimir Oltean
  2020-11-11  4:12                                     ` Florian Fainelli
  0 siblings, 1 reply; 46+ messages in thread
From: Vladimir Oltean @ 2020-11-01 16:13 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jiri Pirko, Jakub Kicinski, Ivan Vecera, vyasevich, netdev,
	UNGLinuxDriver, Nikolay Aleksandrov, Roopa Prabhu

On Sun, Nov 01, 2020 at 05:39:06PM +0200, Ido Schimmel wrote:
> You also wondered which indication you would get down to the driver that
> eventually needs to program the hardware to get the packets:
> 
> "Who will notify me of these multicast addresses if I'm bridged and I
> need to terminate L2 or L4 PTP through the data path of the slave
> interfaces and not of the bridge."
> 
> Which kernel entity you want to get the notification from? The packet
> socket wants the packets, so it should notify you. The kernel is aware
> that traffic is offloaded and can do whatever it needs (e.g., calling
> the ndo) in order to extract packets from the hardware data path to the
> CPU and to the socket.

Honestly, just as I was saying, I was thinking about using the
dev_mc_add call that is emitted today, and simply auditing the
dev_mc_add and dev_uc_add calls which are unnecessary (like in the case
of non-automatic bridge interfaces), for example like this:

if (!(dev->features & NETIF_F_PROMISC_BY_DEFAULT))
	dev_uc_add(dev, static bridge fdb entry);

To me this would be the least painful way forward.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches
  2020-11-01 16:13                                   ` Vladimir Oltean
@ 2020-11-11  4:12                                     ` Florian Fainelli
  0 siblings, 0 replies; 46+ messages in thread
From: Florian Fainelli @ 2020-11-11  4:12 UTC (permalink / raw)
  To: Vladimir Oltean, Ido Schimmel
  Cc: Andrew Lunn, Vivien Didelot, David S. Miller, Jiri Pirko,
	Jakub Kicinski, Ivan Vecera, vyasevich, netdev, UNGLinuxDriver,
	Nikolay Aleksandrov, Roopa Prabhu



On 11/1/2020 8:13 AM, Vladimir Oltean wrote:
> On Sun, Nov 01, 2020 at 05:39:06PM +0200, Ido Schimmel wrote:
>> You also wondered which indication you would get down to the driver that
>> eventually needs to program the hardware to get the packets:
>>
>> "Who will notify me of these multicast addresses if I'm bridged and I
>> need to terminate L2 or L4 PTP through the data path of the slave
>> interfaces and not of the bridge."
>>
>> Which kernel entity you want to get the notification from? The packet
>> socket wants the packets, so it should notify you. The kernel is aware
>> that traffic is offloaded and can do whatever it needs (e.g., calling
>> the ndo) in order to extract packets from the hardware data path to the
>> CPU and to the socket.
> 
> Honestly, just as I was saying, I was thinking about using the
> dev_mc_add call that is emitted today, and simply auditing the
> dev_mc_add and dev_uc_add calls which are unnecessary (like in the case
> of non-automatic bridge interfaces), for example like this:
> 
> if (!(dev->features & NETIF_F_PROMISC_BY_DEFAULT))
> 	dev_uc_add(dev, static bridge fdb entry);
> 
> To me this would be the least painful way forward.

Vladimir, what do you think about re-posting this series with the DSA
ports operating in standalone or bridge mode with the bridge being
multicast aware, and tackle the termination of PTP frames on DSA ports
being bridged separately?

From what I could read there does not appear to be a problem with doing
RX filtering for standalone ports since we all agree that these
net_device should look like a regular NIC port with RX filtering capability.
-- 
Florian

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2020-11-11  4:12 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 01/13] net: core: dev_addr_lists: add VID to device address Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 02/13] net: 8021q: vlan_dev: add vid tag to addresses of uc and mc lists Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 03/13] net: 8021q: vlan_dev: add vid tag for vlan device own address Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 04/13] ethernet: eth: add default vid len for all ethernet kind devices Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 05/13] net: bridge: multicast: propagate br_mc_disabled_update() return Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 06/13] net: core: dev_addr_lists: export some raw __hw_addr helpers Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 07/13] net: dsa: don't use switchdev_notifier_fdb_info in dsa_switchdev_event_work Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 08/13] net: dsa: add ability to program unicast and multicast filters for CPU port Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 09/13] net: dsa: mroute: don't panic the kernel if called without the prepare phase Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding Vladimir Oltean
2020-05-22 12:38   ` Nikolay Aleksandrov
2020-05-22 13:13     ` Vladimir Oltean
2020-05-22 18:45       ` Allan W. Nielsen
2020-07-20 11:08         ` Vladimir Oltean
2020-05-24 14:26   ` Ido Schimmel
2020-05-24 16:13     ` Vladimir Oltean
2020-05-25 20:11       ` Ido Schimmel
2020-05-25 20:32         ` Vladimir Oltean
2020-07-23 22:35         ` Vladimir Oltean
2020-07-27 17:15           ` Ido Schimmel
2020-05-21 21:10 ` [PATCH RFC net-next 11/13] net: dsa: deal with new flooding port attributes from bridge Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 12/13] net: dsa: treat switchdev notifications for multicast router connected to port Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 13/13] net: dsa: wire up multicast IGMP snooping attribute notification Vladimir Oltean
2020-05-22 18:42 ` [PATCH RFC net-next 00/13] RX filtering for DSA switches Allan W. Nielsen
2020-05-24 14:06 ` Ido Schimmel
2020-05-24 16:24   ` Vladimir Oltean
2020-05-25 19:48     ` Ido Schimmel
2020-05-25 20:23       ` Vladimir Oltean
2020-05-26 14:01         ` Ido Schimmel
2020-05-27 11:36           ` Vladimir Oltean
2020-05-28 14:37             ` Ido Schimmel
2020-07-20 10:00               ` Vladimir Oltean
2020-07-27 16:56                 ` Ido Schimmel
2020-10-27 11:52                   ` Vladimir Oltean
2020-10-28 14:43                     ` Ido Schimmel
2020-10-28 18:46                       ` Vladimir Oltean
2020-11-01 11:27                         ` Ido Schimmel
2020-11-01 12:06                           ` Vladimir Oltean
2020-11-01 14:42                             ` Ido Schimmel
2020-11-01 15:04                               ` Vladimir Oltean
2020-11-01 15:39                                 ` Ido Schimmel
2020-11-01 16:13                                   ` Vladimir Oltean
2020-11-11  4:12                                     ` Florian Fainelli
2020-05-24 16:13 ` Florian Fainelli
2020-05-24 16:34   ` Vladimir Oltean

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).