All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge
@ 2022-04-08 20:03 Vladimir Oltean
  2022-04-08 20:03 ` [PATCH net-next 1/6] net: refactor all NETDEV_CHANGE notifier calls to a single function Vladimir Oltean
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Vladimir Oltean @ 2022-04-08 20:03 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, UNGLinuxDriver, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel,
	Tobias Waldekranz, Mattias Forsblad

For this patch series to make more sense, it should be reviewed from the
last patch to the first. Changes were made in the order that they were
just to preserve patch-with-patch functionality.

A little while ago, some DSA switch drivers gained support for
IFF_UNICAST_FLT, a mechanism through which they are notified of the
MAC addresses required for local standalone termination.
A bit longer ago, DSA also gained support for offloading BR_FDB_LOCAL
bridge FDB entries, which are the MAC addresses required for local
termination when under a bridge.

So we have come one step closer to removing the CPU from the list of
destinations for packets with unknown MAC DA. What remains is to check
whether any software L2 forwarding is enabled, and that is accomplished
by monitoring the neighbor bridge ports that DSA switches have.

With these changes, DSA drivers that fulfill the requirements for
dsa_switch_supports_uc_filtering() and dsa_switch_supports_mc_filtering()
will keep flooding towards the CPU disabled for as long as no port is
promiscuous. The bridge won't attempt to make its ports promiscuous
anymore either if said ports are offloaded by switchdev (this series
changes that behavior). Instead, DSA will fall back by its own will to
promiscuous mode on bridge ports when the bridge itself becomes
promiscuous, or a foreign interface is detected under the same bridge.

Vladimir Oltean (6):
  net: refactor all NETDEV_CHANGE notifier calls to a single function
  net: emit NETDEV_CHANGE for changes to IFF_PROMISC | IFF_ALLMULTI
  net: dsa: walk through all changeupper notifier functions
  net: dsa: track whether bridges have foreign interfaces in them
  net: dsa: monitor changes to bridge promiscuity
  net: bridge: avoid uselessly making offloaded ports promiscuous

 include/net/dsa.h  |   4 +-
 net/bridge/br_if.c |  63 +++++++++++--------
 net/core/dev.c     |  34 +++++-----
 net/dsa/dsa_priv.h |   2 +
 net/dsa/port.c     |  12 ++++
 net/dsa/slave.c    | 150 ++++++++++++++++++++++++++++++++++++++++++---
 6 files changed, 215 insertions(+), 50 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH net-next 1/6] net: refactor all NETDEV_CHANGE notifier calls to a single function
  2022-04-08 20:03 [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge Vladimir Oltean
@ 2022-04-08 20:03 ` Vladimir Oltean
  2022-04-08 20:03 ` [PATCH net-next 2/6] net: emit NETDEV_CHANGE for changes to IFF_PROMISC | IFF_ALLMULTI Vladimir Oltean
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Vladimir Oltean @ 2022-04-08 20:03 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, UNGLinuxDriver, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel,
	Tobias Waldekranz, Mattias Forsblad

Create a __netdev_state_change() helper function which emits a
NETDEV_CHANGE notifier with the given changed flags as argument.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/core/dev.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index e027410e861b..433f006a796b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1314,6 +1314,19 @@ void netdev_features_change(struct net_device *dev)
 }
 EXPORT_SYMBOL(netdev_features_change);
 
+static void __netdev_state_change(struct net_device *dev,
+				  unsigned int flags_changed)
+{
+	struct netdev_notifier_change_info change_info = {
+		.info = {
+			.dev = dev,
+		},
+		.flags_changed = flags_changed,
+	};
+
+	call_netdevice_notifiers_info(NETDEV_CHANGE, &change_info.info);
+}
+
 /**
  *	netdev_state_change - device changes state
  *	@dev: device to cause notification
@@ -1325,12 +1338,7 @@ EXPORT_SYMBOL(netdev_features_change);
 void netdev_state_change(struct net_device *dev)
 {
 	if (dev->flags & IFF_UP) {
-		struct netdev_notifier_change_info change_info = {
-			.info.dev = dev,
-		};
-
-		call_netdevice_notifiers_info(NETDEV_CHANGE,
-					      &change_info.info);
+		__netdev_state_change(dev, 0);
 		rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL);
 	}
 }
@@ -8479,16 +8487,8 @@ void __dev_notify_flags(struct net_device *dev, unsigned int old_flags,
 	}
 
 	if (dev->flags & IFF_UP &&
-	    (changes & ~(IFF_UP | IFF_PROMISC | IFF_ALLMULTI | IFF_VOLATILE))) {
-		struct netdev_notifier_change_info change_info = {
-			.info = {
-				.dev = dev,
-			},
-			.flags_changed = changes,
-		};
-
-		call_netdevice_notifiers_info(NETDEV_CHANGE, &change_info.info);
-	}
+	    (changes & ~(IFF_UP | IFF_PROMISC | IFF_ALLMULTI | IFF_VOLATILE)))
+		__netdev_state_change(dev, changes);
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 2/6] net: emit NETDEV_CHANGE for changes to IFF_PROMISC | IFF_ALLMULTI
  2022-04-08 20:03 [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge Vladimir Oltean
  2022-04-08 20:03 ` [PATCH net-next 1/6] net: refactor all NETDEV_CHANGE notifier calls to a single function Vladimir Oltean
@ 2022-04-08 20:03 ` Vladimir Oltean
  2022-04-08 20:03 ` [PATCH net-next 3/6] net: dsa: walk through all changeupper notifier functions Vladimir Oltean
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Vladimir Oltean @ 2022-04-08 20:03 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, UNGLinuxDriver, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel,
	Tobias Waldekranz, Mattias Forsblad

Some drivers may be interested in the change in promiscuity on other
devices, like for example switchdev drivers may be interested in a
promiscuity change on the bridge. In fact this was also suggested as a
valid thing to do a while ago by Jiri Pirko:
https://lore.kernel.org/all/20190829182957.GA17530@lunn.ch/t/#m6de7937f694ab1375723bab7c53a7fb2d3595332

yet it doesn't work. This is because, for probably legacy reasons,
__dev_notify_flags() omits changes to IFF_PROMISC | IFF_ALLMULTI
(RX flags), as well as a bunch of other flags (maybe simply because it
wasn't needed, maybe because of other reasons).

It may be tempting to hook this into (actually remove the restriction
from) __dev_notify_flags(), but that is an unreliable place to put it,
since __dev_set_promiscuity() may be called with "notify=false" and we'd
still like to emit the NETDEV_CHANGE anyway.

So put the netdev notifier call right next to the dev_change_rx_flags()
call, for both IFF_PROMISC and IFF_ALLMULTI.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/core/dev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 433f006a796b..2fc754018a2e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8252,6 +8252,7 @@ static int __dev_set_promiscuity(struct net_device *dev, int inc, bool notify)
 		}
 
 		dev_change_rx_flags(dev, IFF_PROMISC);
+		__netdev_state_change(dev, IFF_PROMISC);
 	}
 	if (notify)
 		__dev_notify_flags(dev, old_flags, IFF_PROMISC);
@@ -8306,6 +8307,7 @@ static int __dev_set_allmulti(struct net_device *dev, int inc, bool notify)
 	}
 	if (dev->flags ^ old_flags) {
 		dev_change_rx_flags(dev, IFF_ALLMULTI);
+		__netdev_state_change(dev, IFF_ALLMULTI);
 		dev_set_rx_mode(dev);
 		if (notify)
 			__dev_notify_flags(dev, old_flags,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 3/6] net: dsa: walk through all changeupper notifier functions
  2022-04-08 20:03 [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge Vladimir Oltean
  2022-04-08 20:03 ` [PATCH net-next 1/6] net: refactor all NETDEV_CHANGE notifier calls to a single function Vladimir Oltean
  2022-04-08 20:03 ` [PATCH net-next 2/6] net: emit NETDEV_CHANGE for changes to IFF_PROMISC | IFF_ALLMULTI Vladimir Oltean
@ 2022-04-08 20:03 ` Vladimir Oltean
  2022-04-08 20:03 ` [PATCH net-next 4/6] net: dsa: track whether bridges have foreign interfaces in them Vladimir Oltean
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Vladimir Oltean @ 2022-04-08 20:03 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, UNGLinuxDriver, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel,
	Tobias Waldekranz, Mattias Forsblad

Traditionally, DSA has had a single netdev notifier handling function
for each device type.

For the sake of code cleanliness, we would like to introduce more
handling functions which do one thing, but the conditions for entering
these functions start to overlap. Example: a handling function which
tracks whether any bridges contain both DSA and non-DSA interfaces.
Either this is placed before dsa_slave_changeupper(), case in which it
will prevent that function from executing, or we place it after
dsa_slave_changeupper(), case in which we will prevent it from
executing. The other alternative is to ignore errors from the new
handling function (not ideal).

To support this usage, we need to change the pattern. In the new model,
we enter all notifier handling sub-functions, and exit with NOTIFY_DONE
if there is nothing to do. This allows the sub-functions to be
relatively free-form and independent from each other.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/slave.c | 37 ++++++++++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index f47048a624fb..f87109e7696d 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -2463,6 +2463,9 @@ static int dsa_slave_changeupper(struct net_device *dev,
 	struct netlink_ext_ack *extack;
 	int err = NOTIFY_DONE;
 
+	if (!dsa_slave_dev_check(dev))
+		return err;
+
 	extack = netdev_notifier_info_to_extack(&info->info);
 
 	if (netif_is_bridge_master(info->upper_dev)) {
@@ -2517,6 +2520,9 @@ static int dsa_slave_prechangeupper(struct net_device *dev,
 {
 	struct dsa_port *dp = dsa_slave_to_port(dev);
 
+	if (!dsa_slave_dev_check(dev))
+		return NOTIFY_DONE;
+
 	if (netif_is_bridge_master(info->upper_dev) && !info->linking)
 		dsa_port_pre_bridge_leave(dp, info->upper_dev);
 	else if (netif_is_lag_master(info->upper_dev) && !info->linking)
@@ -2537,6 +2543,9 @@ dsa_slave_lag_changeupper(struct net_device *dev,
 	int err = NOTIFY_DONE;
 	struct dsa_port *dp;
 
+	if (!netif_is_lag_master(dev))
+		return err;
+
 	netdev_for_each_lower_dev(dev, lower, iter) {
 		if (!dsa_slave_dev_check(lower))
 			continue;
@@ -2566,6 +2575,9 @@ dsa_slave_lag_prechangeupper(struct net_device *dev,
 	int err = NOTIFY_DONE;
 	struct dsa_port *dp;
 
+	if (!netif_is_lag_master(dev))
+		return err;
+
 	netdev_for_each_lower_dev(dev, lower, iter) {
 		if (!dsa_slave_dev_check(lower))
 			continue;
@@ -2687,22 +2699,29 @@ static int dsa_slave_netdevice_event(struct notifier_block *nb,
 		if (err != NOTIFY_DONE)
 			return err;
 
-		if (dsa_slave_dev_check(dev))
-			return dsa_slave_prechangeupper(dev, ptr);
+		err = dsa_slave_prechangeupper(dev, ptr);
+		if (notifier_to_errno(err))
+			return err;
 
-		if (netif_is_lag_master(dev))
-			return dsa_slave_lag_prechangeupper(dev, ptr);
+		err = dsa_slave_lag_prechangeupper(dev, ptr);
+		if (notifier_to_errno(err))
+			return err;
 
 		break;
 	}
-	case NETDEV_CHANGEUPPER:
-		if (dsa_slave_dev_check(dev))
-			return dsa_slave_changeupper(dev, ptr);
+	case NETDEV_CHANGEUPPER: {
+		int err;
+
+		err = dsa_slave_changeupper(dev, ptr);
+		if (notifier_to_errno(err))
+			return err;
 
-		if (netif_is_lag_master(dev))
-			return dsa_slave_lag_changeupper(dev, ptr);
+		err = dsa_slave_lag_changeupper(dev, ptr);
+		if (notifier_to_errno(err))
+			return err;
 
 		break;
+	}
 	case NETDEV_CHANGELOWERSTATE: {
 		struct netdev_notifier_changelowerstate_info *info = ptr;
 		struct dsa_port *dp;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 4/6] net: dsa: track whether bridges have foreign interfaces in them
  2022-04-08 20:03 [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge Vladimir Oltean
                   ` (2 preceding siblings ...)
  2022-04-08 20:03 ` [PATCH net-next 3/6] net: dsa: walk through all changeupper notifier functions Vladimir Oltean
@ 2022-04-08 20:03 ` Vladimir Oltean
  2022-04-08 20:03 ` [PATCH net-next 5/6] net: dsa: monitor changes to bridge promiscuity Vladimir Oltean
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Vladimir Oltean @ 2022-04-08 20:03 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, UNGLinuxDriver, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel,
	Tobias Waldekranz, Mattias Forsblad

DSA switches send packets to the CPU for 2 reasons, either for local
termination or for software forwarding.

If the switch driver satisfies the requirements for IFF_UNICAST_FLT and
also offloads the bridge local FDB entries, then there is no reason to
send unknown traffic to the CPU for the purpose of local termination.

This leaves software forwarding as the sole concern, and a place where
certain optimizations can be made. In the case of a bridge where all
bridge ports are offloaded DSA interfaces, there isn't any need to do
software forwarding (and therefore, to enable host flooding).

We approximate the need for a DSA port to enable host flooding by managing
IFF_PROMISC, which ends up triggering dsa_port_manage_cpu_flood().
This isn't ideal, because IFF_PROMISC/dev->uc/dev->mc deal only with the
standalone address database of a port, and not with the bridging
database, but right now DSA doesn't have the poster-child hardware where
flooding is a per-FID setting rather than per-port. So in current
practice, there is no reason to make dsa_port_manage_cpu_flood() more
complex by looking at anything other than IFF_PROMISC.

To enable those optimizations, create a function called
dsa_bridge_foreign_dev_update() which updates a new boolean of struct
dsa_bridge called "have_foreign" whenever a DSA port joins/leaves a
bridge, or a LAG that is already in a bridge, or any other interface
joins/leaves a bridge in which a DSA port or LAG offloaded by a DSA port
exists.

Note that when dsa_port_bridge_create() is first called,
dsa_bridge_foreign_dev_update() is not called right away. It is called
slightly later (still under rtnl_mutex), leading to some DSA switch
callbacks (->port_bridge_join) being called with a potentially not
up-to-date "have_foreign" property. This can be changed if necessary,
I deem it as "not a problem" for now.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/net/dsa.h  |  3 +-
 net/dsa/dsa_priv.h |  1 +
 net/dsa/port.c     |  7 ++++
 net/dsa/slave.c    | 80 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index f2352d82e37b..0ea45a4acc80 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -240,8 +240,9 @@ struct dsa_mall_tc_entry {
 struct dsa_bridge {
 	struct net_device *dev;
 	unsigned int num;
-	bool tx_fwd_offload;
 	refcount_t refcount;
+	u8 tx_fwd_offload:1;
+	u8 have_foreign:1;
 };
 
 struct dsa_port {
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 5d3f4a67dce1..d610776ecd76 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -320,6 +320,7 @@ void dsa_slave_setup_tagger(struct net_device *slave);
 int dsa_slave_change_mtu(struct net_device *dev, int new_mtu);
 int dsa_slave_manage_vlan_filtering(struct net_device *dev,
 				    bool vlan_filtering);
+int dsa_bridge_foreign_dev_update(struct net_device *bridge_dev);
 
 static inline struct dsa_port *dsa_slave_to_port(const struct net_device *dev)
 {
diff --git a/net/dsa/port.c b/net/dsa/port.c
index af9a815c2639..cbee564e1c22 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -656,8 +656,15 @@ int dsa_port_lag_join(struct dsa_port *dp, struct net_device *lag_dev,
 	if (err)
 		goto err_bridge_join;
 
+	err = dsa_bridge_foreign_dev_update(bridge_dev);
+	if (err)
+		goto err_foreign_update;
+
 	return 0;
 
+err_foreign_update:
+	dsa_port_pre_bridge_leave(dp, bridge_dev);
+	dsa_port_bridge_leave(dp, bridge_dev);
 err_bridge_join:
 	dsa_port_notify(dp, DSA_NOTIFIER_LAG_LEAVE, &info);
 err_lag_join:
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index f87109e7696d..1bc8d830fb46 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -2595,6 +2595,18 @@ dsa_slave_lag_prechangeupper(struct net_device *dev,
 	return err;
 }
 
+static int dsa_bridge_changelower(struct net_device *dev,
+				  struct netdev_notifier_changeupper_info *info)
+{
+	int err;
+
+	if (!netif_is_bridge_master(info->upper_dev))
+		return NOTIFY_DONE;
+
+	err = dsa_bridge_foreign_dev_update(info->upper_dev);
+	return notifier_from_errno(err);
+}
+
 static int
 dsa_prevent_bridging_8021q_upper(struct net_device *dev,
 				 struct netdev_notifier_changeupper_info *info)
@@ -2720,6 +2732,10 @@ static int dsa_slave_netdevice_event(struct notifier_block *nb,
 		if (notifier_to_errno(err))
 			return err;
 
+		err = dsa_bridge_changelower(dev, ptr);
+		if (notifier_to_errno(err))
+			return err;
+
 		break;
 	}
 	case NETDEV_CHANGELOWERSTATE: {
@@ -2874,6 +2890,70 @@ static bool dsa_foreign_dev_check(const struct net_device *dev,
 	return true;
 }
 
+/* We need to keep flooding towards the CPU enabled as long as software
+ * forwarding is required.
+ */
+static void dsa_bridge_host_flood_change(struct dsa_bridge *bridge,
+					 bool have_foreign)
+{
+	bool host_flood_was_enabled = bridge->have_foreign;
+	bool host_flood_enabled = have_foreign;
+	int inc = host_flood_enabled ? 1 : -1;
+	struct net_device *brport_dev;
+	struct dsa_switch_tree *dst;
+	struct dsa_port *dp;
+
+	if (host_flood_was_enabled == host_flood_enabled)
+		goto out;
+
+	list_for_each_entry(dst, &dsa_tree_list, list) {
+		dsa_tree_for_each_user_port(dp, dst) {
+			if (dsa_port_offloads_bridge(dp, bridge)) {
+				brport_dev = dsa_port_to_bridge_port(dp);
+				dev_set_promiscuity(brport_dev, inc);
+			}
+		}
+	}
+
+out:
+	bridge->have_foreign = have_foreign;
+}
+
+int dsa_bridge_foreign_dev_update(struct net_device *bridge_dev)
+{
+	struct net_device *first_slave, *lower;
+	struct dsa_bridge *bridge = NULL;
+	struct dsa_switch_tree *dst;
+	bool have_foreign = false;
+	struct list_head *iter;
+	struct dsa_port *dp;
+
+	list_for_each_entry(dst, &dsa_tree_list, list) {
+		dsa_tree_for_each_user_port(dp, dst) {
+			if (dsa_port_offloads_bridge_dev(dp, bridge_dev)) {
+				bridge = dp->bridge;
+				first_slave = dp->slave;
+				break;
+			}
+		}
+	}
+
+	/* Bridge with no DSA interface in it */
+	if (!bridge)
+		return 0;
+
+	netdev_for_each_lower_dev(bridge_dev, lower, iter) {
+		if (dsa_foreign_dev_check(first_slave, lower)) {
+			have_foreign = true;
+			break;
+		}
+	}
+
+	dsa_bridge_host_flood_change(bridge, have_foreign);
+
+	return 0;
+}
+
 static int dsa_slave_fdb_event(struct net_device *dev,
 			       struct net_device *orig_dev,
 			       unsigned long event, const void *ctx,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 5/6] net: dsa: monitor changes to bridge promiscuity
  2022-04-08 20:03 [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge Vladimir Oltean
                   ` (3 preceding siblings ...)
  2022-04-08 20:03 ` [PATCH net-next 4/6] net: dsa: track whether bridges have foreign interfaces in them Vladimir Oltean
@ 2022-04-08 20:03 ` Vladimir Oltean
  2022-04-08 20:03 ` [PATCH net-next 6/6] net: bridge: avoid uselessly making offloaded ports promiscuous Vladimir Oltean
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Vladimir Oltean @ 2022-04-08 20:03 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, UNGLinuxDriver, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel,
	Tobias Waldekranz, Mattias Forsblad

In preparation of changing the bridge such that it stops managing the
IFF_PROMISC flag on offloaded bridge ports, we need to ensure that DSA
preserves behavior in the circumstances that matter.

The bridge software data path implementation (br_handle_frame_finish)
passes a unicast frame up if a BR_FDB_LOCAL entry exists, or if the MAC
DA is unknown, if local_rcv is true. In turn, local_rcv is true when
the bridge device itself is promiscuous.

The analogous behavior in the offloaded plane is to enable flooding of
packets with unknown destination towards the CPU when the bridge device
itself is promiscuous. This change achieves that by monitoring
IFF_PROMISC changes on bridge devices, and calling
dsa_bridge_host_flood_change -> dsa_port_manage_cpu_flood on such changes.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/net/dsa.h  |  1 +
 net/dsa/dsa_priv.h |  1 +
 net/dsa/port.c     |  5 +++++
 net/dsa/slave.c    | 43 ++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 0ea45a4acc80..e8e30be4cde8 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -243,6 +243,7 @@ struct dsa_bridge {
 	refcount_t refcount;
 	u8 tx_fwd_offload:1;
 	u8 have_foreign:1;
+	u8 promisc:1;
 };
 
 struct dsa_port {
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index d610776ecd76..9b868a7c3459 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -321,6 +321,7 @@ int dsa_slave_change_mtu(struct net_device *dev, int new_mtu);
 int dsa_slave_manage_vlan_filtering(struct net_device *dev,
 				    bool vlan_filtering);
 int dsa_bridge_foreign_dev_update(struct net_device *bridge_dev);
+int dsa_bridge_promisc_update(struct net_device *bridge_dev);
 
 static inline struct dsa_port *dsa_slave_to_port(const struct net_device *dev)
 {
diff --git a/net/dsa/port.c b/net/dsa/port.c
index cbee564e1c22..bbcc9c92af5f 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -660,8 +660,13 @@ int dsa_port_lag_join(struct dsa_port *dp, struct net_device *lag_dev,
 	if (err)
 		goto err_foreign_update;
 
+	err = dsa_bridge_promisc_update(bridge_dev);
+	if (err)
+		goto err_promisc_update;
+
 	return 0;
 
+err_promisc_update:
 err_foreign_update:
 	dsa_port_pre_bridge_leave(dp, bridge_dev);
 	dsa_port_bridge_leave(dp, bridge_dev);
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 1bc8d830fb46..59ebc4a520e7 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -2753,6 +2753,13 @@ static int dsa_slave_netdevice_event(struct notifier_block *nb,
 	}
 	case NETDEV_CHANGE:
 	case NETDEV_UP: {
+		int err;
+
+		if (netif_is_bridge_master(dev)) {
+			err = dsa_bridge_promisc_update(dev);
+			return notifier_from_errno(err);
+		}
+
 		/* Track state of master port.
 		 * DSA driver may require the master port (and indirectly
 		 * the tagger) to be available for some special operation.
@@ -2891,13 +2898,13 @@ static bool dsa_foreign_dev_check(const struct net_device *dev,
 }
 
 /* We need to keep flooding towards the CPU enabled as long as software
- * forwarding is required.
+ * forwarding is required, or the bridge device is promiscuous.
  */
 static void dsa_bridge_host_flood_change(struct dsa_bridge *bridge,
-					 bool have_foreign)
+					 bool have_foreign, bool promisc)
 {
-	bool host_flood_was_enabled = bridge->have_foreign;
-	bool host_flood_enabled = have_foreign;
+	bool host_flood_was_enabled = bridge->have_foreign || bridge->promisc;
+	bool host_flood_enabled = have_foreign || promisc;
 	int inc = host_flood_enabled ? 1 : -1;
 	struct net_device *brport_dev;
 	struct dsa_switch_tree *dst;
@@ -2917,6 +2924,7 @@ static void dsa_bridge_host_flood_change(struct dsa_bridge *bridge,
 
 out:
 	bridge->have_foreign = have_foreign;
+	bridge->promisc = promisc;
 }
 
 int dsa_bridge_foreign_dev_update(struct net_device *bridge_dev)
@@ -2949,7 +2957,32 @@ int dsa_bridge_foreign_dev_update(struct net_device *bridge_dev)
 		}
 	}
 
-	dsa_bridge_host_flood_change(bridge, have_foreign);
+	dsa_bridge_host_flood_change(bridge, have_foreign, bridge->promisc);
+
+	return 0;
+}
+
+int dsa_bridge_promisc_update(struct net_device *bridge_dev)
+{
+	struct dsa_bridge *bridge = NULL;
+	struct dsa_switch_tree *dst;
+	struct dsa_port *dp;
+
+	list_for_each_entry(dst, &dsa_tree_list, list) {
+		dsa_tree_for_each_user_port(dp, dst) {
+			if (dsa_port_offloads_bridge_dev(dp, bridge_dev)) {
+				bridge = dp->bridge;
+				break;
+			}
+		}
+	}
+
+	/* Bridge with no DSA interface in it */
+	if (!bridge)
+		return 0;
+
+	dsa_bridge_host_flood_change(bridge, bridge->have_foreign,
+				     bridge_dev->flags & IFF_PROMISC);
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 6/6] net: bridge: avoid uselessly making offloaded ports promiscuous
  2022-04-08 20:03 [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge Vladimir Oltean
                   ` (4 preceding siblings ...)
  2022-04-08 20:03 ` [PATCH net-next 5/6] net: dsa: monitor changes to bridge promiscuity Vladimir Oltean
@ 2022-04-08 20:03 ` Vladimir Oltean
  2022-04-09 10:17   ` kernel test robot
  2022-04-14 13:53   ` kernel test robot
  2022-04-08 20:14 ` [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge Vladimir Oltean
  2022-04-09 19:46 ` Tobias Waldekranz
  7 siblings, 2 replies; 22+ messages in thread
From: Vladimir Oltean @ 2022-04-08 20:03 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, UNGLinuxDriver, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel,
	Tobias Waldekranz, Mattias Forsblad

The bridge driver's intention by making ports promiscuous is to turn off
their RX filters such that these ports receive packets with any MAC DA.

A quick survey of the kernel drivers that call
switchdev_bridge_port_offload() shows that either these do not implement
ndo_change_rx_flags() at all, or they explicitly ignore changes to
IFF_PROMISC (am65_cpsw_slave_set_promisc, cpsw_set_promiscious,
ocelot_set_rx_mode).

This makes sense, because hardware that is purpose-built to do L2
forwarding generally already knows it should accept any MAC DA on its
ports.

That is not to say that IFF_PROMISC makes no sense for switchdev drivers.
For example, DSA has the concept of multiple address databases (this is
achieved by effectively partitioning the FDB: reserve a database - FID -
for each port operating as standalone, a FID for each VLAN-unaware
bridge, a FID for each bridge VLAN). The address database of a
standalone port is managed through the standard dev->uc and dev->uc
lists and is used to filter towards the hosts the addresses required for
local termination. The bridge-related address databases are managed
using switchdev (SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE).

IFF_PROMISC is intrinsically connected to dev->uc and dev->mc (see the
implementation of __dev_set_rx_mode which puts the interface in
promiscuous mode if the unicast list isn't empty but the device doesn't
support IFF_UNICAST_FLT), and therefore to what DSA implements as the
standalone port address database (there, an entry in dev->uc means
"forward it to CPU", the absence of it means "drop it", and promiscuity
means "put the CPU in the flood mask of packets with unknown MAC DA").

Whereas there is no IFF_PROMISC equivalent to the FDB entries notified
through switchdev (therefore to the bridge-related address databases),
because none is needed.

In this model, the bridge driver, which is only trying to secure its
reception of packets, is in fact overstepping, because it manages
something which is outside of its competence: the host flooding of the
standalone port database, when in fact that database will not be the one
used by packets handled by the bridging service.

In turn, this prevents further optimizations from being applied in
particular to DSA, and in general to any switchdev driver. A desirable
goal is to eliminate host flooding of packets which are known to be
unnecessary and only dropped later in software [1].

In an ideal world with ideal hardware:
(a) flooding would be controlled per FID rather than per port
(b) egress flooding towards a certain port can be controlled
    independently depending on the actual port ingress port, rather than
    globally, regardless of ingress port

When (a) does not hold true, the bridge will force the port to keep host
flooding enabled, even if this is not otherwise needed (there is no
station behind a "foreign interface" that requires software forwarding;
the only packets sent by the accelerator to the CPU are for termination
purposes).

When (b) does not hold true, it means that a 4-port switch where 1 port
is standalone and 3 are bridged (again with no foreign interface) will
have host flooding enabled for all 4 ports (including the standalone
port, because the bridge is keeping host flooding enabled, and all ports
are serviced by the same CPU port).

Since DSA is a framework and not just a driver for a single device,
these nonidealities do hold true, and the bridge unnecessarily setting
IFF_PROMISC on its ports is a real roadblock towards disabling host
flooding in practical scenarios.

The proposed solution is to make the bridge driver stop touching port
promiscuity for offloaded switchdev ports, and let them manage
promiscuity by themselves as they see fit. It can achieve this by
looking at net_bridge_port :: offload_count, which is updated
voluntarily by switchdev drivers using switchdev_bridge_port_offload().

br_manage_promisc() is already called by nbp_update_port_count() on a
port join/leave, and the implicit assumption is that
switchdev_bridge_port_offload() has already been called by that time
(from netdev_master_upper_dev_link).

[1] https://www.youtube.com/watch?v=B1HhxEcU7Jg

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/bridge/br_if.c | 63 ++++++++++++++++++++++++++++------------------
 1 file changed, 39 insertions(+), 24 deletions(-)

diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 55f47cadb114..6ac5313e1cb8 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -135,34 +135,49 @@ static void br_port_clear_promisc(struct net_bridge_port *p)
 void br_manage_promisc(struct net_bridge *br)
 {
 	struct net_bridge_port *p;
-	bool set_all = false;
-
-	/* If vlan filtering is disabled or bridge interface is placed
-	 * into promiscuous mode, place all ports in promiscuous mode.
-	 */
-	if ((br->dev->flags & IFF_PROMISC) || !br_vlan_enabled(br->dev))
-		set_all = true;
 
 	list_for_each_entry(p, &br->port_list, list) {
-		if (set_all) {
+		/* Offloaded ports have a separate address database for
+		 * forwarding, which is managed through switchdev and not
+		 * through dev_uc_add(), so the promiscuous concept makes no
+		 * sense for them. Avoid updating promiscuity in that case.
+		 */
+		if (p->offload_count) {
+			br_port_clear_promisc(p);
+			continue;
+		}
+
+		/* If bridge is promiscuous, unconditionally place all ports
+		 * in promiscuous mode too. This allows the bridge device to
+		 * locally receive all unknown traffic.
+		 */
+		if (br->dev->flags & IFF_PROMISC) {
+			br_port_set_promisc(p);
+			continue;
+		}
+
+		/* If vlan filtering is disabled, place all ports in
+		 * promiscuous mode.
+		 */
+		if (!br_vlan_enabled(br->dev)) {
 			br_port_set_promisc(p);
-		} else {
-			/* If the number of auto-ports is <= 1, then all other
-			 * ports will have their output configuration
-			 * statically specified through fdbs.  Since ingress
-			 * on the auto-port becomes forwarding/egress to other
-			 * ports and egress configuration is statically known,
-			 * we can say that ingress configuration of the
-			 * auto-port is also statically known.
-			 * This lets us disable promiscuous mode and write
-			 * this config to hw.
-			 */
-			if (br->auto_cnt == 0 ||
-			    (br->auto_cnt == 1 && br_auto_port(p)))
-				br_port_clear_promisc(p);
-			else
-				br_port_set_promisc(p);
+			continue;
 		}
+
+		/* If the number of auto-ports is <= 1, then all other ports
+		 * will have their output configuration statically specified
+		 * through fdbs. Since ingress on the auto-port becomes
+		 * forwarding/egress to other ports and egress configuration is
+		 * statically known, we can say that ingress configuration of
+		 * the auto-port is also statically known.
+		 * This lets us disable promiscuous mode and write this config
+		 * to hw.
+		 */
+		if (br->auto_cnt == 0 ||
+		    (br->auto_cnt == 1 && br_auto_port(p)))
+			br_port_clear_promisc(p);
+		else
+			br_port_set_promisc(p);
 	}
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge
  2022-04-08 20:03 [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge Vladimir Oltean
                   ` (5 preceding siblings ...)
  2022-04-08 20:03 ` [PATCH net-next 6/6] net: bridge: avoid uselessly making offloaded ports promiscuous Vladimir Oltean
@ 2022-04-08 20:14 ` Vladimir Oltean
  2022-04-08 20:52   ` Nikolay Aleksandrov
  2022-04-09 19:46 ` Tobias Waldekranz
  7 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2022-04-08 20:14 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, UNGLinuxDriver, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel,
	Tobias Waldekranz, Mattias Forsblad

On Fri, Apr 08, 2022 at 11:03:31PM +0300, Vladimir Oltean wrote:
> For this patch series to make more sense, it should be reviewed from the
> last patch to the first. Changes were made in the order that they were
> just to preserve patch-with-patch functionality.
> 
> A little while ago, some DSA switch drivers gained support for
> IFF_UNICAST_FLT, a mechanism through which they are notified of the
> MAC addresses required for local standalone termination.
> A bit longer ago, DSA also gained support for offloading BR_FDB_LOCAL
> bridge FDB entries, which are the MAC addresses required for local
> termination when under a bridge.
> 
> So we have come one step closer to removing the CPU from the list of
> destinations for packets with unknown MAC DA. What remains is to check
> whether any software L2 forwarding is enabled, and that is accomplished
> by monitoring the neighbor bridge ports that DSA switches have.
> 
> With these changes, DSA drivers that fulfill the requirements for
> dsa_switch_supports_uc_filtering() and dsa_switch_supports_mc_filtering()
> will keep flooding towards the CPU disabled for as long as no port is
> promiscuous. The bridge won't attempt to make its ports promiscuous
> anymore either if said ports are offloaded by switchdev (this series
> changes that behavior). Instead, DSA will fall back by its own will to
> promiscuous mode on bridge ports when the bridge itself becomes
> promiscuous, or a foreign interface is detected under the same bridge.
> 
> Vladimir Oltean (6):
>   net: refactor all NETDEV_CHANGE notifier calls to a single function
>   net: emit NETDEV_CHANGE for changes to IFF_PROMISC | IFF_ALLMULTI
>   net: dsa: walk through all changeupper notifier functions
>   net: dsa: track whether bridges have foreign interfaces in them
>   net: dsa: monitor changes to bridge promiscuity
>   net: bridge: avoid uselessly making offloaded ports promiscuous
> 
>  include/net/dsa.h  |   4 +-
>  net/bridge/br_if.c |  63 +++++++++++--------
>  net/core/dev.c     |  34 +++++-----
>  net/dsa/dsa_priv.h |   2 +
>  net/dsa/port.c     |  12 ++++
>  net/dsa/slave.c    | 150 ++++++++++++++++++++++++++++++++++++++++++---
>  6 files changed, 215 insertions(+), 50 deletions(-)
> 
> -- 
> 2.25.1
>

Hmm, Nikolay's address bounced back and I didn't notice the MAINTAINERS
change. Updated the CC list with his new address.

Nikolay, if you want to take a look the patches are here, I hope it's
fine if I don't resend:
https://patchwork.kernel.org/project/netdevbpf/cover/20220408200337.718067-1-vladimir.oltean@nxp.com/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge
  2022-04-08 20:14 ` [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge Vladimir Oltean
@ 2022-04-08 20:52   ` Nikolay Aleksandrov
  0 siblings, 0 replies; 22+ messages in thread
From: Nikolay Aleksandrov @ 2022-04-08 20:52 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, UNGLinuxDriver, Paolo Abeni,
	Roopa Prabhu, Jiri Pirko, Ido Schimmel, Tobias Waldekranz,
	Mattias Forsblad

On 08/04/2022 23:14, Vladimir Oltean wrote:
> On Fri, Apr 08, 2022 at 11:03:31PM +0300, Vladimir Oltean wrote:
>> For this patch series to make more sense, it should be reviewed from the
>> last patch to the first. Changes were made in the order that they were
>> just to preserve patch-with-patch functionality.
>>
>> A little while ago, some DSA switch drivers gained support for
>> IFF_UNICAST_FLT, a mechanism through which they are notified of the
>> MAC addresses required for local standalone termination.
>> A bit longer ago, DSA also gained support for offloading BR_FDB_LOCAL
>> bridge FDB entries, which are the MAC addresses required for local
>> termination when under a bridge.
>>
>> So we have come one step closer to removing the CPU from the list of
>> destinations for packets with unknown MAC DA. What remains is to check
>> whether any software L2 forwarding is enabled, and that is accomplished
>> by monitoring the neighbor bridge ports that DSA switches have.
>>
>> With these changes, DSA drivers that fulfill the requirements for
>> dsa_switch_supports_uc_filtering() and dsa_switch_supports_mc_filtering()
>> will keep flooding towards the CPU disabled for as long as no port is
>> promiscuous. The bridge won't attempt to make its ports promiscuous
>> anymore either if said ports are offloaded by switchdev (this series
>> changes that behavior). Instead, DSA will fall back by its own will to
>> promiscuous mode on bridge ports when the bridge itself becomes
>> promiscuous, or a foreign interface is detected under the same bridge.
>>
>> Vladimir Oltean (6):
>>   net: refactor all NETDEV_CHANGE notifier calls to a single function
>>   net: emit NETDEV_CHANGE for changes to IFF_PROMISC | IFF_ALLMULTI
>>   net: dsa: walk through all changeupper notifier functions
>>   net: dsa: track whether bridges have foreign interfaces in them
>>   net: dsa: monitor changes to bridge promiscuity
>>   net: bridge: avoid uselessly making offloaded ports promiscuous
>>
>>  include/net/dsa.h  |   4 +-
>>  net/bridge/br_if.c |  63 +++++++++++--------
>>  net/core/dev.c     |  34 +++++-----
>>  net/dsa/dsa_priv.h |   2 +
>>  net/dsa/port.c     |  12 ++++
>>  net/dsa/slave.c    | 150 ++++++++++++++++++++++++++++++++++++++++++---
>>  6 files changed, 215 insertions(+), 50 deletions(-)
>>
>> -- 
>> 2.25.1
>>
> 
> Hmm, Nikolay's address bounced back and I didn't notice the MAINTAINERS
> change. Updated the CC list with his new address.
> 
> Nikolay, if you want to take a look the patches are here, I hope it's
> fine if I don't resend:
> https://patchwork.kernel.org/project/netdevbpf/cover/20220408200337.718067-1-vladimir.oltean@nxp.com/

That's ok, I'll check them out tomorrow. Thanks!


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 6/6] net: bridge: avoid uselessly making offloaded ports promiscuous
  2022-04-08 20:03 ` [PATCH net-next 6/6] net: bridge: avoid uselessly making offloaded ports promiscuous Vladimir Oltean
@ 2022-04-09 10:17   ` kernel test robot
  2022-04-09 11:04       ` Vladimir Oltean
  2022-04-14 13:53   ` kernel test robot
  1 sibling, 1 reply; 22+ messages in thread
From: kernel test robot @ 2022-04-09 10:17 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: llvm, kbuild-all, Jakub Kicinski, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, UNGLinuxDriver, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel,
	Tobias Waldekranz, Mattias Forsblad

Hi Vladimir,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Vladimir-Oltean/Disable-host-flooding-for-DSA-ports-under-a-bridge/20220409-041556
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git e8bd70250a821edb541c3abe1eacdad9f8dc7adf
config: x86_64-randconfig-a003 (https://download.01.org/0day-ci/archive/20220409/202204091856.0PBgeBSa-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 893e1c18b98d8bbc7b8d7d22cc2c348f65c72ad9)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/2b24e24c704fa129c753dbc8fb764c95f4a3562c
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Vladimir-Oltean/Disable-host-flooding-for-DSA-ports-under-a-bridge/20220409-041556
        git checkout 2b24e24c704fa129c753dbc8fb764c95f4a3562c
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash net/bridge/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> net/bridge/br_if.c:145:10: error: no member named 'offload_count' in 'struct net_bridge_port'
                   if (p->offload_count) {
                       ~  ^
   1 error generated.


vim +145 net/bridge/br_if.c

   129	
   130	/* When a port is added or removed or when certain port flags
   131	 * change, this function is called to automatically manage
   132	 * promiscuity setting of all the bridge ports.  We are always called
   133	 * under RTNL so can skip using rcu primitives.
   134	 */
   135	void br_manage_promisc(struct net_bridge *br)
   136	{
   137		struct net_bridge_port *p;
   138	
   139		list_for_each_entry(p, &br->port_list, list) {
   140			/* Offloaded ports have a separate address database for
   141			 * forwarding, which is managed through switchdev and not
   142			 * through dev_uc_add(), so the promiscuous concept makes no
   143			 * sense for them. Avoid updating promiscuity in that case.
   144			 */
 > 145			if (p->offload_count) {
   146				br_port_clear_promisc(p);
   147				continue;
   148			}
   149	
   150			/* If bridge is promiscuous, unconditionally place all ports
   151			 * in promiscuous mode too. This allows the bridge device to
   152			 * locally receive all unknown traffic.
   153			 */
   154			if (br->dev->flags & IFF_PROMISC) {
   155				br_port_set_promisc(p);
   156				continue;
   157			}
   158	
   159			/* If vlan filtering is disabled, place all ports in
   160			 * promiscuous mode.
   161			 */
   162			if (!br_vlan_enabled(br->dev)) {
   163				br_port_set_promisc(p);
   164				continue;
   165			}
   166	
   167			/* If the number of auto-ports is <= 1, then all other ports
   168			 * will have their output configuration statically specified
   169			 * through fdbs. Since ingress on the auto-port becomes
   170			 * forwarding/egress to other ports and egress configuration is
   171			 * statically known, we can say that ingress configuration of
   172			 * the auto-port is also statically known.
   173			 * This lets us disable promiscuous mode and write this config
   174			 * to hw.
   175			 */
   176			if (br->auto_cnt == 0 ||
   177			    (br->auto_cnt == 1 && br_auto_port(p)))
   178				br_port_clear_promisc(p);
   179			else
   180				br_port_set_promisc(p);
   181		}
   182	}
   183	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 6/6] net: bridge: avoid uselessly making offloaded ports promiscuous
  2022-04-09 10:17   ` kernel test robot
@ 2022-04-09 11:04       ` Vladimir Oltean
  0 siblings, 0 replies; 22+ messages in thread
From: Vladimir Oltean @ 2022-04-09 11:04 UTC (permalink / raw)
  To: kernel test robot
  Cc: netdev, llvm, kbuild-all, Jakub Kicinski, Florian Fainelli,
	Andrew Lunn, Vivien Didelot, Vladimir Oltean, UNGLinuxDriver,
	Paolo Abeni, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Tobias Waldekranz, Mattias Forsblad

On Sat, Apr 09, 2022 at 06:17:15PM +0800, kernel test robot wrote:
> >> net/bridge/br_if.c:145:10: error: no member named 'offload_count' in 'struct net_bridge_port'
>                    if (p->offload_count) {
>                        ~  ^
>    130	/* When a port is added or removed or when certain port flags
>    131	 * change, this function is called to automatically manage
>    132	 * promiscuity setting of all the bridge ports.  We are always called
>    133	 * under RTNL so can skip using rcu primitives.
>    134	 */
>    135	void br_manage_promisc(struct net_bridge *br)
>    136	{
>    137		struct net_bridge_port *p;
>    138	
>    139		list_for_each_entry(p, &br->port_list, list) {
>    140			/* Offloaded ports have a separate address database for
>    141			 * forwarding, which is managed through switchdev and not
>    142			 * through dev_uc_add(), so the promiscuous concept makes no
>    143			 * sense for them. Avoid updating promiscuity in that case.
>    144			 */
>  > 145			if (p->offload_count) {

Good point. Please imagine there's a static inline nbp_get_offload_count()
that returns 0 when CONFIG_NET_SWITCHDEV=n. I'll address this for v2.

>    146				br_port_clear_promisc(p);
>    147				continue;
>    148			}
>    149	
>    150			/* If bridge is promiscuous, unconditionally place all ports
>    151			 * in promiscuous mode too. This allows the bridge device to
>    152			 * locally receive all unknown traffic.
>    153			 */
>    154			if (br->dev->flags & IFF_PROMISC) {
>    155				br_port_set_promisc(p);
>    156				continue;
>    157			}
>    158	
>    159			/* If vlan filtering is disabled, place all ports in
>    160			 * promiscuous mode.
>    161			 */
>    162			if (!br_vlan_enabled(br->dev)) {
>    163				br_port_set_promisc(p);
>    164				continue;
>    165			}
>    166	
>    167			/* If the number of auto-ports is <= 1, then all other ports
>    168			 * will have their output configuration statically specified
>    169			 * through fdbs. Since ingress on the auto-port becomes
>    170			 * forwarding/egress to other ports and egress configuration is
>    171			 * statically known, we can say that ingress configuration of
>    172			 * the auto-port is also statically known.
>    173			 * This lets us disable promiscuous mode and write this config
>    174			 * to hw.
>    175			 */
>    176			if (br->auto_cnt == 0 ||
>    177			    (br->auto_cnt == 1 && br_auto_port(p)))
>    178				br_port_clear_promisc(p);
>    179			else
>    180				br_port_set_promisc(p);
>    181		}
>    182	}
>    183	

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 6/6] net: bridge: avoid uselessly making offloaded ports promiscuous
@ 2022-04-09 11:04       ` Vladimir Oltean
  0 siblings, 0 replies; 22+ messages in thread
From: Vladimir Oltean @ 2022-04-09 11:04 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2629 bytes --]

On Sat, Apr 09, 2022 at 06:17:15PM +0800, kernel test robot wrote:
> >> net/bridge/br_if.c:145:10: error: no member named 'offload_count' in 'struct net_bridge_port'
>                    if (p->offload_count) {
>                        ~  ^
>    130	/* When a port is added or removed or when certain port flags
>    131	 * change, this function is called to automatically manage
>    132	 * promiscuity setting of all the bridge ports.  We are always called
>    133	 * under RTNL so can skip using rcu primitives.
>    134	 */
>    135	void br_manage_promisc(struct net_bridge *br)
>    136	{
>    137		struct net_bridge_port *p;
>    138	
>    139		list_for_each_entry(p, &br->port_list, list) {
>    140			/* Offloaded ports have a separate address database for
>    141			 * forwarding, which is managed through switchdev and not
>    142			 * through dev_uc_add(), so the promiscuous concept makes no
>    143			 * sense for them. Avoid updating promiscuity in that case.
>    144			 */
>  > 145			if (p->offload_count) {

Good point. Please imagine there's a static inline nbp_get_offload_count()
that returns 0 when CONFIG_NET_SWITCHDEV=n. I'll address this for v2.

>    146				br_port_clear_promisc(p);
>    147				continue;
>    148			}
>    149	
>    150			/* If bridge is promiscuous, unconditionally place all ports
>    151			 * in promiscuous mode too. This allows the bridge device to
>    152			 * locally receive all unknown traffic.
>    153			 */
>    154			if (br->dev->flags & IFF_PROMISC) {
>    155				br_port_set_promisc(p);
>    156				continue;
>    157			}
>    158	
>    159			/* If vlan filtering is disabled, place all ports in
>    160			 * promiscuous mode.
>    161			 */
>    162			if (!br_vlan_enabled(br->dev)) {
>    163				br_port_set_promisc(p);
>    164				continue;
>    165			}
>    166	
>    167			/* If the number of auto-ports is <= 1, then all other ports
>    168			 * will have their output configuration statically specified
>    169			 * through fdbs. Since ingress on the auto-port becomes
>    170			 * forwarding/egress to other ports and egress configuration is
>    171			 * statically known, we can say that ingress configuration of
>    172			 * the auto-port is also statically known.
>    173			 * This lets us disable promiscuous mode and write this config
>    174			 * to hw.
>    175			 */
>    176			if (br->auto_cnt == 0 ||
>    177			    (br->auto_cnt == 1 && br_auto_port(p)))
>    178				br_port_clear_promisc(p);
>    179			else
>    180				br_port_set_promisc(p);
>    181		}
>    182	}
>    183	

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge
  2022-04-08 20:03 [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge Vladimir Oltean
                   ` (6 preceding siblings ...)
  2022-04-08 20:14 ` [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge Vladimir Oltean
@ 2022-04-09 19:46 ` Tobias Waldekranz
  2022-04-09 20:45   ` Vladimir Oltean
  7 siblings, 1 reply; 22+ messages in thread
From: Tobias Waldekranz @ 2022-04-09 19:46 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, UNGLinuxDriver, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel,
	Mattias Forsblad, Joachim Wiberg

On Fri, Apr 08, 2022 at 23:03, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> For this patch series to make more sense, it should be reviewed from the
> last patch to the first. Changes were made in the order that they were
> just to preserve patch-with-patch functionality.
>
> A little while ago, some DSA switch drivers gained support for
> IFF_UNICAST_FLT, a mechanism through which they are notified of the
> MAC addresses required for local standalone termination.
> A bit longer ago, DSA also gained support for offloading BR_FDB_LOCAL
> bridge FDB entries, which are the MAC addresses required for local
> termination when under a bridge.
>
> So we have come one step closer to removing the CPU from the list of
> destinations for packets with unknown MAC DA.What remains is to check
> whether any software L2 forwarding is enabled, and that is accomplished
> by monitoring the neighbor bridge ports that DSA switches have.
>
> With these changes, DSA drivers that fulfill the requirements for
> dsa_switch_supports_uc_filtering() and dsa_switch_supports_mc_filtering()
> will keep flooding towards the CPU disabled for as long as no port is
> promiscuous. The bridge won't attempt to make its ports promiscuous
> anymore either if said ports are offloaded by switchdev (this series
> changes that behavior). Instead, DSA will fall back by its own will to
> promiscuous mode on bridge ports when the bridge itself becomes
> promiscuous, or a foreign interface is detected under the same bridge.

Hi Vladimir,

Great stuff! I've added Joachim to Cc. He has been working on a series
to add support for configuring the equivalent of BR_FLOOD,
BR_MCAST_FLOOD, and BR_BCAST_FLOOD on the bridge itself. I.e. allowing
the user to specify how local_rcv is managed in br_handle_frame_finish.

For switchdev drivers, being able to query whether a bridge will ingress
unknown unicast to the host or not seems like the missing piece that
makes this bullet proof. I.e. if you have...

- No foreign interfaces
- No promisc
_and_
- No BR_FLOOD on the bridge itself

..._then_ you can safely disable unicast flooding towards the CPU
port. The same would hold for multicast and BR_MCAST_FLOOD of course.

Not sure how close Joachim is to publishing his work. But I just thought
you two should know about the other one's work :)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge
  2022-04-09 19:46 ` Tobias Waldekranz
@ 2022-04-09 20:45   ` Vladimir Oltean
  2022-04-10 18:02     ` Tobias Waldekranz
  0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2022-04-09 20:45 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: netdev, Jakub Kicinski, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot, Vladimir Oltean, UNGLinuxDriver,
	Paolo Abeni, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Mattias Forsblad, Joachim Wiberg

Hi Tobias,

On Sat, Apr 09, 2022 at 09:46:54PM +0200, Tobias Waldekranz wrote:
> On Fri, Apr 08, 2022 at 23:03, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> > For this patch series to make more sense, it should be reviewed from the
> > last patch to the first. Changes were made in the order that they were
> > just to preserve patch-with-patch functionality.
> >
> > A little while ago, some DSA switch drivers gained support for
> > IFF_UNICAST_FLT, a mechanism through which they are notified of the
> > MAC addresses required for local standalone termination.
> > A bit longer ago, DSA also gained support for offloading BR_FDB_LOCAL
> > bridge FDB entries, which are the MAC addresses required for local
> > termination when under a bridge.
> >
> > So we have come one step closer to removing the CPU from the list of
> > destinations for packets with unknown MAC DA.What remains is to check
> > whether any software L2 forwarding is enabled, and that is accomplished
> > by monitoring the neighbor bridge ports that DSA switches have.
> >
> > With these changes, DSA drivers that fulfill the requirements for
> > dsa_switch_supports_uc_filtering() and dsa_switch_supports_mc_filtering()
> > will keep flooding towards the CPU disabled for as long as no port is
> > promiscuous. The bridge won't attempt to make its ports promiscuous
> > anymore either if said ports are offloaded by switchdev (this series
> > changes that behavior). Instead, DSA will fall back by its own will to
> > promiscuous mode on bridge ports when the bridge itself becomes
> > promiscuous, or a foreign interface is detected under the same bridge.
> 
> Hi Vladimir,
> 
> Great stuff! I've added Joachim to Cc. He has been working on a series
> to add support for configuring the equivalent of BR_FLOOD,
> BR_MCAST_FLOOD, and BR_BCAST_FLOOD on the bridge itself. I.e. allowing
> the user to specify how local_rcv is managed in br_handle_frame_finish.
> 
> For switchdev drivers, being able to query whether a bridge will ingress
> unknown unicast to the host or not seems like the missing piece that
> makes this bullet proof. I.e. if you have...
> 
> - No foreign interfaces
> - No promisc
> _and_
> - No BR_FLOOD on the bridge itself
> 
> ..._then_ you can safely disable unicast flooding towards the CPU
> port. The same would hold for multicast and BR_MCAST_FLOOD of course.
> 
> Not sure how close Joachim is to publishing his work. But I just thought
> you two should know about the other one's work :)

I haven't seen Joachim's work and I sure hope he can clarify. It seems
like there is some overlap that I don't currently know what to make of.
The way I see things, BR_FLOOD and BR_MCAST_FLOOD are egress settings,
so I'm not sure how to interpret them when applied to the bridge device
itself. On the other hand, treating IFF_PROMISC/IFF_ALLMULTI on the
bridge device as the knob that decides whether the software bridge wants
to ingress unknown MAC DA packets seems the more appropriate thing to do.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge
  2022-04-09 20:45   ` Vladimir Oltean
@ 2022-04-10 18:02     ` Tobias Waldekranz
  2022-04-10 22:03       ` Vladimir Oltean
  0 siblings, 1 reply; 22+ messages in thread
From: Tobias Waldekranz @ 2022-04-10 18:02 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Jakub Kicinski, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot, Vladimir Oltean, UNGLinuxDriver,
	Paolo Abeni, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Mattias Forsblad, Joachim Wiberg

On Sat, Apr 09, 2022 at 20:45, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> Hi Tobias,
>
> On Sat, Apr 09, 2022 at 09:46:54PM +0200, Tobias Waldekranz wrote:
>> On Fri, Apr 08, 2022 at 23:03, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>> > For this patch series to make more sense, it should be reviewed from the
>> > last patch to the first. Changes were made in the order that they were
>> > just to preserve patch-with-patch functionality.
>> >
>> > A little while ago, some DSA switch drivers gained support for
>> > IFF_UNICAST_FLT, a mechanism through which they are notified of the
>> > MAC addresses required for local standalone termination.
>> > A bit longer ago, DSA also gained support for offloading BR_FDB_LOCAL
>> > bridge FDB entries, which are the MAC addresses required for local
>> > termination when under a bridge.
>> >
>> > So we have come one step closer to removing the CPU from the list of
>> > destinations for packets with unknown MAC DA.What remains is to check
>> > whether any software L2 forwarding is enabled, and that is accomplished
>> > by monitoring the neighbor bridge ports that DSA switches have.
>> >
>> > With these changes, DSA drivers that fulfill the requirements for
>> > dsa_switch_supports_uc_filtering() and dsa_switch_supports_mc_filtering()
>> > will keep flooding towards the CPU disabled for as long as no port is
>> > promiscuous. The bridge won't attempt to make its ports promiscuous
>> > anymore either if said ports are offloaded by switchdev (this series
>> > changes that behavior). Instead, DSA will fall back by its own will to
>> > promiscuous mode on bridge ports when the bridge itself becomes
>> > promiscuous, or a foreign interface is detected under the same bridge.
>> 
>> Hi Vladimir,
>> 
>> Great stuff! I've added Joachim to Cc. He has been working on a series
>> to add support for configuring the equivalent of BR_FLOOD,
>> BR_MCAST_FLOOD, and BR_BCAST_FLOOD on the bridge itself. I.e. allowing
>> the user to specify how local_rcv is managed in br_handle_frame_finish.
>> 
>> For switchdev drivers, being able to query whether a bridge will ingress
>> unknown unicast to the host or not seems like the missing piece that
>> makes this bullet proof. I.e. if you have...
>> 
>> - No foreign interfaces
>> - No promisc
>> _and_
>> - No BR_FLOOD on the bridge itself
>> 
>> ..._then_ you can safely disable unicast flooding towards the CPU
>> port. The same would hold for multicast and BR_MCAST_FLOOD of course.
>> 
>> Not sure how close Joachim is to publishing his work. But I just thought
>> you two should know about the other one's work :)
>
> I haven't seen Joachim's work and I sure hope he can clarify.

If you want to get a feel for it, it is available here (the branch name
is just where he started out I think :))

https://github.com/westermo/linux/tree/bridge-always-flood-unknown-mcast

> It seems
> like there is some overlap that I don't currently know what to make of.
> The way I see things, BR_FLOOD and BR_MCAST_FLOOD are egress settings,
> so I'm not sure how to interpret them when applied to the bridge device
> itself.

They are egress settings, yes. But from the view of the forwarding
mechanism in the bridge, I would argue that the host interface is (or at
least should be) as much an egress port as any of the lower devices.

- It can be the target of an FDB entry
- It can be a member of an MDB entry

I.e. it can be chosen as a _destination_ => egress. This is analogous to
the CPU port, which from the ASICs point of view is an egress port, but
from a system POV it is receiving frames.

> On the other hand, treating IFF_PROMISC/IFF_ALLMULTI on the
> bridge device as the knob that decides whether the software bridge wants
> to ingress unknown MAC DA packets seems the more appropriate thing to do.

Maybe. I think it depends on how exact we want to be in our
classification. Fundementally, I think the problem is that a bridge
deals with one thing that other netdevs do not:

  Whether the destination in known/registered or not.

A NIC's unicast/multicast filters are not quite the same thing, because
a they only deal with a single endpoint. I.e. if an address isn't in a
NIC's list of known DA's, then it is "not mine". But in a bridge
scenario, although it is not associated with the host (i.e. "not mine"),
it can still be "known" (i.e. associated with some lower port).

AFAIK, promisc means "receive all the things!", whereas BR_FLOOD would
just select the subset of frames for which the destination is unknown.

Likewise for multicast, IFF_ALLMULTI means "receive _all_ multicast" -
it does not discriminate between registered and unregistered
flows. BR_MCAST_FLOOD OTOH would only target unregistered flows.

Here is my understanding of how the two solutions would differ in the
types of flows that they would affect:

                    .--------------------.-----------------.
                    |       IFF_*        |    BR_*FLOOD    |
.-------------------|---------.----------|-----.-----.-----|
| Type              | Promisc | Allmulti | BC  | UC  | MC  |
|-------------------|---------|----------|-----|-----|-----|
| Broadcast         | Yes     | No       | Yes | No  | No  |
| Unknown unicast   | Yes     | No       | No  | Yes | No  |
| Unknown multicast | Yes     | Yes      | No  | No  | Yes |
| Known unicast     | Yes     | No       | No  | No  | No  |
| Known multicast   | Yes     | Yes      | No  | No  | No  |
'-------------------'--------------------'-----------------'


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge
  2022-04-10 18:02     ` Tobias Waldekranz
@ 2022-04-10 22:03       ` Vladimir Oltean
  2022-04-11  9:33         ` Tobias Waldekranz
  0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2022-04-10 22:03 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: netdev, Jakub Kicinski, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot, Vladimir Oltean, UNGLinuxDriver,
	Paolo Abeni, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Mattias Forsblad, Joachim Wiberg

On Sun, Apr 10, 2022 at 08:02:13PM +0200, Tobias Waldekranz wrote:
> On Sat, Apr 09, 2022 at 20:45, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> > Hi Tobias,
> >
> > On Sat, Apr 09, 2022 at 09:46:54PM +0200, Tobias Waldekranz wrote:
> >> On Fri, Apr 08, 2022 at 23:03, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> >> > For this patch series to make more sense, it should be reviewed from the
> >> > last patch to the first. Changes were made in the order that they were
> >> > just to preserve patch-with-patch functionality.
> >> >
> >> > A little while ago, some DSA switch drivers gained support for
> >> > IFF_UNICAST_FLT, a mechanism through which they are notified of the
> >> > MAC addresses required for local standalone termination.
> >> > A bit longer ago, DSA also gained support for offloading BR_FDB_LOCAL
> >> > bridge FDB entries, which are the MAC addresses required for local
> >> > termination when under a bridge.
> >> >
> >> > So we have come one step closer to removing the CPU from the list of
> >> > destinations for packets with unknown MAC DA.What remains is to check
> >> > whether any software L2 forwarding is enabled, and that is accomplished
> >> > by monitoring the neighbor bridge ports that DSA switches have.
> >> >
> >> > With these changes, DSA drivers that fulfill the requirements for
> >> > dsa_switch_supports_uc_filtering() and dsa_switch_supports_mc_filtering()
> >> > will keep flooding towards the CPU disabled for as long as no port is
> >> > promiscuous. The bridge won't attempt to make its ports promiscuous
> >> > anymore either if said ports are offloaded by switchdev (this series
> >> > changes that behavior). Instead, DSA will fall back by its own will to
> >> > promiscuous mode on bridge ports when the bridge itself becomes
> >> > promiscuous, or a foreign interface is detected under the same bridge.
> >> 
> >> Hi Vladimir,
> >> 
> >> Great stuff! I've added Joachim to Cc. He has been working on a series
> >> to add support for configuring the equivalent of BR_FLOOD,
> >> BR_MCAST_FLOOD, and BR_BCAST_FLOOD on the bridge itself. I.e. allowing
> >> the user to specify how local_rcv is managed in br_handle_frame_finish.
> >> 
> >> For switchdev drivers, being able to query whether a bridge will ingress
> >> unknown unicast to the host or not seems like the missing piece that
> >> makes this bullet proof. I.e. if you have...
> >> 
> >> - No foreign interfaces
> >> - No promisc
> >> _and_
> >> - No BR_FLOOD on the bridge itself
> >> 
> >> ..._then_ you can safely disable unicast flooding towards the CPU
> >> port. The same would hold for multicast and BR_MCAST_FLOOD of course.
> >> 
> >> Not sure how close Joachim is to publishing his work. But I just thought
> >> you two should know about the other one's work :)
> >
> > I haven't seen Joachim's work and I sure hope he can clarify.
> 
> If you want to get a feel for it, it is available here (the branch name
> is just where he started out I think :))
> 
> https://github.com/westermo/linux/tree/bridge-always-flood-unknown-mcast
> 
> > It seems
> > like there is some overlap that I don't currently know what to make of.
> > The way I see things, BR_FLOOD and BR_MCAST_FLOOD are egress settings,
> > so I'm not sure how to interpret them when applied to the bridge device
> > itself.
> 
> They are egress settings, yes. But from the view of the forwarding
> mechanism in the bridge, I would argue that the host interface is (or at
> least should be) as much an egress port as any of the lower devices.
> 
> - It can be the target of an FDB entry
> - It can be a member of an MDB entry
> 
> I.e. it can be chosen as a _destination_ => egress. This is analogous to
> the CPU port, which from the ASICs point of view is an egress port, but
> from a system POV it is receiving frames.
> 
> > On the other hand, treating IFF_PROMISC/IFF_ALLMULTI on the
> > bridge device as the knob that decides whether the software bridge wants
> > to ingress unknown MAC DA packets seems the more appropriate thing to do.
> 
> Maybe. I think it depends on how exact we want to be in our
> classification. Fundementally, I think the problem is that a bridge
> deals with one thing that other netdevs do not:
> 
>   Whether the destination in known/registered or not.
> 
> A NIC's unicast/multicast filters are not quite the same thing, because
> a they only deal with a single endpoint. I.e. if an address isn't in a
> NIC's list of known DA's, then it is "not mine". But in a bridge
> scenario, although it is not associated with the host (i.e. "not mine"),
> it can still be "known" (i.e. associated with some lower port).
> 
> AFAIK, promisc means "receive all the things!", whereas BR_FLOOD would
> just select the subset of frames for which the destination is unknown.
> 
> Likewise for multicast, IFF_ALLMULTI means "receive _all_ multicast" -
> it does not discriminate between registered and unregistered
> flows. BR_MCAST_FLOOD OTOH would only target unregistered flows.
> 
> Here is my understanding of how the two solutions would differ in the
> types of flows that they would affect:
> 
>                     .--------------------.-----------------.
>                     |       IFF_*        |    BR_*FLOOD    |
> .-------------------|---------.----------|-----.-----.-----|
> | Type              | Promisc | Allmulti | BC  | UC  | MC  |
> |-------------------|---------|----------|-----|-----|-----|
> | Broadcast         | Yes     | No       | Yes | No  | No  |
> | Unknown unicast   | Yes     | No       | No  | Yes | No  |
> | Unknown multicast | Yes     | Yes      | No  | No  | Yes |
> | Known unicast     | Yes     | No       | No  | No  | No  |
                        ~~~
                        To what degree does IFF_PROMISC affect known
                        unicast traffic?

> | Known multicast   | Yes     | Yes      | No  | No  | No  |
> '-------------------'--------------------'-----------------'

So to summarize what you're trying to say. You see two planes back to back.

 +-------------------------------+
 |    User space, other uppers   |
 +-------------------------------+
 |   Termination plane governed  |
 |     by dev->uc and dev->mc    |
 |            +-----+            |
 |            | br0 |            |
 +------------+-----+------------+
 |            | br0 |            |
 |            +-----+            |
 |  Forwarding plane governed    |
 |        by FDB and MDB         |
 | +------+------+------+------+ |
 | | swp0 | swp1 | swp2 | swp3 | |
 +-+------+------+------+------+-+

For a packet to be locally received on the br0 interface, it needs to
pass the filters from both the forwarding plane and the termination
plane.

Considering a unicast packet:
- if a local/permanent entry exists in the FDB, it is known to the
  forwarding plane, and its destination is the bridge device seen as a
  bridge port
- if the FDB doesn't contain this MAC DA, it is unknown to the
  forwarding plane, but the bridge device is still a destination for it,
  since the bridge seen as a bridge port has a non-configurable BR_FLOOD
  port flag which allows unknown unicast to exit the forwarding plane
  towards the termination plane implicitly on
- if the MAC DA exists in the RX filter of the bridge device
  (dev->dev_addr or dev->uc), the packet is known to the termination
  plane, so it is not filtered out.
- if the MAC DA doesn't exist in the RX filter, the packet is filtered
  out, unless the RX filter is disabled via IFF_PROMISC, case in which
  it is still accepted

Considering a multicast packet, things are mostly the same, except for
the fact that having a local/permanent MDB entry towards the bridge
device does not necessarily 'steal' it from the forwarding plane as it
does in case of unicast, since multicast traffic can have multiple
destinations which don't exclude each other.

Needless to say that what we have today is a very limited piece of the
bigger puzzle you've presented above, and perhaps does not even behave
the same as the larger puzzle would, restricted to the same operating
conditions as the current code.

Here are some practical questions so I can make sure I understand the
model you're proposing.

1. You or Joachim add support for BR_FLOOD on the bridge device itself.
   The bridge device is promiscuous, and a packet with a MAC DA unknown
   to the forwarding plane is received. Will this packet be seen in
   tcpdump or not? I assume not, because it never even reached the RX
   filtering lists of the bridge device, it didn't exit the forwarding
   plane.

2. Somebody comes later and adds support for IFF_ALLMULTI. This means,
   the RX filtering for unknown multicast can be toggled on or off.
   Be there a bridge with mcast_router enabled. Should the bridge device
   receive unknown multicast traffic or not, when IFF_ALLMULTI isn't
   enabled? Right now, I think IFF_ALLMULTI isn't necessary.

3. The bridge device does not implement IFF_UNICAST_FLT today, so any
   addition to the bridge_dev->uc list will turn on bridge_dev->flags &
   IFF_PROMISC. Do you see this as a problem on the RX filtering side of
   things, or from a system administration PoV, would you just like to
   disable BR_FLOOD on the forwarding side of the bridge device and call
   it a day, expect the applications to just continue working?

4. Let's say we implement IFF_UNICAST_FLT for the bridge device.
   How do we do it? Add one more lookup in br_handle_frame_finish()
   which didn't exist before, to search for this MAC DA in bridge_dev->uc?

5. Should the implementation of IFF_UNICAST_FLT influence the forwarding
   plane in any way? Think of a user space application emitting a
   PACKET_MR_UNICAST request to ensure it sees the packets with the MAC
   DA it needs. Should said application have awareness of the fact that
   the interface it's speaking to is a bridge device? If not, three
   things may happen. The admin (you) has turned off BR_FLOOD towards
   the bridge device, effectively cutting it off from the (unknown to
   the forwarding plane) packets it wants to see. Or the admin hasn't
   turned off BR_FLOOD, but the MAC DA, while present in the RX
   filtering lists, is still absent from the FDB, so while it is
   received locally by the application, is also flooded to all other
   bridge ports. Or the kernel may automatically add a BR_FDB_LOCAL entry,
   considering that it knows that if there's a destination for this
   MAC DA in this broadcast domain, for certain there isn't more than
   one, so it could just as well guide the forwarding plane towards it.
   Or you may choose completely the other side, "hey, the bridge device
   really is that special, and user space should put up with it and know
   it has to configure both its forwarding and termination plane before
   things work in a reasonable manner on it". But if this is the path
   you choose, what about the uppers a bridge may have, like a VLAN with
   a MAC address different from the bridge's? Should the 8021q driver
   also be aware of the fact that dev_uc_add() may not be sufficient
   when applied to the bridge as a real_dev?

6. For a switchdev driver like DSA, what condition do you propose it
   should monitor for deciding whether to enable flooding towards the
   host? IFF_PROMISC, BR_FLOOD towards the bridge, or both? You may say
   "hey, switchdev offloads just the forwarding plane, I don't really
   care if the bridge software path is going to accept the packet or
   not". Or you may say that unicast flooding is to be enabled if
   IFF_PROMISC || BR_FLOOD, and multicast flooding if IFF_PROMISC ||
   IFF_ALLMULTI || BR_MCAST_FLOOD, and broadcast flooding if
   BR_BCAST_FLOOD. Now take into consideration the additional checks for
   foreign interfaces. Does this complexity scale to the number of
   switchdev drivers we have? If not, can we do something about it?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge
  2022-04-10 22:03       ` Vladimir Oltean
@ 2022-04-11  9:33         ` Tobias Waldekranz
  2022-04-11 10:55           ` Vladimir Oltean
  0 siblings, 1 reply; 22+ messages in thread
From: Tobias Waldekranz @ 2022-04-11  9:33 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Jakub Kicinski, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot, Vladimir Oltean, UNGLinuxDriver,
	Paolo Abeni, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Mattias Forsblad, Joachim Wiberg

On Sun, Apr 10, 2022 at 22:03, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> On Sun, Apr 10, 2022 at 08:02:13PM +0200, Tobias Waldekranz wrote:
>> On Sat, Apr 09, 2022 at 20:45, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>> > Hi Tobias,
>> >
>> > On Sat, Apr 09, 2022 at 09:46:54PM +0200, Tobias Waldekranz wrote:
>> >> On Fri, Apr 08, 2022 at 23:03, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>> >> > For this patch series to make more sense, it should be reviewed from the
>> >> > last patch to the first. Changes were made in the order that they were
>> >> > just to preserve patch-with-patch functionality.
>> >> >
>> >> > A little while ago, some DSA switch drivers gained support for
>> >> > IFF_UNICAST_FLT, a mechanism through which they are notified of the
>> >> > MAC addresses required for local standalone termination.
>> >> > A bit longer ago, DSA also gained support for offloading BR_FDB_LOCAL
>> >> > bridge FDB entries, which are the MAC addresses required for local
>> >> > termination when under a bridge.
>> >> >
>> >> > So we have come one step closer to removing the CPU from the list of
>> >> > destinations for packets with unknown MAC DA.What remains is to check
>> >> > whether any software L2 forwarding is enabled, and that is accomplished
>> >> > by monitoring the neighbor bridge ports that DSA switches have.
>> >> >
>> >> > With these changes, DSA drivers that fulfill the requirements for
>> >> > dsa_switch_supports_uc_filtering() and dsa_switch_supports_mc_filtering()
>> >> > will keep flooding towards the CPU disabled for as long as no port is
>> >> > promiscuous. The bridge won't attempt to make its ports promiscuous
>> >> > anymore either if said ports are offloaded by switchdev (this series
>> >> > changes that behavior). Instead, DSA will fall back by its own will to
>> >> > promiscuous mode on bridge ports when the bridge itself becomes
>> >> > promiscuous, or a foreign interface is detected under the same bridge.
>> >> 
>> >> Hi Vladimir,
>> >> 
>> >> Great stuff! I've added Joachim to Cc. He has been working on a series
>> >> to add support for configuring the equivalent of BR_FLOOD,
>> >> BR_MCAST_FLOOD, and BR_BCAST_FLOOD on the bridge itself. I.e. allowing
>> >> the user to specify how local_rcv is managed in br_handle_frame_finish.
>> >> 
>> >> For switchdev drivers, being able to query whether a bridge will ingress
>> >> unknown unicast to the host or not seems like the missing piece that
>> >> makes this bullet proof. I.e. if you have...
>> >> 
>> >> - No foreign interfaces
>> >> - No promisc
>> >> _and_
>> >> - No BR_FLOOD on the bridge itself
>> >> 
>> >> ..._then_ you can safely disable unicast flooding towards the CPU
>> >> port. The same would hold for multicast and BR_MCAST_FLOOD of course.
>> >> 
>> >> Not sure how close Joachim is to publishing his work. But I just thought
>> >> you two should know about the other one's work :)
>> >
>> > I haven't seen Joachim's work and I sure hope he can clarify.
>> 
>> If you want to get a feel for it, it is available here (the branch name
>> is just where he started out I think :))
>> 
>> https://github.com/westermo/linux/tree/bridge-always-flood-unknown-mcast
>> 
>> > It seems
>> > like there is some overlap that I don't currently know what to make of.
>> > The way I see things, BR_FLOOD and BR_MCAST_FLOOD are egress settings,
>> > so I'm not sure how to interpret them when applied to the bridge device
>> > itself.
>> 
>> They are egress settings, yes. But from the view of the forwarding
>> mechanism in the bridge, I would argue that the host interface is (or at
>> least should be) as much an egress port as any of the lower devices.
>> 
>> - It can be the target of an FDB entry
>> - It can be a member of an MDB entry
>> 
>> I.e. it can be chosen as a _destination_ => egress. This is analogous to
>> the CPU port, which from the ASICs point of view is an egress port, but
>> from a system POV it is receiving frames.
>> 
>> > On the other hand, treating IFF_PROMISC/IFF_ALLMULTI on the
>> > bridge device as the knob that decides whether the software bridge wants
>> > to ingress unknown MAC DA packets seems the more appropriate thing to do.
>> 
>> Maybe. I think it depends on how exact we want to be in our
>> classification. Fundementally, I think the problem is that a bridge
>> deals with one thing that other netdevs do not:
>> 
>>   Whether the destination in known/registered or not.
>> 
>> A NIC's unicast/multicast filters are not quite the same thing, because
>> a they only deal with a single endpoint. I.e. if an address isn't in a
>> NIC's list of known DA's, then it is "not mine". But in a bridge
>> scenario, although it is not associated with the host (i.e. "not mine"),
>> it can still be "known" (i.e. associated with some lower port).
>> 
>> AFAIK, promisc means "receive all the things!", whereas BR_FLOOD would
>> just select the subset of frames for which the destination is unknown.
>> 
>> Likewise for multicast, IFF_ALLMULTI means "receive _all_ multicast" -
>> it does not discriminate between registered and unregistered
>> flows. BR_MCAST_FLOOD OTOH would only target unregistered flows.
>> 
>> Here is my understanding of how the two solutions would differ in the
>> types of flows that they would affect:
>> 
>>                     .--------------------.-----------------.
>>                     |       IFF_*        |    BR_*FLOOD    |
>> .-------------------|---------.----------|-----.-----.-----|
>> | Type              | Promisc | Allmulti | BC  | UC  | MC  |
>> |-------------------|---------|----------|-----|-----|-----|
>> | Broadcast         | Yes     | No       | Yes | No  | No  |
>> | Unknown unicast   | Yes     | No       | No  | Yes | No  |
>> | Unknown multicast | Yes     | Yes      | No  | No  | Yes |
>> | Known unicast     | Yes     | No       | No  | No  | No  |
>                         ~~~
>                         To what degree does IFF_PROMISC affect known
>                         unicast traffic?

When a bridge interface has this flag set, local_rcv is unconditionally
set to true in br_handle_frame_finish:

    local_rcv = !!(br->dev->flags & IFF_PROMISC);

Which means we will always end up in br_pass_frame_up.

>> | Known multicast   | Yes     | Yes      | No  | No  | No  |
>> '-------------------'--------------------'-----------------'
>
> So to summarize what you're trying to say. You see two planes back to back.
>
>  +-------------------------------+
>  |    User space, other uppers   |
>  +-------------------------------+
>  |   Termination plane governed  |
>  |     by dev->uc and dev->mc    |
>  |            +-----+            |
>  |            | br0 |            |
>  +------------+-----+------------+
>  |            | br0 |            |
>  |            +-----+            |
>  |  Forwarding plane governed    |
>  |        by FDB and MDB         |
>  | +------+------+------+------+ |
>  | | swp0 | swp1 | swp2 | swp3 | |
>  +-+------+------+------+------+-+

Yes, this is pretty much my mental model of it.

> For a packet to be locally received on the br0 interface, it needs to
> pass the filters from both the forwarding plane and the termination
> plane.

I see it more as the forwarding plane having a superset of the
information available to the termination plane.

I.e. all of the information in dev->uc and dev->mc can easily be
represented in the FDB (as a "local" entry) and MDB (by setting
`host_joined`) respectively, but there's loads of information in the
{F,M}DB for which there is no representation in a netdev's address
lists.

> Considering a unicast packet:
> - if a local/permanent entry exists in the FDB, it is known to the
>   forwarding plane, and its destination is the bridge device seen as a
>   bridge port
> - if the FDB doesn't contain this MAC DA, it is unknown to the
>   forwarding plane, but the bridge device is still a destination for it,
>   since the bridge seen as a bridge port has a non-configurable BR_FLOOD
>   port flag which allows unknown unicast to exit the forwarding plane
>   towards the termination plane implicitly on
> - if the MAC DA exists in the RX filter of the bridge device
>   (dev->dev_addr or dev->uc), the packet is known to the termination
>   plane, so it is not filtered out.

It's more like the aggregate of all local entries _is_ the termination
plane's Rx filter. I.e. in the bridge's .ndo_set_rx_mode, we should sync
all UC/MC addresses into the FDB/MDB.

There's a lot of stuff to deal with here around VLANs, since that
concept is missing from the kernel's address lists.

> - if the MAC DA doesn't exist in the RX filter, the packet is filtered
>   out, unless the RX filter is disabled via IFF_PROMISC, case in which
>   it is still accepted
>
> Considering a multicast packet, things are mostly the same, except for
> the fact that having a local/permanent MDB entry towards the bridge
> device does not necessarily 'steal' it from the forwarding plane as it
> does in case of unicast, since multicast traffic can have multiple
> destinations which don't exclude each other.

Not _necessarily_, but you are reclassifying the group as registered,
which means you go from flooding to forwarding. This in turn, means only
other registered subscribers (and routers) will see it.

> Needless to say that what we have today is a very limited piece of the
> bigger puzzle you've presented above, and perhaps does not even behave
> the same as the larger puzzle would, restricted to the same operating
> conditions as the current code.
>
> Here are some practical questions so I can make sure I understand the
> model you're proposing.
>
> 1. You or Joachim add support for BR_FLOOD on the bridge device itself.
>    The bridge device is promiscuous, and a packet with a MAC DA unknown
>    to the forwarding plane is received. Will this packet be seen in
>    tcpdump or not? I assume not, because it never even reached the RX
>    filtering lists of the bridge device, it didn't exit the forwarding
>    plane.

We don't propose any changes to how IFF_PROMISC is interpreted at
all. Joachim's changes will simply allow a user more fine grained
control over which flows are recieved by the termination plane when
IFF_PROMISC is _not_ set.

> 2. Somebody comes later and adds support for IFF_ALLMULTI. This means,
>    the RX filtering for unknown multicast can be toggled on or off.
>    Be there a bridge with mcast_router enabled. Should the bridge device
>    receive unknown multicast traffic or not, when IFF_ALLMULTI isn't
>    enabled? Right now, I think IFF_ALLMULTI isn't necessary.

I think that setting IFF_ALLMULTI on a bridge interface is really just
another way of saying that the bridge should be configured as a static
multicast router. I.e. ...

    ip link set dev br0 allmulticast on

...should be equivalent to...

    ip link set dev br0 type bridge mcast_router 2

...since that flag is an indication that _all_ multicast should be
received, both registered and unregistered.

We propose to add support for all three classes of _unknown_ traffic
(BUM) from the get-go:

- BR_FLOOD (Unknown unicast)
- BR_MCAST_FLOOD (Unregistered multicast)
- BR_BCAST_FLOOD (Broadcast)

So if BR_MCAST_FLOOD is not set, then the termination plane will not
receive any unregistered multicast. If it is marked as a multicast
router, then it will receive all registered _and_ unregistered
multicast. Same as with any other bridge port.

> 3. The bridge device does not implement IFF_UNICAST_FLT today, so any
>    addition to the bridge_dev->uc list will turn on bridge_dev->flags &
>    IFF_PROMISC. Do you see this as a problem on the RX filtering side of
>    things, or from a system administration PoV, would you just like to
>    disable BR_FLOOD on the forwarding side of the bridge device and call
>    it a day, expect the applications to just continue working?
>
> 4. Let's say we implement IFF_UNICAST_FLT for the bridge device.
>    How do we do it? Add one more lookup in br_handle_frame_finish()
>    which didn't exist before, to search for this MAC DA in bridge_dev->uc?

I think the most natural way would be to add them to the FDB as local
entries. We might need an extra flag to track its origin or something,
but it should be doable, I think. Again I think this is analogous to how
host addresses are added as FDB entries pointing towards the CPU port in
an ASIC.

> 5. Should the implementation of IFF_UNICAST_FLT influence the forwarding
>    plane in any way?

Since I propose we keep it in the FDB, yes, it will affect forwarding.

>    Think of a user space application emitting a
>    PACKET_MR_UNICAST request to ensure it sees the packets with the MAC
>    DA it needs. Should said application have awareness of the fact that
>    the interface it's speaking to is a bridge device? If not, three
>    things may happen. The admin (you) has turned off BR_FLOOD towards
>    the bridge device, effectively cutting it off from the (unknown to
>    the forwarding plane) packets it wants to see. Or the admin hasn't
>    turned off BR_FLOOD, but the MAC DA, while present in the RX
>    filtering lists, is still absent from the FDB, so while it is
>    received locally by the application, is also flooded to all other
>    bridge ports. Or the kernel may automatically add a BR_FDB_LOCAL entry,
>    considering that it knows that if there's a destination for this
>    MAC DA in this broadcast domain, for certain there isn't more than
>    one, so it could just as well guide the forwarding plane towards it.
>    Or you may choose completely the other side, "hey, the bridge device
>    really is that special, and user space should put up with it and know
>    it has to configure both its forwarding and termination plane before
>    things work in a reasonable manner on it". But if this is the path
>    you choose, what about the uppers a bridge may have, like a VLAN with
>    a MAC address different from the bridge's? Should the 8021q driver
>    also be aware of the fact that dev_uc_add() may not be sufficient
>    when applied to the bridge as a real_dev?

I might be repeating myself, but just to be clear: I think the end goal
should be to have a proper .ndo_set_rx_mode implementation that
translates the device address lists into the forwarding plane, making
most of this transparent to users.

> 6. For a switchdev driver like DSA, what condition do you propose it
>    should monitor for deciding whether to enable flooding towards the
>    host? IFF_PROMISC, BR_FLOOD towards the bridge, or both? You may say
>    "hey, switchdev offloads just the forwarding plane, I don't really
>    care if the bridge software path is going to accept the packet or
>    not". Or you may say that unicast flooding is to be enabled if
>    IFF_PROMISC || BR_FLOOD, and multicast flooding if IFF_PROMISC ||
>    IFF_ALLMULTI || BR_MCAST_FLOOD, and broadcast flooding if
>    BR_BCAST_FLOOD. Now take into consideration the additional checks for
>    foreign interfaces. Does this complexity scale to the number of
>    switchdev drivers we have? If not, can we do something about it?

Rule #1 is, If IFF_PROMISC is enabled on our bridge, we must flood
everything.

Then for each class:

- Broadcast: If BR_BCAST_FLOOD is enabled on the bridge or on any
  foreign interface, we must flood it to the CPU.

- Unicast: If BR_FLOOD is enabled on the bridge or on any foreign
  interface, we must flood it to the CPU.

- Multicast: If BR_MCAST_FLOOD is enabled on the bridge or on any
  foreign interface, we must flood it to the CPU.

For multicast, we also have to take the router port status of the bridge
for foreign interfaces into consideration, but I believe that should be
managed separately.

Additionally, if you _truly_ want to model the bridge's behavior, then
IFF_PROMISC on the bridge would mean that we either disable offloading
completely and forward everything via the CPU or setup some kind of port
mirror on all ports. Although that seems kind of silly, it might
actually be useful for debugging on occasion. You could have an ethtool
flag or something that would allow the user to enable it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge
  2022-04-11  9:33         ` Tobias Waldekranz
@ 2022-04-11 10:55           ` Vladimir Oltean
  2022-04-11 13:14             ` Tobias Waldekranz
  0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2022-04-11 10:55 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: netdev, Jakub Kicinski, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot, Vladimir Oltean, UNGLinuxDriver,
	Paolo Abeni, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Mattias Forsblad, Joachim Wiberg

On Mon, Apr 11, 2022 at 11:33:40AM +0200, Tobias Waldekranz wrote:
> On Sun, Apr 10, 2022 at 22:03, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> > On Sun, Apr 10, 2022 at 08:02:13PM +0200, Tobias Waldekranz wrote:
> >> On Sat, Apr 09, 2022 at 20:45, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> >> > Hi Tobias,
> >> >
> >> > On Sat, Apr 09, 2022 at 09:46:54PM +0200, Tobias Waldekranz wrote:
> >> >> On Fri, Apr 08, 2022 at 23:03, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> >> >> > For this patch series to make more sense, it should be reviewed from the
> >> >> > last patch to the first. Changes were made in the order that they were
> >> >> > just to preserve patch-with-patch functionality.
> >> >> >
> >> >> > A little while ago, some DSA switch drivers gained support for
> >> >> > IFF_UNICAST_FLT, a mechanism through which they are notified of the
> >> >> > MAC addresses required for local standalone termination.
> >> >> > A bit longer ago, DSA also gained support for offloading BR_FDB_LOCAL
> >> >> > bridge FDB entries, which are the MAC addresses required for local
> >> >> > termination when under a bridge.
> >> >> >
> >> >> > So we have come one step closer to removing the CPU from the list of
> >> >> > destinations for packets with unknown MAC DA.What remains is to check
> >> >> > whether any software L2 forwarding is enabled, and that is accomplished
> >> >> > by monitoring the neighbor bridge ports that DSA switches have.
> >> >> >
> >> >> > With these changes, DSA drivers that fulfill the requirements for
> >> >> > dsa_switch_supports_uc_filtering() and dsa_switch_supports_mc_filtering()
> >> >> > will keep flooding towards the CPU disabled for as long as no port is
> >> >> > promiscuous. The bridge won't attempt to make its ports promiscuous
> >> >> > anymore either if said ports are offloaded by switchdev (this series
> >> >> > changes that behavior). Instead, DSA will fall back by its own will to
> >> >> > promiscuous mode on bridge ports when the bridge itself becomes
> >> >> > promiscuous, or a foreign interface is detected under the same bridge.
> >> >> 
> >> >> Hi Vladimir,
> >> >> 
> >> >> Great stuff! I've added Joachim to Cc. He has been working on a series
> >> >> to add support for configuring the equivalent of BR_FLOOD,
> >> >> BR_MCAST_FLOOD, and BR_BCAST_FLOOD on the bridge itself. I.e. allowing
> >> >> the user to specify how local_rcv is managed in br_handle_frame_finish.
> >> >> 
> >> >> For switchdev drivers, being able to query whether a bridge will ingress
> >> >> unknown unicast to the host or not seems like the missing piece that
> >> >> makes this bullet proof. I.e. if you have...
> >> >> 
> >> >> - No foreign interfaces
> >> >> - No promisc
> >> >> _and_
> >> >> - No BR_FLOOD on the bridge itself
> >> >> 
> >> >> ..._then_ you can safely disable unicast flooding towards the CPU
> >> >> port. The same would hold for multicast and BR_MCAST_FLOOD of course.
> >> >> 
> >> >> Not sure how close Joachim is to publishing his work. But I just thought
> >> >> you two should know about the other one's work :)
> >> >
> >> > I haven't seen Joachim's work and I sure hope he can clarify.
> >> 
> >> If you want to get a feel for it, it is available here (the branch name
> >> is just where he started out I think :))
> >> 
> >> https://github.com/westermo/linux/tree/bridge-always-flood-unknown-mcast
> >> 
> >> > It seems
> >> > like there is some overlap that I don't currently know what to make of.
> >> > The way I see things, BR_FLOOD and BR_MCAST_FLOOD are egress settings,
> >> > so I'm not sure how to interpret them when applied to the bridge device
> >> > itself.
> >> 
> >> They are egress settings, yes. But from the view of the forwarding
> >> mechanism in the bridge, I would argue that the host interface is (or at
> >> least should be) as much an egress port as any of the lower devices.
> >> 
> >> - It can be the target of an FDB entry
> >> - It can be a member of an MDB entry
> >> 
> >> I.e. it can be chosen as a _destination_ => egress. This is analogous to
> >> the CPU port, which from the ASICs point of view is an egress port, but
> >> from a system POV it is receiving frames.
> >> 
> >> > On the other hand, treating IFF_PROMISC/IFF_ALLMULTI on the
> >> > bridge device as the knob that decides whether the software bridge wants
> >> > to ingress unknown MAC DA packets seems the more appropriate thing to do.
> >> 
> >> Maybe. I think it depends on how exact we want to be in our
> >> classification. Fundementally, I think the problem is that a bridge
> >> deals with one thing that other netdevs do not:
> >> 
> >>   Whether the destination in known/registered or not.
> >> 
> >> A NIC's unicast/multicast filters are not quite the same thing, because
> >> a they only deal with a single endpoint. I.e. if an address isn't in a
> >> NIC's list of known DA's, then it is "not mine". But in a bridge
> >> scenario, although it is not associated with the host (i.e. "not mine"),
> >> it can still be "known" (i.e. associated with some lower port).
> >> 
> >> AFAIK, promisc means "receive all the things!", whereas BR_FLOOD would
> >> just select the subset of frames for which the destination is unknown.
> >> 
> >> Likewise for multicast, IFF_ALLMULTI means "receive _all_ multicast" -
> >> it does not discriminate between registered and unregistered
> >> flows. BR_MCAST_FLOOD OTOH would only target unregistered flows.
> >> 
> >> Here is my understanding of how the two solutions would differ in the
> >> types of flows that they would affect:
> >> 
> >>                     .--------------------.-----------------.
> >>                     |       IFF_*        |    BR_*FLOOD    |
> >> .-------------------|---------.----------|-----.-----.-----|
> >> | Type              | Promisc | Allmulti | BC  | UC  | MC  |
> >> |-------------------|---------|----------|-----|-----|-----|
> >> | Broadcast         | Yes     | No       | Yes | No  | No  |
> >> | Unknown unicast   | Yes     | No       | No  | Yes | No  |
> >> | Unknown multicast | Yes     | Yes      | No  | No  | Yes |
> >> | Known unicast     | Yes     | No       | No  | No  | No  |
> >                         ~~~
> >                         To what degree does IFF_PROMISC affect known
> >                         unicast traffic?
> 
> When a bridge interface has this flag set, local_rcv is unconditionally
> set to true in br_handle_frame_finish:
> 
>     local_rcv = !!(br->dev->flags & IFF_PROMISC);
> 
> Which means we will always end up in br_pass_frame_up.
> 
> >> | Known multicast   | Yes     | Yes      | No  | No  | No  |
> >> '-------------------'--------------------'-----------------'
> >
> > So to summarize what you're trying to say. You see two planes back to back.
> >
> >  +-------------------------------+
> >  |    User space, other uppers   |
> >  +-------------------------------+
> >  |   Termination plane governed  |
> >  |     by dev->uc and dev->mc    |
> >  |            +-----+            |
> >  |            | br0 |            |
> >  +------------+-----+------------+
> >  |            | br0 |            |
> >  |            +-----+            |
> >  |  Forwarding plane governed    |
> >  |        by FDB and MDB         |
> >  | +------+------+------+------+ |
> >  | | swp0 | swp1 | swp2 | swp3 | |
> >  +-+------+------+------+------+-+
> 
> Yes, this is pretty much my mental model of it.
> 
> > For a packet to be locally received on the br0 interface, it needs to
> > pass the filters from both the forwarding plane and the termination
> > plane.
> 
> I see it more as the forwarding plane having a superset of the
> information available to the termination plane.
> 
> I.e. all of the information in dev->uc and dev->mc can easily be
> represented in the FDB (as a "local" entry) and MDB (by setting
> `host_joined`) respectively, but there's loads of information in the
> {F,M}DB for which there is no representation in a netdev's address
> lists.
> 
> > Considering a unicast packet:
> > - if a local/permanent entry exists in the FDB, it is known to the
> >   forwarding plane, and its destination is the bridge device seen as a
> >   bridge port
> > - if the FDB doesn't contain this MAC DA, it is unknown to the
> >   forwarding plane, but the bridge device is still a destination for it,
> >   since the bridge seen as a bridge port has a non-configurable BR_FLOOD
> >   port flag which allows unknown unicast to exit the forwarding plane
> >   towards the termination plane implicitly on
> > - if the MAC DA exists in the RX filter of the bridge device
> >   (dev->dev_addr or dev->uc), the packet is known to the termination
> >   plane, so it is not filtered out.
> 
> It's more like the aggregate of all local entries _is_ the termination
> plane's Rx filter. I.e. in the bridge's .ndo_set_rx_mode, we should sync
> all UC/MC addresses into the FDB/MDB.
> 
> There's a lot of stuff to deal with here around VLANs, since that
> concept is missing from the kernel's address lists.
> 
> > - if the MAC DA doesn't exist in the RX filter, the packet is filtered
> >   out, unless the RX filter is disabled via IFF_PROMISC, case in which
> >   it is still accepted
> >
> > Considering a multicast packet, things are mostly the same, except for
> > the fact that having a local/permanent MDB entry towards the bridge
> > device does not necessarily 'steal' it from the forwarding plane as it
> > does in case of unicast, since multicast traffic can have multiple
> > destinations which don't exclude each other.
> 
> Not _necessarily_, but you are reclassifying the group as registered,
> which means you go from flooding to forwarding. This in turn, means only
> other registered subscribers (and routers) will see it.
> 
> > Needless to say that what we have today is a very limited piece of the
> > bigger puzzle you've presented above, and perhaps does not even behave
> > the same as the larger puzzle would, restricted to the same operating
> > conditions as the current code.
> >
> > Here are some practical questions so I can make sure I understand the
> > model you're proposing.
> >
> > 1. You or Joachim add support for BR_FLOOD on the bridge device itself.
> >    The bridge device is promiscuous, and a packet with a MAC DA unknown
> >    to the forwarding plane is received. Will this packet be seen in
> >    tcpdump or not? I assume not, because it never even reached the RX
> >    filtering lists of the bridge device, it didn't exit the forwarding
> >    plane.
> 
> We don't propose any changes to how IFF_PROMISC is interpreted at
> all. Joachim's changes will simply allow a user more fine grained
> control over which flows are recieved by the termination plane when
> IFF_PROMISC is _not_ set.
> 
> > 2. Somebody comes later and adds support for IFF_ALLMULTI. This means,
> >    the RX filtering for unknown multicast can be toggled on or off.
> >    Be there a bridge with mcast_router enabled. Should the bridge device
> >    receive unknown multicast traffic or not, when IFF_ALLMULTI isn't
> >    enabled? Right now, I think IFF_ALLMULTI isn't necessary.
> 
> I think that setting IFF_ALLMULTI on a bridge interface is really just
> another way of saying that the bridge should be configured as a static
> multicast router. I.e. ...
> 
>     ip link set dev br0 allmulticast on
> 
> ...should be equivalent to...
> 
>     ip link set dev br0 type bridge mcast_router 2
> 
> ...since that flag is an indication that _all_ multicast should be
> received, both registered and unregistered.
> 
> We propose to add support for all three classes of _unknown_ traffic
> (BUM) from the get-go:
> 
> - BR_FLOOD (Unknown unicast)
> - BR_MCAST_FLOOD (Unregistered multicast)
> - BR_BCAST_FLOOD (Broadcast)
> 
> So if BR_MCAST_FLOOD is not set, then the termination plane will not
> receive any unregistered multicast. If it is marked as a multicast
> router, then it will receive all registered _and_ unregistered
> multicast. Same as with any other bridge port.

A bridge port does not *receive* said flooded traffic, but *sends* it.
The station attached to that bridge port *receives* it. You as a bridge
cannot control whether that station will actually *receive* it.
That is the analogy with the bridge device as a bridge port.
BR_MCAST_FLOOD may decide whether the forwarding plane sends the flow to
the bridge, but the bridge's RX filter still needs to decide whether it
accepts it.

> > 3. The bridge device does not implement IFF_UNICAST_FLT today, so any
> >    addition to the bridge_dev->uc list will turn on bridge_dev->flags &
> >    IFF_PROMISC. Do you see this as a problem on the RX filtering side of
> >    things, or from a system administration PoV, would you just like to
> >    disable BR_FLOOD on the forwarding side of the bridge device and call
> >    it a day, expect the applications to just continue working?
> >
> > 4. Let's say we implement IFF_UNICAST_FLT for the bridge device.
> >    How do we do it? Add one more lookup in br_handle_frame_finish()
> >    which didn't exist before, to search for this MAC DA in bridge_dev->uc?
> 
> I think the most natural way would be to add them to the FDB as local
> entries. We might need an extra flag to track its origin or something,
> but it should be doable, I think. Again I think this is analogous to how
> host addresses are added as FDB entries pointing towards the CPU port in
> an ASIC.
> 
> > 5. Should the implementation of IFF_UNICAST_FLT influence the forwarding
> >    plane in any way?
> 
> Since I propose we keep it in the FDB, yes, it will affect forwarding.
> 
> >    Think of a user space application emitting a
> >    PACKET_MR_UNICAST request to ensure it sees the packets with the MAC
> >    DA it needs. Should said application have awareness of the fact that
> >    the interface it's speaking to is a bridge device? If not, three
> >    things may happen. The admin (you) has turned off BR_FLOOD towards
> >    the bridge device, effectively cutting it off from the (unknown to
> >    the forwarding plane) packets it wants to see. Or the admin hasn't
> >    turned off BR_FLOOD, but the MAC DA, while present in the RX
> >    filtering lists, is still absent from the FDB, so while it is
> >    received locally by the application, is also flooded to all other
> >    bridge ports. Or the kernel may automatically add a BR_FDB_LOCAL entry,
> >    considering that it knows that if there's a destination for this
> >    MAC DA in this broadcast domain, for certain there isn't more than
> >    one, so it could just as well guide the forwarding plane towards it.
> >    Or you may choose completely the other side, "hey, the bridge device
> >    really is that special, and user space should put up with it and know
> >    it has to configure both its forwarding and termination plane before
> >    things work in a reasonable manner on it". But if this is the path
> >    you choose, what about the uppers a bridge may have, like a VLAN with
> >    a MAC address different from the bridge's? Should the 8021q driver
> >    also be aware of the fact that dev_uc_add() may not be sufficient
> >    when applied to the bridge as a real_dev?
> 
> I might be repeating myself, but just to be clear: I think the end goal
> should be to have a proper .ndo_set_rx_mode implementation that
> translates the device address lists into the forwarding plane, making
> most of this transparent to users.
> 
> > 6. For a switchdev driver like DSA, what condition do you propose it
> >    should monitor for deciding whether to enable flooding towards the
> >    host? IFF_PROMISC, BR_FLOOD towards the bridge, or both? You may say
> >    "hey, switchdev offloads just the forwarding plane, I don't really
> >    care if the bridge software path is going to accept the packet or
> >    not". Or you may say that unicast flooding is to be enabled if
> >    IFF_PROMISC || BR_FLOOD, and multicast flooding if IFF_PROMISC ||
> >    IFF_ALLMULTI || BR_MCAST_FLOOD, and broadcast flooding if
> >    BR_BCAST_FLOOD. Now take into consideration the additional checks for
> >    foreign interfaces. Does this complexity scale to the number of
> >    switchdev drivers we have? If not, can we do something about it?
> 
> Rule #1 is, If IFF_PROMISC is enabled on our bridge, we must flood
> everything.
> 
> Then for each class:
> 
> - Broadcast: If BR_BCAST_FLOOD is enabled on the bridge or on any
>   foreign interface, we must flood it to the CPU.
> 
> - Unicast: If BR_FLOOD is enabled on the bridge or on any foreign
>   interface, we must flood it to the CPU.
> 
> - Multicast: If BR_MCAST_FLOOD is enabled on the bridge or on any
>   foreign interface, we must flood it to the CPU.
> 
> For multicast, we also have to take the router port status of the bridge
> for foreign interfaces into consideration, but I believe that should be
> managed separately.
> 
> Additionally, if you _truly_ want to model the bridge's behavior, then
> IFF_PROMISC on the bridge would mean that we either disable offloading
> completely and forward everything via the CPU or setup some kind of port
> mirror on all ports. Although that seems kind of silly, it might
> actually be useful for debugging on occasion. You could have an ethtool
> flag or something that would allow the user to enable it.

So to summarize your second message.

You're saying "hey, I see that IFF_PROMISC and IFF_ALLMULTI on the
bridge are these beasts that defy all conventional wisdom about what an
RX filter even is. A unicast packet forwarded by the bridge shouldn't be
seen in tcpdump if there is an FDB entry that points to some other port
than the bridge device itself, according to the mental model we agreed
on, yet that is what happens today.

So transposed to an offloading plane, setting IFF_PROMISC would mean
that all traffic should be mirrored to the CPU, which is obviously
something that no driver does, for more than one single reason.
Consequently, the presence of an address in dev->uc would mean to mirror
that single address to the CPU.

For practical reasons (unicast traffic has a single destination), we may
mirror that address by installing it to the FDB as a local entry. And
maybe we could get away with that.

But for multicast RX filters, when an entry would get added to dev->mc,
but that same multicast MAC address is absent from the MDB, we can't
really do much about adding it to the MDB either, because this would
transform the flow from being unregistered to being registered towards
the CPU, so stations that used to see this flow via flooding will no
longer see it.

So much for our implementation of a mirror via the FDB/MDB, we'd instead
have to add actual mirroring logic of individual addresses, without
touching the FDB/MDB. Then have a way to offload those mirrors. And that
would be for the sake of correctness.

In terms of usability in the context of dev_uc_add() and dev_mc_add()
though, that isn't really sane and just caters to the naive view that
"tcpdump should see everything after it sets IFF_PROMISC". Because when
a macvlan becomes an upper of the bridge, dev_uc_add() is what it'll
call, and this won't stop the bridge from flooding packets towards the
macvlan's MAC DA towards the network. It will just ensure that the
macvlan is copied to the traffic destined towards it, too.

I maybe agree that the aggregate of all local entries is the termination
plane's Rx filter, in other words not more than that, i.e. the termination
plane shouldn't treat the RX filters as mirrors from the forwarding plane.
And in that view, yes maybe I agree that IFF_PROMISC would implicitly
correspond to BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD, and
IFF_ALLMULTI would implicitly correspond to BR_MCAST_FLOOD (ok, also
adjust for mcast_router ports).

But I'd rather turn a blind eye to that, on the practical basis that
making any changes at all to this mess that isn't correct even now is
going to break something. I'm going to invent some new knobs which
broadly mean what IFF_PROMISC | IFF_ALLMULTI should have meant, and I'm
going to insist on the selling point that these allow me more
fine-grained control over which class of MAC DA is flooded. In any case
I'm not making things worse, because the BR_FLOOD flags towards the
bridge device still make some amount of sense even in the context of a
more normal interpretation of the bridge's RX filtering flags, if
someone from the future really gets infuriated by the bridge device's
behavior in the presence of uppers with different MAC address and has
the guts to change/break something. So maybe I can get away with this."

Am I getting this right? Because maybe I can hop onboard :)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge
  2022-04-11 10:55           ` Vladimir Oltean
@ 2022-04-11 13:14             ` Tobias Waldekranz
  2022-04-11 13:55               ` Vladimir Oltean
  0 siblings, 1 reply; 22+ messages in thread
From: Tobias Waldekranz @ 2022-04-11 13:14 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Jakub Kicinski, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot, Vladimir Oltean, UNGLinuxDriver,
	Paolo Abeni, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Mattias Forsblad, Joachim Wiberg

On Mon, Apr 11, 2022 at 10:55, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> On Mon, Apr 11, 2022 at 11:33:40AM +0200, Tobias Waldekranz wrote:
>> On Sun, Apr 10, 2022 at 22:03, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>> > On Sun, Apr 10, 2022 at 08:02:13PM +0200, Tobias Waldekranz wrote:
>> >> On Sat, Apr 09, 2022 at 20:45, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>> >> > Hi Tobias,
>> >> >
>> >> > On Sat, Apr 09, 2022 at 09:46:54PM +0200, Tobias Waldekranz wrote:
>> >> >> On Fri, Apr 08, 2022 at 23:03, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>> >> >> > For this patch series to make more sense, it should be reviewed from the
>> >> >> > last patch to the first. Changes were made in the order that they were
>> >> >> > just to preserve patch-with-patch functionality.
>> >> >> >
>> >> >> > A little while ago, some DSA switch drivers gained support for
>> >> >> > IFF_UNICAST_FLT, a mechanism through which they are notified of the
>> >> >> > MAC addresses required for local standalone termination.
>> >> >> > A bit longer ago, DSA also gained support for offloading BR_FDB_LOCAL
>> >> >> > bridge FDB entries, which are the MAC addresses required for local
>> >> >> > termination when under a bridge.
>> >> >> >
>> >> >> > So we have come one step closer to removing the CPU from the list of
>> >> >> > destinations for packets with unknown MAC DA.What remains is to check
>> >> >> > whether any software L2 forwarding is enabled, and that is accomplished
>> >> >> > by monitoring the neighbor bridge ports that DSA switches have.
>> >> >> >
>> >> >> > With these changes, DSA drivers that fulfill the requirements for
>> >> >> > dsa_switch_supports_uc_filtering() and dsa_switch_supports_mc_filtering()
>> >> >> > will keep flooding towards the CPU disabled for as long as no port is
>> >> >> > promiscuous. The bridge won't attempt to make its ports promiscuous
>> >> >> > anymore either if said ports are offloaded by switchdev (this series
>> >> >> > changes that behavior). Instead, DSA will fall back by its own will to
>> >> >> > promiscuous mode on bridge ports when the bridge itself becomes
>> >> >> > promiscuous, or a foreign interface is detected under the same bridge.
>> >> >> 
>> >> >> Hi Vladimir,
>> >> >> 
>> >> >> Great stuff! I've added Joachim to Cc. He has been working on a series
>> >> >> to add support for configuring the equivalent of BR_FLOOD,
>> >> >> BR_MCAST_FLOOD, and BR_BCAST_FLOOD on the bridge itself. I.e. allowing
>> >> >> the user to specify how local_rcv is managed in br_handle_frame_finish.
>> >> >> 
>> >> >> For switchdev drivers, being able to query whether a bridge will ingress
>> >> >> unknown unicast to the host or not seems like the missing piece that
>> >> >> makes this bullet proof. I.e. if you have...
>> >> >> 
>> >> >> - No foreign interfaces
>> >> >> - No promisc
>> >> >> _and_
>> >> >> - No BR_FLOOD on the bridge itself
>> >> >> 
>> >> >> ..._then_ you can safely disable unicast flooding towards the CPU
>> >> >> port. The same would hold for multicast and BR_MCAST_FLOOD of course.
>> >> >> 
>> >> >> Not sure how close Joachim is to publishing his work. But I just thought
>> >> >> you two should know about the other one's work :)
>> >> >
>> >> > I haven't seen Joachim's work and I sure hope he can clarify.
>> >> 
>> >> If you want to get a feel for it, it is available here (the branch name
>> >> is just where he started out I think :))
>> >> 
>> >> https://github.com/westermo/linux/tree/bridge-always-flood-unknown-mcast
>> >> 
>> >> > It seems
>> >> > like there is some overlap that I don't currently know what to make of.
>> >> > The way I see things, BR_FLOOD and BR_MCAST_FLOOD are egress settings,
>> >> > so I'm not sure how to interpret them when applied to the bridge device
>> >> > itself.
>> >> 
>> >> They are egress settings, yes. But from the view of the forwarding
>> >> mechanism in the bridge, I would argue that the host interface is (or at
>> >> least should be) as much an egress port as any of the lower devices.
>> >> 
>> >> - It can be the target of an FDB entry
>> >> - It can be a member of an MDB entry
>> >> 
>> >> I.e. it can be chosen as a _destination_ => egress. This is analogous to
>> >> the CPU port, which from the ASICs point of view is an egress port, but
>> >> from a system POV it is receiving frames.
>> >> 
>> >> > On the other hand, treating IFF_PROMISC/IFF_ALLMULTI on the
>> >> > bridge device as the knob that decides whether the software bridge wants
>> >> > to ingress unknown MAC DA packets seems the more appropriate thing to do.
>> >> 
>> >> Maybe. I think it depends on how exact we want to be in our
>> >> classification. Fundementally, I think the problem is that a bridge
>> >> deals with one thing that other netdevs do not:
>> >> 
>> >>   Whether the destination in known/registered or not.
>> >> 
>> >> A NIC's unicast/multicast filters are not quite the same thing, because
>> >> a they only deal with a single endpoint. I.e. if an address isn't in a
>> >> NIC's list of known DA's, then it is "not mine". But in a bridge
>> >> scenario, although it is not associated with the host (i.e. "not mine"),
>> >> it can still be "known" (i.e. associated with some lower port).
>> >> 
>> >> AFAIK, promisc means "receive all the things!", whereas BR_FLOOD would
>> >> just select the subset of frames for which the destination is unknown.
>> >> 
>> >> Likewise for multicast, IFF_ALLMULTI means "receive _all_ multicast" -
>> >> it does not discriminate between registered and unregistered
>> >> flows. BR_MCAST_FLOOD OTOH would only target unregistered flows.
>> >> 
>> >> Here is my understanding of how the two solutions would differ in the
>> >> types of flows that they would affect:
>> >> 
>> >>                     .--------------------.-----------------.
>> >>                     |       IFF_*        |    BR_*FLOOD    |
>> >> .-------------------|---------.----------|-----.-----.-----|
>> >> | Type              | Promisc | Allmulti | BC  | UC  | MC  |
>> >> |-------------------|---------|----------|-----|-----|-----|
>> >> | Broadcast         | Yes     | No       | Yes | No  | No  |
>> >> | Unknown unicast   | Yes     | No       | No  | Yes | No  |
>> >> | Unknown multicast | Yes     | Yes      | No  | No  | Yes |
>> >> | Known unicast     | Yes     | No       | No  | No  | No  |
>> >                         ~~~
>> >                         To what degree does IFF_PROMISC affect known
>> >                         unicast traffic?
>> 
>> When a bridge interface has this flag set, local_rcv is unconditionally
>> set to true in br_handle_frame_finish:
>> 
>>     local_rcv = !!(br->dev->flags & IFF_PROMISC);
>> 
>> Which means we will always end up in br_pass_frame_up.
>> 
>> >> | Known multicast   | Yes     | Yes      | No  | No  | No  |
>> >> '-------------------'--------------------'-----------------'
>> >
>> > So to summarize what you're trying to say. You see two planes back to back.
>> >
>> >  +-------------------------------+
>> >  |    User space, other uppers   |
>> >  +-------------------------------+
>> >  |   Termination plane governed  |
>> >  |     by dev->uc and dev->mc    |
>> >  |            +-----+            |
>> >  |            | br0 |            |
>> >  +------------+-----+------------+
>> >  |            | br0 |            |
>> >  |            +-----+            |
>> >  |  Forwarding plane governed    |
>> >  |        by FDB and MDB         |
>> >  | +------+------+------+------+ |
>> >  | | swp0 | swp1 | swp2 | swp3 | |
>> >  +-+------+------+------+------+-+
>> 
>> Yes, this is pretty much my mental model of it.
>> 
>> > For a packet to be locally received on the br0 interface, it needs to
>> > pass the filters from both the forwarding plane and the termination
>> > plane.
>> 
>> I see it more as the forwarding plane having a superset of the
>> information available to the termination plane.
>> 
>> I.e. all of the information in dev->uc and dev->mc can easily be
>> represented in the FDB (as a "local" entry) and MDB (by setting
>> `host_joined`) respectively, but there's loads of information in the
>> {F,M}DB for which there is no representation in a netdev's address
>> lists.
>> 
>> > Considering a unicast packet:
>> > - if a local/permanent entry exists in the FDB, it is known to the
>> >   forwarding plane, and its destination is the bridge device seen as a
>> >   bridge port
>> > - if the FDB doesn't contain this MAC DA, it is unknown to the
>> >   forwarding plane, but the bridge device is still a destination for it,
>> >   since the bridge seen as a bridge port has a non-configurable BR_FLOOD
>> >   port flag which allows unknown unicast to exit the forwarding plane
>> >   towards the termination plane implicitly on
>> > - if the MAC DA exists in the RX filter of the bridge device
>> >   (dev->dev_addr or dev->uc), the packet is known to the termination
>> >   plane, so it is not filtered out.
>> 
>> It's more like the aggregate of all local entries _is_ the termination
>> plane's Rx filter. I.e. in the bridge's .ndo_set_rx_mode, we should sync
>> all UC/MC addresses into the FDB/MDB.
>> 
>> There's a lot of stuff to deal with here around VLANs, since that
>> concept is missing from the kernel's address lists.
>> 
>> > - if the MAC DA doesn't exist in the RX filter, the packet is filtered
>> >   out, unless the RX filter is disabled via IFF_PROMISC, case in which
>> >   it is still accepted
>> >
>> > Considering a multicast packet, things are mostly the same, except for
>> > the fact that having a local/permanent MDB entry towards the bridge
>> > device does not necessarily 'steal' it from the forwarding plane as it
>> > does in case of unicast, since multicast traffic can have multiple
>> > destinations which don't exclude each other.
>> 
>> Not _necessarily_, but you are reclassifying the group as registered,
>> which means you go from flooding to forwarding. This in turn, means only
>> other registered subscribers (and routers) will see it.
>> 
>> > Needless to say that what we have today is a very limited piece of the
>> > bigger puzzle you've presented above, and perhaps does not even behave
>> > the same as the larger puzzle would, restricted to the same operating
>> > conditions as the current code.
>> >
>> > Here are some practical questions so I can make sure I understand the
>> > model you're proposing.
>> >
>> > 1. You or Joachim add support for BR_FLOOD on the bridge device itself.
>> >    The bridge device is promiscuous, and a packet with a MAC DA unknown
>> >    to the forwarding plane is received. Will this packet be seen in
>> >    tcpdump or not? I assume not, because it never even reached the RX
>> >    filtering lists of the bridge device, it didn't exit the forwarding
>> >    plane.
>> 
>> We don't propose any changes to how IFF_PROMISC is interpreted at
>> all. Joachim's changes will simply allow a user more fine grained
>> control over which flows are recieved by the termination plane when
>> IFF_PROMISC is _not_ set.
>> 
>> > 2. Somebody comes later and adds support for IFF_ALLMULTI. This means,
>> >    the RX filtering for unknown multicast can be toggled on or off.
>> >    Be there a bridge with mcast_router enabled. Should the bridge device
>> >    receive unknown multicast traffic or not, when IFF_ALLMULTI isn't
>> >    enabled? Right now, I think IFF_ALLMULTI isn't necessary.
>> 
>> I think that setting IFF_ALLMULTI on a bridge interface is really just
>> another way of saying that the bridge should be configured as a static
>> multicast router. I.e. ...
>> 
>>     ip link set dev br0 allmulticast on
>> 
>> ...should be equivalent to...
>> 
>>     ip link set dev br0 type bridge mcast_router 2
>> 
>> ...since that flag is an indication that _all_ multicast should be
>> received, both registered and unregistered.
>> 
>> We propose to add support for all three classes of _unknown_ traffic
>> (BUM) from the get-go:
>> 
>> - BR_FLOOD (Unknown unicast)
>> - BR_MCAST_FLOOD (Unregistered multicast)
>> - BR_BCAST_FLOOD (Broadcast)
>> 
>> So if BR_MCAST_FLOOD is not set, then the termination plane will not
>> receive any unregistered multicast. If it is marked as a multicast
>> router, then it will receive all registered _and_ unregistered
>> multicast. Same as with any other bridge port.
>
> A bridge port does not *receive* said flooded traffic, but *sends* it.

Right, that's why I tried to use your "termination plane" terminology.

> The station attached to that bridge port *receives* it. You as a bridge
> cannot control whether that station will actually *receive* it.
> That is the analogy with the bridge device as a bridge port.
> BR_MCAST_FLOOD may decide whether the forwarding plane sends the flow to
> the bridge, but the bridge's RX filter still needs to decide whether it
> accepts it.

So in br_pass_frame_up we would have a separate validation of whether
the DA matches the termination plane's Rx filter or not? That makes
sense.

>> > 3. The bridge device does not implement IFF_UNICAST_FLT today, so any
>> >    addition to the bridge_dev->uc list will turn on bridge_dev->flags &
>> >    IFF_PROMISC. Do you see this as a problem on the RX filtering side of
>> >    things, or from a system administration PoV, would you just like to
>> >    disable BR_FLOOD on the forwarding side of the bridge device and call
>> >    it a day, expect the applications to just continue working?
>> >
>> > 4. Let's say we implement IFF_UNICAST_FLT for the bridge device.
>> >    How do we do it? Add one more lookup in br_handle_frame_finish()
>> >    which didn't exist before, to search for this MAC DA in bridge_dev->uc?
>> 
>> I think the most natural way would be to add them to the FDB as local
>> entries. We might need an extra flag to track its origin or something,
>> but it should be doable, I think. Again I think this is analogous to how
>> host addresses are added as FDB entries pointing towards the CPU port in
>> an ASIC.
>> 
>> > 5. Should the implementation of IFF_UNICAST_FLT influence the forwarding
>> >    plane in any way?
>> 
>> Since I propose we keep it in the FDB, yes, it will affect forwarding.
>> 
>> >    Think of a user space application emitting a
>> >    PACKET_MR_UNICAST request to ensure it sees the packets with the MAC
>> >    DA it needs. Should said application have awareness of the fact that
>> >    the interface it's speaking to is a bridge device? If not, three
>> >    things may happen. The admin (you) has turned off BR_FLOOD towards
>> >    the bridge device, effectively cutting it off from the (unknown to
>> >    the forwarding plane) packets it wants to see. Or the admin hasn't
>> >    turned off BR_FLOOD, but the MAC DA, while present in the RX
>> >    filtering lists, is still absent from the FDB, so while it is
>> >    received locally by the application, is also flooded to all other
>> >    bridge ports. Or the kernel may automatically add a BR_FDB_LOCAL entry,
>> >    considering that it knows that if there's a destination for this
>> >    MAC DA in this broadcast domain, for certain there isn't more than
>> >    one, so it could just as well guide the forwarding plane towards it.
>> >    Or you may choose completely the other side, "hey, the bridge device
>> >    really is that special, and user space should put up with it and know
>> >    it has to configure both its forwarding and termination plane before
>> >    things work in a reasonable manner on it". But if this is the path
>> >    you choose, what about the uppers a bridge may have, like a VLAN with
>> >    a MAC address different from the bridge's? Should the 8021q driver
>> >    also be aware of the fact that dev_uc_add() may not be sufficient
>> >    when applied to the bridge as a real_dev?
>> 
>> I might be repeating myself, but just to be clear: I think the end goal
>> should be to have a proper .ndo_set_rx_mode implementation that
>> translates the device address lists into the forwarding plane, making
>> most of this transparent to users.
>> 
>> > 6. For a switchdev driver like DSA, what condition do you propose it
>> >    should monitor for deciding whether to enable flooding towards the
>> >    host? IFF_PROMISC, BR_FLOOD towards the bridge, or both? You may say
>> >    "hey, switchdev offloads just the forwarding plane, I don't really
>> >    care if the bridge software path is going to accept the packet or
>> >    not". Or you may say that unicast flooding is to be enabled if
>> >    IFF_PROMISC || BR_FLOOD, and multicast flooding if IFF_PROMISC ||
>> >    IFF_ALLMULTI || BR_MCAST_FLOOD, and broadcast flooding if
>> >    BR_BCAST_FLOOD. Now take into consideration the additional checks for
>> >    foreign interfaces. Does this complexity scale to the number of
>> >    switchdev drivers we have? If not, can we do something about it?
>> 
>> Rule #1 is, If IFF_PROMISC is enabled on our bridge, we must flood
>> everything.
>> 
>> Then for each class:
>> 
>> - Broadcast: If BR_BCAST_FLOOD is enabled on the bridge or on any
>>   foreign interface, we must flood it to the CPU.
>> 
>> - Unicast: If BR_FLOOD is enabled on the bridge or on any foreign
>>   interface, we must flood it to the CPU.
>> 
>> - Multicast: If BR_MCAST_FLOOD is enabled on the bridge or on any
>>   foreign interface, we must flood it to the CPU.
>> 
>> For multicast, we also have to take the router port status of the bridge
>> for foreign interfaces into consideration, but I believe that should be
>> managed separately.
>> 
>> Additionally, if you _truly_ want to model the bridge's behavior, then
>> IFF_PROMISC on the bridge would mean that we either disable offloading
>> completely and forward everything via the CPU or setup some kind of port
>> mirror on all ports. Although that seems kind of silly, it might
>> actually be useful for debugging on occasion. You could have an ethtool
>> flag or something that would allow the user to enable it.
>
> So to summarize your second message.
>
> You're saying "hey, I see that IFF_PROMISC and IFF_ALLMULTI on the
> bridge are these beasts that defy all conventional wisdom about what an
> RX filter even is. A unicast packet forwarded by the bridge shouldn't be
> seen in tcpdump if there is an FDB entry that points to some other port
> than the bridge device itself, according to the mental model we agreed
> on, yet that is what happens today.

I think IFF_PROMISC and IFF_ALLMULTI serve their purpose as Rx filters
for NICs. It's just that I don't think they map that well to the flood
controls normally available on a switch ASIC (due to the fact that a
switch must differentiate between known and unknown traffic).

> So transposed to an offloading plane, setting IFF_PROMISC would mean
> that all traffic should be mirrored to the CPU, which is obviously
> something that no driver does, for more than one single reason.
> Consequently, the presence of an address in dev->uc would mean to mirror
> that single address to the CPU.
>
> For practical reasons (unicast traffic has a single destination), we may
> mirror that address by installing it to the FDB as a local entry. And
> maybe we could get away with that.

More than getting away with it, I would argue that it's the Right
Thing™. Station addresses on an Ethernet segment must be unique, so if
we know a station's location (by any means) then by definition it can
only be in that location.

> But for multicast RX filters, when an entry would get added to dev->mc,
> but that same multicast MAC address is absent from the MDB, we can't
> really do much about adding it to the MDB either, because this would
> transform the flow from being unregistered to being registered towards
> the CPU, so stations that used to see this flow via flooding will no
> longer see it.

Yes, multicast is much more tricky. For IP multicast, I think this
already works because an IP_ADD_MEMBERSHIP will cause the kernel to
generate an IGMP/MLD report which can be snooped by the bridge and an
entry installed in the MDB.

We can do the same with L2 multicast by having a PACKET_ADD_MEMBERSHIP
trigger an MMRP message, and then implement MMRP in the bridge.

So as an administrator, for L2 and L3 multicast, you have two choices:

1. Enable the relevant control protocol (MMRP, or IGMP/MLD). In this
   case you can get better filtering because we can load the multicast
   address list into the MDB. The trade-off is that all group listeners
   must speak the control protocol.

2. Disable the control protocol. The address lists will be ignored by
   the bridge, and you rely on flooding to distribute the traffic to
   where it needs to get. This will work for stations that can't take
   part in the control protocol.

There are also variations of (1) where you can help legacy devices by
either marking the ports as permanent router ports or by adding static
entries to the MDB.

> So much for our implementation of a mirror via the FDB/MDB, we'd instead
> have to add actual mirroring logic of individual addresses, without
> touching the FDB/MDB. Then have a way to offload those mirrors. And that
> would be for the sake of correctness.
>
> In terms of usability in the context of dev_uc_add() and dev_mc_add()
> though, that isn't really sane and just caters to the naive view that
> "tcpdump should see everything after it sets IFF_PROMISC". Because when
> a macvlan becomes an upper of the bridge, dev_uc_add() is what it'll
> call, and this won't stop the bridge from flooding packets towards the
> macvlan's MAC DA towards the network. It will just ensure that the
> macvlan is copied to the traffic destined towards it, too.
>
> I maybe agree that the aggregate of all local entries is the termination
> plane's Rx filter, in other words not more than that, i.e. the termination
> plane shouldn't treat the RX filters as mirrors from the forwarding plane.
> And in that view, yes maybe I agree that IFF_PROMISC would implicitly
> correspond to BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD, and

I would phrase it as: IFF_PROMISC means that switchdevs should forward
as much crap as possible towards the CPU. But at the DSA layer, I think
calling a driver specific .bridge_set_promisc op would be better than
assuming that driver's can't do any better.

> IFF_ALLMULTI would implicitly correspond to BR_MCAST_FLOOD (ok, also
> adjust for mcast_router ports).

IFF flags shouldn't correspond to any bridge flags. They describe higher
level features. Therefore, they should be passed down to the drivers,
who in many cases may decide to use hardware resources that are shared
with bridge flags (i.e. flood controls), but in some cases may be able
to do something better.

As an example of "something better": some ASICs have separate flooding
controls for IP and non-IP multicast.

So, I think

    ip link set dev br0 allmulticast on

Should be one way for userspace to tell the bridge to mark the host port
as a permanent multicast router port. This in turn would trigger a

    switchdev_port_attr_set(dev,
    	&{ .id = SWITCHDEV_ATTR_ID_BRIDGE_MROUTER ... }, extack);

At the DSA layer this info would be passed to the driver, which will
decide if that means the same thing as BR_MCAST_FLOOD or something
else.

> But I'd rather turn a blind eye to that, on the practical basis that
> making any changes at all to this mess that isn't correct even now is
> going to break something. I'm going to invent some new knobs which
> broadly mean what IFF_PROMISC | IFF_ALLMULTI should have meant, and I'm
> going to insist on the selling point that these allow me more
> fine-grained control over which class of MAC DA is flooded. In any case
> I'm not making things worse, because the BR_FLOOD flags towards the
> bridge device still make some amount of sense even in the context of a
> more normal interpretation of the bridge's RX filtering flags, if
> someone from the future really gets infuriated by the bridge device's
> behavior in the presence of uppers with different MAC address and has
> the guts to change/break something. So maybe I can get away with this."
>
> Am I getting this right? Because maybe I can hop onboard :)

I think we're getting there. Would love to have you on deck! :)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge
  2022-04-11 13:14             ` Tobias Waldekranz
@ 2022-04-11 13:55               ` Vladimir Oltean
  2022-04-11 14:24                 ` Vladimir Oltean
  0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2022-04-11 13:55 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: Vladimir Oltean, netdev, Jakub Kicinski, David S. Miller,
	Florian Fainelli, Andrew Lunn, Vivien Didelot, UNGLinuxDriver,
	Paolo Abeni, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Mattias Forsblad, Joachim Wiberg

On Mon, Apr 11, 2022 at 03:14:25PM +0200, Tobias Waldekranz wrote:
> > The station attached to that bridge port *receives* it. You as a bridge
> > cannot control whether that station will actually *receive* it.
> > That is the analogy with the bridge device as a bridge port.
> > BR_MCAST_FLOOD may decide whether the forwarding plane sends the flow to
> > the bridge, but the bridge's RX filter still needs to decide whether it
> > accepts it.
> 
> So in br_pass_frame_up we would have a separate validation of whether
> the DA matches the termination plane's Rx filter or not? That makes
> sense.

Yes, I know it would make sense. It's also not possible without
introducing regressions (the bridge currently being able to receive
unknown multicast without IFF_ALLMULTI sounds like a big roadblock).
It also introduces questions about whether user space and upper
interfaces need to be aware that the termination plane is not the only
thing they need to configure.

> > For practical reasons (unicast traffic has a single destination), we may
> > mirror that address by installing it to the FDB as a local entry. And
> > maybe we could get away with that.
> 
> More than getting away with it, I would argue that it's the Right
> Thing™. Station addresses on an Ethernet segment must be unique, so if
> we know a station's location (by any means) then by definition it can
> only be in that location.

Actually, you said it yourself that IFF_PROMISC creates a second copy of
a forwarded known unicast packet, on which it calls br_pass_frame_up().
Think of dev_uc_add() as the IFF_PROMISC behavior applied to a single
unicast MAC DA, as opposed to all MAC DAs. You can't add a local FDB
entry *and* be consistent with bridge IFF_PROMISC behavior.

> > But for multicast RX filters, when an entry would get added to dev->mc,
> > but that same multicast MAC address is absent from the MDB, we can't
> > really do much about adding it to the MDB either, because this would
> > transform the flow from being unregistered to being registered towards
> > the CPU, so stations that used to see this flow via flooding will no
> > longer see it.
> 
> Yes, multicast is much more tricky. For IP multicast, I think this
> already works because an IP_ADD_MEMBERSHIP will cause the kernel to
> generate an IGMP/MLD report which can be snooped by the bridge and an
> entry installed in the MDB.

Happy coincidence, because IP_ADD_MEMBERSHIP will eventually call
ip_mc_filter_add() -> dev_mc_add() and this will update the RX filter of
the bridge. And the multicast snooping code will update the forwarding
plane. I wonder if there's something similar to take away from this
w.r.t. unicast behavior. Like for example the bridge driver could do
dynamic address learning for locally originated traffic, and the
forwarding and termination planes would be once again in sync with no
need for higher layers to know about the distinction anyway. At least
for a while, until the entry ages out.

> We can do the same with L2 multicast by having a PACKET_ADD_MEMBERSHIP
> trigger an MMRP message, and then implement MMRP in the bridge.
> 
> So as an administrator, for L2 and L3 multicast, you have two choices:
> 
> 1. Enable the relevant control protocol (MMRP, or IGMP/MLD). In this
>    case you can get better filtering because we can load the multicast
>    address list into the MDB. The trade-off is that all group listeners
>    must speak the control protocol.
> 
> 2. Disable the control protocol. The address lists will be ignored by
>    the bridge, and you rely on flooding to distribute the traffic to
>    where it needs to get. This will work for stations that can't take
>    part in the control protocol.
> 
> There are also variations of (1) where you can help legacy devices by
> either marking the ports as permanent router ports or by adding static
> entries to the MDB.
> 
> > So much for our implementation of a mirror via the FDB/MDB, we'd instead
> > have to add actual mirroring logic of individual addresses, without
> > touching the FDB/MDB. Then have a way to offload those mirrors. And that
> > would be for the sake of correctness.
> >
> > In terms of usability in the context of dev_uc_add() and dev_mc_add()
> > though, that isn't really sane and just caters to the naive view that
> > "tcpdump should see everything after it sets IFF_PROMISC". Because when
> > a macvlan becomes an upper of the bridge, dev_uc_add() is what it'll
> > call, and this won't stop the bridge from flooding packets towards the
> > macvlan's MAC DA towards the network. It will just ensure that the
> > macvlan is copied to the traffic destined towards it, too.
> >
> > I maybe agree that the aggregate of all local entries is the termination
> > plane's Rx filter, in other words not more than that, i.e. the termination
> > plane shouldn't treat the RX filters as mirrors from the forwarding plane.
> > And in that view, yes maybe I agree that IFF_PROMISC would implicitly
> > correspond to BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD, and
> 
> I would phrase it as: IFF_PROMISC means that switchdevs should forward
> as much crap as possible towards the CPU. But at the DSA layer, I think
> calling a driver specific .bridge_set_promisc op would be better than
> assuming that driver's can't do any better.

I think letting individual DSA sub-drivers decide how generous they'd
like to be with a promiscuous bridge is pure insanity. Today a
promiscuous bridge wants to see all traffic being mirrored, and we don't
do that. That is an inconvenient truth we need to accept and have a
common position on. Supposedly a position driven centrally by the bridge
driver itself.

> > IFF_ALLMULTI would implicitly correspond to BR_MCAST_FLOOD (ok, also
> > adjust for mcast_router ports).
> 
> IFF flags shouldn't correspond to any bridge flags. They describe higher
> level features. Therefore, they should be passed down to the drivers,
> who in many cases may decide to use hardware resources that are shared
> with bridge flags (i.e. flood controls), but in some cases may be able
> to do something better.
> 
> As an example of "something better": some ASICs have separate flooding
> controls for IP and non-IP multicast.
> 
> So, I think
> 
>     ip link set dev br0 allmulticast on
> 
> Should be one way for userspace to tell the bridge to mark the host port
> as a permanent multicast router port. This in turn would trigger a
> 
>     switchdev_port_attr_set(dev,
>     	&{ .id = SWITCHDEV_ATTR_ID_BRIDGE_MROUTER ... }, extack);
> 
> At the DSA layer this info would be passed to the driver, which will
> decide if that means the same thing as BR_MCAST_FLOOD or something
> else.

Yeah, I don't think so. Doing nothing at all is way better than
entangling the RX filtering logic even more with the forwarding logic,
IMHO.

> > But I'd rather turn a blind eye to that, on the practical basis that
> > making any changes at all to this mess that isn't correct even now is
> > going to break something. I'm going to invent some new knobs which
> > broadly mean what IFF_PROMISC | IFF_ALLMULTI should have meant, and I'm
> > going to insist on the selling point that these allow me more
> > fine-grained control over which class of MAC DA is flooded. In any case
> > I'm not making things worse, because the BR_FLOOD flags towards the
> > bridge device still make some amount of sense even in the context of a
> > more normal interpretation of the bridge's RX filtering flags, if
> > someone from the future really gets infuriated by the bridge device's
> > behavior in the presence of uppers with different MAC address and has
> > the guts to change/break something. So maybe I can get away with this."
> >
> > Am I getting this right? Because maybe I can hop onboard :)
> 
> I think we're getting there. Would love to have you on deck! :)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge
  2022-04-11 13:55               ` Vladimir Oltean
@ 2022-04-11 14:24                 ` Vladimir Oltean
  0 siblings, 0 replies; 22+ messages in thread
From: Vladimir Oltean @ 2022-04-11 14:24 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: Vladimir Oltean, netdev, Jakub Kicinski, David S. Miller,
	Florian Fainelli, Andrew Lunn, Vivien Didelot, UNGLinuxDriver,
	Paolo Abeni, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Mattias Forsblad, Joachim Wiberg

On Mon, Apr 11, 2022 at 04:55:30PM +0300, Vladimir Oltean wrote:
> > > IFF_ALLMULTI would implicitly correspond to BR_MCAST_FLOOD (ok, also
> > > adjust for mcast_router ports).
> > 
> > IFF flags shouldn't correspond to any bridge flags. They describe higher
> > level features. Therefore, they should be passed down to the drivers,
> > who in many cases may decide to use hardware resources that are shared
> > with bridge flags (i.e. flood controls), but in some cases may be able
> > to do something better.
> > 
> > As an example of "something better": some ASICs have separate flooding
> > controls for IP and non-IP multicast.
> > 
> > So, I think
> > 
> >     ip link set dev br0 allmulticast on
> > 
> > Should be one way for userspace to tell the bridge to mark the host port
> > as a permanent multicast router port. This in turn would trigger a
> > 
> >     switchdev_port_attr_set(dev,
> >     	&{ .id = SWITCHDEV_ATTR_ID_BRIDGE_MROUTER ... }, extack);
> > 
> > At the DSA layer this info would be passed to the driver, which will
> > decide if that means the same thing as BR_MCAST_FLOOD or something
> > else.
> 
> Yeah, I don't think so. Doing nothing at all is way better than
> entangling the RX filtering logic even more with the forwarding logic,
> IMHO.

Thinking out loud.
I still maintain it is weird and uncalled for if "ip link set dev br0 allmulticast on"
marks the bridge as a multicast router (maybe I'm wrong, but I don't see why it is helpful).
But the other way around may not be so weird. That is, as long as the
bridge is a multicast router port or needs to receive unknown multicast
for any reason at all, it auto-enables IFF_ALLMULTI on its RX flags.

Then, we just need to take care of that "RX filter is a mirror" thing.
Not ideal, but maybe if we actually introduce a bridge flag "rx_filter_mirror"
where the default is 1 to keep backwards compatibility, we could actually
turn the bridge into something sane?
The bad part is that we can't make switchdev drivers reject a bridge
with rx_filter_mirror=1, unless we're ready to deal with the breakage
(although even that is tempting...).

What this effort would achieve is that the bridge would no longer become
promiscuous in the presence of uppers with different MAC address.
It would do this by implementing the filtering you talked about in
br_pass_frame_up().

As for offloading, the nice part is that the bridge RX filtering logic
could be left as invisible to switchdev, and we could concentrate only
on flooding towards the host via the logic that Joachim is working on.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 6/6] net: bridge: avoid uselessly making offloaded ports promiscuous
  2022-04-08 20:03 ` [PATCH net-next 6/6] net: bridge: avoid uselessly making offloaded ports promiscuous Vladimir Oltean
  2022-04-09 10:17   ` kernel test robot
@ 2022-04-14 13:53   ` kernel test robot
  1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2022-04-14 13:53 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: kbuild-all, Jakub Kicinski, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, UNGLinuxDriver, Paolo Abeni,
	Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel,
	Tobias Waldekranz, Mattias Forsblad

Hi Vladimir,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Vladimir-Oltean/Disable-host-flooding-for-DSA-ports-under-a-bridge/20220409-041556
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git e8bd70250a821edb541c3abe1eacdad9f8dc7adf
config: i386-debian-10.3 (https://download.01.org/0day-ci/archive/20220414/202204142133.6pDDI4xD-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.2.0-19) 11.2.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/2b24e24c704fa129c753dbc8fb764c95f4a3562c
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Vladimir-Oltean/Disable-host-flooding-for-DSA-ports-under-a-bridge/20220409-041556
        git checkout 2b24e24c704fa129c753dbc8fb764c95f4a3562c
        # save the config file to linux build tree
        mkdir build_dir
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash net/bridge/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   net/bridge/br_if.c: In function 'br_manage_promisc':
>> net/bridge/br_if.c:145:22: error: 'struct net_bridge_port' has no member named 'offload_count'
     145 |                 if (p->offload_count) {
         |                      ^~


vim +145 net/bridge/br_if.c

   129	
   130	/* When a port is added or removed or when certain port flags
   131	 * change, this function is called to automatically manage
   132	 * promiscuity setting of all the bridge ports.  We are always called
   133	 * under RTNL so can skip using rcu primitives.
   134	 */
   135	void br_manage_promisc(struct net_bridge *br)
   136	{
   137		struct net_bridge_port *p;
   138	
   139		list_for_each_entry(p, &br->port_list, list) {
   140			/* Offloaded ports have a separate address database for
   141			 * forwarding, which is managed through switchdev and not
   142			 * through dev_uc_add(), so the promiscuous concept makes no
   143			 * sense for them. Avoid updating promiscuity in that case.
   144			 */
 > 145			if (p->offload_count) {
   146				br_port_clear_promisc(p);
   147				continue;
   148			}
   149	
   150			/* If bridge is promiscuous, unconditionally place all ports
   151			 * in promiscuous mode too. This allows the bridge device to
   152			 * locally receive all unknown traffic.
   153			 */
   154			if (br->dev->flags & IFF_PROMISC) {
   155				br_port_set_promisc(p);
   156				continue;
   157			}
   158	
   159			/* If vlan filtering is disabled, place all ports in
   160			 * promiscuous mode.
   161			 */
   162			if (!br_vlan_enabled(br->dev)) {
   163				br_port_set_promisc(p);
   164				continue;
   165			}
   166	
   167			/* If the number of auto-ports is <= 1, then all other ports
   168			 * will have their output configuration statically specified
   169			 * through fdbs. Since ingress on the auto-port becomes
   170			 * forwarding/egress to other ports and egress configuration is
   171			 * statically known, we can say that ingress configuration of
   172			 * the auto-port is also statically known.
   173			 * This lets us disable promiscuous mode and write this config
   174			 * to hw.
   175			 */
   176			if (br->auto_cnt == 0 ||
   177			    (br->auto_cnt == 1 && br_auto_port(p)))
   178				br_port_clear_promisc(p);
   179			else
   180				br_port_set_promisc(p);
   181		}
   182	}
   183	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-04-14 14:24 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-08 20:03 [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge Vladimir Oltean
2022-04-08 20:03 ` [PATCH net-next 1/6] net: refactor all NETDEV_CHANGE notifier calls to a single function Vladimir Oltean
2022-04-08 20:03 ` [PATCH net-next 2/6] net: emit NETDEV_CHANGE for changes to IFF_PROMISC | IFF_ALLMULTI Vladimir Oltean
2022-04-08 20:03 ` [PATCH net-next 3/6] net: dsa: walk through all changeupper notifier functions Vladimir Oltean
2022-04-08 20:03 ` [PATCH net-next 4/6] net: dsa: track whether bridges have foreign interfaces in them Vladimir Oltean
2022-04-08 20:03 ` [PATCH net-next 5/6] net: dsa: monitor changes to bridge promiscuity Vladimir Oltean
2022-04-08 20:03 ` [PATCH net-next 6/6] net: bridge: avoid uselessly making offloaded ports promiscuous Vladimir Oltean
2022-04-09 10:17   ` kernel test robot
2022-04-09 11:04     ` Vladimir Oltean
2022-04-09 11:04       ` Vladimir Oltean
2022-04-14 13:53   ` kernel test robot
2022-04-08 20:14 ` [PATCH net-next 0/6] Disable host flooding for DSA ports under a bridge Vladimir Oltean
2022-04-08 20:52   ` Nikolay Aleksandrov
2022-04-09 19:46 ` Tobias Waldekranz
2022-04-09 20:45   ` Vladimir Oltean
2022-04-10 18:02     ` Tobias Waldekranz
2022-04-10 22:03       ` Vladimir Oltean
2022-04-11  9:33         ` Tobias Waldekranz
2022-04-11 10:55           ` Vladimir Oltean
2022-04-11 13:14             ` Tobias Waldekranz
2022-04-11 13:55               ` Vladimir Oltean
2022-04-11 14:24                 ` Vladimir Oltean

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.