Linux-OMAP Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA
@ 2021-03-18 23:18 Vladimir Oltean
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 01/16] net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a bridge Vladimir Oltean
                   ` (15 more replies)
  0 siblings, 16 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-18 23:18 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	netdev, linux-kernel, Roopa Prabhu, Nikolay Aleksandrov,
	Jiri Pirko, Ido Schimmel, Alexandre Belloni, UNGLinuxDriver,
	Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

This series has two objectives:
- To make LAG uppers on top of DSA ports work regardless of which order
  we link interfaces to their masters (first make the port join the LAG,
  then the LAG join the bridge, or the other way around).
- To make DSA ports support non-offloaded LAG interfaces properly.

There was a design decision to be made in patches 2-4 on whether we
should adopt the "push" model, where the driver just calls:

  switchdev_bridge_port_offloaded(brport_dev,
                                  &atomic_notifier_block,
                                  &blocking_notifier_block,
                                  extack);

and the bridge just replays the entire collection of switchdev port
attributes and objects that it has, in some predefined order and with
some predefined error handling logic;


or the "pull" model, where the driver, apart from calling:

  switchdev_bridge_port_offloaded(brport_dev, extack);

has the task of "dumpster diving" (as Tobias puts it) through the bridge
attributes and objects by itself, by calling:

  - br_vlan_replay
  - br_fdb_replay
  - br_mdb_replay
  - br_vlan_enabled
  - br_port_flag_is_set
  - br_port_get_stp_state
  - br_multicast_router
  - br_get_ageing_time

(not necessarily all of them, and not necessarily in this order, and
with driver-defined error handling).

Even though I'm not in love myself with the "pull" model, I chose it
because there is a fundamental trick with replaying switchdev events
like this:

ip link add br0 type bridge
ip link add bond0 type bond
ip link set bond0 master br0
ip link set swp0 master bond0 <- this will replay the objects once for
                                 the bond0 bridge port, and the swp0
                                 switchdev port will process them
ip link set swp1 master bond0 <- this will replay the objects again for
                                 the bond0 bridge port, and the swp1
                                 switchdev port will see them, but swp0
                                 will see them for the second time now

Basically I believe that it is implementation defined whether the driver
wants to error out on switchdev objects seen twice on a port, and the
bridge should not enforce a certain model for that. For example, for FDB
entries added to a bonding interface, the underling switchdev driver
might have an abstraction for just that: an FDB entry pointing towards a
logical (as opposed to physical) port. So when the second port joins the
bridge, it doesn't realy need to replay FDB entries, since there is
already at least one hardware port which has been receiving those
events, and the FDB entries don't need to be added a second time to the
same logical port.
In the other corner, we have the drivers that handle switchdev port
attributes on a LAG as individual switchdev port attributes on physical
ports (example: VLAN filtering). In fact, the switchdev_handle_port_attr_set
helper facilitates this: it is a fan-out from a single orig_dev towards
multiple lowers that pass the check_cb().
But that's the point: switchdev_handle_port_attr_set is just a helper
which the driver _opts_ to use. The bridge can't enforce the "push"
model, because that would assume that all drivers handle port attributes
in the same way, which is probably false.

For this reason, I preferred to go with the "pull" mode for this patch
set. Just to see how bad it is for other switchdev drivers to copy-paste
this logic, I added the pull support to ocelot too, and I think it's
pretty manageable.

This patch set is RFC because it is minimally tested, and I would like
to get some feedback/agreement regarding the design decisions taken,
before I spend any more time on this.

There are also some things I probably broke, but I couldn't figure any
better. For example, I can't seem to figure out if mlxsw does the right
thing when joining a bonding interface that is already a bridge port.
I think it probably doesn't, so in that case, the placement I found for
the switchdev_bridge_port_offload() probably needs some adjustment when
there exists a LAG upper.

If possible, I would like the maintainers of the switchdev drivers to
tell me if this change introduces any regressions to how packets are
flooded (actually not flooded) in software by the bridge between two
ports belonging to the same ASIC ID.

I should mention that this patch series is written on top of Tobias'
series:
https://patchwork.kernel.org/project/netdevbpf/cover/20210318192540.895062-1-tobias@waldekranz.com/
which should get applied soon.

Vladimir Oltean (16):
  net: dsa: call dsa_port_bridge_join when joining a LAG that is already
    in a bridge
  net: dsa: pass extack to dsa_port_{bridge,lag}_join
  net: dsa: inherit the actual bridge port flags at join time
  net: dsa: sync up with bridge port's STP state when joining
  net: dsa: sync up VLAN filtering state when joining the bridge
  net: dsa: sync multicast router state when joining the bridge
  net: dsa: sync ageing time when joining the bridge
  net: dsa: replay port and host-joined mdb entries when joining the
    bridge
  net: dsa: replay port and local fdb entries when joining the bridge
  net: dsa: replay VLANs installed on port when joining the bridge
  net: ocelot: support multiple bridges
  net: ocelot: call ocelot_netdevice_bridge_join when joining a bridged
    LAG
  net: ocelot: replay switchdev events when joining bridge
  net: dsa: don't set skb->offload_fwd_mark when not offloading the
    bridge
  net: dsa: return -EOPNOTSUPP when driver does not implement
    .port_lag_join
  net: bridge: switchdev: let drivers inform which bridge ports are
    offloaded

 drivers/net/dsa/ocelot/felix.c                |   4 +-
 .../ethernet/freescale/dpaa2/dpaa2-switch.c   |   4 +-
 .../marvell/prestera/prestera_switchdev.c     |   7 +
 .../mellanox/mlxsw/spectrum_switchdev.c       |   4 +-
 drivers/net/ethernet/mscc/ocelot.c            |  90 ++++----
 drivers/net/ethernet/mscc/ocelot_net.c        | 210 +++++++++++++++---
 drivers/net/ethernet/rocker/rocker_ofdpa.c    |   8 +-
 drivers/net/ethernet/ti/am65-cpsw-nuss.c      |   7 +-
 drivers/net/ethernet/ti/cpsw_new.c            |   6 +-
 include/linux/if_bridge.h                     |  56 +++++
 include/net/switchdev.h                       |   1 +
 include/soc/mscc/ocelot.h                     |  13 +-
 net/bridge/br_fdb.c                           |  52 +++++
 net/bridge/br_if.c                            |  11 +-
 net/bridge/br_mdb.c                           |  84 +++++++
 net/bridge/br_private.h                       |   8 +-
 net/bridge/br_stp.c                           |  27 +++
 net/bridge/br_switchdev.c                     |  94 +++++++-
 net/bridge/br_vlan.c                          |  71 ++++++
 net/dsa/dsa_priv.h                            |  23 +-
 net/dsa/port.c                                | 201 +++++++++++++----
 net/dsa/slave.c                               |  11 +-
 net/dsa/switch.c                              |   4 +-
 net/dsa/tag_brcm.c                            |   2 +-
 net/dsa/tag_dsa.c                             |  15 +-
 net/dsa/tag_hellcreek.c                       |   2 +-
 net/dsa/tag_ksz.c                             |   2 +-
 net/dsa/tag_lan9303.c                         |   3 +-
 net/dsa/tag_mtk.c                             |   2 +-
 net/dsa/tag_ocelot.c                          |   2 +-
 net/dsa/tag_ocelot_8021q.c                    |   2 +-
 net/dsa/tag_rtl4_a.c                          |   2 +-
 net/dsa/tag_sja1105.c                         |   4 +-
 net/dsa/tag_xrs700x.c                         |   2 +-
 34 files changed, 845 insertions(+), 189 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 net-next 01/16] net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a bridge
  2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
@ 2021-03-18 23:18 ` Vladimir Oltean
  2021-03-19 22:04   ` Florian Fainelli
  2021-03-22 10:24   ` Tobias Waldekranz
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 02/16] net: dsa: pass extack to dsa_port_{bridge,lag}_join Vladimir Oltean
                   ` (14 subsequent siblings)
  15 siblings, 2 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-18 23:18 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	netdev, linux-kernel, Roopa Prabhu, Nikolay Aleksandrov,
	Jiri Pirko, Ido Schimmel, Alexandre Belloni, UNGLinuxDriver,
	Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

DSA can properly detect and offload this sequence of operations:

ip link add br0 type bridge
ip link add bond0 type bond
ip link set swp0 master bond0
ip link set bond0 master br0

But not this one:

ip link add br0 type bridge
ip link add bond0 type bond
ip link set bond0 master br0
ip link set swp0 master bond0

Actually the second one is more complicated, due to the elapsed time
between the enslavement of bond0 and the offloading of it via swp0, a
lot of things could have happened to the bond0 bridge port in terms of
switchdev objects (host MDBs, VLANs, altered STP state etc). So this is
a bit of a can of worms, and making sure that the DSA port's state is in
sync with this already existing bridge port is handled in the next
patches.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/port.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/net/dsa/port.c b/net/dsa/port.c
index c9c6d7ab3f47..d39262a9fe0e 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -249,17 +249,31 @@ int dsa_port_lag_join(struct dsa_port *dp, struct net_device *lag,
 		.lag = lag,
 		.info = uinfo,
 	};
+	struct net_device *bridge_dev;
 	int err;
 
 	dsa_lag_map(dp->ds->dst, lag);
 	dp->lag_dev = lag;
 
 	err = dsa_port_notify(dp, DSA_NOTIFIER_LAG_JOIN, &info);
-	if (err) {
-		dp->lag_dev = NULL;
-		dsa_lag_unmap(dp->ds->dst, lag);
-	}
+	if (err)
+		goto err_lag_join;
 
+	bridge_dev = netdev_master_upper_dev_get(lag);
+	if (!bridge_dev || !netif_is_bridge_master(bridge_dev))
+		return 0;
+
+	err = dsa_port_bridge_join(dp, bridge_dev);
+	if (err)
+		goto err_bridge_join;
+
+	return 0;
+
+err_bridge_join:
+	dsa_port_notify(dp, DSA_NOTIFIER_LAG_LEAVE, &info);
+err_lag_join:
+	dp->lag_dev = NULL;
+	dsa_lag_unmap(dp->ds->dst, lag);
 	return err;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 net-next 02/16] net: dsa: pass extack to dsa_port_{bridge,lag}_join
  2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 01/16] net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a bridge Vladimir Oltean
@ 2021-03-18 23:18 ` Vladimir Oltean
  2021-03-19 22:05   ` Florian Fainelli
  2021-03-22 10:25   ` Tobias Waldekranz
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 03/16] net: dsa: inherit the actual bridge port flags at join time Vladimir Oltean
                   ` (13 subsequent siblings)
  15 siblings, 2 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-18 23:18 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	netdev, linux-kernel, Roopa Prabhu, Nikolay Aleksandrov,
	Jiri Pirko, Ido Schimmel, Alexandre Belloni, UNGLinuxDriver,
	Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

This is a pretty noisy change that was broken out of the larger change
for replaying switchdev attributes and objects at bridge join time,
which is when these extack objects are actually used.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/dsa_priv.h | 6 ++++--
 net/dsa/port.c     | 8 +++++---
 net/dsa/slave.c    | 7 +++++--
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 4c43c5406834..b8778c5d8529 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -181,12 +181,14 @@ int dsa_port_enable_rt(struct dsa_port *dp, struct phy_device *phy);
 int dsa_port_enable(struct dsa_port *dp, struct phy_device *phy);
 void dsa_port_disable_rt(struct dsa_port *dp);
 void dsa_port_disable(struct dsa_port *dp);
-int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br);
+int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
+			 struct netlink_ext_ack *extack);
 void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br);
 int dsa_port_lag_change(struct dsa_port *dp,
 			struct netdev_lag_lower_state_info *linfo);
 int dsa_port_lag_join(struct dsa_port *dp, struct net_device *lag_dev,
-		      struct netdev_lag_upper_info *uinfo);
+		      struct netdev_lag_upper_info *uinfo,
+		      struct netlink_ext_ack *extack);
 void dsa_port_lag_leave(struct dsa_port *dp, struct net_device *lag_dev);
 int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering,
 			    struct netlink_ext_ack *extack);
diff --git a/net/dsa/port.c b/net/dsa/port.c
index d39262a9fe0e..fcbe5b1545b8 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -144,7 +144,8 @@ static void dsa_port_change_brport_flags(struct dsa_port *dp,
 	}
 }
 
-int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br)
+int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
+			 struct netlink_ext_ack *extack)
 {
 	struct dsa_notifier_bridge_info info = {
 		.tree_index = dp->ds->dst->index,
@@ -241,7 +242,8 @@ int dsa_port_lag_change(struct dsa_port *dp,
 }
 
 int dsa_port_lag_join(struct dsa_port *dp, struct net_device *lag,
-		      struct netdev_lag_upper_info *uinfo)
+		      struct netdev_lag_upper_info *uinfo,
+		      struct netlink_ext_ack *extack)
 {
 	struct dsa_notifier_lag_info info = {
 		.sw_index = dp->ds->index,
@@ -263,7 +265,7 @@ int dsa_port_lag_join(struct dsa_port *dp, struct net_device *lag,
 	if (!bridge_dev || !netif_is_bridge_master(bridge_dev))
 		return 0;
 
-	err = dsa_port_bridge_join(dp, bridge_dev);
+	err = dsa_port_bridge_join(dp, bridge_dev, extack);
 	if (err)
 		goto err_bridge_join;
 
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 992fcab4b552..1ff48be476bb 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1976,11 +1976,14 @@ static int dsa_slave_changeupper(struct net_device *dev,
 				 struct netdev_notifier_changeupper_info *info)
 {
 	struct dsa_port *dp = dsa_slave_to_port(dev);
+	struct netlink_ext_ack *extack;
 	int err = NOTIFY_DONE;
 
+	extack = netdev_notifier_info_to_extack(&info->info);
+
 	if (netif_is_bridge_master(info->upper_dev)) {
 		if (info->linking) {
-			err = dsa_port_bridge_join(dp, info->upper_dev);
+			err = dsa_port_bridge_join(dp, info->upper_dev, extack);
 			if (!err)
 				dsa_bridge_mtu_normalization(dp);
 			err = notifier_from_errno(err);
@@ -1991,7 +1994,7 @@ static int dsa_slave_changeupper(struct net_device *dev,
 	} else if (netif_is_lag_master(info->upper_dev)) {
 		if (info->linking) {
 			err = dsa_port_lag_join(dp, info->upper_dev,
-						info->upper_info);
+						info->upper_info, extack);
 			if (err == -EOPNOTSUPP) {
 				NL_SET_ERR_MSG_MOD(info->info.extack,
 						   "Offloading not supported");
-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 net-next 03/16] net: dsa: inherit the actual bridge port flags at join time
  2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 01/16] net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a bridge Vladimir Oltean
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 02/16] net: dsa: pass extack to dsa_port_{bridge,lag}_join Vladimir Oltean
@ 2021-03-18 23:18 ` Vladimir Oltean
  2021-03-19 22:08   ` Florian Fainelli
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 04/16] net: dsa: sync up with bridge port's STP state when joining Vladimir Oltean
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-18 23:18 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	netdev, linux-kernel, Roopa Prabhu, Nikolay Aleksandrov,
	Jiri Pirko, Ido Schimmel, Alexandre Belloni, UNGLinuxDriver,
	Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

DSA currently assumes that the bridge port starts off with this
constellation of bridge port flags:

- learning on
- unicast flooding on
- multicast flooding on
- broadcast flooding on

just by virtue of code copy-pasta from the bridge layer (new_nbp).
This was a simple enough strategy thus far, because the 'bridge join'
moment always coincided with the 'bridge port creation' moment.

But with sandwiched interfaces, such as:

 br0
  |
bond0
  |
 swp0

it may happen that the user has had time to change the bridge port flags
of bond0 before enslaving swp0 to it. In that case, swp0 will falsely
assume that the bridge port flags are those determined by new_nbp, when
in fact this can happen:

ip link add br0 type bridge
ip link add bond0 type bond
ip link set bond0 master br0
ip link set bond0 type bridge_slave learning off
ip link set swp0 master br0

Now swp0 has learning enabled, bond0 has learning disabled. Not nice.

Fix this by "dumpster diving" through the actual bridge port flags with
br_port_flag_is_set, at bridge join time.

We use this opportunity to split dsa_port_change_brport_flags into two
distinct functions called dsa_port_inherit_brport_flags and
dsa_port_clear_brport_flags, now that the implementation for the two
cases is no longer similar.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/port.c | 123 ++++++++++++++++++++++++++++++++-----------------
 1 file changed, 82 insertions(+), 41 deletions(-)

diff --git a/net/dsa/port.c b/net/dsa/port.c
index fcbe5b1545b8..346c50467810 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -122,26 +122,82 @@ void dsa_port_disable(struct dsa_port *dp)
 	rtnl_unlock();
 }
 
-static void dsa_port_change_brport_flags(struct dsa_port *dp,
-					 bool bridge_offload)
+static void dsa_port_clear_brport_flags(struct dsa_port *dp,
+					struct netlink_ext_ack *extack)
 {
 	struct switchdev_brport_flags flags;
-	int flag;
 
-	flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
-	if (bridge_offload)
-		flags.val = flags.mask;
-	else
-		flags.val = flags.mask & ~BR_LEARNING;
+	flags.mask = BR_LEARNING;
+	flags.val = 0;
+	dsa_port_bridge_flags(dp, flags, extack);
+
+	flags.mask = BR_FLOOD;
+	flags.val = BR_FLOOD;
+	dsa_port_bridge_flags(dp, flags, extack);
+
+	flags.mask = BR_MCAST_FLOOD;
+	flags.val = BR_MCAST_FLOOD;
+	dsa_port_bridge_flags(dp, flags, extack);
+
+	flags.mask = BR_BCAST_FLOOD;
+	flags.val = BR_BCAST_FLOOD;
+	dsa_port_bridge_flags(dp, flags, extack);
+}
+
+static int dsa_port_inherit_brport_flags(struct dsa_port *dp,
+					 struct netlink_ext_ack *extack)
+{
+	const unsigned long mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD |
+				   BR_BCAST_FLOOD;
+	struct net_device *brport_dev = dsa_port_to_bridge_port(dp);
+	int flag, err;
+
+	for_each_set_bit(flag, &mask, 32) {
+		struct switchdev_brport_flags flags = {0};
 
-	for_each_set_bit(flag, &flags.mask, 32) {
-		struct switchdev_brport_flags tmp;
+		flags.mask = BIT(flag);
 
-		tmp.val = flags.val & BIT(flag);
-		tmp.mask = BIT(flag);
+		if (br_port_flag_is_set(brport_dev, BIT(flag)))
+			flags.val = BIT(flag);
 
-		dsa_port_bridge_flags(dp, tmp, NULL);
+		err = dsa_port_bridge_flags(dp, flags, extack);
+		if (err && err != -EOPNOTSUPP)
+			return err;
 	}
+
+	return 0;
+}
+
+static int dsa_port_switchdev_sync(struct dsa_port *dp,
+				   struct netlink_ext_ack *extack)
+{
+	int err;
+
+	err = dsa_port_inherit_brport_flags(dp, extack);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+/* Configure the port for standalone mode (no address learning, flood
+ * everything, BR_STATE_FORWARDING, etc).
+ * The bridge only emits SWITCHDEV_ATTR_ID_PORT_* events when the user
+ * requests it through netlink or sysfs, but not automatically at port
+ * join or leave, so we need to handle resetting the brport flags ourselves.
+ * But we even prefer it that way, because otherwise, some setups might never
+ * get the notification they need, for example, when a port leaves a LAG that
+ * offloads the bridge, it becomes standalone, but as far as the bridge is
+ * concerned, no port ever left.
+ */
+static void dsa_port_switchdev_unsync(struct dsa_port *dp)
+{
+	dsa_port_clear_brport_flags(dp, NULL);
+
+	/* Port left the bridge, put in BR_STATE_DISABLED by the bridge layer,
+	 * so allow it to be in BR_STATE_FORWARDING to be kept functional
+	 */
+	dsa_port_set_state_now(dp, BR_STATE_FORWARDING);
 }
 
 int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
@@ -155,24 +211,25 @@ int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
 	};
 	int err;
 
-	/* Notify the port driver to set its configurable flags in a way that
-	 * matches the initial settings of a bridge port.
-	 */
-	dsa_port_change_brport_flags(dp, true);
-
 	/* Here the interface is already bridged. Reflect the current
 	 * configuration so that drivers can program their chips accordingly.
 	 */
 	dp->bridge_dev = br;
 
 	err = dsa_broadcast(DSA_NOTIFIER_BRIDGE_JOIN, &info);
+	if (err)
+		goto out_rollback;
 
-	/* The bridging is rolled back on error */
-	if (err) {
-		dsa_port_change_brport_flags(dp, false);
-		dp->bridge_dev = NULL;
-	}
+	err = dsa_port_switchdev_sync(dp, extack);
+	if (err)
+		goto out_rollback_unbridge;
 
+	return 0;
+
+out_rollback_unbridge:
+	dsa_broadcast(DSA_NOTIFIER_BRIDGE_LEAVE, &info);
+out_rollback:
+	dp->bridge_dev = NULL;
 	return err;
 }
 
@@ -186,6 +243,8 @@ void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br)
 	};
 	int err;
 
+	dsa_port_switchdev_unsync(dp);
+
 	/* Here the port is already unbridged. Reflect the current configuration
 	 * so that drivers can program their chips accordingly.
 	 */
@@ -194,24 +253,6 @@ void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br)
 	err = dsa_broadcast(DSA_NOTIFIER_BRIDGE_LEAVE, &info);
 	if (err)
 		pr_err("DSA: failed to notify DSA_NOTIFIER_BRIDGE_LEAVE\n");
-
-	/* Configure the port for standalone mode (no address learning,
-	 * flood everything).
-	 * The bridge only emits SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS events
-	 * when the user requests it through netlink or sysfs, but not
-	 * automatically at port join or leave, so we need to handle resetting
-	 * the brport flags ourselves. But we even prefer it that way, because
-	 * otherwise, some setups might never get the notification they need,
-	 * for example, when a port leaves a LAG that offloads the bridge,
-	 * it becomes standalone, but as far as the bridge is concerned, no
-	 * port ever left.
-	 */
-	dsa_port_change_brport_flags(dp, false);
-
-	/* Port left the bridge, put in BR_STATE_DISABLED by the bridge layer,
-	 * so allow it to be in BR_STATE_FORWARDING to be kept functional
-	 */
-	dsa_port_set_state_now(dp, BR_STATE_FORWARDING);
 }
 
 int dsa_port_lag_change(struct dsa_port *dp,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 net-next 04/16] net: dsa: sync up with bridge port's STP state when joining
  2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (2 preceding siblings ...)
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 03/16] net: dsa: inherit the actual bridge port flags at join time Vladimir Oltean
@ 2021-03-18 23:18 ` Vladimir Oltean
  2021-03-19 22:11   ` Florian Fainelli
  2021-03-22 10:29   ` Tobias Waldekranz
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 05/16] net: dsa: sync up VLAN filtering state when joining the bridge Vladimir Oltean
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-18 23:18 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	netdev, linux-kernel, Roopa Prabhu, Nikolay Aleksandrov,
	Jiri Pirko, Ido Schimmel, Alexandre Belloni, UNGLinuxDriver,
	Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

It may happen that we have the following topology:

ip link add br0 type bridge stp_state 1
ip link add bond0 type bond
ip link set bond0 master br0
ip link set swp0 master bond0
ip link set swp1 master bond0

STP decides that it should put bond0 into the BLOCKING state, and
that's that. The ports that are actively listening for the switchdev
port attributes emitted for the bond0 bridge port (because they are
offloading it) and have the honor of seeing that switchdev port
attribute can react to it, so we can program swp0 and swp1 into the
BLOCKING state.

But if then we do:

ip link set swp2 master bond0

then as far as the bridge is concerned, nothing has changed: it still
has one bridge port. But this new bridge port will not see any STP state
change notification and will remain FORWARDING, which is how the
standalone code leaves it in.

Add a function to the bridge which retrieves the current STP state, such
that drivers can synchronize to it when they may have missed switchdev
events.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/linux/if_bridge.h |  6 ++++++
 net/bridge/br_stp.c       | 14 ++++++++++++++
 net/dsa/port.c            |  7 +++++++
 3 files changed, 27 insertions(+)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index b979005ea39c..920d3a02cc68 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -136,6 +136,7 @@ struct net_device *br_fdb_find_port(const struct net_device *br_dev,
 				    __u16 vid);
 void br_fdb_clear_offload(const struct net_device *dev, u16 vid);
 bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
+u8 br_port_get_stp_state(const struct net_device *dev);
 #else
 static inline struct net_device *
 br_fdb_find_port(const struct net_device *br_dev,
@@ -154,6 +155,11 @@ br_port_flag_is_set(const struct net_device *dev, unsigned long flag)
 {
 	return false;
 }
+
+static inline u8 br_port_get_stp_state(const struct net_device *dev)
+{
+	return BR_STATE_DISABLED;
+}
 #endif
 
 #endif
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index 21c6781906aa..86b5e05d3f21 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -64,6 +64,20 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
 	}
 }
 
+u8 br_port_get_stp_state(const struct net_device *dev)
+{
+	struct net_bridge_port *p;
+
+	ASSERT_RTNL();
+
+	p = br_port_get_rtnl(dev);
+	if (!p)
+		return BR_STATE_DISABLED;
+
+	return p->state;
+}
+EXPORT_SYMBOL_GPL(br_port_get_stp_state);
+
 /* called under bridge lock */
 struct net_bridge_port *br_get_port(struct net_bridge *br, u16 port_no)
 {
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 346c50467810..785374744462 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -171,12 +171,19 @@ static int dsa_port_inherit_brport_flags(struct dsa_port *dp,
 static int dsa_port_switchdev_sync(struct dsa_port *dp,
 				   struct netlink_ext_ack *extack)
 {
+	struct net_device *brport_dev = dsa_port_to_bridge_port(dp);
+	u8 stp_state;
 	int err;
 
 	err = dsa_port_inherit_brport_flags(dp, extack);
 	if (err)
 		return err;
 
+	stp_state = br_port_get_stp_state(brport_dev);
+	err = dsa_port_set_state(dp, stp_state);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 net-next 05/16] net: dsa: sync up VLAN filtering state when joining the bridge
  2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (3 preceding siblings ...)
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 04/16] net: dsa: sync up with bridge port's STP state when joining Vladimir Oltean
@ 2021-03-18 23:18 ` Vladimir Oltean
  2021-03-19 22:11   ` Florian Fainelli
  2021-03-22 10:30   ` Tobias Waldekranz
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 06/16] net: dsa: sync multicast router " Vladimir Oltean
                   ` (10 subsequent siblings)
  15 siblings, 2 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-18 23:18 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	netdev, linux-kernel, Roopa Prabhu, Nikolay Aleksandrov,
	Jiri Pirko, Ido Schimmel, Alexandre Belloni, UNGLinuxDriver,
	Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

This is the same situation as for other switchdev port attributes: if we
join an already-created bridge port, such as a bond master interface,
then we can miss the initial switchdev notification emitted by the
bridge for this port.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/port.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/dsa/port.c b/net/dsa/port.c
index 785374744462..ac1afe182c3b 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -172,6 +172,7 @@ static int dsa_port_switchdev_sync(struct dsa_port *dp,
 				   struct netlink_ext_ack *extack)
 {
 	struct net_device *brport_dev = dsa_port_to_bridge_port(dp);
+	struct net_device *br = dp->bridge_dev;
 	u8 stp_state;
 	int err;
 
@@ -184,6 +185,10 @@ static int dsa_port_switchdev_sync(struct dsa_port *dp,
 	if (err && err != -EOPNOTSUPP)
 		return err;
 
+	err = dsa_port_vlan_filtering(dp, br, extack);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
 	return 0;
 }
 
@@ -205,6 +210,8 @@ static void dsa_port_switchdev_unsync(struct dsa_port *dp)
 	 * so allow it to be in BR_STATE_FORWARDING to be kept functional
 	 */
 	dsa_port_set_state_now(dp, BR_STATE_FORWARDING);
+
+	/* VLAN filtering is handled by dsa_switch_bridge_leave */
 }
 
 int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 net-next 06/16] net: dsa: sync multicast router state when joining the bridge
  2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (4 preceding siblings ...)
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 05/16] net: dsa: sync up VLAN filtering state when joining the bridge Vladimir Oltean
@ 2021-03-18 23:18 ` Vladimir Oltean
  2021-03-19 22:12   ` Florian Fainelli
  2021-03-22 11:17   ` Tobias Waldekranz
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 07/16] net: dsa: sync ageing time " Vladimir Oltean
                   ` (9 subsequent siblings)
  15 siblings, 2 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-18 23:18 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	netdev, linux-kernel, Roopa Prabhu, Nikolay Aleksandrov,
	Jiri Pirko, Ido Schimmel, Alexandre Belloni, UNGLinuxDriver,
	Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

Make sure that the multicast router setting of the bridge is picked up
correctly by DSA when joining, regardless of whether there are
sandwiched interfaces or not. The SWITCHDEV_ATTR_ID_BRIDGE_MROUTER port
attribute is only emitted from br_mc_router_state_change.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/port.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/net/dsa/port.c b/net/dsa/port.c
index ac1afe182c3b..8380509ee47c 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -189,6 +189,10 @@ static int dsa_port_switchdev_sync(struct dsa_port *dp,
 	if (err && err != -EOPNOTSUPP)
 		return err;
 
+	err = dsa_port_mrouter(dp->cpu_dp, br_multicast_router(br), extack);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
 	return 0;
 }
 
@@ -212,6 +216,12 @@ static void dsa_port_switchdev_unsync(struct dsa_port *dp)
 	dsa_port_set_state_now(dp, BR_STATE_FORWARDING);
 
 	/* VLAN filtering is handled by dsa_switch_bridge_leave */
+
+	/* Some drivers treat the notification for having a local multicast
+	 * router by allowing multicast to be flooded to the CPU, so we should
+	 * allow this in standalone mode too.
+	 */
+	dsa_port_mrouter(dp->cpu_dp, true, NULL);
 }
 
 int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 net-next 07/16] net: dsa: sync ageing time when joining the bridge
  2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (5 preceding siblings ...)
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 06/16] net: dsa: sync multicast router " Vladimir Oltean
@ 2021-03-18 23:18 ` Vladimir Oltean
  2021-03-19 22:13   ` Florian Fainelli
  2021-03-22 11:20   ` Tobias Waldekranz
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 08/16] net: dsa: replay port and host-joined mdb entries " Vladimir Oltean
                   ` (8 subsequent siblings)
  15 siblings, 2 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-18 23:18 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	netdev, linux-kernel, Roopa Prabhu, Nikolay Aleksandrov,
	Jiri Pirko, Ido Schimmel, Alexandre Belloni, UNGLinuxDriver,
	Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from:

sysfs/ioctl/netlink
-> br_set_ageing_time
   -> __set_ageing_time

therefore not at bridge port creation time, so:
(a) drivers had to hardcode the initial value for the address ageing time,
    because they didn't get any notification
(b) that hardcoded value can be out of sync, if the user changes the
    ageing time before enslaving the port to the bridge

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/linux/if_bridge.h |  6 ++++++
 net/bridge/br_stp.c       | 13 +++++++++++++
 net/dsa/port.c            | 10 ++++++++++
 3 files changed, 29 insertions(+)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 920d3a02cc68..ebd16495459c 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -137,6 +137,7 @@ struct net_device *br_fdb_find_port(const struct net_device *br_dev,
 void br_fdb_clear_offload(const struct net_device *dev, u16 vid);
 bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
 u8 br_port_get_stp_state(const struct net_device *dev);
+clock_t br_get_ageing_time(struct net_device *br_dev);
 #else
 static inline struct net_device *
 br_fdb_find_port(const struct net_device *br_dev,
@@ -160,6 +161,11 @@ static inline u8 br_port_get_stp_state(const struct net_device *dev)
 {
 	return BR_STATE_DISABLED;
 }
+
+static inline clock_t br_get_ageing_time(struct net_device *br_dev)
+{
+	return 0;
+}
 #endif
 
 #endif
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index 86b5e05d3f21..3dafb6143cff 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -639,6 +639,19 @@ int br_set_ageing_time(struct net_bridge *br, clock_t ageing_time)
 	return 0;
 }
 
+clock_t br_get_ageing_time(struct net_device *br_dev)
+{
+	struct net_bridge *br;
+
+	if (!netif_is_bridge_master(br_dev))
+		return 0;
+
+	br = netdev_priv(br_dev);
+
+	return jiffies_to_clock_t(br->ageing_time);
+}
+EXPORT_SYMBOL_GPL(br_get_ageing_time);
+
 /* called under bridge lock */
 void __br_set_topology_change(struct net_bridge *br, unsigned char val)
 {
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 8380509ee47c..9fde2371e1bc 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -173,6 +173,7 @@ static int dsa_port_switchdev_sync(struct dsa_port *dp,
 {
 	struct net_device *brport_dev = dsa_port_to_bridge_port(dp);
 	struct net_device *br = dp->bridge_dev;
+	clock_t ageing_time;
 	u8 stp_state;
 	int err;
 
@@ -193,6 +194,11 @@ static int dsa_port_switchdev_sync(struct dsa_port *dp,
 	if (err && err != -EOPNOTSUPP)
 		return err;
 
+	ageing_time = br_get_ageing_time(br);
+	err = dsa_port_ageing_time(dp, ageing_time);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
 	return 0;
 }
 
@@ -222,6 +228,10 @@ static void dsa_port_switchdev_unsync(struct dsa_port *dp)
 	 * allow this in standalone mode too.
 	 */
 	dsa_port_mrouter(dp->cpu_dp, true, NULL);
+
+	/* Ageing time may be global to the switch chip, so don't change it
+	 * here because we have no good reason (or value) to change it to.
+	 */
 }
 
 int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 net-next 08/16] net: dsa: replay port and host-joined mdb entries when joining the bridge
  2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (6 preceding siblings ...)
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 07/16] net: dsa: sync ageing time " Vladimir Oltean
@ 2021-03-18 23:18 ` Vladimir Oltean
  2021-03-19 22:20   ` Florian Fainelli
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 09/16] net: dsa: replay port and local fdb " Vladimir Oltean
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-18 23:18 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	netdev, linux-kernel, Roopa Prabhu, Nikolay Aleksandrov,
	Jiri Pirko, Ido Schimmel, Alexandre Belloni, UNGLinuxDriver,
	Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

I have udhcpcd in my system and this is configured to bring interfaces
up as soon as they are created.

I create a bridge as follows:

ip link add br0 type bridge

As soon as I create the bridge and udhcpcd brings it up, I have some
other crap (avahi) that starts sending some random IPv6 packets to
advertise some local services, and from there, the br0 bridge joins the
following IPv6 groups:

33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0

br_dev_xmit
-> br_multicast_rcv
   -> br_ip6_multicast_add_group
      -> __br_multicast_add_group
         -> br_multicast_host_join
            -> br_mdb_notify

This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.

Then when we add a port to br0:

ip link set swp0 master br0

the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.

The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.

Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.

To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.

Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/linux/if_bridge.h |  9 +++++
 net/bridge/br_mdb.c       | 84 +++++++++++++++++++++++++++++++++++++++
 net/dsa/dsa_priv.h        |  2 +
 net/dsa/port.c            |  6 +++
 net/dsa/slave.c           |  2 +-
 5 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index ebd16495459c..4c25dafb013d 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -69,6 +69,8 @@ bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto);
 bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto);
 bool br_multicast_enabled(const struct net_device *dev);
 bool br_multicast_router(const struct net_device *dev);
+int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
+		  struct notifier_block *nb, struct netlink_ext_ack *extack);
 #else
 static inline int br_multicast_list_adjacent(struct net_device *dev,
 					     struct list_head *br_ip_list)
@@ -93,6 +95,13 @@ static inline bool br_multicast_router(const struct net_device *dev)
 {
 	return false;
 }
+static inline int br_mdb_replay(struct net_device *br_dev,
+				struct net_device *dev,
+				struct notifier_block *nb,
+				struct netlink_ext_ack *extack)
+{
+	return -EINVAL;
+}
 #endif
 
 #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_VLAN_FILTERING)
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 8846c5bcd075..23973186094c 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -506,6 +506,90 @@ static void br_mdb_complete(struct net_device *dev, int err, void *priv)
 	kfree(priv);
 }
 
+static int br_mdb_replay_one(struct notifier_block *nb, struct net_device *dev,
+			     struct net_bridge_mdb_entry *mp, int obj_id,
+			     struct net_device *orig_dev,
+			     struct netlink_ext_ack *extack)
+{
+	struct switchdev_notifier_port_obj_info obj_info = {
+		.info = {
+			.dev = dev,
+			.extack = extack,
+		},
+	};
+	struct switchdev_obj_port_mdb mdb = {
+		.obj = {
+			.orig_dev = orig_dev,
+			.id = obj_id,
+		},
+		.vid = mp->addr.vid,
+	};
+	int err;
+
+	if (mp->addr.proto == htons(ETH_P_IP))
+		ip_eth_mc_map(mp->addr.dst.ip4, mdb.addr);
+#if IS_ENABLED(CONFIG_IPV6)
+	else if (mp->addr.proto == htons(ETH_P_IPV6))
+		ipv6_eth_mc_map(&mp->addr.dst.ip6, mdb.addr);
+#endif
+	else
+		ether_addr_copy(mdb.addr, mp->addr.dst.mac_addr);
+
+	obj_info.obj = &mdb.obj;
+
+	err = nb->notifier_call(nb, SWITCHDEV_PORT_OBJ_ADD, &obj_info);
+	return notifier_to_errno(err);
+}
+
+int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
+		  struct notifier_block *nb, struct netlink_ext_ack *extack)
+{
+	struct net_bridge_mdb_entry *mp;
+	struct list_head mdb_list;
+	struct net_bridge *br;
+	int err = 0;
+
+	ASSERT_RTNL();
+
+	INIT_LIST_HEAD(&mdb_list);
+
+	if (!netif_is_bridge_master(br_dev) || !netif_is_bridge_port(dev))
+		return -EINVAL;
+
+	br = netdev_priv(br_dev);
+
+	if (!br_opt_get(br, BROPT_MULTICAST_ENABLED))
+		return 0;
+
+	hlist_for_each_entry(mp, &br->mdb_list, mdb_node) {
+		struct net_bridge_port_group __rcu **pp;
+		struct net_bridge_port_group *p;
+
+		if (mp->host_joined) {
+			err = br_mdb_replay_one(nb, dev, mp,
+						SWITCHDEV_OBJ_ID_HOST_MDB,
+						br_dev, extack);
+			if (err)
+				return err;
+		}
+
+		for (pp = &mp->ports; (p = rtnl_dereference(*pp)) != NULL;
+		     pp = &p->next) {
+			if (p->key.port->dev != dev)
+				continue;
+
+			err = br_mdb_replay_one(nb, dev, mp,
+						SWITCHDEV_OBJ_ID_PORT_MDB,
+						dev, extack);
+			if (err)
+				return err;
+		}
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(br_mdb_replay);
+
 static void br_mdb_switchdev_host_port(struct net_device *dev,
 				       struct net_device *lower_dev,
 				       struct net_bridge_mdb_entry *mp,
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index b8778c5d8529..b14c43cb88bb 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -262,6 +262,8 @@ static inline bool dsa_tree_offloads_bridge_port(struct dsa_switch_tree *dst,
 
 /* slave.c */
 extern const struct dsa_device_ops notag_netdev_ops;
+extern struct notifier_block dsa_slave_switchdev_blocking_notifier;
+
 void dsa_slave_mii_bus_init(struct dsa_switch *ds);
 int dsa_slave_create(struct dsa_port *dp);
 void dsa_slave_destroy(struct net_device *slave_dev);
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 9fde2371e1bc..6670612f96c6 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -199,6 +199,12 @@ static int dsa_port_switchdev_sync(struct dsa_port *dp,
 	if (err && err != -EOPNOTSUPP)
 		return err;
 
+	err = br_mdb_replay(br, brport_dev,
+			    &dsa_slave_switchdev_blocking_notifier,
+			    extack);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
 	return 0;
 }
 
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 1ff48be476bb..b974d8f84a2e 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -2396,7 +2396,7 @@ static struct notifier_block dsa_slave_switchdev_notifier = {
 	.notifier_call = dsa_slave_switchdev_event,
 };
 
-static struct notifier_block dsa_slave_switchdev_blocking_notifier = {
+struct notifier_block dsa_slave_switchdev_blocking_notifier = {
 	.notifier_call = dsa_slave_switchdev_blocking_event,
 };
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 net-next 09/16] net: dsa: replay port and local fdb entries when joining the bridge
  2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (7 preceding siblings ...)
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 08/16] net: dsa: replay port and host-joined mdb entries " Vladimir Oltean
@ 2021-03-18 23:18 ` Vladimir Oltean
  2021-03-22 15:44   ` Tobias Waldekranz
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 10/16] net: dsa: replay VLANs installed on port " Vladimir Oltean
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-18 23:18 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	netdev, linux-kernel, Roopa Prabhu, Nikolay Aleksandrov,
	Jiri Pirko, Ido Schimmel, Alexandre Belloni, UNGLinuxDriver,
	Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

When a DSA port joins a LAG that already had an FDB entry pointing to it:

ip link set bond0 master br0
bridge fdb add dev bond0 00:01:02:03:04:05 master static
ip link set swp0 master bond0

the DSA port will have no idea that this FDB entry is there, because it
missed the switchdev event emitted at its creation.

Ido Schimmel pointed this out during a discussion about challenges with
switchdev offloading of stacked interfaces between the physical port and
the bridge, and recommended to just catch that condition and deny the
CHANGEUPPER event:
https://lore.kernel.org/netdev/20210210105949.GB287766@shredder.lan/

But in fact, we might need to deal with the hard thing anyway, which is
to replay all FDB addresses relevant to this port, because it isn't just
static FDB entries, but also local addresses (ones that are not
forwarded but terminated by the bridge). There, we can't just say 'oh
yeah, there was an upper already so I'm not joining that'.

So, similar to the logic for replaying MDB entries, add a function that
must be called by individual switchdev drivers and replays local FDB
entries as well as ones pointing towards a bridge port. This time, we
use the atomic switchdev notifier block, since that's what FDB entries
expect for some reason.

Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/linux/if_bridge.h |  9 +++++++
 include/net/switchdev.h   |  1 +
 net/bridge/br_fdb.c       | 52 +++++++++++++++++++++++++++++++++++++++
 net/dsa/dsa_priv.h        |  1 +
 net/dsa/port.c            |  4 +++
 net/dsa/slave.c           |  2 +-
 6 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 4c25dafb013d..89596134e88f 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -147,6 +147,8 @@ void br_fdb_clear_offload(const struct net_device *dev, u16 vid);
 bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
 u8 br_port_get_stp_state(const struct net_device *dev);
 clock_t br_get_ageing_time(struct net_device *br_dev);
+int br_fdb_replay(struct net_device *br_dev, struct net_device *dev,
+		  struct notifier_block *nb);
 #else
 static inline struct net_device *
 br_fdb_find_port(const struct net_device *br_dev,
@@ -175,6 +177,13 @@ static inline clock_t br_get_ageing_time(struct net_device *br_dev)
 {
 	return 0;
 }
+
+static inline int br_fdb_replay(struct net_device *br_dev,
+				struct net_device *dev,
+				struct notifier_block *nb)
+{
+	return -EINVAL;
+}
 #endif
 
 #endif
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index b7fc7d0f54e2..7688ec572757 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -205,6 +205,7 @@ struct switchdev_notifier_info {
 
 struct switchdev_notifier_fdb_info {
 	struct switchdev_notifier_info info; /* must be first */
+	struct list_head list;
 	const unsigned char *addr;
 	u16 vid;
 	u8 added_by_user:1,
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index b7490237f3fc..49125cc196ac 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -726,6 +726,58 @@ static inline size_t fdb_nlmsg_size(void)
 		+ nla_total_size(sizeof(u8)); /* NFEA_ACTIVITY_NOTIFY */
 }
 
+static int br_fdb_replay_one(struct notifier_block *nb,
+			     struct net_bridge_fdb_entry *fdb,
+			     struct net_device *dev)
+{
+	struct switchdev_notifier_fdb_info item;
+	int err;
+
+	item.addr = fdb->key.addr.addr;
+	item.vid = fdb->key.vlan_id;
+	item.added_by_user = test_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
+	item.offloaded = test_bit(BR_FDB_OFFLOADED, &fdb->flags);
+	item.info.dev = dev;
+
+	err = nb->notifier_call(nb, SWITCHDEV_FDB_ADD_TO_DEVICE, &item);
+	return notifier_to_errno(err);
+}
+
+int br_fdb_replay(struct net_device *br_dev, struct net_device *dev,
+		  struct notifier_block *nb)
+{
+	struct net_bridge_fdb_entry *fdb;
+	struct net_bridge *br;
+	int err = 0;
+
+	if (!netif_is_bridge_master(br_dev))
+		return -EINVAL;
+
+	if (!netif_is_bridge_port(dev))
+		return -EINVAL;
+
+	br = netdev_priv(br_dev);
+
+	rcu_read_lock();
+
+	hlist_for_each_entry_rcu(fdb, &br->fdb_list, fdb_node) {
+		struct net_device *dst_dev;
+
+		dst_dev = fdb->dst ? fdb->dst->dev : br->dev;
+		if (dst_dev != br_dev && dst_dev != dev)
+			continue;
+
+		err = br_fdb_replay_one(nb, fdb, dst_dev);
+		if (err)
+			break;
+	}
+
+	rcu_read_unlock();
+
+	return err;
+}
+EXPORT_SYMBOL(br_fdb_replay);
+
 static void fdb_notify(struct net_bridge *br,
 		       const struct net_bridge_fdb_entry *fdb, int type,
 		       bool swdev_notify)
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index b14c43cb88bb..92282de54230 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -262,6 +262,7 @@ static inline bool dsa_tree_offloads_bridge_port(struct dsa_switch_tree *dst,
 
 /* slave.c */
 extern const struct dsa_device_ops notag_netdev_ops;
+extern struct notifier_block dsa_slave_switchdev_notifier;
 extern struct notifier_block dsa_slave_switchdev_blocking_notifier;
 
 void dsa_slave_mii_bus_init(struct dsa_switch *ds);
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 6670612f96c6..9850051071f2 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -205,6 +205,10 @@ static int dsa_port_switchdev_sync(struct dsa_port *dp,
 	if (err && err != -EOPNOTSUPP)
 		return err;
 
+	err = br_fdb_replay(br, brport_dev, &dsa_slave_switchdev_notifier);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
 	return 0;
 }
 
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index b974d8f84a2e..c51e52418a62 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -2392,7 +2392,7 @@ static struct notifier_block dsa_slave_nb __read_mostly = {
 	.notifier_call  = dsa_slave_netdevice_event,
 };
 
-static struct notifier_block dsa_slave_switchdev_notifier = {
+struct notifier_block dsa_slave_switchdev_notifier = {
 	.notifier_call = dsa_slave_switchdev_event,
 };
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 net-next 10/16] net: dsa: replay VLANs installed on port when joining the bridge
  2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (8 preceding siblings ...)
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 09/16] net: dsa: replay port and local fdb " Vladimir Oltean
@ 2021-03-18 23:18 ` Vladimir Oltean
  2021-03-19 22:24   ` Florian Fainelli
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 11/16] net: ocelot: support multiple bridges Vladimir Oltean
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-18 23:18 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	netdev, linux-kernel, Roopa Prabhu, Nikolay Aleksandrov,
	Jiri Pirko, Ido Schimmel, Alexandre Belloni, UNGLinuxDriver,
	Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

Currently this simple setup:

ip link add br0 type bridge vlan_filtering 1
ip link add bond0 type bond
ip link set bond0 master br0
ip link set swp0 master bond0

will not work because the bridge has created the PVID in br_add_if ->
nbp_vlan_init, and it has notified switchdev of the existence of VLAN 1,
but that was too early, since swp0 was not yet a lower of bond0, so it
had no reason to act upon that notification.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/linux/if_bridge.h | 10 ++++++
 net/bridge/br_vlan.c      | 71 +++++++++++++++++++++++++++++++++++++++
 net/dsa/port.c            |  6 ++++
 3 files changed, 87 insertions(+)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 89596134e88f..ea176c508c0d 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -111,6 +111,8 @@ int br_vlan_get_pvid_rcu(const struct net_device *dev, u16 *p_pvid);
 int br_vlan_get_proto(const struct net_device *dev, u16 *p_proto);
 int br_vlan_get_info(const struct net_device *dev, u16 vid,
 		     struct bridge_vlan_info *p_vinfo);
+int br_vlan_replay(struct net_device *br_dev, struct net_device *dev,
+		   struct notifier_block *nb, struct netlink_ext_ack *extack);
 #else
 static inline bool br_vlan_enabled(const struct net_device *dev)
 {
@@ -137,6 +139,14 @@ static inline int br_vlan_get_info(const struct net_device *dev, u16 vid,
 {
 	return -EINVAL;
 }
+
+static inline int br_vlan_replay(struct net_device *br_dev,
+				 struct net_device *dev,
+				 struct notifier_block *nb,
+				 struct netlink_ext_ack *extack)
+{
+	return -EINVAL;
+}
 #endif
 
 #if IS_ENABLED(CONFIG_BRIDGE)
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 8829f621b8ec..45a4eac1b217 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -1751,6 +1751,77 @@ void br_vlan_notify(const struct net_bridge *br,
 	kfree_skb(skb);
 }
 
+static int br_vlan_replay_one(struct notifier_block *nb,
+			      struct net_device *dev,
+			      struct switchdev_obj_port_vlan *vlan,
+			      struct netlink_ext_ack *extack)
+{
+	struct switchdev_notifier_port_obj_info obj_info = {
+		.info = {
+			.dev = dev,
+			.extack = extack,
+		},
+		.obj = &vlan->obj,
+	};
+	int err;
+
+	err = nb->notifier_call(nb, SWITCHDEV_PORT_OBJ_ADD, &obj_info);
+	return notifier_to_errno(err);
+}
+
+int br_vlan_replay(struct net_device *br_dev, struct net_device *dev,
+		   struct notifier_block *nb, struct netlink_ext_ack *extack)
+{
+	struct net_bridge_vlan_group *vg;
+	struct net_bridge_vlan *v;
+	struct net_bridge_port *p;
+	struct net_bridge *br;
+	int err = 0;
+	u16 pvid;
+
+	ASSERT_RTNL();
+
+	if (!netif_is_bridge_master(br_dev))
+		return -EINVAL;
+
+	if (!netif_is_bridge_master(dev) && !netif_is_bridge_port(dev))
+		return -EINVAL;
+
+	if (netif_is_bridge_master(dev)) {
+		br = netdev_priv(dev);
+		vg = br_vlan_group(br);
+		p = NULL;
+	} else {
+		p = br_port_get_rtnl(dev);
+		if (WARN_ON(!p))
+			return -EINVAL;
+		vg = nbp_vlan_group(p);
+		br = p->br;
+	}
+
+	if (!vg)
+		return 0;
+
+	pvid = br_get_pvid(vg);
+
+	list_for_each_entry(v, &vg->vlan_list, vlist) {
+		struct switchdev_obj_port_vlan vlan = {
+			.obj.orig_dev = dev,
+			.obj.id = SWITCHDEV_OBJ_ID_PORT_VLAN,
+			.flags = br_vlan_flags(v, pvid),
+			.vid = v->vid,
+		};
+
+		if (!br_vlan_should_use(v))
+			continue;
+
+		br_vlan_replay_one(nb, dev, &vlan, extack);
+		if (err)
+			return err;
+	}
+
+	return err;
+}
 /* check if v_curr can enter a range ending in range_end */
 bool br_vlan_can_enter_range(const struct net_bridge_vlan *v_curr,
 			     const struct net_bridge_vlan *range_end)
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 9850051071f2..6c3c357ac409 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -209,6 +209,12 @@ static int dsa_port_switchdev_sync(struct dsa_port *dp,
 	if (err && err != -EOPNOTSUPP)
 		return err;
 
+	err = br_vlan_replay(br, brport_dev,
+			     &dsa_slave_switchdev_blocking_notifier,
+			     extack);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 net-next 11/16] net: ocelot: support multiple bridges
  2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (9 preceding siblings ...)
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 10/16] net: dsa: replay VLANs installed on port " Vladimir Oltean
@ 2021-03-18 23:18 ` Vladimir Oltean
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 12/16] net: ocelot: call ocelot_netdevice_bridge_join when joining a bridged LAG Vladimir Oltean
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-18 23:18 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	netdev, linux-kernel, Roopa Prabhu, Nikolay Aleksandrov,
	Jiri Pirko, Ido Schimmel, Alexandre Belloni, UNGLinuxDriver,
	Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

The ocelot switches are a bit odd in that they do not have an STP state
to put the ports into. Instead, the forwarding configuration is delayed
from the typical port_bridge_join into stp_state_set, when the port enters
the BR_STATE_FORWARDING state.

I can only guess that the implementation of this quirk is the reason that
led to the simplification of the driver such that only one bridge could
be offloaded at a time.

We can simplify the data structures somewhat, and introduce a per-port
bridge device pointer and STP state, similar to how the LAG offload
works now (there we have a per-port bonding device pointer and TX
enabled state). This allows offloading multiple bridges with relative
ease, while still keeping in place the quirk to delay the programming of
the PGIDs.

We actually need this change now because we need to remove the bogus
restriction from ocelot_bridge_stp_state_set that ocelot->bridge_mask
needs to contain BIT(port), otherwise that function is a no-op.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/ethernet/mscc/ocelot.c | 72 +++++++++++++++---------------
 include/soc/mscc/ocelot.h          |  7 ++-
 2 files changed, 39 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index 9f0c9bdd9f5d..ce57929ba3d1 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -766,7 +766,7 @@ int ocelot_xtr_poll_frame(struct ocelot *ocelot, int grp, struct sk_buff **nskb)
 	/* Everything we see on an interface that is in the HW bridge
 	 * has already been forwarded.
 	 */
-	if (ocelot->bridge_mask & BIT(src_port))
+	if (ocelot->ports[src_port]->bridge)
 		skb->offload_fwd_mark = 1;
 
 	skb->protocol = eth_type_trans(skb, dev);
@@ -1183,6 +1183,26 @@ static u32 ocelot_get_bond_mask(struct ocelot *ocelot, struct net_device *bond,
 	return mask;
 }
 
+static u32 ocelot_get_bridge_fwd_mask(struct ocelot *ocelot,
+				      struct net_device *bridge)
+{
+	u32 mask = 0;
+	int port;
+
+	for (port = 0; port < ocelot->num_phys_ports; port++) {
+		struct ocelot_port *ocelot_port = ocelot->ports[port];
+
+		if (!ocelot_port)
+			continue;
+
+		if (ocelot_port->stp_state == BR_STATE_FORWARDING &&
+		    ocelot_port->bridge == bridge)
+			mask |= BIT(port);
+	}
+
+	return mask;
+}
+
 static u32 ocelot_get_dsa_8021q_cpu_mask(struct ocelot *ocelot)
 {
 	u32 mask = 0;
@@ -1232,10 +1252,12 @@ void ocelot_apply_bridge_fwd_mask(struct ocelot *ocelot)
 			 */
 			mask = GENMASK(ocelot->num_phys_ports - 1, 0);
 			mask &= ~cpu_fwd_mask;
-		} else if (ocelot->bridge_fwd_mask & BIT(port)) {
+		} else if (ocelot_port->bridge) {
+			struct net_device *bridge = ocelot_port->bridge;
 			struct net_device *bond = ocelot_port->bond;
 
-			mask = ocelot->bridge_fwd_mask & ~BIT(port);
+			mask = ocelot_get_bridge_fwd_mask(ocelot, bridge);
+			mask &= ~BIT(port);
 			if (bond) {
 				mask &= ~ocelot_get_bond_mask(ocelot, bond,
 							      false);
@@ -1256,29 +1278,16 @@ EXPORT_SYMBOL(ocelot_apply_bridge_fwd_mask);
 void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state)
 {
 	struct ocelot_port *ocelot_port = ocelot->ports[port];
-	u32 port_cfg;
-
-	if (!(BIT(port) & ocelot->bridge_mask))
-		return;
+	u32 learn_ena = 0;
 
-	port_cfg = ocelot_read_gix(ocelot, ANA_PORT_PORT_CFG, port);
+	ocelot_port->stp_state = state;
 
-	switch (state) {
-	case BR_STATE_FORWARDING:
-		ocelot->bridge_fwd_mask |= BIT(port);
-		fallthrough;
-	case BR_STATE_LEARNING:
-		if (ocelot_port->learn_ena)
-			port_cfg |= ANA_PORT_PORT_CFG_LEARN_ENA;
-		break;
-
-	default:
-		port_cfg &= ~ANA_PORT_PORT_CFG_LEARN_ENA;
-		ocelot->bridge_fwd_mask &= ~BIT(port);
-		break;
-	}
+	if ((state == BR_STATE_LEARNING || state == BR_STATE_FORWARDING) &&
+	    ocelot_port->learn_ena)
+		learn_ena = ANA_PORT_PORT_CFG_LEARN_ENA;
 
-	ocelot_write_gix(ocelot, port_cfg, ANA_PORT_PORT_CFG, port);
+	ocelot_rmw_gix(ocelot, learn_ena, ANA_PORT_PORT_CFG_LEARN_ENA,
+		       ANA_PORT_PORT_CFG, port);
 
 	ocelot_apply_bridge_fwd_mask(ocelot);
 }
@@ -1508,16 +1517,9 @@ EXPORT_SYMBOL(ocelot_port_mdb_del);
 int ocelot_port_bridge_join(struct ocelot *ocelot, int port,
 			    struct net_device *bridge)
 {
-	if (!ocelot->bridge_mask) {
-		ocelot->hw_bridge_dev = bridge;
-	} else {
-		if (ocelot->hw_bridge_dev != bridge)
-			/* This is adding the port to a second bridge, this is
-			 * unsupported */
-			return -ENODEV;
-	}
+	struct ocelot_port *ocelot_port = ocelot->ports[port];
 
-	ocelot->bridge_mask |= BIT(port);
+	ocelot_port->bridge = bridge;
 
 	return 0;
 }
@@ -1526,13 +1528,11 @@ EXPORT_SYMBOL(ocelot_port_bridge_join);
 int ocelot_port_bridge_leave(struct ocelot *ocelot, int port,
 			     struct net_device *bridge)
 {
+	struct ocelot_port *ocelot_port = ocelot->ports[port];
 	struct ocelot_vlan pvid = {0}, native_vlan = {0};
 	int ret;
 
-	ocelot->bridge_mask &= ~BIT(port);
-
-	if (!ocelot->bridge_mask)
-		ocelot->hw_bridge_dev = NULL;
+	ocelot_port->bridge = NULL;
 
 	ret = ocelot_port_vlan_filtering(ocelot, port, false);
 	if (ret)
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index 0a0751bf97dd..ce7e5c1bd90d 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -615,6 +615,9 @@ struct ocelot_port {
 	bool				lag_tx_active;
 
 	u16				mrp_ring_id;
+
+	struct net_device		*bridge;
+	u8				stp_state;
 };
 
 struct ocelot {
@@ -634,10 +637,6 @@ struct ocelot {
 	int				num_frame_refs;
 	int				num_mact_rows;
 
-	struct net_device		*hw_bridge_dev;
-	u16				bridge_mask;
-	u16				bridge_fwd_mask;
-
 	struct ocelot_port		**ports;
 
 	u8				base_mac[ETH_ALEN];
-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 net-next 12/16] net: ocelot: call ocelot_netdevice_bridge_join when joining a bridged LAG
  2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (10 preceding siblings ...)
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 11/16] net: ocelot: support multiple bridges Vladimir Oltean
@ 2021-03-18 23:18 ` Vladimir Oltean
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 13/16] net: ocelot: replay switchdev events when joining bridge Vladimir Oltean
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-18 23:18 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	netdev, linux-kernel, Roopa Prabhu, Nikolay Aleksandrov,
	Jiri Pirko, Ido Schimmel, Alexandre Belloni, UNGLinuxDriver,
	Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

Similar to the DSA situation, ocelot supports LAG offload but treats
this scenario improperly:

ip link add br0 type bridge
ip link add bond0 type bond
ip link set bond0 master br0
ip link set swp0 master bond0

We do the same thing as we do there, which is to simulate a 'bridge join'
on 'lag join', if we detect that the bonding upper has a bridge upper.

Again, same as DSA, ocelot supports software fallback for LAG, and in
that case, we should avoid calling ocelot_netdevice_changeupper.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/ethernet/mscc/ocelot_net.c | 111 +++++++++++++++++++------
 1 file changed, 86 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
index c08164cd88f4..d1376f7b34fd 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -1117,10 +1117,15 @@ static int ocelot_port_obj_del(struct net_device *dev,
 	return ret;
 }
 
-static int ocelot_netdevice_bridge_join(struct ocelot *ocelot, int port,
-					struct net_device *bridge)
+static int ocelot_netdevice_bridge_join(struct net_device *dev,
+					struct net_device *bridge,
+					struct netlink_ext_ack *extack)
 {
+	struct ocelot_port_private *priv = netdev_priv(dev);
+	struct ocelot_port *ocelot_port = &priv->port;
+	struct ocelot *ocelot = ocelot_port->ocelot;
 	struct switchdev_brport_flags flags;
+	int port = priv->chip_port;
 	int err;
 
 	flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
@@ -1135,10 +1140,14 @@ static int ocelot_netdevice_bridge_join(struct ocelot *ocelot, int port,
 	return 0;
 }
 
-static int ocelot_netdevice_bridge_leave(struct ocelot *ocelot, int port,
+static int ocelot_netdevice_bridge_leave(struct net_device *dev,
 					 struct net_device *bridge)
 {
+	struct ocelot_port_private *priv = netdev_priv(dev);
+	struct ocelot_port *ocelot_port = &priv->port;
+	struct ocelot *ocelot = ocelot_port->ocelot;
 	struct switchdev_brport_flags flags;
+	int port = priv->chip_port;
 	int err;
 
 	flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
@@ -1151,43 +1160,89 @@ static int ocelot_netdevice_bridge_leave(struct ocelot *ocelot, int port,
 	return err;
 }
 
-static int ocelot_netdevice_changeupper(struct net_device *dev,
-					struct netdev_notifier_changeupper_info *info)
+static int ocelot_netdevice_lag_join(struct net_device *dev,
+				     struct net_device *bond,
+				     struct netdev_lag_upper_info *info,
+				     struct netlink_ext_ack *extack)
 {
 	struct ocelot_port_private *priv = netdev_priv(dev);
 	struct ocelot_port *ocelot_port = &priv->port;
 	struct ocelot *ocelot = ocelot_port->ocelot;
+	struct net_device *bridge_dev;
 	int port = priv->chip_port;
+	int err;
+
+	err = ocelot_port_lag_join(ocelot, port, bond, info);
+	if (err == -EOPNOTSUPP) {
+		NL_SET_ERR_MSG_MOD(extack, "Offloading not supported");
+		return 0;
+	}
+
+	bridge_dev = netdev_master_upper_dev_get(bond);
+	if (!bridge_dev || !netif_is_bridge_master(bridge_dev))
+		return 0;
+
+	err = ocelot_netdevice_bridge_join(dev, bridge_dev, extack);
+	if (err)
+		goto err_bridge_join;
+
+	return 0;
+
+err_bridge_join:
+	ocelot_port_lag_leave(ocelot, port, bond);
+	return err;
+}
+
+static int ocelot_netdevice_lag_leave(struct net_device *dev,
+				      struct net_device *bond)
+{
+	struct ocelot_port_private *priv = netdev_priv(dev);
+	struct ocelot_port *ocelot_port = &priv->port;
+	struct ocelot *ocelot = ocelot_port->ocelot;
+	struct net_device *bridge_dev;
+	int port = priv->chip_port;
+
+	ocelot_port_lag_leave(ocelot, port, bond);
+
+	bridge_dev = netdev_master_upper_dev_get(bond);
+	if (!bridge_dev || !netif_is_bridge_master(bridge_dev))
+		return 0;
+
+	return ocelot_netdevice_bridge_leave(dev, bridge_dev);
+}
+
+static int ocelot_netdevice_changeupper(struct net_device *dev,
+					struct netdev_notifier_changeupper_info *info)
+{
+	struct netlink_ext_ack *extack;
 	int err = 0;
 
+	extack = netdev_notifier_info_to_extack(&info->info);
+
 	if (netif_is_bridge_master(info->upper_dev)) {
-		if (info->linking) {
-			err = ocelot_netdevice_bridge_join(ocelot, port,
-							   info->upper_dev);
-		} else {
-			err = ocelot_netdevice_bridge_leave(ocelot, port,
-							    info->upper_dev);
-		}
+		if (info->linking)
+			err = ocelot_netdevice_bridge_join(dev, info->upper_dev,
+							   extack);
+		else
+			err = ocelot_netdevice_bridge_leave(dev, info->upper_dev);
 	}
 	if (netif_is_lag_master(info->upper_dev)) {
-		if (info->linking) {
-			err = ocelot_port_lag_join(ocelot, port,
-						   info->upper_dev,
-						   info->upper_info);
-			if (err == -EOPNOTSUPP) {
-				NL_SET_ERR_MSG_MOD(info->info.extack,
-						   "Offloading not supported");
-				err = 0;
-			}
-		} else {
-			ocelot_port_lag_leave(ocelot, port,
-					      info->upper_dev);
-		}
+		if (info->linking)
+			err = ocelot_netdevice_lag_join(dev, info->upper_dev,
+							info->upper_info, extack);
+		else
+			ocelot_netdevice_lag_leave(dev, info->upper_dev);
 	}
 
 	return notifier_from_errno(err);
 }
 
+/* Treat CHANGEUPPER events on an offloaded LAG as individual CHANGEUPPER
+ * events for the lower physical ports of the LAG.
+ * If the LAG upper isn't offloaded, ignore its CHANGEUPPER events.
+ * In case the LAG joined a bridge, notify that we are offloading it and can do
+ * forwarding in hardware towards it.
+ */
 static int
 ocelot_netdevice_lag_changeupper(struct net_device *dev,
 				 struct netdev_notifier_changeupper_info *info)
@@ -1197,6 +1252,12 @@ ocelot_netdevice_lag_changeupper(struct net_device *dev,
 	int err = NOTIFY_DONE;
 
 	netdev_for_each_lower_dev(dev, lower, iter) {
+		struct ocelot_port_private *priv = netdev_priv(lower);
+		struct ocelot_port *ocelot_port = &priv->port;
+
+		if (ocelot_port->bond != dev)
+			return NOTIFY_OK;
+
 		err = ocelot_netdevice_changeupper(lower, info);
 		if (err)
 			return notifier_from_errno(err);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 net-next 13/16] net: ocelot: replay switchdev events when joining bridge
  2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (11 preceding siblings ...)
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 12/16] net: ocelot: call ocelot_netdevice_bridge_join when joining a bridged LAG Vladimir Oltean
@ 2021-03-18 23:18 ` Vladimir Oltean
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 14/16] net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge Vladimir Oltean
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-18 23:18 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	netdev, linux-kernel, Roopa Prabhu, Nikolay Aleksandrov,
	Jiri Pirko, Ido Schimmel, Alexandre Belloni, UNGLinuxDriver,
	Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

The premise of this change is that the switchdev port attributes and
objects offloaded by ocelot might have been missed when we are joining
an already existing bridge port, such as a bonding interface.

The patch pulls these switchdev attributes and objects from the bridge,
on behalf of the 'bridge port' net device which might be either the
ocelot switch interface, or the bonding upper interface.

The ocelot_net.c belongs strictly to the switchdev ocelot driver, while
ocelot.c is part of a library shared with the DSA felix driver.
The ocelot_port_bridge_leave function (part of the common library) used
to call ocelot_port_vlan_filtering(false), something which is not
necessary for DSA, since the framework deals with that already there.
So we move this function to ocelot_switchdev_unsync, which is specific
to the switchdev driver.

The code movement described above makes ocelot_port_bridge_leave no
longer return an error code, so we change its type from int to void.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/ocelot/felix.c         |   4 +-
 drivers/net/ethernet/mscc/ocelot.c     |  18 ++--
 drivers/net/ethernet/mscc/ocelot_net.c | 117 +++++++++++++++++++++----
 include/soc/mscc/ocelot.h              |   6 +-
 4 files changed, 111 insertions(+), 34 deletions(-)

diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
index 628afb47b579..6b5442be0230 100644
--- a/drivers/net/dsa/ocelot/felix.c
+++ b/drivers/net/dsa/ocelot/felix.c
@@ -719,7 +719,9 @@ static int felix_bridge_join(struct dsa_switch *ds, int port,
 {
 	struct ocelot *ocelot = ds->priv;
 
-	return ocelot_port_bridge_join(ocelot, port, br);
+	ocelot_port_bridge_join(ocelot, port, br);
+
+	return 0;
 }
 
 static void felix_bridge_leave(struct dsa_switch *ds, int port,
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index ce57929ba3d1..1a36b416fd9b 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -1514,34 +1514,28 @@ int ocelot_port_mdb_del(struct ocelot *ocelot, int port,
 }
 EXPORT_SYMBOL(ocelot_port_mdb_del);
 
-int ocelot_port_bridge_join(struct ocelot *ocelot, int port,
-			    struct net_device *bridge)
+void ocelot_port_bridge_join(struct ocelot *ocelot, int port,
+			     struct net_device *bridge)
 {
 	struct ocelot_port *ocelot_port = ocelot->ports[port];
 
 	ocelot_port->bridge = bridge;
 
-	return 0;
+	ocelot_apply_bridge_fwd_mask(ocelot);
 }
 EXPORT_SYMBOL(ocelot_port_bridge_join);
 
-int ocelot_port_bridge_leave(struct ocelot *ocelot, int port,
-			     struct net_device *bridge)
+void ocelot_port_bridge_leave(struct ocelot *ocelot, int port,
+			      struct net_device *bridge)
 {
 	struct ocelot_port *ocelot_port = ocelot->ports[port];
 	struct ocelot_vlan pvid = {0}, native_vlan = {0};
-	int ret;
 
 	ocelot_port->bridge = NULL;
 
-	ret = ocelot_port_vlan_filtering(ocelot, port, false);
-	if (ret)
-		return ret;
-
 	ocelot_port_set_pvid(ocelot, port, pvid);
 	ocelot_port_set_native_vlan(ocelot, port, native_vlan);
-
-	return 0;
+	ocelot_apply_bridge_fwd_mask(ocelot);
 }
 EXPORT_SYMBOL(ocelot_port_bridge_leave);
 
diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
index d1376f7b34fd..d38ffc7cf5f0 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -1117,47 +1117,126 @@ static int ocelot_port_obj_del(struct net_device *dev,
 	return ret;
 }
 
+static void ocelot_inherit_brport_flags(struct ocelot *ocelot, int port,
+					struct net_device *brport_dev)
+{
+	struct switchdev_brport_flags flags = {0};
+	int flag;
+
+	flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
+
+	for_each_set_bit(flag, &flags.mask, 32)
+		if (br_port_flag_is_set(brport_dev, BIT(flag)))
+			flags.val |= BIT(flag);
+
+	ocelot_port_bridge_flags(ocelot, port, flags);
+}
+
+static void ocelot_clear_brport_flags(struct ocelot *ocelot, int port)
+{
+	struct switchdev_brport_flags flags;
+
+	flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
+	flags.val = flags.mask & ~BR_LEARNING;
+
+	ocelot_port_bridge_flags(ocelot, port, flags);
+}
+
+static int ocelot_switchdev_sync(struct ocelot *ocelot, int port,
+				 struct net_device *brport_dev,
+				 struct net_device *bridge_dev,
+				 struct netlink_ext_ack *extack)
+{
+	clock_t ageing_time;
+	u8 stp_state;
+	int err;
+
+	ocelot_inherit_brport_flags(ocelot, port, brport_dev);
+
+	stp_state = br_port_get_stp_state(brport_dev);
+	ocelot_bridge_stp_state_set(ocelot, port, stp_state);
+
+	err = ocelot_port_vlan_filtering(ocelot, port,
+					 br_vlan_enabled(bridge_dev));
+	if (err)
+		return err;
+
+	ageing_time = br_get_ageing_time(bridge_dev);
+	ocelot_port_attr_ageing_set(ocelot, port, ageing_time);
+
+	err = br_mdb_replay(bridge_dev, brport_dev,
+			    &ocelot_switchdev_blocking_nb, extack);
+	if (err)
+		return err;
+
+	err = br_fdb_replay(bridge_dev, brport_dev, &ocelot_switchdev_nb);
+	if (err)
+		return err;
+
+	err = br_vlan_replay(bridge_dev, brport_dev,
+			     &ocelot_switchdev_blocking_nb, extack);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static int ocelot_switchdev_unsync(struct ocelot *ocelot, int port)
+{
+	int err;
+
+	err = ocelot_port_vlan_filtering(ocelot, port, false);
+	if (err)
+		return err;
+
+	ocelot_clear_brport_flags(ocelot, port);
+
+	ocelot_bridge_stp_state_set(ocelot, port, BR_STATE_FORWARDING);
+
+	return 0;
+}
+
 static int ocelot_netdevice_bridge_join(struct net_device *dev,
+					struct net_device *brport_dev,
 					struct net_device *bridge,
 					struct netlink_ext_ack *extack)
 {
 	struct ocelot_port_private *priv = netdev_priv(dev);
 	struct ocelot_port *ocelot_port = &priv->port;
 	struct ocelot *ocelot = ocelot_port->ocelot;
-	struct switchdev_brport_flags flags;
 	int port = priv->chip_port;
 	int err;
 
-	flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
-	flags.val = flags.mask;
+	ocelot_port_bridge_join(ocelot, port, bridge);
 
-	err = ocelot_port_bridge_join(ocelot, port, bridge);
+	err = ocelot_switchdev_sync(ocelot, port, brport_dev, bridge, extack);
 	if (err)
-		return err;
-
-	ocelot_port_bridge_flags(ocelot, port, flags);
+		goto err_switchdev_sync;
 
 	return 0;
+
+err_switchdev_sync:
+	ocelot_port_bridge_leave(ocelot, port, bridge);
+	return err;
 }
 
 static int ocelot_netdevice_bridge_leave(struct net_device *dev,
+					 struct net_device *brport_dev,
 					 struct net_device *bridge)
 {
 	struct ocelot_port_private *priv = netdev_priv(dev);
 	struct ocelot_port *ocelot_port = &priv->port;
 	struct ocelot *ocelot = ocelot_port->ocelot;
-	struct switchdev_brport_flags flags;
 	int port = priv->chip_port;
 	int err;
 
-	flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
-	flags.val = flags.mask & ~BR_LEARNING;
-
-	err = ocelot_port_bridge_leave(ocelot, port, bridge);
+	err = ocelot_switchdev_unsync(ocelot, port);
+	if (err)
+		return err;
 
-	ocelot_port_bridge_flags(ocelot, port, flags);
+	ocelot_port_bridge_leave(ocelot, port, bridge);
 
-	return err;
+	return 0;
 }
 
 static int ocelot_netdevice_lag_join(struct net_device *dev,
@@ -1182,7 +1261,7 @@ static int ocelot_netdevice_lag_join(struct net_device *dev,
 	if (!bridge_dev || !netif_is_bridge_master(bridge_dev))
 		return 0;
 
-	err = ocelot_netdevice_bridge_join(dev, bridge_dev, extack);
+	err = ocelot_netdevice_bridge_join(dev, bond, bridge_dev, extack);
 	if (err)
 		goto err_bridge_join;
 
@@ -1208,7 +1287,7 @@ static int ocelot_netdevice_lag_leave(struct net_device *dev,
 	if (!bridge_dev || !netif_is_bridge_master(bridge_dev))
 		return 0;
 
-	return ocelot_netdevice_bridge_leave(dev, bridge_dev);
+	return ocelot_netdevice_bridge_leave(dev, bond, bridge_dev);
 }
 
 static int ocelot_netdevice_changeupper(struct net_device *dev,
@@ -1221,10 +1300,12 @@ static int ocelot_netdevice_changeupper(struct net_device *dev,
 
 	if (netif_is_bridge_master(info->upper_dev)) {
 		if (info->linking)
-			err = ocelot_netdevice_bridge_join(dev, info->upper_dev,
+			err = ocelot_netdevice_bridge_join(dev, dev,
+							   info->upper_dev,
 							   extack);
 		else
-			err = ocelot_netdevice_bridge_leave(dev, info->upper_dev);
+			err = ocelot_netdevice_bridge_leave(dev, dev,
+							    info->upper_dev);
 	}
 	if (netif_is_lag_master(info->upper_dev)) {
 		if (info->linking)
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index ce7e5c1bd90d..68cdc7ceaf4d 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -803,10 +803,10 @@ int ocelot_port_pre_bridge_flags(struct ocelot *ocelot, int port,
 				 struct switchdev_brport_flags val);
 void ocelot_port_bridge_flags(struct ocelot *ocelot, int port,
 			      struct switchdev_brport_flags val);
-int ocelot_port_bridge_join(struct ocelot *ocelot, int port,
-			    struct net_device *bridge);
-int ocelot_port_bridge_leave(struct ocelot *ocelot, int port,
+void ocelot_port_bridge_join(struct ocelot *ocelot, int port,
 			     struct net_device *bridge);
+void ocelot_port_bridge_leave(struct ocelot *ocelot, int port,
+			      struct net_device *bridge);
 int ocelot_fdb_dump(struct ocelot *ocelot, int port,
 		    dsa_fdb_dump_cb_t *cb, void *data);
 int ocelot_fdb_add(struct ocelot *ocelot, int port,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 net-next 14/16] net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge
  2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (12 preceding siblings ...)
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 13/16] net: ocelot: replay switchdev events when joining bridge Vladimir Oltean
@ 2021-03-18 23:18 ` Vladimir Oltean
  2021-03-19  8:52   ` DENG Qingfang
  2021-03-22 16:06   ` Florian Fainelli
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 15/16] net: dsa: return -EOPNOTSUPP when driver does not implement .port_lag_join Vladimir Oltean
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 16/16] net: bridge: switchdev: let drivers inform which bridge ports are offloaded Vladimir Oltean
  15 siblings, 2 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-18 23:18 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	netdev, linux-kernel, Roopa Prabhu, Nikolay Aleksandrov,
	Jiri Pirko, Ido Schimmel, Alexandre Belloni, UNGLinuxDriver,
	Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

DSA has gained the recent ability to deal gracefully with upper
interfaces it cannot offload, such as the bridge, bonding or team
drivers. When such uppers exist, the ports are still in standalone mode
as far as the hardware is concerned.

But when we deliver packets to the software bridge in order for that to
do the forwarding, there is an unpleasant surprise in that the bridge
will refuse to forward them. This is because we unconditionally set
skb->offload_fwd_mark = true, meaning that the bridge thinks the frames
were already forwarded in hardware by us.

Since dp->bridge_dev is populated only when there is hardware offload
for it, but not in the software fallback case, let's introduce a new
helper that can be called from the tagger data path which sets the
skb->offload_fwd_mark accordingly to zero when there is no hardware
offload for bridging. This lets the bridge forward packets back to other
interfaces of our switch, if needed.

Without this change, sending a packet to the CPU for an unoffloaded
interface triggers this WARN_ON:

void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
			      struct sk_buff *skb)
{
	if (skb->offload_fwd_mark && !WARN_ON_ONCE(!p->offload_fwd_mark))
		BR_INPUT_SKB_CB(skb)->offload_fwd_mark = p->offload_fwd_mark;
}

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 net/dsa/dsa_priv.h         | 14 ++++++++++++++
 net/dsa/tag_brcm.c         |  2 +-
 net/dsa/tag_dsa.c          | 15 +++++++++++----
 net/dsa/tag_hellcreek.c    |  2 +-
 net/dsa/tag_ksz.c          |  2 +-
 net/dsa/tag_lan9303.c      |  3 ++-
 net/dsa/tag_mtk.c          |  2 +-
 net/dsa/tag_ocelot.c       |  2 +-
 net/dsa/tag_ocelot_8021q.c |  2 +-
 net/dsa/tag_rtl4_a.c       |  2 +-
 net/dsa/tag_sja1105.c      |  4 ++--
 net/dsa/tag_xrs700x.c      |  2 +-
 12 files changed, 37 insertions(+), 15 deletions(-)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 92282de54230..b61bef79ce84 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -349,6 +349,20 @@ static inline struct sk_buff *dsa_untag_bridge_pvid(struct sk_buff *skb)
 	return skb;
 }
 
+/* If the ingress port offloads the bridge, we mark the frame as autonomously
+ * forwarded by hardware, so the software bridge doesn't forward in twice, back
+ * to us, because we already did. However, if we're in fallback mode and we do
+ * software bridging, we are not offloading it, therefore the dp->bridge_dev
+ * pointer is not populated, and flooding needs to be done by software (we are
+ * effectively operating in standalone ports mode).
+ */
+static inline void dsa_default_offload_fwd_mark(struct sk_buff *skb)
+{
+	struct dsa_port *dp = dsa_slave_to_port(skb->dev);
+
+	skb->offload_fwd_mark = !!(dp->bridge_dev);
+}
+
 /* switch.c */
 int dsa_switch_register_notifier(struct dsa_switch *ds);
 void dsa_switch_unregister_notifier(struct dsa_switch *ds);
diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index e2577a7dcbca..a8880b3bb106 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -150,7 +150,7 @@ static struct sk_buff *brcm_tag_rcv_ll(struct sk_buff *skb,
 	/* Remove Broadcom tag and update checksum */
 	skb_pull_rcsum(skb, BRCM_TAG_LEN);
 
-	skb->offload_fwd_mark = 1;
+	dsa_default_offload_fwd_mark(skb);
 
 	return skb;
 }
diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
index 7e7b7decdf39..09ab9c25e686 100644
--- a/net/dsa/tag_dsa.c
+++ b/net/dsa/tag_dsa.c
@@ -162,8 +162,8 @@ static struct sk_buff *dsa_xmit_ll(struct sk_buff *skb, struct net_device *dev,
 static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
 				  u8 extra)
 {
+	bool trap = false, trunk = false;
 	int source_device, source_port;
-	bool trunk = false;
 	enum dsa_code code;
 	enum dsa_cmd cmd;
 	u8 *dsa_header;
@@ -174,8 +174,6 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
 	cmd = dsa_header[0] >> 6;
 	switch (cmd) {
 	case DSA_CMD_FORWARD:
-		skb->offload_fwd_mark = 1;
-
 		trunk = !!(dsa_header[1] & 7);
 		break;
 
@@ -194,7 +192,6 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
 			 * device (like a bridge) that forwarding has
 			 * already been done by hardware.
 			 */
-			skb->offload_fwd_mark = 1;
 			break;
 		case DSA_CODE_MGMT_TRAP:
 		case DSA_CODE_IGMP_MLD_TRAP:
@@ -202,6 +199,7 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
 			/* Traps have, by definition, not been
 			 * forwarded by hardware, so don't mark them.
 			 */
+			trap = true;
 			break;
 		default:
 			/* Reserved code, this could be anything. Drop
@@ -235,6 +233,15 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
 	if (!skb->dev)
 		return NULL;
 
+	/* When using LAG offload, skb->dev is not a DSA slave interface,
+	 * so we cannot call dsa_default_offload_fwd_mark and we need to
+	 * special-case it.
+	 */
+	if (trunk)
+		skb->offload_fwd_mark = true;
+	else if (!trap)
+		dsa_default_offload_fwd_mark(skb);
+
 	/* If the 'tagged' bit is set; convert the DSA tag to a 802.1Q
 	 * tag, and delete the ethertype (extra) if applicable. If the
 	 * 'tagged' bit is cleared; delete the DSA tag, and ethertype
diff --git a/net/dsa/tag_hellcreek.c b/net/dsa/tag_hellcreek.c
index a09805c8e1ab..c1ee6eefafe4 100644
--- a/net/dsa/tag_hellcreek.c
+++ b/net/dsa/tag_hellcreek.c
@@ -44,7 +44,7 @@ static struct sk_buff *hellcreek_rcv(struct sk_buff *skb,
 
 	pskb_trim_rcsum(skb, skb->len - HELLCREEK_TAG_LEN);
 
-	skb->offload_fwd_mark = true;
+	dsa_default_offload_fwd_mark(skb);
 
 	return skb;
 }
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index 4820dbcedfa2..8eee63a5b93b 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -24,7 +24,7 @@ static struct sk_buff *ksz_common_rcv(struct sk_buff *skb,
 
 	pskb_trim_rcsum(skb, skb->len - len);
 
-	skb->offload_fwd_mark = true;
+	dsa_default_offload_fwd_mark(skb);
 
 	return skb;
 }
diff --git a/net/dsa/tag_lan9303.c b/net/dsa/tag_lan9303.c
index aa1318dccaf0..3a5494d2f7b1 100644
--- a/net/dsa/tag_lan9303.c
+++ b/net/dsa/tag_lan9303.c
@@ -115,7 +115,8 @@ static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev,
 	skb_pull_rcsum(skb, 2 + 2);
 	memmove(skb->data - ETH_HLEN, skb->data - (ETH_HLEN + LAN9303_TAG_LEN),
 		2 * ETH_ALEN);
-	skb->offload_fwd_mark = !(lan9303_tag1 & LAN9303_TAG_RX_TRAPPED_TO_CPU);
+	if (!(lan9303_tag1 & LAN9303_TAG_RX_TRAPPED_TO_CPU))
+		dsa_default_offload_fwd_mark(skb);
 
 	return skb;
 }
diff --git a/net/dsa/tag_mtk.c b/net/dsa/tag_mtk.c
index f9b2966d1936..92ab21d2ceca 100644
--- a/net/dsa/tag_mtk.c
+++ b/net/dsa/tag_mtk.c
@@ -92,7 +92,7 @@ static struct sk_buff *mtk_tag_rcv(struct sk_buff *skb, struct net_device *dev,
 	if (!skb->dev)
 		return NULL;
 
-	skb->offload_fwd_mark = 1;
+	dsa_default_offload_fwd_mark(skb);
 
 	return skb;
 }
diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c
index f9df9cac81c5..1deba3f1bb82 100644
--- a/net/dsa/tag_ocelot.c
+++ b/net/dsa/tag_ocelot.c
@@ -123,7 +123,7 @@ static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
 		 */
 		return NULL;
 
-	skb->offload_fwd_mark = 1;
+	dsa_default_offload_fwd_mark(skb);
 	skb->priority = qos_class;
 
 	/* Ocelot switches copy frames unmodified to the CPU. However, it is
diff --git a/net/dsa/tag_ocelot_8021q.c b/net/dsa/tag_ocelot_8021q.c
index 5f3e8e124a82..447e1eeb357c 100644
--- a/net/dsa/tag_ocelot_8021q.c
+++ b/net/dsa/tag_ocelot_8021q.c
@@ -81,7 +81,7 @@ static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
 	if (!skb->dev)
 		return NULL;
 
-	skb->offload_fwd_mark = 1;
+	dsa_default_offload_fwd_mark(skb);
 	skb->priority = qos_class;
 
 	return skb;
diff --git a/net/dsa/tag_rtl4_a.c b/net/dsa/tag_rtl4_a.c
index e9176475bac8..1864e3a74df8 100644
--- a/net/dsa/tag_rtl4_a.c
+++ b/net/dsa/tag_rtl4_a.c
@@ -114,7 +114,7 @@ static struct sk_buff *rtl4a_tag_rcv(struct sk_buff *skb,
 		skb->data - ETH_HLEN - RTL4_A_HDR_LEN,
 		2 * ETH_ALEN);
 
-	skb->offload_fwd_mark = 1;
+	dsa_default_offload_fwd_mark(skb);
 
 	return skb;
 }
diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
index 50496013cdb7..45cdf64f0e88 100644
--- a/net/dsa/tag_sja1105.c
+++ b/net/dsa/tag_sja1105.c
@@ -295,8 +295,6 @@ static struct sk_buff *sja1105_rcv(struct sk_buff *skb,
 	is_link_local = sja1105_is_link_local(skb);
 	is_meta = sja1105_is_meta_frame(skb);
 
-	skb->offload_fwd_mark = 1;
-
 	if (is_tagged) {
 		/* Normal traffic path. */
 		skb_push_rcsum(skb, ETH_HLEN);
@@ -339,6 +337,8 @@ static struct sk_buff *sja1105_rcv(struct sk_buff *skb,
 		return NULL;
 	}
 
+	dsa_default_offload_fwd_mark(skb);
+
 	if (subvlan)
 		sja1105_decode_subvlan(skb, subvlan);
 
diff --git a/net/dsa/tag_xrs700x.c b/net/dsa/tag_xrs700x.c
index 858cdf9d2913..1208549f45c1 100644
--- a/net/dsa/tag_xrs700x.c
+++ b/net/dsa/tag_xrs700x.c
@@ -46,7 +46,7 @@ static struct sk_buff *xrs700x_rcv(struct sk_buff *skb, struct net_device *dev,
 		return NULL;
 
 	/* Frame is forwarded by hardware, don't forward in software. */
-	skb->offload_fwd_mark = 1;
+	dsa_default_offload_fwd_mark(skb);
 
 	return skb;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 net-next 15/16] net: dsa: return -EOPNOTSUPP when driver does not implement .port_lag_join
  2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (13 preceding siblings ...)
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 14/16] net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge Vladimir Oltean
@ 2021-03-18 23:18 ` Vladimir Oltean
  2021-03-22 15:51   ` Florian Fainelli
  2021-03-22 15:58   ` Tobias Waldekranz
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 16/16] net: bridge: switchdev: let drivers inform which bridge ports are offloaded Vladimir Oltean
  15 siblings, 2 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-18 23:18 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	netdev, linux-kernel, Roopa Prabhu, Nikolay Aleksandrov,
	Jiri Pirko, Ido Schimmel, Alexandre Belloni, UNGLinuxDriver,
	Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

The DSA core has a layered structure, and even though we end up
returning 0 (success) to user space when setting a bonding/team upper
that can't be offloaded, some parts of the framework actually need to
know that we couldn't offload that.

For example, if dsa_switch_lag_join returns 0 as it currently does,
dsa_port_lag_join has no way to tell a successful offload from a
software fallback, and it will call dsa_port_bridge_join afterwards.
Then we'll think we're offloading the bridge master of the LAG, when in
fact we're not even offloading the LAG. In turn, this will make us set
skb->offload_fwd_mark = true, which is incorrect and the bridge doesn't
like it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/switch.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index 4b5da89dc27a..162bbb2f5cec 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -213,7 +213,7 @@ static int dsa_switch_lag_join(struct dsa_switch *ds,
 						   info->port, info->lag,
 						   info->info);
 
-	return 0;
+	return -EOPNOTSUPP;
 }
 
 static int dsa_switch_lag_leave(struct dsa_switch *ds,
@@ -226,7 +226,7 @@ static int dsa_switch_lag_leave(struct dsa_switch *ds,
 		return ds->ops->crosschip_lag_leave(ds, info->sw_index,
 						    info->port, info->lag);
 
-	return 0;
+	return -EOPNOTSUPP;
 }
 
 static bool dsa_switch_mdb_match(struct dsa_switch *ds, int port,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 net-next 16/16] net: bridge: switchdev: let drivers inform which bridge ports are offloaded
  2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (14 preceding siblings ...)
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 15/16] net: dsa: return -EOPNOTSUPP when driver does not implement .port_lag_join Vladimir Oltean
@ 2021-03-18 23:18 ` Vladimir Oltean
  2021-03-22 16:30   ` Tobias Waldekranz
  15 siblings, 1 reply; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-18 23:18 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	netdev, linux-kernel, Roopa Prabhu, Nikolay Aleksandrov,
	Jiri Pirko, Ido Schimmel, Alexandre Belloni, UNGLinuxDriver,
	Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

On reception of an skb, the bridge checks if it was marked as 'already
forwarded in hardware' (checks if skb->offload_fwd_mark == 1), and if it
is, it puts a mark of its own on that skb, with the switchdev mark of
the ingress port. Then during forwarding, it enforces that the egress
port must have a different switchdev mark than the ingress one (this is
done in nbp_switchdev_allowed_egress).

Non-switchdev drivers don't report any physical switch id (neither
through devlink nor .ndo_get_port_parent_id), therefore the bridge
assigns them a switchdev mark of 0, and packets coming from them will
always have skb->offload_fwd_mark = 0. So there aren't any restrictions.

Problems appear due to the fact that DSA would like to perform software
fallback for bonding and team interfaces that the physical switch cannot
offload.

         +-- br0 -+
        /   / |    \
       /   /  |     \
      /   /   |      \
     /   /    |       \
    /   /     |        \
   /    |     |       bond0
  /     |     |      /    \
 swp0  swp1  swp2  swp3  swp4

There, it is desirable that the presence of swp3 and swp4 under a
non-offloaded LAG does not preclude us from doing hardware bridging
beteen swp0, swp1 and swp2. The bandwidth of the CPU is often times high
enough that software bridging between {swp0,swp1,swp2} and bond0 is not
impractical.

But this creates an impossible paradox given the current way in which
port switchdev marks are assigned. When the driver receives a packet
from swp0 (say, due to flooding), it must set skb->offload_fwd_mark to
something.

- If we set it to 0, then the bridge will forward it towards swp1, swp2
  and bond0. But the switch has already forwarded it towards swp1 and
  swp2 (not to bond0, remember, that isn't offloaded, so as far as the
  switch is concerned, ports swp3 and swp4 are not looking up the FDB,
  and the entire bond0 is a destination that is strictly behind the
  CPU). But we don't want duplicated traffic towards swp1 and swp2, so
  it's not ok to set skb->offload_fwd_mark = 0.

- If we set it to 1, then the bridge will not forward the skb towards
  the ports with the same switchdev mark, i.e. not to swp1, swp2 and
  bond0. Towards swp1 and swp2 that's ok, but towards bond0? It should
  have forwarded the skb there.

So the real issue is that bond0 will be assigned the same switchdev mark
as {swp0,swp1,swp2}, because the function that assigns switchdev marks
to bridge ports, nbp_switchdev_mark_set, recurses through bond0's lower
interfaces until it finds something that implements devlink.

A solution is to give the bridge explicit hints as to what switchdev
mark it should use for each port.

Currently, the bridging offload is very 'silent': a driver registers a
netdevice notifier, which is put on the netns's notifier chain, and
which sniffs around for NETDEV_CHANGEUPPER events where the upper is a
bridge, and the lower is an interface it knows about (one registered by
this driver, normally). Then, from within that notifier, it does a bunch
of stuff behind the bridge's back, without the bridge necessarily
knowing that there's somebody offloading that port. It looks like this:

     ip link set swp0 master br0
                  |
                  v
   bridge calls netdev_master_upper_dev_link
                  |
                  v
        call_netdevice_notifiers
                  |
                  v
       dsa_slave_netdevice_event
                  |
                  v
        oh, hey! it's for me!
                  |
                  v
           .port_bridge_join

What we do to solve the conundrum is to be less silent, and emit a
notification back. Something like this:

     ip link set swp0 master br0
                  |
                  v
   bridge calls netdev_master_upper_dev_link
                  |
                  v                    bridge: Aye! I'll use this
        call_netdevice_notifiers           ^  ppid as the
                  |                        |  switchdev mark for
                  v                        |  this port, and zero
       dsa_slave_netdevice_event           |  if I got nothing.
                  |                        |
                  v                        |
        oh, hey! it's for me!              |
                  |                        |
                  v                        |
           .port_bridge_join               |
                  |                        |
                  +------------------------+
             switchdev_bridge_port_offload(swp0)

Then stacked interfaces (like bond0 on top of swp3/swp4) would be
treated differently in DSA, depending on whether we can or cannot
offload them.

The offload case:

    ip link set bond0 master br0
                  |
                  v
   bridge calls netdev_master_upper_dev_link
                  |
                  v                    bridge: Aye! I'll use this
        call_netdevice_notifiers           ^  ppid as the
                  |                        |  switchdev mark for
                  v                        |        bond0.
       dsa_slave_netdevice_event           | Coincidentally (or not),
                  |                        | bond0 and swp0, swp1, swp2
                  v                        | all have the same switchdev
        hmm, it's not quite for me,        | mark now, since the ASIC
         but my driver has already         | is able to forward towards
           called .port_lag_join           | all these ports in hw.
          for it, because I have           |
      a port with dp->lag_dev == bond0.    |
                  |                        |
                  v                        |
           .port_bridge_join               |
           for swp3 and swp4               |
                  |                        |
                  +------------------------+
            switchdev_bridge_port_offload(bond0)

And the non-offload case:

    ip link set bond0 master br0
                  |
                  v
   bridge calls netdev_master_upper_dev_link
                  |
                  v                    bridge waiting:
        call_netdevice_notifiers           ^  huh, switchdev_bridge_port_offload
                  |                        |  wasn't called, okay, I'll use a
                  v                        |  switchdev mark of zero for this one.
       dsa_slave_netdevice_event           :  Then packets received on swp0 will
                  |                        :  not be forwarded towards swp1, but
                  v                        :  they will towards bond0.
         it's not for me, but
       bond0 is an upper of swp3
      and swp4, but their dp->lag_dev
       is NULL because they couldn't
            offload it.

Basically we can draw the conclusion that the lowers of a bridge port
can come and go, so depending on the configuration of lowers for a
bridge port, it can dynamically toggle between offloaded and unoffloaded.
Therefore, we need an equivalent switchdev_bridge_port_unoffload too.

This patch changes the way any switchdev driver interacts with the
bridge. From now on, everybody needs to call switchdev_bridge_port_offload,
otherwise the bridge will treat the port as non-offloaded and allow
software flooding to other ports from the same ASIC.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 .../ethernet/freescale/dpaa2/dpaa2-switch.c   |  4 +-
 .../marvell/prestera/prestera_switchdev.c     |  7 ++
 .../mellanox/mlxsw/spectrum_switchdev.c       |  4 +-
 drivers/net/ethernet/mscc/ocelot_net.c        |  4 +-
 drivers/net/ethernet/rocker/rocker_ofdpa.c    |  8 +-
 drivers/net/ethernet/ti/am65-cpsw-nuss.c      |  7 +-
 drivers/net/ethernet/ti/cpsw_new.c            |  6 +-
 include/linux/if_bridge.h                     | 16 ++++
 net/bridge/br_if.c                            | 11 +--
 net/bridge/br_private.h                       |  8 +-
 net/bridge/br_switchdev.c                     | 94 ++++++++++++++++---
 11 files changed, 138 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
index 2fd05dd18d46..f20556178e33 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
@@ -1518,7 +1518,7 @@ static int dpaa2_switch_port_bridge_join(struct net_device *netdev,
 	if (err)
 		goto err_egress_flood;
 
-	return 0;
+	return switchdev_bridge_port_offload(netdev, NULL);
 
 err_egress_flood:
 	dpaa2_switch_port_set_fdb(port_priv, NULL);
@@ -1552,6 +1552,8 @@ static int dpaa2_switch_port_bridge_leave(struct net_device *netdev)
 	struct ethsw_core *ethsw = port_priv->ethsw_data;
 	int err;
 
+	switchdev_bridge_port_unoffload(netdev);
+
 	/* First of all, fast age any learn FDB addresses on this switch port */
 	dpaa2_switch_port_fast_age(port_priv);
 
diff --git a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
index 49e052273f30..0b0d5db7b85b 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
+++ b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
@@ -443,6 +443,10 @@ static int prestera_port_bridge_join(struct prestera_port *port,
 		goto err_brport_create;
 	}
 
+	err = switchdev_bridge_port_offload(port->dev, NULL);
+	if (err)
+		goto err_brport_offload;
+
 	if (bridge->vlan_enabled)
 		return 0;
 
@@ -453,6 +457,7 @@ static int prestera_port_bridge_join(struct prestera_port *port,
 	return 0;
 
 err_port_join:
+err_brport_offload:
 	prestera_bridge_port_put(br_port);
 err_brport_create:
 	prestera_bridge_put(bridge);
@@ -520,6 +525,8 @@ static void prestera_port_bridge_leave(struct prestera_port *port,
 	if (!br_port)
 		return;
 
+	switchdev_bridge_port_unoffload(port->dev);
+
 	bridge = br_port->bridge;
 
 	if (bridge->vlan_enabled)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 23b7e8d6386b..7fa0b3653819 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -2326,7 +2326,7 @@ int mlxsw_sp_port_bridge_join(struct mlxsw_sp_port *mlxsw_sp_port,
 	if (err)
 		goto err_port_join;
 
-	return 0;
+	return switchdev_bridge_port_offload(brport_dev, extack);
 
 err_port_join:
 	mlxsw_sp_bridge_port_put(mlxsw_sp->bridge, bridge_port);
@@ -2348,6 +2348,8 @@ void mlxsw_sp_port_bridge_leave(struct mlxsw_sp_port *mlxsw_sp_port,
 	if (!bridge_port)
 		return;
 
+	switchdev_bridge_port_unoffload(brport_dev);
+
 	bridge_device->ops->port_leave(bridge_device, bridge_port,
 				       mlxsw_sp_port);
 	mlxsw_sp_bridge_port_put(mlxsw_sp->bridge, bridge_port);
diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
index d38ffc7cf5f0..b917d9dd8a6a 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -1213,7 +1213,7 @@ static int ocelot_netdevice_bridge_join(struct net_device *dev,
 	if (err)
 		goto err_switchdev_sync;
 
-	return 0;
+	return switchdev_bridge_port_offload(brport_dev, extack);
 
 err_switchdev_sync:
 	ocelot_port_bridge_leave(ocelot, port, bridge);
@@ -1234,6 +1234,8 @@ static int ocelot_netdevice_bridge_leave(struct net_device *dev,
 	if (err)
 		return err;
 
+	switchdev_bridge_port_unoffload(brport_dev);
+
 	ocelot_port_bridge_leave(ocelot, port, bridge);
 
 	return 0;
diff --git a/drivers/net/ethernet/rocker/rocker_ofdpa.c b/drivers/net/ethernet/rocker/rocker_ofdpa.c
index 967a634ee9ac..9b6d7cac112b 100644
--- a/drivers/net/ethernet/rocker/rocker_ofdpa.c
+++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c
@@ -2592,13 +2592,19 @@ static int ofdpa_port_bridge_join(struct ofdpa_port *ofdpa_port,
 
 	ofdpa_port->bridge_dev = bridge;
 
-	return ofdpa_port_vlan_add(ofdpa_port, OFDPA_UNTAGGED_VID, 0);
+	err = ofdpa_port_vlan_add(ofdpa_port, OFDPA_UNTAGGED_VID, 0);
+	if (err)
+		return err;
+
+	return switchdev_bridge_port_offload(ofdpa_port->dev, NULL);
 }
 
 static int ofdpa_port_bridge_leave(struct ofdpa_port *ofdpa_port)
 {
 	int err;
 
+	switchdev_bridge_port_unoffload(ofdpa_port->dev);
+
 	err = ofdpa_port_vlan_del(ofdpa_port, OFDPA_UNTAGGED_VID, 0);
 	if (err)
 		return err;
diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
index 638d7b03be4b..fe2e38971acc 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
+++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
@@ -7,6 +7,7 @@
 
 #include <linux/clk.h>
 #include <linux/etherdevice.h>
+#include <linux/if_bridge.h>
 #include <linux/if_vlan.h>
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
@@ -2082,6 +2083,7 @@ static int am65_cpsw_netdevice_port_link(struct net_device *ndev, struct net_dev
 {
 	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
 	struct am65_cpsw_ndev_priv *priv = am65_ndev_to_priv(ndev);
+	int err;
 
 	if (!common->br_members) {
 		common->hw_bridge_dev = br_ndev;
@@ -2097,7 +2099,8 @@ static int am65_cpsw_netdevice_port_link(struct net_device *ndev, struct net_dev
 
 	am65_cpsw_port_offload_fwd_mark_update(common);
 
-	return NOTIFY_DONE;
+	err = switchdev_bridge_port_offload(ndev, NULL);
+	return notifier_to_errno(err);
 }
 
 static void am65_cpsw_netdevice_port_unlink(struct net_device *ndev)
@@ -2105,6 +2108,8 @@ static void am65_cpsw_netdevice_port_unlink(struct net_device *ndev)
 	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
 	struct am65_cpsw_ndev_priv *priv = am65_ndev_to_priv(ndev);
 
+	switchdev_bridge_port_unoffload(ndev);
+
 	common->br_members &= ~BIT(priv->port->port_id);
 
 	am65_cpsw_port_offload_fwd_mark_update(common);
diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
index 58a64313ac00..6347532fb39d 100644
--- a/drivers/net/ethernet/ti/cpsw_new.c
+++ b/drivers/net/ethernet/ti/cpsw_new.c
@@ -1508,6 +1508,7 @@ static int cpsw_netdevice_port_link(struct net_device *ndev,
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
+	int err;
 
 	if (!cpsw->br_members) {
 		cpsw->hw_bridge_dev = br_ndev;
@@ -1523,7 +1524,8 @@ static int cpsw_netdevice_port_link(struct net_device *ndev,
 
 	cpsw_port_offload_fwd_mark_update(cpsw);
 
-	return NOTIFY_DONE;
+	err = switchdev_bridge_port_offload(ndev, NULL);
+	return notifier_to_errno(err);
 }
 
 static void cpsw_netdevice_port_unlink(struct net_device *ndev)
@@ -1531,6 +1533,8 @@ static void cpsw_netdevice_port_unlink(struct net_device *ndev)
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
 
+	switchdev_bridge_port_unoffload(ndev);
+
 	cpsw->br_members &= ~BIT(priv->emac_port);
 
 	cpsw_port_offload_fwd_mark_update(cpsw);
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index ea176c508c0d..4fbee6d5fc16 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -196,4 +196,20 @@ static inline int br_fdb_replay(struct net_device *br_dev,
 }
 #endif
 
+#if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_NET_SWITCHDEV)
+int switchdev_bridge_port_offload(struct net_device *dev,
+				  struct netlink_ext_ack *extack);
+int switchdev_bridge_port_unoffload(struct net_device *dev);
+#else
+int switchdev_bridge_port_offload(struct net_device *dev,
+				  struct netlink_ext_ack *extack)
+{
+	return 0;
+}
+
+int switchdev_bridge_port_unoffload(struct net_device *dev)
+{
+}
+#endif
+
 #endif
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index f7d2f472ae24..930a09f27e0d 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -643,10 +643,6 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
 	if (err)
 		goto err5;
 
-	err = nbp_switchdev_mark_set(p);
-	if (err)
-		goto err6;
-
 	dev_disable_lro(dev);
 
 	list_add_rcu(&p->list, &br->port_list);
@@ -671,13 +667,13 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
 		 */
 		err = dev_pre_changeaddr_notify(br->dev, dev->dev_addr, extack);
 		if (err)
-			goto err7;
+			goto err6;
 	}
 
 	err = nbp_vlan_init(p, extack);
 	if (err) {
 		netdev_err(dev, "failed to initialize vlan filtering on this port\n");
-		goto err7;
+		goto err6;
 	}
 
 	spin_lock_bh(&br->lock);
@@ -700,11 +696,10 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
 
 	return 0;
 
-err7:
+err6:
 	list_del_rcu(&p->list);
 	br_fdb_delete_by_port(br, p, 0, 1);
 	nbp_update_port_count(br);
-err6:
 	netdev_upper_dev_unlink(dev, br->dev);
 err5:
 	dev->priv_flags &= ~IFF_BRIDGE_PORT;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index d7d167e10b70..1982b5887d0f 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -326,8 +326,10 @@ struct net_bridge_port {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	struct netpoll			*np;
 #endif
+	int				offload_count;
 #ifdef CONFIG_NET_SWITCHDEV
 	int				offload_fwd_mark;
+	struct netdev_phys_item_id	ppid;
 #endif
 	u16				group_fwd_mask;
 	u16				backup_redirected_cnt;
@@ -1572,7 +1574,6 @@ static inline void br_sysfs_delbr(struct net_device *dev) { return; }
 
 /* br_switchdev.c */
 #ifdef CONFIG_NET_SWITCHDEV
-int nbp_switchdev_mark_set(struct net_bridge_port *p);
 void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
 			      struct sk_buff *skb);
 bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
@@ -1592,11 +1593,6 @@ static inline void br_switchdev_frame_unmark(struct sk_buff *skb)
 	skb->offload_fwd_mark = 0;
 }
 #else
-static inline int nbp_switchdev_mark_set(struct net_bridge_port *p)
-{
-	return 0;
-}
-
 static inline void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
 					    struct sk_buff *skb)
 {
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index b89503832fcc..4cf7902f056c 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -8,37 +8,109 @@
 
 #include "br_private.h"
 
-static int br_switchdev_mark_get(struct net_bridge *br, struct net_device *dev)
+static int br_switchdev_mark_get(struct net_bridge *br,
+				 struct net_bridge_port *new_nbp)
 {
 	struct net_bridge_port *p;
 
 	/* dev is yet to be added to the port list. */
 	list_for_each_entry(p, &br->port_list, list) {
-		if (netdev_port_same_parent_id(dev, p->dev))
+		if (!p->offload_count)
+			continue;
+
+		if (netdev_phys_item_id_same(&p->ppid, &new_nbp->ppid))
 			return p->offload_fwd_mark;
 	}
 
 	return ++br->offload_fwd_mark;
 }
 
-int nbp_switchdev_mark_set(struct net_bridge_port *p)
+static int nbp_switchdev_mark_set(struct net_bridge_port *p,
+				  struct netdev_phys_item_id ppid,
+				  struct netlink_ext_ack *extack)
+{
+	if (p->offload_count) {
+		/* Prevent unsupported configurations such as a bridge port
+		 * which is a bonding interface, and the member ports are from
+		 * different hardware switches.
+		 */
+		if (!netdev_phys_item_id_same(&p->ppid, &ppid)) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "Same bridge port cannot be offloaded by two physical switches");
+			return -EBUSY;
+		}
+		/* Be tolerant with drivers that call SWITCHDEV_BRPORT_OFFLOADED
+		 * more than once for the same bridge port, such as when the
+		 * bridge port is an offloaded bonding/team interface.
+		 */
+		p->offload_count++;
+		return 0;
+	}
+
+	p->ppid = ppid;
+	p->offload_count = 1;
+	p->offload_fwd_mark = br_switchdev_mark_get(p->br, p);
+
+	return 0;
+}
+
+static void nbp_switchdev_mark_clear(struct net_bridge_port *p,
+				     struct netdev_phys_item_id ppid)
+{
+	if (WARN_ON(!netdev_phys_item_id_same(&p->ppid, &ppid)))
+		return;
+	if (WARN_ON(!p->offload_count))
+		return;
+
+	p->offload_count--;
+	if (p->offload_count)
+		return;
+
+	p->offload_fwd_mark = 0;
+}
+
+/* Let the bridge know that this port is offloaded, so that it can use
+ * the port parent id obtained by recursion to determine the bridge
+ * port's switchdev mark.
+ */
+int switchdev_bridge_port_offload(struct net_device *dev,
+				  struct netlink_ext_ack *extack)
 {
-	struct netdev_phys_item_id ppid = { };
+	struct netdev_phys_item_id ppid;
+	struct net_bridge_port *p;
 	int err;
 
-	ASSERT_RTNL();
+	p = br_port_get_rtnl(dev);
+	if (!p)
+		return -ENODEV;
 
-	err = dev_get_port_parent_id(p->dev, &ppid, true);
-	if (err) {
-		if (err == -EOPNOTSUPP)
-			return 0;
+	err = dev_get_port_parent_id(dev, &ppid, true);
+	if (err)
+		return err;
+
+	return nbp_switchdev_mark_set(p, ppid, extack);
+}
+EXPORT_SYMBOL_GPL(switchdev_bridge_port_offload);
+
+int switchdev_bridge_port_unoffload(struct net_device *dev)
+{
+	struct netdev_phys_item_id ppid;
+	struct net_bridge_port *p;
+	int err;
+
+	p = br_port_get_rtnl(dev);
+	if (!p)
+		return -ENODEV;
+
+	err = dev_get_port_parent_id(dev, &ppid, true);
+	if (err)
 		return err;
-	}
 
-	p->offload_fwd_mark = br_switchdev_mark_get(p->br, p->dev);
+	nbp_switchdev_mark_clear(p, ppid);
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(switchdev_bridge_port_unoffload);
 
 void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
 			      struct sk_buff *skb)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 14/16] net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 14/16] net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge Vladimir Oltean
@ 2021-03-19  8:52   ` DENG Qingfang
  2021-03-19  9:06     ` Vladimir Oltean
  2021-03-22 16:06   ` Florian Fainelli
  1 sibling, 1 reply; 52+ messages in thread
From: DENG Qingfang @ 2021-03-19  8:52 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Jakub Kicinski, David S. Miller, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, Tobias Waldekranz, netdev, linux-kernel,
	Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel,
	Alexandre Belloni, UNGLinuxDriver, Vadym Kochan, Taras Chornyi,
	Grygorii Strashko, Vignesh Raghavendra, Ioana Ciornei,
	Ivan Vecera, linux-omap, Vladimir Oltean

On Fri, Mar 19, 2021 at 01:18:27AM +0200, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> DSA has gained the recent ability to deal gracefully with upper
> interfaces it cannot offload, such as the bridge, bonding or team
> drivers. When such uppers exist, the ports are still in standalone mode
> as far as the hardware is concerned.
> 
> But when we deliver packets to the software bridge in order for that to
> do the forwarding, there is an unpleasant surprise in that the bridge
> will refuse to forward them. This is because we unconditionally set
> skb->offload_fwd_mark = true, meaning that the bridge thinks the frames
> were already forwarded in hardware by us.
> 
> Since dp->bridge_dev is populated only when there is hardware offload
> for it, but not in the software fallback case, let's introduce a new
> helper that can be called from the tagger data path which sets the
> skb->offload_fwd_mark accordingly to zero when there is no hardware
> offload for bridging. This lets the bridge forward packets back to other
> interfaces of our switch, if needed.
> 
> Without this change, sending a packet to the CPU for an unoffloaded
> interface triggers this WARN_ON:
> 
> void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
> 			      struct sk_buff *skb)
> {
> 	if (skb->offload_fwd_mark && !WARN_ON_ONCE(!p->offload_fwd_mark))
> 		BR_INPUT_SKB_CB(skb)->offload_fwd_mark = p->offload_fwd_mark;
> }
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
> ---
>  net/dsa/dsa_priv.h         | 14 ++++++++++++++
>  net/dsa/tag_brcm.c         |  2 +-
>  net/dsa/tag_dsa.c          | 15 +++++++++++----
>  net/dsa/tag_hellcreek.c    |  2 +-
>  net/dsa/tag_ksz.c          |  2 +-
>  net/dsa/tag_lan9303.c      |  3 ++-
>  net/dsa/tag_mtk.c          |  2 +-
>  net/dsa/tag_ocelot.c       |  2 +-
>  net/dsa/tag_ocelot_8021q.c |  2 +-
>  net/dsa/tag_rtl4_a.c       |  2 +-
>  net/dsa/tag_sja1105.c      |  4 ++--
>  net/dsa/tag_xrs700x.c      |  2 +-
>  12 files changed, 37 insertions(+), 15 deletions(-)
> 
> diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
> index 92282de54230..b61bef79ce84 100644
> --- a/net/dsa/dsa_priv.h
> +++ b/net/dsa/dsa_priv.h
> @@ -349,6 +349,20 @@ static inline struct sk_buff *dsa_untag_bridge_pvid(struct sk_buff *skb)
>  	return skb;
>  }
>  
> +/* If the ingress port offloads the bridge, we mark the frame as autonomously
> + * forwarded by hardware, so the software bridge doesn't forward in twice, back
> + * to us, because we already did. However, if we're in fallback mode and we do
> + * software bridging, we are not offloading it, therefore the dp->bridge_dev
> + * pointer is not populated, and flooding needs to be done by software (we are
> + * effectively operating in standalone ports mode).
> + */
> +static inline void dsa_default_offload_fwd_mark(struct sk_buff *skb)
> +{
> +	struct dsa_port *dp = dsa_slave_to_port(skb->dev);
> +
> +	skb->offload_fwd_mark = !!(dp->bridge_dev);
> +}

So offload_fwd_mark is set iff the ingress port offloads the bridge.
Consider this set up on a switch which does NOT support LAG offload:

        +----- br0 -----+
        |               |
      bond0             |
        |               |         (Linux interfaces)
    +---+---+       +---+---+
    |       |       |       |
+-------+-------+-------+-------+
| sw0p0 | sw0p1 | sw0p2 | sw0p3 |
+-------+-------+-------+-------+
    |       |       |       |
    +---A---+       B       C     (LAN clients)


sw0p0 and sw0p1 should be in standalone mode (offload_fwd_mark = 0),
while sw0p2 and sw0p3 are offloaded (offload_fwd_mark = 1).

When a frame is sent into sw0p2 or sw0p3, can it be forwarded to sw0p0 or
sw0p1?

Setting offload_fwd_mark to 0 could also cause potential packet loss on
switches that perform learning on the CPU port:

When client C is talking to client A, frames from C will:
1. Enter sw0p3, where the switch will learn C is reachable via sw0p3.
2. Be sent to the CPU port and bounced back, where the switch will learn C is
   reachable via the CPU port, overwriting the previous learned FDB entry.
3. Be sent out of either sw0p0 or sw0p1, and reach its destination - A.

During step 2, if client B sends a frame to C, the frame will be forwarded to
the CPU, which will think it is already forwarded by the switch, and refuse to
forward it back, resulting in packet loss.

Many switch TX tags (mtk, qca, rtl) have a bit to disable source address
learning on a per-frame basis. We should utilise that.

> +
>  /* switch.c */
>  int dsa_switch_register_notifier(struct dsa_switch *ds);
>  void dsa_switch_unregister_notifier(struct dsa_switch *ds);
> diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
> index e2577a7dcbca..a8880b3bb106 100644
> --- a/net/dsa/tag_brcm.c
> +++ b/net/dsa/tag_brcm.c
> @@ -150,7 +150,7 @@ static struct sk_buff *brcm_tag_rcv_ll(struct sk_buff *skb,
>  	/* Remove Broadcom tag and update checksum */
>  	skb_pull_rcsum(skb, BRCM_TAG_LEN);
>  
> -	skb->offload_fwd_mark = 1;
> +	dsa_default_offload_fwd_mark(skb);
>  
>  	return skb;
>  }
> diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
> index 7e7b7decdf39..09ab9c25e686 100644
> --- a/net/dsa/tag_dsa.c
> +++ b/net/dsa/tag_dsa.c
> @@ -162,8 +162,8 @@ static struct sk_buff *dsa_xmit_ll(struct sk_buff *skb, struct net_device *dev,
>  static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
>  				  u8 extra)
>  {
> +	bool trap = false, trunk = false;
>  	int source_device, source_port;
> -	bool trunk = false;
>  	enum dsa_code code;
>  	enum dsa_cmd cmd;
>  	u8 *dsa_header;
> @@ -174,8 +174,6 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
>  	cmd = dsa_header[0] >> 6;
>  	switch (cmd) {
>  	case DSA_CMD_FORWARD:
> -		skb->offload_fwd_mark = 1;
> -
>  		trunk = !!(dsa_header[1] & 7);
>  		break;
>  
> @@ -194,7 +192,6 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
>  			 * device (like a bridge) that forwarding has
>  			 * already been done by hardware.
>  			 */
> -			skb->offload_fwd_mark = 1;
>  			break;
>  		case DSA_CODE_MGMT_TRAP:
>  		case DSA_CODE_IGMP_MLD_TRAP:
> @@ -202,6 +199,7 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
>  			/* Traps have, by definition, not been
>  			 * forwarded by hardware, so don't mark them.
>  			 */
> +			trap = true;
>  			break;
>  		default:
>  			/* Reserved code, this could be anything. Drop
> @@ -235,6 +233,15 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
>  	if (!skb->dev)
>  		return NULL;
>  
> +	/* When using LAG offload, skb->dev is not a DSA slave interface,
> +	 * so we cannot call dsa_default_offload_fwd_mark and we need to
> +	 * special-case it.
> +	 */
> +	if (trunk)
> +		skb->offload_fwd_mark = true;
> +	else if (!trap)
> +		dsa_default_offload_fwd_mark(skb);
> +
>  	/* If the 'tagged' bit is set; convert the DSA tag to a 802.1Q
>  	 * tag, and delete the ethertype (extra) if applicable. If the
>  	 * 'tagged' bit is cleared; delete the DSA tag, and ethertype
> diff --git a/net/dsa/tag_hellcreek.c b/net/dsa/tag_hellcreek.c
> index a09805c8e1ab..c1ee6eefafe4 100644
> --- a/net/dsa/tag_hellcreek.c
> +++ b/net/dsa/tag_hellcreek.c
> @@ -44,7 +44,7 @@ static struct sk_buff *hellcreek_rcv(struct sk_buff *skb,
>  
>  	pskb_trim_rcsum(skb, skb->len - HELLCREEK_TAG_LEN);
>  
> -	skb->offload_fwd_mark = true;
> +	dsa_default_offload_fwd_mark(skb);
>  
>  	return skb;
>  }
> diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
> index 4820dbcedfa2..8eee63a5b93b 100644
> --- a/net/dsa/tag_ksz.c
> +++ b/net/dsa/tag_ksz.c
> @@ -24,7 +24,7 @@ static struct sk_buff *ksz_common_rcv(struct sk_buff *skb,
>  
>  	pskb_trim_rcsum(skb, skb->len - len);
>  
> -	skb->offload_fwd_mark = true;
> +	dsa_default_offload_fwd_mark(skb);
>  
>  	return skb;
>  }
> diff --git a/net/dsa/tag_lan9303.c b/net/dsa/tag_lan9303.c
> index aa1318dccaf0..3a5494d2f7b1 100644
> --- a/net/dsa/tag_lan9303.c
> +++ b/net/dsa/tag_lan9303.c
> @@ -115,7 +115,8 @@ static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev,
>  	skb_pull_rcsum(skb, 2 + 2);
>  	memmove(skb->data - ETH_HLEN, skb->data - (ETH_HLEN + LAN9303_TAG_LEN),
>  		2 * ETH_ALEN);
> -	skb->offload_fwd_mark = !(lan9303_tag1 & LAN9303_TAG_RX_TRAPPED_TO_CPU);
> +	if (!(lan9303_tag1 & LAN9303_TAG_RX_TRAPPED_TO_CPU))
> +		dsa_default_offload_fwd_mark(skb);
>  
>  	return skb;
>  }
> diff --git a/net/dsa/tag_mtk.c b/net/dsa/tag_mtk.c
> index f9b2966d1936..92ab21d2ceca 100644
> --- a/net/dsa/tag_mtk.c
> +++ b/net/dsa/tag_mtk.c
> @@ -92,7 +92,7 @@ static struct sk_buff *mtk_tag_rcv(struct sk_buff *skb, struct net_device *dev,
>  	if (!skb->dev)
>  		return NULL;
>  
> -	skb->offload_fwd_mark = 1;
> +	dsa_default_offload_fwd_mark(skb);
>  
>  	return skb;
>  }
> diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c
> index f9df9cac81c5..1deba3f1bb82 100644
> --- a/net/dsa/tag_ocelot.c
> +++ b/net/dsa/tag_ocelot.c
> @@ -123,7 +123,7 @@ static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
>  		 */
>  		return NULL;
>  
> -	skb->offload_fwd_mark = 1;
> +	dsa_default_offload_fwd_mark(skb);
>  	skb->priority = qos_class;
>  
>  	/* Ocelot switches copy frames unmodified to the CPU. However, it is
> diff --git a/net/dsa/tag_ocelot_8021q.c b/net/dsa/tag_ocelot_8021q.c
> index 5f3e8e124a82..447e1eeb357c 100644
> --- a/net/dsa/tag_ocelot_8021q.c
> +++ b/net/dsa/tag_ocelot_8021q.c
> @@ -81,7 +81,7 @@ static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
>  	if (!skb->dev)
>  		return NULL;
>  
> -	skb->offload_fwd_mark = 1;
> +	dsa_default_offload_fwd_mark(skb);
>  	skb->priority = qos_class;
>  
>  	return skb;
> diff --git a/net/dsa/tag_rtl4_a.c b/net/dsa/tag_rtl4_a.c
> index e9176475bac8..1864e3a74df8 100644
> --- a/net/dsa/tag_rtl4_a.c
> +++ b/net/dsa/tag_rtl4_a.c
> @@ -114,7 +114,7 @@ static struct sk_buff *rtl4a_tag_rcv(struct sk_buff *skb,
>  		skb->data - ETH_HLEN - RTL4_A_HDR_LEN,
>  		2 * ETH_ALEN);
>  
> -	skb->offload_fwd_mark = 1;
> +	dsa_default_offload_fwd_mark(skb);
>  
>  	return skb;
>  }
> diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
> index 50496013cdb7..45cdf64f0e88 100644
> --- a/net/dsa/tag_sja1105.c
> +++ b/net/dsa/tag_sja1105.c
> @@ -295,8 +295,6 @@ static struct sk_buff *sja1105_rcv(struct sk_buff *skb,
>  	is_link_local = sja1105_is_link_local(skb);
>  	is_meta = sja1105_is_meta_frame(skb);
>  
> -	skb->offload_fwd_mark = 1;
> -
>  	if (is_tagged) {
>  		/* Normal traffic path. */
>  		skb_push_rcsum(skb, ETH_HLEN);
> @@ -339,6 +337,8 @@ static struct sk_buff *sja1105_rcv(struct sk_buff *skb,
>  		return NULL;
>  	}
>  
> +	dsa_default_offload_fwd_mark(skb);
> +
>  	if (subvlan)
>  		sja1105_decode_subvlan(skb, subvlan);
>  
> diff --git a/net/dsa/tag_xrs700x.c b/net/dsa/tag_xrs700x.c
> index 858cdf9d2913..1208549f45c1 100644
> --- a/net/dsa/tag_xrs700x.c
> +++ b/net/dsa/tag_xrs700x.c
> @@ -46,7 +46,7 @@ static struct sk_buff *xrs700x_rcv(struct sk_buff *skb, struct net_device *dev,
>  		return NULL;
>  
>  	/* Frame is forwarded by hardware, don't forward in software. */
> -	skb->offload_fwd_mark = 1;
> +	dsa_default_offload_fwd_mark(skb);
>  
>  	return skb;
>  }
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 14/16] net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge
  2021-03-19  8:52   ` DENG Qingfang
@ 2021-03-19  9:06     ` Vladimir Oltean
  2021-03-19  9:29       ` DENG Qingfang
  0 siblings, 1 reply; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-19  9:06 UTC (permalink / raw)
  To: DENG Qingfang
  Cc: Jakub Kicinski, David S. Miller, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, Tobias Waldekranz, netdev, linux-kernel,
	Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel,
	Alexandre Belloni, UNGLinuxDriver, Vadym Kochan, Taras Chornyi,
	Grygorii Strashko, Vignesh Raghavendra, Ioana Ciornei,
	Ivan Vecera, linux-omap, Vladimir Oltean

On Fri, Mar 19, 2021 at 04:52:31PM +0800, DENG Qingfang wrote:
> On Fri, Mar 19, 2021 at 01:18:27AM +0200, Vladimir Oltean wrote:
> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > 
> > DSA has gained the recent ability to deal gracefully with upper
> > interfaces it cannot offload, such as the bridge, bonding or team
> > drivers. When such uppers exist, the ports are still in standalone mode
> > as far as the hardware is concerned.
> > 
> > But when we deliver packets to the software bridge in order for that to
> > do the forwarding, there is an unpleasant surprise in that the bridge
> > will refuse to forward them. This is because we unconditionally set
> > skb->offload_fwd_mark = true, meaning that the bridge thinks the frames
> > were already forwarded in hardware by us.
> > 
> > Since dp->bridge_dev is populated only when there is hardware offload
> > for it, but not in the software fallback case, let's introduce a new
> > helper that can be called from the tagger data path which sets the
> > skb->offload_fwd_mark accordingly to zero when there is no hardware
> > offload for bridging. This lets the bridge forward packets back to other
> > interfaces of our switch, if needed.
> > 
> > Without this change, sending a packet to the CPU for an unoffloaded
> > interface triggers this WARN_ON:
> > 
> > void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
> > 			      struct sk_buff *skb)
> > {
> > 	if (skb->offload_fwd_mark && !WARN_ON_ONCE(!p->offload_fwd_mark))
> > 		BR_INPUT_SKB_CB(skb)->offload_fwd_mark = p->offload_fwd_mark;
> > }
> > 
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
> > ---
> >  net/dsa/dsa_priv.h         | 14 ++++++++++++++
> >  net/dsa/tag_brcm.c         |  2 +-
> >  net/dsa/tag_dsa.c          | 15 +++++++++++----
> >  net/dsa/tag_hellcreek.c    |  2 +-
> >  net/dsa/tag_ksz.c          |  2 +-
> >  net/dsa/tag_lan9303.c      |  3 ++-
> >  net/dsa/tag_mtk.c          |  2 +-
> >  net/dsa/tag_ocelot.c       |  2 +-
> >  net/dsa/tag_ocelot_8021q.c |  2 +-
> >  net/dsa/tag_rtl4_a.c       |  2 +-
> >  net/dsa/tag_sja1105.c      |  4 ++--
> >  net/dsa/tag_xrs700x.c      |  2 +-
> >  12 files changed, 37 insertions(+), 15 deletions(-)
> > 
> > diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
> > index 92282de54230..b61bef79ce84 100644
> > --- a/net/dsa/dsa_priv.h
> > +++ b/net/dsa/dsa_priv.h
> > @@ -349,6 +349,20 @@ static inline struct sk_buff *dsa_untag_bridge_pvid(struct sk_buff *skb)
> >  	return skb;
> >  }
> >  
> > +/* If the ingress port offloads the bridge, we mark the frame as autonomously
> > + * forwarded by hardware, so the software bridge doesn't forward in twice, back
> > + * to us, because we already did. However, if we're in fallback mode and we do
> > + * software bridging, we are not offloading it, therefore the dp->bridge_dev
> > + * pointer is not populated, and flooding needs to be done by software (we are
> > + * effectively operating in standalone ports mode).
> > + */
> > +static inline void dsa_default_offload_fwd_mark(struct sk_buff *skb)
> > +{
> > +	struct dsa_port *dp = dsa_slave_to_port(skb->dev);
> > +
> > +	skb->offload_fwd_mark = !!(dp->bridge_dev);
> > +}
> 
> So offload_fwd_mark is set iff the ingress port offloads the bridge.
> Consider this set up on a switch which does NOT support LAG offload:
> 
>         +----- br0 -----+
>         |               |
>       bond0             |
>         |               |         (Linux interfaces)
>     +---+---+       +---+---+
>     |       |       |       |
> +-------+-------+-------+-------+
> | sw0p0 | sw0p1 | sw0p2 | sw0p3 |
> +-------+-------+-------+-------+
>     |       |       |       |
>     +---A---+       B       C     (LAN clients)
> 
> 
> sw0p0 and sw0p1 should be in standalone mode (offload_fwd_mark = 0),
> while sw0p2 and sw0p3 are offloaded (offload_fwd_mark = 1).
> 
> When a frame is sent into sw0p2 or sw0p3, can it be forwarded to sw0p0 or
> sw0p1?

bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
				  const struct sk_buff *skb)
{
	return !skb->offload_fwd_mark ||
	       BR_INPUT_SKB_CB(skb)->offload_fwd_mark != p->offload_fwd_mark;
}

where p->offload_fwd_mark is the mark of the egress port, and
BR_INPUT_SKB_CB(skb) is the mark of the ingress port, assigned here:

void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
			      struct sk_buff *skb)
{
	if (skb->offload_fwd_mark && !WARN_ON_ONCE(!p->offload_fwd_mark))
		BR_INPUT_SKB_CB(skb)->offload_fwd_mark = p->offload_fwd_mark;
}

Basically, sw0p0 and sw0p1 have a switchdev mark of 0, and sw0p2 and
sw0p3 have a non-zero switchdev mark, so nbp_switchdev_allowed_egress
returns true in both directions, regardless of the value of
skb->offload_fwd_mark.

> Setting offload_fwd_mark to 0 could also cause potential packet loss on
> switches that perform learning on the CPU port:
> 
> When client C is talking to client A, frames from C will:
> 1. Enter sw0p3, where the switch will learn C is reachable via sw0p3.
> 2. Be sent to the CPU port and bounced back, where the switch will learn C is
>    reachable via the CPU port, overwriting the previous learned FDB entry.
> 3. Be sent out of either sw0p0 or sw0p1, and reach its destination - A.
> 
> During step 2, if client B sends a frame to C, the frame will be forwarded to
> the CPU, which will think it is already forwarded by the switch, and refuse to
> forward it back, resulting in packet loss.
> 
> Many switch TX tags (mtk, qca, rtl) have a bit to disable source address
> learning on a per-frame basis. We should utilise that.

This is a good point actually, which I thought about, but did not give a
lot of importance to for the moment. Either we go full steam ahead with
assisted learning on the CPU port for everybody, and we selectively
learn the addresses relevant to the bridging funciton only, or we do
what you say, but then it will be a little bit more complicated IMO, and
have hardware dependencies, which isn't as nice.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 14/16] net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge
  2021-03-19  9:06     ` Vladimir Oltean
@ 2021-03-19  9:29       ` DENG Qingfang
  2021-03-19 10:49         ` Vladimir Oltean
  0 siblings, 1 reply; 52+ messages in thread
From: DENG Qingfang @ 2021-03-19  9:29 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Jakub Kicinski, David S. Miller, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, Tobias Waldekranz, netdev, linux-kernel,
	Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel,
	Alexandre Belloni, UNGLinuxDriver, Vadym Kochan, Taras Chornyi,
	Grygorii Strashko, Vignesh Raghavendra, Ioana Ciornei,
	Ivan Vecera, linux-omap, Vladimir Oltean

On Fri, Mar 19, 2021 at 5:06 PM Vladimir Oltean <olteanv@gmail.com> wrote:
>
> This is a good point actually, which I thought about, but did not give a
> lot of importance to for the moment. Either we go full steam ahead with
> assisted learning on the CPU port for everybody, and we selectively
> learn the addresses relevant to the bridging funciton only, or we do
> what you say, but then it will be a little bit more complicated IMO, and
> have hardware dependencies, which isn't as nice.

Are skb->offload_fwd_mark and source DSA switch kept in dsa_slave_xmit?
I think SA learning should be bypassed iff skb->offload_fwd_mark == 1 and
source DSA switch == destination DSA switch.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 14/16] net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge
  2021-03-19  9:29       ` DENG Qingfang
@ 2021-03-19 10:49         ` Vladimir Oltean
  2021-03-22  8:04           ` DENG Qingfang
  0 siblings, 1 reply; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-19 10:49 UTC (permalink / raw)
  To: DENG Qingfang
  Cc: Jakub Kicinski, David S. Miller, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, Tobias Waldekranz, netdev, linux-kernel,
	Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel,
	Alexandre Belloni, UNGLinuxDriver, Vadym Kochan, Taras Chornyi,
	Grygorii Strashko, Vignesh Raghavendra, Ioana Ciornei,
	Ivan Vecera, linux-omap, Vladimir Oltean

On Fri, Mar 19, 2021 at 05:29:12PM +0800, DENG Qingfang wrote:
> On Fri, Mar 19, 2021 at 5:06 PM Vladimir Oltean <olteanv@gmail.com> wrote:
> >
> > This is a good point actually, which I thought about, but did not give a
> > lot of importance to for the moment. Either we go full steam ahead with
> > assisted learning on the CPU port for everybody, and we selectively
> > learn the addresses relevant to the bridging funciton only, or we do
> > what you say, but then it will be a little bit more complicated IMO, and
> > have hardware dependencies, which isn't as nice.
> 
> Are skb->offload_fwd_mark and source DSA switch kept in dsa_slave_xmit?
> I think SA learning should be bypassed iff skb->offload_fwd_mark == 1 and
> source DSA switch == destination DSA switch.

Why would you even want to look at the source net device for forwarding?
I'd say that if dp->bridge_dev is NULL in the xmit function, you certainly
want to bypass address learning if you can. Maybe also for link-local traffic.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 01/16] net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a bridge
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 01/16] net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a bridge Vladimir Oltean
@ 2021-03-19 22:04   ` Florian Fainelli
  2021-03-22 10:24   ` Tobias Waldekranz
  1 sibling, 0 replies; 52+ messages in thread
From: Florian Fainelli @ 2021-03-19 22:04 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Tobias Waldekranz, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean



On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> DSA can properly detect and offload this sequence of operations:
> 
> ip link add br0 type bridge
> ip link add bond0 type bond
> ip link set swp0 master bond0
> ip link set bond0 master br0
> 
> But not this one:
> 
> ip link add br0 type bridge
> ip link add bond0 type bond
> ip link set bond0 master br0
> ip link set swp0 master bond0
> 
> Actually the second one is more complicated, due to the elapsed time
> between the enslavement of bond0 and the offloading of it via swp0, a
> lot of things could have happened to the bond0 bridge port in terms of
> switchdev objects (host MDBs, VLANs, altered STP state etc). So this is
> a bit of a can of worms, and making sure that the DSA port's state is in
> sync with this already existing bridge port is handled in the next
> patches.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 02/16] net: dsa: pass extack to dsa_port_{bridge,lag}_join
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 02/16] net: dsa: pass extack to dsa_port_{bridge,lag}_join Vladimir Oltean
@ 2021-03-19 22:05   ` Florian Fainelli
  2021-03-22 10:25   ` Tobias Waldekranz
  1 sibling, 0 replies; 52+ messages in thread
From: Florian Fainelli @ 2021-03-19 22:05 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Tobias Waldekranz, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean



On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> This is a pretty noisy change that was broken out of the larger change
> for replaying switchdev attributes and objects at bridge join time,
> which is when these extack objects are actually used.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 03/16] net: dsa: inherit the actual bridge port flags at join time
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 03/16] net: dsa: inherit the actual bridge port flags at join time Vladimir Oltean
@ 2021-03-19 22:08   ` Florian Fainelli
  2021-03-20 10:05     ` Vladimir Oltean
  0 siblings, 1 reply; 52+ messages in thread
From: Florian Fainelli @ 2021-03-19 22:08 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Tobias Waldekranz, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean



On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> DSA currently assumes that the bridge port starts off with this
> constellation of bridge port flags:
> 
> - learning on
> - unicast flooding on
> - multicast flooding on
> - broadcast flooding on
> 
> just by virtue of code copy-pasta from the bridge layer (new_nbp).
> This was a simple enough strategy thus far, because the 'bridge join'
> moment always coincided with the 'bridge port creation' moment.
> 
> But with sandwiched interfaces, such as:
> 
>  br0
>   |
> bond0
>   |
>  swp0
> 
> it may happen that the user has had time to change the bridge port flags
> of bond0 before enslaving swp0 to it. In that case, swp0 will falsely
> assume that the bridge port flags are those determined by new_nbp, when
> in fact this can happen:
> 
> ip link add br0 type bridge
> ip link add bond0 type bond
> ip link set bond0 master br0
> ip link set bond0 type bridge_slave learning off
> ip link set swp0 master br0
> 
> Now swp0 has learning enabled, bond0 has learning disabled. Not nice.
> 
> Fix this by "dumpster diving" through the actual bridge port flags with
> br_port_flag_is_set, at bridge join time.
> 
> We use this opportunity to split dsa_port_change_brport_flags into two
> distinct functions called dsa_port_inherit_brport_flags and
> dsa_port_clear_brport_flags, now that the implementation for the two
> cases is no longer similar.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  net/dsa/port.c | 123 ++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 82 insertions(+), 41 deletions(-)
> 
> diff --git a/net/dsa/port.c b/net/dsa/port.c
> index fcbe5b1545b8..346c50467810 100644
> --- a/net/dsa/port.c
> +++ b/net/dsa/port.c
> @@ -122,26 +122,82 @@ void dsa_port_disable(struct dsa_port *dp)
>  	rtnl_unlock();
>  }
>  
> -static void dsa_port_change_brport_flags(struct dsa_port *dp,
> -					 bool bridge_offload)
> +static void dsa_port_clear_brport_flags(struct dsa_port *dp,
> +					struct netlink_ext_ack *extack)
>  {
>  	struct switchdev_brport_flags flags;
> -	int flag;
>  
> -	flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
> -	if (bridge_offload)
> -		flags.val = flags.mask;
> -	else
> -		flags.val = flags.mask & ~BR_LEARNING;
> +	flags.mask = BR_LEARNING;
> +	flags.val = 0;
> +	dsa_port_bridge_flags(dp, flags, extack);

Would not you want to use the same for_each_set_bit() loop that
dsa_port_change_br_flags() uses, that would be a tad more compact.
-- 
Florian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 04/16] net: dsa: sync up with bridge port's STP state when joining
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 04/16] net: dsa: sync up with bridge port's STP state when joining Vladimir Oltean
@ 2021-03-19 22:11   ` Florian Fainelli
  2021-03-22 10:29   ` Tobias Waldekranz
  1 sibling, 0 replies; 52+ messages in thread
From: Florian Fainelli @ 2021-03-19 22:11 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Tobias Waldekranz, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean



On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> It may happen that we have the following topology:
> 
> ip link add br0 type bridge stp_state 1
> ip link add bond0 type bond
> ip link set bond0 master br0
> ip link set swp0 master bond0
> ip link set swp1 master bond0
> 
> STP decides that it should put bond0 into the BLOCKING state, and
> that's that. The ports that are actively listening for the switchdev
> port attributes emitted for the bond0 bridge port (because they are
> offloading it) and have the honor of seeing that switchdev port
> attribute can react to it, so we can program swp0 and swp1 into the
> BLOCKING state.
> 
> But if then we do:
> 
> ip link set swp2 master bond0
> 
> then as far as the bridge is concerned, nothing has changed: it still
> has one bridge port. But this new bridge port will not see any STP state
> change notification and will remain FORWARDING, which is how the
> standalone code leaves it in.
> 
> Add a function to the bridge which retrieves the current STP state, such
> that drivers can synchronize to it when they may have missed switchdev
> events.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 05/16] net: dsa: sync up VLAN filtering state when joining the bridge
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 05/16] net: dsa: sync up VLAN filtering state when joining the bridge Vladimir Oltean
@ 2021-03-19 22:11   ` Florian Fainelli
  2021-03-22 10:30   ` Tobias Waldekranz
  1 sibling, 0 replies; 52+ messages in thread
From: Florian Fainelli @ 2021-03-19 22:11 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Tobias Waldekranz, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean



On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> This is the same situation as for other switchdev port attributes: if we
> join an already-created bridge port, such as a bond master interface,
> then we can miss the initial switchdev notification emitted by the
> bridge for this port.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 06/16] net: dsa: sync multicast router state when joining the bridge
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 06/16] net: dsa: sync multicast router " Vladimir Oltean
@ 2021-03-19 22:12   ` Florian Fainelli
  2021-03-22 11:17   ` Tobias Waldekranz
  1 sibling, 0 replies; 52+ messages in thread
From: Florian Fainelli @ 2021-03-19 22:12 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Tobias Waldekranz, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean



On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> Make sure that the multicast router setting of the bridge is picked up
> correctly by DSA when joining, regardless of whether there are
> sandwiched interfaces or not. The SWITCHDEV_ATTR_ID_BRIDGE_MROUTER port
> attribute is only emitted from br_mc_router_state_change.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 07/16] net: dsa: sync ageing time when joining the bridge
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 07/16] net: dsa: sync ageing time " Vladimir Oltean
@ 2021-03-19 22:13   ` Florian Fainelli
  2021-03-20 10:09     ` Vladimir Oltean
  2021-03-22 11:20   ` Tobias Waldekranz
  1 sibling, 1 reply; 52+ messages in thread
From: Florian Fainelli @ 2021-03-19 22:13 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Tobias Waldekranz, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean



On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from:
> 
> sysfs/ioctl/netlink
> -> br_set_ageing_time
>    -> __set_ageing_time
> 
> therefore not at bridge port creation time, so:
> (a) drivers had to hardcode the initial value for the address ageing time,
>     because they didn't get any notification
> (b) that hardcoded value can be out of sync, if the user changes the
>     ageing time before enslaving the port to the bridge
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  include/linux/if_bridge.h |  6 ++++++
>  net/bridge/br_stp.c       | 13 +++++++++++++
>  net/dsa/port.c            | 10 ++++++++++
>  3 files changed, 29 insertions(+)
> 
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index 920d3a02cc68..ebd16495459c 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -137,6 +137,7 @@ struct net_device *br_fdb_find_port(const struct net_device *br_dev,
>  void br_fdb_clear_offload(const struct net_device *dev, u16 vid);
>  bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
>  u8 br_port_get_stp_state(const struct net_device *dev);
> +clock_t br_get_ageing_time(struct net_device *br_dev);
>  #else
>  static inline struct net_device *
>  br_fdb_find_port(const struct net_device *br_dev,
> @@ -160,6 +161,11 @@ static inline u8 br_port_get_stp_state(const struct net_device *dev)
>  {
>  	return BR_STATE_DISABLED;
>  }
> +
> +static inline clock_t br_get_ageing_time(struct net_device *br_dev)
> +{
> +	return 0;
> +}
>  #endif
>  
>  #endif
> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
> index 86b5e05d3f21..3dafb6143cff 100644
> --- a/net/bridge/br_stp.c
> +++ b/net/bridge/br_stp.c
> @@ -639,6 +639,19 @@ int br_set_ageing_time(struct net_bridge *br, clock_t ageing_time)
>  	return 0;
>  }
>  
> +clock_t br_get_ageing_time(struct net_device *br_dev)
> +{
> +	struct net_bridge *br;
> +
> +	if (!netif_is_bridge_master(br_dev))
> +		return 0;
> +
> +	br = netdev_priv(br_dev);
> +
> +	return jiffies_to_clock_t(br->ageing_time);

Don't you want an ASSERT_RTNL() in this function as well?
-- 
Florian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 08/16] net: dsa: replay port and host-joined mdb entries when joining the bridge
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 08/16] net: dsa: replay port and host-joined mdb entries " Vladimir Oltean
@ 2021-03-19 22:20   ` Florian Fainelli
  2021-03-20  9:53     ` Vladimir Oltean
  0 siblings, 1 reply; 52+ messages in thread
From: Florian Fainelli @ 2021-03-19 22:20 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Tobias Waldekranz, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean



On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> I have udhcpcd in my system and this is configured to bring interfaces
> up as soon as they are created.
> 
> I create a bridge as follows:
> 
> ip link add br0 type bridge
> 
> As soon as I create the bridge and udhcpcd brings it up, I have some
> other crap (avahi)

How dare you ;)

 that starts sending some random IPv6 packets to
> advertise some local services, and from there, the br0 bridge joins the
> following IPv6 groups:
> 
> 33:33:ff:6d:c1:9c vid 0
> 33:33:00:00:00:6a vid 0
> 33:33:00:00:00:fb vid 0
> 
> br_dev_xmit
> -> br_multicast_rcv
>    -> br_ip6_multicast_add_group
>       -> __br_multicast_add_group
>          -> br_multicast_host_join
>             -> br_mdb_notify
> 
> This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
> hooked up, and switchdev will attempt to offload the host joined groups
> to an empty list of ports. Of course nobody offloads them.
> 
> Then when we add a port to br0:
> 
> ip link set swp0 master br0
> 
> the bridge doesn't replay the host-joined MDB entries from br_add_if,
> and eventually the host joined addresses expire, and a switchdev
> notification for deleting it is emitted, but surprise, the original
> addition was already completely missed.
> 
> The strategy to address this problem is to replay the MDB entries (both
> the port ones and the host joined ones) when the new port joins the
> bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
> be populated and only then attached to a bridge that you offload).
> However there are 2 possibilities: the addresses can be 'pushed' by the
> bridge into the port, or the port can 'pull' them from the bridge.
> 
> Considering that in the general case, the new port can be really late to
> the party, and there may have been many other switchdev ports that
> already received the initial notification, we would like to avoid
> delivering duplicate events to them, since they might misbehave. And
> currently, the bridge calls the entire switchdev notifier chain, whereas
> for replaying it should just call the notifier block of the new guy.
> But the bridge doesn't know what is the new guy's notifier block, it
> just knows where the switchdev notifier chain is. So for simplification,
> we make this a driver-initiated pull for now, and the notifier block is
> passed as an argument.
> 
> To emulate the calling context for mdb objects (deferred and put on the
> blocking notifier chain), we must iterate under RCU protection through
> the bridge's mdb entries, queue them, and only call them once we're out
> of the RCU read-side critical section.
> 
> Suggested-by: Ido Schimmel <idosch@idosch.org>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  include/linux/if_bridge.h |  9 +++++
>  net/bridge/br_mdb.c       | 84 +++++++++++++++++++++++++++++++++++++++
>  net/dsa/dsa_priv.h        |  2 +
>  net/dsa/port.c            |  6 +++
>  net/dsa/slave.c           |  2 +-
>  5 files changed, 102 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index ebd16495459c..4c25dafb013d 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -69,6 +69,8 @@ bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto);
>  bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto);
>  bool br_multicast_enabled(const struct net_device *dev);
>  bool br_multicast_router(const struct net_device *dev);
> +int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
> +		  struct notifier_block *nb, struct netlink_ext_ack *extack);
>  #else
>  static inline int br_multicast_list_adjacent(struct net_device *dev,
>  					     struct list_head *br_ip_list)
> @@ -93,6 +95,13 @@ static inline bool br_multicast_router(const struct net_device *dev)
>  {
>  	return false;
>  }
> +static inline int br_mdb_replay(struct net_device *br_dev,
> +				struct net_device *dev,
> +				struct notifier_block *nb,
> +				struct netlink_ext_ack *extack)
> +{
> +	return -EINVAL;

Should we return -EOPNOTUSPP such that this is not made fatal for DSA if
someone compiles its kernel with CONFIG_BRIDGE_IGMP_SNOOPING disabled?

> +}
>  #endif
>  
>  #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_VLAN_FILTERING)
> diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
> index 8846c5bcd075..23973186094c 100644
> --- a/net/bridge/br_mdb.c
> +++ b/net/bridge/br_mdb.c
> @@ -506,6 +506,90 @@ static void br_mdb_complete(struct net_device *dev, int err, void *priv)
>  	kfree(priv);
>  }
>  
> +static int br_mdb_replay_one(struct notifier_block *nb, struct net_device *dev,
> +			     struct net_bridge_mdb_entry *mp, int obj_id,
> +			     struct net_device *orig_dev,
> +			     struct netlink_ext_ack *extack)
> +{
> +	struct switchdev_notifier_port_obj_info obj_info = {
> +		.info = {
> +			.dev = dev,
> +			.extack = extack,
> +		},
> +	};
> +	struct switchdev_obj_port_mdb mdb = {
> +		.obj = {
> +			.orig_dev = orig_dev,
> +			.id = obj_id,
> +		},
> +		.vid = mp->addr.vid,
> +	};
> +	int err;
> +
> +	if (mp->addr.proto == htons(ETH_P_IP))
> +		ip_eth_mc_map(mp->addr.dst.ip4, mdb.addr);
> +#if IS_ENABLED(CONFIG_IPV6)
> +	else if (mp->addr.proto == htons(ETH_P_IPV6))
> +		ipv6_eth_mc_map(&mp->addr.dst.ip6, mdb.addr);
> +#endif
> +	else
> +		ether_addr_copy(mdb.addr, mp->addr.dst.mac_addr);

How you would feel about re-using br_mdb_switchdev_host_port() here and
pass a 'type' value that is neither RTM_NEWDB nor RTM_DELDB just so you
don't have to duplicate that code here and we ensure it is in sync?
-- 
Florian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 10/16] net: dsa: replay VLANs installed on port when joining the bridge
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 10/16] net: dsa: replay VLANs installed on port " Vladimir Oltean
@ 2021-03-19 22:24   ` Florian Fainelli
  0 siblings, 0 replies; 52+ messages in thread
From: Florian Fainelli @ 2021-03-19 22:24 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Tobias Waldekranz, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean



On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> Currently this simple setup:
> 
> ip link add br0 type bridge vlan_filtering 1
> ip link add bond0 type bond
> ip link set bond0 master br0
> ip link set swp0 master bond0
> 
> will not work because the bridge has created the PVID in br_add_if ->
> nbp_vlan_init, and it has notified switchdev of the existence of VLAN 1,
> but that was too early, since swp0 was not yet a lower of bond0, so it
> had no reason to act upon that notification.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  include/linux/if_bridge.h | 10 ++++++
>  net/bridge/br_vlan.c      | 71 +++++++++++++++++++++++++++++++++++++++
>  net/dsa/port.c            |  6 ++++
>  3 files changed, 87 insertions(+)
> 
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index 89596134e88f..ea176c508c0d 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -111,6 +111,8 @@ int br_vlan_get_pvid_rcu(const struct net_device *dev, u16 *p_pvid);
>  int br_vlan_get_proto(const struct net_device *dev, u16 *p_proto);
>  int br_vlan_get_info(const struct net_device *dev, u16 vid,
>  		     struct bridge_vlan_info *p_vinfo);
> +int br_vlan_replay(struct net_device *br_dev, struct net_device *dev,
> +		   struct notifier_block *nb, struct netlink_ext_ack *extack);
>  #else
>  static inline bool br_vlan_enabled(const struct net_device *dev)
>  {
> @@ -137,6 +139,14 @@ static inline int br_vlan_get_info(const struct net_device *dev, u16 vid,
>  {
>  	return -EINVAL;
>  }
> +
> +static inline int br_vlan_replay(struct net_device *br_dev,
> +				 struct net_device *dev,
> +				 struct notifier_block *nb,
> +				 struct netlink_ext_ack *extack)
> +{
> +	return -EINVAL;

Same comment as patch 8, CONFIG_BRIDGE_VLAN_FILTERING can be turned off
even if this does not really make practical sense with a hardware
switch. Should we return -EOPNOTSUPP instead?
-- 
Florian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 08/16] net: dsa: replay port and host-joined mdb entries when joining the bridge
  2021-03-19 22:20   ` Florian Fainelli
@ 2021-03-20  9:53     ` Vladimir Oltean
  2021-03-22 15:56       ` Florian Fainelli
  0 siblings, 1 reply; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-20  9:53 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Jakub Kicinski, David S. Miller, Andrew Lunn, Vivien Didelot,
	Tobias Waldekranz, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

On Fri, Mar 19, 2021 at 03:20:38PM -0700, Florian Fainelli wrote:
>
>
> On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> >
> > I have udhcpcd in my system and this is configured to bring interfaces
> > up as soon as they are created.
> >
> > I create a bridge as follows:
> >
> > ip link add br0 type bridge
> >
> > As soon as I create the bridge and udhcpcd brings it up, I have some
> > other crap (avahi)
>
> How dare you ;)

Well, it comes preinstalled on my system, I don't need it, and it has
caused me nothing but trouble. So I think it has earned its title :D

> > that starts sending some random IPv6 packets to
> > advertise some local services, and from there, the br0 bridge joins the
> > following IPv6 groups:
> >
> > 33:33:ff:6d:c1:9c vid 0
> > 33:33:00:00:00:6a vid 0
> > 33:33:00:00:00:fb vid 0
> >
> > br_dev_xmit
> > -> br_multicast_rcv
> >    -> br_ip6_multicast_add_group
> >       -> __br_multicast_add_group
> >          -> br_multicast_host_join
> >             -> br_mdb_notify
> >
> > This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
> > hooked up, and switchdev will attempt to offload the host joined groups
> > to an empty list of ports. Of course nobody offloads them.
> >
> > Then when we add a port to br0:
> >
> > ip link set swp0 master br0
> >
> > the bridge doesn't replay the host-joined MDB entries from br_add_if,
> > and eventually the host joined addresses expire, and a switchdev
> > notification for deleting it is emitted, but surprise, the original
> > addition was already completely missed.
> >
> > The strategy to address this problem is to replay the MDB entries (both
> > the port ones and the host joined ones) when the new port joins the
> > bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
> > be populated and only then attached to a bridge that you offload).
> > However there are 2 possibilities: the addresses can be 'pushed' by the
> > bridge into the port, or the port can 'pull' them from the bridge.
> >
> > Considering that in the general case, the new port can be really late to
> > the party, and there may have been many other switchdev ports that
> > already received the initial notification, we would like to avoid
> > delivering duplicate events to them, since they might misbehave. And
> > currently, the bridge calls the entire switchdev notifier chain, whereas
> > for replaying it should just call the notifier block of the new guy.
> > But the bridge doesn't know what is the new guy's notifier block, it
> > just knows where the switchdev notifier chain is. So for simplification,
> > we make this a driver-initiated pull for now, and the notifier block is
> > passed as an argument.
> >
> > To emulate the calling context for mdb objects (deferred and put on the
> > blocking notifier chain), we must iterate under RCU protection through
> > the bridge's mdb entries, queue them, and only call them once we're out
> > of the RCU read-side critical section.
> >
> > Suggested-by: Ido Schimmel <idosch@idosch.org>
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > ---
> >  include/linux/if_bridge.h |  9 +++++
> >  net/bridge/br_mdb.c       | 84 +++++++++++++++++++++++++++++++++++++++
> >  net/dsa/dsa_priv.h        |  2 +
> >  net/dsa/port.c            |  6 +++
> >  net/dsa/slave.c           |  2 +-
> >  5 files changed, 102 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> > index ebd16495459c..4c25dafb013d 100644
> > --- a/include/linux/if_bridge.h
> > +++ b/include/linux/if_bridge.h
> > @@ -69,6 +69,8 @@ bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto);
> >  bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto);
> >  bool br_multicast_enabled(const struct net_device *dev);
> >  bool br_multicast_router(const struct net_device *dev);
> > +int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
> > +		  struct notifier_block *nb, struct netlink_ext_ack *extack);
> >  #else
> >  static inline int br_multicast_list_adjacent(struct net_device *dev,
> >  					     struct list_head *br_ip_list)
> > @@ -93,6 +95,13 @@ static inline bool br_multicast_router(const struct net_device *dev)
> >  {
> >  	return false;
> >  }
> > +static inline int br_mdb_replay(struct net_device *br_dev,
> > +				struct net_device *dev,
> > +				struct notifier_block *nb,
> > +				struct netlink_ext_ack *extack)
> > +{
> > +	return -EINVAL;
>
> Should we return -EOPNOTUSPP such that this is not made fatal for DSA if
> someone compiles its kernel with CONFIG_BRIDGE_IGMP_SNOOPING disabled?

Sure, I'll change the return values of the shims everywhere.

> > +}
> >  #endif
> >
> >  #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_VLAN_FILTERING)
> > diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
> > index 8846c5bcd075..23973186094c 100644
> > --- a/net/bridge/br_mdb.c
> > +++ b/net/bridge/br_mdb.c
> > @@ -506,6 +506,90 @@ static void br_mdb_complete(struct net_device *dev, int err, void *priv)
> >  	kfree(priv);
> >  }
> >
> > +static int br_mdb_replay_one(struct notifier_block *nb, struct net_device *dev,
> > +			     struct net_bridge_mdb_entry *mp, int obj_id,
> > +			     struct net_device *orig_dev,
> > +			     struct netlink_ext_ack *extack)
> > +{
> > +	struct switchdev_notifier_port_obj_info obj_info = {
> > +		.info = {
> > +			.dev = dev,
> > +			.extack = extack,
> > +		},
> > +	};
> > +	struct switchdev_obj_port_mdb mdb = {
> > +		.obj = {
> > +			.orig_dev = orig_dev,
> > +			.id = obj_id,
> > +		},
> > +		.vid = mp->addr.vid,
> > +	};
> > +	int err;
> > +
> > +	if (mp->addr.proto == htons(ETH_P_IP))
> > +		ip_eth_mc_map(mp->addr.dst.ip4, mdb.addr);
> > +#if IS_ENABLED(CONFIG_IPV6)
> > +	else if (mp->addr.proto == htons(ETH_P_IPV6))
> > +		ipv6_eth_mc_map(&mp->addr.dst.ip6, mdb.addr);
> > +#endif
> > +	else
> > +		ether_addr_copy(mdb.addr, mp->addr.dst.mac_addr);
>
> How you would feel about re-using br_mdb_switchdev_host_port() here and
> pass a 'type' value that is neither RTM_NEWDB nor RTM_DELDB just so you
> don't have to duplicate that code here and we ensure it is in sync?

The trouble is that br_mdb_switchdev_host calls switchdev_port_obj_add,
and I think the agreement was that replayed events should be a silent,
one-to-one conversation via a direct call to the notifier block of the
interested driver, as opposed to a call to the entire notifier chain
which would make everybody else in the system see duplicates. This is
the reason why I duplicated mostly everything.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 03/16] net: dsa: inherit the actual bridge port flags at join time
  2021-03-19 22:08   ` Florian Fainelli
@ 2021-03-20 10:05     ` Vladimir Oltean
  0 siblings, 0 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-20 10:05 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Jakub Kicinski, David S. Miller, Andrew Lunn, Vivien Didelot,
	Tobias Waldekranz, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

On Fri, Mar 19, 2021 at 03:08:46PM -0700, Florian Fainelli wrote:
> 
> 
> On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > 
> > DSA currently assumes that the bridge port starts off with this
> > constellation of bridge port flags:
> > 
> > - learning on
> > - unicast flooding on
> > - multicast flooding on
> > - broadcast flooding on
> > 
> > just by virtue of code copy-pasta from the bridge layer (new_nbp).
> > This was a simple enough strategy thus far, because the 'bridge join'
> > moment always coincided with the 'bridge port creation' moment.
> > 
> > But with sandwiched interfaces, such as:
> > 
> >  br0
> >   |
> > bond0
> >   |
> >  swp0
> > 
> > it may happen that the user has had time to change the bridge port flags
> > of bond0 before enslaving swp0 to it. In that case, swp0 will falsely
> > assume that the bridge port flags are those determined by new_nbp, when
> > in fact this can happen:
> > 
> > ip link add br0 type bridge
> > ip link add bond0 type bond
> > ip link set bond0 master br0
> > ip link set bond0 type bridge_slave learning off
> > ip link set swp0 master br0
> > 
> > Now swp0 has learning enabled, bond0 has learning disabled. Not nice.
> > 
> > Fix this by "dumpster diving" through the actual bridge port flags with
> > br_port_flag_is_set, at bridge join time.
> > 
> > We use this opportunity to split dsa_port_change_brport_flags into two
> > distinct functions called dsa_port_inherit_brport_flags and
> > dsa_port_clear_brport_flags, now that the implementation for the two
> > cases is no longer similar.
> > 
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > ---
> >  net/dsa/port.c | 123 ++++++++++++++++++++++++++++++++-----------------
> >  1 file changed, 82 insertions(+), 41 deletions(-)
> > 
> > diff --git a/net/dsa/port.c b/net/dsa/port.c
> > index fcbe5b1545b8..346c50467810 100644
> > --- a/net/dsa/port.c
> > +++ b/net/dsa/port.c
> > @@ -122,26 +122,82 @@ void dsa_port_disable(struct dsa_port *dp)
> >  	rtnl_unlock();
> >  }
> >  
> > -static void dsa_port_change_brport_flags(struct dsa_port *dp,
> > -					 bool bridge_offload)
> > +static void dsa_port_clear_brport_flags(struct dsa_port *dp,
> > +					struct netlink_ext_ack *extack)
> >  {
> >  	struct switchdev_brport_flags flags;
> > -	int flag;
> >  
> > -	flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
> > -	if (bridge_offload)
> > -		flags.val = flags.mask;
> > -	else
> > -		flags.val = flags.mask & ~BR_LEARNING;
> > +	flags.mask = BR_LEARNING;
> > +	flags.val = 0;
> > +	dsa_port_bridge_flags(dp, flags, extack);
> 
> Would not you want to use the same for_each_set_bit() loop that
> dsa_port_change_br_flags() uses, that would be a tad more compact.
> -- 
> Florian

The reworded version has an equal number of lines, but at least it
catches errors now:

static void dsa_port_clear_brport_flags(struct dsa_port *dp,
					struct netlink_ext_ack *extack)
{
	const unsigned long val = BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
	const unsigned long mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD |
				   BR_BCAST_FLOOD;
	int flag, err;

	for_each_set_bit(flag, &mask, 32) {
		struct switchdev_brport_flags flags = {0};

		flags.mask = BIT(flag);
		flags.val = val & BIT(flag);

		err = dsa_port_bridge_flags(dp, flags, extack);
		if (err && err != -EOPNOTSUPP)
			dev_err(dp->ds->dev,
				"failed to clear bridge port flag %d: %d (%pe)\n",
				flag, err, ERR_PTR(err));
	}
}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 07/16] net: dsa: sync ageing time when joining the bridge
  2021-03-19 22:13   ` Florian Fainelli
@ 2021-03-20 10:09     ` Vladimir Oltean
  0 siblings, 0 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-20 10:09 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Jakub Kicinski, David S. Miller, Andrew Lunn, Vivien Didelot,
	Tobias Waldekranz, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

On Fri, Mar 19, 2021 at 03:13:03PM -0700, Florian Fainelli wrote:
> > diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
> > index 86b5e05d3f21..3dafb6143cff 100644
> > --- a/net/bridge/br_stp.c
> > +++ b/net/bridge/br_stp.c
> > @@ -639,6 +639,19 @@ int br_set_ageing_time(struct net_bridge *br, clock_t ageing_time)
> >  	return 0;
> >  }
> >  
> > +clock_t br_get_ageing_time(struct net_device *br_dev)
> > +{
> > +	struct net_bridge *br;
> > +
> > +	if (!netif_is_bridge_master(br_dev))
> > +		return 0;
> > +
> > +	br = netdev_priv(br_dev);
> > +
> > +	return jiffies_to_clock_t(br->ageing_time);
> 
> Don't you want an ASSERT_RTNL() in this function as well?

Hmm, I'm not sure. I don't think I'm accessing anything that is under
the protection of the rtnl_mutex. If anything, the ageing time is
protected by the "bridge lock", but I don't think there's much of an
issue if I read an unsigned int while not holding it.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 14/16] net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge
  2021-03-19 10:49         ` Vladimir Oltean
@ 2021-03-22  8:04           ` DENG Qingfang
  2021-03-22 22:23             ` Vladimir Oltean
  0 siblings, 1 reply; 52+ messages in thread
From: DENG Qingfang @ 2021-03-22  8:04 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Jakub Kicinski, David S. Miller, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, Tobias Waldekranz, netdev, linux-kernel,
	Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel,
	Alexandre Belloni, UNGLinuxDriver, Vadym Kochan, Taras Chornyi,
	Grygorii Strashko, Vignesh Raghavendra, Ioana Ciornei,
	Ivan Vecera, linux-omap, Vladimir Oltean

On Fri, Mar 19, 2021 at 6:49 PM Vladimir Oltean <olteanv@gmail.com> wrote:
> Why would you even want to look at the source net device for forwarding?
> I'd say that if dp->bridge_dev is NULL in the xmit function, you certainly
> want to bypass address learning if you can. Maybe also for link-local traffic.

Also for trapped traffic (snooping, tc-flower trap action) if the CPU
sends them back.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 01/16] net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a bridge
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 01/16] net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a bridge Vladimir Oltean
  2021-03-19 22:04   ` Florian Fainelli
@ 2021-03-22 10:24   ` Tobias Waldekranz
  1 sibling, 0 replies; 52+ messages in thread
From: Tobias Waldekranz @ 2021-03-22 10:24 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean

On Fri, Mar 19, 2021 at 01:18, Vladimir Oltean <olteanv@gmail.com> wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> DSA can properly detect and offload this sequence of operations:
>
> ip link add br0 type bridge
> ip link add bond0 type bond
> ip link set swp0 master bond0
> ip link set bond0 master br0
>
> But not this one:
>
> ip link add br0 type bridge
> ip link add bond0 type bond
> ip link set bond0 master br0
> ip link set swp0 master bond0
>
> Actually the second one is more complicated, due to the elapsed time
> between the enslavement of bond0 and the offloading of it via swp0, a
> lot of things could have happened to the bond0 bridge port in terms of
> switchdev objects (host MDBs, VLANs, altered STP state etc). So this is
> a bit of a can of worms, and making sure that the DSA port's state is in
> sync with this already existing bridge port is handled in the next
> patches.
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---

Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 02/16] net: dsa: pass extack to dsa_port_{bridge,lag}_join
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 02/16] net: dsa: pass extack to dsa_port_{bridge,lag}_join Vladimir Oltean
  2021-03-19 22:05   ` Florian Fainelli
@ 2021-03-22 10:25   ` Tobias Waldekranz
  1 sibling, 0 replies; 52+ messages in thread
From: Tobias Waldekranz @ 2021-03-22 10:25 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean

On Fri, Mar 19, 2021 at 01:18, Vladimir Oltean <olteanv@gmail.com> wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> This is a pretty noisy change that was broken out of the larger change
> for replaying switchdev attributes and objects at bridge join time,
> which is when these extack objects are actually used.
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---

Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 04/16] net: dsa: sync up with bridge port's STP state when joining
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 04/16] net: dsa: sync up with bridge port's STP state when joining Vladimir Oltean
  2021-03-19 22:11   ` Florian Fainelli
@ 2021-03-22 10:29   ` Tobias Waldekranz
  1 sibling, 0 replies; 52+ messages in thread
From: Tobias Waldekranz @ 2021-03-22 10:29 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean

On Fri, Mar 19, 2021 at 01:18, Vladimir Oltean <olteanv@gmail.com> wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> It may happen that we have the following topology:
>
> ip link add br0 type bridge stp_state 1
> ip link add bond0 type bond
> ip link set bond0 master br0
> ip link set swp0 master bond0
> ip link set swp1 master bond0
>
> STP decides that it should put bond0 into the BLOCKING state, and
> that's that. The ports that are actively listening for the switchdev
> port attributes emitted for the bond0 bridge port (because they are
> offloading it) and have the honor of seeing that switchdev port
> attribute can react to it, so we can program swp0 and swp1 into the
> BLOCKING state.
>
> But if then we do:
>
> ip link set swp2 master bond0
>
> then as far as the bridge is concerned, nothing has changed: it still
> has one bridge port. But this new bridge port will not see any STP state
> change notification and will remain FORWARDING, which is how the
> standalone code leaves it in.
>
> Add a function to the bridge which retrieves the current STP state, such
> that drivers can synchronize to it when they may have missed switchdev
> events.
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---

Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 05/16] net: dsa: sync up VLAN filtering state when joining the bridge
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 05/16] net: dsa: sync up VLAN filtering state when joining the bridge Vladimir Oltean
  2021-03-19 22:11   ` Florian Fainelli
@ 2021-03-22 10:30   ` Tobias Waldekranz
  1 sibling, 0 replies; 52+ messages in thread
From: Tobias Waldekranz @ 2021-03-22 10:30 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean

On Fri, Mar 19, 2021 at 01:18, Vladimir Oltean <olteanv@gmail.com> wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> This is the same situation as for other switchdev port attributes: if we
> join an already-created bridge port, such as a bond master interface,
> then we can miss the initial switchdev notification emitted by the
> bridge for this port.
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---

Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 06/16] net: dsa: sync multicast router state when joining the bridge
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 06/16] net: dsa: sync multicast router " Vladimir Oltean
  2021-03-19 22:12   ` Florian Fainelli
@ 2021-03-22 11:17   ` Tobias Waldekranz
  2021-03-22 11:43     ` Vladimir Oltean
  1 sibling, 1 reply; 52+ messages in thread
From: Tobias Waldekranz @ 2021-03-22 11:17 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean

On Fri, Mar 19, 2021 at 01:18, Vladimir Oltean <olteanv@gmail.com> wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> Make sure that the multicast router setting of the bridge is picked up
> correctly by DSA when joining, regardless of whether there are
> sandwiched interfaces or not. The SWITCHDEV_ATTR_ID_BRIDGE_MROUTER port
> attribute is only emitted from br_mc_router_state_change.
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  net/dsa/port.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/net/dsa/port.c b/net/dsa/port.c
> index ac1afe182c3b..8380509ee47c 100644
> --- a/net/dsa/port.c
> +++ b/net/dsa/port.c
> @@ -189,6 +189,10 @@ static int dsa_port_switchdev_sync(struct dsa_port *dp,
>  	if (err && err != -EOPNOTSUPP)
>  		return err;
>  
> +	err = dsa_port_mrouter(dp->cpu_dp, br_multicast_router(br), extack);
> +	if (err && err != -EOPNOTSUPP)
> +		return err;
> +
>  	return 0;
>  }
>  
> @@ -212,6 +216,12 @@ static void dsa_port_switchdev_unsync(struct dsa_port *dp)
>  	dsa_port_set_state_now(dp, BR_STATE_FORWARDING);
>  
>  	/* VLAN filtering is handled by dsa_switch_bridge_leave */
> +
> +	/* Some drivers treat the notification for having a local multicast
> +	 * router by allowing multicast to be flooded to the CPU, so we should
> +	 * allow this in standalone mode too.
> +	 */
> +	dsa_port_mrouter(dp->cpu_dp, true, NULL);

Is this really for the DSA layer to decide? The driver has already been
notified that at least one port is now in standalone mode. So if that
particular driver then requires all multicast to be flooded towards the
CPU, it can make that decision on its own.

E.g. say that you implement standalone mode using a matchall TCAM rule
that maps all frames coming in on a particular port to the CPU. You
could still leave flooding of unknown multicast off in that case. Now
that driver has to figure out if the notification about a multicast
router on the CPU is a real router, or the DSA layer telling it
something that it can safely ignore.

Today I think that most (all?) DSA drivers treats mrouter in the same
way as the multicast flooding bridge flag. But AFAIK, the semantic
meaning of the setting is "flood IP multicast to this port because there
is a router behind it somewhere". This means unknown _IP_ multicast, but
also all known (IGMP/MLD) groups. As most smaller devices cannot
separate IP multicast from the non-IP variety, we flood everything. But
we should also make sure that the port in question receives all known
groups for the _bridge_ in question. Because this is really a bridge
setting, though that information is not carried over to the driver
today. So reusing it in this way feels like it could be problematic down
the road.

>  }
>  
>  int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
> -- 
> 2.25.1

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 07/16] net: dsa: sync ageing time when joining the bridge
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 07/16] net: dsa: sync ageing time " Vladimir Oltean
  2021-03-19 22:13   ` Florian Fainelli
@ 2021-03-22 11:20   ` Tobias Waldekranz
  1 sibling, 0 replies; 52+ messages in thread
From: Tobias Waldekranz @ 2021-03-22 11:20 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean

On Fri, Mar 19, 2021 at 01:18, Vladimir Oltean <olteanv@gmail.com> wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from:
>
> sysfs/ioctl/netlink
> -> br_set_ageing_time
>    -> __set_ageing_time
>
> therefore not at bridge port creation time, so:
> (a) drivers had to hardcode the initial value for the address ageing time,
>     because they didn't get any notification
> (b) that hardcoded value can be out of sync, if the user changes the
>     ageing time before enslaving the port to the bridge
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---

Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 06/16] net: dsa: sync multicast router state when joining the bridge
  2021-03-22 11:17   ` Tobias Waldekranz
@ 2021-03-22 11:43     ` Vladimir Oltean
  0 siblings, 0 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-22 11:43 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: Jakub Kicinski, David S. Miller, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

On Mon, Mar 22, 2021 at 12:17:33PM +0100, Tobias Waldekranz wrote:
> On Fri, Mar 19, 2021 at 01:18, Vladimir Oltean <olteanv@gmail.com> wrote:
> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> >
> > Make sure that the multicast router setting of the bridge is picked up
> > correctly by DSA when joining, regardless of whether there are
> > sandwiched interfaces or not. The SWITCHDEV_ATTR_ID_BRIDGE_MROUTER port
> > attribute is only emitted from br_mc_router_state_change.
> >
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > ---
> >  net/dsa/port.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/net/dsa/port.c b/net/dsa/port.c
> > index ac1afe182c3b..8380509ee47c 100644
> > --- a/net/dsa/port.c
> > +++ b/net/dsa/port.c
> > @@ -189,6 +189,10 @@ static int dsa_port_switchdev_sync(struct dsa_port *dp,
> >  	if (err && err != -EOPNOTSUPP)
> >  		return err;
> >  
> > +	err = dsa_port_mrouter(dp->cpu_dp, br_multicast_router(br), extack);
> > +	if (err && err != -EOPNOTSUPP)
> > +		return err;
> > +
> >  	return 0;
> >  }
> >  
> > @@ -212,6 +216,12 @@ static void dsa_port_switchdev_unsync(struct dsa_port *dp)
> >  	dsa_port_set_state_now(dp, BR_STATE_FORWARDING);
> >  
> >  	/* VLAN filtering is handled by dsa_switch_bridge_leave */
> > +
> > +	/* Some drivers treat the notification for having a local multicast
> > +	 * router by allowing multicast to be flooded to the CPU, so we should
> > +	 * allow this in standalone mode too.
> > +	 */
> > +	dsa_port_mrouter(dp->cpu_dp, true, NULL);
> 
> Is this really for the DSA layer to decide? The driver has already been
> notified that at least one port is now in standalone mode. So if that
> particular driver then requires all multicast to be flooded towards the
> CPU, it can make that decision on its own.
> 
> E.g. say that you implement standalone mode using a matchall TCAM rule
> that maps all frames coming in on a particular port to the CPU. You
> could still leave flooding of unknown multicast off in that case. Now
> that driver has to figure out if the notification about a multicast
> router on the CPU is a real router, or the DSA layer telling it
> something that it can safely ignore.
> 
> Today I think that most (all?) DSA drivers treats mrouter in the same
> way as the multicast flooding bridge flag. But AFAIK, the semantic
> meaning of the setting is "flood IP multicast to this port because there
> is a router behind it somewhere". This means unknown _IP_ multicast, but
> also all known (IGMP/MLD) groups. As most smaller devices cannot
> separate IP multicast from the non-IP variety, we flood everything. But
> we should also make sure that the port in question receives all known
> groups for the _bridge_ in question. Because this is really a bridge
> setting, though that information is not carried over to the driver
> today. So reusing it in this way feels like it could be problematic down
> the road.

I agree with your objections in principle, but somehow I would like to
make progress with this patch series which is not really about how we
deal with IP multicast flooding to the CPU port in standalone ports
mode, so I would like to not get bogged down too much into this for now.
Don't forget that up until recent commit a8b659e7ff75 ("net: dsa: act as
passthrough for bridge port flags"), DSA drivers had no real idea
whether multicast flooding was meant for IP or not. And in standalone
mode, the way things work now is that the CPU port should see all
traffic, so it isn't wrong to do what this patch does.
Unless you see a breaking change introduced by this patch, we can
revisit this discussion for the "RX filtering on DSA" series, where it
is more relevant.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 09/16] net: dsa: replay port and local fdb entries when joining the bridge
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 09/16] net: dsa: replay port and local fdb " Vladimir Oltean
@ 2021-03-22 15:44   ` Tobias Waldekranz
  2021-03-22 16:19     ` Vladimir Oltean
  0 siblings, 1 reply; 52+ messages in thread
From: Tobias Waldekranz @ 2021-03-22 15:44 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean

On Fri, Mar 19, 2021 at 01:18, Vladimir Oltean <olteanv@gmail.com> wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> When a DSA port joins a LAG that already had an FDB entry pointing to it:
>
> ip link set bond0 master br0
> bridge fdb add dev bond0 00:01:02:03:04:05 master static
> ip link set swp0 master bond0
>
> the DSA port will have no idea that this FDB entry is there, because it
> missed the switchdev event emitted at its creation.
>
> Ido Schimmel pointed this out during a discussion about challenges with
> switchdev offloading of stacked interfaces between the physical port and
> the bridge, and recommended to just catch that condition and deny the
> CHANGEUPPER event:
> https://lore.kernel.org/netdev/20210210105949.GB287766@shredder.lan/
>
> But in fact, we might need to deal with the hard thing anyway, which is
> to replay all FDB addresses relevant to this port, because it isn't just
> static FDB entries, but also local addresses (ones that are not
> forwarded but terminated by the bridge). There, we can't just say 'oh
> yeah, there was an upper already so I'm not joining that'.
>
> So, similar to the logic for replaying MDB entries, add a function that
> must be called by individual switchdev drivers and replays local FDB
> entries as well as ones pointing towards a bridge port. This time, we
> use the atomic switchdev notifier block, since that's what FDB entries
> expect for some reason.
>
> Reported-by: Ido Schimmel <idosch@idosch.org>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  include/linux/if_bridge.h |  9 +++++++
>  include/net/switchdev.h   |  1 +
>  net/bridge/br_fdb.c       | 52 +++++++++++++++++++++++++++++++++++++++
>  net/dsa/dsa_priv.h        |  1 +
>  net/dsa/port.c            |  4 +++
>  net/dsa/slave.c           |  2 +-
>  6 files changed, 68 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index 4c25dafb013d..89596134e88f 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -147,6 +147,8 @@ void br_fdb_clear_offload(const struct net_device *dev, u16 vid);
>  bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
>  u8 br_port_get_stp_state(const struct net_device *dev);
>  clock_t br_get_ageing_time(struct net_device *br_dev);
> +int br_fdb_replay(struct net_device *br_dev, struct net_device *dev,
> +		  struct notifier_block *nb);
>  #else
>  static inline struct net_device *
>  br_fdb_find_port(const struct net_device *br_dev,
> @@ -175,6 +177,13 @@ static inline clock_t br_get_ageing_time(struct net_device *br_dev)
>  {
>  	return 0;
>  }
> +
> +static inline int br_fdb_replay(struct net_device *br_dev,
> +				struct net_device *dev,
> +				struct notifier_block *nb)
> +{
> +	return -EINVAL;
> +}
>  #endif
>  
>  #endif
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> index b7fc7d0f54e2..7688ec572757 100644
> --- a/include/net/switchdev.h
> +++ b/include/net/switchdev.h
> @@ -205,6 +205,7 @@ struct switchdev_notifier_info {
>  
>  struct switchdev_notifier_fdb_info {
>  	struct switchdev_notifier_info info; /* must be first */
> +	struct list_head list;
>  	const unsigned char *addr;
>  	u16 vid;
>  	u8 added_by_user:1,
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index b7490237f3fc..49125cc196ac 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -726,6 +726,58 @@ static inline size_t fdb_nlmsg_size(void)
>  		+ nla_total_size(sizeof(u8)); /* NFEA_ACTIVITY_NOTIFY */
>  }
>  
> +static int br_fdb_replay_one(struct notifier_block *nb,
> +			     struct net_bridge_fdb_entry *fdb,
> +			     struct net_device *dev)
> +{
> +	struct switchdev_notifier_fdb_info item;
> +	int err;
> +
> +	item.addr = fdb->key.addr.addr;
> +	item.vid = fdb->key.vlan_id;
> +	item.added_by_user = test_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
> +	item.offloaded = test_bit(BR_FDB_OFFLOADED, &fdb->flags);
> +	item.info.dev = dev;
> +
> +	err = nb->notifier_call(nb, SWITCHDEV_FDB_ADD_TO_DEVICE, &item);
> +	return notifier_to_errno(err);
> +}
> +
> +int br_fdb_replay(struct net_device *br_dev, struct net_device *dev,
> +		  struct notifier_block *nb)
> +{
> +	struct net_bridge_fdb_entry *fdb;
> +	struct net_bridge *br;
> +	int err = 0;
> +
> +	if (!netif_is_bridge_master(br_dev))
> +		return -EINVAL;
> +
> +	if (!netif_is_bridge_port(dev))
> +		return -EINVAL;
> +
> +	br = netdev_priv(br_dev);
> +
> +	rcu_read_lock();
> +
> +	hlist_for_each_entry_rcu(fdb, &br->fdb_list, fdb_node) {
> +		struct net_device *dst_dev;
> +
> +		dst_dev = fdb->dst ? fdb->dst->dev : br->dev;
> +		if (dst_dev != br_dev && dst_dev != dev)
> +			continue;
> +

I do not know if it is a problem or not, more of an observation: This is
not guaranteed to be an exact replay of the events that the bridge port
(i.e. bond0 or whatever) has received since, in fdb_insert, we exit
early when adding local entries if that address is already in the
database.

Do we have to guard against this somehow? Or maybe we should consider
the current behavior a bug and make sure to always send the event in the
first place?

> +		err = br_fdb_replay_one(nb, fdb, dst_dev);
> +		if (err)
> +			break;
> +	}
> +
> +	rcu_read_unlock();
> +
> +	return err;
> +}
> +EXPORT_SYMBOL(br_fdb_replay);
> +
>  static void fdb_notify(struct net_bridge *br,
>  		       const struct net_bridge_fdb_entry *fdb, int type,
>  		       bool swdev_notify)
> diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
> index b14c43cb88bb..92282de54230 100644
> --- a/net/dsa/dsa_priv.h
> +++ b/net/dsa/dsa_priv.h
> @@ -262,6 +262,7 @@ static inline bool dsa_tree_offloads_bridge_port(struct dsa_switch_tree *dst,
>  
>  /* slave.c */
>  extern const struct dsa_device_ops notag_netdev_ops;
> +extern struct notifier_block dsa_slave_switchdev_notifier;
>  extern struct notifier_block dsa_slave_switchdev_blocking_notifier;
>  
>  void dsa_slave_mii_bus_init(struct dsa_switch *ds);
> diff --git a/net/dsa/port.c b/net/dsa/port.c
> index 6670612f96c6..9850051071f2 100644
> --- a/net/dsa/port.c
> +++ b/net/dsa/port.c
> @@ -205,6 +205,10 @@ static int dsa_port_switchdev_sync(struct dsa_port *dp,
>  	if (err && err != -EOPNOTSUPP)
>  		return err;
>  
> +	err = br_fdb_replay(br, brport_dev, &dsa_slave_switchdev_notifier);
> +	if (err && err != -EOPNOTSUPP)
> +		return err;
> +
>  	return 0;
>  }
>  
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index b974d8f84a2e..c51e52418a62 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -2392,7 +2392,7 @@ static struct notifier_block dsa_slave_nb __read_mostly = {
>  	.notifier_call  = dsa_slave_netdevice_event,
>  };
>  
> -static struct notifier_block dsa_slave_switchdev_notifier = {
> +struct notifier_block dsa_slave_switchdev_notifier = {
>  	.notifier_call = dsa_slave_switchdev_event,
>  };
>  
> -- 
> 2.25.1

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 15/16] net: dsa: return -EOPNOTSUPP when driver does not implement .port_lag_join
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 15/16] net: dsa: return -EOPNOTSUPP when driver does not implement .port_lag_join Vladimir Oltean
@ 2021-03-22 15:51   ` Florian Fainelli
  2021-03-22 15:58   ` Tobias Waldekranz
  1 sibling, 0 replies; 52+ messages in thread
From: Florian Fainelli @ 2021-03-22 15:51 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Tobias Waldekranz, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean



On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> The DSA core has a layered structure, and even though we end up
> returning 0 (success) to user space when setting a bonding/team upper
> that can't be offloaded, some parts of the framework actually need to
> know that we couldn't offload that.
> 
> For example, if dsa_switch_lag_join returns 0 as it currently does,
> dsa_port_lag_join has no way to tell a successful offload from a
> software fallback, and it will call dsa_port_bridge_join afterwards.
> Then we'll think we're offloading the bridge master of the LAG, when in
> fact we're not even offloading the LAG. In turn, this will make us set
> skb->offload_fwd_mark = true, which is incorrect and the bridge doesn't
> like it.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 08/16] net: dsa: replay port and host-joined mdb entries when joining the bridge
  2021-03-20  9:53     ` Vladimir Oltean
@ 2021-03-22 15:56       ` Florian Fainelli
  0 siblings, 0 replies; 52+ messages in thread
From: Florian Fainelli @ 2021-03-22 15:56 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Jakub Kicinski, David S. Miller, Andrew Lunn, Vivien Didelot,
	Tobias Waldekranz, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean



On 3/20/2021 2:53 AM, Vladimir Oltean wrote:
> On Fri, Mar 19, 2021 at 03:20:38PM -0700, Florian Fainelli wrote:
>>
>>
>> On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
>>> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>>>
>>> I have udhcpcd in my system and this is configured to bring interfaces
>>> up as soon as they are created.
>>>
>>> I create a bridge as follows:
>>>
>>> ip link add br0 type bridge
>>>
>>> As soon as I create the bridge and udhcpcd brings it up, I have some
>>> other crap (avahi)
>>
>> How dare you ;)
> 
> Well, it comes preinstalled on my system, I don't need it, and it has
> caused me nothing but trouble. So I think it has earned its title :D
> 
>>> that starts sending some random IPv6 packets to
>>> advertise some local services, and from there, the br0 bridge joins the
>>> following IPv6 groups:
>>>
>>> 33:33:ff:6d:c1:9c vid 0
>>> 33:33:00:00:00:6a vid 0
>>> 33:33:00:00:00:fb vid 0
>>>
>>> br_dev_xmit
>>> -> br_multicast_rcv
>>>    -> br_ip6_multicast_add_group
>>>       -> __br_multicast_add_group
>>>          -> br_multicast_host_join
>>>             -> br_mdb_notify
>>>
>>> This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
>>> hooked up, and switchdev will attempt to offload the host joined groups
>>> to an empty list of ports. Of course nobody offloads them.
>>>
>>> Then when we add a port to br0:
>>>
>>> ip link set swp0 master br0
>>>
>>> the bridge doesn't replay the host-joined MDB entries from br_add_if,
>>> and eventually the host joined addresses expire, and a switchdev
>>> notification for deleting it is emitted, but surprise, the original
>>> addition was already completely missed.
>>>
>>> The strategy to address this problem is to replay the MDB entries (both
>>> the port ones and the host joined ones) when the new port joins the
>>> bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
>>> be populated and only then attached to a bridge that you offload).
>>> However there are 2 possibilities: the addresses can be 'pushed' by the
>>> bridge into the port, or the port can 'pull' them from the bridge.
>>>
>>> Considering that in the general case, the new port can be really late to
>>> the party, and there may have been many other switchdev ports that
>>> already received the initial notification, we would like to avoid
>>> delivering duplicate events to them, since they might misbehave. And
>>> currently, the bridge calls the entire switchdev notifier chain, whereas
>>> for replaying it should just call the notifier block of the new guy.
>>> But the bridge doesn't know what is the new guy's notifier block, it
>>> just knows where the switchdev notifier chain is. So for simplification,
>>> we make this a driver-initiated pull for now, and the notifier block is
>>> passed as an argument.
>>>
>>> To emulate the calling context for mdb objects (deferred and put on the
>>> blocking notifier chain), we must iterate under RCU protection through
>>> the bridge's mdb entries, queue them, and only call them once we're out
>>> of the RCU read-side critical section.
>>>
>>> Suggested-by: Ido Schimmel <idosch@idosch.org>
>>> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
>>> ---
>>>  include/linux/if_bridge.h |  9 +++++
>>>  net/bridge/br_mdb.c       | 84 +++++++++++++++++++++++++++++++++++++++
>>>  net/dsa/dsa_priv.h        |  2 +
>>>  net/dsa/port.c            |  6 +++
>>>  net/dsa/slave.c           |  2 +-
>>>  5 files changed, 102 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
>>> index ebd16495459c..4c25dafb013d 100644
>>> --- a/include/linux/if_bridge.h
>>> +++ b/include/linux/if_bridge.h
>>> @@ -69,6 +69,8 @@ bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto);
>>>  bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto);
>>>  bool br_multicast_enabled(const struct net_device *dev);
>>>  bool br_multicast_router(const struct net_device *dev);
>>> +int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
>>> +		  struct notifier_block *nb, struct netlink_ext_ack *extack);
>>>  #else
>>>  static inline int br_multicast_list_adjacent(struct net_device *dev,
>>>  					     struct list_head *br_ip_list)
>>> @@ -93,6 +95,13 @@ static inline bool br_multicast_router(const struct net_device *dev)
>>>  {
>>>  	return false;
>>>  }
>>> +static inline int br_mdb_replay(struct net_device *br_dev,
>>> +				struct net_device *dev,
>>> +				struct notifier_block *nb,
>>> +				struct netlink_ext_ack *extack)
>>> +{
>>> +	return -EINVAL;
>>
>> Should we return -EOPNOTUSPP such that this is not made fatal for DSA if
>> someone compiles its kernel with CONFIG_BRIDGE_IGMP_SNOOPING disabled?
> 
> Sure, I'll change the return values of the shims everywhere.
> 
>>> +}
>>>  #endif
>>>
>>>  #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_VLAN_FILTERING)
>>> diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
>>> index 8846c5bcd075..23973186094c 100644
>>> --- a/net/bridge/br_mdb.c
>>> +++ b/net/bridge/br_mdb.c
>>> @@ -506,6 +506,90 @@ static void br_mdb_complete(struct net_device *dev, int err, void *priv)
>>>  	kfree(priv);
>>>  }
>>>
>>> +static int br_mdb_replay_one(struct notifier_block *nb, struct net_device *dev,
>>> +			     struct net_bridge_mdb_entry *mp, int obj_id,
>>> +			     struct net_device *orig_dev,
>>> +			     struct netlink_ext_ack *extack)
>>> +{
>>> +	struct switchdev_notifier_port_obj_info obj_info = {
>>> +		.info = {
>>> +			.dev = dev,
>>> +			.extack = extack,
>>> +		},
>>> +	};
>>> +	struct switchdev_obj_port_mdb mdb = {
>>> +		.obj = {
>>> +			.orig_dev = orig_dev,
>>> +			.id = obj_id,
>>> +		},
>>> +		.vid = mp->addr.vid,
>>> +	};
>>> +	int err;
>>> +
>>> +	if (mp->addr.proto == htons(ETH_P_IP))
>>> +		ip_eth_mc_map(mp->addr.dst.ip4, mdb.addr);
>>> +#if IS_ENABLED(CONFIG_IPV6)
>>> +	else if (mp->addr.proto == htons(ETH_P_IPV6))
>>> +		ipv6_eth_mc_map(&mp->addr.dst.ip6, mdb.addr);
>>> +#endif
>>> +	else
>>> +		ether_addr_copy(mdb.addr, mp->addr.dst.mac_addr);
>>
>> How you would feel about re-using br_mdb_switchdev_host_port() here and
>> pass a 'type' value that is neither RTM_NEWDB nor RTM_DELDB just so you
>> don't have to duplicate that code here and we ensure it is in sync?
> 
> The trouble is that br_mdb_switchdev_host calls switchdev_port_obj_add,
> and I think the agreement was that replayed events should be a silent,
> one-to-one conversation via a direct call to the notifier block of the
> interested driver, as opposed to a call to the entire notifier chain
> which would make everybody else in the system see duplicates. This is
> the reason why I duplicated mostly everything.

It's not a whole lot of notification but if you passed a type argument
that is neither of the two supported value (say -1),
br_mdb_switchdev_host_port() would end its execution there, and that
would avoid the duplication altogether. I am not stuck on that idea and
can hardly think for now of why this function would change, or why the
switchdev_obj_port_mdb would change, too.
-- 
Florian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 15/16] net: dsa: return -EOPNOTSUPP when driver does not implement .port_lag_join
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 15/16] net: dsa: return -EOPNOTSUPP when driver does not implement .port_lag_join Vladimir Oltean
  2021-03-22 15:51   ` Florian Fainelli
@ 2021-03-22 15:58   ` Tobias Waldekranz
  1 sibling, 0 replies; 52+ messages in thread
From: Tobias Waldekranz @ 2021-03-22 15:58 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean

On Fri, Mar 19, 2021 at 01:18, Vladimir Oltean <olteanv@gmail.com> wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> The DSA core has a layered structure, and even though we end up
> returning 0 (success) to user space when setting a bonding/team upper
> that can't be offloaded, some parts of the framework actually need to
> know that we couldn't offload that.
>
> For example, if dsa_switch_lag_join returns 0 as it currently does,
> dsa_port_lag_join has no way to tell a successful offload from a
> software fallback, and it will call dsa_port_bridge_join afterwards.
> Then we'll think we're offloading the bridge master of the LAG, when in
> fact we're not even offloading the LAG. In turn, this will make us set
> skb->offload_fwd_mark = true, which is incorrect and the bridge doesn't
> like it.
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---

Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 14/16] net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 14/16] net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge Vladimir Oltean
  2021-03-19  8:52   ` DENG Qingfang
@ 2021-03-22 16:06   ` Florian Fainelli
  1 sibling, 0 replies; 52+ messages in thread
From: Florian Fainelli @ 2021-03-22 16:06 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Tobias Waldekranz, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean



On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> DSA has gained the recent ability to deal gracefully with upper
> interfaces it cannot offload, such as the bridge, bonding or team
> drivers. When such uppers exist, the ports are still in standalone mode
> as far as the hardware is concerned.
> 
> But when we deliver packets to the software bridge in order for that to
> do the forwarding, there is an unpleasant surprise in that the bridge
> will refuse to forward them. This is because we unconditionally set
> skb->offload_fwd_mark = true, meaning that the bridge thinks the frames
> were already forwarded in hardware by us.
> 
> Since dp->bridge_dev is populated only when there is hardware offload
> for it, but not in the software fallback case, let's introduce a new
> helper that can be called from the tagger data path which sets the
> skb->offload_fwd_mark accordingly to zero when there is no hardware
> offload for bridging. This lets the bridge forward packets back to other
> interfaces of our switch, if needed.
> 
> Without this change, sending a packet to the CPU for an unoffloaded
> interface triggers this WARN_ON:
> 
> void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
> 			      struct sk_buff *skb)
> {
> 	if (skb->offload_fwd_mark && !WARN_ON_ONCE(!p->offload_fwd_mark))
> 		BR_INPUT_SKB_CB(skb)->offload_fwd_mark = p->offload_fwd_mark;
> }
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 09/16] net: dsa: replay port and local fdb entries when joining the bridge
  2021-03-22 15:44   ` Tobias Waldekranz
@ 2021-03-22 16:19     ` Vladimir Oltean
  2021-03-22 17:07       ` Tobias Waldekranz
  0 siblings, 1 reply; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-22 16:19 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: Jakub Kicinski, David S. Miller, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

On Mon, Mar 22, 2021 at 04:44:41PM +0100, Tobias Waldekranz wrote:
> I do not know if it is a problem or not, more of an observation: This is
> not guaranteed to be an exact replay of the events that the bridge port
> (i.e. bond0 or whatever) has received since, in fdb_insert, we exit
> early when adding local entries if that address is already in the
> database.
> 
> Do we have to guard against this somehow? Or maybe we should consider
> the current behavior a bug and make sure to always send the event in the
> first place?

I don't really understand what you're saying.
fdb_insert has:

	fdb = br_fdb_find(br, addr, vid);
	if (fdb) {
		/* it is okay to have multiple ports with same
		 * address, just use the first one.
		 */
		if (test_bit(BR_FDB_LOCAL, &fdb->flags))
			return 0;
		br_warn(br, "adding interface %s with same address as a received packet (addr:%pM, vlan:%u)\n",
		       source ? source->dev->name : br->dev->name, addr, vid);
		fdb_delete(br, fdb, true);
	}

	fdb = fdb_create(br, source, addr, vid,
			 BIT(BR_FDB_LOCAL) | BIT(BR_FDB_STATIC));

Basically, if the {addr, vid} pair already exists in the fdb, and it
points to a local entry, fdb_create is bypassed.

Whereas my br_fdb_replay() function iterates over br->fdb_list, which is
exactly where fdb_create() also lays its eggs. That is to say, unless
I'm missing something, that duplicate local FDB entries that skipped the
fdb_create() call in fdb_insert() because they were for already-existing
local FDB entries will also be skipped by br_fdb_replay(), because it
iterates over a br->fdb_list which contains unique local addresses.
Where am I wrong?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 16/16] net: bridge: switchdev: let drivers inform which bridge ports are offloaded
  2021-03-18 23:18 ` [RFC PATCH v2 net-next 16/16] net: bridge: switchdev: let drivers inform which bridge ports are offloaded Vladimir Oltean
@ 2021-03-22 16:30   ` Tobias Waldekranz
  2021-03-22 17:19     ` Vladimir Oltean
  0 siblings, 1 reply; 52+ messages in thread
From: Tobias Waldekranz @ 2021-03-22 16:30 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev,
	linux-kernel, Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Vadym Kochan,
	Taras Chornyi, Grygorii Strashko, Vignesh Raghavendra,
	Ioana Ciornei, Ivan Vecera, linux-omap, Vladimir Oltean

On Fri, Mar 19, 2021 at 01:18, Vladimir Oltean <olteanv@gmail.com> wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> On reception of an skb, the bridge checks if it was marked as 'already
> forwarded in hardware' (checks if skb->offload_fwd_mark == 1), and if it
> is, it puts a mark of its own on that skb, with the switchdev mark of
> the ingress port. Then during forwarding, it enforces that the egress
> port must have a different switchdev mark than the ingress one (this is
> done in nbp_switchdev_allowed_egress).
>
> Non-switchdev drivers don't report any physical switch id (neither
> through devlink nor .ndo_get_port_parent_id), therefore the bridge
> assigns them a switchdev mark of 0, and packets coming from them will
> always have skb->offload_fwd_mark = 0. So there aren't any restrictions.
>
> Problems appear due to the fact that DSA would like to perform software
> fallback for bonding and team interfaces that the physical switch cannot
> offload.
>
>          +-- br0 -+
>         /   / |    \
>        /   /  |     \
>       /   /   |      \
>      /   /    |       \
>     /   /     |        \
>    /    |     |       bond0
>   /     |     |      /    \
>  swp0  swp1  swp2  swp3  swp4
>
> There, it is desirable that the presence of swp3 and swp4 under a
> non-offloaded LAG does not preclude us from doing hardware bridging
> beteen swp0, swp1 and swp2. The bandwidth of the CPU is often times high
> enough that software bridging between {swp0,swp1,swp2} and bond0 is not
> impractical.
>
> But this creates an impossible paradox given the current way in which
> port switchdev marks are assigned. When the driver receives a packet
> from swp0 (say, due to flooding), it must set skb->offload_fwd_mark to
> something.
>
> - If we set it to 0, then the bridge will forward it towards swp1, swp2
>   and bond0. But the switch has already forwarded it towards swp1 and
>   swp2 (not to bond0, remember, that isn't offloaded, so as far as the
>   switch is concerned, ports swp3 and swp4 are not looking up the FDB,
>   and the entire bond0 is a destination that is strictly behind the
>   CPU). But we don't want duplicated traffic towards swp1 and swp2, so
>   it's not ok to set skb->offload_fwd_mark = 0.
>
> - If we set it to 1, then the bridge will not forward the skb towards
>   the ports with the same switchdev mark, i.e. not to swp1, swp2 and
>   bond0. Towards swp1 and swp2 that's ok, but towards bond0? It should
>   have forwarded the skb there.
>
> So the real issue is that bond0 will be assigned the same switchdev mark
> as {swp0,swp1,swp2}, because the function that assigns switchdev marks
> to bridge ports, nbp_switchdev_mark_set, recurses through bond0's lower
> interfaces until it finds something that implements devlink.
>
> A solution is to give the bridge explicit hints as to what switchdev
> mark it should use for each port.
>
> Currently, the bridging offload is very 'silent': a driver registers a
> netdevice notifier, which is put on the netns's notifier chain, and
> which sniffs around for NETDEV_CHANGEUPPER events where the upper is a
> bridge, and the lower is an interface it knows about (one registered by
> this driver, normally). Then, from within that notifier, it does a bunch
> of stuff behind the bridge's back, without the bridge necessarily
> knowing that there's somebody offloading that port. It looks like this:
>
>      ip link set swp0 master br0
>                   |
>                   v
>    bridge calls netdev_master_upper_dev_link
>                   |
>                   v
>         call_netdevice_notifiers
>                   |
>                   v
>        dsa_slave_netdevice_event
>                   |
>                   v
>         oh, hey! it's for me!
>                   |
>                   v
>            .port_bridge_join
>
> What we do to solve the conundrum is to be less silent, and emit a
> notification back. Something like this:
>
>      ip link set swp0 master br0
>                   |
>                   v
>    bridge calls netdev_master_upper_dev_link
>                   |
>                   v                    bridge: Aye! I'll use this
>         call_netdevice_notifiers           ^  ppid as the
>                   |                        |  switchdev mark for
>                   v                        |  this port, and zero
>        dsa_slave_netdevice_event           |  if I got nothing.
>                   |                        |
>                   v                        |
>         oh, hey! it's for me!              |
>                   |                        |
>                   v                        |
>            .port_bridge_join               |
>                   |                        |
>                   +------------------------+
>              switchdev_bridge_port_offload(swp0)
>
> Then stacked interfaces (like bond0 on top of swp3/swp4) would be
> treated differently in DSA, depending on whether we can or cannot
> offload them.
>
> The offload case:
>
>     ip link set bond0 master br0
>                   |
>                   v
>    bridge calls netdev_master_upper_dev_link
>                   |
>                   v                    bridge: Aye! I'll use this
>         call_netdevice_notifiers           ^  ppid as the
>                   |                        |  switchdev mark for
>                   v                        |        bond0.
>        dsa_slave_netdevice_event           | Coincidentally (or not),
>                   |                        | bond0 and swp0, swp1, swp2
>                   v                        | all have the same switchdev
>         hmm, it's not quite for me,        | mark now, since the ASIC
>          but my driver has already         | is able to forward towards
>            called .port_lag_join           | all these ports in hw.
>           for it, because I have           |
>       a port with dp->lag_dev == bond0.    |
>                   |                        |
>                   v                        |
>            .port_bridge_join               |
>            for swp3 and swp4               |
>                   |                        |
>                   +------------------------+
>             switchdev_bridge_port_offload(bond0)
>
> And the non-offload case:
>
>     ip link set bond0 master br0
>                   |
>                   v
>    bridge calls netdev_master_upper_dev_link
>                   |
>                   v                    bridge waiting:
>         call_netdevice_notifiers           ^  huh, switchdev_bridge_port_offload
>                   |                        |  wasn't called, okay, I'll use a
>                   v                        |  switchdev mark of zero for this one.
>        dsa_slave_netdevice_event           :  Then packets received on swp0 will
>                   |                        :  not be forwarded towards swp1, but
>                   v                        :  they will towards bond0.
>          it's not for me, but
>        bond0 is an upper of swp3
>       and swp4, but their dp->lag_dev
>        is NULL because they couldn't
>             offload it.
>
> Basically we can draw the conclusion that the lowers of a bridge port
> can come and go, so depending on the configuration of lowers for a
> bridge port, it can dynamically toggle between offloaded and unoffloaded.
> Therefore, we need an equivalent switchdev_bridge_port_unoffload too.
>
> This patch changes the way any switchdev driver interacts with the
> bridge. From now on, everybody needs to call switchdev_bridge_port_offload,
> otherwise the bridge will treat the port as non-offloaded and allow
> software flooding to other ports from the same ASIC.
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  .../ethernet/freescale/dpaa2/dpaa2-switch.c   |  4 +-
>  .../marvell/prestera/prestera_switchdev.c     |  7 ++
>  .../mellanox/mlxsw/spectrum_switchdev.c       |  4 +-
>  drivers/net/ethernet/mscc/ocelot_net.c        |  4 +-
>  drivers/net/ethernet/rocker/rocker_ofdpa.c    |  8 +-
>  drivers/net/ethernet/ti/am65-cpsw-nuss.c      |  7 +-
>  drivers/net/ethernet/ti/cpsw_new.c            |  6 +-

Why is not net/dsa included in this change?

>  include/linux/if_bridge.h                     | 16 ++++
>  net/bridge/br_if.c                            | 11 +--
>  net/bridge/br_private.h                       |  8 +-
>  net/bridge/br_switchdev.c                     | 94 ++++++++++++++++---
>  11 files changed, 138 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> index 2fd05dd18d46..f20556178e33 100644
> --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> @@ -1518,7 +1518,7 @@ static int dpaa2_switch_port_bridge_join(struct net_device *netdev,
>  	if (err)
>  		goto err_egress_flood;
>  
> -	return 0;
> +	return switchdev_bridge_port_offload(netdev, NULL);
>  
>  err_egress_flood:
>  	dpaa2_switch_port_set_fdb(port_priv, NULL);
> @@ -1552,6 +1552,8 @@ static int dpaa2_switch_port_bridge_leave(struct net_device *netdev)
>  	struct ethsw_core *ethsw = port_priv->ethsw_data;
>  	int err;
>  
> +	switchdev_bridge_port_unoffload(netdev);
> +
>  	/* First of all, fast age any learn FDB addresses on this switch port */
>  	dpaa2_switch_port_fast_age(port_priv);
>  
> diff --git a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
> index 49e052273f30..0b0d5db7b85b 100644
> --- a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
> +++ b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
> @@ -443,6 +443,10 @@ static int prestera_port_bridge_join(struct prestera_port *port,
>  		goto err_brport_create;
>  	}
>  
> +	err = switchdev_bridge_port_offload(port->dev, NULL);
> +	if (err)
> +		goto err_brport_offload;
> +
>  	if (bridge->vlan_enabled)
>  		return 0;
>  
> @@ -453,6 +457,7 @@ static int prestera_port_bridge_join(struct prestera_port *port,
>  	return 0;
>  
>  err_port_join:
> +err_brport_offload:
>  	prestera_bridge_port_put(br_port);
>  err_brport_create:
>  	prestera_bridge_put(bridge);
> @@ -520,6 +525,8 @@ static void prestera_port_bridge_leave(struct prestera_port *port,
>  	if (!br_port)
>  		return;
>  
> +	switchdev_bridge_port_unoffload(port->dev);
> +
>  	bridge = br_port->bridge;
>  
>  	if (bridge->vlan_enabled)
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
> index 23b7e8d6386b..7fa0b3653819 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
> @@ -2326,7 +2326,7 @@ int mlxsw_sp_port_bridge_join(struct mlxsw_sp_port *mlxsw_sp_port,
>  	if (err)
>  		goto err_port_join;
>  
> -	return 0;
> +	return switchdev_bridge_port_offload(brport_dev, extack);
>  
>  err_port_join:
>  	mlxsw_sp_bridge_port_put(mlxsw_sp->bridge, bridge_port);
> @@ -2348,6 +2348,8 @@ void mlxsw_sp_port_bridge_leave(struct mlxsw_sp_port *mlxsw_sp_port,
>  	if (!bridge_port)
>  		return;
>  
> +	switchdev_bridge_port_unoffload(brport_dev);
> +
>  	bridge_device->ops->port_leave(bridge_device, bridge_port,
>  				       mlxsw_sp_port);
>  	mlxsw_sp_bridge_port_put(mlxsw_sp->bridge, bridge_port);
> diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
> index d38ffc7cf5f0..b917d9dd8a6a 100644
> --- a/drivers/net/ethernet/mscc/ocelot_net.c
> +++ b/drivers/net/ethernet/mscc/ocelot_net.c
> @@ -1213,7 +1213,7 @@ static int ocelot_netdevice_bridge_join(struct net_device *dev,
>  	if (err)
>  		goto err_switchdev_sync;
>  
> -	return 0;
> +	return switchdev_bridge_port_offload(brport_dev, extack);
>  
>  err_switchdev_sync:
>  	ocelot_port_bridge_leave(ocelot, port, bridge);
> @@ -1234,6 +1234,8 @@ static int ocelot_netdevice_bridge_leave(struct net_device *dev,
>  	if (err)
>  		return err;
>  
> +	switchdev_bridge_port_unoffload(brport_dev);
> +
>  	ocelot_port_bridge_leave(ocelot, port, bridge);
>  
>  	return 0;
> diff --git a/drivers/net/ethernet/rocker/rocker_ofdpa.c b/drivers/net/ethernet/rocker/rocker_ofdpa.c
> index 967a634ee9ac..9b6d7cac112b 100644
> --- a/drivers/net/ethernet/rocker/rocker_ofdpa.c
> +++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c
> @@ -2592,13 +2592,19 @@ static int ofdpa_port_bridge_join(struct ofdpa_port *ofdpa_port,
>  
>  	ofdpa_port->bridge_dev = bridge;
>  
> -	return ofdpa_port_vlan_add(ofdpa_port, OFDPA_UNTAGGED_VID, 0);
> +	err = ofdpa_port_vlan_add(ofdpa_port, OFDPA_UNTAGGED_VID, 0);
> +	if (err)
> +		return err;
> +
> +	return switchdev_bridge_port_offload(ofdpa_port->dev, NULL);
>  }
>  
>  static int ofdpa_port_bridge_leave(struct ofdpa_port *ofdpa_port)
>  {
>  	int err;
>  
> +	switchdev_bridge_port_unoffload(ofdpa_port->dev);
> +
>  	err = ofdpa_port_vlan_del(ofdpa_port, OFDPA_UNTAGGED_VID, 0);
>  	if (err)
>  		return err;
> diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
> index 638d7b03be4b..fe2e38971acc 100644
> --- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
> +++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
> @@ -7,6 +7,7 @@
>  
>  #include <linux/clk.h>
>  #include <linux/etherdevice.h>
> +#include <linux/if_bridge.h>
>  #include <linux/if_vlan.h>
>  #include <linux/interrupt.h>
>  #include <linux/kernel.h>
> @@ -2082,6 +2083,7 @@ static int am65_cpsw_netdevice_port_link(struct net_device *ndev, struct net_dev
>  {
>  	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
>  	struct am65_cpsw_ndev_priv *priv = am65_ndev_to_priv(ndev);
> +	int err;
>  
>  	if (!common->br_members) {
>  		common->hw_bridge_dev = br_ndev;
> @@ -2097,7 +2099,8 @@ static int am65_cpsw_netdevice_port_link(struct net_device *ndev, struct net_dev
>  
>  	am65_cpsw_port_offload_fwd_mark_update(common);
>  
> -	return NOTIFY_DONE;
> +	err = switchdev_bridge_port_offload(ndev, NULL);
> +	return notifier_to_errno(err);
>  }
>  
>  static void am65_cpsw_netdevice_port_unlink(struct net_device *ndev)
> @@ -2105,6 +2108,8 @@ static void am65_cpsw_netdevice_port_unlink(struct net_device *ndev)
>  	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
>  	struct am65_cpsw_ndev_priv *priv = am65_ndev_to_priv(ndev);
>  
> +	switchdev_bridge_port_unoffload(ndev);
> +
>  	common->br_members &= ~BIT(priv->port->port_id);
>  
>  	am65_cpsw_port_offload_fwd_mark_update(common);
> diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
> index 58a64313ac00..6347532fb39d 100644
> --- a/drivers/net/ethernet/ti/cpsw_new.c
> +++ b/drivers/net/ethernet/ti/cpsw_new.c
> @@ -1508,6 +1508,7 @@ static int cpsw_netdevice_port_link(struct net_device *ndev,
>  {
>  	struct cpsw_priv *priv = netdev_priv(ndev);
>  	struct cpsw_common *cpsw = priv->cpsw;
> +	int err;
>  
>  	if (!cpsw->br_members) {
>  		cpsw->hw_bridge_dev = br_ndev;
> @@ -1523,7 +1524,8 @@ static int cpsw_netdevice_port_link(struct net_device *ndev,
>  
>  	cpsw_port_offload_fwd_mark_update(cpsw);
>  
> -	return NOTIFY_DONE;
> +	err = switchdev_bridge_port_offload(ndev, NULL);
> +	return notifier_to_errno(err);
>  }
>  
>  static void cpsw_netdevice_port_unlink(struct net_device *ndev)
> @@ -1531,6 +1533,8 @@ static void cpsw_netdevice_port_unlink(struct net_device *ndev)
>  	struct cpsw_priv *priv = netdev_priv(ndev);
>  	struct cpsw_common *cpsw = priv->cpsw;
>  
> +	switchdev_bridge_port_unoffload(ndev);
> +
>  	cpsw->br_members &= ~BIT(priv->emac_port);
>  
>  	cpsw_port_offload_fwd_mark_update(cpsw);
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index ea176c508c0d..4fbee6d5fc16 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -196,4 +196,20 @@ static inline int br_fdb_replay(struct net_device *br_dev,
>  }
>  #endif
>  
> +#if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_NET_SWITCHDEV)
> +int switchdev_bridge_port_offload(struct net_device *dev,
> +				  struct netlink_ext_ack *extack);
> +int switchdev_bridge_port_unoffload(struct net_device *dev);
> +#else
> +int switchdev_bridge_port_offload(struct net_device *dev,
> +				  struct netlink_ext_ack *extack)
> +{
> +	return 0;
> +}
> +
> +int switchdev_bridge_port_unoffload(struct net_device *dev)
> +{
> +}
> +#endif
> +
>  #endif
> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> index f7d2f472ae24..930a09f27e0d 100644
> --- a/net/bridge/br_if.c
> +++ b/net/bridge/br_if.c
> @@ -643,10 +643,6 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
>  	if (err)
>  		goto err5;
>  
> -	err = nbp_switchdev_mark_set(p);
> -	if (err)
> -		goto err6;
> -
>  	dev_disable_lro(dev);
>  
>  	list_add_rcu(&p->list, &br->port_list);
> @@ -671,13 +667,13 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
>  		 */
>  		err = dev_pre_changeaddr_notify(br->dev, dev->dev_addr, extack);
>  		if (err)
> -			goto err7;
> +			goto err6;
>  	}
>  
>  	err = nbp_vlan_init(p, extack);
>  	if (err) {
>  		netdev_err(dev, "failed to initialize vlan filtering on this port\n");
> -		goto err7;
> +		goto err6;
>  	}
>  
>  	spin_lock_bh(&br->lock);
> @@ -700,11 +696,10 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
>  
>  	return 0;
>  
> -err7:
> +err6:
>  	list_del_rcu(&p->list);
>  	br_fdb_delete_by_port(br, p, 0, 1);
>  	nbp_update_port_count(br);
> -err6:
>  	netdev_upper_dev_unlink(dev, br->dev);
>  err5:
>  	dev->priv_flags &= ~IFF_BRIDGE_PORT;
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index d7d167e10b70..1982b5887d0f 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -326,8 +326,10 @@ struct net_bridge_port {
>  #ifdef CONFIG_NET_POLL_CONTROLLER
>  	struct netpoll			*np;
>  #endif
> +	int				offload_count;

Should this be conditional on CONFIG_NET_SWITCHDEV?

>  #ifdef CONFIG_NET_SWITCHDEV
>  	int				offload_fwd_mark;
> +	struct netdev_phys_item_id	ppid;
>  #endif
>  	u16				group_fwd_mask;
>  	u16				backup_redirected_cnt;
> @@ -1572,7 +1574,6 @@ static inline void br_sysfs_delbr(struct net_device *dev) { return; }
>  
>  /* br_switchdev.c */
>  #ifdef CONFIG_NET_SWITCHDEV
> -int nbp_switchdev_mark_set(struct net_bridge_port *p);
>  void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
>  			      struct sk_buff *skb);
>  bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
> @@ -1592,11 +1593,6 @@ static inline void br_switchdev_frame_unmark(struct sk_buff *skb)
>  	skb->offload_fwd_mark = 0;
>  }
>  #else
> -static inline int nbp_switchdev_mark_set(struct net_bridge_port *p)
> -{
> -	return 0;
> -}
> -
>  static inline void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
>  					    struct sk_buff *skb)
>  {
> diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> index b89503832fcc..4cf7902f056c 100644
> --- a/net/bridge/br_switchdev.c
> +++ b/net/bridge/br_switchdev.c
> @@ -8,37 +8,109 @@
>  
>  #include "br_private.h"
>  
> -static int br_switchdev_mark_get(struct net_bridge *br, struct net_device *dev)
> +static int br_switchdev_mark_get(struct net_bridge *br,
> +				 struct net_bridge_port *new_nbp)
>  {
>  	struct net_bridge_port *p;
>  
>  	/* dev is yet to be added to the port list. */
>  	list_for_each_entry(p, &br->port_list, list) {
> -		if (netdev_port_same_parent_id(dev, p->dev))
> +		if (!p->offload_count)
> +			continue;
> +
> +		if (netdev_phys_item_id_same(&p->ppid, &new_nbp->ppid))
>  			return p->offload_fwd_mark;
>  	}
>  
>  	return ++br->offload_fwd_mark;
>  }
>  
> -int nbp_switchdev_mark_set(struct net_bridge_port *p)
> +static int nbp_switchdev_mark_set(struct net_bridge_port *p,
> +				  struct netdev_phys_item_id ppid,
> +				  struct netlink_ext_ack *extack)
> +{
> +	if (p->offload_count) {
> +		/* Prevent unsupported configurations such as a bridge port
> +		 * which is a bonding interface, and the member ports are from
> +		 * different hardware switches.
> +		 */
> +		if (!netdev_phys_item_id_same(&p->ppid, &ppid)) {
> +			NL_SET_ERR_MSG_MOD(extack,
> +					   "Same bridge port cannot be offloaded by two physical switches");
> +			return -EBUSY;
> +		}
> +		/* Be tolerant with drivers that call SWITCHDEV_BRPORT_OFFLOADED
> +		 * more than once for the same bridge port, such as when the
> +		 * bridge port is an offloaded bonding/team interface.
> +		 */
> +		p->offload_count++;
> +		return 0;
> +	}
> +
> +	p->ppid = ppid;
> +	p->offload_count = 1;
> +	p->offload_fwd_mark = br_switchdev_mark_get(p->br, p);
> +
> +	return 0;
> +}
> +
> +static void nbp_switchdev_mark_clear(struct net_bridge_port *p,
> +				     struct netdev_phys_item_id ppid)
> +{
> +	if (WARN_ON(!netdev_phys_item_id_same(&p->ppid, &ppid)))
> +		return;
> +	if (WARN_ON(!p->offload_count))
> +		return;
> +
> +	p->offload_count--;
> +	if (p->offload_count)
> +		return;
> +
> +	p->offload_fwd_mark = 0;
> +}
> +
> +/* Let the bridge know that this port is offloaded, so that it can use
> + * the port parent id obtained by recursion to determine the bridge
> + * port's switchdev mark.
> + */
> +int switchdev_bridge_port_offload(struct net_device *dev,
> +				  struct netlink_ext_ack *extack)
>  {
> -	struct netdev_phys_item_id ppid = { };
> +	struct netdev_phys_item_id ppid;
> +	struct net_bridge_port *p;
>  	int err;
>  
> -	ASSERT_RTNL();
> +	p = br_port_get_rtnl(dev);
> +	if (!p)
> +		return -ENODEV;
>  
> -	err = dev_get_port_parent_id(p->dev, &ppid, true);
> -	if (err) {
> -		if (err == -EOPNOTSUPP)
> -			return 0;
> +	err = dev_get_port_parent_id(dev, &ppid, true);
> +	if (err)
> +		return err;
> +
> +	return nbp_switchdev_mark_set(p, ppid, extack);
> +}
> +EXPORT_SYMBOL_GPL(switchdev_bridge_port_offload);
> +
> +int switchdev_bridge_port_unoffload(struct net_device *dev)
> +{
> +	struct netdev_phys_item_id ppid;
> +	struct net_bridge_port *p;
> +	int err;
> +

Should we ASSERT_RTNL here as well?

> +	p = br_port_get_rtnl(dev);
> +	if (!p)
> +		return -ENODEV;
> +
> +	err = dev_get_port_parent_id(dev, &ppid, true);
> +	if (err)
>  		return err;
> -	}
>  
> -	p->offload_fwd_mark = br_switchdev_mark_get(p->br, p->dev);
> +	nbp_switchdev_mark_clear(p, ppid);
>  
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(switchdev_bridge_port_unoffload);
>  
>  void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
>  			      struct sk_buff *skb)
> -- 
> 2.25.1

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 09/16] net: dsa: replay port and local fdb entries when joining the bridge
  2021-03-22 16:19     ` Vladimir Oltean
@ 2021-03-22 17:07       ` Tobias Waldekranz
  2021-03-22 17:13         ` Vladimir Oltean
  0 siblings, 1 reply; 52+ messages in thread
From: Tobias Waldekranz @ 2021-03-22 17:07 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Jakub Kicinski, David S. Miller, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

On Mon, Mar 22, 2021 at 18:19, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Mon, Mar 22, 2021 at 04:44:41PM +0100, Tobias Waldekranz wrote:
>> I do not know if it is a problem or not, more of an observation: This is
>> not guaranteed to be an exact replay of the events that the bridge port
>> (i.e. bond0 or whatever) has received since, in fdb_insert, we exit
>> early when adding local entries if that address is already in the
>> database.
>> 
>> Do we have to guard against this somehow? Or maybe we should consider
>> the current behavior a bug and make sure to always send the event in the
>> first place?
>
> I don't really understand what you're saying.
> fdb_insert has:
>
> 	fdb = br_fdb_find(br, addr, vid);
> 	if (fdb) {
> 		/* it is okay to have multiple ports with same
> 		 * address, just use the first one.
> 		 */
> 		if (test_bit(BR_FDB_LOCAL, &fdb->flags))
> 			return 0;
> 		br_warn(br, "adding interface %s with same address as a received packet (addr:%pM, vlan:%u)\n",
> 		       source ? source->dev->name : br->dev->name, addr, vid);
> 		fdb_delete(br, fdb, true);
> 	}
>
> 	fdb = fdb_create(br, source, addr, vid,
> 			 BIT(BR_FDB_LOCAL) | BIT(BR_FDB_STATIC));
>
> Basically, if the {addr, vid} pair already exists in the fdb, and it
> points to a local entry, fdb_create is bypassed.
>
> Whereas my br_fdb_replay() function iterates over br->fdb_list, which is
> exactly where fdb_create() also lays its eggs. That is to say, unless
> I'm missing something, that duplicate local FDB entries that skipped the
> fdb_create() call in fdb_insert() because they were for already-existing
> local FDB entries will also be skipped by br_fdb_replay(), because it
> iterates over a br->fdb_list which contains unique local addresses.
> Where am I wrong?

No you are right. I was thinking back to my attempt of offloading local
addresses and I distinctly remembered that local addresses could be
added without a notification being sent.

But that is not what is happening. It is just already inserted on
another port. So the notification would reach DSA, or not, depending on
ordering the of events. But there will be no discrepancy between that
and the replay.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 09/16] net: dsa: replay port and local fdb entries when joining the bridge
  2021-03-22 17:07       ` Tobias Waldekranz
@ 2021-03-22 17:13         ` Vladimir Oltean
  0 siblings, 0 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-22 17:13 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: Jakub Kicinski, David S. Miller, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

On Mon, Mar 22, 2021 at 06:07:51PM +0100, Tobias Waldekranz wrote:
> On Mon, Mar 22, 2021 at 18:19, Vladimir Oltean <olteanv@gmail.com> wrote:
> > On Mon, Mar 22, 2021 at 04:44:41PM +0100, Tobias Waldekranz wrote:
> >> I do not know if it is a problem or not, more of an observation: This is
> >> not guaranteed to be an exact replay of the events that the bridge port
> >> (i.e. bond0 or whatever) has received since, in fdb_insert, we exit
> >> early when adding local entries if that address is already in the
> >> database.
> >> 
> >> Do we have to guard against this somehow? Or maybe we should consider
> >> the current behavior a bug and make sure to always send the event in the
> >> first place?
> >
> > I don't really understand what you're saying.
> > fdb_insert has:
> >
> > 	fdb = br_fdb_find(br, addr, vid);
> > 	if (fdb) {
> > 		/* it is okay to have multiple ports with same
> > 		 * address, just use the first one.
> > 		 */
> > 		if (test_bit(BR_FDB_LOCAL, &fdb->flags))
> > 			return 0;
> > 		br_warn(br, "adding interface %s with same address as a received packet (addr:%pM, vlan:%u)\n",
> > 		       source ? source->dev->name : br->dev->name, addr, vid);
> > 		fdb_delete(br, fdb, true);
> > 	}
> >
> > 	fdb = fdb_create(br, source, addr, vid,
> > 			 BIT(BR_FDB_LOCAL) | BIT(BR_FDB_STATIC));
> >
> > Basically, if the {addr, vid} pair already exists in the fdb, and it
> > points to a local entry, fdb_create is bypassed.
> >
> > Whereas my br_fdb_replay() function iterates over br->fdb_list, which is
> > exactly where fdb_create() also lays its eggs. That is to say, unless
> > I'm missing something, that duplicate local FDB entries that skipped the
> > fdb_create() call in fdb_insert() because they were for already-existing
> > local FDB entries will also be skipped by br_fdb_replay(), because it
> > iterates over a br->fdb_list which contains unique local addresses.
> > Where am I wrong?
> 
> No you are right. I was thinking back to my attempt of offloading local
> addresses and I distinctly remembered that local addresses could be
> added without a notification being sent.
> 
> But that is not what is happening. It is just already inserted on
> another port. So the notification would reach DSA, or not, depending on
> ordering the of events. But there will be no discrepancy between that
> and the replay.

I'm not saying that the bridge isn't broken, because it is, but for
different reasons, as explained here:
https://patchwork.kernel.org/project/netdevbpf/patch/20210224114350.2791260-9-olteanv@gmail.com/

What I can do is I can make br_switchdev_fdb_notify() skip fdb entries
with the BR_FDB_LOCAL bit set, and target that patch against "net", with
a Fixes: tag of 6b26b51b1d13 ("net: bridge: Add support for notifying
devices about FDB add/del").
Then I can also skip the entries with BR_FDB_LOCAL from br_fdb_replay.
Then, when I return to the "RX filtering for DSA" series, I can add the
"is_local" bit to switchdev FDB objects, and make all drivers reject
"is_local" entries (which is what the linked patch does) unless more
specific treatment is applied to those (trap to CPU).
Nikolay?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 16/16] net: bridge: switchdev: let drivers inform which bridge ports are offloaded
  2021-03-22 16:30   ` Tobias Waldekranz
@ 2021-03-22 17:19     ` Vladimir Oltean
  0 siblings, 0 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-22 17:19 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: Jakub Kicinski, David S. Miller, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Vadym Kochan, Taras Chornyi, Grygorii Strashko,
	Vignesh Raghavendra, Ioana Ciornei, Ivan Vecera, linux-omap,
	Vladimir Oltean

On Mon, Mar 22, 2021 at 05:30:52PM +0100, Tobias Waldekranz wrote:
> > ---
> >  .../ethernet/freescale/dpaa2/dpaa2-switch.c   |  4 +-
> >  .../marvell/prestera/prestera_switchdev.c     |  7 ++
> >  .../mellanox/mlxsw/spectrum_switchdev.c       |  4 +-
> >  drivers/net/ethernet/mscc/ocelot_net.c        |  4 +-
> >  drivers/net/ethernet/rocker/rocker_ofdpa.c    |  8 +-
> >  drivers/net/ethernet/ti/am65-cpsw-nuss.c      |  7 +-
> >  drivers/net/ethernet/ti/cpsw_new.c            |  6 +-
> 
> Why is not net/dsa included in this change?

I don't know, must have went shopping somewhere?
I'll make sure DSA is included in this change when I resend.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 net-next 14/16] net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge
  2021-03-22  8:04           ` DENG Qingfang
@ 2021-03-22 22:23             ` Vladimir Oltean
  0 siblings, 0 replies; 52+ messages in thread
From: Vladimir Oltean @ 2021-03-22 22:23 UTC (permalink / raw)
  To: DENG Qingfang
  Cc: Jakub Kicinski, David S. Miller, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, Tobias Waldekranz, netdev, linux-kernel,
	Roopa Prabhu, Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel,
	Alexandre Belloni, UNGLinuxDriver, Vadym Kochan, Taras Chornyi,
	Grygorii Strashko, Vignesh Raghavendra, Ioana Ciornei,
	Ivan Vecera, linux-omap, Vladimir Oltean

On Mon, Mar 22, 2021 at 04:04:01PM +0800, DENG Qingfang wrote:
> On Fri, Mar 19, 2021 at 6:49 PM Vladimir Oltean <olteanv@gmail.com> wrote:
> > Why would you even want to look at the source net device for forwarding?
> > I'd say that if dp->bridge_dev is NULL in the xmit function, you certainly
> > want to bypass address learning if you can. Maybe also for link-local traffic.
> 
> Also for trapped traffic (snooping, tc-flower trap action) if the CPU
> sends them back.

This sounds line an interesting use case, please tell me more about what
commands I could run to reinject trapped packets into the hardware data
path.

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, back to index

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-18 23:18 [RFC PATCH v2 net-next 00/16] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
2021-03-18 23:18 ` [RFC PATCH v2 net-next 01/16] net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a bridge Vladimir Oltean
2021-03-19 22:04   ` Florian Fainelli
2021-03-22 10:24   ` Tobias Waldekranz
2021-03-18 23:18 ` [RFC PATCH v2 net-next 02/16] net: dsa: pass extack to dsa_port_{bridge,lag}_join Vladimir Oltean
2021-03-19 22:05   ` Florian Fainelli
2021-03-22 10:25   ` Tobias Waldekranz
2021-03-18 23:18 ` [RFC PATCH v2 net-next 03/16] net: dsa: inherit the actual bridge port flags at join time Vladimir Oltean
2021-03-19 22:08   ` Florian Fainelli
2021-03-20 10:05     ` Vladimir Oltean
2021-03-18 23:18 ` [RFC PATCH v2 net-next 04/16] net: dsa: sync up with bridge port's STP state when joining Vladimir Oltean
2021-03-19 22:11   ` Florian Fainelli
2021-03-22 10:29   ` Tobias Waldekranz
2021-03-18 23:18 ` [RFC PATCH v2 net-next 05/16] net: dsa: sync up VLAN filtering state when joining the bridge Vladimir Oltean
2021-03-19 22:11   ` Florian Fainelli
2021-03-22 10:30   ` Tobias Waldekranz
2021-03-18 23:18 ` [RFC PATCH v2 net-next 06/16] net: dsa: sync multicast router " Vladimir Oltean
2021-03-19 22:12   ` Florian Fainelli
2021-03-22 11:17   ` Tobias Waldekranz
2021-03-22 11:43     ` Vladimir Oltean
2021-03-18 23:18 ` [RFC PATCH v2 net-next 07/16] net: dsa: sync ageing time " Vladimir Oltean
2021-03-19 22:13   ` Florian Fainelli
2021-03-20 10:09     ` Vladimir Oltean
2021-03-22 11:20   ` Tobias Waldekranz
2021-03-18 23:18 ` [RFC PATCH v2 net-next 08/16] net: dsa: replay port and host-joined mdb entries " Vladimir Oltean
2021-03-19 22:20   ` Florian Fainelli
2021-03-20  9:53     ` Vladimir Oltean
2021-03-22 15:56       ` Florian Fainelli
2021-03-18 23:18 ` [RFC PATCH v2 net-next 09/16] net: dsa: replay port and local fdb " Vladimir Oltean
2021-03-22 15:44   ` Tobias Waldekranz
2021-03-22 16:19     ` Vladimir Oltean
2021-03-22 17:07       ` Tobias Waldekranz
2021-03-22 17:13         ` Vladimir Oltean
2021-03-18 23:18 ` [RFC PATCH v2 net-next 10/16] net: dsa: replay VLANs installed on port " Vladimir Oltean
2021-03-19 22:24   ` Florian Fainelli
2021-03-18 23:18 ` [RFC PATCH v2 net-next 11/16] net: ocelot: support multiple bridges Vladimir Oltean
2021-03-18 23:18 ` [RFC PATCH v2 net-next 12/16] net: ocelot: call ocelot_netdevice_bridge_join when joining a bridged LAG Vladimir Oltean
2021-03-18 23:18 ` [RFC PATCH v2 net-next 13/16] net: ocelot: replay switchdev events when joining bridge Vladimir Oltean
2021-03-18 23:18 ` [RFC PATCH v2 net-next 14/16] net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge Vladimir Oltean
2021-03-19  8:52   ` DENG Qingfang
2021-03-19  9:06     ` Vladimir Oltean
2021-03-19  9:29       ` DENG Qingfang
2021-03-19 10:49         ` Vladimir Oltean
2021-03-22  8:04           ` DENG Qingfang
2021-03-22 22:23             ` Vladimir Oltean
2021-03-22 16:06   ` Florian Fainelli
2021-03-18 23:18 ` [RFC PATCH v2 net-next 15/16] net: dsa: return -EOPNOTSUPP when driver does not implement .port_lag_join Vladimir Oltean
2021-03-22 15:51   ` Florian Fainelli
2021-03-22 15:58   ` Tobias Waldekranz
2021-03-18 23:18 ` [RFC PATCH v2 net-next 16/16] net: bridge: switchdev: let drivers inform which bridge ports are offloaded Vladimir Oltean
2021-03-22 16:30   ` Tobias Waldekranz
2021-03-22 17:19     ` Vladimir Oltean

Linux-OMAP Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-omap/0 linux-omap/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-omap linux-omap/ https://lore.kernel.org/linux-omap \
		linux-omap@vger.kernel.org
	public-inbox-index linux-omap

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-omap


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git