All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA
@ 2022-02-09 21:30 Vladimir Oltean
  2022-02-09 21:30 ` [RFC PATCH net-next 1/5] net: bridge: vlan: br_vlan_add: notify switchdev only when changed Vladimir Oltean
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Vladimir Oltean @ 2022-02-09 21:30 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Rafael Richter,
	Daniel Klauer, Tobias Waldekranz

The motivation behind these patches is that Rafael reported the
following error with mv88e6xxx when the first switch port joins a
bridge:

mv88e6085 0x0000000008b96000:00: port 0 failed to add a6:ef:77:c8:5f:3d vid 1 to fdb: -95 (-EOPNOTSUPP)

The FDB entry that's added is the MAC address of the bridge, in VID 1
(the default_pvid), being replayed as part of br_add_if() -> ... ->
nbp_switchdev_sync_objs().

-EOPNOTSUPP is the mv88e6xxx driver's way of saying that VID 1 doesn't
exist in the VTU, so it can't program the ATU with a FID, something
which it needs.

It appears to be a race, but it isn't, since we only end up installing
VID 1 in the VTU by coincidence. DSA's approximation of programming
VLANs on the CPU port together with the user ports breaks down with
host FDB entries on mv88e6xxx, since that strictly requires the VTU to
contain the VID. But the user may freely add VLANs pointing just towards
the bridge, and FDB entries in those VLANs, and DSA will not be aware of
them, because it only listens for VLANs on user ports.

To create a solution that scales properly to cross-chip setups and
doesn't leak entries behind, some changes in the bridge driver are
required. I believe that these are for the better overall, but I may be
wrong. Namely, the same refcounting procedure that DSA has in place for
host FDB and MDB entries can be replicated for VLANs, except that it's
garbage in, garbage out: the VLAN addition and removal notifications
from switchdev aren't balanced. So the first 3 patches attempt to deal
with that.

This patch set has been superficially tested on a board with 3 mv88e6xxx
switches in a daisy chain and appears to produce the primary desired
effect - the driver no longer returns -EOPNOTSUPP when the first port
joins a bridge, and is successful in performing local termination under
a VLAN-aware bridge.
As an additional side effect, it silences the annoying "p%d: already a
member of VLAN %d\n" warning messages that the mv88e6xxx driver produces
when coupled with systemd-networkd, and a few VLANs are configured.
Furthermore, it advances Florian's idea from a few years back, which
never got merged:
https://lore.kernel.org/lkml/20180624153339.13572-1-f.fainelli@gmail.com/

Some testing:

root@debian:~# bridge vlan add dev br0 vid 101 pvid self
[  100.709220] mv88e6085 d0032004.mdio-mii:10: mv88e6xxx_port_vlan_add: port 9 vlan 101
[  100.873426] mv88e6085 d0032004.mdio-mii:10: mv88e6xxx_port_vlan_add: port 10 vlan 101
[  100.892314] mv88e6085 d0032004.mdio-mii:11: mv88e6xxx_port_vlan_add: port 9 vlan 101
[  101.053392] mv88e6085 d0032004.mdio-mii:11: mv88e6xxx_port_vlan_add: port 10 vlan 101
[  101.076994] mv88e6085 d0032004.mdio-mii:12: mv88e6xxx_port_vlan_add: port 9 vlan 101
root@debian:~# bridge vlan add dev br0 vid 101 pvid self
root@debian:~# bridge vlan add dev br0 vid 101 pvid self
root@debian:~# bridge vlan
port              vlan-id
eth0              1 PVID Egress Untagged
lan9              1 PVID Egress Untagged
lan10             1 PVID Egress Untagged
lan11             1 PVID Egress Untagged
lan12             1 PVID Egress Untagged
lan13             1 PVID Egress Untagged
lan14             1 PVID Egress Untagged
lan15             1 PVID Egress Untagged
lan16             1 PVID Egress Untagged
lan17             1 PVID Egress Untagged
lan18             1 PVID Egress Untagged
lan19             1 PVID Egress Untagged
lan20             1 PVID Egress Untagged
lan21             1 PVID Egress Untagged
lan22             1 PVID Egress Untagged
lan23             1 PVID Egress Untagged
lan24             1 PVID Egress Untagged
sfp               1 PVID Egress Untagged
lan1              1 PVID Egress Untagged
lan2              1 PVID Egress Untagged
lan3              1 PVID Egress Untagged
lan4              1 PVID Egress Untagged
lan5              1 PVID Egress Untagged
lan6              1 PVID Egress Untagged
lan7              1 PVID Egress Untagged
lan8              1 PVID Egress Untagged
br0               1 Egress Untagged
                  101 PVID
root@debian:~# bridge vlan del dev br0 vid 101 pvid self
[  108.340487] mv88e6085 d0032004.mdio-mii:11: mv88e6xxx_port_vlan_del: port 9 vlan 101
[  108.379167] mv88e6085 d0032004.mdio-mii:11: mv88e6xxx_port_vlan_del: port 10 vlan 101
[  108.402319] mv88e6085 d0032004.mdio-mii:12: mv88e6xxx_port_vlan_del: port 9 vlan 101
[  108.425866] mv88e6085 d0032004.mdio-mii:10: mv88e6xxx_port_vlan_del: port 9 vlan 101
[  108.452280] mv88e6085 d0032004.mdio-mii:10: mv88e6xxx_port_vlan_del: port 10 vlan 101
root@debian:~# bridge vlan del dev br0 vid 101 pvid self
root@debian:~# bridge vlan del dev br0 vid 101 pvid self
root@debian:~# bridge vlan
port              vlan-id
eth0              1 PVID Egress Untagged
lan9              1 PVID Egress Untagged
lan10             1 PVID Egress Untagged
lan11             1 PVID Egress Untagged
lan12             1 PVID Egress Untagged
lan13             1 PVID Egress Untagged
lan14             1 PVID Egress Untagged
lan15             1 PVID Egress Untagged
lan16             1 PVID Egress Untagged
lan17             1 PVID Egress Untagged
lan18             1 PVID Egress Untagged
lan19             1 PVID Egress Untagged
lan20             1 PVID Egress Untagged
lan21             1 PVID Egress Untagged
lan22             1 PVID Egress Untagged
lan23             1 PVID Egress Untagged
lan24             1 PVID Egress Untagged
sfp               1 PVID Egress Untagged
lan1              1 PVID Egress Untagged
lan2              1 PVID Egress Untagged
lan3              1 PVID Egress Untagged
lan4              1 PVID Egress Untagged
lan5              1 PVID Egress Untagged
lan6              1 PVID Egress Untagged
lan7              1 PVID Egress Untagged
lan8              1 PVID Egress Untagged
br0               1 Egress Untagged
root@debian:~# bridge vlan del dev br0 vid 101 pvid self

Vladimir Oltean (5):
  net: bridge: vlan: br_vlan_add: notify switchdev only when changed
  net: bridge: vlan: nbp_vlan_add: notify switchdev only when changed
  net: bridge: vlan: notify a switchdev deletion when modifying flags of
    existing VLAN
  net: bridge: switchdev: replay VLANs present on the bridge device
    itself
  net: dsa: add explicit support for host bridge VLANs

 include/net/dsa.h         |  12 +++
 net/bridge/br_switchdev.c |   9 ++
 net/bridge/br_vlan.c      | 110 ++++++++++++++++++-----
 net/dsa/dsa2.c            |   2 +
 net/dsa/dsa_priv.h        |   7 ++
 net/dsa/port.c            |  42 +++++++++
 net/dsa/slave.c           |  97 +++++++++++++--------
 net/dsa/switch.c          | 179 ++++++++++++++++++++++++++++++++++++--
 8 files changed, 392 insertions(+), 66 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH net-next 1/5] net: bridge: vlan: br_vlan_add: notify switchdev only when changed
  2022-02-09 21:30 [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA Vladimir Oltean
@ 2022-02-09 21:30 ` Vladimir Oltean
  2022-02-09 21:30 ` [RFC PATCH net-next 2/5] net: bridge: vlan: nbp_vlan_add: " Vladimir Oltean
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Vladimir Oltean @ 2022-02-09 21:30 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Rafael Richter,
	Daniel Klauer, Tobias Waldekranz

When a VLAN entry is added multiple times in a row to a bridge:

bridge vlan add dev br0 vid 100 self

the bridge notifies switchdev each time, even if nothing changed, which
makes driver-level accounting impossible.

Move br_switchdev_port_vlan_add() out of br_vlan_add_existing(), keep
track of exactly what br_vlan_add_existing() changed, only call
br_switchdev_port_vlan_add() afterwards and if anything changed at all,
and restore whatever br_vlan_add_existing() may have done on the error
path of br_switchdev_port_vlan_add().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/bridge/br_vlan.c | 68 ++++++++++++++++++++++++++++++++------------
 1 file changed, 50 insertions(+), 18 deletions(-)

diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 1402d5ca242d..c7cadc1b4f71 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -667,44 +667,56 @@ bool br_should_learn(struct net_bridge_port *p, struct sk_buff *skb, u16 *vid)
 static int br_vlan_add_existing(struct net_bridge *br,
 				struct net_bridge_vlan_group *vg,
 				struct net_bridge_vlan *vlan,
-				u16 flags, bool *changed,
+				u16 flags, bool *brentry_created,
+				bool *flags_changed,
 				struct netlink_ext_ack *extack)
 {
 	int err;
 
-	err = br_switchdev_port_vlan_add(br->dev, vlan->vid, flags, extack);
-	if (err && err != -EOPNOTSUPP)
-		return err;
+	*brentry_created = false;
+	*flags_changed = false;
 
 	if (!br_vlan_is_brentry(vlan)) {
 		/* Trying to change flags of non-existent bridge vlan */
-		if (!(flags & BRIDGE_VLAN_INFO_BRENTRY)) {
-			err = -EINVAL;
-			goto err_flags;
-		}
+		if (!(flags & BRIDGE_VLAN_INFO_BRENTRY))
+			return -EINVAL;
+
 		/* It was only kept for port vlans, now make it real */
 		err = br_fdb_add_local(br, NULL, br->dev->dev_addr, vlan->vid);
 		if (err) {
 			br_err(br, "failed to insert local address into bridge forwarding table\n");
-			goto err_fdb_insert;
+			return err;
 		}
 
 		refcount_inc(&vlan->refcnt);
 		vlan->flags |= BRIDGE_VLAN_INFO_BRENTRY;
 		vg->num_vlans++;
-		*changed = true;
+		*brentry_created = true;
 		br_multicast_toggle_one_vlan(vlan, true);
 	}
 
 	if (__vlan_add_flags(vlan, flags))
-		*changed = true;
+		*flags_changed = true;
 
 	return 0;
+}
 
-err_fdb_insert:
-err_flags:
-	br_switchdev_port_vlan_del(br->dev, vlan->vid);
-	return err;
+static void br_vlan_restore_existing(struct net_bridge *br,
+				     struct net_bridge_vlan_group *vg,
+				     struct net_bridge_vlan *vlan,
+				     u16 flags, bool del_brentry,
+				     bool restore_flags)
+{
+	if (del_brentry) {
+		br_fdb_find_delete_local(br, NULL, br->dev->dev_addr, vlan->vid);
+
+		refcount_dec(&vlan->refcnt);
+		vlan->flags &= ~BRIDGE_VLAN_INFO_BRENTRY;
+		vg->num_vlans--;
+	}
+
+	if (restore_flags)
+		__vlan_add_flags(vlan, flags);
 }
 
 /* Must be protected by RTNL.
@@ -723,9 +735,29 @@ int br_vlan_add(struct net_bridge *br, u16 vid, u16 flags, bool *changed,
 	*changed = false;
 	vg = br_vlan_group(br);
 	vlan = br_vlan_find(vg, vid);
-	if (vlan)
-		return br_vlan_add_existing(br, vg, vlan, flags, changed,
-					    extack);
+	if (vlan) {
+		bool brentry_created, flags_changed;
+		u16 old_flags = vlan->flags;
+
+		ret = br_vlan_add_existing(br, vg, vlan, flags,
+					   &brentry_created, &flags_changed,
+					   extack);
+		if (ret)
+			return ret;
+
+		*changed = brentry_created || flags_changed;
+		if (*changed) {
+			ret = br_switchdev_port_vlan_add(br->dev, vlan->vid,
+							 flags, extack);
+			if (ret && ret != -EOPNOTSUPP) {
+				br_vlan_restore_existing(br, vg, vlan, old_flags,
+							 brentry_created, flags_changed);
+				return ret;
+			}
+		}
+
+		return 0;
+	}
 
 	vlan = kzalloc(sizeof(*vlan), GFP_KERNEL);
 	if (!vlan)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH net-next 2/5] net: bridge: vlan: nbp_vlan_add: notify switchdev only when changed
  2022-02-09 21:30 [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA Vladimir Oltean
  2022-02-09 21:30 ` [RFC PATCH net-next 1/5] net: bridge: vlan: br_vlan_add: notify switchdev only when changed Vladimir Oltean
@ 2022-02-09 21:30 ` Vladimir Oltean
  2022-02-09 21:30 ` [RFC PATCH net-next 3/5] net: bridge: vlan: notify a switchdev deletion when modifying flags of existing VLAN Vladimir Oltean
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Vladimir Oltean @ 2022-02-09 21:30 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Rafael Richter,
	Daniel Klauer, Tobias Waldekranz

When a VLAN entry is added multiple times in a row to a bridge port:

bridge vlan add dev lan12 vid 100 master static

the bridge notifies switchdev each time, even if nothing changed, which
makes driver-level accounting impossible.

Since nbp_vlan_add() changes only the flags of the existing port VLAN,
record the old flags, and restore them on the error path of
br_switchdev_port_vlan_add().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/bridge/br_vlan.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index c7cadc1b4f71..3c149b54124e 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -1279,11 +1279,18 @@ int nbp_vlan_add(struct net_bridge_port *port, u16 vid, u16 flags,
 	*changed = false;
 	vlan = br_vlan_find(nbp_vlan_group(port), vid);
 	if (vlan) {
-		/* Pass the flags to the hardware bridge */
-		ret = br_switchdev_port_vlan_add(port->dev, vid, flags, extack);
-		if (ret && ret != -EOPNOTSUPP)
-			return ret;
+		u16 old_flags = vlan->flags;
+
 		*changed = __vlan_add_flags(vlan, flags);
+		if (*changed) {
+			/* Pass the flags to the hardware bridge */
+			ret = br_switchdev_port_vlan_add(port->dev, vid, flags,
+							 extack);
+			if (ret && ret != -EOPNOTSUPP) {
+				__vlan_add_flags(vlan, old_flags);
+				return ret;
+			}
+		}
 
 		return 0;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH net-next 3/5] net: bridge: vlan: notify a switchdev deletion when modifying flags of existing VLAN
  2022-02-09 21:30 [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA Vladimir Oltean
  2022-02-09 21:30 ` [RFC PATCH net-next 1/5] net: bridge: vlan: br_vlan_add: notify switchdev only when changed Vladimir Oltean
  2022-02-09 21:30 ` [RFC PATCH net-next 2/5] net: bridge: vlan: nbp_vlan_add: " Vladimir Oltean
@ 2022-02-09 21:30 ` Vladimir Oltean
  2022-02-09 21:30 ` [RFC PATCH net-next 4/5] net: bridge: switchdev: replay VLANs present on the bridge device itself Vladimir Oltean
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Vladimir Oltean @ 2022-02-09 21:30 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Rafael Richter,
	Daniel Klauer, Tobias Waldekranz

There are several reasons why this is desirable.

A user is free to install a bridge VLAN several times in a row, with the
same VID just with the flags changed:

bridge vlan add dev swp0 vid 100 pvid untagged
bridge vlan add dev swp0 vid 100

After the second command runs, a subtle implication is that swp0 no
longer has a pvid, so untagged and priority-tagged traffic received on
that port should be dropped.

A quick survey of the in-tree code base shows that at least some
switchdev drivers do not appear to properly detect this condition, like
for example:

- sparx5_vlan_vid_del() sets port->pvid = 0 if the pvid VLAN was
  deleted, but doesn't set port->pvid = 0 in sparx5_vlan_vid_add() if
  the pvid VLAN was re-added as a non-pvid VLAN

- dpaa2_switch_port_del_vlan() detects the deletion of the pvid VLAN and
  sets the port pvid to 4095. However dpaa2_switch_port_add_vlan() does
  not detect when the pvid VLAN was reinstalled as non-pvid, and doesn't
  have logic to do the same thing as port_del_vlan().

- cpsw_port_vlan_del() calls cpsw_set_pvid(priv, 0, 0, 0), but
  cpsw_port_vlan_add() doesn't.

The list can probably go on. Some drivers handle the condition well, but
the point is that subtle code is easy to get wrong and hence best avoided.

The second argument has to do with driver-level bookkeeping of VLANs.
The goal there, broadly speaking, is to be able to count the number of
VLANs in use as the number of calls to br_switchdev_port_vlan_add()
minus the calls to br_switchdev_port_vlan_del(). This is especially
possible since now we no longer call br_switchdev_port_vlan_add() when
nothing has changed about the VLAN. But when a VLAN is readded with
different flags, the number of additions and deletions is unbalanced.

This is particularly problematic for DSA, who would like to treat VLANs
pointing towards the bridge device as VLANs added on the CPU port.
The CPU port is special because it is a shared resource for all net
devices, in other words VLAN X must be present on the CPU port as long
as:
(a) VLAN X is present in the bridge's VLAN group
(b) at least one switch net device is present as a bridge port

So DSA must reference-count VLANs added on the CPU port (once per switch
net device), and only delete VLAN X once there isn't any switch bridge
port offloading the bridge anymore. Using br_switchdev_port_vlan_{add,del}
is a good indication, except that it is easily thrown off when a VLAN is
reinstalled with different flags. Avoiding this condition at DSA's level
would mean keeping track, for each VLAN, which port it came from, and
what flags it had on that port, to distinguish between a change of flags
(which shouldn't bump the refcount) and a real VLAN addition on a
different net device (which should). Doing this would increase the
memory usage beyond reasonable, when we should be able to make do with a
single refcounting integer, not even the flags are needed.

The presented solution is similar to how FDB entry migration is handled
by commit 90dc8fd36078 ("net: bridge: notify switchdev of disappearance
of old FDB entry upon migration"): when a VLAN is readded with different
flags, first delete it, then readd it with the right ones.

This is a bit against the original spirit of switchdev notifiers which
were supposed to be transactional, but I think part of that was an
excuse to keep the complexity in the bridge driver low, and part was an
overly idealistic view of reality, thinking that complex device drivers
would be any better in terms of bug count. The transactional model for
VLAN offloading was removed in commit ffb68fc58e96 ("net: switchdev:
remove the transaction structure from port object notifiers"), and as
such, the only possible solution in the current API is first to delete,
then add. Treating errors in this model is best-effort, meaning that if
the deletion succeeds, addition fails, we attempt to reinstall old VLAN
with original flags, but that may fail too, but we give up and don't
even attempt to check the error code for the restoration. It isn't
formally perfect, but under the assumption that the original VLAN had
already been accepted once by the driver, and we're holding the
rtnl_mutex, so not much can happen between deletion and addition,
I think there's a good chance that the driver will accept to reinstall
the original VLAN. Plus, there are areas of the bridge driver where
errors from switchdev aren't even checked at all (FDBs), so VLANs
wouldn't be the worst part.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/bridge/br_vlan.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 3c149b54124e..5da7e2e23e68 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -747,11 +747,30 @@ int br_vlan_add(struct net_bridge *br, u16 vid, u16 flags, bool *changed,
 
 		*changed = brentry_created || flags_changed;
 		if (*changed) {
+			/* First notify to switchdev a deletion of the VLAN
+			 * with the old flags, then an addition with the new
+			 * ones.
+			 */
+			if (flags_changed) {
+				ret = br_switchdev_port_vlan_del(br->dev,
+								 vlan->vid);
+				if (ret && ret != -EOPNOTSUPP) {
+					br_vlan_restore_existing(br, vg, vlan,
+								 old_flags,
+								 brentry_created,
+								 flags_changed);
+					return ret;
+				}
+			}
+
 			ret = br_switchdev_port_vlan_add(br->dev, vlan->vid,
 							 flags, extack);
 			if (ret && ret != -EOPNOTSUPP) {
 				br_vlan_restore_existing(br, vg, vlan, old_flags,
 							 brentry_created, flags_changed);
+				if (flags_changed)
+					br_switchdev_port_vlan_add(br->dev, vlan->vid,
+								   old_flags, NULL);
 				return ret;
 			}
 		}
@@ -1283,10 +1302,18 @@ int nbp_vlan_add(struct net_bridge_port *port, u16 vid, u16 flags,
 
 		*changed = __vlan_add_flags(vlan, flags);
 		if (*changed) {
-			/* Pass the flags to the hardware bridge */
+			/* Delete the old VLAN and pass the updated flags
+			 * to the hardware bridge.
+			 */
+			ret = br_switchdev_port_vlan_del(port->dev, vid);
+			if (ret && ret != -EOPNOTSUPP)
+				return ret;
+
 			ret = br_switchdev_port_vlan_add(port->dev, vid, flags,
 							 extack);
 			if (ret && ret != -EOPNOTSUPP) {
+				br_switchdev_port_vlan_add(port->dev, vid,
+							   old_flags, NULL);
 				__vlan_add_flags(vlan, old_flags);
 				return ret;
 			}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH net-next 4/5] net: bridge: switchdev: replay VLANs present on the bridge device itself
  2022-02-09 21:30 [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA Vladimir Oltean
                   ` (2 preceding siblings ...)
  2022-02-09 21:30 ` [RFC PATCH net-next 3/5] net: bridge: vlan: notify a switchdev deletion when modifying flags of existing VLAN Vladimir Oltean
@ 2022-02-09 21:30 ` Vladimir Oltean
  2022-02-09 21:30 ` [RFC PATCH net-next 5/5] net: dsa: add explicit support for host bridge VLANs Vladimir Oltean
  2022-02-13 18:54 ` [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA Nikolay Aleksandrov
  5 siblings, 0 replies; 20+ messages in thread
From: Vladimir Oltean @ 2022-02-09 21:30 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Rafael Richter,
	Daniel Klauer, Tobias Waldekranz

The mv88e6xxx DSA driver currently prints this error message as soon as
the first port of a switch joins a bridge:

mv88e6085 0x0000000008b96000:00: port 0 failed to add a6:ef:77:c8:5f:3d vid 1 to fdb: -95

where a6:ef:77:c8:5f:3d vid 1 is a local FDB entry corresponding to the
bridge MAC address in the default_pvid.

The -EOPNOTSUPP is returned by mv88e6xxx_port_db_load_purge() because it
tries to map VID 1 to a FID (the ATU is indexed by FID not VID), but
fails to do so. This is because ->port_fdb_add() is called before
->port_vlan_add() for VID 1.

The abridged timeline of the calls is:

br_add_if
-> netdev_master_upper_dev_link
   -> dsa_port_bridge_join
      -> switchdev_bridge_port_offload
         -> br_switchdev_vlan_replay (*)
         -> br_switchdev_fdb_replay
            -> mv88e6xxx_port_fdb_add
-> nbp_vlan_init
   -> nbp_vlan_add
      -> mv88e6xxx_port_vlan_add

and the issue is that at the time of (*), the bridge port isn't in VID 1
(nbp_vlan_init hasn't been called), therefore br_switchdev_vlan_replay()
won't have anything to replay, therefore VID 1 won't be in the VTU by
the time mv88e6xxx_port_fdb_add() is called.

This happens only when the first port of a switch joins. For further
ports, the initial mv88e6xxx_port_vlan_add() is sufficient for VID 1 to
be loaded in the VTU (which is switch-wide, not per port).

The problem is somewhat unique to mv88e6xxx by chance, because most
other drivers offload an FDB entry by VID, so FDBs and VLANs can be
added asynchronously with respect to each other, but addressing the
issue at the bridge layer makes sense, since what mv88e6xxx requires
isn't absurd.

To fix this problem, we need to recognize that it isn't the VLAN group
of the port that we're interested in, but the VLAN group of the bridge
itself (so it isn't a timing issue, but rather insufficient information
being passed from switchdev to drivers).

Currently nbp_switchdev_sync_objs() only calls br_switchdev_vlan_replay()
for VLANs corresponding to the port, but the VLANs corresponding to the
bridge itself, for local termination, also need to be replayed.
In this case, VID 1 is not (yet) present in the port's VLAN group but is
present in the bridge's VLAN group.

Call br_switchdev_vlan_replay() once more, with the br_dev passed as the
"dev" argument. The function is already prepared to deal with this and
search in the right VLAN group, just that the functionality hasn't been
utilized yet.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/bridge/br_switchdev.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index f8fbaaa7c501..d2d59fcfc0d4 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -681,11 +681,18 @@ static int nbp_switchdev_sync_objs(struct net_bridge_port *p, const void *ctx,
 	struct net_device *dev = p->dev;
 	int err;
 
+	/* Port VLANs */
 	err = br_switchdev_vlan_replay(br_dev, dev, ctx, true, blocking_nb,
 				       extack);
 	if (err && err != -EOPNOTSUPP)
 		return err;
 
+	/* And VLANs on the bridge device */
+	err = br_switchdev_vlan_replay(br_dev, br_dev, ctx, true, blocking_nb,
+				       extack);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
 	err = br_switchdev_mdb_replay(br_dev, dev, ctx, true, blocking_nb,
 				      extack);
 	if (err && err != -EOPNOTSUPP)
@@ -708,6 +715,8 @@ static void nbp_switchdev_unsync_objs(struct net_bridge_port *p,
 
 	br_switchdev_vlan_replay(br_dev, dev, ctx, false, blocking_nb, NULL);
 
+	br_switchdev_vlan_replay(br_dev, br_dev, ctx, false, blocking_nb, NULL);
+
 	br_switchdev_mdb_replay(br_dev, dev, ctx, false, blocking_nb, NULL);
 
 	br_switchdev_fdb_replay(br_dev, ctx, false, atomic_nb);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH net-next 5/5] net: dsa: add explicit support for host bridge VLANs
  2022-02-09 21:30 [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA Vladimir Oltean
                   ` (3 preceding siblings ...)
  2022-02-09 21:30 ` [RFC PATCH net-next 4/5] net: bridge: switchdev: replay VLANs present on the bridge device itself Vladimir Oltean
@ 2022-02-09 21:30 ` Vladimir Oltean
  2022-02-10 15:30   ` Vladimir Oltean
  2022-02-13  1:09   ` Tobias Waldekranz
  2022-02-13 18:54 ` [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA Nikolay Aleksandrov
  5 siblings, 2 replies; 20+ messages in thread
From: Vladimir Oltean @ 2022-02-09 21:30 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Rafael Richter,
	Daniel Klauer, Tobias Waldekranz

Currently, DSA programs VLANs on shared (DSA and CPU) ports each time it
does so on user ports. This is good for basic functionality but has
several limitations:

- the VLAN group which must reach the CPU may be radically different
  from the VLAN group that must be autonomously forwarded by the switch.
  In other words, the admin may want to isolate noisy stations and avoid
  traffic from them going to the control processor of the switch, where
  it would just waste useless cycles. The bridge already supports
  independent control of VLAN groups on bridge ports and on the bridge
  itself, and when VLAN-aware, it will drop packets in software anyway
  if their VID isn't added as a 'self' entry towards the bridge device.

- Replaying host FDB entries may depend, for some drivers like mv88e6xxx,
  on replaying the host VLANs as well. The 2 VLAN groups are
  approximately the same in most regular cases, but there are corner
  cases when timing matters, and DSA's approximation of replicating
  VLANs on shared ports simply does not work.

- It is possible to artificially fill the VLAN table of a switch, by
  walking through the entire VLAN space, adding and deleting them.
  For each VLAN added on a user port, DSA will add it on shared ports
  too, but for each VLAN deletion on a user port, it will remain
  installed on shared ports, since DSA has no good indication of whether
  the VLAN is still in use or not. If the hardware has a limited number
  of VLAN table entries, this may uselessly consume that space.

Now that the bridge driver emits well-balanced SWITCHDEV_OBJ_ID_PORT_VLAN
addition and removal events, DSA has a simple and straightforward task
of separating the bridge port VLANs (these have an orig_dev which is a
DSA slave interface, or a LAG interface) from the host VLANs (these have
an orig_dev which is a bridge interface), and to keep a simple reference
count of each VID on each shared port.

Forwarding VLANs must be installed on the bridge ports and on all DSA
ports interconnecting them. We don't have a good view of the exact
topology, so we simply install forwarding VLANs on all DSA ports, which
is what has been done until now.

Host VLANs must be installed primarily on the dedicated CPU port of each
bridge port. More subtly, they must also be installed on upstream-facing
and downstream-facing DSA ports that are connecting the bridge ports and
the CPU. This ensures that the mv88e6xxx's problem (VID of host FDB
entry may be absent from VTU) is still addressed even if that switch is
in a cross-chip setup, and it has no local CPU port.

Therefore:
- user ports contain only bridge port (forwarding) VLANs, and no
  refcounting is necessary
- DSA ports contain both forwarding and host VLANs. Refcounting is
  necessary among these 2 types.
- CPU ports contain only host VLANs. Refcounting is also necessary.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/net/dsa.h  |  12 +++
 net/dsa/dsa2.c     |   2 +
 net/dsa/dsa_priv.h |   7 ++
 net/dsa/port.c     |  42 +++++++++++
 net/dsa/slave.c    |  97 ++++++++++++++----------
 net/dsa/switch.c   | 179 +++++++++++++++++++++++++++++++++++++++++++--
 6 files changed, 295 insertions(+), 44 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index fd1f62a6e0a8..313295c1b0c6 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -312,6 +312,12 @@ struct dsa_port {
 	struct mutex		addr_lists_lock;
 	struct list_head	fdbs;
 	struct list_head	mdbs;
+
+	/* List of host VLANs that CPU and upstream-facing DSA ports
+	 * are members of.
+	 */
+	struct mutex		vlans_lock;
+	struct list_head	vlans;
 };
 
 /* TODO: ideally DSA ports would have a single dp->link_dp member,
@@ -332,6 +338,12 @@ struct dsa_mac_addr {
 	struct list_head list;
 };
 
+struct dsa_vlan {
+	u16 vid;
+	refcount_t refcount;
+	struct list_head list;
+};
+
 struct dsa_switch {
 	struct device *dev;
 
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index e498c927c3d0..1df8c2356463 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -453,8 +453,10 @@ static int dsa_port_setup(struct dsa_port *dp)
 		return 0;
 
 	mutex_init(&dp->addr_lists_lock);
+	mutex_init(&dp->vlans_lock);
 	INIT_LIST_HEAD(&dp->fdbs);
 	INIT_LIST_HEAD(&dp->mdbs);
+	INIT_LIST_HEAD(&dp->vlans);
 
 	if (ds->ops->port_setup) {
 		err = ds->ops->port_setup(ds, dp->index);
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 2bbfa9efe9f8..6a3878157b0a 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -34,6 +34,8 @@ enum {
 	DSA_NOTIFIER_HOST_MDB_DEL,
 	DSA_NOTIFIER_VLAN_ADD,
 	DSA_NOTIFIER_VLAN_DEL,
+	DSA_NOTIFIER_HOST_VLAN_ADD,
+	DSA_NOTIFIER_HOST_VLAN_DEL,
 	DSA_NOTIFIER_MTU,
 	DSA_NOTIFIER_TAG_PROTO,
 	DSA_NOTIFIER_TAG_PROTO_CONNECT,
@@ -234,6 +236,11 @@ int dsa_port_vlan_add(struct dsa_port *dp,
 		      struct netlink_ext_ack *extack);
 int dsa_port_vlan_del(struct dsa_port *dp,
 		      const struct switchdev_obj_port_vlan *vlan);
+int dsa_port_host_vlan_add(struct dsa_port *dp,
+			   const struct switchdev_obj_port_vlan *vlan,
+			   struct netlink_ext_ack *extack);
+int dsa_port_host_vlan_del(struct dsa_port *dp,
+			   const struct switchdev_obj_port_vlan *vlan);
 int dsa_port_mrp_add(const struct dsa_port *dp,
 		     const struct switchdev_obj_mrp *mrp);
 int dsa_port_mrp_del(const struct dsa_port *dp,
diff --git a/net/dsa/port.c b/net/dsa/port.c
index bd78192e0e47..cca5cf686f74 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -904,6 +904,48 @@ int dsa_port_vlan_del(struct dsa_port *dp,
 	return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_DEL, &info);
 }
 
+int dsa_port_host_vlan_add(struct dsa_port *dp,
+			   const struct switchdev_obj_port_vlan *vlan,
+			   struct netlink_ext_ack *extack)
+{
+	struct dsa_notifier_vlan_info info = {
+		.sw_index = dp->ds->index,
+		.port = dp->index,
+		.vlan = vlan,
+		.extack = extack,
+	};
+	struct dsa_port *cpu_dp = dp->cpu_dp;
+	int err;
+
+	err = dsa_port_notify(dp, DSA_NOTIFIER_HOST_VLAN_ADD, &info);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
+	vlan_vid_add(cpu_dp->master, htons(ETH_P_8021Q), vlan->vid);
+
+	return err;
+}
+
+int dsa_port_host_vlan_del(struct dsa_port *dp,
+			   const struct switchdev_obj_port_vlan *vlan)
+{
+	struct dsa_notifier_vlan_info info = {
+		.sw_index = dp->ds->index,
+		.port = dp->index,
+		.vlan = vlan,
+	};
+	struct dsa_port *cpu_dp = dp->cpu_dp;
+	int err;
+
+	err = dsa_port_notify(dp, DSA_NOTIFIER_HOST_VLAN_DEL, &info);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
+	vlan_vid_del(cpu_dp->master, htons(ETH_P_8021Q), vlan->vid);
+
+	return err;
+}
+
 int dsa_port_mrp_add(const struct dsa_port *dp,
 		     const struct switchdev_obj_mrp *mrp)
 {
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 2b5b0f294233..769dabe7db91 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -348,9 +348,8 @@ static int dsa_slave_vlan_add(struct net_device *dev,
 			      const struct switchdev_obj *obj,
 			      struct netlink_ext_ack *extack)
 {
-	struct net_device *master = dsa_slave_to_master(dev);
 	struct dsa_port *dp = dsa_slave_to_port(dev);
-	struct switchdev_obj_port_vlan vlan;
+	struct switchdev_obj_port_vlan *vlan;
 	int err;
 
 	if (dsa_port_skip_vlan_configuration(dp)) {
@@ -358,14 +357,14 @@ static int dsa_slave_vlan_add(struct net_device *dev,
 		return 0;
 	}
 
-	vlan = *SWITCHDEV_OBJ_PORT_VLAN(obj);
+	vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
 
 	/* Deny adding a bridge VLAN when there is already an 802.1Q upper with
 	 * the same VID.
 	 */
 	if (br_vlan_enabled(dsa_port_bridge_dev_get(dp))) {
 		rcu_read_lock();
-		err = dsa_slave_vlan_check_for_8021q_uppers(dev, &vlan);
+		err = dsa_slave_vlan_check_for_8021q_uppers(dev, vlan);
 		rcu_read_unlock();
 		if (err) {
 			NL_SET_ERR_MSG_MOD(extack,
@@ -374,21 +373,33 @@ static int dsa_slave_vlan_add(struct net_device *dev,
 		}
 	}
 
-	err = dsa_port_vlan_add(dp, &vlan, extack);
-	if (err)
-		return err;
+	return dsa_port_vlan_add(dp, vlan, extack);
+}
+
+static int dsa_slave_host_vlan_add(struct net_device *dev,
+				   const struct switchdev_obj *obj,
+				   struct netlink_ext_ack *extack)
+{
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+	struct switchdev_obj_port_vlan vlan;
 
-	/* We need the dedicated CPU port to be a member of the VLAN as well.
-	 * Even though drivers often handle CPU membership in special ways,
+	if (dsa_port_skip_vlan_configuration(dp)) {
+		NL_SET_ERR_MSG_MOD(extack, "skipping configuration of VLAN");
+		return 0;
+	}
+
+	vlan = *SWITCHDEV_OBJ_PORT_VLAN(obj);
+
+	/* Even though drivers often handle CPU membership in special ways,
 	 * it doesn't make sense to program a PVID, so clear this flag.
 	 */
 	vlan.flags &= ~BRIDGE_VLAN_INFO_PVID;
 
-	err = dsa_port_vlan_add(dp->cpu_dp, &vlan, extack);
-	if (err)
-		return err;
+	/* Skip case 3 VLANs from __vlan_add() from the bridge driver */
+	if (!(vlan.flags & BRIDGE_VLAN_INFO_BRENTRY))
+		return 0;
 
-	return vlan_vid_add(master, htons(ETH_P_8021Q), vlan.vid);
+	return dsa_port_host_vlan_add(dp, &vlan, extack);
 }
 
 static int dsa_slave_port_obj_add(struct net_device *dev, const void *ctx,
@@ -415,10 +426,17 @@ static int dsa_slave_port_obj_add(struct net_device *dev, const void *ctx,
 		err = dsa_port_host_mdb_add(dp, SWITCHDEV_OBJ_PORT_MDB(obj));
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_VLAN:
-		if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev))
-			return -EOPNOTSUPP;
+		if (netif_is_bridge_master(obj->orig_dev)) {
+			if (!dsa_port_offloads_bridge_dev(dp, obj->orig_dev))
+				return -EOPNOTSUPP;
+
+			err = dsa_slave_host_vlan_add(dev, obj, extack);
+		} else {
+			if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev))
+				return -EOPNOTSUPP;
 
-		err = dsa_slave_vlan_add(dev, obj, extack);
+			err = dsa_slave_vlan_add(dev, obj, extack);
+		}
 		break;
 	case SWITCHDEV_OBJ_ID_MRP:
 		if (!dsa_port_offloads_bridge_dev(dp, obj->orig_dev))
@@ -444,26 +462,29 @@ static int dsa_slave_port_obj_add(struct net_device *dev, const void *ctx,
 static int dsa_slave_vlan_del(struct net_device *dev,
 			      const struct switchdev_obj *obj)
 {
-	struct net_device *master = dsa_slave_to_master(dev);
 	struct dsa_port *dp = dsa_slave_to_port(dev);
 	struct switchdev_obj_port_vlan *vlan;
-	int err;
 
 	if (dsa_port_skip_vlan_configuration(dp))
 		return 0;
 
 	vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
 
-	/* Do not deprogram the CPU port as it may be shared with other user
-	 * ports which can be members of this VLAN as well.
-	 */
-	err = dsa_port_vlan_del(dp, vlan);
-	if (err)
-		return err;
+	return dsa_port_vlan_del(dp, vlan);
+}
 
-	vlan_vid_del(master, htons(ETH_P_8021Q), vlan->vid);
+static int dsa_slave_host_vlan_del(struct net_device *dev,
+				   const struct switchdev_obj *obj)
+{
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+	struct switchdev_obj_port_vlan *vlan;
 
-	return 0;
+	if (dsa_port_skip_vlan_configuration(dp))
+		return 0;
+
+	vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
+
+	return dsa_port_host_vlan_del(dp, vlan);
 }
 
 static int dsa_slave_port_obj_del(struct net_device *dev, const void *ctx,
@@ -489,10 +510,17 @@ static int dsa_slave_port_obj_del(struct net_device *dev, const void *ctx,
 		err = dsa_port_host_mdb_del(dp, SWITCHDEV_OBJ_PORT_MDB(obj));
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_VLAN:
-		if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev))
-			return -EOPNOTSUPP;
+		if (netif_is_bridge_master(obj->orig_dev)) {
+			if (!dsa_port_offloads_bridge_dev(dp, obj->orig_dev))
+				return -EOPNOTSUPP;
 
-		err = dsa_slave_vlan_del(dev, obj);
+			err = dsa_slave_host_vlan_del(dev, obj);
+		} else {
+			if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev))
+				return -EOPNOTSUPP;
+
+			err = dsa_slave_vlan_del(dev, obj);
+		}
 		break;
 	case SWITCHDEV_OBJ_ID_MRP:
 		if (!dsa_port_offloads_bridge_dev(dp, obj->orig_dev))
@@ -1405,7 +1433,7 @@ static int dsa_slave_vlan_rx_add_vid(struct net_device *dev, __be16 proto,
 	}
 
 	/* And CPU port... */
-	ret = dsa_port_vlan_add(dp->cpu_dp, &vlan, &extack);
+	ret = dsa_port_host_vlan_add(dp, &vlan, &extack);
 	if (ret) {
 		if (extack._msg)
 			netdev_err(dev, "CPU port %d: %s\n", dp->cpu_dp->index,
@@ -1413,7 +1441,7 @@ static int dsa_slave_vlan_rx_add_vid(struct net_device *dev, __be16 proto,
 		return ret;
 	}
 
-	return vlan_vid_add(master, proto, vid);
+	return 0;
 }
 
 static int dsa_slave_vlan_rx_kill_vid(struct net_device *dev, __be16 proto,
@@ -1428,16 +1456,11 @@ static int dsa_slave_vlan_rx_kill_vid(struct net_device *dev, __be16 proto,
 	};
 	int err;
 
-	/* Do not deprogram the CPU port as it may be shared with other user
-	 * ports which can be members of this VLAN as well.
-	 */
 	err = dsa_port_vlan_del(dp, &vlan);
 	if (err)
 		return err;
 
-	vlan_vid_del(master, proto, vid);
-
-	return 0;
+	return dsa_port_host_vlan_del(dp, &vlan);
 }
 
 static int dsa_slave_restore_vlan(struct net_device *vdev, int vid, void *arg)
diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index 4866b58649e4..9e4570bdea2f 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -558,6 +558,7 @@ static int dsa_switch_host_mdb_del(struct dsa_switch *ds,
 	return err;
 }
 
+/* Port VLANs match on the targeted port and on all DSA ports */
 static bool dsa_port_vlan_match(struct dsa_port *dp,
 				struct dsa_notifier_vlan_info *info)
 {
@@ -570,6 +571,118 @@ static bool dsa_port_vlan_match(struct dsa_port *dp,
 	return false;
 }
 
+/* Host VLANs match on the targeted port's CPU port, and on all DSA ports
+ * (upstream and downstream) of that switch and its upstream switches.
+ */
+static bool dsa_port_host_vlan_match(struct dsa_port *dp,
+				     struct dsa_notifier_vlan_info *info)
+{
+	struct dsa_port *targeted_dp, *cpu_dp;
+	struct dsa_switch *targeted_ds;
+
+	targeted_ds = dsa_switch_find(dp->ds->dst->index, info->sw_index);
+	targeted_dp = dsa_to_port(targeted_ds, info->port);
+	cpu_dp = targeted_dp->cpu_dp;
+
+	if (dsa_switch_is_upstream_of(dp->ds, targeted_ds))
+		return dsa_port_is_dsa(dp) || dp == cpu_dp;
+
+	return false;
+}
+
+static struct dsa_vlan *dsa_vlan_find(struct list_head *vlan_list,
+				      const struct switchdev_obj_port_vlan *vlan)
+{
+	struct dsa_vlan *v;
+
+	list_for_each_entry(v, vlan_list, list)
+		if (v->vid == vlan->vid)
+			return v;
+
+	return NULL;
+}
+
+static int dsa_port_do_vlan_add(struct dsa_port *dp,
+				const struct switchdev_obj_port_vlan *vlan,
+				struct netlink_ext_ack *extack)
+{
+	struct dsa_switch *ds = dp->ds;
+	int port = dp->index;
+	struct dsa_vlan *v;
+	int err = 0;
+
+	/* No need to bother with refcounting for user ports */
+	if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp)))
+		return ds->ops->port_vlan_add(ds, port, vlan, extack);
+
+	mutex_lock(&dp->vlans_lock);
+
+	v = dsa_vlan_find(&dp->vlans, vlan);
+	if (v) {
+		refcount_inc(&v->refcount);
+		goto out;
+	}
+
+	v = kzalloc(sizeof(*v), GFP_KERNEL);
+	if (!v) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = ds->ops->port_vlan_add(ds, port, vlan, extack);
+	if (err) {
+		kfree(v);
+		goto out;
+	}
+
+	v->vid = vlan->vid;
+	refcount_set(&v->refcount, 1);
+	list_add_tail(&v->list, &dp->vlans);
+
+out:
+	mutex_unlock(&dp->vlans_lock);
+
+	return err;
+}
+
+static int dsa_port_do_vlan_del(struct dsa_port *dp,
+				const struct switchdev_obj_port_vlan *vlan)
+{
+	struct dsa_switch *ds = dp->ds;
+	int port = dp->index;
+	struct dsa_vlan *v;
+	int err = 0;
+
+	/* No need to bother with refcounting for user ports */
+	if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp)))
+		return ds->ops->port_vlan_del(ds, port, vlan);
+
+	mutex_lock(&dp->vlans_lock);
+
+	v = dsa_vlan_find(&dp->vlans, vlan);
+	if (!v) {
+		err = -ENOENT;
+		goto out;
+	}
+
+	if (!refcount_dec_and_test(&v->refcount))
+		goto out;
+
+	err = ds->ops->port_vlan_del(ds, port, vlan);
+	if (err) {
+		refcount_set(&v->refcount, 1);
+		goto out;
+	}
+
+	list_del(&v->list);
+	kfree(v);
+
+out:
+	mutex_unlock(&dp->vlans_lock);
+
+	return err;
+}
+
 static int dsa_switch_vlan_add(struct dsa_switch *ds,
 			       struct dsa_notifier_vlan_info *info)
 {
@@ -581,8 +694,8 @@ static int dsa_switch_vlan_add(struct dsa_switch *ds,
 
 	dsa_switch_for_each_port(dp, ds) {
 		if (dsa_port_vlan_match(dp, info)) {
-			err = ds->ops->port_vlan_add(ds, dp->index, info->vlan,
-						     info->extack);
+			err = dsa_port_do_vlan_add(dp, info->vlan,
+						   info->extack);
 			if (err)
 				return err;
 		}
@@ -594,15 +707,61 @@ static int dsa_switch_vlan_add(struct dsa_switch *ds,
 static int dsa_switch_vlan_del(struct dsa_switch *ds,
 			       struct dsa_notifier_vlan_info *info)
 {
+	struct dsa_port *dp;
+	int err;
+
 	if (!ds->ops->port_vlan_del)
 		return -EOPNOTSUPP;
 
-	if (ds->index == info->sw_index)
-		return ds->ops->port_vlan_del(ds, info->port, info->vlan);
+	dsa_switch_for_each_port(dp, ds) {
+		if (dsa_port_vlan_match(dp, info)) {
+			err = dsa_port_do_vlan_del(dp, info->vlan);
+			if (err)
+				return err;
+		}
+	}
+
+	return 0;
+}
+
+static int dsa_switch_host_vlan_add(struct dsa_switch *ds,
+				    struct dsa_notifier_vlan_info *info)
+{
+	struct dsa_port *dp;
+	int err;
+
+	if (!ds->ops->port_vlan_add)
+		return -EOPNOTSUPP;
+
+	dsa_switch_for_each_port(dp, ds) {
+		if (dsa_port_host_vlan_match(dp, info)) {
+			err = dsa_port_do_vlan_add(dp, info->vlan,
+						   info->extack);
+			if (err)
+				return err;
+		}
+	}
+
+	return 0;
+}
+
+static int dsa_switch_host_vlan_del(struct dsa_switch *ds,
+				    struct dsa_notifier_vlan_info *info)
+{
+	struct dsa_port *dp;
+	int err;
+
+	if (!ds->ops->port_vlan_del)
+		return -EOPNOTSUPP;
+
+	dsa_switch_for_each_port(dp, ds) {
+		if (dsa_port_host_vlan_match(dp, info)) {
+			err = dsa_port_do_vlan_del(dp, info->vlan);
+			if (err)
+				return err;
+		}
+	}
 
-	/* Do not deprogram the DSA links as they may be used as conduit
-	 * for other VLAN members in the fabric.
-	 */
 	return 0;
 }
 
@@ -764,6 +923,12 @@ static int dsa_switch_event(struct notifier_block *nb,
 	case DSA_NOTIFIER_VLAN_DEL:
 		err = dsa_switch_vlan_del(ds, info);
 		break;
+	case DSA_NOTIFIER_HOST_VLAN_ADD:
+		err = dsa_switch_host_vlan_add(ds, info);
+		break;
+	case DSA_NOTIFIER_HOST_VLAN_DEL:
+		err = dsa_switch_host_vlan_del(ds, info);
+		break;
 	case DSA_NOTIFIER_MTU:
 		err = dsa_switch_mtu(ds, info);
 		break;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH net-next 5/5] net: dsa: add explicit support for host bridge VLANs
  2022-02-09 21:30 ` [RFC PATCH net-next 5/5] net: dsa: add explicit support for host bridge VLANs Vladimir Oltean
@ 2022-02-10 15:30   ` Vladimir Oltean
  2022-02-13  1:09   ` Tobias Waldekranz
  1 sibling, 0 replies; 20+ messages in thread
From: Vladimir Oltean @ 2022-02-10 15:30 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Rafael Richter,
	Daniel Klauer, Tobias Waldekranz

On Wed, Feb 09, 2022 at 11:30:43PM +0200, Vladimir Oltean wrote:
> - It is possible to artificially fill the VLAN table of a switch, by
>   walking through the entire VLAN space, adding and deleting them.
>   For each VLAN added on a user port, DSA will add it on shared ports
>   too, but for each VLAN deletion on a user port, it will remain
>   installed on shared ports, since DSA has no good indication of whether
>   the VLAN is still in use or not. If the hardware has a limited number
>   of VLAN table entries, this may uselessly consume that space.

There's another, more important angle to this which I forgot while I was
writing the commit message. If you don't have a way to delete VLANs on
CPU ports, you have no way of removing yourself from a VLAN with noisy
stations, once you've entered it. I'll make sure to update the commit
message for v2.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH net-next 5/5] net: dsa: add explicit support for host bridge VLANs
  2022-02-09 21:30 ` [RFC PATCH net-next 5/5] net: dsa: add explicit support for host bridge VLANs Vladimir Oltean
  2022-02-10 15:30   ` Vladimir Oltean
@ 2022-02-13  1:09   ` Tobias Waldekranz
  2022-02-13 11:34     ` Vladimir Oltean
  1 sibling, 1 reply; 20+ messages in thread
From: Tobias Waldekranz @ 2022-02-13  1:09 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Rafael Richter,
	Daniel Klauer


Hi Vladimir,

Thanks for working on this!

On Wed, Feb 09, 2022 at 23:30, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> Currently, DSA programs VLANs on shared (DSA and CPU) ports each time it
> does so on user ports. This is good for basic functionality but has
> several limitations:
>
> - the VLAN group which must reach the CPU may be radically different
>   from the VLAN group that must be autonomously forwarded by the switch.
>   In other words, the admin may want to isolate noisy stations and avoid
>   traffic from them going to the control processor of the switch, where
>   it would just waste useless cycles. The bridge already supports
>   independent control of VLAN groups on bridge ports and on the bridge
>   itself, and when VLAN-aware, it will drop packets in software anyway
>   if their VID isn't added as a 'self' entry towards the bridge device.
>
> - Replaying host FDB entries may depend, for some drivers like mv88e6xxx,
>   on replaying the host VLANs as well. The 2 VLAN groups are
>   approximately the same in most regular cases, but there are corner
>   cases when timing matters, and DSA's approximation of replicating
>   VLANs on shared ports simply does not work.
>
> - It is possible to artificially fill the VLAN table of a switch, by
>   walking through the entire VLAN space, adding and deleting them.
>   For each VLAN added on a user port, DSA will add it on shared ports
>   too, but for each VLAN deletion on a user port, it will remain
>   installed on shared ports, since DSA has no good indication of whether
>   the VLAN is still in use or not. If the hardware has a limited number
>   of VLAN table entries, this may uselessly consume that space.
>
> Now that the bridge driver emits well-balanced SWITCHDEV_OBJ_ID_PORT_VLAN
> addition and removal events, DSA has a simple and straightforward task
> of separating the bridge port VLANs (these have an orig_dev which is a
> DSA slave interface, or a LAG interface) from the host VLANs (these have
> an orig_dev which is a bridge interface), and to keep a simple reference
> count of each VID on each shared port.
>
> Forwarding VLANs must be installed on the bridge ports and on all DSA
> ports interconnecting them. We don't have a good view of the exact
> topology, so we simply install forwarding VLANs on all DSA ports, which
> is what has been done until now.
>
> Host VLANs must be installed primarily on the dedicated CPU port of each
> bridge port. More subtly, they must also be installed on upstream-facing
> and downstream-facing DSA ports that are connecting the bridge ports and
> the CPU. This ensures that the mv88e6xxx's problem (VID of host FDB
> entry may be absent from VTU) is still addressed even if that switch is
> in a cross-chip setup, and it has no local CPU port.
>
> Therefore:
> - user ports contain only bridge port (forwarding) VLANs, and no
>   refcounting is necessary
> - DSA ports contain both forwarding and host VLANs. Refcounting is
>   necessary among these 2 types.
> - CPU ports contain only host VLANs. Refcounting is also necessary.

This is pretty much true, though this does not take foreign interfaces
into account. It would be great if the condifion could be refined to:

    The CPU port should be a member of all VLANs where either (a) the
    host is a member, or (b) at least one foreign interface is a member.

I.e. in a situation like this:

   br0
   / \
swp0 tap0

If br0 is not a member of VLAN X, but tap0 is, then the CPU port still
needs to be a member of X. Otherwise the (software) bridge will never
get a chance to software forward it over the tunnel.

> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  include/net/dsa.h  |  12 +++
>  net/dsa/dsa2.c     |   2 +
>  net/dsa/dsa_priv.h |   7 ++
>  net/dsa/port.c     |  42 +++++++++++
>  net/dsa/slave.c    |  97 ++++++++++++++----------
>  net/dsa/switch.c   | 179 +++++++++++++++++++++++++++++++++++++++++++--
>  6 files changed, 295 insertions(+), 44 deletions(-)
>
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index fd1f62a6e0a8..313295c1b0c6 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -312,6 +312,12 @@ struct dsa_port {
>  	struct mutex		addr_lists_lock;
>  	struct list_head	fdbs;
>  	struct list_head	mdbs;
> +
> +	/* List of host VLANs that CPU and upstream-facing DSA ports
> +	 * are members of.
> +	 */
> +	struct mutex		vlans_lock;
> +	struct list_head	vlans;
>  };
>  
>  /* TODO: ideally DSA ports would have a single dp->link_dp member,
> @@ -332,6 +338,12 @@ struct dsa_mac_addr {
>  	struct list_head list;
>  };
>  
> +struct dsa_vlan {
> +	u16 vid;
> +	refcount_t refcount;
> +	struct list_head list;
> +};
> +
>  struct dsa_switch {
>  	struct device *dev;
>  
> diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
> index e498c927c3d0..1df8c2356463 100644
> --- a/net/dsa/dsa2.c
> +++ b/net/dsa/dsa2.c
> @@ -453,8 +453,10 @@ static int dsa_port_setup(struct dsa_port *dp)
>  		return 0;
>  
>  	mutex_init(&dp->addr_lists_lock);
> +	mutex_init(&dp->vlans_lock);
>  	INIT_LIST_HEAD(&dp->fdbs);
>  	INIT_LIST_HEAD(&dp->mdbs);
> +	INIT_LIST_HEAD(&dp->vlans);
>  
>  	if (ds->ops->port_setup) {
>  		err = ds->ops->port_setup(ds, dp->index);
> diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
> index 2bbfa9efe9f8..6a3878157b0a 100644
> --- a/net/dsa/dsa_priv.h
> +++ b/net/dsa/dsa_priv.h
> @@ -34,6 +34,8 @@ enum {
>  	DSA_NOTIFIER_HOST_MDB_DEL,
>  	DSA_NOTIFIER_VLAN_ADD,
>  	DSA_NOTIFIER_VLAN_DEL,
> +	DSA_NOTIFIER_HOST_VLAN_ADD,
> +	DSA_NOTIFIER_HOST_VLAN_DEL,
>  	DSA_NOTIFIER_MTU,
>  	DSA_NOTIFIER_TAG_PROTO,
>  	DSA_NOTIFIER_TAG_PROTO_CONNECT,
> @@ -234,6 +236,11 @@ int dsa_port_vlan_add(struct dsa_port *dp,
>  		      struct netlink_ext_ack *extack);
>  int dsa_port_vlan_del(struct dsa_port *dp,
>  		      const struct switchdev_obj_port_vlan *vlan);
> +int dsa_port_host_vlan_add(struct dsa_port *dp,
> +			   const struct switchdev_obj_port_vlan *vlan,
> +			   struct netlink_ext_ack *extack);
> +int dsa_port_host_vlan_del(struct dsa_port *dp,
> +			   const struct switchdev_obj_port_vlan *vlan);
>  int dsa_port_mrp_add(const struct dsa_port *dp,
>  		     const struct switchdev_obj_mrp *mrp);
>  int dsa_port_mrp_del(const struct dsa_port *dp,
> diff --git a/net/dsa/port.c b/net/dsa/port.c
> index bd78192e0e47..cca5cf686f74 100644
> --- a/net/dsa/port.c
> +++ b/net/dsa/port.c
> @@ -904,6 +904,48 @@ int dsa_port_vlan_del(struct dsa_port *dp,
>  	return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_DEL, &info);
>  }
>  
> +int dsa_port_host_vlan_add(struct dsa_port *dp,
> +			   const struct switchdev_obj_port_vlan *vlan,
> +			   struct netlink_ext_ack *extack)
> +{
> +	struct dsa_notifier_vlan_info info = {
> +		.sw_index = dp->ds->index,
> +		.port = dp->index,
> +		.vlan = vlan,
> +		.extack = extack,
> +	};
> +	struct dsa_port *cpu_dp = dp->cpu_dp;
> +	int err;
> +
> +	err = dsa_port_notify(dp, DSA_NOTIFIER_HOST_VLAN_ADD, &info);
> +	if (err && err != -EOPNOTSUPP)
> +		return err;
> +
> +	vlan_vid_add(cpu_dp->master, htons(ETH_P_8021Q), vlan->vid);
> +
> +	return err;
> +}
> +
> +int dsa_port_host_vlan_del(struct dsa_port *dp,
> +			   const struct switchdev_obj_port_vlan *vlan)
> +{
> +	struct dsa_notifier_vlan_info info = {
> +		.sw_index = dp->ds->index,
> +		.port = dp->index,
> +		.vlan = vlan,
> +	};
> +	struct dsa_port *cpu_dp = dp->cpu_dp;
> +	int err;
> +
> +	err = dsa_port_notify(dp, DSA_NOTIFIER_HOST_VLAN_DEL, &info);
> +	if (err && err != -EOPNOTSUPP)
> +		return err;
> +
> +	vlan_vid_del(cpu_dp->master, htons(ETH_P_8021Q), vlan->vid);
> +
> +	return err;
> +}
> +
>  int dsa_port_mrp_add(const struct dsa_port *dp,
>  		     const struct switchdev_obj_mrp *mrp)
>  {
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index 2b5b0f294233..769dabe7db91 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -348,9 +348,8 @@ static int dsa_slave_vlan_add(struct net_device *dev,
>  			      const struct switchdev_obj *obj,
>  			      struct netlink_ext_ack *extack)
>  {
> -	struct net_device *master = dsa_slave_to_master(dev);
>  	struct dsa_port *dp = dsa_slave_to_port(dev);
> -	struct switchdev_obj_port_vlan vlan;
> +	struct switchdev_obj_port_vlan *vlan;
>  	int err;
>  
>  	if (dsa_port_skip_vlan_configuration(dp)) {
> @@ -358,14 +357,14 @@ static int dsa_slave_vlan_add(struct net_device *dev,
>  		return 0;
>  	}
>  
> -	vlan = *SWITCHDEV_OBJ_PORT_VLAN(obj);
> +	vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
>  
>  	/* Deny adding a bridge VLAN when there is already an 802.1Q upper with
>  	 * the same VID.
>  	 */
>  	if (br_vlan_enabled(dsa_port_bridge_dev_get(dp))) {
>  		rcu_read_lock();
> -		err = dsa_slave_vlan_check_for_8021q_uppers(dev, &vlan);
> +		err = dsa_slave_vlan_check_for_8021q_uppers(dev, vlan);
>  		rcu_read_unlock();
>  		if (err) {
>  			NL_SET_ERR_MSG_MOD(extack,
> @@ -374,21 +373,33 @@ static int dsa_slave_vlan_add(struct net_device *dev,
>  		}
>  	}
>  
> -	err = dsa_port_vlan_add(dp, &vlan, extack);
> -	if (err)
> -		return err;
> +	return dsa_port_vlan_add(dp, vlan, extack);
> +}
> +
> +static int dsa_slave_host_vlan_add(struct net_device *dev,
> +				   const struct switchdev_obj *obj,
> +				   struct netlink_ext_ack *extack)
> +{
> +	struct dsa_port *dp = dsa_slave_to_port(dev);
> +	struct switchdev_obj_port_vlan vlan;
>  
> -	/* We need the dedicated CPU port to be a member of the VLAN as well.
> -	 * Even though drivers often handle CPU membership in special ways,
> +	if (dsa_port_skip_vlan_configuration(dp)) {
> +		NL_SET_ERR_MSG_MOD(extack, "skipping configuration of VLAN");
> +		return 0;
> +	}
> +
> +	vlan = *SWITCHDEV_OBJ_PORT_VLAN(obj);
> +
> +	/* Even though drivers often handle CPU membership in special ways,
>  	 * it doesn't make sense to program a PVID, so clear this flag.
>  	 */
>  	vlan.flags &= ~BRIDGE_VLAN_INFO_PVID;
>  
> -	err = dsa_port_vlan_add(dp->cpu_dp, &vlan, extack);
> -	if (err)
> -		return err;
> +	/* Skip case 3 VLANs from __vlan_add() from the bridge driver */
> +	if (!(vlan.flags & BRIDGE_VLAN_INFO_BRENTRY))
> +		return 0;
>  
> -	return vlan_vid_add(master, htons(ETH_P_8021Q), vlan.vid);
> +	return dsa_port_host_vlan_add(dp, &vlan, extack);
>  }
>  
>  static int dsa_slave_port_obj_add(struct net_device *dev, const void *ctx,
> @@ -415,10 +426,17 @@ static int dsa_slave_port_obj_add(struct net_device *dev, const void *ctx,
>  		err = dsa_port_host_mdb_add(dp, SWITCHDEV_OBJ_PORT_MDB(obj));
>  		break;
>  	case SWITCHDEV_OBJ_ID_PORT_VLAN:
> -		if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev))
> -			return -EOPNOTSUPP;
> +		if (netif_is_bridge_master(obj->orig_dev)) {
> +			if (!dsa_port_offloads_bridge_dev(dp, obj->orig_dev))
> +				return -EOPNOTSUPP;
> +
> +			err = dsa_slave_host_vlan_add(dev, obj, extack);
> +		} else {
> +			if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev))
> +				return -EOPNOTSUPP;
>  
> -		err = dsa_slave_vlan_add(dev, obj, extack);
> +			err = dsa_slave_vlan_add(dev, obj, extack);
> +		}
>  		break;
>  	case SWITCHDEV_OBJ_ID_MRP:
>  		if (!dsa_port_offloads_bridge_dev(dp, obj->orig_dev))
> @@ -444,26 +462,29 @@ static int dsa_slave_port_obj_add(struct net_device *dev, const void *ctx,
>  static int dsa_slave_vlan_del(struct net_device *dev,
>  			      const struct switchdev_obj *obj)
>  {
> -	struct net_device *master = dsa_slave_to_master(dev);
>  	struct dsa_port *dp = dsa_slave_to_port(dev);
>  	struct switchdev_obj_port_vlan *vlan;
> -	int err;
>  
>  	if (dsa_port_skip_vlan_configuration(dp))
>  		return 0;
>  
>  	vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
>  
> -	/* Do not deprogram the CPU port as it may be shared with other user
> -	 * ports which can be members of this VLAN as well.
> -	 */
> -	err = dsa_port_vlan_del(dp, vlan);
> -	if (err)
> -		return err;
> +	return dsa_port_vlan_del(dp, vlan);
> +}
>  
> -	vlan_vid_del(master, htons(ETH_P_8021Q), vlan->vid);
> +static int dsa_slave_host_vlan_del(struct net_device *dev,
> +				   const struct switchdev_obj *obj)
> +{
> +	struct dsa_port *dp = dsa_slave_to_port(dev);
> +	struct switchdev_obj_port_vlan *vlan;
>  
> -	return 0;
> +	if (dsa_port_skip_vlan_configuration(dp))
> +		return 0;
> +
> +	vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
> +
> +	return dsa_port_host_vlan_del(dp, vlan);
>  }
>  
>  static int dsa_slave_port_obj_del(struct net_device *dev, const void *ctx,
> @@ -489,10 +510,17 @@ static int dsa_slave_port_obj_del(struct net_device *dev, const void *ctx,
>  		err = dsa_port_host_mdb_del(dp, SWITCHDEV_OBJ_PORT_MDB(obj));
>  		break;
>  	case SWITCHDEV_OBJ_ID_PORT_VLAN:
> -		if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev))
> -			return -EOPNOTSUPP;
> +		if (netif_is_bridge_master(obj->orig_dev)) {
> +			if (!dsa_port_offloads_bridge_dev(dp, obj->orig_dev))
> +				return -EOPNOTSUPP;
>  
> -		err = dsa_slave_vlan_del(dev, obj);
> +			err = dsa_slave_host_vlan_del(dev, obj);
> +		} else {
> +			if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev))
> +				return -EOPNOTSUPP;
> +
> +			err = dsa_slave_vlan_del(dev, obj);
> +		}
>  		break;
>  	case SWITCHDEV_OBJ_ID_MRP:
>  		if (!dsa_port_offloads_bridge_dev(dp, obj->orig_dev))
> @@ -1405,7 +1433,7 @@ static int dsa_slave_vlan_rx_add_vid(struct net_device *dev, __be16 proto,
>  	}
>  
>  	/* And CPU port... */
> -	ret = dsa_port_vlan_add(dp->cpu_dp, &vlan, &extack);
> +	ret = dsa_port_host_vlan_add(dp, &vlan, &extack);
>  	if (ret) {
>  		if (extack._msg)
>  			netdev_err(dev, "CPU port %d: %s\n", dp->cpu_dp->index,
> @@ -1413,7 +1441,7 @@ static int dsa_slave_vlan_rx_add_vid(struct net_device *dev, __be16 proto,
>  		return ret;
>  	}
>  
> -	return vlan_vid_add(master, proto, vid);
> +	return 0;
>  }
>  
>  static int dsa_slave_vlan_rx_kill_vid(struct net_device *dev, __be16 proto,
> @@ -1428,16 +1456,11 @@ static int dsa_slave_vlan_rx_kill_vid(struct net_device *dev, __be16 proto,
>  	};
>  	int err;
>  
> -	/* Do not deprogram the CPU port as it may be shared with other user
> -	 * ports which can be members of this VLAN as well.
> -	 */
>  	err = dsa_port_vlan_del(dp, &vlan);
>  	if (err)
>  		return err;
>  
> -	vlan_vid_del(master, proto, vid);
> -
> -	return 0;
> +	return dsa_port_host_vlan_del(dp, &vlan);
>  }
>  
>  static int dsa_slave_restore_vlan(struct net_device *vdev, int vid, void *arg)
> diff --git a/net/dsa/switch.c b/net/dsa/switch.c
> index 4866b58649e4..9e4570bdea2f 100644
> --- a/net/dsa/switch.c
> +++ b/net/dsa/switch.c
> @@ -558,6 +558,7 @@ static int dsa_switch_host_mdb_del(struct dsa_switch *ds,
>  	return err;
>  }
>  
> +/* Port VLANs match on the targeted port and on all DSA ports */
>  static bool dsa_port_vlan_match(struct dsa_port *dp,
>  				struct dsa_notifier_vlan_info *info)
>  {
> @@ -570,6 +571,118 @@ static bool dsa_port_vlan_match(struct dsa_port *dp,
>  	return false;
>  }
>  
> +/* Host VLANs match on the targeted port's CPU port, and on all DSA ports
> + * (upstream and downstream) of that switch and its upstream switches.
> + */
> +static bool dsa_port_host_vlan_match(struct dsa_port *dp,
> +				     struct dsa_notifier_vlan_info *info)
> +{
> +	struct dsa_port *targeted_dp, *cpu_dp;
> +	struct dsa_switch *targeted_ds;
> +
> +	targeted_ds = dsa_switch_find(dp->ds->dst->index, info->sw_index);
> +	targeted_dp = dsa_to_port(targeted_ds, info->port);
> +	cpu_dp = targeted_dp->cpu_dp;
> +
> +	if (dsa_switch_is_upstream_of(dp->ds, targeted_ds))
> +		return dsa_port_is_dsa(dp) || dp == cpu_dp;
> +
> +	return false;
> +}
> +
> +static struct dsa_vlan *dsa_vlan_find(struct list_head *vlan_list,
> +				      const struct switchdev_obj_port_vlan *vlan)
> +{
> +	struct dsa_vlan *v;
> +
> +	list_for_each_entry(v, vlan_list, list)
> +		if (v->vid == vlan->vid)
> +			return v;
> +
> +	return NULL;
> +}
> +
> +static int dsa_port_do_vlan_add(struct dsa_port *dp,
> +				const struct switchdev_obj_port_vlan *vlan,
> +				struct netlink_ext_ack *extack)
> +{
> +	struct dsa_switch *ds = dp->ds;
> +	int port = dp->index;
> +	struct dsa_vlan *v;
> +	int err = 0;
> +
> +	/* No need to bother with refcounting for user ports */
> +	if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp)))
> +		return ds->ops->port_vlan_add(ds, port, vlan, extack);
> +
> +	mutex_lock(&dp->vlans_lock);
> +
> +	v = dsa_vlan_find(&dp->vlans, vlan);
> +	if (v) {
> +		refcount_inc(&v->refcount);
> +		goto out;
> +	}
> +
> +	v = kzalloc(sizeof(*v), GFP_KERNEL);
> +	if (!v) {
> +		err = -ENOMEM;
> +		goto out;
> +	}
> +
> +	err = ds->ops->port_vlan_add(ds, port, vlan, extack);
> +	if (err) {
> +		kfree(v);
> +		goto out;
> +	}
> +
> +	v->vid = vlan->vid;
> +	refcount_set(&v->refcount, 1);
> +	list_add_tail(&v->list, &dp->vlans);
> +
> +out:
> +	mutex_unlock(&dp->vlans_lock);
> +
> +	return err;
> +}
> +
> +static int dsa_port_do_vlan_del(struct dsa_port *dp,
> +				const struct switchdev_obj_port_vlan *vlan)
> +{
> +	struct dsa_switch *ds = dp->ds;
> +	int port = dp->index;
> +	struct dsa_vlan *v;
> +	int err = 0;
> +
> +	/* No need to bother with refcounting for user ports */
> +	if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp)))
> +		return ds->ops->port_vlan_del(ds, port, vlan);
> +
> +	mutex_lock(&dp->vlans_lock);
> +
> +	v = dsa_vlan_find(&dp->vlans, vlan);
> +	if (!v) {
> +		err = -ENOENT;
> +		goto out;
> +	}
> +
> +	if (!refcount_dec_and_test(&v->refcount))
> +		goto out;
> +
> +	err = ds->ops->port_vlan_del(ds, port, vlan);
> +	if (err) {
> +		refcount_set(&v->refcount, 1);
> +		goto out;
> +	}
> +
> +	list_del(&v->list);
> +	kfree(v);
> +
> +out:
> +	mutex_unlock(&dp->vlans_lock);
> +
> +	return err;
> +}
> +
>  static int dsa_switch_vlan_add(struct dsa_switch *ds,
>  			       struct dsa_notifier_vlan_info *info)
>  {
> @@ -581,8 +694,8 @@ static int dsa_switch_vlan_add(struct dsa_switch *ds,
>  
>  	dsa_switch_for_each_port(dp, ds) {
>  		if (dsa_port_vlan_match(dp, info)) {
> -			err = ds->ops->port_vlan_add(ds, dp->index, info->vlan,
> -						     info->extack);
> +			err = dsa_port_do_vlan_add(dp, info->vlan,
> +						   info->extack);
>  			if (err)
>  				return err;
>  		}
> @@ -594,15 +707,61 @@ static int dsa_switch_vlan_add(struct dsa_switch *ds,
>  static int dsa_switch_vlan_del(struct dsa_switch *ds,
>  			       struct dsa_notifier_vlan_info *info)
>  {
> +	struct dsa_port *dp;
> +	int err;
> +
>  	if (!ds->ops->port_vlan_del)
>  		return -EOPNOTSUPP;
>  
> -	if (ds->index == info->sw_index)
> -		return ds->ops->port_vlan_del(ds, info->port, info->vlan);
> +	dsa_switch_for_each_port(dp, ds) {
> +		if (dsa_port_vlan_match(dp, info)) {
> +			err = dsa_port_do_vlan_del(dp, info->vlan);
> +			if (err)
> +				return err;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int dsa_switch_host_vlan_add(struct dsa_switch *ds,
> +				    struct dsa_notifier_vlan_info *info)
> +{
> +	struct dsa_port *dp;
> +	int err;
> +
> +	if (!ds->ops->port_vlan_add)
> +		return -EOPNOTSUPP;
> +
> +	dsa_switch_for_each_port(dp, ds) {
> +		if (dsa_port_host_vlan_match(dp, info)) {
> +			err = dsa_port_do_vlan_add(dp, info->vlan,
> +						   info->extack);
> +			if (err)
> +				return err;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int dsa_switch_host_vlan_del(struct dsa_switch *ds,
> +				    struct dsa_notifier_vlan_info *info)
> +{
> +	struct dsa_port *dp;
> +	int err;
> +
> +	if (!ds->ops->port_vlan_del)
> +		return -EOPNOTSUPP;
> +
> +	dsa_switch_for_each_port(dp, ds) {
> +		if (dsa_port_host_vlan_match(dp, info)) {
> +			err = dsa_port_do_vlan_del(dp, info->vlan);
> +			if (err)
> +				return err;
> +		}
> +	}
>  
> -	/* Do not deprogram the DSA links as they may be used as conduit
> -	 * for other VLAN members in the fabric.
> -	 */
>  	return 0;
>  }
>  
> @@ -764,6 +923,12 @@ static int dsa_switch_event(struct notifier_block *nb,
>  	case DSA_NOTIFIER_VLAN_DEL:
>  		err = dsa_switch_vlan_del(ds, info);
>  		break;
> +	case DSA_NOTIFIER_HOST_VLAN_ADD:
> +		err = dsa_switch_host_vlan_add(ds, info);
> +		break;
> +	case DSA_NOTIFIER_HOST_VLAN_DEL:
> +		err = dsa_switch_host_vlan_del(ds, info);
> +		break;
>  	case DSA_NOTIFIER_MTU:
>  		err = dsa_switch_mtu(ds, info);
>  		break;
> -- 
> 2.25.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH net-next 5/5] net: dsa: add explicit support for host bridge VLANs
  2022-02-13  1:09   ` Tobias Waldekranz
@ 2022-02-13 11:34     ` Vladimir Oltean
  0 siblings, 0 replies; 20+ messages in thread
From: Vladimir Oltean @ 2022-02-13 11:34 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: netdev, Jakub Kicinski, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot, Vladimir Oltean, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Rafael Richter,
	Daniel Klauer

On Sun, Feb 13, 2022 at 02:09:35AM +0100, Tobias Waldekranz wrote:
> > Therefore:
> > - user ports contain only bridge port (forwarding) VLANs, and no
> >   refcounting is necessary
> > - DSA ports contain both forwarding and host VLANs. Refcounting is
> >   necessary among these 2 types.
> > - CPU ports contain only host VLANs. Refcounting is also necessary.
> 
> This is pretty much true, though this does not take foreign interfaces
> into account. It would be great if the condifion could be refined to:
> 
>     The CPU port should be a member of all VLANs where either (a) the
>     host is a member, or (b) at least one foreign interface is a member.
> 
> I.e. in a situation like this:
> 
>    br0
>    / \
> swp0 tap0
> 
> If br0 is not a member of VLAN X, but tap0 is, then the CPU port still
> needs to be a member of X. Otherwise the (software) bridge will never
> get a chance to software forward it over the tunnel.

This is a good observation and it can be done - just in the same way as
we treat FDB entries on foregin interfaces as upstream-facing FDB entries
in DSA.

It also has implications upon the replay procedure, because if we want
this to happen, we must replay all bridge VLAN groups, not just the port
and the bridge VLANs. Again, similar to how br_switchdev_fdb_replay()
replays the entire FDB.

I can start working on these changes, but in parallel I'd also like an
Ack from Nikolay, Roopa, Jiri, Ido on the approach from patches 1-3,
since the whole refcounting thing depends on that. I think it's fairly
safe, but maybe it breaks something I haven't thought of...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA
  2022-02-09 21:30 [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA Vladimir Oltean
                   ` (4 preceding siblings ...)
  2022-02-09 21:30 ` [RFC PATCH net-next 5/5] net: dsa: add explicit support for host bridge VLANs Vladimir Oltean
@ 2022-02-13 18:54 ` Nikolay Aleksandrov
  2022-02-13 20:02   ` Vladimir Oltean
  5 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2022-02-13 18:54 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: Jakub Kicinski, David S. Miller, Florian Fainelli, Andrew Lunn,
	Vivien Didelot, Vladimir Oltean, Roopa Prabhu, Jiri Pirko,
	Ido Schimmel, Rafael Richter, Daniel Klauer, Tobias Waldekranz

On 09/02/2022 23:30, Vladimir Oltean wrote:
> The motivation behind these patches is that Rafael reported the
> following error with mv88e6xxx when the first switch port joins a
> bridge:
> 
> mv88e6085 0x0000000008b96000:00: port 0 failed to add a6:ef:77:c8:5f:3d vid 1 to fdb: -95 (-EOPNOTSUPP)
> 
> The FDB entry that's added is the MAC address of the bridge, in VID 1
> (the default_pvid), being replayed as part of br_add_if() -> ... ->
> nbp_switchdev_sync_objs().
> 
> -EOPNOTSUPP is the mv88e6xxx driver's way of saying that VID 1 doesn't
> exist in the VTU, so it can't program the ATU with a FID, something
> which it needs.
> 
> It appears to be a race, but it isn't, since we only end up installing
> VID 1 in the VTU by coincidence. DSA's approximation of programming
> VLANs on the CPU port together with the user ports breaks down with
> host FDB entries on mv88e6xxx, since that strictly requires the VTU to
> contain the VID. But the user may freely add VLANs pointing just towards
> the bridge, and FDB entries in those VLANs, and DSA will not be aware of
> them, because it only listens for VLANs on user ports.
> 
> To create a solution that scales properly to cross-chip setups and
> doesn't leak entries behind, some changes in the bridge driver are
> required. I believe that these are for the better overall, but I may be
> wrong. Namely, the same refcounting procedure that DSA has in place for
> host FDB and MDB entries can be replicated for VLANs, except that it's
> garbage in, garbage out: the VLAN addition and removal notifications
> from switchdev aren't balanced. So the first 3 patches attempt to deal
> with that.
> 
> This patch set has been superficially tested on a board with 3 mv88e6xxx
> switches in a daisy chain and appears to produce the primary desired
> effect - the driver no longer returns -EOPNOTSUPP when the first port
> joins a bridge, and is successful in performing local termination under
> a VLAN-aware bridge.
> As an additional side effect, it silences the annoying "p%d: already a
> member of VLAN %d\n" warning messages that the mv88e6xxx driver produces
> when coupled with systemd-networkd, and a few VLANs are configured.
> Furthermore, it advances Florian's idea from a few years back, which
> never got merged:
> https://lore.kernel.org/lkml/20180624153339.13572-1-f.fainelli@gmail.com/
> 

Hi,
I don't like the VLAN delete on simple flags change to workaround some devices'
broken behaviour, in general I'd like to avoid adding driver workarounds in the bridge.
Either those drivers should be fixed (best approach IMO), or the workaround should only
affect them, and not everyone. The point is that a vlan has much more state than a simple
fdb, and deleting it could result in a lot more unnecessary churn which can be avoided
if these flags can be changed properly. The host replay sounds good, but please re-work
the flags change logic and push the changes down where they're needed.

Thanks,
 Nik




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA
  2022-02-13 18:54 ` [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA Nikolay Aleksandrov
@ 2022-02-13 20:02   ` Vladimir Oltean
  2022-02-13 20:13     ` Vladimir Oltean
  2022-02-14  9:05     ` Nikolay Aleksandrov
  0 siblings, 2 replies; 20+ messages in thread
From: Vladimir Oltean @ 2022-02-13 20:02 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, Jakub Kicinski, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot, Vladimir Oltean, Roopa Prabhu,
	Jiri Pirko, Ido Schimmel, Rafael Richter, Daniel Klauer,
	Tobias Waldekranz

Hi Nikolay,

On Sun, Feb 13, 2022 at 08:54:50PM +0200, Nikolay Aleksandrov wrote:
> Hi,
> I don't like the VLAN delete on simple flags change to workaround some devices'
> broken behaviour, in general I'd like to avoid adding driver workarounds in the bridge.
> Either those drivers should be fixed (best approach IMO), or the workaround should only
> affect them, and not everyone. The point is that a vlan has much more state than a simple
> fdb, and deleting it could result in a lot more unnecessary churn which can be avoided
> if these flags can be changed properly.

Agree, but the broken drivers was just an added bonus I thought I'd mention,
since the subtle implications of the API struck me as odd the first time
I realized them.

The point is that it's impossible for a switchdev driver to do correct
refcounting for this example (taken from Tobias):

   br0
   / \
swp0 tap0
 ^     ^
DSA   foreign interface

(1) ip link add br0 type bridge
(2) ip link set swp0 master br0
(3) ip link set tap0 master br0
(4) bridge vlan add dev tap0 vid 100
(5) bridge vlan add dev br0 vid 100 self
(6) bridge vlan add dev br0 vid 100 pvid self
(7) bridge vlan add dev br0 vid 100 pvid untagged self
(8) bridge vlan del dev br0 vid 100 self
(8) bridge vlan del dev tap0 vid 100

basically, if DSA were to keep track of the host-facing users of VID 100
in order to keep the CPU port programmed in that VID, it needs a way to
detect the fact that commands (6) and (7) operate on the same VID as (5),
and on a different VID than (8). So practically, it needs to keep a
shadow copy of each bridge VLAN so that it can figure out whether a
switchdev notification is for an existing VLAN or for a new one.

This is really undesirable in my mind as well, and I see two middle grounds
(both untested):

(a) call br_vlan_get_info() from the DSA switchdev notification handler
    to figure out whether the VLAN is new or not. As far as I can see in
    __vlan_add(), br_switchdev_port_vlan_add() is called before the
    insertion of the VLAN into &vg->vlan_hash, so the absence from there
    could be used as an indicator that the VLAN is new, and that the
    refcount needs to be bumped, regardless of knowing exactly which
    bridge or bridge port the VLAN came from. The important part is that
    it isn't just a flag change, for which we don't want to bump the
    refcount, and that we can rely on the bridge database and not keep a
    separate one. The disadvantage seems to be that the solution is a
    bit fragile and puts a bit too much pressure on the bridge code
    structure, if it even works (need to try).

(b) extend struct switchdev_obj_port_vlan with a "bool existing" flag
    which is set to true by the "_add_existing" bridge code paths.
    This flag can be ignored by non-interested parties, and used by DSA
    and others as a hint whether to bump a refcount on the VID or not.

(c) (just a variation of b) I feel there should have been a
    SWITCHDEV_PORT_OBJ_CHANGE instead of just SWITCHDEV_PORT_OBJ_ADD,
    but it's probably too late for that.

So what do you think about option (b)?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA
  2022-02-13 20:02   ` Vladimir Oltean
@ 2022-02-13 20:13     ` Vladimir Oltean
  2022-02-14  9:05     ` Nikolay Aleksandrov
  1 sibling, 0 replies; 20+ messages in thread
From: Vladimir Oltean @ 2022-02-13 20:13 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, Jakub Kicinski, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot, Vladimir Oltean, Roopa Prabhu,
	Jiri Pirko, Ido Schimmel, Rafael Richter, Daniel Klauer,
	Tobias Waldekranz

On Sun, Feb 13, 2022 at 10:02:55PM +0200, Vladimir Oltean wrote:
> (a) call br_vlan_get_info() from the DSA switchdev notification handler
>     to figure out whether the VLAN is new or not. As far as I can see in
>     __vlan_add(), br_switchdev_port_vlan_add() is called before the
>     insertion of the VLAN into &vg->vlan_hash, so the absence from there
>     could be used as an indicator that the VLAN is new, and that the
>     refcount needs to be bumped, regardless of knowing exactly which
>     bridge or bridge port the VLAN came from. The important part is that
>     it isn't just a flag change, for which we don't want to bump the
>     refcount, and that we can rely on the bridge database and not keep a
>     separate one. The disadvantage seems to be that the solution is a
>     bit fragile and puts a bit too much pressure on the bridge code
>     structure, if it even works (need to try).

Ah, this is too fragile, I thought of a case where it's broken already:
for VLAN replays, it's technically a 'new VLAN' not a changed one, yet
br_vlan_get_info() will still find it so it will get detected as changed.
So the bridge has to pass the information that the switchdev notifier is
just for a change of flags somehow.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA
  2022-02-13 20:02   ` Vladimir Oltean
  2022-02-13 20:13     ` Vladimir Oltean
@ 2022-02-14  9:05     ` Nikolay Aleksandrov
  2022-02-14  9:59       ` Ido Schimmel
  1 sibling, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2022-02-14  9:05 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Jakub Kicinski, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot, Vladimir Oltean, Roopa Prabhu,
	Jiri Pirko, Ido Schimmel, Rafael Richter, Daniel Klauer,
	Tobias Waldekranz

On 13/02/2022 22:02, Vladimir Oltean wrote:
> Hi Nikolay,
> 
> On Sun, Feb 13, 2022 at 08:54:50PM +0200, Nikolay Aleksandrov wrote:
>> Hi,
>> I don't like the VLAN delete on simple flags change to workaround some devices'
>> broken behaviour, in general I'd like to avoid adding driver workarounds in the bridge.
>> Either those drivers should be fixed (best approach IMO), or the workaround should only
>> affect them, and not everyone. The point is that a vlan has much more state than a simple
>> fdb, and deleting it could result in a lot more unnecessary churn which can be avoided
>> if these flags can be changed properly.
> 
> Agree, but the broken drivers was just an added bonus I thought I'd mention,
> since the subtle implications of the API struck me as odd the first time
> I realized them.
> 
> The point is that it's impossible for a switchdev driver to do correct
> refcounting for this example (taken from Tobias):
> 
>    br0
>    / \
> swp0 tap0
>  ^     ^
> DSA   foreign interface
> 
> (1) ip link add br0 type bridge
> (2) ip link set swp0 master br0
> (3) ip link set tap0 master br0
> (4) bridge vlan add dev tap0 vid 100
> (5) bridge vlan add dev br0 vid 100 self
> (6) bridge vlan add dev br0 vid 100 pvid self
> (7) bridge vlan add dev br0 vid 100 pvid untagged self
> (8) bridge vlan del dev br0 vid 100 self
> (8) bridge vlan del dev tap0 vid 100
> 
> basically, if DSA were to keep track of the host-facing users of VID 100
> in order to keep the CPU port programmed in that VID, it needs a way to
> detect the fact that commands (6) and (7) operate on the same VID as (5),
> and on a different VID than (8). So practically, it needs to keep a
> shadow copy of each bridge VLAN so that it can figure out whether a
> switchdev notification is for an existing VLAN or for a new one.
> 
> This is really undesirable in my mind as well, and I see two middle grounds
> (both untested):
> 
> (a) call br_vlan_get_info() from the DSA switchdev notification handler
>     to figure out whether the VLAN is new or not. As far as I can see in
>     __vlan_add(), br_switchdev_port_vlan_add() is called before the
>     insertion of the VLAN into &vg->vlan_hash, so the absence from there
>     could be used as an indicator that the VLAN is new, and that the
>     refcount needs to be bumped, regardless of knowing exactly which
>     bridge or bridge port the VLAN came from. The important part is that
>     it isn't just a flag change, for which we don't want to bump the
>     refcount, and that we can rely on the bridge database and not keep a
>     separate one. The disadvantage seems to be that the solution is a
>     bit fragile and puts a bit too much pressure on the bridge code
>     structure, if it even works (need to try).
> 

This is undesirable for many reasons, one of which you already noted. :)

> (b) extend struct switchdev_obj_port_vlan with a "bool existing" flag
>     which is set to true by the "_add_existing" bridge code paths.
>     This flag can be ignored by non-interested parties, and used by DSA
>     and others as a hint whether to bump a refcount on the VID or not.
> 
> (c) (just a variation of b) I feel there should have been a
>     SWITCHDEV_PORT_OBJ_CHANGE instead of just SWITCHDEV_PORT_OBJ_ADD,
>     but it's probably too late for that.
> 
> So what do you think about option (b)?

(b) sounds good if it will be enough for DSA, it looks like the least
intrusive way to do it. Also passing that information would make simpler
some inferring by other means that the vlan already exists in drivers.

Cheers,
 Nik

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA
  2022-02-14  9:05     ` Nikolay Aleksandrov
@ 2022-02-14  9:59       ` Ido Schimmel
  2022-02-14 10:07         ` Vladimir Oltean
  0 siblings, 1 reply; 20+ messages in thread
From: Ido Schimmel @ 2022-02-14  9:59 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Vladimir Oltean, netdev, Jakub Kicinski, David S. Miller,
	Florian Fainelli, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	Roopa Prabhu, Jiri Pirko, Ido Schimmel, Rafael Richter,
	Daniel Klauer, Tobias Waldekranz

On Mon, Feb 14, 2022 at 11:05:54AM +0200, Nikolay Aleksandrov wrote:
> On 13/02/2022 22:02, Vladimir Oltean wrote:
> > Hi Nikolay,
> > 
> > On Sun, Feb 13, 2022 at 08:54:50PM +0200, Nikolay Aleksandrov wrote:
> >> Hi,
> >> I don't like the VLAN delete on simple flags change to workaround some devices'
> >> broken behaviour, in general I'd like to avoid adding driver workarounds in the bridge.
> >> Either those drivers should be fixed (best approach IMO), or the workaround should only
> >> affect them, and not everyone. The point is that a vlan has much more state than a simple
> >> fdb, and deleting it could result in a lot more unnecessary churn which can be avoided
> >> if these flags can be changed properly.
> > 
> > Agree, but the broken drivers was just an added bonus I thought I'd mention,
> > since the subtle implications of the API struck me as odd the first time
> > I realized them.
> > 
> > The point is that it's impossible for a switchdev driver to do correct
> > refcounting for this example (taken from Tobias):
> > 
> >    br0
> >    / \
> > swp0 tap0
> >  ^     ^
> > DSA   foreign interface
> > 
> > (1) ip link add br0 type bridge
> > (2) ip link set swp0 master br0
> > (3) ip link set tap0 master br0
> > (4) bridge vlan add dev tap0 vid 100
> > (5) bridge vlan add dev br0 vid 100 self
> > (6) bridge vlan add dev br0 vid 100 pvid self
> > (7) bridge vlan add dev br0 vid 100 pvid untagged self
> > (8) bridge vlan del dev br0 vid 100 self
> > (8) bridge vlan del dev tap0 vid 100
> > 
> > basically, if DSA were to keep track of the host-facing users of VID 100
> > in order to keep the CPU port programmed in that VID, it needs a way to
> > detect the fact that commands (6) and (7) operate on the same VID as (5),
> > and on a different VID than (8). So practically, it needs to keep a
> > shadow copy of each bridge VLAN so that it can figure out whether a
> > switchdev notification is for an existing VLAN or for a new one.
> > 
> > This is really undesirable in my mind as well, and I see two middle grounds
> > (both untested):
> > 
> > (a) call br_vlan_get_info() from the DSA switchdev notification handler
> >     to figure out whether the VLAN is new or not. As far as I can see in
> >     __vlan_add(), br_switchdev_port_vlan_add() is called before the
> >     insertion of the VLAN into &vg->vlan_hash, so the absence from there
> >     could be used as an indicator that the VLAN is new, and that the
> >     refcount needs to be bumped, regardless of knowing exactly which
> >     bridge or bridge port the VLAN came from. The important part is that
> >     it isn't just a flag change, for which we don't want to bump the
> >     refcount, and that we can rely on the bridge database and not keep a
> >     separate one. The disadvantage seems to be that the solution is a
> >     bit fragile and puts a bit too much pressure on the bridge code
> >     structure, if it even works (need to try).
> > 
> 
> This is undesirable for many reasons, one of which you already noted. :)
> 
> > (b) extend struct switchdev_obj_port_vlan with a "bool existing" flag
> >     which is set to true by the "_add_existing" bridge code paths.
> >     This flag can be ignored by non-interested parties, and used by DSA
> >     and others as a hint whether to bump a refcount on the VID or not.
> > 
> > (c) (just a variation of b) I feel there should have been a
> >     SWITCHDEV_PORT_OBJ_CHANGE instead of just SWITCHDEV_PORT_OBJ_ADD,
> >     but it's probably too late for that.
> > 
> > So what do you think about option (b)?
> 
> (b) sounds good if it will be enough for DSA, it looks like the least
> intrusive way to do it. Also passing that information would make simpler
> some inferring by other means that the vlan already exists in drivers.

Sounds good to me as well. I assume it means patches #1 and #2 will be
changed to make use of this flag and patch #3 will be dropped?

> 
> Cheers,
>  Nik

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA
  2022-02-14  9:59       ` Ido Schimmel
@ 2022-02-14 10:07         ` Vladimir Oltean
  2022-02-14 10:27           ` Nikolay Aleksandrov
  0 siblings, 1 reply; 20+ messages in thread
From: Vladimir Oltean @ 2022-02-14 10:07 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Nikolay Aleksandrov, netdev, Jakub Kicinski, David S. Miller,
	Florian Fainelli, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	Roopa Prabhu, Jiri Pirko, Ido Schimmel, Rafael Richter,
	Daniel Klauer, Tobias Waldekranz

On Mon, Feb 14, 2022 at 11:59:02AM +0200, Ido Schimmel wrote:
> Sounds good to me as well. I assume it means patches #1 and #2 will be
> changed to make use of this flag and patch #3 will be dropped?

Not quite. Patches 1 and 2 will be kept as-is, since fundamentally,
their goal is to eliminate a useless SWITCHDEV_PORT_OBJ_ADD event when
really nothing has changed (flags == old_flags, no brentry was created).

Patch 3 will be replaced with another patch that extends
br_switchdev_port_vlan_add() and struct switchdev_obj_port_vlan with a
"bool changed" meaning that "this notification signifies a change of
flags for an existing VLAN", and the callers of br_switchdev_port_vlan_add()
will populate it with false, or *changed, or *flags_changed, as appropriate.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA
  2022-02-14 10:07         ` Vladimir Oltean
@ 2022-02-14 10:27           ` Nikolay Aleksandrov
  2022-02-14 10:42             ` Vladimir Oltean
  0 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2022-02-14 10:27 UTC (permalink / raw)
  To: Vladimir Oltean, Ido Schimmel
  Cc: netdev, Jakub Kicinski, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot, Vladimir Oltean, Roopa Prabhu,
	Jiri Pirko, Ido Schimmel, Rafael Richter, Daniel Klauer,
	Tobias Waldekranz

On 14/02/2022 12:07, Vladimir Oltean wrote:
> On Mon, Feb 14, 2022 at 11:59:02AM +0200, Ido Schimmel wrote:
>> Sounds good to me as well. I assume it means patches #1 and #2 will be
>> changed to make use of this flag and patch #3 will be dropped?
> 
> Not quite. Patches 1 and 2 will be kept as-is, since fundamentally,
> their goal is to eliminate a useless SWITCHDEV_PORT_OBJ_ADD event when
> really nothing has changed (flags == old_flags, no brentry was created).
> 

I don't think that's needed, a two-line change like
"vlan_already_exists == true && old_flags == flags then do nothing"
would do the trick in DSA for all drivers and avoid the churn of these patches.
It will also keep the order of the events consistent with (most of) the rest
of the bridge. I'd prefer the simpler change which avoids config reverts than
these two patches.

> Patch 3 will be replaced with another patch that extends
> br_switchdev_port_vlan_add() and struct switchdev_obj_port_vlan with a
> "bool changed" meaning that "this notification signifies a change of
> flags for an existing VLAN", and the callers of br_switchdev_port_vlan_add()
> will populate it with false, or *changed, or *flags_changed, as appropriate.

Thanks,
 Nik


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA
  2022-02-14 10:27           ` Nikolay Aleksandrov
@ 2022-02-14 10:42             ` Vladimir Oltean
  2022-02-14 11:11               ` Nikolay Aleksandrov
  0 siblings, 1 reply; 20+ messages in thread
From: Vladimir Oltean @ 2022-02-14 10:42 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Ido Schimmel, netdev, Jakub Kicinski, David S. Miller,
	Florian Fainelli, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	Roopa Prabhu, Jiri Pirko, Ido Schimmel, Rafael Richter,
	Daniel Klauer, Tobias Waldekranz

On Mon, Feb 14, 2022 at 12:27:57PM +0200, Nikolay Aleksandrov wrote:
> On 14/02/2022 12:07, Vladimir Oltean wrote:
> > On Mon, Feb 14, 2022 at 11:59:02AM +0200, Ido Schimmel wrote:
> >> Sounds good to me as well. I assume it means patches #1 and #2 will be
> >> changed to make use of this flag and patch #3 will be dropped?
> > 
> > Not quite. Patches 1 and 2 will be kept as-is, since fundamentally,
> > their goal is to eliminate a useless SWITCHDEV_PORT_OBJ_ADD event when
> > really nothing has changed (flags == old_flags, no brentry was created).
> > 
> 
> I don't think that's needed, a two-line change like
> "vlan_already_exists == true && old_flags == flags then do nothing"
> would do the trick in DSA for all drivers and avoid the churn of these patches.
> It will also keep the order of the events consistent with (most of) the rest
> of the bridge. I'd prefer the simpler change which avoids config reverts than
> these two patches.

I understand you prefer the simpler change which avoids reverting the
struct net_bridge_vlan on error, but "vlan_already_exists == true &&
old_flags == flags then do nothing" is not possible in DSA, because DSA
does not know "old_flags" unless we also pass that via
struct switchdev_obj_port_vlan. If we do that, I don't have much of an
objection, but it still seems cleaner to me if the bridge didn't notify
switchdev at all when it has nothing to say.

Or where do you mean to place the two-line change?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA
  2022-02-14 10:42             ` Vladimir Oltean
@ 2022-02-14 11:11               ` Nikolay Aleksandrov
  2022-02-14 12:04                 ` Vladimir Oltean
  0 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2022-02-14 11:11 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Ido Schimmel, netdev, Jakub Kicinski, David S. Miller,
	Florian Fainelli, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	Roopa Prabhu, Jiri Pirko, Ido Schimmel, Rafael Richter,
	Daniel Klauer, Tobias Waldekranz

On 14/02/2022 12:42, Vladimir Oltean wrote:
> On Mon, Feb 14, 2022 at 12:27:57PM +0200, Nikolay Aleksandrov wrote:
>> On 14/02/2022 12:07, Vladimir Oltean wrote:
>>> On Mon, Feb 14, 2022 at 11:59:02AM +0200, Ido Schimmel wrote:
>>>> Sounds good to me as well. I assume it means patches #1 and #2 will be
>>>> changed to make use of this flag and patch #3 will be dropped?
>>>
>>> Not quite. Patches 1 and 2 will be kept as-is, since fundamentally,
>>> their goal is to eliminate a useless SWITCHDEV_PORT_OBJ_ADD event when
>>> really nothing has changed (flags == old_flags, no brentry was created).
>>>
>>
>> I don't think that's needed, a two-line change like
>> "vlan_already_exists == true && old_flags == flags then do nothing"
>> would do the trick in DSA for all drivers and avoid the churn of these patches.
>> It will also keep the order of the events consistent with (most of) the rest
>> of the bridge. I'd prefer the simpler change which avoids config reverts than
>> these two patches.
> 
> I understand you prefer the simpler change which avoids reverting the
> struct net_bridge_vlan on error, but "vlan_already_exists == true &&
> old_flags == flags then do nothing" is not possible in DSA, because DSA
> does not know "old_flags" unless we also pass that via
> struct switchdev_obj_port_vlan. If we do that, I don't have much of an
> objection, but it still seems cleaner to me if the bridge didn't notify
> switchdev at all when it has nothing to say.
> 
> Or where do you mean to place the two-line change?

You mention a couple of times in both patches that you'd like to add dsa
vlan refcounting infra similar to dsa's host fdb and mdb. So I assumed that involves
keeping track of vlan entries in dsa? If so, then I thought you'd record the old flags there.

Alternatively I don't mind passing old flags if you don't intend on keeping
vlan information in dsa, it's uglier but it will avoid the reverts which will
also avoid additional notifications when these cases are hit. It makes sense
to have both old flags and new flags if the switchdev checks are done pre-config
change so it can veto any transitions it can't handle for some reason.

A third option is to do the flags check in the bridge and avoid doing the
switchdev call (e.g. in br_switchdev_port_vlan_ calls), that should avoid
the reverts as well.

If you intend to add vlan refcounting in dsa, then I'd go with just keeping track
of the flags, you'll have vlan state anyway, otherwise it's up to you. I'm ok
with both options for old flags. 

Cheers,
 Nik

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA
  2022-02-14 11:11               ` Nikolay Aleksandrov
@ 2022-02-14 12:04                 ` Vladimir Oltean
  2022-02-14 15:01                   ` Nikolay Aleksandrov
  0 siblings, 1 reply; 20+ messages in thread
From: Vladimir Oltean @ 2022-02-14 12:04 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Ido Schimmel, netdev, Jakub Kicinski, David S. Miller,
	Florian Fainelli, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	Roopa Prabhu, Jiri Pirko, Ido Schimmel, Rafael Richter,
	Daniel Klauer, Tobias Waldekranz

On Mon, Feb 14, 2022 at 01:11:01PM +0200, Nikolay Aleksandrov wrote:
> On 14/02/2022 12:42, Vladimir Oltean wrote:
> > On Mon, Feb 14, 2022 at 12:27:57PM +0200, Nikolay Aleksandrov wrote:
> >> On 14/02/2022 12:07, Vladimir Oltean wrote:
> >>> On Mon, Feb 14, 2022 at 11:59:02AM +0200, Ido Schimmel wrote:
> >>>> Sounds good to me as well. I assume it means patches #1 and #2 will be
> >>>> changed to make use of this flag and patch #3 will be dropped?
> >>>
> >>> Not quite. Patches 1 and 2 will be kept as-is, since fundamentally,
> >>> their goal is to eliminate a useless SWITCHDEV_PORT_OBJ_ADD event when
> >>> really nothing has changed (flags == old_flags, no brentry was created).
> >>>
> >>
> >> I don't think that's needed, a two-line change like
> >> "vlan_already_exists == true && old_flags == flags then do nothing"
> >> would do the trick in DSA for all drivers and avoid the churn of these patches.
> >> It will also keep the order of the events consistent with (most of) the rest
> >> of the bridge. I'd prefer the simpler change which avoids config reverts than
> >> these two patches.
> > 
> > I understand you prefer the simpler change which avoids reverting the
> > struct net_bridge_vlan on error, but "vlan_already_exists == true &&
> > old_flags == flags then do nothing" is not possible in DSA, because DSA
> > does not know "old_flags" unless we also pass that via
> > struct switchdev_obj_port_vlan. If we do that, I don't have much of an
> > objection, but it still seems cleaner to me if the bridge didn't notify
> > switchdev at all when it has nothing to say.
> > 
> > Or where do you mean to place the two-line change?
> 
> You mention a couple of times in both patches that you'd like to add dsa
> vlan refcounting infra similar to dsa's host fdb and mdb. So I assumed that involves
> keeping track of vlan entries in dsa? If so, then I thought you'd record the old flags there.
> 
> Alternatively I don't mind passing old flags if you don't intend on keeping
> vlan information in dsa, it's uglier but it will avoid the reverts which will
> also avoid additional notifications when these cases are hit. It makes sense
> to have both old flags and new flags if the switchdev checks are done pre-config
> change so it can veto any transitions it can't handle for some reason.
> 
> A third option is to do the flags check in the bridge and avoid doing the
> switchdev call (e.g. in br_switchdev_port_vlan_ calls), that should avoid
> the reverts as well.
> 
> If you intend to add vlan refcounting in dsa, then I'd go with just keeping track
> of the flags, you'll have vlan state anyway, otherwise it's up to you. I'm ok
> with both options for old flags. 

I understand it can be confusing why DSA needs to keep VLAN refcounting
yet it doesn't keep track of flags, so let me explain.

First thing to mention is that VLAN flags on CPU and DSA (cascade) ports
don't make much sense at the level of the DSA core. These ports are
really pipes that transport packets between switches and between the
switch and the CPU, so 'pvid' and 'untagged' don't really make sense.
An unwritten convention is for DSA hardware drivers to program the CPU
and DSA ports such that the VLAN information is unmodified w.r.t. how it
was/will be on the wire. The only important aspect about VLAN on DSA and
CPU ports is the VID membership list.

This isn't the major reason, though, so secondly, I need to introduce a topology.

               eth0                                           eth1
           (DSA master)                                (foreign interface)
                |
                |
                | CPU port (no netdev)
        +----------------+
        |     sw0p0      |
        |                |
        |    Switch 0    |
        |                |
        | sw0p0   sw0p1  |
        +----------------+
            |       | DSA/cascade port (no netdev)
            |       |
            |       |
            |       | DSA/cascade port (no netdev)
            |    +--------------+
            |    | sw1p0        |
            |    |              |
            |    |   Switch 1   |
            |    |              |
            |    |sw1p1  sw1p2  |
            |    +--------------+
 user port  |       |      |
(has netdev)   user port  user port

-----------------------------------------------------------

Example 1: you want forwarding in VLAN 100 between sw0p0, sw1p1 and
sw1p2, so you issue these commands:

ip link add br0 type bridge vlan_filtering 1
ip link set sw0p0 master br0 && bridge vlan add dev sw0p0 vid 100
ip link set sw1p1 master br0 && bridge vlan add dev sw1p1 vid 100
ip link set sw1p2 master br0 && bridge vlan add dev sw1p2 vid 100

DSA must figure out that the sw0p1 and sw1p0 (the cascade ports
interconnecting the 2 switches) must also be members of VID 100.
So it must privately add each cascade port of a switch to this VID,
as long as a user port of that switch is in VID 100.

So the refcounting must be kept on the destination of where those VLANs
are programmed (the cascade ports), not on the sources of where those
VLANs came from (the user ports).

In this example, sw0p1 and sw1p0 will be members of VID 100 and will
have a refcount of 3 on it (it is needed by 3 user ports).

-----------------------------------------------------------

Example 2: you figure out that sw1p2 should in fact be pvid untagged in
VID 100, so you change that:

bridge vlan add dev sw1p2 pvid untagged

DSA doesn't care about a change of flags, so it needs to change nothing
about the DSA ports. It just needs to pass on the notification of the
change of flags to the device driver for sw1p2.

-----------------------------------------------------------

Example 3: you realize that you know what, you don't want sw1p2 to be a
member in VID 100 at all:

bridge vlan del dev sw1p2 vid 100

DSA must decrement the refcount of VID 100 on sw0p1 and sw1p0 from 3 to 2.

-----------------------------------------------------------

Example 4: you want local termination in VID 100 for this bridge, so you do:

bridge vlan add dev br0 vid 100 self

To ensure local termination works for all DSA user ports that are
members of the bridge (sw0p0, sw1p1), DSA must ensure that the CPU port
and the DSA ports towards it are members of VID 100.

So sw0p0 joins VID 100 with a refcount of 1, while sw0p1 and sw1p0 just
bump the refcount again from 2 to 3.

-----------------------------------------------------------

Example 5: you actually want to forward in VID 100 to a foreign
interface as well:

ip link set eth1 master br0 && bridge vlan add dev eth1 vid 100

DSA must bump the refcount of VID 100 on sw0p0 (the CPU port) from 1 to 2,
and for the cascade ports from 3 to 4. This is such that, if we decide
that we no longer want local termination (example 4), VID 100 is not
removed on those ports. Local termination vs software forwarding is
identical from DSA's perspective, but different from the stack's perspective.


The way this worked until now was by an approximation that held quite
honorably more often than not: whenever a bridge VLAN is added to a user
port, just add it to all DSA and CPU ports too, and never remove it.
However, it gets quite limiting.



So the way in which VLAN membership on CPU and DSA ports is refcounted
makes it not really practical to keep original flags of each VLAN on
each source port it came from. Your suggestion of keeping each vlan->flags
in DSA practically boils down to keeping an entire copy of the bridge
VLAN database. I hope it's clearer now why it isn't really the path I
would like to follow.

I can look into pruning useless calls in br_switchdev_port_vlan_add()
itself, if that's what you prefer. Before looking closer at the code,
that's what I originally had in mind, but it would pollute all callers
of that function, since we'd have to provide a "bool existing" +
"u16 old_flags" set of extra arguments to it, and most call paths would
pass "false, 0". When I noticed that the call paths I'm interested in
already compute "bool changed" based on __vlan_add_flags(), I thought it
would be a bit redundant to duplicate logic from that function.
Just reorder the call to __vlan_add_flags() and I have the info I need.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA
  2022-02-14 12:04                 ` Vladimir Oltean
@ 2022-02-14 15:01                   ` Nikolay Aleksandrov
  0 siblings, 0 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2022-02-14 15:01 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Ido Schimmel, netdev, Jakub Kicinski, David S. Miller,
	Florian Fainelli, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	Roopa Prabhu, Jiri Pirko, Ido Schimmel, Rafael Richter,
	Daniel Klauer, Tobias Waldekranz

On 14/02/2022 14:04, Vladimir Oltean wrote:
> On Mon, Feb 14, 2022 at 01:11:01PM +0200, Nikolay Aleksandrov wrote:
>> On 14/02/2022 12:42, Vladimir Oltean wrote:
>>> On Mon, Feb 14, 2022 at 12:27:57PM +0200, Nikolay Aleksandrov wrote:
>>>> On 14/02/2022 12:07, Vladimir Oltean wrote:
>>>>> On Mon, Feb 14, 2022 at 11:59:02AM +0200, Ido Schimmel wrote:
>>>>>> Sounds good to me as well. I assume it means patches #1 and #2 will be
>>>>>> changed to make use of this flag and patch #3 will be dropped?
>>>>>
>>>>> Not quite. Patches 1 and 2 will be kept as-is, since fundamentally,
>>>>> their goal is to eliminate a useless SWITCHDEV_PORT_OBJ_ADD event when
>>>>> really nothing has changed (flags == old_flags, no brentry was created).
>>>>>
>>>>
>>>> I don't think that's needed, a two-line change like
>>>> "vlan_already_exists == true && old_flags == flags then do nothing"
>>>> would do the trick in DSA for all drivers and avoid the churn of these patches.
>>>> It will also keep the order of the events consistent with (most of) the rest
>>>> of the bridge. I'd prefer the simpler change which avoids config reverts than
>>>> these two patches.
>>>
>>> I understand you prefer the simpler change which avoids reverting the
>>> struct net_bridge_vlan on error, but "vlan_already_exists == true &&
>>> old_flags == flags then do nothing" is not possible in DSA, because DSA
>>> does not know "old_flags" unless we also pass that via
>>> struct switchdev_obj_port_vlan. If we do that, I don't have much of an
>>> objection, but it still seems cleaner to me if the bridge didn't notify
>>> switchdev at all when it has nothing to say.
>>>
>>> Or where do you mean to place the two-line change?
>>
>> You mention a couple of times in both patches that you'd like to add dsa
>> vlan refcounting infra similar to dsa's host fdb and mdb. So I assumed that involves
>> keeping track of vlan entries in dsa? If so, then I thought you'd record the old flags there.
>>
>> Alternatively I don't mind passing old flags if you don't intend on keeping
>> vlan information in dsa, it's uglier but it will avoid the reverts which will
>> also avoid additional notifications when these cases are hit. It makes sense
>> to have both old flags and new flags if the switchdev checks are done pre-config
>> change so it can veto any transitions it can't handle for some reason.
>>
>> A third option is to do the flags check in the bridge and avoid doing the
>> switchdev call (e.g. in br_switchdev_port_vlan_ calls), that should avoid
>> the reverts as well.
>>
>> If you intend to add vlan refcounting in dsa, then I'd go with just keeping track
>> of the flags, you'll have vlan state anyway, otherwise it's up to you. I'm ok
>> with both options for old flags. 
> 
> I understand it can be confusing why DSA needs to keep VLAN refcounting
> yet it doesn't keep track of flags, so let me explain.
> 
> First thing to mention is that VLAN flags on CPU and DSA (cascade) ports
> don't make much sense at the level of the DSA core. These ports are
> really pipes that transport packets between switches and between the
> switch and the CPU, so 'pvid' and 'untagged' don't really make sense.
> An unwritten convention is for DSA hardware drivers to program the CPU
> and DSA ports such that the VLAN information is unmodified w.r.t. how it
> was/will be on the wire. The only important aspect about VLAN on DSA and
> CPU ports is the VID membership list.
> 
> This isn't the major reason, though, so secondly, I need to introduce a topology.
> 
>                eth0                                           eth1
>            (DSA master)                                (foreign interface)
>                 |
>                 |
>                 | CPU port (no netdev)
>         +----------------+
>         |     sw0p0      |
>         |                |
>         |    Switch 0    |
>         |                |
>         | sw0p0   sw0p1  |
>         +----------------+
>             |       | DSA/cascade port (no netdev)
>             |       |
>             |       |
>             |       | DSA/cascade port (no netdev)
>             |    +--------------+
>             |    | sw1p0        |
>             |    |              |
>             |    |   Switch 1   |
>             |    |              |
>             |    |sw1p1  sw1p2  |
>             |    +--------------+
>  user port  |       |      |
> (has netdev)   user port  user port
> 
> -----------------------------------------------------------
> 
> Example 1: you want forwarding in VLAN 100 between sw0p0, sw1p1 and
> sw1p2, so you issue these commands:
> 
> ip link add br0 type bridge vlan_filtering 1
> ip link set sw0p0 master br0 && bridge vlan add dev sw0p0 vid 100
> ip link set sw1p1 master br0 && bridge vlan add dev sw1p1 vid 100
> ip link set sw1p2 master br0 && bridge vlan add dev sw1p2 vid 100
> 
> DSA must figure out that the sw0p1 and sw1p0 (the cascade ports
> interconnecting the 2 switches) must also be members of VID 100.
> So it must privately add each cascade port of a switch to this VID,
> as long as a user port of that switch is in VID 100.
> 
> So the refcounting must be kept on the destination of where those VLANs
> are programmed (the cascade ports), not on the sources of where those
> VLANs came from (the user ports).
> 
> In this example, sw0p1 and sw1p0 will be members of VID 100 and will
> have a refcount of 3 on it (it is needed by 3 user ports).
> 
> -----------------------------------------------------------
> 
> Example 2: you figure out that sw1p2 should in fact be pvid untagged in
> VID 100, so you change that:
> 
> bridge vlan add dev sw1p2 pvid untagged
> 
> DSA doesn't care about a change of flags, so it needs to change nothing
> about the DSA ports. It just needs to pass on the notification of the
> change of flags to the device driver for sw1p2.
> 
> -----------------------------------------------------------
> 
> Example 3: you realize that you know what, you don't want sw1p2 to be a
> member in VID 100 at all:
> 
> bridge vlan del dev sw1p2 vid 100
> 
> DSA must decrement the refcount of VID 100 on sw0p1 and sw1p0 from 3 to 2.
> 
> -----------------------------------------------------------
> 
> Example 4: you want local termination in VID 100 for this bridge, so you do:
> 
> bridge vlan add dev br0 vid 100 self
> 
> To ensure local termination works for all DSA user ports that are
> members of the bridge (sw0p0, sw1p1), DSA must ensure that the CPU port
> and the DSA ports towards it are members of VID 100.
> 
> So sw0p0 joins VID 100 with a refcount of 1, while sw0p1 and sw1p0 just
> bump the refcount again from 2 to 3.
> 
> -----------------------------------------------------------
> 
> Example 5: you actually want to forward in VID 100 to a foreign
> interface as well:
> 
> ip link set eth1 master br0 && bridge vlan add dev eth1 vid 100
> 
> DSA must bump the refcount of VID 100 on sw0p0 (the CPU port) from 1 to 2,
> and for the cascade ports from 3 to 4. This is such that, if we decide
> that we no longer want local termination (example 4), VID 100 is not
> removed on those ports. Local termination vs software forwarding is
> identical from DSA's perspective, but different from the stack's perspective.
> 
> 
> The way this worked until now was by an approximation that held quite
> honorably more often than not: whenever a bridge VLAN is added to a user
> port, just add it to all DSA and CPU ports too, and never remove it.
> However, it gets quite limiting.
> 
> 
> 
> So the way in which VLAN membership on CPU and DSA ports is refcounted
> makes it not really practical to keep original flags of each VLAN on
> each source port it came from. Your suggestion of keeping each vlan->flags
> in DSA practically boils down to keeping an entire copy of the bridge
> VLAN database. I hope it's clearer now why it isn't really the path I
> would like to follow.
> 

I see, thank you for the examples and the detailed explanation.

> I can look into pruning useless calls in br_switchdev_port_vlan_add()
> itself, if that's what you prefer. Before looking closer at the code,
> that's what I originally had in mind, but it would pollute all callers
> of that function, since we'd have to provide a "bool existing" +
> "u16 old_flags" set of extra arguments to it, and most call paths would
> pass "false, 0". When I noticed that the call paths I'm interested in
> already compute "bool changed" based on __vlan_add_flags(), I thought it
> would be a bit redundant to duplicate logic from that function.
> Just reorder the call to __vlan_add_flags() and I have the info I need.

You already have to add "existing", so I don't see why not add
old flags as well. As I mentioned it makes sense to have it for pre-config
veto of unsupported state transitions. I think it will be simpler and
we'll avoid the config reverts and more notifications.

Cheers,
 Nik


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-02-14 15:01 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-09 21:30 [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA Vladimir Oltean
2022-02-09 21:30 ` [RFC PATCH net-next 1/5] net: bridge: vlan: br_vlan_add: notify switchdev only when changed Vladimir Oltean
2022-02-09 21:30 ` [RFC PATCH net-next 2/5] net: bridge: vlan: nbp_vlan_add: " Vladimir Oltean
2022-02-09 21:30 ` [RFC PATCH net-next 3/5] net: bridge: vlan: notify a switchdev deletion when modifying flags of existing VLAN Vladimir Oltean
2022-02-09 21:30 ` [RFC PATCH net-next 4/5] net: bridge: switchdev: replay VLANs present on the bridge device itself Vladimir Oltean
2022-02-09 21:30 ` [RFC PATCH net-next 5/5] net: dsa: add explicit support for host bridge VLANs Vladimir Oltean
2022-02-10 15:30   ` Vladimir Oltean
2022-02-13  1:09   ` Tobias Waldekranz
2022-02-13 11:34     ` Vladimir Oltean
2022-02-13 18:54 ` [RFC PATCH net-next 0/5] Replay and offload host VLAN entries in DSA Nikolay Aleksandrov
2022-02-13 20:02   ` Vladimir Oltean
2022-02-13 20:13     ` Vladimir Oltean
2022-02-14  9:05     ` Nikolay Aleksandrov
2022-02-14  9:59       ` Ido Schimmel
2022-02-14 10:07         ` Vladimir Oltean
2022-02-14 10:27           ` Nikolay Aleksandrov
2022-02-14 10:42             ` Vladimir Oltean
2022-02-14 11:11               ` Nikolay Aleksandrov
2022-02-14 12:04                 ` Vladimir Oltean
2022-02-14 15:01                   ` Nikolay Aleksandrov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.