linux-omap.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA
@ 2021-03-22 23:51 Vladimir Oltean
  2021-03-22 23:51 ` [PATCH v4 net-next 01/11] net: bridge: add helper for retrieving the current bridge port STP state Vladimir Oltean
                   ` (11 more replies)
  0 siblings, 12 replies; 19+ messages in thread
From: Vladimir Oltean @ 2021-03-22 23:51 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	Claudiu Manoil, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Ivan Vecera, linux-omap, Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

Changes in v4:
- Added missing EXPORT_SYMBOL_GPL
- Using READ_ONCE(fdb->dst)
- Split patches into (a) adding the bridge helpers (b) making DSA use them
- br_mdb_replay went back to the v1 approach where it allocated memory
  in atomic context
- Created a br_switchdev_mdb_populate which reduces some of the code
  duplication
- Fixed the error message in dsa_port_clear_brport_flags
- Replaced "dsa_port_vlan_filtering(dp, br, extack)" with
  "dsa_port_vlan_filtering(dp, br_vlan_enabled(br), extack)" (duh)
- Added review tags (sorry if I missed any)

The objective of this series is to make LAG uppers on top of switchdev
ports work regardless of which order we link interfaces to their masters
(first make the port join the LAG, then the LAG join the bridge, or the
other way around).

There was a design decision to be made in patches 2-4 on whether we
should adopt the "push" model (which attempts to solve the problem
centrally, in the bridge layer) where the driver just calls:

  switchdev_bridge_port_offloaded(brport_dev,
                                  &atomic_notifier_block,
                                  &blocking_notifier_block,
                                  extack);

and the bridge just replays the entire collection of switchdev port
attributes and objects that it has, in some predefined order and with
some predefined error handling logic;


or the "pull" model (which attempts to solve the problem by giving the
driver the rope to hang itself), where the driver, apart from calling:

  switchdev_bridge_port_offloaded(brport_dev, extack);

has the task of "dumpster diving" (as Tobias puts it) through the bridge
attributes and objects by itself, by calling:

  - br_vlan_replay
  - br_fdb_replay
  - br_mdb_replay
  - br_vlan_enabled
  - br_port_flag_is_set
  - br_port_get_stp_state
  - br_multicast_router
  - br_get_ageing_time

(not necessarily all of them, and not necessarily in this order, and
with driver-defined error handling).

Even though I'm not in love myself with the "pull" model, I chose it
because there is a fundamental trick with replaying switchdev events
like this:

ip link add br0 type bridge
ip link add bond0 type bond
ip link set bond0 master br0
ip link set swp0 master bond0 <- this will replay the objects once for
                                 the bond0 bridge port, and the swp0
                                 switchdev port will process them
ip link set swp1 master bond0 <- this will replay the objects again for
                                 the bond0 bridge port, and the swp1
                                 switchdev port will see them, but swp0
                                 will see them for the second time now

Basically I believe that it is implementation defined whether the driver
wants to error out on switchdev objects seen twice on a port, and the
bridge should not enforce a certain model for that. For example, for FDB
entries added to a bonding interface, the underling switchdev driver
might have an abstraction for just that: an FDB entry pointing towards a
logical (as opposed to physical) port. So when the second port joins the
bridge, it doesn't realy need to replay FDB entries, since there is
already at least one hardware port which has been receiving those
events, and the FDB entries don't need to be added a second time to the
same logical port.
In the other corner, we have the drivers that handle switchdev port
attributes on a LAG as individual switchdev port attributes on physical
ports (example: VLAN filtering). In fact, the switchdev_handle_port_attr_set
helper facilitates this: it is a fan-out from a single orig_dev towards
multiple lowers that pass the check_cb().
But that's the point: switchdev_handle_port_attr_set is just a helper
which the driver _opts_ to use. The bridge can't enforce the "push"
model, because that would assume that all drivers handle port attributes
in the same way, which is probably false.

For this reason, I preferred to go with the "pull" mode for this patch
set. Just to see how bad it is for other switchdev drivers to copy-paste
this logic, I added the pull support to ocelot too, and I think it's
pretty manageable.

Vladimir Oltean (11):
  net: bridge: add helper for retrieving the current bridge port STP
    state
  net: bridge: add helper to retrieve the current ageing time
  net: bridge: add helper to replay port and host-joined mdb entries
  net: bridge: add helper to replay port and local fdb entries
  net: bridge: add helper to replay VLANs installed on port
  net: dsa: call dsa_port_bridge_join when joining a LAG that is already
    in a bridge
  net: dsa: pass extack to dsa_port_{bridge,lag}_join
  net: dsa: inherit the actual bridge port flags at join time
  net: dsa: sync up switchdev objects and port attributes when joining
    the bridge
  net: ocelot: call ocelot_netdevice_bridge_join when joining a bridged
    LAG
  net: ocelot: replay switchdev events when joining bridge

 drivers/net/dsa/ocelot/felix.c         |   4 +-
 drivers/net/ethernet/mscc/Kconfig      |   3 +-
 drivers/net/ethernet/mscc/ocelot.c     |  18 +--
 drivers/net/ethernet/mscc/ocelot_net.c | 208 +++++++++++++++++++++----
 include/linux/if_bridge.h              |  40 +++++
 include/net/switchdev.h                |   1 +
 include/soc/mscc/ocelot.h              |   6 +-
 net/bridge/br_fdb.c                    |  50 ++++++
 net/bridge/br_mdb.c                    | 148 ++++++++++++++++--
 net/bridge/br_stp.c                    |  27 ++++
 net/bridge/br_vlan.c                   |  73 +++++++++
 net/dsa/dsa_priv.h                     |   9 +-
 net/dsa/port.c                         | 197 +++++++++++++++++------
 net/dsa/slave.c                        |  11 +-
 14 files changed, 675 insertions(+), 120 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 net-next 01/11] net: bridge: add helper for retrieving the current bridge port STP state
  2021-03-22 23:51 [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
@ 2021-03-22 23:51 ` Vladimir Oltean
  2021-03-23 10:33   ` Nikolay Aleksandrov
  2021-03-22 23:51 ` [PATCH v4 net-next 02/11] net: bridge: add helper to retrieve the current ageing time Vladimir Oltean
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Vladimir Oltean @ 2021-03-22 23:51 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	Claudiu Manoil, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Ivan Vecera, linux-omap, Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

It may happen that we have the following topology with DSA or any other
switchdev driver with LAG offload:

ip link add br0 type bridge stp_state 1
ip link add bond0 type bond
ip link set bond0 master br0
ip link set swp0 master bond0
ip link set swp1 master bond0

STP decides that it should put bond0 into the BLOCKING state, and
that's that. The ports that are actively listening for the switchdev
port attributes emitted for the bond0 bridge port (because they are
offloading it) and have the honor of seeing that switchdev port
attribute can react to it, so we can program swp0 and swp1 into the
BLOCKING state.

But if then we do:

ip link set swp2 master bond0

then as far as the bridge is concerned, nothing has changed: it still
has one bridge port. But this new bridge port will not see any STP state
change notification and will remain FORWARDING, which is how the
standalone code leaves it in.

We need a function in the bridge driver which retrieves the current STP
state, such that drivers can synchronize to it when they may have missed
switchdev events.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 include/linux/if_bridge.h |  6 ++++++
 net/bridge/br_stp.c       | 14 ++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index b979005ea39c..920d3a02cc68 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -136,6 +136,7 @@ struct net_device *br_fdb_find_port(const struct net_device *br_dev,
 				    __u16 vid);
 void br_fdb_clear_offload(const struct net_device *dev, u16 vid);
 bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
+u8 br_port_get_stp_state(const struct net_device *dev);
 #else
 static inline struct net_device *
 br_fdb_find_port(const struct net_device *br_dev,
@@ -154,6 +155,11 @@ br_port_flag_is_set(const struct net_device *dev, unsigned long flag)
 {
 	return false;
 }
+
+static inline u8 br_port_get_stp_state(const struct net_device *dev)
+{
+	return BR_STATE_DISABLED;
+}
 #endif
 
 #endif
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index 21c6781906aa..86b5e05d3f21 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -64,6 +64,20 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
 	}
 }
 
+u8 br_port_get_stp_state(const struct net_device *dev)
+{
+	struct net_bridge_port *p;
+
+	ASSERT_RTNL();
+
+	p = br_port_get_rtnl(dev);
+	if (!p)
+		return BR_STATE_DISABLED;
+
+	return p->state;
+}
+EXPORT_SYMBOL_GPL(br_port_get_stp_state);
+
 /* called under bridge lock */
 struct net_bridge_port *br_get_port(struct net_bridge *br, u16 port_no)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 net-next 02/11] net: bridge: add helper to retrieve the current ageing time
  2021-03-22 23:51 [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
  2021-03-22 23:51 ` [PATCH v4 net-next 01/11] net: bridge: add helper for retrieving the current bridge port STP state Vladimir Oltean
@ 2021-03-22 23:51 ` Vladimir Oltean
  2021-03-23 10:36   ` Nikolay Aleksandrov
  2021-03-22 23:51 ` [PATCH v4 net-next 03/11] net: bridge: add helper to replay port and host-joined mdb entries Vladimir Oltean
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Vladimir Oltean @ 2021-03-22 23:51 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	Claudiu Manoil, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Ivan Vecera, linux-omap, Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from:

sysfs/ioctl/netlink
-> br_set_ageing_time
   -> __set_ageing_time

therefore not at bridge port creation time, so:
(a) switchdev drivers have to hardcode the initial value for the address
    ageing time, because they didn't get any notification
(b) that hardcoded value can be out of sync, if the user changes the
    ageing time before enslaving the port to the bridge

We need a helper in the bridge, such that switchdev drivers can query
the current value of the bridge ageing time when they start offloading
it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 include/linux/if_bridge.h |  6 ++++++
 net/bridge/br_stp.c       | 13 +++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 920d3a02cc68..ebd16495459c 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -137,6 +137,7 @@ struct net_device *br_fdb_find_port(const struct net_device *br_dev,
 void br_fdb_clear_offload(const struct net_device *dev, u16 vid);
 bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
 u8 br_port_get_stp_state(const struct net_device *dev);
+clock_t br_get_ageing_time(struct net_device *br_dev);
 #else
 static inline struct net_device *
 br_fdb_find_port(const struct net_device *br_dev,
@@ -160,6 +161,11 @@ static inline u8 br_port_get_stp_state(const struct net_device *dev)
 {
 	return BR_STATE_DISABLED;
 }
+
+static inline clock_t br_get_ageing_time(struct net_device *br_dev)
+{
+	return 0;
+}
 #endif
 
 #endif
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index 86b5e05d3f21..3dafb6143cff 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -639,6 +639,19 @@ int br_set_ageing_time(struct net_bridge *br, clock_t ageing_time)
 	return 0;
 }
 
+clock_t br_get_ageing_time(struct net_device *br_dev)
+{
+	struct net_bridge *br;
+
+	if (!netif_is_bridge_master(br_dev))
+		return 0;
+
+	br = netdev_priv(br_dev);
+
+	return jiffies_to_clock_t(br->ageing_time);
+}
+EXPORT_SYMBOL_GPL(br_get_ageing_time);
+
 /* called under bridge lock */
 void __br_set_topology_change(struct net_bridge *br, unsigned char val)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 net-next 03/11] net: bridge: add helper to replay port and host-joined mdb entries
  2021-03-22 23:51 [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
  2021-03-22 23:51 ` [PATCH v4 net-next 01/11] net: bridge: add helper for retrieving the current bridge port STP state Vladimir Oltean
  2021-03-22 23:51 ` [PATCH v4 net-next 02/11] net: bridge: add helper to retrieve the current ageing time Vladimir Oltean
@ 2021-03-22 23:51 ` Vladimir Oltean
  2021-03-23 12:00   ` Nikolay Aleksandrov
  2021-03-22 23:51 ` [PATCH v4 net-next 04/11] net: bridge: add helper to replay port and local fdb entries Vladimir Oltean
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Vladimir Oltean @ 2021-03-22 23:51 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	Claudiu Manoil, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Ivan Vecera, linux-omap, Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

I have a system with DSA ports, and udhcpcd is configured to bring
interfaces up as soon as they are created.

I create a bridge as follows:

ip link add br0 type bridge

As soon as I create the bridge and udhcpcd brings it up, I also have
avahi which automatically starts sending IPv6 packets to advertise some
local services, and because of that, the br0 bridge joins the following
IPv6 groups due to the code path detailed below:

33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0

br_dev_xmit
-> br_multicast_rcv
   -> br_ip6_multicast_add_group
      -> __br_multicast_add_group
         -> br_multicast_host_join
            -> br_mdb_notify

This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.

Then when we add a port to br0:

ip link set swp0 master br0

the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.

The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.

Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.

To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.

There was some opportunity for reuse between br_mdb_switchdev_host_port,
br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
mdb object is created, so a helper was created.

Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/linux/if_bridge.h |   9 +++
 include/net/switchdev.h   |   1 +
 net/bridge/br_mdb.c       | 148 +++++++++++++++++++++++++++++++++-----
 3 files changed, 141 insertions(+), 17 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index ebd16495459c..f6472969bb44 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -69,6 +69,8 @@ bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto);
 bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto);
 bool br_multicast_enabled(const struct net_device *dev);
 bool br_multicast_router(const struct net_device *dev);
+int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
+		  struct notifier_block *nb, struct netlink_ext_ack *extack);
 #else
 static inline int br_multicast_list_adjacent(struct net_device *dev,
 					     struct list_head *br_ip_list)
@@ -93,6 +95,13 @@ static inline bool br_multicast_router(const struct net_device *dev)
 {
 	return false;
 }
+static inline int br_mdb_replay(struct net_device *br_dev,
+				struct net_device *dev,
+				struct notifier_block *nb,
+				struct netlink_ext_ack *extack)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_VLAN_FILTERING)
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index b7fc7d0f54e2..8c3218177136 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -68,6 +68,7 @@ enum switchdev_obj_id {
 };
 
 struct switchdev_obj {
+	struct list_head list;
 	struct net_device *orig_dev;
 	enum switchdev_obj_id id;
 	u32 flags;
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 8846c5bcd075..95fa4af0e8dd 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -506,6 +506,134 @@ static void br_mdb_complete(struct net_device *dev, int err, void *priv)
 	kfree(priv);
 }
 
+static void br_switchdev_mdb_populate(struct switchdev_obj_port_mdb *mdb,
+				      const struct net_bridge_mdb_entry *mp)
+{
+	if (mp->addr.proto == htons(ETH_P_IP))
+		ip_eth_mc_map(mp->addr.dst.ip4, mdb->addr);
+#if IS_ENABLED(CONFIG_IPV6)
+	else if (mp->addr.proto == htons(ETH_P_IPV6))
+		ipv6_eth_mc_map(&mp->addr.dst.ip6, mdb->addr);
+#endif
+	else
+		ether_addr_copy(mdb->addr, mp->addr.dst.mac_addr);
+
+	mdb->vid = mp->addr.vid;
+}
+
+static int br_mdb_replay_one(struct notifier_block *nb, struct net_device *dev,
+			     struct switchdev_obj_port_mdb *mdb,
+			     struct netlink_ext_ack *extack)
+{
+	struct switchdev_notifier_port_obj_info obj_info = {
+		.info = {
+			.dev = dev,
+			.extack = extack,
+		},
+		.obj = &mdb->obj,
+	};
+	int err;
+
+	err = nb->notifier_call(nb, SWITCHDEV_PORT_OBJ_ADD, &obj_info);
+	return notifier_to_errno(err);
+}
+
+static int br_mdb_queue_one(struct list_head *mdb_list,
+			    enum switchdev_obj_id id,
+			    const struct net_bridge_mdb_entry *mp,
+			    struct net_device *orig_dev)
+{
+	struct switchdev_obj_port_mdb *mdb;
+
+	mdb = kzalloc(sizeof(*mdb), GFP_ATOMIC);
+	if (!mdb)
+		return -ENOMEM;
+
+	mdb->obj.id = id;
+	mdb->obj.orig_dev = orig_dev;
+	br_switchdev_mdb_populate(mdb, mp);
+	list_add_tail(&mdb->obj.list, mdb_list);
+
+	return 0;
+}
+
+int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
+		  struct notifier_block *nb, struct netlink_ext_ack *extack)
+{
+	struct net_bridge_mdb_entry *mp;
+	struct switchdev_obj *obj, *tmp;
+	struct net_bridge *br;
+	LIST_HEAD(mdb_list);
+	int err = 0;
+
+	ASSERT_RTNL();
+
+	if (!netif_is_bridge_master(br_dev) || !netif_is_bridge_port(dev))
+		return -EINVAL;
+
+	br = netdev_priv(br_dev);
+
+	if (!br_opt_get(br, BROPT_MULTICAST_ENABLED))
+		return 0;
+
+	/* We cannot walk over br->mdb_list protected just by the rtnl_mutex,
+	 * because the write-side protection is br->multicast_lock. But we
+	 * need to emulate the [ blocking ] calling context of a regular
+	 * switchdev event, so since both br->multicast_lock and RCU read side
+	 * critical sections are atomic, we have no choice but to pick the RCU
+	 * read side lock, queue up all our events, leave the critical section
+	 * and notify switchdev from blocking context.
+	 */
+	rcu_read_lock();
+
+	hlist_for_each_entry_rcu(mp, &br->mdb_list, mdb_node) {
+		struct net_bridge_port_group __rcu **pp;
+		struct net_bridge_port_group *p;
+
+		if (mp->host_joined) {
+			err = br_mdb_queue_one(&mdb_list,
+					       SWITCHDEV_OBJ_ID_HOST_MDB,
+					       mp, br_dev);
+			if (err) {
+				rcu_read_unlock();
+				goto out_free_mdb;
+			}
+		}
+
+		for (pp = &mp->ports; (p = rcu_dereference(*pp)) != NULL;
+		     pp = &p->next) {
+			if (p->key.port->dev != dev)
+				continue;
+
+			err = br_mdb_queue_one(&mdb_list,
+					       SWITCHDEV_OBJ_ID_PORT_MDB,
+					       mp, dev);
+			if (err) {
+				rcu_read_unlock();
+				goto out_free_mdb;
+			}
+		}
+	}
+
+	rcu_read_unlock();
+
+	list_for_each_entry(obj, &mdb_list, list) {
+		err = br_mdb_replay_one(nb, dev, SWITCHDEV_OBJ_PORT_MDB(obj),
+					extack);
+		if (err)
+			goto out_free_mdb;
+	}
+
+out_free_mdb:
+	list_for_each_entry_safe(obj, tmp, &mdb_list, list) {
+		list_del(&obj->list);
+		kfree(SWITCHDEV_OBJ_PORT_MDB(obj));
+	}
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(br_mdb_replay);
+
 static void br_mdb_switchdev_host_port(struct net_device *dev,
 				       struct net_device *lower_dev,
 				       struct net_bridge_mdb_entry *mp,
@@ -515,18 +643,12 @@ static void br_mdb_switchdev_host_port(struct net_device *dev,
 		.obj = {
 			.id = SWITCHDEV_OBJ_ID_HOST_MDB,
 			.flags = SWITCHDEV_F_DEFER,
+			.orig_dev = dev,
 		},
-		.vid = mp->addr.vid,
 	};
 
-	if (mp->addr.proto == htons(ETH_P_IP))
-		ip_eth_mc_map(mp->addr.dst.ip4, mdb.addr);
-#if IS_ENABLED(CONFIG_IPV6)
-	else
-		ipv6_eth_mc_map(&mp->addr.dst.ip6, mdb.addr);
-#endif
+	br_switchdev_mdb_populate(&mdb, mp);
 
-	mdb.obj.orig_dev = dev;
 	switch (type) {
 	case RTM_NEWMDB:
 		switchdev_port_obj_add(lower_dev, &mdb.obj, NULL);
@@ -558,21 +680,13 @@ void br_mdb_notify(struct net_device *dev,
 			.id = SWITCHDEV_OBJ_ID_PORT_MDB,
 			.flags = SWITCHDEV_F_DEFER,
 		},
-		.vid = mp->addr.vid,
 	};
 	struct net *net = dev_net(dev);
 	struct sk_buff *skb;
 	int err = -ENOBUFS;
 
 	if (pg) {
-		if (mp->addr.proto == htons(ETH_P_IP))
-			ip_eth_mc_map(mp->addr.dst.ip4, mdb.addr);
-#if IS_ENABLED(CONFIG_IPV6)
-		else if (mp->addr.proto == htons(ETH_P_IPV6))
-			ipv6_eth_mc_map(&mp->addr.dst.ip6, mdb.addr);
-#endif
-		else
-			ether_addr_copy(mdb.addr, mp->addr.dst.mac_addr);
+		br_switchdev_mdb_populate(&mdb, mp);
 
 		mdb.obj.orig_dev = pg->key.port->dev;
 		switch (type) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 net-next 04/11] net: bridge: add helper to replay port and local fdb entries
  2021-03-22 23:51 [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (2 preceding siblings ...)
  2021-03-22 23:51 ` [PATCH v4 net-next 03/11] net: bridge: add helper to replay port and host-joined mdb entries Vladimir Oltean
@ 2021-03-22 23:51 ` Vladimir Oltean
  2021-03-23 11:12   ` Nikolay Aleksandrov
  2021-03-22 23:51 ` [PATCH v4 net-next 05/11] net: bridge: add helper to replay VLANs installed on port Vladimir Oltean
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Vladimir Oltean @ 2021-03-22 23:51 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	Claudiu Manoil, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Ivan Vecera, linux-omap, Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

When a switchdev port starts offloading a LAG that is already in a
bridge and has an FDB entry pointing to it:

ip link set bond0 master br0
bridge fdb add dev bond0 00:01:02:03:04:05 master static
ip link set swp0 master bond0

the switchdev driver will have no idea that this FDB entry is there,
because it missed the switchdev event emitted at its creation.

Ido Schimmel pointed this out during a discussion about challenges with
switchdev offloading of stacked interfaces between the physical port and
the bridge, and recommended to just catch that condition and deny the
CHANGEUPPER event:
https://lore.kernel.org/netdev/20210210105949.GB287766@shredder.lan/

But in fact, we might need to deal with the hard thing anyway, which is
to replay all FDB addresses relevant to this port, because it isn't just
static FDB entries, but also local addresses (ones that are not
forwarded but terminated by the bridge). There, we can't just say 'oh
yeah, there was an upper already so I'm not joining that'.

So, similar to the logic for replaying MDB entries, add a function that
must be called by individual switchdev drivers and replays local FDB
entries as well as ones pointing towards a bridge port. This time, we
use the atomic switchdev notifier block, since that's what FDB entries
expect for some reason.

Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/linux/if_bridge.h |  9 +++++++
 net/bridge/br_fdb.c       | 50 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index f6472969bb44..b564c4486a45 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -147,6 +147,8 @@ void br_fdb_clear_offload(const struct net_device *dev, u16 vid);
 bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
 u8 br_port_get_stp_state(const struct net_device *dev);
 clock_t br_get_ageing_time(struct net_device *br_dev);
+int br_fdb_replay(struct net_device *br_dev, struct net_device *dev,
+		  struct notifier_block *nb);
 #else
 static inline struct net_device *
 br_fdb_find_port(const struct net_device *br_dev,
@@ -175,6 +177,13 @@ static inline clock_t br_get_ageing_time(struct net_device *br_dev)
 {
 	return 0;
 }
+
+static inline int br_fdb_replay(struct net_device *br_dev,
+				struct net_device *dev,
+				struct notifier_block *nb)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 #endif
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index b7490237f3fc..698b79747d32 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -726,6 +726,56 @@ static inline size_t fdb_nlmsg_size(void)
 		+ nla_total_size(sizeof(u8)); /* NFEA_ACTIVITY_NOTIFY */
 }
 
+static int br_fdb_replay_one(struct notifier_block *nb,
+			     struct net_bridge_fdb_entry *fdb,
+			     struct net_device *dev)
+{
+	struct switchdev_notifier_fdb_info item;
+	int err;
+
+	item.addr = fdb->key.addr.addr;
+	item.vid = fdb->key.vlan_id;
+	item.added_by_user = test_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
+	item.offloaded = test_bit(BR_FDB_OFFLOADED, &fdb->flags);
+	item.info.dev = dev;
+
+	err = nb->notifier_call(nb, SWITCHDEV_FDB_ADD_TO_DEVICE, &item);
+	return notifier_to_errno(err);
+}
+
+int br_fdb_replay(struct net_device *br_dev, struct net_device *dev,
+		  struct notifier_block *nb)
+{
+	struct net_bridge_fdb_entry *fdb;
+	struct net_bridge *br;
+	int err = 0;
+
+	if (!netif_is_bridge_master(br_dev) || !netif_is_bridge_port(dev))
+		return -EINVAL;
+
+	br = netdev_priv(br_dev);
+
+	rcu_read_lock();
+
+	hlist_for_each_entry_rcu(fdb, &br->fdb_list, fdb_node) {
+		struct net_bridge_port *dst = READ_ONCE(fdb->dst);
+		struct net_device *dst_dev;
+
+		dst_dev = dst ? dst->dev : br->dev;
+		if (dst_dev != br_dev && dst_dev != dev)
+			continue;
+
+		err = br_fdb_replay_one(nb, fdb, dst_dev);
+		if (err)
+			break;
+	}
+
+	rcu_read_unlock();
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(br_fdb_replay);
+
 static void fdb_notify(struct net_bridge *br,
 		       const struct net_bridge_fdb_entry *fdb, int type,
 		       bool swdev_notify)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 net-next 05/11] net: bridge: add helper to replay VLANs installed on port
  2021-03-22 23:51 [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (3 preceding siblings ...)
  2021-03-22 23:51 ` [PATCH v4 net-next 04/11] net: bridge: add helper to replay port and local fdb entries Vladimir Oltean
@ 2021-03-22 23:51 ` Vladimir Oltean
  2021-03-23 14:06   ` Nikolay Aleksandrov
  2021-03-22 23:51 ` [PATCH v4 net-next 06/11] net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a bridge Vladimir Oltean
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Vladimir Oltean @ 2021-03-22 23:51 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	Claudiu Manoil, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Ivan Vecera, linux-omap, Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

Currently this simple setup with DSA:

ip link add br0 type bridge vlan_filtering 1
ip link add bond0 type bond
ip link set bond0 master br0
ip link set swp0 master bond0

will not work because the bridge has created the PVID in br_add_if ->
nbp_vlan_init, and it has notified switchdev of the existence of VLAN 1,
but that was too early, since swp0 was not yet a lower of bond0, so it
had no reason to act upon that notification.

We need a helper in the bridge to replay the switchdev VLAN objects that
were notified since the bridge port creation, because some of them may
have been missed.

As opposed to the br_mdb_replay function, the vg->vlan_list write side
protection is offered by the rtnl_mutex which is sleepable, so we don't
need to queue up the objects in atomic context, we can replay them right
away.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/linux/if_bridge.h | 10 ++++++
 net/bridge/br_vlan.c      | 73 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index b564c4486a45..2cc35038a8ca 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -111,6 +111,8 @@ int br_vlan_get_pvid_rcu(const struct net_device *dev, u16 *p_pvid);
 int br_vlan_get_proto(const struct net_device *dev, u16 *p_proto);
 int br_vlan_get_info(const struct net_device *dev, u16 vid,
 		     struct bridge_vlan_info *p_vinfo);
+int br_vlan_replay(struct net_device *br_dev, struct net_device *dev,
+		   struct notifier_block *nb, struct netlink_ext_ack *extack);
 #else
 static inline bool br_vlan_enabled(const struct net_device *dev)
 {
@@ -137,6 +139,14 @@ static inline int br_vlan_get_info(const struct net_device *dev, u16 vid,
 {
 	return -EINVAL;
 }
+
+static inline int br_vlan_replay(struct net_device *br_dev,
+				 struct net_device *dev,
+				 struct notifier_block *nb,
+				 struct netlink_ext_ack *extack)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 #if IS_ENABLED(CONFIG_BRIDGE)
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 8829f621b8ec..ca8daccff217 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -1751,6 +1751,79 @@ void br_vlan_notify(const struct net_bridge *br,
 	kfree_skb(skb);
 }
 
+static int br_vlan_replay_one(struct notifier_block *nb,
+			      struct net_device *dev,
+			      struct switchdev_obj_port_vlan *vlan,
+			      struct netlink_ext_ack *extack)
+{
+	struct switchdev_notifier_port_obj_info obj_info = {
+		.info = {
+			.dev = dev,
+			.extack = extack,
+		},
+		.obj = &vlan->obj,
+	};
+	int err;
+
+	err = nb->notifier_call(nb, SWITCHDEV_PORT_OBJ_ADD, &obj_info);
+	return notifier_to_errno(err);
+}
+
+int br_vlan_replay(struct net_device *br_dev, struct net_device *dev,
+		   struct notifier_block *nb, struct netlink_ext_ack *extack)
+{
+	struct net_bridge_vlan_group *vg;
+	struct net_bridge_vlan *v;
+	struct net_bridge_port *p;
+	struct net_bridge *br;
+	int err = 0;
+	u16 pvid;
+
+	ASSERT_RTNL();
+
+	if (!netif_is_bridge_master(br_dev))
+		return -EINVAL;
+
+	if (!netif_is_bridge_master(dev) && !netif_is_bridge_port(dev))
+		return -EINVAL;
+
+	if (netif_is_bridge_master(dev)) {
+		br = netdev_priv(dev);
+		vg = br_vlan_group(br);
+		p = NULL;
+	} else {
+		p = br_port_get_rtnl(dev);
+		if (WARN_ON(!p))
+			return -EINVAL;
+		vg = nbp_vlan_group(p);
+		br = p->br;
+	}
+
+	if (!vg)
+		return 0;
+
+	pvid = br_get_pvid(vg);
+
+	list_for_each_entry(v, &vg->vlan_list, vlist) {
+		struct switchdev_obj_port_vlan vlan = {
+			.obj.orig_dev = dev,
+			.obj.id = SWITCHDEV_OBJ_ID_PORT_VLAN,
+			.flags = br_vlan_flags(v, pvid),
+			.vid = v->vid,
+		};
+
+		if (!br_vlan_should_use(v))
+			continue;
+
+		br_vlan_replay_one(nb, dev, &vlan, extack);
+		if (err)
+			return err;
+	}
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(br_vlan_replay);
+
 /* check if v_curr can enter a range ending in range_end */
 bool br_vlan_can_enter_range(const struct net_bridge_vlan *v_curr,
 			     const struct net_bridge_vlan *range_end)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 net-next 06/11] net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a bridge
  2021-03-22 23:51 [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (4 preceding siblings ...)
  2021-03-22 23:51 ` [PATCH v4 net-next 05/11] net: bridge: add helper to replay VLANs installed on port Vladimir Oltean
@ 2021-03-22 23:51 ` Vladimir Oltean
  2021-03-22 23:51 ` [PATCH v4 net-next 07/11] net: dsa: pass extack to dsa_port_{bridge,lag}_join Vladimir Oltean
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Vladimir Oltean @ 2021-03-22 23:51 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	Claudiu Manoil, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Ivan Vecera, linux-omap, Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

DSA can properly detect and offload this sequence of operations:

ip link add br0 type bridge
ip link add bond0 type bond
ip link set swp0 master bond0
ip link set bond0 master br0

But not this one:

ip link add br0 type bridge
ip link add bond0 type bond
ip link set bond0 master br0
ip link set swp0 master bond0

Actually the second one is more complicated, due to the elapsed time
between the enslavement of bond0 and the offloading of it via swp0, a
lot of things could have happened to the bond0 bridge port in terms of
switchdev objects (host MDBs, VLANs, altered STP state etc). So this is
a bit of a can of worms, and making sure that the DSA port's state is in
sync with this already existing bridge port is handled in the next
patches.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 net/dsa/port.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/net/dsa/port.c b/net/dsa/port.c
index c9c6d7ab3f47..d39262a9fe0e 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -249,17 +249,31 @@ int dsa_port_lag_join(struct dsa_port *dp, struct net_device *lag,
 		.lag = lag,
 		.info = uinfo,
 	};
+	struct net_device *bridge_dev;
 	int err;
 
 	dsa_lag_map(dp->ds->dst, lag);
 	dp->lag_dev = lag;
 
 	err = dsa_port_notify(dp, DSA_NOTIFIER_LAG_JOIN, &info);
-	if (err) {
-		dp->lag_dev = NULL;
-		dsa_lag_unmap(dp->ds->dst, lag);
-	}
+	if (err)
+		goto err_lag_join;
 
+	bridge_dev = netdev_master_upper_dev_get(lag);
+	if (!bridge_dev || !netif_is_bridge_master(bridge_dev))
+		return 0;
+
+	err = dsa_port_bridge_join(dp, bridge_dev);
+	if (err)
+		goto err_bridge_join;
+
+	return 0;
+
+err_bridge_join:
+	dsa_port_notify(dp, DSA_NOTIFIER_LAG_LEAVE, &info);
+err_lag_join:
+	dp->lag_dev = NULL;
+	dsa_lag_unmap(dp->ds->dst, lag);
 	return err;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 net-next 07/11] net: dsa: pass extack to dsa_port_{bridge,lag}_join
  2021-03-22 23:51 [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (5 preceding siblings ...)
  2021-03-22 23:51 ` [PATCH v4 net-next 06/11] net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a bridge Vladimir Oltean
@ 2021-03-22 23:51 ` Vladimir Oltean
  2021-03-22 23:51 ` [PATCH v4 net-next 08/11] net: dsa: inherit the actual bridge port flags at join time Vladimir Oltean
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Vladimir Oltean @ 2021-03-22 23:51 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	Claudiu Manoil, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Ivan Vecera, linux-omap, Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

This is a pretty noisy change that was broken out of the larger change
for replaying switchdev attributes and objects at bridge join time,
which is when these extack objects are actually used.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 net/dsa/dsa_priv.h | 6 ++++--
 net/dsa/port.c     | 8 +++++---
 net/dsa/slave.c    | 7 +++++--
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 4c43c5406834..b8778c5d8529 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -181,12 +181,14 @@ int dsa_port_enable_rt(struct dsa_port *dp, struct phy_device *phy);
 int dsa_port_enable(struct dsa_port *dp, struct phy_device *phy);
 void dsa_port_disable_rt(struct dsa_port *dp);
 void dsa_port_disable(struct dsa_port *dp);
-int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br);
+int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
+			 struct netlink_ext_ack *extack);
 void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br);
 int dsa_port_lag_change(struct dsa_port *dp,
 			struct netdev_lag_lower_state_info *linfo);
 int dsa_port_lag_join(struct dsa_port *dp, struct net_device *lag_dev,
-		      struct netdev_lag_upper_info *uinfo);
+		      struct netdev_lag_upper_info *uinfo,
+		      struct netlink_ext_ack *extack);
 void dsa_port_lag_leave(struct dsa_port *dp, struct net_device *lag_dev);
 int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering,
 			    struct netlink_ext_ack *extack);
diff --git a/net/dsa/port.c b/net/dsa/port.c
index d39262a9fe0e..fcbe5b1545b8 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -144,7 +144,8 @@ static void dsa_port_change_brport_flags(struct dsa_port *dp,
 	}
 }
 
-int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br)
+int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
+			 struct netlink_ext_ack *extack)
 {
 	struct dsa_notifier_bridge_info info = {
 		.tree_index = dp->ds->dst->index,
@@ -241,7 +242,8 @@ int dsa_port_lag_change(struct dsa_port *dp,
 }
 
 int dsa_port_lag_join(struct dsa_port *dp, struct net_device *lag,
-		      struct netdev_lag_upper_info *uinfo)
+		      struct netdev_lag_upper_info *uinfo,
+		      struct netlink_ext_ack *extack)
 {
 	struct dsa_notifier_lag_info info = {
 		.sw_index = dp->ds->index,
@@ -263,7 +265,7 @@ int dsa_port_lag_join(struct dsa_port *dp, struct net_device *lag,
 	if (!bridge_dev || !netif_is_bridge_master(bridge_dev))
 		return 0;
 
-	err = dsa_port_bridge_join(dp, bridge_dev);
+	err = dsa_port_bridge_join(dp, bridge_dev, extack);
 	if (err)
 		goto err_bridge_join;
 
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 992fcab4b552..1ff48be476bb 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1976,11 +1976,14 @@ static int dsa_slave_changeupper(struct net_device *dev,
 				 struct netdev_notifier_changeupper_info *info)
 {
 	struct dsa_port *dp = dsa_slave_to_port(dev);
+	struct netlink_ext_ack *extack;
 	int err = NOTIFY_DONE;
 
+	extack = netdev_notifier_info_to_extack(&info->info);
+
 	if (netif_is_bridge_master(info->upper_dev)) {
 		if (info->linking) {
-			err = dsa_port_bridge_join(dp, info->upper_dev);
+			err = dsa_port_bridge_join(dp, info->upper_dev, extack);
 			if (!err)
 				dsa_bridge_mtu_normalization(dp);
 			err = notifier_from_errno(err);
@@ -1991,7 +1994,7 @@ static int dsa_slave_changeupper(struct net_device *dev,
 	} else if (netif_is_lag_master(info->upper_dev)) {
 		if (info->linking) {
 			err = dsa_port_lag_join(dp, info->upper_dev,
-						info->upper_info);
+						info->upper_info, extack);
 			if (err == -EOPNOTSUPP) {
 				NL_SET_ERR_MSG_MOD(info->info.extack,
 						   "Offloading not supported");
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 net-next 08/11] net: dsa: inherit the actual bridge port flags at join time
  2021-03-22 23:51 [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (6 preceding siblings ...)
  2021-03-22 23:51 ` [PATCH v4 net-next 07/11] net: dsa: pass extack to dsa_port_{bridge,lag}_join Vladimir Oltean
@ 2021-03-22 23:51 ` Vladimir Oltean
  2021-03-22 23:51 ` [PATCH v4 net-next 09/11] net: dsa: sync up switchdev objects and port attributes when joining the bridge Vladimir Oltean
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Vladimir Oltean @ 2021-03-22 23:51 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	Claudiu Manoil, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Ivan Vecera, linux-omap, Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

DSA currently assumes that the bridge port starts off with this
constellation of bridge port flags:

- learning on
- unicast flooding on
- multicast flooding on
- broadcast flooding on

just by virtue of code copy-pasta from the bridge layer (new_nbp).
This was a simple enough strategy thus far, because the 'bridge join'
moment always coincided with the 'bridge port creation' moment.

But with sandwiched interfaces, such as:

 br0
  |
bond0
  |
 swp0

it may happen that the user has had time to change the bridge port flags
of bond0 before enslaving swp0 to it. In that case, swp0 will falsely
assume that the bridge port flags are those determined by new_nbp, when
in fact this can happen:

ip link add br0 type bridge
ip link add bond0 type bond
ip link set bond0 master br0
ip link set bond0 type bridge_slave learning off
ip link set swp0 master br0

Now swp0 has learning enabled, bond0 has learning disabled. Not nice.

Fix this by "dumpster diving" through the actual bridge port flags with
br_port_flag_is_set, at bridge join time.

We use this opportunity to split dsa_port_change_brport_flags into two
distinct functions called dsa_port_inherit_brport_flags and
dsa_port_clear_brport_flags, now that the implementation for the two
cases is no longer similar. This patch also creates two functions called
dsa_port_switchdev_sync and dsa_port_switchdev_unsync which collect what
we have so far, even if that's asymmetrical. More is going to be added
in the next patch.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/port.c | 123 ++++++++++++++++++++++++++++++++-----------------
 1 file changed, 82 insertions(+), 41 deletions(-)

diff --git a/net/dsa/port.c b/net/dsa/port.c
index fcbe5b1545b8..c712bf3da0a0 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -122,28 +122,84 @@ void dsa_port_disable(struct dsa_port *dp)
 	rtnl_unlock();
 }
 
-static void dsa_port_change_brport_flags(struct dsa_port *dp,
-					 bool bridge_offload)
+static int dsa_port_inherit_brport_flags(struct dsa_port *dp,
+					 struct netlink_ext_ack *extack)
 {
-	struct switchdev_brport_flags flags;
-	int flag;
+	const unsigned long mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD |
+				   BR_BCAST_FLOOD;
+	struct net_device *brport_dev = dsa_port_to_bridge_port(dp);
+	int flag, err;
 
-	flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
-	if (bridge_offload)
-		flags.val = flags.mask;
-	else
-		flags.val = flags.mask & ~BR_LEARNING;
+	for_each_set_bit(flag, &mask, 32) {
+		struct switchdev_brport_flags flags = {0};
+
+		flags.mask = BIT(flag);
 
-	for_each_set_bit(flag, &flags.mask, 32) {
-		struct switchdev_brport_flags tmp;
+		if (br_port_flag_is_set(brport_dev, BIT(flag)))
+			flags.val = BIT(flag);
+
+		err = dsa_port_bridge_flags(dp, flags, extack);
+		if (err && err != -EOPNOTSUPP)
+			return err;
+	}
 
-		tmp.val = flags.val & BIT(flag);
-		tmp.mask = BIT(flag);
+	return 0;
+}
 
-		dsa_port_bridge_flags(dp, tmp, NULL);
+static void dsa_port_clear_brport_flags(struct dsa_port *dp)
+{
+	const unsigned long val = BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
+	const unsigned long mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD |
+				   BR_BCAST_FLOOD;
+	int flag, err;
+
+	for_each_set_bit(flag, &mask, 32) {
+		struct switchdev_brport_flags flags = {0};
+
+		flags.mask = BIT(flag);
+		flags.val = val & BIT(flag);
+
+		err = dsa_port_bridge_flags(dp, flags, NULL);
+		if (err && err != -EOPNOTSUPP)
+			dev_err(dp->ds->dev,
+				"failed to clear bridge port flag %lu: %pe\n",
+				flags.val, ERR_PTR(err));
 	}
 }
 
+static int dsa_port_switchdev_sync(struct dsa_port *dp,
+				   struct netlink_ext_ack *extack)
+{
+	int err;
+
+	err = dsa_port_inherit_brport_flags(dp, extack);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static void dsa_port_switchdev_unsync(struct dsa_port *dp)
+{
+	/* Configure the port for standalone mode (no address learning,
+	 * flood everything).
+	 * The bridge only emits SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS events
+	 * when the user requests it through netlink or sysfs, but not
+	 * automatically at port join or leave, so we need to handle resetting
+	 * the brport flags ourselves. But we even prefer it that way, because
+	 * otherwise, some setups might never get the notification they need,
+	 * for example, when a port leaves a LAG that offloads the bridge,
+	 * it becomes standalone, but as far as the bridge is concerned, no
+	 * port ever left.
+	 */
+	dsa_port_clear_brport_flags(dp);
+
+	/* Port left the bridge, put in BR_STATE_DISABLED by the bridge layer,
+	 * so allow it to be in BR_STATE_FORWARDING to be kept functional
+	 */
+	dsa_port_set_state_now(dp, BR_STATE_FORWARDING);
+}
+
 int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
 			 struct netlink_ext_ack *extack)
 {
@@ -155,24 +211,25 @@ int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
 	};
 	int err;
 
-	/* Notify the port driver to set its configurable flags in a way that
-	 * matches the initial settings of a bridge port.
-	 */
-	dsa_port_change_brport_flags(dp, true);
-
 	/* Here the interface is already bridged. Reflect the current
 	 * configuration so that drivers can program their chips accordingly.
 	 */
 	dp->bridge_dev = br;
 
 	err = dsa_broadcast(DSA_NOTIFIER_BRIDGE_JOIN, &info);
+	if (err)
+		goto out_rollback;
 
-	/* The bridging is rolled back on error */
-	if (err) {
-		dsa_port_change_brport_flags(dp, false);
-		dp->bridge_dev = NULL;
-	}
+	err = dsa_port_switchdev_sync(dp, extack);
+	if (err)
+		goto out_rollback_unbridge;
 
+	return 0;
+
+out_rollback_unbridge:
+	dsa_broadcast(DSA_NOTIFIER_BRIDGE_LEAVE, &info);
+out_rollback:
+	dp->bridge_dev = NULL;
 	return err;
 }
 
@@ -195,23 +252,7 @@ void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br)
 	if (err)
 		pr_err("DSA: failed to notify DSA_NOTIFIER_BRIDGE_LEAVE\n");
 
-	/* Configure the port for standalone mode (no address learning,
-	 * flood everything).
-	 * The bridge only emits SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS events
-	 * when the user requests it through netlink or sysfs, but not
-	 * automatically at port join or leave, so we need to handle resetting
-	 * the brport flags ourselves. But we even prefer it that way, because
-	 * otherwise, some setups might never get the notification they need,
-	 * for example, when a port leaves a LAG that offloads the bridge,
-	 * it becomes standalone, but as far as the bridge is concerned, no
-	 * port ever left.
-	 */
-	dsa_port_change_brport_flags(dp, false);
-
-	/* Port left the bridge, put in BR_STATE_DISABLED by the bridge layer,
-	 * so allow it to be in BR_STATE_FORWARDING to be kept functional
-	 */
-	dsa_port_set_state_now(dp, BR_STATE_FORWARDING);
+	dsa_port_switchdev_unsync(dp);
 }
 
 int dsa_port_lag_change(struct dsa_port *dp,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 net-next 09/11] net: dsa: sync up switchdev objects and port attributes when joining the bridge
  2021-03-22 23:51 [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (7 preceding siblings ...)
  2021-03-22 23:51 ` [PATCH v4 net-next 08/11] net: dsa: inherit the actual bridge port flags at join time Vladimir Oltean
@ 2021-03-22 23:51 ` Vladimir Oltean
  2021-03-22 23:51 ` [PATCH v4 net-next 10/11] net: ocelot: call ocelot_netdevice_bridge_join when joining a bridged LAG Vladimir Oltean
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Vladimir Oltean @ 2021-03-22 23:51 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	Claudiu Manoil, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Ivan Vecera, linux-omap, Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

If we join an already-created bridge port, such as a bond master
interface, then we can miss the initial switchdev notifications emitted
by the bridge for this port, while it wasn't offloaded by anybody.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/dsa_priv.h |  3 +++
 net/dsa/port.c     | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 net/dsa/slave.c    |  4 ++--
 3 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index b8778c5d8529..92282de54230 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -262,6 +262,9 @@ static inline bool dsa_tree_offloads_bridge_port(struct dsa_switch_tree *dst,
 
 /* slave.c */
 extern const struct dsa_device_ops notag_netdev_ops;
+extern struct notifier_block dsa_slave_switchdev_notifier;
+extern struct notifier_block dsa_slave_switchdev_blocking_notifier;
+
 void dsa_slave_mii_bus_init(struct dsa_switch *ds);
 int dsa_slave_create(struct dsa_port *dp);
 void dsa_slave_destroy(struct net_device *slave_dev);
diff --git a/net/dsa/port.c b/net/dsa/port.c
index c712bf3da0a0..01e30264b25b 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -170,12 +170,46 @@ static void dsa_port_clear_brport_flags(struct dsa_port *dp)
 static int dsa_port_switchdev_sync(struct dsa_port *dp,
 				   struct netlink_ext_ack *extack)
 {
+	struct net_device *brport_dev = dsa_port_to_bridge_port(dp);
+	struct net_device *br = dp->bridge_dev;
 	int err;
 
 	err = dsa_port_inherit_brport_flags(dp, extack);
 	if (err)
 		return err;
 
+	err = dsa_port_set_state(dp, br_port_get_stp_state(brport_dev));
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
+	err = dsa_port_vlan_filtering(dp, br_vlan_enabled(br), extack);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
+	err = dsa_port_mrouter(dp->cpu_dp, br_multicast_router(br), extack);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
+	err = dsa_port_ageing_time(dp, br_get_ageing_time(br));
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
+	err = br_mdb_replay(br, brport_dev,
+			    &dsa_slave_switchdev_blocking_notifier,
+			    extack);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
+	err = br_fdb_replay(br, brport_dev, &dsa_slave_switchdev_notifier);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
+	err = br_vlan_replay(br, brport_dev,
+			     &dsa_slave_switchdev_blocking_notifier,
+			     extack);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
 	return 0;
 }
 
@@ -198,6 +232,18 @@ static void dsa_port_switchdev_unsync(struct dsa_port *dp)
 	 * so allow it to be in BR_STATE_FORWARDING to be kept functional
 	 */
 	dsa_port_set_state_now(dp, BR_STATE_FORWARDING);
+
+	/* VLAN filtering is handled by dsa_switch_bridge_leave */
+
+	/* Some drivers treat the notification for having a local multicast
+	 * router by allowing multicast to be flooded to the CPU, so we should
+	 * allow this in standalone mode too.
+	 */
+	dsa_port_mrouter(dp->cpu_dp, true, NULL);
+
+	/* Ageing time may be global to the switch chip, so don't change it
+	 * here because we have no good reason (or value) to change it to.
+	 */
 }
 
 int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 1ff48be476bb..c51e52418a62 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -2392,11 +2392,11 @@ static struct notifier_block dsa_slave_nb __read_mostly = {
 	.notifier_call  = dsa_slave_netdevice_event,
 };
 
-static struct notifier_block dsa_slave_switchdev_notifier = {
+struct notifier_block dsa_slave_switchdev_notifier = {
 	.notifier_call = dsa_slave_switchdev_event,
 };
 
-static struct notifier_block dsa_slave_switchdev_blocking_notifier = {
+struct notifier_block dsa_slave_switchdev_blocking_notifier = {
 	.notifier_call = dsa_slave_switchdev_blocking_event,
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 net-next 10/11] net: ocelot: call ocelot_netdevice_bridge_join when joining a bridged LAG
  2021-03-22 23:51 [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (8 preceding siblings ...)
  2021-03-22 23:51 ` [PATCH v4 net-next 09/11] net: dsa: sync up switchdev objects and port attributes when joining the bridge Vladimir Oltean
@ 2021-03-22 23:51 ` Vladimir Oltean
  2021-03-22 23:51 ` [PATCH v4 net-next 11/11] net: ocelot: replay switchdev events when joining bridge Vladimir Oltean
  2021-03-23 22:00 ` [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA patchwork-bot+netdevbpf
  11 siblings, 0 replies; 19+ messages in thread
From: Vladimir Oltean @ 2021-03-22 23:51 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	Claudiu Manoil, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Ivan Vecera, linux-omap, Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

Similar to the DSA situation, ocelot supports LAG offload but treats
this scenario improperly:

ip link add br0 type bridge
ip link add bond0 type bond
ip link set bond0 master br0
ip link set swp0 master bond0

We do the same thing as we do there, which is to simulate a 'bridge join'
on 'lag join', if we detect that the bonding upper has a bridge upper.

Again, same as DSA, ocelot supports software fallback for LAG, and in
that case, we should avoid calling ocelot_netdevice_changeupper.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/ethernet/mscc/ocelot_net.c | 111 +++++++++++++++++++------
 1 file changed, 86 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
index c08164cd88f4..d1376f7b34fd 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -1117,10 +1117,15 @@ static int ocelot_port_obj_del(struct net_device *dev,
 	return ret;
 }
 
-static int ocelot_netdevice_bridge_join(struct ocelot *ocelot, int port,
-					struct net_device *bridge)
+static int ocelot_netdevice_bridge_join(struct net_device *dev,
+					struct net_device *bridge,
+					struct netlink_ext_ack *extack)
 {
+	struct ocelot_port_private *priv = netdev_priv(dev);
+	struct ocelot_port *ocelot_port = &priv->port;
+	struct ocelot *ocelot = ocelot_port->ocelot;
 	struct switchdev_brport_flags flags;
+	int port = priv->chip_port;
 	int err;
 
 	flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
@@ -1135,10 +1140,14 @@ static int ocelot_netdevice_bridge_join(struct ocelot *ocelot, int port,
 	return 0;
 }
 
-static int ocelot_netdevice_bridge_leave(struct ocelot *ocelot, int port,
+static int ocelot_netdevice_bridge_leave(struct net_device *dev,
 					 struct net_device *bridge)
 {
+	struct ocelot_port_private *priv = netdev_priv(dev);
+	struct ocelot_port *ocelot_port = &priv->port;
+	struct ocelot *ocelot = ocelot_port->ocelot;
 	struct switchdev_brport_flags flags;
+	int port = priv->chip_port;
 	int err;
 
 	flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
@@ -1151,43 +1160,89 @@ static int ocelot_netdevice_bridge_leave(struct ocelot *ocelot, int port,
 	return err;
 }
 
-static int ocelot_netdevice_changeupper(struct net_device *dev,
-					struct netdev_notifier_changeupper_info *info)
+static int ocelot_netdevice_lag_join(struct net_device *dev,
+				     struct net_device *bond,
+				     struct netdev_lag_upper_info *info,
+				     struct netlink_ext_ack *extack)
 {
 	struct ocelot_port_private *priv = netdev_priv(dev);
 	struct ocelot_port *ocelot_port = &priv->port;
 	struct ocelot *ocelot = ocelot_port->ocelot;
+	struct net_device *bridge_dev;
 	int port = priv->chip_port;
+	int err;
+
+	err = ocelot_port_lag_join(ocelot, port, bond, info);
+	if (err == -EOPNOTSUPP) {
+		NL_SET_ERR_MSG_MOD(extack, "Offloading not supported");
+		return 0;
+	}
+
+	bridge_dev = netdev_master_upper_dev_get(bond);
+	if (!bridge_dev || !netif_is_bridge_master(bridge_dev))
+		return 0;
+
+	err = ocelot_netdevice_bridge_join(dev, bridge_dev, extack);
+	if (err)
+		goto err_bridge_join;
+
+	return 0;
+
+err_bridge_join:
+	ocelot_port_lag_leave(ocelot, port, bond);
+	return err;
+}
+
+static int ocelot_netdevice_lag_leave(struct net_device *dev,
+				      struct net_device *bond)
+{
+	struct ocelot_port_private *priv = netdev_priv(dev);
+	struct ocelot_port *ocelot_port = &priv->port;
+	struct ocelot *ocelot = ocelot_port->ocelot;
+	struct net_device *bridge_dev;
+	int port = priv->chip_port;
+
+	ocelot_port_lag_leave(ocelot, port, bond);
+
+	bridge_dev = netdev_master_upper_dev_get(bond);
+	if (!bridge_dev || !netif_is_bridge_master(bridge_dev))
+		return 0;
+
+	return ocelot_netdevice_bridge_leave(dev, bridge_dev);
+}
+
+static int ocelot_netdevice_changeupper(struct net_device *dev,
+					struct netdev_notifier_changeupper_info *info)
+{
+	struct netlink_ext_ack *extack;
 	int err = 0;
 
+	extack = netdev_notifier_info_to_extack(&info->info);
+
 	if (netif_is_bridge_master(info->upper_dev)) {
-		if (info->linking) {
-			err = ocelot_netdevice_bridge_join(ocelot, port,
-							   info->upper_dev);
-		} else {
-			err = ocelot_netdevice_bridge_leave(ocelot, port,
-							    info->upper_dev);
-		}
+		if (info->linking)
+			err = ocelot_netdevice_bridge_join(dev, info->upper_dev,
+							   extack);
+		else
+			err = ocelot_netdevice_bridge_leave(dev, info->upper_dev);
 	}
 	if (netif_is_lag_master(info->upper_dev)) {
-		if (info->linking) {
-			err = ocelot_port_lag_join(ocelot, port,
-						   info->upper_dev,
-						   info->upper_info);
-			if (err == -EOPNOTSUPP) {
-				NL_SET_ERR_MSG_MOD(info->info.extack,
-						   "Offloading not supported");
-				err = 0;
-			}
-		} else {
-			ocelot_port_lag_leave(ocelot, port,
-					      info->upper_dev);
-		}
+		if (info->linking)
+			err = ocelot_netdevice_lag_join(dev, info->upper_dev,
+							info->upper_info, extack);
+		else
+			ocelot_netdevice_lag_leave(dev, info->upper_dev);
 	}
 
 	return notifier_from_errno(err);
 }
 
+/* Treat CHANGEUPPER events on an offloaded LAG as individual CHANGEUPPER
+ * events for the lower physical ports of the LAG.
+ * If the LAG upper isn't offloaded, ignore its CHANGEUPPER events.
+ * In case the LAG joined a bridge, notify that we are offloading it and can do
+ * forwarding in hardware towards it.
+ */
 static int
 ocelot_netdevice_lag_changeupper(struct net_device *dev,
 				 struct netdev_notifier_changeupper_info *info)
@@ -1197,6 +1252,12 @@ ocelot_netdevice_lag_changeupper(struct net_device *dev,
 	int err = NOTIFY_DONE;
 
 	netdev_for_each_lower_dev(dev, lower, iter) {
+		struct ocelot_port_private *priv = netdev_priv(lower);
+		struct ocelot_port *ocelot_port = &priv->port;
+
+		if (ocelot_port->bond != dev)
+			return NOTIFY_OK;
+
 		err = ocelot_netdevice_changeupper(lower, info);
 		if (err)
 			return notifier_from_errno(err);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 net-next 11/11] net: ocelot: replay switchdev events when joining bridge
  2021-03-22 23:51 [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (9 preceding siblings ...)
  2021-03-22 23:51 ` [PATCH v4 net-next 10/11] net: ocelot: call ocelot_netdevice_bridge_join when joining a bridged LAG Vladimir Oltean
@ 2021-03-22 23:51 ` Vladimir Oltean
  2021-03-23 22:00 ` [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA patchwork-bot+netdevbpf
  11 siblings, 0 replies; 19+ messages in thread
From: Vladimir Oltean @ 2021-03-22 23:51 UTC (permalink / raw)
  To: Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	Claudiu Manoil, netdev, linux-kernel, Roopa Prabhu,
	Nikolay Aleksandrov, Jiri Pirko, Ido Schimmel, Alexandre Belloni,
	UNGLinuxDriver, Ivan Vecera, linux-omap, Vladimir Oltean

From: Vladimir Oltean <vladimir.oltean@nxp.com>

The premise of this change is that the switchdev port attributes and
objects offloaded by ocelot might have been missed when we are joining
an already existing bridge port, such as a bonding interface.

The patch pulls these switchdev attributes and objects from the bridge,
on behalf of the 'bridge port' net device which might be either the
ocelot switch interface, or the bonding upper interface.

The ocelot_net.c belongs strictly to the switchdev ocelot driver, while
ocelot.c is part of a library shared with the DSA felix driver.
The ocelot_port_bridge_leave function (part of the common library) used
to call ocelot_port_vlan_filtering(false), something which is not
necessary for DSA, since the framework deals with that already there.
So we move this function to ocelot_switchdev_unsync, which is specific
to the switchdev driver.

The code movement described above makes ocelot_port_bridge_leave no
longer return an error code, so we change its type from int to void.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/ocelot/felix.c         |   4 +-
 drivers/net/ethernet/mscc/Kconfig      |   3 +-
 drivers/net/ethernet/mscc/ocelot.c     |  18 ++--
 drivers/net/ethernet/mscc/ocelot_net.c | 117 +++++++++++++++++++++----
 include/soc/mscc/ocelot.h              |   6 +-
 5 files changed, 113 insertions(+), 35 deletions(-)

diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
index 628afb47b579..6b5442be0230 100644
--- a/drivers/net/dsa/ocelot/felix.c
+++ b/drivers/net/dsa/ocelot/felix.c
@@ -719,7 +719,9 @@ static int felix_bridge_join(struct dsa_switch *ds, int port,
 {
 	struct ocelot *ocelot = ds->priv;
 
-	return ocelot_port_bridge_join(ocelot, port, br);
+	ocelot_port_bridge_join(ocelot, port, br);
+
+	return 0;
 }
 
 static void felix_bridge_leave(struct dsa_switch *ds, int port,
diff --git a/drivers/net/ethernet/mscc/Kconfig b/drivers/net/ethernet/mscc/Kconfig
index 05cb040c2677..2d3157e4d081 100644
--- a/drivers/net/ethernet/mscc/Kconfig
+++ b/drivers/net/ethernet/mscc/Kconfig
@@ -11,7 +11,7 @@ config NET_VENDOR_MICROSEMI
 
 if NET_VENDOR_MICROSEMI
 
-# Users should depend on NET_SWITCHDEV, HAS_IOMEM
+# Users should depend on NET_SWITCHDEV, HAS_IOMEM, BRIDGE
 config MSCC_OCELOT_SWITCH_LIB
 	select NET_DEVLINK
 	select REGMAP_MMIO
@@ -24,6 +24,7 @@ config MSCC_OCELOT_SWITCH_LIB
 
 config MSCC_OCELOT_SWITCH
 	tristate "Ocelot switch driver"
+	depends on BRIDGE || BRIDGE=n
 	depends on NET_SWITCHDEV
 	depends on HAS_IOMEM
 	depends on OF_NET
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index ce57929ba3d1..1a36b416fd9b 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -1514,34 +1514,28 @@ int ocelot_port_mdb_del(struct ocelot *ocelot, int port,
 }
 EXPORT_SYMBOL(ocelot_port_mdb_del);
 
-int ocelot_port_bridge_join(struct ocelot *ocelot, int port,
-			    struct net_device *bridge)
+void ocelot_port_bridge_join(struct ocelot *ocelot, int port,
+			     struct net_device *bridge)
 {
 	struct ocelot_port *ocelot_port = ocelot->ports[port];
 
 	ocelot_port->bridge = bridge;
 
-	return 0;
+	ocelot_apply_bridge_fwd_mask(ocelot);
 }
 EXPORT_SYMBOL(ocelot_port_bridge_join);
 
-int ocelot_port_bridge_leave(struct ocelot *ocelot, int port,
-			     struct net_device *bridge)
+void ocelot_port_bridge_leave(struct ocelot *ocelot, int port,
+			      struct net_device *bridge)
 {
 	struct ocelot_port *ocelot_port = ocelot->ports[port];
 	struct ocelot_vlan pvid = {0}, native_vlan = {0};
-	int ret;
 
 	ocelot_port->bridge = NULL;
 
-	ret = ocelot_port_vlan_filtering(ocelot, port, false);
-	if (ret)
-		return ret;
-
 	ocelot_port_set_pvid(ocelot, port, pvid);
 	ocelot_port_set_native_vlan(ocelot, port, native_vlan);
-
-	return 0;
+	ocelot_apply_bridge_fwd_mask(ocelot);
 }
 EXPORT_SYMBOL(ocelot_port_bridge_leave);
 
diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
index d1376f7b34fd..36f32a4d9b0f 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -1117,47 +1117,126 @@ static int ocelot_port_obj_del(struct net_device *dev,
 	return ret;
 }
 
+static void ocelot_inherit_brport_flags(struct ocelot *ocelot, int port,
+					struct net_device *brport_dev)
+{
+	struct switchdev_brport_flags flags = {0};
+	int flag;
+
+	flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
+
+	for_each_set_bit(flag, &flags.mask, 32)
+		if (br_port_flag_is_set(brport_dev, BIT(flag)))
+			flags.val |= BIT(flag);
+
+	ocelot_port_bridge_flags(ocelot, port, flags);
+}
+
+static void ocelot_clear_brport_flags(struct ocelot *ocelot, int port)
+{
+	struct switchdev_brport_flags flags;
+
+	flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
+	flags.val = flags.mask & ~BR_LEARNING;
+
+	ocelot_port_bridge_flags(ocelot, port, flags);
+}
+
+static int ocelot_switchdev_sync(struct ocelot *ocelot, int port,
+				 struct net_device *brport_dev,
+				 struct net_device *bridge_dev,
+				 struct netlink_ext_ack *extack)
+{
+	clock_t ageing_time;
+	u8 stp_state;
+	int err;
+
+	ocelot_inherit_brport_flags(ocelot, port, brport_dev);
+
+	stp_state = br_port_get_stp_state(brport_dev);
+	ocelot_bridge_stp_state_set(ocelot, port, stp_state);
+
+	err = ocelot_port_vlan_filtering(ocelot, port,
+					 br_vlan_enabled(bridge_dev));
+	if (err)
+		return err;
+
+	ageing_time = br_get_ageing_time(bridge_dev);
+	ocelot_port_attr_ageing_set(ocelot, port, ageing_time);
+
+	err = br_mdb_replay(bridge_dev, brport_dev,
+			    &ocelot_switchdev_blocking_nb, extack);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
+	err = br_fdb_replay(bridge_dev, brport_dev, &ocelot_switchdev_nb);
+	if (err)
+		return err;
+
+	err = br_vlan_replay(bridge_dev, brport_dev,
+			     &ocelot_switchdev_blocking_nb, extack);
+	if (err && err != -EOPNOTSUPP)
+		return err;
+
+	return 0;
+}
+
+static int ocelot_switchdev_unsync(struct ocelot *ocelot, int port)
+{
+	int err;
+
+	err = ocelot_port_vlan_filtering(ocelot, port, false);
+	if (err)
+		return err;
+
+	ocelot_clear_brport_flags(ocelot, port);
+
+	ocelot_bridge_stp_state_set(ocelot, port, BR_STATE_FORWARDING);
+
+	return 0;
+}
+
 static int ocelot_netdevice_bridge_join(struct net_device *dev,
+					struct net_device *brport_dev,
 					struct net_device *bridge,
 					struct netlink_ext_ack *extack)
 {
 	struct ocelot_port_private *priv = netdev_priv(dev);
 	struct ocelot_port *ocelot_port = &priv->port;
 	struct ocelot *ocelot = ocelot_port->ocelot;
-	struct switchdev_brport_flags flags;
 	int port = priv->chip_port;
 	int err;
 
-	flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
-	flags.val = flags.mask;
+	ocelot_port_bridge_join(ocelot, port, bridge);
 
-	err = ocelot_port_bridge_join(ocelot, port, bridge);
+	err = ocelot_switchdev_sync(ocelot, port, brport_dev, bridge, extack);
 	if (err)
-		return err;
-
-	ocelot_port_bridge_flags(ocelot, port, flags);
+		goto err_switchdev_sync;
 
 	return 0;
+
+err_switchdev_sync:
+	ocelot_port_bridge_leave(ocelot, port, bridge);
+	return err;
 }
 
 static int ocelot_netdevice_bridge_leave(struct net_device *dev,
+					 struct net_device *brport_dev,
 					 struct net_device *bridge)
 {
 	struct ocelot_port_private *priv = netdev_priv(dev);
 	struct ocelot_port *ocelot_port = &priv->port;
 	struct ocelot *ocelot = ocelot_port->ocelot;
-	struct switchdev_brport_flags flags;
 	int port = priv->chip_port;
 	int err;
 
-	flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
-	flags.val = flags.mask & ~BR_LEARNING;
+	err = ocelot_switchdev_unsync(ocelot, port);
+	if (err)
+		return err;
 
-	err = ocelot_port_bridge_leave(ocelot, port, bridge);
+	ocelot_port_bridge_leave(ocelot, port, bridge);
 
-	ocelot_port_bridge_flags(ocelot, port, flags);
-
-	return err;
+	return 0;
 }
 
 static int ocelot_netdevice_lag_join(struct net_device *dev,
@@ -1182,7 +1261,7 @@ static int ocelot_netdevice_lag_join(struct net_device *dev,
 	if (!bridge_dev || !netif_is_bridge_master(bridge_dev))
 		return 0;
 
-	err = ocelot_netdevice_bridge_join(dev, bridge_dev, extack);
+	err = ocelot_netdevice_bridge_join(dev, bond, bridge_dev, extack);
 	if (err)
 		goto err_bridge_join;
 
@@ -1208,7 +1287,7 @@ static int ocelot_netdevice_lag_leave(struct net_device *dev,
 	if (!bridge_dev || !netif_is_bridge_master(bridge_dev))
 		return 0;
 
-	return ocelot_netdevice_bridge_leave(dev, bridge_dev);
+	return ocelot_netdevice_bridge_leave(dev, bond, bridge_dev);
 }
 
 static int ocelot_netdevice_changeupper(struct net_device *dev,
@@ -1221,10 +1300,12 @@ static int ocelot_netdevice_changeupper(struct net_device *dev,
 
 	if (netif_is_bridge_master(info->upper_dev)) {
 		if (info->linking)
-			err = ocelot_netdevice_bridge_join(dev, info->upper_dev,
+			err = ocelot_netdevice_bridge_join(dev, dev,
+							   info->upper_dev,
 							   extack);
 		else
-			err = ocelot_netdevice_bridge_leave(dev, info->upper_dev);
+			err = ocelot_netdevice_bridge_leave(dev, dev,
+							    info->upper_dev);
 	}
 	if (netif_is_lag_master(info->upper_dev)) {
 		if (info->linking)
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index ce7e5c1bd90d..68cdc7ceaf4d 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -803,10 +803,10 @@ int ocelot_port_pre_bridge_flags(struct ocelot *ocelot, int port,
 				 struct switchdev_brport_flags val);
 void ocelot_port_bridge_flags(struct ocelot *ocelot, int port,
 			      struct switchdev_brport_flags val);
-int ocelot_port_bridge_join(struct ocelot *ocelot, int port,
-			    struct net_device *bridge);
-int ocelot_port_bridge_leave(struct ocelot *ocelot, int port,
+void ocelot_port_bridge_join(struct ocelot *ocelot, int port,
 			     struct net_device *bridge);
+void ocelot_port_bridge_leave(struct ocelot *ocelot, int port,
+			      struct net_device *bridge);
 int ocelot_fdb_dump(struct ocelot *ocelot, int port,
 		    dsa_fdb_dump_cb_t *cb, void *data);
 int ocelot_fdb_add(struct ocelot *ocelot, int port,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 net-next 01/11] net: bridge: add helper for retrieving the current bridge port STP state
  2021-03-22 23:51 ` [PATCH v4 net-next 01/11] net: bridge: add helper for retrieving the current bridge port STP state Vladimir Oltean
@ 2021-03-23 10:33   ` Nikolay Aleksandrov
  0 siblings, 0 replies; 19+ messages in thread
From: Nikolay Aleksandrov @ 2021-03-23 10:33 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	Claudiu Manoil, netdev, linux-kernel, Roopa Prabhu, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Ivan Vecera,
	linux-omap, Vladimir Oltean

On 23/03/2021 01:51, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> It may happen that we have the following topology with DSA or any other
> switchdev driver with LAG offload:
> 
> ip link add br0 type bridge stp_state 1
> ip link add bond0 type bond
> ip link set bond0 master br0
> ip link set swp0 master bond0
> ip link set swp1 master bond0
> 
> STP decides that it should put bond0 into the BLOCKING state, and
> that's that. The ports that are actively listening for the switchdev
> port attributes emitted for the bond0 bridge port (because they are
> offloading it) and have the honor of seeing that switchdev port
> attribute can react to it, so we can program swp0 and swp1 into the
> BLOCKING state.
> 
> But if then we do:
> 
> ip link set swp2 master bond0
> 
> then as far as the bridge is concerned, nothing has changed: it still
> has one bridge port. But this new bridge port will not see any STP state
> change notification and will remain FORWARDING, which is how the
> standalone code leaves it in.
> 
> We need a function in the bridge driver which retrieves the current STP
> state, such that drivers can synchronize to it when they may have missed
> switchdev events.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
> ---
>  include/linux/if_bridge.h |  6 ++++++
>  net/bridge/br_stp.c       | 14 ++++++++++++++
>  2 files changed, 20 insertions(+)
> 

Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 net-next 02/11] net: bridge: add helper to retrieve the current ageing time
  2021-03-22 23:51 ` [PATCH v4 net-next 02/11] net: bridge: add helper to retrieve the current ageing time Vladimir Oltean
@ 2021-03-23 10:36   ` Nikolay Aleksandrov
  0 siblings, 0 replies; 19+ messages in thread
From: Nikolay Aleksandrov @ 2021-03-23 10:36 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	Claudiu Manoil, netdev, linux-kernel, Roopa Prabhu, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Ivan Vecera,
	linux-omap, Vladimir Oltean

On 23/03/2021 01:51, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from:
> 
> sysfs/ioctl/netlink
> -> br_set_ageing_time
>    -> __set_ageing_time
> 
> therefore not at bridge port creation time, so:
> (a) switchdev drivers have to hardcode the initial value for the address
>     ageing time, because they didn't get any notification
> (b) that hardcoded value can be out of sync, if the user changes the
>     ageing time before enslaving the port to the bridge
> 
> We need a helper in the bridge, such that switchdev drivers can query
> the current value of the bridge ageing time when they start offloading
> it.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
> ---
>  include/linux/if_bridge.h |  6 ++++++
>  net/bridge/br_stp.c       | 13 +++++++++++++
>  2 files changed, 19 insertions(+)
> 

The patch is mostly fine, there are a few minor nits (const qualifiers). If there
is another version of the patch-set please add them, either way:

Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>

> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index 920d3a02cc68..ebd16495459c 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -137,6 +137,7 @@ struct net_device *br_fdb_find_port(const struct net_device *br_dev,
>  void br_fdb_clear_offload(const struct net_device *dev, u16 vid);
>  bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
>  u8 br_port_get_stp_state(const struct net_device *dev);
> +clock_t br_get_ageing_time(struct net_device *br_dev);
>  #else
>  static inline struct net_device *
>  br_fdb_find_port(const struct net_device *br_dev,
> @@ -160,6 +161,11 @@ static inline u8 br_port_get_stp_state(const struct net_device *dev)
>  {
>  	return BR_STATE_DISABLED;
>  }
> +
> +static inline clock_t br_get_ageing_time(struct net_device *br_dev)

const

> +{
> +	return 0;
> +}
>  #endif
>  
>  #endif
> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
> index 86b5e05d3f21..3dafb6143cff 100644
> --- a/net/bridge/br_stp.c
> +++ b/net/bridge/br_stp.c
> @@ -639,6 +639,19 @@ int br_set_ageing_time(struct net_bridge *br, clock_t ageing_time)
>  	return 0;
>  }
>  
> +clock_t br_get_ageing_time(struct net_device *br_dev)

const

> +{
> +	struct net_bridge *br;

const

> +
> +	if (!netif_is_bridge_master(br_dev))
> +		return 0;
> +
> +	br = netdev_priv(br_dev);
> +
> +	return jiffies_to_clock_t(br->ageing_time);
> +}
> +EXPORT_SYMBOL_GPL(br_get_ageing_time);
> +
>  /* called under bridge lock */
>  void __br_set_topology_change(struct net_bridge *br, unsigned char val)
>  {
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 net-next 04/11] net: bridge: add helper to replay port and local fdb entries
  2021-03-22 23:51 ` [PATCH v4 net-next 04/11] net: bridge: add helper to replay port and local fdb entries Vladimir Oltean
@ 2021-03-23 11:12   ` Nikolay Aleksandrov
  2021-03-23 18:11     ` Vladimir Oltean
  0 siblings, 1 reply; 19+ messages in thread
From: Nikolay Aleksandrov @ 2021-03-23 11:12 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	Claudiu Manoil, netdev, linux-kernel, Roopa Prabhu, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Ivan Vecera,
	linux-omap, Vladimir Oltean

On 23/03/2021 01:51, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> When a switchdev port starts offloading a LAG that is already in a
> bridge and has an FDB entry pointing to it:
> 
> ip link set bond0 master br0
> bridge fdb add dev bond0 00:01:02:03:04:05 master static
> ip link set swp0 master bond0
> 
> the switchdev driver will have no idea that this FDB entry is there,
> because it missed the switchdev event emitted at its creation.
> 
> Ido Schimmel pointed this out during a discussion about challenges with
> switchdev offloading of stacked interfaces between the physical port and
> the bridge, and recommended to just catch that condition and deny the
> CHANGEUPPER event:
> https://lore.kernel.org/netdev/20210210105949.GB287766@shredder.lan/
> 
> But in fact, we might need to deal with the hard thing anyway, which is
> to replay all FDB addresses relevant to this port, because it isn't just
> static FDB entries, but also local addresses (ones that are not
> forwarded but terminated by the bridge). There, we can't just say 'oh
> yeah, there was an upper already so I'm not joining that'.
> 
> So, similar to the logic for replaying MDB entries, add a function that
> must be called by individual switchdev drivers and replays local FDB
> entries as well as ones pointing towards a bridge port. This time, we
> use the atomic switchdev notifier block, since that's what FDB entries
> expect for some reason.
> 

I get the reason to have both bridge and bridge port devices (although the bridge
is really unnecessary as it can be inferred from the port), but it looks kind of
weird at first glance, I mean we get all of the port's fdbs and all of the bridge
fdbs every time (dst == NULL). The code itself is correct and the alternative
to take only 1 net_device and act based on its type would add another
step to the process per-port which also doesn't sound good...
There are a few minor const nits below too, again if there is another version
please take care of them, for the patch:

Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>

> Reported-by: Ido Schimmel <idosch@idosch.org>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  include/linux/if_bridge.h |  9 +++++++
>  net/bridge/br_fdb.c       | 50 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 59 insertions(+)
> 
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index f6472969bb44..b564c4486a45 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -147,6 +147,8 @@ void br_fdb_clear_offload(const struct net_device *dev, u16 vid);
>  bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
>  u8 br_port_get_stp_state(const struct net_device *dev);
>  clock_t br_get_ageing_time(struct net_device *br_dev);
> +int br_fdb_replay(struct net_device *br_dev, struct net_device *dev,
> +		  struct notifier_block *nb);
>  #else
>  static inline struct net_device *
>  br_fdb_find_port(const struct net_device *br_dev,
> @@ -175,6 +177,13 @@ static inline clock_t br_get_ageing_time(struct net_device *br_dev)
>  {
>  	return 0;
>  }
> +
> +static inline int br_fdb_replay(struct net_device *br_dev,
> +				struct net_device *dev,
> +				struct notifier_block *nb)
> +{
> +	return -EOPNOTSUPP;
> +}
>  #endif
>  
>  #endif
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index b7490237f3fc..698b79747d32 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -726,6 +726,56 @@ static inline size_t fdb_nlmsg_size(void)
>  		+ nla_total_size(sizeof(u8)); /* NFEA_ACTIVITY_NOTIFY */
>  }
>  
> +static int br_fdb_replay_one(struct notifier_block *nb,
> +			     struct net_bridge_fdb_entry *fdb,
> +			     struct net_device *dev)
> +{
> +	struct switchdev_notifier_fdb_info item;
> +	int err;
> +
> +	item.addr = fdb->key.addr.addr;
> +	item.vid = fdb->key.vlan_id;
> +	item.added_by_user = test_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
> +	item.offloaded = test_bit(BR_FDB_OFFLOADED, &fdb->flags);
> +	item.info.dev = dev;
> +
> +	err = nb->notifier_call(nb, SWITCHDEV_FDB_ADD_TO_DEVICE, &item);
> +	return notifier_to_errno(err);
> +}
> +
> +int br_fdb_replay(struct net_device *br_dev, struct net_device *dev,
> +		  struct notifier_block *nb)

The devices can be const

> +{
> +	struct net_bridge_fdb_entry *fdb;
> +	struct net_bridge *br;
> +	int err = 0;
> +
> +	if (!netif_is_bridge_master(br_dev) || !netif_is_bridge_port(dev))
> +		return -EINVAL;
> +
> +	br = netdev_priv(br_dev);
> +
> +	rcu_read_lock();
> +
> +	hlist_for_each_entry_rcu(fdb, &br->fdb_list, fdb_node) {
> +		struct net_bridge_port *dst = READ_ONCE(fdb->dst);

const

> +		struct net_device *dst_dev;
> +
> +		dst_dev = dst ? dst->dev : br->dev;
> +		if (dst_dev != br_dev && dst_dev != dev)
> +			continue;
> +
> +		err = br_fdb_replay_one(nb, fdb, dst_dev);
> +		if (err)
> +			break;
> +	}
> +
> +	rcu_read_unlock();
> +
> +	return err;
> +}
> +EXPORT_SYMBOL_GPL(br_fdb_replay);
> +
>  static void fdb_notify(struct net_bridge *br,
>  		       const struct net_bridge_fdb_entry *fdb, int type,
>  		       bool swdev_notify)
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 net-next 03/11] net: bridge: add helper to replay port and host-joined mdb entries
  2021-03-22 23:51 ` [PATCH v4 net-next 03/11] net: bridge: add helper to replay port and host-joined mdb entries Vladimir Oltean
@ 2021-03-23 12:00   ` Nikolay Aleksandrov
  0 siblings, 0 replies; 19+ messages in thread
From: Nikolay Aleksandrov @ 2021-03-23 12:00 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	Claudiu Manoil, netdev, linux-kernel, Roopa Prabhu, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Ivan Vecera,
	linux-omap, Vladimir Oltean

On 23/03/2021 01:51, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> I have a system with DSA ports, and udhcpcd is configured to bring
> interfaces up as soon as they are created.
> 
> I create a bridge as follows:
> 
> ip link add br0 type bridge
> 
> As soon as I create the bridge and udhcpcd brings it up, I also have
> avahi which automatically starts sending IPv6 packets to advertise some
> local services, and because of that, the br0 bridge joins the following
> IPv6 groups due to the code path detailed below:
> 
> 33:33:ff:6d:c1:9c vid 0
> 33:33:00:00:00:6a vid 0
> 33:33:00:00:00:fb vid 0
> 
> br_dev_xmit
> -> br_multicast_rcv
>    -> br_ip6_multicast_add_group
>       -> __br_multicast_add_group
>          -> br_multicast_host_join
>             -> br_mdb_notify
> 
> This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
> hooked up, and switchdev will attempt to offload the host joined groups
> to an empty list of ports. Of course nobody offloads them.
> 
> Then when we add a port to br0:
> 
> ip link set swp0 master br0
> 
> the bridge doesn't replay the host-joined MDB entries from br_add_if,
> and eventually the host joined addresses expire, and a switchdev
> notification for deleting it is emitted, but surprise, the original
> addition was already completely missed.
> 
> The strategy to address this problem is to replay the MDB entries (both
> the port ones and the host joined ones) when the new port joins the
> bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
> be populated and only then attached to a bridge that you offload).
> However there are 2 possibilities: the addresses can be 'pushed' by the
> bridge into the port, or the port can 'pull' them from the bridge.
> 
> Considering that in the general case, the new port can be really late to
> the party, and there may have been many other switchdev ports that
> already received the initial notification, we would like to avoid
> delivering duplicate events to them, since they might misbehave. And
> currently, the bridge calls the entire switchdev notifier chain, whereas
> for replaying it should just call the notifier block of the new guy.
> But the bridge doesn't know what is the new guy's notifier block, it
> just knows where the switchdev notifier chain is. So for simplification,
> we make this a driver-initiated pull for now, and the notifier block is
> passed as an argument.
> 
> To emulate the calling context for mdb objects (deferred and put on the
> blocking notifier chain), we must iterate under RCU protection through
> the bridge's mdb entries, queue them, and only call them once we're out
> of the RCU read-side critical section.
> 
> There was some opportunity for reuse between br_mdb_switchdev_host_port,
> br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
> mdb object is created, so a helper was created.
> 
> Suggested-by: Ido Schimmel <idosch@idosch.org>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  include/linux/if_bridge.h |   9 +++
>  include/net/switchdev.h   |   1 +
>  net/bridge/br_mdb.c       | 148 +++++++++++++++++++++++++++++++++-----
>  3 files changed, 141 insertions(+), 17 deletions(-)
> 

Absolutely the same comments here as for the fdb version.
The code looks correct.

Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>

> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index ebd16495459c..f6472969bb44 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -69,6 +69,8 @@ bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto);
>  bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto);
>  bool br_multicast_enabled(const struct net_device *dev);
>  bool br_multicast_router(const struct net_device *dev);
> +int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
> +		  struct notifier_block *nb, struct netlink_ext_ack *extack);
>  #else
>  static inline int br_multicast_list_adjacent(struct net_device *dev,
>  					     struct list_head *br_ip_list)
> @@ -93,6 +95,13 @@ static inline bool br_multicast_router(const struct net_device *dev)
>  {
>  	return false;
>  }
> +static inline int br_mdb_replay(struct net_device *br_dev,
> +				struct net_device *dev,
> +				struct notifier_block *nb,
> +				struct netlink_ext_ack *extack)
> +{
> +	return -EOPNOTSUPP;
> +}
>  #endif
>  
>  #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_VLAN_FILTERING)
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> index b7fc7d0f54e2..8c3218177136 100644
> --- a/include/net/switchdev.h
> +++ b/include/net/switchdev.h
> @@ -68,6 +68,7 @@ enum switchdev_obj_id {
>  };
>  
>  struct switchdev_obj {
> +	struct list_head list;
>  	struct net_device *orig_dev;
>  	enum switchdev_obj_id id;
>  	u32 flags;
> diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
> index 8846c5bcd075..95fa4af0e8dd 100644
> --- a/net/bridge/br_mdb.c
> +++ b/net/bridge/br_mdb.c
> @@ -506,6 +506,134 @@ static void br_mdb_complete(struct net_device *dev, int err, void *priv)
>  	kfree(priv);
>  }
>  
> +static void br_switchdev_mdb_populate(struct switchdev_obj_port_mdb *mdb,
> +				      const struct net_bridge_mdb_entry *mp)
> +{
> +	if (mp->addr.proto == htons(ETH_P_IP))
> +		ip_eth_mc_map(mp->addr.dst.ip4, mdb->addr);
> +#if IS_ENABLED(CONFIG_IPV6)
> +	else if (mp->addr.proto == htons(ETH_P_IPV6))
> +		ipv6_eth_mc_map(&mp->addr.dst.ip6, mdb->addr);
> +#endif
> +	else
> +		ether_addr_copy(mdb->addr, mp->addr.dst.mac_addr);
> +
> +	mdb->vid = mp->addr.vid;
> +}
> +
> +static int br_mdb_replay_one(struct notifier_block *nb, struct net_device *dev,
> +			     struct switchdev_obj_port_mdb *mdb,
> +			     struct netlink_ext_ack *extack)
> +{
> +	struct switchdev_notifier_port_obj_info obj_info = {
> +		.info = {
> +			.dev = dev,
> +			.extack = extack,
> +		},
> +		.obj = &mdb->obj,
> +	};
> +	int err;
> +
> +	err = nb->notifier_call(nb, SWITCHDEV_PORT_OBJ_ADD, &obj_info);
> +	return notifier_to_errno(err);
> +}
> +
> +static int br_mdb_queue_one(struct list_head *mdb_list,
> +			    enum switchdev_obj_id id,
> +			    const struct net_bridge_mdb_entry *mp,
> +			    struct net_device *orig_dev)
> +{
> +	struct switchdev_obj_port_mdb *mdb;
> +
> +	mdb = kzalloc(sizeof(*mdb), GFP_ATOMIC);
> +	if (!mdb)
> +		return -ENOMEM;
> +
> +	mdb->obj.id = id;
> +	mdb->obj.orig_dev = orig_dev;
> +	br_switchdev_mdb_populate(mdb, mp);
> +	list_add_tail(&mdb->obj.list, mdb_list);
> +
> +	return 0;
> +}
> +
> +int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
> +		  struct notifier_block *nb, struct netlink_ext_ack *extack)
> +{
> +	struct net_bridge_mdb_entry *mp;
> +	struct switchdev_obj *obj, *tmp;
> +	struct net_bridge *br;
> +	LIST_HEAD(mdb_list);
> +	int err = 0;
> +
> +	ASSERT_RTNL();
> +
> +	if (!netif_is_bridge_master(br_dev) || !netif_is_bridge_port(dev))
> +		return -EINVAL;
> +
> +	br = netdev_priv(br_dev);
> +
> +	if (!br_opt_get(br, BROPT_MULTICAST_ENABLED))
> +		return 0;
> +
> +	/* We cannot walk over br->mdb_list protected just by the rtnl_mutex,
> +	 * because the write-side protection is br->multicast_lock. But we
> +	 * need to emulate the [ blocking ] calling context of a regular
> +	 * switchdev event, so since both br->multicast_lock and RCU read side
> +	 * critical sections are atomic, we have no choice but to pick the RCU
> +	 * read side lock, queue up all our events, leave the critical section
> +	 * and notify switchdev from blocking context.
> +	 */
> +	rcu_read_lock();
> +
> +	hlist_for_each_entry_rcu(mp, &br->mdb_list, mdb_node) {
> +		struct net_bridge_port_group __rcu **pp;
> +		struct net_bridge_port_group *p;
> +
> +		if (mp->host_joined) {
> +			err = br_mdb_queue_one(&mdb_list,
> +					       SWITCHDEV_OBJ_ID_HOST_MDB,
> +					       mp, br_dev);
> +			if (err) {
> +				rcu_read_unlock();
> +				goto out_free_mdb;
> +			}
> +		}
> +
> +		for (pp = &mp->ports; (p = rcu_dereference(*pp)) != NULL;
> +		     pp = &p->next) {
> +			if (p->key.port->dev != dev)
> +				continue;
> +
> +			err = br_mdb_queue_one(&mdb_list,
> +					       SWITCHDEV_OBJ_ID_PORT_MDB,
> +					       mp, dev);
> +			if (err) {
> +				rcu_read_unlock();
> +				goto out_free_mdb;
> +			}
> +		}
> +	}
> +
> +	rcu_read_unlock();
> +
> +	list_for_each_entry(obj, &mdb_list, list) {
> +		err = br_mdb_replay_one(nb, dev, SWITCHDEV_OBJ_PORT_MDB(obj),
> +					extack);
> +		if (err)
> +			goto out_free_mdb;
> +	}
> +
> +out_free_mdb:
> +	list_for_each_entry_safe(obj, tmp, &mdb_list, list) {
> +		list_del(&obj->list);
> +		kfree(SWITCHDEV_OBJ_PORT_MDB(obj));
> +	}
> +
> +	return err;
> +}
> +EXPORT_SYMBOL_GPL(br_mdb_replay);
> +
>  static void br_mdb_switchdev_host_port(struct net_device *dev,
>  				       struct net_device *lower_dev,
>  				       struct net_bridge_mdb_entry *mp,
> @@ -515,18 +643,12 @@ static void br_mdb_switchdev_host_port(struct net_device *dev,
>  		.obj = {
>  			.id = SWITCHDEV_OBJ_ID_HOST_MDB,
>  			.flags = SWITCHDEV_F_DEFER,
> +			.orig_dev = dev,
>  		},
> -		.vid = mp->addr.vid,
>  	};
>  
> -	if (mp->addr.proto == htons(ETH_P_IP))
> -		ip_eth_mc_map(mp->addr.dst.ip4, mdb.addr);
> -#if IS_ENABLED(CONFIG_IPV6)
> -	else
> -		ipv6_eth_mc_map(&mp->addr.dst.ip6, mdb.addr);
> -#endif
> +	br_switchdev_mdb_populate(&mdb, mp);
>  
> -	mdb.obj.orig_dev = dev;
>  	switch (type) {
>  	case RTM_NEWMDB:
>  		switchdev_port_obj_add(lower_dev, &mdb.obj, NULL);
> @@ -558,21 +680,13 @@ void br_mdb_notify(struct net_device *dev,
>  			.id = SWITCHDEV_OBJ_ID_PORT_MDB,
>  			.flags = SWITCHDEV_F_DEFER,
>  		},
> -		.vid = mp->addr.vid,
>  	};
>  	struct net *net = dev_net(dev);
>  	struct sk_buff *skb;
>  	int err = -ENOBUFS;
>  
>  	if (pg) {
> -		if (mp->addr.proto == htons(ETH_P_IP))
> -			ip_eth_mc_map(mp->addr.dst.ip4, mdb.addr);
> -#if IS_ENABLED(CONFIG_IPV6)
> -		else if (mp->addr.proto == htons(ETH_P_IPV6))
> -			ipv6_eth_mc_map(&mp->addr.dst.ip6, mdb.addr);
> -#endif
> -		else
> -			ether_addr_copy(mdb.addr, mp->addr.dst.mac_addr);
> +		br_switchdev_mdb_populate(&mdb, mp);
>  
>  		mdb.obj.orig_dev = pg->key.port->dev;
>  		switch (type) {
> 
n

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 net-next 05/11] net: bridge: add helper to replay VLANs installed on port
  2021-03-22 23:51 ` [PATCH v4 net-next 05/11] net: bridge: add helper to replay VLANs installed on port Vladimir Oltean
@ 2021-03-23 14:06   ` Nikolay Aleksandrov
  0 siblings, 0 replies; 19+ messages in thread
From: Nikolay Aleksandrov @ 2021-03-23 14:06 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski, David S. Miller
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Tobias Waldekranz,
	Claudiu Manoil, netdev, linux-kernel, Roopa Prabhu, Jiri Pirko,
	Ido Schimmel, Alexandre Belloni, UNGLinuxDriver, Ivan Vecera,
	linux-omap, Vladimir Oltean

On 23/03/2021 01:51, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> Currently this simple setup with DSA:
> 
> ip link add br0 type bridge vlan_filtering 1
> ip link add bond0 type bond
> ip link set bond0 master br0
> ip link set swp0 master bond0
> 
> will not work because the bridge has created the PVID in br_add_if ->
> nbp_vlan_init, and it has notified switchdev of the existence of VLAN 1,
> but that was too early, since swp0 was not yet a lower of bond0, so it
> had no reason to act upon that notification.
> 
> We need a helper in the bridge to replay the switchdev VLAN objects that
> were notified since the bridge port creation, because some of them may
> have been missed.
> 
> As opposed to the br_mdb_replay function, the vg->vlan_list write side
> protection is offered by the rtnl_mutex which is sleepable, so we don't
> need to queue up the objects in atomic context, we can replay them right
> away.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  include/linux/if_bridge.h | 10 ++++++
>  net/bridge/br_vlan.c      | 73 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 83 insertions(+)
> 

Same comments about the const qualifiers as the other patches.
The code looks good to me otherwise.

Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>

> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index b564c4486a45..2cc35038a8ca 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -111,6 +111,8 @@ int br_vlan_get_pvid_rcu(const struct net_device *dev, u16 *p_pvid);
>  int br_vlan_get_proto(const struct net_device *dev, u16 *p_proto);
>  int br_vlan_get_info(const struct net_device *dev, u16 vid,
>  		     struct bridge_vlan_info *p_vinfo);
> +int br_vlan_replay(struct net_device *br_dev, struct net_device *dev,
> +		   struct notifier_block *nb, struct netlink_ext_ack *extack);
>  #else
>  static inline bool br_vlan_enabled(const struct net_device *dev)
>  {
> @@ -137,6 +139,14 @@ static inline int br_vlan_get_info(const struct net_device *dev, u16 vid,
>  {
>  	return -EINVAL;
>  }
> +
> +static inline int br_vlan_replay(struct net_device *br_dev,
> +				 struct net_device *dev,
> +				 struct notifier_block *nb,
> +				 struct netlink_ext_ack *extack)
> +{
> +	return -EOPNOTSUPP;
> +}
>  #endif
>  
>  #if IS_ENABLED(CONFIG_BRIDGE)
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index 8829f621b8ec..ca8daccff217 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -1751,6 +1751,79 @@ void br_vlan_notify(const struct net_bridge *br,
>  	kfree_skb(skb);
>  }
>  
> +static int br_vlan_replay_one(struct notifier_block *nb,
> +			      struct net_device *dev,
> +			      struct switchdev_obj_port_vlan *vlan,
> +			      struct netlink_ext_ack *extack)
> +{
> +	struct switchdev_notifier_port_obj_info obj_info = {
> +		.info = {
> +			.dev = dev,
> +			.extack = extack,
> +		},
> +		.obj = &vlan->obj,
> +	};
> +	int err;
> +
> +	err = nb->notifier_call(nb, SWITCHDEV_PORT_OBJ_ADD, &obj_info);
> +	return notifier_to_errno(err);
> +}
> +
> +int br_vlan_replay(struct net_device *br_dev, struct net_device *dev,
> +		   struct notifier_block *nb, struct netlink_ext_ack *extack)
> +{
> +	struct net_bridge_vlan_group *vg;
> +	struct net_bridge_vlan *v;
> +	struct net_bridge_port *p;
> +	struct net_bridge *br;
> +	int err = 0;
> +	u16 pvid;
> +
> +	ASSERT_RTNL();
> +
> +	if (!netif_is_bridge_master(br_dev))
> +		return -EINVAL;
> +
> +	if (!netif_is_bridge_master(dev) && !netif_is_bridge_port(dev))
> +		return -EINVAL;
> +
> +	if (netif_is_bridge_master(dev)) {
> +		br = netdev_priv(dev);
> +		vg = br_vlan_group(br);
> +		p = NULL;
> +	} else {
> +		p = br_port_get_rtnl(dev);
> +		if (WARN_ON(!p))
> +			return -EINVAL;
> +		vg = nbp_vlan_group(p);
> +		br = p->br;
> +	}
> +
> +	if (!vg)
> +		return 0;
> +
> +	pvid = br_get_pvid(vg);
> +
> +	list_for_each_entry(v, &vg->vlan_list, vlist) {
> +		struct switchdev_obj_port_vlan vlan = {
> +			.obj.orig_dev = dev,
> +			.obj.id = SWITCHDEV_OBJ_ID_PORT_VLAN,
> +			.flags = br_vlan_flags(v, pvid),
> +			.vid = v->vid,
> +		};
> +
> +		if (!br_vlan_should_use(v))
> +			continue;
> +
> +		br_vlan_replay_one(nb, dev, &vlan, extack);
> +		if (err)
> +			return err;
> +	}
> +
> +	return err;
> +}
> +EXPORT_SYMBOL_GPL(br_vlan_replay);
> +
>  /* check if v_curr can enter a range ending in range_end */
>  bool br_vlan_can_enter_range(const struct net_bridge_vlan *v_curr,
>  			     const struct net_bridge_vlan *range_end)
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 net-next 04/11] net: bridge: add helper to replay port and local fdb entries
  2021-03-23 11:12   ` Nikolay Aleksandrov
@ 2021-03-23 18:11     ` Vladimir Oltean
  0 siblings, 0 replies; 19+ messages in thread
From: Vladimir Oltean @ 2021-03-23 18:11 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Jakub Kicinski, David S. Miller, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, Tobias Waldekranz, Claudiu Manoil, netdev,
	linux-kernel, Roopa Prabhu, Jiri Pirko, Ido Schimmel,
	Alexandre Belloni, UNGLinuxDriver, Ivan Vecera, linux-omap,
	Vladimir Oltean

On Tue, Mar 23, 2021 at 01:12:33PM +0200, Nikolay Aleksandrov wrote:
> On 23/03/2021 01:51, Vladimir Oltean wrote:
> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > 
> > When a switchdev port starts offloading a LAG that is already in a
> > bridge and has an FDB entry pointing to it:
> > 
> > ip link set bond0 master br0
> > bridge fdb add dev bond0 00:01:02:03:04:05 master static
> > ip link set swp0 master bond0
> > 
> > the switchdev driver will have no idea that this FDB entry is there,
> > because it missed the switchdev event emitted at its creation.
> > 
> > Ido Schimmel pointed this out during a discussion about challenges with
> > switchdev offloading of stacked interfaces between the physical port and
> > the bridge, and recommended to just catch that condition and deny the
> > CHANGEUPPER event:
> > https://lore.kernel.org/netdev/20210210105949.GB287766@shredder.lan/
> > 
> > But in fact, we might need to deal with the hard thing anyway, which is
> > to replay all FDB addresses relevant to this port, because it isn't just
> > static FDB entries, but also local addresses (ones that are not
> > forwarded but terminated by the bridge). There, we can't just say 'oh
> > yeah, there was an upper already so I'm not joining that'.
> > 
> > So, similar to the logic for replaying MDB entries, add a function that
> > must be called by individual switchdev drivers and replays local FDB
> > entries as well as ones pointing towards a bridge port. This time, we
> > use the atomic switchdev notifier block, since that's what FDB entries
> > expect for some reason.
> > 
> 
> I get the reason to have both bridge and bridge port devices (although the bridge
> is really unnecessary as it can be inferred from the port), but it looks kind of
> weird at first glance, I mean we get all of the port's fdbs and all of the bridge
> fdbs every time (dst == NULL). The code itself is correct and the alternative
> to take only 1 net_device and act based on its type would add another
> step to the process per-port which also doesn't sound good...
> There are a few minor const nits below too, again if there is another version
> please take care of them, for the patch:
> 
> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>

Thanks for the review. For host MDB entries, those are already offloaded
to every bridge port (which yes, is still giving me headaches), so
replaying them for every port that calls br_mdb_replay is at least
consistent with that. For br_fdb_replay, honestly I am not yet sure
because mainline DSA does not yet handle local FDBs, I might end up
touching things up a little when I come back to the "RX filtering in
DSA" series (I need to address Ido's feedback by then too).  I would
just like to get something started. It's even possible that by the end
of the kernel development cycle, the end result might not even look
anything remotely similar to what we have here - this is just what I
deemed as "good enough as a small first step".

If nobody has objections or sees problems with the current series, I
think I'd prefer to send a follow-up with the const conversions, so I
can spam less people with another 11 emails.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA
  2021-03-22 23:51 [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
                   ` (10 preceding siblings ...)
  2021-03-22 23:51 ` [PATCH v4 net-next 11/11] net: ocelot: replay switchdev events when joining bridge Vladimir Oltean
@ 2021-03-23 22:00 ` patchwork-bot+netdevbpf
  11 siblings, 0 replies; 19+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-03-23 22:00 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: kuba, davem, andrew, vivien.didelot, f.fainelli, tobias,
	claudiu.manoil, netdev, linux-kernel, roopa, nikolay, jiri,
	idosch, alexandre.belloni, UNGLinuxDriver, ivecera, linux-omap,
	vladimir.oltean

Hello:

This series was applied to netdev/net-next.git (refs/heads/master):

On Tue, 23 Mar 2021 01:51:41 +0200 you wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> Changes in v4:
> - Added missing EXPORT_SYMBOL_GPL
> - Using READ_ONCE(fdb->dst)
> - Split patches into (a) adding the bridge helpers (b) making DSA use them
> - br_mdb_replay went back to the v1 approach where it allocated memory
>   in atomic context
> - Created a br_switchdev_mdb_populate which reduces some of the code
>   duplication
> - Fixed the error message in dsa_port_clear_brport_flags
> - Replaced "dsa_port_vlan_filtering(dp, br, extack)" with
>   "dsa_port_vlan_filtering(dp, br_vlan_enabled(br), extack)" (duh)
> - Added review tags (sorry if I missed any)
> 
> [...]

Here is the summary with links:
  - [v4,net-next,01/11] net: bridge: add helper for retrieving the current bridge port STP state
    https://git.kernel.org/netdev/net-next/c/c0e715bbd50e
  - [v4,net-next,02/11] net: bridge: add helper to retrieve the current ageing time
    https://git.kernel.org/netdev/net-next/c/f1d42ea10056
  - [v4,net-next,03/11] net: bridge: add helper to replay port and host-joined mdb entries
    https://git.kernel.org/netdev/net-next/c/4f2673b3a2b6
  - [v4,net-next,04/11] net: bridge: add helper to replay port and local fdb entries
    https://git.kernel.org/netdev/net-next/c/04846f903b53
  - [v4,net-next,05/11] net: bridge: add helper to replay VLANs installed on port
    https://git.kernel.org/netdev/net-next/c/22f67cdfae6a
  - [v4,net-next,06/11] net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a bridge
    https://git.kernel.org/netdev/net-next/c/185c9a760a61
  - [v4,net-next,07/11] net: dsa: pass extack to dsa_port_{bridge,lag}_join
    https://git.kernel.org/netdev/net-next/c/2afc526ab342
  - [v4,net-next,08/11] net: dsa: inherit the actual bridge port flags at join time
    https://git.kernel.org/netdev/net-next/c/5961d6a12c13
  - [v4,net-next,09/11] net: dsa: sync up switchdev objects and port attributes when joining the bridge
    https://git.kernel.org/netdev/net-next/c/010e269f91be
  - [v4,net-next,10/11] net: ocelot: call ocelot_netdevice_bridge_join when joining a bridged LAG
    https://git.kernel.org/netdev/net-next/c/81ef35e7619a
  - [v4,net-next,11/11] net: ocelot: replay switchdev events when joining bridge
    https://git.kernel.org/netdev/net-next/c/e4bd44e89dcf

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-03-23 22:01 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-22 23:51 [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA Vladimir Oltean
2021-03-22 23:51 ` [PATCH v4 net-next 01/11] net: bridge: add helper for retrieving the current bridge port STP state Vladimir Oltean
2021-03-23 10:33   ` Nikolay Aleksandrov
2021-03-22 23:51 ` [PATCH v4 net-next 02/11] net: bridge: add helper to retrieve the current ageing time Vladimir Oltean
2021-03-23 10:36   ` Nikolay Aleksandrov
2021-03-22 23:51 ` [PATCH v4 net-next 03/11] net: bridge: add helper to replay port and host-joined mdb entries Vladimir Oltean
2021-03-23 12:00   ` Nikolay Aleksandrov
2021-03-22 23:51 ` [PATCH v4 net-next 04/11] net: bridge: add helper to replay port and local fdb entries Vladimir Oltean
2021-03-23 11:12   ` Nikolay Aleksandrov
2021-03-23 18:11     ` Vladimir Oltean
2021-03-22 23:51 ` [PATCH v4 net-next 05/11] net: bridge: add helper to replay VLANs installed on port Vladimir Oltean
2021-03-23 14:06   ` Nikolay Aleksandrov
2021-03-22 23:51 ` [PATCH v4 net-next 06/11] net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a bridge Vladimir Oltean
2021-03-22 23:51 ` [PATCH v4 net-next 07/11] net: dsa: pass extack to dsa_port_{bridge,lag}_join Vladimir Oltean
2021-03-22 23:51 ` [PATCH v4 net-next 08/11] net: dsa: inherit the actual bridge port flags at join time Vladimir Oltean
2021-03-22 23:51 ` [PATCH v4 net-next 09/11] net: dsa: sync up switchdev objects and port attributes when joining the bridge Vladimir Oltean
2021-03-22 23:51 ` [PATCH v4 net-next 10/11] net: ocelot: call ocelot_netdevice_bridge_join when joining a bridged LAG Vladimir Oltean
2021-03-22 23:51 ` [PATCH v4 net-next 11/11] net: ocelot: replay switchdev events when joining bridge Vladimir Oltean
2021-03-23 22:00 ` [PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).