All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Oltean <olteanv@gmail.com>
To: netdev@vger.kernel.org
Cc: Andrew Lunn <andrew@lunn.ch>,
	Florian Fainelli <f.fainelli@gmail.com>,
	Vivien Didelot <vivien.didelot@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>, Ido Schimmel <idosch@idosch.org>,
	DENG Qingfang <dqfext@gmail.com>,
	Tobias Waldekranz <tobias@waldekranz.com>,
	George McCollister <george.mccollister@gmail.com>,
	Vlad Yasevich <vyasevich@gmail.com>,
	Roopa Prabhu <roopa@nvidia.com>,
	Nikolay Aleksandrov <nikolay@nvidia.com>
Subject: [RFC PATCH v2 net-next 14/17] net: dsa: replay port and host-joined mdb entries when joining the bridge
Date: Wed, 24 Feb 2021 13:43:47 +0200	[thread overview]
Message-ID: <20210224114350.2791260-15-olteanv@gmail.com> (raw)
In-Reply-To: <20210224114350.2791260-1-olteanv@gmail.com>

From: Vladimir Oltean <vladimir.oltean@nxp.com>

I have udhcpcd in my system and this is configured to bring interfaces
up as soon as they are created.

I create a bridge as follows:

ip link add br0 type bridge

As soon as I create the bridge and udhcpcd brings it up, I have some
other crap (avahi) that starts sending some random IPv6 packets to
advertise some local services, and from there, the br0 bridge joins the
following IPv6 groups:

33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0

br_dev_xmit
-> br_multicast_rcv
   -> br_ip6_multicast_add_group
      -> __br_multicast_add_group
         -> br_multicast_host_join
            -> br_mdb_notify

This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.

Then when we add a port to br0:

ip link set swp0 master br0

the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.

The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.

Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.

To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.

Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/linux/if_bridge.h |   8 +++
 include/net/switchdev.h   |   1 +
 net/bridge/br_mdb.c       | 117 ++++++++++++++++++++++++++++++++++++++
 net/dsa/slave.c           |  17 +++++-
 4 files changed, 141 insertions(+), 2 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index b979005ea39c..2f0e5713bf39 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -69,6 +69,8 @@ bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto);
 bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto);
 bool br_multicast_enabled(const struct net_device *dev);
 bool br_multicast_router(const struct net_device *dev);
+int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
+		  struct notifier_block *nb);
 #else
 static inline int br_multicast_list_adjacent(struct net_device *dev,
 					     struct list_head *br_ip_list)
@@ -93,6 +95,12 @@ static inline bool br_multicast_router(const struct net_device *dev)
 {
 	return false;
 }
+static inline int br_mdb_replay(struct net_device *br_dev,
+				struct net_device *dev,
+				struct notifier_block *nb)
+{
+	return -EINVAL;
+}
 #endif
 
 #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_VLAN_FILTERING)
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index ca7223a79135..f1a5a9a3634d 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -68,6 +68,7 @@ enum switchdev_obj_id {
 };
 
 struct switchdev_obj {
+	struct list_head list;
 	struct net_device *orig_dev;
 	enum switchdev_obj_id id;
 	u32 flags;
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 8846c5bcd075..170353510c35 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -506,6 +506,123 @@ static void br_mdb_complete(struct net_device *dev, int err, void *priv)
 	kfree(priv);
 }
 
+static int br_mdb_replay_one(struct notifier_block *nb, struct net_device *dev,
+			     struct switchdev_obj_port_mdb *mdb)
+{
+	struct switchdev_notifier_port_obj_info obj_info = {
+		.info = {
+			.dev = dev,
+		},
+		.obj = &mdb->obj,
+	};
+	int err;
+
+	err = nb->notifier_call(nb, SWITCHDEV_PORT_OBJ_ADD, &obj_info);
+	return notifier_to_errno(err);
+}
+
+static int br_mdb_queue_one(struct list_head *mdb_list,
+			    enum switchdev_obj_id id,
+			    struct net_bridge_mdb_entry *mp,
+			    struct net_device *orig_dev)
+{
+	struct switchdev_obj_port_mdb *mdb;
+
+	mdb = kzalloc(sizeof(*mdb), GFP_ATOMIC);
+	if (!mdb)
+		return -ENOMEM;
+
+	mdb->obj.id = id;
+	mdb->obj.orig_dev = orig_dev;
+	mdb->vid = mp->addr.vid;
+
+	if (mp->addr.proto == htons(ETH_P_IP))
+		ip_eth_mc_map(mp->addr.dst.ip4, mdb->addr);
+#if IS_ENABLED(CONFIG_IPV6)
+	else if (mp->addr.proto == htons(ETH_P_IPV6))
+		ipv6_eth_mc_map(&mp->addr.dst.ip6, mdb->addr);
+#endif
+	else
+		ether_addr_copy(mdb->addr, mp->addr.dst.mac_addr);
+
+	list_add_tail(&mdb->obj.list, mdb_list);
+
+	return 0;
+}
+
+int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
+		  struct notifier_block *nb)
+{
+	struct net_bridge_mdb_entry *mp;
+	struct switchdev_obj *obj, *tmp;
+	struct list_head mdb_list;
+	struct net_bridge *br;
+	int err = 0;
+
+	ASSERT_RTNL();
+
+	INIT_LIST_HEAD(&mdb_list);
+
+	if (!netif_is_bridge_master(br_dev))
+		return -EINVAL;
+
+	if (!netif_is_bridge_port(dev))
+		return -EINVAL;
+
+	br = netdev_priv(br_dev);
+
+	if (!br_opt_get(br, BROPT_MULTICAST_ENABLED))
+		return 0;
+
+	rcu_read_lock();
+
+	hlist_for_each_entry_rcu(mp, &br->mdb_list, mdb_node) {
+		struct net_bridge_port_group __rcu **pp;
+		struct net_bridge_port_group *p;
+
+		if (mp->host_joined) {
+			err = br_mdb_queue_one(&mdb_list,
+					       SWITCHDEV_OBJ_ID_HOST_MDB,
+					       mp, br_dev);
+			if (err) {
+				rcu_read_unlock();
+				goto out_free_mdb;
+			}
+		}
+
+		for (pp = &mp->ports; (p = rcu_dereference(*pp)) != NULL;
+		     pp = &p->next) {
+			if (p->key.port->dev != dev)
+				continue;
+
+			err = br_mdb_queue_one(&mdb_list,
+					       SWITCHDEV_OBJ_ID_PORT_MDB,
+					       mp, dev);
+			if (err) {
+				rcu_read_unlock();
+				goto out_free_mdb;
+			}
+		}
+	}
+
+	rcu_read_unlock();
+
+	list_for_each_entry(obj, &mdb_list, list) {
+		err = br_mdb_replay_one(nb, dev, SWITCHDEV_OBJ_PORT_MDB(obj));
+		if (err)
+			goto out_free_mdb;
+	}
+
+out_free_mdb:
+	list_for_each_entry_safe(obj, tmp, &mdb_list, list) {
+		list_del(&obj->list);
+		kfree(SWITCHDEV_OBJ_PORT_MDB(obj));
+	}
+
+	return err;
+}
+EXPORT_SYMBOL(br_mdb_replay);
+
 static void br_mdb_switchdev_host_port(struct net_device *dev,
 				       struct net_device *lower_dev,
 				       struct net_bridge_mdb_entry *mp,
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index a32875d3dc5f..10b4a0f72dcb 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -2290,6 +2290,9 @@ bool dsa_slave_dev_check(const struct net_device *dev)
 }
 EXPORT_SYMBOL_GPL(dsa_slave_dev_check);
 
+/* Circular reference */
+static struct notifier_block dsa_slave_switchdev_blocking_notifier;
+
 static int dsa_slave_changeupper(struct net_device *dev,
 				 struct netdev_notifier_changeupper_info *info)
 {
@@ -2297,10 +2300,15 @@ static int dsa_slave_changeupper(struct net_device *dev,
 	int err = NOTIFY_DONE;
 
 	if (netif_is_bridge_master(info->upper_dev)) {
+		struct net_device *bridge_dev = info->upper_dev;
+
 		if (info->linking) {
-			err = dsa_port_bridge_join(dp, info->upper_dev);
-			if (!err)
+			err = dsa_port_bridge_join(dp, bridge_dev);
+			if (!err) {
 				dsa_bridge_mtu_normalization(dp);
+				br_mdb_replay(bridge_dev, dev,
+					      &dsa_slave_switchdev_blocking_notifier);
+			}
 			err = notifier_from_errno(err);
 		} else {
 			dsa_port_bridge_leave(dp, info->upper_dev);
@@ -2361,6 +2369,11 @@ dsa_slave_lag_changeupper(struct net_device *dev,
 			break;
 	}
 
+	if (netif_is_bridge_master(info->upper_dev) && !err) {
+		br_mdb_replay(info->upper_dev, dev,
+			      &dsa_slave_switchdev_blocking_notifier);
+	}
+
 	return err;
 }
 
-- 
2.25.1


  parent reply	other threads:[~2021-02-24 11:48 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-24 11:43 [RFC PATCH v2 net-next 00/17] RX filtering in DSA Vladimir Oltean
2021-02-24 11:43 ` [RFC PATCH v2 net-next 01/17] net: dsa: reference count the host mdb addresses Vladimir Oltean
2021-02-26  9:20   ` Tobias Waldekranz
2021-02-24 11:43 ` [RFC PATCH v2 net-next 02/17] net: dsa: reference count the host fdb addresses Vladimir Oltean
2021-02-24 11:43 ` [RFC PATCH v2 net-next 03/17] net: dsa: install the host MDB and FDB entries in the master's RX filter Vladimir Oltean
2021-02-24 11:43 ` [RFC PATCH v2 net-next 04/17] net: dsa: install the port MAC addresses as host fdb entries Vladimir Oltean
2021-02-24 11:43 ` [RFC PATCH v2 net-next 05/17] net: bridge: implement unicast filtering for the bridge device Vladimir Oltean
2021-03-01 15:22   ` Ido Schimmel
2022-02-22 11:21     ` Vladimir Oltean
2022-02-22 16:54       ` Ido Schimmel
2022-02-22 17:18         ` Vladimir Oltean
2022-02-24 13:22           ` Ido Schimmel
2022-02-24 13:52             ` Vladimir Oltean
2022-03-01 16:20               ` Ido Schimmel
2022-03-02 11:17                 ` Vladimir Oltean
2021-02-24 11:43 ` [RFC PATCH v2 net-next 06/17] net: dsa: add addresses obtained from RX filtering to host addresses Vladimir Oltean
2021-02-26 10:59   ` Tobias Waldekranz
2021-02-26 13:28     ` Vladimir Oltean
2021-02-26 22:44       ` Tobias Waldekranz
2021-02-24 11:43 ` [RFC PATCH v2 net-next 07/17] net: bridge: switchdev: refactor br_switchdev_fdb_notify Vladimir Oltean
2021-02-24 11:43 ` [RFC PATCH v2 net-next 08/17] net: bridge: switchdev: include local flag in FDB notifications Vladimir Oltean
2021-02-24 11:43 ` [RFC PATCH v2 net-next 09/17] net: bridge: switchdev: send FDB notifications for host addresses Vladimir Oltean
2021-02-24 11:43 ` [RFC PATCH v2 net-next 10/17] net: dsa: include bridge addresses which are local in the host fdb list Vladimir Oltean
2021-02-24 11:43 ` [RFC PATCH v2 net-next 11/17] net: dsa: include fdb entries pointing to bridge " Vladimir Oltean
2021-02-24 11:43 ` [RFC PATCH v2 net-next 12/17] net: dsa: sync static FDB entries on foreign interfaces to hardware Vladimir Oltean
2021-02-24 11:43 ` [RFC PATCH v2 net-next 13/17] net: dsa: mv88e6xxx: Request assisted learning on CPU port Vladimir Oltean
2021-02-24 11:43 ` Vladimir Oltean [this message]
2021-02-24 11:43 ` [RFC PATCH v2 net-next 15/17] net: dsa: replay port and local fdb entries when joining the bridge Vladimir Oltean
2021-02-26 12:23   ` Tobias Waldekranz
2021-02-26 18:08     ` Vladimir Oltean
2021-02-24 11:43 ` [RFC PATCH v2 net-next 16/17] net: bridge: switchdev: let drivers inform which bridge ports are offloaded Vladimir Oltean
2021-02-24 14:25   ` kernel test robot
2021-02-24 11:43 ` [RFC PATCH v2 net-next 17/17] net: bridge: offloaded ports are always promiscuous Vladimir Oltean
2021-02-24 15:21   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210224114350.2791260-15-olteanv@gmail.com \
    --to=olteanv@gmail.com \
    --cc=andrew@lunn.ch \
    --cc=dqfext@gmail.com \
    --cc=f.fainelli@gmail.com \
    --cc=george.mccollister@gmail.com \
    --cc=idosch@idosch.org \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@nvidia.com \
    --cc=roopa@nvidia.com \
    --cc=tobias@waldekranz.com \
    --cc=vivien.didelot@gmail.com \
    --cc=vyasevich@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.