All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Oltean <vladimir.oltean@nxp.com>
To: netdev@vger.kernel.org, Roopa Prabhu <roopa@nvidia.com>,
	Nikolay Aleksandrov <nikolay@nvidia.com>,
	Ido Schimmel <idosch@nvidia.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Andrew Lunn <andrew@lunn.ch>,
	Florian Fainelli <f.fainelli@gmail.com>,
	Vivien Didelot <vivien.didelot@gmail.com>,
	Vladimir Oltean <olteanv@gmail.com>, Jiri Pirko <jiri@nvidia.com>
Subject: [RFC PATCH net-next 11/15] net: bridge: make fdb_add_entry() wait for switchdev feedback
Date: Tue, 26 Oct 2021 01:24:11 +0300	[thread overview]
Message-ID: <20211025222415.983883-12-vladimir.oltean@nxp.com> (raw)
In-Reply-To: <20211025222415.983883-1-vladimir.oltean@nxp.com>

At the moment, switchdev gets notified of FDB entries via the
br_switchdev_fdb_notify() function, which is called from fdb_notify().
In turn, fdb_notify is called from a wide variety of places, mixing data
path learning, STP timers, sysfs handlers, netlink IFLA_BRPORT_FLUSH
handlers, RTM_NEWNEIGH/RTM_DELNEIGH handlers, FDB garbage collection
timers, and others.

The common denominator is that FDB entries are created and added to the
bridge hash table under the br->hash_lock. And so is the switchdev
notification.

Because SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE events are notified on the
switchdev atomic notifier chain and all drivers need sleepable context
to offload those FDB entries, the current situation is that all drivers
register a private workqueue on which they schedule work items from the
atomic switchdev notifier chain. The practical implication is that they
can't return error codes and extack messages from their private
workqueue, and even if they could (or if they would return an error
directly from the atomic notifier), the bridge would still ignore those.

We're not structurally changing anything, because there are reasons why
things are the way they are (the theoretical possibility of performance,
basically). Instead, this patch just adds a mechanism based on a
completion structure by which user space can wait for the driver's
deferred work item to finish, and return an error code.

It works as follows:

- br_switchdev_fdb_notify() behaves as before, but we introduce a new
  br_switchdev_fdb_notify_async() which contains some bridge data.
  The functions which don't care what switchdev has to say keep calling
  br_switchdev_fdb_notify(), the others (for now fdb_add_entry(), the
  function that processes RTM_NEWNEIGH) calls
  br_switchdev_fdb_notify_async()

- every function that calls br_switchdev_fdb_notify_async() must declare
  a struct br_switchdev_fdb_wait_ctx on stack. This is the storage for
  the completion structure. Then br_switchdev_fdb_notify_async() will
  create a special struct switchdev_notifier_fdb_info that contains some
  function pointers that wake up the bridge process waiting for this
  completion.

- every function that calls br_switchdev_fdb_notify_async() under
  br->hash_lock must release this lock before it can sleep waiting for
  the completion, then it has to take the lock again and search for the
  FDB entry again.

- in the case of fdb_add_entry(), we have nothing to do if switchdev was
  happy, otherwise we need to take the hash_lock again and delete the
  FDB entry we've just created. We may not find the FDB entry we were
  trying to delete, due to factors such as ultra short ageing time.
  In those cases we do nothing. The rollback logic for fdb_add_entry()
  is copied from fdb_delete_by_addr_and_port() except here we don't
  notify switchdev - we know it doesn't contain that entry and doesn't
  want it. I've renamed the existing fdb_delete_by_addr_and_port()
  function into fdb_delete_by_addr_and_port_switchdev().

The API exposed to switchdev drivers is comprised of two functions:

* switchdev_fdb_mark_pending() must be called from atomic context, to
  tell the bridge to wait
* switchdev_fdb_mark_done() can be called from any context, it tells
  the bridge to stop waiting

When nobody has called switchdev_fdb_mark_pending(), the bridge doesn't
wait. There is no race condition here, because the atomic notifiers have
finished by the time br_switchdev_fdb_notify_async() finishes.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/net/switchdev.h   |  17 +++++
 net/bridge/br_fdb.c       | 138 +++++++++++++++++++++++++++++++++++---
 net/bridge/br_private.h   |  16 +++++
 net/bridge/br_switchdev.c |  34 ++++++++--
 4 files changed, 190 insertions(+), 15 deletions(-)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 559f63abc15b..67f7b22e5332 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -218,6 +218,9 @@ struct switchdev_notifier_info {
 
 struct switchdev_notifier_fdb_info {
 	struct switchdev_notifier_info info; /* must be first */
+	void (*pending_fn)(unsigned long cookie);
+	void (*done_fn)(unsigned long cookie, int err);
+	unsigned long cookie;
 	unsigned char addr[ETH_ALEN];
 	u16 vid;
 	u8 added_by_user:1,
@@ -260,6 +263,20 @@ switchdev_fdb_is_dynamically_learned(const struct switchdev_notifier_fdb_info *f
 	return !fdb_info->added_by_user && !fdb_info->is_local;
 }
 
+static inline void
+switchdev_fdb_mark_pending(const struct switchdev_notifier_fdb_info *fdb_info)
+{
+	if (fdb_info->pending_fn)
+		fdb_info->pending_fn(fdb_info->cookie);
+}
+
+static inline void
+switchdev_fdb_mark_done(const struct switchdev_notifier_fdb_info *fdb_info, int err)
+{
+	if (fdb_info->done_fn)
+		fdb_info->done_fn(fdb_info->cookie, err);
+}
+
 #ifdef CONFIG_NET_SWITCHDEV
 
 int switchdev_bridge_port_offload(struct net_device *brport_dev,
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 2095bdf24e42..e8afe64dadcc 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -201,6 +201,83 @@ static void fdb_notify(struct net_bridge *br,
 	rtnl_set_sk_err(net, RTNLGRP_NEIGH, err);
 }
 
+struct br_switchdev_fdb_wait_ctx {
+	/* Serializes switchdev driver calls to switchdev_fdb_mark_done */
+	struct mutex lock;
+	struct completion done;
+	bool pending;
+	int pending_count;
+	int err;
+};
+
+static void
+br_switchdev_fdb_wait_ctx_init(struct br_switchdev_fdb_wait_ctx *wait_ctx)
+{
+	wait_ctx->pending = false;
+	wait_ctx->pending_count = 0;
+	wait_ctx->err = 0;
+	mutex_init(&wait_ctx->lock);
+	init_completion(&wait_ctx->done);
+}
+
+static void br_switchdev_fdb_pending(unsigned long cookie)
+{
+	struct br_switchdev_fdb_wait_ctx *wait_ctx;
+
+	wait_ctx = (struct br_switchdev_fdb_wait_ctx *)cookie;
+
+	wait_ctx->pending = true;
+	/* Drivers serialize on the switchdev atomic notifier chain when they
+	 * call switchdev_fdb_mark_pending, so no locking is necessary.
+	 */
+	wait_ctx->pending_count++;
+}
+
+static void br_switchdev_fdb_done(unsigned long cookie, int err)
+{
+	struct br_switchdev_fdb_wait_ctx *wait_ctx;
+	bool done;
+
+	wait_ctx = (struct br_switchdev_fdb_wait_ctx *)cookie;
+
+	/* Potentially multiple drivers might call switchdev_fdb_mark_done,
+	 * each from its own deferred context. So we need to serialize here.
+	 */
+	mutex_lock(&wait_ctx->lock);
+
+	/* Do not overwrite errors with success stories. This preserves the
+	 * last non-zero error code, which may or may not coincide with the
+	 * last extack.
+	 */
+	if (err)
+		wait_ctx->err = err;
+
+	wait_ctx->pending_count--;
+	done = wait_ctx->pending_count == 0;
+
+	mutex_unlock(&wait_ctx->lock);
+
+	if (done)
+		complete(&wait_ctx->done);
+}
+
+static int br_switchdev_fdb_wait(struct br_switchdev_fdb_wait_ctx *wait_ctx)
+{
+	/* If the pending flag isn't set, there's nothing to wait for
+	 * (switchdev not compiled, no driver interested, driver with the
+	 * legacy silent behavior, etc).
+	 * We need a dedicated pending flag as opposed to looking at the
+	 * pending_count, because we'd need a lock to look at the
+	 * pending_count (it's decremented concurrently with us), whereas we
+	 * need no locking to look at the pending flag: it was set (if it was)
+	 * during the atomic notifier call.
+	 */
+	if (wait_ctx->pending)
+		wait_for_completion(&wait_ctx->done);
+
+	return wait_ctx->err;
+}
+
 static void br_fdb_notify(struct net_bridge *br,
 			  const struct net_bridge_fdb_entry *fdb, int type,
 			  bool swdev_notify)
@@ -211,6 +288,18 @@ static void br_fdb_notify(struct net_bridge *br,
 	fdb_notify(br, fdb, type);
 }
 
+static void br_fdb_notify_async(struct net_bridge *br,
+				const struct net_bridge_fdb_entry *fdb,
+				int type, struct netlink_ext_ack *extack,
+				struct br_switchdev_fdb_wait_ctx *wait_ctx)
+{
+	br_switchdev_fdb_notify_async(br, fdb, type, br_switchdev_fdb_pending,
+				      br_switchdev_fdb_done,
+				      (unsigned long)wait_ctx, extack);
+
+	fdb_notify(br, fdb, type);
+}
+
 static struct net_bridge_fdb_entry *fdb_find_rcu(struct rhashtable *tbl,
 						 const unsigned char *addr,
 						 __u16 vid)
@@ -841,6 +930,26 @@ int br_fdb_get(struct sk_buff *skb,
 	return err;
 }
 
+/* Delete an FDB entry and don't notify switchdev */
+static void fdb_delete_by_addr_and_port(struct net_bridge *br,
+					const struct net_bridge_port *p,
+					const u8 *addr, u16 vlan)
+{
+	struct net_bridge_fdb_entry *fdb;
+
+	spin_lock_bh(&br->hash_lock);
+
+	fdb = br_fdb_find(br, addr, vlan);
+	if (!fdb || READ_ONCE(fdb->dst) != p) {
+		spin_unlock_bh(&br->hash_lock);
+		return;
+	}
+
+	fdb_delete(br, fdb, false);
+
+	spin_unlock_bh(&br->hash_lock);
+}
+
 /* returns true if the fdb is modified */
 static bool fdb_handle_notify(struct net_bridge_fdb_entry *fdb, u8 notify)
 {
@@ -868,14 +977,16 @@ static bool fdb_handle_notify(struct net_bridge_fdb_entry *fdb, u8 notify)
 /* Update (create or replace) forwarding database entry */
 static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
 			 const u8 *addr, struct ndmsg *ndm, u16 flags, u16 vid,
-			 struct nlattr *nfea_tb[])
+			 struct nlattr *nfea_tb[], struct netlink_ext_ack *extack)
 {
 	bool is_sticky = !!(ndm->ndm_flags & NTF_STICKY);
 	bool refresh = !nfea_tb[NFEA_DONT_REFRESH];
+	struct br_switchdev_fdb_wait_ctx wait_ctx;
 	struct net_bridge_fdb_entry *fdb;
 	u16 state = ndm->ndm_state;
 	bool modified = false;
 	u8 notify = 0;
+	int err;
 
 	/* If the port cannot learn allow only local and static entries */
 	if (source && !(state & NUD_PERMANENT) && !(state & NUD_NOARP) &&
@@ -899,6 +1010,8 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
 			return -EINVAL;
 	}
 
+	br_switchdev_fdb_wait_ctx_init(&wait_ctx);
+
 	spin_lock_bh(&br->hash_lock);
 
 	fdb = br_fdb_find(br, addr, vid);
@@ -959,12 +1072,16 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
 	if (modified) {
 		if (refresh)
 			fdb->updated = jiffies;
-		br_fdb_notify(br, fdb, RTM_NEWNEIGH, true);
+		br_fdb_notify_async(br, fdb, RTM_NEWNEIGH, extack, &wait_ctx);
 	}
 
 	spin_unlock_bh(&br->hash_lock);
 
-	return 0;
+	err = br_switchdev_fdb_wait(&wait_ctx);
+	if (err)
+		fdb_delete_by_addr_and_port(br, source, addr, vid);
+
+	return err;
 }
 
 static int __br_fdb_add(struct ndmsg *ndm, struct net_bridge *br,
@@ -996,7 +1113,8 @@ static int __br_fdb_add(struct ndmsg *ndm, struct net_bridge *br,
 		}
 		err = br_fdb_external_learn_add(br, p, addr, vid, true);
 	} else {
-		err = fdb_add_entry(br, p, addr, ndm, nlh_flags, vid, nfea_tb);
+		err = fdb_add_entry(br, p, addr, ndm, nlh_flags, vid, nfea_tb,
+				    extack);
 	}
 
 	return err;
@@ -1090,9 +1208,13 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 	return err;
 }
 
-static int fdb_delete_by_addr_and_port(struct net_bridge *br,
-				       const struct net_bridge_port *p,
-				       const u8 *addr, u16 vlan)
+/* Delete an FDB entry and notify switchdev.
+ * Caller must hold &br->hash_lock.
+ */
+static int
+fdb_delete_by_addr_and_port_switchdev(struct net_bridge *br,
+				      const struct net_bridge_port *p,
+				      const u8 *addr, u16 vlan)
 {
 	struct net_bridge_fdb_entry *fdb;
 
@@ -1112,7 +1234,7 @@ static int __br_fdb_delete(struct net_bridge *br,
 	int err;
 
 	spin_lock_bh(&br->hash_lock);
-	err = fdb_delete_by_addr_and_port(br, p, addr, vid);
+	err = fdb_delete_by_addr_and_port_switchdev(br, p, addr, vid);
 	spin_unlock_bh(&br->hash_lock);
 
 	return err;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 3c9327628060..f5f7501dad7d 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -1987,6 +1987,12 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
 			       struct netlink_ext_ack *extack);
 void br_switchdev_fdb_notify(struct net_bridge *br,
 			     const struct net_bridge_fdb_entry *fdb, int type);
+void br_switchdev_fdb_notify_async(struct net_bridge *br,
+				   const struct net_bridge_fdb_entry *fdb, int type,
+				   void (*pending_fn)(unsigned long cookie),
+				   void (*done_fn)(unsigned long cookie, int err),
+				   unsigned long cookie,
+				   struct netlink_ext_ack *extack);
 int br_switchdev_port_vlan_add(struct net_device *dev, u16 vid, u16 flags,
 			       struct netlink_ext_ack *extack);
 int br_switchdev_port_vlan_del(struct net_device *dev, u16 vid);
@@ -2073,6 +2079,16 @@ br_switchdev_fdb_notify(struct net_bridge *br,
 {
 }
 
+static inline void
+br_switchdev_fdb_notify_async(struct net_bridge *br,
+			      const struct net_bridge_fdb_entry *fdb, int type,
+			      void (*pending_fn)(unsigned long cookie),
+			      void (*done_fn)(unsigned long cookie, int err),
+			      unsigned long cookie,
+			      struct netlink_ext_ack *extack)
+{
+}
+
 static inline void br_switchdev_frame_unmark(struct sk_buff *skb)
 {
 }
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index f58fb06ae641..6e3040f6f636 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -125,7 +125,10 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
 static void br_switchdev_fdb_populate(struct net_bridge *br,
 				      struct switchdev_notifier_fdb_info *item,
 				      const struct net_bridge_fdb_entry *fdb,
-				      const void *ctx)
+				      const void *ctx,
+				      void (*pending_fn)(unsigned long cookie),
+				      void (*done_fn)(unsigned long cookie, int err),
+				      unsigned long cookie)
 {
 	const struct net_bridge_port *p = READ_ONCE(fdb->dst);
 
@@ -136,28 +139,44 @@ static void br_switchdev_fdb_populate(struct net_bridge *br,
 	item->is_local = test_bit(BR_FDB_LOCAL, &fdb->flags);
 	item->info.dev = (!p || item->is_local) ? br->dev : p->dev;
 	item->info.ctx = ctx;
+	item->pending_fn = pending_fn;
+	item->done_fn = done_fn;
+	item->cookie = cookie;
 }
 
 void
-br_switchdev_fdb_notify(struct net_bridge *br,
-			const struct net_bridge_fdb_entry *fdb, int type)
+br_switchdev_fdb_notify_async(struct net_bridge *br,
+			      const struct net_bridge_fdb_entry *fdb, int type,
+			      void (*pending_fn)(unsigned long cookie),
+			      void (*done_fn)(unsigned long cookie, int err),
+			      unsigned long cookie,
+			      struct netlink_ext_ack *extack)
 {
 	struct switchdev_notifier_fdb_info item;
 
-	br_switchdev_fdb_populate(br, &item, fdb, NULL);
+	br_switchdev_fdb_populate(br, &item, fdb, NULL, pending_fn,
+				  done_fn, cookie);
 
 	switch (type) {
 	case RTM_DELNEIGH:
 		call_switchdev_notifiers(SWITCHDEV_FDB_DEL_TO_DEVICE,
-					 item.info.dev, &item.info, NULL);
+					 item.info.dev, &item.info, extack);
 		break;
 	case RTM_NEWNEIGH:
 		call_switchdev_notifiers(SWITCHDEV_FDB_ADD_TO_DEVICE,
-					 item.info.dev, &item.info, NULL);
+					 item.info.dev, &item.info, extack);
 		break;
 	}
 }
 
+void
+br_switchdev_fdb_notify(struct net_bridge *br,
+			const struct net_bridge_fdb_entry *fdb, int type)
+{
+	br_switchdev_fdb_notify_async(br, fdb, type, NULL, NULL,
+				      (unsigned long)NULL, NULL);
+}
+
 int br_switchdev_port_vlan_add(struct net_device *dev, u16 vid, u16 flags,
 			       struct netlink_ext_ack *extack)
 {
@@ -287,7 +306,8 @@ static int br_fdb_replay_one(struct net_bridge *br, struct notifier_block *nb,
 	struct switchdev_notifier_fdb_info item;
 	int err;
 
-	br_switchdev_fdb_populate(br, &item, fdb, ctx);
+	br_switchdev_fdb_populate(br, &item, fdb, ctx, NULL,
+				  NULL, (unsigned long)NULL);
 
 	err = nb->notifier_call(nb, action, &item);
 	return notifier_to_errno(err);
-- 
2.25.1


  parent reply	other threads:[~2021-10-25 22:25 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-25 22:24 [RFC PATCH net-next 00/15] Synchronous feedback on FDB add/del from switchdev to the bridge Vladimir Oltean
2021-10-25 22:24 ` [RFC PATCH net-next 01/15] net: bridge: remove fdb_notify forward declaration Vladimir Oltean
2021-10-25 22:24 ` [RFC PATCH net-next 02/15] net: bridge: remove fdb_insert " Vladimir Oltean
2021-10-25 22:24 ` [RFC PATCH net-next 03/15] net: bridge: rename fdb_insert to fdb_add_local Vladimir Oltean
2021-10-25 22:24 ` [RFC PATCH net-next 04/15] net: bridge: rename br_fdb_insert to br_fdb_add_local Vladimir Oltean
2021-10-25 22:24 ` [RFC PATCH net-next 05/15] net: bridge: move br_fdb_replay inside br_switchdev.c Vladimir Oltean
2021-10-25 22:24 ` [RFC PATCH net-next 06/15] net: bridge: create a common function for populating switchdev FDB entries Vladimir Oltean
2021-10-25 22:24 ` [RFC PATCH net-next 07/15] net: switchdev: keep the MAC address by value in struct switchdev_notifier_fdb_info Vladimir Oltean
2021-10-25 22:24 ` [RFC PATCH net-next 08/15] net: bridge: take the hash_lock inside fdb_add_entry Vladimir Oltean
2021-10-25 22:24 ` [RFC PATCH net-next 09/15] net: bridge: rename fdb_notify to br_fdb_notify Vladimir Oltean
2021-10-25 22:24 ` [RFC PATCH net-next 10/15] net: switchdev: merge switchdev_handle_fdb_{add,del}_to_device Vladimir Oltean
2021-10-25 22:24 ` Vladimir Oltean [this message]
2021-10-25 22:24 ` [RFC PATCH net-next 12/15] net: rtnetlink: pass extack to .ndo_fdb_del Vladimir Oltean
2021-10-25 22:24 ` [RFC PATCH net-next 13/15] net: bridge: wait for errors from switchdev when deleting FDB entries Vladimir Oltean
2021-10-25 22:24 ` [RFC PATCH net-next 14/15] net: dsa: propagate feedback to SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE Vladimir Oltean
2021-10-25 22:24 ` [RFC PATCH net-next 15/15] net: dsa: propagate extack to .port_fdb_{add,del} Vladimir Oltean
2021-10-26 10:40 ` [RFC PATCH net-next 00/15] Synchronous feedback on FDB add/del from switchdev to the bridge Nikolay Aleksandrov
2021-10-26 11:25   ` Vladimir Oltean
2021-10-26 12:20     ` Nikolay Aleksandrov
2021-10-26 12:38       ` Ido Schimmel
2021-10-26 16:54       ` Vladimir Oltean
2021-10-26 17:10         ` Nikolay Aleksandrov
2021-10-26 19:01           ` Vladimir Oltean
2021-10-26 19:56             ` Nikolay Aleksandrov
2021-10-26 21:51               ` Vladimir Oltean
2021-10-26 22:27                 ` Nikolay Aleksandrov
2021-10-27  9:20                   ` Nikolay Aleksandrov
2021-10-27  9:36                     ` Nikolay Aleksandrov
2021-10-27  7:24                 ` Ido Schimmel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211025222415.983883-12-vladimir.oltean@nxp.com \
    --to=vladimir.oltean@nxp.com \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=f.fainelli@gmail.com \
    --cc=idosch@nvidia.com \
    --cc=jiri@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@nvidia.com \
    --cc=olteanv@gmail.com \
    --cc=roopa@nvidia.com \
    --cc=vivien.didelot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.