All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net-next 0/8] Drop rtnl_lock from DSA .port_fdb_{add,del}
@ 2021-08-24 11:40 Vladimir Oltean
  2021-08-24 11:40 ` [RFC PATCH net-next 1/8] net: dsa: sja1105: wait for dynamic config command completion on writes too Vladimir Oltean
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Vladimir Oltean @ 2021-08-24 11:40 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	UNGLinuxDriver, DENG Qingfang, Kurt Kanzenbach, Hauke Mehrtens,
	Woojung Huh, Sean Wang, Landen Chao, Alexandre Belloni,
	George McCollister, John Crispin, Aleksander Jan Bajkowski,
	Egil Hjelmeland, Oleksij Rempel

This is a heads-up to DSA driver maintainers: in about 3 weeks time
(when the development cycle for v5.16 begins), I will come back with
these patches and attempt to drop the rtnl_lock guarantee from the DSA
.port_fdb_add and .port_fdb_del methods.

Plans might change, but this seems like an overall beneficial change if
we could make it, regardless of whether it is going to be part of the
final solution for enforcing FDB isolation in DSA.

After applying the entire patch set, the .port_fdb_add and .port_fdb_del
methods will run unlocked, and I would appreciate any regression test
that a maintainer can run on their hardware. Most drivers have locking
of sorts, but I wouldn't trust it, since it wasn't really put to test
until now.

The change set is structured as follows (from bottom to top):

1. A self test that driver maintainers could run while testing their
   newly unlocked ops. It is not bullet-proof, but I have found issues
   with it, so maybe it is useful.

   To run it, rsync the entire "selftests" folder to the board, then run:

   ./selftests/drivers/net/dsa/test_bridge_fdb_stress.sh sw0p2

   For the sja1105 driver, this was an indication that the internal
   locking was not sufficient:

[  282.615386] sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:04:05 vid 1: -ENOENT <- printed by the driver
[  282.624796] sja1105 spi2.0: port 2 failed to add 00:01:02:03:04:05 vid 1 to fdb: -2 <- printed by DSA

   The self-test does not test traffic, but it would be nice to check if
   the switch still behaves normally after the test finishes.

2. The DSA changes themselves that drop the rtnl_lock.

3. Some example changes that I needed to make in two drivers I could
   test. I would very much prefer avoiding non-expert, wide ranging
   locking schemes such as an "FDB lock" or a "register lock". It is
   best to understand what needs to be atomic and what can be safely
   concurrent.

As usual, feedback and ACKs/NACKs are very welcome.

Vladimir Oltean (8):
  net: dsa: sja1105: wait for dynamic config command completion on
    writes too
  net: dsa: sja1105: serialize access to the dynamic config interface
  net: mscc: ocelot: serialize access to the MAC table
  net: dsa: introduce locking for the address lists on CPU and DSA ports
  net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work
  net: dsa: flush switchdev workqueue when leaving the bridge
  selftests: lib: forwarding: allow tests to not require mz and jq
  selftests: net: dsa: add a stress test for unlocked FDB operations

 MAINTAINERS                                   |  1 +
 drivers/net/dsa/sja1105/sja1105.h             |  2 +
 .../net/dsa/sja1105/sja1105_dynamic_config.c  | 91 ++++++++++++++-----
 drivers/net/dsa/sja1105/sja1105_main.c        |  1 +
 drivers/net/ethernet/mscc/ocelot.c            | 53 ++++++++---
 include/net/dsa.h                             |  1 +
 include/soc/mscc/ocelot.h                     |  3 +
 net/dsa/dsa.c                                 |  5 +
 net/dsa/dsa2.c                                |  1 +
 net/dsa/dsa_priv.h                            |  2 +
 net/dsa/port.c                                |  2 +
 net/dsa/slave.c                               |  2 -
 net/dsa/switch.c                              | 76 +++++++++++-----
 .../drivers/net/dsa/test_bridge_fdb_stress.sh | 48 ++++++++++
 tools/testing/selftests/net/forwarding/lib.sh | 10 +-
 15 files changed, 235 insertions(+), 63 deletions(-)
 create mode 100755 tools/testing/selftests/drivers/net/dsa/test_bridge_fdb_stress.sh

-- 
2.25.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH net-next 1/8] net: dsa: sja1105: wait for dynamic config command completion on writes too
  2021-08-24 11:40 [RFC PATCH net-next 0/8] Drop rtnl_lock from DSA .port_fdb_{add,del} Vladimir Oltean
@ 2021-08-24 11:40 ` Vladimir Oltean
  2021-08-24 11:40 ` [RFC PATCH net-next 2/8] net: dsa: sja1105: serialize access to the dynamic config interface Vladimir Oltean
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Vladimir Oltean @ 2021-08-24 11:40 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	UNGLinuxDriver, DENG Qingfang, Kurt Kanzenbach, Hauke Mehrtens,
	Woojung Huh, Sean Wang, Landen Chao, Alexandre Belloni,
	George McCollister, John Crispin, Aleksander Jan Bajkowski,
	Egil Hjelmeland, Oleksij Rempel

The hardware manual says that software should attempt a new dynamic
config access (be it a a write or a read-back) only while the VALID bit
is cleared. The VALID bit is set by software to 1, and it remains set as
long as the hardware is still processing the request.

Currently the driver only polls for the command completion only for
reads, because that's when we need the actual data read back. Writes
have been more or less "asynchronous", although this has never been an
observable issue.

This change makes sja1105_dynamic_config_write poll the VALID bit as
well, to absolutely ensure that a follow-up access to the static config
finds the VALID bit cleared.

So VALID means "work in progress", while VALIDENT means "entry being
read is valid". On reads we check the VALIDENT bit too, while on writes
that bit is not always defined. So we need to factor it out of the loop,
and make the loop provide back the unpacked command structure, so that
sja1105_dynamic_config_read can check the VALIDENT bit.

The change also attempts to convert the open-coded loop to use the
read_poll_timeout macro, since I know this will come up during review.
It's more code, but hey, it uses read_poll_timeout!

Tested on SJA1105T, SJA1105S, SJA1110A.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 .../net/dsa/sja1105/sja1105_dynamic_config.c  | 81 ++++++++++++++-----
 1 file changed, 59 insertions(+), 22 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
index f2049f52833c..32ec34f181de 100644
--- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
+++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
@@ -1170,6 +1170,56 @@ const struct sja1105_dynamic_table_ops sja1110_dyn_ops[BLK_IDX_MAX_DYN] = {
 	},
 };
 
+#define SJA1105_DYNAMIC_CONFIG_SLEEP_US		10
+#define SJA1105_DYNAMIC_CONFIG_TIMEOUT_US	100000
+
+static int
+sja1105_dynamic_config_poll_valid(struct sja1105_private *priv,
+				  struct sja1105_dyn_cmd *cmd,
+				  const struct sja1105_dynamic_table_ops *ops)
+{
+	u8 packed_buf[SJA1105_MAX_DYN_CMD_SIZE] = {};
+	int rc;
+
+	/* We don't _need_ to read the full entry, just the command area which
+	 * is a fixed SJA1105_SIZE_DYN_CMD. But our cmd_packing() API expects a
+	 * buffer that contains the full entry too. Additionally, our API
+	 * doesn't really know how many bytes into the buffer does the command
+	 * area really begin. So just read back the whole entry.
+	 */
+	rc = sja1105_xfer_buf(priv, SPI_READ, ops->addr, packed_buf,
+			      ops->packed_size);
+	if (rc)
+		return rc;
+
+	/* Unpack the command structure, and return it to the caller in case it
+	 * needs to perform further checks on it (VALIDENT).
+	 */
+	memset(cmd, 0, sizeof(*cmd));
+	ops->cmd_packing(packed_buf, cmd, UNPACK);
+
+	/* Hardware hasn't cleared VALID => still working on it */
+	return cmd->valid ? -EAGAIN : 0;
+}
+
+/* Poll the dynamic config entry's control area until the hardware has
+ * cleared the VALID bit, which means we have confirmation that it has
+ * finished processing the command.
+ */
+static int
+sja1105_dynamic_config_wait_complete(struct sja1105_private *priv,
+				     struct sja1105_dyn_cmd *cmd,
+				     const struct sja1105_dynamic_table_ops *ops)
+{
+	int rc;
+
+	return read_poll_timeout(sja1105_dynamic_config_poll_valid,
+				 rc, rc != -EAGAIN,
+				 SJA1105_DYNAMIC_CONFIG_SLEEP_US,
+				 SJA1105_DYNAMIC_CONFIG_TIMEOUT_US,
+				 false, priv, cmd, ops);
+}
+
 /* Provides read access to the settings through the dynamic interface
  * of the switch.
  * @blk_idx	is used as key to select from the sja1105_dynamic_table_ops.
@@ -1196,7 +1246,6 @@ int sja1105_dynamic_config_read(struct sja1105_private *priv,
 	struct sja1105_dyn_cmd cmd = {0};
 	/* SPI payload buffer */
 	u8 packed_buf[SJA1105_MAX_DYN_CMD_SIZE] = {0};
-	int retries = 3;
 	int rc;
 
 	if (blk_idx >= BLK_IDX_MAX_DYN)
@@ -1239,28 +1288,12 @@ int sja1105_dynamic_config_read(struct sja1105_private *priv,
 	if (rc < 0)
 		return rc;
 
-	/* Loop until we have confirmation that hardware has finished
-	 * processing the command and has cleared the VALID field
-	 */
-	do {
-		memset(packed_buf, 0, ops->packed_size);
-
-		/* Retrieve the read operation's result */
-		rc = sja1105_xfer_buf(priv, SPI_READ, ops->addr, packed_buf,
-				      ops->packed_size);
-		if (rc < 0)
-			return rc;
-
-		cmd = (struct sja1105_dyn_cmd) {0};
-		ops->cmd_packing(packed_buf, &cmd, UNPACK);
-
-		if (!cmd.valident && !(ops->access & OP_VALID_ANYWAY))
-			return -ENOENT;
-		cpu_relax();
-	} while (cmd.valid && --retries);
+	rc = sja1105_dynamic_config_wait_complete(priv, &cmd, ops);
+	if (rc < 0)
+		return rc;
 
-	if (cmd.valid)
-		return -ETIMEDOUT;
+	if (!cmd.valident && !(ops->access & OP_VALID_ANYWAY))
+		return -ENOENT;
 
 	/* Don't dereference possibly NULL pointer - maybe caller
 	 * only wanted to see whether the entry existed or not.
@@ -1321,6 +1354,10 @@ int sja1105_dynamic_config_write(struct sja1105_private *priv,
 	if (rc < 0)
 		return rc;
 
+	rc = sja1105_dynamic_config_wait_complete(priv, &cmd, ops);
+	if (rc < 0)
+		return rc;
+
 	cmd = (struct sja1105_dyn_cmd) {0};
 	ops->cmd_packing(packed_buf, &cmd, UNPACK);
 	if (cmd.errors)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH net-next 2/8] net: dsa: sja1105: serialize access to the dynamic config interface
  2021-08-24 11:40 [RFC PATCH net-next 0/8] Drop rtnl_lock from DSA .port_fdb_{add,del} Vladimir Oltean
  2021-08-24 11:40 ` [RFC PATCH net-next 1/8] net: dsa: sja1105: wait for dynamic config command completion on writes too Vladimir Oltean
@ 2021-08-24 11:40 ` Vladimir Oltean
  2021-08-24 11:40 ` [RFC PATCH net-next 3/8] net: mscc: ocelot: serialize access to the MAC table Vladimir Oltean
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Vladimir Oltean @ 2021-08-24 11:40 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	UNGLinuxDriver, DENG Qingfang, Kurt Kanzenbach, Hauke Mehrtens,
	Woojung Huh, Sean Wang, Landen Chao, Alexandre Belloni,
	George McCollister, John Crispin, Aleksander Jan Bajkowski,
	Egil Hjelmeland, Oleksij Rempel

The sja1105 hardware seems as concurrent as can be, but when we create a
background script that adds/removes a rain of FDB entries without the
rtnl_mutex taken, then in parallel we do another operation like run
'bridge fdb show', we can notice these errors popping up:

sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:00:40 vid 0: -ENOENT
sja1105 spi2.0: port 2 failed to add 00:01:02:03:00:40 vid 0 to fdb: -2
sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:00:46 vid 0: -ENOENT
sja1105 spi2.0: port 2 failed to add 00:01:02:03:00:46 vid 0 to fdb: -2

Luckily what is going on does not require a major rework in the driver.
The sja1105_dynamic_config_read() function sends multiple SPI buffers to
the peripheral until the operation completes. We should not do anything
until the hardware clears the VALID bit.

But since there is no locking (i.e. right now we are implicitly
serialized by the rtnl_mutex, but if we remove that), it might be
possible that the process which performs the dynamic config read is
preempted and another one performs a dynamic config write.

What will happen in that case is that sja1105_dynamic_config_read(),
when it resumes, expects to see VALIDENT set for the entry it reads
back. But it won't.

This can be corrected by introducing a mutex for serializing SPI
accesses to the dynamic config interface which should be atomic with
respect to each other.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/sja1105/sja1105.h                |  2 ++
 drivers/net/dsa/sja1105/sja1105_dynamic_config.c | 12 ++++++++++--
 drivers/net/dsa/sja1105/sja1105_main.c           |  1 +
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h
index 2e899c9f036d..78624851d1f8 100644
--- a/drivers/net/dsa/sja1105/sja1105.h
+++ b/drivers/net/dsa/sja1105/sja1105.h
@@ -248,6 +248,8 @@ struct sja1105_private {
 	 * the switch doesn't confuse them with one another.
 	 */
 	struct mutex mgmt_lock;
+	/* Serializes access to the dynamic config interface */
+	struct mutex dynamic_config_lock;
 	struct devlink_region **regions;
 	struct sja1105_cbs_entry *cbs;
 	struct mii_bus *mdio_base_t1;
diff --git a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
index 32ec34f181de..7729d3f8b7f5 100644
--- a/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
+++ b/drivers/net/dsa/sja1105/sja1105_dynamic_config.c
@@ -1283,12 +1283,16 @@ int sja1105_dynamic_config_read(struct sja1105_private *priv,
 		ops->entry_packing(packed_buf, entry, PACK);
 
 	/* Send SPI write operation: read config table entry */
+	mutex_lock(&priv->dynamic_config_lock);
 	rc = sja1105_xfer_buf(priv, SPI_WRITE, ops->addr, packed_buf,
 			      ops->packed_size);
-	if (rc < 0)
+	if (rc < 0) {
+		mutex_unlock(&priv->dynamic_config_lock);
 		return rc;
+	}
 
 	rc = sja1105_dynamic_config_wait_complete(priv, &cmd, ops);
+	mutex_unlock(&priv->dynamic_config_lock);
 	if (rc < 0)
 		return rc;
 
@@ -1349,12 +1353,16 @@ int sja1105_dynamic_config_write(struct sja1105_private *priv,
 		ops->entry_packing(packed_buf, entry, PACK);
 
 	/* Send SPI write operation: read config table entry */
+	mutex_lock(&priv->dynamic_config_lock);
 	rc = sja1105_xfer_buf(priv, SPI_WRITE, ops->addr, packed_buf,
 			      ops->packed_size);
-	if (rc < 0)
+	if (rc < 0) {
+		mutex_unlock(&priv->dynamic_config_lock);
 		return rc;
+	}
 
 	rc = sja1105_dynamic_config_wait_complete(priv, &cmd, ops);
+	mutex_unlock(&priv->dynamic_config_lock);
 	if (rc < 0)
 		return rc;
 
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index 05ba65042b5f..dbfbb949a485 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -3285,6 +3285,7 @@ static int sja1105_probe(struct spi_device *spi)
 	priv->ds = ds;
 
 	mutex_init(&priv->ptp_data.lock);
+	mutex_init(&priv->dynamic_config_lock);
 	mutex_init(&priv->mgmt_lock);
 
 	rc = sja1105_parse_dt(priv);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH net-next 3/8] net: mscc: ocelot: serialize access to the MAC table
  2021-08-24 11:40 [RFC PATCH net-next 0/8] Drop rtnl_lock from DSA .port_fdb_{add,del} Vladimir Oltean
  2021-08-24 11:40 ` [RFC PATCH net-next 1/8] net: dsa: sja1105: wait for dynamic config command completion on writes too Vladimir Oltean
  2021-08-24 11:40 ` [RFC PATCH net-next 2/8] net: dsa: sja1105: serialize access to the dynamic config interface Vladimir Oltean
@ 2021-08-24 11:40 ` Vladimir Oltean
  2021-08-24 11:40 ` [RFC PATCH net-next 4/8] net: dsa: introduce locking for the address lists on CPU and DSA ports Vladimir Oltean
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Vladimir Oltean @ 2021-08-24 11:40 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	UNGLinuxDriver, DENG Qingfang, Kurt Kanzenbach, Hauke Mehrtens,
	Woojung Huh, Sean Wang, Landen Chao, Alexandre Belloni,
	George McCollister, John Crispin, Aleksander Jan Bajkowski,
	Egil Hjelmeland, Oleksij Rempel

DSA would like to remove the rtnl_lock from its
SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE handlers, and the felix driver uses
the same MAC table functions as ocelot.

This means that the MAC table functions will no longer be implicitly
serialized with respect to each other by the rtnl_mutex, we need to add
a dedicated lock in ocelot for the non-atomic operations of selecting a
MAC table row, reading/writing what we want and polling for completion.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/ethernet/mscc/ocelot.c | 53 +++++++++++++++++++++++-------
 include/soc/mscc/ocelot.h          |  3 ++
 2 files changed, 44 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index c581b955efb3..9f481cf19931 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -20,11 +20,13 @@ struct ocelot_mact_entry {
 	enum macaccess_entry_type type;
 };
 
+/* Must be called with &ocelot->mact_lock held */
 static inline u32 ocelot_mact_read_macaccess(struct ocelot *ocelot)
 {
 	return ocelot_read(ocelot, ANA_TABLES_MACACCESS);
 }
 
+/* Must be called with &ocelot->mact_lock held */
 static inline int ocelot_mact_wait_for_completion(struct ocelot *ocelot)
 {
 	u32 val;
@@ -36,6 +38,7 @@ static inline int ocelot_mact_wait_for_completion(struct ocelot *ocelot)
 		TABLE_UPDATE_SLEEP_US, TABLE_UPDATE_TIMEOUT_US);
 }
 
+/* Must be called with &ocelot->mact_lock held */
 static void ocelot_mact_select(struct ocelot *ocelot,
 			       const unsigned char mac[ETH_ALEN],
 			       unsigned int vid)
@@ -67,6 +70,7 @@ int ocelot_mact_learn(struct ocelot *ocelot, int port,
 		ANA_TABLES_MACACCESS_ENTRYTYPE(type) |
 		ANA_TABLES_MACACCESS_MAC_TABLE_CMD(MACACCESS_CMD_LEARN);
 	unsigned int mc_ports;
+	int err;
 
 	/* Set MAC_CPU_COPY if the CPU port is used by a multicast entry */
 	if (type == ENTRYTYPE_MACv4)
@@ -79,18 +83,28 @@ int ocelot_mact_learn(struct ocelot *ocelot, int port,
 	if (mc_ports & BIT(ocelot->num_phys_ports))
 		cmd |= ANA_TABLES_MACACCESS_MAC_CPU_COPY;
 
+	mutex_lock(&ocelot->mact_lock);
+
 	ocelot_mact_select(ocelot, mac, vid);
 
 	/* Issue a write command */
 	ocelot_write(ocelot, cmd, ANA_TABLES_MACACCESS);
 
-	return ocelot_mact_wait_for_completion(ocelot);
+	err = ocelot_mact_wait_for_completion(ocelot);
+
+	mutex_unlock(&ocelot->mact_lock);
+
+	return err;
 }
 EXPORT_SYMBOL(ocelot_mact_learn);
 
 int ocelot_mact_forget(struct ocelot *ocelot,
 		       const unsigned char mac[ETH_ALEN], unsigned int vid)
 {
+	int err;
+
+	mutex_lock(&ocelot->mact_lock);
+
 	ocelot_mact_select(ocelot, mac, vid);
 
 	/* Issue a forget command */
@@ -98,7 +112,11 @@ int ocelot_mact_forget(struct ocelot *ocelot,
 		     ANA_TABLES_MACACCESS_MAC_TABLE_CMD(MACACCESS_CMD_FORGET),
 		     ANA_TABLES_MACACCESS);
 
-	return ocelot_mact_wait_for_completion(ocelot);
+	err = ocelot_mact_wait_for_completion(ocelot);
+
+	mutex_unlock(&ocelot->mact_lock);
+
+	return err;
 }
 EXPORT_SYMBOL(ocelot_mact_forget);
 
@@ -114,7 +132,9 @@ static void ocelot_mact_init(struct ocelot *ocelot)
 		   | ANA_AGENCTRL_LEARN_IGNORE_VLAN,
 		   ANA_AGENCTRL);
 
-	/* Clear the MAC table */
+	/* Clear the MAC table. We are not concurrent with anyone, so
+	 * holding &ocelot->mact_lock is pointless.
+	 */
 	ocelot_write(ocelot, MACACCESS_CMD_INIT, ANA_TABLES_MACACCESS);
 }
 
@@ -1028,6 +1048,7 @@ int ocelot_port_fdb_do_dump(const unsigned char *addr, u16 vid,
 }
 EXPORT_SYMBOL(ocelot_port_fdb_do_dump);
 
+/* Must be called with &ocelot->mact_lock held */
 static int ocelot_mact_read(struct ocelot *ocelot, int port, int row, int col,
 			    struct ocelot_mact_entry *entry)
 {
@@ -1078,33 +1099,40 @@ static int ocelot_mact_read(struct ocelot *ocelot, int port, int row, int col,
 int ocelot_fdb_dump(struct ocelot *ocelot, int port,
 		    dsa_fdb_dump_cb_t *cb, void *data)
 {
+	int err = 0;
 	int i, j;
 
+	/* We could take the lock just around ocelot_mact_read, but doing so
+	 * thousands of times in a row seems rather pointless and inefficient.
+	 */
+	mutex_lock(&ocelot->mact_lock);
+
 	/* Loop through all the mac tables entries. */
 	for (i = 0; i < ocelot->num_mact_rows; i++) {
 		for (j = 0; j < 4; j++) {
 			struct ocelot_mact_entry entry;
 			bool is_static;
-			int ret;
 
-			ret = ocelot_mact_read(ocelot, port, i, j, &entry);
+			err = ocelot_mact_read(ocelot, port, i, j, &entry);
 			/* If the entry is invalid (wrong port, invalid...),
 			 * skip it.
 			 */
-			if (ret == -EINVAL)
+			if (err == -EINVAL)
 				continue;
-			else if (ret)
-				return ret;
+			else if (err)
+				break;
 
 			is_static = (entry.type == ENTRYTYPE_LOCKED);
 
-			ret = cb(entry.mac, entry.vid, is_static, data);
-			if (ret)
-				return ret;
+			err = cb(entry.mac, entry.vid, is_static, data);
+			if (err)
+				break;
 		}
 	}
 
-	return 0;
+	mutex_unlock(&ocelot->mact_lock);
+
+	return err;
 }
 EXPORT_SYMBOL(ocelot_fdb_dump);
 
@@ -2085,6 +2113,7 @@ int ocelot_init(struct ocelot *ocelot)
 
 	mutex_init(&ocelot->stats_lock);
 	mutex_init(&ocelot->ptp_lock);
+	mutex_init(&ocelot->mact_lock);
 	spin_lock_init(&ocelot->ptp_clock_lock);
 	snprintf(queue_name, sizeof(queue_name), "%s-stats",
 		 dev_name(ocelot->dev));
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index 06706a9fd5b1..682cd058096c 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -674,6 +674,9 @@ struct ocelot {
 	struct delayed_work		stats_work;
 	struct workqueue_struct		*stats_queue;
 
+	/* Lock for serializing access to the MAC table */
+	struct mutex			mact_lock;
+
 	struct workqueue_struct		*owq;
 
 	u8				ptp:1;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH net-next 4/8] net: dsa: introduce locking for the address lists on CPU and DSA ports
  2021-08-24 11:40 [RFC PATCH net-next 0/8] Drop rtnl_lock from DSA .port_fdb_{add,del} Vladimir Oltean
                   ` (2 preceding siblings ...)
  2021-08-24 11:40 ` [RFC PATCH net-next 3/8] net: mscc: ocelot: serialize access to the MAC table Vladimir Oltean
@ 2021-08-24 11:40 ` Vladimir Oltean
  2021-08-24 11:40 ` [RFC PATCH net-next 5/8] net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work Vladimir Oltean
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Vladimir Oltean @ 2021-08-24 11:40 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	UNGLinuxDriver, DENG Qingfang, Kurt Kanzenbach, Hauke Mehrtens,
	Woojung Huh, Sean Wang, Landen Chao, Alexandre Belloni,
	George McCollister, John Crispin, Aleksander Jan Bajkowski,
	Egil Hjelmeland, Oleksij Rempel

Now that the rtnl_mutex is going away for dsa_port_{host_,}fdb_{add,del},
no one is serializing access to the address lists that DSA keeps for the
purpose of reference counting on shared ports (CPU and cascade ports).

It can happen for one dsa_switch_do_fdb_del to do list_del on a dp->fdbs
element while another dsa_switch_do_fdb_{add,del} is traversing dp->fdbs.
We need to avoid that.

Currently dp->mdbs is not at risk, because dsa_switch_do_mdb_{add,del}
still runs under the rtnl_mutex. But it would be nice if it would not
depend on that being the case. So let's introduce a mutex per port (the
address lists are per port too) and share it between dp->mdbs and
dp->fdbs.

The place where we put the locking is interesting. It could be tempting
to put a DSA-level lock which still serializes calls to
.port_fdb_{add,del}, but it would still not avoid concurrency with other
driver code paths that are currently under rtnl_mutex (.port_fdb_dump,
.port_fast_age). So it would add a very false sense of security (and
adding a global switch-wide lock in DSA to resynchronize with the
rtnl_lock is also counterproductive and hard).

So the locking is intentionally done only where the dp->fdbs and dp->mdbs
lists are traversed. That means, from a driver perspective, that
.port_fdb_add will be called with the dp->addr_lists_lock mutex held on
the CPU port, but not held on user ports. This is done so that driver
writers are not encouraged to rely on any guarantee offered by
dp->addr_lists_lock.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/net/dsa.h |  1 +
 net/dsa/dsa2.c    |  1 +
 net/dsa/switch.c  | 76 ++++++++++++++++++++++++++++++++---------------
 3 files changed, 54 insertions(+), 24 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index f9a17145255a..bed1fbc0215c 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -285,6 +285,7 @@ struct dsa_port {
 	/* List of MAC addresses that must be forwarded on this port.
 	 * These are only valid on CPU ports and DSA links.
 	 */
+	struct mutex		addr_lists_lock;
 	struct list_head	fdbs;
 	struct list_head	mdbs;
 
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 1b2b25d7bd02..8ddf10e27d85 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -435,6 +435,7 @@ static int dsa_port_setup(struct dsa_port *dp)
 	if (dp->setup)
 		return 0;
 
+	mutex_init(&dp->addr_lists_lock);
 	INIT_LIST_HEAD(&dp->fdbs);
 	INIT_LIST_HEAD(&dp->mdbs);
 
diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index 1c797ec8e2c2..40e28eedac59 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -214,26 +214,30 @@ static int dsa_switch_do_mdb_add(struct dsa_switch *ds, int port,
 {
 	struct dsa_port *dp = dsa_to_port(ds, port);
 	struct dsa_mac_addr *a;
-	int err;
+	int err = 0;
 
 	/* No need to bother with refcounting for user ports */
 	if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp)))
 		return ds->ops->port_mdb_add(ds, port, mdb);
 
+	mutex_lock(&dp->addr_lists_lock);
+
 	a = dsa_mac_addr_find(&dp->mdbs, mdb->addr, mdb->vid);
 	if (a) {
 		refcount_inc(&a->refcount);
-		return 0;
+		goto out;
 	}
 
 	a = kzalloc(sizeof(*a), GFP_KERNEL);
-	if (!a)
-		return -ENOMEM;
+	if (!a) {
+		err = -ENOMEM;
+		goto out;
+	}
 
 	err = ds->ops->port_mdb_add(ds, port, mdb);
 	if (err) {
 		kfree(a);
-		return err;
+		goto out;
 	}
 
 	ether_addr_copy(a->addr, mdb->addr);
@@ -241,7 +245,10 @@ static int dsa_switch_do_mdb_add(struct dsa_switch *ds, int port,
 	refcount_set(&a->refcount, 1);
 	list_add_tail(&a->list, &dp->mdbs);
 
-	return 0;
+out:
+	mutex_unlock(&dp->addr_lists_lock);
+
+	return err;
 }
 
 static int dsa_switch_do_mdb_del(struct dsa_switch *ds, int port,
@@ -249,29 +256,36 @@ static int dsa_switch_do_mdb_del(struct dsa_switch *ds, int port,
 {
 	struct dsa_port *dp = dsa_to_port(ds, port);
 	struct dsa_mac_addr *a;
-	int err;
+	int err = 0;
 
 	/* No need to bother with refcounting for user ports */
 	if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp)))
 		return ds->ops->port_mdb_del(ds, port, mdb);
 
+	mutex_lock(&dp->addr_lists_lock);
+
 	a = dsa_mac_addr_find(&dp->mdbs, mdb->addr, mdb->vid);
-	if (!a)
-		return -ENOENT;
+	if (!a) {
+		err = -ENOENT;
+		goto out;
+	}
 
 	if (!refcount_dec_and_test(&a->refcount))
-		return 0;
+		goto out;
 
 	err = ds->ops->port_mdb_del(ds, port, mdb);
 	if (err) {
 		refcount_inc(&a->refcount);
-		return err;
+		goto out;
 	}
 
 	list_del(&a->list);
 	kfree(a);
 
-	return 0;
+out:
+	mutex_unlock(&dp->addr_lists_lock);
+
+	return err;
 }
 
 static int dsa_switch_do_fdb_add(struct dsa_switch *ds, int port,
@@ -279,26 +293,30 @@ static int dsa_switch_do_fdb_add(struct dsa_switch *ds, int port,
 {
 	struct dsa_port *dp = dsa_to_port(ds, port);
 	struct dsa_mac_addr *a;
-	int err;
+	int err = 0;
 
 	/* No need to bother with refcounting for user ports */
 	if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp)))
 		return ds->ops->port_fdb_add(ds, port, addr, vid);
 
+	mutex_lock(&dp->addr_lists_lock);
+
 	a = dsa_mac_addr_find(&dp->fdbs, addr, vid);
 	if (a) {
 		refcount_inc(&a->refcount);
-		return 0;
+		goto out;
 	}
 
 	a = kzalloc(sizeof(*a), GFP_KERNEL);
-	if (!a)
-		return -ENOMEM;
+	if (!a) {
+		err = -ENOMEM;
+		goto out;
+	}
 
 	err = ds->ops->port_fdb_add(ds, port, addr, vid);
 	if (err) {
 		kfree(a);
-		return err;
+		goto out;
 	}
 
 	ether_addr_copy(a->addr, addr);
@@ -306,7 +324,10 @@ static int dsa_switch_do_fdb_add(struct dsa_switch *ds, int port,
 	refcount_set(&a->refcount, 1);
 	list_add_tail(&a->list, &dp->fdbs);
 
-	return 0;
+out:
+	mutex_unlock(&dp->addr_lists_lock);
+
+	return err;
 }
 
 static int dsa_switch_do_fdb_del(struct dsa_switch *ds, int port,
@@ -314,29 +335,36 @@ static int dsa_switch_do_fdb_del(struct dsa_switch *ds, int port,
 {
 	struct dsa_port *dp = dsa_to_port(ds, port);
 	struct dsa_mac_addr *a;
-	int err;
+	int err = 0;
 
 	/* No need to bother with refcounting for user ports */
 	if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp)))
 		return ds->ops->port_fdb_del(ds, port, addr, vid);
 
+	mutex_lock(&dp->addr_lists_lock);
+
 	a = dsa_mac_addr_find(&dp->fdbs, addr, vid);
-	if (!a)
-		return -ENOENT;
+	if (!a) {
+		err = -ENOENT;
+		goto out;
+	}
 
 	if (!refcount_dec_and_test(&a->refcount))
-		return 0;
+		goto out;
 
 	err = ds->ops->port_fdb_del(ds, port, addr, vid);
 	if (err) {
 		refcount_inc(&a->refcount);
-		return err;
+		goto out;
 	}
 
 	list_del(&a->list);
 	kfree(a);
 
-	return 0;
+out:
+	mutex_unlock(&dp->addr_lists_lock);
+
+	return err;
 }
 
 static int dsa_switch_host_fdb_add(struct dsa_switch *ds,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH net-next 5/8] net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work
  2021-08-24 11:40 [RFC PATCH net-next 0/8] Drop rtnl_lock from DSA .port_fdb_{add,del} Vladimir Oltean
                   ` (3 preceding siblings ...)
  2021-08-24 11:40 ` [RFC PATCH net-next 4/8] net: dsa: introduce locking for the address lists on CPU and DSA ports Vladimir Oltean
@ 2021-08-24 11:40 ` Vladimir Oltean
  2021-08-24 11:40 ` [RFC PATCH net-next 6/8] net: dsa: flush switchdev workqueue when leaving the bridge Vladimir Oltean
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Vladimir Oltean @ 2021-08-24 11:40 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	UNGLinuxDriver, DENG Qingfang, Kurt Kanzenbach, Hauke Mehrtens,
	Woojung Huh, Sean Wang, Landen Chao, Alexandre Belloni,
	George McCollister, John Crispin, Aleksander Jan Bajkowski,
	Egil Hjelmeland, Oleksij Rempel

After talking with Ido Schimmel, it became clear that rtnl_lock is not
actually required for anything that is done inside the
SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE deferred work handlers.

The reason why it was probably added by Arkadi Sharshevsky in commit
c9eb3e0f8701 ("net: dsa: Add support for learning FDB through
notification") was to offer the same locking/serialization guarantees as
.ndo_fdb_{add,del} and avoid reworking any drivers.

DSA has implemented .ndo_fdb_add and .ndo_fdb_del until commit
b117e1e8a86d ("net: dsa: delete dsa_legacy_fdb_add and
dsa_legacy_fdb_del") - that is to say, until fairly recently.

But those methods have been deleted, so now we are free to drop the
rtnl_lock as well.

Note that exposing DSA switch drivers to an unlocked method which was
previously serialized by the rtnl_mutex is a potentially dangerous
affair. Driver writers couldn't ensure that their internal locking
scheme does the right thing even if they wanted.

We could err on the side of paranoia and introduce a switch-wide lock
inside the DSA framework, but that seems way overreaching. Instead, we
could check as many drivers for regressions as we can, fix those first,
then let this change go in once it is assumed to be fairly safe.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/slave.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 662ff531d4e2..53394fb43d67 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -2381,7 +2381,6 @@ static void dsa_slave_switchdev_event_work(struct work_struct *work)
 
 	dp = dsa_to_port(ds, switchdev_work->port);
 
-	rtnl_lock();
 	switch (switchdev_work->event) {
 	case SWITCHDEV_FDB_ADD_TO_DEVICE:
 		if (switchdev_work->host_addr)
@@ -2416,7 +2415,6 @@ static void dsa_slave_switchdev_event_work(struct work_struct *work)
 
 		break;
 	}
-	rtnl_unlock();
 
 	dev_put(switchdev_work->dev);
 	kfree(switchdev_work);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH net-next 6/8] net: dsa: flush switchdev workqueue when leaving the bridge
  2021-08-24 11:40 [RFC PATCH net-next 0/8] Drop rtnl_lock from DSA .port_fdb_{add,del} Vladimir Oltean
                   ` (4 preceding siblings ...)
  2021-08-24 11:40 ` [RFC PATCH net-next 5/8] net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work Vladimir Oltean
@ 2021-08-24 11:40 ` Vladimir Oltean
  2021-08-24 11:40 ` [RFC PATCH net-next 7/8] selftests: lib: forwarding: allow tests to not require mz and jq Vladimir Oltean
  2021-08-24 11:40 ` [RFC PATCH net-next 8/8] selftests: net: dsa: add a stress test for unlocked FDB operations Vladimir Oltean
  7 siblings, 0 replies; 9+ messages in thread
From: Vladimir Oltean @ 2021-08-24 11:40 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	UNGLinuxDriver, DENG Qingfang, Kurt Kanzenbach, Hauke Mehrtens,
	Woojung Huh, Sean Wang, Landen Chao, Alexandre Belloni,
	George McCollister, John Crispin, Aleksander Jan Bajkowski,
	Egil Hjelmeland, Oleksij Rempel

DSA is preparing to offer switch drivers an API through which they can
associate each FDB entry with a struct net_device *bridge_dev. This can
be used to perform FDB isolation (the FDB lookup performed on the
ingress of a standalone, or bridged port, should not find an FDB entry
that is present in the FDB of another bridge).

In preparation of that work, DSA needs to ensure that by the time we
call the switch .port_fdb_add and .port_fdb_del methods, the
dp->bridge_dev pointer is still valid, i.e. the port is still a bridge
port.

Currently this is true for .port_fdb_add, but not guaranteed to be true
for .port_fdb_del. This is because the SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE
API requires drivers that must have sleepable context to handle those
events to schedule the deferred work themselves. DSA does this through
the dsa_owq.

It can happen that a port leaves a bridge, del_nbp() flushes the FDB on
that port, SWITCHDEV_FDB_DEL_TO_DEVICE is notified in atomic context,
DSA schedules its deferred work, but del_nbp() finishes unlinking the
bridge as a master from the port before DSA's deferred work is run.

Fundamentally, the port must not be unlinked from the bridge until all
FDB deletion deferred work items have been flushed. The bridge must wait
for the completion of these hardware accesses.

I have tried to address this issue centrally in switchdev by making
SWITCHDEV_FDB_DEL_TO_DEVICE deferred (=> blocking) at the switchdev
level, which would offer implicit synchronization with del_nbp:

https://patchwork.kernel.org/project/netdevbpf/cover/20210820115746.3701811-1-vladimir.oltean@nxp.com/

but it seems that any attempt to modify switchdev's behavior and make
the events blocking there would introduce undesirable side effects in
other switchdev consumers.

The most undesirable behavior seems to be that
switchdev_deferred_process_work() takes the rtnl_mutex itself, which
would be worse off than having the rtnl_mutex taken individually from
drivers which is what we have now.

So to offer the needed guarantee to DSA switch drivers, I have come up
with a compromise solution that does not require switchdev rework:
we already have a hook at the last moment in time when the bridge is
still an upper of ours: the NETDEV_PRECHANGEUPPER handler. We can flush
the dsa_owq manually from there, which makes all FDB deletions
synchronous.

Major problem: the NETDEV_PRECHANGEUPPER event runs with rtnl_mutex held,
so flushing dsa_owq would deadlock if dsa_slave_switchdev_event_work
would take the rtnl_mutex too.

So not only would it be desirable to drop the rtnl_lock from DSA, it is
actually mandatory to do so.

This change requires ACKs from driver maintainers, since we expose
switches to a method which is now unlocked and can trigger concurrency
issue in the access to hardware.

I've eyeballed the existing drivers, and have needed to patch sja1105
and felix/ocelot. I am also looking at the b53 driver where the ARL ops
are unlocked. The other drivers do seem to have a mutex of sorts, but I
am fairly skeptical that its serialization features have really been put
to the test (knowing that the rtnl_mutex serialized accesses already).
So any regression test from drivers that implement:
- .port_fdb_add
- .port_fdb_del
- .port_fdb_dump
- .port_mdb_add
- .port_mdb_del
- .port_fast_age

is appreciated.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/dsa.c      | 5 +++++
 net/dsa/dsa_priv.h | 2 ++
 net/dsa/port.c     | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 1dc45e40f961..8e7207c85d61 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -345,6 +345,11 @@ bool dsa_schedule_work(struct work_struct *work)
 	return queue_work(dsa_owq, work);
 }
 
+void dsa_flush_work(void)
+{
+	flush_workqueue(dsa_owq);
+}
+
 int dsa_devlink_param_get(struct devlink *dl, u32 id,
 			  struct devlink_param_gset_ctx *ctx)
 {
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 33ab7d7af9eb..1dc28ad4b8a8 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -170,6 +170,8 @@ void dsa_tag_driver_put(const struct dsa_device_ops *ops);
 const struct dsa_device_ops *dsa_find_tagger_by_name(const char *buf);
 
 bool dsa_schedule_work(struct work_struct *work);
+void dsa_flush_work(void);
+
 const char *dsa_tag_protocol_to_str(const struct dsa_device_ops *ops);
 
 static inline int dsa_tag_protocol_overhead(const struct dsa_device_ops *ops)
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 616330a16d31..65ce114b9fc8 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -380,6 +380,8 @@ void dsa_port_pre_bridge_leave(struct dsa_port *dp, struct net_device *br)
 	switchdev_bridge_port_unoffload(brport_dev, dp,
 					&dsa_slave_switchdev_notifier,
 					&dsa_slave_switchdev_blocking_notifier);
+
+	dsa_flush_work();
 }
 
 void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH net-next 7/8] selftests: lib: forwarding: allow tests to not require mz and jq
  2021-08-24 11:40 [RFC PATCH net-next 0/8] Drop rtnl_lock from DSA .port_fdb_{add,del} Vladimir Oltean
                   ` (5 preceding siblings ...)
  2021-08-24 11:40 ` [RFC PATCH net-next 6/8] net: dsa: flush switchdev workqueue when leaving the bridge Vladimir Oltean
@ 2021-08-24 11:40 ` Vladimir Oltean
  2021-08-24 11:40 ` [RFC PATCH net-next 8/8] selftests: net: dsa: add a stress test for unlocked FDB operations Vladimir Oltean
  7 siblings, 0 replies; 9+ messages in thread
From: Vladimir Oltean @ 2021-08-24 11:40 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	UNGLinuxDriver, DENG Qingfang, Kurt Kanzenbach, Hauke Mehrtens,
	Woojung Huh, Sean Wang, Landen Chao, Alexandre Belloni,
	George McCollister, John Crispin, Aleksander Jan Bajkowski,
	Egil Hjelmeland, Oleksij Rempel

These programs are useful, but not all selftests require them.

Additionally, on embedded boards without package management (things like
buildroot), installing mausezahn or jq is not always as trivial as
downloading a package from the web.

So it is actually a bit annoying to require programs that are not used.
Introduce options that can be set by scripts to not enforce these
dependencies. For compatibility, default to "yes".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 tools/testing/selftests/net/forwarding/lib.sh | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 42e28c983d41..b937472d2e17 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -20,6 +20,8 @@ MC_CLI=${MC_CLI:=smcroutectl}
 PING_TIMEOUT=${PING_TIMEOUT:=5}
 WAIT_TIMEOUT=${WAIT_TIMEOUT:=20}
 INTERFACE_TIMEOUT=${INTERFACE_TIMEOUT:=600}
+REQUIRE_JQ=${REQUIRE_JQ:=yes}
+REQUIRE_MZ=${REQUIRE_MZ:=yes}
 
 relative_path="${BASH_SOURCE%/*}"
 if [[ "$relative_path" == "${BASH_SOURCE}" ]]; then
@@ -138,8 +140,12 @@ require_command()
 	fi
 }
 
-require_command jq
-require_command $MZ
+if [[ "$REQUIRE_JQ" = "yes" ]]; then
+	require_command jq
+fi
+if [[ "$REQUIRE_MZ" = "yes" ]]; then
+	require_command $MZ
+fi
 
 if [[ ! -v NUM_NETIFS ]]; then
 	echo "SKIP: importer does not define \"NUM_NETIFS\""
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH net-next 8/8] selftests: net: dsa: add a stress test for unlocked FDB operations
  2021-08-24 11:40 [RFC PATCH net-next 0/8] Drop rtnl_lock from DSA .port_fdb_{add,del} Vladimir Oltean
                   ` (6 preceding siblings ...)
  2021-08-24 11:40 ` [RFC PATCH net-next 7/8] selftests: lib: forwarding: allow tests to not require mz and jq Vladimir Oltean
@ 2021-08-24 11:40 ` Vladimir Oltean
  7 siblings, 0 replies; 9+ messages in thread
From: Vladimir Oltean @ 2021-08-24 11:40 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	UNGLinuxDriver, DENG Qingfang, Kurt Kanzenbach, Hauke Mehrtens,
	Woojung Huh, Sean Wang, Landen Chao, Alexandre Belloni,
	George McCollister, John Crispin, Aleksander Jan Bajkowski,
	Egil Hjelmeland, Oleksij Rempel

This test is a bit strange in that it is perhaps more manual than
others: it does not transmit a clear OK/FAIL verdict, because user space
does not have synchronous feedback from the kernel. If a hardware access
fails, it is in deferred context.

Nonetheless, on sja1105 I have used it successfully to find and solve a
concurrency issue, so it can be used as a starting point for other
driver maintainers too.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 MAINTAINERS                                   |  1 +
 .../drivers/net/dsa/test_bridge_fdb_stress.sh | 48 +++++++++++++++++++
 2 files changed, 49 insertions(+)
 create mode 100755 tools/testing/selftests/drivers/net/dsa/test_bridge_fdb_stress.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 06e39d3eba93..23b65ef5f11a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12958,6 +12958,7 @@ F:	include/linux/dsa/
 F:	include/linux/platform_data/dsa.h
 F:	include/net/dsa.h
 F:	net/dsa/
+F:	tools/testing/selftests/drivers/net/dsa/
 
 NETWORKING [GENERAL]
 M:	"David S. Miller" <davem@davemloft.net>
diff --git a/tools/testing/selftests/drivers/net/dsa/test_bridge_fdb_stress.sh b/tools/testing/selftests/drivers/net/dsa/test_bridge_fdb_stress.sh
new file mode 100755
index 000000000000..cdb7f9ca2251
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/dsa/test_bridge_fdb_stress.sh
@@ -0,0 +1,48 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# Bridge FDB entries can be offloaded to DSA switches without holding the
+# rtnl_mutex. Traditionally this mutex has conferred drivers implicit
+# serialization, which means their code paths are not well tested in the
+# presence of concurrency.
+# This test creates a background task that stresses the FDB by adding and
+# deleting an entry many times in a row without the rtnl_mutex held.
+# It then tests the driver resistance to concurrency by calling .ndo_fdb_dump
+# (with rtnl_mutex held) from a foreground task.
+# Since either the FDB dump or the additions/removals can fail, but the
+# additions and removals are performed in deferred as opposed to process
+# context, we cannot simply check for user space error codes.
+
+WAIT_TIME=1
+NUM_NETIFS=1
+REQUIRE_JQ="no"
+REQUIRE_MZ="no"
+NETIF_CREATE="no"
+lib_dir=$(dirname $0)/../../../net/forwarding
+source $lib_dir/lib.sh
+
+cleanup() {
+	echo "Cleaning up"
+	ip link del br0
+	kill $pid
+	killall bash
+	echo "Please check kernel log for errors"
+}
+trap 'cleanup' EXIT
+
+eth=${NETIFS[p1]}
+
+ip link del br0 2&>1 >/dev/null || :
+ip link add br0 type bridge && ip link set $eth master br0
+
+(while :; do
+	bridge fdb add 00:01:02:03:04:05 dev $eth master static
+	bridge fdb del 00:01:02:03:04:05 dev $eth master static
+done) &
+pid=$!
+
+for i in $(seq 1 50); do
+	bridge fdb show > /dev/null
+	sleep 3
+	echo "$((${i} * 2))% complete..."
+done
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-08-24 11:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-24 11:40 [RFC PATCH net-next 0/8] Drop rtnl_lock from DSA .port_fdb_{add,del} Vladimir Oltean
2021-08-24 11:40 ` [RFC PATCH net-next 1/8] net: dsa: sja1105: wait for dynamic config command completion on writes too Vladimir Oltean
2021-08-24 11:40 ` [RFC PATCH net-next 2/8] net: dsa: sja1105: serialize access to the dynamic config interface Vladimir Oltean
2021-08-24 11:40 ` [RFC PATCH net-next 3/8] net: mscc: ocelot: serialize access to the MAC table Vladimir Oltean
2021-08-24 11:40 ` [RFC PATCH net-next 4/8] net: dsa: introduce locking for the address lists on CPU and DSA ports Vladimir Oltean
2021-08-24 11:40 ` [RFC PATCH net-next 5/8] net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work Vladimir Oltean
2021-08-24 11:40 ` [RFC PATCH net-next 6/8] net: dsa: flush switchdev workqueue when leaving the bridge Vladimir Oltean
2021-08-24 11:40 ` [RFC PATCH net-next 7/8] selftests: lib: forwarding: allow tests to not require mz and jq Vladimir Oltean
2021-08-24 11:40 ` [RFC PATCH net-next 8/8] selftests: net: dsa: add a stress test for unlocked FDB operations Vladimir Oltean

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.