[PATCH v2 net-next 1/2] net: dsa: mv88e6xxx: Improve performance of busy bit polling

* [PATCH v2 net-next 1/2] net: dsa: mv88e6xxx: Improve performance of busy bit polling
       [not found] <20220128104938.2211441-1-tobias@waldekranz.com>
@ 2022-01-28 10:49 ` Tobias Waldekranz
  2022-01-28 14:13   ` Andrew Lunn
  2022-01-28 10:49 ` [PATCH v2 net-next 2/2] net: dsa: mv88e6xxx: Improve indirect addressing performance Tobias Waldekranz
  1 sibling, 1 reply; 3+ messages in thread
From: Tobias Waldekranz @ 2022-01-28 10:49 UTC (permalink / raw)
  To: davem, kuba
  Cc: netdev, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, linux-kernel

Avoid a long delay when a busy bit is still set and has to be polled
again.

Measurements on a system with 2 Opals (6097F) and one Agate (6352)
show that even with this much tighter loop, we have about a 50% chance
of the bit being cleared on the first poll, all other accesses see the
bit being cleared on the second poll.

On a standard MDIO bus running MDC at 2.5MHz, a single access with 32
bits of preamble plus 32 bits of data takes 64*(1/2.5MHz) = 25.6us.

This means that mv88e6xxx_smi_direct_wait took 26us + CPU overhead in
the fast scenario, but 26us + 1500us + 26us + CPU overhead in the slow
case - bringing the average close to 1ms.

With this change in place, the slow case is closer to 2*26us + CPU
overhead, with the average well below 100us - a 10x improvement.

This translates to real-world winnings. On a 3-chip 20-port system,
the modprobe time drops by 88%:

Before:

root@coronet:~# time modprobe mv88e6xxx
real    0m 15.99s
user    0m 0.00s
sys     0m 1.52s

After:

root@coronet:~# time modprobe mv88e6xxx
real    0m 2.21s
user    0m 0.00s
sys     0m 1.54s

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 10 +++++++---
 drivers/net/dsa/mv88e6xxx/smi.c  |  8 ++++++--
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 58ca684d73f7..de8a568a8c53 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -86,12 +86,16 @@ int mv88e6xxx_write(struct mv88e6xxx_chip *chip, int addr, int reg, u16 val)
 int mv88e6xxx_wait_mask(struct mv88e6xxx_chip *chip, int addr, int reg,
 			u16 mask, u16 val)
 {
+	const unsigned long timeout = jiffies + msecs_to_jiffies(50);
 	u16 data;
 	int err;
 	int i;
 
-	/* There's no bus specific operation to wait for a mask */
-	for (i = 0; i < 16; i++) {
+	/* There's no bus specific operation to wait for a mask. Even
+	 * if the initial poll takes longer than 50ms, always do at
+	 * least one more attempt.
+	 */
+	for (i = 0; time_before(jiffies, timeout) || (i < 2); i++) {
 		err = mv88e6xxx_read(chip, addr, reg, &data);
 		if (err)
 			return err;
@@ -99,7 +103,7 @@ int mv88e6xxx_wait_mask(struct mv88e6xxx_chip *chip, int addr, int reg,
 		if ((data & mask) == val)
 			return 0;
 
-		usleep_range(1000, 2000);
+		cpu_relax();
 	}
 
 	dev_err(chip->dev, "Timeout while waiting for switch\n");
diff --git a/drivers/net/dsa/mv88e6xxx/smi.c b/drivers/net/dsa/mv88e6xxx/smi.c
index 282fe08db050..466d2aaa9fcb 100644
--- a/drivers/net/dsa/mv88e6xxx/smi.c
+++ b/drivers/net/dsa/mv88e6xxx/smi.c
@@ -55,11 +55,15 @@ static int mv88e6xxx_smi_direct_write(struct mv88e6xxx_chip *chip,
 static int mv88e6xxx_smi_direct_wait(struct mv88e6xxx_chip *chip,
 				     int dev, int reg, int bit, int val)
 {
+	const unsigned long timeout = jiffies + msecs_to_jiffies(50);
 	u16 data;
 	int err;
 	int i;
 
-	for (i = 0; i < 16; i++) {
+	/* Even if the initial poll takes longer than 50ms, always do
+	 * at least one more attempt.
+	 */
+	for (i = 0; time_before(jiffies, timeout) || (i < 2); i++) {
 		err = mv88e6xxx_smi_direct_read(chip, dev, reg, &data);
 		if (err)
 			return err;
@@ -67,7 +71,7 @@ static int mv88e6xxx_smi_direct_wait(struct mv88e6xxx_chip *chip,
 		if (!!(data & BIT(bit)) == !!val)
 			return 0;
 
-		usleep_range(1000, 2000);
+		cpu_relax();
 	}
 
 	return -ETIMEDOUT;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread