linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net 0/3] Fixes for Felix DSA driver calculation of tc-taprio guard bands
@ 2022-09-05 17:01 Vladimir Oltean
  2022-09-05 17:01 ` [PATCH v2 net 1/3] net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet Vladimir Oltean
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Vladimir Oltean @ 2022-09-05 17:01 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Xiaoliang Yang, Claudiu Manoil, Alexandre Belloni,
	UNGLinuxDriver, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Michael Walle, Vinicius Costa Gomes, Maxim Kochetkov,
	Colin Foster, Richie Pearn, linux-kernel

This series fixes some bugs which are not quite new, but date from v5.13
when static guard bands were enabled by Michael Walle to prevent
tc-taprio overruns.

The investigation started when Xiaoliang asked privately what is the
expected max SDU for a traffic class when its minimum gate interval is
10 us. The answer, as it turns out, is not an L1 size of 1250 octets,
but 1245 octets, since otherwise, the switch will not consider frames
for egress scheduling, because the static guard band is exactly as large
as the time interval. The switch needs a minimum of 33 ns outside of the
guard band to consider a frame for scheduling, and the reduction of the
max SDU by 5 provides exactly for that.

The fix for that (patch 1/3) is relatively small, but during testing, it
became apparent that cut-through forwarding prevents oversized frame
dropping from working properly. This is solved through the larger patch
2/3. Finally, patch 3/3 fixes one more tc-taprio locking problem found
through code inspection.

Vladimir Oltean (3):
  net: dsa: felix: tc-taprio intervals smaller than MTU should send at
    least one packet
  net: dsa: felix: disable cut-through forwarding for frames oversized
    for tc-taprio
  net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in
    vsc9959_sched_speed_set

 drivers/net/dsa/ocelot/felix_vsc9959.c | 161 +++++++++++++++++--------
 1 file changed, 112 insertions(+), 49 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 net 1/3] net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
  2022-09-05 17:01 [PATCH v2 net 0/3] Fixes for Felix DSA driver calculation of tc-taprio guard bands Vladimir Oltean
@ 2022-09-05 17:01 ` Vladimir Oltean
  2022-09-05 22:53   ` Michael Walle
  2022-09-05 17:01 ` [PATCH v2 net 2/3] net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio Vladimir Oltean
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Vladimir Oltean @ 2022-09-05 17:01 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Xiaoliang Yang, Claudiu Manoil, Alexandre Belloni,
	UNGLinuxDriver, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Michael Walle, Vinicius Costa Gomes, Maxim Kochetkov,
	Colin Foster, Richie Pearn, linux-kernel

The blamed commit broke tc-taprio schedules such as this one:

tc qdisc replace dev $swp1 root taprio \
        num_tc 8 \
        map 0 1 2 3 4 5 6 7 \
        queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
        base-time 0 \
        sched-entry S 0x7f 990000 \
        sched-entry S 0x80  10000 \
        flags 0x2

because the gate entry for TC 7 (S 0x80 10000 ns) now has a static guard
band added earlier than its 'gate close' event, such that packet
overruns won't occur in the worst case of the largest packet possible.

Since guard bands are statically determined based on the per-tc
QSYS_QMAXSDU_CFG_* with a fallback on the port-based QSYS_PORT_MAX_SDU,
we need to discuss what happens with TC 7 depending on kernel version,
since the driver, prior to commit 55a515b1f5a9 ("net: dsa: felix: drop
oversized frames with tc-taprio instead of hanging the port"), did not
touch QSYS_QMAXSDU_CFG_*, and therefore relied on QSYS_PORT_MAX_SDU.

1 (before vsc9959_tas_guard_bands_update): QSYS_PORT_MAX_SDU defaults to
  1518, and at gigabit this introduces a static guard band (independent
  of packet sizes) of 12144 ns, plus QSYS::HSCH_MISC_CFG.FRM_ADJ (bit
  time of 20 octets => 160 ns). But this is larger than the time window
  itself, of 10000 ns. So, the queue system never considers a frame with
  TC 7 as eligible for transmission, since the gate practically never
  opens, and these frames are forever stuck in the TX queues and hang
  the port.

2 (after vsc9959_tas_guard_bands_update): Under the sole goal of
  enabling oversized frame dropping, we make an effort to set
  QSYS_QMAXSDU_CFG_7 to 1230 bytes. But QSYS_QMAXSDU_CFG_7 plays
  one more role, which we did not take into account: per-tc static guard
  band, expressed in L2 byte time (auto-adjusted for FCS and L1 overhead).
  There is a discrepancy between what the driver thinks (that there is
  no guard band, and 100% of min_gate_len[tc] is available for egress
  scheduling) and what the hardware actually does (crops the equivalent
  of QSYS_QMAXSDU_CFG_7 ns out of min_gate_len[tc]). In practice, this
  means that the hardware thinks it has exactly 0 ns for scheduling tc 7.

In both cases, even minimum sized Ethernet frames are stuck on egress
rather than being considered for scheduling on TC 7, even if they would
fit given a proper configuration. Considering the current situation,
with vsc9959_tas_guard_bands_update(), frames between 60 octets and 1230
octets in size are not eligible for oversized dropping (because they are
smaller than QSYS_QMAXSDU_CFG_7), but won't be considered as eligible
for scheduling either, because the min_gate_len[7] (10000 ns) minus the
guard band determined by QSYS_QMAXSDU_CFG_7 (1230 octets * 8 ns per
octet == 9840 ns) minus the guard band auto-added for L1 overhead by
QSYS::HSCH_MISC_CFG.FRM_ADJ (20 octets * 8 ns per octet == 160 octets)
leaves 0 ns for scheduling in the queue system proper.

Investigating the hardware behavior, it becomes apparent that the queue
system needs precisely 33 ns of 'gate open' time in order to consider a
frame as eligible for scheduling to a tc. So the solution to this
problem is to amend vsc9959_tas_guard_bands_update(), by giving the
per-tc guard bands less space by exactly 33 ns, just enough for one
frame to be scheduled in that interval. This allows the queue system to
make forward progress for that port-tc, and prevents it from hanging.

Fixes: 297c4de6f780 ("net: dsa: felix: re-enable TAS guard band mode")
Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
v1->v2:
- add Xiaoliang's Reported-by: tag, since he did report the correct
  issue after all (my bad)
- reword commit message and explanations
- reserve just 33 ns of min_gate_len[tc] as useful space for scheduling,
  rather than one whole max MTU frame worth of time. In the given
  example of 10 us tc-taprio interval, this raises our useful max SDU
  to 1225, compared to 601 octets in v1.

 drivers/net/dsa/ocelot/felix_vsc9959.c | 35 +++++++++++++++++++++++---
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c b/drivers/net/dsa/ocelot/felix_vsc9959.c
index 1cdce8a98d1d..262be2cf4e4b 100644
--- a/drivers/net/dsa/ocelot/felix_vsc9959.c
+++ b/drivers/net/dsa/ocelot/felix_vsc9959.c
@@ -22,6 +22,7 @@
 #define VSC9959_NUM_PORTS		6
 
 #define VSC9959_TAS_GCL_ENTRY_MAX	63
+#define VSC9959_TAS_MIN_GATE_LEN_NS	33
 #define VSC9959_VCAP_POLICER_BASE	63
 #define VSC9959_VCAP_POLICER_MAX	383
 #define VSC9959_SWITCH_PCI_BAR		4
@@ -1478,6 +1479,23 @@ static void vsc9959_mdio_bus_free(struct ocelot *ocelot)
 	mdiobus_free(felix->imdio);
 }
 
+/* The switch considers any frame (regardless of size) as eligible for
+ * transmission if the traffic class gate is open for at least 33 ns.
+ * Overruns are prevented by cropping an interval at the end of the gate time
+ * slot for which egress scheduling is blocked, but we need to still keep 33 ns
+ * available for one packet to be transmitted, otherwise the port tc will hang.
+ * This function returns the size of a gate interval that remains available for
+ * setting the guard band, after reserving the space for one egress frame.
+ */
+static u64 vsc9959_tas_remaining_gate_len_ps(u64 gate_len_ns)
+{
+	/* Gate always open */
+	if (gate_len_ns == U64_MAX)
+		return U64_MAX;
+
+	return (gate_len_ns - VSC9959_TAS_MIN_GATE_LEN_NS) * PSEC_PER_NSEC;
+}
+
 /* Extract shortest continuous gate open intervals in ns for each traffic class
  * of a cyclic tc-taprio schedule. If a gate is always open, the duration is
  * considered U64_MAX. If the gate is always closed, it is considered 0.
@@ -1596,10 +1614,13 @@ static void vsc9959_tas_guard_bands_update(struct ocelot *ocelot, int port)
 	vsc9959_tas_min_gate_lengths(ocelot_port->taprio, min_gate_len);
 
 	for (tc = 0; tc < OCELOT_NUM_TC; tc++) {
+		u64 remaining_gate_len_ps;
 		u32 max_sdu;
 
-		if (min_gate_len[tc] == U64_MAX /* Gate always open */ ||
-		    min_gate_len[tc] * PSEC_PER_NSEC > needed_bit_time_ps) {
+		remaining_gate_len_ps =
+			vsc9959_tas_remaining_gate_len_ps(min_gate_len[tc]);
+
+		if (remaining_gate_len_ps > needed_bit_time_ps) {
 			/* Setting QMAXSDU_CFG to 0 disables oversized frame
 			 * dropping.
 			 */
@@ -1612,9 +1633,15 @@ static void vsc9959_tas_guard_bands_update(struct ocelot *ocelot, int port)
 			/* If traffic class doesn't support a full MTU sized
 			 * frame, make sure to enable oversize frame dropping
 			 * for frames larger than the smallest that would fit.
+			 *
+			 * However, the exact same register, QSYS_QMAXSDU_CFG_*,
+			 * controls not only oversized frame dropping, but also
+			 * per-tc static guard band lengths, so it reduces the
+			 * useful gate interval length. Therefore, be careful
+			 * to calculate a guard band (and therefore max_sdu)
+			 * that still leaves 33 ns available in the time slot.
 			 */
-			max_sdu = div_u64(min_gate_len[tc] * PSEC_PER_NSEC,
-					  picos_per_byte);
+			max_sdu = div_u64(remaining_gate_len_ps, picos_per_byte);
 			/* A TC gate may be completely closed, which is a
 			 * special case where all packets are oversized.
 			 * Any limit smaller than 64 octets accomplishes this
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 net 2/3] net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
  2022-09-05 17:01 [PATCH v2 net 0/3] Fixes for Felix DSA driver calculation of tc-taprio guard bands Vladimir Oltean
  2022-09-05 17:01 ` [PATCH v2 net 1/3] net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet Vladimir Oltean
@ 2022-09-05 17:01 ` Vladimir Oltean
  2022-09-05 17:01 ` [PATCH v2 net 3/3] net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in vsc9959_sched_speed_set Vladimir Oltean
  2022-09-07 13:00 ` [PATCH v2 net 0/3] Fixes for Felix DSA driver calculation of tc-taprio guard bands patchwork-bot+netdevbpf
  3 siblings, 0 replies; 8+ messages in thread
From: Vladimir Oltean @ 2022-09-05 17:01 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Xiaoliang Yang, Claudiu Manoil, Alexandre Belloni,
	UNGLinuxDriver, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Michael Walle, Vinicius Costa Gomes, Maxim Kochetkov,
	Colin Foster, Richie Pearn, linux-kernel

Experimentally, it looks like when QSYS_QMAXSDU_CFG_7 is set to 605,
frames even way larger than 601 octets are transmitted even though these
should be considered as oversized, according to the documentation, and
dropped.

Since oversized frame dropping depends on frame size, which is only
known at the EOF stage, and therefore not at SOF when cut-through
forwarding begins, it means that the switch cannot take QSYS_QMAXSDU_CFG_*
into consideration for traffic classes that are cut-through.

Since cut-through forwarding has no UAPI to control it, and the driver
enables it based on the mantra "if we can, then why not", the strategy
is to alter vsc9959_cut_through_fwd() to take into consideration which
tc's have oversize frame dropping enabled, and disable cut-through for
them. Then, from vsc9959_tas_guard_bands_update(), we re-trigger the
cut-through determination process.

There are 2 strategies for vsc9959_cut_through_fwd() to determine
whether a tc has oversized dropping enabled or not. One is to keep a bit
mask of traffic classes per port, and the other is to read back from the
hardware registers (a non-zero value of QSYS_QMAXSDU_CFG_* means the
feature is enabled). We choose reading back from registers, because
struct ocelot_port is shared with drivers (ocelot, seville) that don't
support either cut-through nor tc-taprio, and we don't have a felix
specific extension of struct ocelot_port. Furthermore, reading registers
from the Felix hardware is quite cheap, since they are memory-mapped.

Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
v1->v2: none

 drivers/net/dsa/ocelot/felix_vsc9959.c | 122 ++++++++++++++++---------
 1 file changed, 79 insertions(+), 43 deletions(-)

diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c b/drivers/net/dsa/ocelot/felix_vsc9959.c
index 262be2cf4e4b..ad7c795d6612 100644
--- a/drivers/net/dsa/ocelot/felix_vsc9959.c
+++ b/drivers/net/dsa/ocelot/felix_vsc9959.c
@@ -1557,6 +1557,65 @@ static void vsc9959_tas_min_gate_lengths(struct tc_taprio_qopt_offload *taprio,
 			min_gate_len[tc] = 0;
 }
 
+/* ocelot_write_rix is a macro that concatenates QSYS_MAXSDU_CFG_* with _RSZ,
+ * so we need to spell out the register access to each traffic class in helper
+ * functions, to simplify callers
+ */
+static void vsc9959_port_qmaxsdu_set(struct ocelot *ocelot, int port, int tc,
+				     u32 max_sdu)
+{
+	switch (tc) {
+	case 0:
+		ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_0,
+				 port);
+		break;
+	case 1:
+		ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_1,
+				 port);
+		break;
+	case 2:
+		ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_2,
+				 port);
+		break;
+	case 3:
+		ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_3,
+				 port);
+		break;
+	case 4:
+		ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_4,
+				 port);
+		break;
+	case 5:
+		ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_5,
+				 port);
+		break;
+	case 6:
+		ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_6,
+				 port);
+		break;
+	case 7:
+		ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_7,
+				 port);
+		break;
+	}
+}
+
+static u32 vsc9959_port_qmaxsdu_get(struct ocelot *ocelot, int port, int tc)
+{
+	switch (tc) {
+	case 0: return ocelot_read_rix(ocelot, QSYS_QMAXSDU_CFG_0, port);
+	case 1: return ocelot_read_rix(ocelot, QSYS_QMAXSDU_CFG_1, port);
+	case 2: return ocelot_read_rix(ocelot, QSYS_QMAXSDU_CFG_2, port);
+	case 3: return ocelot_read_rix(ocelot, QSYS_QMAXSDU_CFG_3, port);
+	case 4: return ocelot_read_rix(ocelot, QSYS_QMAXSDU_CFG_4, port);
+	case 5: return ocelot_read_rix(ocelot, QSYS_QMAXSDU_CFG_5, port);
+	case 6: return ocelot_read_rix(ocelot, QSYS_QMAXSDU_CFG_6, port);
+	case 7: return ocelot_read_rix(ocelot, QSYS_QMAXSDU_CFG_7, port);
+	default:
+		return 0;
+	}
+}
+
 /* Update QSYS_PORT_MAX_SDU to make sure the static guard bands added by the
  * switch (see the ALWAYS_GUARD_BAND_SCH_Q comment) are correct at all MTU
  * values (the default value is 1518). Also, for traffic class windows smaller
@@ -1613,6 +1672,8 @@ static void vsc9959_tas_guard_bands_update(struct ocelot *ocelot, int port)
 
 	vsc9959_tas_min_gate_lengths(ocelot_port->taprio, min_gate_len);
 
+	mutex_lock(&ocelot->fwd_domain_lock);
+
 	for (tc = 0; tc < OCELOT_NUM_TC; tc++) {
 		u64 remaining_gate_len_ps;
 		u32 max_sdu;
@@ -1664,47 +1725,14 @@ static void vsc9959_tas_guard_bands_update(struct ocelot *ocelot, int port)
 				 max_sdu);
 		}
 
-		/* ocelot_write_rix is a macro that concatenates
-		 * QSYS_MAXSDU_CFG_* with _RSZ, so we need to spell out
-		 * the writes to each traffic class
-		 */
-		switch (tc) {
-		case 0:
-			ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_0,
-					 port);
-			break;
-		case 1:
-			ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_1,
-					 port);
-			break;
-		case 2:
-			ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_2,
-					 port);
-			break;
-		case 3:
-			ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_3,
-					 port);
-			break;
-		case 4:
-			ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_4,
-					 port);
-			break;
-		case 5:
-			ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_5,
-					 port);
-			break;
-		case 6:
-			ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_6,
-					 port);
-			break;
-		case 7:
-			ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_7,
-					 port);
-			break;
-		}
+		vsc9959_port_qmaxsdu_set(ocelot, port, tc, max_sdu);
 	}
 
 	ocelot_write_rix(ocelot, maxlen, QSYS_PORT_MAX_SDU, port);
+
+	ocelot->ops->cut_through_fwd(ocelot);
+
+	mutex_unlock(&ocelot->fwd_domain_lock);
 }
 
 static void vsc9959_sched_speed_set(struct ocelot *ocelot, int port,
@@ -2797,7 +2825,7 @@ static void vsc9959_cut_through_fwd(struct ocelot *ocelot)
 {
 	struct felix *felix = ocelot_to_felix(ocelot);
 	struct dsa_switch *ds = felix->ds;
-	int port, other_port;
+	int tc, port, other_port;
 
 	lockdep_assert_held(&ocelot->fwd_domain_lock);
 
@@ -2841,19 +2869,27 @@ static void vsc9959_cut_through_fwd(struct ocelot *ocelot)
 				min_speed = other_ocelot_port->speed;
 		}
 
-		/* Enable cut-through forwarding for all traffic classes. */
-		if (ocelot_port->speed == min_speed)
+		/* Enable cut-through forwarding for all traffic classes that
+		 * don't have oversized dropping enabled, since this check is
+		 * bypassed in cut-through mode.
+		 */
+		if (ocelot_port->speed == min_speed) {
 			val = GENMASK(7, 0);
 
+			for (tc = 0; tc < OCELOT_NUM_TC; tc++)
+				if (vsc9959_port_qmaxsdu_get(ocelot, port, tc))
+					val &= ~BIT(tc);
+		}
+
 set:
 		tmp = ocelot_read_rix(ocelot, ANA_CUT_THRU_CFG, port);
 		if (tmp == val)
 			continue;
 
 		dev_dbg(ocelot->dev,
-			"port %d fwd mask 0x%lx speed %d min_speed %d, %s cut-through forwarding\n",
+			"port %d fwd mask 0x%lx speed %d min_speed %d, %s cut-through forwarding on TC mask 0x%x\n",
 			port, mask, ocelot_port->speed, min_speed,
-			val ? "enabling" : "disabling");
+			val ? "enabling" : "disabling", val);
 
 		ocelot_write_rix(ocelot, val, ANA_CUT_THRU_CFG, port);
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 net 3/3] net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in vsc9959_sched_speed_set
  2022-09-05 17:01 [PATCH v2 net 0/3] Fixes for Felix DSA driver calculation of tc-taprio guard bands Vladimir Oltean
  2022-09-05 17:01 ` [PATCH v2 net 1/3] net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet Vladimir Oltean
  2022-09-05 17:01 ` [PATCH v2 net 2/3] net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio Vladimir Oltean
@ 2022-09-05 17:01 ` Vladimir Oltean
  2022-09-07 13:00 ` [PATCH v2 net 0/3] Fixes for Felix DSA driver calculation of tc-taprio guard bands patchwork-bot+netdevbpf
  3 siblings, 0 replies; 8+ messages in thread
From: Vladimir Oltean @ 2022-09-05 17:01 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Xiaoliang Yang, Claudiu Manoil, Alexandre Belloni,
	UNGLinuxDriver, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Michael Walle, Vinicius Costa Gomes, Maxim Kochetkov,
	Colin Foster, Richie Pearn, linux-kernel

The read-modify-write of QSYS_TAG_CONFIG from vsc9959_sched_speed_set()
runs unlocked with respect to the other functions that access it, which
are vsc9959_tas_guard_bands_update(), vsc9959_qos_port_tas_set() and
vsc9959_tas_clock_adjust(). All the others are under ocelot->tas_lock,
so move the vsc9959_sched_speed_set() access under that lock as well, to
resolve the concurrency.

Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
v1->v2: none

 drivers/net/dsa/ocelot/felix_vsc9959.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c b/drivers/net/dsa/ocelot/felix_vsc9959.c
index ad7c795d6612..f8f19a85744c 100644
--- a/drivers/net/dsa/ocelot/felix_vsc9959.c
+++ b/drivers/net/dsa/ocelot/felix_vsc9959.c
@@ -1759,13 +1759,13 @@ static void vsc9959_sched_speed_set(struct ocelot *ocelot, int port,
 		break;
 	}
 
+	mutex_lock(&ocelot->tas_lock);
+
 	ocelot_rmw_rix(ocelot,
 		       QSYS_TAG_CONFIG_LINK_SPEED(tas_speed),
 		       QSYS_TAG_CONFIG_LINK_SPEED_M,
 		       QSYS_TAG_CONFIG, port);
 
-	mutex_lock(&ocelot->tas_lock);
-
 	if (ocelot_port->taprio)
 		vsc9959_tas_guard_bands_update(ocelot, port);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net 1/3] net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
  2022-09-05 17:01 ` [PATCH v2 net 1/3] net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet Vladimir Oltean
@ 2022-09-05 22:53   ` Michael Walle
  2022-09-06  0:11     ` Vladimir Oltean
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Walle @ 2022-09-05 22:53 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Xiaoliang Yang, Claudiu Manoil, Alexandre Belloni,
	UNGLinuxDriver, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Vinicius Costa Gomes, Maxim Kochetkov, Colin Foster,
	Richie Pearn, linux-kernel

Am 2022-09-05 19:01, schrieb Vladimir Oltean:
> The blamed commit broke tc-taprio schedules such as this one:
> 
> tc qdisc replace dev $swp1 root taprio \
>         num_tc 8 \
>         map 0 1 2 3 4 5 6 7 \
>         queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
>         base-time 0 \
>         sched-entry S 0x7f 990000 \
>         sched-entry S 0x80  10000 \
>         flags 0x2
> 
> because the gate entry for TC 7 (S 0x80 10000 ns) now has a static 
> guard
> band added earlier than its 'gate close' event, such that packet
> overruns won't occur in the worst case of the largest packet possible.
> 
> Since guard bands are statically determined based on the per-tc
> QSYS_QMAXSDU_CFG_* with a fallback on the port-based QSYS_PORT_MAX_SDU,
> we need to discuss what happens with TC 7 depending on kernel version,
> since the driver, prior to commit 55a515b1f5a9 ("net: dsa: felix: drop
> oversized frames with tc-taprio instead of hanging the port"), did not
> touch QSYS_QMAXSDU_CFG_*, and therefore relied on QSYS_PORT_MAX_SDU.
> 
> 1 (before vsc9959_tas_guard_bands_update): QSYS_PORT_MAX_SDU defaults 
> to
>   1518, and at gigabit this introduces a static guard band (independent
>   of packet sizes) of 12144 ns, plus QSYS::HSCH_MISC_CFG.FRM_ADJ (bit
>   time of 20 octets => 160 ns). But this is larger than the time window
>   itself, of 10000 ns. So, the queue system never considers a frame 
> with
>   TC 7 as eligible for transmission, since the gate practically never
>   opens, and these frames are forever stuck in the TX queues and hang
>   the port.
> 
> 2 (after vsc9959_tas_guard_bands_update): Under the sole goal of
>   enabling oversized frame dropping, we make an effort to set
>   QSYS_QMAXSDU_CFG_7 to 1230 bytes. But QSYS_QMAXSDU_CFG_7 plays
>   one more role, which we did not take into account: per-tc static 
> guard
>   band, expressed in L2 byte time (auto-adjusted for FCS and L1 
> overhead).
>   There is a discrepancy between what the driver thinks (that there is
>   no guard band, and 100% of min_gate_len[tc] is available for egress
>   scheduling) and what the hardware actually does (crops the equivalent
>   of QSYS_QMAXSDU_CFG_7 ns out of min_gate_len[tc]). In practice, this
>   means that the hardware thinks it has exactly 0 ns for scheduling tc 
> 7.
> 
> In both cases, even minimum sized Ethernet frames are stuck on egress
> rather than being considered for scheduling on TC 7, even if they would
> fit given a proper configuration. Considering the current situation,
> with vsc9959_tas_guard_bands_update(), frames between 60 octets and 
> 1230
> octets in size are not eligible for oversized dropping (because they 
> are
> smaller than QSYS_QMAXSDU_CFG_7), but won't be considered as eligible
> for scheduling either, because the min_gate_len[7] (10000 ns) minus the
> guard band determined by QSYS_QMAXSDU_CFG_7 (1230 octets * 8 ns per
> octet == 9840 ns) minus the guard band auto-added for L1 overhead by
> QSYS::HSCH_MISC_CFG.FRM_ADJ (20 octets * 8 ns per octet == 160 octets)
> leaves 0 ns for scheduling in the queue system proper.
> 
> Investigating the hardware behavior, it becomes apparent that the queue
> system needs precisely 33 ns of 'gate open' time in order to consider a
> frame as eligible for scheduling to a tc. So the solution to this
> problem is to amend vsc9959_tas_guard_bands_update(), by giving the
> per-tc guard bands less space by exactly 33 ns, just enough for one
> frame to be scheduled in that interval. This allows the queue system to
> make forward progress for that port-tc, and prevents it from hanging.
> 
> Fixes: 297c4de6f780 ("net: dsa: felix: re-enable TAS guard band mode")
> Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

I haven't looked at the overall code, but the solution described
above sounds good.

FWIW, I don't think such a schedule, where exactly one frame
can be sent, is very likely in the wild though. Imagine a piece
of software is generating one frame per cycle. It might happen
that during one (hardware) cycle there is no frame ready (because
it is software and it jitters), but then in the next cycle, there
are now two frames ready. In that case you'll always lag one frame
behind and you'll never recover from it.

Either I'd make sure I can send at two frames in one cycle, or
my software would only send a frame every other cycle.

Thanks for taking care of this!

-michael

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net 1/3] net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
  2022-09-05 22:53   ` Michael Walle
@ 2022-09-06  0:11     ` Vladimir Oltean
  2022-09-06  7:17       ` Michael Walle
  0 siblings, 1 reply; 8+ messages in thread
From: Vladimir Oltean @ 2022-09-06  0:11 UTC (permalink / raw)
  To: Michael Walle
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Xiaoliang Yang, Claudiu Manoil, Alexandre Belloni,
	UNGLinuxDriver, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Vinicius Costa Gomes, Maxim Kochetkov, Colin Foster,
	Richie Pearn, linux-kernel

On Tue, Sep 06, 2022 at 12:53:20AM +0200, Michael Walle wrote:
> I haven't looked at the overall code, but the solution described
> above sounds good.
> 
> FWIW, I don't think such a schedule, where exactly one frame
> can be sent, is very likely in the wild though. Imagine a piece
> of software is generating one frame per cycle. It might happen
> that during one (hardware) cycle there is no frame ready (because
> it is software and it jitters), but then in the next cycle, there
> are now two frames ready. In that case you'll always lag one frame
> behind and you'll never recover from it.
> 
> Either I'd make sure I can send at two frames in one cycle, or
> my software would only send a frame every other cycle.

A 10 us interval is a 10 us interval, it shouldn't matter if you slice
it up as one 1250B frame, or two 500B frames, or four 200B frames, etc.
Except with the Microchip hardware implementation, it does. In v1, we
were slicing the 10 us interval in half for useful traffic and half for
the guard band. So we could fit more small packets in 5 us. In v2, at
your proposal, we are slicing it in 33 ns for the useful traffic, and
10 us - 33 ns for the guard band. This indeed allows for a single
packet, be it big or small. It's how the hardware works; without any
other input data point, a slicing point needs to be put somewhere.
Somehow it's just as arbitrary in v2 as where it was in v1, just
optimized for a different metric which you're now saying is less practical.

By the way, I was a fool in last year's discussion on guard bands for
saying that there isn't any way for the user to control per-tc MTU.
IEEE 802.1Qbv, later standardized as IEEE 802.1Q clause 8.6.8.4
Enhancements for scheduled traffic, does contain a queueMaxSDUTable
structure with queueMaxSDU elements. I guess I have no choice except to
add this to the tc-taprio UAPI in a net-next patch, because as I've
explained above, even though I've solved the port hanging issue, this
hardware needs more fine tuning to obtain a differentiation between many
small packets vs few large packets per interval.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net 1/3] net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
  2022-09-06  0:11     ` Vladimir Oltean
@ 2022-09-06  7:17       ` Michael Walle
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Walle @ 2022-09-06  7:17 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Xiaoliang Yang, Claudiu Manoil, Alexandre Belloni,
	UNGLinuxDriver, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Vinicius Costa Gomes, Maxim Kochetkov, Colin Foster,
	Richie Pearn, linux-kernel

Am 2022-09-06 02:11, schrieb Vladimir Oltean:
> On Tue, Sep 06, 2022 at 12:53:20AM +0200, Michael Walle wrote:
>> I haven't looked at the overall code, but the solution described
>> above sounds good.
>> 
>> FWIW, I don't think such a schedule, where exactly one frame
>> can be sent, is very likely in the wild though. Imagine a piece
>> of software is generating one frame per cycle. It might happen
>> that during one (hardware) cycle there is no frame ready (because
>> it is software and it jitters), but then in the next cycle, there
>> are now two frames ready. In that case you'll always lag one frame
>> behind and you'll never recover from it.
>> 
>> Either I'd make sure I can send at two frames in one cycle, or
>> my software would only send a frame every other cycle.
> 
> A 10 us interval is a 10 us interval, it shouldn't matter if you slice
> it up as one 1250B frame, or two 500B frames, or four 200B frames, etc.
> Except with the Microchip hardware implementation, it does. In v1, we
> were slicing the 10 us interval in half for useful traffic and half for
> the guard band. So we could fit more small packets in 5 us. In v2, at
> your proposal, we are slicing it in 33 ns for the useful traffic, and
> 10 us - 33 ns for the guard band. This indeed allows for a single
> packet, be it big or small. It's how the hardware works; without any
> other input data point, a slicing point needs to be put somewhere.
> Somehow it's just as arbitrary in v2 as where it was in v1, just
> optimized for a different metric which you're now saying is less 
> practical.

I actually checked the code before writing and saw that one could
change the guard band by setting the MTU of the interface. I though,
"ah ok, then there is no issue". After sleeping, I noticed that you'd
restrict the size of all the frames on the interface. Doh ;)

-michael

> By the way, I was a fool in last year's discussion on guard bands for
> saying that there isn't any way for the user to control per-tc MTU.
> IEEE 802.1Qbv, later standardized as IEEE 802.1Q clause 8.6.8.4
> Enhancements for scheduled traffic, does contain a queueMaxSDUTable
> structure with queueMaxSDU elements. I guess I have no choice except to
> add this to the tc-taprio UAPI in a net-next patch, because as I've
> explained above, even though I've solved the port hanging issue, this
> hardware needs more fine tuning to obtain a differentiation between 
> many
> small packets vs few large packets per interval.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net 0/3] Fixes for Felix DSA driver calculation of tc-taprio guard bands
  2022-09-05 17:01 [PATCH v2 net 0/3] Fixes for Felix DSA driver calculation of tc-taprio guard bands Vladimir Oltean
                   ` (2 preceding siblings ...)
  2022-09-05 17:01 ` [PATCH v2 net 3/3] net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in vsc9959_sched_speed_set Vladimir Oltean
@ 2022-09-07 13:00 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-09-07 13:00 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, davem, edumazet, kuba, pabeni, xiaoliang.yang_1,
	claudiu.manoil, alexandre.belloni, UNGLinuxDriver, andrew,
	vivien.didelot, f.fainelli, michael, vinicius.gomes, fido_max,
	colin.foster, richard.pearn, linux-kernel

Hello:

This series was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Mon,  5 Sep 2022 20:01:22 +0300 you wrote:
> This series fixes some bugs which are not quite new, but date from v5.13
> when static guard bands were enabled by Michael Walle to prevent
> tc-taprio overruns.
> 
> The investigation started when Xiaoliang asked privately what is the
> expected max SDU for a traffic class when its minimum gate interval is
> 10 us. The answer, as it turns out, is not an L1 size of 1250 octets,
> but 1245 octets, since otherwise, the switch will not consider frames
> for egress scheduling, because the static guard band is exactly as large
> as the time interval. The switch needs a minimum of 33 ns outside of the
> guard band to consider a frame for scheduling, and the reduction of the
> max SDU by 5 provides exactly for that.
> 
> [...]

Here is the summary with links:
  - [v2,net,1/3] net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
    https://git.kernel.org/netdev/net/c/11afdc6526de
  - [v2,net,2/3] net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
    https://git.kernel.org/netdev/net/c/843794bbdef8
  - [v2,net,3/3] net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in vsc9959_sched_speed_set
    https://git.kernel.org/netdev/net/c/a4bb481aeb9d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-09-07 13:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-05 17:01 [PATCH v2 net 0/3] Fixes for Felix DSA driver calculation of tc-taprio guard bands Vladimir Oltean
2022-09-05 17:01 ` [PATCH v2 net 1/3] net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet Vladimir Oltean
2022-09-05 22:53   ` Michael Walle
2022-09-06  0:11     ` Vladimir Oltean
2022-09-06  7:17       ` Michael Walle
2022-09-05 17:01 ` [PATCH v2 net 2/3] net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio Vladimir Oltean
2022-09-05 17:01 ` [PATCH v2 net 3/3] net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in vsc9959_sched_speed_set Vladimir Oltean
2022-09-07 13:00 ` [PATCH v2 net 0/3] Fixes for Felix DSA driver calculation of tc-taprio guard bands patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).