netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] add egress rate limit offload for Marvell 6393X family
@ 2023-06-09 14:18 alexis.lothore
  2023-06-09 14:18 ` [PATCH net-next 1/2] net: dsa: mv88e6xxx: allow driver to hook TC callback alexis.lothore
  2023-06-09 14:18 ` [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family alexis.lothore
  0 siblings, 2 replies; 14+ messages in thread
From: alexis.lothore @ 2023-06-09 14:18 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: linux-kernel, netdev, Thomas Petazzoni, paul.arola, scott.roberts

From: Alexis Lothoré <alexis.lothore@bootlin.com>

This series aims to give access to egress rate shaping offloading available
on Marvell 88E6393X family (88E6393X/88E6193X/88E6191X/88E6361)

The switch offers a very basic egress rate limiter: rate can be configured
from 64kbps up to 10gbps depending on the model, with some specific
increments depending on the targeted rate, and is "burstless".

Since available controls are quite limited, this series proposes to provide
controls to userspace through a TBF qdisc. Since hardware features do no
completely matches what TBF qidsc expects, some passed parameters (burst,
latency) are simply ignored

- 1st commit allows mv88e6xxx driver to attach the port_setup_tc callback
  in dsa_switch_opts and to dispatch it to any switch implementing it
- 2nd commit add tbf configuration for 88E6393X family

Alexis Lothoré (2):
  net: dsa: mv88e6xxx: allow driver to hook TC callback
  net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family

 drivers/net/dsa/mv88e6xxx/chip.c |  18 ++++++
 drivers/net/dsa/mv88e6xxx/chip.h |   3 +-
 drivers/net/dsa/mv88e6xxx/port.c | 104 +++++++++++++++++++++++++++++++
 drivers/net/dsa/mv88e6xxx/port.h |  17 ++++-
 4 files changed, 139 insertions(+), 3 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next 1/2] net: dsa: mv88e6xxx: allow driver to hook TC callback
  2023-06-09 14:18 [PATCH net-next 0/2] add egress rate limit offload for Marvell 6393X family alexis.lothore
@ 2023-06-09 14:18 ` alexis.lothore
  2023-06-09 14:18 ` [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family alexis.lothore
  1 sibling, 0 replies; 14+ messages in thread
From: alexis.lothore @ 2023-06-09 14:18 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: linux-kernel, netdev, Thomas Petazzoni, paul.arola, scott.roberts

From: Alexis Lothoré <alexis.lothore@bootlin.com>

Marvell switches have some traffic control offloading capabilities, like
for example per-port rate limiting. Allow driver to benefit from this
offloading by hooking TC setup callback in DSA

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 17 +++++++++++++++++
 drivers/net/dsa/mv88e6xxx/chip.h |  3 ++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 8b51756bd805..0f1ae2aeaf00 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -7091,6 +7091,22 @@ static int mv88e6xxx_crosschip_lag_leave(struct dsa_switch *ds, int sw_index,
 	return err_sync ? : err_pvt;
 }
 
+static int mv88e6xxx_port_setup_tc(struct dsa_switch *ds, int port,
+				   enum tc_setup_type type, void *type_data)
+{
+	struct mv88e6xxx_chip *chip = ds->priv;
+	int err;
+
+	if (!chip->info->ops->port_setup_tc)
+		return -EOPNOTSUPP;
+
+	mv88e6xxx_reg_lock(chip);
+	err = chip->info->ops->port_setup_tc(ds, port, type, type_data);
+	mv88e6xxx_reg_unlock(chip);
+
+	return err;
+}
+
 static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
 	.get_tag_protocol	= mv88e6xxx_get_tag_protocol,
 	.change_tag_protocol	= mv88e6xxx_change_tag_protocol,
@@ -7158,6 +7174,7 @@ static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
 	.crosschip_lag_change	= mv88e6xxx_crosschip_lag_change,
 	.crosschip_lag_join	= mv88e6xxx_crosschip_lag_join,
 	.crosschip_lag_leave	= mv88e6xxx_crosschip_lag_leave,
+	.port_setup_tc		= mv88e6xxx_port_setup_tc,
 };
 
 static int mv88e6xxx_register_switch(struct mv88e6xxx_chip *chip)
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 0ad34b2d8913..b259671b2cd7 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -560,7 +560,8 @@ struct mv88e6xxx_ops {
 	 */
 	int (*port_set_upstream_port)(struct mv88e6xxx_chip *chip, int port,
 				      int upstream_port);
-
+	int (*port_setup_tc)(struct dsa_switch *ds, int port,
+			     enum tc_setup_type type, void *type_data);
 	/* Snapshot the statistics for a port. The statistics can then
 	 * be read back a leisure but still with a consistent view.
 	 */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family
  2023-06-09 14:18 [PATCH net-next 0/2] add egress rate limit offload for Marvell 6393X family alexis.lothore
  2023-06-09 14:18 ` [PATCH net-next 1/2] net: dsa: mv88e6xxx: allow driver to hook TC callback alexis.lothore
@ 2023-06-09 14:18 ` alexis.lothore
  2023-06-09 14:53   ` Andrew Lunn
                     ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: alexis.lothore @ 2023-06-09 14:18 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: linux-kernel, netdev, Thomas Petazzoni, paul.arola, scott.roberts

From: Alexis Lothoré <alexis.lothore@bootlin.com>

The 6393x switches family has a basic, per-port egress rate limit
capability. This feature allows to configure any rate limit between 64kbps
and 10Gbps. This rate limit is said to be "burstless"
The switch offers the following controls, per port:
- count mode: frames, L1 bytes, L2 bytes, L3 bytes
- egress rate: recommended fixed values to be programmed, depending on
  actually targeted rate:
  - 64 kbps for rate between 64kbps and 1Mbps
  - 1 Mbps for rate between 1Mbps and 100Mbps
  - 10 Mbps for rate between 100Mbps and 1Gbps
  - 100Mbps for rate between 1Gbps and 10Gbps
- egress decrement rate, as number of steps programmed in egress rate
- an optional frame overhead count(0 to 60), which will be added to counted
  bytes to adjust (decrease) rate limiting. Could be used for example to
  avoid saturating a receiving side which add more encapsulation to frames

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c |   1 +
 drivers/net/dsa/mv88e6xxx/port.c | 104 +++++++++++++++++++++++++++++++
 drivers/net/dsa/mv88e6xxx/port.h |  17 ++++-
 3 files changed, 120 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 0f1ae2aeaf00..901698513f26 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -5633,6 +5633,7 @@ static const struct mv88e6xxx_ops mv88e6393x_ops = {
 	.port_set_cmode = mv88e6393x_port_set_cmode,
 	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.port_set_upstream_port = mv88e6393x_port_set_upstream_port,
+	.port_setup_tc = mv88e6393x_port_setup_tc,
 	.stats_snapshot = mv88e6390_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6390_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6320_stats_get_sset_count,
diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
index dd66ec902d4c..b2f0087807ad 100644
--- a/drivers/net/dsa/mv88e6xxx/port.c
+++ b/drivers/net/dsa/mv88e6xxx/port.c
@@ -12,12 +12,16 @@
 #include <linux/if_bridge.h>
 #include <linux/phy.h>
 #include <linux/phylink.h>
+#include <net/pkt_cls.h>
 
 #include "chip.h"
 #include "global2.h"
 #include "port.h"
 #include "serdes.h"
 
+#define MBPS_TO_KBPS(x)	((x) * 1000)
+#define GBPS_TO_KBPS(x)	(MBPS_TO_KBPS(x) * 1000)
+
 int mv88e6xxx_port_read(struct mv88e6xxx_chip *chip, int port, int reg,
 			u16 *val)
 {
@@ -1497,6 +1501,106 @@ int mv88e6393x_port_set_upstream_port(struct mv88e6xxx_chip *chip, int port,
 	return mv88e6393x_port_policy_write(chip, port, ptr, data);
 }
 
+int mv88e6393x_tbf_add(struct mv88e6xxx_chip *chip, int port,
+		       struct tc_tbf_qopt_offload_replace_params *replace_params)
+{
+	int rate_kbps = DIV_ROUND_UP(replace_params->rate.rate_bytes_ps * 8, 1000);
+	int overhead = DIV_ROUND_UP(replace_params->rate.overhead, 4);
+	int rate_step, decrement_rate, err;
+	u16 val;
+
+	if (rate_kbps < MV88E6393X_PORT_EGRESS_RATE_MIN_KBPS ||
+	    rate_kbps >= MV88E6393X_PORT_EGRESS_RATE_MAX_KBPS)
+		return -EOPNOTSUPP;
+
+	if (replace_params->rate.overhead > MV88E6393X_PORT_EGRESS_MAX_OVERHEAD)
+		return -EOPNOTSUPP;
+
+	/* Switch supports only max rate configuration. There is no
+	 * configurable burst/max size nor latency.
+	 * Formula defining registers value is:
+	 * EgressRate = 8 * EgressDec / (16ns * desired Rate)
+	 * EgressRate is a set of fixed values depending of targeted range
+	 */
+	if (rate_kbps < MBPS_TO_KBPS(1)) {
+		decrement_rate = rate_kbps / 64;
+		rate_step = MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_64_KBPS;
+	} else if (rate_kbps < MBPS_TO_KBPS(100)) {
+		decrement_rate = rate_kbps / MBPS_TO_KBPS(1);
+		rate_step = MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_1_MBPS;
+	} else if (rate_kbps < GBPS_TO_KBPS(1)) {
+		decrement_rate = rate_kbps / MBPS_TO_KBPS(10);
+		rate_step = MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_10_MBPS;
+	} else {
+		decrement_rate = rate_kbps / MBPS_TO_KBPS(100);
+		rate_step = MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_100_MBPS;
+	}
+
+	dev_dbg(chip->dev, "p%d: adding egress tbf qdisc with %dkbps rate",
+		port, rate_kbps);
+	val = decrement_rate;
+	val |= (overhead << MV88E6XXX_PORT_EGRESS_RATE_CTL1_FRAME_OVERHEAD_SHIFT);
+	err = mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL1,
+				   val);
+	if (err)
+		return err;
+
+	val = rate_step;
+	/* Configure mode to bits per second mode, on layer 1 */
+	val |= MV88E6XXX_PORT_EGRESS_RATE_CTL2_COUNT_L1_BYTES;
+	err = mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL2,
+				   val);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int mv88e6393x_tbf_del(struct mv88e6xxx_chip *chip, int port)
+{
+	int err;
+
+	dev_dbg(chip->dev, "p%d: removing tbf qdisc", port);
+	err = mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL2,
+				   0x0000);
+	if (err)
+		return err;
+	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL1,
+				    0x0001);
+}
+
+static int mv88e6393x_tc_setup_qdisc_tbf(struct mv88e6xxx_chip *chip, int port,
+					 struct tc_tbf_qopt_offload *qopt)
+{
+	/* Device only supports per-port egress rate limiting */
+	if (qopt->parent != TC_H_ROOT)
+		return -EOPNOTSUPP;
+
+	switch (qopt->command) {
+	case TC_TBF_REPLACE:
+		return mv88e6393x_tbf_add(chip, port, &qopt->replace_params);
+	case TC_TBF_DESTROY:
+		return mv88e6393x_tbf_del(chip, port);
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return -EOPNOTSUPP;
+}
+
+int mv88e6393x_port_setup_tc(struct dsa_switch *ds, int port,
+			     enum tc_setup_type type, void *type_data)
+{
+	struct mv88e6xxx_chip *chip = ds->priv;
+
+	switch (type) {
+	case TC_SETUP_QDISC_TBF:
+		return mv88e6393x_tc_setup_qdisc_tbf(chip, port, type_data);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
 int mv88e6393x_port_mgmt_rsvd2cpu(struct mv88e6xxx_chip *chip)
 {
 	u16 ptr;
diff --git a/drivers/net/dsa/mv88e6xxx/port.h b/drivers/net/dsa/mv88e6xxx/port.h
index 86deeb347cbc..791ad335b647 100644
--- a/drivers/net/dsa/mv88e6xxx/port.h
+++ b/drivers/net/dsa/mv88e6xxx/port.h
@@ -222,10 +222,21 @@
 #define MV88E6095_PORT_CTL2_CPU_PORT_MASK		0x000f
 
 /* Offset 0x09: Egress Rate Control */
-#define MV88E6XXX_PORT_EGRESS_RATE_CTL1		0x09
+#define MV88E6XXX_PORT_EGRESS_RATE_CTL1				0x09
+#define MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_64_KBPS		0x1E84
+#define MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_1_MBPS		0x01F4
+#define MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_10_MBPS		0x0032
+#define MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_100_MBPS		0x0005
+#define MV88E6XXX_PORT_EGRESS_RATE_CTL1_FRAME_OVERHEAD_SHIFT	8
+#define MV88E6393X_PORT_EGRESS_RATE_MIN_KBPS			64
+#define MV88E6393X_PORT_EGRESS_RATE_MAX_KBPS			10000000
+#define MV88E6393X_PORT_EGRESS_MAX_OVERHEAD			60
 
 /* Offset 0x0A: Egress Rate Control 2 */
-#define MV88E6XXX_PORT_EGRESS_RATE_CTL2		0x0a
+#define MV88E6XXX_PORT_EGRESS_RATE_CTL2				0x0a
+#define MV88E6XXX_PORT_EGRESS_RATE_CTL2_COUNT_L1_BYTES		0x4000
+#define MV88E6XXX_PORT_EGRESS_RATE_CTL2_COUNT_L2_BYTES		0x8000
+#define MV88E6XXX_PORT_EGRESS_RATE_CTL2_COUNT_L3_BYTES		0xC000
 
 /* Offset 0x0B: Port Association Vector */
 #define MV88E6XXX_PORT_ASSOC_VECTOR			0x0b
@@ -415,6 +426,8 @@ int mv88e6393x_set_egress_port(struct mv88e6xxx_chip *chip,
 			       int port);
 int mv88e6393x_port_set_upstream_port(struct mv88e6xxx_chip *chip, int port,
 				      int upstream_port);
+int mv88e6393x_port_setup_tc(struct dsa_switch *ds, int port,
+			     enum tc_setup_type type, void *type_data);
 int mv88e6393x_port_mgmt_rsvd2cpu(struct mv88e6xxx_chip *chip);
 int mv88e6393x_port_set_ether_type(struct mv88e6xxx_chip *chip, int port,
 				   u16 etype);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family
  2023-06-09 14:18 ` [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family alexis.lothore
@ 2023-06-09 14:53   ` Andrew Lunn
  2023-06-09 16:27     ` Alexis Lothoré
  2023-06-09 14:57   ` Vladimir Oltean
  2023-06-09 15:52   ` kernel test robot
  2 siblings, 1 reply; 14+ messages in thread
From: Andrew Lunn @ 2023-06-09 14:53 UTC (permalink / raw)
  To: alexis.lothore
  Cc: Florian Fainelli, Vladimir Oltean, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel, netdev,
	Thomas Petazzoni, paul.arola, scott.roberts

> +int mv88e6393x_tbf_add(struct mv88e6xxx_chip *chip, int port,
> +		       struct tc_tbf_qopt_offload_replace_params *replace_params)
> +{
> +	int rate_kbps = DIV_ROUND_UP(replace_params->rate.rate_bytes_ps * 8, 1000);
> +	int overhead = DIV_ROUND_UP(replace_params->rate.overhead, 4);
> +	int rate_step, decrement_rate, err;
> +	u16 val;
> +
> +	if (rate_kbps < MV88E6393X_PORT_EGRESS_RATE_MIN_KBPS ||
> +	    rate_kbps >= MV88E6393X_PORT_EGRESS_RATE_MAX_KBPS)
> +		return -EOPNOTSUPP;
> +
> +	if (replace_params->rate.overhead > MV88E6393X_PORT_EGRESS_MAX_OVERHEAD)
> +		return -EOPNOTSUPP;
> +
> +	/* Switch supports only max rate configuration. There is no
> +	 * configurable burst/max size nor latency.

Can you return -EOPNOTSUPP if these values are not 0? That should make
it clear to the user they are not supported.

>  /* Offset 0x09: Egress Rate Control */
> -#define MV88E6XXX_PORT_EGRESS_RATE_CTL1		0x09
> +#define MV88E6XXX_PORT_EGRESS_RATE_CTL1				0x09
> +#define MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_64_KBPS		0x1E84
> +#define MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_1_MBPS		0x01F4
> +#define MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_10_MBPS		0x0032
> +#define MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_100_MBPS		0x0005
> +#define MV88E6XXXw_PORT_EGRESS_RATE_CTL1_FRAME_OVERHEAD_SHIFT	8

Are they above values specific to the 6393? Or will they also work for
other families? You use the MV88E6XXX prefix which means they should
be generic across all devices.

	Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family
  2023-06-09 14:18 ` [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family alexis.lothore
  2023-06-09 14:53   ` Andrew Lunn
@ 2023-06-09 14:57   ` Vladimir Oltean
  2023-06-09 16:27     ` Alexis Lothoré
  2023-06-09 15:52   ` kernel test robot
  2 siblings, 1 reply; 14+ messages in thread
From: Vladimir Oltean @ 2023-06-09 14:57 UTC (permalink / raw)
  To: alexis.lothore
  Cc: Andrew Lunn, Florian Fainelli, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel, netdev,
	Thomas Petazzoni, paul.arola, scott.roberts

On Fri, Jun 09, 2023 at 04:18:12PM +0200, alexis.lothore@bootlin.com wrote:
> +int mv88e6393x_tbf_add(struct mv88e6xxx_chip *chip, int port,
> +		       struct tc_tbf_qopt_offload_replace_params *replace_params)
> +{
> +	int rate_kbps = DIV_ROUND_UP(replace_params->rate.rate_bytes_ps * 8, 1000);
> +	int overhead = DIV_ROUND_UP(replace_params->rate.overhead, 4);
> +	int rate_step, decrement_rate, err;
> +	u16 val;
> +
> +	if (rate_kbps < MV88E6393X_PORT_EGRESS_RATE_MIN_KBPS ||
> +	    rate_kbps >= MV88E6393X_PORT_EGRESS_RATE_MAX_KBPS)
> +		return -EOPNOTSUPP;
> +
> +	if (replace_params->rate.overhead > MV88E6393X_PORT_EGRESS_MAX_OVERHEAD)
> +		return -EOPNOTSUPP;

How does tbf react to the driver returning -EOPNOTSUPP? I see tbf_offload_change()
returns void and doesn't check the ndo_setup_tc() return code.

Should we resolve that so that the error code is propagated to the user?

Also, it would be nice to extend struct tc_tbf_qopt_offload with a
netlink extack, for the driver to state exactly the reason for the
offload failure.

Not sure if EOPNOTSUPP is the return code to use here for range checks,
rather than ERANGE.

> +
> +	/* Switch supports only max rate configuration. There is no
> +	 * configurable burst/max size nor latency.
> +	 * Formula defining registers value is:
> +	 * EgressRate = 8 * EgressDec / (16ns * desired Rate)
> +	 * EgressRate is a set of fixed values depending of targeted range
> +	 */
> +	if (rate_kbps < MBPS_TO_KBPS(1)) {
> +		decrement_rate = rate_kbps / 64;
> +		rate_step = MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_64_KBPS;
> +	} else if (rate_kbps < MBPS_TO_KBPS(100)) {
> +		decrement_rate = rate_kbps / MBPS_TO_KBPS(1);
> +		rate_step = MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_1_MBPS;
> +	} else if (rate_kbps < GBPS_TO_KBPS(1)) {
> +		decrement_rate = rate_kbps / MBPS_TO_KBPS(10);
> +		rate_step = MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_10_MBPS;
> +	} else {
> +		decrement_rate = rate_kbps / MBPS_TO_KBPS(100);
> +		rate_step = MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_100_MBPS;
> +	}
> +
> +	dev_dbg(chip->dev, "p%d: adding egress tbf qdisc with %dkbps rate",
> +		port, rate_kbps);
> +	val = decrement_rate;
> +	val |= (overhead << MV88E6XXX_PORT_EGRESS_RATE_CTL1_FRAME_OVERHEAD_SHIFT);
> +	err = mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL1,
> +				   val);
> +	if (err)
> +		return err;
> +
> +	val = rate_step;
> +	/* Configure mode to bits per second mode, on layer 1 */
> +	val |= MV88E6XXX_PORT_EGRESS_RATE_CTL2_COUNT_L1_BYTES;
> +	err = mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL2,
> +				   val);
> +	if (err)
> +		return err;
> +
> +	return 0;
> +}
> +
> +int mv88e6393x_tbf_del(struct mv88e6xxx_chip *chip, int port)
> +{
> +	int err;
> +
> +	dev_dbg(chip->dev, "p%d: removing tbf qdisc", port);
> +	err = mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL2,
> +				   0x0000);
> +	if (err)
> +		return err;
> +	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL1,
> +				    0x0001);

I guess this should return void and proceed on errors, rather than exit early.
Maybe shout out loud that things went wrong.

> +}
> +
> +static int mv88e6393x_tc_setup_qdisc_tbf(struct mv88e6xxx_chip *chip, int port,
> +					 struct tc_tbf_qopt_offload *qopt)
> +{
> +	/* Device only supports per-port egress rate limiting */
> +	if (qopt->parent != TC_H_ROOT)
> +		return -EOPNOTSUPP;
> +
> +	switch (qopt->command) {
> +	case TC_TBF_REPLACE:
> +		return mv88e6393x_tbf_add(chip, port, &qopt->replace_params);
> +	case TC_TBF_DESTROY:
> +		return mv88e6393x_tbf_del(chip, port);
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +
> +	return -EOPNOTSUPP;
> +}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family
  2023-06-09 14:18 ` [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family alexis.lothore
  2023-06-09 14:53   ` Andrew Lunn
  2023-06-09 14:57   ` Vladimir Oltean
@ 2023-06-09 15:52   ` kernel test robot
  2 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2023-06-09 15:52 UTC (permalink / raw)
  To: alexis.lothore, Andrew Lunn, Florian Fainelli, Vladimir Oltean,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: oe-kbuild-all, netdev, linux-kernel, Thomas Petazzoni,
	paul.arola, scott.roberts

Hi,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/alexis-lothore-bootlin-com/net-dsa-mv88e6xxx-allow-driver-to-hook-TC-callback/20230609-222048
base:   net-next/main
patch link:    https://lore.kernel.org/r/20230609141812.297521-3-alexis.lothore%40bootlin.com
patch subject: [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family
config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20230609/202306092327.Gf6CXGqE-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 12.3.0
reproduce (this is a W=1 build):
        mkdir -p ~/bin
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        git remote add net-next https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
        git fetch net-next main
        git checkout net-next/main
        b4 shazam https://lore.kernel.org/r/20230609141812.297521-3-alexis.lothore@bootlin.com
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross W=1 O=build_dir ARCH=alpha olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross W=1 O=build_dir ARCH=alpha SHELL=/bin/bash drivers/net/

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202306092327.Gf6CXGqE-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/net/dsa/mv88e6xxx/port.c:1504:5: warning: no previous prototype for 'mv88e6393x_tbf_add' [-Wmissing-prototypes]
    1504 | int mv88e6393x_tbf_add(struct mv88e6xxx_chip *chip, int port,
         |     ^~~~~~~~~~~~~~~~~~
>> drivers/net/dsa/mv88e6xxx/port.c:1559:5: warning: no previous prototype for 'mv88e6393x_tbf_del' [-Wmissing-prototypes]
    1559 | int mv88e6393x_tbf_del(struct mv88e6xxx_chip *chip, int port)
         |     ^~~~~~~~~~~~~~~~~~


vim +/mv88e6393x_tbf_add +1504 drivers/net/dsa/mv88e6xxx/port.c

  1503	
> 1504	int mv88e6393x_tbf_add(struct mv88e6xxx_chip *chip, int port,
  1505			       struct tc_tbf_qopt_offload_replace_params *replace_params)
  1506	{
  1507		int rate_kbps = DIV_ROUND_UP(replace_params->rate.rate_bytes_ps * 8, 1000);
  1508		int overhead = DIV_ROUND_UP(replace_params->rate.overhead, 4);
  1509		int rate_step, decrement_rate, err;
  1510		u16 val;
  1511	
  1512		if (rate_kbps < MV88E6393X_PORT_EGRESS_RATE_MIN_KBPS ||
  1513		    rate_kbps >= MV88E6393X_PORT_EGRESS_RATE_MAX_KBPS)
  1514			return -EOPNOTSUPP;
  1515	
  1516		if (replace_params->rate.overhead > MV88E6393X_PORT_EGRESS_MAX_OVERHEAD)
  1517			return -EOPNOTSUPP;
  1518	
  1519		/* Switch supports only max rate configuration. There is no
  1520		 * configurable burst/max size nor latency.
  1521		 * Formula defining registers value is:
  1522		 * EgressRate = 8 * EgressDec / (16ns * desired Rate)
  1523		 * EgressRate is a set of fixed values depending of targeted range
  1524		 */
  1525		if (rate_kbps < MBPS_TO_KBPS(1)) {
  1526			decrement_rate = rate_kbps / 64;
  1527			rate_step = MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_64_KBPS;
  1528		} else if (rate_kbps < MBPS_TO_KBPS(100)) {
  1529			decrement_rate = rate_kbps / MBPS_TO_KBPS(1);
  1530			rate_step = MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_1_MBPS;
  1531		} else if (rate_kbps < GBPS_TO_KBPS(1)) {
  1532			decrement_rate = rate_kbps / MBPS_TO_KBPS(10);
  1533			rate_step = MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_10_MBPS;
  1534		} else {
  1535			decrement_rate = rate_kbps / MBPS_TO_KBPS(100);
  1536			rate_step = MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_100_MBPS;
  1537		}
  1538	
  1539		dev_dbg(chip->dev, "p%d: adding egress tbf qdisc with %dkbps rate",
  1540			port, rate_kbps);
  1541		val = decrement_rate;
  1542		val |= (overhead << MV88E6XXX_PORT_EGRESS_RATE_CTL1_FRAME_OVERHEAD_SHIFT);
  1543		err = mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL1,
  1544					   val);
  1545		if (err)
  1546			return err;
  1547	
  1548		val = rate_step;
  1549		/* Configure mode to bits per second mode, on layer 1 */
  1550		val |= MV88E6XXX_PORT_EGRESS_RATE_CTL2_COUNT_L1_BYTES;
  1551		err = mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL2,
  1552					   val);
  1553		if (err)
  1554			return err;
  1555	
  1556		return 0;
  1557	}
  1558	
> 1559	int mv88e6393x_tbf_del(struct mv88e6xxx_chip *chip, int port)
  1560	{
  1561		int err;
  1562	
  1563		dev_dbg(chip->dev, "p%d: removing tbf qdisc", port);
  1564		err = mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL2,
  1565					   0x0000);
  1566		if (err)
  1567			return err;
  1568		return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL1,
  1569					    0x0001);
  1570	}
  1571	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family
  2023-06-09 14:57   ` Vladimir Oltean
@ 2023-06-09 16:27     ` Alexis Lothoré
  0 siblings, 0 replies; 14+ messages in thread
From: Alexis Lothoré @ 2023-06-09 16:27 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel, netdev,
	Thomas Petazzoni, paul.arola, scott.roberts

Hello Vladimir, thanks for the feedback,

On 6/9/23 16:57, Vladimir Oltean wrote:
> On Fri, Jun 09, 2023 at 04:18:12PM +0200, alexis.lothore@bootlin.com wrote:
>> +int mv88e6393x_tbf_add(struct mv88e6xxx_chip *chip, int port,
>> +		       struct tc_tbf_qopt_offload_replace_params *replace_params)
>> +{
>> +	int rate_kbps = DIV_ROUND_UP(replace_params->rate.rate_bytes_ps * 8, 1000);
>> +	int overhead = DIV_ROUND_UP(replace_params->rate.overhead, 4);
>> +	int rate_step, decrement_rate, err;
>> +	u16 val;
>> +
>> +	if (rate_kbps < MV88E6393X_PORT_EGRESS_RATE_MIN_KBPS ||
>> +	    rate_kbps >= MV88E6393X_PORT_EGRESS_RATE_MAX_KBPS)
>> +		return -EOPNOTSUPP;
>> +
>> +	if (replace_params->rate.overhead > MV88E6393X_PORT_EGRESS_MAX_OVERHEAD)
>> +		return -EOPNOTSUPP;
> 
> How does tbf react to the driver returning -EOPNOTSUPP? I see tbf_offload_change()
> returns void and doesn't check the ndo_setup_tc() return code.
> 
> Should we resolve that so that the error code is propagated to the user?

Indeed, checking some other TC Qdisc, some reports ndo_setup_tc errors (htb,
taprio, ...) to caller, some others do not (red, ets...). I can give it a try
and see the impact
> 
> Also, it would be nice to extend struct tc_tbf_qopt_offload with a
> netlink extack, for the driver to state exactly the reason for the
> offload failure.

ACK, I will add the extack struct

> 
> Not sure if EOPNOTSUPP is the return code to use here for range checks,
> rather than ERANGE.

I was not sure about proper error codes on all those checks. Since all those
errors are about what hardware can handle/can not handle, I felt like EOPNOTSUPP
was the most relevant one. But indeed it may make more sense for user to get
ERANGE here, I will update accordingly
>> +
>> +	/* Switch supports only max rate configuration. There is no
>> +	 * configurable burst/max size nor latency.
>> +	 * Formula defining registers value is:
>> +	 * EgressRate = 8 * EgressDec / (16ns * desired Rate)
>> +	 * EgressRate is a set of fixed values depending of targeted range
>> +	 */
>> +	if (rate_kbps < MBPS_TO_KBPS(1)) {
>> +		decrement_rate = rate_kbps / 64;
>> +		rate_step = MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_64_KBPS;
>> +	} else if (rate_kbps < MBPS_TO_KBPS(100)) {
>> +		decrement_rate = rate_kbps / MBPS_TO_KBPS(1);
>> +		rate_step = MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_1_MBPS;
>> +	} else if (rate_kbps < GBPS_TO_KBPS(1)) {
>> +		decrement_rate = rate_kbps / MBPS_TO_KBPS(10);
>> +		rate_step = MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_10_MBPS;
>> +	} else {
>> +		decrement_rate = rate_kbps / MBPS_TO_KBPS(100);
>> +		rate_step = MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_100_MBPS;
>> +	}
>> +
>> +	dev_dbg(chip->dev, "p%d: adding egress tbf qdisc with %dkbps rate",
>> +		port, rate_kbps);
>> +	val = decrement_rate;
>> +	val |= (overhead << MV88E6XXX_PORT_EGRESS_RATE_CTL1_FRAME_OVERHEAD_SHIFT);
>> +	err = mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL1,
>> +				   val);
>> +	if (err)
>> +		return err;
>> +
>> +	val = rate_step;
>> +	/* Configure mode to bits per second mode, on layer 1 */
>> +	val |= MV88E6XXX_PORT_EGRESS_RATE_CTL2_COUNT_L1_BYTES;
>> +	err = mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL2,
>> +				   val);
>> +	if (err)
>> +		return err;
>> +
>> +	return 0;
>> +}
>> +
>> +int mv88e6393x_tbf_del(struct mv88e6xxx_chip *chip, int port)
>> +{
>> +	int err;
>> +
>> +	dev_dbg(chip->dev, "p%d: removing tbf qdisc", port);
>> +	err = mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL2,
>> +				   0x0000);
>> +	if (err)
>> +		return err;
>> +	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL1,
>> +				    0x0001);
> 
> I guess this should return void and proceed on errors, rather than exit early.
> Maybe shout out loud that things went wrong.

ACK

> 
>> +}
>> +
>> +static int mv88e6393x_tc_setup_qdisc_tbf(struct mv88e6xxx_chip *chip, int port,
>> +					 struct tc_tbf_qopt_offload *qopt)
>> +{
>> +	/* Device only supports per-port egress rate limiting */
>> +	if (qopt->parent != TC_H_ROOT)
>> +		return -EOPNOTSUPP;
>> +
>> +	switch (qopt->command) {
>> +	case TC_TBF_REPLACE:
>> +		return mv88e6393x_tbf_add(chip, port, &qopt->replace_params);
>> +	case TC_TBF_DESTROY:
>> +		return mv88e6393x_tbf_del(chip, port);
>> +	default:
>> +		return -EOPNOTSUPP;
>> +	}
>> +
>> +	return -EOPNOTSUPP;
>> +}

-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family
  2023-06-09 14:53   ` Andrew Lunn
@ 2023-06-09 16:27     ` Alexis Lothoré
  2023-06-09 17:16       ` Andrew Lunn
  0 siblings, 1 reply; 14+ messages in thread
From: Alexis Lothoré @ 2023-06-09 16:27 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, Vladimir Oltean, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel, netdev,
	Thomas Petazzoni, paul.arola, scott.roberts

Hi Andrew, thanks for the review,

On 6/9/23 16:53, Andrew Lunn wrote:
>> +int mv88e6393x_tbf_add(struct mv88e6xxx_chip *chip, int port,
>> +		       struct tc_tbf_qopt_offload_replace_params *replace_params)
>> +{
>> +	int rate_kbps = DIV_ROUND_UP(replace_params->rate.rate_bytes_ps * 8, 1000);
>> +	int overhead = DIV_ROUND_UP(replace_params->rate.overhead, 4);
>> +	int rate_step, decrement_rate, err;
>> +	u16 val;
>> +
>> +	if (rate_kbps < MV88E6393X_PORT_EGRESS_RATE_MIN_KBPS ||
>> +	    rate_kbps >= MV88E6393X_PORT_EGRESS_RATE_MAX_KBPS)
>> +		return -EOPNOTSUPP;
>> +
>> +	if (replace_params->rate.overhead > MV88E6393X_PORT_EGRESS_MAX_OVERHEAD)
>> +		return -EOPNOTSUPP;
>> +
>> +	/* Switch supports only max rate configuration. There is no
>> +	 * configurable burst/max size nor latency.
> 
> Can you return -EOPNOTSUPP if these values are not 0? That should make
> it clear to the user they are not supported.

Yes, I can do that (or maybe -EINVAL to match Vladimir's comment ?). I think
it's worth mentioning that I encountered an issue regarding those values during
tests: I use tc program to set the tbf, and I observed that tc does not even
reach kernel to set the qdisc if we pass no burst/latency value OR if we set it
to 0. So tc enforces right on userspace side non-zero value for those
parameters, and I have passed random values and ignored them on kernel side.
Checking available doc about tc-tbf makes me feel like that indeed a TBF qdisc
command without burst or latency value makes no sense, except my use case can
not have such values. That's what I struggled a bit to find a proper qdisc to
match hardware cap. I may fallback to a custom netlink program to improve testing.
> 
>>  /* Offset 0x09: Egress Rate Control */
>> -#define MV88E6XXX_PORT_EGRESS_RATE_CTL1		0x09
>> +#define MV88E6XXX_PORT_EGRESS_RATE_CTL1				0x09
>> +#define MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_64_KBPS		0x1E84
>> +#define MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_1_MBPS		0x01F4
>> +#define MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_10_MBPS		0x0032
>> +#define MV88E6XXX_PORT_EGRESS_RATE_CTL1_STEP_100_MBPS		0x0005
>> +#define MV88E6XXXw_PORT_EGRESS_RATE_CTL1_FRAME_OVERHEAD_SHIFT	8
> 
> Are they above values specific to the 6393? Or will they also work for
> other families? You use the MV88E6XXX prefix which means they should
> be generic across all devices.

I have no idea about EGRESS_RATE_CTL1 and EGRESSE_RATE_CTL2 registers layout or
features for other switches supported in mv88e6xxx, and it is likely risky to
assume it is identical, so indeed I will rename those defines to make them
specific to 6393 (+ nasty typo in
MV88E6XXXw_PORT_EGRESS_RATE_CTL1_FRAME_OVERHEAD_SHIFT)

Thanks,
Alexis

> 
> 	Andrew

-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family
  2023-06-09 16:27     ` Alexis Lothoré
@ 2023-06-09 17:16       ` Andrew Lunn
  2023-06-09 17:38         ` Alexis Lothoré
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Lunn @ 2023-06-09 17:16 UTC (permalink / raw)
  To: Alexis Lothoré
  Cc: Florian Fainelli, Vladimir Oltean, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel, netdev,
	Thomas Petazzoni, paul.arola, scott.roberts

> Yes, I can do that (or maybe -EINVAL to match Vladimir's comment ?). I think
> it's worth mentioning that I encountered an issue regarding those values during
> tests: I use tc program to set the tbf, and I observed that tc does not even
> reach kernel to set the qdisc if we pass no burst/latency value OR if we set it
> to 0. So tc enforces right on userspace side non-zero value for those
> parameters, and I have passed random values and ignored them on kernel side.

That is not good. Please take a look around and see if any other
driver offloads TBF, and what they do with burst.

> Checking available doc about tc-tbf makes me feel like that indeed a TBF qdisc
> command without burst or latency value makes no sense, except my use case can
> not have such values. That's what I struggled a bit to find a proper qdisc to
> match hardware cap. I may fallback to a custom netlink program to improve testing.

We don't really want a custom application, since we want users to use
TC to set this up.

Looking at the 6390 datasheet, Queue Counter Registers, mode 8 gives
the number of egress buffers for a port. You could validate that the
switch has at least the requested number of buffers assigned to the
port? There is quite a bit you can configure, so maybe there is a way
to influence the number of buffers, so you can actually implement the
burst parameter?

      Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family
  2023-06-09 17:16       ` Andrew Lunn
@ 2023-06-09 17:38         ` Alexis Lothoré
       [not found]           ` <CA+sq2CcG4pQDLcw+fTkcEfTZv6zPY3pcGCKeOy8owiaRF2HELA@mail.gmail.com>
  0 siblings, 1 reply; 14+ messages in thread
From: Alexis Lothoré @ 2023-06-09 17:38 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, Vladimir Oltean, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel, netdev,
	Thomas Petazzoni, paul.arola, scott.roberts

On 6/9/23 19:16, Andrew Lunn wrote:
>> Yes, I can do that (or maybe -EINVAL to match Vladimir's comment ?). I think
>> it's worth mentioning that I encountered an issue regarding those values during
>> tests: I use tc program to set the tbf, and I observed that tc does not even
>> reach kernel to set the qdisc if we pass no burst/latency value OR if we set it
>> to 0. So tc enforces right on userspace side non-zero value for those
>> parameters, and I have passed random values and ignored them on kernel side.
> 
> That is not good. Please take a look around and see if any other
> driver offloads TBF, and what they do with burst.
> 
>> Checking available doc about tc-tbf makes me feel like that indeed a TBF qdisc
>> command without burst or latency value makes no sense, except my use case can
>> not have such values. That's what I struggled a bit to find a proper qdisc to
>> match hardware cap. I may fallback to a custom netlink program to improve testing.
> 
> We don't really want a custom application, since we want users to use
> TC to set this up.
> 
> Looking at the 6390 datasheet, Queue Counter Registers, mode 8 gives
> the number of egress buffers for a port. You could validate that the
> switch has at least the requested number of buffers assigned to the
> port? There is quite a bit you can configure, so maybe there is a way
> to influence the number of buffers, so you can actually implement the
> burst parameter?

Thanks for the pointers. I will check the egress buffers configuration and see
if I can come up with something better

> 
>       Andrew

-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family
       [not found]           ` <CA+sq2CcG4pQDLcw+fTkcEfTZv6zPY3pcGCKeOy8owiaRF2HELA@mail.gmail.com>
@ 2023-06-12  8:54             ` Alexis Lothoré
  2023-06-12  9:43             ` Vladimir Oltean
  1 sibling, 0 replies; 14+ messages in thread
From: Alexis Lothoré @ 2023-06-12  8:54 UTC (permalink / raw)
  To: Sunil Kovvuri
  Cc: Andrew Lunn, Florian Fainelli, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel, netdev,
	Thomas Petazzoni, paul.arola, scott.roberts

Hi Sunil,

On 6/12/23 08:34, Sunil Kovvuri wrote:
> 
> 
> On Fri, Jun 9, 2023 at 11:08 PM Alexis Lothoré <alexis.lothore@bootlin.com
> <mailto:alexis.lothore@bootlin.com>> wrote:
> 
>     On 6/9/23 19:16, Andrew Lunn wrote:
>     >> Yes, I can do that (or maybe -EINVAL to match Vladimir's comment ?). I think
>     >> it's worth mentioning that I encountered an issue regarding those values
>     during
>     >> tests: I use tc program to set the tbf, and I observed that tc does not even
>     >> reach kernel to set the qdisc if we pass no burst/latency value OR if we
>     set it
>     >> to 0. So tc enforces right on userspace side non-zero value for those
>     >> parameters, and I have passed random values and ignored them on kernel side.
>     >
>     > That is not good. Please take a look around and see if any other
>     > driver offloads TBF, and what they do with burst.
>     >
>     >> Checking available doc about tc-tbf makes me feel like that indeed a TBF
>     qdisc
>     >> command without burst or latency value makes no sense, except my use case can
>     >> not have such values. That's what I struggled a bit to find a proper qdisc to
>     >> match hardware cap. I may fallback to a custom netlink program to improve
>     testing.
>     >
>     > We don't really want a custom application, since we want users to use
>     > TC to set this up.
>     >
>     > Looking at the 6390 datasheet, Queue Counter Registers, mode 8 gives
>     > the number of egress buffers for a port. You could validate that the
>     > switch has at least the requested number of buffers assigned to the
>     > port? There is quite a bit you can configure, so maybe there is a way
>     > to influence the number of buffers, so you can actually implement the
>     > burst parameter?
> 
>     Thanks for the pointers. I will check the egress buffers configuration and see
>     if I can come up with something better
> 
> 
> For setting up simple per-port ratelimit, instead of TBF isn't "egress matchall"
> suitable here ?

I guess you are suggesting matchall + policer ? At first glance, I see no
obvious elements showing if one or another is more relevant. From user point of
view, controls are pretty much the same (rate + burst at least), but it looks
like policer is more of a pass/drop action, contrary to TBF which has some delay
notions, so it would solve the latency/limit absence of control. I am not sure
how it would look like on kernel side and how it would behave (how is managed
the filter, how can the policer be offloaded). I see some port_policer_add/del
callbacks in DSA, I will take a look at that as well and check differences with
TBF. Thanks for the suggestion.

Alexis
> 
> Thanks,
> Sunil. 

-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family
       [not found]           ` <CA+sq2CcG4pQDLcw+fTkcEfTZv6zPY3pcGCKeOy8owiaRF2HELA@mail.gmail.com>
  2023-06-12  8:54             ` Alexis Lothoré
@ 2023-06-12  9:43             ` Vladimir Oltean
  2023-06-12 18:23               ` Sunil Kovvuri
  1 sibling, 1 reply; 14+ messages in thread
From: Vladimir Oltean @ 2023-06-12  9:43 UTC (permalink / raw)
  To: Sunil Kovvuri
  Cc: Alexis Lothoré,
	Andrew Lunn, Florian Fainelli, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel, netdev,
	Thomas Petazzoni, paul.arola, scott.roberts

Hi Sunil,

On Mon, Jun 12, 2023 at 12:04:56PM +0530, Sunil Kovvuri wrote:
> For setting up simple per-port ratelimit, instead of TBF isn't "egress
> matchall" suitable here ?

"matchall" is a filter. What would be the associated action for a
port-level shaper?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family
  2023-06-12  9:43             ` Vladimir Oltean
@ 2023-06-12 18:23               ` Sunil Kovvuri
  2023-06-12 18:51                 ` Vladimir Oltean
  0 siblings, 1 reply; 14+ messages in thread
From: Sunil Kovvuri @ 2023-06-12 18:23 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Alexis Lothoré,
	Andrew Lunn, Florian Fainelli, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel, netdev,
	Thomas Petazzoni, paul.arola, scott.roberts

On Mon, Jun 12, 2023 at 3:13 PM Vladimir Oltean <olteanv@gmail.com> wrote:
>
> Hi Sunil,
>
> On Mon, Jun 12, 2023 at 12:04:56PM +0530, Sunil Kovvuri wrote:
> > For setting up simple per-port ratelimit, instead of TBF isn't "egress
> > matchall" suitable here ?
>
> "matchall" is a filter. What would be the associated action for a
> port-level shaper?

As Alexis mentioned I was referring to "matchall + policer".

Thanks,
Sunil.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family
  2023-06-12 18:23               ` Sunil Kovvuri
@ 2023-06-12 18:51                 ` Vladimir Oltean
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Oltean @ 2023-06-12 18:51 UTC (permalink / raw)
  To: Sunil Kovvuri
  Cc: Alexis Lothoré,
	Andrew Lunn, Florian Fainelli, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel, netdev,
	Thomas Petazzoni, paul.arola, scott.roberts

On Mon, Jun 12, 2023 at 11:53:06PM +0530, Sunil Kovvuri wrote:
> On Mon, Jun 12, 2023 at 3:13 PM Vladimir Oltean <olteanv@gmail.com> wrote:
> >
> > Hi Sunil,
> >
> > On Mon, Jun 12, 2023 at 12:04:56PM +0530, Sunil Kovvuri wrote:
> > > For setting up simple per-port ratelimit, instead of TBF isn't "egress
> > > matchall" suitable here ?
> >
> > "matchall" is a filter. What would be the associated action for a
> > port-level shaper?
> 
> As Alexis mentioned I was referring to "matchall + policer".

The idea would be to pick a software representation which matches the
hardware behavior. A policer drops excess packets, a shaper queues them.
This hardware supports some sort of egress rate shaping.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-06-12 18:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-09 14:18 [PATCH net-next 0/2] add egress rate limit offload for Marvell 6393X family alexis.lothore
2023-06-09 14:18 ` [PATCH net-next 1/2] net: dsa: mv88e6xxx: allow driver to hook TC callback alexis.lothore
2023-06-09 14:18 ` [PATCH net-next 2/2] net: dsa: mv88e6xxx: implement egress tbf qdisc for 6393x family alexis.lothore
2023-06-09 14:53   ` Andrew Lunn
2023-06-09 16:27     ` Alexis Lothoré
2023-06-09 17:16       ` Andrew Lunn
2023-06-09 17:38         ` Alexis Lothoré
     [not found]           ` <CA+sq2CcG4pQDLcw+fTkcEfTZv6zPY3pcGCKeOy8owiaRF2HELA@mail.gmail.com>
2023-06-12  8:54             ` Alexis Lothoré
2023-06-12  9:43             ` Vladimir Oltean
2023-06-12 18:23               ` Sunil Kovvuri
2023-06-12 18:51                 ` Vladimir Oltean
2023-06-09 14:57   ` Vladimir Oltean
2023-06-09 16:27     ` Alexis Lothoré
2023-06-09 15:52   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).