[PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark"

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark"
@ 2021-01-17  8:02 Ido Schimmel
  2021-01-17  8:02 ` [PATCH net-next 1/5] devlink: Add ecn_mark trap Ido Schimmel
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Ido Schimmel @ 2021-01-17  8:02 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, petrm, jiri, amcohen, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@nvidia.com>

Petr says:

The RED qdisc currently supports two qevents: "early_drop" and "mark". The
filters added to the block bound to the "early_drop" qevent are executed on
packets for which the RED algorithm decides that they should be
early-dropped. The "mark" filters are similarly executed on ECT packets
that are marked as ECN-CE (Congestion Encountered).

A previous patchset has offloaded "early_drop" filters on Spectrum-2 and
later, provided that the classifier used is "matchall", that the action
used is either "trap" or "mirred", and a handful or further limitations.

This patchset similarly offloads "mark" filters.

Patch set overview:

Patches #1 and #2 add the trap, under which packets will be reported to the
CPU, if the qevent filter uses the action "trap".

Patch #3 then recognizes FLOW_BLOCK_BINDER_TYPE_RED_MARK as a binder type,
and offloads the attached filters similarly to _EARLY_DROP.

Patch #4 cleans up some unused variables in a selftest, and patch #5 adds a
new selftest for the RED "mark" qevent offload.

Petr Machata (5):
  devlink: Add ecn_mark trap
  mlxsw: spectrum_trap: Add ecn_mark trap
  mlxsw: spectrum_qdisc: Offload RED qevent mark
  selftests: mlxsw: sch_red_core: Drop two unused variables
  selftests: mlxsw: RED: Add selftests for the mark qevent

 .../networking/devlink/devlink-trap.rst       |  4 +
 .../net/ethernet/mellanox/mlxsw/spectrum.c    |  2 +
 .../net/ethernet/mellanox/mlxsw/spectrum.h    |  2 +
 .../ethernet/mellanox/mlxsw/spectrum_qdisc.c  | 14 +++-
 .../ethernet/mellanox/mlxsw/spectrum_span.c   | 16 ++++
 .../ethernet/mellanox/mlxsw/spectrum_span.h   |  1 +
 .../ethernet/mellanox/mlxsw/spectrum_trap.c   |  9 ++
 include/net/devlink.h                         |  3 +
 net/core/devlink.c                            |  1 +
 .../drivers/net/mlxsw/sch_red_core.sh         | 84 ++++++++++++++++++-
 .../drivers/net/mlxsw/sch_red_ets.sh          | 74 ++++++++++++++--
 11 files changed, 200 insertions(+), 10 deletions(-)

-- 
2.29.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next 1/5] devlink: Add ecn_mark trap
  2021-01-17  8:02 [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark" Ido Schimmel
@ 2021-01-17  8:02 ` Ido Schimmel
  2021-01-17  8:02 ` [PATCH net-next 2/5] mlxsw: spectrum_trap: " Ido Schimmel
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Ido Schimmel @ 2021-01-17  8:02 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, petrm, jiri, amcohen, mlxsw, Ido Schimmel

From: Petr Machata <petrm@nvidia.com>

Add the packet trap that can report packets that were ECN marked due to RED
AQM.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 Documentation/networking/devlink/devlink-trap.rst | 4 ++++
 include/net/devlink.h                             | 3 +++
 net/core/devlink.c                                | 1 +
 3 files changed, 8 insertions(+)

diff --git a/Documentation/networking/devlink/devlink-trap.rst b/Documentation/networking/devlink/devlink-trap.rst
index d875f3e1e9cf..c6026b44617e 100644
--- a/Documentation/networking/devlink/devlink-trap.rst
+++ b/Documentation/networking/devlink/devlink-trap.rst
@@ -480,6 +480,10 @@ be added to the following table:
      - ``drop``
      - Traps packets that the device decided to drop in case they hit a
        blackhole nexthop
+   * - ``ecn_mark``
+     - ``drop``
+     - Traps ECN-capable packets that were marked with CE (Congestion
+       Encountered) code point by RED algorithm instead of being dropped
 
 Driver-specific Packet Traps
 ============================
diff --git a/include/net/devlink.h b/include/net/devlink.h
index f466819cc477..dd0c0b8fba6e 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -836,6 +836,7 @@ enum devlink_trap_generic_id {
 	DEVLINK_TRAP_GENERIC_ID_GTP_PARSING,
 	DEVLINK_TRAP_GENERIC_ID_ESP_PARSING,
 	DEVLINK_TRAP_GENERIC_ID_BLACKHOLE_NEXTHOP,
+	DEVLINK_TRAP_GENERIC_ID_ECN_MARK,
 
 	/* Add new generic trap IDs above */
 	__DEVLINK_TRAP_GENERIC_ID_MAX,
@@ -1061,6 +1062,8 @@ enum devlink_trap_group_generic_id {
 	"esp_parsing"
 #define DEVLINK_TRAP_GENERIC_NAME_BLACKHOLE_NEXTHOP \
 	"blackhole_nexthop"
+#define DEVLINK_TRAP_GENERIC_NAME_ECN_MARK \
+	"ecn_mark"
 
 #define DEVLINK_TRAP_GROUP_GENERIC_NAME_L2_DROPS \
 	"l2_drops"
diff --git a/net/core/devlink.c b/net/core/devlink.c
index ee828e4b1007..f86688bfad46 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -9508,6 +9508,7 @@ static const struct devlink_trap devlink_trap_generic[] = {
 	DEVLINK_TRAP(GTP_PARSING, DROP),
 	DEVLINK_TRAP(ESP_PARSING, DROP),
 	DEVLINK_TRAP(BLACKHOLE_NEXTHOP, DROP),
+	DEVLINK_TRAP(ECN_MARK, DROP),
 };
 
 #define DEVLINK_TRAP_GROUP(_id)						      \
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 2/5] mlxsw: spectrum_trap: Add ecn_mark trap
  2021-01-17  8:02 [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark" Ido Schimmel
  2021-01-17  8:02 ` [PATCH net-next 1/5] devlink: Add ecn_mark trap Ido Schimmel
@ 2021-01-17  8:02 ` Ido Schimmel
  2021-01-17  8:02 ` [PATCH net-next 3/5] mlxsw: spectrum_qdisc: Offload RED qevent mark Ido Schimmel
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Ido Schimmel @ 2021-01-17  8:02 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, petrm, jiri, amcohen, mlxsw, Ido Schimmel

From: Petr Machata <petrm@nvidia.com>

Register with devlink a new buffer drop trap, ecn_mark. Since Spectrum-1
does not support the ecn_mark trap, do it only for Spectrum-2 and above.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c
index 4ef12e3e021a..45cf6c027cac 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c
@@ -51,6 +51,8 @@ enum {
 enum {
 	/* Packet was early dropped. */
 	MLXSW_SP_MIRROR_REASON_INGRESS_WRED = 9,
+	/* Packet was ECN marked. */
+	MLXSW_SP_MIRROR_REASON_EGRESS_ECN = 13,
 };
 
 static int mlxsw_sp_rx_listener(struct mlxsw_sp *mlxsw_sp, struct sk_buff *skb,
@@ -1760,6 +1762,13 @@ mlxsw_sp2_trap_items_arr[] = {
 		},
 		.is_source = true,
 	},
+	{
+		.trap = MLXSW_SP_TRAP_BUFFER_DROP(ECN_MARK),
+		.listeners_arr = {
+			MLXSW_SP_RXL_BUFFER_DISCARD(EGRESS_ECN),
+		},
+		.is_source = true,
+	},
 };
 
 static int
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 3/5] mlxsw: spectrum_qdisc: Offload RED qevent mark
  2021-01-17  8:02 [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark" Ido Schimmel
  2021-01-17  8:02 ` [PATCH net-next 1/5] devlink: Add ecn_mark trap Ido Schimmel
  2021-01-17  8:02 ` [PATCH net-next 2/5] mlxsw: spectrum_trap: " Ido Schimmel
@ 2021-01-17  8:02 ` Ido Schimmel
  2021-01-17  8:02 ` [PATCH net-next 4/5] selftests: mlxsw: sch_red_core: Drop two unused variables Ido Schimmel
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Ido Schimmel @ 2021-01-17  8:02 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, petrm, jiri, amcohen, mlxsw, Ido Schimmel

From: Petr Machata <petrm@nvidia.com>

The RED "mark" qevent can be offloaded under the same exact conditions as
the RED "tail_drop" qevent. Therefore recognize its binding type in the
TC_SETUP_BLOCK handler and translate to the right SPAN trigger.

The corresponding hardware trigger, MLXSW_SP_SPAN_TRIGGER_ECN, is
considered "egress", unlike the previously-offloaded _EARLY_DROP. Add a
helper to spectrum_span, mlxsw_sp_span_trigger_is_ingress(), to classify
triggers to ingress and egress. Pass result of this instead of hardcoding
true when calling mlxsw_sp_span_analyzed_port_get()/_put().

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c   |  2 ++
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h   |  2 ++
 .../net/ethernet/mellanox/mlxsw/spectrum_qdisc.c | 14 +++++++++++---
 .../net/ethernet/mellanox/mlxsw/spectrum_span.c  | 16 ++++++++++++++++
 .../net/ethernet/mellanox/mlxsw/spectrum_span.h  |  1 +
 5 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 1650d9852b5b..e93b96837a44 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -992,6 +992,8 @@ static int mlxsw_sp_setup_tc_block(struct mlxsw_sp_port *mlxsw_sp_port,
 		return mlxsw_sp_setup_tc_block_clsact(mlxsw_sp_port, f, false);
 	case FLOW_BLOCK_BINDER_TYPE_RED_EARLY_DROP:
 		return mlxsw_sp_setup_tc_block_qevent_early_drop(mlxsw_sp_port, f);
+	case FLOW_BLOCK_BINDER_TYPE_RED_MARK:
+		return mlxsw_sp_setup_tc_block_qevent_mark(mlxsw_sp_port, f);
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index a6956cfc9cb1..0e5fd6a73da1 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -1109,6 +1109,8 @@ int mlxsw_sp_setup_tc_fifo(struct mlxsw_sp_port *mlxsw_sp_port,
 			   struct tc_fifo_qopt_offload *p);
 int mlxsw_sp_setup_tc_block_qevent_early_drop(struct mlxsw_sp_port *mlxsw_sp_port,
 					      struct flow_block_offload *f);
+int mlxsw_sp_setup_tc_block_qevent_mark(struct mlxsw_sp_port *mlxsw_sp_port,
+					struct flow_block_offload *f);
 
 /* spectrum_fid.c */
 bool mlxsw_sp_fid_is_dummy(struct mlxsw_sp *mlxsw_sp, u16 fid_index);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
index fd672c6c9133..5cd8854e6070 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
@@ -1327,6 +1327,7 @@ static int mlxsw_sp_qevent_span_configure(struct mlxsw_sp *mlxsw_sp,
 					  const struct mlxsw_sp_span_agent_parms *agent_parms,
 					  int *p_span_id)
 {
+	bool ingress = mlxsw_sp_span_trigger_is_ingress(qevent_binding->span_trigger);
 	struct mlxsw_sp_port *mlxsw_sp_port = qevent_binding->mlxsw_sp_port;
 	struct mlxsw_sp_span_trigger_parms trigger_parms = {};
 	int span_id;
@@ -1336,7 +1337,7 @@ static int mlxsw_sp_qevent_span_configure(struct mlxsw_sp *mlxsw_sp,
 	if (err)
 		return err;
 
-	err = mlxsw_sp_span_analyzed_port_get(mlxsw_sp_port, true);
+	err = mlxsw_sp_span_analyzed_port_get(mlxsw_sp_port, ingress);
 	if (err)
 		goto err_analyzed_port_get;
 
@@ -1358,7 +1359,7 @@ static int mlxsw_sp_qevent_span_configure(struct mlxsw_sp *mlxsw_sp,
 	mlxsw_sp_span_agent_unbind(mlxsw_sp, qevent_binding->span_trigger, mlxsw_sp_port,
 				   &trigger_parms);
 err_agent_bind:
-	mlxsw_sp_span_analyzed_port_put(mlxsw_sp_port, true);
+	mlxsw_sp_span_analyzed_port_put(mlxsw_sp_port, ingress);
 err_analyzed_port_get:
 	mlxsw_sp_span_agent_put(mlxsw_sp, span_id);
 	return err;
@@ -1368,6 +1369,7 @@ static void mlxsw_sp_qevent_span_deconfigure(struct mlxsw_sp *mlxsw_sp,
 					     struct mlxsw_sp_qevent_binding *qevent_binding,
 					     int span_id)
 {
+	bool ingress = mlxsw_sp_span_trigger_is_ingress(qevent_binding->span_trigger);
 	struct mlxsw_sp_port *mlxsw_sp_port = qevent_binding->mlxsw_sp_port;
 	struct mlxsw_sp_span_trigger_parms trigger_parms = {
 		.span_id = span_id,
@@ -1377,7 +1379,7 @@ static void mlxsw_sp_qevent_span_deconfigure(struct mlxsw_sp *mlxsw_sp,
 				      qevent_binding->tclass_num);
 	mlxsw_sp_span_agent_unbind(mlxsw_sp, qevent_binding->span_trigger, mlxsw_sp_port,
 				   &trigger_parms);
-	mlxsw_sp_span_analyzed_port_put(mlxsw_sp_port, true);
+	mlxsw_sp_span_analyzed_port_put(mlxsw_sp_port, ingress);
 	mlxsw_sp_span_agent_put(mlxsw_sp, span_id);
 }
 
@@ -1828,6 +1830,12 @@ int mlxsw_sp_setup_tc_block_qevent_early_drop(struct mlxsw_sp_port *mlxsw_sp_por
 	return mlxsw_sp_setup_tc_block_qevent(mlxsw_sp_port, f, MLXSW_SP_SPAN_TRIGGER_EARLY_DROP);
 }
 
+int mlxsw_sp_setup_tc_block_qevent_mark(struct mlxsw_sp_port *mlxsw_sp_port,
+					struct flow_block_offload *f)
+{
+	return mlxsw_sp_setup_tc_block_qevent(mlxsw_sp_port, f, MLXSW_SP_SPAN_TRIGGER_ECN);
+}
+
 int mlxsw_sp_tc_qdisc_init(struct mlxsw_sp_port *mlxsw_sp_port)
 {
 	struct mlxsw_sp_qdisc_state *qdisc_state;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
index c6c5826aba41..d44a93d69972 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
@@ -1631,6 +1631,22 @@ void mlxsw_sp_span_trigger_disable(struct mlxsw_sp_port *mlxsw_sp_port,
 	return trigger_entry->ops->disable(trigger_entry, mlxsw_sp_port, tc);
 }
 
+bool mlxsw_sp_span_trigger_is_ingress(enum mlxsw_sp_span_trigger trigger)
+{
+	switch (trigger) {
+	case MLXSW_SP_SPAN_TRIGGER_INGRESS:
+	case MLXSW_SP_SPAN_TRIGGER_EARLY_DROP:
+	case MLXSW_SP_SPAN_TRIGGER_TAIL_DROP:
+		return true;
+	case MLXSW_SP_SPAN_TRIGGER_EGRESS:
+	case MLXSW_SP_SPAN_TRIGGER_ECN:
+		return false;
+	}
+
+	WARN_ON_ONCE(1);
+	return false;
+}
+
 static int mlxsw_sp1_span_init(struct mlxsw_sp *mlxsw_sp)
 {
 	size_t arr_size = ARRAY_SIZE(mlxsw_sp1_span_entry_ops_arr);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.h
index d907718bc8c5..dbf54752d814 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.h
@@ -103,6 +103,7 @@ int mlxsw_sp_span_trigger_enable(struct mlxsw_sp_port *mlxsw_sp_port,
 				 enum mlxsw_sp_span_trigger trigger, u8 tc);
 void mlxsw_sp_span_trigger_disable(struct mlxsw_sp_port *mlxsw_sp_port,
 				   enum mlxsw_sp_span_trigger trigger, u8 tc);
+bool mlxsw_sp_span_trigger_is_ingress(enum mlxsw_sp_span_trigger trigger);
 
 extern const struct mlxsw_sp_span_ops mlxsw_sp1_span_ops;
 extern const struct mlxsw_sp_span_ops mlxsw_sp2_span_ops;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 4/5] selftests: mlxsw: sch_red_core: Drop two unused variables
  2021-01-17  8:02 [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark" Ido Schimmel
                   ` (2 preceding siblings ...)
  2021-01-17  8:02 ` [PATCH net-next 3/5] mlxsw: spectrum_qdisc: Offload RED qevent mark Ido Schimmel
@ 2021-01-17  8:02 ` Ido Schimmel
  2021-01-17  8:02 ` [PATCH net-next 5/5] selftests: mlxsw: RED: Add selftests for the mark qevent Ido Schimmel
  2021-01-19 22:22 ` [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark" Jakub Kicinski
  5 siblings, 0 replies; 14+ messages in thread
From: Ido Schimmel @ 2021-01-17  8:02 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, petrm, jiri, amcohen, mlxsw, Ido Schimmel

From: Petr Machata <petrm@nvidia.com>

These variables are cut'n'pasted from other functions in the file and not
actually used.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
index b0cb1aaffdda..d439ee0eaa8a 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
@@ -551,10 +551,8 @@ do_drop_test()
 	local trigger=$1; shift
 	local subtest=$1; shift
 	local fetch_counter=$1; shift
-	local backlog
 	local base
 	local now
-	local pct
 
 	RET=0
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 5/5] selftests: mlxsw: RED: Add selftests for the mark qevent
  2021-01-17  8:02 [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark" Ido Schimmel
                   ` (3 preceding siblings ...)
  2021-01-17  8:02 ` [PATCH net-next 4/5] selftests: mlxsw: sch_red_core: Drop two unused variables Ido Schimmel
@ 2021-01-17  8:02 ` Ido Schimmel
  2021-01-19 22:22 ` [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark" Jakub Kicinski
  5 siblings, 0 replies; 14+ messages in thread
From: Ido Schimmel @ 2021-01-17  8:02 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, petrm, jiri, amcohen, mlxsw, Ido Schimmel

From: Petr Machata <petrm@nvidia.com>

Add do_mark_test(), which is to do_ecn_test() like do_drop_test() is to
do_red_test(): meant to test that actions on the RED mark qevent block are
offloaded, and executed on ECN-marked packets.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 .../drivers/net/mlxsw/sch_red_core.sh         | 82 +++++++++++++++++++
 .../drivers/net/mlxsw/sch_red_ets.sh          | 74 +++++++++++++++--
 2 files changed, 151 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
index d439ee0eaa8a..6c2ddf76b4b8 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
@@ -544,6 +544,51 @@ do_mc_backlog_test()
 	log_test "TC $((vlan - 10)): Qdisc reports MC backlog"
 }
 
+do_mark_test()
+{
+	local vlan=$1; shift
+	local limit=$1; shift
+	local subtest=$1; shift
+	local fetch_counter=$1; shift
+	local should_fail=$1; shift
+	local base
+
+	RET=0
+
+	start_tcp_traffic $h1.$vlan $(ipaddr 1 $vlan) $(ipaddr 3 $vlan) $h3_mac tos=0x01
+
+	# Create a bit of a backlog and observe no mirroring due to marks.
+	qevent_rule_install_$subtest
+
+	build_backlog $vlan $((2 * limit / 3)) tcp tos=0x01 >/dev/null
+
+	base=$($fetch_counter)
+	count=$(busywait 1100 until_counter_is ">= $((base + 1))" $fetch_counter)
+	check_fail $? "Spurious packets ($base -> $count) observed without buffer pressure"
+
+	# Above limit, everything should be mirrored, we should see lots of
+	# packets.
+	build_backlog $vlan $((3 * limit / 2)) tcp tos=0x01 >/dev/null
+	busywait_for_counter 1100 +10000 \
+		 $fetch_counter > /dev/null
+	check_err_fail "$should_fail" $? "ECN-marked packets $subtest'd"
+
+	# When the rule is uninstalled, there should be no mirroring.
+	qevent_rule_uninstall_$subtest
+	busywait_for_counter 1100 +10 \
+		 $fetch_counter > /dev/null
+	check_fail $? "Spurious packets observed after uninstall"
+
+	if ((should_fail)); then
+		log_test "TC $((vlan - 10)): marked packets not $subtest'd"
+	else
+		log_test "TC $((vlan - 10)): marked packets $subtest'd"
+	fi
+
+	stop_traffic
+	sleep 1
+}
+
 do_drop_test()
 {
 	local vlan=$1; shift
@@ -626,6 +671,22 @@ do_drop_mirror_test()
 	tc filter del dev $h2 ingress pref 1 handle 101 flower
 }
 
+do_mark_mirror_test()
+{
+	local vlan=$1; shift
+	local limit=$1; shift
+
+	tc filter add dev $h2 ingress pref 1 handle 101 prot ip \
+	   flower skip_sw ip_proto tcp \
+	   action drop
+
+	do_mark_test "$vlan" "$limit" mirror \
+		     qevent_counter_fetch_mirror \
+		     $(: should_fail=)0
+
+	tc filter del dev $h2 ingress pref 1 handle 101 flower
+}
+
 qevent_rule_install_trap()
 {
 	tc filter add block 10 pref 1234 handle 102 matchall skip_sw \
@@ -653,3 +714,24 @@ do_drop_trap_test()
 	do_drop_test "$vlan" "$limit" "$trap_name" trap \
 		     "qevent_counter_fetch_trap $trap_name"
 }
+
+do_mark_trap_test()
+{
+	local vlan=$1; shift
+	local limit=$1; shift
+	local should_fail=$1; shift
+
+	do_mark_test "$vlan" "$limit" trap \
+		     "qevent_counter_fetch_trap ecn_mark" \
+		     "$should_fail"
+}
+
+do_mark_trap_test_pass()
+{
+	do_mark_trap_test "$@" 0
+}
+
+do_mark_trap_test_fail()
+{
+	do_mark_trap_test "$@" 1
+}
diff --git a/tools/testing/selftests/drivers/net/mlxsw/sch_red_ets.sh b/tools/testing/selftests/drivers/net/mlxsw/sch_red_ets.sh
index 3f007c5f8361..566cad32f346 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/sch_red_ets.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/sch_red_ets.sh
@@ -9,6 +9,8 @@ ALL_TESTS="
 	mc_backlog_test
 	red_mirror_test
 	red_trap_test
+	ecn_trap_test
+	ecn_mirror_test
 "
 : ${QDISC:=ets}
 source sch_red_core.sh
@@ -21,28 +23,60 @@ source sch_red_core.sh
 BACKLOG1=200000
 BACKLOG2=500000
 
-install_qdisc()
+install_root_qdisc()
 {
-	local -a args=("$@")
-
 	tc qdisc add dev $swp3 root handle 10: $QDISC \
 	   bands 8 priomap 7 6 5 4 3 2 1 0
+}
+
+install_qdisc_tc0()
+{
+	local -a args=("$@")
+
 	tc qdisc add dev $swp3 parent 10:8 handle 108: red \
 	   limit 1000000 min $BACKLOG1 max $((BACKLOG1 + 1)) \
 	   probability 1.0 avpkt 8000 burst 38 "${args[@]}"
+}
+
+install_qdisc_tc1()
+{
+	local -a args=("$@")
+
 	tc qdisc add dev $swp3 parent 10:7 handle 107: red \
 	   limit 1000000 min $BACKLOG2 max $((BACKLOG2 + 1)) \
 	   probability 1.0 avpkt 8000 burst 63 "${args[@]}"
+}
+
+install_qdisc()
+{
+	install_root_qdisc
+	install_qdisc_tc0 "$@"
+	install_qdisc_tc1 "$@"
 	sleep 1
 }
 
-uninstall_qdisc()
+uninstall_qdisc_tc0()
 {
-	tc qdisc del dev $swp3 parent 10:7
 	tc qdisc del dev $swp3 parent 10:8
+}
+
+uninstall_qdisc_tc1()
+{
+	tc qdisc del dev $swp3 parent 10:7
+}
+
+uninstall_root_qdisc()
+{
 	tc qdisc del dev $swp3 root
 }
 
+uninstall_qdisc()
+{
+	uninstall_qdisc_tc0
+	uninstall_qdisc_tc1
+	uninstall_root_qdisc
+}
+
 ecn_test()
 {
 	install_qdisc ecn
@@ -105,6 +139,36 @@ red_trap_test()
 	uninstall_qdisc
 }
 
+ecn_mirror_test()
+{
+	install_qdisc ecn qevent mark block 10
+
+	do_mark_mirror_test 10 $BACKLOG1
+	do_mark_mirror_test 11 $BACKLOG2
+
+	uninstall_qdisc
+}
+
+ecn_trap_test()
+{
+	install_root_qdisc
+	install_qdisc_tc0 ecn qevent mark block 10
+	install_qdisc_tc1 ecn
+
+	do_mark_trap_test_pass 10 $BACKLOG1
+	do_mark_trap_test_fail 11 $BACKLOG2
+
+	uninstall_qdisc_tc1
+	install_qdisc_tc1 ecn qevent mark block 10
+
+	do_mark_trap_test_pass 10 $BACKLOG1
+	do_mark_trap_test_pass 11 $BACKLOG2
+
+	uninstall_qdisc_tc1
+	uninstall_qdisc_tc0
+	uninstall_root_qdisc
+}
+
 trap cleanup EXIT
 
 setup_prepare
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark"
  2021-01-17  8:02 [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark" Ido Schimmel
                   ` (4 preceding siblings ...)
  2021-01-17  8:02 ` [PATCH net-next 5/5] selftests: mlxsw: RED: Add selftests for the mark qevent Ido Schimmel
@ 2021-01-19 22:22 ` Jakub Kicinski
  2021-01-20  9:14   ` Ido Schimmel
  5 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2021-01-19 22:22 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, petrm, jiri, amcohen, mlxsw, Ido Schimmel

On Sun, 17 Jan 2021 10:02:18 +0200 Ido Schimmel wrote:
> From: Ido Schimmel <idosch@nvidia.com>
> 
> The RED qdisc currently supports two qevents: "early_drop" and "mark". The
> filters added to the block bound to the "early_drop" qevent are executed on
> packets for which the RED algorithm decides that they should be
> early-dropped. The "mark" filters are similarly executed on ECT packets
> that are marked as ECN-CE (Congestion Encountered).
> 
> A previous patchset has offloaded "early_drop" filters on Spectrum-2 and
> later, provided that the classifier used is "matchall", that the action
> used is either "trap" or "mirred", and a handful or further limitations.

For early_drop trap or mirred makes obvious sense, no explanation
needed.

But for marked as a user I'd like to see a _copy_ of the packet, 
while the original continues on its marry way to the destination.
I'd venture to say that e.g. for a DCTCP deployment mark+trap is
unusable, at least for tracing, because it distorts the operation 
by effectively dropping instead of marking.

Am I reading this right?

If that is the case and you really want to keep the mark+trap
functionality - I feel like at least better documentation is needed.
The current two liner should also be rewritten, quoting from patch 1:

> * - ``ecn_mark``
>   - ``drop``
>   - Traps ECN-capable packets that were marked with CE (Congestion
>     Encountered) code point by RED algorithm instead of being dropped

That needs to say that the trap is for datagrams trapped by a qevent.
Otherwise "Traps ... instead of being dropped" is too much of a
thought-shortcut, marked packets are not dropped.

(I'd also think that trap is better documented next to early_drop,
let's look at it from the reader's perspective)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark"
  2021-01-19 22:22 ` [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark" Jakub Kicinski
@ 2021-01-20  9:14   ` Ido Schimmel
  2021-01-21  0:45     ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Ido Schimmel @ 2021-01-20  9:14 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, davem, petrm, jiri, amcohen, mlxsw, Ido Schimmel

On Tue, Jan 19, 2021 at 02:22:55PM -0800, Jakub Kicinski wrote:
> On Sun, 17 Jan 2021 10:02:18 +0200 Ido Schimmel wrote:
> > From: Ido Schimmel <idosch@nvidia.com>
> > 
> > The RED qdisc currently supports two qevents: "early_drop" and "mark". The
> > filters added to the block bound to the "early_drop" qevent are executed on
> > packets for which the RED algorithm decides that they should be
> > early-dropped. The "mark" filters are similarly executed on ECT packets
> > that are marked as ECN-CE (Congestion Encountered).
> > 
> > A previous patchset has offloaded "early_drop" filters on Spectrum-2 and
> > later, provided that the classifier used is "matchall", that the action
> > used is either "trap" or "mirred", and a handful or further limitations.
> 
> For early_drop trap or mirred makes obvious sense, no explanation
> needed.
> 
> But for marked as a user I'd like to see a _copy_ of the packet, 
> while the original continues on its marry way to the destination.
> I'd venture to say that e.g. for a DCTCP deployment mark+trap is
> unusable, at least for tracing, because it distorts the operation 
> by effectively dropping instead of marking.
> 
> Am I reading this right?

You get a copy of the packet as otherwise it will create a lot of
problems (like you wrote).

> 
> If that is the case and you really want to keep the mark+trap
> functionality - I feel like at least better documentation is needed.
> The current two liner should also be rewritten, quoting from patch 1:
> 
> > * - ``ecn_mark``
> >   - ``drop``
> >   - Traps ECN-capable packets that were marked with CE (Congestion
> >     Encountered) code point by RED algorithm instead of being dropped
> 
> That needs to say that the trap is for datagrams trapped by a qevent.
> Otherwise "Traps ... instead of being dropped" is too much of a
> thought-shortcut, marked packets are not dropped.
> 
> (I'd also think that trap is better documented next to early_drop,
> let's look at it from the reader's perspective)

How about:

"Traps a copy of ECN-capable packets that were marked with CE
(Congestion Encountered) code point by RED algorithm instead of being
dropped. The trap is enabled by attaching a filter with action 'trap' to
the 'mark' qevent of the RED qdisc."

In addition, this output:

$ devlink trap show pci/0000:06:00.0 trap ecn_mark 
pci/0000:06:00.0:
  name ecn_mark type drop generic true action trap group buffer_drops

Can be converted to:

$ devlink trap show pci/0000:06:00.0 trap ecn_mark 
pci/0000:06:00.0:
  name ecn_mark type drop generic true action mirror group buffer_drops

"mirror: The packet is forwarded by the underlying device and a copy is sent to
the CPU."

In this case the action is static and you cannot change it.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark"
  2021-01-20  9:14   ` Ido Schimmel
@ 2021-01-21  0:45     ` Jakub Kicinski
  2021-01-21 10:23       ` Ido Schimmel
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2021-01-21  0:45 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, petrm, jiri, amcohen, mlxsw, Ido Schimmel

On Wed, 20 Jan 2021 11:14:37 +0200 Ido Schimmel wrote:
> On Tue, Jan 19, 2021 at 02:22:55PM -0800, Jakub Kicinski wrote:
> > On Sun, 17 Jan 2021 10:02:18 +0200 Ido Schimmel wrote:  
> > > From: Ido Schimmel <idosch@nvidia.com>
> > > 
> > > The RED qdisc currently supports two qevents: "early_drop" and "mark". The
> > > filters added to the block bound to the "early_drop" qevent are executed on
> > > packets for which the RED algorithm decides that they should be
> > > early-dropped. The "mark" filters are similarly executed on ECT packets
> > > that are marked as ECN-CE (Congestion Encountered).
> > > 
> > > A previous patchset has offloaded "early_drop" filters on Spectrum-2 and
> > > later, provided that the classifier used is "matchall", that the action
> > > used is either "trap" or "mirred", and a handful or further limitations.  
> > 
> > For early_drop trap or mirred makes obvious sense, no explanation
> > needed.
> > 
> > But for marked as a user I'd like to see a _copy_ of the packet, 
> > while the original continues on its marry way to the destination.
> > I'd venture to say that e.g. for a DCTCP deployment mark+trap is
> > unusable, at least for tracing, because it distorts the operation 
> > by effectively dropping instead of marking.
> > 
> > Am I reading this right?  
> 
> You get a copy of the packet as otherwise it will create a lot of
> problems (like you wrote).

Hm, so am I missing some background on semantics on TC_ACT_TRAP?
Or perhaps you use a different action code?

AFAICT the code in the kernel is:

struct sk_buff *tcf_qevent_handle(...

	case TC_ACT_STOLEN:
	case TC_ACT_QUEUED:
	case TC_ACT_TRAP:
		__qdisc_drop(skb, to_free);
		*ret = __NET_XMIT_STOLEN;
		return NULL;

Having TRAP mean DROP makes sense for filters, but in case of qevents
shouldn't they be a no-op?

Looking at sch_red looks like TRAP being a no-op would actually give us
the expected behavior.

> > If that is the case and you really want to keep the mark+trap
> > functionality - I feel like at least better documentation is needed.
> > The current two liner should also be rewritten, quoting from patch 1:
> >   
> > > * - ``ecn_mark``
> > >   - ``drop``
> > >   - Traps ECN-capable packets that were marked with CE (Congestion
> > >     Encountered) code point by RED algorithm instead of being dropped  
> > 
> > That needs to say that the trap is for datagrams trapped by a qevent.
> > Otherwise "Traps ... instead of being dropped" is too much of a
> > thought-shortcut, marked packets are not dropped.
> > 
> > (I'd also think that trap is better documented next to early_drop,
> > let's look at it from the reader's perspective)  
> 
> How about:
> 
> "Traps a copy of ECN-capable packets that were marked with CE

I think "Traps copies" or "Traps the copy of .. packet"?
I'm not a native speaker but there seems to be a grammatical mix here.

> (Congestion Encountered) code point by RED algorithm instead of being
> dropped. The trap is enabled by attaching a filter with action 'trap' to

... instead of those copies being dropped.

> the 'mark' qevent of the RED qdisc."
>
> In addition, this output:
> 
> $ devlink trap show pci/0000:06:00.0 trap ecn_mark 
> pci/0000:06:00.0:
>   name ecn_mark type drop generic true action trap group buffer_drops
> 
> Can be converted to:
> 
> $ devlink trap show pci/0000:06:00.0 trap ecn_mark 
> pci/0000:06:00.0:
>   name ecn_mark type drop generic true action mirror group buffer_drops
> 
> "mirror: The packet is forwarded by the underlying device and a copy is sent to
> the CPU."
> 
> In this case the action is static and you cannot change it.

Oh yes, that's nice, I thought mirror in traps means mirror to another
port. Are there already traps which implement the mirroring / trapping
a clone? Quick grep yields nothing of substance.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark"
  2021-01-21  0:45     ` Jakub Kicinski
@ 2021-01-21 10:23       ` Ido Schimmel
  2021-01-21 17:19         ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Ido Schimmel @ 2021-01-21 10:23 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, davem, petrm, jiri, amcohen, mlxsw, Ido Schimmel

On Wed, Jan 20, 2021 at 04:45:08PM -0800, Jakub Kicinski wrote:
> On Wed, 20 Jan 2021 11:14:37 +0200 Ido Schimmel wrote:
> > On Tue, Jan 19, 2021 at 02:22:55PM -0800, Jakub Kicinski wrote:
> > > On Sun, 17 Jan 2021 10:02:18 +0200 Ido Schimmel wrote:  
> > > > From: Ido Schimmel <idosch@nvidia.com>
> > > > 
> > > > The RED qdisc currently supports two qevents: "early_drop" and "mark". The
> > > > filters added to the block bound to the "early_drop" qevent are executed on
> > > > packets for which the RED algorithm decides that they should be
> > > > early-dropped. The "mark" filters are similarly executed on ECT packets
> > > > that are marked as ECN-CE (Congestion Encountered).
> > > > 
> > > > A previous patchset has offloaded "early_drop" filters on Spectrum-2 and
> > > > later, provided that the classifier used is "matchall", that the action
> > > > used is either "trap" or "mirred", and a handful or further limitations.  
> > > 
> > > For early_drop trap or mirred makes obvious sense, no explanation
> > > needed.
> > > 
> > > But for marked as a user I'd like to see a _copy_ of the packet, 
> > > while the original continues on its marry way to the destination.
> > > I'd venture to say that e.g. for a DCTCP deployment mark+trap is
> > > unusable, at least for tracing, because it distorts the operation 
> > > by effectively dropping instead of marking.
> > > 
> > > Am I reading this right?  
> > 
> > You get a copy of the packet as otherwise it will create a lot of
> > problems (like you wrote).
> 
> Hm, so am I missing some background on semantics on TC_ACT_TRAP?
> Or perhaps you use a different action code?

Well, to make it really clear, we can add TC_ACT_TRAP_MIRROR.

TC_ACT_TRAP: Sole copy goes to the CPU
TC_ACT_TRAP_MIRROR: The packet is forwarded by the underlying device and
a copy is sent to the CPU

And only allow (in mlxsw) attaching filters with TC_ACT_TRAP_MIRROR to
the "mark" qevent.

> 
> AFAICT the code in the kernel is:
> 
> struct sk_buff *tcf_qevent_handle(...
> 
> 	case TC_ACT_STOLEN:
> 	case TC_ACT_QUEUED:
> 	case TC_ACT_TRAP:
> 		__qdisc_drop(skb, to_free);
> 		*ret = __NET_XMIT_STOLEN;
> 		return NULL;
> 
> Having TRAP mean DROP makes sense for filters, but in case of qevents
> shouldn't they be a no-op?
> 
> Looking at sch_red looks like TRAP being a no-op would actually give us
> the expected behavior.

I'm not sure it makes sense to try to interpret these actions in
software (I expect they will be used with "skip_sw" filters), but
TC_ACT_TRAP_MIRROR can be a no-op like you suggested.

> 
> > > If that is the case and you really want to keep the mark+trap
> > > functionality - I feel like at least better documentation is needed.
> > > The current two liner should also be rewritten, quoting from patch 1:
> > >   
> > > > * - ``ecn_mark``
> > > >   - ``drop``
> > > >   - Traps ECN-capable packets that were marked with CE (Congestion
> > > >     Encountered) code point by RED algorithm instead of being dropped  
> > > 
> > > That needs to say that the trap is for datagrams trapped by a qevent.
> > > Otherwise "Traps ... instead of being dropped" is too much of a
> > > thought-shortcut, marked packets are not dropped.
> > > 
> > > (I'd also think that trap is better documented next to early_drop,
> > > let's look at it from the reader's perspective)  
> > 
> > How about:
> > 
> > "Traps a copy of ECN-capable packets that were marked with CE
> 
> I think "Traps copies" or "Traps the copy of .. packet"?
> I'm not a native speaker but there seems to be a grammatical mix here.
> 
> > (Congestion Encountered) code point by RED algorithm instead of being
> > dropped. The trap is enabled by attaching a filter with action 'trap' to
> 
> ... instead of those copies being dropped.

Will reword

> 
> > the 'mark' qevent of the RED qdisc."
> >
> > In addition, this output:
> > 
> > $ devlink trap show pci/0000:06:00.0 trap ecn_mark 
> > pci/0000:06:00.0:
> >   name ecn_mark type drop generic true action trap group buffer_drops
> > 
> > Can be converted to:
> > 
> > $ devlink trap show pci/0000:06:00.0 trap ecn_mark 
> > pci/0000:06:00.0:
> >   name ecn_mark type drop generic true action mirror group buffer_drops
> > 
> > "mirror: The packet is forwarded by the underlying device and a copy is sent to
> > the CPU."
> > 
> > In this case the action is static and you cannot change it.
> 
> Oh yes, that's nice, I thought mirror in traps means mirror to another
> port. Are there already traps which implement the mirroring / trapping
> a clone? Quick grep yields nothing of substance.

Yes. That's why we have the 'offload_fwd_mark' and 'offload_l3_fwd_mark'
bits in the skb. For example, we let the hardware flood ARP requests
('arp_request'), but also send a copy to the CPU in case it needs to
update its neighbour table. The trapping happens at L2, so we only set
the 'offload_fwd_mark' bit. It will tell the bridge driver to not flood
the packet again.

The 'offload_l3_fwd_mark' bit is mainly used to support one-armed router
use cases where a packet is forwarded through the same interface through
which it was received ('uc_loopback'). We do the forwarding in hardware,
but also send a copy to the CPU to give the kernel the chance to
generate an ICMP redirect if it was not disabled by the user. See more
info in commit 55827458e058 ("Merge branch
'mlxsw-Add-one-armed-router-support'").

I also want to explain how the qevent stuff works in hardware to make
sure it is all clear. We have the ability to bind different triggers to
a mirroring (SPAN) agent. The agent can point to a physical port /
virtual interface (e.g., gretap for ERSPAN) or to the CPU port. The
first is programmed via the mirred action and the second using the trap
action.

The triggers can be simple such as Rx/Tx packet (matchall + mirred) or
policy engine (flower + mirred). The more advanced triggers are various
buffer events such as early drops ('early_drop' qevent) and ECN marking
('mark' qevent). Currently, it is only possible to bind these triggers
to a mirroring agent which is why we only support (in mlxsw) attaching
matchall filters to these qevents. In the future we might be able to
bind ACLs to these triggers in which case we will allow attaching flower
filters. devlink-trap is really only a read-only interface in this case,
meant to tell you why you go the packet from the hardware datapath. The
enablement / disablement is done by tc which gives us feature parity
with the software datapath.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark"
  2021-01-21 10:23       ` Ido Schimmel
@ 2021-01-21 17:19         ` Jakub Kicinski
  2021-01-23 15:28           ` Ido Schimmel
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2021-01-21 17:19 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, petrm, jiri, amcohen, mlxsw, Ido Schimmel

On Thu, 21 Jan 2021 12:23:18 +0200 Ido Schimmel wrote:
> On Wed, Jan 20, 2021 at 04:45:08PM -0800, Jakub Kicinski wrote:
> > On Wed, 20 Jan 2021 11:14:37 +0200 Ido Schimmel wrote:  
> > > On Tue, Jan 19, 2021 at 02:22:55PM -0800, Jakub Kicinski wrote:  
> > > > On Sun, 17 Jan 2021 10:02:18 +0200 Ido Schimmel wrote:    
> > > > > From: Ido Schimmel <idosch@nvidia.com>
> > > > > 
> > > > > The RED qdisc currently supports two qevents: "early_drop" and "mark". The
> > > > > filters added to the block bound to the "early_drop" qevent are executed on
> > > > > packets for which the RED algorithm decides that they should be
> > > > > early-dropped. The "mark" filters are similarly executed on ECT packets
> > > > > that are marked as ECN-CE (Congestion Encountered).
> > > > > 
> > > > > A previous patchset has offloaded "early_drop" filters on Spectrum-2 and
> > > > > later, provided that the classifier used is "matchall", that the action
> > > > > used is either "trap" or "mirred", and a handful or further limitations.    
> > > > 
> > > > For early_drop trap or mirred makes obvious sense, no explanation
> > > > needed.
> > > > 
> > > > But for marked as a user I'd like to see a _copy_ of the packet, 
> > > > while the original continues on its marry way to the destination.
> > > > I'd venture to say that e.g. for a DCTCP deployment mark+trap is
> > > > unusable, at least for tracing, because it distorts the operation 
> > > > by effectively dropping instead of marking.
> > > > 
> > > > Am I reading this right?    
> > > 
> > > You get a copy of the packet as otherwise it will create a lot of
> > > problems (like you wrote).  
> > 
> > Hm, so am I missing some background on semantics on TC_ACT_TRAP?
> > Or perhaps you use a different action code?  
> 
> Well, to make it really clear, we can add TC_ACT_TRAP_MIRROR.
> 
> TC_ACT_TRAP: Sole copy goes to the CPU
> TC_ACT_TRAP_MIRROR: The packet is forwarded by the underlying device and
> a copy is sent to the CPU
> 
> And only allow (in mlxsw) attaching filters with TC_ACT_TRAP_MIRROR to
> the "mark" qevent.
> 
> > 
> > AFAICT the code in the kernel is:
> > 
> > struct sk_buff *tcf_qevent_handle(...
> > 
> > 	case TC_ACT_STOLEN:
> > 	case TC_ACT_QUEUED:
> > 	case TC_ACT_TRAP:
> > 		__qdisc_drop(skb, to_free);
> > 		*ret = __NET_XMIT_STOLEN;
> > 		return NULL;
> > 
> > Having TRAP mean DROP makes sense for filters, but in case of qevents
> > shouldn't they be a no-op?
> > 
> > Looking at sch_red looks like TRAP being a no-op would actually give us
> > the expected behavior.  
> 
> I'm not sure it makes sense to try to interpret these actions in
> software (I expect they will be used with "skip_sw" filters), but
> TC_ACT_TRAP_MIRROR can be a no-op like you suggested.

Well our paradigm is SW defines the behavior, we can't have HW forward
and copy, while the SW drops the frame. Some engineer will try to
implement this some day in their switch driver, look at the SW behavior
and scratch their head.

> > > the 'mark' qevent of the RED qdisc."
> > >
> > > In addition, this output:
> > > 
> > > $ devlink trap show pci/0000:06:00.0 trap ecn_mark 
> > > pci/0000:06:00.0:
> > >   name ecn_mark type drop generic true action trap group buffer_drops
> > > 
> > > Can be converted to:
> > > 
> > > $ devlink trap show pci/0000:06:00.0 trap ecn_mark 
> > > pci/0000:06:00.0:
> > >   name ecn_mark type drop generic true action mirror group buffer_drops
> > > 
> > > "mirror: The packet is forwarded by the underlying device and a copy is sent to
> > > the CPU."
> > > 
> > > In this case the action is static and you cannot change it.  
> > 
> > Oh yes, that's nice, I thought mirror in traps means mirror to another
> > port. Are there already traps which implement the mirroring / trapping
> > a clone? Quick grep yields nothing of substance.  
> 
> Yes. That's why we have the 'offload_fwd_mark' and 'offload_l3_fwd_mark'
> bits in the skb. For example, we let the hardware flood ARP requests
> ('arp_request'), but also send a copy to the CPU in case it needs to
> update its neighbour table. The trapping happens at L2, so we only set
> the 'offload_fwd_mark' bit. It will tell the bridge driver to not flood
> the packet again.
> 
> The 'offload_l3_fwd_mark' bit is mainly used to support one-armed router
> use cases where a packet is forwarded through the same interface through
> which it was received ('uc_loopback'). We do the forwarding in hardware,
> but also send a copy to the CPU to give the kernel the chance to
> generate an ICMP redirect if it was not disabled by the user. See more
> info in commit 55827458e058 ("Merge branch
> 'mlxsw-Add-one-armed-router-support'").

I see, thanks for the example, but just to be clear those are "internal
traps", they don't have any impact on the devlink trap uAPI (in case we
want to change the definition of MIRRED since nothing is using it).

> I also want to explain how the qevent stuff works in hardware to make
> sure it is all clear. We have the ability to bind different triggers to
> a mirroring (SPAN) agent. The agent can point to a physical port /
> virtual interface (e.g., gretap for ERSPAN) or to the CPU port. The
> first is programmed via the mirred action and the second using the trap
> action.
> 
> The triggers can be simple such as Rx/Tx packet (matchall + mirred) or
> policy engine (flower + mirred). The more advanced triggers are various
> buffer events such as early drops ('early_drop' qevent) and ECN marking
> ('mark' qevent). Currently, it is only possible to bind these triggers
> to a mirroring agent which is why we only support (in mlxsw) attaching
> matchall filters to these qevents. In the future we might be able to
> bind ACLs to these triggers in which case we will allow attaching flower
> filters. devlink-trap is really only a read-only interface in this case,
> meant to tell you why you go the packet from the hardware datapath. The
> enablement / disablement is done by tc which gives us feature parity
> with the software datapath.

Thanks for the explanation. I feel more and more convinced now that
we should have TC_ACT_TRAP_MIRROR and the devlink trap should only 
be on/off :S Current model of "if ACT_TRAP consult devlink for trap
configuration" is impossible to model in SW since it doesn't have a
equivalent of devlink traps. Or we need that equivalent..

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark"
  2021-01-21 17:19         ` Jakub Kicinski
@ 2021-01-23 15:28           ` Ido Schimmel
  2021-01-23 19:55             ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Ido Schimmel @ 2021-01-23 15:28 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, davem, petrm, jiri, amcohen, mlxsw, Ido Schimmel

On Thu, Jan 21, 2021 at 09:19:40AM -0800, Jakub Kicinski wrote:
> On Thu, 21 Jan 2021 12:23:18 +0200 Ido Schimmel wrote:
> > On Wed, Jan 20, 2021 at 04:45:08PM -0800, Jakub Kicinski wrote:
> > > On Wed, 20 Jan 2021 11:14:37 +0200 Ido Schimmel wrote:  
> > > > On Tue, Jan 19, 2021 at 02:22:55PM -0800, Jakub Kicinski wrote:  
> > > > > On Sun, 17 Jan 2021 10:02:18 +0200 Ido Schimmel wrote:    
> > > > > > From: Ido Schimmel <idosch@nvidia.com>
> > > > > > 
> > > > > > The RED qdisc currently supports two qevents: "early_drop" and "mark". The
> > > > > > filters added to the block bound to the "early_drop" qevent are executed on
> > > > > > packets for which the RED algorithm decides that they should be
> > > > > > early-dropped. The "mark" filters are similarly executed on ECT packets
> > > > > > that are marked as ECN-CE (Congestion Encountered).
> > > > > > 
> > > > > > A previous patchset has offloaded "early_drop" filters on Spectrum-2 and
> > > > > > later, provided that the classifier used is "matchall", that the action
> > > > > > used is either "trap" or "mirred", and a handful or further limitations.    
> > > > > 
> > > > > For early_drop trap or mirred makes obvious sense, no explanation
> > > > > needed.
> > > > > 
> > > > > But for marked as a user I'd like to see a _copy_ of the packet, 
> > > > > while the original continues on its marry way to the destination.
> > > > > I'd venture to say that e.g. for a DCTCP deployment mark+trap is
> > > > > unusable, at least for tracing, because it distorts the operation 
> > > > > by effectively dropping instead of marking.
> > > > > 
> > > > > Am I reading this right?    
> > > > 
> > > > You get a copy of the packet as otherwise it will create a lot of
> > > > problems (like you wrote).  
> > > 
> > > Hm, so am I missing some background on semantics on TC_ACT_TRAP?
> > > Or perhaps you use a different action code?  
> > 
> > Well, to make it really clear, we can add TC_ACT_TRAP_MIRROR.
> > 
> > TC_ACT_TRAP: Sole copy goes to the CPU
> > TC_ACT_TRAP_MIRROR: The packet is forwarded by the underlying device and
> > a copy is sent to the CPU
> > 
> > And only allow (in mlxsw) attaching filters with TC_ACT_TRAP_MIRROR to
> > the "mark" qevent.
> > 
> > > 
> > > AFAICT the code in the kernel is:
> > > 
> > > struct sk_buff *tcf_qevent_handle(...
> > > 
> > > 	case TC_ACT_STOLEN:
> > > 	case TC_ACT_QUEUED:
> > > 	case TC_ACT_TRAP:
> > > 		__qdisc_drop(skb, to_free);
> > > 		*ret = __NET_XMIT_STOLEN;
> > > 		return NULL;
> > > 
> > > Having TRAP mean DROP makes sense for filters, but in case of qevents
> > > shouldn't they be a no-op?
> > > 
> > > Looking at sch_red looks like TRAP being a no-op would actually give us
> > > the expected behavior.  
> > 
> > I'm not sure it makes sense to try to interpret these actions in
> > software (I expect they will be used with "skip_sw" filters), but
> > TC_ACT_TRAP_MIRROR can be a no-op like you suggested.
> 
> Well our paradigm is SW defines the behavior, we can't have HW forward
> and copy, while the SW drops the frame. Some engineer will try to
> implement this some day in their switch driver, look at the SW behavior
> and scratch their head.

OK, TC_ACT_TRAP_MIRROR will be a no-op in software

> 
> > > > the 'mark' qevent of the RED qdisc."
> > > >
> > > > In addition, this output:
> > > > 
> > > > $ devlink trap show pci/0000:06:00.0 trap ecn_mark 
> > > > pci/0000:06:00.0:
> > > >   name ecn_mark type drop generic true action trap group buffer_drops
> > > > 
> > > > Can be converted to:
> > > > 
> > > > $ devlink trap show pci/0000:06:00.0 trap ecn_mark 
> > > > pci/0000:06:00.0:
> > > >   name ecn_mark type drop generic true action mirror group buffer_drops
> > > > 
> > > > "mirror: The packet is forwarded by the underlying device and a copy is sent to
> > > > the CPU."
> > > > 
> > > > In this case the action is static and you cannot change it.  
> > > 
> > > Oh yes, that's nice, I thought mirror in traps means mirror to another
> > > port. Are there already traps which implement the mirroring / trapping
> > > a clone? Quick grep yields nothing of substance.  
> > 
> > Yes. That's why we have the 'offload_fwd_mark' and 'offload_l3_fwd_mark'
> > bits in the skb. For example, we let the hardware flood ARP requests
> > ('arp_request'), but also send a copy to the CPU in case it needs to
> > update its neighbour table. The trapping happens at L2, so we only set
> > the 'offload_fwd_mark' bit. It will tell the bridge driver to not flood
> > the packet again.
> > 
> > The 'offload_l3_fwd_mark' bit is mainly used to support one-armed router
> > use cases where a packet is forwarded through the same interface through
> > which it was received ('uc_loopback'). We do the forwarding in hardware,
> > but also send a copy to the CPU to give the kernel the chance to
> > generate an ICMP redirect if it was not disabled by the user. See more
> > info in commit 55827458e058 ("Merge branch
> > 'mlxsw-Add-one-armed-router-support'").
> 
> I see, thanks for the example, but just to be clear those are "internal
> traps", they don't have any impact on the devlink trap uAPI (in case we
> want to change the definition of MIRRED since nothing is using it).

It's not MIRRED, but MIRROR. Anyway, these are not internal traps:

$ devlink trap show pci/0000:01:00.0 trap arp_request 
pci/0000:01:00.0:
  name arp_request type control generic true action mirror group neigh_discovery

> 
> > I also want to explain how the qevent stuff works in hardware to make
> > sure it is all clear. We have the ability to bind different triggers to
> > a mirroring (SPAN) agent. The agent can point to a physical port /
> > virtual interface (e.g., gretap for ERSPAN) or to the CPU port. The
> > first is programmed via the mirred action and the second using the trap
> > action.
> > 
> > The triggers can be simple such as Rx/Tx packet (matchall + mirred) or
> > policy engine (flower + mirred). The more advanced triggers are various
> > buffer events such as early drops ('early_drop' qevent) and ECN marking
> > ('mark' qevent). Currently, it is only possible to bind these triggers
> > to a mirroring agent which is why we only support (in mlxsw) attaching
> > matchall filters to these qevents. In the future we might be able to
> > bind ACLs to these triggers in which case we will allow attaching flower
> > filters. devlink-trap is really only a read-only interface in this case,
> > meant to tell you why you go the packet from the hardware datapath. The
> > enablement / disablement is done by tc which gives us feature parity
> > with the software datapath.
> 
> Thanks for the explanation. I feel more and more convinced now that
> we should have TC_ACT_TRAP_MIRROR and the devlink trap should only 
> be on/off :S Current model of "if ACT_TRAP consult devlink for trap
> configuration" is impossible to model in SW since it doesn't have a
> equivalent of devlink traps. Or we need that equivalent..

Wait, the current model is not "if ACT_TRAP consult devlink for trap
configuration". 'ecn_mark' action is always 'trap' ('mirror' in v2) and
can't be changed. Such packets can always be sent to the CPU, but the
decision of whether to send them or not is based on the presence of tc
filters attached to RED's 'mark' qevent with TC_ACT_TRAP
(TC_ACT_TRAP_MIRROR in v2).

I believe that with the proposed changes in v2 it should be perfectly
clear that ECN marked packets are forwarded in hardware and a copy is
sent to the CPU.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark"
  2021-01-23 15:28           ` Ido Schimmel
@ 2021-01-23 19:55             ` Jakub Kicinski
  2021-01-24  8:42               ` Ido Schimmel
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2021-01-23 19:55 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, petrm, jiri, amcohen, mlxsw, Ido Schimmel

On Sat, 23 Jan 2021 17:28:02 +0200 Ido Schimmel wrote:
> > Thanks for the explanation. I feel more and more convinced now that
> > we should have TC_ACT_TRAP_MIRROR and the devlink trap should only 
> > be on/off :S Current model of "if ACT_TRAP consult devlink for trap
> > configuration" is impossible to model in SW since it doesn't have a
> > equivalent of devlink traps. Or we need that equivalent..  
> 
> Wait, the current model is not "if ACT_TRAP consult devlink for trap
> configuration". 'ecn_mark' action is always 'trap' ('mirror' in v2) and
> can't be changed. Such packets can always be sent to the CPU, but the
> decision of whether to send them or not is based on the presence of tc
> filters attached to RED's 'mark' qevent with TC_ACT_TRAP
> (TC_ACT_TRAP_MIRROR in v2).

I see, missed that, but I think my point conceptually stands, right?
Part of forwarding behavior was (in v1) only expressed in control 
plane (devlink) not dataplane (tc).
 
> I believe that with the proposed changes in v2 it should be perfectly
> clear that ECN marked packets are forwarded in hardware and a copy is
> sent to the CPU.

Yup, sounds good.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark"
  2021-01-23 19:55             ` Jakub Kicinski
@ 2021-01-24  8:42               ` Ido Schimmel
  0 siblings, 0 replies; 14+ messages in thread
From: Ido Schimmel @ 2021-01-24  8:42 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, davem, petrm, jiri, amcohen, mlxsw, Ido Schimmel

On Sat, Jan 23, 2021 at 11:55:27AM -0800, Jakub Kicinski wrote:
> On Sat, 23 Jan 2021 17:28:02 +0200 Ido Schimmel wrote:
> > > Thanks for the explanation. I feel more and more convinced now that
> > > we should have TC_ACT_TRAP_MIRROR and the devlink trap should only 
> > > be on/off :S Current model of "if ACT_TRAP consult devlink for trap
> > > configuration" is impossible to model in SW since it doesn't have a
> > > equivalent of devlink traps. Or we need that equivalent..  
> > 
> > Wait, the current model is not "if ACT_TRAP consult devlink for trap
> > configuration". 'ecn_mark' action is always 'trap' ('mirror' in v2) and
> > can't be changed. Such packets can always be sent to the CPU, but the
> > decision of whether to send them or not is based on the presence of tc
> > filters attached to RED's 'mark' qevent with TC_ACT_TRAP
> > (TC_ACT_TRAP_MIRROR in v2).
> 
> I see, missed that, but I think my point conceptually stands, right?
> Part of forwarding behavior was (in v1) only expressed in control 
> plane (devlink) not dataplane (tc).

I don't think so? The action was set to 'trap' in both devlink and tc.

>  
> > I believe that with the proposed changes in v2 it should be perfectly
> > clear that ECN marked packets are forwarded in hardware and a copy is
> > sent to the CPU.
> 
> Yup, sounds good.

Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-01-24  8:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-17  8:02 [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark" Ido Schimmel
2021-01-17  8:02 ` [PATCH net-next 1/5] devlink: Add ecn_mark trap Ido Schimmel
2021-01-17  8:02 ` [PATCH net-next 2/5] mlxsw: spectrum_trap: " Ido Schimmel
2021-01-17  8:02 ` [PATCH net-next 3/5] mlxsw: spectrum_qdisc: Offload RED qevent mark Ido Schimmel
2021-01-17  8:02 ` [PATCH net-next 4/5] selftests: mlxsw: sch_red_core: Drop two unused variables Ido Schimmel
2021-01-17  8:02 ` [PATCH net-next 5/5] selftests: mlxsw: RED: Add selftests for the mark qevent Ido Schimmel
2021-01-19 22:22 ` [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark" Jakub Kicinski
2021-01-20  9:14   ` Ido Schimmel
2021-01-21  0:45     ` Jakub Kicinski
2021-01-21 10:23       ` Ido Schimmel
2021-01-21 17:19         ` Jakub Kicinski
2021-01-23 15:28           ` Ido Schimmel
2021-01-23 19:55             ` Jakub Kicinski
2021-01-24  8:42               ` Ido Schimmel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).