[net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping

All of lore.kernel.org
 help / color / mirror / Atom feed

* [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping
@ 2022-09-08  1:23 Amritha Nambiar
  2022-09-08  1:24 ` [net-next PATCH v2 1/4] act_skbedit: Add support for action skbedit " Amritha Nambiar
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Amritha Nambiar @ 2022-09-08  1:23 UTC (permalink / raw)
  To: netdev, kuba
  Cc: alexander.duyck, jhs, jiri, xiyou.wangcong, vinicius.gomes,
	sridhar.samudrala, amritha.nambiar

Based on the discussion on
https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/,
the following series extends skbedit tc action to RX queue mapping.
Currently, skbedit action in tc allows overriding of transmit queue.
Extending this ability of skedit action supports the selection of receive
queue for incoming packets. Offloading this action is added for receive
side. Enabled ice driver to offload this type of filter into the
hardware for accepting packets to the device's receive queue.

v2: Added documentation in Documentation/networking

---

Amritha Nambiar (4):
      act_skbedit: Add support for action skbedit RX queue mapping
      act_skbedit: Offload skbedit queue mapping for receive queue
      ice: Enable RX queue selection using skbedit action
      Documentation: networking: TC queue based filtering


 Documentation/networking/tc-queue-filters.rst |   24 ++
 drivers/net/ethernet/intel/ice/ice.h          |   15 +
 drivers/net/ethernet/intel/ice/ice_main.c     |    2 
 drivers/net/ethernet/intel/ice/ice_tc_lib.c   |  351 ++++++++++++++++++-------
 drivers/net/ethernet/intel/ice/ice_tc_lib.h   |   40 ++-
 include/net/act_api.h                         |    1 
 include/net/flow_offload.h                    |    2 
 include/net/tc_act/tc_skbedit.h               |   11 +
 net/sched/act_skbedit.c                       |   40 ++-
 net/sched/cls_api.c                           |    7 
 10 files changed, 376 insertions(+), 117 deletions(-)
 create mode 100644 Documentation/networking/tc-queue-filters.rst

--

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [net-next PATCH v2 1/4] act_skbedit: Add support for action skbedit RX queue mapping
  2022-09-08  1:23 [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping Amritha Nambiar
@ 2022-09-08  1:24 ` Amritha Nambiar
  2022-09-08  1:24 ` [net-next PATCH v2 2/4] act_skbedit: Offload skbedit queue mapping for receive queue Amritha Nambiar
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Amritha Nambiar @ 2022-09-08  1:24 UTC (permalink / raw)
  To: netdev, kuba
  Cc: alexander.duyck, jhs, jiri, xiyou.wangcong, vinicius.gomes,
	sridhar.samudrala, amritha.nambiar

The skbedit action in tc allows the selection of transmit queue
in an interface with multiple queues. This patch extends this
ability of skedit action by supporting the selection of receive
queue on which the packets would arrive.

Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 net/sched/act_skbedit.c |   29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index e3bd11dfe1ca..9b8274d09117 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -37,6 +37,24 @@ static u16 tcf_skbedit_hash(struct tcf_skbedit_params *params,
 	return netdev_cap_txqueue(skb->dev, queue_mapping);
 }
 
+static void tcf_skbedit_act_txq(struct tcf_skbedit_params *params,
+				struct sk_buff *skb)
+{
+	if (skb->dev->real_num_tx_queues > params->queue_mapping) {
+#ifdef CONFIG_NET_EGRESS
+		netdev_xmit_skip_txqueue(true);
+#endif
+		skb_set_queue_mapping(skb, tcf_skbedit_hash(params, skb));
+	}
+}
+
+static void tcf_skbedit_act_rxq(struct tcf_skbedit_params *params,
+				struct sk_buff *skb)
+{
+	if (skb->dev->real_num_rx_queues > params->queue_mapping)
+		skb_record_rx_queue(skb, params->queue_mapping);
+}
+
 static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
 			   struct tcf_result *res)
 {
@@ -71,12 +89,11 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
 			break;
 		}
 	}
-	if (params->flags & SKBEDIT_F_QUEUE_MAPPING &&
-	    skb->dev->real_num_tx_queues > params->queue_mapping) {
-#ifdef CONFIG_NET_EGRESS
-		netdev_xmit_skip_txqueue(true);
-#endif
-		skb_set_queue_mapping(skb, tcf_skbedit_hash(params, skb));
+	if (params->flags & SKBEDIT_F_QUEUE_MAPPING) {
+		if (!skb_at_tc_ingress(skb))
+			tcf_skbedit_act_txq(params, skb);
+		else
+			tcf_skbedit_act_rxq(params, skb);
 	}
 	if (params->flags & SKBEDIT_F_MARK) {
 		skb->mark &= ~params->mask;


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net-next PATCH v2 2/4] act_skbedit: Offload skbedit queue mapping for receive queue
  2022-09-08  1:23 [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping Amritha Nambiar
  2022-09-08  1:24 ` [net-next PATCH v2 1/4] act_skbedit: Add support for action skbedit " Amritha Nambiar
@ 2022-09-08  1:24 ` Amritha Nambiar
  2022-09-08  1:24 ` [net-next PATCH v2 3/4] ice: Enable RX queue selection using skbedit action Amritha Nambiar
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Amritha Nambiar @ 2022-09-08  1:24 UTC (permalink / raw)
  To: netdev, kuba
  Cc: alexander.duyck, jhs, jiri, xiyou.wangcong, vinicius.gomes,
	sridhar.samudrala, amritha.nambiar

Add support for offloading skbedit queue mapping action on
receive side. This enables offloading filters for receive
queue selection in the hardware using the skbedit action.
Traffic arrives on the Rx queue requested in the skbedit
action parameter. A new tc action flag TCA_ACT_FLAGS_AT_INGRESS
is introduced to identify the traffic direction the action
queue_mapping is requested on during filter addition.
This is used to disallow offloading the skbedit queue mapping
action on transmit side.

Example:
$tc filter add dev $IFACE ingress protocol ip flower dst_ip $DST_IP\
skip_sw action skbedit queue_mapping $rxq_id

Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 include/net/act_api.h           |    1 +
 include/net/flow_offload.h      |    2 ++
 include/net/tc_act/tc_skbedit.h |   11 +++++++++++
 net/sched/act_skbedit.c         |   11 +++++++++--
 net/sched/cls_api.c             |    7 +++++++
 5 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 9cf6870b526e..7eb78519d579 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -67,6 +67,7 @@ struct tc_action {
 #define TCA_ACT_FLAGS_BIND	(1U << (TCA_ACT_FLAGS_USER_BITS + 1))
 #define TCA_ACT_FLAGS_REPLACE	(1U << (TCA_ACT_FLAGS_USER_BITS + 2))
 #define TCA_ACT_FLAGS_NO_RTNL	(1U << (TCA_ACT_FLAGS_USER_BITS + 3))
+#define TCA_ACT_FLAGS_AT_INGRESS	(1U << (TCA_ACT_FLAGS_USER_BITS + 4))
 
 /* Update lastuse only if needed, to avoid dirtying a cache line.
  * We use a temp variable to avoid fetching jiffies twice.
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 2a9a9e42e7fd..8b7786343a03 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -149,6 +149,7 @@ enum flow_action_id {
 	FLOW_ACTION_MARK,
 	FLOW_ACTION_PTYPE,
 	FLOW_ACTION_PRIORITY,
+	FLOW_ACTION_RX_QUEUE_MAPPING,
 	FLOW_ACTION_WAKE,
 	FLOW_ACTION_QUEUE,
 	FLOW_ACTION_SAMPLE,
@@ -241,6 +242,7 @@ struct flow_action_entry {
 		u32			csum_flags;	/* FLOW_ACTION_CSUM */
 		u32			mark;		/* FLOW_ACTION_MARK */
 		u16                     ptype;          /* FLOW_ACTION_PTYPE */
+		u16			rx_queue;	/* FLOW_ACTION_RX_QUEUE_MAPPING */
 		u32			priority;	/* FLOW_ACTION_PRIORITY */
 		struct {				/* FLOW_ACTION_QUEUE */
 			u32		ctx;
diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
index dc1079f28e13..07145aafb0f1 100644
--- a/include/net/tc_act/tc_skbedit.h
+++ b/include/net/tc_act/tc_skbedit.h
@@ -95,6 +95,17 @@ static inline u32 tcf_skbedit_priority(const struct tc_action *a)
 	return priority;
 }
 
+static inline u16 tcf_skbedit_rx_queue_mapping(const struct tc_action *a)
+{
+	u16 rx_queue;
+
+	rcu_read_lock();
+	rx_queue = rcu_dereference(to_skbedit(a)->params)->queue_mapping;
+	rcu_read_unlock();
+
+	return rx_queue;
+}
+
 /* Return true iff action is queue_mapping */
 static inline bool is_tcf_skbedit_queue_mapping(const struct tc_action *a)
 {
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index 9b8274d09117..f5d92ba916e6 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -410,8 +410,12 @@ static int tcf_skbedit_offload_act_setup(struct tc_action *act, void *entry_data
 			entry->id = FLOW_ACTION_PRIORITY;
 			entry->priority = tcf_skbedit_priority(act);
 		} else if (is_tcf_skbedit_queue_mapping(act)) {
-			NL_SET_ERR_MSG_MOD(extack, "Offload not supported when \"queue_mapping\" option is used");
-			return -EOPNOTSUPP;
+			if (!(act->tcfa_flags & TCA_ACT_FLAGS_AT_INGRESS)) {
+				NL_SET_ERR_MSG_MOD(extack, "Offload not supported when \"queue_mapping\" option is used on transmit side");
+				return -EOPNOTSUPP;
+			}
+			entry->id = FLOW_ACTION_RX_QUEUE_MAPPING;
+			entry->rx_queue = tcf_skbedit_rx_queue_mapping(act);
 		} else if (is_tcf_skbedit_inheritdsfield(act)) {
 			NL_SET_ERR_MSG_MOD(extack, "Offload not supported when \"inheritdsfield\" option is used");
 			return -EOPNOTSUPP;
@@ -429,6 +433,9 @@ static int tcf_skbedit_offload_act_setup(struct tc_action *act, void *entry_data
 			fl_action->id = FLOW_ACTION_PTYPE;
 		else if (is_tcf_skbedit_priority(act))
 			fl_action->id = FLOW_ACTION_PRIORITY;
+		else if (is_tcf_skbedit_queue_mapping(act) &&
+			 (act->tcfa_flags & TCA_ACT_FLAGS_AT_INGRESS))
+			fl_action->id = FLOW_ACTION_RX_QUEUE_MAPPING;
 		else
 			return -EOPNOTSUPP;
 	}
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 1ebab4b11262..9cc11395396b 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1953,6 +1953,11 @@ static void tfilter_put(struct tcf_proto *tp, void *fh)
 		tp->ops->put(tp, fh);
 }
 
+static bool is_qdisc_ingress(__u32 classid)
+{
+	return (TC_H_MIN(classid) == TC_H_MIN(TC_H_MIN_INGRESS));
+}
+
 static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 			  struct netlink_ext_ack *extack)
 {
@@ -2143,6 +2148,8 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		flags |= TCA_ACT_FLAGS_REPLACE;
 	if (!rtnl_held)
 		flags |= TCA_ACT_FLAGS_NO_RTNL;
+	if (is_qdisc_ingress(parent))
+		flags |= TCA_ACT_FLAGS_AT_INGRESS;
 	err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh,
 			      flags, extack);
 	if (err == 0) {


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net-next PATCH v2 3/4] ice: Enable RX queue selection using skbedit action
  2022-09-08  1:23 [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping Amritha Nambiar
  2022-09-08  1:24 ` [net-next PATCH v2 1/4] act_skbedit: Add support for action skbedit " Amritha Nambiar
  2022-09-08  1:24 ` [net-next PATCH v2 2/4] act_skbedit: Offload skbedit queue mapping for receive queue Amritha Nambiar
@ 2022-09-08  1:24 ` Amritha Nambiar
  2022-09-08  1:24 ` [net-next PATCH v2 4/4] Documentation: networking: TC queue based filtering Amritha Nambiar
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Amritha Nambiar @ 2022-09-08  1:24 UTC (permalink / raw)
  To: netdev, kuba
  Cc: alexander.duyck, jhs, jiri, xiyou.wangcong, vinicius.gomes,
	sridhar.samudrala, amritha.nambiar

This patch uses TC skbedit queue_mapping action to support
forwarding packets to a device queue. Such filters with action
forward to queue will be the highest priority switch filter in
HW.
Example:
$ tc filter add dev ens4f0 protocol ip ingress flower\
  dst_ip 192.168.1.12 ip_proto tcp dst_port 5001\
  skip_sw action skbedit queue_mapping 5

The above command adds an ingress filter, incoming packets
qualifying the match will be accepted into queue 5. The queue
number is in decimal format.

Refactored ice_add_tc_flower_adv_fltr() to consolidate code with
action FWD_TO_VSI and FWD_TO QUEUE.

Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h        |   15 +
 drivers/net/ethernet/intel/ice/ice_main.c   |    2 
 drivers/net/ethernet/intel/ice/ice_tc_lib.c |  351 +++++++++++++++++++--------
 drivers/net/ethernet/intel/ice/ice_tc_lib.h |   40 ++-
 4 files changed, 299 insertions(+), 109 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 001500afc4a6..d75c8d11f867 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -137,6 +137,21 @@
  */
 #define ICE_BW_KBPS_DIVISOR		125
 
+/* Default recipes have priority 4 and below, hence priority values between 5..7
+ * can be used as filter priority for advanced switch filter (advanced switch
+ * filters need new recipe to be created for specified extraction sequence
+ * because default recipe extraction sequence does not represent custom
+ * extraction)
+ */
+#define ICE_SWITCH_FLTR_PRIO_QUEUE	7
+/* prio 6 is reserved for future use (e.g. switch filter with L3 fields +
+ * (Optional: IP TOS/TTL) + L4 fields + (optionally: TCP fields such as
+ * SYN/FIN/RST))
+ */
+#define ICE_SWITCH_FLTR_PRIO_RSVD	6
+#define ICE_SWITCH_FLTR_PRIO_VSI	5
+#define ICE_SWITCH_FLTR_PRIO_QGRP	ICE_SWITCH_FLTR_PRIO_VSI
+
 /* Macro for each VSI in a PF */
 #define ice_for_each_vsi(pf, i) \
 	for ((i) = 0; (i) < (pf)->num_alloc_vsi; (i)++)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 14edf7614406..7d675ef167da 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -8245,7 +8245,7 @@ static void ice_rem_all_chnl_fltrs(struct ice_pf *pf)
 
 		rule.rid = fltr->rid;
 		rule.rule_id = fltr->rule_id;
-		rule.vsi_handle = fltr->dest_id;
+		rule.vsi_handle = fltr->dest_vsi_handle;
 		status = ice_rem_adv_rule_by_id(&pf->hw, &rule);
 		if (status) {
 			if (status == -ENOENT)
diff --git a/drivers/net/ethernet/intel/ice/ice_tc_lib.c b/drivers/net/ethernet/intel/ice/ice_tc_lib.c
index 42df686e0215..f95bd7d2f293 100644
--- a/drivers/net/ethernet/intel/ice/ice_tc_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_tc_lib.c
@@ -679,13 +679,123 @@ ice_eswitch_add_tc_fltr(struct ice_vsi *vsi, struct ice_tc_flower_fltr *fltr)
 	 */
 	fltr->rid = rule_added.rid;
 	fltr->rule_id = rule_added.rule_id;
-	fltr->dest_id = rule_added.vsi_handle;
+	fltr->dest_vsi_handle = rule_added.vsi_handle;
 
 exit:
 	kfree(list);
 	return ret;
 }
 
+/**
+ * ice_locate_vsi_using_queue - locate VSI using queue (forward to queue action)
+ * @vsi: Pointer to VSI
+ * @tc_fltr: Pointer to tc_flower_filter
+ *
+ * Locate the VSI using specified queue. When ADQ is not enabled, always
+ * return input VSI, otherwise locate corresponding VSI based on per channel
+ * offset and qcount
+ */
+static struct ice_vsi *
+ice_locate_vsi_using_queue(struct ice_vsi *vsi,
+			   struct ice_tc_flower_fltr *tc_fltr)
+{
+	int num_tc, tc, queue;
+
+	/* if ADQ is not active, passed VSI is the candidate VSI */
+	if (!ice_is_adq_active(vsi->back))
+		return vsi;
+
+	/* Locate the VSI (it could still be main PF VSI or CHNL_VSI depending
+	 * upon queue number)
+	 */
+	num_tc = vsi->mqprio_qopt.qopt.num_tc;
+	queue = tc_fltr->action.fwd.q.queue;
+
+	for (tc = 0; tc < num_tc; tc++) {
+		int qcount = vsi->mqprio_qopt.qopt.count[tc];
+		int offset = vsi->mqprio_qopt.qopt.offset[tc];
+
+		if (queue >= offset && queue < offset + qcount) {
+			/* for non-ADQ TCs, passed VSI is the candidate VSI */
+			if (tc < ICE_CHNL_START_TC)
+				return vsi;
+			else
+				return vsi->tc_map_vsi[tc];
+		}
+	}
+	return NULL;
+}
+
+static struct ice_rx_ring *
+ice_locate_rx_ring_using_queue(struct ice_vsi *vsi,
+			       struct ice_tc_flower_fltr *tc_fltr)
+{
+	u16 queue = tc_fltr->action.fwd.q.queue;
+
+	return queue < vsi->num_rxq ? vsi->rx_rings[queue] : NULL;
+}
+
+/**
+ * ice_tc_forward_action - Determine destination VSI and queue for the action
+ * @vsi: Pointer to VSI
+ * @tc_fltr: Pointer to TC flower filter structure
+ *
+ * Validates the tc forward action and determines the destination VSI and queue
+ * for the forward action.
+ */
+static struct ice_vsi *
+ice_tc_forward_action(struct ice_vsi *vsi, struct ice_tc_flower_fltr *tc_fltr)
+{
+	struct ice_rx_ring *ring = NULL;
+	struct ice_vsi *ch_vsi = NULL;
+	struct ice_pf *pf = vsi->back;
+	struct device *dev;
+	u32 tc_class;
+
+	dev = ice_pf_to_dev(pf);
+
+	/* Get the destination VSI and/or destination queue and validate them */
+	switch (tc_fltr->action.fltr_act) {
+	case ICE_FWD_TO_VSI:
+		tc_class = tc_fltr->action.fwd.tc.tc_class;
+		/* Select the destination VSI */
+		if (tc_class < ICE_CHNL_START_TC) {
+			NL_SET_ERR_MSG_MOD(tc_fltr->extack,
+					   "Unable to add filter because of unsupported destination");
+			return ERR_PTR(-EOPNOTSUPP);
+		}
+		/* Locate ADQ VSI depending on hw_tc number */
+		ch_vsi = vsi->tc_map_vsi[tc_class];
+		break;
+	case ICE_FWD_TO_Q:
+		/* Locate the Rx queue */
+		ring = ice_locate_rx_ring_using_queue(vsi, tc_fltr);
+		if (!ring) {
+			dev_err(dev,
+				"Unable to locate Rx queue for action fwd_to_queue: %u\n",
+				tc_fltr->action.fwd.q.queue);
+			return ERR_PTR(-EINVAL);
+		}
+		/* Determine destination VSI even though the action is
+		 * FWD_TO_QUEUE, because QUEUE is associated with VSI
+		 */
+		ch_vsi = tc_fltr->dest_vsi;
+		break;
+	default:
+		dev_err(dev,
+			"Unable to add filter because of unsupported action %u (supported actions: fwd to tc, fwd to queue)\n",
+			tc_fltr->action.fltr_act);
+		return ERR_PTR(-EINVAL);
+	}
+	/* Must have valid ch_vsi (it could be main VSI or ADQ VSI) */
+	if (!ch_vsi) {
+		dev_err(dev,
+			"Unable to add filter because specified destination VSI doesn't exist\n");
+		return ERR_PTR(-EINVAL);
+	}
+	return ch_vsi;
+}
+
 /**
  * ice_add_tc_flower_adv_fltr - add appropriate filter rules
  * @vsi: Pointer to VSI
@@ -727,11 +837,10 @@ ice_add_tc_flower_adv_fltr(struct ice_vsi *vsi,
 		return -EOPNOTSUPP;
 	}
 
-	/* get the channel (aka ADQ VSI) */
-	if (tc_fltr->dest_vsi)
-		ch_vsi = tc_fltr->dest_vsi;
-	else
-		ch_vsi = vsi->tc_map_vsi[tc_fltr->action.tc_class];
+	/* validate forwarding action VSI and queue */
+	ch_vsi = ice_tc_forward_action(vsi, tc_fltr);
+	if (IS_ERR(ch_vsi))
+		return PTR_ERR(ch_vsi);
 
 	lkups_cnt = ice_tc_count_lkups(flags, headers, tc_fltr);
 	list = kcalloc(lkups_cnt, sizeof(*list), GFP_ATOMIC);
@@ -745,30 +854,40 @@ ice_add_tc_flower_adv_fltr(struct ice_vsi *vsi,
 	}
 
 	rule_info.sw_act.fltr_act = tc_fltr->action.fltr_act;
-	if (tc_fltr->action.tc_class >= ICE_CHNL_START_TC) {
-		if (!ch_vsi) {
-			NL_SET_ERR_MSG_MOD(tc_fltr->extack, "Unable to add filter because specified destination doesn't exist");
-			ret = -EINVAL;
-			goto exit;
-		}
+	/* specify the cookie as filter_rule_id */
+	rule_info.fltr_rule_id = tc_fltr->cookie;
 
-		rule_info.sw_act.fltr_act = ICE_FWD_TO_VSI;
+	switch (tc_fltr->action.fltr_act) {
+	case ICE_FWD_TO_VSI:
 		rule_info.sw_act.vsi_handle = ch_vsi->idx;
-		rule_info.priority = 7;
+		rule_info.priority = ICE_SWITCH_FLTR_PRIO_VSI;
 		rule_info.sw_act.src = hw->pf_id;
 		rule_info.rx = true;
 		dev_dbg(dev, "add switch rule for TC:%u vsi_idx:%u, lkups_cnt:%u\n",
-			tc_fltr->action.tc_class,
+			tc_fltr->action.fwd.tc.tc_class,
 			rule_info.sw_act.vsi_handle, lkups_cnt);
-	} else {
+		break;
+	case ICE_FWD_TO_Q:
+		/* HW queue number in global space */
+		rule_info.sw_act.fwd_id.q_id = tc_fltr->action.fwd.q.hw_queue;
+		rule_info.sw_act.vsi_handle = ch_vsi->idx;
+		rule_info.priority = ICE_SWITCH_FLTR_PRIO_QUEUE;
+		rule_info.sw_act.src = hw->pf_id;
+		rule_info.rx = true;
+		dev_dbg(dev, "add switch rule action to forward to queue:%u (HW queue %u), lkups_cnt:%u\n",
+			tc_fltr->action.fwd.q.queue,
+			tc_fltr->action.fwd.q.hw_queue, lkups_cnt);
+		break;
+	default:
 		rule_info.sw_act.flag |= ICE_FLTR_TX;
+		/* In case of Tx (LOOKUP_TX), src needs to be src VSI */
 		rule_info.sw_act.src = vsi->idx;
+		/* 'Rx' is false, direction of rule(LOOKUPTRX) */
 		rule_info.rx = false;
+		rule_info.priority = ICE_SWITCH_FLTR_PRIO_VSI;
+		break;
 	}
 
-	/* specify the cookie as filter_rule_id */
-	rule_info.fltr_rule_id = tc_fltr->cookie;
-
 	ret = ice_add_adv_rule(hw, list, lkups_cnt, &rule_info, &rule_added);
 	if (ret == -EEXIST) {
 		NL_SET_ERR_MSG_MOD(tc_fltr->extack,
@@ -786,19 +905,14 @@ ice_add_tc_flower_adv_fltr(struct ice_vsi *vsi,
 	 */
 	tc_fltr->rid = rule_added.rid;
 	tc_fltr->rule_id = rule_added.rule_id;
-	if (tc_fltr->action.tc_class > 0 && ch_vsi) {
-		/* For PF ADQ, VSI type is set as ICE_VSI_CHNL, and
-		 * for PF ADQ filter, it is not yet set in tc_fltr,
-		 * hence store the dest_vsi ptr in tc_fltr
-		 */
-		if (ch_vsi->type == ICE_VSI_CHNL)
-			tc_fltr->dest_vsi = ch_vsi;
+	tc_fltr->dest_vsi_handle = rule_added.vsi_handle;
+	if (tc_fltr->action.fltr_act == ICE_FWD_TO_VSI ||
+	    tc_fltr->action.fltr_act == ICE_FWD_TO_Q) {
+		tc_fltr->dest_vsi = ch_vsi;
 		/* keep track of advanced switch filter for
-		 * destination VSI (channel VSI)
+		 * destination VSI
 		 */
 		ch_vsi->num_chnl_fltr++;
-		/* in this case, dest_id is VSI handle (sw handle) */
-		tc_fltr->dest_id = rule_added.vsi_handle;
 
 		/* keeps track of channel filters for PF VSI */
 		if (vsi->type == ICE_VSI_PF &&
@@ -806,10 +920,22 @@ ice_add_tc_flower_adv_fltr(struct ice_vsi *vsi,
 			      ICE_TC_FLWR_FIELD_ENC_DST_MAC)))
 			pf->num_dmac_chnl_fltrs++;
 	}
-	dev_dbg(dev, "added switch rule (lkups_cnt %u, flags 0x%x) for TC %u, rid %u, rule_id %u, vsi_idx %u\n",
-		lkups_cnt, flags,
-		tc_fltr->action.tc_class, rule_added.rid,
-		rule_added.rule_id, rule_added.vsi_handle);
+	switch (tc_fltr->action.fltr_act) {
+	case ICE_FWD_TO_VSI:
+		dev_dbg(dev, "added switch rule (lkups_cnt %u, flags 0x%x), action is forward to TC %u, rid %u, rule_id %u, vsi_idx %u\n",
+			lkups_cnt, flags,
+			tc_fltr->action.fwd.tc.tc_class, rule_added.rid,
+			rule_added.rule_id, rule_added.vsi_handle);
+		break;
+	case ICE_FWD_TO_Q:
+		dev_dbg(dev, "added switch rule (lkups_cnt %u, flags 0x%x), action is forward to queue: %u (HW queue %u)     , rid %u, rule_id %u\n",
+			lkups_cnt, flags, tc_fltr->action.fwd.q.queue,
+			tc_fltr->action.fwd.q.hw_queue, rule_added.rid,
+			rule_added.rule_id);
+		break;
+	default:
+		break;
+	}
 exit:
 	kfree(list);
 	return ret;
@@ -1389,43 +1515,15 @@ ice_add_switch_fltr(struct ice_vsi *vsi, struct ice_tc_flower_fltr *fltr)
 }
 
 /**
- * ice_handle_tclass_action - Support directing to a traffic class
+ * ice_prep_adq_filter - Prepare ADQ filter with the required additional headers
  * @vsi: Pointer to VSI
- * @cls_flower: Pointer to TC flower offload structure
  * @fltr: Pointer to TC flower filter structure
  *
- * Support directing traffic to a traffic class
+ * Prepare ADQ filter with the required additional header fields
  */
 static int
-ice_handle_tclass_action(struct ice_vsi *vsi,
-			 struct flow_cls_offload *cls_flower,
-			 struct ice_tc_flower_fltr *fltr)
+ice_prep_adq_filter(struct ice_vsi *vsi, struct ice_tc_flower_fltr *fltr)
 {
-	int tc = tc_classid_to_hwtc(vsi->netdev, cls_flower->classid);
-	struct ice_vsi *main_vsi;
-
-	if (tc < 0) {
-		NL_SET_ERR_MSG_MOD(fltr->extack, "Unable to add filter because specified destination is invalid");
-		return -EINVAL;
-	}
-	if (!tc) {
-		NL_SET_ERR_MSG_MOD(fltr->extack, "Unable to add filter because of invalid destination");
-		return -EINVAL;
-	}
-
-	if (!(vsi->all_enatc & BIT(tc))) {
-		NL_SET_ERR_MSG_MOD(fltr->extack, "Unable to add filter because of non-existence destination");
-		return -EINVAL;
-	}
-
-	/* Redirect to a TC class or Queue Group */
-	main_vsi = ice_get_main_vsi(vsi->back);
-	if (!main_vsi || !main_vsi->netdev) {
-		NL_SET_ERR_MSG_MOD(fltr->extack,
-				   "Unable to add filter because of invalid netdevice");
-		return -EINVAL;
-	}
-
 	if ((fltr->flags & ICE_TC_FLWR_FIELD_TENANT_ID) &&
 	    (fltr->flags & (ICE_TC_FLWR_FIELD_DST_MAC |
 			   ICE_TC_FLWR_FIELD_SRC_MAC))) {
@@ -1437,9 +1535,8 @@ ice_handle_tclass_action(struct ice_vsi *vsi,
 	/* For ADQ, filter must include dest MAC address, otherwise unwanted
 	 * packets with unrelated MAC address get delivered to ADQ VSIs as long
 	 * as remaining filter criteria is satisfied such as dest IP address
-	 * and dest/src L4 port. Following code is trying to handle:
-	 * 1. For non-tunnel, if user specify MAC addresses, use them (means
-	 * this code won't do anything
+	 * and dest/src L4 port. Below code handles the following cases:
+	 * 1. For non-tunnel, if user specify MAC addresses, use them.
 	 * 2. For non-tunnel, if user didn't specify MAC address, add implicit
 	 * dest MAC to be lower netdev's active unicast MAC address
 	 * 3. For tunnel,  as of now TC-filter through flower classifier doesn't
@@ -1462,35 +1559,97 @@ ice_handle_tclass_action(struct ice_vsi *vsi,
 		eth_broadcast_addr(fltr->outer_headers.l2_mask.dst_mac);
 	}
 
-	/* validate specified dest MAC address, make sure either it belongs to
-	 * lower netdev or any of MACVLAN. MACVLANs MAC address are added as
-	 * unicast MAC filter destined to main VSI.
-	 */
-	if (!ice_mac_fltr_exist(&main_vsi->back->hw,
-				fltr->outer_headers.l2_key.dst_mac,
-				main_vsi->idx)) {
-		NL_SET_ERR_MSG_MOD(fltr->extack,
-				   "Unable to add filter because legacy MAC filter for specified destination doesn't exist");
-		return -EINVAL;
-	}
-
 	/* Make sure VLAN is already added to main VSI, before allowing ADQ to
 	 * add a VLAN based filter such as MAC + VLAN + L4 port.
 	 */
 	if (fltr->flags & ICE_TC_FLWR_FIELD_VLAN) {
 		u16 vlan_id = be16_to_cpu(fltr->outer_headers.vlan_hdr.vlan_id);
 
-		if (!ice_vlan_fltr_exist(&main_vsi->back->hw, vlan_id,
-					 main_vsi->idx)) {
+		if (!ice_vlan_fltr_exist(&vsi->back->hw, vlan_id, vsi->idx)) {
 			NL_SET_ERR_MSG_MOD(fltr->extack,
 					   "Unable to add filter because legacy VLAN filter for specified destination doesn't exist");
 			return -EINVAL;
 		}
 	}
+	return 0;
+}
+
+/**
+ * ice_handle_tclass_action - Support directing to a traffic class
+ * @vsi: Pointer to VSI
+ * @cls_flower: Pointer to TC flower offload structure
+ * @fltr: Pointer to TC flower filter structure
+ *
+ * Support directing traffic to a traffic class/queue-set
+ */
+static int
+ice_handle_tclass_action(struct ice_vsi *vsi,
+			 struct flow_cls_offload *cls_flower,
+			 struct ice_tc_flower_fltr *fltr)
+{
+	int tc = tc_classid_to_hwtc(vsi->netdev, cls_flower->classid);
+
+	/* user specified hw_tc (must be non-zero for ADQ TC), action is forward
+	 * to hw_tc (i.e. ADQ channel number)
+	 */
+	if (tc < ICE_CHNL_START_TC) {
+		NL_SET_ERR_MSG_MOD(fltr->extack,
+				   "Unable to add filter because of unsupported destination");
+		return -EOPNOTSUPP;
+	}
+	if (!(vsi->all_enatc & BIT(tc))) {
+		NL_SET_ERR_MSG_MOD(fltr->extack,
+				   "Unable to add filter because of non-existence destination");
+		return -EINVAL;
+	}
 	fltr->action.fltr_act = ICE_FWD_TO_VSI;
-	fltr->action.tc_class = tc;
+	fltr->action.fwd.tc.tc_class = tc;
 
-	return 0;
+	return ice_prep_adq_filter(vsi, fltr);
+}
+
+static int
+ice_tc_forward_to_queue(struct ice_vsi *vsi, struct ice_tc_flower_fltr *fltr,
+			struct flow_action_entry *act)
+{
+	struct ice_vsi *ch_vsi = NULL;
+	u16 queue = act->rx_queue;
+
+	if (queue > vsi->num_rxq) {
+		NL_SET_ERR_MSG_MOD(fltr->extack,
+				   "Unable to add filter because specified queue is invalid");
+		return -EINVAL;
+	}
+	fltr->action.fltr_act = ICE_FWD_TO_Q;
+	fltr->action.fwd.q.queue = queue;
+	/* determine corresponding HW queue */
+	fltr->action.fwd.q.hw_queue = vsi->rxq_map[queue];
+
+	/* If ADQ is configured, and the queue belongs to ADQ VSI, then prepare
+	 * ADQ switch filter
+	 */
+	ch_vsi = ice_locate_vsi_using_queue(vsi, fltr);
+	if (!ch_vsi)
+		return -EINVAL;
+	fltr->dest_vsi = ch_vsi;
+	if (!ice_is_chnl_fltr(fltr))
+		return 0;
+
+	return ice_prep_adq_filter(vsi, fltr);
+}
+
+static int
+ice_tc_parse_action(struct ice_vsi *vsi, struct ice_tc_flower_fltr *fltr,
+		    struct flow_action_entry *act)
+{
+	switch (act->id) {
+	case FLOW_ACTION_RX_QUEUE_MAPPING:
+		/* forward to queue */
+		return ice_tc_forward_to_queue(vsi, fltr, act);
+	default:
+		NL_SET_ERR_MSG_MOD(fltr->extack, "Unsupported TC action");
+		return -EOPNOTSUPP;
+	}
 }
 
 /**
@@ -1509,7 +1668,7 @@ ice_parse_tc_flower_actions(struct ice_vsi *vsi,
 	struct flow_rule *rule = flow_cls_offload_flow_rule(cls_flower);
 	struct flow_action *flow_action = &rule->action;
 	struct flow_action_entry *act;
-	int i;
+	int i, err;
 
 	if (cls_flower->classid)
 		return ice_handle_tclass_action(vsi, cls_flower, fltr);
@@ -1518,21 +1677,13 @@ ice_parse_tc_flower_actions(struct ice_vsi *vsi,
 		return -EINVAL;
 
 	flow_action_for_each(i, act, flow_action) {
-		if (ice_is_eswitch_mode_switchdev(vsi->back)) {
-			int err = ice_eswitch_tc_parse_action(fltr, act);
-
-			if (err)
-				return err;
-			continue;
-		}
-		/* Allow only one rule per filter */
-
-		/* Drop action */
-		if (act->id == FLOW_ACTION_DROP) {
-			NL_SET_ERR_MSG_MOD(fltr->extack, "Unsupported action DROP");
-			return -EINVAL;
-		}
-		fltr->action.fltr_act = ICE_FWD_TO_VSI;
+		if (ice_is_eswitch_mode_switchdev(vsi->back))
+			err = ice_eswitch_tc_parse_action(fltr, act);
+		else
+			err = ice_tc_parse_action(vsi, fltr, act);
+		if (err)
+			return err;
+		continue;
 	}
 	return 0;
 }
@@ -1552,7 +1703,7 @@ static int ice_del_tc_fltr(struct ice_vsi *vsi, struct ice_tc_flower_fltr *fltr)
 
 	rule_rem.rid = fltr->rid;
 	rule_rem.rule_id = fltr->rule_id;
-	rule_rem.vsi_handle = fltr->dest_id;
+	rule_rem.vsi_handle = fltr->dest_vsi_handle;
 	err = ice_rem_adv_rule_by_id(&pf->hw, &rule_rem);
 	if (err) {
 		if (err == -ENOENT) {
diff --git a/drivers/net/ethernet/intel/ice/ice_tc_lib.h b/drivers/net/ethernet/intel/ice/ice_tc_lib.h
index f397ed02606d..248f979db6ef 100644
--- a/drivers/net/ethernet/intel/ice/ice_tc_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_tc_lib.h
@@ -42,7 +42,20 @@ struct ice_indr_block_priv {
 };
 
 struct ice_tc_flower_action {
-	u32 tc_class;
+	/* forward action specific params */
+	union {
+		struct {
+			u32 tc_class; /* forward to hw_tc */
+			u32 rsvd;
+		} tc;
+		struct {
+			u16 queue; /* forward to queue */
+			/* To add filter in HW, absolute queue number in global
+			 * space of queues (between 0...N) is needed
+			 */
+			u16 hw_queue;
+		} q;
+	} fwd;
 	enum ice_sw_fwd_act_type fltr_act;
 };
 
@@ -123,11 +136,11 @@ struct ice_tc_flower_fltr {
 	 */
 	u16 rid;
 	u16 rule_id;
-	/* this could be queue/vsi_idx (sw handle)/queue_group, depending upon
-	 * destination type
+	/* VSI handle of the destination VSI (it could be main PF VSI, CHNL_VSI,
+	 * VF VSI)
 	 */
-	u16 dest_id;
-	/* if dest_id is vsi_idx, then need to store destination VSI ptr */
+	u16 dest_vsi_handle;
+	/* ptr to destination VSI */
 	struct ice_vsi *dest_vsi;
 	/* direction of fltr for eswitch use case */
 	enum ice_eswitch_fltr_direction direction;
@@ -154,12 +167,23 @@ struct ice_tc_flower_fltr {
  * @f: Pointer to tc-flower filter
  *
  * Criteria to determine of given filter is valid channel filter
- * or not is based on its "destination". If destination is hw_tc (aka tc_class)
- * and it is non-zero, then it is valid channel (aka ADQ) filter
+ * or not is based on its destination.
+ * For forward to VSI action, if destination is valid hw_tc (aka tc_class)
+ * and in supported range of TCs for ADQ, then return true.
+ * For forward to queue, as long as dest_vsi is valid and it is of type
+ * VSI_CHNL (PF ADQ VSI is of type VSI_CHNL), return true.
+ * NOTE: For forward to queue, correct dest_vsi is still set in tc_fltr based
+ * on destination queue specified.
  */
 static inline bool ice_is_chnl_fltr(struct ice_tc_flower_fltr *f)
 {
-	return !!f->action.tc_class;
+	if (f->action.fltr_act == ICE_FWD_TO_VSI)
+		return f->action.fwd.tc.tc_class >= ICE_CHNL_START_TC &&
+		       f->action.fwd.tc.tc_class < ICE_CHNL_MAX_TC;
+	else if (f->action.fltr_act == ICE_FWD_TO_Q)
+		return f->dest_vsi && f->dest_vsi->type == ICE_VSI_CHNL;
+
+	return false;
 }
 
 /**


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net-next PATCH v2 4/4] Documentation: networking: TC queue based filtering
  2022-09-08  1:23 [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping Amritha Nambiar
                   ` (2 preceding siblings ...)
  2022-09-08  1:24 ` [net-next PATCH v2 3/4] ice: Enable RX queue selection using skbedit action Amritha Nambiar
@ 2022-09-08  1:24 ` Amritha Nambiar
  2022-09-08 17:34   ` kernel test robot
  2022-09-08 15:27 ` [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping Alexander Duyck
  2022-09-21 20:29 ` Jakub Kicinski
  5 siblings, 1 reply; 18+ messages in thread
From: Amritha Nambiar @ 2022-09-08  1:24 UTC (permalink / raw)
  To: netdev, kuba
  Cc: alexander.duyck, jhs, jiri, xiyou.wangcong, vinicius.gomes,
	sridhar.samudrala, amritha.nambiar

Added tc-queue-filters.rst with notes on TC filters for
selecting a set of queues and/or a queue.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 Documentation/networking/tc-queue-filters.rst |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)
 create mode 100644 Documentation/networking/tc-queue-filters.rst

diff --git a/Documentation/networking/tc-queue-filters.rst b/Documentation/networking/tc-queue-filters.rst
new file mode 100644
index 000000000000..5839ddd23eb4
--- /dev/null
+++ b/Documentation/networking/tc-queue-filters.rst
@@ -0,0 +1,24 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================
+TC queue based filtering
+=========================
+
+TC can be used for directing traffic to either a set of queues or
+on a single queue on both the transmit and receive side.
+
+On the transmit side:
+1. TC filter directing traffic to a set of queues is achieved
+   using the action skbedit priority for Tx priority selection,
+   the priority maps to a traffic class (set of queues).
+2. TC filter directs traffic to a transmit queue with the action
+   skbedit queue_mapping $tx_qid.
+
+Likewise, on the receive side, the two filters for selecting set of
+queues and/or a single queue is supported as below:
+1. TC flower filter directs traffic to a set of queues using the
+   'hw_tc' option of tc-flower.
+   hw_tc $TCID - Specify a hardware traffic class to pass matching
+   packets on to. TCID is in the range 0 through 15.
+2. TC filter with action skbedit queue_mapping $rx_qid selects a
+   receive queue.


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping
  2022-09-08  1:23 [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping Amritha Nambiar
                   ` (3 preceding siblings ...)
  2022-09-08  1:24 ` [net-next PATCH v2 4/4] Documentation: networking: TC queue based filtering Amritha Nambiar
@ 2022-09-08 15:27 ` Alexander Duyck
  2022-09-09  9:18   ` Nambiar, Amritha
  2022-09-21 20:29 ` Jakub Kicinski
  5 siblings, 1 reply; 18+ messages in thread
From: Alexander Duyck @ 2022-09-08 15:27 UTC (permalink / raw)
  To: Amritha Nambiar
  Cc: netdev, kuba, jhs, jiri, xiyou.wangcong, vinicius.gomes,
	sridhar.samudrala

On Wed, Sep 7, 2022 at 6:14 PM Amritha Nambiar
<amritha.nambiar@intel.com> wrote:
>
> Based on the discussion on
> https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/,
> the following series extends skbedit tc action to RX queue mapping.
> Currently, skbedit action in tc allows overriding of transmit queue.
> Extending this ability of skedit action supports the selection of receive
> queue for incoming packets. Offloading this action is added for receive
> side. Enabled ice driver to offload this type of filter into the
> hardware for accepting packets to the device's receive queue.
>
> v2: Added documentation in Documentation/networking
>
> ---
>
> Amritha Nambiar (4):
>       act_skbedit: Add support for action skbedit RX queue mapping
>       act_skbedit: Offload skbedit queue mapping for receive queue
>       ice: Enable RX queue selection using skbedit action
>       Documentation: networking: TC queue based filtering

I don't think skbedit is the right thing to be updating for this. In
the case of Tx we were using it because at the time we stored the
sockets Tx queue in the skb, so it made sense to edit it there if we
wanted to tweak things before it got to the qdisc layer. However it
didn't have a direct impact on the hardware and only really affected
the software routing in the device, which eventually resulted in which
hardware queue and qdisc was selected.

The problem with editing the receive queue is that the hardware
offloaded case versus the software offloaded can have very different
behaviors. I wonder if this wouldn't be better served by being an
extension of the mirred ingress redirect action which is already used
for multiple hardware offloads as I recall.

In this case you would want to be redirecting packets received on a
port to being received on a specific queue on that port. By using the
redirect action it would take the packet out of the receive path and
reinsert it, being able to account for anything such as the RPS
configuration on the device so the behavior would be closer to what
the hardware offloaded behavior would be.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [net-next PATCH v2 4/4] Documentation: networking: TC queue based filtering
  2022-09-08  1:24 ` [net-next PATCH v2 4/4] Documentation: networking: TC queue based filtering Amritha Nambiar
@ 2022-09-08 17:34   ` kernel test robot
  0 siblings, 0 replies; 18+ messages in thread
From: kernel test robot @ 2022-09-08 17:34 UTC (permalink / raw)
  To: Amritha Nambiar, netdev, kuba
  Cc: kbuild-all, alexander.duyck, jhs, jiri, xiyou.wangcong,
	vinicius.gomes, sridhar.samudrala, amritha.nambiar

Hi Amritha,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Amritha-Nambiar/Extend-action-skbedit-to-RX-queue-mapping/20220908-091523
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 2018b22a759e26a4c7e3ac6c60c283cfbd2c9c93
reproduce:
        # https://github.com/intel-lab-lkp/linux/commit/1c8a93c17a4a1a9ba7be2aba8b1886d12ea14d8c
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Amritha-Nambiar/Extend-action-skbedit-to-RX-queue-mapping/20220908-091523
        git checkout 1c8a93c17a4a1a9ba7be2aba8b1886d12ea14d8c
        make menuconfig
        # enable CONFIG_COMPILE_TEST, CONFIG_WARN_MISSING_DOCUMENTS, CONFIG_WARN_ABI_ERRORS
        make htmldocs

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> Documentation/networking/tc-queue-filters.rst:12: WARNING: Unexpected indentation.
>> Documentation/networking/tc-queue-filters.rst:14: WARNING: Block quote ends without a blank line; unexpected unindent.
>> Documentation/networking/tc-queue-filters.rst: WARNING: document isn't included in any toctree

vim +12 Documentation/networking/tc-queue-filters.rst

     9	
    10	On the transmit side:
    11	1. TC filter directing traffic to a set of queues is achieved
  > 12	   using the action skbedit priority for Tx priority selection,
    13	   the priority maps to a traffic class (set of queues).
  > 14	2. TC filter directs traffic to a transmit queue with the action
    15	   skbedit queue_mapping $tx_qid.
    16	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping
  2022-09-08 15:27 ` [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping Alexander Duyck
@ 2022-09-09  9:18   ` Nambiar, Amritha
  2022-09-09 18:35     ` Alexander Duyck
  0 siblings, 1 reply; 18+ messages in thread
From: Nambiar, Amritha @ 2022-09-09  9:18 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, kuba, jhs, jiri, xiyou.wangcong, Gomes, Vinicius,
	Samudrala, Sridhar

> -----Original Message-----
> From: Alexander Duyck <alexander.duyck@gmail.com>
> Sent: Thursday, September 8, 2022 8:28 AM
> To: Nambiar, Amritha <amritha.nambiar@intel.com>
> Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com;
> jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius
> <vinicius.gomes@intel.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>
> Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue
> mapping
> 
> On Wed, Sep 7, 2022 at 6:14 PM Amritha Nambiar
> <amritha.nambiar@intel.com> wrote:
> >
> > Based on the discussion on
> > https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/,
> > the following series extends skbedit tc action to RX queue mapping.
> > Currently, skbedit action in tc allows overriding of transmit queue.
> > Extending this ability of skedit action supports the selection of receive
> > queue for incoming packets. Offloading this action is added for receive
> > side. Enabled ice driver to offload this type of filter into the
> > hardware for accepting packets to the device's receive queue.
> >
> > v2: Added documentation in Documentation/networking
> >
> > ---
> >
> > Amritha Nambiar (4):
> >       act_skbedit: Add support for action skbedit RX queue mapping
> >       act_skbedit: Offload skbedit queue mapping for receive queue
> >       ice: Enable RX queue selection using skbedit action
> >       Documentation: networking: TC queue based filtering
> 
> I don't think skbedit is the right thing to be updating for this. In
> the case of Tx we were using it because at the time we stored the
> sockets Tx queue in the skb, so it made sense to edit it there if we
> wanted to tweak things before it got to the qdisc layer. However it
> didn't have a direct impact on the hardware and only really affected
> the software routing in the device, which eventually resulted in which
> hardware queue and qdisc was selected.
> 
> The problem with editing the receive queue is that the hardware
> offloaded case versus the software offloaded can have very different
> behaviors. I wonder if this wouldn't be better served by being an

Could you please explain how the hardware offload and software cases
behave differently in the skbedit case. From Jakub's suggestion on
https://lore.kernel.org/netdev/20220503084732.363b89cc@kernel.org/,
it looked like the skbedit action fits better to align the hardware and
software description of RX queue offload (considering the skb metadata
remains same in offload vs no-offload case).

> extension of the mirred ingress redirect action which is already used
> for multiple hardware offloads as I recall.
> 
> In this case you would want to be redirecting packets received on a
> port to being received on a specific queue on that port. By using the
> redirect action it would take the packet out of the receive path and
> reinsert it, being able to account for anything such as the RPS
> configuration on the device so the behavior would be closer to what
> the hardware offloaded behavior would be.

Wouldn't this be an overkill as we only want to accept packets into a 
predetermined queue? IIUC, the mirred redirect action typically moves
packets from one interface to another, the filter is added on interface
different from the destination interface. In our case, with the
destination interface being the same, I am not understanding the need
for a loopback. Also, WRT to RPS, not sure I understand the impact
here. In hardware, once the offloaded filter executes to select the queue,
RSS does not run. In software, if RPS executes before
sch_handle_ingress(), wouldn't any tc-actions (mirred redirect or skbedit
overriding the queue) behave in similar way ? 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping
  2022-09-09  9:18   ` Nambiar, Amritha
@ 2022-09-09 18:35     ` Alexander Duyck
  2022-09-15  8:45       ` Nambiar, Amritha
  0 siblings, 1 reply; 18+ messages in thread
From: Alexander Duyck @ 2022-09-09 18:35 UTC (permalink / raw)
  To: Nambiar, Amritha
  Cc: netdev, kuba, jhs, jiri, xiyou.wangcong, Gomes, Vinicius,
	Samudrala, Sridhar

On Fri, Sep 9, 2022 at 2:18 AM Nambiar, Amritha
<amritha.nambiar@intel.com> wrote:
>
> > -----Original Message-----
> > From: Alexander Duyck <alexander.duyck@gmail.com>
> > Sent: Thursday, September 8, 2022 8:28 AM
> > To: Nambiar, Amritha <amritha.nambiar@intel.com>
> > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com;
> > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius
> > <vinicius.gomes@intel.com>; Samudrala, Sridhar
> > <sridhar.samudrala@intel.com>
> > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue
> > mapping
> >
> > On Wed, Sep 7, 2022 at 6:14 PM Amritha Nambiar
> > <amritha.nambiar@intel.com> wrote:
> > >
> > > Based on the discussion on
> > > https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/,
> > > the following series extends skbedit tc action to RX queue mapping.
> > > Currently, skbedit action in tc allows overriding of transmit queue.
> > > Extending this ability of skedit action supports the selection of receive
> > > queue for incoming packets. Offloading this action is added for receive
> > > side. Enabled ice driver to offload this type of filter into the
> > > hardware for accepting packets to the device's receive queue.
> > >
> > > v2: Added documentation in Documentation/networking
> > >
> > > ---
> > >
> > > Amritha Nambiar (4):
> > >       act_skbedit: Add support for action skbedit RX queue mapping
> > >       act_skbedit: Offload skbedit queue mapping for receive queue
> > >       ice: Enable RX queue selection using skbedit action
> > >       Documentation: networking: TC queue based filtering
> >
> > I don't think skbedit is the right thing to be updating for this. In
> > the case of Tx we were using it because at the time we stored the
> > sockets Tx queue in the skb, so it made sense to edit it there if we
> > wanted to tweak things before it got to the qdisc layer. However it
> > didn't have a direct impact on the hardware and only really affected
> > the software routing in the device, which eventually resulted in which
> > hardware queue and qdisc was selected.
> >
> > The problem with editing the receive queue is that the hardware
> > offloaded case versus the software offloaded can have very different
> > behaviors. I wonder if this wouldn't be better served by being an
>
> Could you please explain how the hardware offload and software cases
> behave differently in the skbedit case. From Jakub's suggestion on
> https://lore.kernel.org/netdev/20220503084732.363b89cc@kernel.org/,
> it looked like the skbedit action fits better to align the hardware and
> software description of RX queue offload (considering the skb metadata
> remains same in offload vs no-offload case).

So specifically my concern is RPS. The problem is RPS takes place
before your TC rule would be applied in the software case, but after
it has been applied in the hardware case. As a result the behavior
will be different for one versus the other. With the redirect action
it will pull the packet out of the Rx pipeline and reinsert it so that
RPS will be applied to the packet and it would be received on the CPUs
expected.

> > extension of the mirred ingress redirect action which is already used
> > for multiple hardware offloads as I recall.
> >
> > In this case you would want to be redirecting packets received on a
> > port to being received on a specific queue on that port. By using the
> > redirect action it would take the packet out of the receive path and
> > reinsert it, being able to account for anything such as the RPS
> > configuration on the device so the behavior would be closer to what
> > the hardware offloaded behavior would be.
>
> Wouldn't this be an overkill as we only want to accept packets into a
> predetermined queue? IIUC, the mirred redirect action typically moves
> packets from one interface to another, the filter is added on interface
> different from the destination interface. In our case, with the
> destination interface being the same, I am not understanding the need
> for a loopback. Also, WRT to RPS, not sure I understand the impact
> here. In hardware, once the offloaded filter executes to select the queue,
> RSS does not run. In software, if RPS executes before
> sch_handle_ingress(), wouldn't any tc-actions (mirred redirect or skbedit
> overriding the queue) behave in similar way ?

The problem is that RPS != RSS. You can use the two together to spread
work out over a greater set of queues. So for example in a NUMA system
with multiple sockets/nodes you might use RSS to split the work up
into a per-node queue(s), and then use RPS to split up the work across
CPUs within that node. If you pick a packet up from one device and
redirect it via the mirred action the RPS is applied as though the
packet was received on the device so the RPS queue would be correct
assuming you updated the queue. That is what I am looking for. What
this patch is doing is creating a situation where the effect is very
different between the hardware and software version of things which
would likely break things for a use case such as this.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping
  2022-09-09 18:35     ` Alexander Duyck
@ 2022-09-15  8:45       ` Nambiar, Amritha
  2022-09-15 15:21         ` Alexander Duyck
  0 siblings, 1 reply; 18+ messages in thread
From: Nambiar, Amritha @ 2022-09-15  8:45 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, kuba, jhs, jiri, xiyou.wangcong, Gomes, Vinicius,
	Samudrala, Sridhar

> -----Original Message-----
> From: Alexander Duyck <alexander.duyck@gmail.com>
> Sent: Friday, September 9, 2022 11:35 AM
> To: Nambiar, Amritha <amritha.nambiar@intel.com>
> Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com;
> jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius
> <vinicius.gomes@intel.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>
> Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue
> mapping
> 
> On Fri, Sep 9, 2022 at 2:18 AM Nambiar, Amritha
> <amritha.nambiar@intel.com> wrote:
> >
> > > -----Original Message-----
> > > From: Alexander Duyck <alexander.duyck@gmail.com>
> > > Sent: Thursday, September 8, 2022 8:28 AM
> > > To: Nambiar, Amritha <amritha.nambiar@intel.com>
> > > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com;
> > > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius
> > > <vinicius.gomes@intel.com>; Samudrala, Sridhar
> > > <sridhar.samudrala@intel.com>
> > > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue
> > > mapping
> > >
> > > On Wed, Sep 7, 2022 at 6:14 PM Amritha Nambiar
> > > <amritha.nambiar@intel.com> wrote:
> > > >
> > > > Based on the discussion on
> > > >
> https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/,
> > > > the following series extends skbedit tc action to RX queue mapping.
> > > > Currently, skbedit action in tc allows overriding of transmit queue.
> > > > Extending this ability of skedit action supports the selection of receive
> > > > queue for incoming packets. Offloading this action is added for receive
> > > > side. Enabled ice driver to offload this type of filter into the
> > > > hardware for accepting packets to the device's receive queue.
> > > >
> > > > v2: Added documentation in Documentation/networking
> > > >
> > > > ---
> > > >
> > > > Amritha Nambiar (4):
> > > >       act_skbedit: Add support for action skbedit RX queue mapping
> > > >       act_skbedit: Offload skbedit queue mapping for receive queue
> > > >       ice: Enable RX queue selection using skbedit action
> > > >       Documentation: networking: TC queue based filtering
> > >
> > > I don't think skbedit is the right thing to be updating for this. In
> > > the case of Tx we were using it because at the time we stored the
> > > sockets Tx queue in the skb, so it made sense to edit it there if we
> > > wanted to tweak things before it got to the qdisc layer. However it
> > > didn't have a direct impact on the hardware and only really affected
> > > the software routing in the device, which eventually resulted in which
> > > hardware queue and qdisc was selected.
> > >
> > > The problem with editing the receive queue is that the hardware
> > > offloaded case versus the software offloaded can have very different
> > > behaviors. I wonder if this wouldn't be better served by being an
> >
> > Could you please explain how the hardware offload and software cases
> > behave differently in the skbedit case. From Jakub's suggestion on
> > https://lore.kernel.org/netdev/20220503084732.363b89cc@kernel.org/,
> > it looked like the skbedit action fits better to align the hardware and
> > software description of RX queue offload (considering the skb metadata
> > remains same in offload vs no-offload case).
> 
> So specifically my concern is RPS. The problem is RPS takes place
> before your TC rule would be applied in the software case, but after
> it has been applied in the hardware case. As a result the behavior
> will be different for one versus the other. With the redirect action
> it will pull the packet out of the Rx pipeline and reinsert it so that
> RPS will be applied to the packet and it would be received on the CPUs
> expected.
> 

Okay, so I understand that without HW offload, the SW behavior would
not align for RPS, i.e., RPS CPU would be from a queue (already selected
by HW, RSS etc.),  and may not align with the queue selected from
the SW TC rule. And I see your point, the solution to this would be
reinserting the packet after updating the queue. But, as I look more into
this, there are still some more concerns I have.

IIUC, we may be looking at a potential TC rule as below:
tc filter add dev ethX ingress ... \
action mirred ingress redirect dev ethX rxqueue <rx_qid>

It looks to me that this configuration could possibly result in loops
recursively calling act_mirred. Since the redirection is from ingress
to ingress on the same device, when the packet is reinserted into the
RX pipeline of the same device, RPS and tc classification happens again,
the tc filter with act mirred executes redirecting and reinserting the
packet again. act_mirred keeps a CPU counter of recursive calls for the
action and drops the packet when the limit is reached. 
If this is a valid configuration, I cannot find any code that perhaps uses
a combination of skb->redirect and skb->from_ingress to check and
prevent recursive classification (further execution of TC mirred redirect
action).

Also, since reinserting the packet after updating the queue would fix
the RPS inconsistency, can this be done from the skbedit action instead
of mirred redirect ? So, if skbedit action is used for Rx queue selection,
maybe this sequence helps:

RPS on RX q1 -> TC action skbedit RX q2 -> 
always reinsert if action skbedit is on RX -> RPS on RX q2 -> 
stop further execution of TC action RX skbedit

> > > extension of the mirred ingress redirect action which is already used
> > > for multiple hardware offloads as I recall.
> > >
> > > In this case you would want to be redirecting packets received on a
> > > port to being received on a specific queue on that port. By using the
> > > redirect action it would take the packet out of the receive path and
> > > reinsert it, being able to account for anything such as the RPS
> > > configuration on the device so the behavior would be closer to what
> > > the hardware offloaded behavior would be.
> >
> > Wouldn't this be an overkill as we only want to accept packets into a
> > predetermined queue? IIUC, the mirred redirect action typically moves
> > packets from one interface to another, the filter is added on interface
> > different from the destination interface. In our case, with the
> > destination interface being the same, I am not understanding the need
> > for a loopback. Also, WRT to RPS, not sure I understand the impact
> > here. In hardware, once the offloaded filter executes to select the queue,
> > RSS does not run. In software, if RPS executes before
> > sch_handle_ingress(), wouldn't any tc-actions (mirred redirect or skbedit
> > overriding the queue) behave in similar way ?
> 
> The problem is that RPS != RSS. You can use the two together to spread
> work out over a greater set of queues. So for example in a NUMA system
> with multiple sockets/nodes you might use RSS to split the work up
> into a per-node queue(s), and then use RPS to split up the work across
> CPUs within that node. If you pick a packet up from one device and
> redirect it via the mirred action the RPS is applied as though the
> packet was received on the device so the RPS queue would be correct
> assuming you updated the queue. That is what I am looking for. What
> this patch is doing is creating a situation where the effect is very
> different between the hardware and software version of things which
> would likely break things for a use case such as this.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping
  2022-09-15  8:45       ` Nambiar, Amritha
@ 2022-09-15 15:21         ` Alexander Duyck
  2022-09-15 22:09           ` Nambiar, Amritha
  0 siblings, 1 reply; 18+ messages in thread
From: Alexander Duyck @ 2022-09-15 15:21 UTC (permalink / raw)
  To: Nambiar, Amritha
  Cc: netdev, kuba, jhs, jiri, xiyou.wangcong, Gomes, Vinicius,
	Samudrala, Sridhar

On Thu, Sep 15, 2022 at 1:45 AM Nambiar, Amritha
<amritha.nambiar@intel.com> wrote:
>
> > -----Original Message-----
> > From: Alexander Duyck <alexander.duyck@gmail.com>
> > Sent: Friday, September 9, 2022 11:35 AM
> > To: Nambiar, Amritha <amritha.nambiar@intel.com>
> > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com;
> > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius
> > <vinicius.gomes@intel.com>; Samudrala, Sridhar
> > <sridhar.samudrala@intel.com>
> > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue
> > mapping
> >
> > On Fri, Sep 9, 2022 at 2:18 AM Nambiar, Amritha
> > <amritha.nambiar@intel.com> wrote:
> > >
> > > > -----Original Message-----
> > > > From: Alexander Duyck <alexander.duyck@gmail.com>
> > > > Sent: Thursday, September 8, 2022 8:28 AM
> > > > To: Nambiar, Amritha <amritha.nambiar@intel.com>
> > > > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com;
> > > > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius
> > > > <vinicius.gomes@intel.com>; Samudrala, Sridhar
> > > > <sridhar.samudrala@intel.com>
> > > > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue
> > > > mapping
> > > >
> > > > On Wed, Sep 7, 2022 at 6:14 PM Amritha Nambiar
> > > > <amritha.nambiar@intel.com> wrote:
> > > > >
> > > > > Based on the discussion on
> > > > >
> > https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/,
> > > > > the following series extends skbedit tc action to RX queue mapping.
> > > > > Currently, skbedit action in tc allows overriding of transmit queue.
> > > > > Extending this ability of skedit action supports the selection of receive
> > > > > queue for incoming packets. Offloading this action is added for receive
> > > > > side. Enabled ice driver to offload this type of filter into the
> > > > > hardware for accepting packets to the device's receive queue.
> > > > >
> > > > > v2: Added documentation in Documentation/networking
> > > > >
> > > > > ---
> > > > >
> > > > > Amritha Nambiar (4):
> > > > >       act_skbedit: Add support for action skbedit RX queue mapping
> > > > >       act_skbedit: Offload skbedit queue mapping for receive queue
> > > > >       ice: Enable RX queue selection using skbedit action
> > > > >       Documentation: networking: TC queue based filtering
> > > >
> > > > I don't think skbedit is the right thing to be updating for this. In
> > > > the case of Tx we were using it because at the time we stored the
> > > > sockets Tx queue in the skb, so it made sense to edit it there if we
> > > > wanted to tweak things before it got to the qdisc layer. However it
> > > > didn't have a direct impact on the hardware and only really affected
> > > > the software routing in the device, which eventually resulted in which
> > > > hardware queue and qdisc was selected.
> > > >
> > > > The problem with editing the receive queue is that the hardware
> > > > offloaded case versus the software offloaded can have very different
> > > > behaviors. I wonder if this wouldn't be better served by being an
> > >
> > > Could you please explain how the hardware offload and software cases
> > > behave differently in the skbedit case. From Jakub's suggestion on
> > > https://lore.kernel.org/netdev/20220503084732.363b89cc@kernel.org/,
> > > it looked like the skbedit action fits better to align the hardware and
> > > software description of RX queue offload (considering the skb metadata
> > > remains same in offload vs no-offload case).
> >
> > So specifically my concern is RPS. The problem is RPS takes place
> > before your TC rule would be applied in the software case, but after
> > it has been applied in the hardware case. As a result the behavior
> > will be different for one versus the other. With the redirect action
> > it will pull the packet out of the Rx pipeline and reinsert it so that
> > RPS will be applied to the packet and it would be received on the CPUs
> > expected.
> >
>
> Okay, so I understand that without HW offload, the SW behavior would
> not align for RPS, i.e., RPS CPU would be from a queue (already selected
> by HW, RSS etc.),  and may not align with the queue selected from
> the SW TC rule. And I see your point, the solution to this would be
> reinserting the packet after updating the queue. But, as I look more into
> this, there are still some more concerns I have.
>
> IIUC, we may be looking at a potential TC rule as below:
> tc filter add dev ethX ingress ... \
> action mirred ingress redirect dev ethX rxqueue <rx_qid>
>
> It looks to me that this configuration could possibly result in loops
> recursively calling act_mirred. Since the redirection is from ingress
> to ingress on the same device, when the packet is reinserted into the
> RX pipeline of the same device, RPS and tc classification happens again,
> the tc filter with act mirred executes redirecting and reinserting the
> packet again. act_mirred keeps a CPU counter of recursive calls for the
> action and drops the packet when the limit is reached.
> If this is a valid configuration, I cannot find any code that perhaps uses
> a combination of skb->redirect and skb->from_ingress to check and
> prevent recursive classification (further execution of TC mirred redirect
> action).

The recursion problem is easily solved by simply not requeueing again
if the packet is on the queue it is supposed to be. If you have rules
that are bouncing the traffic between two queues it wouldn't be any
different than the potential issue of bouncing traffic between two
netdevs which is why the recursion counters were added.

> Also, since reinserting the packet after updating the queue would fix
> the RPS inconsistency, can this be done from the skbedit action instead
> of mirred redirect ? So, if skbedit action is used for Rx queue selection,
> maybe this sequence helps:
>
> RPS on RX q1 -> TC action skbedit RX q2 ->
> always reinsert if action skbedit is on RX -> RPS on RX q2 ->
> stop further execution of TC action RX skbedit

That is changing the function of skbedit. In addition the skbedit
action isn't meant to be offloaded. The skbedit action is only really
supposed to edit skb medatada, it isn't supposed to take other actions
on the frame. What we want to avoid is making skbedit into another
mirred.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping
  2022-09-15 15:21         ` Alexander Duyck
@ 2022-09-15 22:09           ` Nambiar, Amritha
  2022-09-16 15:58             ` Alexander Duyck
  0 siblings, 1 reply; 18+ messages in thread
From: Nambiar, Amritha @ 2022-09-15 22:09 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, kuba, jhs, jiri, xiyou.wangcong, Gomes, Vinicius,
	Samudrala, Sridhar

> -----Original Message-----
> From: Alexander Duyck <alexander.duyck@gmail.com>
> Sent: Thursday, September 15, 2022 8:21 AM
> To: Nambiar, Amritha <amritha.nambiar@intel.com>
> Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com;
> jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius
> <vinicius.gomes@intel.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>
> Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue
> mapping
> 
> On Thu, Sep 15, 2022 at 1:45 AM Nambiar, Amritha
> <amritha.nambiar@intel.com> wrote:
> >
> > > -----Original Message-----
> > > From: Alexander Duyck <alexander.duyck@gmail.com>
> > > Sent: Friday, September 9, 2022 11:35 AM
> > > To: Nambiar, Amritha <amritha.nambiar@intel.com>
> > > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com;
> > > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius
> > > <vinicius.gomes@intel.com>; Samudrala, Sridhar
> > > <sridhar.samudrala@intel.com>
> > > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue
> > > mapping
> > >
> > > On Fri, Sep 9, 2022 at 2:18 AM Nambiar, Amritha
> > > <amritha.nambiar@intel.com> wrote:
> > > >
> > > > > -----Original Message-----
> > > > > From: Alexander Duyck <alexander.duyck@gmail.com>
> > > > > Sent: Thursday, September 8, 2022 8:28 AM
> > > > > To: Nambiar, Amritha <amritha.nambiar@intel.com>
> > > > > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com;
> > > > > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius
> > > > > <vinicius.gomes@intel.com>; Samudrala, Sridhar
> > > > > <sridhar.samudrala@intel.com>
> > > > > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX
> queue
> > > > > mapping
> > > > >
> > > > > On Wed, Sep 7, 2022 at 6:14 PM Amritha Nambiar
> > > > > <amritha.nambiar@intel.com> wrote:
> > > > > >
> > > > > > Based on the discussion on
> > > > > >
> > > https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/,
> > > > > > the following series extends skbedit tc action to RX queue mapping.
> > > > > > Currently, skbedit action in tc allows overriding of transmit queue.
> > > > > > Extending this ability of skedit action supports the selection of
> receive
> > > > > > queue for incoming packets. Offloading this action is added for
> receive
> > > > > > side. Enabled ice driver to offload this type of filter into the
> > > > > > hardware for accepting packets to the device's receive queue.
> > > > > >
> > > > > > v2: Added documentation in Documentation/networking
> > > > > >
> > > > > > ---
> > > > > >
> > > > > > Amritha Nambiar (4):
> > > > > >       act_skbedit: Add support for action skbedit RX queue mapping
> > > > > >       act_skbedit: Offload skbedit queue mapping for receive queue
> > > > > >       ice: Enable RX queue selection using skbedit action
> > > > > >       Documentation: networking: TC queue based filtering
> > > > >
> > > > > I don't think skbedit is the right thing to be updating for this. In
> > > > > the case of Tx we were using it because at the time we stored the
> > > > > sockets Tx queue in the skb, so it made sense to edit it there if we
> > > > > wanted to tweak things before it got to the qdisc layer. However it
> > > > > didn't have a direct impact on the hardware and only really affected
> > > > > the software routing in the device, which eventually resulted in which
> > > > > hardware queue and qdisc was selected.
> > > > >
> > > > > The problem with editing the receive queue is that the hardware
> > > > > offloaded case versus the software offloaded can have very different
> > > > > behaviors. I wonder if this wouldn't be better served by being an
> > > >
> > > > Could you please explain how the hardware offload and software cases
> > > > behave differently in the skbedit case. From Jakub's suggestion on
> > > >
> https://lore.kernel.org/netdev/20220503084732.363b89cc@kernel.org/,
> > > > it looked like the skbedit action fits better to align the hardware and
> > > > software description of RX queue offload (considering the skb metadata
> > > > remains same in offload vs no-offload case).
> > >
> > > So specifically my concern is RPS. The problem is RPS takes place
> > > before your TC rule would be applied in the software case, but after
> > > it has been applied in the hardware case. As a result the behavior
> > > will be different for one versus the other. With the redirect action
> > > it will pull the packet out of the Rx pipeline and reinsert it so that
> > > RPS will be applied to the packet and it would be received on the CPUs
> > > expected.
> > >
> >
> > Okay, so I understand that without HW offload, the SW behavior would
> > not align for RPS, i.e., RPS CPU would be from a queue (already selected
> > by HW, RSS etc.),  and may not align with the queue selected from
> > the SW TC rule. And I see your point, the solution to this would be
> > reinserting the packet after updating the queue. But, as I look more into
> > this, there are still some more concerns I have.
> >
> > IIUC, we may be looking at a potential TC rule as below:
> > tc filter add dev ethX ingress ... \
> > action mirred ingress redirect dev ethX rxqueue <rx_qid>
> >
> > It looks to me that this configuration could possibly result in loops
> > recursively calling act_mirred. Since the redirection is from ingress
> > to ingress on the same device, when the packet is reinserted into the
> > RX pipeline of the same device, RPS and tc classification happens again,
> > the tc filter with act mirred executes redirecting and reinserting the
> > packet again. act_mirred keeps a CPU counter of recursive calls for the
> > action and drops the packet when the limit is reached.
> > If this is a valid configuration, I cannot find any code that perhaps uses
> > a combination of skb->redirect and skb->from_ingress to check and
> > prevent recursive classification (further execution of TC mirred redirect
> > action).
> 
> The recursion problem is easily solved by simply not requeueing again
> if the packet is on the queue it is supposed to be. If you have rules
> that are bouncing the traffic between two queues it wouldn't be any
> different than the potential issue of bouncing traffic between two
> netdevs which is why the recursion counters were added.
> 

Okay, makes sense. So, redirecting (ingress to ingress) on the same
device with the queue_mapping extension would make it possible to
update the queue. Without the queue_mapping extension, ingress to
ingress redirect on the same device would simply bounce the packets and
eventually drop them once the recursion limit is reached.

> > Also, since reinserting the packet after updating the queue would fix
> > the RPS inconsistency, can this be done from the skbedit action instead
> > of mirred redirect ? So, if skbedit action is used for Rx queue selection,
> > maybe this sequence helps:
> >
> > RPS on RX q1 -> TC action skbedit RX q2 ->
> > always reinsert if action skbedit is on RX -> RPS on RX q2 ->
> > stop further execution of TC action RX skbedit
> 
> That is changing the function of skbedit. In addition the skbedit
> action isn't meant to be offloaded. The skbedit action is only really
> supposed to edit skb medatada, it isn't supposed to take other actions
> on the frame. What we want to avoid is making skbedit into another
> mirred.

Okay, so skbedit changing the skb metadata and doing the additional
action here (reinserting the packet) is not right. But in terms of skbedit
action not meant to be offloaded, I do find that skbedit action allows
offload of certain parameters like mark, ptype and priority.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping
  2022-09-15 22:09           ` Nambiar, Amritha
@ 2022-09-16 15:58             ` Alexander Duyck
  0 siblings, 0 replies; 18+ messages in thread
From: Alexander Duyck @ 2022-09-16 15:58 UTC (permalink / raw)
  To: Nambiar, Amritha
  Cc: netdev, kuba, jhs, jiri, xiyou.wangcong, Gomes, Vinicius,
	Samudrala, Sridhar

On Thu, Sep 15, 2022 at 3:09 PM Nambiar, Amritha
<amritha.nambiar@intel.com> wrote:
>
> > -----Original Message-----
> > From: Alexander Duyck <alexander.duyck@gmail.com>
> > Sent: Thursday, September 15, 2022 8:21 AM
> > To: Nambiar, Amritha <amritha.nambiar@intel.com>
> > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com;
> > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius
> > <vinicius.gomes@intel.com>; Samudrala, Sridhar
> > <sridhar.samudrala@intel.com>
> > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue
> > mapping
> >
> > On Thu, Sep 15, 2022 at 1:45 AM Nambiar, Amritha
> > <amritha.nambiar@intel.com> wrote:
> > >
> > > > -----Original Message-----
> > > > From: Alexander Duyck <alexander.duyck@gmail.com>
> > > > Sent: Friday, September 9, 2022 11:35 AM
> > > > To: Nambiar, Amritha <amritha.nambiar@intel.com>
> > > > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com;
> > > > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius
> > > > <vinicius.gomes@intel.com>; Samudrala, Sridhar
> > > > <sridhar.samudrala@intel.com>
> > > > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue
> > > > mapping
> > > >
> > > > On Fri, Sep 9, 2022 at 2:18 AM Nambiar, Amritha
> > > > <amritha.nambiar@intel.com> wrote:
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Alexander Duyck <alexander.duyck@gmail.com>
> > > > > > Sent: Thursday, September 8, 2022 8:28 AM
> > > > > > To: Nambiar, Amritha <amritha.nambiar@intel.com>
> > > > > > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com;
> > > > > > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius
> > > > > > <vinicius.gomes@intel.com>; Samudrala, Sridhar
> > > > > > <sridhar.samudrala@intel.com>
> > > > > > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX
> > queue
> > > > > > mapping
> > > > > >
> > > > > > On Wed, Sep 7, 2022 at 6:14 PM Amritha Nambiar
> > > > > > <amritha.nambiar@intel.com> wrote:
> > > > > > >
> > > > > > > Based on the discussion on
> > > > > > >
> > > > https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/,
> > > > > > > the following series extends skbedit tc action to RX queue mapping.
> > > > > > > Currently, skbedit action in tc allows overriding of transmit queue.
> > > > > > > Extending this ability of skedit action supports the selection of
> > receive
> > > > > > > queue for incoming packets. Offloading this action is added for
> > receive
> > > > > > > side. Enabled ice driver to offload this type of filter into the
> > > > > > > hardware for accepting packets to the device's receive queue.
> > > > > > >
> > > > > > > v2: Added documentation in Documentation/networking
> > > > > > >
> > > > > > > ---
> > > > > > >
> > > > > > > Amritha Nambiar (4):
> > > > > > >       act_skbedit: Add support for action skbedit RX queue mapping
> > > > > > >       act_skbedit: Offload skbedit queue mapping for receive queue
> > > > > > >       ice: Enable RX queue selection using skbedit action
> > > > > > >       Documentation: networking: TC queue based filtering
> > > > > >
> > > > > > I don't think skbedit is the right thing to be updating for this. In
> > > > > > the case of Tx we were using it because at the time we stored the
> > > > > > sockets Tx queue in the skb, so it made sense to edit it there if we
> > > > > > wanted to tweak things before it got to the qdisc layer. However it
> > > > > > didn't have a direct impact on the hardware and only really affected
> > > > > > the software routing in the device, which eventually resulted in which
> > > > > > hardware queue and qdisc was selected.
> > > > > >
> > > > > > The problem with editing the receive queue is that the hardware
> > > > > > offloaded case versus the software offloaded can have very different
> > > > > > behaviors. I wonder if this wouldn't be better served by being an
> > > > >
> > > > > Could you please explain how the hardware offload and software cases
> > > > > behave differently in the skbedit case. From Jakub's suggestion on
> > > > >
> > https://lore.kernel.org/netdev/20220503084732.363b89cc@kernel.org/,
> > > > > it looked like the skbedit action fits better to align the hardware and
> > > > > software description of RX queue offload (considering the skb metadata
> > > > > remains same in offload vs no-offload case).
> > > >
> > > > So specifically my concern is RPS. The problem is RPS takes place
> > > > before your TC rule would be applied in the software case, but after
> > > > it has been applied in the hardware case. As a result the behavior
> > > > will be different for one versus the other. With the redirect action
> > > > it will pull the packet out of the Rx pipeline and reinsert it so that
> > > > RPS will be applied to the packet and it would be received on the CPUs
> > > > expected.
> > > >
> > >
> > > Okay, so I understand that without HW offload, the SW behavior would
> > > not align for RPS, i.e., RPS CPU would be from a queue (already selected
> > > by HW, RSS etc.),  and may not align with the queue selected from
> > > the SW TC rule. And I see your point, the solution to this would be
> > > reinserting the packet after updating the queue. But, as I look more into
> > > this, there are still some more concerns I have.
> > >
> > > IIUC, we may be looking at a potential TC rule as below:
> > > tc filter add dev ethX ingress ... \
> > > action mirred ingress redirect dev ethX rxqueue <rx_qid>
> > >
> > > It looks to me that this configuration could possibly result in loops
> > > recursively calling act_mirred. Since the redirection is from ingress
> > > to ingress on the same device, when the packet is reinserted into the
> > > RX pipeline of the same device, RPS and tc classification happens again,
> > > the tc filter with act mirred executes redirecting and reinserting the
> > > packet again. act_mirred keeps a CPU counter of recursive calls for the
> > > action and drops the packet when the limit is reached.
> > > If this is a valid configuration, I cannot find any code that perhaps uses
> > > a combination of skb->redirect and skb->from_ingress to check and
> > > prevent recursive classification (further execution of TC mirred redirect
> > > action).
> >
> > The recursion problem is easily solved by simply not requeueing again
> > if the packet is on the queue it is supposed to be. If you have rules
> > that are bouncing the traffic between two queues it wouldn't be any
> > different than the potential issue of bouncing traffic between two
> > netdevs which is why the recursion counters were added.
> >
>
> Okay, makes sense. So, redirecting (ingress to ingress) on the same
> device with the queue_mapping extension would make it possible to
> update the queue. Without the queue_mapping extension, ingress to
> ingress redirect on the same device would simply bounce the packets and
> eventually drop them once the recursion limit is reached.

Right. Basically this would be an extension for the redirect action
that would move a queue up to a first class citizen of sorts.

> > > Also, since reinserting the packet after updating the queue would fix
> > > the RPS inconsistency, can this be done from the skbedit action instead
> > > of mirred redirect ? So, if skbedit action is used for Rx queue selection,
> > > maybe this sequence helps:
> > >
> > > RPS on RX q1 -> TC action skbedit RX q2 ->
> > > always reinsert if action skbedit is on RX -> RPS on RX q2 ->
> > > stop further execution of TC action RX skbedit
> >
> > That is changing the function of skbedit. In addition the skbedit
> > action isn't meant to be offloaded. The skbedit action is only really
> > supposed to edit skb medatada, it isn't supposed to take other actions
> > on the frame. What we want to avoid is making skbedit into another
> > mirred.
>
> Okay, so skbedit changing the skb metadata and doing the additional
> action here (reinserting the packet) is not right. But in terms of skbedit
> action not meant to be offloaded, I do find that skbedit action allows
> offload of certain parameters like mark, ptype and priority.

I had overlooked the offload use as it isn't exactly a direct use
case. It is needed in the context of offloading a qdisc if I am not
mistaken. So it isn't really the action that is being offloaded but
the queueing discipline itself, the skbedit is more of a passenger
along for the ride. The general idea is that the values would be
consumed by the qdisc layer to take some action, and since the qdisc
is being handled down in the hardware it is now down in the hardware
with it. I'm not thrilled, but I can see the justification as we have
had discussions about renaming the action before since it really just
edits the metadata for the frame.

The big difference is that it is the qdisc taking the action versus
the action directly, and all we are tweaking is the metadata which is
the main task of skbedit which is meant to only edit metadata.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping
  2022-09-08  1:23 [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping Amritha Nambiar
                   ` (4 preceding siblings ...)
  2022-09-08 15:27 ` [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping Alexander Duyck
@ 2022-09-21 20:29 ` Jakub Kicinski
  2022-09-22  8:19   ` Nambiar, Amritha
  5 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2022-09-21 20:29 UTC (permalink / raw)
  To: Amritha Nambiar
  Cc: netdev, alexander.duyck, jhs, jiri, xiyou.wangcong,
	vinicius.gomes, sridhar.samudrala

On Wed, 07 Sep 2022 18:23:57 -0700 Amritha Nambiar wrote:
> Based on the discussion on
> https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/,
> the following series extends skbedit tc action to RX queue mapping.
> Currently, skbedit action in tc allows overriding of transmit queue.
> Extending this ability of skedit action supports the selection of receive
> queue for incoming packets. Offloading this action is added for receive
> side. Enabled ice driver to offload this type of filter into the
> hardware for accepting packets to the device's receive queue.
> 
> v2: Added documentation in Documentation/networking

Alex and I had a quick chat about this, I think we can work around 
the difficulties with duplicating the behavior in SW by enforcing 
that the action can only be used with skip_sw. Either skbedit or
Alex's suggested approach with act_mirred. Or new hw-only action.

Alex pointed out that it'd be worth documenting the priorities of 
aRFS vs this thing, which one will be used if HW matches both.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping
  2022-09-21 20:29 ` Jakub Kicinski
@ 2022-09-22  8:19   ` Nambiar, Amritha
  2022-09-22 12:29     ` Jakub Kicinski
  0 siblings, 1 reply; 18+ messages in thread
From: Nambiar, Amritha @ 2022-09-22  8:19 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, alexander.duyck, jhs, jiri, xiyou.wangcong, Gomes,
	Vinicius, Samudrala, Sridhar

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Wednesday, September 21, 2022 1:29 PM
> To: Nambiar, Amritha <amritha.nambiar@intel.com>
> Cc: netdev@vger.kernel.org; alexander.duyck@gmail.com;
> jhs@mojatatu.com; jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes,
> Vinicius <vinicius.gomes@intel.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>
> Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue
> mapping
> 
> On Wed, 07 Sep 2022 18:23:57 -0700 Amritha Nambiar wrote:
> > Based on the discussion on
> > https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/,
> > the following series extends skbedit tc action to RX queue mapping.
> > Currently, skbedit action in tc allows overriding of transmit queue.
> > Extending this ability of skedit action supports the selection of receive
> > queue for incoming packets. Offloading this action is added for receive
> > side. Enabled ice driver to offload this type of filter into the
> > hardware for accepting packets to the device's receive queue.
> >
> > v2: Added documentation in Documentation/networking
> 
> Alex and I had a quick chat about this, I think we can work around
> the difficulties with duplicating the behavior in SW by enforcing
> that the action can only be used with skip_sw. Either skbedit or
> Alex's suggested approach with act_mirred. Or new hw-only action.
> 

Okay, I will go with skbedit and enforce skip_sw for the action on Rx
side. So, skbedit for TX queue is SW only and skbedit for RX queue is
HW only action.

> Alex pointed out that it'd be worth documenting the priorities of
> aRFS vs this thing, which one will be used if HW matches both.

Sure, I will document the priorities of aRFS and TC filter selecting
the Rx queue. On Intel E810 devices, the TC filter selecting Rx queue
has higher priority over aRFS (Flow director filter) if both the filters
are matched.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping
  2022-09-22  8:19   ` Nambiar, Amritha
@ 2022-09-22 12:29     ` Jakub Kicinski
  2022-09-23  1:09       ` Nambiar, Amritha
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2022-09-22 12:29 UTC (permalink / raw)
  To: Nambiar, Amritha
  Cc: netdev, alexander.duyck, jhs, jiri, xiyou.wangcong, Gomes,
	Vinicius, Samudrala, Sridhar

On Thu, 22 Sep 2022 08:19:07 +0000 Nambiar, Amritha wrote:
> > Alex pointed out that it'd be worth documenting the priorities of
> > aRFS vs this thing, which one will be used if HW matches both.  
> 
> Sure, I will document the priorities of aRFS and TC filter selecting
> the Rx queue. On Intel E810 devices, the TC filter selecting Rx queue
> has higher priority over aRFS (Flow director filter) if both the filters
> are matched.

Is it configurable? :S If we think about RSS context selection we'd
expect the context to be selected based on for example the IPv6 daddr 
of the container but we still want aRFS to select the exact queue...
Is there a counterargument for giving the flow director higher prio?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping
  2022-09-22 12:29     ` Jakub Kicinski
@ 2022-09-23  1:09       ` Nambiar, Amritha
  2022-09-23 13:02         ` Jakub Kicinski
  0 siblings, 1 reply; 18+ messages in thread
From: Nambiar, Amritha @ 2022-09-23  1:09 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, alexander.duyck, jhs, jiri, xiyou.wangcong, Gomes,
	Vinicius, Samudrala, Sridhar

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Thursday, September 22, 2022 5:29 AM
> To: Nambiar, Amritha <amritha.nambiar@intel.com>
> Cc: netdev@vger.kernel.org; alexander.duyck@gmail.com;
> jhs@mojatatu.com; jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes,
> Vinicius <vinicius.gomes@intel.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>
> Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue
> mapping
> 
> On Thu, 22 Sep 2022 08:19:07 +0000 Nambiar, Amritha wrote:
> > > Alex pointed out that it'd be worth documenting the priorities of
> > > aRFS vs this thing, which one will be used if HW matches both.
> >
> > Sure, I will document the priorities of aRFS and TC filter selecting
> > the Rx queue. On Intel E810 devices, the TC filter selecting Rx queue
> > has higher priority over aRFS (Flow director filter) if both the filters
> > are matched.
> 
> Is it configurable? :S If we think about RSS context selection we'd
> expect the context to be selected based on for example the IPv6 daddr
> of the container but we still want aRFS to select the exact queue...

This behavior is similar on E810 currently, i.e., TC filter selects the set
of queues (like the RSS context), and then the flow director filter can
be used to select the exact queue within the queue-set. We want to have
the ability to select the exact queue within the queue-set via TC (and not
flow director filter).

On E810, TC filters are added to hardware block called the "switch". This
block supports two types of actions in hardware termed as "forward to
queue" and  "forward to queue-set". aRFS filters are added to a different
HW block called the "flow director" (FD). The FD block comes after the switch
block in the HW pipeline. The FD block has certain restrictions (can only
have filters with the same packet type). The switch table does not have
this restriction.

When both the TC filter and FD filter competes for queue selection (i.e. both
filters are matched and action is to set a queue), the pipeline resolves
conflicts based on metadata priority. The priorities are not user configurable.
The ICE driver programs these priorities based on metadata at each stage,
action etc. Switch (TC) filters with forward-to-queue action have higher 
priority over FD filter assigning a queue. The hash filter has lowest priority.

> Is there a counterargument for giving the flow director higher prio?

Isn't the HW behavior on RX (WRT to setting the queue) consistent
with what we have in SW for TX ? In SW, TX queue set via TC filter
(skbedit action) has the highest precedence over XPS and jhash (lowest). 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping
  2022-09-23  1:09       ` Nambiar, Amritha
@ 2022-09-23 13:02         ` Jakub Kicinski
  0 siblings, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2022-09-23 13:02 UTC (permalink / raw)
  To: Nambiar, Amritha
  Cc: netdev, alexander.duyck, jhs, jiri, xiyou.wangcong, Gomes,
	Vinicius, Samudrala, Sridhar

On Fri, 23 Sep 2022 01:09:38 +0000 Nambiar, Amritha wrote:
> > Is it configurable? :S If we think about RSS context selection we'd
> > expect the context to be selected based on for example the IPv6 daddr
> > of the container but we still want aRFS to select the exact queue...  
> 
> This behavior is similar on E810 currently, i.e., TC filter selects the set
> of queues (like the RSS context), and then the flow director filter can
> be used to select the exact queue within the queue-set. We want to have
> the ability to select the exact queue within the queue-set via TC (and not
> flow director filter).
> 
> On E810, TC filters are added to hardware block called the "switch". This
> block supports two types of actions in hardware termed as "forward to
> queue" and  "forward to queue-set". aRFS filters are added to a different
> HW block called the "flow director" (FD). The FD block comes after the switch
> block in the HW pipeline. The FD block has certain restrictions (can only
> have filters with the same packet type). The switch table does not have
> this restriction.
> 
> When both the TC filter and FD filter competes for queue selection (i.e. both
> filters are matched and action is to set a queue), the pipeline resolves
> conflicts based on metadata priority. The priorities are not user configurable.
> The ICE driver programs these priorities based on metadata at each stage,
> action etc. Switch (TC) filters with forward-to-queue action have higher 
> priority over FD filter assigning a queue. The hash filter has lowest priority.
> 
> > Is there a counterargument for giving the flow director higher prio?  
> 
> Isn't the HW behavior on RX (WRT to setting the queue) consistent
> with what we have in SW for TX ? In SW, TX queue set via TC filter
> (skbedit action) has the highest precedence over XPS and jhash (lowest). 

Alright, I guess that could work as well, thanks for the explanation.
Initially I thought that it'd be strange if the queue-set selection
was before aRFS and queue-id selection after, if they are both to be
controlled by TC. But I can see how that makes most practical sense :S

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-09-23 13:02 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-08  1:23 [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping Amritha Nambiar
2022-09-08  1:24 ` [net-next PATCH v2 1/4] act_skbedit: Add support for action skbedit " Amritha Nambiar
2022-09-08  1:24 ` [net-next PATCH v2 2/4] act_skbedit: Offload skbedit queue mapping for receive queue Amritha Nambiar
2022-09-08  1:24 ` [net-next PATCH v2 3/4] ice: Enable RX queue selection using skbedit action Amritha Nambiar
2022-09-08  1:24 ` [net-next PATCH v2 4/4] Documentation: networking: TC queue based filtering Amritha Nambiar
2022-09-08 17:34   ` kernel test robot
2022-09-08 15:27 ` [net-next PATCH v2 0/4] Extend action skbedit to RX queue mapping Alexander Duyck
2022-09-09  9:18   ` Nambiar, Amritha
2022-09-09 18:35     ` Alexander Duyck
2022-09-15  8:45       ` Nambiar, Amritha
2022-09-15 15:21         ` Alexander Duyck
2022-09-15 22:09           ` Nambiar, Amritha
2022-09-16 15:58             ` Alexander Duyck
2022-09-21 20:29 ` Jakub Kicinski
2022-09-22  8:19   ` Nambiar, Amritha
2022-09-22 12:29     ` Jakub Kicinski
2022-09-23  1:09       ` Nambiar, Amritha
2022-09-23 13:02         ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.