All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v9 0/7] net/sched: cls_api: Support hardware miss to tc action
@ 2023-02-06 17:43 Paul Blakey
  2023-02-06 17:43 ` [PATCH net-next v9 1/7] " Paul Blakey
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Paul Blakey @ 2023-02-06 17:43 UTC (permalink / raw)
  To: Paul Blakey, netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller
  Cc: Oz Shlomo, Jiri Pirko, Roi Dayan, Vlad Buslov

Hi,

This series adds support for hardware miss to instruct tc to continue execution
in a specific tc action instance on a filter's action list. The mlx5 driver patch
(besides the refactors) shows its usage instead of using just chain restore.

Currently a filter's action list must be executed all together or
not at all as driver are only able to tell tc to continue executing from a
specific tc chain, and not a specific filter/action.

This is troublesome with regards to action CT, where new connections should
be sent to software (via tc chain restore), and established connections can
be handled in hardware.

Checking for new connections is done when executing the ct action in hardware
(by checking the packet's tuple against known established tuples).
But if there is a packet modification (pedit) action before action CT and the
checked tuple is a new connection, hardware will need to revert the previous
packet modifications before sending it back to software so it can
re-match the same tc filter in software and re-execute its CT action.

The following is an example configuration of stateless nat
on mlx5 driver that isn't supported before this patchet:

 #Setup corrosponding mlx5 VFs in namespaces
 $ ip netns add ns0
 $ ip netns add ns1
 $ ip link set dev enp8s0f0v0 netns ns0
 $ ip netns exec ns0 ifconfig enp8s0f0v0 1.1.1.1/24 up
 $ ip link set dev enp8s0f0v1 netns ns1
 $ ip netns exec ns1 ifconfig enp8s0f0v1 1.1.1.2/24 up

 #Setup tc arp and ct rules on mxl5 VF representors
 $ tc qdisc add dev enp8s0f0_0 ingress
 $ tc qdisc add dev enp8s0f0_1 ingress
 $ ifconfig enp8s0f0_0 up
 $ ifconfig enp8s0f0_1 up

 #Original side
 $ tc filter add dev enp8s0f0_0 ingress chain 0 proto ip flower \
    ct_state -trk ip_proto tcp dst_port 8888 \
      action pedit ex munge tcp dport set 5001 pipe \
      action csum ip tcp pipe \
      action ct pipe \
      action goto chain 1
 $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
    ct_state +trk+est \
      action mirred egress redirect dev enp8s0f0_1
 $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
    ct_state +trk+new \
      action ct commit pipe \
      action mirred egress redirect dev enp8s0f0_1
 $ tc filter add dev enp8s0f0_0 ingress chain 0 proto arp flower \
      action mirred egress redirect dev enp8s0f0_1

 #Reply side
 $ tc filter add dev enp8s0f0_1 ingress chain 0 proto arp flower \
      action mirred egress redirect dev enp8s0f0_0
 $ tc filter add dev enp8s0f0_1 ingress chain 0 proto ip flower \
    ct_state -trk ip_proto tcp \ 
      action ct pipe \
      action pedit ex munge tcp sport set 8888 pipe \
      action csum ip tcp pipe \
      action mirred egress redirect dev enp8s0f0_0

 #Run traffic
 $ ip netns exec ns1 iperf -s -p 5001&
 $ sleep 2 #wait for iperf to fully open
 $ ip netns exec ns0 iperf -c 1.1.1.2 -p 8888

 #dump tc filter stats on enp8s0f0_0 chain 0 rule and see hardware packets:
 $ tc -s filter show dev enp8s0f0_0 ingress chain 0 proto ip | grep "hardware.*pkt"
        Sent hardware 9310116832 bytes 6149672 pkt
        Sent hardware 9310116832 bytes 6149672 pkt
        Sent hardware 9310116832 bytes 6149672 pkt

A new connection executing the first filter in hardware will first rewrite
the dst port to the new port, and then the ct action is executed,
because this is a new connection, hardware will need to be send this back
to software, on chain 0, to execute the first filter again in software.
The dst port needs to be reverted otherwise it won't re-match the old
dst port in the first filter. Because of that, currently mlx5 driver will
reject offloading the above action ct rule.

This series adds supports partial offload of a filter's action list,
and letting tc software continue processing in the specific action instance
where hardware left off (in the above case after the "action pedit ex munge tcp
dport... of the first rule") allowing support for scenarios such as the above.

Changelog:
	v1->v2:
	Fixed compilation without CONFIG_NET_CLS
	Cover letter re-write

	v2->v3:
	Unlock spin_lock on error in cls flower filter handle refactor
	Cover letter

	v3->v4:
	Silence warning by clang

	v4->v5:
	Cover letter example
	Removed ifdef as much as possible by using inline stubs

	v5->v6:
	Removed new inlines in cls_api.c (bot complained in patchwork)
	Added reviewed-by/ack - Thanks!

	v6->v7:
	Removed WARN_ON from pkt path (leon)
	Removed unnecessary return in void func

	v7->v8:
	Removed #if IS_ENABLED on skb ext adding Kconfig changes
	Complex variable init in seperate lines
	if,else if, else if ---> switch case

	v8->v9:
	Removed even more IS_ENABLED because of Kconfig

Paul Blakey (7):
  net/sched: cls_api: Support hardware miss to tc action
  net/sched: flower: Move filter handle initialization earlier
  net/sched: flower: Support hardware miss to tc action
  net/mlx5: Kconfig: Make tc offload depend on tc skb extension
  net/mlx5: Refactor tc miss handling to a single function
  net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
  net/mlx5e: TC, Set CT miss to the specific ct action instance

 .../net/ethernet/mellanox/mlx5/core/Kconfig   |   4 +-
 .../ethernet/mellanox/mlx5/core/en/rep/tc.c   | 225 ++------------
 .../mellanox/mlx5/core/en/tc/sample.c         |   2 +-
 .../ethernet/mellanox/mlx5/core/en/tc_ct.c    |  39 +--
 .../ethernet/mellanox/mlx5/core/en/tc_ct.h    |   2 +
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |   4 +-
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 280 ++++++++++++++++--
 .../net/ethernet/mellanox/mlx5/core/en_tc.h   |  23 +-
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |   2 +
 .../mellanox/mlx5/core/lib/fs_chains.c        |  14 +-
 include/linux/skbuff.h                        |   6 +-
 include/net/flow_offload.h                    |   1 +
 include/net/pkt_cls.h                         |  34 ++-
 include/net/sch_generic.h                     |   2 +
 net/openvswitch/flow.c                        |   3 +-
 net/sched/act_api.c                           |   2 +-
 net/sched/cls_api.c                           | 213 ++++++++++++-
 net/sched/cls_flower.c                        |  73 +++--
 18 files changed, 602 insertions(+), 327 deletions(-)

-- 
2.30.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH net-next v9 1/7] net/sched: cls_api: Support hardware miss to tc action
  2023-02-06 17:43 [PATCH net-next v9 0/7] net/sched: cls_api: Support hardware miss to tc action Paul Blakey
@ 2023-02-06 17:43 ` Paul Blakey
  2023-02-10  2:21   ` Marcelo Ricardo Leitner
  2023-02-06 17:43 ` [PATCH net-next v9 2/7] net/sched: flower: Move filter handle initialization earlier Paul Blakey
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Paul Blakey @ 2023-02-06 17:43 UTC (permalink / raw)
  To: Paul Blakey, netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller
  Cc: Oz Shlomo, Jiri Pirko, Roi Dayan, Vlad Buslov, Simon Horman

For drivers to support partial offload of a filter's action list,
add support for action miss to specify an action instance to
continue from in sw.

CT action in particular can't be fully offloaded, as new connections
need to be handled in software. This imposes other limitations on
the actions that can be offloaded together with the CT action, such
as packet modifications.

Assign each action on a filter's action list a unique miss_cookie
which drivers can then use to fill action_miss part of the tc skb
extension. On getting back this miss_cookie, find the action
instance with relevant cookie and continue classifying from there.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 include/linux/skbuff.h     |   6 +-
 include/net/flow_offload.h |   1 +
 include/net/pkt_cls.h      |  34 +++---
 include/net/sch_generic.h  |   2 +
 net/openvswitch/flow.c     |   3 +-
 net/sched/act_api.c        |   2 +-
 net/sched/cls_api.c        | 213 +++++++++++++++++++++++++++++++++++--
 7 files changed, 234 insertions(+), 27 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 1fa95b916342..9b9aa854068f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -311,12 +311,16 @@ struct nf_bridge_info {
  * and read by ovs to recirc_id.
  */
 struct tc_skb_ext {
-	__u32 chain;
+	union {
+		u64 act_miss_cookie;
+		__u32 chain;
+	};
 	__u16 mru;
 	__u16 zone;
 	u8 post_ct:1;
 	u8 post_ct_snat:1;
 	u8 post_ct_dnat:1;
+	u8 act_miss:1; /* Set if act_miss_cookie is used */
 };
 #endif
 
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 0400a0ac8a29..88db7346eb7a 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -228,6 +228,7 @@ void flow_action_cookie_destroy(struct flow_action_cookie *cookie);
 struct flow_action_entry {
 	enum flow_action_id		id;
 	u32				hw_index;
+	u64				miss_cookie;
 	enum flow_action_hw_stats	hw_stats;
 	action_destr			destructor;
 	void				*destructor_priv;
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index cd410a87517b..e395f2a84ed2 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -59,6 +59,8 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 void tcf_block_put(struct tcf_block *block);
 void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
 		       struct tcf_block_ext_info *ei);
+int tcf_exts_init_ex(struct tcf_exts *exts, struct net *net, int action,
+		     int police, struct tcf_proto *tp, u32 handle, bool used_action_miss);
 
 static inline bool tcf_block_shared(struct tcf_block *block)
 {
@@ -229,6 +231,7 @@ struct tcf_exts {
 	struct tc_action **actions;
 	struct net	*net;
 	netns_tracker	ns_tracker;
+	struct tcf_exts_miss_cookie_node *miss_cookie_node;
 #endif
 	/* Map to export classifier specific extension TLV types to the
 	 * generic extensions API. Unsupported extensions must be set to 0.
@@ -240,21 +243,11 @@ struct tcf_exts {
 static inline int tcf_exts_init(struct tcf_exts *exts, struct net *net,
 				int action, int police)
 {
-#ifdef CONFIG_NET_CLS_ACT
-	exts->type = 0;
-	exts->nr_actions = 0;
-	/* Note: we do not own yet a reference on net.
-	 * This reference might be taken later from tcf_exts_get_net().
-	 */
-	exts->net = net;
-	exts->actions = kcalloc(TCA_ACT_MAX_PRIO, sizeof(struct tc_action *),
-				GFP_KERNEL);
-	if (!exts->actions)
-		return -ENOMEM;
+#ifdef CONFIG_NET_CLS
+	return tcf_exts_init_ex(exts, net, action, police, NULL, 0, false);
+#else
+	return -EOPNOTSUPP;
 #endif
-	exts->action = action;
-	exts->police = police;
-	return 0;
 }
 
 /* Return false if the netns is being destroyed in cleanup_net(). Callers
@@ -353,6 +346,18 @@ tcf_exts_exec(struct sk_buff *skb, struct tcf_exts *exts,
 	return TC_ACT_OK;
 }
 
+static inline int
+tcf_exts_exec_ex(struct sk_buff *skb, struct tcf_exts *exts, int act_index,
+		 struct tcf_result *res)
+{
+#ifdef CONFIG_NET_CLS_ACT
+	return tcf_action_exec(skb, exts->actions + act_index,
+			       exts->nr_actions - act_index, res);
+#else
+	return TC_ACT_OK;
+#endif
+}
+
 int tcf_exts_validate(struct net *net, struct tcf_proto *tp,
 		      struct nlattr **tb, struct nlattr *rate_tlv,
 		      struct tcf_exts *exts, u32 flags,
@@ -577,6 +582,7 @@ int tc_setup_offload_action(struct flow_action *flow_action,
 void tc_cleanup_offload_action(struct flow_action *flow_action);
 int tc_setup_action(struct flow_action *flow_action,
 		    struct tc_action *actions[],
+		    u32 miss_cookie_base,
 		    struct netlink_ext_ack *extack);
 
 int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index af4aa66aaa4e..fab5ba3e61b7 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -369,6 +369,8 @@ struct tcf_proto_ops {
 						struct nlattr **tca,
 						struct netlink_ext_ack *extack);
 	void			(*tmplt_destroy)(void *tmplt_priv);
+	struct tcf_exts *	(*get_exts)(const struct tcf_proto *tp,
+					    u32 handle);
 
 	/* rtnetlink specific */
 	int			(*dump)(struct net*, struct tcf_proto*, void *,
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index e20d1a973417..69f91460a55c 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -1038,7 +1038,8 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
 	if (tc_skb_ext_tc_enabled()) {
 		tc_ext = skb_ext_find(skb, TC_SKB_EXT);
-		key->recirc_id = tc_ext ? tc_ext->chain : 0;
+		key->recirc_id = tc_ext && !tc_ext->act_miss ?
+				 tc_ext->chain : 0;
 		OVS_CB(skb)->mru = tc_ext ? tc_ext->mru : 0;
 		post_ct = tc_ext ? tc_ext->post_ct : false;
 		post_ct_snat = post_ct ? tc_ext->post_ct_snat : false;
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index cd09ef49df22..16fd3d30eb12 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -272,7 +272,7 @@ static int tcf_action_offload_add_ex(struct tc_action *action,
 	if (err)
 		goto fl_err;
 
-	err = tc_setup_action(&fl_action->action, actions, extack);
+	err = tc_setup_action(&fl_action->action, actions, 0, extack);
 	if (err) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "Failed to setup tc actions for offload");
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 5b4a95e8a1ee..8ff9530fef68 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -22,6 +22,7 @@
 #include <linux/idr.h>
 #include <linux/jhash.h>
 #include <linux/rculist.h>
+#include <linux/rhashtable.h>
 #include <net/net_namespace.h>
 #include <net/sock.h>
 #include <net/netlink.h>
@@ -50,6 +51,109 @@ static LIST_HEAD(tcf_proto_base);
 /* Protects list of registered TC modules. It is pure SMP lock. */
 static DEFINE_RWLOCK(cls_mod_lock);
 
+static struct xarray tcf_exts_miss_cookies_xa;
+struct tcf_exts_miss_cookie_node {
+	const struct tcf_chain *chain;
+	const struct tcf_proto *tp;
+	const struct tcf_exts *exts;
+	u32 chain_index;
+	u32 tp_prio;
+	u32 handle;
+	u32 miss_cookie_base;
+	struct rcu_head rcu;
+};
+
+/* Each tc action entry cookie will be comprised of 32bit miss_cookie_base +
+ * action index in the exts tc actions array.
+ */
+union tcf_exts_miss_cookie {
+	struct {
+		u32 miss_cookie_base;
+		u32 act_index;
+	};
+	u64 miss_cookie;
+};
+
+#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
+static int
+tcf_exts_miss_cookie_base_alloc(struct tcf_exts *exts, struct tcf_proto *tp,
+				u32 handle)
+{
+	struct tcf_exts_miss_cookie_node *n;
+	static u32 next;
+	int err;
+
+	if (WARN_ON(!handle || !tp->ops->get_exts))
+		return -EINVAL;
+
+	n = kzalloc(sizeof(*n), GFP_KERNEL);
+	if (!n)
+		return -ENOMEM;
+
+	n->chain_index = tp->chain->index;
+	n->chain = tp->chain;
+	n->tp_prio = tp->prio;
+	n->tp = tp;
+	n->exts = exts;
+	n->handle = handle;
+
+	err = xa_alloc_cyclic(&tcf_exts_miss_cookies_xa, &n->miss_cookie_base,
+			      n, xa_limit_32b, &next, GFP_KERNEL);
+	if (err)
+		goto err_xa_alloc;
+
+	exts->miss_cookie_node = n;
+	return 0;
+
+err_xa_alloc:
+	kfree(n);
+	return err;
+}
+
+static void tcf_exts_miss_cookie_base_destroy(struct tcf_exts *exts)
+{
+	struct tcf_exts_miss_cookie_node *n;
+
+	if (!exts->miss_cookie_node)
+		return;
+
+	n = exts->miss_cookie_node;
+	xa_erase(&tcf_exts_miss_cookies_xa, n->miss_cookie_base);
+	kfree_rcu(n, rcu);
+}
+
+static struct tcf_exts_miss_cookie_node *
+tcf_exts_miss_cookie_lookup(u64 miss_cookie, int *act_index)
+{
+	union tcf_exts_miss_cookie mc = { .miss_cookie = miss_cookie, };
+
+	*act_index = mc.act_index;
+	return xa_load(&tcf_exts_miss_cookies_xa, mc.miss_cookie_base);
+}
+#else /* IS_ENABLED(CONFIG_NET_TC_SKB_EXT) */
+static int
+tcf_exts_miss_cookie_base_alloc(struct tcf_exts *exts, struct tcf_proto *tp,
+				u32 handle)
+{
+	return 0;
+}
+
+static void tcf_exts_miss_cookie_base_destroy(struct tcf_exts *exts)
+{
+}
+#endif /* IS_ENABLED(CONFIG_NET_TC_SKB_EXT) */
+
+static u64 tcf_exts_miss_cookie_get(u32 miss_cookie_base, int act_index)
+{
+	union tcf_exts_miss_cookie mc = { .act_index = act_index, };
+
+	if (!miss_cookie_base)
+		return 0;
+
+	mc.miss_cookie_base = miss_cookie_base;
+	return mc.miss_cookie;
+}
+
 #ifdef CONFIG_NET_CLS_ACT
 DEFINE_STATIC_KEY_FALSE(tc_skb_ext_tc);
 EXPORT_SYMBOL(tc_skb_ext_tc);
@@ -1549,6 +1653,8 @@ static inline int __tcf_classify(struct sk_buff *skb,
 				 const struct tcf_proto *orig_tp,
 				 struct tcf_result *res,
 				 bool compat_mode,
+				 struct tcf_exts_miss_cookie_node *n,
+				 int act_index,
 				 u32 *last_executed_chain)
 {
 #ifdef CONFIG_NET_CLS_ACT
@@ -1560,13 +1666,36 @@ static inline int __tcf_classify(struct sk_buff *skb,
 #endif
 	for (; tp; tp = rcu_dereference_bh(tp->next)) {
 		__be16 protocol = skb_protocol(skb, false);
-		int err;
+		int err = 0;
 
-		if (tp->protocol != protocol &&
-		    tp->protocol != htons(ETH_P_ALL))
-			continue;
+		if (n) {
+			struct tcf_exts *exts;
+
+			if (n->tp_prio != tp->prio)
+				continue;
+
+			/* We re-lookup the tp and chain based on index instead
+			 * of having hard refs and locks to them, so do a sanity
+			 * check if any of tp,chain,exts was replaced by the
+			 * time we got here with a cookie from hardware.
+			 */
+			if (unlikely(n->tp != tp || n->tp->chain != n->chain ||
+				     !tp->ops->get_exts))
+				return TC_ACT_SHOT;
+
+			exts = tp->ops->get_exts(tp, n->handle);
+			if (unlikely(!exts || n->exts != exts))
+				return TC_ACT_SHOT;
 
-		err = tc_classify(skb, tp, res);
+			n = NULL;
+			err = tcf_exts_exec_ex(skb, exts, act_index, res);
+		} else {
+			if (tp->protocol != protocol &&
+			    tp->protocol != htons(ETH_P_ALL))
+				continue;
+
+			err = tc_classify(skb, tp, res);
+		}
 #ifdef CONFIG_NET_CLS_ACT
 		if (unlikely(err == TC_ACT_RECLASSIFY && !compat_mode)) {
 			first_tp = orig_tp;
@@ -1582,6 +1711,9 @@ static inline int __tcf_classify(struct sk_buff *skb,
 			return err;
 	}
 
+	if (unlikely(n))
+		return TC_ACT_SHOT;
+
 	return TC_ACT_UNSPEC; /* signal: continue lookup */
 #ifdef CONFIG_NET_CLS_ACT
 reset:
@@ -1606,21 +1738,33 @@ int tcf_classify(struct sk_buff *skb,
 #if !IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
 	u32 last_executed_chain = 0;
 
-	return __tcf_classify(skb, tp, tp, res, compat_mode,
+	return __tcf_classify(skb, tp, tp, res, compat_mode, NULL, 0,
 			      &last_executed_chain);
 #else
 	u32 last_executed_chain = tp ? tp->chain->index : 0;
+	struct tcf_exts_miss_cookie_node *n = NULL;
 	const struct tcf_proto *orig_tp = tp;
 	struct tc_skb_ext *ext;
+	int act_index = 0;
 	int ret;
 
 	if (block) {
 		ext = skb_ext_find(skb, TC_SKB_EXT);
 
-		if (ext && ext->chain) {
+		if (ext && (ext->chain || ext->act_miss)) {
 			struct tcf_chain *fchain;
+			u32 chain = ext->chain;
 
-			fchain = tcf_chain_lookup_rcu(block, ext->chain);
+			if (ext->act_miss) {
+				n = tcf_exts_miss_cookie_lookup(ext->act_miss_cookie,
+								&act_index);
+				if (!n)
+					return TC_ACT_SHOT;
+
+				chain = n->chain_index;
+			}
+
+			fchain = tcf_chain_lookup_rcu(block, chain);
 			if (!fchain)
 				return TC_ACT_SHOT;
 
@@ -1632,7 +1776,7 @@ int tcf_classify(struct sk_buff *skb,
 		}
 	}
 
-	ret = __tcf_classify(skb, tp, orig_tp, res, compat_mode,
+	ret = __tcf_classify(skb, tp, orig_tp, res, compat_mode, n, act_index,
 			     &last_executed_chain);
 
 	if (tc_skb_ext_tc_enabled()) {
@@ -3056,9 +3200,48 @@ static int tc_dump_chain(struct sk_buff *skb, struct netlink_callback *cb)
 	return skb->len;
 }
 
+int tcf_exts_init_ex(struct tcf_exts *exts, struct net *net, int action,
+		     int police, struct tcf_proto *tp, u32 handle,
+		     bool use_action_miss)
+{
+	int err = 0;
+
+#ifdef CONFIG_NET_CLS_ACT
+	exts->type = 0;
+	exts->nr_actions = 0;
+	/* Note: we do not own yet a reference on net.
+	 * This reference might be taken later from tcf_exts_get_net().
+	 */
+	exts->net = net;
+	exts->actions = kcalloc(TCA_ACT_MAX_PRIO, sizeof(struct tc_action *),
+				GFP_KERNEL);
+	if (!exts->actions)
+		return -ENOMEM;
+#endif
+
+	exts->action = action;
+	exts->police = police;
+
+	if (!use_action_miss)
+		return 0;
+
+	err = tcf_exts_miss_cookie_base_alloc(exts, tp, handle);
+	if (err)
+		goto err_miss_alloc;
+
+	return 0;
+
+err_miss_alloc:
+	tcf_exts_destroy(exts);
+	return err;
+}
+EXPORT_SYMBOL(tcf_exts_init_ex);
+
 void tcf_exts_destroy(struct tcf_exts *exts)
 {
 #ifdef CONFIG_NET_CLS_ACT
+	tcf_exts_miss_cookie_base_destroy(exts);
+
 	if (exts->actions) {
 		tcf_action_destroy(exts->actions, TCA_ACT_UNBIND);
 		kfree(exts->actions);
@@ -3547,6 +3730,7 @@ static int tc_setup_offload_act(struct tc_action *act,
 
 int tc_setup_action(struct flow_action *flow_action,
 		    struct tc_action *actions[],
+		    u32 miss_cookie_base,
 		    struct netlink_ext_ack *extack)
 {
 	int i, j, k, index, err = 0;
@@ -3577,6 +3761,8 @@ int tc_setup_action(struct flow_action *flow_action,
 		for (k = 0; k < index ; k++) {
 			entry[k].hw_stats = tc_act_hw_stats(act->hw_stats);
 			entry[k].hw_index = act->tcfa_index;
+			entry[k].miss_cookie =
+				tcf_exts_miss_cookie_get(miss_cookie_base, i);
 		}
 
 		j += index;
@@ -3599,10 +3785,15 @@ int tc_setup_offload_action(struct flow_action *flow_action,
 			    struct netlink_ext_ack *extack)
 {
 #ifdef CONFIG_NET_CLS_ACT
+	u32 miss_cookie_base;
+
 	if (!exts)
 		return 0;
 
-	return tc_setup_action(flow_action, exts->actions, extack);
+	miss_cookie_base = exts->miss_cookie_node ?
+			   exts->miss_cookie_node->miss_cookie_base : 0;
+	return tc_setup_action(flow_action, exts->actions, miss_cookie_base,
+			       extack);
 #else
 	return 0;
 #endif
@@ -3770,6 +3961,8 @@ static int __init tc_filter_init(void)
 	if (err)
 		goto err_register_pernet_subsys;
 
+	xa_init_flags(&tcf_exts_miss_cookies_xa, XA_FLAGS_ALLOC1);
+
 	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL,
 		      RTNL_FLAG_DOIT_UNLOCKED);
 	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL,
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next v9 2/7] net/sched: flower: Move filter handle initialization earlier
  2023-02-06 17:43 [PATCH net-next v9 0/7] net/sched: cls_api: Support hardware miss to tc action Paul Blakey
  2023-02-06 17:43 ` [PATCH net-next v9 1/7] " Paul Blakey
@ 2023-02-06 17:43 ` Paul Blakey
  2023-02-06 17:43 ` [PATCH net-next v9 3/7] net/sched: flower: Support hardware miss to tc action Paul Blakey
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Paul Blakey @ 2023-02-06 17:43 UTC (permalink / raw)
  To: Paul Blakey, netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller
  Cc: Oz Shlomo, Jiri Pirko, Roi Dayan, Vlad Buslov, Simon Horman

To support miss to action during hardware offload the filter's
handle is needed when setting up the actions (tcf_exts_init()),
and before offloading.

Move filter handle initialization earlier.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
---
 net/sched/cls_flower.c | 62 ++++++++++++++++++++++++------------------
 1 file changed, 35 insertions(+), 27 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 0b15698b3531..564b862870c7 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -2192,10 +2192,6 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	INIT_LIST_HEAD(&fnew->hw_list);
 	refcount_set(&fnew->refcnt, 1);
 
-	err = tcf_exts_init(&fnew->exts, net, TCA_FLOWER_ACT, 0);
-	if (err < 0)
-		goto errout;
-
 	if (tb[TCA_FLOWER_FLAGS]) {
 		fnew->flags = nla_get_u32(tb[TCA_FLOWER_FLAGS]);
 
@@ -2205,15 +2201,45 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 		}
 	}
 
+	if (!fold) {
+		spin_lock(&tp->lock);
+		if (!handle) {
+			handle = 1;
+			err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
+					    INT_MAX, GFP_ATOMIC);
+		} else {
+			err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
+					    handle, GFP_ATOMIC);
+
+			/* Filter with specified handle was concurrently
+			 * inserted after initial check in cls_api. This is not
+			 * necessarily an error if NLM_F_EXCL is not set in
+			 * message flags. Returning EAGAIN will cause cls_api to
+			 * try to update concurrently inserted rule.
+			 */
+			if (err == -ENOSPC)
+				err = -EAGAIN;
+		}
+		spin_unlock(&tp->lock);
+
+		if (err)
+			goto errout;
+	}
+	fnew->handle = handle;
+
+	err = tcf_exts_init(&fnew->exts, net, TCA_FLOWER_ACT, 0);
+	if (err < 0)
+		goto errout_idr;
+
 	err = fl_set_parms(net, tp, fnew, mask, base, tb, tca[TCA_RATE],
 			   tp->chain->tmplt_priv, flags, fnew->flags,
 			   extack);
 	if (err)
-		goto errout;
+		goto errout_idr;
 
 	err = fl_check_assign_mask(head, fnew, fold, mask);
 	if (err)
-		goto errout;
+		goto errout_idr;
 
 	err = fl_ht_insert_unique(fnew, fold, &in_ht);
 	if (err)
@@ -2279,29 +2305,9 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 		refcount_dec(&fold->refcnt);
 		__fl_put(fold);
 	} else {
-		if (handle) {
-			/* user specifies a handle and it doesn't exist */
-			err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
-					    handle, GFP_ATOMIC);
-
-			/* Filter with specified handle was concurrently
-			 * inserted after initial check in cls_api. This is not
-			 * necessarily an error if NLM_F_EXCL is not set in
-			 * message flags. Returning EAGAIN will cause cls_api to
-			 * try to update concurrently inserted rule.
-			 */
-			if (err == -ENOSPC)
-				err = -EAGAIN;
-		} else {
-			handle = 1;
-			err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
-					    INT_MAX, GFP_ATOMIC);
-		}
-		if (err)
-			goto errout_hw;
+		idr_replace(&head->handle_idr, fnew, fnew->handle);
 
 		refcount_inc(&fnew->refcnt);
-		fnew->handle = handle;
 		list_add_tail_rcu(&fnew->list, &fnew->mask->filters);
 		spin_unlock(&tp->lock);
 	}
@@ -2324,6 +2330,8 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 				       fnew->mask->filter_ht_params);
 errout_mask:
 	fl_mask_put(head, fnew->mask);
+errout_idr:
+	idr_remove(&head->handle_idr, fnew->handle);
 errout:
 	__fl_put(fnew);
 errout_tb:
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next v9 3/7] net/sched: flower: Support hardware miss to tc action
  2023-02-06 17:43 [PATCH net-next v9 0/7] net/sched: cls_api: Support hardware miss to tc action Paul Blakey
  2023-02-06 17:43 ` [PATCH net-next v9 1/7] " Paul Blakey
  2023-02-06 17:43 ` [PATCH net-next v9 2/7] net/sched: flower: Move filter handle initialization earlier Paul Blakey
@ 2023-02-06 17:43 ` Paul Blakey
  2023-02-06 17:44 ` [PATCH net-next v9 4/7] net/mlx5: Kconfig: Make tc offload depend on tc skb extension Paul Blakey
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Paul Blakey @ 2023-02-06 17:43 UTC (permalink / raw)
  To: Paul Blakey, netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller
  Cc: Oz Shlomo, Jiri Pirko, Roi Dayan, Vlad Buslov, Simon Horman

To support hardware miss to tc action in actions on the flower
classifier, implement the required getting of filter actions,
and setup filter exts (actions) miss by giving it the filter's
handle and actions.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
---
 net/sched/cls_flower.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 564b862870c7..5da7f6d02e5d 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -534,6 +534,15 @@ static struct cls_fl_filter *__fl_get(struct cls_fl_head *head, u32 handle)
 	return f;
 }
 
+static struct tcf_exts *fl_get_exts(const struct tcf_proto *tp, u32 handle)
+{
+	struct cls_fl_head *head = rcu_dereference_bh(tp->root);
+	struct cls_fl_filter *f;
+
+	f = idr_find(&head->handle_idr, handle);
+	return f ? &f->exts : NULL;
+}
+
 static int __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
 		       bool *last, bool rtnl_held,
 		       struct netlink_ext_ack *extack)
@@ -2227,7 +2236,8 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	}
 	fnew->handle = handle;
 
-	err = tcf_exts_init(&fnew->exts, net, TCA_FLOWER_ACT, 0);
+	err = tcf_exts_init_ex(&fnew->exts, net, TCA_FLOWER_ACT, 0, tp, handle,
+			       !tc_skip_hw(fnew->flags));
 	if (err < 0)
 		goto errout_idr;
 
@@ -3449,6 +3459,7 @@ static struct tcf_proto_ops cls_fl_ops __read_mostly = {
 	.tmplt_create	= fl_tmplt_create,
 	.tmplt_destroy	= fl_tmplt_destroy,
 	.tmplt_dump	= fl_tmplt_dump,
+	.get_exts	= fl_get_exts,
 	.owner		= THIS_MODULE,
 	.flags		= TCF_PROTO_OPS_DOIT_UNLOCKED,
 };
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next v9 4/7] net/mlx5: Kconfig: Make tc offload depend on tc skb extension
  2023-02-06 17:43 [PATCH net-next v9 0/7] net/sched: cls_api: Support hardware miss to tc action Paul Blakey
                   ` (2 preceding siblings ...)
  2023-02-06 17:43 ` [PATCH net-next v9 3/7] net/sched: flower: Support hardware miss to tc action Paul Blakey
@ 2023-02-06 17:44 ` Paul Blakey
  2023-02-10  2:29   ` Marcelo Ricardo Leitner
  2023-02-06 17:44 ` [PATCH net-next v9 5/7] net/mlx5: Refactor tc miss handling to a single function Paul Blakey
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Paul Blakey @ 2023-02-06 17:44 UTC (permalink / raw)
  To: Paul Blakey, netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller
  Cc: Oz Shlomo, Jiri Pirko, Roi Dayan, Vlad Buslov

Tc skb extension is a basic requirement for using tc
offload to support correct restoration on action miss.

Depend on it.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig     | 4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c | 2 --
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c  | 7 -------
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c     | 2 --
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.h     | 2 --
 5 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 26685fd0fdaa..bb1d7b039a7e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -85,7 +85,7 @@ config MLX5_BRIDGE
 
 config MLX5_CLS_ACT
 	bool "MLX5 TC classifier action support"
-	depends on MLX5_ESWITCH && NET_CLS_ACT
+	depends on MLX5_ESWITCH && NET_CLS_ACT && NET_TC_SKB_EXT
 	default y
 	help
 	  mlx5 ConnectX offloads support for TC classifier action (NET_CLS_ACT),
@@ -100,7 +100,7 @@ config MLX5_CLS_ACT
 
 config MLX5_TC_CT
 	bool "MLX5 TC connection tracking offload support"
-	depends on MLX5_CLS_ACT && NF_FLOW_TABLE && NET_ACT_CT && NET_TC_SKB_EXT
+	depends on MLX5_CLS_ACT && NF_FLOW_TABLE && NET_ACT_CT
 	default y
 	help
 	  Say Y here if you want to support offloading connection tracking rules
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
index b08339d986d5..fcb4cf526727 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
@@ -762,7 +762,6 @@ static bool mlx5e_restore_skb_chain(struct sk_buff *skb, u32 chain, u32 reg_c1,
 	struct mlx5e_priv *priv = netdev_priv(skb->dev);
 	u32 tunnel_id = (reg_c1 >> ESW_TUN_OFFSET) & TUNNEL_ID_MASK;
 
-#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
 	if (chain) {
 		struct mlx5_rep_uplink_priv *uplink_priv;
 		struct mlx5e_rep_priv *uplink_rpriv;
@@ -784,7 +783,6 @@ static bool mlx5e_restore_skb_chain(struct sk_buff *skb, u32 chain, u32 reg_c1,
 					      zone_restore_id))
 			return false;
 	}
-#endif /* CONFIG_NET_TC_SKB_EXT */
 
 	return mlx5e_restore_tunnel(priv, skb, tc_priv, tunnel_id);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index 193562c14c44..2251f33c3865 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -2078,13 +2078,6 @@ mlx5_tc_ct_init_check_support(struct mlx5e_priv *priv,
 	const char *err_msg = NULL;
 	int err = 0;
 
-#if !IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
-	/* cannot restore chain ID on HW miss */
-
-	err_msg = "tc skb extension missing";
-	err = -EOPNOTSUPP;
-	goto out_err;
-#endif
 	if (IS_ERR_OR_NULL(post_act)) {
 		/* Ignore_flow_level support isn't supported by default for VFs and so post_act
 		 * won't be supported. Skip showing error msg.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 4e6f5caf8ab6..b173c7e9e553 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -5565,7 +5565,6 @@ int mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
 bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe,
 			 struct sk_buff *skb)
 {
-#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
 	u32 chain = 0, chain_tag, reg_b, zone_restore_id;
 	struct mlx5e_priv *priv = netdev_priv(skb->dev);
 	struct mlx5_mapped_obj mapped_obj;
@@ -5603,7 +5602,6 @@ bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe,
 		netdev_dbg(priv->netdev, "Invalid mapped object type: %d\n", mapped_obj.type);
 		return false;
 	}
-#endif /* CONFIG_NET_TC_SKB_EXT */
 
 	return true;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
index ce516dc7f3fd..ee9c8f31491e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -370,7 +370,6 @@ struct mlx5e_tc_table *mlx5e_tc_table_alloc(void);
 void mlx5e_tc_table_free(struct mlx5e_tc_table *tc);
 static inline bool mlx5e_cqe_regb_chain(struct mlx5_cqe64 *cqe)
 {
-#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
 	u32 chain, reg_b;
 
 	reg_b = be32_to_cpu(cqe->ft_metadata);
@@ -381,7 +380,6 @@ static inline bool mlx5e_cqe_regb_chain(struct mlx5_cqe64 *cqe)
 	chain = reg_b & MLX5E_TC_TABLE_CHAIN_TAG_MASK;
 	if (chain)
 		return true;
-#endif
 
 	return false;
 }
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next v9 5/7] net/mlx5: Refactor tc miss handling to a single function
  2023-02-06 17:43 [PATCH net-next v9 0/7] net/sched: cls_api: Support hardware miss to tc action Paul Blakey
                   ` (3 preceding siblings ...)
  2023-02-06 17:44 ` [PATCH net-next v9 4/7] net/mlx5: Kconfig: Make tc offload depend on tc skb extension Paul Blakey
@ 2023-02-06 17:44 ` Paul Blakey
  2023-02-06 17:44 ` [PATCH net-next v9 6/7] net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG Paul Blakey
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Paul Blakey @ 2023-02-06 17:44 UTC (permalink / raw)
  To: Paul Blakey, netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller
  Cc: Oz Shlomo, Jiri Pirko, Roi Dayan, Vlad Buslov

Move tc miss handling code to en_tc.c, and remove
duplicate code.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en/rep/tc.c   | 223 ++----------------
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |   4 +-
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 220 +++++++++++++++--
 .../net/ethernet/mellanox/mlx5/core/en_tc.h   |  11 +-
 4 files changed, 231 insertions(+), 227 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
index fcb4cf526727..0b84665989fb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
@@ -1,7 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
 /* Copyright (c) 2020 Mellanox Technologies. */
 
-#include <net/dst_metadata.h>
 #include <linux/netdevice.h>
 #include <linux/if_macvlan.h>
 #include <linux/list.h>
@@ -665,230 +664,54 @@ void mlx5e_rep_tc_netdevice_event_unregister(struct mlx5e_rep_priv *rpriv)
 				 mlx5e_rep_indr_block_unbind);
 }
 
-static bool mlx5e_restore_tunnel(struct mlx5e_priv *priv, struct sk_buff *skb,
-				 struct mlx5e_tc_update_priv *tc_priv,
-				 u32 tunnel_id)
-{
-	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
-	struct tunnel_match_enc_opts enc_opts = {};
-	struct mlx5_rep_uplink_priv *uplink_priv;
-	struct mlx5e_rep_priv *uplink_rpriv;
-	struct metadata_dst *tun_dst;
-	struct tunnel_match_key key;
-	u32 tun_id, enc_opts_id;
-	struct net_device *dev;
-	int err;
-
-	enc_opts_id = tunnel_id & ENC_OPTS_BITS_MASK;
-	tun_id = tunnel_id >> ENC_OPTS_BITS;
-
-	if (!tun_id)
-		return true;
-
-	uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
-	uplink_priv = &uplink_rpriv->uplink_priv;
-
-	err = mapping_find(uplink_priv->tunnel_mapping, tun_id, &key);
-	if (err) {
-		netdev_dbg(priv->netdev,
-			   "Couldn't find tunnel for tun_id: %d, err: %d\n",
-			   tun_id, err);
-		return false;
-	}
-
-	if (enc_opts_id) {
-		err = mapping_find(uplink_priv->tunnel_enc_opts_mapping,
-				   enc_opts_id, &enc_opts);
-		if (err) {
-			netdev_dbg(priv->netdev,
-				   "Couldn't find tunnel (opts) for tun_id: %d, err: %d\n",
-				   enc_opts_id, err);
-			return false;
-		}
-	}
-
-	if (key.enc_control.addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS) {
-		tun_dst = __ip_tun_set_dst(key.enc_ipv4.src, key.enc_ipv4.dst,
-					   key.enc_ip.tos, key.enc_ip.ttl,
-					   key.enc_tp.dst, TUNNEL_KEY,
-					   key32_to_tunnel_id(key.enc_key_id.keyid),
-					   enc_opts.key.len);
-	} else if (key.enc_control.addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS) {
-		tun_dst = __ipv6_tun_set_dst(&key.enc_ipv6.src, &key.enc_ipv6.dst,
-					     key.enc_ip.tos, key.enc_ip.ttl,
-					     key.enc_tp.dst, 0, TUNNEL_KEY,
-					     key32_to_tunnel_id(key.enc_key_id.keyid),
-					     enc_opts.key.len);
-	} else {
-		netdev_dbg(priv->netdev,
-			   "Couldn't restore tunnel, unsupported addr_type: %d\n",
-			   key.enc_control.addr_type);
-		return false;
-	}
-
-	if (!tun_dst) {
-		netdev_dbg(priv->netdev, "Couldn't restore tunnel, no tun_dst\n");
-		return false;
-	}
-
-	tun_dst->u.tun_info.key.tp_src = key.enc_tp.src;
-
-	if (enc_opts.key.len)
-		ip_tunnel_info_opts_set(&tun_dst->u.tun_info,
-					enc_opts.key.data,
-					enc_opts.key.len,
-					enc_opts.key.dst_opt_type);
-
-	skb_dst_set(skb, (struct dst_entry *)tun_dst);
-	dev = dev_get_by_index(&init_net, key.filter_ifindex);
-	if (!dev) {
-		netdev_dbg(priv->netdev,
-			   "Couldn't find tunnel device with ifindex: %d\n",
-			   key.filter_ifindex);
-		return false;
-	}
-
-	/* Set fwd_dev so we do dev_put() after datapath */
-	tc_priv->fwd_dev = dev;
-
-	skb->dev = dev;
-
-	return true;
-}
-
-static bool mlx5e_restore_skb_chain(struct sk_buff *skb, u32 chain, u32 reg_c1,
-				    struct mlx5e_tc_update_priv *tc_priv)
-{
-	struct mlx5e_priv *priv = netdev_priv(skb->dev);
-	u32 tunnel_id = (reg_c1 >> ESW_TUN_OFFSET) & TUNNEL_ID_MASK;
-
-	if (chain) {
-		struct mlx5_rep_uplink_priv *uplink_priv;
-		struct mlx5e_rep_priv *uplink_rpriv;
-		struct tc_skb_ext *tc_skb_ext;
-		struct mlx5_eswitch *esw;
-		u32 zone_restore_id;
-
-		tc_skb_ext = tc_skb_ext_alloc(skb);
-		if (!tc_skb_ext) {
-			WARN_ON(1);
-			return false;
-		}
-		tc_skb_ext->chain = chain;
-		zone_restore_id = reg_c1 & ESW_ZONE_ID_MASK;
-		esw = priv->mdev->priv.eswitch;
-		uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
-		uplink_priv = &uplink_rpriv->uplink_priv;
-		if (!mlx5e_tc_ct_restore_flow(uplink_priv->ct_priv, skb,
-					      zone_restore_id))
-			return false;
-	}
-
-	return mlx5e_restore_tunnel(priv, skb, tc_priv, tunnel_id);
-}
-
-static void mlx5_rep_tc_post_napi_receive(struct mlx5e_tc_update_priv *tc_priv)
-{
-	if (tc_priv->fwd_dev)
-		dev_put(tc_priv->fwd_dev);
-}
-
-static void mlx5e_restore_skb_sample(struct mlx5e_priv *priv, struct sk_buff *skb,
-				     struct mlx5_mapped_obj *mapped_obj,
-				     struct mlx5e_tc_update_priv *tc_priv)
-{
-	if (!mlx5e_restore_tunnel(priv, skb, tc_priv, mapped_obj->sample.tunnel_id)) {
-		netdev_dbg(priv->netdev,
-			   "Failed to restore tunnel info for sampled packet\n");
-		return;
-	}
-	mlx5e_tc_sample_skb(skb, mapped_obj);
-	mlx5_rep_tc_post_napi_receive(tc_priv);
-}
-
-static bool mlx5e_restore_skb_int_port(struct mlx5e_priv *priv, struct sk_buff *skb,
-				       struct mlx5_mapped_obj *mapped_obj,
-				       struct mlx5e_tc_update_priv *tc_priv,
-				       bool *forward_tx,
-				       u32 reg_c1)
-{
-	u32 tunnel_id = (reg_c1 >> ESW_TUN_OFFSET) & TUNNEL_ID_MASK;
-	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
-	struct mlx5_rep_uplink_priv *uplink_priv;
-	struct mlx5e_rep_priv *uplink_rpriv;
-
-	/* Tunnel restore takes precedence over int port restore */
-	if (tunnel_id)
-		return mlx5e_restore_tunnel(priv, skb, tc_priv, tunnel_id);
-
-	uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
-	uplink_priv = &uplink_rpriv->uplink_priv;
-
-	if (mlx5e_tc_int_port_dev_fwd(uplink_priv->int_port_priv, skb,
-				      mapped_obj->int_port_metadata, forward_tx)) {
-		/* Set fwd_dev for future dev_put */
-		tc_priv->fwd_dev = skb->dev;
-
-		return true;
-	}
-
-	return false;
-}
-
 void mlx5e_rep_tc_receive(struct mlx5_cqe64 *cqe, struct mlx5e_rq *rq,
 			  struct sk_buff *skb)
 {
-	u32 reg_c1 = be32_to_cpu(cqe->ft_metadata);
+	u32 reg_c0, reg_c1, zone_restore_id, tunnel_id;
 	struct mlx5e_tc_update_priv tc_priv = {};
-	struct mlx5_mapped_obj mapped_obj;
+	struct mlx5_rep_uplink_priv *uplink_priv;
+	struct mlx5e_rep_priv *uplink_rpriv;
+	struct mlx5_tc_ct_priv *ct_priv;
+	struct mapping_ctx *mapping_ctx;
 	struct mlx5_eswitch *esw;
-	bool forward_tx = false;
 	struct mlx5e_priv *priv;
-	u32 reg_c0;
-	int err;
 
 	reg_c0 = (be32_to_cpu(cqe->sop_drop_qpn) & MLX5E_TC_FLOW_ID_MASK);
 	if (!reg_c0 || reg_c0 == MLX5_FS_DEFAULT_FLOW_TAG)
 		goto forward;
 
-	/* If reg_c0 is not equal to the default flow tag then skb->mark
+	/* If mapped_obj_id is not equal to the default flow tag then skb->mark
 	 * is not supported and must be reset back to 0.
 	 */
 	skb->mark = 0;
 
 	priv = netdev_priv(skb->dev);
 	esw = priv->mdev->priv.eswitch;
-	err = mapping_find(esw->offloads.reg_c0_obj_pool, reg_c0, &mapped_obj);
-	if (err) {
-		netdev_dbg(priv->netdev,
-			   "Couldn't find mapped object for reg_c0: %d, err: %d\n",
-			   reg_c0, err);
-		goto free_skb;
-	}
+	mapping_ctx = esw->offloads.reg_c0_obj_pool;
+	reg_c1 = be32_to_cpu(cqe->ft_metadata);
+	zone_restore_id = reg_c1 & ESW_ZONE_ID_MASK;
+	tunnel_id = (reg_c1 >> ESW_TUN_OFFSET) & TUNNEL_ID_MASK;
 
-	if (mapped_obj.type == MLX5_MAPPED_OBJ_CHAIN) {
-		if (!mlx5e_restore_skb_chain(skb, mapped_obj.chain, reg_c1, &tc_priv) &&
-		    !mlx5_ipsec_is_rx_flow(cqe))
-			goto free_skb;
-	} else if (mapped_obj.type == MLX5_MAPPED_OBJ_SAMPLE) {
-		mlx5e_restore_skb_sample(priv, skb, &mapped_obj, &tc_priv);
-		goto free_skb;
-	} else if (mapped_obj.type == MLX5_MAPPED_OBJ_INT_PORT_METADATA) {
-		if (!mlx5e_restore_skb_int_port(priv, skb, &mapped_obj, &tc_priv,
-						&forward_tx, reg_c1))
-			goto free_skb;
-	} else {
-		netdev_dbg(priv->netdev, "Invalid mapped object type: %d\n", mapped_obj.type);
+	uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
+	uplink_priv = &uplink_rpriv->uplink_priv;
+	ct_priv = uplink_priv->ct_priv;
+
+	if (!mlx5_ipsec_is_rx_flow(cqe) &&
+	    !mlx5e_tc_update_skb(cqe, skb, mapping_ctx, reg_c0, ct_priv, zone_restore_id, tunnel_id,
+				 &tc_priv))
 		goto free_skb;
-	}
 
 forward:
-	if (forward_tx)
+	if (tc_priv.skb_done)
+		goto free_skb;
+
+	if (tc_priv.forward_tx)
 		dev_queue_xmit(skb);
 	else
 		napi_gro_receive(rq->cq.napi, skb);
 
-	mlx5_rep_tc_post_napi_receive(&tc_priv);
+	if (tc_priv.fwd_dev)
+		dev_put(tc_priv.fwd_dev);
 
 	return;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index a9473a51edc1..fea0c2aa95e2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1792,7 +1792,7 @@ static void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
 
 	if (mlx5e_cqe_regb_chain(cqe))
-		if (!mlx5e_tc_update_skb(cqe, skb)) {
+		if (!mlx5e_tc_update_skb_nic(cqe, skb)) {
 			dev_kfree_skb_any(skb);
 			goto free_wqe;
 		}
@@ -2259,7 +2259,7 @@ static void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct mlx5_cqe64 *cq
 	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
 
 	if (mlx5e_cqe_regb_chain(cqe))
-		if (!mlx5e_tc_update_skb(cqe, skb)) {
+		if (!mlx5e_tc_update_skb_nic(cqe, skb)) {
 			dev_kfree_skb_any(skb);
 			goto mpwrq_cqe_out;
 		}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index b173c7e9e553..a6399dc870c2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -43,6 +43,7 @@
 #include <net/ipv6_stubs.h>
 #include <net/bareudp.h>
 #include <net/bonding.h>
+#include <net/dst_metadata.h>
 #include "en.h"
 #include "en/tc/post_act.h"
 #include "en_rep.h"
@@ -5562,46 +5563,219 @@ int mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
 	}
 }
 
-bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe,
-			 struct sk_buff *skb)
+static bool mlx5e_tc_restore_tunnel(struct mlx5e_priv *priv, struct sk_buff *skb,
+				    struct mlx5e_tc_update_priv *tc_priv,
+				    u32 tunnel_id)
 {
-	u32 chain = 0, chain_tag, reg_b, zone_restore_id;
-	struct mlx5e_priv *priv = netdev_priv(skb->dev);
-	struct mlx5_mapped_obj mapped_obj;
-	struct tc_skb_ext *tc_skb_ext;
-	struct mlx5e_tc_table *tc;
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+	struct tunnel_match_enc_opts enc_opts = {};
+	struct mlx5_rep_uplink_priv *uplink_priv;
+	struct mlx5e_rep_priv *uplink_rpriv;
+	struct metadata_dst *tun_dst;
+	struct tunnel_match_key key;
+	u32 tun_id, enc_opts_id;
+	struct net_device *dev;
 	int err;
 
-	reg_b = be32_to_cpu(cqe->ft_metadata);
-	tc = mlx5e_fs_get_tc(priv->fs);
-	chain_tag = reg_b & MLX5E_TC_TABLE_CHAIN_TAG_MASK;
+	enc_opts_id = tunnel_id & ENC_OPTS_BITS_MASK;
+	tun_id = tunnel_id >> ENC_OPTS_BITS;
+
+	if (!tun_id)
+		return true;
+
+	uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
+	uplink_priv = &uplink_rpriv->uplink_priv;
 
-	err = mapping_find(tc->mapping, chain_tag, &mapped_obj);
+	err = mapping_find(uplink_priv->tunnel_mapping, tun_id, &key);
 	if (err) {
 		netdev_dbg(priv->netdev,
-			   "Couldn't find chain for chain tag: %d, err: %d\n",
-			   chain_tag, err);
+			   "Couldn't find tunnel for tun_id: %d, err: %d\n",
+			   tun_id, err);
+		return false;
+	}
+
+	if (enc_opts_id) {
+		err = mapping_find(uplink_priv->tunnel_enc_opts_mapping,
+				   enc_opts_id, &enc_opts);
+		if (err) {
+			netdev_dbg(priv->netdev,
+				   "Couldn't find tunnel (opts) for tun_id: %d, err: %d\n",
+				   enc_opts_id, err);
+			return false;
+		}
+	}
+
+	switch (key.enc_control.addr_type) {
+	case FLOW_DISSECTOR_KEY_IPV4_ADDRS:
+		tun_dst = __ip_tun_set_dst(key.enc_ipv4.src, key.enc_ipv4.dst,
+					   key.enc_ip.tos, key.enc_ip.ttl,
+					   key.enc_tp.dst, TUNNEL_KEY,
+					   key32_to_tunnel_id(key.enc_key_id.keyid),
+					   enc_opts.key.len);
+		break;
+	case FLOW_DISSECTOR_KEY_IPV6_ADDRS:
+		tun_dst = __ipv6_tun_set_dst(&key.enc_ipv6.src, &key.enc_ipv6.dst,
+					     key.enc_ip.tos, key.enc_ip.ttl,
+					     key.enc_tp.dst, 0, TUNNEL_KEY,
+					     key32_to_tunnel_id(key.enc_key_id.keyid),
+					     enc_opts.key.len);
+		break;
+	default:
+		netdev_dbg(priv->netdev,
+			   "Couldn't restore tunnel, unsupported addr_type: %d\n",
+			   key.enc_control.addr_type);
+		return false;
+	}
+
+	if (!tun_dst) {
+		netdev_dbg(priv->netdev, "Couldn't restore tunnel, no tun_dst\n");
+		return false;
+	}
+
+	tun_dst->u.tun_info.key.tp_src = key.enc_tp.src;
+
+	if (enc_opts.key.len)
+		ip_tunnel_info_opts_set(&tun_dst->u.tun_info,
+					enc_opts.key.data,
+					enc_opts.key.len,
+					enc_opts.key.dst_opt_type);
+
+	skb_dst_set(skb, (struct dst_entry *)tun_dst);
+	dev = dev_get_by_index(&init_net, key.filter_ifindex);
+	if (!dev) {
+		netdev_dbg(priv->netdev,
+			   "Couldn't find tunnel device with ifindex: %d\n",
+			   key.filter_ifindex);
 		return false;
 	}
 
-	if (mapped_obj.type == MLX5_MAPPED_OBJ_CHAIN) {
-		chain = mapped_obj.chain;
+	/* Set fwd_dev so we do dev_put() after datapath */
+	tc_priv->fwd_dev = dev;
+
+	skb->dev = dev;
+
+	return true;
+}
+
+static bool mlx5e_tc_restore_skb_chain(struct sk_buff *skb, struct mlx5_tc_ct_priv *ct_priv,
+				       u32 chain, u32 zone_restore_id,
+				       u32 tunnel_id,  struct mlx5e_tc_update_priv *tc_priv)
+{
+	struct mlx5e_priv *priv = netdev_priv(skb->dev);
+	struct tc_skb_ext *tc_skb_ext;
+
+	if (chain) {
+		if (!mlx5e_tc_ct_restore_flow(ct_priv, skb, zone_restore_id))
+			return false;
+
 		tc_skb_ext = tc_skb_ext_alloc(skb);
-		if (WARN_ON(!tc_skb_ext))
+		if (!tc_skb_ext) {
+			WARN_ON(1);
 			return false;
+		}
 
 		tc_skb_ext->chain = chain;
+	}
 
-		zone_restore_id = (reg_b >> MLX5_REG_MAPPING_MOFFSET(NIC_ZONE_RESTORE_TO_REG)) &
-			ESW_ZONE_ID_MASK;
+	if (tc_priv)
+		return mlx5e_tc_restore_tunnel(priv, skb, tc_priv, tunnel_id);
 
-		if (!mlx5e_tc_ct_restore_flow(tc->ct, skb,
-					      zone_restore_id))
-			return false;
-	} else {
+	return true;
+}
+
+static void mlx5e_tc_restore_skb_sample(struct mlx5e_priv *priv, struct sk_buff *skb,
+					struct mlx5_mapped_obj *mapped_obj,
+					struct mlx5e_tc_update_priv *tc_priv)
+{
+	if (!mlx5e_tc_restore_tunnel(priv, skb, tc_priv, mapped_obj->sample.tunnel_id)) {
+		netdev_dbg(priv->netdev,
+			   "Failed to restore tunnel info for sampled packet\n");
+		return;
+	}
+	mlx5e_tc_sample_skb(skb, mapped_obj);
+}
+
+static bool mlx5e_tc_restore_skb_int_port(struct mlx5e_priv *priv, struct sk_buff *skb,
+					  struct mlx5_mapped_obj *mapped_obj,
+					  struct mlx5e_tc_update_priv *tc_priv,
+					  u32 tunnel_id)
+{
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+	struct mlx5_rep_uplink_priv *uplink_priv;
+	struct mlx5e_rep_priv *uplink_rpriv;
+	bool forward_tx = false;
+
+	/* Tunnel restore takes precedence over int port restore */
+	if (tunnel_id)
+		return mlx5e_tc_restore_tunnel(priv, skb, tc_priv, tunnel_id);
+
+	uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
+	uplink_priv = &uplink_rpriv->uplink_priv;
+
+	if (mlx5e_tc_int_port_dev_fwd(uplink_priv->int_port_priv, skb,
+				      mapped_obj->int_port_metadata, &forward_tx)) {
+		/* Set fwd_dev for future dev_put */
+		tc_priv->fwd_dev = skb->dev;
+		tc_priv->forward_tx = forward_tx;
+
+		return true;
+	}
+
+	return false;
+}
+
+bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe, struct sk_buff *skb,
+			 struct mapping_ctx *mapping_ctx, u32 mapped_obj_id,
+			 struct mlx5_tc_ct_priv *ct_priv,
+			 u32 zone_restore_id, u32 tunnel_id,
+			 struct mlx5e_tc_update_priv *tc_priv)
+{
+	struct mlx5e_priv *priv = netdev_priv(skb->dev);
+	struct mlx5_mapped_obj mapped_obj;
+	int err;
+
+	err = mapping_find(mapping_ctx, mapped_obj_id, &mapped_obj);
+	if (err) {
+		netdev_dbg(skb->dev,
+			   "Couldn't find mapped object for mapped_obj_id: %d, err: %d\n",
+			   mapped_obj_id, err);
+		return false;
+	}
+
+	switch (mapped_obj.type) {
+	case MLX5_MAPPED_OBJ_CHAIN:
+		return mlx5e_tc_restore_skb_chain(skb, ct_priv, mapped_obj.chain, zone_restore_id,
+						  tunnel_id, tc_priv);
+	case MLX5_MAPPED_OBJ_SAMPLE:
+		mlx5e_tc_restore_skb_sample(priv, skb, &mapped_obj, tc_priv);
+		tc_priv->skb_done = true;
+		return true;
+	case MLX5_MAPPED_OBJ_INT_PORT_METADATA:
+		return mlx5e_tc_restore_skb_int_port(priv, skb, &mapped_obj, tc_priv, tunnel_id);
+	default:
 		netdev_dbg(priv->netdev, "Invalid mapped object type: %d\n", mapped_obj.type);
 		return false;
 	}
 
-	return true;
+	return false;
+}
+
+bool mlx5e_tc_update_skb_nic(struct mlx5_cqe64 *cqe, struct sk_buff *skb)
+{
+	struct mlx5e_priv *priv = netdev_priv(skb->dev);
+	u32 mapped_obj_id, reg_b, zone_restore_id;
+	struct mlx5_tc_ct_priv *ct_priv;
+	struct mapping_ctx *mapping_ctx;
+	struct mlx5e_tc_table *tc;
+
+	reg_b = be32_to_cpu(cqe->ft_metadata);
+	tc = mlx5e_fs_get_tc(priv->fs);
+	mapped_obj_id = reg_b & MLX5E_TC_TABLE_CHAIN_TAG_MASK;
+	zone_restore_id = (reg_b >> MLX5_REG_MAPPING_MOFFSET(NIC_ZONE_RESTORE_TO_REG)) &
+			  ESW_ZONE_ID_MASK;
+	ct_priv = tc->ct;
+	mapping_ctx = tc->mapping;
+
+	return mlx5e_tc_update_skb(cqe, skb, mapping_ctx, mapped_obj_id, ct_priv, zone_restore_id,
+				   0, NULL);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
index ee9c8f31491e..1c52c8915c3a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -59,6 +59,8 @@ int mlx5e_tc_num_filters(struct mlx5e_priv *priv, unsigned long flags);
 
 struct mlx5e_tc_update_priv {
 	struct net_device *fwd_dev;
+	bool skb_done;
+	bool forward_tx;
 };
 
 struct mlx5_nic_flow_attr {
@@ -384,14 +386,19 @@ static inline bool mlx5e_cqe_regb_chain(struct mlx5_cqe64 *cqe)
 	return false;
 }
 
-bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe, struct sk_buff *skb);
+bool mlx5e_tc_update_skb_nic(struct mlx5_cqe64 *cqe, struct sk_buff *skb);
+bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe, struct sk_buff *skb,
+			 struct mapping_ctx *mapping_ctx, u32 mapped_obj_id,
+			 struct mlx5_tc_ct_priv *ct_priv,
+			 u32 zone_restore_id, u32 tunnel_id,
+			 struct mlx5e_tc_update_priv *tc_priv);
 #else /* CONFIG_MLX5_CLS_ACT */
 static inline struct mlx5e_tc_table *mlx5e_tc_table_alloc(void) { return NULL; }
 static inline void mlx5e_tc_table_free(struct mlx5e_tc_table *tc) {}
 static inline bool mlx5e_cqe_regb_chain(struct mlx5_cqe64 *cqe)
 { return false; }
 static inline bool
-mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe, struct sk_buff *skb)
+mlx5e_tc_update_skb_nic(struct mlx5_cqe64 *cqe, struct sk_buff *skb)
 { return true; }
 #endif
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next v9 6/7] net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
  2023-02-06 17:43 [PATCH net-next v9 0/7] net/sched: cls_api: Support hardware miss to tc action Paul Blakey
                   ` (4 preceding siblings ...)
  2023-02-06 17:44 ` [PATCH net-next v9 5/7] net/mlx5: Refactor tc miss handling to a single function Paul Blakey
@ 2023-02-06 17:44 ` Paul Blakey
  2023-02-06 17:44 ` [PATCH net-next v9 7/7] net/mlx5e: TC, Set CT miss to the specific ct action instance Paul Blakey
  2023-02-10  1:56 ` [PATCH net-next v9 0/7] net/sched: cls_api: Support hardware miss to tc action Marcelo Ricardo Leitner
  7 siblings, 0 replies; 21+ messages in thread
From: Paul Blakey @ 2023-02-06 17:44 UTC (permalink / raw)
  To: Paul Blakey, netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller
  Cc: Oz Shlomo, Jiri Pirko, Roi Dayan, Vlad Buslov, Leon Romanovsky

This reg usage is always a mapped object, not necessarily
containing chain info.

Rename to properly convey what it stores.
This patch doesn't change any functionality.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/tc/sample.c |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    |  6 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.h    |  4 ++--
 .../ethernet/mellanox/mlx5/core/lib/fs_chains.c    | 14 +++++++-------
 5 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/sample.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/sample.c
index f2c2c752bd1c..558a776359af 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/sample.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/sample.c
@@ -237,7 +237,7 @@ sample_modify_hdr_get(struct mlx5_core_dev *mdev, u32 obj_id,
 	int err;
 
 	err = mlx5e_tc_match_to_reg_set(mdev, mod_acts, MLX5_FLOW_NAMESPACE_FDB,
-					CHAIN_TO_REG, obj_id);
+					MAPPED_OBJ_TO_REG, obj_id);
 	if (err)
 		goto err_set_regc0;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index 2251f33c3865..de751d084770 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -1875,7 +1875,7 @@ __mlx5_tc_ct_flow_offload(struct mlx5_tc_ct_priv *ct_priv,
 	ct_flow->chain_mapping = chain_mapping;
 
 	err = mlx5e_tc_match_to_reg_set(priv->mdev, pre_mod_acts, ct_priv->ns_type,
-					CHAIN_TO_REG, chain_mapping);
+					MAPPED_OBJ_TO_REG, chain_mapping);
 	if (err) {
 		ct_dbg("Failed to set chain register mapping");
 		goto err_mapping;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index a6399dc870c2..f0ce1d1ae8ad 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -105,7 +105,7 @@ struct mlx5e_tc_table {
 };
 
 struct mlx5e_tc_attr_to_reg_mapping mlx5e_tc_attr_to_reg_mappings[] = {
-	[CHAIN_TO_REG] = {
+	[MAPPED_OBJ_TO_REG] = {
 		.mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_C_0,
 		.moffset = 0,
 		.mlen = 16,
@@ -132,7 +132,7 @@ struct mlx5e_tc_attr_to_reg_mapping mlx5e_tc_attr_to_reg_mappings[] = {
 	 * into reg_b that is passed to SW since we don't
 	 * jump between steering domains.
 	 */
-	[NIC_CHAIN_TO_REG] = {
+	[NIC_MAPPED_OBJ_TO_REG] = {
 		.mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_B,
 		.moffset = 0,
 		.mlen = 16,
@@ -1585,7 +1585,7 @@ mlx5e_tc_offload_to_slow_path(struct mlx5_eswitch *esw,
 		goto err_get_chain;
 
 	err = mlx5e_tc_match_to_reg_set(esw->dev, &mod_acts, MLX5_FLOW_NAMESPACE_FDB,
-					CHAIN_TO_REG, chain_mapping);
+					MAPPED_OBJ_TO_REG, chain_mapping);
 	if (err)
 		goto err_reg_set;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
index 1c52c8915c3a..680333ab63fc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -229,7 +229,7 @@ void mlx5e_tc_update_neigh_used_value(struct mlx5e_neigh_hash_entry *nhe);
 void mlx5e_tc_reoffload_flows_work(struct work_struct *work);
 
 enum mlx5e_tc_attr_to_reg {
-	CHAIN_TO_REG,
+	MAPPED_OBJ_TO_REG,
 	VPORT_TO_REG,
 	TUNNEL_TO_REG,
 	CTSTATE_TO_REG,
@@ -238,7 +238,7 @@ enum mlx5e_tc_attr_to_reg {
 	MARK_TO_REG,
 	LABELS_TO_REG,
 	FTEID_TO_REG,
-	NIC_CHAIN_TO_REG,
+	NIC_MAPPED_OBJ_TO_REG,
 	NIC_ZONE_RESTORE_TO_REG,
 	PACKET_COLOR_TO_REG,
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
index df58cba37930..81ed91fee59b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
@@ -214,7 +214,7 @@ create_chain_restore(struct fs_chain *chain)
 	struct mlx5_eswitch *esw = chain->chains->dev->priv.eswitch;
 	u8 modact[MLX5_UN_SZ_BYTES(set_add_copy_action_in_auto)] = {};
 	struct mlx5_fs_chains *chains = chain->chains;
-	enum mlx5e_tc_attr_to_reg chain_to_reg;
+	enum mlx5e_tc_attr_to_reg mapped_obj_to_reg;
 	struct mlx5_modify_hdr *mod_hdr;
 	u32 index;
 	int err;
@@ -242,7 +242,7 @@ create_chain_restore(struct fs_chain *chain)
 	chain->id = index;
 
 	if (chains->ns == MLX5_FLOW_NAMESPACE_FDB) {
-		chain_to_reg = CHAIN_TO_REG;
+		mapped_obj_to_reg = MAPPED_OBJ_TO_REG;
 		chain->restore_rule = esw_add_restore_rule(esw, chain->id);
 		if (IS_ERR(chain->restore_rule)) {
 			err = PTR_ERR(chain->restore_rule);
@@ -253,7 +253,7 @@ create_chain_restore(struct fs_chain *chain)
 		 * since we write the metadata to reg_b
 		 * that is passed to SW directly.
 		 */
-		chain_to_reg = NIC_CHAIN_TO_REG;
+		mapped_obj_to_reg = NIC_MAPPED_OBJ_TO_REG;
 	} else {
 		err = -EINVAL;
 		goto err_rule;
@@ -261,12 +261,12 @@ create_chain_restore(struct fs_chain *chain)
 
 	MLX5_SET(set_action_in, modact, action_type, MLX5_ACTION_TYPE_SET);
 	MLX5_SET(set_action_in, modact, field,
-		 mlx5e_tc_attr_to_reg_mappings[chain_to_reg].mfield);
+		 mlx5e_tc_attr_to_reg_mappings[mapped_obj_to_reg].mfield);
 	MLX5_SET(set_action_in, modact, offset,
-		 mlx5e_tc_attr_to_reg_mappings[chain_to_reg].moffset);
+		 mlx5e_tc_attr_to_reg_mappings[mapped_obj_to_reg].moffset);
 	MLX5_SET(set_action_in, modact, length,
-		 mlx5e_tc_attr_to_reg_mappings[chain_to_reg].mlen == 32 ?
-		 0 : mlx5e_tc_attr_to_reg_mappings[chain_to_reg].mlen);
+		 mlx5e_tc_attr_to_reg_mappings[mapped_obj_to_reg].mlen == 32 ?
+		 0 : mlx5e_tc_attr_to_reg_mappings[mapped_obj_to_reg].mlen);
 	MLX5_SET(set_action_in, modact, data, chain->id);
 	mod_hdr = mlx5_modify_header_alloc(chains->dev, chains->ns,
 					   1, modact);
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next v9 7/7] net/mlx5e: TC, Set CT miss to the specific ct action instance
  2023-02-06 17:43 [PATCH net-next v9 0/7] net/sched: cls_api: Support hardware miss to tc action Paul Blakey
                   ` (5 preceding siblings ...)
  2023-02-06 17:44 ` [PATCH net-next v9 6/7] net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG Paul Blakey
@ 2023-02-06 17:44 ` Paul Blakey
  2023-02-10  1:56 ` [PATCH net-next v9 0/7] net/sched: cls_api: Support hardware miss to tc action Marcelo Ricardo Leitner
  7 siblings, 0 replies; 21+ messages in thread
From: Paul Blakey @ 2023-02-06 17:44 UTC (permalink / raw)
  To: Paul Blakey, netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller
  Cc: Oz Shlomo, Jiri Pirko, Roi Dayan, Vlad Buslov

Currently, CT misses restore the missed chain on the tc skb extension so
tc will continue from the relevant chain. Instead, restore the CT action's
miss cookie on the extension, which will instruct tc to continue from the
this specific CT action instance on the relevant filter's action list.

Map the CT action's miss_cookie to a new miss object (ACT_MISS), and use
this miss mapping instead of the current chain miss object (CHAIN_MISS)
for CT action misses.

To restore this new miss mapping value, add a RX restore rule for each
such mapping value.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Sholmo <ozsh@nvidia.com>
Signed-off-by: Paul Blakey <paulb@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en/tc_ct.c    | 32 +++++-----
 .../ethernet/mellanox/mlx5/core/en/tc_ct.h    |  2 +
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 64 +++++++++++++++++--
 .../net/ethernet/mellanox/mlx5/core/en_tc.h   |  6 ++
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  2 +
 5 files changed, 82 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index de751d084770..5c58ec279b10 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -59,6 +59,7 @@ struct mlx5_tc_ct_debugfs {
 
 struct mlx5_tc_ct_priv {
 	struct mlx5_core_dev *dev;
+	struct mlx5e_priv *priv;
 	const struct net_device *netdev;
 	struct mod_hdr_tbl *mod_hdr_tbl;
 	struct xarray tuple_ids;
@@ -85,7 +86,6 @@ struct mlx5_ct_flow {
 	struct mlx5_flow_attr *pre_ct_attr;
 	struct mlx5_flow_handle *pre_ct_rule;
 	struct mlx5_ct_ft *ft;
-	u32 chain_mapping;
 };
 
 struct mlx5_ct_zone_rule {
@@ -1445,6 +1445,7 @@ mlx5_tc_ct_parse_action(struct mlx5_tc_ct_priv *priv,
 	attr->ct_attr.zone = act->ct.zone;
 	attr->ct_attr.ct_action = act->ct.action;
 	attr->ct_attr.nf_ft = act->ct.flow_table;
+	attr->ct_attr.act_miss_cookie = act->miss_cookie;
 
 	return 0;
 }
@@ -1782,7 +1783,7 @@ mlx5_tc_ct_del_ft_cb(struct mlx5_tc_ct_priv *ct_priv, struct mlx5_ct_ft *ft)
  *	+ ft prio (tc chain)  +
  *	+ original match      +
  *	+---------------------+
- *		 | set chain miss mapping
+ *		 | set act_miss_cookie mapping
  *		 | set fte_id
  *		 | set tunnel_id
  *		 | do decap
@@ -1827,7 +1828,7 @@ __mlx5_tc_ct_flow_offload(struct mlx5_tc_ct_priv *ct_priv,
 	struct mlx5_flow_attr *pre_ct_attr;
 	struct mlx5_modify_hdr *mod_hdr;
 	struct mlx5_ct_flow *ct_flow;
-	int chain_mapping = 0, err;
+	int act_miss_mapping = 0, err;
 	struct mlx5_ct_ft *ft;
 	u16 zone;
 
@@ -1862,22 +1863,18 @@ __mlx5_tc_ct_flow_offload(struct mlx5_tc_ct_priv *ct_priv,
 	pre_ct_attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST |
 			       MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
 
-	/* Write chain miss tag for miss in ct table as we
-	 * don't go though all prios of this chain as normal tc rules
-	 * miss.
-	 */
-	err = mlx5_chains_get_chain_mapping(ct_priv->chains, attr->chain,
-					    &chain_mapping);
+	err = mlx5e_tc_action_miss_mapping_get(ct_priv->priv, attr, attr->ct_attr.act_miss_cookie,
+					       &act_miss_mapping);
 	if (err) {
-		ct_dbg("Failed to get chain register mapping for chain");
-		goto err_get_chain;
+		ct_dbg("Failed to get register mapping for act miss");
+		goto err_get_act_miss;
 	}
-	ct_flow->chain_mapping = chain_mapping;
+	attr->ct_attr.act_miss_mapping = act_miss_mapping;
 
 	err = mlx5e_tc_match_to_reg_set(priv->mdev, pre_mod_acts, ct_priv->ns_type,
-					MAPPED_OBJ_TO_REG, chain_mapping);
+					MAPPED_OBJ_TO_REG, act_miss_mapping);
 	if (err) {
-		ct_dbg("Failed to set chain register mapping");
+		ct_dbg("Failed to set act miss register mapping");
 		goto err_mapping;
 	}
 
@@ -1941,8 +1938,8 @@ __mlx5_tc_ct_flow_offload(struct mlx5_tc_ct_priv *ct_priv,
 	mlx5_modify_header_dealloc(priv->mdev, pre_ct_attr->modify_hdr);
 err_mapping:
 	mlx5e_mod_hdr_dealloc(pre_mod_acts);
-	mlx5_chains_put_chain_mapping(ct_priv->chains, ct_flow->chain_mapping);
-err_get_chain:
+	mlx5e_tc_action_miss_mapping_put(ct_priv->priv, attr, act_miss_mapping);
+err_get_act_miss:
 	kfree(ct_flow->pre_ct_attr);
 err_alloc_pre:
 	mlx5_tc_ct_del_ft_cb(ct_priv, ft);
@@ -1981,7 +1978,7 @@ __mlx5_tc_ct_delete_flow(struct mlx5_tc_ct_priv *ct_priv,
 	mlx5_tc_rule_delete(priv, ct_flow->pre_ct_rule, pre_ct_attr);
 	mlx5_modify_header_dealloc(priv->mdev, pre_ct_attr->modify_hdr);
 
-	mlx5_chains_put_chain_mapping(ct_priv->chains, ct_flow->chain_mapping);
+	mlx5e_tc_action_miss_mapping_put(ct_priv->priv, attr, attr->ct_attr.act_miss_mapping);
 	mlx5_tc_ct_del_ft_cb(ct_priv, ct_flow->ft);
 
 	kfree(ct_flow->pre_ct_attr);
@@ -2154,6 +2151,7 @@ mlx5_tc_ct_init(struct mlx5e_priv *priv, struct mlx5_fs_chains *chains,
 	}
 
 	spin_lock_init(&ct_priv->ht_lock);
+	ct_priv->priv = priv;
 	ct_priv->ns_type = ns_type;
 	ct_priv->chains = chains;
 	ct_priv->netdev = priv->netdev;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
index 5bbd6b92840f..5c5ddaa83055 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
@@ -28,6 +28,8 @@ struct mlx5_ct_attr {
 	struct mlx5_ct_flow *ct_flow;
 	struct nf_flowtable *nf_ft;
 	u32 ct_labels_id;
+	u32 act_miss_mapping;
+	u64 act_miss_cookie;
 };
 
 #define zone_to_reg_ct {\
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index f0ce1d1ae8ad..91798291f235 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -3801,6 +3801,7 @@ mlx5e_clone_flow_attr_for_post_act(struct mlx5_flow_attr *attr,
 	attr2->parse_attr = parse_attr;
 	attr2->dest_chain = 0;
 	attr2->dest_ft = NULL;
+	attr2->act_id_restore_rule = NULL;
 
 	if (ns_type == MLX5_FLOW_NAMESPACE_FDB) {
 		attr2->esw_attr->out_count = 0;
@@ -5657,14 +5658,19 @@ static bool mlx5e_tc_restore_tunnel(struct mlx5e_priv *priv, struct sk_buff *skb
 	return true;
 }
 
-static bool mlx5e_tc_restore_skb_chain(struct sk_buff *skb, struct mlx5_tc_ct_priv *ct_priv,
-				       u32 chain, u32 zone_restore_id,
-				       u32 tunnel_id,  struct mlx5e_tc_update_priv *tc_priv)
+static bool mlx5e_tc_restore_skb_tc_meta(struct sk_buff *skb, struct mlx5_tc_ct_priv *ct_priv,
+					 struct mlx5_mapped_obj *mapped_obj, u32 zone_restore_id,
+					 u32 tunnel_id,  struct mlx5e_tc_update_priv *tc_priv)
 {
 	struct mlx5e_priv *priv = netdev_priv(skb->dev);
 	struct tc_skb_ext *tc_skb_ext;
+	u64 act_miss_cookie;
+	u32 chain;
 
-	if (chain) {
+	chain = mapped_obj->type == MLX5_MAPPED_OBJ_CHAIN ? mapped_obj->chain : 0;
+	act_miss_cookie = mapped_obj->type == MLX5_MAPPED_OBJ_ACT_MISS ?
+			  mapped_obj->act_miss_cookie : 0;
+	if (chain || act_miss_cookie) {
 		if (!mlx5e_tc_ct_restore_flow(ct_priv, skb, zone_restore_id))
 			return false;
 
@@ -5674,7 +5680,12 @@ static bool mlx5e_tc_restore_skb_chain(struct sk_buff *skb, struct mlx5_tc_ct_pr
 			return false;
 		}
 
-		tc_skb_ext->chain = chain;
+		if (act_miss_cookie) {
+			tc_skb_ext->act_miss_cookie = act_miss_cookie;
+			tc_skb_ext->act_miss = 1;
+		} else {
+			tc_skb_ext->chain = chain;
+		}
 	}
 
 	if (tc_priv)
@@ -5744,8 +5755,9 @@ bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe, struct sk_buff *skb,
 
 	switch (mapped_obj.type) {
 	case MLX5_MAPPED_OBJ_CHAIN:
-		return mlx5e_tc_restore_skb_chain(skb, ct_priv, mapped_obj.chain, zone_restore_id,
-						  tunnel_id, tc_priv);
+	case MLX5_MAPPED_OBJ_ACT_MISS:
+		return mlx5e_tc_restore_skb_tc_meta(skb, ct_priv, &mapped_obj, zone_restore_id,
+						    tunnel_id, tc_priv);
 	case MLX5_MAPPED_OBJ_SAMPLE:
 		mlx5e_tc_restore_skb_sample(priv, skb, &mapped_obj, tc_priv);
 		tc_priv->skb_done = true;
@@ -5779,3 +5791,41 @@ bool mlx5e_tc_update_skb_nic(struct mlx5_cqe64 *cqe, struct sk_buff *skb)
 	return mlx5e_tc_update_skb(cqe, skb, mapping_ctx, mapped_obj_id, ct_priv, zone_restore_id,
 				   0, NULL);
 }
+
+int mlx5e_tc_action_miss_mapping_get(struct mlx5e_priv *priv, struct mlx5_flow_attr *attr,
+				     u64 act_miss_cookie, u32 *act_miss_mapping)
+{
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+	struct mlx5_mapped_obj mapped_obj = {};
+	struct mapping_ctx *ctx;
+	int err;
+
+	ctx = esw->offloads.reg_c0_obj_pool;
+
+	mapped_obj.type = MLX5_MAPPED_OBJ_ACT_MISS;
+	mapped_obj.act_miss_cookie = act_miss_cookie;
+	err = mapping_add(ctx, &mapped_obj, act_miss_mapping);
+	if (err)
+		return err;
+
+	attr->act_id_restore_rule = esw_add_restore_rule(esw, *act_miss_mapping);
+	if (IS_ERR(attr->act_id_restore_rule))
+		goto err_rule;
+
+	return 0;
+
+err_rule:
+	mapping_remove(ctx, *act_miss_mapping);
+	return err;
+}
+
+void mlx5e_tc_action_miss_mapping_put(struct mlx5e_priv *priv, struct mlx5_flow_attr *attr,
+				      u32 act_miss_mapping)
+{
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+	struct mapping_ctx *ctx;
+
+	ctx = esw->offloads.reg_c0_obj_pool;
+	mlx5_del_flow_rules(attr->act_id_restore_rule);
+	mapping_remove(ctx, act_miss_mapping);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
index 680333ab63fc..fda722fed6b8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -101,6 +101,7 @@ struct mlx5_flow_attr {
 	struct mlx5_flow_attr *branch_true;
 	struct mlx5_flow_attr *branch_false;
 	struct mlx5_flow_attr *jumping_attr;
+	struct mlx5_flow_handle *act_id_restore_rule;
 	/* keep this union last */
 	union {
 		DECLARE_FLEX_ARRAY(struct mlx5_esw_flow_attr, esw_attr);
@@ -402,4 +403,9 @@ mlx5e_tc_update_skb_nic(struct mlx5_cqe64 *cqe, struct sk_buff *skb)
 { return true; }
 #endif
 
+int mlx5e_tc_action_miss_mapping_get(struct mlx5e_priv *priv, struct mlx5_flow_attr *attr,
+				     u64 act_miss_cookie, u32 *act_miss_mapping);
+void mlx5e_tc_action_miss_mapping_put(struct mlx5e_priv *priv, struct mlx5_flow_attr *attr,
+				      u32 act_miss_mapping);
+
 #endif /* __MLX5_EN_TC_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 5b5a215a7dc5..747981b868bd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -52,12 +52,14 @@ enum mlx5_mapped_obj_type {
 	MLX5_MAPPED_OBJ_CHAIN,
 	MLX5_MAPPED_OBJ_SAMPLE,
 	MLX5_MAPPED_OBJ_INT_PORT_METADATA,
+	MLX5_MAPPED_OBJ_ACT_MISS,
 };
 
 struct mlx5_mapped_obj {
 	enum mlx5_mapped_obj_type type;
 	union {
 		u32 chain;
+		u64 act_miss_cookie;
 		struct {
 			u32 group_id;
 			u32 rate;
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v9 0/7] net/sched: cls_api: Support hardware miss to tc action
  2023-02-06 17:43 [PATCH net-next v9 0/7] net/sched: cls_api: Support hardware miss to tc action Paul Blakey
                   ` (6 preceding siblings ...)
  2023-02-06 17:44 ` [PATCH net-next v9 7/7] net/mlx5e: TC, Set CT miss to the specific ct action instance Paul Blakey
@ 2023-02-10  1:56 ` Marcelo Ricardo Leitner
  2023-02-13 16:25   ` Paul Blakey
  7 siblings, 1 reply; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-02-10  1:56 UTC (permalink / raw)
  To: Paul Blakey
  Cc: netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller,
	Oz Shlomo, Jiri Pirko, Roi Dayan, Vlad Buslov

Hi,

On Mon, Feb 06, 2023 at 07:43:56PM +0200, Paul Blakey wrote:
> Hi,
> 
> This series adds support for hardware miss to instruct tc to continue execution
> in a specific tc action instance on a filter's action list. The mlx5 driver patch
> (besides the refactors) shows its usage instead of using just chain restore.
> 
> Currently a filter's action list must be executed all together or
> not at all as driver are only able to tell tc to continue executing from a
> specific tc chain, and not a specific filter/action.
> 
> This is troublesome with regards to action CT, where new connections should
> be sent to software (via tc chain restore), and established connections can
> be handled in hardware.
> 
> Checking for new connections is done when executing the ct action in hardware
> (by checking the packet's tuple against known established tuples).
> But if there is a packet modification (pedit) action before action CT and the
> checked tuple is a new connection, hardware will need to revert the previous
> packet modifications before sending it back to software so it can
> re-match the same tc filter in software and re-execute its CT action.
> 
> The following is an example configuration of stateless nat
> on mlx5 driver that isn't supported before this patchet:
> 
>  #Setup corrosponding mlx5 VFs in namespaces
>  $ ip netns add ns0
>  $ ip netns add ns1
>  $ ip link set dev enp8s0f0v0 netns ns0
>  $ ip netns exec ns0 ifconfig enp8s0f0v0 1.1.1.1/24 up
>  $ ip link set dev enp8s0f0v1 netns ns1
>  $ ip netns exec ns1 ifconfig enp8s0f0v1 1.1.1.2/24 up
> 
>  #Setup tc arp and ct rules on mxl5 VF representors
>  $ tc qdisc add dev enp8s0f0_0 ingress
>  $ tc qdisc add dev enp8s0f0_1 ingress
>  $ ifconfig enp8s0f0_0 up
>  $ ifconfig enp8s0f0_1 up
> 
>  #Original side
>  $ tc filter add dev enp8s0f0_0 ingress chain 0 proto ip flower \
>     ct_state -trk ip_proto tcp dst_port 8888 \
>       action pedit ex munge tcp dport set 5001 pipe \
>       action csum ip tcp pipe \
>       action ct pipe \
>       action goto chain 1
>  $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
>     ct_state +trk+est \
>       action mirred egress redirect dev enp8s0f0_1
>  $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
>     ct_state +trk+new \
>       action ct commit pipe \
>       action mirred egress redirect dev enp8s0f0_1
>  $ tc filter add dev enp8s0f0_0 ingress chain 0 proto arp flower \
>       action mirred egress redirect dev enp8s0f0_1
> 
>  #Reply side
>  $ tc filter add dev enp8s0f0_1 ingress chain 0 proto arp flower \
>       action mirred egress redirect dev enp8s0f0_0
>  $ tc filter add dev enp8s0f0_1 ingress chain 0 proto ip flower \
>     ct_state -trk ip_proto tcp \ 
>       action ct pipe \
>       action pedit ex munge tcp sport set 8888 pipe \
>       action csum ip tcp pipe \
>       action mirred egress redirect dev enp8s0f0_0
> 
>  #Run traffic
>  $ ip netns exec ns1 iperf -s -p 5001&
>  $ sleep 2 #wait for iperf to fully open
>  $ ip netns exec ns0 iperf -c 1.1.1.2 -p 8888
> 
>  #dump tc filter stats on enp8s0f0_0 chain 0 rule and see hardware packets:
>  $ tc -s filter show dev enp8s0f0_0 ingress chain 0 proto ip | grep "hardware.*pkt"
>         Sent hardware 9310116832 bytes 6149672 pkt
>         Sent hardware 9310116832 bytes 6149672 pkt
>         Sent hardware 9310116832 bytes 6149672 pkt

I see Jamal had asked about stats on the other version, but then no
dependency was set. I think we _must_ have a dependency of this
patchet on the per-action stats one. Otherwise the stats above will
get messy.  Without the per-action stats, the last one is replicated
to the other actions. But then, will hw count the packet that it did
only the first action? I don't see how it would, and then for the all
but first one the packet will be accounted twice.

With this said, it would be nice to provide a sample of how the sw and
hw stats would look like _after_ this patchset as well.

Btw I'll add my Reviewed-by tag to the per-action stats one in a few.

> 
> A new connection executing the first filter in hardware will first rewrite
> the dst port to the new port, and then the ct action is executed,
> because this is a new connection, hardware will need to be send this back
> to software, on chain 0, to execute the first filter again in software.
> The dst port needs to be reverted otherwise it won't re-match the old
> dst port in the first filter. Because of that, currently mlx5 driver will
> reject offloading the above action ct rule.
> 
> This series adds supports partial offload of a filter's action list,

We should avoid this terminology as is, as it can create confusion. It
is not that it is offloading action 1 and not action 2. Instead, it is
adding support to a more fine grained miss to sw. Perhaps "support for
partially executing in hw".

> and letting tc software continue processing in the specific action instance
> where hardware left off (in the above case after the "action pedit ex munge tcp
> dport... of the first rule") allowing support for scenarios such as the above.
> 
> Changelog:
> 	v1->v2:
> 	Fixed compilation without CONFIG_NET_CLS
> 	Cover letter re-write
> 
> 	v2->v3:
> 	Unlock spin_lock on error in cls flower filter handle refactor
> 	Cover letter
> 
> 	v3->v4:
> 	Silence warning by clang
> 
> 	v4->v5:
> 	Cover letter example
> 	Removed ifdef as much as possible by using inline stubs
> 
> 	v5->v6:
> 	Removed new inlines in cls_api.c (bot complained in patchwork)
> 	Added reviewed-by/ack - Thanks!
> 
> 	v6->v7:
> 	Removed WARN_ON from pkt path (leon)
> 	Removed unnecessary return in void func
> 
> 	v7->v8:
> 	Removed #if IS_ENABLED on skb ext adding Kconfig changes
> 	Complex variable init in seperate lines
> 	if,else if, else if ---> switch case
> 
> 	v8->v9:
> 	Removed even more IS_ENABLED because of Kconfig
> 
> Paul Blakey (7):
>   net/sched: cls_api: Support hardware miss to tc action
>   net/sched: flower: Move filter handle initialization earlier
>   net/sched: flower: Support hardware miss to tc action
>   net/mlx5: Kconfig: Make tc offload depend on tc skb extension
>   net/mlx5: Refactor tc miss handling to a single function
>   net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
>   net/mlx5e: TC, Set CT miss to the specific ct action instance
> 
>  .../net/ethernet/mellanox/mlx5/core/Kconfig   |   4 +-
>  .../ethernet/mellanox/mlx5/core/en/rep/tc.c   | 225 ++------------
>  .../mellanox/mlx5/core/en/tc/sample.c         |   2 +-
>  .../ethernet/mellanox/mlx5/core/en/tc_ct.c    |  39 +--
>  .../ethernet/mellanox/mlx5/core/en/tc_ct.h    |   2 +
>  .../net/ethernet/mellanox/mlx5/core/en_rx.c   |   4 +-
>  .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 280 ++++++++++++++++--
>  .../net/ethernet/mellanox/mlx5/core/en_tc.h   |  23 +-
>  .../net/ethernet/mellanox/mlx5/core/eswitch.h |   2 +
>  .../mellanox/mlx5/core/lib/fs_chains.c        |  14 +-
>  include/linux/skbuff.h                        |   6 +-
>  include/net/flow_offload.h                    |   1 +
>  include/net/pkt_cls.h                         |  34 ++-
>  include/net/sch_generic.h                     |   2 +
>  net/openvswitch/flow.c                        |   3 +-
>  net/sched/act_api.c                           |   2 +-
>  net/sched/cls_api.c                           | 213 ++++++++++++-
>  net/sched/cls_flower.c                        |  73 +++--
>  18 files changed, 602 insertions(+), 327 deletions(-)
> 
> -- 
> 2.30.1
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v9 1/7] net/sched: cls_api: Support hardware miss to tc action
  2023-02-06 17:43 ` [PATCH net-next v9 1/7] " Paul Blakey
@ 2023-02-10  2:21   ` Marcelo Ricardo Leitner
  2023-02-13 16:13     ` Paul Blakey
  0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-02-10  2:21 UTC (permalink / raw)
  To: Paul Blakey
  Cc: netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller,
	Oz Shlomo, Jiri Pirko, Roi Dayan, Vlad Buslov, Simon Horman

On Mon, Feb 06, 2023 at 07:43:57PM +0200, Paul Blakey wrote:
> For drivers to support partial offload of a filter's action list,
> add support for action miss to specify an action instance to
> continue from in sw.
> 
> CT action in particular can't be fully offloaded, as new connections
> need to be handled in software. This imposes other limitations on
> the actions that can be offloaded together with the CT action, such
> as packet modifications.
> 
> Assign each action on a filter's action list a unique miss_cookie
> which drivers can then use to fill action_miss part of the tc skb
> extension. On getting back this miss_cookie, find the action
> instance with relevant cookie and continue classifying from there.
> 
> Signed-off-by: Paul Blakey <paulb@nvidia.com>
> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
> Reviewed-by: Simon Horman <simon.horman@corigine.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> ---
>  include/linux/skbuff.h     |   6 +-
>  include/net/flow_offload.h |   1 +
>  include/net/pkt_cls.h      |  34 +++---
>  include/net/sch_generic.h  |   2 +
>  net/openvswitch/flow.c     |   3 +-
>  net/sched/act_api.c        |   2 +-
>  net/sched/cls_api.c        | 213 +++++++++++++++++++++++++++++++++++--
>  7 files changed, 234 insertions(+), 27 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 1fa95b916342..9b9aa854068f 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -311,12 +311,16 @@ struct nf_bridge_info {
>   * and read by ovs to recirc_id.
>   */
>  struct tc_skb_ext {
> -	__u32 chain;
> +	union {
> +		u64 act_miss_cookie;
> +		__u32 chain;
> +	};
>  	__u16 mru;
>  	__u16 zone;
>  	u8 post_ct:1;
>  	u8 post_ct_snat:1;
>  	u8 post_ct_dnat:1;
> +	u8 act_miss:1; /* Set if act_miss_cookie is used */
>  };
>  #endif
>  
> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> index 0400a0ac8a29..88db7346eb7a 100644
> --- a/include/net/flow_offload.h
> +++ b/include/net/flow_offload.h
> @@ -228,6 +228,7 @@ void flow_action_cookie_destroy(struct flow_action_cookie *cookie);
>  struct flow_action_entry {
>  	enum flow_action_id		id;
>  	u32				hw_index;
> +	u64				miss_cookie;

The per-action stats patchset is adding a cookie for the actions as
well, and exactly on this struct:

@@ -228,6 +228,7 @@ struct flow_action_cookie *flow_action_cookie_create(void *data,
 struct flow_action_entry {
        enum flow_action_id             id;
        u32                             hw_index;
+       unsigned long                   act_cookie;
        enum flow_action_hw_stats       hw_stats;
        action_destr                    destructor;
        void                            *destructor_priv;

There, it is a simple value: the act pointer itself. Here, it is already more
complex. Can them be merged into only one maybe?
If not, perhaps act_cookie should be renamed to stats_cookie then.

>  	enum flow_action_hw_stats	hw_stats;
>  	action_destr			destructor;
>  	void				*destructor_priv;
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index cd410a87517b..e395f2a84ed2 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -59,6 +59,8 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
>  void tcf_block_put(struct tcf_block *block);
>  void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
>  		       struct tcf_block_ext_info *ei);
> +int tcf_exts_init_ex(struct tcf_exts *exts, struct net *net, int action,
> +		     int police, struct tcf_proto *tp, u32 handle, bool used_action_miss);
>  
>  static inline bool tcf_block_shared(struct tcf_block *block)
>  {
> @@ -229,6 +231,7 @@ struct tcf_exts {
>  	struct tc_action **actions;
>  	struct net	*net;
>  	netns_tracker	ns_tracker;
> +	struct tcf_exts_miss_cookie_node *miss_cookie_node;
>  #endif
>  	/* Map to export classifier specific extension TLV types to the
>  	 * generic extensions API. Unsupported extensions must be set to 0.
> @@ -240,21 +243,11 @@ struct tcf_exts {
>  static inline int tcf_exts_init(struct tcf_exts *exts, struct net *net,
>  				int action, int police)
>  {
> -#ifdef CONFIG_NET_CLS_ACT
> -	exts->type = 0;
> -	exts->nr_actions = 0;
> -	/* Note: we do not own yet a reference on net.
> -	 * This reference might be taken later from tcf_exts_get_net().
> -	 */
> -	exts->net = net;
> -	exts->actions = kcalloc(TCA_ACT_MAX_PRIO, sizeof(struct tc_action *),
> -				GFP_KERNEL);
> -	if (!exts->actions)
> -		return -ENOMEM;
> +#ifdef CONFIG_NET_CLS
> +	return tcf_exts_init_ex(exts, net, action, police, NULL, 0, false);
> +#else
> +	return -EOPNOTSUPP;
>  #endif
> -	exts->action = action;
> -	exts->police = police;
> -	return 0;
>  }
>  
>  /* Return false if the netns is being destroyed in cleanup_net(). Callers
> @@ -353,6 +346,18 @@ tcf_exts_exec(struct sk_buff *skb, struct tcf_exts *exts,
>  	return TC_ACT_OK;
>  }
>  
> +static inline int
> +tcf_exts_exec_ex(struct sk_buff *skb, struct tcf_exts *exts, int act_index,
> +		 struct tcf_result *res)
> +{
> +#ifdef CONFIG_NET_CLS_ACT
> +	return tcf_action_exec(skb, exts->actions + act_index,
> +			       exts->nr_actions - act_index, res);
> +#else
> +	return TC_ACT_OK;
> +#endif
> +}
> +
>  int tcf_exts_validate(struct net *net, struct tcf_proto *tp,
>  		      struct nlattr **tb, struct nlattr *rate_tlv,
>  		      struct tcf_exts *exts, u32 flags,
> @@ -577,6 +582,7 @@ int tc_setup_offload_action(struct flow_action *flow_action,
>  void tc_cleanup_offload_action(struct flow_action *flow_action);
>  int tc_setup_action(struct flow_action *flow_action,
>  		    struct tc_action *actions[],
> +		    u32 miss_cookie_base,
>  		    struct netlink_ext_ack *extack);
>  
>  int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index af4aa66aaa4e..fab5ba3e61b7 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -369,6 +369,8 @@ struct tcf_proto_ops {
>  						struct nlattr **tca,
>  						struct netlink_ext_ack *extack);
>  	void			(*tmplt_destroy)(void *tmplt_priv);
> +	struct tcf_exts *	(*get_exts)(const struct tcf_proto *tp,
> +					    u32 handle);
>  
>  	/* rtnetlink specific */
>  	int			(*dump)(struct net*, struct tcf_proto*, void *,
> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index e20d1a973417..69f91460a55c 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -1038,7 +1038,8 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>  #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>  	if (tc_skb_ext_tc_enabled()) {
>  		tc_ext = skb_ext_find(skb, TC_SKB_EXT);
> -		key->recirc_id = tc_ext ? tc_ext->chain : 0;
> +		key->recirc_id = tc_ext && !tc_ext->act_miss ?
> +				 tc_ext->chain : 0;
>  		OVS_CB(skb)->mru = tc_ext ? tc_ext->mru : 0;
>  		post_ct = tc_ext ? tc_ext->post_ct : false;
>  		post_ct_snat = post_ct ? tc_ext->post_ct_snat : false;
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index cd09ef49df22..16fd3d30eb12 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -272,7 +272,7 @@ static int tcf_action_offload_add_ex(struct tc_action *action,
>  	if (err)
>  		goto fl_err;
>  
> -	err = tc_setup_action(&fl_action->action, actions, extack);
> +	err = tc_setup_action(&fl_action->action, actions, 0, extack);
>  	if (err) {
>  		NL_SET_ERR_MSG_MOD(extack,
>  				   "Failed to setup tc actions for offload");
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 5b4a95e8a1ee..8ff9530fef68 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -22,6 +22,7 @@
>  #include <linux/idr.h>
>  #include <linux/jhash.h>
>  #include <linux/rculist.h>
> +#include <linux/rhashtable.h>
>  #include <net/net_namespace.h>
>  #include <net/sock.h>
>  #include <net/netlink.h>
> @@ -50,6 +51,109 @@ static LIST_HEAD(tcf_proto_base);
>  /* Protects list of registered TC modules. It is pure SMP lock. */
>  static DEFINE_RWLOCK(cls_mod_lock);
>  
> +static struct xarray tcf_exts_miss_cookies_xa;
> +struct tcf_exts_miss_cookie_node {
> +	const struct tcf_chain *chain;
> +	const struct tcf_proto *tp;
> +	const struct tcf_exts *exts;
> +	u32 chain_index;
> +	u32 tp_prio;
> +	u32 handle;
> +	u32 miss_cookie_base;
> +	struct rcu_head rcu;
> +};
> +
> +/* Each tc action entry cookie will be comprised of 32bit miss_cookie_base +
> + * action index in the exts tc actions array.
> + */
> +union tcf_exts_miss_cookie {
> +	struct {
> +		u32 miss_cookie_base;
> +		u32 act_index;
> +	};
> +	u64 miss_cookie;
> +};
> +
> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> +static int
> +tcf_exts_miss_cookie_base_alloc(struct tcf_exts *exts, struct tcf_proto *tp,
> +				u32 handle)
> +{
> +	struct tcf_exts_miss_cookie_node *n;
> +	static u32 next;

What protects this static variable from concurrent access?

> +	int err;
> +
> +	if (WARN_ON(!handle || !tp->ops->get_exts))
> +		return -EINVAL;
> +
> +	n = kzalloc(sizeof(*n), GFP_KERNEL);
> +	if (!n)
> +		return -ENOMEM;
> +
> +	n->chain_index = tp->chain->index;
> +	n->chain = tp->chain;
> +	n->tp_prio = tp->prio;
> +	n->tp = tp;
> +	n->exts = exts;
> +	n->handle = handle;
> +
> +	err = xa_alloc_cyclic(&tcf_exts_miss_cookies_xa, &n->miss_cookie_base,
> +			      n, xa_limit_32b, &next, GFP_KERNEL);
> +	if (err)
> +		goto err_xa_alloc;
> +
> +	exts->miss_cookie_node = n;
> +	return 0;
> +
> +err_xa_alloc:
> +	kfree(n);
> +	return err;
> +}
> +
> +static void tcf_exts_miss_cookie_base_destroy(struct tcf_exts *exts)
> +{
> +	struct tcf_exts_miss_cookie_node *n;
> +
> +	if (!exts->miss_cookie_node)
> +		return;
> +
> +	n = exts->miss_cookie_node;
> +	xa_erase(&tcf_exts_miss_cookies_xa, n->miss_cookie_base);
> +	kfree_rcu(n, rcu);
> +}
> +
> +static struct tcf_exts_miss_cookie_node *
> +tcf_exts_miss_cookie_lookup(u64 miss_cookie, int *act_index)
> +{
> +	union tcf_exts_miss_cookie mc = { .miss_cookie = miss_cookie, };
> +
> +	*act_index = mc.act_index;
> +	return xa_load(&tcf_exts_miss_cookies_xa, mc.miss_cookie_base);
> +}
> +#else /* IS_ENABLED(CONFIG_NET_TC_SKB_EXT) */
> +static int
> +tcf_exts_miss_cookie_base_alloc(struct tcf_exts *exts, struct tcf_proto *tp,
> +				u32 handle)
> +{
> +	return 0;
> +}
> +
> +static void tcf_exts_miss_cookie_base_destroy(struct tcf_exts *exts)
> +{
> +}
> +#endif /* IS_ENABLED(CONFIG_NET_TC_SKB_EXT) */
> +
> +static u64 tcf_exts_miss_cookie_get(u32 miss_cookie_base, int act_index)
> +{
> +	union tcf_exts_miss_cookie mc = { .act_index = act_index, };
> +
> +	if (!miss_cookie_base)
> +		return 0;
> +
> +	mc.miss_cookie_base = miss_cookie_base;
> +	return mc.miss_cookie;
> +}
> +
>  #ifdef CONFIG_NET_CLS_ACT
>  DEFINE_STATIC_KEY_FALSE(tc_skb_ext_tc);
>  EXPORT_SYMBOL(tc_skb_ext_tc);
> @@ -1549,6 +1653,8 @@ static inline int __tcf_classify(struct sk_buff *skb,
>  				 const struct tcf_proto *orig_tp,
>  				 struct tcf_result *res,
>  				 bool compat_mode,
> +				 struct tcf_exts_miss_cookie_node *n,
> +				 int act_index,
>  				 u32 *last_executed_chain)
>  {
>  #ifdef CONFIG_NET_CLS_ACT
> @@ -1560,13 +1666,36 @@ static inline int __tcf_classify(struct sk_buff *skb,
>  #endif
>  	for (; tp; tp = rcu_dereference_bh(tp->next)) {
>  		__be16 protocol = skb_protocol(skb, false);
> -		int err;
> +		int err = 0;
>  
> -		if (tp->protocol != protocol &&
> -		    tp->protocol != htons(ETH_P_ALL))
> -			continue;
> +		if (n) {
> +			struct tcf_exts *exts;
> +
> +			if (n->tp_prio != tp->prio)
> +				continue;
> +
> +			/* We re-lookup the tp and chain based on index instead
> +			 * of having hard refs and locks to them, so do a sanity
> +			 * check if any of tp,chain,exts was replaced by the
> +			 * time we got here with a cookie from hardware.
> +			 */
> +			if (unlikely(n->tp != tp || n->tp->chain != n->chain ||
> +				     !tp->ops->get_exts))
> +				return TC_ACT_SHOT;
> +
> +			exts = tp->ops->get_exts(tp, n->handle);
> +			if (unlikely(!exts || n->exts != exts))
> +				return TC_ACT_SHOT;
>  
> -		err = tc_classify(skb, tp, res);
> +			n = NULL;
> +			err = tcf_exts_exec_ex(skb, exts, act_index, res);
> +		} else {
> +			if (tp->protocol != protocol &&
> +			    tp->protocol != htons(ETH_P_ALL))
> +				continue;
> +
> +			err = tc_classify(skb, tp, res);
> +		}
>  #ifdef CONFIG_NET_CLS_ACT
>  		if (unlikely(err == TC_ACT_RECLASSIFY && !compat_mode)) {
>  			first_tp = orig_tp;
> @@ -1582,6 +1711,9 @@ static inline int __tcf_classify(struct sk_buff *skb,
>  			return err;
>  	}
>  
> +	if (unlikely(n))
> +		return TC_ACT_SHOT;
> +
>  	return TC_ACT_UNSPEC; /* signal: continue lookup */
>  #ifdef CONFIG_NET_CLS_ACT
>  reset:
> @@ -1606,21 +1738,33 @@ int tcf_classify(struct sk_buff *skb,
>  #if !IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>  	u32 last_executed_chain = 0;
>  
> -	return __tcf_classify(skb, tp, tp, res, compat_mode,
> +	return __tcf_classify(skb, tp, tp, res, compat_mode, NULL, 0,
>  			      &last_executed_chain);
>  #else
>  	u32 last_executed_chain = tp ? tp->chain->index : 0;
> +	struct tcf_exts_miss_cookie_node *n = NULL;
>  	const struct tcf_proto *orig_tp = tp;
>  	struct tc_skb_ext *ext;
> +	int act_index = 0;
>  	int ret;
>  
>  	if (block) {
>  		ext = skb_ext_find(skb, TC_SKB_EXT);
>  
> -		if (ext && ext->chain) {
> +		if (ext && (ext->chain || ext->act_miss)) {
>  			struct tcf_chain *fchain;
> +			u32 chain = ext->chain;

IMHO it would be nice to avoid this assignment here, because there is
nothing above saying that the union value can be read as a chain. For
C, yes, it is okay. The worry is about ensuring that whatever is
reading it, is reading what it is expecting it to be.
Perhaps just declare it here, and then:

>  
> -			fchain = tcf_chain_lookup_rcu(block, ext->chain);
> +			if (ext->act_miss) {
> +				n = tcf_exts_miss_cookie_lookup(ext->act_miss_cookie,
> +								&act_index);
> +				if (!n)
> +					return TC_ACT_SHOT;
> +
> +				chain = n->chain_index;
> +			}

			else {
				chain = ext->chain;
			}

> +
> +			fchain = tcf_chain_lookup_rcu(block, chain);
>  			if (!fchain)
>  				return TC_ACT_SHOT;
>  
> @@ -1632,7 +1776,7 @@ int tcf_classify(struct sk_buff *skb,
>  		}
>  	}
>  
> -	ret = __tcf_classify(skb, tp, orig_tp, res, compat_mode,
> +	ret = __tcf_classify(skb, tp, orig_tp, res, compat_mode, n, act_index,
>  			     &last_executed_chain);
>  
>  	if (tc_skb_ext_tc_enabled()) {
> @@ -3056,9 +3200,48 @@ static int tc_dump_chain(struct sk_buff *skb, struct netlink_callback *cb)
>  	return skb->len;
>  }
>  
> +int tcf_exts_init_ex(struct tcf_exts *exts, struct net *net, int action,
> +		     int police, struct tcf_proto *tp, u32 handle,
> +		     bool use_action_miss)
> +{
> +	int err = 0;
> +
> +#ifdef CONFIG_NET_CLS_ACT
> +	exts->type = 0;
> +	exts->nr_actions = 0;
> +	/* Note: we do not own yet a reference on net.
> +	 * This reference might be taken later from tcf_exts_get_net().
> +	 */
> +	exts->net = net;
> +	exts->actions = kcalloc(TCA_ACT_MAX_PRIO, sizeof(struct tc_action *),
> +				GFP_KERNEL);
> +	if (!exts->actions)
> +		return -ENOMEM;
> +#endif
> +
> +	exts->action = action;
> +	exts->police = police;
> +
> +	if (!use_action_miss)
> +		return 0;
> +
> +	err = tcf_exts_miss_cookie_base_alloc(exts, tp, handle);
> +	if (err)
> +		goto err_miss_alloc;
> +
> +	return 0;
> +
> +err_miss_alloc:
> +	tcf_exts_destroy(exts);
> +	return err;
> +}
> +EXPORT_SYMBOL(tcf_exts_init_ex);
> +
>  void tcf_exts_destroy(struct tcf_exts *exts)
>  {
>  #ifdef CONFIG_NET_CLS_ACT
> +	tcf_exts_miss_cookie_base_destroy(exts);
> +
>  	if (exts->actions) {
>  		tcf_action_destroy(exts->actions, TCA_ACT_UNBIND);
>  		kfree(exts->actions);
> @@ -3547,6 +3730,7 @@ static int tc_setup_offload_act(struct tc_action *act,
>  
>  int tc_setup_action(struct flow_action *flow_action,
>  		    struct tc_action *actions[],
> +		    u32 miss_cookie_base,
>  		    struct netlink_ext_ack *extack)
>  {
>  	int i, j, k, index, err = 0;
> @@ -3577,6 +3761,8 @@ int tc_setup_action(struct flow_action *flow_action,
>  		for (k = 0; k < index ; k++) {
>  			entry[k].hw_stats = tc_act_hw_stats(act->hw_stats);
>  			entry[k].hw_index = act->tcfa_index;
> +			entry[k].miss_cookie =
> +				tcf_exts_miss_cookie_get(miss_cookie_base, i);
>  		}
>  
>  		j += index;
> @@ -3599,10 +3785,15 @@ int tc_setup_offload_action(struct flow_action *flow_action,
>  			    struct netlink_ext_ack *extack)
>  {
>  #ifdef CONFIG_NET_CLS_ACT
> +	u32 miss_cookie_base;
> +
>  	if (!exts)
>  		return 0;
>  
> -	return tc_setup_action(flow_action, exts->actions, extack);
> +	miss_cookie_base = exts->miss_cookie_node ?
> +			   exts->miss_cookie_node->miss_cookie_base : 0;
> +	return tc_setup_action(flow_action, exts->actions, miss_cookie_base,
> +			       extack);
>  #else
>  	return 0;
>  #endif
> @@ -3770,6 +3961,8 @@ static int __init tc_filter_init(void)
>  	if (err)
>  		goto err_register_pernet_subsys;
>  
> +	xa_init_flags(&tcf_exts_miss_cookies_xa, XA_FLAGS_ALLOC1);
> +
>  	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL,
>  		      RTNL_FLAG_DOIT_UNLOCKED);
>  	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL,
> -- 
> 2.30.1
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v9 4/7] net/mlx5: Kconfig: Make tc offload depend on tc skb extension
  2023-02-06 17:44 ` [PATCH net-next v9 4/7] net/mlx5: Kconfig: Make tc offload depend on tc skb extension Paul Blakey
@ 2023-02-10  2:29   ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-02-10  2:29 UTC (permalink / raw)
  To: Paul Blakey
  Cc: netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller,
	Oz Shlomo, Jiri Pirko, Roi Dayan, Vlad Buslov

On Mon, Feb 06, 2023 at 07:44:00PM +0200, Paul Blakey wrote:
> Tc skb extension is a basic requirement for using tc
> offload to support correct restoration on action miss.

Btw, this is great. There was at least 1 report on ovn upstream
because the person didn't know skb extensions had to be manually
enabled.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v9 1/7] net/sched: cls_api: Support hardware miss to tc action
  2023-02-10  2:21   ` Marcelo Ricardo Leitner
@ 2023-02-13 16:13     ` Paul Blakey
  2023-02-13 18:43       ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 21+ messages in thread
From: Paul Blakey @ 2023-02-13 16:13 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller,
	Oz Shlomo, Jiri Pirko, Roi Dayan, Vlad Buslov, Simon Horman



On 10/02/2023 04:21, Marcelo Ricardo Leitner wrote:
> On Mon, Feb 06, 2023 at 07:43:57PM +0200, Paul Blakey wrote:
>> For drivers to support partial offload of a filter's action list,
>> add support for action miss to specify an action instance to
>> continue from in sw.
>>
>> CT action in particular can't be fully offloaded, as new connections
>> need to be handled in software. This imposes other limitations on
>> the actions that can be offloaded together with the CT action, such
>> as packet modifications.
>>
>> Assign each action on a filter's action list a unique miss_cookie
>> which drivers can then use to fill action_miss part of the tc skb
>> extension. On getting back this miss_cookie, find the action
>> instance with relevant cookie and continue classifying from there.
>>
>> Signed-off-by: Paul Blakey <paulb@nvidia.com>
>> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
>> Reviewed-by: Simon Horman <simon.horman@corigine.com>
>> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
>> ---
>>   include/linux/skbuff.h     |   6 +-
>>   include/net/flow_offload.h |   1 +
>>   include/net/pkt_cls.h      |  34 +++---
>>   include/net/sch_generic.h  |   2 +
>>   net/openvswitch/flow.c     |   3 +-
>>   net/sched/act_api.c        |   2 +-
>>   net/sched/cls_api.c        | 213 +++++++++++++++++++++++++++++++++++--
>>   7 files changed, 234 insertions(+), 27 deletions(-)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index 1fa95b916342..9b9aa854068f 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -311,12 +311,16 @@ struct nf_bridge_info {
>>    * and read by ovs to recirc_id.
>>    */
>>   struct tc_skb_ext {
>> -	__u32 chain;
>> +	union {
>> +		u64 act_miss_cookie;
>> +		__u32 chain;
>> +	};
>>   	__u16 mru;
>>   	__u16 zone;
>>   	u8 post_ct:1;
>>   	u8 post_ct_snat:1;
>>   	u8 post_ct_dnat:1;
>> +	u8 act_miss:1; /* Set if act_miss_cookie is used */
>>   };
>>   #endif
>>   
>> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
>> index 0400a0ac8a29..88db7346eb7a 100644
>> --- a/include/net/flow_offload.h
>> +++ b/include/net/flow_offload.h
>> @@ -228,6 +228,7 @@ void flow_action_cookie_destroy(struct flow_action_cookie *cookie);
>>   struct flow_action_entry {
>>   	enum flow_action_id		id;
>>   	u32				hw_index;
>> +	u64				miss_cookie;
> 
> The per-action stats patchset is adding a cookie for the actions as
> well, and exactly on this struct:
> 
> @@ -228,6 +228,7 @@ struct flow_action_cookie *flow_action_cookie_create(void *data,
>   struct flow_action_entry {
>          enum flow_action_id             id;
>          u32                             hw_index;
> +       unsigned long                   act_cookie;
>          enum flow_action_hw_stats       hw_stats;
>          action_destr                    destructor;
>          void                            *destructor_priv;
> 
> There, it is a simple value: the act pointer itself. Here, it is already more
> complex. Can them be merged into only one maybe?
> If not, perhaps act_cookie should be renamed to stats_cookie then.

I don't think it can be shared, actions can be shared between multiple 
filters, while the miss cookie would be different for each used instance 
(takes the filter in to account).

So I'll rename it.

> 
>>   	enum flow_action_hw_stats	hw_stats;
>>   	action_destr			destructor;
>>   	void				*destructor_priv;
>> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>> index cd410a87517b..e395f2a84ed2 100644
>> --- a/include/net/pkt_cls.h
>> +++ b/include/net/pkt_cls.h
>> @@ -59,6 +59,8 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
>>   void tcf_block_put(struct tcf_block *block);
>>   void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
>>   		       struct tcf_block_ext_info *ei);
>> +int tcf_exts_init_ex(struct tcf_exts *exts, struct net *net, int action,
>> +		     int police, struct tcf_proto *tp, u32 handle, bool used_action_miss);
>>   
>>   static inline bool tcf_block_shared(struct tcf_block *block)
>>   {
>> @@ -229,6 +231,7 @@ struct tcf_exts {
>>   	struct tc_action **actions;
>>   	struct net	*net;
>>   	netns_tracker	ns_tracker;
>> +	struct tcf_exts_miss_cookie_node *miss_cookie_node;
>>   #endif
>>   	/* Map to export classifier specific extension TLV types to the
>>   	 * generic extensions API. Unsupported extensions must be set to 0.
>> @@ -240,21 +243,11 @@ struct tcf_exts {
>>   static inline int tcf_exts_init(struct tcf_exts *exts, struct net *net,
>>   				int action, int police)
>>   {
>> -#ifdef CONFIG_NET_CLS_ACT
>> -	exts->type = 0;
>> -	exts->nr_actions = 0;
>> -	/* Note: we do not own yet a reference on net.
>> -	 * This reference might be taken later from tcf_exts_get_net().
>> -	 */
>> -	exts->net = net;
>> -	exts->actions = kcalloc(TCA_ACT_MAX_PRIO, sizeof(struct tc_action *),
>> -				GFP_KERNEL);
>> -	if (!exts->actions)
>> -		return -ENOMEM;
>> +#ifdef CONFIG_NET_CLS
>> +	return tcf_exts_init_ex(exts, net, action, police, NULL, 0, false);
>> +#else
>> +	return -EOPNOTSUPP;
>>   #endif
>> -	exts->action = action;
>> -	exts->police = police;
>> -	return 0;
>>   }
>>   
>>   /* Return false if the netns is being destroyed in cleanup_net(). Callers
>> @@ -353,6 +346,18 @@ tcf_exts_exec(struct sk_buff *skb, struct tcf_exts *exts,
>>   	return TC_ACT_OK;
>>   }
>>   
>> +static inline int
>> +tcf_exts_exec_ex(struct sk_buff *skb, struct tcf_exts *exts, int act_index,
>> +		 struct tcf_result *res)
>> +{
>> +#ifdef CONFIG_NET_CLS_ACT
>> +	return tcf_action_exec(skb, exts->actions + act_index,
>> +			       exts->nr_actions - act_index, res);
>> +#else
>> +	return TC_ACT_OK;
>> +#endif
>> +}
>> +
>>   int tcf_exts_validate(struct net *net, struct tcf_proto *tp,
>>   		      struct nlattr **tb, struct nlattr *rate_tlv,
>>   		      struct tcf_exts *exts, u32 flags,
>> @@ -577,6 +582,7 @@ int tc_setup_offload_action(struct flow_action *flow_action,
>>   void tc_cleanup_offload_action(struct flow_action *flow_action);
>>   int tc_setup_action(struct flow_action *flow_action,
>>   		    struct tc_action *actions[],
>> +		    u32 miss_cookie_base,
>>   		    struct netlink_ext_ack *extack);
>>   
>>   int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> index af4aa66aaa4e..fab5ba3e61b7 100644
>> --- a/include/net/sch_generic.h
>> +++ b/include/net/sch_generic.h
>> @@ -369,6 +369,8 @@ struct tcf_proto_ops {
>>   						struct nlattr **tca,
>>   						struct netlink_ext_ack *extack);
>>   	void			(*tmplt_destroy)(void *tmplt_priv);
>> +	struct tcf_exts *	(*get_exts)(const struct tcf_proto *tp,
>> +					    u32 handle);
>>   
>>   	/* rtnetlink specific */
>>   	int			(*dump)(struct net*, struct tcf_proto*, void *,
>> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
>> index e20d1a973417..69f91460a55c 100644
>> --- a/net/openvswitch/flow.c
>> +++ b/net/openvswitch/flow.c
>> @@ -1038,7 +1038,8 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>>   #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>>   	if (tc_skb_ext_tc_enabled()) {
>>   		tc_ext = skb_ext_find(skb, TC_SKB_EXT);
>> -		key->recirc_id = tc_ext ? tc_ext->chain : 0;
>> +		key->recirc_id = tc_ext && !tc_ext->act_miss ?
>> +				 tc_ext->chain : 0;
>>   		OVS_CB(skb)->mru = tc_ext ? tc_ext->mru : 0;
>>   		post_ct = tc_ext ? tc_ext->post_ct : false;
>>   		post_ct_snat = post_ct ? tc_ext->post_ct_snat : false;
>> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>> index cd09ef49df22..16fd3d30eb12 100644
>> --- a/net/sched/act_api.c
>> +++ b/net/sched/act_api.c
>> @@ -272,7 +272,7 @@ static int tcf_action_offload_add_ex(struct tc_action *action,
>>   	if (err)
>>   		goto fl_err;
>>   
>> -	err = tc_setup_action(&fl_action->action, actions, extack);
>> +	err = tc_setup_action(&fl_action->action, actions, 0, extack);
>>   	if (err) {
>>   		NL_SET_ERR_MSG_MOD(extack,
>>   				   "Failed to setup tc actions for offload");
>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>> index 5b4a95e8a1ee..8ff9530fef68 100644
>> --- a/net/sched/cls_api.c
>> +++ b/net/sched/cls_api.c
>> @@ -22,6 +22,7 @@
>>   #include <linux/idr.h>
>>   #include <linux/jhash.h>
>>   #include <linux/rculist.h>
>> +#include <linux/rhashtable.h>
>>   #include <net/net_namespace.h>
>>   #include <net/sock.h>
>>   #include <net/netlink.h>
>> @@ -50,6 +51,109 @@ static LIST_HEAD(tcf_proto_base);
>>   /* Protects list of registered TC modules. It is pure SMP lock. */
>>   static DEFINE_RWLOCK(cls_mod_lock);
>>   
>> +static struct xarray tcf_exts_miss_cookies_xa;
>> +struct tcf_exts_miss_cookie_node {
>> +	const struct tcf_chain *chain;
>> +	const struct tcf_proto *tp;
>> +	const struct tcf_exts *exts;
>> +	u32 chain_index;
>> +	u32 tp_prio;
>> +	u32 handle;
>> +	u32 miss_cookie_base;
>> +	struct rcu_head rcu;
>> +};
>> +
>> +/* Each tc action entry cookie will be comprised of 32bit miss_cookie_base +
>> + * action index in the exts tc actions array.
>> + */
>> +union tcf_exts_miss_cookie {
>> +	struct {
>> +		u32 miss_cookie_base;
>> +		u32 act_index;
>> +	};
>> +	u64 miss_cookie;
>> +};
>> +
>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>> +static int
>> +tcf_exts_miss_cookie_base_alloc(struct tcf_exts *exts, struct tcf_proto *tp,
>> +				u32 handle)
>> +{
>> +	struct tcf_exts_miss_cookie_node *n;
>> +	static u32 next;
> 
> What protects this static variable from concurrent access?


Nothing, but it acts as a suggestion for xa_alloc, so I don't
think syncronzation is needed here.


> 
>> +	int err;
>> +
>> +	if (WARN_ON(!handle || !tp->ops->get_exts))
>> +		return -EINVAL;
>> +
>> +	n = kzalloc(sizeof(*n), GFP_KERNEL);
>> +	if (!n)
>> +		return -ENOMEM;
>> +
>> +	n->chain_index = tp->chain->index;
>> +	n->chain = tp->chain;
>> +	n->tp_prio = tp->prio;
>> +	n->tp = tp;
>> +	n->exts = exts;
>> +	n->handle = handle;
>> +
>> +	err = xa_alloc_cyclic(&tcf_exts_miss_cookies_xa, &n->miss_cookie_base,
>> +			      n, xa_limit_32b, &next, GFP_KERNEL);
>> +	if (err)
>> +		goto err_xa_alloc;
>> +
>> +	exts->miss_cookie_node = n;
>> +	return 0;
>> +
>> +err_xa_alloc:
>> +	kfree(n);
>> +	return err;
>> +}
>> +
>> +static void tcf_exts_miss_cookie_base_destroy(struct tcf_exts *exts)
>> +{
>> +	struct tcf_exts_miss_cookie_node *n;
>> +
>> +	if (!exts->miss_cookie_node)
>> +		return;
>> +
>> +	n = exts->miss_cookie_node;
>> +	xa_erase(&tcf_exts_miss_cookies_xa, n->miss_cookie_base);
>> +	kfree_rcu(n, rcu);
>> +}
>> +
>> +static struct tcf_exts_miss_cookie_node *
>> +tcf_exts_miss_cookie_lookup(u64 miss_cookie, int *act_index)
>> +{
>> +	union tcf_exts_miss_cookie mc = { .miss_cookie = miss_cookie, };
>> +
>> +	*act_index = mc.act_index;
>> +	return xa_load(&tcf_exts_miss_cookies_xa, mc.miss_cookie_base);
>> +}
>> +#else /* IS_ENABLED(CONFIG_NET_TC_SKB_EXT) */
>> +static int
>> +tcf_exts_miss_cookie_base_alloc(struct tcf_exts *exts, struct tcf_proto *tp,
>> +				u32 handle)
>> +{
>> +	return 0;
>> +}
>> +
>> +static void tcf_exts_miss_cookie_base_destroy(struct tcf_exts *exts)
>> +{
>> +}
>> +#endif /* IS_ENABLED(CONFIG_NET_TC_SKB_EXT) */
>> +
>> +static u64 tcf_exts_miss_cookie_get(u32 miss_cookie_base, int act_index)
>> +{
>> +	union tcf_exts_miss_cookie mc = { .act_index = act_index, };
>> +
>> +	if (!miss_cookie_base)
>> +		return 0;
>> +
>> +	mc.miss_cookie_base = miss_cookie_base;
>> +	return mc.miss_cookie;
>> +}
>> +
>>   #ifdef CONFIG_NET_CLS_ACT
>>   DEFINE_STATIC_KEY_FALSE(tc_skb_ext_tc);
>>   EXPORT_SYMBOL(tc_skb_ext_tc);
>> @@ -1549,6 +1653,8 @@ static inline int __tcf_classify(struct sk_buff *skb,
>>   				 const struct tcf_proto *orig_tp,
>>   				 struct tcf_result *res,
>>   				 bool compat_mode,
>> +				 struct tcf_exts_miss_cookie_node *n,
>> +				 int act_index,
>>   				 u32 *last_executed_chain)
>>   {
>>   #ifdef CONFIG_NET_CLS_ACT
>> @@ -1560,13 +1666,36 @@ static inline int __tcf_classify(struct sk_buff *skb,
>>   #endif
>>   	for (; tp; tp = rcu_dereference_bh(tp->next)) {
>>   		__be16 protocol = skb_protocol(skb, false);
>> -		int err;
>> +		int err = 0;
>>   
>> -		if (tp->protocol != protocol &&
>> -		    tp->protocol != htons(ETH_P_ALL))
>> -			continue;
>> +		if (n) {
>> +			struct tcf_exts *exts;
>> +
>> +			if (n->tp_prio != tp->prio)
>> +				continue;
>> +
>> +			/* We re-lookup the tp and chain based on index instead
>> +			 * of having hard refs and locks to them, so do a sanity
>> +			 * check if any of tp,chain,exts was replaced by the
>> +			 * time we got here with a cookie from hardware.
>> +			 */
>> +			if (unlikely(n->tp != tp || n->tp->chain != n->chain ||
>> +				     !tp->ops->get_exts))
>> +				return TC_ACT_SHOT;
>> +
>> +			exts = tp->ops->get_exts(tp, n->handle);
>> +			if (unlikely(!exts || n->exts != exts))
>> +				return TC_ACT_SHOT;
>>   
>> -		err = tc_classify(skb, tp, res);
>> +			n = NULL;
>> +			err = tcf_exts_exec_ex(skb, exts, act_index, res);
>> +		} else {
>> +			if (tp->protocol != protocol &&
>> +			    tp->protocol != htons(ETH_P_ALL))
>> +				continue;
>> +
>> +			err = tc_classify(skb, tp, res);
>> +		}
>>   #ifdef CONFIG_NET_CLS_ACT
>>   		if (unlikely(err == TC_ACT_RECLASSIFY && !compat_mode)) {
>>   			first_tp = orig_tp;
>> @@ -1582,6 +1711,9 @@ static inline int __tcf_classify(struct sk_buff *skb,
>>   			return err;
>>   	}
>>   
>> +	if (unlikely(n))
>> +		return TC_ACT_SHOT;
>> +
>>   	return TC_ACT_UNSPEC; /* signal: continue lookup */
>>   #ifdef CONFIG_NET_CLS_ACT
>>   reset:
>> @@ -1606,21 +1738,33 @@ int tcf_classify(struct sk_buff *skb,
>>   #if !IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>>   	u32 last_executed_chain = 0;
>>   
>> -	return __tcf_classify(skb, tp, tp, res, compat_mode,
>> +	return __tcf_classify(skb, tp, tp, res, compat_mode, NULL, 0,
>>   			      &last_executed_chain);
>>   #else
>>   	u32 last_executed_chain = tp ? tp->chain->index : 0;
>> +	struct tcf_exts_miss_cookie_node *n = NULL;
>>   	const struct tcf_proto *orig_tp = tp;
>>   	struct tc_skb_ext *ext;
>> +	int act_index = 0;
>>   	int ret;
>>   
>>   	if (block) {
>>   		ext = skb_ext_find(skb, TC_SKB_EXT);
>>   
>> -		if (ext && ext->chain) {
>> +		if (ext && (ext->chain || ext->act_miss)) {
>>   			struct tcf_chain *fchain;
>> +			u32 chain = ext->chain;
> 
> IMHO it would be nice to avoid this assignment here, because there is
> nothing above saying that the union value can be read as a chain. For
> C, yes, it is okay. The worry is about ensuring that whatever is
> reading it, is reading what it is expecting it to be.
> Perhaps just declare it here, and then:

Right, not nice, will fix.

> 
>>   
>> -			fchain = tcf_chain_lookup_rcu(block, ext->chain);
>> +			if (ext->act_miss) {
>> +				n = tcf_exts_miss_cookie_lookup(ext->act_miss_cookie,
>> +								&act_index);
>> +				if (!n)
>> +					return TC_ACT_SHOT;
>> +
>> +				chain = n->chain_index;
>> +			}
> 
> 			else {
> 				chain = ext->chain;
> 			}
> 
>> +
>> +			fchain = tcf_chain_lookup_rcu(block, chain);
>>   			if (!fchain)
>>   				return TC_ACT_SHOT;
>>   
>> @@ -1632,7 +1776,7 @@ int tcf_classify(struct sk_buff *skb,
>>   		}
>>   	}
>>   
>> -	ret = __tcf_classify(skb, tp, orig_tp, res, compat_mode,
>> +	ret = __tcf_classify(skb, tp, orig_tp, res, compat_mode, n, act_index,
>>   			     &last_executed_chain);
>>   
>>   	if (tc_skb_ext_tc_enabled()) {
>> @@ -3056,9 +3200,48 @@ static int tc_dump_chain(struct sk_buff *skb, struct netlink_callback *cb)
>>   	return skb->len;
>>   }
>>   
>> +int tcf_exts_init_ex(struct tcf_exts *exts, struct net *net, int action,
>> +		     int police, struct tcf_proto *tp, u32 handle,
>> +		     bool use_action_miss)
>> +{
>> +	int err = 0;
>> +
>> +#ifdef CONFIG_NET_CLS_ACT
>> +	exts->type = 0;
>> +	exts->nr_actions = 0;
>> +	/* Note: we do not own yet a reference on net.
>> +	 * This reference might be taken later from tcf_exts_get_net().
>> +	 */
>> +	exts->net = net;
>> +	exts->actions = kcalloc(TCA_ACT_MAX_PRIO, sizeof(struct tc_action *),
>> +				GFP_KERNEL);
>> +	if (!exts->actions)
>> +		return -ENOMEM;
>> +#endif
>> +
>> +	exts->action = action;
>> +	exts->police = police;
>> +
>> +	if (!use_action_miss)
>> +		return 0;
>> +
>> +	err = tcf_exts_miss_cookie_base_alloc(exts, tp, handle);
>> +	if (err)
>> +		goto err_miss_alloc;
>> +
>> +	return 0;
>> +
>> +err_miss_alloc:
>> +	tcf_exts_destroy(exts);
>> +	return err;
>> +}
>> +EXPORT_SYMBOL(tcf_exts_init_ex);
>> +
>>   void tcf_exts_destroy(struct tcf_exts *exts)
>>   {
>>   #ifdef CONFIG_NET_CLS_ACT
>> +	tcf_exts_miss_cookie_base_destroy(exts);
>> +
>>   	if (exts->actions) {
>>   		tcf_action_destroy(exts->actions, TCA_ACT_UNBIND);
>>   		kfree(exts->actions);
>> @@ -3547,6 +3730,7 @@ static int tc_setup_offload_act(struct tc_action *act,
>>   
>>   int tc_setup_action(struct flow_action *flow_action,
>>   		    struct tc_action *actions[],
>> +		    u32 miss_cookie_base,
>>   		    struct netlink_ext_ack *extack)
>>   {
>>   	int i, j, k, index, err = 0;
>> @@ -3577,6 +3761,8 @@ int tc_setup_action(struct flow_action *flow_action,
>>   		for (k = 0; k < index ; k++) {
>>   			entry[k].hw_stats = tc_act_hw_stats(act->hw_stats);
>>   			entry[k].hw_index = act->tcfa_index;
>> +			entry[k].miss_cookie =
>> +				tcf_exts_miss_cookie_get(miss_cookie_base, i);
>>   		}
>>   
>>   		j += index;
>> @@ -3599,10 +3785,15 @@ int tc_setup_offload_action(struct flow_action *flow_action,
>>   			    struct netlink_ext_ack *extack)
>>   {
>>   #ifdef CONFIG_NET_CLS_ACT
>> +	u32 miss_cookie_base;
>> +
>>   	if (!exts)
>>   		return 0;
>>   
>> -	return tc_setup_action(flow_action, exts->actions, extack);
>> +	miss_cookie_base = exts->miss_cookie_node ?
>> +			   exts->miss_cookie_node->miss_cookie_base : 0;
>> +	return tc_setup_action(flow_action, exts->actions, miss_cookie_base,
>> +			       extack);
>>   #else
>>   	return 0;
>>   #endif
>> @@ -3770,6 +3961,8 @@ static int __init tc_filter_init(void)
>>   	if (err)
>>   		goto err_register_pernet_subsys;
>>   
>> +	xa_init_flags(&tcf_exts_miss_cookies_xa, XA_FLAGS_ALLOC1);
>> +
>>   	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL,
>>   		      RTNL_FLAG_DOIT_UNLOCKED);
>>   	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL,
>> -- 
>> 2.30.1
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v9 0/7] net/sched: cls_api: Support hardware miss to tc action
  2023-02-10  1:56 ` [PATCH net-next v9 0/7] net/sched: cls_api: Support hardware miss to tc action Marcelo Ricardo Leitner
@ 2023-02-13 16:25   ` Paul Blakey
  2023-02-13 18:27     ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 21+ messages in thread
From: Paul Blakey @ 2023-02-13 16:25 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller,
	Oz Shlomo, Jiri Pirko, Roi Dayan, Vlad Buslov



On 10/02/2023 03:56, Marcelo Ricardo Leitner wrote:
> Hi,
> 
> On Mon, Feb 06, 2023 at 07:43:56PM +0200, Paul Blakey wrote:
>> Hi,
>>
>> This series adds support for hardware miss to instruct tc to continue execution
>> in a specific tc action instance on a filter's action list. The mlx5 driver patch
>> (besides the refactors) shows its usage instead of using just chain restore.
>>
>> Currently a filter's action list must be executed all together or
>> not at all as driver are only able to tell tc to continue executing from a
>> specific tc chain, and not a specific filter/action.
>>
>> This is troublesome with regards to action CT, where new connections should
>> be sent to software (via tc chain restore), and established connections can
>> be handled in hardware.
>>
>> Checking for new connections is done when executing the ct action in hardware
>> (by checking the packet's tuple against known established tuples).
>> But if there is a packet modification (pedit) action before action CT and the
>> checked tuple is a new connection, hardware will need to revert the previous
>> packet modifications before sending it back to software so it can
>> re-match the same tc filter in software and re-execute its CT action.
>>
>> The following is an example configuration of stateless nat
>> on mlx5 driver that isn't supported before this patchet:
>>
>>   #Setup corrosponding mlx5 VFs in namespaces
>>   $ ip netns add ns0
>>   $ ip netns add ns1
>>   $ ip link set dev enp8s0f0v0 netns ns0
>>   $ ip netns exec ns0 ifconfig enp8s0f0v0 1.1.1.1/24 up
>>   $ ip link set dev enp8s0f0v1 netns ns1
>>   $ ip netns exec ns1 ifconfig enp8s0f0v1 1.1.1.2/24 up
>>
>>   #Setup tc arp and ct rules on mxl5 VF representors
>>   $ tc qdisc add dev enp8s0f0_0 ingress
>>   $ tc qdisc add dev enp8s0f0_1 ingress
>>   $ ifconfig enp8s0f0_0 up
>>   $ ifconfig enp8s0f0_1 up
>>
>>   #Original side
>>   $ tc filter add dev enp8s0f0_0 ingress chain 0 proto ip flower \
>>      ct_state -trk ip_proto tcp dst_port 8888 \
>>        action pedit ex munge tcp dport set 5001 pipe \
>>        action csum ip tcp pipe \
>>        action ct pipe \
>>        action goto chain 1
>>   $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
>>      ct_state +trk+est \
>>        action mirred egress redirect dev enp8s0f0_1
>>   $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
>>      ct_state +trk+new \
>>        action ct commit pipe \
>>        action mirred egress redirect dev enp8s0f0_1
>>   $ tc filter add dev enp8s0f0_0 ingress chain 0 proto arp flower \
>>        action mirred egress redirect dev enp8s0f0_1
>>
>>   #Reply side
>>   $ tc filter add dev enp8s0f0_1 ingress chain 0 proto arp flower \
>>        action mirred egress redirect dev enp8s0f0_0
>>   $ tc filter add dev enp8s0f0_1 ingress chain 0 proto ip flower \
>>      ct_state -trk ip_proto tcp \
>>        action ct pipe \
>>        action pedit ex munge tcp sport set 8888 pipe \
>>        action csum ip tcp pipe \
>>        action mirred egress redirect dev enp8s0f0_0
>>
>>   #Run traffic
>>   $ ip netns exec ns1 iperf -s -p 5001&
>>   $ sleep 2 #wait for iperf to fully open
>>   $ ip netns exec ns0 iperf -c 1.1.1.2 -p 8888
>>
>>   #dump tc filter stats on enp8s0f0_0 chain 0 rule and see hardware packets:
>>   $ tc -s filter show dev enp8s0f0_0 ingress chain 0 proto ip | grep "hardware.*pkt"
>>          Sent hardware 9310116832 bytes 6149672 pkt
>>          Sent hardware 9310116832 bytes 6149672 pkt
>>          Sent hardware 9310116832 bytes 6149672 pkt
> 
> I see Jamal had asked about stats on the other version, but then no
> dependency was set. I think we _must_ have a dependency of this
> patchet on the per-action stats one. Otherwise the stats above will
> get messy.  Without the per-action stats, the last one is replicated
> to the other actions. But then, will hw count the packet that it did
> only the first action? I don't see how it would, and then for the all
> but first one the packet will be accounted twice.
> 
> With this said, it would be nice to provide a sample of how the sw and
> hw stats would look like _after_ this patchset as well.
> 
> Btw I'll add my Reviewed-by tag to the per-action stats one in a few.


This patchset actually doesn't need to rely on the per actions stats 
because the driver is still reordering the action list so CT will be 
first and the example in cover letter will be rejected because it can't 
be reordered. Then we are still doing all (CT and the rest) or nothing.

But we wanted to confirm the API before committing the rest of the 
driver patches since its has a lot of refactors so we split it to two 
series - API, then only MLX5 DRIVER. This is just the API with a bare 
minimal driver change to just use the API. From tc user perspective 
nothing changed here.

So do we first continue with this (after i fix your suggestions), and 
then we'll submit the rest, or should I rebase on the per action stats, 
add even more mlx5 patches here which are mostly not relevant, since I 
think we are ok with the API changes.


> 
>>
>> A new connection executing the first filter in hardware will first rewrite
>> the dst port to the new port, and then the ct action is executed,
>> because this is a new connection, hardware will need to be send this back
>> to software, on chain 0, to execute the first filter again in software.
>> The dst port needs to be reverted otherwise it won't re-match the old
>> dst port in the first filter. Because of that, currently mlx5 driver will
>> reject offloading the above action ct rule.
>>
>> This series adds supports partial offload of a filter's action list,
> 
> We should avoid this terminology as is, as it can create confusion. It
> is not that it is offloading action 1 and not action 2. Instead, it is
> adding support to a more fine grained miss to sw. Perhaps "support for
> partially executing in hw".
> 
>> and letting tc software continue processing in the specific action instance
>> where hardware left off (in the above case after the "action pedit ex munge tcp
>> dport... of the first rule") allowing support for scenarios such as the above.
>>
>> Changelog:
>> 	v1->v2:
>> 	Fixed compilation without CONFIG_NET_CLS
>> 	Cover letter re-write
>>
>> 	v2->v3:
>> 	Unlock spin_lock on error in cls flower filter handle refactor
>> 	Cover letter
>>
>> 	v3->v4:
>> 	Silence warning by clang
>>
>> 	v4->v5:
>> 	Cover letter example
>> 	Removed ifdef as much as possible by using inline stubs
>>
>> 	v5->v6:
>> 	Removed new inlines in cls_api.c (bot complained in patchwork)
>> 	Added reviewed-by/ack - Thanks!
>>
>> 	v6->v7:
>> 	Removed WARN_ON from pkt path (leon)
>> 	Removed unnecessary return in void func
>>
>> 	v7->v8:
>> 	Removed #if IS_ENABLED on skb ext adding Kconfig changes
>> 	Complex variable init in seperate lines
>> 	if,else if, else if ---> switch case
>>
>> 	v8->v9:
>> 	Removed even more IS_ENABLED because of Kconfig
>>
>> Paul Blakey (7):
>>    net/sched: cls_api: Support hardware miss to tc action
>>    net/sched: flower: Move filter handle initialization earlier
>>    net/sched: flower: Support hardware miss to tc action
>>    net/mlx5: Kconfig: Make tc offload depend on tc skb extension
>>    net/mlx5: Refactor tc miss handling to a single function
>>    net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
>>    net/mlx5e: TC, Set CT miss to the specific ct action instance
>>
>>   .../net/ethernet/mellanox/mlx5/core/Kconfig   |   4 +-
>>   .../ethernet/mellanox/mlx5/core/en/rep/tc.c   | 225 ++------------
>>   .../mellanox/mlx5/core/en/tc/sample.c         |   2 +-
>>   .../ethernet/mellanox/mlx5/core/en/tc_ct.c    |  39 +--
>>   .../ethernet/mellanox/mlx5/core/en/tc_ct.h    |   2 +
>>   .../net/ethernet/mellanox/mlx5/core/en_rx.c   |   4 +-
>>   .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 280 ++++++++++++++++--
>>   .../net/ethernet/mellanox/mlx5/core/en_tc.h   |  23 +-
>>   .../net/ethernet/mellanox/mlx5/core/eswitch.h |   2 +
>>   .../mellanox/mlx5/core/lib/fs_chains.c        |  14 +-
>>   include/linux/skbuff.h                        |   6 +-
>>   include/net/flow_offload.h                    |   1 +
>>   include/net/pkt_cls.h                         |  34 ++-
>>   include/net/sch_generic.h                     |   2 +
>>   net/openvswitch/flow.c                        |   3 +-
>>   net/sched/act_api.c                           |   2 +-
>>   net/sched/cls_api.c                           | 213 ++++++++++++-
>>   net/sched/cls_flower.c                        |  73 +++--
>>   18 files changed, 602 insertions(+), 327 deletions(-)
>>
>> -- 
>> 2.30.1
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v9 0/7] net/sched: cls_api: Support hardware miss to tc action
  2023-02-13 16:25   ` Paul Blakey
@ 2023-02-13 18:27     ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-02-13 18:27 UTC (permalink / raw)
  To: Paul Blakey
  Cc: netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller,
	Oz Shlomo, Jiri Pirko, Roi Dayan, Vlad Buslov

On Mon, Feb 13, 2023 at 06:25:14PM +0200, Paul Blakey wrote:
> 
> 
> On 10/02/2023 03:56, Marcelo Ricardo Leitner wrote:
> > Hi,
> > 
> > On Mon, Feb 06, 2023 at 07:43:56PM +0200, Paul Blakey wrote:
> > > Hi,
> > > 
> > > This series adds support for hardware miss to instruct tc to continue execution
> > > in a specific tc action instance on a filter's action list. The mlx5 driver patch
> > > (besides the refactors) shows its usage instead of using just chain restore.
> > > 
> > > Currently a filter's action list must be executed all together or
> > > not at all as driver are only able to tell tc to continue executing from a
> > > specific tc chain, and not a specific filter/action.
> > > 
> > > This is troublesome with regards to action CT, where new connections should
> > > be sent to software (via tc chain restore), and established connections can
> > > be handled in hardware.
> > > 
> > > Checking for new connections is done when executing the ct action in hardware
> > > (by checking the packet's tuple against known established tuples).
> > > But if there is a packet modification (pedit) action before action CT and the
> > > checked tuple is a new connection, hardware will need to revert the previous
> > > packet modifications before sending it back to software so it can
> > > re-match the same tc filter in software and re-execute its CT action.
> > > 
> > > The following is an example configuration of stateless nat
> > > on mlx5 driver that isn't supported before this patchet:
> > > 
> > >   #Setup corrosponding mlx5 VFs in namespaces
> > >   $ ip netns add ns0
> > >   $ ip netns add ns1
> > >   $ ip link set dev enp8s0f0v0 netns ns0
> > >   $ ip netns exec ns0 ifconfig enp8s0f0v0 1.1.1.1/24 up
> > >   $ ip link set dev enp8s0f0v1 netns ns1
> > >   $ ip netns exec ns1 ifconfig enp8s0f0v1 1.1.1.2/24 up
> > > 
> > >   #Setup tc arp and ct rules on mxl5 VF representors
> > >   $ tc qdisc add dev enp8s0f0_0 ingress
> > >   $ tc qdisc add dev enp8s0f0_1 ingress
> > >   $ ifconfig enp8s0f0_0 up
> > >   $ ifconfig enp8s0f0_1 up
> > > 
> > >   #Original side
> > >   $ tc filter add dev enp8s0f0_0 ingress chain 0 proto ip flower \
> > >      ct_state -trk ip_proto tcp dst_port 8888 \
> > >        action pedit ex munge tcp dport set 5001 pipe \
> > >        action csum ip tcp pipe \
> > >        action ct pipe \
> > >        action goto chain 1
> > >   $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
> > >      ct_state +trk+est \
> > >        action mirred egress redirect dev enp8s0f0_1
> > >   $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
> > >      ct_state +trk+new \
> > >        action ct commit pipe \
> > >        action mirred egress redirect dev enp8s0f0_1
> > >   $ tc filter add dev enp8s0f0_0 ingress chain 0 proto arp flower \
> > >        action mirred egress redirect dev enp8s0f0_1
> > > 
> > >   #Reply side
> > >   $ tc filter add dev enp8s0f0_1 ingress chain 0 proto arp flower \
> > >        action mirred egress redirect dev enp8s0f0_0
> > >   $ tc filter add dev enp8s0f0_1 ingress chain 0 proto ip flower \
> > >      ct_state -trk ip_proto tcp \
> > >        action ct pipe \
> > >        action pedit ex munge tcp sport set 8888 pipe \
> > >        action csum ip tcp pipe \
> > >        action mirred egress redirect dev enp8s0f0_0
> > > 
> > >   #Run traffic
> > >   $ ip netns exec ns1 iperf -s -p 5001&
> > >   $ sleep 2 #wait for iperf to fully open
> > >   $ ip netns exec ns0 iperf -c 1.1.1.2 -p 8888
> > > 
> > >   #dump tc filter stats on enp8s0f0_0 chain 0 rule and see hardware packets:
> > >   $ tc -s filter show dev enp8s0f0_0 ingress chain 0 proto ip | grep "hardware.*pkt"
> > >          Sent hardware 9310116832 bytes 6149672 pkt
> > >          Sent hardware 9310116832 bytes 6149672 pkt
> > >          Sent hardware 9310116832 bytes 6149672 pkt
> > 
> > I see Jamal had asked about stats on the other version, but then no
> > dependency was set. I think we _must_ have a dependency of this
> > patchet on the per-action stats one. Otherwise the stats above will
> > get messy.  Without the per-action stats, the last one is replicated
> > to the other actions. But then, will hw count the packet that it did
> > only the first action? I don't see how it would, and then for the all
> > but first one the packet will be accounted twice.
> > 
> > With this said, it would be nice to provide a sample of how the sw and
> > hw stats would look like _after_ this patchset as well.
> > 
> > Btw I'll add my Reviewed-by tag to the per-action stats one in a few.
> 
> 
> This patchset actually doesn't need to rely on the per actions stats because
> the driver is still reordering the action list so CT will be first and the
> example in cover letter will be rejected because it can't be reordered. Then
> we are still doing all (CT and the rest) or nothing.

Ah right. I thought the driver patches here were undoing the
reordering already.

> 
> But we wanted to confirm the API before committing the rest of the driver
> patches since its has a lot of refactors so we split it to two series - API,
> then only MLX5 DRIVER. This is just the API with a bare minimal driver
> change to just use the API. From tc user perspective nothing changed here.
> 
> So do we first continue with this (after i fix your suggestions), and then
> we'll submit the rest, or should I rebase on the per action stats, add even

Well, the two patchsets will certainly create conflicts when adding
the cookie. Not sure if the maintainers can accomodate it.

> more mlx5 patches here which are mostly not relevant, since I think we are
> ok with the API changes.
> 
> 
> > 
> > > 
> > > A new connection executing the first filter in hardware will first rewrite
> > > the dst port to the new port, and then the ct action is executed,
> > > because this is a new connection, hardware will need to be send this back
> > > to software, on chain 0, to execute the first filter again in software.
> > > The dst port needs to be reverted otherwise it won't re-match the old
> > > dst port in the first filter. Because of that, currently mlx5 driver will
> > > reject offloading the above action ct rule.
> > > 
> > > This series adds supports partial offload of a filter's action list,
> > 
> > We should avoid this terminology as is, as it can create confusion. It
> > is not that it is offloading action 1 and not action 2. Instead, it is
> > adding support to a more fine grained miss to sw. Perhaps "support for
> > partially executing in hw".
> > 
> > > and letting tc software continue processing in the specific action instance
> > > where hardware left off (in the above case after the "action pedit ex munge tcp
> > > dport... of the first rule") allowing support for scenarios such as the above.
> > > 
> > > Changelog:
> > > 	v1->v2:
> > > 	Fixed compilation without CONFIG_NET_CLS
> > > 	Cover letter re-write
> > > 
> > > 	v2->v3:
> > > 	Unlock spin_lock on error in cls flower filter handle refactor
> > > 	Cover letter
> > > 
> > > 	v3->v4:
> > > 	Silence warning by clang
> > > 
> > > 	v4->v5:
> > > 	Cover letter example
> > > 	Removed ifdef as much as possible by using inline stubs
> > > 
> > > 	v5->v6:
> > > 	Removed new inlines in cls_api.c (bot complained in patchwork)
> > > 	Added reviewed-by/ack - Thanks!
> > > 
> > > 	v6->v7:
> > > 	Removed WARN_ON from pkt path (leon)
> > > 	Removed unnecessary return in void func
> > > 
> > > 	v7->v8:
> > > 	Removed #if IS_ENABLED on skb ext adding Kconfig changes
> > > 	Complex variable init in seperate lines
> > > 	if,else if, else if ---> switch case
> > > 
> > > 	v8->v9:
> > > 	Removed even more IS_ENABLED because of Kconfig
> > > 
> > > Paul Blakey (7):
> > >    net/sched: cls_api: Support hardware miss to tc action
> > >    net/sched: flower: Move filter handle initialization earlier
> > >    net/sched: flower: Support hardware miss to tc action
> > >    net/mlx5: Kconfig: Make tc offload depend on tc skb extension
> > >    net/mlx5: Refactor tc miss handling to a single function
> > >    net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
> > >    net/mlx5e: TC, Set CT miss to the specific ct action instance
> > > 
> > >   .../net/ethernet/mellanox/mlx5/core/Kconfig   |   4 +-
> > >   .../ethernet/mellanox/mlx5/core/en/rep/tc.c   | 225 ++------------
> > >   .../mellanox/mlx5/core/en/tc/sample.c         |   2 +-
> > >   .../ethernet/mellanox/mlx5/core/en/tc_ct.c    |  39 +--
> > >   .../ethernet/mellanox/mlx5/core/en/tc_ct.h    |   2 +
> > >   .../net/ethernet/mellanox/mlx5/core/en_rx.c   |   4 +-
> > >   .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 280 ++++++++++++++++--
> > >   .../net/ethernet/mellanox/mlx5/core/en_tc.h   |  23 +-
> > >   .../net/ethernet/mellanox/mlx5/core/eswitch.h |   2 +
> > >   .../mellanox/mlx5/core/lib/fs_chains.c        |  14 +-
> > >   include/linux/skbuff.h                        |   6 +-
> > >   include/net/flow_offload.h                    |   1 +
> > >   include/net/pkt_cls.h                         |  34 ++-
> > >   include/net/sch_generic.h                     |   2 +
> > >   net/openvswitch/flow.c                        |   3 +-
> > >   net/sched/act_api.c                           |   2 +-
> > >   net/sched/cls_api.c                           | 213 ++++++++++++-
> > >   net/sched/cls_flower.c                        |  73 +++--
> > >   18 files changed, 602 insertions(+), 327 deletions(-)
> > > 
> > > -- 
> > > 2.30.1
> > > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v9 1/7] net/sched: cls_api: Support hardware miss to tc action
  2023-02-13 16:13     ` Paul Blakey
@ 2023-02-13 18:43       ` Marcelo Ricardo Leitner
  2023-02-14 12:14         ` Paul Blakey
  0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-02-13 18:43 UTC (permalink / raw)
  To: Paul Blakey
  Cc: netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller,
	Oz Shlomo, Jiri Pirko, Roi Dayan, Vlad Buslov, Simon Horman

On Mon, Feb 13, 2023 at 06:13:34PM +0200, Paul Blakey wrote:
> 
> 
> On 10/02/2023 04:21, Marcelo Ricardo Leitner wrote:
> > On Mon, Feb 06, 2023 at 07:43:57PM +0200, Paul Blakey wrote:
> > > For drivers to support partial offload of a filter's action list,
> > > add support for action miss to specify an action instance to
> > > continue from in sw.
> > > 
> > > CT action in particular can't be fully offloaded, as new connections
> > > need to be handled in software. This imposes other limitations on
> > > the actions that can be offloaded together with the CT action, such
> > > as packet modifications.
> > > 
> > > Assign each action on a filter's action list a unique miss_cookie
> > > which drivers can then use to fill action_miss part of the tc skb
> > > extension. On getting back this miss_cookie, find the action
> > > instance with relevant cookie and continue classifying from there.
> > > 
> > > Signed-off-by: Paul Blakey <paulb@nvidia.com>
> > > Reviewed-by: Jiri Pirko <jiri@nvidia.com>
> > > Reviewed-by: Simon Horman <simon.horman@corigine.com>
> > > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > > ---
> > >   include/linux/skbuff.h     |   6 +-
> > >   include/net/flow_offload.h |   1 +
> > >   include/net/pkt_cls.h      |  34 +++---
> > >   include/net/sch_generic.h  |   2 +
> > >   net/openvswitch/flow.c     |   3 +-
> > >   net/sched/act_api.c        |   2 +-
> > >   net/sched/cls_api.c        | 213 +++++++++++++++++++++++++++++++++++--
> > >   7 files changed, 234 insertions(+), 27 deletions(-)
> > > 
> > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > index 1fa95b916342..9b9aa854068f 100644
> > > --- a/include/linux/skbuff.h
> > > +++ b/include/linux/skbuff.h
> > > @@ -311,12 +311,16 @@ struct nf_bridge_info {
> > >    * and read by ovs to recirc_id.
> > >    */
> > >   struct tc_skb_ext {
> > > -	__u32 chain;
> > > +	union {
> > > +		u64 act_miss_cookie;
> > > +		__u32 chain;
> > > +	};
> > >   	__u16 mru;
> > >   	__u16 zone;
> > >   	u8 post_ct:1;
> > >   	u8 post_ct_snat:1;
> > >   	u8 post_ct_dnat:1;
> > > +	u8 act_miss:1; /* Set if act_miss_cookie is used */
> > >   };
> > >   #endif
> > > diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> > > index 0400a0ac8a29..88db7346eb7a 100644
> > > --- a/include/net/flow_offload.h
> > > +++ b/include/net/flow_offload.h
> > > @@ -228,6 +228,7 @@ void flow_action_cookie_destroy(struct flow_action_cookie *cookie);
> > >   struct flow_action_entry {
> > >   	enum flow_action_id		id;
> > >   	u32				hw_index;
> > > +	u64				miss_cookie;
> > 
> > The per-action stats patchset is adding a cookie for the actions as
> > well, and exactly on this struct:
> > 
> > @@ -228,6 +228,7 @@ struct flow_action_cookie *flow_action_cookie_create(void *data,
> >   struct flow_action_entry {
> >          enum flow_action_id             id;
> >          u32                             hw_index;
> > +       unsigned long                   act_cookie;
> >          enum flow_action_hw_stats       hw_stats;
> >          action_destr                    destructor;
> >          void                            *destructor_priv;
> > 
> > There, it is a simple value: the act pointer itself. Here, it is already more
> > complex. Can them be merged into only one maybe?
> > If not, perhaps act_cookie should be renamed to stats_cookie then.
> 
> I don't think it can be shared, actions can be shared between multiple
> filters, while the miss cookie would be different for each used instance
> (takes the filter in to account).

Good point. So it would at best be a masked value that part A works
for the miss here and part B for the stats, which is pretty much what
the two cookies are giving, just without having to do bit gymnasics,
yes.

> 
> So I'll rename it.
> 
> > 
> > >   	enum flow_action_hw_stats	hw_stats;
> > >   	action_destr			destructor;
> > >   	void				*destructor_priv;
> > > diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> > > index cd410a87517b..e395f2a84ed2 100644
> > > --- a/include/net/pkt_cls.h
> > > +++ b/include/net/pkt_cls.h
> > > @@ -59,6 +59,8 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
> > >   void tcf_block_put(struct tcf_block *block);
> > >   void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
> > >   		       struct tcf_block_ext_info *ei);
> > > +int tcf_exts_init_ex(struct tcf_exts *exts, struct net *net, int action,
> > > +		     int police, struct tcf_proto *tp, u32 handle, bool used_action_miss);
> > >   static inline bool tcf_block_shared(struct tcf_block *block)
> > >   {
> > > @@ -229,6 +231,7 @@ struct tcf_exts {
> > >   	struct tc_action **actions;
> > >   	struct net	*net;
> > >   	netns_tracker	ns_tracker;
> > > +	struct tcf_exts_miss_cookie_node *miss_cookie_node;
> > >   #endif
> > >   	/* Map to export classifier specific extension TLV types to the
> > >   	 * generic extensions API. Unsupported extensions must be set to 0.
> > > @@ -240,21 +243,11 @@ struct tcf_exts {
> > >   static inline int tcf_exts_init(struct tcf_exts *exts, struct net *net,
> > >   				int action, int police)
> > >   {
> > > -#ifdef CONFIG_NET_CLS_ACT
> > > -	exts->type = 0;
> > > -	exts->nr_actions = 0;
> > > -	/* Note: we do not own yet a reference on net.
> > > -	 * This reference might be taken later from tcf_exts_get_net().
> > > -	 */
> > > -	exts->net = net;
> > > -	exts->actions = kcalloc(TCA_ACT_MAX_PRIO, sizeof(struct tc_action *),
> > > -				GFP_KERNEL);
> > > -	if (!exts->actions)
> > > -		return -ENOMEM;
> > > +#ifdef CONFIG_NET_CLS
> > > +	return tcf_exts_init_ex(exts, net, action, police, NULL, 0, false);
> > > +#else
> > > +	return -EOPNOTSUPP;
> > >   #endif
> > > -	exts->action = action;
> > > -	exts->police = police;
> > > -	return 0;
> > >   }
> > >   /* Return false if the netns is being destroyed in cleanup_net(). Callers
> > > @@ -353,6 +346,18 @@ tcf_exts_exec(struct sk_buff *skb, struct tcf_exts *exts,
> > >   	return TC_ACT_OK;
> > >   }
> > > +static inline int
> > > +tcf_exts_exec_ex(struct sk_buff *skb, struct tcf_exts *exts, int act_index,
> > > +		 struct tcf_result *res)
> > > +{
> > > +#ifdef CONFIG_NET_CLS_ACT
> > > +	return tcf_action_exec(skb, exts->actions + act_index,
> > > +			       exts->nr_actions - act_index, res);
> > > +#else
> > > +	return TC_ACT_OK;
> > > +#endif
> > > +}
> > > +
> > >   int tcf_exts_validate(struct net *net, struct tcf_proto *tp,
> > >   		      struct nlattr **tb, struct nlattr *rate_tlv,
> > >   		      struct tcf_exts *exts, u32 flags,
> > > @@ -577,6 +582,7 @@ int tc_setup_offload_action(struct flow_action *flow_action,
> > >   void tc_cleanup_offload_action(struct flow_action *flow_action);
> > >   int tc_setup_action(struct flow_action *flow_action,
> > >   		    struct tc_action *actions[],
> > > +		    u32 miss_cookie_base,
> > >   		    struct netlink_ext_ack *extack);
> > >   int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
> > > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> > > index af4aa66aaa4e..fab5ba3e61b7 100644
> > > --- a/include/net/sch_generic.h
> > > +++ b/include/net/sch_generic.h
> > > @@ -369,6 +369,8 @@ struct tcf_proto_ops {
> > >   						struct nlattr **tca,
> > >   						struct netlink_ext_ack *extack);
> > >   	void			(*tmplt_destroy)(void *tmplt_priv);
> > > +	struct tcf_exts *	(*get_exts)(const struct tcf_proto *tp,
> > > +					    u32 handle);
> > >   	/* rtnetlink specific */
> > >   	int			(*dump)(struct net*, struct tcf_proto*, void *,
> > > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> > > index e20d1a973417..69f91460a55c 100644
> > > --- a/net/openvswitch/flow.c
> > > +++ b/net/openvswitch/flow.c
> > > @@ -1038,7 +1038,8 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
> > >   #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> > >   	if (tc_skb_ext_tc_enabled()) {
> > >   		tc_ext = skb_ext_find(skb, TC_SKB_EXT);
> > > -		key->recirc_id = tc_ext ? tc_ext->chain : 0;
> > > +		key->recirc_id = tc_ext && !tc_ext->act_miss ?
> > > +				 tc_ext->chain : 0;
> > >   		OVS_CB(skb)->mru = tc_ext ? tc_ext->mru : 0;
> > >   		post_ct = tc_ext ? tc_ext->post_ct : false;
> > >   		post_ct_snat = post_ct ? tc_ext->post_ct_snat : false;
> > > diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> > > index cd09ef49df22..16fd3d30eb12 100644
> > > --- a/net/sched/act_api.c
> > > +++ b/net/sched/act_api.c
> > > @@ -272,7 +272,7 @@ static int tcf_action_offload_add_ex(struct tc_action *action,
> > >   	if (err)
> > >   		goto fl_err;
> > > -	err = tc_setup_action(&fl_action->action, actions, extack);
> > > +	err = tc_setup_action(&fl_action->action, actions, 0, extack);
> > >   	if (err) {
> > >   		NL_SET_ERR_MSG_MOD(extack,
> > >   				   "Failed to setup tc actions for offload");
> > > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> > > index 5b4a95e8a1ee..8ff9530fef68 100644
> > > --- a/net/sched/cls_api.c
> > > +++ b/net/sched/cls_api.c
> > > @@ -22,6 +22,7 @@
> > >   #include <linux/idr.h>
> > >   #include <linux/jhash.h>
> > >   #include <linux/rculist.h>
> > > +#include <linux/rhashtable.h>
> > >   #include <net/net_namespace.h>
> > >   #include <net/sock.h>
> > >   #include <net/netlink.h>
> > > @@ -50,6 +51,109 @@ static LIST_HEAD(tcf_proto_base);
> > >   /* Protects list of registered TC modules. It is pure SMP lock. */
> > >   static DEFINE_RWLOCK(cls_mod_lock);
> > > +static struct xarray tcf_exts_miss_cookies_xa;
> > > +struct tcf_exts_miss_cookie_node {
> > > +	const struct tcf_chain *chain;
> > > +	const struct tcf_proto *tp;
> > > +	const struct tcf_exts *exts;
> > > +	u32 chain_index;
> > > +	u32 tp_prio;
> > > +	u32 handle;
> > > +	u32 miss_cookie_base;
> > > +	struct rcu_head rcu;
> > > +};
> > > +
> > > +/* Each tc action entry cookie will be comprised of 32bit miss_cookie_base +
> > > + * action index in the exts tc actions array.
> > > + */
> > > +union tcf_exts_miss_cookie {
> > > +	struct {
> > > +		u32 miss_cookie_base;
> > > +		u32 act_index;
> > > +	};
> > > +	u64 miss_cookie;
> > > +};
> > > +
> > > +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> > > +static int
> > > +tcf_exts_miss_cookie_base_alloc(struct tcf_exts *exts, struct tcf_proto *tp,
> > > +				u32 handle)
> > > +{
> > > +	struct tcf_exts_miss_cookie_node *n;
> > > +	static u32 next;
> > 
> > What protects this static variable from concurrent access?
> 
> 
> Nothing, but it acts as a suggestion for xa_alloc, so I don't
> think syncronzation is needed here.
> 
> 
> > 
> > > +	int err;
> > > +
> > > +	if (WARN_ON(!handle || !tp->ops->get_exts))
> > > +		return -EINVAL;
> > > +
> > > +	n = kzalloc(sizeof(*n), GFP_KERNEL);
> > > +	if (!n)
> > > +		return -ENOMEM;
> > > +
> > > +	n->chain_index = tp->chain->index;
> > > +	n->chain = tp->chain;
> > > +	n->tp_prio = tp->prio;
> > > +	n->tp = tp;
> > > +	n->exts = exts;
> > > +	n->handle = handle;
> > > +
> > > +	err = xa_alloc_cyclic(&tcf_exts_miss_cookies_xa, &n->miss_cookie_base,
> > > +			      n, xa_limit_32b, &next, GFP_KERNEL);
> > > +	if (err)
> > > +		goto err_xa_alloc;
> > > +
> > > +	exts->miss_cookie_node = n;
> > > +	return 0;
> > > +
> > > +err_xa_alloc:
> > > +	kfree(n);
> > > +	return err;
> > > +}
> > > +
> > > +static void tcf_exts_miss_cookie_base_destroy(struct tcf_exts *exts)
> > > +{
> > > +	struct tcf_exts_miss_cookie_node *n;
> > > +
> > > +	if (!exts->miss_cookie_node)
> > > +		return;
> > > +
> > > +	n = exts->miss_cookie_node;
> > > +	xa_erase(&tcf_exts_miss_cookies_xa, n->miss_cookie_base);
> > > +	kfree_rcu(n, rcu);
> > > +}
> > > +
> > > +static struct tcf_exts_miss_cookie_node *
> > > +tcf_exts_miss_cookie_lookup(u64 miss_cookie, int *act_index)
> > > +{
> > > +	union tcf_exts_miss_cookie mc = { .miss_cookie = miss_cookie, };
> > > +
> > > +	*act_index = mc.act_index;
> > > +	return xa_load(&tcf_exts_miss_cookies_xa, mc.miss_cookie_base);
> > > +}
> > > +#else /* IS_ENABLED(CONFIG_NET_TC_SKB_EXT) */
> > > +static int
> > > +tcf_exts_miss_cookie_base_alloc(struct tcf_exts *exts, struct tcf_proto *tp,
> > > +				u32 handle)
> > > +{
> > > +	return 0;
> > > +}
> > > +
> > > +static void tcf_exts_miss_cookie_base_destroy(struct tcf_exts *exts)
> > > +{
> > > +}
> > > +#endif /* IS_ENABLED(CONFIG_NET_TC_SKB_EXT) */
> > > +
> > > +static u64 tcf_exts_miss_cookie_get(u32 miss_cookie_base, int act_index)
> > > +{
> > > +	union tcf_exts_miss_cookie mc = { .act_index = act_index, };
> > > +
> > > +	if (!miss_cookie_base)
> > > +		return 0;
> > > +
> > > +	mc.miss_cookie_base = miss_cookie_base;
> > > +	return mc.miss_cookie;
> > > +}
> > > +
> > >   #ifdef CONFIG_NET_CLS_ACT
> > >   DEFINE_STATIC_KEY_FALSE(tc_skb_ext_tc);
> > >   EXPORT_SYMBOL(tc_skb_ext_tc);
> > > @@ -1549,6 +1653,8 @@ static inline int __tcf_classify(struct sk_buff *skb,
> > >   				 const struct tcf_proto *orig_tp,
> > >   				 struct tcf_result *res,
> > >   				 bool compat_mode,
> > > +				 struct tcf_exts_miss_cookie_node *n,
> > > +				 int act_index,
> > >   				 u32 *last_executed_chain)
> > >   {
> > >   #ifdef CONFIG_NET_CLS_ACT
> > > @@ -1560,13 +1666,36 @@ static inline int __tcf_classify(struct sk_buff *skb,
> > >   #endif
> > >   	for (; tp; tp = rcu_dereference_bh(tp->next)) {
> > >   		__be16 protocol = skb_protocol(skb, false);
> > > -		int err;
> > > +		int err = 0;
> > > -		if (tp->protocol != protocol &&
> > > -		    tp->protocol != htons(ETH_P_ALL))
> > > -			continue;
> > > +		if (n) {
> > > +			struct tcf_exts *exts;
> > > +
> > > +			if (n->tp_prio != tp->prio)
> > > +				continue;
> > > +
> > > +			/* We re-lookup the tp and chain based on index instead
> > > +			 * of having hard refs and locks to them, so do a sanity
> > > +			 * check if any of tp,chain,exts was replaced by the
> > > +			 * time we got here with a cookie from hardware.
> > > +			 */
> > > +			if (unlikely(n->tp != tp || n->tp->chain != n->chain ||
> > > +				     !tp->ops->get_exts))
> > > +				return TC_ACT_SHOT;
> > > +
> > > +			exts = tp->ops->get_exts(tp, n->handle);
> > > +			if (unlikely(!exts || n->exts != exts))
> > > +				return TC_ACT_SHOT;
> > > -		err = tc_classify(skb, tp, res);
> > > +			n = NULL;
> > > +			err = tcf_exts_exec_ex(skb, exts, act_index, res);
> > > +		} else {
> > > +			if (tp->protocol != protocol &&
> > > +			    tp->protocol != htons(ETH_P_ALL))
> > > +				continue;
> > > +
> > > +			err = tc_classify(skb, tp, res);
> > > +		}
> > >   #ifdef CONFIG_NET_CLS_ACT
> > >   		if (unlikely(err == TC_ACT_RECLASSIFY && !compat_mode)) {
> > >   			first_tp = orig_tp;
> > > @@ -1582,6 +1711,9 @@ static inline int __tcf_classify(struct sk_buff *skb,
> > >   			return err;
> > >   	}
> > > +	if (unlikely(n))
> > > +		return TC_ACT_SHOT;
> > > +
> > >   	return TC_ACT_UNSPEC; /* signal: continue lookup */
> > >   #ifdef CONFIG_NET_CLS_ACT
> > >   reset:
> > > @@ -1606,21 +1738,33 @@ int tcf_classify(struct sk_buff *skb,
> > >   #if !IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> > >   	u32 last_executed_chain = 0;
> > > -	return __tcf_classify(skb, tp, tp, res, compat_mode,
> > > +	return __tcf_classify(skb, tp, tp, res, compat_mode, NULL, 0,
> > >   			      &last_executed_chain);
> > >   #else
> > >   	u32 last_executed_chain = tp ? tp->chain->index : 0;
> > > +	struct tcf_exts_miss_cookie_node *n = NULL;
> > >   	const struct tcf_proto *orig_tp = tp;
> > >   	struct tc_skb_ext *ext;
> > > +	int act_index = 0;
> > >   	int ret;
> > >   	if (block) {
> > >   		ext = skb_ext_find(skb, TC_SKB_EXT);
> > > -		if (ext && ext->chain) {
> > > +		if (ext && (ext->chain || ext->act_miss)) {
> > >   			struct tcf_chain *fchain;
> > > +			u32 chain = ext->chain;
> > 
> > IMHO it would be nice to avoid this assignment here, because there is
> > nothing above saying that the union value can be read as a chain. For
> > C, yes, it is okay. The worry is about ensuring that whatever is
> > reading it, is reading what it is expecting it to be.
> > Perhaps just declare it here, and then:
> 
> Right, not nice, will fix.
> 
> > 
> > > -			fchain = tcf_chain_lookup_rcu(block, ext->chain);
> > > +			if (ext->act_miss) {
> > > +				n = tcf_exts_miss_cookie_lookup(ext->act_miss_cookie,
> > > +								&act_index);
> > > +				if (!n)
> > > +					return TC_ACT_SHOT;
> > > +
> > > +				chain = n->chain_index;
> > > +			}
> > 
> > 			else {
> > 				chain = ext->chain;
> > 			}
> > 
> > > +
> > > +			fchain = tcf_chain_lookup_rcu(block, chain);
> > >   			if (!fchain)
> > >   				return TC_ACT_SHOT;
> > > @@ -1632,7 +1776,7 @@ int tcf_classify(struct sk_buff *skb,
> > >   		}
> > >   	}
> > > -	ret = __tcf_classify(skb, tp, orig_tp, res, compat_mode,
> > > +	ret = __tcf_classify(skb, tp, orig_tp, res, compat_mode, n, act_index,
> > >   			     &last_executed_chain);
> > >   	if (tc_skb_ext_tc_enabled()) {
> > > @@ -3056,9 +3200,48 @@ static int tc_dump_chain(struct sk_buff *skb, struct netlink_callback *cb)
> > >   	return skb->len;
> > >   }
> > > +int tcf_exts_init_ex(struct tcf_exts *exts, struct net *net, int action,
> > > +		     int police, struct tcf_proto *tp, u32 handle,
> > > +		     bool use_action_miss)
> > > +{
> > > +	int err = 0;
> > > +
> > > +#ifdef CONFIG_NET_CLS_ACT
> > > +	exts->type = 0;
> > > +	exts->nr_actions = 0;
> > > +	/* Note: we do not own yet a reference on net.
> > > +	 * This reference might be taken later from tcf_exts_get_net().
> > > +	 */
> > > +	exts->net = net;
> > > +	exts->actions = kcalloc(TCA_ACT_MAX_PRIO, sizeof(struct tc_action *),
> > > +				GFP_KERNEL);
> > > +	if (!exts->actions)
> > > +		return -ENOMEM;
> > > +#endif
> > > +
> > > +	exts->action = action;
> > > +	exts->police = police;
> > > +
> > > +	if (!use_action_miss)
> > > +		return 0;
> > > +
> > > +	err = tcf_exts_miss_cookie_base_alloc(exts, tp, handle);
> > > +	if (err)
> > > +		goto err_miss_alloc;
> > > +
> > > +	return 0;
> > > +
> > > +err_miss_alloc:
> > > +	tcf_exts_destroy(exts);
> > > +	return err;
> > > +}
> > > +EXPORT_SYMBOL(tcf_exts_init_ex);
> > > +
> > >   void tcf_exts_destroy(struct tcf_exts *exts)
> > >   {
> > >   #ifdef CONFIG_NET_CLS_ACT
> > > +	tcf_exts_miss_cookie_base_destroy(exts);
> > > +
> > >   	if (exts->actions) {
> > >   		tcf_action_destroy(exts->actions, TCA_ACT_UNBIND);
> > >   		kfree(exts->actions);
> > > @@ -3547,6 +3730,7 @@ static int tc_setup_offload_act(struct tc_action *act,
> > >   int tc_setup_action(struct flow_action *flow_action,
> > >   		    struct tc_action *actions[],
> > > +		    u32 miss_cookie_base,
> > >   		    struct netlink_ext_ack *extack)
> > >   {
> > >   	int i, j, k, index, err = 0;
> > > @@ -3577,6 +3761,8 @@ int tc_setup_action(struct flow_action *flow_action,
> > >   		for (k = 0; k < index ; k++) {
> > >   			entry[k].hw_stats = tc_act_hw_stats(act->hw_stats);
> > >   			entry[k].hw_index = act->tcfa_index;
> > > +			entry[k].miss_cookie =
> > > +				tcf_exts_miss_cookie_get(miss_cookie_base, i);
> > >   		}
> > >   		j += index;
> > > @@ -3599,10 +3785,15 @@ int tc_setup_offload_action(struct flow_action *flow_action,
> > >   			    struct netlink_ext_ack *extack)
> > >   {
> > >   #ifdef CONFIG_NET_CLS_ACT
> > > +	u32 miss_cookie_base;
> > > +
> > >   	if (!exts)
> > >   		return 0;
> > > -	return tc_setup_action(flow_action, exts->actions, extack);
> > > +	miss_cookie_base = exts->miss_cookie_node ?
> > > +			   exts->miss_cookie_node->miss_cookie_base : 0;
> > > +	return tc_setup_action(flow_action, exts->actions, miss_cookie_base,
> > > +			       extack);
> > >   #else
> > >   	return 0;
> > >   #endif
> > > @@ -3770,6 +3961,8 @@ static int __init tc_filter_init(void)
> > >   	if (err)
> > >   		goto err_register_pernet_subsys;
> > > +	xa_init_flags(&tcf_exts_miss_cookies_xa, XA_FLAGS_ALLOC1);
> > > +
> > >   	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL,
> > >   		      RTNL_FLAG_DOIT_UNLOCKED);
> > >   	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL,
> > > -- 
> > > 2.30.1
> > > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v9 1/7] net/sched: cls_api: Support hardware miss to tc action
  2023-02-13 18:43       ` Marcelo Ricardo Leitner
@ 2023-02-14 12:14         ` Paul Blakey
  2023-02-14 12:31           ` Oz Shlomo
  0 siblings, 1 reply; 21+ messages in thread
From: Paul Blakey @ 2023-02-14 12:14 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller,
	Oz Shlomo, Jiri Pirko, Roi Dayan, Vlad Buslov, Simon Horman


On 13/02/2023 20:43, Marcelo Ricardo Leitner wrote:
> On Mon, Feb 13, 2023 at 06:13:34PM +0200, Paul Blakey wrote:
>>
>> On 10/02/2023 04:21, Marcelo Ricardo Leitner wrote:
>>> On Mon, Feb 06, 2023 at 07:43:57PM +0200, Paul Blakey wrote:
>>>> For drivers to support partial offload of a filter's action list,
>>>> add support for action miss to specify an action instance to
>>>> continue from in sw.
>>>>
>>>> CT action in particular can't be fully offloaded, as new connections
>>>> need to be handled in software. This imposes other limitations on
>>>> the actions that can be offloaded together with the CT action, such
>>>> as packet modifications.
>>>>
>>>> Assign each action on a filter's action list a unique miss_cookie
>>>> which drivers can then use to fill action_miss part of the tc skb
>>>> extension. On getting back this miss_cookie, find the action
>>>> instance with relevant cookie and continue classifying from there.
>>>>
>>>> Signed-off-by: Paul Blakey <paulb@nvidia.com>
>>>> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
>>>> Reviewed-by: Simon Horman <simon.horman@corigine.com>
>>>> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
>>>> ---
>>>>   include/linux/skbuff.h     |   6 +-
>>>>   include/net/flow_offload.h |   1 +
>>>>   include/net/pkt_cls.h      |  34 +++---
>>>>   include/net/sch_generic.h  |   2 +
>>>>   net/openvswitch/flow.c     |   3 +-
>>>>   net/sched/act_api.c        |   2 +-
>>>>   net/sched/cls_api.c        | 213 +++++++++++++++++++++++++++++++++++--
>>>>   7 files changed, 234 insertions(+), 27 deletions(-)
>>>>
>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>> index 1fa95b916342..9b9aa854068f 100644
>>>> --- a/include/linux/skbuff.h
>>>> +++ b/include/linux/skbuff.h
>>>> @@ -311,12 +311,16 @@ struct nf_bridge_info {
>>>>    * and read by ovs to recirc_id.
>>>>    */
>>>>   struct tc_skb_ext {
>>>> -	__u32 chain;
>>>> +	union {
>>>> +		u64 act_miss_cookie;
>>>> +		__u32 chain;
>>>> +	};
>>>>   	__u16 mru;
>>>>   	__u16 zone;
>>>>   	u8 post_ct:1;
>>>>   	u8 post_ct_snat:1;
>>>>   	u8 post_ct_dnat:1;
>>>> +	u8 act_miss:1; /* Set if act_miss_cookie is used */
>>>>   };
>>>>   #endif
>>>> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
>>>> index 0400a0ac8a29..88db7346eb7a 100644
>>>> --- a/include/net/flow_offload.h
>>>> +++ b/include/net/flow_offload.h
>>>> @@ -228,6 +228,7 @@ void flow_action_cookie_destroy(struct flow_action_cookie *cookie);
>>>>   struct flow_action_entry {
>>>>   	enum flow_action_id		id;
>>>>   	u32				hw_index;
>>>> +	u64				miss_cookie;
>>> The per-action stats patchset is adding a cookie for the actions as
>>> well, and exactly on this struct:
>>>
>>> @@ -228,6 +228,7 @@ struct flow_action_cookie *flow_action_cookie_create(void *data,
>>>   struct flow_action_entry {
>>>          enum flow_action_id             id;
>>>          u32                             hw_index;
>>> +       unsigned long                   act_cookie;
>>>          enum flow_action_hw_stats       hw_stats;
>>>          action_destr                    destructor;
>>>          void                            *destructor_priv;
>>>
>>> There, it is a simple value: the act pointer itself. Here, it is already more
>>> complex. Can them be merged into only one maybe?
>>> If not, perhaps act_cookie should be renamed to stats_cookie then.
>> I don't think it can be shared, actions can be shared between multiple
>> filters, while the miss cookie would be different for each used instance
>> (takes the filter in to account).
> Good point. So it would at best be a masked value that part A works
> for the miss here and part B for the stats, which is pretty much what
> the two cookies are giving, just without having to do bit gymnasics,
> yes.

act cookie is using 64 bits (to store the pointer and void a mapping), and I'm using at least

32bits, so there is not simple type that will contain both.

So I'll rename the act_cookie to stats_cookie once I rebase.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v9 1/7] net/sched: cls_api: Support hardware miss to tc action
  2023-02-14 12:14         ` Paul Blakey
@ 2023-02-14 12:31           ` Oz Shlomo
  2023-02-14 18:48             ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 21+ messages in thread
From: Oz Shlomo @ 2023-02-14 12:31 UTC (permalink / raw)
  To: Paul Blakey, Marcelo Ricardo Leitner
  Cc: netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller,
	Jiri Pirko, Roi Dayan, Vlad Buslov, Simon Horman


On 14/02/2023 14:14, Paul Blakey wrote:
> On 13/02/2023 20:43, Marcelo Ricardo Leitner wrote:
>> On Mon, Feb 13, 2023 at 06:13:34PM +0200, Paul Blakey wrote:
>>> On 10/02/2023 04:21, Marcelo Ricardo Leitner wrote:
>>>> On Mon, Feb 06, 2023 at 07:43:57PM +0200, Paul Blakey wrote:
>>>>> For drivers to support partial offload of a filter's action list,
>>>>> add support for action miss to specify an action instance to
>>>>> continue from in sw.
>>>>>
>>>>> CT action in particular can't be fully offloaded, as new connections
>>>>> need to be handled in software. This imposes other limitations on
>>>>> the actions that can be offloaded together with the CT action, such
>>>>> as packet modifications.
>>>>>
>>>>> Assign each action on a filter's action list a unique miss_cookie
>>>>> which drivers can then use to fill action_miss part of the tc skb
>>>>> extension. On getting back this miss_cookie, find the action
>>>>> instance with relevant cookie and continue classifying from there.
>>>>>
>>>>> Signed-off-by: Paul Blakey <paulb@nvidia.com>
>>>>> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
>>>>> Reviewed-by: Simon Horman <simon.horman@corigine.com>
>>>>> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
>>>>> ---
>>>>>    include/linux/skbuff.h     |   6 +-
>>>>>    include/net/flow_offload.h |   1 +
>>>>>    include/net/pkt_cls.h      |  34 +++---
>>>>>    include/net/sch_generic.h  |   2 +
>>>>>    net/openvswitch/flow.c     |   3 +-
>>>>>    net/sched/act_api.c        |   2 +-
>>>>>    net/sched/cls_api.c        | 213 +++++++++++++++++++++++++++++++++++--
>>>>>    7 files changed, 234 insertions(+), 27 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>>> index 1fa95b916342..9b9aa854068f 100644
>>>>> --- a/include/linux/skbuff.h
>>>>> +++ b/include/linux/skbuff.h
>>>>> @@ -311,12 +311,16 @@ struct nf_bridge_info {
>>>>>     * and read by ovs to recirc_id.
>>>>>     */
>>>>>    struct tc_skb_ext {
>>>>> -	__u32 chain;
>>>>> +	union {
>>>>> +		u64 act_miss_cookie;
>>>>> +		__u32 chain;
>>>>> +	};
>>>>>    	__u16 mru;
>>>>>    	__u16 zone;
>>>>>    	u8 post_ct:1;
>>>>>    	u8 post_ct_snat:1;
>>>>>    	u8 post_ct_dnat:1;
>>>>> +	u8 act_miss:1; /* Set if act_miss_cookie is used */
>>>>>    };
>>>>>    #endif
>>>>> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
>>>>> index 0400a0ac8a29..88db7346eb7a 100644
>>>>> --- a/include/net/flow_offload.h
>>>>> +++ b/include/net/flow_offload.h
>>>>> @@ -228,6 +228,7 @@ void flow_action_cookie_destroy(struct flow_action_cookie *cookie);
>>>>>    struct flow_action_entry {
>>>>>    	enum flow_action_id		id;
>>>>>    	u32				hw_index;
>>>>> +	u64				miss_cookie;
>>>> The per-action stats patchset is adding a cookie for the actions as
>>>> well, and exactly on this struct:
>>>>
>>>> @@ -228,6 +228,7 @@ struct flow_action_cookie *flow_action_cookie_create(void *data,
>>>>    struct flow_action_entry {
>>>>           enum flow_action_id             id;
>>>>           u32                             hw_index;
>>>> +       unsigned long                   act_cookie;
>>>>           enum flow_action_hw_stats       hw_stats;
>>>>           action_destr                    destructor;
>>>>           void                            *destructor_priv;
>>>>
>>>> There, it is a simple value: the act pointer itself. Here, it is already more
>>>> complex. Can them be merged into only one maybe?
>>>> If not, perhaps act_cookie should be renamed to stats_cookie then.
>>> I don't think it can be shared, actions can be shared between multiple
>>> filters, while the miss cookie would be different for each used instance
>>> (takes the filter in to account).
>> Good point. So it would at best be a masked value that part A works
>> for the miss here and part B for the stats, which is pretty much what
>> the two cookies are giving, just without having to do bit gymnasics,
>> yes.
> act cookie is using 64 bits (to store the pointer and void a mapping), and I'm using at least
>
> 32bits, so there is not simple type that will contain both.
>
> So I'll rename the act_cookie to stats_cookie once I rebase.
>
The current act_cookie uniquely identifies the action instance.

I think it might be used in other use cases and not just for stats.

Actually, I think the current naming scheme of act_cookie and 
miss_cookie makes sense.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v9 1/7] net/sched: cls_api: Support hardware miss to tc action
  2023-02-14 12:31           ` Oz Shlomo
@ 2023-02-14 18:48             ` Marcelo Ricardo Leitner
  2023-02-14 19:24               ` Edward Cree
  0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-02-14 18:48 UTC (permalink / raw)
  To: Oz Shlomo
  Cc: Paul Blakey, netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller,
	Jiri Pirko, Roi Dayan, Vlad Buslov, Simon Horman

On Tue, Feb 14, 2023 at 02:31:06PM +0200, Oz Shlomo wrote:
> 
> On 14/02/2023 14:14, Paul Blakey wrote:
> > On 13/02/2023 20:43, Marcelo Ricardo Leitner wrote:
> > > On Mon, Feb 13, 2023 at 06:13:34PM +0200, Paul Blakey wrote:
> > > > On 10/02/2023 04:21, Marcelo Ricardo Leitner wrote:
> > > > > On Mon, Feb 06, 2023 at 07:43:57PM +0200, Paul Blakey wrote:
> > > > > > For drivers to support partial offload of a filter's action list,
> > > > > > add support for action miss to specify an action instance to
> > > > > > continue from in sw.
> > > > > > 
> > > > > > CT action in particular can't be fully offloaded, as new connections
> > > > > > need to be handled in software. This imposes other limitations on
> > > > > > the actions that can be offloaded together with the CT action, such
> > > > > > as packet modifications.
> > > > > > 
> > > > > > Assign each action on a filter's action list a unique miss_cookie
> > > > > > which drivers can then use to fill action_miss part of the tc skb
> > > > > > extension. On getting back this miss_cookie, find the action
> > > > > > instance with relevant cookie and continue classifying from there.
> > > > > > 
> > > > > > Signed-off-by: Paul Blakey <paulb@nvidia.com>
> > > > > > Reviewed-by: Jiri Pirko <jiri@nvidia.com>
> > > > > > Reviewed-by: Simon Horman <simon.horman@corigine.com>
> > > > > > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > > > > > ---
> > > > > >    include/linux/skbuff.h     |   6 +-
> > > > > >    include/net/flow_offload.h |   1 +
> > > > > >    include/net/pkt_cls.h      |  34 +++---
> > > > > >    include/net/sch_generic.h  |   2 +
> > > > > >    net/openvswitch/flow.c     |   3 +-
> > > > > >    net/sched/act_api.c        |   2 +-
> > > > > >    net/sched/cls_api.c        | 213 +++++++++++++++++++++++++++++++++++--
> > > > > >    7 files changed, 234 insertions(+), 27 deletions(-)
> > > > > > 
> > > > > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > > > > index 1fa95b916342..9b9aa854068f 100644
> > > > > > --- a/include/linux/skbuff.h
> > > > > > +++ b/include/linux/skbuff.h
> > > > > > @@ -311,12 +311,16 @@ struct nf_bridge_info {
> > > > > >     * and read by ovs to recirc_id.
> > > > > >     */
> > > > > >    struct tc_skb_ext {
> > > > > > -	__u32 chain;
> > > > > > +	union {
> > > > > > +		u64 act_miss_cookie;
> > > > > > +		__u32 chain;
> > > > > > +	};
> > > > > >    	__u16 mru;
> > > > > >    	__u16 zone;
> > > > > >    	u8 post_ct:1;
> > > > > >    	u8 post_ct_snat:1;
> > > > > >    	u8 post_ct_dnat:1;
> > > > > > +	u8 act_miss:1; /* Set if act_miss_cookie is used */
> > > > > >    };
> > > > > >    #endif
> > > > > > diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> > > > > > index 0400a0ac8a29..88db7346eb7a 100644
> > > > > > --- a/include/net/flow_offload.h
> > > > > > +++ b/include/net/flow_offload.h
> > > > > > @@ -228,6 +228,7 @@ void flow_action_cookie_destroy(struct flow_action_cookie *cookie);
> > > > > >    struct flow_action_entry {
> > > > > >    	enum flow_action_id		id;
> > > > > >    	u32				hw_index;
> > > > > > +	u64				miss_cookie;
> > > > > The per-action stats patchset is adding a cookie for the actions as
> > > > > well, and exactly on this struct:
> > > > > 
> > > > > @@ -228,6 +228,7 @@ struct flow_action_cookie *flow_action_cookie_create(void *data,
> > > > >    struct flow_action_entry {
> > > > >           enum flow_action_id             id;
> > > > >           u32                             hw_index;
> > > > > +       unsigned long                   act_cookie;
> > > > >           enum flow_action_hw_stats       hw_stats;
> > > > >           action_destr                    destructor;
> > > > >           void                            *destructor_priv;
> > > > > 
> > > > > There, it is a simple value: the act pointer itself. Here, it is already more
> > > > > complex. Can them be merged into only one maybe?
> > > > > If not, perhaps act_cookie should be renamed to stats_cookie then.
> > > > I don't think it can be shared, actions can be shared between multiple
> > > > filters, while the miss cookie would be different for each used instance
> > > > (takes the filter in to account).
> > > Good point. So it would at best be a masked value that part A works
> > > for the miss here and part B for the stats, which is pretty much what
> > > the two cookies are giving, just without having to do bit gymnasics,
> > > yes.
> > act cookie is using 64 bits (to store the pointer and void a mapping), and I'm using at least
> > 
> > 32bits, so there is not simple type that will contain both.
> > 
> > So I'll rename the act_cookie to stats_cookie once I rebase.
> > 
> The current act_cookie uniquely identifies the action instance.
> 

While miss_cookie also uniquely identifies the action, within a filter list.
Or, miss_cookie uniquely identifies an action configuration.

> I think it might be used in other use cases and not just for stats.

I don't disagree.

> 
> Actually, I think the current naming scheme of act_cookie and miss_cookie
> makes sense.

act_cookie is actually already used (and differently) in tc_action itself here:
1045ba77a596 ("net sched actions: Add support for user cookies")

With this, IMO it is tough to know which one takes precedence where.

Then perhaps,
act_cookie here -> instance_cookie
miss_cookie -> config_cookie

Sorry for the bikeshedding, btw, but these cookies are getting
confusing. We need them to taste nice :-}

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v9 1/7] net/sched: cls_api: Support hardware miss to tc action
  2023-02-14 18:48             ` Marcelo Ricardo Leitner
@ 2023-02-14 19:24               ` Edward Cree
  2023-02-15 10:09                 ` Paul Blakey
  0 siblings, 1 reply; 21+ messages in thread
From: Edward Cree @ 2023-02-14 19:24 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner, Oz Shlomo
  Cc: Paul Blakey, netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller,
	Jiri Pirko, Roi Dayan, Vlad Buslov, Simon Horman

On 14/02/2023 18:48, Marcelo Ricardo Leitner wrote:
> On Tue, Feb 14, 2023 at 02:31:06PM +0200, Oz Shlomo wrote:
>> Actually, I think the current naming scheme of act_cookie and miss_cookie
>> makes sense.
> 
> Then perhaps,
> act_cookie here -> instance_cookie
> miss_cookie -> config_cookie
> 
> Sorry for the bikeshedding, btw, but these cookies are getting
> confusing. We need them to taste nice :-}

I'm with Oz, keep the current name for act_cookie.

(In my ideal world, it'd just be called cookie, and the existing
 cookie in struct flow_action_entry would be renamed user_cookie.
 Because act_cookie is the same thing conceptually as
 flow_cls_offload.cookie.  Though I wonder if that means it
 belongs in struct flow_offload_action instead?)

-ed

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v9 1/7] net/sched: cls_api: Support hardware miss to tc action
  2023-02-14 19:24               ` Edward Cree
@ 2023-02-15 10:09                 ` Paul Blakey
  2023-02-15 16:14                   ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 21+ messages in thread
From: Paul Blakey @ 2023-02-15 10:09 UTC (permalink / raw)
  To: Edward Cree, Marcelo Ricardo Leitner, Oz Shlomo
  Cc: netdev, Saeed Mahameed, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Cong Wang, David S. Miller,
	Jiri Pirko, Roi Dayan, Vlad Buslov, Simon Horman



On 14/02/2023 21:24, Edward Cree wrote:
> On 14/02/2023 18:48, Marcelo Ricardo Leitner wrote:
>> On Tue, Feb 14, 2023 at 02:31:06PM +0200, Oz Shlomo wrote:
>>> Actually, I think the current naming scheme of act_cookie and miss_cookie
>>> makes sense.
>>
>> Then perhaps,
>> act_cookie here -> instance_cookie
>> miss_cookie -> config_cookie
>>
>> Sorry for the bikeshedding, btw, but these cookies are getting
>> confusing. We need them to taste nice :-}
> 
> I'm with Oz, keep the current name for act_cookie.
> 
> (In my ideal world, it'd just be called cookie, and the existing
>   cookie in struct flow_action_entry would be renamed user_cookie.
>   Because act_cookie is the same thing conceptually as
>   flow_cls_offload.cookie.  Though I wonder if that means it
>   belongs in struct flow_offload_action instead?)
> 
> -ed




Ok so I want to add this patch to the series:


 From 326938812758dbd2591b221452708504911ca419 Mon Sep 17 00:00:00 2001
From: Paul Blakey <paulb@nvidia.com>
Date: Wed, 15 Feb 2023 10:57:40 +0200
Subject: [PATCH] net: sched: Rename user cookie and act cookie

struct tc_action->act_cookie is a user defined cookie,
and the related struct flow_action_entry->act_cookie is
used as an handle similar to struct flow_cls_offload->cookie.

Rename tc_action->act_cookie to user_cookie, and
flow_action_entry->act_cookie to cookie so their names
would better fit their usage.

Issue: 3226890
Change-Id: I3cfff2323f50234250e510062fd27307b6aa1896
Signed-off-by: Paul Blakey <paulb@nvidia.com>
---

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 2d06b44..208809a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -4180,7 +4180,7 @@

  		parse_state->actions |= attr->action;
  		if (!tc_act->stats_action)
-			attr->tc_act_cookies[attr->tc_act_cookies_count++] = act->act_cookie;
+			attr->tc_act_cookies[attr->tc_act_cookies_count++] = act->cookie;

  		/* Split attr for multi table act if not the last act. */
  		if (jump_state.jump_target ||
diff --git a/include/net/act_api.h b/include/net/act_api.h
index 2a6f443..4ae0580 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -39,7 +39,7 @@
  	struct gnet_stats_basic_sync __percpu *cpu_bstats;
  	struct gnet_stats_basic_sync __percpu *cpu_bstats_hw;
  	struct gnet_stats_queue __percpu *cpu_qstats;
-	struct tc_cookie	__rcu *act_cookie;
+	struct tc_cookie	__rcu *user_cookie;
  	struct tcf_chain	__rcu *goto_chain;
  	u32			tcfa_flags;
  	u8			hw_stats;
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 8c05455..9c5cb12 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -228,7 +228,7 @@
  struct flow_action_entry {
  	enum flow_action_id		id;
  	u32				hw_index;
-	unsigned long			act_cookie;
+	unsigned long			cookie;
  	enum flow_action_hw_stats	hw_stats;
  	action_destr			destructor;
  	void				*destructor_priv;
@@ -321,7 +321,7 @@
  			u16		sid;
  		} pppoe;
  	};
-	struct flow_action_cookie *cookie; /* user defined action cookie */
+	struct flow_action_cookie *user_cookie; /* user defined action cookie */
  };

  struct flow_action {
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index eda58b7..e67ebc9 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -125,7 +125,7 @@
  	free_percpu(p->cpu_bstats_hw);
  	free_percpu(p->cpu_qstats);

-	tcf_set_action_cookie(&p->act_cookie, NULL);
+	tcf_set_action_cookie(&p->user_cookie, NULL);
  	if (chain)
  		tcf_chain_put_by_act(chain);

@@ -431,14 +431,14 @@

  static size_t tcf_action_shared_attrs_size(const struct tc_action *act)
  {
-	struct tc_cookie *act_cookie;
+	struct tc_cookie *user_cookie;
  	u32 cookie_len = 0;

  	rcu_read_lock();
-	act_cookie = rcu_dereference(act->act_cookie);
+	user_cookie = rcu_dereference(act->user_cookie);

-	if (act_cookie)
-		cookie_len = nla_total_size(act_cookie->len);
+	if (user_cookie)
+		cookie_len = nla_total_size(user_cookie->len);
  	rcu_read_unlock();

  	return  nla_total_size(0) /* action number nested */
@@ -488,7 +488,7 @@
  		goto nla_put_failure;

  	rcu_read_lock();
-	cookie = rcu_dereference(a->act_cookie);
+	cookie = rcu_dereference(a->user_cookie);
  	if (cookie) {
  		if (nla_put(skb, TCA_ACT_COOKIE, cookie->len, cookie->data)) {
  			rcu_read_unlock();
@@ -1362,9 +1362,9 @@
  {
  	bool police = flags & TCA_ACT_FLAGS_POLICE;
  	struct nla_bitfield32 userflags = { 0, 0 };
+	struct tc_cookie *user_cookie = NULL;
  	u8 hw_stats = TCA_ACT_HW_STATS_ANY;
  	struct nlattr *tb[TCA_ACT_MAX + 1];
-	struct tc_cookie *cookie = NULL;
  	struct tc_action *a;
  	int err;

@@ -1375,8 +1375,8 @@
  		if (err < 0)
  			return ERR_PTR(err);
  		if (tb[TCA_ACT_COOKIE]) {
-			cookie = nla_memdup_cookie(tb);
-			if (!cookie) {
+			user_cookie = nla_memdup_cookie(tb);
+			if (!user_cookie) {
  				NL_SET_ERR_MSG(extack, "No memory to generate TC cookie");
  				err = -ENOMEM;
  				goto err_out;
@@ -1402,7 +1402,7 @@
  	*init_res = err;

  	if (!police && tb[TCA_ACT_COOKIE])
-		tcf_set_action_cookie(&a->act_cookie, cookie);
+		tcf_set_action_cookie(&a->user_cookie, user_cookie);

  	if (!police)
  		a->hw_stats = hw_stats;
@@ -1410,9 +1410,9 @@
  	return a;

  err_out:
-	if (cookie) {
-		kfree(cookie->data);
-		kfree(cookie);
+	if (user_cookie) {
+		kfree(user_cookie->data);
+		kfree(user_cookie);
  	}
  	return ERR_PTR(err);
  }
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index bfabc9c..656049e 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3490,28 +3490,28 @@
  }
  EXPORT_SYMBOL(tc_setup_cb_reoffload);

-static int tcf_act_get_cookie(struct flow_action_entry *entry,
-			      const struct tc_action *act)
+static int tcf_act_get_user_cookie(struct flow_action_entry *entry,
+				   const struct tc_action *act)
  {
-	struct tc_cookie *cookie;
+	struct tc_cookie *user_cookie;
  	int err = 0;

  	rcu_read_lock();
-	cookie = rcu_dereference(act->act_cookie);
-	if (cookie) {
-		entry->cookie = flow_action_cookie_create(cookie->data,
-							  cookie->len,
-							  GFP_ATOMIC);
-		if (!entry->cookie)
+	user_cookie = rcu_dereference(act->user_cookie);
+	if (user_cookie) {
+		entry->user_cookie = flow_action_cookie_create(user_cookie->data,
+							       user_cookie->len,
+							       GFP_ATOMIC);
+		if (!entry->user_cookie)
  			err = -ENOMEM;
  	}
  	rcu_read_unlock();
  	return err;
  }

-static void tcf_act_put_cookie(struct flow_action_entry *entry)
+static void tcf_act_put_user_cookie(struct flow_action_entry *entry)
  {
-	flow_action_cookie_destroy(entry->cookie);
+	flow_action_cookie_destroy(entry->user_cookie);
  }

  void tc_cleanup_offload_action(struct flow_action *flow_action)
@@ -3520,7 +3520,7 @@
  	int i;

  	flow_action_for_each(i, entry, flow_action) {
-		tcf_act_put_cookie(entry);
+		tcf_act_put_user_cookie(entry);
  		if (entry->destructor)
  			entry->destructor(entry->destructor_priv);
  	}
@@ -3565,7 +3565,7 @@

  		entry = &flow_action->entries[j];
  		spin_lock_bh(&act->tcfa_lock);
-		err = tcf_act_get_cookie(entry, act);
+		err = tcf_act_get_user_cookie(entry, act);
  		if (err)
  			goto err_out_locked;

@@ -3577,7 +3577,7 @@
  		for (k = 0; k < index ; k++) {
  			entry[k].hw_stats = tc_act_hw_stats(act->hw_stats);
  			entry[k].hw_index = act->tcfa_index;
-			entry[k].act_cookie = (unsigned long)act;
+			entry[k].cookie = (unsigned long)act;
  		}

  		j += index;


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next v9 1/7] net/sched: cls_api: Support hardware miss to tc action
  2023-02-15 10:09                 ` Paul Blakey
@ 2023-02-15 16:14                   ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-02-15 16:14 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Edward Cree, Oz Shlomo, netdev, Saeed Mahameed, Paolo Abeni,
	Jakub Kicinski, Eric Dumazet, Jamal Hadi Salim, Cong Wang,
	David S. Miller, Jiri Pirko, Roi Dayan, Vlad Buslov,
	Simon Horman

On Wed, Feb 15, 2023 at 12:09:51PM +0200, Paul Blakey wrote:
> 
> 
> On 14/02/2023 21:24, Edward Cree wrote:
> > On 14/02/2023 18:48, Marcelo Ricardo Leitner wrote:
> > > On Tue, Feb 14, 2023 at 02:31:06PM +0200, Oz Shlomo wrote:
> > > > Actually, I think the current naming scheme of act_cookie and miss_cookie
> > > > makes sense.
> > > 
> > > Then perhaps,
> > > act_cookie here -> instance_cookie
> > > miss_cookie -> config_cookie
> > > 
> > > Sorry for the bikeshedding, btw, but these cookies are getting
> > > confusing. We need them to taste nice :-}
> > 
> > I'm with Oz, keep the current name for act_cookie.
> > 
> > (In my ideal world, it'd just be called cookie, and the existing
> >   cookie in struct flow_action_entry would be renamed user_cookie.
> >   Because act_cookie is the same thing conceptually as
> >   flow_cls_offload.cookie.  Though I wonder if that means it
> >   belongs in struct flow_offload_action instead?)
> > 
> > -ed
> 
> 
> 
> 
> Ok so I want to add this patch to the series:
> 
> 
> From 326938812758dbd2591b221452708504911ca419 Mon Sep 17 00:00:00 2001
> From: Paul Blakey <paulb@nvidia.com>
> Date: Wed, 15 Feb 2023 10:57:40 +0200
> Subject: [PATCH] net: sched: Rename user cookie and act cookie
> 
> struct tc_action->act_cookie is a user defined cookie,
> and the related struct flow_action_entry->act_cookie is
> used as an handle similar to struct flow_cls_offload->cookie.
> 
> Rename tc_action->act_cookie to user_cookie, and
> flow_action_entry->act_cookie to cookie so their names
> would better fit their usage.

Makes sense. This helps a lot already.
(I didn't review the patch carefully yet, but it seems good)
Thanks!

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-02-15 16:14 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-06 17:43 [PATCH net-next v9 0/7] net/sched: cls_api: Support hardware miss to tc action Paul Blakey
2023-02-06 17:43 ` [PATCH net-next v9 1/7] " Paul Blakey
2023-02-10  2:21   ` Marcelo Ricardo Leitner
2023-02-13 16:13     ` Paul Blakey
2023-02-13 18:43       ` Marcelo Ricardo Leitner
2023-02-14 12:14         ` Paul Blakey
2023-02-14 12:31           ` Oz Shlomo
2023-02-14 18:48             ` Marcelo Ricardo Leitner
2023-02-14 19:24               ` Edward Cree
2023-02-15 10:09                 ` Paul Blakey
2023-02-15 16:14                   ` Marcelo Ricardo Leitner
2023-02-06 17:43 ` [PATCH net-next v9 2/7] net/sched: flower: Move filter handle initialization earlier Paul Blakey
2023-02-06 17:43 ` [PATCH net-next v9 3/7] net/sched: flower: Support hardware miss to tc action Paul Blakey
2023-02-06 17:44 ` [PATCH net-next v9 4/7] net/mlx5: Kconfig: Make tc offload depend on tc skb extension Paul Blakey
2023-02-10  2:29   ` Marcelo Ricardo Leitner
2023-02-06 17:44 ` [PATCH net-next v9 5/7] net/mlx5: Refactor tc miss handling to a single function Paul Blakey
2023-02-06 17:44 ` [PATCH net-next v9 6/7] net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG Paul Blakey
2023-02-06 17:44 ` [PATCH net-next v9 7/7] net/mlx5e: TC, Set CT miss to the specific ct action instance Paul Blakey
2023-02-10  1:56 ` [PATCH net-next v9 0/7] net/sched: cls_api: Support hardware miss to tc action Marcelo Ricardo Leitner
2023-02-13 16:25   ` Paul Blakey
2023-02-13 18:27     ` Marcelo Ricardo Leitner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.