All of lore.kernel.org
 help / color / mirror / Atom feed
* [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode
@ 2020-09-23 22:48 saeed
  2020-09-23 22:48 ` [net-next V2 01/15] net/mlx5: Refactor multi chains and prios support saeed
                   ` (15 more replies)
  0 siblings, 16 replies; 21+ messages in thread
From: saeed @ 2020-09-23 22:48 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski; +Cc: netdev, Saeed Mahameed

From: Saeed Mahameed <saeedm@nvidia.com>

Hi Dave, Jakub,

This series adds the support for connection tracking in NIC mode,
and attached to this series some trivial cleanup patches.
v1->v2:
 - Remove "fixup!" comment from commit message (Jakub)
 - More information and use case description in the tag message
   (Cover-letter) (Jakub)

For more information please see tag log below.

Please pull and let me know if there is any problem.

Thanks,
Saeed.

---

The following changes since commit 748d1c8a425ec529d541f082ee7a81f6a51fa120:

  Merge branch 'devlink-Use-nla_policy-to-validate-range' (2020-09-22 17:38:42 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2020-09-22

for you to fetch changes up to 987cd5f049a2b5ed46901f6a874040a08d21d31f:

  net/mlx5: remove unreachable return (2020-09-23 15:44:39 -0700)

----------------------------------------------------------------
mlx5-updates-2020-09-22

This series includes mlx5 updates

1) Add support for Connection Tracking offload in NIC mode.
   Supporting CT offload in NIC mode on Mellanox cards is useful for
   scenarios where the dual port NIC serves as a gateway between 2
   networks and forwards traffic between these networks.

   Since the traffic is not terminated on the host in this case,
   no use of SRIOV VFs and/or switchdev mode is required.

   Today Mellanox NIC cards already support offloading of packet forwarding
   between physical ports without going to the host so combining it with CT
   offloading allows users to create a gateway with forwarding and CT
   (Including NAT) offloading capabilities in non-switchdev mode.

   To support connection tracking in non-Switchdev mode (Single NIC mode),
   we need to make use of the current Connection tracking infrastructure
   implemented on top of E-Switch and the mlx5 generic flow table chains
   APIs, to make it work on non-Eswitch steering domain e.g. NIC RX domain,
   the following was performed:

 1.1) Refactor current flow steering chains infrastructure and
      updates TC nic mode implementation to use flow table chains.
 1.2) Refactor current Connection Tracking (CT) infrastructure to not
      assume E-switch backend, and make the CT layer agnostic to
      underlying steering mode (E-Switch/NIC)
 1.3) Plumbing to support CT offload in NIC mode.

2) Trivial code cleanups.

----------------------------------------------------------------
Ariel Levkovich (9):
      net/mlx5: Refactor multi chains and prios support
      net/mlx5: Allow ft level ignore for nic rx tables
      net/mlx5e: Tc nic flows to use mlx5_chains flow tables
      net/mlx5e: Split nic tc flow allocation and creation
      net/mlx5: Refactor tc flow attributes structure
      net/mlx5e: Add tc chains offload support for nic flows
      net/mlx5e: rework ct offload init messages
      net/mlx5e: Support CT offload for tc nic flows
      net/mlx5e: Keep direct reference to mlx5_core_dev in tc ct

Denis Efremov (2):
      net/mlx5e: IPsec: Use kvfree() for memory allocated with kvzalloc()
      net/mlx5e: Use kfree() to free fd->g in accel_fs_tcp_create_groups()

Oz Shlomo (1):
      net/mlx5e: CT: Use the same counter for both directions

Pavel Machek (CIP) (1):
      net/mlx5: remove unreachable return

Qinglang Miao (1):
      net/mlx5: simplify the return expression of mlx5_ec_init()

Saeed Mahameed (1):
      net/mlx5e: TC: Remove unused parameter from mlx5_tc_ct_add_no_trk_match()

 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/ecpf.c     |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/fs.h    |   7 +-
 .../net/ethernet/mellanox/mlx5/core/en/rep/tc.c    |  22 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 525 +++++++-----
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h |  75 +-
 .../ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c  |   2 +-
 .../mellanox/mlx5/core/en_accel/ipsec_fs.c         |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   1 -
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |  10 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 865 +++++++++++++------
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.h    |  97 +++
 .../net/ethernet/mellanox/mlx5/core/esw/chains.c   | 944 ---------------------
 .../net/ethernet/mellanox/mlx5/core/esw/chains.h   |  68 --
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  39 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 309 +++++--
 .../mellanox/mlx5/core/eswitch_offloads_termtbl.c  |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |   5 +-
 .../net/ethernet/mellanox/mlx5/core/lib/clock.c    |   2 -
 .../ethernet/mellanox/mlx5/core/lib/fs_chains.c    | 911 ++++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/lib/fs_chains.h    |  93 ++
 21 files changed, 2339 insertions(+), 1658 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/esw/chains.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/esw/chains.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.h

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [net-next V2 01/15] net/mlx5: Refactor multi chains and prios support
  2020-09-23 22:48 [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode saeed
@ 2020-09-23 22:48 ` saeed
  2020-09-23 22:48 ` [net-next V2 02/15] net/mlx5: Allow ft level ignore for nic rx tables saeed
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: saeed @ 2020-09-23 22:48 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Ariel Levkovich, Roi Dayan, Saeed Mahameed, Saeed Mahameed

From: Ariel Levkovich <lariel@mellanox.com>

Decouple the chains infrastructure from eswitch and make
it generic to support other steering namespaces.

The change defines an agnostic data structure to keep
all the relevant information for maintaining flow table
chaining in any steering namespace. Each namespace that
requires table chaining will be required to allocate
such data structure.

The chains creation code will receive the steering namespace
and flow table parameters from the caller so it will operate
agnosticly when creating the required resources to
maintain the table chaining function while Parts of the code
that are relevant to eswitch specific functionality are moved
to eswitch files.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 .../ethernet/mellanox/mlx5/core/en/rep/tc.c   |  16 +-
 .../ethernet/mellanox/mlx5/core/en/tc_ct.c    |  31 +-
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |   1 -
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   |  12 +-
 .../ethernet/mellanox/mlx5/core/esw/chains.c  | 944 ------------------
 .../ethernet/mellanox/mlx5/core/esw/chains.h  |  68 --
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  12 +-
 .../mellanox/mlx5/core/eswitch_offloads.c     | 159 ++-
 .../mellanox/mlx5/core/lib/fs_chains.c        | 902 +++++++++++++++++
 .../mellanox/mlx5/core/lib/fs_chains.h        |  93 ++
 11 files changed, 1174 insertions(+), 1066 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/esw/chains.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/esw/chains.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 0b3eaa102751..9826a041e407 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -37,7 +37,7 @@ mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += en/hv_vhca_stats.o
 mlx5_core-$(CONFIG_MLX5_ESWITCH)     += lag_mp.o lib/geneve.o lib/port_tun.o \
 					en_rep.o en/rep/bond.o en/mod_hdr.o
 mlx5_core-$(CONFIG_MLX5_CLS_ACT)     += en_tc.o en/rep/tc.o en/rep/neigh.o \
-					en/mapping.o esw/chains.o en/tc_tun.o \
+					en/mapping.o lib/fs_chains.o en/tc_tun.o \
 					en/tc_tun_vxlan.o en/tc_tun_gre.o en/tc_tun_geneve.o \
 					en/tc_tun_mplsoudp.o diag/en_tc_tracepoint.o
 mlx5_core-$(CONFIG_MLX5_TC_CT)	     += en/tc_ct.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
index 79cc42d88eec..771e73f211fb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
@@ -12,7 +12,7 @@
 #include "neigh.h"
 #include "en_rep.h"
 #include "eswitch.h"
-#include "esw/chains.h"
+#include "lib/fs_chains.h"
 #include "en/tc_ct.h"
 #include "en/mapping.h"
 #include "en/tc_tun.h"
@@ -191,7 +191,7 @@ static int mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void *type_data,
 	case TC_SETUP_CLSFLOWER:
 		memcpy(&tmp, f, sizeof(*f));
 
-		if (!mlx5_esw_chains_prios_supported(esw))
+		if (!mlx5_chains_prios_supported(esw_chains(esw)))
 			return -EOPNOTSUPP;
 
 		/* Re-use tc offload path by moving the ft flow to the
@@ -203,12 +203,12 @@ static int mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void *type_data,
 		 *
 		 * We only support chain 0 of FT offload.
 		 */
-		if (tmp.common.prio >= mlx5_esw_chains_get_prio_range(esw))
+		if (tmp.common.prio >= mlx5_chains_get_prio_range(esw_chains(esw)))
 			return -EOPNOTSUPP;
 		if (tmp.common.chain_index != 0)
 			return -EOPNOTSUPP;
 
-		tmp.common.chain_index = mlx5_esw_chains_get_ft_chain(esw);
+		tmp.common.chain_index = mlx5_chains_get_nf_ft_chain(esw_chains(esw));
 		tmp.common.prio++;
 		err = mlx5e_rep_setup_tc_cls_flower(priv, &tmp, flags);
 		memcpy(&f->stats, &tmp.stats, sizeof(f->stats));
@@ -378,12 +378,12 @@ static int mlx5e_rep_indr_setup_ft_cb(enum tc_setup_type type,
 		 *
 		 * We only support chain 0 of FT offload.
 		 */
-		if (!mlx5_esw_chains_prios_supported(esw) ||
-		    tmp.common.prio >= mlx5_esw_chains_get_prio_range(esw) ||
+		if (!mlx5_chains_prios_supported(esw_chains(esw)) ||
+		    tmp.common.prio >= mlx5_chains_get_prio_range(esw_chains(esw)) ||
 		    tmp.common.chain_index)
 			return -EOPNOTSUPP;
 
-		tmp.common.chain_index = mlx5_esw_chains_get_ft_chain(esw);
+		tmp.common.chain_index = mlx5_chains_get_nf_ft_chain(esw_chains(esw));
 		tmp.common.prio++;
 		err = mlx5e_rep_indr_offload(priv->netdev, &tmp, priv, flags);
 		memcpy(&f->stats, &tmp.stats, sizeof(f->stats));
@@ -626,7 +626,7 @@ bool mlx5e_rep_tc_update_skb(struct mlx5_cqe64 *cqe,
 	priv = netdev_priv(skb->dev);
 	esw = priv->mdev->priv.eswitch;
 
-	err = mlx5_eswitch_get_chain_for_tag(esw, reg_c0, &chain);
+	err = mlx5_get_chain_for_tag(esw_chains(esw), reg_c0, &chain);
 	if (err) {
 		netdev_dbg(priv->netdev,
 			   "Couldn't find chain for chain tag: %d, err: %d\n",
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index bc5f72ec3623..579f888c22ab 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -14,7 +14,7 @@
 #include <linux/workqueue.h>
 #include <linux/xarray.h>
 
-#include "esw/chains.h"
+#include "lib/fs_chains.h"
 #include "en/tc_ct.h"
 #include "en/mod_hdr.h"
 #include "en/mapping.h"
@@ -1485,8 +1485,8 @@ __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 	 * don't go though all prios of this chain as normal tc rules
 	 * miss.
 	 */
-	err = mlx5_esw_chains_get_chain_mapping(esw, attr->chain,
-						&chain_mapping);
+	err = mlx5_chains_get_chain_mapping(esw_chains(esw), attr->chain,
+					    &chain_mapping);
 	if (err) {
 		ct_dbg("Failed to get chain register mapping for chain");
 		goto err_get_chain;
@@ -1582,7 +1582,7 @@ __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 	mlx5_modify_header_dealloc(priv->mdev, pre_ct_attr->modify_hdr);
 err_mapping:
 	dealloc_mod_hdr_actions(&pre_mod_acts);
-	mlx5_esw_chains_put_chain_mapping(esw, ct_flow->chain_mapping);
+	mlx5_chains_put_chain_mapping(esw_chains(esw), ct_flow->chain_mapping);
 err_get_chain:
 	idr_remove(&ct_priv->fte_ids, fte_id);
 err_idr:
@@ -1694,7 +1694,7 @@ __mlx5_tc_ct_delete_flow(struct mlx5_tc_ct_priv *ct_priv,
 	if (ct_flow->post_ct_rule) {
 		mlx5_eswitch_del_offloaded_rule(esw, ct_flow->post_ct_rule,
 						&ct_flow->post_ct_attr);
-		mlx5_esw_chains_put_chain_mapping(esw, ct_flow->chain_mapping);
+		mlx5_chains_put_chain_mapping(esw_chains(esw), ct_flow->chain_mapping);
 		idr_remove(&ct_priv->fte_ids, ct_flow->fte_id);
 		mlx5_tc_ct_del_ft_cb(ct_priv, ct_flow->ft);
 	}
@@ -1817,14 +1817,14 @@ mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv)
 
 	ct_priv->esw = esw;
 	ct_priv->netdev = rpriv->netdev;
-	ct_priv->ct = mlx5_esw_chains_create_global_table(esw);
+	ct_priv->ct = mlx5_chains_create_global_table(esw_chains(esw));
 	if (IS_ERR(ct_priv->ct)) {
 		err = PTR_ERR(ct_priv->ct);
 		mlx5_tc_ct_init_err(rpriv, "failed to create ct table", err);
 		goto err_ct_tbl;
 	}
 
-	ct_priv->ct_nat = mlx5_esw_chains_create_global_table(esw);
+	ct_priv->ct_nat = mlx5_chains_create_global_table(esw_chains(esw));
 	if (IS_ERR(ct_priv->ct_nat)) {
 		err = PTR_ERR(ct_priv->ct_nat);
 		mlx5_tc_ct_init_err(rpriv, "failed to create ct nat table",
@@ -1832,7 +1832,7 @@ mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv)
 		goto err_ct_nat_tbl;
 	}
 
-	ct_priv->post_ct = mlx5_esw_chains_create_global_table(esw);
+	ct_priv->post_ct = mlx5_chains_create_global_table(esw_chains(esw));
 	if (IS_ERR(ct_priv->post_ct)) {
 		err = PTR_ERR(ct_priv->post_ct);
 		mlx5_tc_ct_init_err(rpriv, "failed to create post ct table",
@@ -1852,9 +1852,9 @@ mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv)
 	return 0;
 
 err_post_ct_tbl:
-	mlx5_esw_chains_destroy_global_table(esw, ct_priv->ct_nat);
+	mlx5_chains_destroy_global_table(esw_chains(esw), ct_priv->ct_nat);
 err_ct_nat_tbl:
-	mlx5_esw_chains_destroy_global_table(esw, ct_priv->ct);
+	mlx5_chains_destroy_global_table(esw_chains(esw), ct_priv->ct);
 err_ct_tbl:
 	mapping_destroy(ct_priv->labels_mapping);
 err_mapping_labels:
@@ -1871,13 +1871,18 @@ void
 mlx5_tc_ct_clean(struct mlx5_rep_uplink_priv *uplink_priv)
 {
 	struct mlx5_tc_ct_priv *ct_priv = uplink_priv->ct_priv;
+	struct mlx5_fs_chains *chains;
+	struct mlx5_eswitch *esw;
 
 	if (!ct_priv)
 		return;
 
-	mlx5_esw_chains_destroy_global_table(ct_priv->esw, ct_priv->post_ct);
-	mlx5_esw_chains_destroy_global_table(ct_priv->esw, ct_priv->ct_nat);
-	mlx5_esw_chains_destroy_global_table(ct_priv->esw, ct_priv->ct);
+	esw = ct_priv->esw;
+	chains = esw_chains(esw);
+
+	mlx5_chains_destroy_global_table(chains, ct_priv->post_ct);
+	mlx5_chains_destroy_global_table(chains, ct_priv->ct_nat);
+	mlx5_chains_destroy_global_table(chains, ct_priv->ct);
 	mapping_destroy(ct_priv->zone_mapping);
 	mapping_destroy(ct_priv->labels_mapping);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 97ba2da56cf9..9f5c97d22af4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -39,7 +39,6 @@
 #include <net/ipv6_stubs.h>
 
 #include "eswitch.h"
-#include "esw/chains.h"
 #include "en.h"
 #include "en_rep.h"
 #include "en/txrx.h"
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 28053c3c4380..557769c16393 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -57,7 +57,6 @@
 #include "en/rep/neigh.h"
 #include "en_tc.h"
 #include "eswitch.h"
-#include "esw/chains.h"
 #include "fs_core.h"
 #include "en/port.h"
 #include "en/tc_tun.h"
@@ -66,6 +65,7 @@
 #include "en/mod_hdr.h"
 #include "lib/devcom.h"
 #include "lib/geneve.h"
+#include "lib/fs_chains.h"
 #include "diag/en_tc_tracepoint.h"
 
 #define MLX5_MH_ACT_SZ MLX5_UN_SZ_BYTES(set_add_copy_action_in_auto)
@@ -1180,7 +1180,7 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 	int err = 0;
 	int out_index;
 
-	if (!mlx5_esw_chains_prios_supported(esw) && attr->prio != 1) {
+	if (!mlx5_chains_prios_supported(esw_chains(esw)) && attr->prio != 1) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "E-switch priorities unsupported, upgrade FW");
 		return -EOPNOTSUPP;
@@ -1191,14 +1191,14 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 	 * FDB_FT_CHAIN which is outside tc range.
 	 * See mlx5e_rep_setup_ft_cb().
 	 */
-	max_chain = mlx5_esw_chains_get_chain_range(esw);
+	max_chain = mlx5_chains_get_chain_range(esw_chains(esw));
 	if (!mlx5e_is_ft_flow(flow) && attr->chain > max_chain) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "Requested chain is out of supported range");
 		return -EOPNOTSUPP;
 	}
 
-	max_prio = mlx5_esw_chains_get_prio_range(esw);
+	max_prio = mlx5_chains_get_prio_range(esw_chains(esw));
 	if (attr->prio > max_prio) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "Requested priority is out of supported range");
@@ -3845,7 +3845,7 @@ static int mlx5_validate_goto_chain(struct mlx5_eswitch *esw,
 				    u32 actions,
 				    struct netlink_ext_ack *extack)
 {
-	u32 max_chain = mlx5_esw_chains_get_chain_range(esw);
+	u32 max_chain = mlx5_chains_get_chain_range(esw_chains(esw));
 	struct mlx5_esw_flow_attr *attr = flow->esw_attr;
 	bool ft_flow = mlx5e_is_ft_flow(flow);
 	u32 dest_chain = act->chain_index;
@@ -3855,7 +3855,7 @@ static int mlx5_validate_goto_chain(struct mlx5_eswitch *esw,
 		return -EOPNOTSUPP;
 	}
 
-	if (!mlx5_esw_chains_backwards_supported(esw) &&
+	if (!mlx5_chains_backwards_supported(esw_chains(esw)) &&
 	    dest_chain <= attr->chain) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "Goto lower numbered chain isn't supported");
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/chains.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/chains.c
deleted file mode 100644
index d5bf908dfecd..000000000000
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/chains.c
+++ /dev/null
@@ -1,944 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
-// Copyright (c) 2020 Mellanox Technologies.
-
-#include <linux/mlx5/driver.h>
-#include <linux/mlx5/mlx5_ifc.h>
-#include <linux/mlx5/fs.h>
-
-#include "esw/chains.h"
-#include "en/mapping.h"
-#include "mlx5_core.h"
-#include "fs_core.h"
-#include "eswitch.h"
-#include "en.h"
-#include "en_tc.h"
-
-#define esw_chains_priv(esw) ((esw)->fdb_table.offloads.esw_chains_priv)
-#define esw_chains_lock(esw) (esw_chains_priv(esw)->lock)
-#define esw_chains_ht(esw) (esw_chains_priv(esw)->chains_ht)
-#define esw_chains_mapping(esw) (esw_chains_priv(esw)->chains_mapping)
-#define esw_prios_ht(esw) (esw_chains_priv(esw)->prios_ht)
-#define fdb_pool_left(esw) (esw_chains_priv(esw)->fdb_left)
-#define tc_slow_fdb(esw) ((esw)->fdb_table.offloads.slow_fdb)
-#define tc_end_fdb(esw) (esw_chains_priv(esw)->tc_end_fdb)
-#define fdb_ignore_flow_level_supported(esw) \
-	(MLX5_CAP_ESW_FLOWTABLE_FDB((esw)->dev, ignore_flow_level))
-#define fdb_modify_header_fwd_to_table_supported(esw) \
-	(MLX5_CAP_ESW_FLOWTABLE((esw)->dev, fdb_modify_header_fwd_to_table))
-
-/* Firmware currently has 4 pool of 4 sizes that it supports (ESW_POOLS),
- * and a virtual memory region of 16M (ESW_SIZE), this region is duplicated
- * for each flow table pool. We can allocate up to 16M of each pool,
- * and we keep track of how much we used via get_next_avail_sz_from_pool.
- * Firmware doesn't report any of this for now.
- * ESW_POOL is expected to be sorted from large to small and match firmware
- * pools.
- */
-#define ESW_SIZE (16 * 1024 * 1024)
-static const unsigned int ESW_POOLS[] = { 4 * 1024 * 1024,
-					  1 * 1024 * 1024,
-					  64 * 1024,
-					  128 };
-#define ESW_FT_TBL_SZ (64 * 1024)
-
-struct mlx5_esw_chains_priv {
-	struct rhashtable chains_ht;
-	struct rhashtable prios_ht;
-	/* Protects above chains_ht and prios_ht */
-	struct mutex lock;
-
-	struct mlx5_flow_table *tc_end_fdb;
-	struct mapping_ctx *chains_mapping;
-
-	int fdb_left[ARRAY_SIZE(ESW_POOLS)];
-};
-
-struct fdb_chain {
-	struct rhash_head node;
-
-	u32 chain;
-
-	int ref;
-	int id;
-
-	struct mlx5_eswitch *esw;
-	struct list_head prios_list;
-	struct mlx5_flow_handle *restore_rule;
-	struct mlx5_modify_hdr *miss_modify_hdr;
-};
-
-struct fdb_prio_key {
-	u32 chain;
-	u32 prio;
-	u32 level;
-};
-
-struct fdb_prio {
-	struct rhash_head node;
-	struct list_head list;
-
-	struct fdb_prio_key key;
-
-	int ref;
-
-	struct fdb_chain *fdb_chain;
-	struct mlx5_flow_table *fdb;
-	struct mlx5_flow_table *next_fdb;
-	struct mlx5_flow_group *miss_group;
-	struct mlx5_flow_handle *miss_rule;
-};
-
-static const struct rhashtable_params chain_params = {
-	.head_offset = offsetof(struct fdb_chain, node),
-	.key_offset = offsetof(struct fdb_chain, chain),
-	.key_len = sizeof_field(struct fdb_chain, chain),
-	.automatic_shrinking = true,
-};
-
-static const struct rhashtable_params prio_params = {
-	.head_offset = offsetof(struct fdb_prio, node),
-	.key_offset = offsetof(struct fdb_prio, key),
-	.key_len = sizeof_field(struct fdb_prio, key),
-	.automatic_shrinking = true,
-};
-
-bool mlx5_esw_chains_prios_supported(struct mlx5_eswitch *esw)
-{
-	return esw->fdb_table.flags & ESW_FDB_CHAINS_AND_PRIOS_SUPPORTED;
-}
-
-bool mlx5_esw_chains_backwards_supported(struct mlx5_eswitch *esw)
-{
-	return mlx5_esw_chains_prios_supported(esw) &&
-	       fdb_ignore_flow_level_supported(esw);
-}
-
-u32 mlx5_esw_chains_get_chain_range(struct mlx5_eswitch *esw)
-{
-	if (!mlx5_esw_chains_prios_supported(esw))
-		return 1;
-
-	if (fdb_ignore_flow_level_supported(esw))
-		return UINT_MAX - 1;
-
-	return FDB_TC_MAX_CHAIN;
-}
-
-u32 mlx5_esw_chains_get_ft_chain(struct mlx5_eswitch *esw)
-{
-	return mlx5_esw_chains_get_chain_range(esw) + 1;
-}
-
-u32 mlx5_esw_chains_get_prio_range(struct mlx5_eswitch *esw)
-{
-	if (!mlx5_esw_chains_prios_supported(esw))
-		return 1;
-
-	if (fdb_ignore_flow_level_supported(esw))
-		return UINT_MAX;
-
-	return FDB_TC_MAX_PRIO;
-}
-
-static unsigned int mlx5_esw_chains_get_level_range(struct mlx5_eswitch *esw)
-{
-	if (fdb_ignore_flow_level_supported(esw))
-		return UINT_MAX;
-
-	return FDB_TC_LEVELS_PER_PRIO;
-}
-
-#define POOL_NEXT_SIZE 0
-static int
-mlx5_esw_chains_get_avail_sz_from_pool(struct mlx5_eswitch *esw,
-				       int desired_size)
-{
-	int i, found_i = -1;
-
-	for (i = ARRAY_SIZE(ESW_POOLS) - 1; i >= 0; i--) {
-		if (fdb_pool_left(esw)[i] && ESW_POOLS[i] > desired_size) {
-			found_i = i;
-			if (desired_size != POOL_NEXT_SIZE)
-				break;
-		}
-	}
-
-	if (found_i != -1) {
-		--fdb_pool_left(esw)[found_i];
-		return ESW_POOLS[found_i];
-	}
-
-	return 0;
-}
-
-static void
-mlx5_esw_chains_put_sz_to_pool(struct mlx5_eswitch *esw, int sz)
-{
-	int i;
-
-	for (i = ARRAY_SIZE(ESW_POOLS) - 1; i >= 0; i--) {
-		if (sz == ESW_POOLS[i]) {
-			++fdb_pool_left(esw)[i];
-			return;
-		}
-	}
-
-	WARN_ONCE(1, "Couldn't find size %d in fdb size pool", sz);
-}
-
-static void
-mlx5_esw_chains_init_sz_pool(struct mlx5_eswitch *esw)
-{
-	u32 fdb_max;
-	int i;
-
-	fdb_max = 1 << MLX5_CAP_ESW_FLOWTABLE_FDB(esw->dev, log_max_ft_size);
-
-	for (i = ARRAY_SIZE(ESW_POOLS) - 1; i >= 0; i--)
-		fdb_pool_left(esw)[i] =
-			ESW_POOLS[i] <= fdb_max ? ESW_SIZE / ESW_POOLS[i] : 0;
-}
-
-static struct mlx5_flow_table *
-mlx5_esw_chains_create_fdb_table(struct mlx5_eswitch *esw,
-				 u32 chain, u32 prio, u32 level)
-{
-	struct mlx5_flow_table_attr ft_attr = {};
-	struct mlx5_flow_namespace *ns;
-	struct mlx5_flow_table *fdb;
-	int sz;
-
-	if (esw->offloads.encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE)
-		ft_attr.flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT |
-				  MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
-
-	sz = (chain == mlx5_esw_chains_get_ft_chain(esw)) ?
-	     mlx5_esw_chains_get_avail_sz_from_pool(esw, ESW_FT_TBL_SZ) :
-	     mlx5_esw_chains_get_avail_sz_from_pool(esw, POOL_NEXT_SIZE);
-	if (!sz)
-		return ERR_PTR(-ENOSPC);
-	ft_attr.max_fte = sz;
-
-	/* We use tc_slow_fdb(esw) as the table's next_ft till
-	 * ignore_flow_level is allowed on FT creation and not just for FTEs.
-	 * Instead caller should add an explicit miss rule if needed.
-	 */
-	ft_attr.next_ft = tc_slow_fdb(esw);
-
-	/* The root table(chain 0, prio 1, level 0) is required to be
-	 * connected to the previous prio (FDB_BYPASS_PATH if exists).
-	 * We always create it, as a managed table, in order to align with
-	 * fs_core logic.
-	 */
-	if (!fdb_ignore_flow_level_supported(esw) ||
-	    (chain == 0 && prio == 1 && level == 0)) {
-		ft_attr.level = level;
-		ft_attr.prio = prio - 1;
-		ns = mlx5_get_fdb_sub_ns(esw->dev, chain);
-	} else {
-		ft_attr.flags |= MLX5_FLOW_TABLE_UNMANAGED;
-		ft_attr.prio = FDB_TC_OFFLOAD;
-		/* Firmware doesn't allow us to create another level 0 table,
-		 * so we create all unmanaged tables as level 1.
-		 *
-		 * To connect them, we use explicit miss rules with
-		 * ignore_flow_level. Caller is responsible to create
-		 * these rules (if needed).
-		 */
-		ft_attr.level = 1;
-		ns = mlx5_get_flow_namespace(esw->dev, MLX5_FLOW_NAMESPACE_FDB);
-	}
-
-	ft_attr.autogroup.num_reserved_entries = 2;
-	ft_attr.autogroup.max_num_groups = esw->params.large_group_num;
-	fdb = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
-	if (IS_ERR(fdb)) {
-		esw_warn(esw->dev,
-			 "Failed to create FDB table err %d (chain: %d, prio: %d, level: %d, size: %d)\n",
-			 (int)PTR_ERR(fdb), chain, prio, level, sz);
-		mlx5_esw_chains_put_sz_to_pool(esw, sz);
-		return fdb;
-	}
-
-	return fdb;
-}
-
-static void
-mlx5_esw_chains_destroy_fdb_table(struct mlx5_eswitch *esw,
-				  struct mlx5_flow_table *fdb)
-{
-	mlx5_esw_chains_put_sz_to_pool(esw, fdb->max_fte);
-	mlx5_destroy_flow_table(fdb);
-}
-
-static int
-create_fdb_chain_restore(struct fdb_chain *fdb_chain)
-{
-	char modact[MLX5_UN_SZ_BYTES(set_add_copy_action_in_auto)];
-	struct mlx5_eswitch *esw = fdb_chain->esw;
-	struct mlx5_modify_hdr *mod_hdr;
-	u32 index;
-	int err;
-
-	if (fdb_chain->chain == mlx5_esw_chains_get_ft_chain(esw) ||
-	    !mlx5_esw_chains_prios_supported(esw))
-		return 0;
-
-	err = mapping_add(esw_chains_mapping(esw), &fdb_chain->chain, &index);
-	if (err)
-		return err;
-	if (index == MLX5_FS_DEFAULT_FLOW_TAG) {
-		/* we got the special default flow tag id, so we won't know
-		 * if we actually marked the packet with the restore rule
-		 * we create.
-		 *
-		 * This case isn't possible with MLX5_FS_DEFAULT_FLOW_TAG = 0.
-		 */
-		err = mapping_add(esw_chains_mapping(esw),
-				  &fdb_chain->chain, &index);
-		mapping_remove(esw_chains_mapping(esw),
-			       MLX5_FS_DEFAULT_FLOW_TAG);
-		if (err)
-			return err;
-	}
-
-	fdb_chain->id = index;
-
-	MLX5_SET(set_action_in, modact, action_type, MLX5_ACTION_TYPE_SET);
-	MLX5_SET(set_action_in, modact, field,
-		 mlx5e_tc_attr_to_reg_mappings[CHAIN_TO_REG].mfield);
-	MLX5_SET(set_action_in, modact, offset,
-		 mlx5e_tc_attr_to_reg_mappings[CHAIN_TO_REG].moffset * 8);
-	MLX5_SET(set_action_in, modact, length,
-		 mlx5e_tc_attr_to_reg_mappings[CHAIN_TO_REG].mlen * 8);
-	MLX5_SET(set_action_in, modact, data, fdb_chain->id);
-	mod_hdr = mlx5_modify_header_alloc(esw->dev, MLX5_FLOW_NAMESPACE_FDB,
-					   1, modact);
-	if (IS_ERR(mod_hdr)) {
-		err = PTR_ERR(mod_hdr);
-		goto err_mod_hdr;
-	}
-	fdb_chain->miss_modify_hdr = mod_hdr;
-
-	fdb_chain->restore_rule = esw_add_restore_rule(esw, fdb_chain->id);
-	if (IS_ERR(fdb_chain->restore_rule)) {
-		err = PTR_ERR(fdb_chain->restore_rule);
-		goto err_rule;
-	}
-
-	return 0;
-
-err_rule:
-	mlx5_modify_header_dealloc(esw->dev, fdb_chain->miss_modify_hdr);
-err_mod_hdr:
-	/* Datapath can't find this mapping, so we can safely remove it */
-	mapping_remove(esw_chains_mapping(esw), fdb_chain->id);
-	return err;
-}
-
-static void destroy_fdb_chain_restore(struct fdb_chain *fdb_chain)
-{
-	struct mlx5_eswitch *esw = fdb_chain->esw;
-
-	if (!fdb_chain->miss_modify_hdr)
-		return;
-
-	mlx5_del_flow_rules(fdb_chain->restore_rule);
-	mlx5_modify_header_dealloc(esw->dev, fdb_chain->miss_modify_hdr);
-	mapping_remove(esw_chains_mapping(esw), fdb_chain->id);
-}
-
-static struct fdb_chain *
-mlx5_esw_chains_create_fdb_chain(struct mlx5_eswitch *esw, u32 chain)
-{
-	struct fdb_chain *fdb_chain = NULL;
-	int err;
-
-	fdb_chain = kvzalloc(sizeof(*fdb_chain), GFP_KERNEL);
-	if (!fdb_chain)
-		return ERR_PTR(-ENOMEM);
-
-	fdb_chain->esw = esw;
-	fdb_chain->chain = chain;
-	INIT_LIST_HEAD(&fdb_chain->prios_list);
-
-	err = create_fdb_chain_restore(fdb_chain);
-	if (err)
-		goto err_restore;
-
-	err = rhashtable_insert_fast(&esw_chains_ht(esw), &fdb_chain->node,
-				     chain_params);
-	if (err)
-		goto err_insert;
-
-	return fdb_chain;
-
-err_insert:
-	destroy_fdb_chain_restore(fdb_chain);
-err_restore:
-	kvfree(fdb_chain);
-	return ERR_PTR(err);
-}
-
-static void
-mlx5_esw_chains_destroy_fdb_chain(struct fdb_chain *fdb_chain)
-{
-	struct mlx5_eswitch *esw = fdb_chain->esw;
-
-	rhashtable_remove_fast(&esw_chains_ht(esw), &fdb_chain->node,
-			       chain_params);
-
-	destroy_fdb_chain_restore(fdb_chain);
-	kvfree(fdb_chain);
-}
-
-static struct fdb_chain *
-mlx5_esw_chains_get_fdb_chain(struct mlx5_eswitch *esw, u32 chain)
-{
-	struct fdb_chain *fdb_chain;
-
-	fdb_chain = rhashtable_lookup_fast(&esw_chains_ht(esw), &chain,
-					   chain_params);
-	if (!fdb_chain) {
-		fdb_chain = mlx5_esw_chains_create_fdb_chain(esw, chain);
-		if (IS_ERR(fdb_chain))
-			return fdb_chain;
-	}
-
-	fdb_chain->ref++;
-
-	return fdb_chain;
-}
-
-static struct mlx5_flow_handle *
-mlx5_esw_chains_add_miss_rule(struct fdb_chain *fdb_chain,
-			      struct mlx5_flow_table *fdb,
-			      struct mlx5_flow_table *next_fdb)
-{
-	struct mlx5_eswitch *esw = fdb_chain->esw;
-	struct mlx5_flow_destination dest = {};
-	struct mlx5_flow_act act = {};
-
-	act.flags  = FLOW_ACT_IGNORE_FLOW_LEVEL | FLOW_ACT_NO_APPEND;
-	act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
-	dest.type  = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
-	dest.ft = next_fdb;
-
-	if (next_fdb == tc_end_fdb(esw) &&
-	    mlx5_esw_chains_prios_supported(esw)) {
-		act.modify_hdr = fdb_chain->miss_modify_hdr;
-		act.action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
-	}
-
-	return mlx5_add_flow_rules(fdb, NULL, &act, &dest, 1);
-}
-
-static int
-mlx5_esw_chains_update_prio_prevs(struct fdb_prio *fdb_prio,
-				  struct mlx5_flow_table *next_fdb)
-{
-	struct mlx5_flow_handle *miss_rules[FDB_TC_LEVELS_PER_PRIO + 1] = {};
-	struct fdb_chain *fdb_chain = fdb_prio->fdb_chain;
-	struct fdb_prio *pos;
-	int n = 0, err;
-
-	if (fdb_prio->key.level)
-		return 0;
-
-	/* Iterate in reverse order until reaching the level 0 rule of
-	 * the previous priority, adding all the miss rules first, so we can
-	 * revert them if any of them fails.
-	 */
-	pos = fdb_prio;
-	list_for_each_entry_continue_reverse(pos,
-					     &fdb_chain->prios_list,
-					     list) {
-		miss_rules[n] = mlx5_esw_chains_add_miss_rule(fdb_chain,
-							      pos->fdb,
-							      next_fdb);
-		if (IS_ERR(miss_rules[n])) {
-			err = PTR_ERR(miss_rules[n]);
-			goto err_prev_rule;
-		}
-
-		n++;
-		if (!pos->key.level)
-			break;
-	}
-
-	/* Success, delete old miss rules, and update the pointers. */
-	n = 0;
-	pos = fdb_prio;
-	list_for_each_entry_continue_reverse(pos,
-					     &fdb_chain->prios_list,
-					     list) {
-		mlx5_del_flow_rules(pos->miss_rule);
-
-		pos->miss_rule = miss_rules[n];
-		pos->next_fdb = next_fdb;
-
-		n++;
-		if (!pos->key.level)
-			break;
-	}
-
-	return 0;
-
-err_prev_rule:
-	while (--n >= 0)
-		mlx5_del_flow_rules(miss_rules[n]);
-
-	return err;
-}
-
-static void
-mlx5_esw_chains_put_fdb_chain(struct fdb_chain *fdb_chain)
-{
-	if (--fdb_chain->ref == 0)
-		mlx5_esw_chains_destroy_fdb_chain(fdb_chain);
-}
-
-static struct fdb_prio *
-mlx5_esw_chains_create_fdb_prio(struct mlx5_eswitch *esw,
-				u32 chain, u32 prio, u32 level)
-{
-	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
-	struct mlx5_flow_handle *miss_rule = NULL;
-	struct mlx5_flow_group *miss_group;
-	struct fdb_prio *fdb_prio = NULL;
-	struct mlx5_flow_table *next_fdb;
-	struct fdb_chain *fdb_chain;
-	struct mlx5_flow_table *fdb;
-	struct list_head *pos;
-	u32 *flow_group_in;
-	int err;
-
-	fdb_chain = mlx5_esw_chains_get_fdb_chain(esw, chain);
-	if (IS_ERR(fdb_chain))
-		return ERR_CAST(fdb_chain);
-
-	fdb_prio = kvzalloc(sizeof(*fdb_prio), GFP_KERNEL);
-	flow_group_in = kvzalloc(inlen, GFP_KERNEL);
-	if (!fdb_prio || !flow_group_in) {
-		err = -ENOMEM;
-		goto err_alloc;
-	}
-
-	/* Chain's prio list is sorted by prio and level.
-	 * And all levels of some prio point to the next prio's level 0.
-	 * Example list (prio, level):
-	 * (3,0)->(3,1)->(5,0)->(5,1)->(6,1)->(7,0)
-	 * In hardware, we will we have the following pointers:
-	 * (3,0) -> (5,0) -> (7,0) -> Slow path
-	 * (3,1) -> (5,0)
-	 * (5,1) -> (7,0)
-	 * (6,1) -> (7,0)
-	 */
-
-	/* Default miss for each chain: */
-	next_fdb = (chain == mlx5_esw_chains_get_ft_chain(esw)) ?
-		    tc_slow_fdb(esw) :
-		    tc_end_fdb(esw);
-	list_for_each(pos, &fdb_chain->prios_list) {
-		struct fdb_prio *p = list_entry(pos, struct fdb_prio, list);
-
-		/* exit on first pos that is larger */
-		if (prio < p->key.prio || (prio == p->key.prio &&
-					   level < p->key.level)) {
-			/* Get next level 0 table */
-			next_fdb = p->key.level == 0 ? p->fdb : p->next_fdb;
-			break;
-		}
-	}
-
-	fdb = mlx5_esw_chains_create_fdb_table(esw, chain, prio, level);
-	if (IS_ERR(fdb)) {
-		err = PTR_ERR(fdb);
-		goto err_create;
-	}
-
-	MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index,
-		 fdb->max_fte - 2);
-	MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index,
-		 fdb->max_fte - 1);
-	miss_group = mlx5_create_flow_group(fdb, flow_group_in);
-	if (IS_ERR(miss_group)) {
-		err = PTR_ERR(miss_group);
-		goto err_group;
-	}
-
-	/* Add miss rule to next_fdb */
-	miss_rule = mlx5_esw_chains_add_miss_rule(fdb_chain, fdb, next_fdb);
-	if (IS_ERR(miss_rule)) {
-		err = PTR_ERR(miss_rule);
-		goto err_miss_rule;
-	}
-
-	fdb_prio->miss_group = miss_group;
-	fdb_prio->miss_rule = miss_rule;
-	fdb_prio->next_fdb = next_fdb;
-	fdb_prio->fdb_chain = fdb_chain;
-	fdb_prio->key.chain = chain;
-	fdb_prio->key.prio = prio;
-	fdb_prio->key.level = level;
-	fdb_prio->fdb = fdb;
-
-	err = rhashtable_insert_fast(&esw_prios_ht(esw), &fdb_prio->node,
-				     prio_params);
-	if (err)
-		goto err_insert;
-
-	list_add(&fdb_prio->list, pos->prev);
-
-	/* Table is ready, connect it */
-	err = mlx5_esw_chains_update_prio_prevs(fdb_prio, fdb);
-	if (err)
-		goto err_update;
-
-	kvfree(flow_group_in);
-	return fdb_prio;
-
-err_update:
-	list_del(&fdb_prio->list);
-	rhashtable_remove_fast(&esw_prios_ht(esw), &fdb_prio->node,
-			       prio_params);
-err_insert:
-	mlx5_del_flow_rules(miss_rule);
-err_miss_rule:
-	mlx5_destroy_flow_group(miss_group);
-err_group:
-	mlx5_esw_chains_destroy_fdb_table(esw, fdb);
-err_create:
-err_alloc:
-	kvfree(fdb_prio);
-	kvfree(flow_group_in);
-	mlx5_esw_chains_put_fdb_chain(fdb_chain);
-	return ERR_PTR(err);
-}
-
-static void
-mlx5_esw_chains_destroy_fdb_prio(struct mlx5_eswitch *esw,
-				 struct fdb_prio *fdb_prio)
-{
-	struct fdb_chain *fdb_chain = fdb_prio->fdb_chain;
-
-	WARN_ON(mlx5_esw_chains_update_prio_prevs(fdb_prio,
-						  fdb_prio->next_fdb));
-
-	list_del(&fdb_prio->list);
-	rhashtable_remove_fast(&esw_prios_ht(esw), &fdb_prio->node,
-			       prio_params);
-	mlx5_del_flow_rules(fdb_prio->miss_rule);
-	mlx5_destroy_flow_group(fdb_prio->miss_group);
-	mlx5_esw_chains_destroy_fdb_table(esw, fdb_prio->fdb);
-	mlx5_esw_chains_put_fdb_chain(fdb_chain);
-	kvfree(fdb_prio);
-}
-
-struct mlx5_flow_table *
-mlx5_esw_chains_get_table(struct mlx5_eswitch *esw, u32 chain, u32 prio,
-			  u32 level)
-{
-	struct mlx5_flow_table *prev_fts;
-	struct fdb_prio *fdb_prio;
-	struct fdb_prio_key key;
-	int l = 0;
-
-	if ((chain > mlx5_esw_chains_get_chain_range(esw) &&
-	     chain != mlx5_esw_chains_get_ft_chain(esw)) ||
-	    prio > mlx5_esw_chains_get_prio_range(esw) ||
-	    level > mlx5_esw_chains_get_level_range(esw))
-		return ERR_PTR(-EOPNOTSUPP);
-
-	/* create earlier levels for correct fs_core lookup when
-	 * connecting tables.
-	 */
-	for (l = 0; l < level; l++) {
-		prev_fts = mlx5_esw_chains_get_table(esw, chain, prio, l);
-		if (IS_ERR(prev_fts)) {
-			fdb_prio = ERR_CAST(prev_fts);
-			goto err_get_prevs;
-		}
-	}
-
-	key.chain = chain;
-	key.prio = prio;
-	key.level = level;
-
-	mutex_lock(&esw_chains_lock(esw));
-	fdb_prio = rhashtable_lookup_fast(&esw_prios_ht(esw), &key,
-					  prio_params);
-	if (!fdb_prio) {
-		fdb_prio = mlx5_esw_chains_create_fdb_prio(esw, chain,
-							   prio, level);
-		if (IS_ERR(fdb_prio))
-			goto err_create_prio;
-	}
-
-	++fdb_prio->ref;
-	mutex_unlock(&esw_chains_lock(esw));
-
-	return fdb_prio->fdb;
-
-err_create_prio:
-	mutex_unlock(&esw_chains_lock(esw));
-err_get_prevs:
-	while (--l >= 0)
-		mlx5_esw_chains_put_table(esw, chain, prio, l);
-	return ERR_CAST(fdb_prio);
-}
-
-void
-mlx5_esw_chains_put_table(struct mlx5_eswitch *esw, u32 chain, u32 prio,
-			  u32 level)
-{
-	struct fdb_prio *fdb_prio;
-	struct fdb_prio_key key;
-
-	key.chain = chain;
-	key.prio = prio;
-	key.level = level;
-
-	mutex_lock(&esw_chains_lock(esw));
-	fdb_prio = rhashtable_lookup_fast(&esw_prios_ht(esw), &key,
-					  prio_params);
-	if (!fdb_prio)
-		goto err_get_prio;
-
-	if (--fdb_prio->ref == 0)
-		mlx5_esw_chains_destroy_fdb_prio(esw, fdb_prio);
-	mutex_unlock(&esw_chains_lock(esw));
-
-	while (level-- > 0)
-		mlx5_esw_chains_put_table(esw, chain, prio, level);
-
-	return;
-
-err_get_prio:
-	mutex_unlock(&esw_chains_lock(esw));
-	WARN_ONCE(1,
-		  "Couldn't find table: (chain: %d prio: %d level: %d)",
-		  chain, prio, level);
-}
-
-struct mlx5_flow_table *
-mlx5_esw_chains_get_tc_end_ft(struct mlx5_eswitch *esw)
-{
-	return tc_end_fdb(esw);
-}
-
-struct mlx5_flow_table *
-mlx5_esw_chains_create_global_table(struct mlx5_eswitch *esw)
-{
-	u32 chain, prio, level;
-	int err;
-
-	if (!fdb_ignore_flow_level_supported(esw)) {
-		err = -EOPNOTSUPP;
-
-		esw_warn(esw->dev,
-			 "Couldn't create global flow table, ignore_flow_level not supported.");
-		goto err_ignore;
-	}
-
-	chain = mlx5_esw_chains_get_chain_range(esw),
-	prio = mlx5_esw_chains_get_prio_range(esw);
-	level = mlx5_esw_chains_get_level_range(esw);
-
-	return mlx5_esw_chains_create_fdb_table(esw, chain, prio, level);
-
-err_ignore:
-	return ERR_PTR(err);
-}
-
-void
-mlx5_esw_chains_destroy_global_table(struct mlx5_eswitch *esw,
-				     struct mlx5_flow_table *ft)
-{
-	mlx5_esw_chains_destroy_fdb_table(esw, ft);
-}
-
-static int
-mlx5_esw_chains_init(struct mlx5_eswitch *esw)
-{
-	struct mlx5_esw_chains_priv *chains_priv;
-	struct mlx5_core_dev *dev = esw->dev;
-	u32 max_flow_counter, fdb_max;
-	struct mapping_ctx *mapping;
-	int err;
-
-	chains_priv = kzalloc(sizeof(*chains_priv), GFP_KERNEL);
-	if (!chains_priv)
-		return -ENOMEM;
-	esw_chains_priv(esw) = chains_priv;
-
-	max_flow_counter = (MLX5_CAP_GEN(dev, max_flow_counter_31_16) << 16) |
-			    MLX5_CAP_GEN(dev, max_flow_counter_15_0);
-	fdb_max = 1 << MLX5_CAP_ESW_FLOWTABLE_FDB(dev, log_max_ft_size);
-
-	esw_debug(dev,
-		  "Init esw offloads chains, max counters(%d), groups(%d), max flow table size(%d)\n",
-		  max_flow_counter, esw->params.large_group_num, fdb_max);
-
-	mlx5_esw_chains_init_sz_pool(esw);
-
-	if (!MLX5_CAP_ESW_FLOWTABLE(esw->dev, multi_fdb_encap) &&
-	    esw->offloads.encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE) {
-		esw->fdb_table.flags &= ~ESW_FDB_CHAINS_AND_PRIOS_SUPPORTED;
-		esw_warn(dev, "Tc chains and priorities offload aren't supported, update firmware if needed\n");
-	} else if (!mlx5_eswitch_reg_c1_loopback_enabled(esw)) {
-		esw->fdb_table.flags &= ~ESW_FDB_CHAINS_AND_PRIOS_SUPPORTED;
-		esw_warn(dev, "Tc chains and priorities offload aren't supported\n");
-	} else if (!fdb_modify_header_fwd_to_table_supported(esw)) {
-		/* Disabled when ttl workaround is needed, e.g
-		 * when ESWITCH_IPV4_TTL_MODIFY_ENABLE = true in mlxconfig
-		 */
-		esw_warn(dev,
-			 "Tc chains and priorities offload aren't supported, check firmware version, or mlxconfig settings\n");
-		esw->fdb_table.flags &= ~ESW_FDB_CHAINS_AND_PRIOS_SUPPORTED;
-	} else {
-		esw->fdb_table.flags |= ESW_FDB_CHAINS_AND_PRIOS_SUPPORTED;
-		esw_info(dev, "Supported tc offload range - chains: %u, prios: %u\n",
-			 mlx5_esw_chains_get_chain_range(esw),
-			 mlx5_esw_chains_get_prio_range(esw));
-	}
-
-	err = rhashtable_init(&esw_chains_ht(esw), &chain_params);
-	if (err)
-		goto init_chains_ht_err;
-
-	err = rhashtable_init(&esw_prios_ht(esw), &prio_params);
-	if (err)
-		goto init_prios_ht_err;
-
-	mapping = mapping_create(sizeof(u32), esw_get_max_restore_tag(esw),
-				 true);
-	if (IS_ERR(mapping)) {
-		err = PTR_ERR(mapping);
-		goto mapping_err;
-	}
-	esw_chains_mapping(esw) = mapping;
-
-	mutex_init(&esw_chains_lock(esw));
-
-	return 0;
-
-mapping_err:
-	rhashtable_destroy(&esw_prios_ht(esw));
-init_prios_ht_err:
-	rhashtable_destroy(&esw_chains_ht(esw));
-init_chains_ht_err:
-	kfree(chains_priv);
-	return err;
-}
-
-static void
-mlx5_esw_chains_cleanup(struct mlx5_eswitch *esw)
-{
-	mutex_destroy(&esw_chains_lock(esw));
-	mapping_destroy(esw_chains_mapping(esw));
-	rhashtable_destroy(&esw_prios_ht(esw));
-	rhashtable_destroy(&esw_chains_ht(esw));
-
-	kfree(esw_chains_priv(esw));
-}
-
-static int
-mlx5_esw_chains_open(struct mlx5_eswitch *esw)
-{
-	struct mlx5_flow_table *ft;
-	int err;
-
-	/* Create tc_end_fdb(esw) which is the always created ft chain */
-	ft = mlx5_esw_chains_get_table(esw, mlx5_esw_chains_get_ft_chain(esw),
-				       1, 0);
-	if (IS_ERR(ft))
-		return PTR_ERR(ft);
-
-	tc_end_fdb(esw) = ft;
-
-	/* Always open the root for fast path */
-	ft = mlx5_esw_chains_get_table(esw, 0, 1, 0);
-	if (IS_ERR(ft)) {
-		err = PTR_ERR(ft);
-		goto level_0_err;
-	}
-
-	/* Open level 1 for split rules now if prios isn't supported  */
-	if (!mlx5_esw_chains_prios_supported(esw)) {
-		err = mlx5_esw_vport_tbl_get(esw);
-		if (err)
-			goto level_1_err;
-	}
-
-	return 0;
-
-level_1_err:
-	mlx5_esw_chains_put_table(esw, 0, 1, 0);
-level_0_err:
-	mlx5_esw_chains_put_table(esw, mlx5_esw_chains_get_ft_chain(esw), 1, 0);
-	return err;
-}
-
-static void
-mlx5_esw_chains_close(struct mlx5_eswitch *esw)
-{
-	if (!mlx5_esw_chains_prios_supported(esw))
-		mlx5_esw_vport_tbl_put(esw);
-	mlx5_esw_chains_put_table(esw, 0, 1, 0);
-	mlx5_esw_chains_put_table(esw, mlx5_esw_chains_get_ft_chain(esw), 1, 0);
-}
-
-int
-mlx5_esw_chains_create(struct mlx5_eswitch *esw)
-{
-	int err;
-
-	err = mlx5_esw_chains_init(esw);
-	if (err)
-		return err;
-
-	err = mlx5_esw_chains_open(esw);
-	if (err)
-		goto err_open;
-
-	return 0;
-
-err_open:
-	mlx5_esw_chains_cleanup(esw);
-	return err;
-}
-
-void
-mlx5_esw_chains_destroy(struct mlx5_eswitch *esw)
-{
-	mlx5_esw_chains_close(esw);
-	mlx5_esw_chains_cleanup(esw);
-}
-
-int
-mlx5_esw_chains_get_chain_mapping(struct mlx5_eswitch *esw, u32 chain,
-				  u32 *chain_mapping)
-{
-	return mapping_add(esw_chains_mapping(esw), &chain, chain_mapping);
-}
-
-int
-mlx5_esw_chains_put_chain_mapping(struct mlx5_eswitch *esw, u32 chain_mapping)
-{
-	return mapping_remove(esw_chains_mapping(esw), chain_mapping);
-}
-
-int mlx5_eswitch_get_chain_for_tag(struct mlx5_eswitch *esw, u32 tag,
-				   u32 *chain)
-{
-	int err;
-
-	err = mapping_find(esw_chains_mapping(esw), tag, chain);
-	if (err) {
-		esw_warn(esw->dev, "Can't find chain for tag: %d\n", tag);
-		return -ENOENT;
-	}
-
-	return 0;
-}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/chains.h b/drivers/net/ethernet/mellanox/mlx5/core/esw/chains.h
deleted file mode 100644
index 7679ac359e31..000000000000
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/chains.h
+++ /dev/null
@@ -1,68 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
-/* Copyright (c) 2020 Mellanox Technologies. */
-
-#ifndef __ML5_ESW_CHAINS_H__
-#define __ML5_ESW_CHAINS_H__
-
-#include "eswitch.h"
-
-#if IS_ENABLED(CONFIG_MLX5_CLS_ACT)
-
-bool
-mlx5_esw_chains_prios_supported(struct mlx5_eswitch *esw);
-bool
-mlx5_esw_chains_backwards_supported(struct mlx5_eswitch *esw);
-u32
-mlx5_esw_chains_get_prio_range(struct mlx5_eswitch *esw);
-u32
-mlx5_esw_chains_get_chain_range(struct mlx5_eswitch *esw);
-u32
-mlx5_esw_chains_get_ft_chain(struct mlx5_eswitch *esw);
-
-struct mlx5_flow_table *
-mlx5_esw_chains_get_table(struct mlx5_eswitch *esw, u32 chain, u32 prio,
-			  u32 level);
-void
-mlx5_esw_chains_put_table(struct mlx5_eswitch *esw, u32 chain, u32 prio,
-			  u32 level);
-
-struct mlx5_flow_table *
-mlx5_esw_chains_get_tc_end_ft(struct mlx5_eswitch *esw);
-
-struct mlx5_flow_table *
-mlx5_esw_chains_create_global_table(struct mlx5_eswitch *esw);
-void
-mlx5_esw_chains_destroy_global_table(struct mlx5_eswitch *esw,
-				     struct mlx5_flow_table *ft);
-
-int
-mlx5_esw_chains_get_chain_mapping(struct mlx5_eswitch *esw, u32 chain,
-				  u32 *chain_mapping);
-int
-mlx5_esw_chains_put_chain_mapping(struct mlx5_eswitch *esw,
-				  u32 chain_mapping);
-
-int mlx5_esw_chains_create(struct mlx5_eswitch *esw);
-void mlx5_esw_chains_destroy(struct mlx5_eswitch *esw);
-
-int
-mlx5_eswitch_get_chain_for_tag(struct mlx5_eswitch *esw, u32 tag, u32 *chain);
-
-#else /* CONFIG_MLX5_CLS_ACT */
-
-static inline struct mlx5_flow_table *
-mlx5_esw_chains_get_table(struct mlx5_eswitch *esw, u32 chain, u32 prio,
-			  u32 level) { return ERR_PTR(-EOPNOTSUPP); }
-static inline void
-mlx5_esw_chains_put_table(struct mlx5_eswitch *esw, u32 chain, u32 prio,
-			  u32 level) {}
-
-static inline struct mlx5_flow_table *
-mlx5_esw_chains_get_tc_end_ft(struct mlx5_eswitch *esw) { return ERR_PTR(-EOPNOTSUPP); }
-
-static inline int mlx5_esw_chains_create(struct mlx5_eswitch *esw) { return 0; }
-static inline void mlx5_esw_chains_destroy(struct mlx5_eswitch *esw) {}
-
-#endif /* CONFIG_MLX5_CLS_ACT */
-
-#endif /* __ML5_ESW_CHAINS_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 7455fbd21a0a..fc23d57e9e44 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -42,6 +42,7 @@
 #include <linux/mlx5/vport.h>
 #include <linux/mlx5/fs.h>
 #include "lib/mpfs.h"
+#include "lib/fs_chains.h"
 #include "en/tc_ct.h"
 
 #ifdef CONFIG_MLX5_ESWITCH
@@ -62,6 +63,9 @@
 #define mlx5_esw_has_fwd_fdb(dev) \
 	MLX5_CAP_ESW_FLOWTABLE(dev, fdb_multi_path_to_table)
 
+#define esw_chains(esw) \
+	((esw)->fdb_table.offloads.esw_chains_priv)
+
 struct vport_ingress {
 	struct mlx5_flow_table *acl;
 	struct mlx5_flow_handle *allow_rule;
@@ -154,12 +158,6 @@ struct mlx5_vport {
 	enum mlx5_eswitch_vport_event enabled_events;
 };
 
-enum offloads_fdb_flags {
-	ESW_FDB_CHAINS_AND_PRIOS_SUPPORTED = BIT(0),
-};
-
-struct mlx5_esw_chains_priv;
-
 struct mlx5_eswitch_fdb {
 	union {
 		struct legacy_fdb {
@@ -183,7 +181,7 @@ struct mlx5_eswitch_fdb {
 			struct mlx5_flow_handle *miss_rule_multi;
 			int vlan_push_pop_refcount;
 
-			struct mlx5_esw_chains_priv *esw_chains_priv;
+			struct mlx5_fs_chains *esw_chains_priv;
 			struct {
 				DECLARE_HASHTABLE(table, 8);
 				/* Protects vports.table */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 00bcf97cecbc..38eef5a8feb9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -39,12 +39,12 @@
 #include "mlx5_core.h"
 #include "eswitch.h"
 #include "esw/acl/ofld.h"
-#include "esw/chains.h"
 #include "rdma.h"
 #include "en.h"
 #include "fs_core.h"
 #include "lib/devcom.h"
 #include "lib/eq.h"
+#include "lib/fs_chains.h"
 
 /* There are two match-all miss flows, one for unicast dst mac and
  * one for multicast.
@@ -294,6 +294,7 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 {
 	struct mlx5_flow_destination dest[MLX5_MAX_FLOW_FWD_VPORTS + 1] = {};
 	struct mlx5_flow_act flow_act = { .flags = FLOW_ACT_NO_APPEND, };
+	struct mlx5_fs_chains *chains = esw_chains(esw);
 	bool split = !!(attr->split_count);
 	struct mlx5_flow_handle *rule;
 	struct mlx5_flow_table *fdb;
@@ -329,12 +330,12 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 		} else if (attr->flags & MLX5_ESW_ATTR_FLAG_SLOW_PATH) {
 			flow_act.flags |= FLOW_ACT_IGNORE_FLOW_LEVEL;
 			dest[i].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
-			dest[i].ft = mlx5_esw_chains_get_tc_end_ft(esw);
+			dest[i].ft = mlx5_chains_get_tc_end_ft(chains);
 			i++;
 		} else if (attr->dest_chain) {
 			flow_act.flags |= FLOW_ACT_IGNORE_FLOW_LEVEL;
-			ft = mlx5_esw_chains_get_table(esw, attr->dest_chain,
-						       1, 0);
+			ft = mlx5_chains_get_table(chains, attr->dest_chain,
+						   1, 0);
 			if (IS_ERR(ft)) {
 				rule = ERR_CAST(ft);
 				goto err_create_goto_table;
@@ -385,8 +386,8 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 		fdb = esw_vport_tbl_get(esw, attr);
 	} else {
 		if (attr->chain || attr->prio)
-			fdb = mlx5_esw_chains_get_table(esw, attr->chain,
-							attr->prio, 0);
+			fdb = mlx5_chains_get_table(chains, attr->chain,
+						    attr->prio, 0);
 		else
 			fdb = attr->fdb;
 
@@ -416,10 +417,10 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 	if (split)
 		esw_vport_tbl_put(esw, attr);
 	else if (attr->chain || attr->prio)
-		mlx5_esw_chains_put_table(esw, attr->chain, attr->prio, 0);
+		mlx5_chains_put_table(chains, attr->chain, attr->prio, 0);
 err_esw_get:
 	if (!(attr->flags & MLX5_ESW_ATTR_FLAG_SLOW_PATH) && attr->dest_chain)
-		mlx5_esw_chains_put_table(esw, attr->dest_chain, 1, 0);
+		mlx5_chains_put_table(chains, attr->dest_chain, 1, 0);
 err_create_goto_table:
 	return rule;
 }
@@ -431,12 +432,13 @@ mlx5_eswitch_add_fwd_rule(struct mlx5_eswitch *esw,
 {
 	struct mlx5_flow_destination dest[MLX5_MAX_FLOW_FWD_VPORTS + 1] = {};
 	struct mlx5_flow_act flow_act = { .flags = FLOW_ACT_NO_APPEND, };
+	struct mlx5_fs_chains *chains = esw_chains(esw);
 	struct mlx5_flow_table *fast_fdb;
 	struct mlx5_flow_table *fwd_fdb;
 	struct mlx5_flow_handle *rule;
 	int i;
 
-	fast_fdb = mlx5_esw_chains_get_table(esw, attr->chain, attr->prio, 0);
+	fast_fdb = mlx5_chains_get_table(chains, attr->chain, attr->prio, 0);
 	if (IS_ERR(fast_fdb)) {
 		rule = ERR_CAST(fast_fdb);
 		goto err_get_fast;
@@ -483,7 +485,7 @@ mlx5_eswitch_add_fwd_rule(struct mlx5_eswitch *esw,
 add_err:
 	esw_vport_tbl_put(esw, attr);
 err_get_fwd:
-	mlx5_esw_chains_put_table(esw, attr->chain, attr->prio, 0);
+	mlx5_chains_put_table(chains, attr->chain, attr->prio, 0);
 err_get_fast:
 	return rule;
 }
@@ -494,6 +496,7 @@ __mlx5_eswitch_del_rule(struct mlx5_eswitch *esw,
 			struct mlx5_esw_flow_attr *attr,
 			bool fwd_rule)
 {
+	struct mlx5_fs_chains *chains = esw_chains(esw);
 	bool split = (attr->split_count > 0);
 	int i;
 
@@ -511,15 +514,14 @@ __mlx5_eswitch_del_rule(struct mlx5_eswitch *esw,
 
 	if (fwd_rule)  {
 		esw_vport_tbl_put(esw, attr);
-		mlx5_esw_chains_put_table(esw, attr->chain, attr->prio, 0);
+		mlx5_chains_put_table(chains, attr->chain, attr->prio, 0);
 	} else {
 		if (split)
 			esw_vport_tbl_put(esw, attr);
 		else if (attr->chain || attr->prio)
-			mlx5_esw_chains_put_table(esw, attr->chain, attr->prio,
-						  0);
+			mlx5_chains_put_table(chains, attr->chain, attr->prio, 0);
 		if (attr->dest_chain)
-			mlx5_esw_chains_put_table(esw, attr->dest_chain, 1, 0);
+			mlx5_chains_put_table(chains, attr->dest_chain, 1, 0);
 	}
 }
 
@@ -1137,6 +1139,126 @@ static void esw_set_flow_group_source_port(struct mlx5_eswitch *esw,
 	}
 }
 
+#if IS_ENABLED(CONFIG_MLX5_CLS_ACT)
+#define fdb_modify_header_fwd_to_table_supported(esw) \
+	(MLX5_CAP_ESW_FLOWTABLE((esw)->dev, fdb_modify_header_fwd_to_table))
+static void esw_init_chains_offload_flags(struct mlx5_eswitch *esw, u32 *flags)
+{
+	struct mlx5_core_dev *dev = esw->dev;
+
+	if (MLX5_CAP_ESW_FLOWTABLE_FDB(dev, ignore_flow_level))
+		*flags |= MLX5_CHAINS_IGNORE_FLOW_LEVEL_SUPPORTED;
+
+	if (!MLX5_CAP_ESW_FLOWTABLE(dev, multi_fdb_encap) &&
+	    esw->offloads.encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE) {
+		*flags &= ~MLX5_CHAINS_AND_PRIOS_SUPPORTED;
+		esw_warn(dev, "Tc chains and priorities offload aren't supported, update firmware if needed\n");
+	} else if (!mlx5_eswitch_reg_c1_loopback_enabled(esw)) {
+		*flags &= ~MLX5_CHAINS_AND_PRIOS_SUPPORTED;
+		esw_warn(dev, "Tc chains and priorities offload aren't supported\n");
+	} else if (!fdb_modify_header_fwd_to_table_supported(esw)) {
+		/* Disabled when ttl workaround is needed, e.g
+		 * when ESWITCH_IPV4_TTL_MODIFY_ENABLE = true in mlxconfig
+		 */
+		esw_warn(dev,
+			 "Tc chains and priorities offload aren't supported, check firmware version, or mlxconfig settings\n");
+		*flags &= ~MLX5_CHAINS_AND_PRIOS_SUPPORTED;
+	} else {
+		*flags |= MLX5_CHAINS_AND_PRIOS_SUPPORTED;
+		esw_info(dev, "Supported tc chains and prios offload\n");
+	}
+
+	if (esw->offloads.encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE)
+		*flags |= MLX5_CHAINS_FT_TUNNEL_SUPPORTED;
+}
+
+static int
+esw_chains_create(struct mlx5_eswitch *esw, struct mlx5_flow_table *miss_fdb)
+{
+	struct mlx5_core_dev *dev = esw->dev;
+	struct mlx5_flow_table *nf_ft, *ft;
+	struct mlx5_chains_attr attr = {};
+	struct mlx5_fs_chains *chains;
+	u32 fdb_max;
+	int err;
+
+	fdb_max = 1 << MLX5_CAP_ESW_FLOWTABLE_FDB(dev, log_max_ft_size);
+
+	esw_init_chains_offload_flags(esw, &attr.flags);
+	attr.ns = MLX5_FLOW_NAMESPACE_FDB;
+	attr.max_ft_sz = fdb_max;
+	attr.max_grp_num = esw->params.large_group_num;
+	attr.default_ft = miss_fdb;
+	attr.max_restore_tag = esw_get_max_restore_tag(esw);
+
+	chains = mlx5_chains_create(dev, &attr);
+	if (IS_ERR(chains)) {
+		err = PTR_ERR(chains);
+		esw_warn(dev, "Failed to create fdb chains err(%d)\n", err);
+		return err;
+	}
+
+	esw->fdb_table.offloads.esw_chains_priv = chains;
+
+	/* Create tc_end_ft which is the always created ft chain */
+	nf_ft = mlx5_chains_get_table(chains, mlx5_chains_get_nf_ft_chain(chains),
+				      1, 0);
+	if (IS_ERR(nf_ft)) {
+		err = PTR_ERR(nf_ft);
+		goto nf_ft_err;
+	}
+
+	/* Always open the root for fast path */
+	ft = mlx5_chains_get_table(chains, 0, 1, 0);
+	if (IS_ERR(ft)) {
+		err = PTR_ERR(ft);
+		goto level_0_err;
+	}
+
+	/* Open level 1 for split fdb rules now if prios isn't supported  */
+	if (!mlx5_chains_prios_supported(chains)) {
+		err = mlx5_esw_vport_tbl_get(esw);
+		if (err)
+			goto level_1_err;
+	}
+
+	mlx5_chains_set_end_ft(chains, nf_ft);
+
+	return 0;
+
+level_1_err:
+	mlx5_chains_put_table(chains, 0, 1, 0);
+level_0_err:
+	mlx5_chains_put_table(chains, mlx5_chains_get_nf_ft_chain(chains), 1, 0);
+nf_ft_err:
+	mlx5_chains_destroy(chains);
+	esw->fdb_table.offloads.esw_chains_priv = NULL;
+
+	return err;
+}
+
+static void
+esw_chains_destroy(struct mlx5_eswitch *esw, struct mlx5_fs_chains *chains)
+{
+	if (!mlx5_chains_prios_supported(chains))
+		mlx5_esw_vport_tbl_put(esw);
+	mlx5_chains_put_table(chains, 0, 1, 0);
+	mlx5_chains_put_table(chains, mlx5_chains_get_nf_ft_chain(chains), 1, 0);
+	mlx5_chains_destroy(chains);
+}
+
+#else /* CONFIG_MLX5_CLS_ACT */
+
+static int
+esw_chains_create(struct mlx5_eswitch *esw, struct mlx5_flow_table *miss_fdb)
+{ return 0; }
+
+static void
+esw_chains_destroy(struct mlx5_eswitch *esw, struct mlx5_fs_chains *chains)
+{}
+
+#endif
+
 static int esw_create_offloads_fdb_tables(struct mlx5_eswitch *esw)
 {
 	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
@@ -1192,9 +1314,9 @@ static int esw_create_offloads_fdb_tables(struct mlx5_eswitch *esw)
 	}
 	esw->fdb_table.offloads.slow_fdb = fdb;
 
-	err = mlx5_esw_chains_create(esw);
+	err = esw_chains_create(esw, fdb);
 	if (err) {
-		esw_warn(dev, "Failed to create fdb chains err(%d)\n", err);
+		esw_warn(dev, "Failed to open fdb chains err(%d)\n", err);
 		goto fdb_chains_err;
 	}
 
@@ -1288,7 +1410,7 @@ static int esw_create_offloads_fdb_tables(struct mlx5_eswitch *esw)
 peer_miss_err:
 	mlx5_destroy_flow_group(esw->fdb_table.offloads.send_to_vport_grp);
 send_vport_err:
-	mlx5_esw_chains_destroy(esw);
+	esw_chains_destroy(esw, esw_chains(esw));
 fdb_chains_err:
 	mlx5_destroy_flow_table(esw->fdb_table.offloads.slow_fdb);
 slow_fdb_err:
@@ -1312,7 +1434,8 @@ static void esw_destroy_offloads_fdb_tables(struct mlx5_eswitch *esw)
 		mlx5_destroy_flow_group(esw->fdb_table.offloads.peer_miss_grp);
 	mlx5_destroy_flow_group(esw->fdb_table.offloads.miss_grp);
 
-	mlx5_esw_chains_destroy(esw);
+	esw_chains_destroy(esw, esw_chains(esw));
+
 	mlx5_destroy_flow_table(esw->fdb_table.offloads.slow_fdb);
 	/* Holds true only as long as DMFS is the default */
 	mlx5_flow_namespace_set_mode(esw->fdb_table.offloads.ns,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
new file mode 100644
index 000000000000..5bd65cdc9b07
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
@@ -0,0 +1,902 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+// Copyright (c) 2020 Mellanox Technologies.
+
+#include <linux/mlx5/driver.h>
+#include <linux/mlx5/mlx5_ifc.h>
+#include <linux/mlx5/fs.h>
+
+#include "lib/fs_chains.h"
+#include "en/mapping.h"
+#include "mlx5_core.h"
+#include "fs_core.h"
+#include "eswitch.h"
+#include "en.h"
+#include "en_tc.h"
+
+#define chains_lock(chains) ((chains)->lock)
+#define chains_ht(chains) ((chains)->chains_ht)
+#define chains_mapping(chains) ((chains)->chains_mapping)
+#define prios_ht(chains) ((chains)->prios_ht)
+#define ft_pool_left(chains) ((chains)->ft_left)
+#define tc_default_ft(chains) ((chains)->tc_default_ft)
+#define tc_end_ft(chains) ((chains)->tc_end_ft)
+#define ns_to_chains_fs_prio(ns) ((ns) == MLX5_FLOW_NAMESPACE_FDB ? \
+				  FDB_TC_OFFLOAD : MLX5E_TC_PRIO)
+
+/* Firmware currently has 4 pool of 4 sizes that it supports (FT_POOLS),
+ * and a virtual memory region of 16M (MLX5_FT_SIZE), this region is duplicated
+ * for each flow table pool. We can allocate up to 16M of each pool,
+ * and we keep track of how much we used via get_next_avail_sz_from_pool.
+ * Firmware doesn't report any of this for now.
+ * ESW_POOL is expected to be sorted from large to small and match firmware
+ * pools.
+ */
+#define FT_SIZE (16 * 1024 * 1024)
+static const unsigned int FT_POOLS[] = { 4 * 1024 * 1024,
+					  1 * 1024 * 1024,
+					  64 * 1024,
+					  128 };
+#define FT_TBL_SZ (64 * 1024)
+
+struct mlx5_fs_chains {
+	struct mlx5_core_dev *dev;
+
+	struct rhashtable chains_ht;
+	struct rhashtable prios_ht;
+	/* Protects above chains_ht and prios_ht */
+	struct mutex lock;
+
+	struct mlx5_flow_table *tc_default_ft;
+	struct mlx5_flow_table *tc_end_ft;
+	struct mapping_ctx *chains_mapping;
+
+	enum mlx5_flow_namespace_type ns;
+	u32 group_num;
+	u32 flags;
+
+	int ft_left[ARRAY_SIZE(FT_POOLS)];
+};
+
+struct fs_chain {
+	struct rhash_head node;
+
+	u32 chain;
+
+	int ref;
+	int id;
+
+	struct mlx5_fs_chains *chains;
+	struct list_head prios_list;
+	struct mlx5_flow_handle *restore_rule;
+	struct mlx5_modify_hdr *miss_modify_hdr;
+};
+
+struct prio_key {
+	u32 chain;
+	u32 prio;
+	u32 level;
+};
+
+struct prio {
+	struct rhash_head node;
+	struct list_head list;
+
+	struct prio_key key;
+
+	int ref;
+
+	struct fs_chain *chain;
+	struct mlx5_flow_table *ft;
+	struct mlx5_flow_table *next_ft;
+	struct mlx5_flow_group *miss_group;
+	struct mlx5_flow_handle *miss_rule;
+};
+
+static const struct rhashtable_params chain_params = {
+	.head_offset = offsetof(struct fs_chain, node),
+	.key_offset = offsetof(struct fs_chain, chain),
+	.key_len = sizeof_field(struct fs_chain, chain),
+	.automatic_shrinking = true,
+};
+
+static const struct rhashtable_params prio_params = {
+	.head_offset = offsetof(struct prio, node),
+	.key_offset = offsetof(struct prio, key),
+	.key_len = sizeof_field(struct prio, key),
+	.automatic_shrinking = true,
+};
+
+bool mlx5_chains_prios_supported(struct mlx5_fs_chains *chains)
+{
+	return chains->flags & MLX5_CHAINS_AND_PRIOS_SUPPORTED;
+}
+
+static bool mlx5_chains_ignore_flow_level_supported(struct mlx5_fs_chains *chains)
+{
+	return chains->flags & MLX5_CHAINS_IGNORE_FLOW_LEVEL_SUPPORTED;
+}
+
+bool mlx5_chains_backwards_supported(struct mlx5_fs_chains *chains)
+{
+	return mlx5_chains_prios_supported(chains) &&
+	       mlx5_chains_ignore_flow_level_supported(chains);
+}
+
+u32 mlx5_chains_get_chain_range(struct mlx5_fs_chains *chains)
+{
+	if (!mlx5_chains_prios_supported(chains))
+		return 1;
+
+	if (mlx5_chains_ignore_flow_level_supported(chains))
+		return UINT_MAX - 1;
+
+	/* We should get here only for eswitch case */
+	return FDB_TC_MAX_CHAIN;
+}
+
+u32 mlx5_chains_get_nf_ft_chain(struct mlx5_fs_chains *chains)
+{
+	return mlx5_chains_get_chain_range(chains) + 1;
+}
+
+u32 mlx5_chains_get_prio_range(struct mlx5_fs_chains *chains)
+{
+	if (!mlx5_chains_prios_supported(chains))
+		return 1;
+
+	if (mlx5_chains_ignore_flow_level_supported(chains))
+		return UINT_MAX;
+
+	/* We should get here only for eswitch case */
+	return FDB_TC_MAX_PRIO;
+}
+
+static unsigned int mlx5_chains_get_level_range(struct mlx5_fs_chains *chains)
+{
+	if (mlx5_chains_ignore_flow_level_supported(chains))
+		return UINT_MAX;
+
+	/* Same value for FDB and NIC RX tables */
+	return FDB_TC_LEVELS_PER_PRIO;
+}
+
+void
+mlx5_chains_set_end_ft(struct mlx5_fs_chains *chains,
+		       struct mlx5_flow_table *ft)
+{
+	tc_end_ft(chains) = ft;
+}
+
+#define POOL_NEXT_SIZE 0
+static int
+mlx5_chains_get_avail_sz_from_pool(struct mlx5_fs_chains *chains,
+				   int desired_size)
+{
+	int i, found_i = -1;
+
+	for (i = ARRAY_SIZE(FT_POOLS) - 1; i >= 0; i--) {
+		if (ft_pool_left(chains)[i] && FT_POOLS[i] > desired_size) {
+			found_i = i;
+			if (desired_size != POOL_NEXT_SIZE)
+				break;
+		}
+	}
+
+	if (found_i != -1) {
+		--ft_pool_left(chains)[found_i];
+		return FT_POOLS[found_i];
+	}
+
+	return 0;
+}
+
+static void
+mlx5_chains_put_sz_to_pool(struct mlx5_fs_chains *chains, int sz)
+{
+	int i;
+
+	for (i = ARRAY_SIZE(FT_POOLS) - 1; i >= 0; i--) {
+		if (sz == FT_POOLS[i]) {
+			++ft_pool_left(chains)[i];
+			return;
+		}
+	}
+
+	WARN_ONCE(1, "Couldn't find size %d in flow table size pool", sz);
+}
+
+static void
+mlx5_chains_init_sz_pool(struct mlx5_fs_chains *chains, u32 ft_max)
+{
+	int i;
+
+	for (i = ARRAY_SIZE(FT_POOLS) - 1; i >= 0; i--)
+		ft_pool_left(chains)[i] =
+			FT_POOLS[i] <= ft_max ? FT_SIZE / FT_POOLS[i] : 0;
+}
+
+static struct mlx5_flow_table *
+mlx5_chains_create_table(struct mlx5_fs_chains *chains,
+			 u32 chain, u32 prio, u32 level)
+{
+	struct mlx5_flow_table_attr ft_attr = {};
+	struct mlx5_flow_namespace *ns;
+	struct mlx5_flow_table *ft;
+	int sz;
+
+	if (chains->flags & MLX5_CHAINS_FT_TUNNEL_SUPPORTED)
+		ft_attr.flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT |
+				  MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
+
+	sz = (chain == mlx5_chains_get_nf_ft_chain(chains)) ?
+	     mlx5_chains_get_avail_sz_from_pool(chains, FT_TBL_SZ) :
+	     mlx5_chains_get_avail_sz_from_pool(chains, POOL_NEXT_SIZE);
+	if (!sz)
+		return ERR_PTR(-ENOSPC);
+	ft_attr.max_fte = sz;
+
+	/* We use tc_default_ft(chains) as the table's next_ft till
+	 * ignore_flow_level is allowed on FT creation and not just for FTEs.
+	 * Instead caller should add an explicit miss rule if needed.
+	 */
+	ft_attr.next_ft = tc_default_ft(chains);
+
+	/* The root table(chain 0, prio 1, level 0) is required to be
+	 * connected to the previous fs_core managed prio.
+	 * We always create it, as a managed table, in order to align with
+	 * fs_core logic.
+	 */
+	if (!mlx5_chains_ignore_flow_level_supported(chains) ||
+	    (chain == 0 && prio == 1 && level == 0)) {
+		ft_attr.level = level;
+		ft_attr.prio = prio - 1;
+		ns = (chains->ns == MLX5_FLOW_NAMESPACE_FDB) ?
+			mlx5_get_fdb_sub_ns(chains->dev, chain) :
+			mlx5_get_flow_namespace(chains->dev, chains->ns);
+	} else {
+		ft_attr.flags |= MLX5_FLOW_TABLE_UNMANAGED;
+		ft_attr.prio = ns_to_chains_fs_prio(chains->ns);
+		/* Firmware doesn't allow us to create another level 0 table,
+		 * so we create all unmanaged tables as level 1.
+		 *
+		 * To connect them, we use explicit miss rules with
+		 * ignore_flow_level. Caller is responsible to create
+		 * these rules (if needed).
+		 */
+		ft_attr.level = 1;
+		ns = mlx5_get_flow_namespace(chains->dev, chains->ns);
+	}
+
+	ft_attr.autogroup.num_reserved_entries = 2;
+	ft_attr.autogroup.max_num_groups = chains->group_num;
+	ft = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
+	if (IS_ERR(ft)) {
+		mlx5_core_warn(chains->dev, "Failed to create chains table err %d (chain: %d, prio: %d, level: %d, size: %d)\n",
+			       (int)PTR_ERR(ft), chain, prio, level, sz);
+		mlx5_chains_put_sz_to_pool(chains, sz);
+		return ft;
+	}
+
+	return ft;
+}
+
+static void
+mlx5_chains_destroy_table(struct mlx5_fs_chains *chains,
+			  struct mlx5_flow_table *ft)
+{
+	mlx5_chains_put_sz_to_pool(chains, ft->max_fte);
+	mlx5_destroy_flow_table(ft);
+}
+
+static int
+create_chain_restore(struct fs_chain *chain)
+{
+	struct mlx5_eswitch *esw = chain->chains->dev->priv.eswitch;
+	char modact[MLX5_UN_SZ_BYTES(set_add_copy_action_in_auto)];
+	struct mlx5_fs_chains *chains = chain->chains;
+	enum mlx5e_tc_attr_to_reg chain_to_reg;
+	struct mlx5_modify_hdr *mod_hdr;
+	u32 index;
+	int err;
+
+	if (chain->chain == mlx5_chains_get_nf_ft_chain(chains) ||
+	    !mlx5_chains_prios_supported(chains))
+		return 0;
+
+	err = mapping_add(chains_mapping(chains), &chain->chain, &index);
+	if (err)
+		return err;
+	if (index == MLX5_FS_DEFAULT_FLOW_TAG) {
+		/* we got the special default flow tag id, so we won't know
+		 * if we actually marked the packet with the restore rule
+		 * we create.
+		 *
+		 * This case isn't possible with MLX5_FS_DEFAULT_FLOW_TAG = 0.
+		 */
+		err = mapping_add(chains_mapping(chains),
+				  &chain->chain, &index);
+		mapping_remove(chains_mapping(chains),
+			       MLX5_FS_DEFAULT_FLOW_TAG);
+		if (err)
+			return err;
+	}
+
+	chain->id = index;
+
+	if (chains->ns == MLX5_FLOW_NAMESPACE_FDB) {
+		chain_to_reg = CHAIN_TO_REG;
+		chain->restore_rule = esw_add_restore_rule(esw, chain->id);
+		if (IS_ERR(chain->restore_rule)) {
+			err = PTR_ERR(chain->restore_rule);
+			goto err_rule;
+		}
+	} else {
+		err = -EINVAL;
+		goto err_rule;
+	}
+
+	MLX5_SET(set_action_in, modact, action_type, MLX5_ACTION_TYPE_SET);
+	MLX5_SET(set_action_in, modact, field,
+		 mlx5e_tc_attr_to_reg_mappings[chain_to_reg].mfield);
+	MLX5_SET(set_action_in, modact, offset,
+		 mlx5e_tc_attr_to_reg_mappings[chain_to_reg].moffset * 8);
+	MLX5_SET(set_action_in, modact, length,
+		 mlx5e_tc_attr_to_reg_mappings[chain_to_reg].mlen * 8);
+	MLX5_SET(set_action_in, modact, data, chain->id);
+	mod_hdr = mlx5_modify_header_alloc(chains->dev, chains->ns,
+					   1, modact);
+	if (IS_ERR(mod_hdr)) {
+		err = PTR_ERR(mod_hdr);
+		goto err_mod_hdr;
+	}
+	chain->miss_modify_hdr = mod_hdr;
+
+	return 0;
+
+err_mod_hdr:
+	if (!IS_ERR_OR_NULL(chain->restore_rule))
+		mlx5_del_flow_rules(chain->restore_rule);
+err_rule:
+	/* Datapath can't find this mapping, so we can safely remove it */
+	mapping_remove(chains_mapping(chains), chain->id);
+	return err;
+}
+
+static void destroy_chain_restore(struct fs_chain *chain)
+{
+	struct mlx5_fs_chains *chains = chain->chains;
+
+	if (!chain->miss_modify_hdr)
+		return;
+
+	if (chain->restore_rule)
+		mlx5_del_flow_rules(chain->restore_rule);
+
+	mlx5_modify_header_dealloc(chains->dev, chain->miss_modify_hdr);
+	mapping_remove(chains_mapping(chains), chain->id);
+}
+
+static struct fs_chain *
+mlx5_chains_create_chain(struct mlx5_fs_chains *chains, u32 chain)
+{
+	struct fs_chain *chain_s = NULL;
+	int err;
+
+	chain_s = kvzalloc(sizeof(*chain_s), GFP_KERNEL);
+	if (!chain_s)
+		return ERR_PTR(-ENOMEM);
+
+	chain_s->chains = chains;
+	chain_s->chain = chain;
+	INIT_LIST_HEAD(&chain_s->prios_list);
+
+	err = create_chain_restore(chain_s);
+	if (err)
+		goto err_restore;
+
+	err = rhashtable_insert_fast(&chains_ht(chains), &chain_s->node,
+				     chain_params);
+	if (err)
+		goto err_insert;
+
+	return chain_s;
+
+err_insert:
+	destroy_chain_restore(chain_s);
+err_restore:
+	kvfree(chain_s);
+	return ERR_PTR(err);
+}
+
+static void
+mlx5_chains_destroy_chain(struct fs_chain *chain)
+{
+	struct mlx5_fs_chains *chains = chain->chains;
+
+	rhashtable_remove_fast(&chains_ht(chains), &chain->node,
+			       chain_params);
+
+	destroy_chain_restore(chain);
+	kvfree(chain);
+}
+
+static struct fs_chain *
+mlx5_chains_get_chain(struct mlx5_fs_chains *chains, u32 chain)
+{
+	struct fs_chain *chain_s;
+
+	chain_s = rhashtable_lookup_fast(&chains_ht(chains), &chain,
+					 chain_params);
+	if (!chain_s) {
+		chain_s = mlx5_chains_create_chain(chains, chain);
+		if (IS_ERR(chain_s))
+			return chain_s;
+	}
+
+	chain_s->ref++;
+
+	return chain_s;
+}
+
+static struct mlx5_flow_handle *
+mlx5_chains_add_miss_rule(struct fs_chain *chain,
+			  struct mlx5_flow_table *ft,
+			  struct mlx5_flow_table *next_ft)
+{
+	struct mlx5_fs_chains *chains = chain->chains;
+	struct mlx5_flow_destination dest = {};
+	struct mlx5_flow_act act = {};
+
+	act.flags  = FLOW_ACT_IGNORE_FLOW_LEVEL | FLOW_ACT_NO_APPEND;
+	act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+	dest.type  = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
+	dest.ft = next_ft;
+
+	if (next_ft == tc_end_ft(chains) &&
+	    chain->chain != mlx5_chains_get_nf_ft_chain(chains) &&
+	    mlx5_chains_prios_supported(chains)) {
+		act.modify_hdr = chain->miss_modify_hdr;
+		act.action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
+	}
+
+	return mlx5_add_flow_rules(ft, NULL, &act, &dest, 1);
+}
+
+static int
+mlx5_chains_update_prio_prevs(struct prio *prio,
+			      struct mlx5_flow_table *next_ft)
+{
+	struct mlx5_flow_handle *miss_rules[FDB_TC_LEVELS_PER_PRIO + 1] = {};
+	struct fs_chain *chain = prio->chain;
+	struct prio *pos;
+	int n = 0, err;
+
+	if (prio->key.level)
+		return 0;
+
+	/* Iterate in reverse order until reaching the level 0 rule of
+	 * the previous priority, adding all the miss rules first, so we can
+	 * revert them if any of them fails.
+	 */
+	pos = prio;
+	list_for_each_entry_continue_reverse(pos,
+					     &chain->prios_list,
+					     list) {
+		miss_rules[n] = mlx5_chains_add_miss_rule(chain,
+							  pos->ft,
+							  next_ft);
+		if (IS_ERR(miss_rules[n])) {
+			err = PTR_ERR(miss_rules[n]);
+			goto err_prev_rule;
+		}
+
+		n++;
+		if (!pos->key.level)
+			break;
+	}
+
+	/* Success, delete old miss rules, and update the pointers. */
+	n = 0;
+	pos = prio;
+	list_for_each_entry_continue_reverse(pos,
+					     &chain->prios_list,
+					     list) {
+		mlx5_del_flow_rules(pos->miss_rule);
+
+		pos->miss_rule = miss_rules[n];
+		pos->next_ft = next_ft;
+
+		n++;
+		if (!pos->key.level)
+			break;
+	}
+
+	return 0;
+
+err_prev_rule:
+	while (--n >= 0)
+		mlx5_del_flow_rules(miss_rules[n]);
+
+	return err;
+}
+
+static void
+mlx5_chains_put_chain(struct fs_chain *chain)
+{
+	if (--chain->ref == 0)
+		mlx5_chains_destroy_chain(chain);
+}
+
+static struct prio *
+mlx5_chains_create_prio(struct mlx5_fs_chains *chains,
+			u32 chain, u32 prio, u32 level)
+{
+	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
+	struct mlx5_flow_handle *miss_rule = NULL;
+	struct mlx5_flow_group *miss_group;
+	struct mlx5_flow_table *next_ft;
+	struct mlx5_flow_table *ft;
+	struct prio *prio_s = NULL;
+	struct fs_chain *chain_s;
+	struct list_head *pos;
+	u32 *flow_group_in;
+	int err;
+
+	chain_s = mlx5_chains_get_chain(chains, chain);
+	if (IS_ERR(chain_s))
+		return ERR_CAST(chain_s);
+
+	prio_s = kvzalloc(sizeof(*prio_s), GFP_KERNEL);
+	flow_group_in = kvzalloc(inlen, GFP_KERNEL);
+	if (!prio_s || !flow_group_in) {
+		err = -ENOMEM;
+		goto err_alloc;
+	}
+
+	/* Chain's prio list is sorted by prio and level.
+	 * And all levels of some prio point to the next prio's level 0.
+	 * Example list (prio, level):
+	 * (3,0)->(3,1)->(5,0)->(5,1)->(6,1)->(7,0)
+	 * In hardware, we will we have the following pointers:
+	 * (3,0) -> (5,0) -> (7,0) -> Slow path
+	 * (3,1) -> (5,0)
+	 * (5,1) -> (7,0)
+	 * (6,1) -> (7,0)
+	 */
+
+	/* Default miss for each chain: */
+	next_ft = (chain == mlx5_chains_get_nf_ft_chain(chains)) ?
+		  tc_default_ft(chains) :
+		  tc_end_ft(chains);
+	list_for_each(pos, &chain_s->prios_list) {
+		struct prio *p = list_entry(pos, struct prio, list);
+
+		/* exit on first pos that is larger */
+		if (prio < p->key.prio || (prio == p->key.prio &&
+					   level < p->key.level)) {
+			/* Get next level 0 table */
+			next_ft = p->key.level == 0 ? p->ft : p->next_ft;
+			break;
+		}
+	}
+
+	ft = mlx5_chains_create_table(chains, chain, prio, level);
+	if (IS_ERR(ft)) {
+		err = PTR_ERR(ft);
+		goto err_create;
+	}
+
+	MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index,
+		 ft->max_fte - 2);
+	MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index,
+		 ft->max_fte - 1);
+	miss_group = mlx5_create_flow_group(ft, flow_group_in);
+	if (IS_ERR(miss_group)) {
+		err = PTR_ERR(miss_group);
+		goto err_group;
+	}
+
+	/* Add miss rule to next_ft */
+	miss_rule = mlx5_chains_add_miss_rule(chain_s, ft, next_ft);
+	if (IS_ERR(miss_rule)) {
+		err = PTR_ERR(miss_rule);
+		goto err_miss_rule;
+	}
+
+	prio_s->miss_group = miss_group;
+	prio_s->miss_rule = miss_rule;
+	prio_s->next_ft = next_ft;
+	prio_s->chain = chain_s;
+	prio_s->key.chain = chain;
+	prio_s->key.prio = prio;
+	prio_s->key.level = level;
+	prio_s->ft = ft;
+
+	err = rhashtable_insert_fast(&prios_ht(chains), &prio_s->node,
+				     prio_params);
+	if (err)
+		goto err_insert;
+
+	list_add(&prio_s->list, pos->prev);
+
+	/* Table is ready, connect it */
+	err = mlx5_chains_update_prio_prevs(prio_s, ft);
+	if (err)
+		goto err_update;
+
+	kvfree(flow_group_in);
+	return prio_s;
+
+err_update:
+	list_del(&prio_s->list);
+	rhashtable_remove_fast(&prios_ht(chains), &prio_s->node,
+			       prio_params);
+err_insert:
+	mlx5_del_flow_rules(miss_rule);
+err_miss_rule:
+	mlx5_destroy_flow_group(miss_group);
+err_group:
+	mlx5_chains_destroy_table(chains, ft);
+err_create:
+err_alloc:
+	kvfree(prio_s);
+	kvfree(flow_group_in);
+	mlx5_chains_put_chain(chain_s);
+	return ERR_PTR(err);
+}
+
+static void
+mlx5_chains_destroy_prio(struct mlx5_fs_chains *chains,
+			 struct prio *prio)
+{
+	struct fs_chain *chain = prio->chain;
+
+	WARN_ON(mlx5_chains_update_prio_prevs(prio,
+					      prio->next_ft));
+
+	list_del(&prio->list);
+	rhashtable_remove_fast(&prios_ht(chains), &prio->node,
+			       prio_params);
+	mlx5_del_flow_rules(prio->miss_rule);
+	mlx5_destroy_flow_group(prio->miss_group);
+	mlx5_chains_destroy_table(chains, prio->ft);
+	mlx5_chains_put_chain(chain);
+	kvfree(prio);
+}
+
+struct mlx5_flow_table *
+mlx5_chains_get_table(struct mlx5_fs_chains *chains, u32 chain, u32 prio,
+		      u32 level)
+{
+	struct mlx5_flow_table *prev_fts;
+	struct prio *prio_s;
+	struct prio_key key;
+	int l = 0;
+
+	if ((chain > mlx5_chains_get_chain_range(chains) &&
+	     chain != mlx5_chains_get_nf_ft_chain(chains)) ||
+	    prio > mlx5_chains_get_prio_range(chains) ||
+	    level > mlx5_chains_get_level_range(chains))
+		return ERR_PTR(-EOPNOTSUPP);
+
+	/* create earlier levels for correct fs_core lookup when
+	 * connecting tables.
+	 */
+	for (l = 0; l < level; l++) {
+		prev_fts = mlx5_chains_get_table(chains, chain, prio, l);
+		if (IS_ERR(prev_fts)) {
+			prio_s = ERR_CAST(prev_fts);
+			goto err_get_prevs;
+		}
+	}
+
+	key.chain = chain;
+	key.prio = prio;
+	key.level = level;
+
+	mutex_lock(&chains_lock(chains));
+	prio_s = rhashtable_lookup_fast(&prios_ht(chains), &key,
+					prio_params);
+	if (!prio_s) {
+		prio_s = mlx5_chains_create_prio(chains, chain,
+						 prio, level);
+		if (IS_ERR(prio_s))
+			goto err_create_prio;
+	}
+
+	++prio_s->ref;
+	mutex_unlock(&chains_lock(chains));
+
+	return prio_s->ft;
+
+err_create_prio:
+	mutex_unlock(&chains_lock(chains));
+err_get_prevs:
+	while (--l >= 0)
+		mlx5_chains_put_table(chains, chain, prio, l);
+	return ERR_CAST(prio_s);
+}
+
+void
+mlx5_chains_put_table(struct mlx5_fs_chains *chains, u32 chain, u32 prio,
+		      u32 level)
+{
+	struct prio *prio_s;
+	struct prio_key key;
+
+	key.chain = chain;
+	key.prio = prio;
+	key.level = level;
+
+	mutex_lock(&chains_lock(chains));
+	prio_s = rhashtable_lookup_fast(&prios_ht(chains), &key,
+					prio_params);
+	if (!prio_s)
+		goto err_get_prio;
+
+	if (--prio_s->ref == 0)
+		mlx5_chains_destroy_prio(chains, prio_s);
+	mutex_unlock(&chains_lock(chains));
+
+	while (level-- > 0)
+		mlx5_chains_put_table(chains, chain, prio, level);
+
+	return;
+
+err_get_prio:
+	mutex_unlock(&chains_lock(chains));
+	WARN_ONCE(1,
+		  "Couldn't find table: (chain: %d prio: %d level: %d)",
+		  chain, prio, level);
+}
+
+struct mlx5_flow_table *
+mlx5_chains_get_tc_end_ft(struct mlx5_fs_chains *chains)
+{
+	return tc_end_ft(chains);
+}
+
+struct mlx5_flow_table *
+mlx5_chains_create_global_table(struct mlx5_fs_chains *chains)
+{
+	u32 chain, prio, level;
+	int err;
+
+	if (!mlx5_chains_ignore_flow_level_supported(chains)) {
+		err = -EOPNOTSUPP;
+
+		mlx5_core_warn(chains->dev,
+			       "Couldn't create global flow table, ignore_flow_level not supported.");
+		goto err_ignore;
+	}
+
+	chain = mlx5_chains_get_chain_range(chains),
+	prio = mlx5_chains_get_prio_range(chains);
+	level = mlx5_chains_get_level_range(chains);
+
+	return mlx5_chains_create_table(chains, chain, prio, level);
+
+err_ignore:
+	return ERR_PTR(err);
+}
+
+void
+mlx5_chains_destroy_global_table(struct mlx5_fs_chains *chains,
+				 struct mlx5_flow_table *ft)
+{
+	mlx5_chains_destroy_table(chains, ft);
+}
+
+static struct mlx5_fs_chains *
+mlx5_chains_init(struct mlx5_core_dev *dev, struct mlx5_chains_attr *attr)
+{
+	struct mlx5_fs_chains *chains_priv;
+	struct mapping_ctx *mapping;
+	u32 max_flow_counter;
+	int err;
+
+	chains_priv = kzalloc(sizeof(*chains_priv), GFP_KERNEL);
+	if (!chains_priv)
+		return ERR_PTR(-ENOMEM);
+
+	max_flow_counter = (MLX5_CAP_GEN(dev, max_flow_counter_31_16) << 16) |
+			    MLX5_CAP_GEN(dev, max_flow_counter_15_0);
+
+	mlx5_core_dbg(dev,
+		      "Init flow table chains, max counters(%d), groups(%d), max flow table size(%d)\n",
+		      max_flow_counter, attr->max_grp_num, attr->max_ft_sz);
+
+	chains_priv->dev = dev;
+	chains_priv->flags = attr->flags;
+	chains_priv->ns = attr->ns;
+	chains_priv->group_num = attr->max_grp_num;
+	tc_default_ft(chains_priv) = tc_end_ft(chains_priv) = attr->default_ft;
+
+	mlx5_core_info(dev, "Supported tc offload range - chains: %u, prios: %u\n",
+		       mlx5_chains_get_chain_range(chains_priv),
+		       mlx5_chains_get_prio_range(chains_priv));
+
+	mlx5_chains_init_sz_pool(chains_priv, attr->max_ft_sz);
+
+	err = rhashtable_init(&chains_ht(chains_priv), &chain_params);
+	if (err)
+		goto init_chains_ht_err;
+
+	err = rhashtable_init(&prios_ht(chains_priv), &prio_params);
+	if (err)
+		goto init_prios_ht_err;
+
+	mapping = mapping_create(sizeof(u32), attr->max_restore_tag,
+				 true);
+	if (IS_ERR(mapping)) {
+		err = PTR_ERR(mapping);
+		goto mapping_err;
+	}
+	chains_mapping(chains_priv) = mapping;
+
+	mutex_init(&chains_lock(chains_priv));
+
+	return chains_priv;
+
+mapping_err:
+	rhashtable_destroy(&prios_ht(chains_priv));
+init_prios_ht_err:
+	rhashtable_destroy(&chains_ht(chains_priv));
+init_chains_ht_err:
+	kfree(chains_priv);
+	return ERR_PTR(err);
+}
+
+static void
+mlx5_chains_cleanup(struct mlx5_fs_chains *chains)
+{
+	mutex_destroy(&chains_lock(chains));
+	mapping_destroy(chains_mapping(chains));
+	rhashtable_destroy(&prios_ht(chains));
+	rhashtable_destroy(&chains_ht(chains));
+
+	kfree(chains);
+}
+
+struct mlx5_fs_chains *
+mlx5_chains_create(struct mlx5_core_dev *dev, struct mlx5_chains_attr *attr)
+{
+	struct mlx5_fs_chains *chains;
+
+	chains = mlx5_chains_init(dev, attr);
+
+	return chains;
+}
+
+void
+mlx5_chains_destroy(struct mlx5_fs_chains *chains)
+{
+	mlx5_chains_cleanup(chains);
+}
+
+int
+mlx5_chains_get_chain_mapping(struct mlx5_fs_chains *chains, u32 chain,
+			      u32 *chain_mapping)
+{
+	return mapping_add(chains_mapping(chains), &chain, chain_mapping);
+}
+
+int
+mlx5_chains_put_chain_mapping(struct mlx5_fs_chains *chains, u32 chain_mapping)
+{
+	return mapping_remove(chains_mapping(chains), chain_mapping);
+}
+
+int mlx5_get_chain_for_tag(struct mlx5_fs_chains *chains, u32 tag,
+			   u32 *chain)
+{
+	int err;
+
+	err = mapping_find(chains_mapping(chains), tag, chain);
+	if (err) {
+		mlx5_core_warn(chains->dev, "Can't find chain for tag: %d\n", tag);
+		return -ENOENT;
+	}
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.h
new file mode 100644
index 000000000000..6d5be31b05dd
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.h
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2020 Mellanox Technologies. */
+
+#ifndef __ML5_ESW_CHAINS_H__
+#define __ML5_ESW_CHAINS_H__
+
+#include <linux/mlx5/fs.h>
+
+struct mlx5_fs_chains;
+
+enum mlx5_chains_flags {
+	MLX5_CHAINS_AND_PRIOS_SUPPORTED = BIT(0),
+	MLX5_CHAINS_IGNORE_FLOW_LEVEL_SUPPORTED = BIT(1),
+	MLX5_CHAINS_FT_TUNNEL_SUPPORTED = BIT(2),
+};
+
+struct mlx5_chains_attr {
+	enum mlx5_flow_namespace_type ns;
+	u32 flags;
+	u32 max_ft_sz;
+	u32 max_grp_num;
+	struct mlx5_flow_table *default_ft;
+	u32 max_restore_tag;
+};
+
+#if IS_ENABLED(CONFIG_MLX5_CLS_ACT)
+
+bool
+mlx5_chains_prios_supported(struct mlx5_fs_chains *chains);
+bool
+mlx5_chains_backwards_supported(struct mlx5_fs_chains *chains);
+u32
+mlx5_chains_get_prio_range(struct mlx5_fs_chains *chains);
+u32
+mlx5_chains_get_chain_range(struct mlx5_fs_chains *chains);
+u32
+mlx5_chains_get_nf_ft_chain(struct mlx5_fs_chains *chains);
+
+struct mlx5_flow_table *
+mlx5_chains_get_table(struct mlx5_fs_chains *chains, u32 chain, u32 prio,
+		      u32 level);
+void
+mlx5_chains_put_table(struct mlx5_fs_chains *chains, u32 chain, u32 prio,
+		      u32 level);
+
+struct mlx5_flow_table *
+mlx5_chains_get_tc_end_ft(struct mlx5_fs_chains *chains);
+
+struct mlx5_flow_table *
+mlx5_chains_create_global_table(struct mlx5_fs_chains *chains);
+void
+mlx5_chains_destroy_global_table(struct mlx5_fs_chains *chains,
+				 struct mlx5_flow_table *ft);
+
+int
+mlx5_chains_get_chain_mapping(struct mlx5_fs_chains *chains, u32 chain,
+			      u32 *chain_mapping);
+int
+mlx5_chains_put_chain_mapping(struct mlx5_fs_chains *chains,
+			      u32 chain_mapping);
+
+struct mlx5_fs_chains *
+mlx5_chains_create(struct mlx5_core_dev *dev, struct mlx5_chains_attr *attr);
+void mlx5_chains_destroy(struct mlx5_fs_chains *chains);
+
+int
+mlx5_get_chain_for_tag(struct mlx5_fs_chains *chains, u32 tag, u32 *chain);
+
+void
+mlx5_chains_set_end_ft(struct mlx5_fs_chains *chains,
+		       struct mlx5_flow_table *ft);
+
+#else /* CONFIG_MLX5_CLS_ACT */
+
+static inline struct mlx5_flow_table *
+mlx5_chains_get_table(struct mlx5_fs_chains *chains, u32 chain, u32 prio,
+		      u32 level) { return ERR_PTR(-EOPNOTSUPP); }
+static inline void
+mlx5_chains_put_table(struct mlx5_fs_chains *chains, u32 chain, u32 prio,
+		      u32 level) {};
+
+static inline struct mlx5_flow_table *
+mlx5_chains_get_tc_end_ft(struct mlx5_fs_chains *chains) { return ERR_PTR(-EOPNOTSUPP); }
+
+static inline struct mlx5_fs_chains *
+mlx5_chains_create(struct mlx5_core_dev *dev, struct mlx5_chains_attr *attr)
+{ return NULL; }
+static inline void
+mlx5_chains_destroy(struct mlx5_fs_chains *chains) {};
+
+#endif /* CONFIG_MLX5_CLS_ACT */
+
+#endif /* __ML5_ESW_CHAINS_H__ */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next V2 02/15] net/mlx5: Allow ft level ignore for nic rx tables
  2020-09-23 22:48 [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode saeed
  2020-09-23 22:48 ` [net-next V2 01/15] net/mlx5: Refactor multi chains and prios support saeed
@ 2020-09-23 22:48 ` saeed
  2020-09-23 22:48 ` [net-next V2 03/15] net/mlx5e: Tc nic flows to use mlx5_chains flow tables saeed
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: saeed @ 2020-09-23 22:48 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Ariel Levkovich, Roi Dayan, Saeed Mahameed, Saeed Mahameed

From: Ariel Levkovich <lariel@mellanox.com>

Allow setting a flow table with a lower level
as a rule destination in nic rx tables.
This is required in order to support table chaining
of tc nic flows.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 75fa44eee434..6141e9ec8190 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1595,11 +1595,12 @@ static bool dest_is_valid(struct mlx5_flow_destination *dest,
 		return true;
 
 	if (ignore_level) {
-		if (ft->type != FS_FT_FDB)
+		if (ft->type != FS_FT_FDB &&
+		    ft->type != FS_FT_NIC_RX)
 			return false;
 
 		if (dest->type == MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE &&
-		    dest->ft->type != FS_FT_FDB)
+		    ft->type != dest->ft->type)
 			return false;
 	}
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next V2 03/15] net/mlx5e: Tc nic flows to use mlx5_chains flow tables
  2020-09-23 22:48 [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode saeed
  2020-09-23 22:48 ` [net-next V2 01/15] net/mlx5: Refactor multi chains and prios support saeed
  2020-09-23 22:48 ` [net-next V2 02/15] net/mlx5: Allow ft level ignore for nic rx tables saeed
@ 2020-09-23 22:48 ` saeed
  2020-09-23 22:48 ` [net-next V2 04/15] net/mlx5e: Split nic tc flow allocation and creation saeed
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: saeed @ 2020-09-23 22:48 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Ariel Levkovich, Roi Dayan, Saeed Mahameed, Saeed Mahameed

From: Ariel Levkovich <lariel@mellanox.com>

Change nic tc flows offload path to use the chains and prios
infrastructure for the flow table creation as a preparation to
support tc multi chains and priorities for nic flows.

Adding an instance of the table chaining database to the nic tc struct
and perform the root table creation and desctuction via the chains api
while keeping the limit of a single chain (0) in nic tc mode.
This will be extendable to supporting multiple chains in the following
patches.

The flow table sizes and default miss table parameters that are provided
to the chains creation api are kept the same.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/fs.h   |  5 +-
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 86 ++++++++++++-------
 2 files changed, 60 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
index 6fdcd5e69476..ef3c9a165b1d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
@@ -12,9 +12,12 @@ enum {
 };
 
 struct mlx5e_tc_table {
-	/* protects flow table */
+	/* Protects the dynamic assignment of the t parameter
+	 * which is the nic tc root table.
+	 */
 	struct mutex			t_lock;
 	struct mlx5_flow_table		*t;
+	struct mlx5_fs_chains           *chains;
 
 	struct rhashtable               ht;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 557769c16393..0cc81f8d2f5e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -68,6 +68,7 @@
 #include "lib/fs_chains.h"
 #include "diag/en_tc_tracepoint.h"
 
+#define nic_chains(priv) ((priv)->fs.tc.chains)
 #define MLX5_MH_ACT_SZ MLX5_UN_SZ_BYTES(set_add_copy_action_in_auto)
 
 struct mlx5_nic_flow_attr {
@@ -170,7 +171,7 @@ struct mlx5e_tc_flow_parse_attr {
 };
 
 #define MLX5E_TC_TABLE_NUM_GROUPS 4
-#define MLX5E_TC_TABLE_MAX_GROUP_SIZE BIT(16)
+#define MLX5E_TC_TABLE_MAX_GROUP_SIZE BIT(18)
 
 struct mlx5e_tc_attr_to_reg_mapping mlx5e_tc_attr_to_reg_mappings[] = {
 	[CHAIN_TO_REG] = {
@@ -898,6 +899,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 {
 	struct mlx5_flow_context *flow_context = &parse_attr->spec.flow_context;
 	struct mlx5_nic_flow_attr *attr = flow->nic_attr;
+	struct mlx5e_tc_table *tc = &priv->fs.tc;
 	struct mlx5_core_dev *dev = priv->mdev;
 	struct mlx5_flow_destination dest[2] = {};
 	struct mlx5_flow_act flow_act = {
@@ -948,35 +950,19 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 			return err;
 	}
 
-	mutex_lock(&priv->fs.tc.t_lock);
-	if (IS_ERR_OR_NULL(priv->fs.tc.t)) {
-		struct mlx5_flow_table_attr ft_attr = {};
-		int tc_grp_size, tc_tbl_size, tc_num_grps;
-		u32 max_flow_counter;
-
-		max_flow_counter = (MLX5_CAP_GEN(dev, max_flow_counter_31_16) << 16) |
-				    MLX5_CAP_GEN(dev, max_flow_counter_15_0);
-
-		tc_grp_size = min_t(int, max_flow_counter, MLX5E_TC_TABLE_MAX_GROUP_SIZE);
-
-		tc_tbl_size = min_t(int, tc_grp_size * MLX5E_TC_TABLE_NUM_GROUPS,
-				    BIT(MLX5_CAP_FLOWTABLE_NIC_RX(dev, log_max_ft_size)));
-		tc_num_grps = MLX5E_TC_TABLE_NUM_GROUPS;
-
-		ft_attr.prio = MLX5E_TC_PRIO;
-		ft_attr.max_fte = tc_tbl_size;
-		ft_attr.level = MLX5E_TC_FT_LEVEL;
-		ft_attr.autogroup.max_num_groups = tc_num_grps;
-		priv->fs.tc.t =
-			mlx5_create_auto_grouped_flow_table(priv->fs.ns,
-							    &ft_attr);
-		if (IS_ERR(priv->fs.tc.t)) {
-			mutex_unlock(&priv->fs.tc.t_lock);
+	mutex_lock(&tc->t_lock);
+	if (IS_ERR_OR_NULL(tc->t)) {
+		/* Create the root table here if doesn't exist yet */
+		tc->t =
+			mlx5_chains_get_table(nic_chains(priv), 0, 1, MLX5E_TC_FT_LEVEL);
+
+		if (IS_ERR(tc->t)) {
+			mutex_unlock(&tc->t_lock);
 			NL_SET_ERR_MSG_MOD(extack,
 					   "Failed to create tc offload table");
 			netdev_err(priv->netdev,
 				   "Failed to create tc offload table\n");
-			return PTR_ERR(priv->fs.tc.t);
+			return PTR_ERR(tc->t);
 		}
 	}
 
@@ -994,6 +980,7 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
 				  struct mlx5e_tc_flow *flow)
 {
 	struct mlx5_nic_flow_attr *attr = flow->nic_attr;
+	struct mlx5e_tc_table *tc = &priv->fs.tc;
 	struct mlx5_fc *counter = NULL;
 
 	counter = attr->counter;
@@ -1002,8 +989,9 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
 	mlx5_fc_destroy(priv->mdev, counter);
 
 	mutex_lock(&priv->fs.tc.t_lock);
-	if (!mlx5e_tc_num_filters(priv, MLX5_TC_FLAG(NIC_OFFLOAD)) && priv->fs.tc.t) {
-		mlx5_destroy_flow_table(priv->fs.tc.t);
+	if (!mlx5e_tc_num_filters(priv, MLX5_TC_FLAG(NIC_OFFLOAD)) &&
+	    !IS_ERR_OR_NULL(tc->t)) {
+		mlx5_chains_put_table(nic_chains(priv), 0, 1, MLX5E_TC_FT_LEVEL);
 		priv->fs.tc.t = NULL;
 	}
 	mutex_unlock(&priv->fs.tc.t_lock);
@@ -4951,9 +4939,27 @@ static int mlx5e_tc_netdev_event(struct notifier_block *this,
 	return NOTIFY_DONE;
 }
 
+static int mlx5e_tc_nic_get_ft_size(struct mlx5_core_dev *dev)
+{
+	int tc_grp_size, tc_tbl_size;
+	u32 max_flow_counter;
+
+	max_flow_counter = (MLX5_CAP_GEN(dev, max_flow_counter_31_16) << 16) |
+			    MLX5_CAP_GEN(dev, max_flow_counter_15_0);
+
+	tc_grp_size = min_t(int, max_flow_counter, MLX5E_TC_TABLE_MAX_GROUP_SIZE);
+
+	tc_tbl_size = min_t(int, tc_grp_size * MLX5E_TC_TABLE_NUM_GROUPS,
+			    BIT(MLX5_CAP_FLOWTABLE_NIC_RX(dev, log_max_ft_size)));
+
+	return tc_tbl_size;
+}
+
 int mlx5e_tc_nic_init(struct mlx5e_priv *priv)
 {
 	struct mlx5e_tc_table *tc = &priv->fs.tc;
+	struct mlx5_core_dev *dev = priv->mdev;
+	struct mlx5_chains_attr attr = {};
 	int err;
 
 	mlx5e_mod_hdr_tbl_init(&tc->mod_hdr);
@@ -4965,6 +4971,17 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv)
 	if (err)
 		return err;
 
+	attr.ns = MLX5_FLOW_NAMESPACE_KERNEL;
+	attr.max_ft_sz = mlx5e_tc_nic_get_ft_size(dev);
+	attr.max_grp_num = MLX5E_TC_TABLE_NUM_GROUPS;
+	attr.default_ft = priv->fs.vlan.ft.t;
+
+	tc->chains = mlx5_chains_create(dev, &attr);
+	if (IS_ERR(tc->chains)) {
+		err = PTR_ERR(tc->chains);
+		goto err_chains;
+	}
+
 	tc->netdevice_nb.notifier_call = mlx5e_tc_netdev_event;
 	err = register_netdevice_notifier_dev_net(priv->netdev,
 						  &tc->netdevice_nb,
@@ -4972,8 +4989,15 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv)
 	if (err) {
 		tc->netdevice_nb.notifier_call = NULL;
 		mlx5_core_warn(priv->mdev, "Failed to register netdev notifier\n");
+		goto err_reg;
 	}
 
+	return 0;
+
+err_reg:
+	mlx5_chains_destroy(tc->chains);
+err_chains:
+	rhashtable_destroy(&tc->ht);
 	return err;
 }
 
@@ -4998,13 +5022,15 @@ void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv)
 	mlx5e_mod_hdr_tbl_destroy(&tc->mod_hdr);
 	mutex_destroy(&tc->hairpin_tbl_lock);
 
-	rhashtable_destroy(&tc->ht);
+	rhashtable_free_and_destroy(&tc->ht, _mlx5e_tc_del_flow, NULL);
 
 	if (!IS_ERR_OR_NULL(tc->t)) {
-		mlx5_destroy_flow_table(tc->t);
+		mlx5_chains_put_table(tc->chains, 0, 1, MLX5E_TC_FT_LEVEL);
 		tc->t = NULL;
 	}
 	mutex_destroy(&tc->t_lock);
+
+	mlx5_chains_destroy(tc->chains);
 }
 
 int mlx5e_tc_esw_init(struct rhashtable *tc_ht)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next V2 04/15] net/mlx5e: Split nic tc flow allocation and creation
  2020-09-23 22:48 [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode saeed
                   ` (2 preceding siblings ...)
  2020-09-23 22:48 ` [net-next V2 03/15] net/mlx5e: Tc nic flows to use mlx5_chains flow tables saeed
@ 2020-09-23 22:48 ` saeed
  2020-09-23 22:48 ` [net-next V2 05/15] net/mlx5: Refactor tc flow attributes structure saeed
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: saeed @ 2020-09-23 22:48 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Ariel Levkovich, Roi Dayan, Saeed Mahameed, Saeed Mahameed

From: Ariel Levkovich <lariel@mellanox.com>

For future support of CT offload with nic tc flows, where
the flow rule is not created immediately but rather following
a future event, the patch is splitting the nic rule creation
and deletion into 2 parts:
1. Creating/Deleting and setting the rule attributes.
2. Creating/Deleting the flow table and flow rule itself.

This way the attributes can be prepared and stored in the
flow handle when the tc flow is created but the rule can
actually be created at any point in the future, using these
pre allocated attributes.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 116 +++++++++++-------
 .../net/ethernet/mellanox/mlx5/core/en_tc.h   |   7 ++
 2 files changed, 77 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 0cc81f8d2f5e..4b810ad9d6d6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -891,39 +891,31 @@ static void mlx5e_hairpin_flow_del(struct mlx5e_priv *priv,
 	flow->hpe = NULL;
 }
 
-static int
-mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
-		      struct mlx5e_tc_flow_parse_attr *parse_attr,
-		      struct mlx5e_tc_flow *flow,
-		      struct netlink_ext_ack *extack)
+struct mlx5_flow_handle *
+mlx5e_add_offloaded_nic_rule(struct mlx5e_priv *priv,
+			     struct mlx5_flow_spec *spec,
+			     struct mlx5_nic_flow_attr *attr)
 {
-	struct mlx5_flow_context *flow_context = &parse_attr->spec.flow_context;
-	struct mlx5_nic_flow_attr *attr = flow->nic_attr;
+	struct mlx5_flow_context *flow_context = &spec->flow_context;
 	struct mlx5e_tc_table *tc = &priv->fs.tc;
-	struct mlx5_core_dev *dev = priv->mdev;
 	struct mlx5_flow_destination dest[2] = {};
 	struct mlx5_flow_act flow_act = {
 		.action = attr->action,
 		.flags    = FLOW_ACT_NO_APPEND,
 	};
-	struct mlx5_fc *counter = NULL;
-	int err, dest_ix = 0;
+	struct mlx5_flow_handle *rule;
+	int dest_ix = 0;
 
 	flow_context->flags |= FLOW_CONTEXT_HAS_TAG;
 	flow_context->flow_tag = attr->flow_tag;
 
-	if (flow_flag_test(flow, HAIRPIN)) {
-		err = mlx5e_hairpin_flow_add(priv, flow, parse_attr, extack);
-		if (err)
-			return err;
-
-		if (flow_flag_test(flow, HAIRPIN_RSS)) {
-			dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
-			dest[dest_ix].ft = attr->hairpin_ft;
-		} else {
-			dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_TIR;
-			dest[dest_ix].tir_num = attr->hairpin_tirn;
-		}
+	if (attr->hairpin_ft) {
+		dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
+		dest[dest_ix].ft = attr->hairpin_ft;
+		dest_ix++;
+	} else if (attr->hairpin_tirn) {
+		dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_TIR;
+		dest[dest_ix].tir_num = attr->hairpin_tirn;
 		dest_ix++;
 	} else if (attr->action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
 		dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
@@ -931,24 +923,14 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 		dest_ix++;
 	}
 
-	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
-		counter = mlx5_fc_create(dev, true);
-		if (IS_ERR(counter))
-			return PTR_ERR(counter);
-
+	if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
 		dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
-		dest[dest_ix].counter_id = mlx5_fc_id(counter);
+		dest[dest_ix].counter_id = mlx5_fc_id(attr->counter);
 		dest_ix++;
-		attr->counter = counter;
 	}
 
-	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) {
-		err = mlx5e_attach_mod_hdr(priv, flow, parse_attr);
+	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
 		flow_act.modify_hdr = attr->modify_hdr;
-		dealloc_mod_hdr_actions(&parse_attr->mod_hdr_acts);
-		if (err)
-			return err;
-	}
 
 	mutex_lock(&tc->t_lock);
 	if (IS_ERR_OR_NULL(tc->t)) {
@@ -958,35 +940,77 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 
 		if (IS_ERR(tc->t)) {
 			mutex_unlock(&tc->t_lock);
-			NL_SET_ERR_MSG_MOD(extack,
-					   "Failed to create tc offload table");
 			netdev_err(priv->netdev,
 				   "Failed to create tc offload table\n");
-			return PTR_ERR(tc->t);
+			return ERR_CAST(tc->t);
 		}
 	}
+	mutex_unlock(&tc->t_lock);
 
 	if (attr->match_level != MLX5_MATCH_NONE)
-		parse_attr->spec.match_criteria_enable |= MLX5_MATCH_OUTER_HEADERS;
+		spec->match_criteria_enable |= MLX5_MATCH_OUTER_HEADERS;
 
-	flow->rule[0] = mlx5_add_flow_rules(priv->fs.tc.t, &parse_attr->spec,
-					    &flow_act, dest, dest_ix);
-	mutex_unlock(&priv->fs.tc.t_lock);
+	rule = mlx5_add_flow_rules(tc->t, spec,
+				   &flow_act, dest, dest_ix);
+	if (IS_ERR(rule))
+		return ERR_CAST(rule);
+
+	return rule;
+}
+
+static int
+mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
+		      struct mlx5e_tc_flow_parse_attr *parse_attr,
+		      struct mlx5e_tc_flow *flow,
+		      struct netlink_ext_ack *extack)
+{
+	struct mlx5_nic_flow_attr *attr = flow->nic_attr;
+	struct mlx5_core_dev *dev = priv->mdev;
+	struct mlx5_fc *counter = NULL;
+	int err;
+
+	if (flow_flag_test(flow, HAIRPIN)) {
+		err = mlx5e_hairpin_flow_add(priv, flow, parse_attr, extack);
+		if (err)
+			return err;
+	}
+
+	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
+		counter = mlx5_fc_create(dev, true);
+		if (IS_ERR(counter))
+			return PTR_ERR(counter);
+
+		attr->counter = counter;
+	}
+
+	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) {
+		err = mlx5e_attach_mod_hdr(priv, flow, parse_attr);
+		dealloc_mod_hdr_actions(&parse_attr->mod_hdr_acts);
+		if (err)
+			return err;
+	}
+
+	flow->rule[0] = mlx5e_add_offloaded_nic_rule(priv, &parse_attr->spec,
+						     attr);
 
 	return PTR_ERR_OR_ZERO(flow->rule[0]);
 }
 
+void mlx5e_del_offloaded_nic_rule(struct mlx5e_priv *priv,
+				  struct mlx5_flow_handle *rule)
+{
+	mlx5_del_flow_rules(rule);
+}
+
 static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
 				  struct mlx5e_tc_flow *flow)
 {
 	struct mlx5_nic_flow_attr *attr = flow->nic_attr;
 	struct mlx5e_tc_table *tc = &priv->fs.tc;
-	struct mlx5_fc *counter = NULL;
 
-	counter = attr->counter;
 	if (!IS_ERR_OR_NULL(flow->rule[0]))
-		mlx5_del_flow_rules(flow->rule[0]);
-	mlx5_fc_destroy(priv->mdev, counter);
+		mlx5e_del_offloaded_nic_rule(priv, flow->rule[0]);
+	mlx5_fc_destroy(priv->mdev, attr->counter);
 
 	mutex_lock(&priv->fs.tc.t_lock);
 	if (!mlx5e_tc_num_filters(priv, MLX5_TC_FLAG(NIC_OFFLOAD)) &&
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
index 437f680728fd..2d63a75a9326 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -181,6 +181,13 @@ void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv);
 int mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
 			    void *cb_priv);
 
+struct mlx5_nic_flow_attr;
+struct mlx5_flow_handle *
+mlx5e_add_offloaded_nic_rule(struct mlx5e_priv *priv,
+			     struct mlx5_flow_spec *spec,
+			     struct mlx5_nic_flow_attr *attr);
+void mlx5e_del_offloaded_nic_rule(struct mlx5e_priv *priv,
+				  struct mlx5_flow_handle *rule);
 #else /* CONFIG_MLX5_CLS_ACT */
 static inline int  mlx5e_tc_nic_init(struct mlx5e_priv *priv) { return 0; }
 static inline void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv) {}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next V2 05/15] net/mlx5: Refactor tc flow attributes structure
  2020-09-23 22:48 [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode saeed
                   ` (3 preceding siblings ...)
  2020-09-23 22:48 ` [net-next V2 04/15] net/mlx5e: Split nic tc flow allocation and creation saeed
@ 2020-09-23 22:48 ` saeed
  2020-09-23 22:48 ` [net-next V2 06/15] net/mlx5e: Add tc chains offload support for nic flows saeed
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: saeed @ 2020-09-23 22:48 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Ariel Levkovich, Roi Dayan, Vlad Buslov, Saeed Mahameed,
	Saeed Mahameed

From: Ariel Levkovich <lariel@mellanox.com>

In order to support chains and connection tracking offload for
nic flows, there's a need to introduce a common flow attributes
struct so that these features can be agnostic and have access to
a single attributes struct, regardless of the flow type.

Therefore, a new tc flow attributes format is introduced to allow
access to attributes that are common to eswitch and nic flows.

The common attributes will always get allocated for the new flows,
regardless of their type, while the type specific attributes are
separated into different structs and will be allocated based on the
flow type to avoid memory waste.

When allocating the flow attributes the caller provides the flow
steering namespace and according the namespace type the additional
space for the extra, type specific, attributes is determined and
added to the total attribute allocation size.

In addition, the attributes that are going to be common to both
flow types are moved to the common attributes struct.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en/tc_ct.c    |  99 ++++--
 .../ethernet/mellanox/mlx5/core/en/tc_ct.h    |  14 +-
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 304 ++++++++++--------
 .../net/ethernet/mellanox/mlx5/core/en_tc.h   |  36 ++-
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  27 +-
 .../mellanox/mlx5/core/eswitch_offloads.c     | 150 +++++----
 .../mlx5/core/eswitch_offloads_termtbl.c      |   8 +-
 7 files changed, 376 insertions(+), 262 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index 579f888c22ab..9509f8674e5a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -55,8 +55,8 @@ struct mlx5_tc_ct_priv {
 };
 
 struct mlx5_ct_flow {
-	struct mlx5_esw_flow_attr pre_ct_attr;
-	struct mlx5_esw_flow_attr post_ct_attr;
+	struct mlx5_flow_attr *pre_ct_attr;
+	struct mlx5_flow_attr *post_ct_attr;
 	struct mlx5_flow_handle *pre_ct_rule;
 	struct mlx5_flow_handle *post_ct_rule;
 	struct mlx5_ct_ft *ft;
@@ -67,7 +67,7 @@ struct mlx5_ct_flow {
 struct mlx5_ct_zone_rule {
 	struct mlx5_flow_handle *rule;
 	struct mlx5e_mod_hdr_handle *mh;
-	struct mlx5_esw_flow_attr attr;
+	struct mlx5_flow_attr *attr;
 	bool nat;
 };
 
@@ -400,7 +400,7 @@ mlx5_tc_ct_entry_del_rule(struct mlx5_tc_ct_priv *ct_priv,
 			  bool nat)
 {
 	struct mlx5_ct_zone_rule *zone_rule = &entry->zone_rules[nat];
-	struct mlx5_esw_flow_attr *attr = &zone_rule->attr;
+	struct mlx5_flow_attr *attr = zone_rule->attr;
 	struct mlx5_eswitch *esw = ct_priv->esw;
 
 	ct_dbg("Deleting ct entry rule in zone %d", entry->tuple.zone);
@@ -409,6 +409,7 @@ mlx5_tc_ct_entry_del_rule(struct mlx5_tc_ct_priv *ct_priv,
 	mlx5e_mod_hdr_detach(ct_priv->esw->dev,
 			     &esw->offloads.mod_hdr, zone_rule->mh);
 	mapping_remove(ct_priv->labels_mapping, attr->ct_attr.ct_labels_id);
+	kfree(attr);
 }
 
 static void
@@ -588,7 +589,7 @@ mlx5_tc_ct_entry_create_nat(struct mlx5_tc_ct_priv *ct_priv,
 
 static int
 mlx5_tc_ct_entry_create_mod_hdr(struct mlx5_tc_ct_priv *ct_priv,
-				struct mlx5_esw_flow_attr *attr,
+				struct mlx5_flow_attr *attr,
 				struct flow_rule *flow_rule,
 				struct mlx5e_mod_hdr_handle **mh,
 				u8 zone_restore_id, bool nat)
@@ -650,9 +651,9 @@ mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv,
 			  bool nat, u8 zone_restore_id)
 {
 	struct mlx5_ct_zone_rule *zone_rule = &entry->zone_rules[nat];
-	struct mlx5_esw_flow_attr *attr = &zone_rule->attr;
 	struct mlx5_eswitch *esw = ct_priv->esw;
 	struct mlx5_flow_spec *spec = NULL;
+	struct mlx5_flow_attr *attr;
 	int err;
 
 	zone_rule->nat = nat;
@@ -661,6 +662,12 @@ mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv,
 	if (!spec)
 		return -ENOMEM;
 
+	attr = mlx5_alloc_flow_attr(MLX5_FLOW_NAMESPACE_FDB);
+	if (!attr) {
+		err = -ENOMEM;
+		goto err_attr;
+	}
+
 	err = mlx5_tc_ct_entry_create_mod_hdr(ct_priv, attr, flow_rule,
 					      &zone_rule->mh,
 					      zone_restore_id, nat);
@@ -674,7 +681,7 @@ mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv,
 		       MLX5_FLOW_CONTEXT_ACTION_COUNT;
 	attr->dest_chain = 0;
 	attr->dest_ft = ct_priv->post_ct;
-	attr->fdb = nat ? ct_priv->ct_nat : ct_priv->ct;
+	attr->ft = nat ? ct_priv->ct_nat : ct_priv->ct;
 	attr->outer_match_level = MLX5_MATCH_L4;
 	attr->counter = entry->counter;
 	attr->flags |= MLX5_ESW_ATTR_FLAG_NO_IN_PORT;
@@ -691,6 +698,8 @@ mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv,
 		goto err_rule;
 	}
 
+	zone_rule->attr = attr;
+
 	kfree(spec);
 	ct_dbg("Offloaded ct entry rule in zone %d", entry->tuple.zone);
 
@@ -701,6 +710,8 @@ mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv,
 			     &esw->offloads.mod_hdr, zone_rule->mh);
 	mapping_remove(ct_priv->labels_mapping, attr->ct_attr.ct_labels_id);
 err_mod_hdr:
+	kfree(attr);
+err_attr:
 	kfree(spec);
 	return err;
 }
@@ -1056,7 +1067,7 @@ mlx5_tc_ct_match_add(struct mlx5e_priv *priv,
 
 int
 mlx5_tc_ct_parse_action(struct mlx5e_priv *priv,
-			struct mlx5_esw_flow_attr *attr,
+			struct mlx5_flow_attr *attr,
 			const struct flow_action_entry *act,
 			struct netlink_ext_ack *extack)
 {
@@ -1429,14 +1440,14 @@ static struct mlx5_flow_handle *
 __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 			  struct mlx5e_tc_flow *flow,
 			  struct mlx5_flow_spec *orig_spec,
-			  struct mlx5_esw_flow_attr *attr)
+			  struct mlx5_flow_attr *attr)
 {
 	struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv);
 	bool nat = attr->ct_attr.ct_action & TCA_CT_ACT_NAT;
 	struct mlx5e_tc_mod_hdr_acts pre_mod_acts = {};
 	struct mlx5_flow_spec *post_ct_spec = NULL;
 	struct mlx5_eswitch *esw = ct_priv->esw;
-	struct mlx5_esw_flow_attr *pre_ct_attr;
+	struct mlx5_flow_attr *pre_ct_attr;
 	struct mlx5_modify_hdr *mod_hdr;
 	struct mlx5_flow_handle *rule;
 	struct mlx5_ct_flow *ct_flow;
@@ -1471,10 +1482,22 @@ __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 	}
 	ct_flow->fte_id = fte_id;
 
-	/* Base esw attributes of both rules on original rule attribute */
-	pre_ct_attr = &ct_flow->pre_ct_attr;
-	memcpy(pre_ct_attr, attr, sizeof(*attr));
-	memcpy(&ct_flow->post_ct_attr, attr, sizeof(*attr));
+	/* Base flow attributes of both rules on original rule attribute */
+	ct_flow->pre_ct_attr = mlx5_alloc_flow_attr(MLX5_FLOW_NAMESPACE_FDB);
+	if (!ct_flow->pre_ct_attr) {
+		err = -ENOMEM;
+		goto err_alloc_pre;
+	}
+
+	ct_flow->post_ct_attr = mlx5_alloc_flow_attr(MLX5_FLOW_NAMESPACE_FDB);
+	if (!ct_flow->post_ct_attr) {
+		err = -ENOMEM;
+		goto err_alloc_post;
+	}
+
+	pre_ct_attr = ct_flow->pre_ct_attr;
+	memcpy(pre_ct_attr, attr, ESW_FLOW_ATTR_SZ);
+	memcpy(ct_flow->post_ct_attr, attr, ESW_FLOW_ATTR_SZ);
 
 	/* Modify the original rule's action to fwd and modify, leave decap */
 	pre_ct_attr->action = attr->action & MLX5_FLOW_CONTEXT_ACTION_DECAP;
@@ -1541,15 +1564,15 @@ __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 				    fte_id, MLX5_FTE_ID_MASK);
 
 	/* Put post_ct rule on post_ct fdb */
-	ct_flow->post_ct_attr.chain = 0;
-	ct_flow->post_ct_attr.prio = 0;
-	ct_flow->post_ct_attr.fdb = ct_priv->post_ct;
+	ct_flow->post_ct_attr->chain = 0;
+	ct_flow->post_ct_attr->prio = 0;
+	ct_flow->post_ct_attr->ft = ct_priv->post_ct;
 
-	ct_flow->post_ct_attr.inner_match_level = MLX5_MATCH_NONE;
-	ct_flow->post_ct_attr.outer_match_level = MLX5_MATCH_NONE;
-	ct_flow->post_ct_attr.action &= ~(MLX5_FLOW_CONTEXT_ACTION_DECAP);
+	ct_flow->post_ct_attr->inner_match_level = MLX5_MATCH_NONE;
+	ct_flow->post_ct_attr->outer_match_level = MLX5_MATCH_NONE;
+	ct_flow->post_ct_attr->action &= ~(MLX5_FLOW_CONTEXT_ACTION_DECAP);
 	rule = mlx5_eswitch_add_offloaded_rule(esw, post_ct_spec,
-					       &ct_flow->post_ct_attr);
+					       ct_flow->post_ct_attr);
 	ct_flow->post_ct_rule = rule;
 	if (IS_ERR(ct_flow->post_ct_rule)) {
 		err = PTR_ERR(ct_flow->post_ct_rule);
@@ -1577,13 +1600,17 @@ __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 
 err_insert_orig:
 	mlx5_eswitch_del_offloaded_rule(ct_priv->esw, ct_flow->post_ct_rule,
-					&ct_flow->post_ct_attr);
+					ct_flow->post_ct_attr);
 err_insert_post_ct:
 	mlx5_modify_header_dealloc(priv->mdev, pre_ct_attr->modify_hdr);
 err_mapping:
 	dealloc_mod_hdr_actions(&pre_mod_acts);
 	mlx5_chains_put_chain_mapping(esw_chains(esw), ct_flow->chain_mapping);
 err_get_chain:
+	kfree(ct_flow->post_ct_attr);
+err_alloc_post:
+	kfree(ct_flow->pre_ct_attr);
+err_alloc_pre:
 	idr_remove(&ct_priv->fte_ids, fte_id);
 err_idr:
 	mlx5_tc_ct_del_ft_cb(ct_priv, ft);
@@ -1597,12 +1624,12 @@ __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 static struct mlx5_flow_handle *
 __mlx5_tc_ct_flow_offload_clear(struct mlx5e_priv *priv,
 				struct mlx5_flow_spec *orig_spec,
-				struct mlx5_esw_flow_attr *attr,
+				struct mlx5_flow_attr *attr,
 				struct mlx5e_tc_mod_hdr_acts *mod_acts)
 {
 	struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv);
 	struct mlx5_eswitch *esw = ct_priv->esw;
-	struct mlx5_esw_flow_attr *pre_ct_attr;
+	struct mlx5_flow_attr *pre_ct_attr;
 	struct mlx5_modify_hdr *mod_hdr;
 	struct mlx5_flow_handle *rule;
 	struct mlx5_ct_flow *ct_flow;
@@ -1613,8 +1640,13 @@ __mlx5_tc_ct_flow_offload_clear(struct mlx5e_priv *priv,
 		return ERR_PTR(-ENOMEM);
 
 	/* Base esw attributes on original rule attribute */
-	pre_ct_attr = &ct_flow->pre_ct_attr;
-	memcpy(pre_ct_attr, attr, sizeof(*attr));
+	pre_ct_attr = mlx5_alloc_flow_attr(MLX5_FLOW_NAMESPACE_FDB);
+	if (!pre_ct_attr) {
+		err = -ENOMEM;
+		goto err_attr;
+	}
+
+	memcpy(pre_ct_attr, attr, ESW_FLOW_ATTR_SZ);
 
 	err = mlx5_tc_ct_entry_set_registers(ct_priv, mod_acts, 0, 0, 0, 0);
 	if (err) {
@@ -1644,6 +1676,7 @@ __mlx5_tc_ct_flow_offload_clear(struct mlx5e_priv *priv,
 	}
 
 	attr->ct_attr.ct_flow = ct_flow;
+	ct_flow->pre_ct_attr = pre_ct_attr;
 	ct_flow->pre_ct_rule = rule;
 	return rule;
 
@@ -1652,6 +1685,10 @@ __mlx5_tc_ct_flow_offload_clear(struct mlx5e_priv *priv,
 err_set_registers:
 	netdev_warn(priv->netdev,
 		    "Failed to offload ct clear flow, err %d\n", err);
+	kfree(pre_ct_attr);
+err_attr:
+	kfree(ct_flow);
+
 	return ERR_PTR(err);
 }
 
@@ -1659,7 +1696,7 @@ struct mlx5_flow_handle *
 mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 			struct mlx5e_tc_flow *flow,
 			struct mlx5_flow_spec *spec,
-			struct mlx5_esw_flow_attr *attr,
+			struct mlx5_flow_attr *attr,
 			struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts)
 {
 	bool clear_action = attr->ct_attr.ct_action & TCA_CT_ACT_CLEAR;
@@ -1684,7 +1721,7 @@ static void
 __mlx5_tc_ct_delete_flow(struct mlx5_tc_ct_priv *ct_priv,
 			 struct mlx5_ct_flow *ct_flow)
 {
-	struct mlx5_esw_flow_attr *pre_ct_attr = &ct_flow->pre_ct_attr;
+	struct mlx5_flow_attr *pre_ct_attr = ct_flow->pre_ct_attr;
 	struct mlx5_eswitch *esw = ct_priv->esw;
 
 	mlx5_eswitch_del_offloaded_rule(esw, ct_flow->pre_ct_rule,
@@ -1693,18 +1730,20 @@ __mlx5_tc_ct_delete_flow(struct mlx5_tc_ct_priv *ct_priv,
 
 	if (ct_flow->post_ct_rule) {
 		mlx5_eswitch_del_offloaded_rule(esw, ct_flow->post_ct_rule,
-						&ct_flow->post_ct_attr);
+						ct_flow->post_ct_attr);
 		mlx5_chains_put_chain_mapping(esw_chains(esw), ct_flow->chain_mapping);
 		idr_remove(&ct_priv->fte_ids, ct_flow->fte_id);
 		mlx5_tc_ct_del_ft_cb(ct_priv, ct_flow->ft);
 	}
 
+	kfree(ct_flow->pre_ct_attr);
+	kfree(ct_flow->post_ct_attr);
 	kfree(ct_flow);
 }
 
 void
 mlx5_tc_ct_delete_flow(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow,
-		       struct mlx5_esw_flow_attr *attr)
+		       struct mlx5_flow_attr *attr)
 {
 	struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv);
 	struct mlx5_ct_flow *ct_flow = attr->ct_attr.ct_flow;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
index 708c216325d3..2bfe930faa3b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
@@ -10,7 +10,7 @@
 
 #include "en.h"
 
-struct mlx5_esw_flow_attr;
+struct mlx5_flow_attr;
 struct mlx5e_tc_mod_hdr_acts;
 struct mlx5_rep_uplink_priv;
 struct mlx5e_tc_flow;
@@ -101,7 +101,7 @@ mlx5_tc_ct_add_no_trk_match(struct mlx5e_priv *priv,
 			    struct mlx5_flow_spec *spec);
 int
 mlx5_tc_ct_parse_action(struct mlx5e_priv *priv,
-			struct mlx5_esw_flow_attr *attr,
+			struct mlx5_flow_attr *attr,
 			const struct flow_action_entry *act,
 			struct netlink_ext_ack *extack);
 
@@ -109,12 +109,12 @@ struct mlx5_flow_handle *
 mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 			struct mlx5e_tc_flow *flow,
 			struct mlx5_flow_spec *spec,
-			struct mlx5_esw_flow_attr *attr,
+			struct mlx5_flow_attr *attr,
 			struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts);
 void
 mlx5_tc_ct_delete_flow(struct mlx5e_priv *priv,
 		       struct mlx5e_tc_flow *flow,
-		       struct mlx5_esw_flow_attr *attr);
+		       struct mlx5_flow_attr *attr);
 
 bool
 mlx5e_tc_ct_restore_flow(struct mlx5_rep_uplink_priv *uplink_priv,
@@ -162,7 +162,7 @@ mlx5_tc_ct_add_no_trk_match(struct mlx5e_priv *priv,
 
 static inline int
 mlx5_tc_ct_parse_action(struct mlx5e_priv *priv,
-			struct mlx5_esw_flow_attr *attr,
+			struct mlx5_flow_attr *attr,
 			const struct flow_action_entry *act,
 			struct netlink_ext_ack *extack)
 {
@@ -175,7 +175,7 @@ static inline struct mlx5_flow_handle *
 mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 			struct mlx5e_tc_flow *flow,
 			struct mlx5_flow_spec *spec,
-			struct mlx5_esw_flow_attr *attr,
+			struct mlx5_flow_attr *attr,
 			struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts)
 {
 	return ERR_PTR(-EOPNOTSUPP);
@@ -184,7 +184,7 @@ mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 static inline void
 mlx5_tc_ct_delete_flow(struct mlx5e_priv *priv,
 		       struct mlx5e_tc_flow *flow,
-		       struct mlx5_esw_flow_attr *attr)
+		       struct mlx5_flow_attr *attr)
 {
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 4b810ad9d6d6..a54821107566 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -70,17 +70,6 @@
 
 #define nic_chains(priv) ((priv)->fs.tc.chains)
 #define MLX5_MH_ACT_SZ MLX5_UN_SZ_BYTES(set_add_copy_action_in_auto)
-
-struct mlx5_nic_flow_attr {
-	u32 action;
-	u32 flow_tag;
-	struct mlx5_modify_hdr *modify_hdr;
-	u32 hairpin_tirn;
-	u8 match_level;
-	struct mlx5_flow_table	*hairpin_ft;
-	struct mlx5_fc		*counter;
-};
-
 #define MLX5E_TC_FLOW_BASE (MLX5E_TC_FLAG_LAST_EXPORTED_BIT + 1)
 
 enum {
@@ -154,11 +143,7 @@ struct mlx5e_tc_flow {
 	struct rcu_head		rcu_head;
 	struct completion	init_done;
 	int tunnel_id; /* the mapped tunnel id of this flow */
-
-	union {
-		struct mlx5_esw_flow_attr esw_attr[0];
-		struct mlx5_nic_flow_attr nic_attr[0];
-	};
+	struct mlx5_flow_attr *attr;
 };
 
 struct mlx5e_tc_flow_parse_attr {
@@ -416,10 +401,7 @@ static int mlx5e_attach_mod_hdr(struct mlx5e_priv *priv,
 		return PTR_ERR(mh);
 
 	modify_hdr = mlx5e_mod_hdr_get(mh);
-	if (mlx5e_is_eswitch_flow(flow))
-		flow->esw_attr->modify_hdr = modify_hdr;
-	else
-		flow->nic_attr->modify_hdr = modify_hdr;
+	flow->attr->modify_hdr = modify_hdr;
 	flow->mh = mh;
 
 	return 0;
@@ -859,9 +841,9 @@ static int mlx5e_hairpin_flow_add(struct mlx5e_priv *priv,
 attach_flow:
 	if (hpe->hp->num_channels > 1) {
 		flow_flag_set(flow, HAIRPIN_RSS);
-		flow->nic_attr->hairpin_ft = hpe->hp->ttc.ft.t;
+		flow->attr->nic_attr->hairpin_ft = hpe->hp->ttc.ft.t;
 	} else {
-		flow->nic_attr->hairpin_tirn = hpe->hp->tirn;
+		flow->attr->nic_attr->hairpin_tirn = hpe->hp->tirn;
 	}
 
 	flow->hpe = hpe;
@@ -894,9 +876,10 @@ static void mlx5e_hairpin_flow_del(struct mlx5e_priv *priv,
 struct mlx5_flow_handle *
 mlx5e_add_offloaded_nic_rule(struct mlx5e_priv *priv,
 			     struct mlx5_flow_spec *spec,
-			     struct mlx5_nic_flow_attr *attr)
+			     struct mlx5_flow_attr *attr)
 {
 	struct mlx5_flow_context *flow_context = &spec->flow_context;
+	struct mlx5_nic_flow_attr *nic_attr = attr->nic_attr;
 	struct mlx5e_tc_table *tc = &priv->fs.tc;
 	struct mlx5_flow_destination dest[2] = {};
 	struct mlx5_flow_act flow_act = {
@@ -907,15 +890,15 @@ mlx5e_add_offloaded_nic_rule(struct mlx5e_priv *priv,
 	int dest_ix = 0;
 
 	flow_context->flags |= FLOW_CONTEXT_HAS_TAG;
-	flow_context->flow_tag = attr->flow_tag;
+	flow_context->flow_tag = nic_attr->flow_tag;
 
-	if (attr->hairpin_ft) {
+	if (nic_attr->hairpin_ft) {
 		dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
-		dest[dest_ix].ft = attr->hairpin_ft;
+		dest[dest_ix].ft = nic_attr->hairpin_ft;
 		dest_ix++;
-	} else if (attr->hairpin_tirn) {
+	} else if (nic_attr->hairpin_tirn) {
 		dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_TIR;
-		dest[dest_ix].tir_num = attr->hairpin_tirn;
+		dest[dest_ix].tir_num = nic_attr->hairpin_tirn;
 		dest_ix++;
 	} else if (attr->action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
 		dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
@@ -947,7 +930,7 @@ mlx5e_add_offloaded_nic_rule(struct mlx5e_priv *priv,
 	}
 	mutex_unlock(&tc->t_lock);
 
-	if (attr->match_level != MLX5_MATCH_NONE)
+	if (attr->outer_match_level != MLX5_MATCH_NONE)
 		spec->match_criteria_enable |= MLX5_MATCH_OUTER_HEADERS;
 
 	rule = mlx5_add_flow_rules(tc->t, spec,
@@ -964,7 +947,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 		      struct mlx5e_tc_flow *flow,
 		      struct netlink_ext_ack *extack)
 {
-	struct mlx5_nic_flow_attr *attr = flow->nic_attr;
+	struct mlx5_flow_attr *attr = flow->attr;
 	struct mlx5_core_dev *dev = priv->mdev;
 	struct mlx5_fc *counter = NULL;
 	int err;
@@ -1005,7 +988,7 @@ void mlx5e_del_offloaded_nic_rule(struct mlx5e_priv *priv,
 static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
 				  struct mlx5e_tc_flow *flow)
 {
-	struct mlx5_nic_flow_attr *attr = flow->nic_attr;
+	struct mlx5_flow_attr *attr = flow->attr;
 	struct mlx5e_tc_table *tc = &priv->fs.tc;
 
 	if (!IS_ERR_OR_NULL(flow->rule[0]))
@@ -1025,6 +1008,8 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
 
 	if (flow_flag_test(flow, HAIRPIN))
 		mlx5e_hairpin_flow_del(priv, flow);
+
+	kfree(flow->attr);
 }
 
 static void mlx5e_detach_encap(struct mlx5e_priv *priv,
@@ -1047,7 +1032,7 @@ static struct mlx5_flow_handle *
 mlx5e_tc_offload_fdb_rules(struct mlx5_eswitch *esw,
 			   struct mlx5e_tc_flow *flow,
 			   struct mlx5_flow_spec *spec,
-			   struct mlx5_esw_flow_attr *attr)
+			   struct mlx5_flow_attr *attr)
 {
 	struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts;
 	struct mlx5_flow_handle *rule;
@@ -1063,7 +1048,7 @@ mlx5e_tc_offload_fdb_rules(struct mlx5_eswitch *esw,
 	if (IS_ERR(rule))
 		return rule;
 
-	if (attr->split_count) {
+	if (attr->esw_attr->split_count) {
 		flow->rule[1] = mlx5_eswitch_add_fwd_rule(esw, spec, attr);
 		if (IS_ERR(flow->rule[1])) {
 			mlx5_eswitch_del_offloaded_rule(esw, rule, attr);
@@ -1077,7 +1062,7 @@ mlx5e_tc_offload_fdb_rules(struct mlx5_eswitch *esw,
 static void
 mlx5e_tc_unoffload_fdb_rules(struct mlx5_eswitch *esw,
 			     struct mlx5e_tc_flow *flow,
-			     struct mlx5_esw_flow_attr *attr)
+			     struct mlx5_flow_attr *attr)
 {
 	flow_flag_clear(flow, OFFLOADED);
 
@@ -1086,7 +1071,7 @@ mlx5e_tc_unoffload_fdb_rules(struct mlx5_eswitch *esw,
 		return;
 	}
 
-	if (attr->split_count)
+	if (attr->esw_attr->split_count)
 		mlx5_eswitch_del_fwd_rule(esw, flow->rule[1], attr);
 
 	mlx5_eswitch_del_offloaded_rule(esw, flow->rule[0], attr);
@@ -1097,18 +1082,24 @@ mlx5e_tc_offload_to_slow_path(struct mlx5_eswitch *esw,
 			      struct mlx5e_tc_flow *flow,
 			      struct mlx5_flow_spec *spec)
 {
-	struct mlx5_esw_flow_attr slow_attr;
+	struct mlx5_flow_attr *slow_attr;
 	struct mlx5_flow_handle *rule;
 
-	memcpy(&slow_attr, flow->esw_attr, sizeof(slow_attr));
-	slow_attr.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
-	slow_attr.split_count = 0;
-	slow_attr.flags |= MLX5_ESW_ATTR_FLAG_SLOW_PATH;
+	slow_attr = mlx5_alloc_flow_attr(MLX5_FLOW_NAMESPACE_FDB);
+	if (!slow_attr)
+		return ERR_PTR(-ENOMEM);
 
-	rule = mlx5e_tc_offload_fdb_rules(esw, flow, spec, &slow_attr);
+	memcpy(slow_attr, flow->attr, ESW_FLOW_ATTR_SZ);
+	slow_attr->action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+	slow_attr->esw_attr->split_count = 0;
+	slow_attr->flags |= MLX5_ESW_ATTR_FLAG_SLOW_PATH;
+
+	rule = mlx5e_tc_offload_fdb_rules(esw, flow, spec, slow_attr);
 	if (!IS_ERR(rule))
 		flow_flag_set(flow, SLOW);
 
+	kfree(slow_attr);
+
 	return rule;
 }
 
@@ -1116,14 +1107,19 @@ static void
 mlx5e_tc_unoffload_from_slow_path(struct mlx5_eswitch *esw,
 				  struct mlx5e_tc_flow *flow)
 {
-	struct mlx5_esw_flow_attr slow_attr;
+	struct mlx5_flow_attr *slow_attr;
 
-	memcpy(&slow_attr, flow->esw_attr, sizeof(slow_attr));
-	slow_attr.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
-	slow_attr.split_count = 0;
-	slow_attr.flags |= MLX5_ESW_ATTR_FLAG_SLOW_PATH;
-	mlx5e_tc_unoffload_fdb_rules(esw, flow, &slow_attr);
+	slow_attr = mlx5_alloc_flow_attr(MLX5_FLOW_NAMESPACE_FDB);
+	if (!slow_attr)
+		mlx5_core_warn(flow->priv->mdev, "Unable to unoffload slow path rule\n");
+
+	memcpy(slow_attr, flow->attr, ESW_FLOW_ATTR_SZ);
+	slow_attr->action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+	slow_attr->esw_attr->split_count = 0;
+	slow_attr->flags |= MLX5_ESW_ATTR_FLAG_SLOW_PATH;
+	mlx5e_tc_unoffload_fdb_rules(esw, flow, slow_attr);
 	flow_flag_clear(flow, SLOW);
+	kfree(slow_attr);
 }
 
 /* Caller must obtain uplink_priv->unready_flows_lock mutex before calling this
@@ -1181,9 +1177,10 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 		      struct netlink_ext_ack *extack)
 {
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
-	struct mlx5_esw_flow_attr *attr = flow->esw_attr;
-	struct mlx5e_tc_flow_parse_attr *parse_attr = attr->parse_attr;
 	struct net_device *out_dev, *encap_dev = NULL;
+	struct mlx5e_tc_flow_parse_attr *parse_attr;
+	struct mlx5_flow_attr *attr = flow->attr;
+	struct mlx5_esw_flow_attr *esw_attr;
 	struct mlx5_fc *counter = NULL;
 	struct mlx5e_rep_priv *rpriv;
 	struct mlx5e_priv *out_priv;
@@ -1223,10 +1220,13 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 			return err;
 	}
 
+	parse_attr = attr->parse_attr;
+	esw_attr = attr->esw_attr;
+
 	for (out_index = 0; out_index < MLX5_MAX_FLOW_FWD_VPORTS; out_index++) {
 		int mirred_ifindex;
 
-		if (!(attr->dests[out_index].flags & MLX5_ESW_DEST_ENCAP))
+		if (!(esw_attr->dests[out_index].flags & MLX5_ESW_DEST_ENCAP))
 			continue;
 
 		mirred_ifindex = parse_attr->mirred_ifindex[out_index];
@@ -1239,8 +1239,8 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 
 		out_priv = netdev_priv(encap_dev);
 		rpriv = out_priv->ppriv;
-		attr->dests[out_index].rep = rpriv->rep;
-		attr->dests[out_index].mdev = out_priv->mdev;
+		esw_attr->dests[out_index].rep = rpriv->rep;
+		esw_attr->dests[out_index].mdev = out_priv->mdev;
 	}
 
 	err = mlx5_eswitch_add_vlan_action(esw, attr);
@@ -1256,7 +1256,7 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 	}
 
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
-		counter = mlx5_fc_create(attr->counter_dev, true);
+		counter = mlx5_fc_create(esw_attr->counter_dev, true);
 		if (IS_ERR(counter))
 			return PTR_ERR(counter);
 
@@ -1282,7 +1282,7 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 
 static bool mlx5_flow_has_geneve_opt(struct mlx5e_tc_flow *flow)
 {
-	struct mlx5_flow_spec *spec = &flow->esw_attr->parse_attr->spec;
+	struct mlx5_flow_spec *spec = &flow->attr->parse_attr->spec;
 	void *headers_v = MLX5_ADDR_OF(fte_match_param,
 				       spec->match_value,
 				       misc_parameters_3);
@@ -1297,7 +1297,7 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv,
 				  struct mlx5e_tc_flow *flow)
 {
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
-	struct mlx5_esw_flow_attr *attr = flow->esw_attr;
+	struct mlx5_flow_attr *attr = flow->attr;
 	int out_index;
 
 	mlx5e_put_flow_tunnel_id(flow);
@@ -1318,22 +1318,24 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv,
 	mlx5_eswitch_del_vlan_action(esw, attr);
 
 	for (out_index = 0; out_index < MLX5_MAX_FLOW_FWD_VPORTS; out_index++)
-		if (attr->dests[out_index].flags & MLX5_ESW_DEST_ENCAP) {
+		if (attr->esw_attr->dests[out_index].flags & MLX5_ESW_DEST_ENCAP) {
 			mlx5e_detach_encap(priv, flow, out_index);
 			kfree(attr->parse_attr->tun_info[out_index]);
 		}
 	kvfree(attr->parse_attr);
 
-	mlx5_tc_ct_match_del(priv, &flow->esw_attr->ct_attr);
+	mlx5_tc_ct_match_del(priv, &flow->attr->ct_attr);
 
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
 		mlx5e_detach_mod_hdr(priv, flow);
 
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_COUNT)
-		mlx5_fc_destroy(attr->counter_dev, attr->counter);
+		mlx5_fc_destroy(attr->esw_attr->counter_dev, attr->counter);
 
 	if (flow_flag_test(flow, L3_TO_L2_DECAP))
 		mlx5e_detach_decap(priv, flow);
+
+	kfree(flow->attr);
 }
 
 void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv,
@@ -1343,6 +1345,7 @@ void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv,
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 	struct mlx5_esw_flow_attr *esw_attr;
 	struct mlx5_flow_handle *rule;
+	struct mlx5_flow_attr *attr;
 	struct mlx5_flow_spec *spec;
 	struct mlx5e_tc_flow *flow;
 	int err;
@@ -1365,8 +1368,9 @@ void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv,
 
 		if (!mlx5e_is_offloaded_flow(flow))
 			continue;
-		esw_attr = flow->esw_attr;
-		spec = &esw_attr->parse_attr->spec;
+		attr = flow->attr;
+		esw_attr = attr->esw_attr;
+		spec = &attr->parse_attr->spec;
 
 		esw_attr->dests[flow->tmp_efi_index].pkt_reformat = e->pkt_reformat;
 		esw_attr->dests[flow->tmp_efi_index].flags |= MLX5_ESW_DEST_ENCAP_VALID;
@@ -1386,7 +1390,7 @@ void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv,
 		if (!all_flow_encaps_valid)
 			continue;
 		/* update from slow path rule to encap rule */
-		rule = mlx5e_tc_offload_fdb_rules(esw, flow, spec, esw_attr);
+		rule = mlx5e_tc_offload_fdb_rules(esw, flow, spec, attr);
 		if (IS_ERR(rule)) {
 			err = PTR_ERR(rule);
 			mlx5_core_warn(priv->mdev, "Failed to update cached encapsulation flow, %d\n",
@@ -1406,7 +1410,9 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv,
 			      struct list_head *flow_list)
 {
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+	struct mlx5_esw_flow_attr *esw_attr;
 	struct mlx5_flow_handle *rule;
+	struct mlx5_flow_attr *attr;
 	struct mlx5_flow_spec *spec;
 	struct mlx5e_tc_flow *flow;
 	int err;
@@ -1414,12 +1420,14 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv,
 	list_for_each_entry(flow, flow_list, tmp_list) {
 		if (!mlx5e_is_offloaded_flow(flow))
 			continue;
-		spec = &flow->esw_attr->parse_attr->spec;
+		attr = flow->attr;
+		esw_attr = attr->esw_attr;
+		spec = &attr->parse_attr->spec;
 
 		/* update from encap rule to slow path rule */
 		rule = mlx5e_tc_offload_to_slow_path(esw, flow, spec);
 		/* mark the flow's encap dest as non-valid */
-		flow->esw_attr->dests[flow->tmp_efi_index].flags &= ~MLX5_ESW_DEST_ENCAP_VALID;
+		esw_attr->dests[flow->tmp_efi_index].flags &= ~MLX5_ESW_DEST_ENCAP_VALID;
 
 		if (IS_ERR(rule)) {
 			err = PTR_ERR(rule);
@@ -1428,7 +1436,7 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv,
 			continue;
 		}
 
-		mlx5e_tc_unoffload_fdb_rules(esw, flow, flow->esw_attr);
+		mlx5e_tc_unoffload_fdb_rules(esw, flow, attr);
 		flow->rule[0] = rule;
 		/* was unset when fast path rule removed */
 		flow_flag_set(flow, OFFLOADED);
@@ -1441,10 +1449,7 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv,
 
 static struct mlx5_fc *mlx5e_tc_get_counter(struct mlx5e_tc_flow *flow)
 {
-	if (mlx5e_is_eswitch_flow(flow))
-		return flow->esw_attr->counter;
-	else
-		return flow->nic_attr->counter;
+	return flow->attr->counter;
 }
 
 /* Takes reference to all flows attached to encap and adds the flows to
@@ -1810,11 +1815,11 @@ static int mlx5e_get_flow_tunnel_id(struct mlx5e_priv *priv,
 {
 	struct flow_rule *rule = flow_cls_offload_flow_rule(f);
 	struct netlink_ext_ack *extack = f->common.extack;
-	struct mlx5_esw_flow_attr *attr = flow->esw_attr;
 	struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts;
 	struct flow_match_enc_opts enc_opts_match;
 	struct tunnel_match_enc_opts tun_enc_opts;
 	struct mlx5_rep_uplink_priv *uplink_priv;
+	struct mlx5_flow_attr *attr = flow->attr;
 	struct mlx5e_rep_priv *uplink_rpriv;
 	struct tunnel_match_key tunnel_key;
 	bool enc_opts_is_dont_care = true;
@@ -1964,8 +1969,8 @@ static int parse_tunnel_attr(struct mlx5e_priv *priv,
 	if (!mlx5e_is_eswitch_flow(flow))
 		return -EOPNOTSUPP;
 
-	needs_mapping = !!flow->esw_attr->chain;
-	sets_mapping = !flow->esw_attr->chain && flow_has_tc_fwd_action(f);
+	needs_mapping = !!flow->attr->chain;
+	sets_mapping = !flow->attr->chain && flow_has_tc_fwd_action(f);
 	*match_inner = !needs_mapping;
 
 	if ((needs_mapping || sets_mapping) &&
@@ -1977,7 +1982,7 @@ static int parse_tunnel_attr(struct mlx5e_priv *priv,
 		return -EOPNOTSUPP;
 	}
 
-	if (!flow->esw_attr->chain) {
+	if (!flow->attr->chain) {
 		err = mlx5e_tc_tun_parse(filter_dev, priv, spec, f,
 					 match_level);
 		if (err) {
@@ -1992,7 +1997,7 @@ static int parse_tunnel_attr(struct mlx5e_priv *priv,
 		 * object
 		 */
 		if (!netif_is_bareudp(filter_dev))
-			flow->esw_attr->action |= MLX5_FLOW_CONTEXT_ACTION_DECAP;
+			flow->attr->action |= MLX5_FLOW_CONTEXT_ACTION_DECAP;
 	}
 
 	if (!needs_mapping && !sets_mapping)
@@ -2495,12 +2500,9 @@ static int parse_cls_flower(struct mlx5e_priv *priv,
 		}
 	}
 
-	if (is_eswitch_flow) {
-		flow->esw_attr->inner_match_level = inner_match_level;
-		flow->esw_attr->outer_match_level = outer_match_level;
-	} else {
-		flow->nic_attr->match_level = non_tunnel_match_level;
-	}
+	flow->attr->inner_match_level = inner_match_level;
+	flow->attr->outer_match_level = outer_match_level;
+
 
 	return err;
 }
@@ -3134,12 +3136,13 @@ static bool actions_match_supported(struct mlx5e_priv *priv,
 	bool ct_flow = false, ct_clear = false;
 	u32 actions;
 
+	ct_clear = flow->attr->ct_attr.ct_action &
+		TCA_CT_ACT_CLEAR;
+	ct_flow = flow_flag_test(flow, CT) && !ct_clear;
+	actions = flow->attr->action;
+
 	if (mlx5e_is_eswitch_flow(flow)) {
-		actions = flow->esw_attr->action;
-		ct_clear = flow->esw_attr->ct_attr.ct_action &
-			   TCA_CT_ACT_CLEAR;
-		ct_flow = flow_flag_test(flow, CT) && !ct_clear;
-		if (flow->esw_attr->split_count && ct_flow) {
+		if (flow->attr->esw_attr->split_count && ct_flow) {
 			/* All registers used by ct are cleared when using
 			 * split rules.
 			 */
@@ -3147,8 +3150,6 @@ static bool actions_match_supported(struct mlx5e_priv *priv,
 					   "Can't offload mirroring with action ct");
 			return false;
 		}
-	} else {
-		actions = flow->nic_attr->action;
 	}
 
 	if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
@@ -3252,9 +3253,10 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv,
 				struct mlx5e_tc_flow *flow,
 				struct netlink_ext_ack *extack)
 {
-	struct mlx5_nic_flow_attr *attr = flow->nic_attr;
+	struct mlx5_flow_attr *attr = flow->attr;
 	struct pedit_headers_action hdrs[2] = {};
 	const struct flow_action_entry *act;
+	struct mlx5_nic_flow_attr *nic_attr;
 	u32 action = 0;
 	int err, i;
 
@@ -3265,7 +3267,9 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv,
 					FLOW_ACTION_HW_STATS_DELAYED_BIT))
 		return -EOPNOTSUPP;
 
-	attr->flow_tag = MLX5_FS_DEFAULT_FLOW_TAG;
+	nic_attr = attr->nic_attr;
+
+	nic_attr->flow_tag = MLX5_FS_DEFAULT_FLOW_TAG;
 
 	flow_action_for_each(i, act, flow_action) {
 		switch (act->id) {
@@ -3332,7 +3336,7 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv,
 				return -EINVAL;
 			}
 
-			attr->flow_tag = mark;
+			nic_attr->flow_tag = mark;
 			action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
 			}
 			break;
@@ -3489,8 +3493,8 @@ static int mlx5e_attach_encap(struct mlx5e_priv *priv,
 			      bool *encap_valid)
 {
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
-	struct mlx5_esw_flow_attr *attr = flow->esw_attr;
 	struct mlx5e_tc_flow_parse_attr *parse_attr;
+	struct mlx5_flow_attr *attr = flow->attr;
 	const struct ip_tunnel_info *tun_info;
 	struct encap_key key;
 	struct mlx5e_encap_entry *e;
@@ -3576,8 +3580,8 @@ static int mlx5e_attach_encap(struct mlx5e_priv *priv,
 	flow->encaps[out_index].index = out_index;
 	*encap_dev = e->out_dev;
 	if (e->flags & MLX5_ENCAP_ENTRY_VALID) {
-		attr->dests[out_index].pkt_reformat = e->pkt_reformat;
-		attr->dests[out_index].flags |= MLX5_ESW_DEST_ENCAP_VALID;
+		attr->esw_attr->dests[out_index].pkt_reformat = e->pkt_reformat;
+		attr->esw_attr->dests[out_index].flags |= MLX5_ESW_DEST_ENCAP_VALID;
 		*encap_valid = true;
 	} else {
 		*encap_valid = false;
@@ -3604,14 +3608,14 @@ static int mlx5e_attach_decap(struct mlx5e_priv *priv,
 			      struct netlink_ext_ack *extack)
 {
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
-	struct mlx5_esw_flow_attr *attr = flow->esw_attr;
+	struct mlx5_esw_flow_attr *attr = flow->attr->esw_attr;
 	struct mlx5e_tc_flow_parse_attr *parse_attr;
 	struct mlx5e_decap_entry *d;
 	struct mlx5e_decap_key key;
 	uintptr_t hash_key;
 	int err = 0;
 
-	parse_attr = attr->parse_attr;
+	parse_attr = flow->attr->parse_attr;
 	if (sizeof(parse_attr->eth) > MLX5_CAP_ESW(priv->mdev, max_encap_header_size)) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "encap header larger than max supported");
@@ -3753,7 +3757,7 @@ static struct net_device *get_fdb_out_dev(struct net_device *uplink_dev,
 }
 
 static int add_vlan_push_action(struct mlx5e_priv *priv,
-				struct mlx5_esw_flow_attr *attr,
+				struct mlx5_flow_attr *attr,
 				struct net_device **out_dev,
 				u32 *action)
 {
@@ -3766,7 +3770,7 @@ static int add_vlan_push_action(struct mlx5e_priv *priv,
 	};
 	int err;
 
-	err = parse_tc_vlan_action(priv, &vlan_act, attr, action);
+	err = parse_tc_vlan_action(priv, &vlan_act, attr->esw_attr, action);
 	if (err)
 		return err;
 
@@ -3779,7 +3783,7 @@ static int add_vlan_push_action(struct mlx5e_priv *priv,
 }
 
 static int add_vlan_pop_action(struct mlx5e_priv *priv,
-			       struct mlx5_esw_flow_attr *attr,
+			       struct mlx5_flow_attr *attr,
 			       u32 *action)
 {
 	struct flow_action_entry vlan_act = {
@@ -3790,7 +3794,7 @@ static int add_vlan_pop_action(struct mlx5e_priv *priv,
 	nest_level = attr->parse_attr->filter_dev->lower_level -
 						priv->netdev->lower_level;
 	while (nest_level--) {
-		err = parse_tc_vlan_action(priv, &vlan_act, attr, action);
+		err = parse_tc_vlan_action(priv, &vlan_act, attr->esw_attr, action);
 		if (err)
 			return err;
 	}
@@ -3858,7 +3862,7 @@ static int mlx5_validate_goto_chain(struct mlx5_eswitch *esw,
 				    struct netlink_ext_ack *extack)
 {
 	u32 max_chain = mlx5_chains_get_chain_range(esw_chains(esw));
-	struct mlx5_esw_flow_attr *attr = flow->esw_attr;
+	struct mlx5_flow_attr *attr = flow->attr;
 	bool ft_flow = mlx5e_is_ft_flow(flow);
 	u32 dest_chain = act->chain_index;
 
@@ -3895,15 +3899,15 @@ static int verify_uplink_forwarding(struct mlx5e_priv *priv,
 				    struct net_device *out_dev,
 				    struct netlink_ext_ack *extack)
 {
+	struct mlx5_esw_flow_attr *attr = flow->attr->esw_attr;
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
-	struct mlx5_esw_flow_attr *attr = flow->esw_attr;
 	struct mlx5e_rep_priv *rep_priv;
 
 	/* Forwarding non encapsulated traffic between
 	 * uplink ports is allowed only if
 	 * termination_table_raw_traffic cap is set.
 	 *
-	 * Input vport was stored esw_attr->in_rep.
+	 * Input vport was stored attr->in_rep.
 	 * In LAG case, *priv* is the private data of
 	 * uplink which may be not the input vport.
 	 */
@@ -3938,13 +3942,14 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 {
 	struct pedit_headers_action hdrs[2] = {};
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
-	struct mlx5_esw_flow_attr *attr = flow->esw_attr;
-	struct mlx5e_tc_flow_parse_attr *parse_attr = attr->parse_attr;
+	struct mlx5e_tc_flow_parse_attr *parse_attr;
 	struct mlx5e_rep_priv *rpriv = priv->ppriv;
 	const struct ip_tunnel_info *info = NULL;
+	struct mlx5_flow_attr *attr = flow->attr;
 	int ifindexes[MLX5_MAX_FLOW_FWD_VPORTS];
 	bool ft_flow = mlx5e_is_ft_flow(flow);
 	const struct flow_action_entry *act;
+	struct mlx5_esw_flow_attr *esw_attr;
 	bool encap = false, decap = false;
 	u32 action = attr->action;
 	int err, i, if_count = 0;
@@ -3957,6 +3962,9 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 					FLOW_ACTION_HW_STATS_DELAYED_BIT))
 		return -EOPNOTSUPP;
 
+	esw_attr = attr->esw_attr;
+	parse_attr = attr->parse_attr;
+
 	flow_action_for_each(i, act, flow_action) {
 		switch (act->id) {
 		case FLOW_ACTION_DROP:
@@ -4013,7 +4021,7 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 
 			if (!flow_flag_test(flow, L3_TO_L2_DECAP)) {
 				action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
-				attr->split_count = attr->out_count;
+				esw_attr->split_count = esw_attr->out_count;
 			}
 			break;
 		case FLOW_ACTION_CSUM:
@@ -4050,27 +4058,27 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 				return -EOPNOTSUPP;
 			}
 
-			if (attr->out_count >= MLX5_MAX_FLOW_FWD_VPORTS) {
+			if (esw_attr->out_count >= MLX5_MAX_FLOW_FWD_VPORTS) {
 				NL_SET_ERR_MSG_MOD(extack,
 						   "can't support more output ports, can't offload forwarding");
 				netdev_warn(priv->netdev,
 					    "can't support more than %d output ports, can't offload forwarding\n",
-					    attr->out_count);
+					    esw_attr->out_count);
 				return -EOPNOTSUPP;
 			}
 
 			action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST |
 				  MLX5_FLOW_CONTEXT_ACTION_COUNT;
 			if (encap) {
-				parse_attr->mirred_ifindex[attr->out_count] =
+				parse_attr->mirred_ifindex[esw_attr->out_count] =
 					out_dev->ifindex;
-				parse_attr->tun_info[attr->out_count] = dup_tun_info(info);
-				if (!parse_attr->tun_info[attr->out_count])
+				parse_attr->tun_info[esw_attr->out_count] = dup_tun_info(info);
+				if (!parse_attr->tun_info[esw_attr->out_count])
 					return -ENOMEM;
 				encap = false;
-				attr->dests[attr->out_count].flags |=
+				esw_attr->dests[esw_attr->out_count].flags |=
 					MLX5_ESW_DEST_ENCAP;
-				attr->out_count++;
+				esw_attr->out_count++;
 				/* attr->dests[].rep is resolved when we
 				 * handle encap
 				 */
@@ -4119,9 +4127,9 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 
 				out_priv = netdev_priv(out_dev);
 				rpriv = out_priv->ppriv;
-				attr->dests[attr->out_count].rep = rpriv->rep;
-				attr->dests[attr->out_count].mdev = out_priv->mdev;
-				attr->out_count++;
+				esw_attr->dests[esw_attr->out_count].rep = rpriv->rep;
+				esw_attr->dests[esw_attr->out_count].mdev = out_priv->mdev;
+				esw_attr->out_count++;
 			} else if (parse_attr->filter_dev != priv->netdev) {
 				/* All mlx5 devices are called to configure
 				 * high level device filters. Therefore, the
@@ -4159,12 +4167,12 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 							      act, parse_attr, hdrs,
 							      &action, extack);
 			} else {
-				err = parse_tc_vlan_action(priv, act, attr, &action);
+				err = parse_tc_vlan_action(priv, act, esw_attr, &action);
 			}
 			if (err)
 				return err;
 
-			attr->split_count = attr->out_count;
+			esw_attr->split_count = esw_attr->out_count;
 			break;
 		case FLOW_ACTION_VLAN_MANGLE:
 			err = add_vlan_rewrite_action(priv,
@@ -4174,7 +4182,7 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 			if (err)
 				return err;
 
-			attr->split_count = attr->out_count;
+			esw_attr->split_count = esw_attr->out_count;
 			break;
 		case FLOW_ACTION_TUNNEL_DECAP:
 			decap = true;
@@ -4228,7 +4236,7 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 			dealloc_mod_hdr_actions(&parse_attr->mod_hdr_acts);
 			if (!((action & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP) ||
 			      (action & MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH)))
-				attr->split_count = 0;
+				esw_attr->split_count = 0;
 		}
 	}
 
@@ -4268,7 +4276,7 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 		return -EOPNOTSUPP;
 	}
 
-	if (attr->split_count > 0 && !mlx5_esw_has_fwd_fdb(priv->mdev)) {
+	if (esw_attr->split_count > 0 && !mlx5_esw_has_fwd_fdb(priv->mdev)) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "current firmware doesn't support split rule for port mirroring");
 		netdev_warn_once(priv->netdev, "current firmware doesn't support split rule for port mirroring\n");
@@ -4319,25 +4327,37 @@ static struct rhashtable *get_tc_ht(struct mlx5e_priv *priv,
 
 static bool is_peer_flow_needed(struct mlx5e_tc_flow *flow)
 {
-	struct mlx5_esw_flow_attr *attr = flow->esw_attr;
-	bool is_rep_ingress = attr->in_rep->vport != MLX5_VPORT_UPLINK &&
+	struct mlx5_esw_flow_attr *esw_attr = flow->attr->esw_attr;
+	struct mlx5_flow_attr *attr = flow->attr;
+	bool is_rep_ingress = esw_attr->in_rep->vport != MLX5_VPORT_UPLINK &&
 		flow_flag_test(flow, INGRESS);
 	bool act_is_encap = !!(attr->action &
 			       MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT);
-	bool esw_paired = mlx5_devcom_is_paired(attr->in_mdev->priv.devcom,
+	bool esw_paired = mlx5_devcom_is_paired(esw_attr->in_mdev->priv.devcom,
 						MLX5_DEVCOM_ESW_OFFLOADS);
 
 	if (!esw_paired)
 		return false;
 
-	if ((mlx5_lag_is_sriov(attr->in_mdev) ||
-	     mlx5_lag_is_multipath(attr->in_mdev)) &&
+	if ((mlx5_lag_is_sriov(esw_attr->in_mdev) ||
+	     mlx5_lag_is_multipath(esw_attr->in_mdev)) &&
 	    (is_rep_ingress || act_is_encap))
 		return true;
 
 	return false;
 }
 
+struct mlx5_flow_attr *
+mlx5_alloc_flow_attr(enum mlx5_flow_namespace_type type)
+{
+	u32 ex_attr_size = (type == MLX5_FLOW_NAMESPACE_FDB)  ?
+				sizeof(struct mlx5_esw_flow_attr) :
+				sizeof(struct mlx5_nic_flow_attr);
+	struct mlx5_flow_attr *attr;
+
+	return kzalloc(sizeof(*attr) + ex_attr_size, GFP_KERNEL);
+}
+
 static int
 mlx5e_alloc_flow(struct mlx5e_priv *priv, int attr_size,
 		 struct flow_cls_offload *f, unsigned long flow_flags,
@@ -4345,19 +4365,24 @@ mlx5e_alloc_flow(struct mlx5e_priv *priv, int attr_size,
 		 struct mlx5e_tc_flow **__flow)
 {
 	struct mlx5e_tc_flow_parse_attr *parse_attr;
+	struct mlx5_flow_attr *attr;
 	struct mlx5e_tc_flow *flow;
 	int out_index, err;
 
-	flow = kzalloc(sizeof(*flow) + attr_size, GFP_KERNEL);
+	flow = kzalloc(sizeof(*flow), GFP_KERNEL);
 	parse_attr = kvzalloc(sizeof(*parse_attr), GFP_KERNEL);
-	if (!parse_attr || !flow) {
+
+	flow->flags = flow_flags;
+	flow->cookie = f->cookie;
+	flow->priv = priv;
+
+	attr = mlx5_alloc_flow_attr(get_flow_name_space(flow));
+	if (!parse_attr || !flow || !attr) {
 		err = -ENOMEM;
 		goto err_free;
 	}
+	flow->attr = attr;
 
-	flow->cookie = f->cookie;
-	flow->flags = flow_flags;
-	flow->priv = priv;
 	for (out_index = 0; out_index < MLX5_MAX_FLOW_FWD_VPORTS; out_index++)
 		INIT_LIST_HEAD(&flow->encaps[out_index].list);
 	INIT_LIST_HEAD(&flow->hairpin);
@@ -4373,11 +4398,12 @@ mlx5e_alloc_flow(struct mlx5e_priv *priv, int attr_size,
 err_free:
 	kfree(flow);
 	kvfree(parse_attr);
+	kfree(attr);
 	return err;
 }
 
 static void
-mlx5e_flow_esw_attr_init(struct mlx5_esw_flow_attr *esw_attr,
+mlx5e_flow_esw_attr_init(struct mlx5_flow_attr *attr,
 			 struct mlx5e_priv *priv,
 			 struct mlx5e_tc_flow_parse_attr *parse_attr,
 			 struct flow_cls_offload *f,
@@ -4385,10 +4411,11 @@ mlx5e_flow_esw_attr_init(struct mlx5_esw_flow_attr *esw_attr,
 			 struct mlx5_core_dev *in_mdev)
 {
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+	struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr;
 
-	esw_attr->parse_attr = parse_attr;
-	esw_attr->chain = f->common.chain_index;
-	esw_attr->prio = f->common.prio;
+	attr->parse_attr = parse_attr;
+	attr->chain = f->common.chain_index;
+	attr->prio = f->common.prio;
 
 	esw_attr->in_rep = in_rep;
 	esw_attr->in_mdev = in_mdev;
@@ -4422,7 +4449,7 @@ __mlx5e_add_fdb_flow(struct mlx5e_priv *priv,
 		goto out;
 
 	parse_attr->filter_dev = filter_dev;
-	mlx5e_flow_esw_attr_init(flow->esw_attr,
+	mlx5e_flow_esw_attr_init(flow->attr,
 				 priv, parse_attr,
 				 f, in_rep, in_mdev);
 
@@ -4433,7 +4460,7 @@ __mlx5e_add_fdb_flow(struct mlx5e_priv *priv,
 
 	/* actions validation depends on parsing the ct matches first */
 	err = mlx5_tc_ct_match_add(priv, &parse_attr->spec, f,
-				   &flow->esw_attr->ct_attr, extack);
+				   &flow->attr->ct_attr, extack);
 	if (err)
 		goto err_free;
 
@@ -4464,6 +4491,7 @@ static int mlx5e_tc_add_fdb_peer_flow(struct flow_cls_offload *f,
 {
 	struct mlx5e_priv *priv = flow->priv, *peer_priv;
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch, *peer_esw;
+	struct mlx5_esw_flow_attr *attr = flow->attr->esw_attr;
 	struct mlx5_devcom *devcom = priv->mdev->priv.devcom;
 	struct mlx5e_tc_flow_parse_attr *parse_attr;
 	struct mlx5e_rep_priv *peer_urpriv;
@@ -4483,15 +4511,15 @@ static int mlx5e_tc_add_fdb_peer_flow(struct flow_cls_offload *f,
 	 * original flow and packets redirected from uplink use the
 	 * peer mdev.
 	 */
-	if (flow->esw_attr->in_rep->vport == MLX5_VPORT_UPLINK)
+	if (attr->in_rep->vport == MLX5_VPORT_UPLINK)
 		in_mdev = peer_priv->mdev;
 	else
 		in_mdev = priv->mdev;
 
-	parse_attr = flow->esw_attr->parse_attr;
+	parse_attr = flow->attr->parse_attr;
 	peer_flow = __mlx5e_add_fdb_flow(peer_priv, f, flow_flags,
 					 parse_attr->filter_dev,
-					 flow->esw_attr->in_rep, in_mdev);
+					 attr->in_rep, in_mdev);
 	if (IS_ERR(peer_flow)) {
 		err = PTR_ERR(peer_flow);
 		goto out;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
index 2d63a75a9326..9e84f03eebce 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -35,17 +35,48 @@
 
 #include <net/pkt_cls.h>
 #include "en.h"
+#include "eswitch.h"
+#include "en/tc_ct.h"
 
 #define MLX5E_TC_FLOW_ID_MASK 0x0000ffff
 
 #ifdef CONFIG_MLX5_ESWITCH
 
+#define ESW_FLOW_ATTR_SZ (sizeof(struct mlx5_flow_attr) +\
+			  sizeof(struct mlx5_esw_flow_attr))
+
 int mlx5e_tc_num_filters(struct mlx5e_priv *priv, unsigned long flags);
 
 struct mlx5e_tc_update_priv {
 	struct net_device *tun_dev;
 };
 
+struct mlx5_nic_flow_attr {
+	u32 flow_tag;
+	u32 hairpin_tirn;
+	struct mlx5_flow_table *hairpin_ft;
+};
+
+struct mlx5_flow_attr {
+	u32 action;
+	struct mlx5_fc *counter;
+	struct mlx5_modify_hdr *modify_hdr;
+	struct mlx5_ct_attr ct_attr;
+	struct mlx5e_tc_flow_parse_attr *parse_attr;
+	u32 chain;
+	u16 prio;
+	u32 dest_chain;
+	struct mlx5_flow_table *ft;
+	struct mlx5_flow_table *dest_ft;
+	u8 inner_match_level;
+	u8 outer_match_level;
+	u32 flags;
+	union {
+		struct mlx5_esw_flow_attr esw_attr[0];
+		struct mlx5_nic_flow_attr nic_attr[0];
+	};
+};
+
 #if IS_ENABLED(CONFIG_MLX5_CLS_ACT)
 
 struct tunnel_match_key {
@@ -181,11 +212,10 @@ void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv);
 int mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
 			    void *cb_priv);
 
-struct mlx5_nic_flow_attr;
 struct mlx5_flow_handle *
 mlx5e_add_offloaded_nic_rule(struct mlx5e_priv *priv,
 			     struct mlx5_flow_spec *spec,
-			     struct mlx5_nic_flow_attr *attr);
+			     struct mlx5_flow_attr *attr);
 void mlx5e_del_offloaded_nic_rule(struct mlx5e_priv *priv,
 				  struct mlx5_flow_handle *rule);
 #else /* CONFIG_MLX5_CLS_ACT */
@@ -196,6 +226,8 @@ mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data, void *cb_priv)
 { return -EOPNOTSUPP; }
 #endif /* CONFIG_MLX5_CLS_ACT */
 
+struct mlx5_flow_attr *mlx5_alloc_flow_attr(enum mlx5_flow_namespace_type type);
+
 #else /* CONFIG_MLX5_ESWITCH */
 static inline int  mlx5e_tc_nic_init(struct mlx5e_priv *priv) { return 0; }
 static inline void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv) {}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index fc23d57e9e44..66393fbdcd94 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -328,7 +328,7 @@ struct mlx5_termtbl_handle;
 
 bool
 mlx5_eswitch_termtbl_required(struct mlx5_eswitch *esw,
-			      struct mlx5_esw_flow_attr *attr,
+			      struct mlx5_flow_attr *attr,
 			      struct mlx5_flow_act *flow_act,
 			      struct mlx5_flow_spec *spec);
 
@@ -348,19 +348,19 @@ mlx5_eswitch_termtbl_put(struct mlx5_eswitch *esw,
 struct mlx5_flow_handle *
 mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 				struct mlx5_flow_spec *spec,
-				struct mlx5_esw_flow_attr *attr);
+				struct mlx5_flow_attr *attr);
 struct mlx5_flow_handle *
 mlx5_eswitch_add_fwd_rule(struct mlx5_eswitch *esw,
 			  struct mlx5_flow_spec *spec,
-			  struct mlx5_esw_flow_attr *attr);
+			  struct mlx5_flow_attr *attr);
 void
 mlx5_eswitch_del_offloaded_rule(struct mlx5_eswitch *esw,
 				struct mlx5_flow_handle *rule,
-				struct mlx5_esw_flow_attr *attr);
+				struct mlx5_flow_attr *attr);
 void
 mlx5_eswitch_del_fwd_rule(struct mlx5_eswitch *esw,
 			  struct mlx5_flow_handle *rule,
-			  struct mlx5_esw_flow_attr *attr);
+			  struct mlx5_flow_attr *attr);
 
 struct mlx5_flow_handle *
 mlx5_eswitch_create_vport_rx_rule(struct mlx5_eswitch *esw, u16 vport,
@@ -400,7 +400,6 @@ struct mlx5_esw_flow_attr {
 	int split_count;
 	int out_count;
 
-	int	action;
 	__be16	vlan_proto[MLX5_FS_VLAN_DEPTH];
 	u16	vlan_vid[MLX5_FS_VLAN_DEPTH];
 	u8	vlan_prio[MLX5_FS_VLAN_DEPTH];
@@ -412,19 +411,7 @@ struct mlx5_esw_flow_attr {
 		struct mlx5_core_dev *mdev;
 		struct mlx5_termtbl_handle *termtbl;
 	} dests[MLX5_MAX_FLOW_FWD_VPORTS];
-	struct  mlx5_modify_hdr *modify_hdr;
-	u8	inner_match_level;
-	u8	outer_match_level;
-	struct mlx5_fc *counter;
-	u32	chain;
-	u16	prio;
-	u32	dest_chain;
-	u32	flags;
-	struct mlx5_flow_table *fdb;
-	struct mlx5_flow_table *dest_ft;
-	struct mlx5_ct_attr ct_attr;
 	struct mlx5_pkt_reformat *decap_pkt_reformat;
-	struct mlx5e_tc_flow_parse_attr *parse_attr;
 };
 
 int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
@@ -450,9 +437,9 @@ int mlx5_devlink_port_function_hw_addr_set(struct devlink *devlink,
 void *mlx5_eswitch_get_uplink_priv(struct mlx5_eswitch *esw, u8 rep_type);
 
 int mlx5_eswitch_add_vlan_action(struct mlx5_eswitch *esw,
-				 struct mlx5_esw_flow_attr *attr);
+				 struct mlx5_flow_attr *attr);
 int mlx5_eswitch_del_vlan_action(struct mlx5_eswitch *esw,
-				 struct mlx5_esw_flow_attr *attr);
+				 struct mlx5_flow_attr *attr);
 int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw,
 				  u16 vport, u16 vlan, u8 qos, u8 set_flags);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 38eef5a8feb9..ffd5d540a19e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -45,6 +45,7 @@
 #include "lib/devcom.h"
 #include "lib/eq.h"
 #include "lib/fs_chains.h"
+#include "en_tc.h"
 
 /* There are two match-all miss flows, one for unicast dst mac and
  * one for multicast.
@@ -66,6 +67,12 @@ struct mlx5_vport_key {
 	u16 vhca_id;
 } __packed;
 
+struct mlx5_vport_tbl_attr {
+	u16 chain;
+	u16 prio;
+	u16 vport;
+};
+
 struct mlx5_vport_table {
 	struct hlist_node hlist;
 	struct mlx5_flow_table *fdb;
@@ -94,10 +101,10 @@ esw_vport_tbl_create(struct mlx5_eswitch *esw, struct mlx5_flow_namespace *ns)
 }
 
 static u32 flow_attr_to_vport_key(struct mlx5_eswitch *esw,
-				  struct mlx5_esw_flow_attr *attr,
+				  struct mlx5_vport_tbl_attr *attr,
 				  struct mlx5_vport_key *key)
 {
-	key->vport = attr->in_rep->vport;
+	key->vport = attr->vport;
 	key->chain = attr->chain;
 	key->prio = attr->prio;
 	key->vhca_id = MLX5_CAP_GEN(esw->dev, vhca_id);
@@ -118,7 +125,7 @@ esw_vport_tbl_lookup(struct mlx5_eswitch *esw, struct mlx5_vport_key *skey, u32
 }
 
 static void
-esw_vport_tbl_put(struct mlx5_eswitch *esw, struct mlx5_esw_flow_attr *attr)
+esw_vport_tbl_put(struct mlx5_eswitch *esw, struct mlx5_vport_tbl_attr *attr)
 {
 	struct mlx5_vport_table *e;
 	struct mlx5_vport_key key;
@@ -138,7 +145,7 @@ esw_vport_tbl_put(struct mlx5_eswitch *esw, struct mlx5_esw_flow_attr *attr)
 }
 
 static struct mlx5_flow_table *
-esw_vport_tbl_get(struct mlx5_eswitch *esw, struct mlx5_esw_flow_attr *attr)
+esw_vport_tbl_get(struct mlx5_eswitch *esw, struct mlx5_vport_tbl_attr *attr)
 {
 	struct mlx5_core_dev *dev = esw->dev;
 	struct mlx5_flow_namespace *ns;
@@ -189,16 +196,15 @@ esw_vport_tbl_get(struct mlx5_eswitch *esw, struct mlx5_esw_flow_attr *attr)
 
 int mlx5_esw_vport_tbl_get(struct mlx5_eswitch *esw)
 {
-	struct mlx5_esw_flow_attr attr = {};
-	struct mlx5_eswitch_rep rep = {};
+	struct mlx5_vport_tbl_attr attr;
 	struct mlx5_flow_table *fdb;
 	struct mlx5_vport *vport;
 	int i;
 
+	attr.chain = 0;
 	attr.prio = 1;
-	attr.in_rep = &rep;
 	mlx5_esw_for_all_vports(esw, i, vport) {
-		attr.in_rep->vport = vport->vport;
+		attr.vport = vport->vport;
 		fdb = esw_vport_tbl_get(esw, &attr);
 		if (IS_ERR(fdb))
 			goto out;
@@ -212,15 +218,14 @@ int mlx5_esw_vport_tbl_get(struct mlx5_eswitch *esw)
 
 void mlx5_esw_vport_tbl_put(struct mlx5_eswitch *esw)
 {
-	struct mlx5_esw_flow_attr attr = {};
-	struct mlx5_eswitch_rep rep = {};
+	struct mlx5_vport_tbl_attr attr;
 	struct mlx5_vport *vport;
 	int i;
 
+	attr.chain = 0;
 	attr.prio = 1;
-	attr.in_rep = &rep;
 	mlx5_esw_for_all_vports(esw, i, vport) {
-		attr.in_rep->vport = vport->vport;
+		attr.vport = vport->vport;
 		esw_vport_tbl_put(esw, &attr);
 	}
 }
@@ -290,12 +295,14 @@ mlx5_eswitch_set_rule_source_port(struct mlx5_eswitch *esw,
 struct mlx5_flow_handle *
 mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 				struct mlx5_flow_spec *spec,
-				struct mlx5_esw_flow_attr *attr)
+				struct mlx5_flow_attr *attr)
 {
 	struct mlx5_flow_destination dest[MLX5_MAX_FLOW_FWD_VPORTS + 1] = {};
 	struct mlx5_flow_act flow_act = { .flags = FLOW_ACT_NO_APPEND, };
+	struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr;
 	struct mlx5_fs_chains *chains = esw_chains(esw);
-	bool split = !!(attr->split_count);
+	bool split = !!(esw_attr->split_count);
+	struct mlx5_vport_tbl_attr fwd_attr;
 	struct mlx5_flow_handle *rule;
 	struct mlx5_flow_table *fdb;
 	int j, i = 0;
@@ -309,13 +316,13 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 		flow_act.action &= ~(MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH |
 				     MLX5_FLOW_CONTEXT_ACTION_VLAN_POP);
 	else if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH) {
-		flow_act.vlan[0].ethtype = ntohs(attr->vlan_proto[0]);
-		flow_act.vlan[0].vid = attr->vlan_vid[0];
-		flow_act.vlan[0].prio = attr->vlan_prio[0];
+		flow_act.vlan[0].ethtype = ntohs(esw_attr->vlan_proto[0]);
+		flow_act.vlan[0].vid = esw_attr->vlan_vid[0];
+		flow_act.vlan[0].prio = esw_attr->vlan_prio[0];
 		if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH_2) {
-			flow_act.vlan[1].ethtype = ntohs(attr->vlan_proto[1]);
-			flow_act.vlan[1].vid = attr->vlan_vid[1];
-			flow_act.vlan[1].prio = attr->vlan_prio[1];
+			flow_act.vlan[1].ethtype = ntohs(esw_attr->vlan_proto[1]);
+			flow_act.vlan[1].vid = esw_attr->vlan_vid[1];
+			flow_act.vlan[1].prio = esw_attr->vlan_prio[1];
 		}
 	}
 
@@ -345,28 +352,29 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 			dest[i].ft = ft;
 			i++;
 		} else {
-			for (j = attr->split_count; j < attr->out_count; j++) {
+			for (j = esw_attr->split_count; j < esw_attr->out_count; j++) {
 				dest[i].type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
-				dest[i].vport.num = attr->dests[j].rep->vport;
+				dest[i].vport.num = esw_attr->dests[j].rep->vport;
 				dest[i].vport.vhca_id =
-					MLX5_CAP_GEN(attr->dests[j].mdev, vhca_id);
+					MLX5_CAP_GEN(esw_attr->dests[j].mdev, vhca_id);
 				if (MLX5_CAP_ESW(esw->dev, merged_eswitch))
 					dest[i].vport.flags |=
 						MLX5_FLOW_DEST_VPORT_VHCA_ID;
-				if (attr->dests[j].flags & MLX5_ESW_DEST_ENCAP) {
+				if (esw_attr->dests[j].flags & MLX5_ESW_DEST_ENCAP) {
 					flow_act.action |= MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT;
-					flow_act.pkt_reformat = attr->dests[j].pkt_reformat;
+					flow_act.pkt_reformat =
+							esw_attr->dests[j].pkt_reformat;
 					dest[i].vport.flags |= MLX5_FLOW_DEST_VPORT_REFORMAT_ID;
 					dest[i].vport.pkt_reformat =
-						attr->dests[j].pkt_reformat;
+						esw_attr->dests[j].pkt_reformat;
 				}
 				i++;
 			}
 		}
 	}
 
-	if (attr->decap_pkt_reformat)
-		flow_act.pkt_reformat = attr->decap_pkt_reformat;
+	if (esw_attr->decap_pkt_reformat)
+		flow_act.pkt_reformat = esw_attr->decap_pkt_reformat;
 
 	if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
 		dest[i].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
@@ -383,26 +391,30 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 		flow_act.modify_hdr = attr->modify_hdr;
 
 	if (split) {
-		fdb = esw_vport_tbl_get(esw, attr);
+		fwd_attr.chain = attr->chain;
+		fwd_attr.prio = attr->prio;
+		fwd_attr.vport = esw_attr->in_rep->vport;
+
+		fdb = esw_vport_tbl_get(esw, &fwd_attr);
 	} else {
 		if (attr->chain || attr->prio)
 			fdb = mlx5_chains_get_table(chains, attr->chain,
 						    attr->prio, 0);
 		else
-			fdb = attr->fdb;
+			fdb = attr->ft;
 
 		if (!(attr->flags & MLX5_ESW_ATTR_FLAG_NO_IN_PORT))
-			mlx5_eswitch_set_rule_source_port(esw, spec, attr);
+			mlx5_eswitch_set_rule_source_port(esw, spec, esw_attr);
 	}
 	if (IS_ERR(fdb)) {
 		rule = ERR_CAST(fdb);
 		goto err_esw_get;
 	}
 
-	mlx5_eswitch_set_rule_flow_source(esw, spec, attr);
+	mlx5_eswitch_set_rule_flow_source(esw, spec, esw_attr);
 
 	if (mlx5_eswitch_termtbl_required(esw, attr, &flow_act, spec))
-		rule = mlx5_eswitch_add_termtbl_rule(esw, fdb, spec, attr,
+		rule = mlx5_eswitch_add_termtbl_rule(esw, fdb, spec, esw_attr,
 						     &flow_act, dest, i);
 	else
 		rule = mlx5_add_flow_rules(fdb, spec, &flow_act, dest, i);
@@ -415,7 +427,7 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 
 err_add_rule:
 	if (split)
-		esw_vport_tbl_put(esw, attr);
+		esw_vport_tbl_put(esw, &fwd_attr);
 	else if (attr->chain || attr->prio)
 		mlx5_chains_put_table(chains, attr->chain, attr->prio, 0);
 err_esw_get:
@@ -428,11 +440,13 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 struct mlx5_flow_handle *
 mlx5_eswitch_add_fwd_rule(struct mlx5_eswitch *esw,
 			  struct mlx5_flow_spec *spec,
-			  struct mlx5_esw_flow_attr *attr)
+			  struct mlx5_flow_attr *attr)
 {
 	struct mlx5_flow_destination dest[MLX5_MAX_FLOW_FWD_VPORTS + 1] = {};
 	struct mlx5_flow_act flow_act = { .flags = FLOW_ACT_NO_APPEND, };
+	struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr;
 	struct mlx5_fs_chains *chains = esw_chains(esw);
+	struct mlx5_vport_tbl_attr fwd_attr;
 	struct mlx5_flow_table *fast_fdb;
 	struct mlx5_flow_table *fwd_fdb;
 	struct mlx5_flow_handle *rule;
@@ -444,31 +458,33 @@ mlx5_eswitch_add_fwd_rule(struct mlx5_eswitch *esw,
 		goto err_get_fast;
 	}
 
-	fwd_fdb = esw_vport_tbl_get(esw, attr);
+	fwd_attr.chain = attr->chain;
+	fwd_attr.prio = attr->prio;
+	fwd_attr.vport = esw_attr->in_rep->vport;
+	fwd_fdb = esw_vport_tbl_get(esw, &fwd_attr);
 	if (IS_ERR(fwd_fdb)) {
 		rule = ERR_CAST(fwd_fdb);
 		goto err_get_fwd;
 	}
 
 	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
-	for (i = 0; i < attr->split_count; i++) {
+	for (i = 0; i < esw_attr->split_count; i++) {
 		dest[i].type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
-		dest[i].vport.num = attr->dests[i].rep->vport;
+		dest[i].vport.num = esw_attr->dests[i].rep->vport;
 		dest[i].vport.vhca_id =
-			MLX5_CAP_GEN(attr->dests[i].mdev, vhca_id);
+			MLX5_CAP_GEN(esw_attr->dests[i].mdev, vhca_id);
 		if (MLX5_CAP_ESW(esw->dev, merged_eswitch))
 			dest[i].vport.flags |= MLX5_FLOW_DEST_VPORT_VHCA_ID;
-		if (attr->dests[i].flags & MLX5_ESW_DEST_ENCAP) {
+		if (esw_attr->dests[i].flags & MLX5_ESW_DEST_ENCAP) {
 			dest[i].vport.flags |= MLX5_FLOW_DEST_VPORT_REFORMAT_ID;
-			dest[i].vport.pkt_reformat = attr->dests[i].pkt_reformat;
+			dest[i].vport.pkt_reformat = esw_attr->dests[i].pkt_reformat;
 		}
 	}
 	dest[i].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
 	dest[i].ft = fwd_fdb,
 	i++;
 
-	mlx5_eswitch_set_rule_source_port(esw, spec, attr);
-	mlx5_eswitch_set_rule_flow_source(esw, spec, attr);
+	mlx5_eswitch_set_rule_source_port(esw, spec, esw_attr);
 
 	if (attr->outer_match_level != MLX5_MATCH_NONE)
 		spec->match_criteria_enable |= MLX5_MATCH_OUTER_HEADERS;
@@ -483,7 +499,7 @@ mlx5_eswitch_add_fwd_rule(struct mlx5_eswitch *esw,
 
 	return rule;
 add_err:
-	esw_vport_tbl_put(esw, attr);
+	esw_vport_tbl_put(esw, &fwd_attr);
 err_get_fwd:
 	mlx5_chains_put_table(chains, attr->chain, attr->prio, 0);
 err_get_fast:
@@ -493,11 +509,13 @@ mlx5_eswitch_add_fwd_rule(struct mlx5_eswitch *esw,
 static void
 __mlx5_eswitch_del_rule(struct mlx5_eswitch *esw,
 			struct mlx5_flow_handle *rule,
-			struct mlx5_esw_flow_attr *attr,
+			struct mlx5_flow_attr *attr,
 			bool fwd_rule)
 {
+	struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr;
 	struct mlx5_fs_chains *chains = esw_chains(esw);
-	bool split = (attr->split_count > 0);
+	bool split = (esw_attr->split_count > 0);
+	struct mlx5_vport_tbl_attr fwd_attr;
 	int i;
 
 	mlx5_del_flow_rules(rule);
@@ -505,19 +523,25 @@ __mlx5_eswitch_del_rule(struct mlx5_eswitch *esw,
 	if (!(attr->flags & MLX5_ESW_ATTR_FLAG_SLOW_PATH)) {
 		/* unref the term table */
 		for (i = 0; i < MLX5_MAX_FLOW_FWD_VPORTS; i++) {
-			if (attr->dests[i].termtbl)
-				mlx5_eswitch_termtbl_put(esw, attr->dests[i].termtbl);
+			if (esw_attr->dests[i].termtbl)
+				mlx5_eswitch_termtbl_put(esw, esw_attr->dests[i].termtbl);
 		}
 	}
 
 	atomic64_dec(&esw->offloads.num_flows);
 
+	if (fwd_rule || split) {
+		fwd_attr.chain = attr->chain;
+		fwd_attr.prio = attr->prio;
+		fwd_attr.vport = esw_attr->in_rep->vport;
+	}
+
 	if (fwd_rule)  {
-		esw_vport_tbl_put(esw, attr);
+		esw_vport_tbl_put(esw, &fwd_attr);
 		mlx5_chains_put_table(chains, attr->chain, attr->prio, 0);
 	} else {
 		if (split)
-			esw_vport_tbl_put(esw, attr);
+			esw_vport_tbl_put(esw, &fwd_attr);
 		else if (attr->chain || attr->prio)
 			mlx5_chains_put_table(chains, attr->chain, attr->prio, 0);
 		if (attr->dest_chain)
@@ -528,7 +552,7 @@ __mlx5_eswitch_del_rule(struct mlx5_eswitch *esw,
 void
 mlx5_eswitch_del_offloaded_rule(struct mlx5_eswitch *esw,
 				struct mlx5_flow_handle *rule,
-				struct mlx5_esw_flow_attr *attr)
+				struct mlx5_flow_attr *attr)
 {
 	__mlx5_eswitch_del_rule(esw, rule, attr, false);
 }
@@ -536,7 +560,7 @@ mlx5_eswitch_del_offloaded_rule(struct mlx5_eswitch *esw,
 void
 mlx5_eswitch_del_fwd_rule(struct mlx5_eswitch *esw,
 			  struct mlx5_flow_handle *rule,
-			  struct mlx5_esw_flow_attr *attr)
+			  struct mlx5_flow_attr *attr)
 {
 	__mlx5_eswitch_del_rule(esw, rule, attr, true);
 }
@@ -613,9 +637,10 @@ static int esw_add_vlan_action_check(struct mlx5_esw_flow_attr *attr,
 }
 
 int mlx5_eswitch_add_vlan_action(struct mlx5_eswitch *esw,
-				 struct mlx5_esw_flow_attr *attr)
+				 struct mlx5_flow_attr *attr)
 {
 	struct offloads_fdb *offloads = &esw->fdb_table.offloads;
+	struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr;
 	struct mlx5_eswitch_rep *vport = NULL;
 	bool push, pop, fwd;
 	int err = 0;
@@ -631,17 +656,17 @@ int mlx5_eswitch_add_vlan_action(struct mlx5_eswitch *esw,
 
 	mutex_lock(&esw->state_lock);
 
-	err = esw_add_vlan_action_check(attr, push, pop, fwd);
+	err = esw_add_vlan_action_check(esw_attr, push, pop, fwd);
 	if (err)
 		goto unlock;
 
 	attr->flags &= ~MLX5_ESW_ATTR_FLAG_VLAN_HANDLED;
 
-	vport = esw_vlan_action_get_vport(attr, push, pop);
+	vport = esw_vlan_action_get_vport(esw_attr, push, pop);
 
 	if (!push && !pop && fwd) {
 		/* tracks VF --> wire rules without vlan push action */
-		if (attr->dests[0].rep->vport == MLX5_VPORT_UPLINK) {
+		if (esw_attr->dests[0].rep->vport == MLX5_VPORT_UPLINK) {
 			vport->vlan_refcount++;
 			attr->flags |= MLX5_ESW_ATTR_FLAG_VLAN_HANDLED;
 		}
@@ -664,11 +689,11 @@ int mlx5_eswitch_add_vlan_action(struct mlx5_eswitch *esw,
 		if (vport->vlan_refcount)
 			goto skip_set_push;
 
-		err = __mlx5_eswitch_set_vport_vlan(esw, vport->vport, attr->vlan_vid[0], 0,
-						    SET_VLAN_INSERT | SET_VLAN_STRIP);
+		err = __mlx5_eswitch_set_vport_vlan(esw, vport->vport, esw_attr->vlan_vid[0],
+						    0, SET_VLAN_INSERT | SET_VLAN_STRIP);
 		if (err)
 			goto out;
-		vport->vlan = attr->vlan_vid[0];
+		vport->vlan = esw_attr->vlan_vid[0];
 skip_set_push:
 		vport->vlan_refcount++;
 	}
@@ -681,9 +706,10 @@ int mlx5_eswitch_add_vlan_action(struct mlx5_eswitch *esw,
 }
 
 int mlx5_eswitch_del_vlan_action(struct mlx5_eswitch *esw,
-				 struct mlx5_esw_flow_attr *attr)
+				 struct mlx5_flow_attr *attr)
 {
 	struct offloads_fdb *offloads = &esw->fdb_table.offloads;
+	struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr;
 	struct mlx5_eswitch_rep *vport = NULL;
 	bool push, pop, fwd;
 	int err = 0;
@@ -701,11 +727,11 @@ int mlx5_eswitch_del_vlan_action(struct mlx5_eswitch *esw,
 
 	mutex_lock(&esw->state_lock);
 
-	vport = esw_vlan_action_get_vport(attr, push, pop);
+	vport = esw_vlan_action_get_vport(esw_attr, push, pop);
 
 	if (!push && !pop && fwd) {
 		/* tracks VF --> wire rules without vlan push action */
-		if (attr->dests[0].rep->vport == MLX5_VPORT_UPLINK)
+		if (esw_attr->dests[0].rep->vport == MLX5_VPORT_UPLINK)
 			vport->vlan_refcount--;
 
 		goto out;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c
index 17a0d2bc102b..ec679560a95d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c
@@ -3,6 +3,7 @@
 
 #include <linux/mlx5/fs.h>
 #include "eswitch.h"
+#include "en_tc.h"
 #include "fs_core.h"
 
 struct mlx5_termtbl_handle {
@@ -228,10 +229,11 @@ static bool mlx5_eswitch_offload_is_uplink_port(const struct mlx5_eswitch *esw,
 
 bool
 mlx5_eswitch_termtbl_required(struct mlx5_eswitch *esw,
-			      struct mlx5_esw_flow_attr *attr,
+			      struct mlx5_flow_attr *attr,
 			      struct mlx5_flow_act *flow_act,
 			      struct mlx5_flow_spec *spec)
 {
+	struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr;
 	int i;
 
 	if (!MLX5_CAP_ESW_FLOWTABLE_FDB(esw->dev, termination_table) ||
@@ -244,8 +246,8 @@ mlx5_eswitch_termtbl_required(struct mlx5_eswitch *esw,
 		return true;
 
 	/* hairpin */
-	for (i = attr->split_count; i < attr->out_count; i++)
-		if (attr->dests[i].rep->vport == MLX5_VPORT_UPLINK)
+	for (i = esw_attr->split_count; i < esw_attr->out_count; i++)
+		if (esw_attr->dests[i].rep->vport == MLX5_VPORT_UPLINK)
 			return true;
 
 	return false;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next V2 06/15] net/mlx5e: Add tc chains offload support for nic flows
  2020-09-23 22:48 [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode saeed
                   ` (4 preceding siblings ...)
  2020-09-23 22:48 ` [net-next V2 05/15] net/mlx5: Refactor tc flow attributes structure saeed
@ 2020-09-23 22:48 ` saeed
  2020-09-23 22:48 ` [net-next V2 07/15] net/mlx5e: rework ct offload init messages saeed
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: saeed @ 2020-09-23 22:48 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Ariel Levkovich, Roi Dayan, Saeed Mahameed, Saeed Mahameed

From: Ariel Levkovich <lariel@mellanox.com>

Allow adding nic tc flow rules with goto chain action.

Connecting the nic flows to the mlx5 chains infrastructure in previous
patches allows us to support the creation of chained flow tables and
rules that direct to another chain for further packet processing.
This is a required preparation to support CT offloads for nic tc flows.

We allow the creation of 256 different chains for nic flows since we
have 8 bits available for the chain restore tag in case of a miss.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  10 +
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 252 ++++++++++++++----
 .../net/ethernet/mellanox/mlx5/core/en_tc.h   |  34 ++-
 .../mellanox/mlx5/core/lib/fs_chains.c        |  11 +-
 4 files changed, 250 insertions(+), 57 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 310533fc9950..599f5b5ebc97 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1260,6 +1260,11 @@ static void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	}
 
 	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
+
+	if (mlx5e_cqe_regb_chain(cqe))
+		if (!mlx5e_tc_update_skb(cqe, skb))
+			goto free_wqe;
+
 	napi_gro_receive(rq->cq.napi, skb);
 
 free_wqe:
@@ -1521,6 +1526,11 @@ static void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct mlx5_cqe64 *cq
 		goto mpwrq_cqe_out;
 
 	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
+
+	if (mlx5e_cqe_regb_chain(cqe))
+		if (!mlx5e_tc_update_skb(cqe, skb))
+			goto mpwrq_cqe_out;
+
 	napi_gro_receive(rq->cq.napi, skb);
 
 mpwrq_cqe_out:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index a54821107566..da05c4c195ff 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -177,6 +177,15 @@ struct mlx5e_tc_attr_to_reg_mapping mlx5e_tc_attr_to_reg_mappings[] = {
 	[MARK_TO_REG] = mark_to_reg_ct,
 	[LABELS_TO_REG] = labels_to_reg_ct,
 	[FTEID_TO_REG] = fteid_to_reg_ct,
+	/* For NIC rules we store the retore metadata directly
+	 * into reg_b that is passed to SW since we don't
+	 * jump between steering domains.
+	 */
+	[NIC_CHAIN_TO_REG] = {
+		.mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_B,
+		.moffset = 0,
+		.mlen = 2,
+	},
 };
 
 static void mlx5e_put_flow_tunnel_id(struct mlx5e_tc_flow *flow);
@@ -879,6 +888,7 @@ mlx5e_add_offloaded_nic_rule(struct mlx5e_priv *priv,
 			     struct mlx5_flow_attr *attr)
 {
 	struct mlx5_flow_context *flow_context = &spec->flow_context;
+	struct mlx5_fs_chains *nic_chains = nic_chains(priv);
 	struct mlx5_nic_flow_attr *nic_attr = attr->nic_attr;
 	struct mlx5e_tc_table *tc = &priv->fs.tc;
 	struct mlx5_flow_destination dest[2] = {};
@@ -887,6 +897,7 @@ mlx5e_add_offloaded_nic_rule(struct mlx5e_priv *priv,
 		.flags    = FLOW_ACT_NO_APPEND,
 	};
 	struct mlx5_flow_handle *rule;
+	struct mlx5_flow_table *ft;
 	int dest_ix = 0;
 
 	flow_context->flags |= FLOW_CONTEXT_HAS_TAG;
@@ -902,10 +913,22 @@ mlx5e_add_offloaded_nic_rule(struct mlx5e_priv *priv,
 		dest_ix++;
 	} else if (attr->action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
 		dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
-		dest[dest_ix].ft = priv->fs.vlan.ft.t;
+		if (attr->dest_chain) {
+			dest[dest_ix].ft = mlx5_chains_get_table(nic_chains,
+								 attr->dest_chain, 1,
+								 MLX5E_TC_FT_LEVEL);
+			if (IS_ERR(dest[dest_ix].ft))
+				return ERR_CAST(dest[dest_ix].ft);
+		} else {
+			dest[dest_ix].ft = priv->fs.vlan.ft.t;
+		}
 		dest_ix++;
 	}
 
+	if (dest[0].type == MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE &&
+	    MLX5_CAP_FLOWTABLE_NIC_RX(priv->mdev, ignore_flow_level))
+		flow_act.flags |= FLOW_ACT_IGNORE_FLOW_LEVEL;
+
 	if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
 		dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
 		dest[dest_ix].counter_id = mlx5_fc_id(attr->counter);
@@ -919,26 +942,47 @@ mlx5e_add_offloaded_nic_rule(struct mlx5e_priv *priv,
 	if (IS_ERR_OR_NULL(tc->t)) {
 		/* Create the root table here if doesn't exist yet */
 		tc->t =
-			mlx5_chains_get_table(nic_chains(priv), 0, 1, MLX5E_TC_FT_LEVEL);
+			mlx5_chains_get_table(nic_chains, 0, 1, MLX5E_TC_FT_LEVEL);
 
 		if (IS_ERR(tc->t)) {
 			mutex_unlock(&tc->t_lock);
 			netdev_err(priv->netdev,
 				   "Failed to create tc offload table\n");
-			return ERR_CAST(tc->t);
+			rule = ERR_CAST(priv->fs.tc.t);
+			goto err_ft_get;
 		}
 	}
 	mutex_unlock(&tc->t_lock);
 
+	ft = mlx5_chains_get_table(nic_chains,
+				   attr->chain, attr->prio,
+				   MLX5E_TC_FT_LEVEL);
+	if (IS_ERR(ft)) {
+		rule = ERR_CAST(ft);
+		goto err_ft_get;
+	}
+
 	if (attr->outer_match_level != MLX5_MATCH_NONE)
 		spec->match_criteria_enable |= MLX5_MATCH_OUTER_HEADERS;
 
-	rule = mlx5_add_flow_rules(tc->t, spec,
+	rule = mlx5_add_flow_rules(ft, spec,
 				   &flow_act, dest, dest_ix);
 	if (IS_ERR(rule))
-		return ERR_CAST(rule);
+		goto err_rule;
 
 	return rule;
+
+err_rule:
+	mlx5_chains_put_table(nic_chains,
+			      attr->chain, attr->prio,
+			      MLX5E_TC_FT_LEVEL);
+err_ft_get:
+	if (attr->dest_chain)
+		mlx5_chains_put_table(nic_chains,
+				      attr->dest_chain, 1,
+				      MLX5E_TC_FT_LEVEL);
+
+	return ERR_CAST(rule);
 }
 
 static int
@@ -980,9 +1024,19 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 }
 
 void mlx5e_del_offloaded_nic_rule(struct mlx5e_priv *priv,
-				  struct mlx5_flow_handle *rule)
+				  struct mlx5_flow_handle *rule,
+				  struct mlx5_flow_attr *attr)
 {
+	struct mlx5_fs_chains *nic_chains = nic_chains(priv);
+
 	mlx5_del_flow_rules(rule);
+
+	mlx5_chains_put_table(nic_chains, attr->chain, attr->prio,
+			      MLX5E_TC_FT_LEVEL);
+
+	if (attr->dest_chain)
+		mlx5_chains_put_table(nic_chains, attr->dest_chain, 1,
+				      MLX5E_TC_FT_LEVEL);
 }
 
 static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
@@ -992,9 +1046,14 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
 	struct mlx5e_tc_table *tc = &priv->fs.tc;
 
 	if (!IS_ERR_OR_NULL(flow->rule[0]))
-		mlx5e_del_offloaded_nic_rule(priv, flow->rule[0]);
+		mlx5e_del_offloaded_nic_rule(priv, flow->rule[0], attr);
 	mlx5_fc_destroy(priv->mdev, attr->counter);
 
+	flow_flag_clear(flow, OFFLOADED);
+
+	/* Remove root table if no rules are left to avoid
+	 * extra steering hops.
+	 */
 	mutex_lock(&priv->fs.tc.t_lock);
 	if (!mlx5e_tc_num_filters(priv, MLX5_TC_FLAG(NIC_OFFLOAD)) &&
 	    !IS_ERR_OR_NULL(tc->t)) {
@@ -3247,6 +3306,57 @@ add_vlan_prio_tag_rewrite_action(struct mlx5e_priv *priv,
 				       extack);
 }
 
+static int validate_goto_chain(struct mlx5e_priv *priv,
+			       struct mlx5e_tc_flow *flow,
+			       const struct flow_action_entry *act,
+			       u32 actions,
+			       struct netlink_ext_ack *extack)
+{
+	bool is_esw = mlx5e_is_eswitch_flow(flow);
+	struct mlx5_flow_attr *attr = flow->attr;
+	bool ft_flow = mlx5e_is_ft_flow(flow);
+	u32 dest_chain = act->chain_index;
+	struct mlx5_fs_chains *chains;
+	struct mlx5_eswitch *esw;
+	u32 reformat_and_fwd;
+	u32 max_chain;
+
+	esw = priv->mdev->priv.eswitch;
+	chains = is_esw ? esw_chains(esw) : nic_chains(priv);
+	max_chain = mlx5_chains_get_chain_range(chains);
+	reformat_and_fwd = is_esw ?
+			   MLX5_CAP_ESW_FLOWTABLE_FDB(priv->mdev, reformat_and_fwd_to_table) :
+			   MLX5_CAP_FLOWTABLE_NIC_RX(priv->mdev, reformat_and_fwd_to_table);
+
+	if (ft_flow) {
+		NL_SET_ERR_MSG_MOD(extack, "Goto action is not supported");
+		return -EOPNOTSUPP;
+	}
+
+	if (!mlx5_chains_backwards_supported(chains) &&
+	    dest_chain <= attr->chain) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Goto lower numbered chain isn't supported");
+		return -EOPNOTSUPP;
+	}
+
+	if (dest_chain > max_chain) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Requested destination chain is out of supported range");
+		return -EOPNOTSUPP;
+	}
+
+	if (actions & (MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT |
+		       MLX5_FLOW_CONTEXT_ACTION_DECAP) &&
+	    !reformat_and_fwd) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Goto chain is not allowed if action has reformat or decap");
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
 static int parse_tc_nic_actions(struct mlx5e_priv *priv,
 				struct flow_action *flow_action,
 				struct mlx5e_tc_flow_parse_attr *parse_attr,
@@ -3290,8 +3400,7 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv,
 			if (err)
 				return err;
 
-			action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR |
-				  MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+			action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
 			break;
 		case FLOW_ACTION_VLAN_MANGLE:
 			err = add_vlan_rewrite_action(priv,
@@ -3340,6 +3449,15 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv,
 			action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
 			}
 			break;
+		case FLOW_ACTION_GOTO:
+			err = validate_goto_chain(priv, flow, act, action,
+						  extack);
+			if (err)
+				return err;
+
+			action |= MLX5_FLOW_CONTEXT_ACTION_COUNT;
+			attr->dest_chain = act->chain_index;
+			break;
 		default:
 			NL_SET_ERR_MSG_MOD(extack, "The offload action is not supported");
 			return -EOPNOTSUPP;
@@ -3362,6 +3480,18 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv,
 	}
 
 	attr->action = action;
+
+	if (attr->dest_chain) {
+		if (attr->action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
+			NL_SET_ERR_MSG(extack, "Mirroring goto chain rules isn't supported");
+			return -EOPNOTSUPP;
+		}
+		attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+	}
+
+	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
+		attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+
 	if (!actions_match_supported(priv, flow_action, parse_attr, flow, extack))
 		return -EOPNOTSUPP;
 
@@ -3855,45 +3985,6 @@ static bool is_duplicated_output_device(struct net_device *dev,
 	return false;
 }
 
-static int mlx5_validate_goto_chain(struct mlx5_eswitch *esw,
-				    struct mlx5e_tc_flow *flow,
-				    const struct flow_action_entry *act,
-				    u32 actions,
-				    struct netlink_ext_ack *extack)
-{
-	u32 max_chain = mlx5_chains_get_chain_range(esw_chains(esw));
-	struct mlx5_flow_attr *attr = flow->attr;
-	bool ft_flow = mlx5e_is_ft_flow(flow);
-	u32 dest_chain = act->chain_index;
-
-	if (ft_flow) {
-		NL_SET_ERR_MSG_MOD(extack, "Goto action is not supported");
-		return -EOPNOTSUPP;
-	}
-
-	if (!mlx5_chains_backwards_supported(esw_chains(esw)) &&
-	    dest_chain <= attr->chain) {
-		NL_SET_ERR_MSG_MOD(extack,
-				   "Goto lower numbered chain isn't supported");
-		return -EOPNOTSUPP;
-	}
-	if (dest_chain > max_chain) {
-		NL_SET_ERR_MSG_MOD(extack,
-				   "Requested destination chain is out of supported range");
-		return -EOPNOTSUPP;
-	}
-
-	if (actions & (MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT |
-		       MLX5_FLOW_CONTEXT_ACTION_DECAP) &&
-	    !MLX5_CAP_ESW_FLOWTABLE_FDB(esw->dev, reformat_and_fwd_to_table)) {
-		NL_SET_ERR_MSG_MOD(extack,
-				   "Goto chain is not allowed if action has reformat or decap");
-		return -EOPNOTSUPP;
-	}
-
-	return 0;
-}
-
 static int verify_uplink_forwarding(struct mlx5e_priv *priv,
 				    struct mlx5e_tc_flow *flow,
 				    struct net_device *out_dev,
@@ -4188,8 +4279,8 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 			decap = true;
 			break;
 		case FLOW_ACTION_GOTO:
-			err = mlx5_validate_goto_chain(esw, flow, act, action,
-						       extack);
+			err = validate_goto_chain(priv, flow, act, action,
+						  extack);
 			if (err)
 				return err;
 
@@ -4402,6 +4493,16 @@ mlx5e_alloc_flow(struct mlx5e_priv *priv, int attr_size,
 	return err;
 }
 
+static void
+mlx5e_flow_attr_init(struct mlx5_flow_attr *attr,
+		     struct mlx5e_tc_flow_parse_attr *parse_attr,
+		     struct flow_cls_offload *f)
+{
+	attr->parse_attr = parse_attr;
+	attr->chain = f->common.chain_index;
+	attr->prio = f->common.prio;
+}
+
 static void
 mlx5e_flow_esw_attr_init(struct mlx5_flow_attr *attr,
 			 struct mlx5e_priv *priv,
@@ -4413,9 +4514,7 @@ mlx5e_flow_esw_attr_init(struct mlx5_flow_attr *attr,
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 	struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr;
 
-	attr->parse_attr = parse_attr;
-	attr->chain = f->common.chain_index;
-	attr->prio = f->common.prio;
+	mlx5e_flow_attr_init(attr, parse_attr, f);
 
 	esw_attr->in_rep = in_rep;
 	esw_attr->in_mdev = in_mdev;
@@ -4583,9 +4682,12 @@ mlx5e_add_nic_flow(struct mlx5e_priv *priv,
 	struct mlx5e_tc_flow *flow;
 	int attr_size, err;
 
-	/* multi-chain not supported for NIC rules */
-	if (!tc_cls_can_offload_and_chain0(priv->netdev, &f->common))
+	if (!MLX5_CAP_FLOWTABLE_NIC_RX(priv->mdev, ignore_flow_level)) {
+		if (!tc_cls_can_offload_and_chain0(priv->netdev, &f->common))
+			return -EOPNOTSUPP;
+	} else if (!tc_can_offload_extack(priv->netdev, f->common.extack)) {
 		return -EOPNOTSUPP;
+	}
 
 	flow_flags |= BIT(MLX5E_TC_FLOW_FLAG_NIC);
 	attr_size  = sizeof(struct mlx5_nic_flow_attr);
@@ -4595,6 +4697,8 @@ mlx5e_add_nic_flow(struct mlx5e_priv *priv,
 		goto out;
 
 	parse_attr->filter_dev = filter_dev;
+	mlx5e_flow_attr_init(flow->attr, parse_attr, f);
+
 	err = parse_cls_flower(flow->priv, flow, &parse_attr->spec,
 			       f, filter_dev);
 	if (err)
@@ -5023,6 +5127,11 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv)
 	if (err)
 		return err;
 
+	if (MLX5_CAP_FLOWTABLE_NIC_RX(priv->mdev, ignore_flow_level)) {
+		attr.flags = MLX5_CHAINS_AND_PRIOS_SUPPORTED |
+			MLX5_CHAINS_IGNORE_FLOW_LEVEL_SUPPORTED;
+		attr.max_restore_tag = MLX5E_TC_TABLE_CHAIN_TAG_MASK;
+	}
 	attr.ns = MLX5_FLOW_NAMESPACE_KERNEL;
 	attr.max_ft_sz = mlx5e_tc_nic_get_ft_size(dev);
 	attr.max_grp_num = MLX5E_TC_TABLE_NUM_GROUPS;
@@ -5208,3 +5317,36 @@ int mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
 		return -EOPNOTSUPP;
 	}
 }
+
+bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe,
+			 struct sk_buff *skb)
+{
+#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
+	struct mlx5e_priv *priv = netdev_priv(skb->dev);
+	u32 chain = 0, chain_tag, reg_b;
+	struct tc_skb_ext *tc_skb_ext;
+	int err;
+
+	reg_b = be32_to_cpu(cqe->ft_metadata);
+
+	chain_tag = reg_b & MLX5E_TC_TABLE_CHAIN_TAG_MASK;
+
+	err = mlx5_get_chain_for_tag(nic_chains(priv), chain_tag, &chain);
+	if (err) {
+		netdev_dbg(priv->netdev,
+			   "Couldn't find chain for chain tag: %d, err: %d\n",
+			   chain_tag, err);
+		return false;
+	}
+
+	if (chain) {
+		tc_skb_ext = skb_ext_add(skb, TC_SKB_EXT);
+		if (WARN_ON(!tc_skb_ext))
+			return false;
+
+		tc_skb_ext->chain = chain;
+	}
+#endif /* CONFIG_NET_TC_SKB_EXT */
+
+	return true;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
index 9e84f03eebce..fa78289489b6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -77,6 +77,9 @@ struct mlx5_flow_attr {
 	};
 };
 
+#define MLX5E_TC_TABLE_CHAIN_TAG_BITS 16
+#define MLX5E_TC_TABLE_CHAIN_TAG_MASK GENMASK(MLX5E_TC_TABLE_CHAIN_TAG_BITS - 1, 0)
+
 #if IS_ENABLED(CONFIG_MLX5_CLS_ACT)
 
 struct tunnel_match_key {
@@ -164,6 +167,7 @@ enum mlx5e_tc_attr_to_reg {
 	MARK_TO_REG,
 	LABELS_TO_REG,
 	FTEID_TO_REG,
+	NIC_CHAIN_TO_REG,
 };
 
 struct mlx5e_tc_attr_to_reg_mapping {
@@ -217,13 +221,16 @@ mlx5e_add_offloaded_nic_rule(struct mlx5e_priv *priv,
 			     struct mlx5_flow_spec *spec,
 			     struct mlx5_flow_attr *attr);
 void mlx5e_del_offloaded_nic_rule(struct mlx5e_priv *priv,
-				  struct mlx5_flow_handle *rule);
+				  struct mlx5_flow_handle *rule,
+				  struct mlx5_flow_attr *attr);
+
 #else /* CONFIG_MLX5_CLS_ACT */
 static inline int  mlx5e_tc_nic_init(struct mlx5e_priv *priv) { return 0; }
 static inline void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv) {}
 static inline int
 mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data, void *cb_priv)
 { return -EOPNOTSUPP; }
+
 #endif /* CONFIG_MLX5_CLS_ACT */
 
 struct mlx5_flow_attr *mlx5_alloc_flow_attr(enum mlx5_flow_namespace_type type);
@@ -242,4 +249,29 @@ mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data, void *cb_priv)
 { return -EOPNOTSUPP; }
 #endif
 
+#if IS_ENABLED(CONFIG_MLX5_CLS_ACT)
+static inline bool mlx5e_cqe_regb_chain(struct mlx5_cqe64 *cqe)
+{
+#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
+	u32 chain, reg_b;
+
+	reg_b = be32_to_cpu(cqe->ft_metadata);
+
+	chain = reg_b & MLX5E_TC_TABLE_CHAIN_TAG_MASK;
+	if (chain)
+		return true;
+#endif
+
+	return false;
+}
+
+bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe, struct sk_buff *skb);
+#else /* CONFIG_MLX5_CLS_ACT */
+static inline bool mlx5e_cqe_regb_chain(struct mlx5_cqe64 *cqe)
+{ return false; }
+static inline bool
+mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe, struct sk_buff *skb)
+{ return true; }
+#endif
+
 #endif /* __MLX5_EN_TC_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
index 5bd65cdc9b07..947f346bdc2d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
@@ -330,6 +330,12 @@ create_chain_restore(struct fs_chain *chain)
 			err = PTR_ERR(chain->restore_rule);
 			goto err_rule;
 		}
+	} else if (chains->ns == MLX5_FLOW_NAMESPACE_KERNEL) {
+		/* For NIC RX we don't need a restore rule
+		 * since we write the metadata to reg_b
+		 * that is passed to SW directly.
+		 */
+		chain_to_reg = NIC_CHAIN_TO_REG;
 	} else {
 		err = -EINVAL;
 		goto err_rule;
@@ -447,7 +453,10 @@ mlx5_chains_add_miss_rule(struct fs_chain *chain,
 	struct mlx5_flow_destination dest = {};
 	struct mlx5_flow_act act = {};
 
-	act.flags  = FLOW_ACT_IGNORE_FLOW_LEVEL | FLOW_ACT_NO_APPEND;
+	act.flags  = FLOW_ACT_NO_APPEND;
+	if (mlx5_chains_ignore_flow_level_supported(chain->chains))
+		act.flags |= FLOW_ACT_IGNORE_FLOW_LEVEL;
+
 	act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
 	dest.type  = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
 	dest.ft = next_ft;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next V2 07/15] net/mlx5e: rework ct offload init messages
  2020-09-23 22:48 [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode saeed
                   ` (5 preceding siblings ...)
  2020-09-23 22:48 ` [net-next V2 06/15] net/mlx5e: Add tc chains offload support for nic flows saeed
@ 2020-09-23 22:48 ` saeed
  2020-09-23 22:48 ` [net-next V2 08/15] net/mlx5e: Support CT offload for tc nic flows saeed
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: saeed @ 2020-09-23 22:48 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Ariel Levkovich, Roi Dayan, Saeed Mahameed, Saeed Mahameed

From: Ariel Levkovich <lariel@mellanox.com>

The changes are:
- Use mlx5_core print macros instead of netdev_warn since
  netdev is not always initialized at that stage.

- Print a warning message in case the issue is with lack of
  support for CT offload without indicating an error.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en/tc_ct.c    | 39 ++++++++-----------
 1 file changed, 17 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index 9509f8674e5a..bc7589711357 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -1803,24 +1803,14 @@ mlx5_tc_ct_init_check_support(struct mlx5_eswitch *esw,
 	return 0;
 }
 
-static void
-mlx5_tc_ct_init_err(struct mlx5e_rep_priv *rpriv, const char *msg, int err)
-{
-	if (msg)
-		netdev_warn(rpriv->netdev,
-			    "tc ct offload not supported, %s, err: %d\n",
-			    msg, err);
-	else
-		netdev_warn(rpriv->netdev,
-			    "tc ct offload not supported, err: %d\n",
-			    err);
-}
+#define INIT_ERR_PREFIX "tc ct offload init failed"
 
 int
 mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv)
 {
 	struct mlx5_tc_ct_priv *ct_priv;
 	struct mlx5e_rep_priv *rpriv;
+	struct mlx5_core_dev *dev;
 	struct mlx5_eswitch *esw;
 	struct mlx5e_priv *priv;
 	const char *msg;
@@ -1828,19 +1818,20 @@ mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv)
 
 	rpriv = container_of(uplink_priv, struct mlx5e_rep_priv, uplink_priv);
 	priv = netdev_priv(rpriv->netdev);
-	esw = priv->mdev->priv.eswitch;
+	dev = priv->mdev;
+	esw = dev->priv.eswitch;
 
 	err = mlx5_tc_ct_init_check_support(esw, &msg);
 	if (err) {
-		mlx5_tc_ct_init_err(rpriv, msg, err);
+		mlx5_core_warn(dev,
+			       "tc ct offload not supported, %s\n",
+			       msg);
 		goto err_support;
 	}
 
 	ct_priv = kzalloc(sizeof(*ct_priv), GFP_KERNEL);
-	if (!ct_priv) {
-		mlx5_tc_ct_init_err(rpriv, NULL, -ENOMEM);
+	if (!ct_priv)
 		goto err_alloc;
-	}
 
 	ct_priv->zone_mapping = mapping_create(sizeof(u16), 0, true);
 	if (IS_ERR(ct_priv->zone_mapping)) {
@@ -1859,23 +1850,27 @@ mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv)
 	ct_priv->ct = mlx5_chains_create_global_table(esw_chains(esw));
 	if (IS_ERR(ct_priv->ct)) {
 		err = PTR_ERR(ct_priv->ct);
-		mlx5_tc_ct_init_err(rpriv, "failed to create ct table", err);
+		mlx5_core_warn(dev,
+			       "%s, failed to create ct table err: %d\n",
+			       INIT_ERR_PREFIX, err);
 		goto err_ct_tbl;
 	}
 
 	ct_priv->ct_nat = mlx5_chains_create_global_table(esw_chains(esw));
 	if (IS_ERR(ct_priv->ct_nat)) {
 		err = PTR_ERR(ct_priv->ct_nat);
-		mlx5_tc_ct_init_err(rpriv, "failed to create ct nat table",
-				    err);
+		mlx5_core_warn(dev,
+			       "%s, failed to create ct nat table err: %d\n",
+			       INIT_ERR_PREFIX, err);
 		goto err_ct_nat_tbl;
 	}
 
 	ct_priv->post_ct = mlx5_chains_create_global_table(esw_chains(esw));
 	if (IS_ERR(ct_priv->post_ct)) {
 		err = PTR_ERR(ct_priv->post_ct);
-		mlx5_tc_ct_init_err(rpriv, "failed to create post ct table",
-				    err);
+		mlx5_core_warn(dev,
+			       "%s, failed to create post ct table err: %d\n",
+			       INIT_ERR_PREFIX, err);
 		goto err_post_ct_tbl;
 	}
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next V2 08/15] net/mlx5e: Support CT offload for tc nic flows
  2020-09-23 22:48 [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode saeed
                   ` (6 preceding siblings ...)
  2020-09-23 22:48 ` [net-next V2 07/15] net/mlx5e: rework ct offload init messages saeed
@ 2020-09-23 22:48 ` saeed
  2020-09-23 22:48 ` [net-next V2 09/15] net/mlx5e: CT: Use the same counter for both directions saeed
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: saeed @ 2020-09-23 22:48 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Ariel Levkovich, Roi Dayan, Saeed Mahameed, Saeed Mahameed

From: Ariel Levkovich <lariel@mellanox.com>

Adding support to perform CT related tc actions and
matching on CT states for nic flows.

The ct flows management and handling will be done using a new
instance of the ct database that is declared in this patch to
keep it separate from the eswitch ct flows database.
Offloading and unoffloading ct flows will be done using the
existing ct offload api by providing it the relevant ct
database reference in each mode.

In addition, refactoring the tc ct api is introduced to make it
agnostic to the flow type and perform the resource allocations
and rule insertion to the proper steering domain in the device.

In the initialization call, the api requests and stores in the ct
database instance all the relevant information that distinguishes
between nic flows and esw flows, such as chains database, steering
namespace and mod hdr table.
This way the operations of adding and removing ct flows to the device
can later performed agnostically to the flow type.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/fs.h   |   2 +
 .../ethernet/mellanox/mlx5/core/en/rep/tc.c   |   6 +-
 .../ethernet/mellanox/mlx5/core/en/tc_ct.c    | 289 +++++++++---------
 .../ethernet/mellanox/mlx5/core/en/tc_ct.h    |  54 ++--
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 171 ++++++++---
 .../net/ethernet/mellanox/mlx5/core/en_tc.h   |  26 ++
 6 files changed, 348 insertions(+), 200 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
index ef3c9a165b1d..6a97452dc60e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
@@ -27,6 +27,8 @@ struct mlx5e_tc_table {
 
 	struct notifier_block     netdevice_nb;
 	struct netdev_net_notifier	netdevice_nn;
+
+	struct mlx5_tc_ct_priv         *ct;
 };
 
 struct mlx5e_flow_table {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
index 771e73f211fb..e36e505d38ad 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
@@ -612,7 +612,6 @@ bool mlx5e_rep_tc_update_skb(struct mlx5_cqe64 *cqe,
 	struct tc_skb_ext *tc_skb_ext;
 	struct mlx5_eswitch *esw;
 	struct mlx5e_priv *priv;
-	int tunnel_moffset;
 	int err;
 
 	reg_c0 = (be32_to_cpu(cqe->sop_drop_qpn) & MLX5E_TC_FLOW_ID_MASK);
@@ -647,13 +646,12 @@ bool mlx5e_rep_tc_update_skb(struct mlx5_cqe64 *cqe,
 
 		uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
 		uplink_priv = &uplink_rpriv->uplink_priv;
-		if (!mlx5e_tc_ct_restore_flow(uplink_priv, skb,
+		if (!mlx5e_tc_ct_restore_flow(uplink_priv->ct_priv, skb,
 					      zone_restore_id))
 			return false;
 	}
 
-	tunnel_moffset = mlx5e_tc_attr_to_reg_mappings[TUNNEL_TO_REG].moffset;
-	tunnel_id = reg_c1 >> (8 * tunnel_moffset);
+	tunnel_id = reg_c1 >> REG_MAPPING_SHIFT(TUNNEL_TO_REG);
 	return mlx5e_restore_tunnel(priv, skb, tc_priv, tunnel_id);
 #endif /* CONFIG_NET_TC_SKB_EXT */
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index bc7589711357..86afef459dc6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -41,6 +41,7 @@
 struct mlx5_tc_ct_priv {
 	struct mlx5_eswitch *esw;
 	const struct net_device *netdev;
+	struct mod_hdr_tbl *mod_hdr_tbl;
 	struct idr fte_ids;
 	struct xarray tuple_ids;
 	struct rhashtable zone_ht;
@@ -52,6 +53,8 @@ struct mlx5_tc_ct_priv {
 	struct mutex control_lock; /* guards parallel adds/dels */
 	struct mapping_ctx *zone_mapping;
 	struct mapping_ctx *labels_mapping;
+	enum mlx5_flow_namespace_type ns_type;
+	struct mlx5_fs_chains *chains;
 };
 
 struct mlx5_ct_flow {
@@ -72,7 +75,7 @@ struct mlx5_ct_zone_rule {
 };
 
 struct mlx5_tc_ct_pre {
-	struct mlx5_flow_table *fdb;
+	struct mlx5_flow_table *ft;
 	struct mlx5_flow_group *flow_grp;
 	struct mlx5_flow_group *miss_grp;
 	struct mlx5_flow_handle *flow_rule;
@@ -157,18 +160,6 @@ static const struct rhashtable_params tuples_nat_ht_params = {
 	.min_size = 16 * 1024,
 };
 
-static struct mlx5_tc_ct_priv *
-mlx5_tc_ct_get_ct_priv(struct mlx5e_priv *priv)
-{
-	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
-	struct mlx5_rep_uplink_priv *uplink_priv;
-	struct mlx5e_rep_priv *uplink_rpriv;
-
-	uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
-	uplink_priv = &uplink_rpriv->uplink_priv;
-	return uplink_priv->ct_priv;
-}
-
 static int
 mlx5_tc_ct_rule_to_tuple(struct mlx5_ct_tuple *tuple, struct flow_rule *rule)
 {
@@ -401,13 +392,12 @@ mlx5_tc_ct_entry_del_rule(struct mlx5_tc_ct_priv *ct_priv,
 {
 	struct mlx5_ct_zone_rule *zone_rule = &entry->zone_rules[nat];
 	struct mlx5_flow_attr *attr = zone_rule->attr;
-	struct mlx5_eswitch *esw = ct_priv->esw;
 
 	ct_dbg("Deleting ct entry rule in zone %d", entry->tuple.zone);
 
-	mlx5_eswitch_del_offloaded_rule(esw, zone_rule->rule, attr);
+	mlx5_tc_rule_delete(netdev_priv(ct_priv->netdev), zone_rule->rule, attr);
 	mlx5e_mod_hdr_detach(ct_priv->esw->dev,
-			     &esw->offloads.mod_hdr, zone_rule->mh);
+			     ct_priv->mod_hdr_tbl, zone_rule->mh);
 	mapping_remove(ct_priv->labels_mapping, attr->ct_attr.ct_labels_id);
 	kfree(attr);
 }
@@ -445,29 +435,40 @@ mlx5_tc_ct_entry_set_registers(struct mlx5_tc_ct_priv *ct_priv,
 			       u32 labels_id,
 			       u8 zone_restore_id)
 {
+	enum mlx5_flow_namespace_type ns = ct_priv->ns_type;
 	struct mlx5_eswitch *esw = ct_priv->esw;
 	int err;
 
-	err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts,
+	err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts, ns,
 					CTSTATE_TO_REG, ct_state);
 	if (err)
 		return err;
 
-	err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts,
+	err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts, ns,
 					MARK_TO_REG, mark);
 	if (err)
 		return err;
 
-	err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts,
+	err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts, ns,
 					LABELS_TO_REG, labels_id);
 	if (err)
 		return err;
 
-	err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts,
+	err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts, ns,
 					ZONE_RESTORE_TO_REG, zone_restore_id);
 	if (err)
 		return err;
 
+	/* Make another copy of zone id in reg_b for
+	 * NIC rx flows since we don't copy reg_c1 to
+	 * reg_b upon miss.
+	 */
+	if (ns != MLX5_FLOW_NAMESPACE_FDB) {
+		err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts, ns,
+						NIC_ZONE_RESTORE_TO_REG, zone_restore_id);
+		if (err)
+			return err;
+	}
 	return 0;
 }
 
@@ -559,8 +560,7 @@ mlx5_tc_ct_entry_create_nat(struct mlx5_tc_ct_priv *ct_priv,
 	flow_action_for_each(i, act, flow_action) {
 		switch (act->id) {
 		case FLOW_ACTION_MANGLE: {
-			err = alloc_mod_hdr_actions(mdev,
-						    MLX5_FLOW_NAMESPACE_FDB,
+			err = alloc_mod_hdr_actions(mdev, ct_priv->ns_type,
 						    mod_acts);
 			if (err)
 				return err;
@@ -626,8 +626,8 @@ mlx5_tc_ct_entry_create_mod_hdr(struct mlx5_tc_ct_priv *ct_priv,
 		goto err_mapping;
 
 	*mh = mlx5e_mod_hdr_attach(ct_priv->esw->dev,
-				   &ct_priv->esw->offloads.mod_hdr,
-				   MLX5_FLOW_NAMESPACE_FDB,
+				   ct_priv->mod_hdr_tbl,
+				   ct_priv->ns_type,
 				   &mod_acts);
 	if (IS_ERR(*mh)) {
 		err = PTR_ERR(*mh);
@@ -651,7 +651,7 @@ mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv,
 			  bool nat, u8 zone_restore_id)
 {
 	struct mlx5_ct_zone_rule *zone_rule = &entry->zone_rules[nat];
-	struct mlx5_eswitch *esw = ct_priv->esw;
+	struct mlx5e_priv *priv = netdev_priv(ct_priv->netdev);
 	struct mlx5_flow_spec *spec = NULL;
 	struct mlx5_flow_attr *attr;
 	int err;
@@ -662,7 +662,7 @@ mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv,
 	if (!spec)
 		return -ENOMEM;
 
-	attr = mlx5_alloc_flow_attr(MLX5_FLOW_NAMESPACE_FDB);
+	attr = mlx5_alloc_flow_attr(ct_priv->ns_type);
 	if (!attr) {
 		err = -ENOMEM;
 		goto err_attr;
@@ -691,7 +691,7 @@ mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv,
 				    entry->tuple.zone & MLX5_CT_ZONE_MASK,
 				    MLX5_CT_ZONE_MASK);
 
-	zone_rule->rule = mlx5_eswitch_add_offloaded_rule(esw, spec, attr);
+	zone_rule->rule = mlx5_tc_rule_insert(priv, spec, attr);
 	if (IS_ERR(zone_rule->rule)) {
 		err = PTR_ERR(zone_rule->rule);
 		ct_dbg("Failed to add ct entry rule, nat: %d", nat);
@@ -707,7 +707,7 @@ mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv,
 
 err_rule:
 	mlx5e_mod_hdr_detach(ct_priv->esw->dev,
-			     &esw->offloads.mod_hdr, zone_rule->mh);
+			     ct_priv->mod_hdr_tbl, zone_rule->mh);
 	mapping_remove(ct_priv->labels_mapping, attr->ct_attr.ct_labels_id);
 err_mod_hdr:
 	kfree(attr);
@@ -970,24 +970,21 @@ mlx5_tc_ct_add_no_trk_match(struct mlx5e_priv *priv,
 	return 0;
 }
 
-void mlx5_tc_ct_match_del(struct mlx5e_priv *priv, struct mlx5_ct_attr *ct_attr)
+void mlx5_tc_ct_match_del(struct mlx5_tc_ct_priv *priv, struct mlx5_ct_attr *ct_attr)
 {
-	struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv);
-
-	if (!ct_priv || !ct_attr->ct_labels_id)
+	if (!priv || !ct_attr->ct_labels_id)
 		return;
 
-	mapping_remove(ct_priv->labels_mapping, ct_attr->ct_labels_id);
+	mapping_remove(priv->labels_mapping, ct_attr->ct_labels_id);
 }
 
 int
-mlx5_tc_ct_match_add(struct mlx5e_priv *priv,
+mlx5_tc_ct_match_add(struct mlx5_tc_ct_priv *priv,
 		     struct mlx5_flow_spec *spec,
 		     struct flow_cls_offload *f,
 		     struct mlx5_ct_attr *ct_attr,
 		     struct netlink_ext_ack *extack)
 {
-	struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv);
 	struct flow_rule *rule = flow_cls_offload_flow_rule(f);
 	struct flow_dissector_key_ct *mask, *key;
 	bool trk, est, untrk, unest, new;
@@ -1000,7 +997,7 @@ mlx5_tc_ct_match_add(struct mlx5e_priv *priv,
 	if (!flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_CT))
 		return 0;
 
-	if (!ct_priv) {
+	if (!priv) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "offload of ct matching isn't available");
 		return -EOPNOTSUPP;
@@ -1056,7 +1053,7 @@ mlx5_tc_ct_match_add(struct mlx5e_priv *priv,
 		ct_labels[1] = key->ct_labels[1] & mask->ct_labels[1];
 		ct_labels[2] = key->ct_labels[2] & mask->ct_labels[2];
 		ct_labels[3] = key->ct_labels[3] & mask->ct_labels[3];
-		if (mapping_add(ct_priv->labels_mapping, ct_labels, &ct_attr->ct_labels_id))
+		if (mapping_add(priv->labels_mapping, ct_labels, &ct_attr->ct_labels_id))
 			return -EOPNOTSUPP;
 		mlx5e_tc_match_to_reg_match(spec, LABELS_TO_REG, ct_attr->ct_labels_id,
 					    MLX5_CT_LABELS_MASK);
@@ -1066,14 +1063,12 @@ mlx5_tc_ct_match_add(struct mlx5e_priv *priv,
 }
 
 int
-mlx5_tc_ct_parse_action(struct mlx5e_priv *priv,
+mlx5_tc_ct_parse_action(struct mlx5_tc_ct_priv *priv,
 			struct mlx5_flow_attr *attr,
 			const struct flow_action_entry *act,
 			struct netlink_ext_ack *extack)
 {
-	struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv);
-
-	if (!ct_priv) {
+	if (!priv) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "offload of ct action isn't available");
 		return -EOPNOTSUPP;
@@ -1093,7 +1088,7 @@ static int tc_ct_pre_ct_add_rules(struct mlx5_ct_ft *ct_ft,
 	struct mlx5_tc_ct_priv *ct_priv = ct_ft->ct_priv;
 	struct mlx5e_tc_mod_hdr_acts pre_mod_acts = {};
 	struct mlx5_core_dev *dev = ct_priv->esw->dev;
-	struct mlx5_flow_table *fdb = pre_ct->fdb;
+	struct mlx5_flow_table *ft = pre_ct->ft;
 	struct mlx5_flow_destination dest = {};
 	struct mlx5_flow_act flow_act = {};
 	struct mlx5_modify_hdr *mod_hdr;
@@ -1108,14 +1103,14 @@ static int tc_ct_pre_ct_add_rules(struct mlx5_ct_ft *ct_ft,
 		return -ENOMEM;
 
 	zone = ct_ft->zone & MLX5_CT_ZONE_MASK;
-	err = mlx5e_tc_match_to_reg_set(dev, &pre_mod_acts, ZONE_TO_REG, zone);
+	err = mlx5e_tc_match_to_reg_set(dev, &pre_mod_acts, ct_priv->ns_type,
+					ZONE_TO_REG, zone);
 	if (err) {
 		ct_dbg("Failed to set zone register mapping");
 		goto err_mapping;
 	}
 
-	mod_hdr = mlx5_modify_header_alloc(dev,
-					   MLX5_FLOW_NAMESPACE_FDB,
+	mod_hdr = mlx5_modify_header_alloc(dev, ct_priv->ns_type,
 					   pre_mod_acts.num_actions,
 					   pre_mod_acts.actions);
 
@@ -1141,7 +1136,7 @@ static int tc_ct_pre_ct_add_rules(struct mlx5_ct_ft *ct_ft,
 	mlx5e_tc_match_to_reg_match(spec, CTSTATE_TO_REG, ctstate, ctstate);
 
 	dest.ft = ct_priv->post_ct;
-	rule = mlx5_add_flow_rules(fdb, spec, &flow_act, &dest, 1);
+	rule = mlx5_add_flow_rules(ft, spec, &flow_act, &dest, 1);
 	if (IS_ERR(rule)) {
 		err = PTR_ERR(rule);
 		ct_dbg("Failed to add pre ct flow rule zone %d", zone);
@@ -1152,7 +1147,7 @@ static int tc_ct_pre_ct_add_rules(struct mlx5_ct_ft *ct_ft,
 	/* add miss rule */
 	memset(spec, 0, sizeof(*spec));
 	dest.ft = nat ? ct_priv->ct_nat : ct_priv->ct;
-	rule = mlx5_add_flow_rules(fdb, spec, &flow_act, &dest, 1);
+	rule = mlx5_add_flow_rules(ft, spec, &flow_act, &dest, 1);
 	if (IS_ERR(rule)) {
 		err = PTR_ERR(rule);
 		ct_dbg("Failed to add pre ct miss rule zone %d", zone);
@@ -1203,10 +1198,10 @@ mlx5_tc_ct_alloc_pre_ct(struct mlx5_ct_ft *ct_ft,
 	void *misc;
 	int err;
 
-	ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_FDB);
+	ns = mlx5_get_flow_namespace(dev, ct_priv->ns_type);
 	if (!ns) {
 		err = -EOPNOTSUPP;
-		ct_dbg("Failed to get FDB flow namespace");
+		ct_dbg("Failed to get flow namespace");
 		return err;
 	}
 
@@ -1215,7 +1210,8 @@ mlx5_tc_ct_alloc_pre_ct(struct mlx5_ct_ft *ct_ft,
 		return -ENOMEM;
 
 	ft_attr.flags = MLX5_FLOW_TABLE_UNMANAGED;
-	ft_attr.prio = FDB_TC_OFFLOAD;
+	ft_attr.prio =  ct_priv->ns_type ==  MLX5_FLOW_NAMESPACE_FDB ?
+			FDB_TC_OFFLOAD : MLX5E_TC_PRIO;
 	ft_attr.max_fte = 2;
 	ft_attr.level = 1;
 	ft = mlx5_create_flow_table(ns, &ft_attr);
@@ -1224,7 +1220,7 @@ mlx5_tc_ct_alloc_pre_ct(struct mlx5_ct_ft *ct_ft,
 		ct_dbg("Failed to create pre ct table");
 		goto out_free;
 	}
-	pre_ct->fdb = ft;
+	pre_ct->ft = ft;
 
 	/* create flow group */
 	MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index, 0);
@@ -1288,7 +1284,7 @@ mlx5_tc_ct_free_pre_ct(struct mlx5_ct_ft *ct_ft,
 	tc_ct_pre_ct_del_rules(ct_ft, pre_ct);
 	mlx5_destroy_flow_group(pre_ct->miss_grp);
 	mlx5_destroy_flow_group(pre_ct->flow_grp);
-	mlx5_destroy_flow_table(pre_ct->fdb);
+	mlx5_destroy_flow_table(pre_ct->ft);
 }
 
 static int
@@ -1407,7 +1403,7 @@ mlx5_tc_ct_del_ft_cb(struct mlx5_tc_ct_priv *ct_priv, struct mlx5_ct_ft *ft)
 /* We translate the tc filter with CT action to the following HW model:
  *
  * +---------------------+
- * + fdb prio (tc chain) +
+ * + ft prio (tc chain) +
  * + original match      +
  * +---------------------+
  *      | set chain miss mapping
@@ -1437,16 +1433,16 @@ mlx5_tc_ct_del_ft_cb(struct mlx5_tc_ct_priv *ct_priv, struct mlx5_ct_ft *ft)
  * +--------------+
  */
 static struct mlx5_flow_handle *
-__mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
+__mlx5_tc_ct_flow_offload(struct mlx5_tc_ct_priv *ct_priv,
 			  struct mlx5e_tc_flow *flow,
 			  struct mlx5_flow_spec *orig_spec,
 			  struct mlx5_flow_attr *attr)
 {
-	struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv);
 	bool nat = attr->ct_attr.ct_action & TCA_CT_ACT_NAT;
+	struct mlx5e_priv *priv = netdev_priv(ct_priv->netdev);
 	struct mlx5e_tc_mod_hdr_acts pre_mod_acts = {};
+	u32 attr_sz = ns_to_attr_sz(ct_priv->ns_type);
 	struct mlx5_flow_spec *post_ct_spec = NULL;
-	struct mlx5_eswitch *esw = ct_priv->esw;
 	struct mlx5_flow_attr *pre_ct_attr;
 	struct mlx5_modify_hdr *mod_hdr;
 	struct mlx5_flow_handle *rule;
@@ -1483,21 +1479,21 @@ __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 	ct_flow->fte_id = fte_id;
 
 	/* Base flow attributes of both rules on original rule attribute */
-	ct_flow->pre_ct_attr = mlx5_alloc_flow_attr(MLX5_FLOW_NAMESPACE_FDB);
+	ct_flow->pre_ct_attr = mlx5_alloc_flow_attr(ct_priv->ns_type);
 	if (!ct_flow->pre_ct_attr) {
 		err = -ENOMEM;
 		goto err_alloc_pre;
 	}
 
-	ct_flow->post_ct_attr = mlx5_alloc_flow_attr(MLX5_FLOW_NAMESPACE_FDB);
+	ct_flow->post_ct_attr = mlx5_alloc_flow_attr(ct_priv->ns_type);
 	if (!ct_flow->post_ct_attr) {
 		err = -ENOMEM;
 		goto err_alloc_post;
 	}
 
 	pre_ct_attr = ct_flow->pre_ct_attr;
-	memcpy(pre_ct_attr, attr, ESW_FLOW_ATTR_SZ);
-	memcpy(ct_flow->post_ct_attr, attr, ESW_FLOW_ATTR_SZ);
+	memcpy(pre_ct_attr, attr, attr_sz);
+	memcpy(ct_flow->post_ct_attr, attr, attr_sz);
 
 	/* Modify the original rule's action to fwd and modify, leave decap */
 	pre_ct_attr->action = attr->action & MLX5_FLOW_CONTEXT_ACTION_DECAP;
@@ -1508,7 +1504,7 @@ __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 	 * don't go though all prios of this chain as normal tc rules
 	 * miss.
 	 */
-	err = mlx5_chains_get_chain_mapping(esw_chains(esw), attr->chain,
+	err = mlx5_chains_get_chain_mapping(ct_priv->chains, attr->chain,
 					    &chain_mapping);
 	if (err) {
 		ct_dbg("Failed to get chain register mapping for chain");
@@ -1516,14 +1512,14 @@ __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 	}
 	ct_flow->chain_mapping = chain_mapping;
 
-	err = mlx5e_tc_match_to_reg_set(esw->dev, &pre_mod_acts,
+	err = mlx5e_tc_match_to_reg_set(priv->mdev, &pre_mod_acts, ct_priv->ns_type,
 					CHAIN_TO_REG, chain_mapping);
 	if (err) {
 		ct_dbg("Failed to set chain register mapping");
 		goto err_mapping;
 	}
 
-	err = mlx5e_tc_match_to_reg_set(esw->dev, &pre_mod_acts,
+	err = mlx5e_tc_match_to_reg_set(priv->mdev, &pre_mod_acts, ct_priv->ns_type,
 					FTEID_TO_REG, fte_id);
 	if (err) {
 		ct_dbg("Failed to set fte_id register mapping");
@@ -1537,7 +1533,8 @@ __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 	    attr->chain == 0) {
 		u32 tun_id = mlx5e_tc_get_flow_tun_id(flow);
 
-		err = mlx5e_tc_match_to_reg_set(esw->dev, &pre_mod_acts,
+		err = mlx5e_tc_match_to_reg_set(priv->mdev, &pre_mod_acts,
+						ct_priv->ns_type,
 						TUNNEL_TO_REG,
 						tun_id);
 		if (err) {
@@ -1546,8 +1543,7 @@ __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 		}
 	}
 
-	mod_hdr = mlx5_modify_header_alloc(esw->dev,
-					   MLX5_FLOW_NAMESPACE_FDB,
+	mod_hdr = mlx5_modify_header_alloc(priv->mdev, ct_priv->ns_type,
 					   pre_mod_acts.num_actions,
 					   pre_mod_acts.actions);
 	if (IS_ERR(mod_hdr)) {
@@ -1563,7 +1559,7 @@ __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 	mlx5e_tc_match_to_reg_match(post_ct_spec, FTEID_TO_REG,
 				    fte_id, MLX5_FTE_ID_MASK);
 
-	/* Put post_ct rule on post_ct fdb */
+	/* Put post_ct rule on post_ct flow table */
 	ct_flow->post_ct_attr->chain = 0;
 	ct_flow->post_ct_attr->prio = 0;
 	ct_flow->post_ct_attr->ft = ct_priv->post_ct;
@@ -1571,8 +1567,8 @@ __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 	ct_flow->post_ct_attr->inner_match_level = MLX5_MATCH_NONE;
 	ct_flow->post_ct_attr->outer_match_level = MLX5_MATCH_NONE;
 	ct_flow->post_ct_attr->action &= ~(MLX5_FLOW_CONTEXT_ACTION_DECAP);
-	rule = mlx5_eswitch_add_offloaded_rule(esw, post_ct_spec,
-					       ct_flow->post_ct_attr);
+	rule = mlx5_tc_rule_insert(priv, post_ct_spec,
+				   ct_flow->post_ct_attr);
 	ct_flow->post_ct_rule = rule;
 	if (IS_ERR(ct_flow->post_ct_rule)) {
 		err = PTR_ERR(ct_flow->post_ct_rule);
@@ -1582,10 +1578,9 @@ __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 
 	/* Change original rule point to ct table */
 	pre_ct_attr->dest_chain = 0;
-	pre_ct_attr->dest_ft = nat ? ft->pre_ct_nat.fdb : ft->pre_ct.fdb;
-	ct_flow->pre_ct_rule = mlx5_eswitch_add_offloaded_rule(esw,
-							       orig_spec,
-							       pre_ct_attr);
+	pre_ct_attr->dest_ft = nat ? ft->pre_ct_nat.ft : ft->pre_ct.ft;
+	ct_flow->pre_ct_rule = mlx5_tc_rule_insert(priv, orig_spec,
+						   pre_ct_attr);
 	if (IS_ERR(ct_flow->pre_ct_rule)) {
 		err = PTR_ERR(ct_flow->pre_ct_rule);
 		ct_dbg("Failed to add pre ct rule");
@@ -1599,13 +1594,13 @@ __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 	return rule;
 
 err_insert_orig:
-	mlx5_eswitch_del_offloaded_rule(ct_priv->esw, ct_flow->post_ct_rule,
-					ct_flow->post_ct_attr);
+	mlx5_tc_rule_delete(priv, ct_flow->post_ct_rule,
+			    ct_flow->post_ct_attr);
 err_insert_post_ct:
 	mlx5_modify_header_dealloc(priv->mdev, pre_ct_attr->modify_hdr);
 err_mapping:
 	dealloc_mod_hdr_actions(&pre_mod_acts);
-	mlx5_chains_put_chain_mapping(esw_chains(esw), ct_flow->chain_mapping);
+	mlx5_chains_put_chain_mapping(ct_priv->chains, ct_flow->chain_mapping);
 err_get_chain:
 	kfree(ct_flow->post_ct_attr);
 err_alloc_post:
@@ -1622,13 +1617,13 @@ __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 }
 
 static struct mlx5_flow_handle *
-__mlx5_tc_ct_flow_offload_clear(struct mlx5e_priv *priv,
+__mlx5_tc_ct_flow_offload_clear(struct mlx5_tc_ct_priv *ct_priv,
 				struct mlx5_flow_spec *orig_spec,
 				struct mlx5_flow_attr *attr,
 				struct mlx5e_tc_mod_hdr_acts *mod_acts)
 {
-	struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv);
-	struct mlx5_eswitch *esw = ct_priv->esw;
+	struct mlx5e_priv *priv = netdev_priv(ct_priv->netdev);
+	u32 attr_sz = ns_to_attr_sz(ct_priv->ns_type);
 	struct mlx5_flow_attr *pre_ct_attr;
 	struct mlx5_modify_hdr *mod_hdr;
 	struct mlx5_flow_handle *rule;
@@ -1640,13 +1635,13 @@ __mlx5_tc_ct_flow_offload_clear(struct mlx5e_priv *priv,
 		return ERR_PTR(-ENOMEM);
 
 	/* Base esw attributes on original rule attribute */
-	pre_ct_attr = mlx5_alloc_flow_attr(MLX5_FLOW_NAMESPACE_FDB);
+	pre_ct_attr = mlx5_alloc_flow_attr(ct_priv->ns_type);
 	if (!pre_ct_attr) {
 		err = -ENOMEM;
 		goto err_attr;
 	}
 
-	memcpy(pre_ct_attr, attr, ESW_FLOW_ATTR_SZ);
+	memcpy(pre_ct_attr, attr, attr_sz);
 
 	err = mlx5_tc_ct_entry_set_registers(ct_priv, mod_acts, 0, 0, 0, 0);
 	if (err) {
@@ -1654,8 +1649,7 @@ __mlx5_tc_ct_flow_offload_clear(struct mlx5e_priv *priv,
 		goto err_set_registers;
 	}
 
-	mod_hdr = mlx5_modify_header_alloc(esw->dev,
-					   MLX5_FLOW_NAMESPACE_FDB,
+	mod_hdr = mlx5_modify_header_alloc(priv->mdev, ct_priv->ns_type,
 					   mod_acts->num_actions,
 					   mod_acts->actions);
 	if (IS_ERR(mod_hdr)) {
@@ -1668,7 +1662,7 @@ __mlx5_tc_ct_flow_offload_clear(struct mlx5e_priv *priv,
 	pre_ct_attr->modify_hdr = mod_hdr;
 	pre_ct_attr->action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
 
-	rule = mlx5_eswitch_add_offloaded_rule(esw, orig_spec, pre_ct_attr);
+	rule = mlx5_tc_rule_insert(priv, orig_spec, pre_ct_attr);
 	if (IS_ERR(rule)) {
 		err = PTR_ERR(rule);
 		ct_dbg("Failed to add ct clear rule");
@@ -1693,45 +1687,45 @@ __mlx5_tc_ct_flow_offload_clear(struct mlx5e_priv *priv,
 }
 
 struct mlx5_flow_handle *
-mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
+mlx5_tc_ct_flow_offload(struct mlx5_tc_ct_priv *priv,
 			struct mlx5e_tc_flow *flow,
 			struct mlx5_flow_spec *spec,
 			struct mlx5_flow_attr *attr,
 			struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts)
 {
 	bool clear_action = attr->ct_attr.ct_action & TCA_CT_ACT_CLEAR;
-	struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv);
 	struct mlx5_flow_handle *rule;
 
-	if (!ct_priv)
+	if (!priv)
 		return ERR_PTR(-EOPNOTSUPP);
 
-	mutex_lock(&ct_priv->control_lock);
+	mutex_lock(&priv->control_lock);
 
 	if (clear_action)
 		rule = __mlx5_tc_ct_flow_offload_clear(priv, spec, attr, mod_hdr_acts);
 	else
 		rule = __mlx5_tc_ct_flow_offload(priv, flow, spec, attr);
-	mutex_unlock(&ct_priv->control_lock);
+	mutex_unlock(&priv->control_lock);
 
 	return rule;
 }
 
 static void
 __mlx5_tc_ct_delete_flow(struct mlx5_tc_ct_priv *ct_priv,
+			 struct mlx5e_tc_flow *flow,
 			 struct mlx5_ct_flow *ct_flow)
 {
 	struct mlx5_flow_attr *pre_ct_attr = ct_flow->pre_ct_attr;
-	struct mlx5_eswitch *esw = ct_priv->esw;
+	struct mlx5e_priv *priv = netdev_priv(ct_priv->netdev);
 
-	mlx5_eswitch_del_offloaded_rule(esw, ct_flow->pre_ct_rule,
-					pre_ct_attr);
-	mlx5_modify_header_dealloc(esw->dev, pre_ct_attr->modify_hdr);
+	mlx5_tc_rule_delete(priv, ct_flow->pre_ct_rule,
+			    pre_ct_attr);
+	mlx5_modify_header_dealloc(priv->mdev, pre_ct_attr->modify_hdr);
 
 	if (ct_flow->post_ct_rule) {
-		mlx5_eswitch_del_offloaded_rule(esw, ct_flow->post_ct_rule,
-						ct_flow->post_ct_attr);
-		mlx5_chains_put_chain_mapping(esw_chains(esw), ct_flow->chain_mapping);
+		mlx5_tc_rule_delete(priv, ct_flow->post_ct_rule,
+				    ct_flow->post_ct_attr);
+		mlx5_chains_put_chain_mapping(ct_priv->chains, ct_flow->chain_mapping);
 		idr_remove(&ct_priv->fte_ids, ct_flow->fte_id);
 		mlx5_tc_ct_del_ft_cb(ct_priv, ct_flow->ft);
 	}
@@ -1742,10 +1736,10 @@ __mlx5_tc_ct_delete_flow(struct mlx5_tc_ct_priv *ct_priv,
 }
 
 void
-mlx5_tc_ct_delete_flow(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow,
+mlx5_tc_ct_delete_flow(struct mlx5_tc_ct_priv *priv,
+		       struct mlx5e_tc_flow *flow,
 		       struct mlx5_flow_attr *attr)
 {
-	struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv);
 	struct mlx5_ct_flow *ct_flow = attr->ct_attr.ct_flow;
 
 	/* We are called on error to clean up stuff from parsing
@@ -1754,22 +1748,15 @@ mlx5_tc_ct_delete_flow(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow,
 	if (!ct_flow)
 		return;
 
-	mutex_lock(&ct_priv->control_lock);
-	__mlx5_tc_ct_delete_flow(ct_priv, ct_flow);
-	mutex_unlock(&ct_priv->control_lock);
+	mutex_lock(&priv->control_lock);
+	__mlx5_tc_ct_delete_flow(priv, flow, ct_flow);
+	mutex_unlock(&priv->control_lock);
 }
 
 static int
-mlx5_tc_ct_init_check_support(struct mlx5_eswitch *esw,
-			      const char **err_msg)
+mlx5_tc_ct_init_check_esw_support(struct mlx5_eswitch *esw,
+				  const char **err_msg)
 {
-#if !IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
-	/* cannot restore chain ID on HW miss */
-
-	*err_msg = "tc skb extension missing";
-	return -EOPNOTSUPP;
-#endif
-
 	if (!MLX5_CAP_ESW_FLOWTABLE_FDB(esw->dev, ignore_flow_level)) {
 		*err_msg = "firmware level support is missing";
 		return -EOPNOTSUPP;
@@ -1803,25 +1790,51 @@ mlx5_tc_ct_init_check_support(struct mlx5_eswitch *esw,
 	return 0;
 }
 
+static int
+mlx5_tc_ct_init_check_nic_support(struct mlx5e_priv *priv,
+				  const char **err_msg)
+{
+	if (!MLX5_CAP_FLOWTABLE_NIC_RX(priv->mdev, ignore_flow_level)) {
+		*err_msg = "firmware level support is missing";
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
+static int
+mlx5_tc_ct_init_check_support(struct mlx5e_priv *priv,
+			      enum mlx5_flow_namespace_type ns_type,
+			      const char **err_msg)
+{
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+
+#if !IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
+	/* cannot restore chain ID on HW miss */
+
+	*err_msg = "tc skb extension missing";
+	return -EOPNOTSUPP;
+#endif
+	if (ns_type == MLX5_FLOW_NAMESPACE_FDB)
+		return mlx5_tc_ct_init_check_esw_support(esw, err_msg);
+	else
+		return mlx5_tc_ct_init_check_nic_support(priv, err_msg);
+}
+
 #define INIT_ERR_PREFIX "tc ct offload init failed"
 
-int
-mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv)
+struct mlx5_tc_ct_priv *
+mlx5_tc_ct_init(struct mlx5e_priv *priv, struct mlx5_fs_chains *chains,
+		struct mod_hdr_tbl *mod_hdr,
+		enum mlx5_flow_namespace_type ns_type)
 {
 	struct mlx5_tc_ct_priv *ct_priv;
-	struct mlx5e_rep_priv *rpriv;
 	struct mlx5_core_dev *dev;
-	struct mlx5_eswitch *esw;
-	struct mlx5e_priv *priv;
 	const char *msg;
 	int err;
 
-	rpriv = container_of(uplink_priv, struct mlx5e_rep_priv, uplink_priv);
-	priv = netdev_priv(rpriv->netdev);
 	dev = priv->mdev;
-	esw = dev->priv.eswitch;
-
-	err = mlx5_tc_ct_init_check_support(esw, &msg);
+	err = mlx5_tc_ct_init_check_support(priv, ns_type, &msg);
 	if (err) {
 		mlx5_core_warn(dev,
 			       "tc ct offload not supported, %s\n",
@@ -1845,9 +1858,12 @@ mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv)
 		goto err_mapping_labels;
 	}
 
-	ct_priv->esw = esw;
-	ct_priv->netdev = rpriv->netdev;
-	ct_priv->ct = mlx5_chains_create_global_table(esw_chains(esw));
+	ct_priv->ns_type = ns_type;
+	ct_priv->chains = chains;
+	ct_priv->esw = priv->mdev->priv.eswitch;
+	ct_priv->netdev = priv->netdev;
+	ct_priv->mod_hdr_tbl = mod_hdr;
+	ct_priv->ct = mlx5_chains_create_global_table(chains);
 	if (IS_ERR(ct_priv->ct)) {
 		err = PTR_ERR(ct_priv->ct);
 		mlx5_core_warn(dev,
@@ -1856,7 +1872,7 @@ mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv)
 		goto err_ct_tbl;
 	}
 
-	ct_priv->ct_nat = mlx5_chains_create_global_table(esw_chains(esw));
+	ct_priv->ct_nat = mlx5_chains_create_global_table(chains);
 	if (IS_ERR(ct_priv->ct_nat)) {
 		err = PTR_ERR(ct_priv->ct_nat);
 		mlx5_core_warn(dev,
@@ -1865,7 +1881,7 @@ mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv)
 		goto err_ct_nat_tbl;
 	}
 
-	ct_priv->post_ct = mlx5_chains_create_global_table(esw_chains(esw));
+	ct_priv->post_ct = mlx5_chains_create_global_table(chains);
 	if (IS_ERR(ct_priv->post_ct)) {
 		err = PTR_ERR(ct_priv->post_ct);
 		mlx5_core_warn(dev,
@@ -1880,15 +1896,12 @@ mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv)
 	rhashtable_init(&ct_priv->ct_tuples_ht, &tuples_ht_params);
 	rhashtable_init(&ct_priv->ct_tuples_nat_ht, &tuples_nat_ht_params);
 
-	/* Done, set ct_priv to know it initializted */
-	uplink_priv->ct_priv = ct_priv;
-
-	return 0;
+	return ct_priv;
 
 err_post_ct_tbl:
-	mlx5_chains_destroy_global_table(esw_chains(esw), ct_priv->ct_nat);
+	mlx5_chains_destroy_global_table(chains, ct_priv->ct_nat);
 err_ct_nat_tbl:
-	mlx5_chains_destroy_global_table(esw_chains(esw), ct_priv->ct);
+	mlx5_chains_destroy_global_table(chains, ct_priv->ct);
 err_ct_tbl:
 	mapping_destroy(ct_priv->labels_mapping);
 err_mapping_labels:
@@ -1898,21 +1911,18 @@ mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv)
 err_alloc:
 err_support:
 
-	return 0;
+	return NULL;
 }
 
 void
-mlx5_tc_ct_clean(struct mlx5_rep_uplink_priv *uplink_priv)
+mlx5_tc_ct_clean(struct mlx5_tc_ct_priv *ct_priv)
 {
-	struct mlx5_tc_ct_priv *ct_priv = uplink_priv->ct_priv;
 	struct mlx5_fs_chains *chains;
-	struct mlx5_eswitch *esw;
 
 	if (!ct_priv)
 		return;
 
-	esw = ct_priv->esw;
-	chains = esw_chains(esw);
+	chains = ct_priv->chains;
 
 	mlx5_chains_destroy_global_table(chains, ct_priv->post_ct);
 	mlx5_chains_destroy_global_table(chains, ct_priv->ct_nat);
@@ -1926,15 +1936,12 @@ mlx5_tc_ct_clean(struct mlx5_rep_uplink_priv *uplink_priv)
 	mutex_destroy(&ct_priv->control_lock);
 	idr_destroy(&ct_priv->fte_ids);
 	kfree(ct_priv);
-
-	uplink_priv->ct_priv = NULL;
 }
 
 bool
-mlx5e_tc_ct_restore_flow(struct mlx5_rep_uplink_priv *uplink_priv,
+mlx5e_tc_ct_restore_flow(struct mlx5_tc_ct_priv *ct_priv,
 			 struct sk_buff *skb, u8 zone_restore_id)
 {
-	struct mlx5_tc_ct_priv *ct_priv = uplink_priv->ct_priv;
 	struct mlx5_ct_tuple tuple = {};
 	struct mlx5_ct_entry *entry;
 	u16 zone;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
index 2bfe930faa3b..bab872b76a5a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
@@ -16,6 +16,8 @@ struct mlx5_rep_uplink_priv;
 struct mlx5e_tc_flow;
 struct mlx5e_priv;
 
+struct mlx5_fs_chains;
+struct mlx5_tc_ct_priv;
 struct mlx5_ct_flow;
 
 struct nf_flowtable;
@@ -76,22 +78,32 @@ struct mlx5_ct_attr {
 				 misc_parameters_2.metadata_reg_c_1) + 3,\
 }
 
+#define nic_zone_restore_to_reg_ct {\
+	.mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_B,\
+	.moffset = 2,\
+	.mlen = 1,\
+}
+
 #define REG_MAPPING_MLEN(reg) (mlx5e_tc_attr_to_reg_mappings[reg].mlen)
+#define REG_MAPPING_MOFFSET(reg) (mlx5e_tc_attr_to_reg_mappings[reg].moffset)
+#define REG_MAPPING_SHIFT(reg) (REG_MAPPING_MOFFSET(reg) * 8)
 #define ZONE_RESTORE_BITS (REG_MAPPING_MLEN(ZONE_RESTORE_TO_REG) * 8)
 #define ZONE_RESTORE_MAX GENMASK(ZONE_RESTORE_BITS - 1, 0)
 
 #if IS_ENABLED(CONFIG_MLX5_TC_CT)
 
-int
-mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv);
+struct mlx5_tc_ct_priv *
+mlx5_tc_ct_init(struct mlx5e_priv *priv, struct mlx5_fs_chains *chains,
+		struct mod_hdr_tbl *mod_hdr,
+		enum mlx5_flow_namespace_type ns_type);
 void
-mlx5_tc_ct_clean(struct mlx5_rep_uplink_priv *uplink_priv);
+mlx5_tc_ct_clean(struct mlx5_tc_ct_priv *ct_priv);
 
 void
-mlx5_tc_ct_match_del(struct mlx5e_priv *priv, struct mlx5_ct_attr *ct_attr);
+mlx5_tc_ct_match_del(struct mlx5_tc_ct_priv *priv, struct mlx5_ct_attr *ct_attr);
 
 int
-mlx5_tc_ct_match_add(struct mlx5e_priv *priv,
+mlx5_tc_ct_match_add(struct mlx5_tc_ct_priv *priv,
 		     struct mlx5_flow_spec *spec,
 		     struct flow_cls_offload *f,
 		     struct mlx5_ct_attr *ct_attr,
@@ -100,44 +112,46 @@ int
 mlx5_tc_ct_add_no_trk_match(struct mlx5e_priv *priv,
 			    struct mlx5_flow_spec *spec);
 int
-mlx5_tc_ct_parse_action(struct mlx5e_priv *priv,
+mlx5_tc_ct_parse_action(struct mlx5_tc_ct_priv *priv,
 			struct mlx5_flow_attr *attr,
 			const struct flow_action_entry *act,
 			struct netlink_ext_ack *extack);
 
 struct mlx5_flow_handle *
-mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
+mlx5_tc_ct_flow_offload(struct mlx5_tc_ct_priv *priv,
 			struct mlx5e_tc_flow *flow,
 			struct mlx5_flow_spec *spec,
 			struct mlx5_flow_attr *attr,
 			struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts);
 void
-mlx5_tc_ct_delete_flow(struct mlx5e_priv *priv,
+mlx5_tc_ct_delete_flow(struct mlx5_tc_ct_priv *priv,
 		       struct mlx5e_tc_flow *flow,
 		       struct mlx5_flow_attr *attr);
 
 bool
-mlx5e_tc_ct_restore_flow(struct mlx5_rep_uplink_priv *uplink_priv,
+mlx5e_tc_ct_restore_flow(struct mlx5_tc_ct_priv *ct_priv,
 			 struct sk_buff *skb, u8 zone_restore_id);
 
 #else /* CONFIG_MLX5_TC_CT */
 
-static inline int
-mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv)
+static inline struct mlx5_tc_ct_priv *
+mlx5_tc_ct_init(struct mlx5e_priv *priv, struct mlx5_fs_chains *chains,
+		struct mod_hdr_tbl *mod_hdr,
+		enum mlx5_flow_namespace_type ns_type)
 {
-	return 0;
+	return NULL;
 }
 
 static inline void
-mlx5_tc_ct_clean(struct mlx5_rep_uplink_priv *uplink_priv)
+mlx5_tc_ct_clean(struct mlx5_tc_ct_priv *ct_priv)
 {
 }
 
 static inline void
-mlx5_tc_ct_match_del(struct mlx5e_priv *priv, struct mlx5_ct_attr *ct_attr) {}
+mlx5_tc_ct_match_del(struct mlx5_tc_ct_priv *priv, struct mlx5_ct_attr *ct_attr) {}
 
 static inline int
-mlx5_tc_ct_match_add(struct mlx5e_priv *priv,
+mlx5_tc_ct_match_add(struct mlx5_tc_ct_priv *priv,
 		     struct mlx5_flow_spec *spec,
 		     struct flow_cls_offload *f,
 		     struct mlx5_ct_attr *ct_attr,
@@ -149,7 +163,6 @@ mlx5_tc_ct_match_add(struct mlx5e_priv *priv,
 		return 0;
 
 	NL_SET_ERR_MSG_MOD(extack, "mlx5 tc ct offload isn't enabled.");
-	netdev_warn(priv->netdev, "mlx5 tc ct offload isn't enabled.\n");
 	return -EOPNOTSUPP;
 }
 
@@ -161,18 +174,17 @@ mlx5_tc_ct_add_no_trk_match(struct mlx5e_priv *priv,
 }
 
 static inline int
-mlx5_tc_ct_parse_action(struct mlx5e_priv *priv,
+mlx5_tc_ct_parse_action(struct mlx5_tc_ct_priv *priv,
 			struct mlx5_flow_attr *attr,
 			const struct flow_action_entry *act,
 			struct netlink_ext_ack *extack)
 {
 	NL_SET_ERR_MSG_MOD(extack, "mlx5 tc ct offload isn't enabled.");
-	netdev_warn(priv->netdev, "mlx5 tc ct offload isn't enabled.\n");
 	return -EOPNOTSUPP;
 }
 
 static inline struct mlx5_flow_handle *
-mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
+mlx5_tc_ct_flow_offload(struct mlx5_tc_ct_priv *priv,
 			struct mlx5e_tc_flow *flow,
 			struct mlx5_flow_spec *spec,
 			struct mlx5_flow_attr *attr,
@@ -182,14 +194,14 @@ mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv,
 }
 
 static inline void
-mlx5_tc_ct_delete_flow(struct mlx5e_priv *priv,
+mlx5_tc_ct_delete_flow(struct mlx5_tc_ct_priv *priv,
 		       struct mlx5e_tc_flow *flow,
 		       struct mlx5_flow_attr *attr)
 {
 }
 
 static inline bool
-mlx5e_tc_ct_restore_flow(struct mlx5_rep_uplink_priv *uplink_priv,
+mlx5e_tc_ct_restore_flow(struct mlx5_tc_ct_priv *ct_priv,
 			 struct sk_buff *skb, u8 zone_restore_id)
 {
 	if (!zone_restore_id)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index da05c4c195ff..4084a293442d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -186,6 +186,7 @@ struct mlx5e_tc_attr_to_reg_mapping mlx5e_tc_attr_to_reg_mappings[] = {
 		.moffset = 0,
 		.mlen = 2,
 	},
+	[NIC_ZONE_RESTORE_TO_REG] = nic_zone_restore_to_reg_ct,
 };
 
 static void mlx5e_put_flow_tunnel_id(struct mlx5e_tc_flow *flow);
@@ -239,6 +240,7 @@ mlx5e_tc_match_to_reg_get_match(struct mlx5_flow_spec *spec,
 int
 mlx5e_tc_match_to_reg_set(struct mlx5_core_dev *mdev,
 			  struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts,
+			  enum mlx5_flow_namespace_type ns,
 			  enum mlx5e_tc_attr_to_reg type,
 			  u32 data)
 {
@@ -248,8 +250,7 @@ mlx5e_tc_match_to_reg_set(struct mlx5_core_dev *mdev,
 	char *modact;
 	int err;
 
-	err = alloc_mod_hdr_actions(mdev, MLX5_FLOW_NAMESPACE_FDB,
-				    mod_hdr_acts);
+	err = alloc_mod_hdr_actions(mdev, ns, mod_hdr_acts);
 	if (err)
 		return err;
 
@@ -270,6 +271,54 @@ mlx5e_tc_match_to_reg_set(struct mlx5_core_dev *mdev,
 	return 0;
 }
 
+#define esw_offloads_mode(esw) (mlx5_eswitch_mode(esw) == MLX5_ESWITCH_OFFLOADS)
+
+static struct mlx5_tc_ct_priv *
+get_ct_priv(struct mlx5e_priv *priv)
+{
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+	struct mlx5_rep_uplink_priv *uplink_priv;
+	struct mlx5e_rep_priv *uplink_rpriv;
+
+	if (esw_offloads_mode(esw)) {
+		uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
+		uplink_priv = &uplink_rpriv->uplink_priv;
+
+		return uplink_priv->ct_priv;
+	}
+
+	return priv->fs.tc.ct;
+}
+
+struct mlx5_flow_handle *
+mlx5_tc_rule_insert(struct mlx5e_priv *priv,
+		    struct mlx5_flow_spec *spec,
+		    struct mlx5_flow_attr *attr)
+{
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+
+	if (esw_offloads_mode(esw))
+		return mlx5_eswitch_add_offloaded_rule(esw, spec, attr);
+
+	return	mlx5e_add_offloaded_nic_rule(priv, spec, attr);
+}
+
+void
+mlx5_tc_rule_delete(struct mlx5e_priv *priv,
+		    struct mlx5_flow_handle *rule,
+		    struct mlx5_flow_attr *attr)
+{
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+
+	if (esw_offloads_mode(esw)) {
+		mlx5_eswitch_del_offloaded_rule(esw, rule, attr);
+
+		return;
+	}
+
+	mlx5e_del_offloaded_nic_rule(priv, rule, attr);
+}
+
 struct mlx5e_hairpin {
 	struct mlx5_hairpin *pair;
 
@@ -365,7 +414,7 @@ static bool __flow_flag_test(struct mlx5e_tc_flow *flow, unsigned long flag)
 #define flow_flag_test(flow, flag) __flow_flag_test(flow, \
 						    MLX5E_TC_FLOW_FLAG_##flag)
 
-static bool mlx5e_is_eswitch_flow(struct mlx5e_tc_flow *flow)
+bool mlx5e_is_eswitch_flow(struct mlx5e_tc_flow *flow)
 {
 	return flow_flag_test(flow, ESWITCH);
 }
@@ -903,7 +952,11 @@ mlx5e_add_offloaded_nic_rule(struct mlx5e_priv *priv,
 	flow_context->flags |= FLOW_CONTEXT_HAS_TAG;
 	flow_context->flow_tag = nic_attr->flow_tag;
 
-	if (nic_attr->hairpin_ft) {
+	if (attr->dest_ft) {
+		dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
+		dest[dest_ix].ft = attr->dest_ft;
+		dest_ix++;
+	} else if (nic_attr->hairpin_ft) {
 		dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
 		dest[dest_ix].ft = nic_attr->hairpin_ft;
 		dest_ix++;
@@ -954,9 +1007,13 @@ mlx5e_add_offloaded_nic_rule(struct mlx5e_priv *priv,
 	}
 	mutex_unlock(&tc->t_lock);
 
-	ft = mlx5_chains_get_table(nic_chains,
-				   attr->chain, attr->prio,
-				   MLX5E_TC_FT_LEVEL);
+	if (attr->chain || attr->prio)
+		ft = mlx5_chains_get_table(nic_chains,
+					   attr->chain, attr->prio,
+					   MLX5E_TC_FT_LEVEL);
+	else
+		ft = attr->ft;
+
 	if (IS_ERR(ft)) {
 		rule = ERR_CAST(ft);
 		goto err_ft_get;
@@ -973,9 +1030,10 @@ mlx5e_add_offloaded_nic_rule(struct mlx5e_priv *priv,
 	return rule;
 
 err_rule:
-	mlx5_chains_put_table(nic_chains,
-			      attr->chain, attr->prio,
-			      MLX5E_TC_FT_LEVEL);
+	if (attr->chain || attr->prio)
+		mlx5_chains_put_table(nic_chains,
+				      attr->chain, attr->prio,
+				      MLX5E_TC_FT_LEVEL);
 err_ft_get:
 	if (attr->dest_chain)
 		mlx5_chains_put_table(nic_chains,
@@ -1017,8 +1075,12 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 			return err;
 	}
 
-	flow->rule[0] = mlx5e_add_offloaded_nic_rule(priv, &parse_attr->spec,
-						     attr);
+	if (flow_flag_test(flow, CT))
+		flow->rule[0] = mlx5_tc_ct_flow_offload(get_ct_priv(priv), flow, &parse_attr->spec,
+							attr, &parse_attr->mod_hdr_acts);
+	else
+		flow->rule[0] = mlx5e_add_offloaded_nic_rule(priv, &parse_attr->spec,
+							     attr);
 
 	return PTR_ERR_OR_ZERO(flow->rule[0]);
 }
@@ -1031,8 +1093,9 @@ void mlx5e_del_offloaded_nic_rule(struct mlx5e_priv *priv,
 
 	mlx5_del_flow_rules(rule);
 
-	mlx5_chains_put_table(nic_chains, attr->chain, attr->prio,
-			      MLX5E_TC_FT_LEVEL);
+	if (attr->chain || attr->prio)
+		mlx5_chains_put_table(nic_chains, attr->chain, attr->prio,
+				      MLX5E_TC_FT_LEVEL);
 
 	if (attr->dest_chain)
 		mlx5_chains_put_table(nic_chains, attr->dest_chain, 1,
@@ -1045,12 +1108,13 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
 	struct mlx5_flow_attr *attr = flow->attr;
 	struct mlx5e_tc_table *tc = &priv->fs.tc;
 
-	if (!IS_ERR_OR_NULL(flow->rule[0]))
-		mlx5e_del_offloaded_nic_rule(priv, flow->rule[0], attr);
-	mlx5_fc_destroy(priv->mdev, attr->counter);
-
 	flow_flag_clear(flow, OFFLOADED);
 
+	if (flow_flag_test(flow, CT))
+		mlx5_tc_ct_delete_flow(get_ct_priv(flow->priv), flow, attr);
+	else if (!IS_ERR_OR_NULL(flow->rule[0]))
+		mlx5e_del_offloaded_nic_rule(priv, flow->rule[0], attr);
+
 	/* Remove root table if no rules are left to avoid
 	 * extra steering hops.
 	 */
@@ -1062,9 +1126,13 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
 	}
 	mutex_unlock(&priv->fs.tc.t_lock);
 
+	kvfree(attr->parse_attr);
+
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
 		mlx5e_detach_mod_hdr(priv, flow);
 
+	mlx5_fc_destroy(priv->mdev, attr->counter);
+
 	if (flow_flag_test(flow, HAIRPIN))
 		mlx5e_hairpin_flow_del(priv, flow);
 
@@ -1099,7 +1167,8 @@ mlx5e_tc_offload_fdb_rules(struct mlx5_eswitch *esw,
 	if (flow_flag_test(flow, CT)) {
 		mod_hdr_acts = &attr->parse_attr->mod_hdr_acts;
 
-		return mlx5_tc_ct_flow_offload(flow->priv, flow, spec, attr,
+		return mlx5_tc_ct_flow_offload(get_ct_priv(flow->priv),
+					       flow, spec, attr,
 					       mod_hdr_acts);
 	}
 
@@ -1126,7 +1195,7 @@ mlx5e_tc_unoffload_fdb_rules(struct mlx5_eswitch *esw,
 	flow_flag_clear(flow, OFFLOADED);
 
 	if (flow_flag_test(flow, CT)) {
-		mlx5_tc_ct_delete_flow(flow->priv, flow, attr);
+		mlx5_tc_ct_delete_flow(get_ct_priv(flow->priv), flow, attr);
 		return;
 	}
 
@@ -1383,7 +1452,7 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv,
 		}
 	kvfree(attr->parse_attr);
 
-	mlx5_tc_ct_match_del(priv, &flow->attr->ct_attr);
+	mlx5_tc_ct_match_del(get_ct_priv(priv), &flow->attr->ct_attr);
 
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
 		mlx5e_detach_mod_hdr(priv, flow);
@@ -1942,7 +2011,7 @@ static int mlx5e_get_flow_tunnel_id(struct mlx5e_priv *priv,
 	} else {
 		mod_hdr_acts = &attr->parse_attr->mod_hdr_acts;
 		err = mlx5e_tc_match_to_reg_set(priv->mdev,
-						mod_hdr_acts,
+						mod_hdr_acts, MLX5_FLOW_NAMESPACE_FDB,
 						TUNNEL_TO_REG, value);
 		if (err)
 			goto err_set;
@@ -3458,6 +3527,13 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv,
 			action |= MLX5_FLOW_CONTEXT_ACTION_COUNT;
 			attr->dest_chain = act->chain_index;
 			break;
+		case FLOW_ACTION_CT:
+			err = mlx5_tc_ct_parse_action(get_ct_priv(priv), attr, act, extack);
+			if (err)
+				return err;
+
+			flow_flag_set(flow, CT);
+			break;
 		default:
 			NL_SET_ERR_MSG_MOD(extack, "The offload action is not supported");
 			return -EOPNOTSUPP;
@@ -4288,7 +4364,7 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 			attr->dest_chain = act->chain_index;
 			break;
 		case FLOW_ACTION_CT:
-			err = mlx5_tc_ct_parse_action(priv, attr, act, extack);
+			err = mlx5_tc_ct_parse_action(get_ct_priv(priv), attr, act, extack);
 			if (err)
 				return err;
 
@@ -4558,7 +4634,7 @@ __mlx5e_add_fdb_flow(struct mlx5e_priv *priv,
 		goto err_free;
 
 	/* actions validation depends on parsing the ct matches first */
-	err = mlx5_tc_ct_match_add(priv, &parse_attr->spec, f,
+	err = mlx5_tc_ct_match_add(get_ct_priv(priv), &parse_attr->spec, f,
 				   &flow->attr->ct_attr, extack);
 	if (err)
 		goto err_free;
@@ -4704,6 +4780,11 @@ mlx5e_add_nic_flow(struct mlx5e_priv *priv,
 	if (err)
 		goto err_free;
 
+	err = mlx5_tc_ct_match_add(get_ct_priv(priv), &parse_attr->spec, f,
+				   &flow->attr->ct_attr, extack);
+	if (err)
+		goto err_free;
+
 	err = parse_tc_nic_actions(priv, &rule->action, parse_attr, flow, extack);
 	if (err)
 		goto err_free;
@@ -4713,14 +4794,12 @@ mlx5e_add_nic_flow(struct mlx5e_priv *priv,
 		goto err_free;
 
 	flow_flag_set(flow, OFFLOADED);
-	kvfree(parse_attr);
 	*__flow = flow;
 
 	return 0;
 
 err_free:
 	mlx5e_flow_put(priv, flow);
-	kvfree(parse_attr);
 out:
 	return err;
 }
@@ -5143,6 +5222,11 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv)
 		goto err_chains;
 	}
 
+	tc->ct = mlx5_tc_ct_init(priv, tc->chains, &priv->fs.tc.mod_hdr,
+				 MLX5_FLOW_NAMESPACE_KERNEL);
+	if (IS_ERR(tc->ct))
+		goto err_ct;
+
 	tc->netdevice_nb.notifier_call = mlx5e_tc_netdev_event;
 	err = register_netdevice_notifier_dev_net(priv->netdev,
 						  &tc->netdevice_nb,
@@ -5156,6 +5240,8 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv)
 	return 0;
 
 err_reg:
+	mlx5_tc_ct_clean(tc->ct);
+err_ct:
 	mlx5_chains_destroy(tc->chains);
 err_chains:
 	rhashtable_destroy(&tc->ht);
@@ -5191,6 +5277,7 @@ void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv)
 	}
 	mutex_destroy(&tc->t_lock);
 
+	mlx5_tc_ct_clean(tc->ct);
 	mlx5_chains_destroy(tc->chains);
 }
 
@@ -5198,15 +5285,22 @@ int mlx5e_tc_esw_init(struct rhashtable *tc_ht)
 {
 	const size_t sz_enc_opts = sizeof(struct tunnel_match_enc_opts);
 	struct mlx5_rep_uplink_priv *uplink_priv;
-	struct mlx5e_rep_priv *priv;
+	struct mlx5e_rep_priv *rpriv;
 	struct mapping_ctx *mapping;
-	int err;
+	struct mlx5_eswitch *esw;
+	struct mlx5e_priv *priv;
+	int err = 0;
 
 	uplink_priv = container_of(tc_ht, struct mlx5_rep_uplink_priv, tc_ht);
-	priv = container_of(uplink_priv, struct mlx5e_rep_priv, uplink_priv);
+	rpriv = container_of(uplink_priv, struct mlx5e_rep_priv, uplink_priv);
+	priv = netdev_priv(rpriv->netdev);
+	esw = priv->mdev->priv.eswitch;
 
-	err = mlx5_tc_ct_init(uplink_priv);
-	if (err)
+	uplink_priv->ct_priv = mlx5_tc_ct_init(netdev_priv(priv->netdev),
+					       esw_chains(esw),
+					       &esw->offloads.mod_hdr,
+					       MLX5_FLOW_NAMESPACE_FDB);
+	if (IS_ERR(uplink_priv->ct_priv))
 		goto err_ct;
 
 	mapping = mapping_create(sizeof(struct tunnel_match_key),
@@ -5235,7 +5329,7 @@ int mlx5e_tc_esw_init(struct rhashtable *tc_ht)
 err_enc_opts_mapping:
 	mapping_destroy(uplink_priv->tunnel_mapping);
 err_tun_mapping:
-	mlx5_tc_ct_clean(uplink_priv);
+	mlx5_tc_ct_clean(uplink_priv->ct_priv);
 err_ct:
 	netdev_warn(priv->netdev,
 		    "Failed to initialize tc (eswitch), err: %d", err);
@@ -5249,10 +5343,11 @@ void mlx5e_tc_esw_cleanup(struct rhashtable *tc_ht)
 	rhashtable_free_and_destroy(tc_ht, _mlx5e_tc_del_flow, NULL);
 
 	uplink_priv = container_of(tc_ht, struct mlx5_rep_uplink_priv, tc_ht);
+
 	mapping_destroy(uplink_priv->tunnel_enc_opts_mapping);
 	mapping_destroy(uplink_priv->tunnel_mapping);
 
-	mlx5_tc_ct_clean(uplink_priv);
+	mlx5_tc_ct_clean(uplink_priv->ct_priv);
 }
 
 int mlx5e_tc_num_filters(struct mlx5e_priv *priv, unsigned long flags)
@@ -5322,8 +5417,9 @@ bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe,
 			 struct sk_buff *skb)
 {
 #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
+	u32 chain = 0, chain_tag, reg_b, zone_restore_id;
 	struct mlx5e_priv *priv = netdev_priv(skb->dev);
-	u32 chain = 0, chain_tag, reg_b;
+	struct mlx5e_tc_table *tc = &priv->fs.tc;
 	struct tc_skb_ext *tc_skb_ext;
 	int err;
 
@@ -5345,6 +5441,13 @@ bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe,
 			return false;
 
 		tc_skb_ext->chain = chain;
+
+		zone_restore_id = (reg_b >> REG_MAPPING_SHIFT(NIC_ZONE_RESTORE_TO_REG)) &
+				  ZONE_RESTORE_MAX;
+
+		if (!mlx5e_tc_ct_restore_flow(tc->ct, skb,
+					      zone_restore_id))
+			return false;
 	}
 #endif /* CONFIG_NET_TC_SKB_EXT */
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
index fa78289489b6..3b979008143d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -42,8 +42,14 @@
 
 #ifdef CONFIG_MLX5_ESWITCH
 
+#define NIC_FLOW_ATTR_SZ (sizeof(struct mlx5_flow_attr) +\
+			  sizeof(struct mlx5_nic_flow_attr))
 #define ESW_FLOW_ATTR_SZ (sizeof(struct mlx5_flow_attr) +\
 			  sizeof(struct mlx5_esw_flow_attr))
+#define ns_to_attr_sz(ns) (((ns) == MLX5_FLOW_NAMESPACE_FDB) ?\
+			    ESW_FLOW_ATTR_SZ :\
+			    NIC_FLOW_ATTR_SZ)
+
 
 int mlx5e_tc_num_filters(struct mlx5e_priv *priv, unsigned long flags);
 
@@ -124,6 +130,7 @@ enum {
 
 int mlx5e_tc_esw_init(struct rhashtable *tc_ht);
 void mlx5e_tc_esw_cleanup(struct rhashtable *tc_ht);
+bool mlx5e_is_eswitch_flow(struct mlx5e_tc_flow *flow);
 
 int mlx5e_configure_flower(struct net_device *dev, struct mlx5e_priv *priv,
 			   struct flow_cls_offload *f, unsigned long flags);
@@ -168,6 +175,7 @@ enum mlx5e_tc_attr_to_reg {
 	LABELS_TO_REG,
 	FTEID_TO_REG,
 	NIC_CHAIN_TO_REG,
+	NIC_ZONE_RESTORE_TO_REG,
 };
 
 struct mlx5e_tc_attr_to_reg_mapping {
@@ -185,6 +193,7 @@ bool mlx5e_is_valid_eswitch_fwd_dev(struct mlx5e_priv *priv,
 
 int mlx5e_tc_match_to_reg_set(struct mlx5_core_dev *mdev,
 			      struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts,
+			      enum mlx5_flow_namespace_type ns,
 			      enum mlx5e_tc_attr_to_reg type,
 			      u32 data);
 
@@ -224,6 +233,15 @@ void mlx5e_del_offloaded_nic_rule(struct mlx5e_priv *priv,
 				  struct mlx5_flow_handle *rule,
 				  struct mlx5_flow_attr *attr);
 
+struct mlx5_flow_handle *
+mlx5_tc_rule_insert(struct mlx5e_priv *priv,
+		    struct mlx5_flow_spec *spec,
+		    struct mlx5_flow_attr *attr);
+void
+mlx5_tc_rule_delete(struct mlx5e_priv *priv,
+		    struct mlx5_flow_handle *rule,
+		    struct mlx5_flow_attr *attr);
+
 #else /* CONFIG_MLX5_CLS_ACT */
 static inline int  mlx5e_tc_nic_init(struct mlx5e_priv *priv) { return 0; }
 static inline void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv) {}
@@ -235,6 +253,14 @@ mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data, void *cb_priv)
 
 struct mlx5_flow_attr *mlx5_alloc_flow_attr(enum mlx5_flow_namespace_type type);
 
+struct mlx5_flow_handle *
+mlx5e_add_offloaded_nic_rule(struct mlx5e_priv *priv,
+			     struct mlx5_flow_spec *spec,
+			     struct mlx5_flow_attr *attr);
+void mlx5e_del_offloaded_nic_rule(struct mlx5e_priv *priv,
+				  struct mlx5_flow_handle *rule,
+				  struct mlx5_flow_attr *attr);
+
 #else /* CONFIG_MLX5_ESWITCH */
 static inline int  mlx5e_tc_nic_init(struct mlx5e_priv *priv) { return 0; }
 static inline void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv) {}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next V2 09/15] net/mlx5e: CT: Use the same counter for both directions
  2020-09-23 22:48 [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode saeed
                   ` (7 preceding siblings ...)
  2020-09-23 22:48 ` [net-next V2 08/15] net/mlx5e: Support CT offload for tc nic flows saeed
@ 2020-09-23 22:48 ` saeed
  2020-11-27 14:01   ` Marcelo Ricardo Leitner
  2020-09-23 22:48 ` [net-next V2 10/15] net/mlx5e: TC: Remove unused parameter from mlx5_tc_ct_add_no_trk_match() saeed
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 21+ messages in thread
From: saeed @ 2020-09-23 22:48 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Oz Shlomo, Roi Dayan, Saeed Mahameed

From: Oz Shlomo <ozsh@mellanox.com>

A connection is represented by two 5-tuple entries, one for each direction.
Currently, each direction allocates its own hw counter, which is
inefficient as ct aging is managed per connection.

Share the counter that was allocated for the original direction with the
reverse direction.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en/tc_ct.c    | 94 +++++++++++++++++--
 1 file changed, 85 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index 86afef459dc6..9a7bd681f8fe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -51,6 +51,7 @@ struct mlx5_tc_ct_priv {
 	struct mlx5_flow_table *ct_nat;
 	struct mlx5_flow_table *post_ct;
 	struct mutex control_lock; /* guards parallel adds/dels */
+	struct mutex shared_counter_lock;
 	struct mapping_ctx *zone_mapping;
 	struct mapping_ctx *labels_mapping;
 	enum mlx5_flow_namespace_type ns_type;
@@ -117,11 +118,16 @@ struct mlx5_ct_tuple {
 	u16 zone;
 };
 
+struct mlx5_ct_shared_counter {
+	struct mlx5_fc *counter;
+	refcount_t refcount;
+};
+
 struct mlx5_ct_entry {
 	struct rhash_head node;
 	struct rhash_head tuple_node;
 	struct rhash_head tuple_nat_node;
-	struct mlx5_fc *counter;
+	struct mlx5_ct_shared_counter *shared_counter;
 	unsigned long cookie;
 	unsigned long restore_cookie;
 	struct mlx5_ct_tuple tuple;
@@ -385,6 +391,16 @@ mlx5_tc_ct_set_tuple_match(struct mlx5e_priv *priv, struct mlx5_flow_spec *spec,
 	return 0;
 }
 
+static void
+mlx5_tc_ct_shared_counter_put(struct mlx5_tc_ct_priv *ct_priv, struct mlx5_ct_entry *entry)
+{
+	if (!refcount_dec_and_test(&entry->shared_counter->refcount))
+		return;
+
+	mlx5_fc_destroy(ct_priv->esw->dev, entry->shared_counter->counter);
+	kfree(entry->shared_counter);
+}
+
 static void
 mlx5_tc_ct_entry_del_rule(struct mlx5_tc_ct_priv *ct_priv,
 			  struct mlx5_ct_entry *entry,
@@ -409,7 +425,6 @@ mlx5_tc_ct_entry_del_rules(struct mlx5_tc_ct_priv *ct_priv,
 	mlx5_tc_ct_entry_del_rule(ct_priv, entry, true);
 	mlx5_tc_ct_entry_del_rule(ct_priv, entry, false);
 
-	mlx5_fc_destroy(ct_priv->esw->dev, entry->counter);
 }
 
 static struct flow_action_entry *
@@ -683,7 +698,7 @@ mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv,
 	attr->dest_ft = ct_priv->post_ct;
 	attr->ft = nat ? ct_priv->ct_nat : ct_priv->ct;
 	attr->outer_match_level = MLX5_MATCH_L4;
-	attr->counter = entry->counter;
+	attr->counter = entry->shared_counter->counter;
 	attr->flags |= MLX5_ESW_ATTR_FLAG_NO_IN_PORT;
 
 	mlx5_tc_ct_set_tuple_match(netdev_priv(ct_priv->netdev), spec, flow_rule);
@@ -716,18 +731,73 @@ mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv,
 	return err;
 }
 
+static struct mlx5_ct_shared_counter *
+mlx5_tc_ct_shared_counter_get(struct mlx5_tc_ct_priv *ct_priv,
+			      struct mlx5_ct_entry *entry)
+{
+	struct mlx5_ct_tuple rev_tuple = entry->tuple;
+	struct mlx5_ct_shared_counter *shared_counter;
+	struct mlx5_eswitch *esw = ct_priv->esw;
+	struct mlx5_ct_entry *rev_entry;
+	__be16 tmp_port;
+
+	/* get the reversed tuple */
+	tmp_port = rev_tuple.port.src;
+	rev_tuple.port.src = rev_tuple.port.dst;
+	rev_tuple.port.dst = tmp_port;
+
+	if (rev_tuple.addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS) {
+		__be32 tmp_addr = rev_tuple.ip.src_v4;
+
+		rev_tuple.ip.src_v4 = rev_tuple.ip.dst_v4;
+		rev_tuple.ip.dst_v4 = tmp_addr;
+	} else if (rev_tuple.addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS) {
+		struct in6_addr tmp_addr = rev_tuple.ip.src_v6;
+
+		rev_tuple.ip.src_v6 = rev_tuple.ip.dst_v6;
+		rev_tuple.ip.dst_v6 = tmp_addr;
+	} else {
+		return ERR_PTR(-EOPNOTSUPP);
+	}
+
+	/* Use the same counter as the reverse direction */
+	mutex_lock(&ct_priv->shared_counter_lock);
+	rev_entry = rhashtable_lookup_fast(&ct_priv->ct_tuples_ht, &rev_tuple,
+					   tuples_ht_params);
+	if (rev_entry) {
+		if (refcount_inc_not_zero(&rev_entry->shared_counter->refcount)) {
+			mutex_unlock(&ct_priv->shared_counter_lock);
+			return rev_entry->shared_counter;
+		}
+	}
+	mutex_unlock(&ct_priv->shared_counter_lock);
+
+	shared_counter = kzalloc(sizeof(*shared_counter), GFP_KERNEL);
+	if (!shared_counter)
+		return ERR_PTR(-ENOMEM);
+
+	shared_counter->counter = mlx5_fc_create(esw->dev, true);
+	if (IS_ERR(shared_counter->counter)) {
+		ct_dbg("Failed to create counter for ct entry");
+		kfree(shared_counter);
+		return ERR_PTR(PTR_ERR(shared_counter->counter));
+	}
+
+	refcount_set(&shared_counter->refcount, 1);
+	return shared_counter;
+}
+
 static int
 mlx5_tc_ct_entry_add_rules(struct mlx5_tc_ct_priv *ct_priv,
 			   struct flow_rule *flow_rule,
 			   struct mlx5_ct_entry *entry,
 			   u8 zone_restore_id)
 {
-	struct mlx5_eswitch *esw = ct_priv->esw;
 	int err;
 
-	entry->counter = mlx5_fc_create(esw->dev, true);
-	if (IS_ERR(entry->counter)) {
-		err = PTR_ERR(entry->counter);
+	entry->shared_counter = mlx5_tc_ct_shared_counter_get(ct_priv, entry);
+	if (IS_ERR(entry->shared_counter)) {
+		err = PTR_ERR(entry->shared_counter);
 		ct_dbg("Failed to create counter for ct entry");
 		return err;
 	}
@@ -747,7 +817,7 @@ mlx5_tc_ct_entry_add_rules(struct mlx5_tc_ct_priv *ct_priv,
 err_nat:
 	mlx5_tc_ct_entry_del_rule(ct_priv, entry, false);
 err_orig:
-	mlx5_fc_destroy(esw->dev, entry->counter);
+	mlx5_tc_ct_shared_counter_put(ct_priv, entry);
 	return err;
 }
 
@@ -837,12 +907,16 @@ mlx5_tc_ct_del_ft_entry(struct mlx5_tc_ct_priv *ct_priv,
 			struct mlx5_ct_entry *entry)
 {
 	mlx5_tc_ct_entry_del_rules(ct_priv, entry);
+	mutex_lock(&ct_priv->shared_counter_lock);
 	if (entry->tuple_node.next)
 		rhashtable_remove_fast(&ct_priv->ct_tuples_nat_ht,
 				       &entry->tuple_nat_node,
 				       tuples_nat_ht_params);
 	rhashtable_remove_fast(&ct_priv->ct_tuples_ht, &entry->tuple_node,
 			       tuples_ht_params);
+	mutex_unlock(&ct_priv->shared_counter_lock);
+	mlx5_tc_ct_shared_counter_put(ct_priv, entry);
+
 }
 
 static int
@@ -879,7 +953,7 @@ mlx5_tc_ct_block_flow_offload_stats(struct mlx5_ct_ft *ft,
 	if (!entry)
 		return -ENOENT;
 
-	mlx5_fc_query_cached(entry->counter, &bytes, &packets, &lastuse);
+	mlx5_fc_query_cached(entry->shared_counter->counter, &bytes, &packets, &lastuse);
 	flow_stats_update(&f->stats, bytes, packets, 0, lastuse,
 			  FLOW_ACTION_HW_STATS_DELAYED);
 
@@ -1892,6 +1966,7 @@ mlx5_tc_ct_init(struct mlx5e_priv *priv, struct mlx5_fs_chains *chains,
 
 	idr_init(&ct_priv->fte_ids);
 	mutex_init(&ct_priv->control_lock);
+	mutex_init(&ct_priv->shared_counter_lock);
 	rhashtable_init(&ct_priv->zone_ht, &zone_params);
 	rhashtable_init(&ct_priv->ct_tuples_ht, &tuples_ht_params);
 	rhashtable_init(&ct_priv->ct_tuples_nat_ht, &tuples_nat_ht_params);
@@ -1934,6 +2009,7 @@ mlx5_tc_ct_clean(struct mlx5_tc_ct_priv *ct_priv)
 	rhashtable_destroy(&ct_priv->ct_tuples_nat_ht);
 	rhashtable_destroy(&ct_priv->zone_ht);
 	mutex_destroy(&ct_priv->control_lock);
+	mutex_destroy(&ct_priv->shared_counter_lock);
 	idr_destroy(&ct_priv->fte_ids);
 	kfree(ct_priv);
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next V2 10/15] net/mlx5e: TC: Remove unused parameter from mlx5_tc_ct_add_no_trk_match()
  2020-09-23 22:48 [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode saeed
                   ` (8 preceding siblings ...)
  2020-09-23 22:48 ` [net-next V2 09/15] net/mlx5e: CT: Use the same counter for both directions saeed
@ 2020-09-23 22:48 ` saeed
  2020-09-23 22:48 ` [net-next V2 11/15] net/mlx5e: Keep direct reference to mlx5_core_dev in tc ct saeed
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: saeed @ 2020-09-23 22:48 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski; +Cc: netdev, Saeed Mahameed

From: Saeed Mahameed <saeedm@nvidia.com>

priv is never used in this function

Fixes: 7e36feeb0467 ("net/mlx5e: CT: Don't offload tuple rewrites for established tuples")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 4 +---
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h | 7 ++-----
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 2 +-
 3 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index 9a7bd681f8fe..fe78de54179e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -1026,9 +1026,7 @@ mlx5_tc_ct_skb_to_tuple(struct sk_buff *skb, struct mlx5_ct_tuple *tuple,
 	return false;
 }
 
-int
-mlx5_tc_ct_add_no_trk_match(struct mlx5e_priv *priv,
-			    struct mlx5_flow_spec *spec)
+int mlx5_tc_ct_add_no_trk_match(struct mlx5_flow_spec *spec)
 {
 	u32 ctstate = 0, ctstate_mask = 0;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
index bab872b76a5a..6503b614337c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
@@ -108,9 +108,7 @@ mlx5_tc_ct_match_add(struct mlx5_tc_ct_priv *priv,
 		     struct flow_cls_offload *f,
 		     struct mlx5_ct_attr *ct_attr,
 		     struct netlink_ext_ack *extack);
-int
-mlx5_tc_ct_add_no_trk_match(struct mlx5e_priv *priv,
-			    struct mlx5_flow_spec *spec);
+int mlx5_tc_ct_add_no_trk_match(struct mlx5_flow_spec *spec);
 int
 mlx5_tc_ct_parse_action(struct mlx5_tc_ct_priv *priv,
 			struct mlx5_flow_attr *attr,
@@ -167,8 +165,7 @@ mlx5_tc_ct_match_add(struct mlx5_tc_ct_priv *priv,
 }
 
 static inline int
-mlx5_tc_ct_add_no_trk_match(struct mlx5e_priv *priv,
-			    struct mlx5_flow_spec *spec)
+mlx5_tc_ct_add_no_trk_match(struct mlx5_flow_spec *spec)
 {
 	return 0;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 4084a293442d..f815b0c60a6c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -3233,7 +3233,7 @@ static bool modify_header_match_supported(struct mlx5e_priv *priv,
 	 *  we can't restore ct state
 	 */
 	if (!ct_clear && modify_tuple &&
-	    mlx5_tc_ct_add_no_trk_match(priv, spec)) {
+	    mlx5_tc_ct_add_no_trk_match(spec)) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "can't offload tuple modify header with ct matches");
 		netdev_info(priv->netdev,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next V2 11/15] net/mlx5e: Keep direct reference to mlx5_core_dev in tc ct
  2020-09-23 22:48 [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode saeed
                   ` (9 preceding siblings ...)
  2020-09-23 22:48 ` [net-next V2 10/15] net/mlx5e: TC: Remove unused parameter from mlx5_tc_ct_add_no_trk_match() saeed
@ 2020-09-23 22:48 ` saeed
  2020-09-23 22:48 ` [net-next V2 12/15] net/mlx5e: IPsec: Use kvfree() for memory allocated with kvzalloc() saeed
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: saeed @ 2020-09-23 22:48 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Ariel Levkovich, Roi Dayan, Saeed Mahameed

From: Ariel Levkovich <lariel@nvidia.com>

Keep and use a direct reference to the mlx5 core device in all of
tc_ct code instead of accessing it via a pointer to mlx5 eswitch
in order to support nic mode ct offload for VF devices that don't
have a valid eswitch pointer set.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en/tc_ct.c    | 37 +++++++++----------
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index fe78de54179e..b5f8ed30047b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -39,7 +39,7 @@
 	netdev_dbg(ct_priv->netdev, "ct_debug: " fmt "\n", ##args)
 
 struct mlx5_tc_ct_priv {
-	struct mlx5_eswitch *esw;
+	struct mlx5_core_dev *dev;
 	const struct net_device *netdev;
 	struct mod_hdr_tbl *mod_hdr_tbl;
 	struct idr fte_ids;
@@ -397,7 +397,7 @@ mlx5_tc_ct_shared_counter_put(struct mlx5_tc_ct_priv *ct_priv, struct mlx5_ct_en
 	if (!refcount_dec_and_test(&entry->shared_counter->refcount))
 		return;
 
-	mlx5_fc_destroy(ct_priv->esw->dev, entry->shared_counter->counter);
+	mlx5_fc_destroy(ct_priv->dev, entry->shared_counter->counter);
 	kfree(entry->shared_counter);
 }
 
@@ -412,7 +412,7 @@ mlx5_tc_ct_entry_del_rule(struct mlx5_tc_ct_priv *ct_priv,
 	ct_dbg("Deleting ct entry rule in zone %d", entry->tuple.zone);
 
 	mlx5_tc_rule_delete(netdev_priv(ct_priv->netdev), zone_rule->rule, attr);
-	mlx5e_mod_hdr_detach(ct_priv->esw->dev,
+	mlx5e_mod_hdr_detach(ct_priv->dev,
 			     ct_priv->mod_hdr_tbl, zone_rule->mh);
 	mapping_remove(ct_priv->labels_mapping, attr->ct_attr.ct_labels_id);
 	kfree(attr);
@@ -424,7 +424,6 @@ mlx5_tc_ct_entry_del_rules(struct mlx5_tc_ct_priv *ct_priv,
 {
 	mlx5_tc_ct_entry_del_rule(ct_priv, entry, true);
 	mlx5_tc_ct_entry_del_rule(ct_priv, entry, false);
-
 }
 
 static struct flow_action_entry *
@@ -451,25 +450,25 @@ mlx5_tc_ct_entry_set_registers(struct mlx5_tc_ct_priv *ct_priv,
 			       u8 zone_restore_id)
 {
 	enum mlx5_flow_namespace_type ns = ct_priv->ns_type;
-	struct mlx5_eswitch *esw = ct_priv->esw;
+	struct mlx5_core_dev *dev = ct_priv->dev;
 	int err;
 
-	err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts, ns,
+	err = mlx5e_tc_match_to_reg_set(dev, mod_acts, ns,
 					CTSTATE_TO_REG, ct_state);
 	if (err)
 		return err;
 
-	err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts, ns,
+	err = mlx5e_tc_match_to_reg_set(dev, mod_acts, ns,
 					MARK_TO_REG, mark);
 	if (err)
 		return err;
 
-	err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts, ns,
+	err = mlx5e_tc_match_to_reg_set(dev, mod_acts, ns,
 					LABELS_TO_REG, labels_id);
 	if (err)
 		return err;
 
-	err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts, ns,
+	err = mlx5e_tc_match_to_reg_set(dev, mod_acts, ns,
 					ZONE_RESTORE_TO_REG, zone_restore_id);
 	if (err)
 		return err;
@@ -479,7 +478,7 @@ mlx5_tc_ct_entry_set_registers(struct mlx5_tc_ct_priv *ct_priv,
 	 * reg_b upon miss.
 	 */
 	if (ns != MLX5_FLOW_NAMESPACE_FDB) {
-		err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts, ns,
+		err = mlx5e_tc_match_to_reg_set(dev, mod_acts, ns,
 						NIC_ZONE_RESTORE_TO_REG, zone_restore_id);
 		if (err)
 			return err;
@@ -564,7 +563,7 @@ mlx5_tc_ct_entry_create_nat(struct mlx5_tc_ct_priv *ct_priv,
 			    struct mlx5e_tc_mod_hdr_acts *mod_acts)
 {
 	struct flow_action *flow_action = &flow_rule->action;
-	struct mlx5_core_dev *mdev = ct_priv->esw->dev;
+	struct mlx5_core_dev *mdev = ct_priv->dev;
 	struct flow_action_entry *act;
 	size_t action_size;
 	char *modact;
@@ -640,7 +639,7 @@ mlx5_tc_ct_entry_create_mod_hdr(struct mlx5_tc_ct_priv *ct_priv,
 	if (err)
 		goto err_mapping;
 
-	*mh = mlx5e_mod_hdr_attach(ct_priv->esw->dev,
+	*mh = mlx5e_mod_hdr_attach(ct_priv->dev,
 				   ct_priv->mod_hdr_tbl,
 				   ct_priv->ns_type,
 				   &mod_acts);
@@ -721,7 +720,7 @@ mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv,
 	return 0;
 
 err_rule:
-	mlx5e_mod_hdr_detach(ct_priv->esw->dev,
+	mlx5e_mod_hdr_detach(ct_priv->dev,
 			     ct_priv->mod_hdr_tbl, zone_rule->mh);
 	mapping_remove(ct_priv->labels_mapping, attr->ct_attr.ct_labels_id);
 err_mod_hdr:
@@ -737,7 +736,7 @@ mlx5_tc_ct_shared_counter_get(struct mlx5_tc_ct_priv *ct_priv,
 {
 	struct mlx5_ct_tuple rev_tuple = entry->tuple;
 	struct mlx5_ct_shared_counter *shared_counter;
-	struct mlx5_eswitch *esw = ct_priv->esw;
+	struct mlx5_core_dev *dev = ct_priv->dev;
 	struct mlx5_ct_entry *rev_entry;
 	__be16 tmp_port;
 
@@ -776,7 +775,7 @@ mlx5_tc_ct_shared_counter_get(struct mlx5_tc_ct_priv *ct_priv,
 	if (!shared_counter)
 		return ERR_PTR(-ENOMEM);
 
-	shared_counter->counter = mlx5_fc_create(esw->dev, true);
+	shared_counter->counter = mlx5_fc_create(dev, true);
 	if (IS_ERR(shared_counter->counter)) {
 		ct_dbg("Failed to create counter for ct entry");
 		kfree(shared_counter);
@@ -1159,7 +1158,7 @@ static int tc_ct_pre_ct_add_rules(struct mlx5_ct_ft *ct_ft,
 {
 	struct mlx5_tc_ct_priv *ct_priv = ct_ft->ct_priv;
 	struct mlx5e_tc_mod_hdr_acts pre_mod_acts = {};
-	struct mlx5_core_dev *dev = ct_priv->esw->dev;
+	struct mlx5_core_dev *dev = ct_priv->dev;
 	struct mlx5_flow_table *ft = pre_ct->ft;
 	struct mlx5_flow_destination dest = {};
 	struct mlx5_flow_act flow_act = {};
@@ -1246,7 +1245,7 @@ tc_ct_pre_ct_del_rules(struct mlx5_ct_ft *ct_ft,
 		       struct mlx5_tc_ct_pre *pre_ct)
 {
 	struct mlx5_tc_ct_priv *ct_priv = ct_ft->ct_priv;
-	struct mlx5_core_dev *dev = ct_priv->esw->dev;
+	struct mlx5_core_dev *dev = ct_priv->dev;
 
 	mlx5_del_flow_rules(pre_ct->flow_rule);
 	mlx5_del_flow_rules(pre_ct->miss_rule);
@@ -1260,7 +1259,7 @@ mlx5_tc_ct_alloc_pre_ct(struct mlx5_ct_ft *ct_ft,
 {
 	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
 	struct mlx5_tc_ct_priv *ct_priv = ct_ft->ct_priv;
-	struct mlx5_core_dev *dev = ct_priv->esw->dev;
+	struct mlx5_core_dev *dev = ct_priv->dev;
 	struct mlx5_flow_table_attr ft_attr = {};
 	struct mlx5_flow_namespace *ns;
 	struct mlx5_flow_table *ft;
@@ -1932,8 +1931,8 @@ mlx5_tc_ct_init(struct mlx5e_priv *priv, struct mlx5_fs_chains *chains,
 
 	ct_priv->ns_type = ns_type;
 	ct_priv->chains = chains;
-	ct_priv->esw = priv->mdev->priv.eswitch;
 	ct_priv->netdev = priv->netdev;
+	ct_priv->dev = priv->mdev;
 	ct_priv->mod_hdr_tbl = mod_hdr;
 	ct_priv->ct = mlx5_chains_create_global_table(chains);
 	if (IS_ERR(ct_priv->ct)) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next V2 12/15] net/mlx5e: IPsec: Use kvfree() for memory allocated with kvzalloc()
  2020-09-23 22:48 [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode saeed
                   ` (10 preceding siblings ...)
  2020-09-23 22:48 ` [net-next V2 11/15] net/mlx5e: Keep direct reference to mlx5_core_dev in tc ct saeed
@ 2020-09-23 22:48 ` saeed
  2020-09-23 22:48 ` [net-next V2 13/15] net/mlx5e: Use kfree() to free fd->g in accel_fs_tcp_create_groups() saeed
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: saeed @ 2020-09-23 22:48 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski; +Cc: netdev, Denis Efremov, Saeed Mahameed

From: Denis Efremov <efremov@linux.com>

Variables flow_group_in, spec in rx_fs_create() are allocated with
kvzalloc(). It's incorrect to free them with kfree(). Use kvfree()
instead.

Fixes: 5e466345291a ("net/mlx5e: IPsec: Add IPsec steering in local NIC RX")
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
index 429428bbc903..b974f3cd1005 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
@@ -228,8 +228,8 @@ static int rx_fs_create(struct mlx5e_priv *priv,
 	fs_prot->miss_rule = miss_rule;
 
 out:
-	kfree(flow_group_in);
-	kfree(spec);
+	kvfree(flow_group_in);
+	kvfree(spec);
 	return err;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next V2 13/15] net/mlx5e: Use kfree() to free fd->g in accel_fs_tcp_create_groups()
  2020-09-23 22:48 [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode saeed
                   ` (11 preceding siblings ...)
  2020-09-23 22:48 ` [net-next V2 12/15] net/mlx5e: IPsec: Use kvfree() for memory allocated with kvzalloc() saeed
@ 2020-09-23 22:48 ` saeed
  2020-09-23 22:48 ` [net-next V2 14/15] net/mlx5: simplify the return expression of mlx5_ec_init() saeed
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 21+ messages in thread
From: saeed @ 2020-09-23 22:48 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski; +Cc: netdev, Denis Efremov, Saeed Mahameed

From: Denis Efremov <efremov@linux.com>

Memory ft->g in accel_fs_tcp_create_groups() is allocaed with kcalloc().
It's excessive to free ft->g with kvfree(). Use kfree() instead.

Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c
index 4cdd9eac647d..97f1594cee11 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c
@@ -191,7 +191,7 @@ static int accel_fs_tcp_create_groups(struct mlx5e_flow_table *ft,
 	ft->g = kcalloc(MLX5E_ACCEL_FS_TCP_NUM_GROUPS, sizeof(*ft->g), GFP_KERNEL);
 	in = kvzalloc(inlen, GFP_KERNEL);
 	if  (!in || !ft->g) {
-		kvfree(ft->g);
+		kfree(ft->g);
 		kvfree(in);
 		return -ENOMEM;
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next V2 14/15] net/mlx5: simplify the return expression of mlx5_ec_init()
  2020-09-23 22:48 [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode saeed
                   ` (12 preceding siblings ...)
  2020-09-23 22:48 ` [net-next V2 13/15] net/mlx5e: Use kfree() to free fd->g in accel_fs_tcp_create_groups() saeed
@ 2020-09-23 22:48 ` saeed
  2020-09-23 22:48 ` [net-next V2 15/15] net/mlx5: remove unreachable return saeed
  2020-09-25  2:55 ` [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode David Miller
  15 siblings, 0 replies; 21+ messages in thread
From: saeed @ 2020-09-23 22:48 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski; +Cc: netdev, Qinglang Miao, Saeed Mahameed

From: Qinglang Miao <miaoqinglang@huawei.com>

Simplify the return expression.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/ecpf.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
index a894ea98c95a..3dc9dd3f24dc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
@@ -43,19 +43,13 @@ static void mlx5_peer_pf_cleanup(struct mlx5_core_dev *dev)
 
 int mlx5_ec_init(struct mlx5_core_dev *dev)
 {
-	int err = 0;
-
 	if (!mlx5_core_is_ecpf(dev))
 		return 0;
 
 	/* ECPF shall enable HCA for peer PF in the same way a PF
 	 * does this for its VFs.
 	 */
-	err = mlx5_peer_pf_init(dev);
-	if (err)
-		return err;
-
-	return 0;
+	return mlx5_peer_pf_init(dev);
 }
 
 void mlx5_ec_cleanup(struct mlx5_core_dev *dev)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next V2 15/15] net/mlx5: remove unreachable return
  2020-09-23 22:48 [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode saeed
                   ` (13 preceding siblings ...)
  2020-09-23 22:48 ` [net-next V2 14/15] net/mlx5: simplify the return expression of mlx5_ec_init() saeed
@ 2020-09-23 22:48 ` saeed
  2020-09-25  2:55 ` [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode David Miller
  15 siblings, 0 replies; 21+ messages in thread
From: saeed @ 2020-09-23 22:48 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Pavel Machek (CIP), Saeed Mahameed

From: "Pavel Machek (CIP)" <pavel@denx.de>

The last return statement is unreachable code. I'm not sure if it will
provoke any warnings, but it looks ugly.

Signed-off-by: Pavel Machek (CIP) <pavel@denx.de>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
index 7fc59e01a353..c70c1f0ca0c1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
@@ -435,8 +435,6 @@ static int mlx5_ptp_verify(struct ptp_clock_info *ptp, unsigned int pin,
 	default:
 		return -EOPNOTSUPP;
 	}
-
-	return -EOPNOTSUPP;
 }
 
 static const struct ptp_clock_info mlx5_ptp_clock_info = {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode
  2020-09-23 22:48 [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode saeed
                   ` (14 preceding siblings ...)
  2020-09-23 22:48 ` [net-next V2 15/15] net/mlx5: remove unreachable return saeed
@ 2020-09-25  2:55 ` David Miller
  15 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2020-09-25  2:55 UTC (permalink / raw)
  To: saeed; +Cc: kuba, netdev, saeedm

From: saeed@kernel.org
Date: Wed, 23 Sep 2020 15:48:09 -0700

> This series adds the support for connection tracking in NIC mode,
> and attached to this series some trivial cleanup patches.
> v1->v2:
>  - Remove "fixup!" comment from commit message (Jakub)
>  - More information and use case description in the tag message
>    (Cover-letter) (Jakub)
> 
> For more information please see tag log below.
> 
> Please pull and let me know if there is any problem.

Pulled, thanks Saeed.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next V2 09/15] net/mlx5e: CT: Use the same counter for both directions
  2020-09-23 22:48 ` [net-next V2 09/15] net/mlx5e: CT: Use the same counter for both directions saeed
@ 2020-11-27 14:01   ` Marcelo Ricardo Leitner
  2020-12-01 21:41     ` Saeed Mahameed
  0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2020-11-27 14:01 UTC (permalink / raw)
  To: saeed
  Cc: David S. Miller, Jakub Kicinski, netdev, Oz Shlomo, Roi Dayan,
	Saeed Mahameed, Marcelo Ricardo Leitner

On Wed, Sep 23, 2020 at 03:48:18PM -0700, saeed@kernel.org wrote:
> From: Oz Shlomo <ozsh@mellanox.com>

Sorry for reviving this one, but seemed better for the context.

> 
> A connection is represented by two 5-tuple entries, one for each direction.
> Currently, each direction allocates its own hw counter, which is
> inefficient as ct aging is managed per connection.
> 
> Share the counter that was allocated for the original direction with the
> reverse direction.

Yes, aging is done per connection, but the stats are not. With this
patch, with netperf TCP_RR test, I get this: (mangled for readability)

# grep 172.0.0.4 /proc/net/nf_conntrack
ipv4     2 tcp      6
  src=172.0.0.3 dst=172.0.0.4 sport=34018 dport=33396 packets=3941992 bytes=264113427
  src=172.0.0.4 dst=172.0.0.3 sport=33396 dport=34018 packets=4 bytes=218 [HW_OFFLOAD]
  mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=3

while without it (594e31bceb + act_ct patch to enable it posted
yesterday + revert), I get:

# grep 172.0.0.4 /proc/net/nf_conntrack
ipv4     2 tcp      6
  src=172.0.0.3 dst=172.0.0.4 sport=41856 dport=32776 packets=1876763 bytes=125743084
  src=172.0.0.4 dst=172.0.0.3 sport=32776 dport=41856 packets=1876761 bytes=125742951 [HW_OFFLOAD]
  mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=3

The same is visible on 'ovs-appctl dpctl/dump-conntrack -s' then.
Summing both directions in one like this is at least very misleading.
Seems this change was motivated only by hw resources constrains. That
said, I'm wondering, can this change be reverted somehow?

  Marcelo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next V2 09/15] net/mlx5e: CT: Use the same counter for both directions
  2020-11-27 14:01   ` Marcelo Ricardo Leitner
@ 2020-12-01 21:41     ` Saeed Mahameed
  2020-12-07 10:20       ` Oz Shlomo
  0 siblings, 1 reply; 21+ messages in thread
From: Saeed Mahameed @ 2020-12-01 21:41 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner, Ariel Levkovich
  Cc: David S. Miller, Jakub Kicinski, netdev, Oz Shlomo, Roi Dayan,
	Marcelo Ricardo Leitner

On Fri, 2020-11-27 at 11:01 -0300, Marcelo Ricardo Leitner wrote:
> On Wed, Sep 23, 2020 at 03:48:18PM -0700, saeed@kernel.org wrote:
> > From: Oz Shlomo <ozsh@mellanox.com>
> 
> Sorry for reviving this one, but seemed better for the context.
> 
> > A connection is represented by two 5-tuple entries, one for each
> > direction.
> > Currently, each direction allocates its own hw counter, which is
> > inefficient as ct aging is managed per connection.
> > 
> > Share the counter that was allocated for the original direction
> > with the
> > reverse direction.
> 
> Yes, aging is done per connection, but the stats are not. With this
> patch, with netperf TCP_RR test, I get this: (mangled for
> readability)
> 
> # grep 172.0.0.4 /proc/net/nf_conntrack
> ipv4     2 tcp      6
>   src=172.0.0.3 dst=172.0.0.4 sport=34018 dport=33396 packets=3941992
> bytes=264113427
>   src=172.0.0.4 dst=172.0.0.3 sport=33396 dport=34018 packets=4
> bytes=218 [HW_OFFLOAD]
>   mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=3
> 
> while without it (594e31bceb + act_ct patch to enable it posted
> yesterday + revert), I get:
> 
> # grep 172.0.0.4 /proc/net/nf_conntrack
> ipv4     2 tcp      6
>   src=172.0.0.3 dst=172.0.0.4 sport=41856 dport=32776 packets=1876763
> bytes=125743084
>   src=172.0.0.4 dst=172.0.0.3 sport=32776 dport=41856 packets=1876761
> bytes=125742951 [HW_OFFLOAD]
>   mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=3
> 
> The same is visible on 'ovs-appctl dpctl/dump-conntrack -s' then.
> Summing both directions in one like this is at least very misleading.
> Seems this change was motivated only by hw resources constrains. That
> said, I'm wondering, can this change be reverted somehow?
> 
>   Marcelo

Hi Marcelo, thanks for the report, 
Sorry i am not familiar with this /procfs
Oz, Ariel, Roi, what is your take on this, it seems that we changed the
behavior of stats incorrectly.

Thanks,
Saeed.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next V2 09/15] net/mlx5e: CT: Use the same counter for both directions
  2020-12-01 21:41     ` Saeed Mahameed
@ 2020-12-07 10:20       ` Oz Shlomo
  2020-12-07 19:19         ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 21+ messages in thread
From: Oz Shlomo @ 2020-12-07 10:20 UTC (permalink / raw)
  To: Saeed Mahameed, Marcelo Ricardo Leitner, Ariel Levkovich
  Cc: David S. Miller, Jakub Kicinski, netdev, Oz Shlomo, Roi Dayan,
	Marcelo Ricardo Leitner

Hi Marcelo,

On 12/1/2020 11:41 PM, Saeed Mahameed wrote:
> On Fri, 2020-11-27 at 11:01 -0300, Marcelo Ricardo Leitner wrote:
>> On Wed, Sep 23, 2020 at 03:48:18PM -0700, saeed@kernel.org wrote:
>>> From: Oz Shlomo <ozsh@mellanox.com>
>>
>> Sorry for reviving this one, but seemed better for the context.
>>
>>> A connection is represented by two 5-tuple entries, one for each
>>> direction.
>>> Currently, each direction allocates its own hw counter, which is
>>> inefficient as ct aging is managed per connection.
>>>
>>> Share the counter that was allocated for the original direction
>>> with the
>>> reverse direction.
>>
>> Yes, aging is done per connection, but the stats are not. With this
>> patch, with netperf TCP_RR test, I get this: (mangled for
>> readability)
>>
>> # grep 172.0.0.4 /proc/net/nf_conntrack
>> ipv4     2 tcp      6
>>    src=172.0.0.3 dst=172.0.0.4 sport=34018 dport=33396 packets=3941992
>> bytes=264113427
>>    src=172.0.0.4 dst=172.0.0.3 sport=33396 dport=34018 packets=4
>> bytes=218 [HW_OFFLOAD]
>>    mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=3
>>
>> while without it (594e31bceb + act_ct patch to enable it posted
>> yesterday + revert), I get:
>>
>> # grep 172.0.0.4 /proc/net/nf_conntrack
>> ipv4     2 tcp      6
>>    src=172.0.0.3 dst=172.0.0.4 sport=41856 dport=32776 packets=1876763
>> bytes=125743084
>>    src=172.0.0.4 dst=172.0.0.3 sport=32776 dport=41856 packets=1876761
>> bytes=125742951 [HW_OFFLOAD]
>>    mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=3
>>
>> The same is visible on 'ovs-appctl dpctl/dump-conntrack -s' then.
>> Summing both directions in one like this is at least very misleading.
>> Seems this change was motivated only by hw resources constrains. That
>> said, I'm wondering, can this change be reverted somehow?
>>
>>    Marcelo
> 
> Hi Marcelo, thanks for the report,
> Sorry i am not familiar with this /procfs
> Oz, Ariel, Roi, what is your take on this, it seems that we changed the
> behavior of stats incorrectly.

Indeed we overlooked the CT accounting extension.
We will submit a driver fix.

> 
> Thanks,
> Saeed.
> 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next V2 09/15] net/mlx5e: CT: Use the same counter for both directions
  2020-12-07 10:20       ` Oz Shlomo
@ 2020-12-07 19:19         ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2020-12-07 19:19 UTC (permalink / raw)
  To: Oz Shlomo
  Cc: Saeed Mahameed, Ariel Levkovich, David S. Miller, Jakub Kicinski,
	netdev, Oz Shlomo, Roi Dayan, mleitner

On Mon, Dec 07, 2020 at 12:20:54PM +0200, Oz Shlomo wrote:
> On 12/1/2020 11:41 PM, Saeed Mahameed wrote:
> > On Fri, 2020-11-27 at 11:01 -0300, Marcelo Ricardo Leitner wrote:
...
> > > The same is visible on 'ovs-appctl dpctl/dump-conntrack -s' then.
> > > Summing both directions in one like this is at least very misleading.
> > > Seems this change was motivated only by hw resources constrains. That
> > > said, I'm wondering, can this change be reverted somehow?
> > > 
> > >    Marcelo
> > 
> > Hi Marcelo, thanks for the report,
> > Sorry i am not familiar with this /procfs
> > Oz, Ariel, Roi, what is your take on this, it seems that we changed the
> > behavior of stats incorrectly.
> 
> Indeed we overlooked the CT accounting extension.
> We will submit a driver fix.

Cool. Thanks for confirming it, Oz.

  Marcelo

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-12-07 19:20 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-23 22:48 [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode saeed
2020-09-23 22:48 ` [net-next V2 01/15] net/mlx5: Refactor multi chains and prios support saeed
2020-09-23 22:48 ` [net-next V2 02/15] net/mlx5: Allow ft level ignore for nic rx tables saeed
2020-09-23 22:48 ` [net-next V2 03/15] net/mlx5e: Tc nic flows to use mlx5_chains flow tables saeed
2020-09-23 22:48 ` [net-next V2 04/15] net/mlx5e: Split nic tc flow allocation and creation saeed
2020-09-23 22:48 ` [net-next V2 05/15] net/mlx5: Refactor tc flow attributes structure saeed
2020-09-23 22:48 ` [net-next V2 06/15] net/mlx5e: Add tc chains offload support for nic flows saeed
2020-09-23 22:48 ` [net-next V2 07/15] net/mlx5e: rework ct offload init messages saeed
2020-09-23 22:48 ` [net-next V2 08/15] net/mlx5e: Support CT offload for tc nic flows saeed
2020-09-23 22:48 ` [net-next V2 09/15] net/mlx5e: CT: Use the same counter for both directions saeed
2020-11-27 14:01   ` Marcelo Ricardo Leitner
2020-12-01 21:41     ` Saeed Mahameed
2020-12-07 10:20       ` Oz Shlomo
2020-12-07 19:19         ` Marcelo Ricardo Leitner
2020-09-23 22:48 ` [net-next V2 10/15] net/mlx5e: TC: Remove unused parameter from mlx5_tc_ct_add_no_trk_match() saeed
2020-09-23 22:48 ` [net-next V2 11/15] net/mlx5e: Keep direct reference to mlx5_core_dev in tc ct saeed
2020-09-23 22:48 ` [net-next V2 12/15] net/mlx5e: IPsec: Use kvfree() for memory allocated with kvzalloc() saeed
2020-09-23 22:48 ` [net-next V2 13/15] net/mlx5e: Use kfree() to free fd->g in accel_fs_tcp_create_groups() saeed
2020-09-23 22:48 ` [net-next V2 14/15] net/mlx5: simplify the return expression of mlx5_ec_init() saeed
2020-09-23 22:48 ` [net-next V2 15/15] net/mlx5: remove unreachable return saeed
2020-09-25  2:55 ` [pull request][net-next V2 00/15] mlx5 Connection Tracking in NIC mode David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.