netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios
@ 2020-01-17  0:06 Saeed Mahameed
  2020-01-17  0:06 ` [mlx5-next 01/16] net/mlx5: Add structures layout for new MCAM access reg groups Saeed Mahameed
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: Saeed Mahameed @ 2020-01-17  0:06 UTC (permalink / raw)
  To: David S. Miller, kuba; +Cc: netdev, Saeed Mahameed

Hi Dave/Jakub,

This series has two parts, 

1) A merge commit with mlx5-next branch that include updates for mlx5
HW layouts needed for this and upcoming submissions. 

2) From Paul, Increase the number of chains and prios

Currently the Mellanox driver supports offloading tc rules that
are defined on the first 4 chains and the first 16 priorities.
The restriction stems from the firmware flow level enforcement
requiring a flow table of a certain level to point to a flow
table of a higher level. This limitation may be ignored by setting
the ignore_flow_level bit when creating flow table entries.
Use unmanaged tables and ignore flow level to create more tables than
declared by fs_core steering. Manually manage the connections between the
tables themselves.

HW table is instantiated for every tc <chain,prio> tuple. The miss rule
of every table either jumps to the next <chain,prio> table, or continues
to slow_fdb. This logic is realized by following this sequence:

1. Create an auto-grouped flow table for the specified priority with
    reserved entries

Reserved entries are allocated at the end of the flow table.
Flow groups are evaluated in sequence and therefore it is guaranteed
that the flow group defined on the last FTEs will be the last to evaluate.

Define a "match all" flow group on the reserved entries, providing
the platform to add table miss actions.

2. Set the miss rule action to jump to the next <chain,prio> table
    or the slow_fdb.

3. Link the previous priority table to point to the new table by
    updating its miss rule.

Please pull and let me know if there's any problem.

Thanks,
Saeed.

---

The following changes since commit 12e9e0d0d97cc4f2aa9a858ac8a5741f321b5287:

  Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux (2020-01-16 15:48:24 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git for-net-next

for you to fetch changes up to 278d51f24330718aefd7fe86996a6da66fd345e7:

  net/mlx5: E-Switch, Increase number of chains and priorities (2020-01-16 15:48:58 -0800)

----------------------------------------------------------------

Aharon Landau (1):
  net/mlx5e: Add discard counters per priority

Aya Levin (2):
  net/mlx5: Expose resource dump register mapping
  net/mlx5e: Expose FEC feilds and related capability bit

Eran Ben Elisha (3):
  net/mlx5: Add structures layout for new MCAM access reg groups
  net/mlx5: Read MCAM register groups 1 and 2
  net/mlx5: Add structures and defines for MIRC register

Hamdan Igbaria (1):
  net/mlx5: Add copy header action struct layout

Paul Blakey (9):
  net/mlx5: Add mlx5_ifc definitions for connection tracking support
  net/mlx5: Refactor mlx5_create_auto_grouped_flow_table
  net/mlx5: fs_core: Introduce unmanaged flow tables
  net/mlx5: Add ignore level support fwd to table rules
  net/mlx5: Allow creating autogroups with reserved entries
  net/mlx5: ft: Use getter function to get ft chain
  net/mlx5: ft: Check prio and chain sanity for ft offload
  net/mlx5: E-Switch, Refactor chains and priorities
  net/mlx5: E-Switch, Increase number of chains and priorities

 drivers/infiniband/hw/mlx5/main.c             |  10 +-
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 .../mellanox/mlx5/core/en_fs_ethtool.c        |   9 +-
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |  28 +-
 .../ethernet/mellanox/mlx5/core/en_stats.c    |   1 +
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   |  27 +-
 .../net/ethernet/mellanox/mlx5/core/eswitch.c |   7 +-
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  27 +-
 .../mellanox/mlx5/core/eswitch_offloads.c     | 298 ++-----
 .../mlx5/core/eswitch_offloads_chains.c       | 758 ++++++++++++++++++
 .../mlx5/core/eswitch_offloads_chains.h       |  30 +
 .../mlx5/core/eswitch_offloads_termtbl.c      |  11 +-
 .../net/ethernet/mellanox/mlx5/core/fs_cmd.c  |   3 +
 .../net/ethernet/mellanox/mlx5/core/fs_core.c |  96 ++-
 .../net/ethernet/mellanox/mlx5/core/fs_core.h |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/fw.c  |  15 +-
 include/linux/mlx5/device.h                   |  14 +-
 include/linux/mlx5/driver.h                   |   4 +-
 include/linux/mlx5/fs.h                       |  20 +-
 include/linux/mlx5/mlx5_ifc.h                 | 222 ++++-
 20 files changed, 1222 insertions(+), 361 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h

-- 
2.24.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [mlx5-next 01/16] net/mlx5: Add structures layout for new MCAM access reg groups
  2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
@ 2020-01-17  0:06 ` Saeed Mahameed
  2020-01-17  0:06 ` [mlx5-next 02/16] net/mlx5: Read MCAM register groups 1 and 2 Saeed Mahameed
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2020-01-17  0:06 UTC (permalink / raw)
  To: David S. Miller, kuba; +Cc: netdev, Eran Ben Elisha, Saeed Mahameed

From: Eran Ben Elisha <eranbe@mellanox.com>

MCAM has 3 access_reg_groups (0-2). Defines data structures in order to
read and parse access_reg_groups #1 and #2.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index c6abaf4f1c55..43cdf9211747 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -8832,6 +8832,28 @@ struct mlx5_ifc_mcam_access_reg_bits {
 	u8         regs_31_to_0[0x20];
 };
 
+struct mlx5_ifc_mcam_access_reg_bits1 {
+	u8         regs_127_to_96[0x20];
+
+	u8         regs_95_to_64[0x20];
+
+	u8         regs_63_to_32[0x20];
+
+	u8         regs_31_to_0[0x20];
+};
+
+struct mlx5_ifc_mcam_access_reg_bits2 {
+	u8         regs_127_to_99[0x1d];
+	u8         mirc[0x1];
+	u8         regs_97_to_96[0x2];
+
+	u8         regs_95_to_64[0x20];
+
+	u8         regs_63_to_32[0x20];
+
+	u8         regs_31_to_0[0x20];
+};
+
 struct mlx5_ifc_mcam_reg_bits {
 	u8         reserved_at_0[0x8];
 	u8         feature_group[0x8];
@@ -8842,6 +8864,8 @@ struct mlx5_ifc_mcam_reg_bits {
 
 	union {
 		struct mlx5_ifc_mcam_access_reg_bits access_regs;
+		struct mlx5_ifc_mcam_access_reg_bits1 access_regs1;
+		struct mlx5_ifc_mcam_access_reg_bits2 access_regs2;
 		u8         reserved_at_0[0x80];
 	} mng_access_reg_cap_mask;
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [mlx5-next 02/16] net/mlx5: Read MCAM register groups 1 and 2
  2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
  2020-01-17  0:06 ` [mlx5-next 01/16] net/mlx5: Add structures layout for new MCAM access reg groups Saeed Mahameed
@ 2020-01-17  0:06 ` Saeed Mahameed
  2020-01-17  0:06 ` [mlx5-next 03/16] net/mlx5: Add structures and defines for MIRC register Saeed Mahameed
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2020-01-17  0:06 UTC (permalink / raw)
  To: David S. Miller, kuba; +Cc: netdev, Eran Ben Elisha, Saeed Mahameed

From: Eran Ben Elisha <eranbe@mellanox.com>

On load, Driver caches MCAM (Management Capabilities Mask Register)
registers. in addition to the only MCAM register group (0) the driver
already reads, here we add support for reading groups 1 and 2.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fw.c | 15 +++++++++------
 include/linux/mlx5/device.h                  | 14 +++++++++++++-
 include/linux/mlx5/driver.h                  |  2 +-
 3 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index c375edfe528c..d89ff1d09119 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -131,11 +131,11 @@ static int mlx5_get_pcam_reg(struct mlx5_core_dev *dev)
 				   MLX5_PCAM_REGS_5000_TO_507F);
 }
 
-static int mlx5_get_mcam_reg(struct mlx5_core_dev *dev)
+static int mlx5_get_mcam_access_reg_group(struct mlx5_core_dev *dev,
+					  enum mlx5_mcam_reg_groups group)
 {
-	return mlx5_query_mcam_reg(dev, dev->caps.mcam,
-				   MLX5_MCAM_FEATURE_ENHANCED_FEATURES,
-				   MLX5_MCAM_REGS_FIRST_128);
+	return mlx5_query_mcam_reg(dev, dev->caps.mcam[group],
+				   MLX5_MCAM_FEATURE_ENHANCED_FEATURES, group);
 }
 
 static int mlx5_get_qcam_reg(struct mlx5_core_dev *dev)
@@ -221,8 +221,11 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
 	if (MLX5_CAP_GEN(dev, pcam_reg))
 		mlx5_get_pcam_reg(dev);
 
-	if (MLX5_CAP_GEN(dev, mcam_reg))
-		mlx5_get_mcam_reg(dev);
+	if (MLX5_CAP_GEN(dev, mcam_reg)) {
+		mlx5_get_mcam_access_reg_group(dev, MLX5_MCAM_REGS_FIRST_128);
+		mlx5_get_mcam_access_reg_group(dev, MLX5_MCAM_REGS_0x9080_0x90FF);
+		mlx5_get_mcam_access_reg_group(dev, MLX5_MCAM_REGS_0x9100_0x917F);
+	}
 
 	if (MLX5_CAP_GEN(dev, qcam_reg))
 		mlx5_get_qcam_reg(dev);
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 1a1c53f0262d..0e62c3db45e5 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1121,6 +1121,9 @@ enum mlx5_pcam_feature_groups {
 
 enum mlx5_mcam_reg_groups {
 	MLX5_MCAM_REGS_FIRST_128                    = 0x0,
+	MLX5_MCAM_REGS_0x9080_0x90FF                = 0x1,
+	MLX5_MCAM_REGS_0x9100_0x917F                = 0x2,
+	MLX5_MCAM_REGS_NUM                          = 0x3,
 };
 
 enum mlx5_mcam_feature_groups {
@@ -1269,7 +1272,16 @@ enum mlx5_qcam_feature_groups {
 	MLX5_GET(pcam_reg, (mdev)->caps.pcam, port_access_reg_cap_mask.regs_5000_to_507f.reg)
 
 #define MLX5_CAP_MCAM_REG(mdev, reg) \
-	MLX5_GET(mcam_reg, (mdev)->caps.mcam, mng_access_reg_cap_mask.access_regs.reg)
+	MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_FIRST_128], \
+		 mng_access_reg_cap_mask.access_regs.reg)
+
+#define MLX5_CAP_MCAM_REG1(mdev, reg) \
+	MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_0x9080_0x90FF], \
+		 mng_access_reg_cap_mask.access_regs1.reg)
+
+#define MLX5_CAP_MCAM_REG2(mdev, reg) \
+	MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_0x9100_0x917F], \
+		 mng_access_reg_cap_mask.access_regs2.reg)
 
 #define MLX5_CAP_MCAM_FEATURE(mdev, fld) \
 	MLX5_GET(mcam_reg, (mdev)->caps.mcam, mng_feature_cap_mask.enhanced_features.fld)
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 59cff380f41a..dc3c725be092 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -684,7 +684,7 @@ struct mlx5_core_dev {
 		u32 hca_cur[MLX5_CAP_NUM][MLX5_UN_SZ_DW(hca_cap_union)];
 		u32 hca_max[MLX5_CAP_NUM][MLX5_UN_SZ_DW(hca_cap_union)];
 		u32 pcam[MLX5_ST_SZ_DW(pcam_reg)];
-		u32 mcam[MLX5_ST_SZ_DW(mcam_reg)];
+		u32 mcam[MLX5_MCAM_REGS_NUM][MLX5_ST_SZ_DW(mcam_reg)];
 		u32 fpga[MLX5_ST_SZ_DW(fpga_cap)];
 		u32 qcam[MLX5_ST_SZ_DW(qcam_reg)];
 		u8  embedded_cpu;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [mlx5-next 03/16] net/mlx5: Add structures and defines for MIRC register
  2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
  2020-01-17  0:06 ` [mlx5-next 01/16] net/mlx5: Add structures layout for new MCAM access reg groups Saeed Mahameed
  2020-01-17  0:06 ` [mlx5-next 02/16] net/mlx5: Read MCAM register groups 1 and 2 Saeed Mahameed
@ 2020-01-17  0:06 ` Saeed Mahameed
  2020-01-17  0:06 ` [mlx5-next 04/16] net/mlx5: Expose resource dump register mapping Saeed Mahameed
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2020-01-17  0:06 UTC (permalink / raw)
  To: David S. Miller, kuba; +Cc: netdev, Eran Ben Elisha, Saeed Mahameed

From: Eran Ben Elisha <eranbe@mellanox.com>

Add needed structures, layouts and defines for MIRC (Management Image
Re-activation Control) register. This structure will be used for the FSM
reactivation flow in the downstream patches.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/driver.h   | 1 +
 include/linux/mlx5/mlx5_ifc.h | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index dc3c725be092..5bbdbeb21ab6 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -145,6 +145,7 @@ enum {
 	MLX5_REG_MCC		 = 0x9062,
 	MLX5_REG_MCDA		 = 0x9063,
 	MLX5_REG_MCAM		 = 0x907f,
+	MLX5_REG_MIRC		 = 0x9162,
 };
 
 enum mlx5_qpts_trust_state {
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 43cdf9211747..a133583c3e4f 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -9471,6 +9471,13 @@ struct mlx5_ifc_mcda_reg_bits {
 	u8         data[0][0x20];
 };
 
+struct mlx5_ifc_mirc_reg_bits {
+	u8         reserved_at_0[0x18];
+	u8         status_code[0x8];
+
+	u8         reserved_at_20[0x20];
+};
+
 union mlx5_ifc_ports_control_registers_document_bits {
 	struct mlx5_ifc_bufferx_reg_bits bufferx_reg;
 	struct mlx5_ifc_eth_2819_cntrs_grp_data_layout_bits eth_2819_cntrs_grp_data_layout;
@@ -9526,6 +9533,7 @@ union mlx5_ifc_ports_control_registers_document_bits {
 	struct mlx5_ifc_mcqi_reg_bits mcqi_reg;
 	struct mlx5_ifc_mcc_reg_bits mcc_reg;
 	struct mlx5_ifc_mcda_reg_bits mcda_reg;
+	struct mlx5_ifc_mirc_reg_bits mirc_reg;
 	u8         reserved_at_0[0x60e0];
 };
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [mlx5-next 04/16] net/mlx5: Expose resource dump register mapping
  2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
                   ` (2 preceding siblings ...)
  2020-01-17  0:06 ` [mlx5-next 03/16] net/mlx5: Add structures and defines for MIRC register Saeed Mahameed
@ 2020-01-17  0:06 ` Saeed Mahameed
  2020-01-17  0:07 ` [mlx5-next 05/16] net/mlx5: Add copy header action struct layout Saeed Mahameed
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2020-01-17  0:06 UTC (permalink / raw)
  To: David S. Miller, kuba; +Cc: netdev, Aya Levin, Moshe Shemesh, Saeed Mahameed

From: Aya Levin <ayal@mellanox.com>

Add new register enumeration for resource dump. Add layout mapping for
resource dump: access command and response.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/driver.h   |   1 +
 include/linux/mlx5/mlx5_ifc.h | 130 +++++++++++++++++++++++++++++++++-
 2 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 5bbdbeb21ab6..22bd0d5024c8 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -146,6 +146,7 @@ enum {
 	MLX5_REG_MCDA		 = 0x9063,
 	MLX5_REG_MCAM		 = 0x907f,
 	MLX5_REG_MIRC		 = 0x9162,
+	MLX5_REG_RESOURCE_DUMP   = 0xC000,
 };
 
 enum mlx5_qpts_trust_state {
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index a133583c3e4f..6fe0431e11ec 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -823,7 +823,9 @@ struct mlx5_ifc_qos_cap_bits {
 struct mlx5_ifc_debug_cap_bits {
 	u8         core_dump_general[0x1];
 	u8         core_dump_qp[0x1];
-	u8         reserved_at_2[0x1e];
+	u8         reserved_at_2[0x7];
+	u8         resource_dump[0x1];
+	u8         reserved_at_a[0x16];
 
 	u8         reserved_at_20[0x2];
 	u8         stall_detect[0x1];
@@ -1767,6 +1769,132 @@ struct mlx5_ifc_resize_field_select_bits {
 	u8         resize_field_select[0x20];
 };
 
+struct mlx5_ifc_resource_dump_bits {
+	u8         more_dump[0x1];
+	u8         inline_dump[0x1];
+	u8         reserved_at_2[0xa];
+	u8         seq_num[0x4];
+	u8         segment_type[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         vhca_id[0x10];
+
+	u8         index1[0x20];
+
+	u8         index2[0x20];
+
+	u8         num_of_obj1[0x10];
+	u8         num_of_obj2[0x10];
+
+	u8         reserved_at_a0[0x20];
+
+	u8         device_opaque[0x40];
+
+	u8         mkey[0x20];
+
+	u8         size[0x20];
+
+	u8         address[0x40];
+
+	u8         inline_data[52][0x20];
+};
+
+struct mlx5_ifc_resource_dump_menu_record_bits {
+	u8         reserved_at_0[0x4];
+	u8         num_of_obj2_supports_active[0x1];
+	u8         num_of_obj2_supports_all[0x1];
+	u8         must_have_num_of_obj2[0x1];
+	u8         support_num_of_obj2[0x1];
+	u8         num_of_obj1_supports_active[0x1];
+	u8         num_of_obj1_supports_all[0x1];
+	u8         must_have_num_of_obj1[0x1];
+	u8         support_num_of_obj1[0x1];
+	u8         must_have_index2[0x1];
+	u8         support_index2[0x1];
+	u8         must_have_index1[0x1];
+	u8         support_index1[0x1];
+	u8         segment_type[0x10];
+
+	u8         segment_name[4][0x20];
+
+	u8         index1_name[4][0x20];
+
+	u8         index2_name[4][0x20];
+};
+
+struct mlx5_ifc_resource_dump_segment_header_bits {
+	u8         length_dw[0x10];
+	u8         segment_type[0x10];
+};
+
+struct mlx5_ifc_resource_dump_command_segment_bits {
+	struct mlx5_ifc_resource_dump_segment_header_bits segment_header;
+
+	u8         segment_called[0x10];
+	u8         vhca_id[0x10];
+
+	u8         index1[0x20];
+
+	u8         index2[0x20];
+
+	u8         num_of_obj1[0x10];
+	u8         num_of_obj2[0x10];
+};
+
+struct mlx5_ifc_resource_dump_error_segment_bits {
+	struct mlx5_ifc_resource_dump_segment_header_bits segment_header;
+
+	u8         reserved_at_20[0x10];
+	u8         syndrome_id[0x10];
+
+	u8         reserved_at_40[0x40];
+
+	u8         error[8][0x20];
+};
+
+struct mlx5_ifc_resource_dump_info_segment_bits {
+	struct mlx5_ifc_resource_dump_segment_header_bits segment_header;
+
+	u8         reserved_at_20[0x18];
+	u8         dump_version[0x8];
+
+	u8         hw_version[0x20];
+
+	u8         fw_version[0x20];
+};
+
+struct mlx5_ifc_resource_dump_menu_segment_bits {
+	struct mlx5_ifc_resource_dump_segment_header_bits segment_header;
+
+	u8         reserved_at_20[0x10];
+	u8         num_of_records[0x10];
+
+	struct mlx5_ifc_resource_dump_menu_record_bits record[0];
+};
+
+struct mlx5_ifc_resource_dump_resource_segment_bits {
+	struct mlx5_ifc_resource_dump_segment_header_bits segment_header;
+
+	u8         reserved_at_20[0x20];
+
+	u8         index1[0x20];
+
+	u8         index2[0x20];
+
+	u8         payload[0][0x20];
+};
+
+struct mlx5_ifc_resource_dump_terminate_segment_bits {
+	struct mlx5_ifc_resource_dump_segment_header_bits segment_header;
+};
+
+struct mlx5_ifc_menu_resource_dump_response_bits {
+	struct mlx5_ifc_resource_dump_info_segment_bits info;
+	struct mlx5_ifc_resource_dump_command_segment_bits cmd;
+	struct mlx5_ifc_resource_dump_menu_segment_bits menu;
+	struct mlx5_ifc_resource_dump_terminate_segment_bits terminate;
+};
+
 enum {
 	MLX5_MODIFY_FIELD_SELECT_MODIFY_FIELD_SELECT_CQ_PERIOD     = 0x1,
 	MLX5_MODIFY_FIELD_SELECT_MODIFY_FIELD_SELECT_CQ_MAX_COUNT  = 0x2,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [mlx5-next 05/16] net/mlx5: Add copy header action struct layout
  2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
                   ` (3 preceding siblings ...)
  2020-01-17  0:06 ` [mlx5-next 04/16] net/mlx5: Expose resource dump register mapping Saeed Mahameed
@ 2020-01-17  0:07 ` Saeed Mahameed
  2020-01-17  0:07 ` [mlx5-next 06/16] net/mlx5: Add mlx5_ifc definitions for connection tracking support Saeed Mahameed
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2020-01-17  0:07 UTC (permalink / raw)
  To: David S. Miller, kuba; +Cc: netdev, Hamdan Igbaria, Alex Vesker, Saeed Mahameed

From: Hamdan Igbaria <hamdani@mellanox.com>

Add definition for copy header action, copy action is used
to copy header fields from source to destination.

Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 6fe0431e11ec..23613a6ea51c 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -5609,6 +5609,21 @@ struct mlx5_ifc_add_action_in_bits {
 	u8         data[0x20];
 };
 
+struct mlx5_ifc_copy_action_in_bits {
+	u8         action_type[0x4];
+	u8         src_field[0xc];
+	u8         reserved_at_10[0x3];
+	u8         src_offset[0x5];
+	u8         reserved_at_18[0x3];
+	u8         length[0x5];
+
+	u8         reserved_at_20[0x4];
+	u8         dst_field[0xc];
+	u8         reserved_at_30[0x3];
+	u8         dst_offset[0x5];
+	u8         reserved_at_38[0x8];
+};
+
 union mlx5_ifc_set_action_in_add_action_in_auto_bits {
 	struct mlx5_ifc_set_action_in_bits set_action_in;
 	struct mlx5_ifc_add_action_in_bits add_action_in;
@@ -5618,6 +5633,7 @@ union mlx5_ifc_set_action_in_add_action_in_auto_bits {
 enum {
 	MLX5_ACTION_TYPE_SET   = 0x1,
 	MLX5_ACTION_TYPE_ADD   = 0x2,
+	MLX5_ACTION_TYPE_COPY  = 0x3,
 };
 
 enum {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [mlx5-next 06/16] net/mlx5: Add mlx5_ifc definitions for connection tracking support
  2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
                   ` (4 preceding siblings ...)
  2020-01-17  0:07 ` [mlx5-next 05/16] net/mlx5: Add copy header action struct layout Saeed Mahameed
@ 2020-01-17  0:07 ` Saeed Mahameed
  2020-01-17  0:07 ` [mlx5-next 07/16] net/mlx5e: Expose FEC feilds and related capability bit Saeed Mahameed
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2020-01-17  0:07 UTC (permalink / raw)
  To: David S. Miller, kuba
  Cc: netdev, Paul Blakey, Roi Dayan, Oz Shlomo, Mark Bloch, Saeed Mahameed

From: Paul Blakey <paulb@mellanox.com>

Add the required hardware definitions to mlx5_ifc:
ignore_flow_level, registers, copy_header, and fwd_and_modify cap.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Sholomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 23613a6ea51c..e9c165ffe3f9 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -375,8 +375,17 @@ struct mlx5_ifc_flow_table_fields_supported_bits {
 	u8	   outer_esp_spi[0x1];
 	u8         reserved_at_58[0x2];
 	u8         bth_dst_qp[0x1];
+	u8         reserved_at_5b[0x5];
 
-	u8         reserved_at_5b[0x25];
+	u8         reserved_at_60[0x18];
+	u8         metadata_reg_c_7[0x1];
+	u8         metadata_reg_c_6[0x1];
+	u8         metadata_reg_c_5[0x1];
+	u8         metadata_reg_c_4[0x1];
+	u8         metadata_reg_c_3[0x1];
+	u8         metadata_reg_c_2[0x1];
+	u8         metadata_reg_c_1[0x1];
+	u8         metadata_reg_c_0[0x1];
 };
 
 struct mlx5_ifc_flow_table_prop_layout_bits {
@@ -401,7 +410,8 @@ struct mlx5_ifc_flow_table_prop_layout_bits {
 	u8	   reformat_l3_tunnel_to_l2[0x1];
 	u8	   reformat_l2_to_l3_tunnel[0x1];
 	u8	   reformat_and_modify_action[0x1];
-	u8         reserved_at_15[0x2];
+	u8	   ignore_flow_level[0x1];
+	u8         reserved_at_16[0x1];
 	u8	   table_miss_action_domain[0x1];
 	u8         termination_table[0x1];
 	u8         reserved_at_19[0x7];
@@ -722,7 +732,9 @@ enum {
 
 struct mlx5_ifc_flow_table_eswitch_cap_bits {
 	u8      fdb_to_vport_reg_c_id[0x8];
-	u8      reserved_at_8[0xf];
+	u8      reserved_at_8[0xd];
+	u8      fdb_modify_header_fwd_to_table[0x1];
+	u8      reserved_at_16[0x1];
 	u8      flow_source[0x1];
 	u8      reserved_at_18[0x2];
 	u8      multi_fdb_encap[0x1];
@@ -4141,7 +4153,8 @@ struct mlx5_ifc_set_fte_in_bits {
 	u8         reserved_at_a0[0x8];
 	u8         table_id[0x18];
 
-	u8         reserved_at_c0[0x18];
+	u8         ignore_flow_level[0x1];
+	u8         reserved_at_c1[0x17];
 	u8         modify_enable_mask[0x8];
 
 	u8         reserved_at_e0[0x20];
@@ -5627,6 +5640,7 @@ struct mlx5_ifc_copy_action_in_bits {
 union mlx5_ifc_set_action_in_add_action_in_auto_bits {
 	struct mlx5_ifc_set_action_in_bits set_action_in;
 	struct mlx5_ifc_add_action_in_bits add_action_in;
+	struct mlx5_ifc_copy_action_in_bits copy_action_in;
 	u8         reserved_at_0[0x40];
 };
 
@@ -5669,6 +5683,8 @@ enum {
 	MLX5_ACTION_IN_FIELD_METADATA_REG_C_3  = 0x54,
 	MLX5_ACTION_IN_FIELD_METADATA_REG_C_4  = 0x55,
 	MLX5_ACTION_IN_FIELD_METADATA_REG_C_5  = 0x56,
+	MLX5_ACTION_IN_FIELD_METADATA_REG_C_6  = 0x57,
+	MLX5_ACTION_IN_FIELD_METADATA_REG_C_7  = 0x58,
 	MLX5_ACTION_IN_FIELD_OUT_TCP_SEQ_NUM   = 0x59,
 	MLX5_ACTION_IN_FIELD_OUT_TCP_ACK_NUM   = 0x5B,
 };
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [mlx5-next 07/16] net/mlx5e: Expose FEC feilds and related capability bit
  2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
                   ` (5 preceding siblings ...)
  2020-01-17  0:07 ` [mlx5-next 06/16] net/mlx5: Add mlx5_ifc definitions for connection tracking support Saeed Mahameed
@ 2020-01-17  0:07 ` Saeed Mahameed
  2020-01-17  0:07 ` [mlx5-next 08/16] net/mlx5e: Add discard counters per priority Saeed Mahameed
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2020-01-17  0:07 UTC (permalink / raw)
  To: David S. Miller, kuba; +Cc: netdev, Aya Levin, Eran Ben Elisha, Saeed Mahameed

From: Aya Levin <ayal@mellanox.com>

Introduce 50G per lane FEC modes capability bit and newly supported
fields in PPLM register which allow this configuration.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index e9c165ffe3f9..2ab4562b4851 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -8581,6 +8581,18 @@ struct mlx5_ifc_pplm_reg_bits {
 	u8	   fec_override_admin_50g[0x4];
 	u8	   fec_override_admin_25g[0x4];
 	u8	   fec_override_admin_10g_40g[0x4];
+
+	u8         fec_override_cap_400g_8x[0x10];
+	u8         fec_override_cap_200g_4x[0x10];
+
+	u8         fec_override_cap_100g_2x[0x10];
+	u8         fec_override_cap_50g_1x[0x10];
+
+	u8         fec_override_admin_400g_8x[0x10];
+	u8         fec_override_admin_200g_4x[0x10];
+
+	u8         fec_override_admin_100g_2x[0x10];
+	u8         fec_override_admin_50g_1x[0x10];
 };
 
 struct mlx5_ifc_ppcnt_reg_bits {
@@ -8907,7 +8919,9 @@ struct mlx5_ifc_mpegc_reg_bits {
 };
 
 struct mlx5_ifc_pcam_enhanced_features_bits {
-	u8         reserved_at_0[0x6d];
+	u8         reserved_at_0[0x68];
+	u8         fec_50G_per_lane_in_pplm[0x1];
+	u8         reserved_at_69[0x4];
 	u8         rx_icrc_encapsulated_counter[0x1];
 	u8	   reserved_at_6e[0x4];
 	u8         ptys_extended_ethernet[0x1];
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [mlx5-next 08/16] net/mlx5e: Add discard counters per priority
  2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
                   ` (6 preceding siblings ...)
  2020-01-17  0:07 ` [mlx5-next 07/16] net/mlx5e: Expose FEC feilds and related capability bit Saeed Mahameed
@ 2020-01-17  0:07 ` Saeed Mahameed
  2020-01-17  0:07 ` [mlx5-next 09/16] net/mlx5: Refactor mlx5_create_auto_grouped_flow_table Saeed Mahameed
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2020-01-17  0:07 UTC (permalink / raw)
  To: David S. Miller, kuba; +Cc: netdev, Aharon Landau, Tariq Toukan, Saeed Mahameed

From: Aharon Landau <aharonl@mellanox.com>

Add counters that count (per priority) the number of received
packets that dropped due to lack of buffers on a physical port. If
this counter is increasing, it implies that the adapter is
congested and cannot absorb the traffic coming from the network.

Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 1 +
 include/linux/mlx5/mlx5_ifc.h                      | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index a05158472ed1..4291db78efc9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -1133,6 +1133,7 @@ static void mlx5e_grp_per_port_buffer_congest_update_stats(struct mlx5e_priv *pr
 static const struct counter_desc pport_per_prio_traffic_stats_desc[] = {
 	{ "rx_prio%d_bytes", PPORT_PER_PRIO_OFF(rx_octets) },
 	{ "rx_prio%d_packets", PPORT_PER_PRIO_OFF(rx_frames) },
+	{ "rx_prio%d_discards", PPORT_PER_PRIO_OFF(rx_discards) },
 	{ "tx_prio%d_bytes", PPORT_PER_PRIO_OFF(tx_octets) },
 	{ "tx_prio%d_packets", PPORT_PER_PRIO_OFF(tx_frames) },
 };
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 2ab4562b4851..ee0a34d66c7c 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -2180,7 +2180,9 @@ struct mlx5_ifc_eth_per_prio_grp_data_layout_bits {
 
 	u8         rx_pause_transition_low[0x20];
 
-	u8         reserved_at_3c0[0x40];
+	u8         rx_discards_high[0x20];
+
+	u8         rx_discards_low[0x20];
 
 	u8         device_stall_minor_watermark_cnt_high[0x20];
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [mlx5-next 09/16] net/mlx5: Refactor mlx5_create_auto_grouped_flow_table
  2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
                   ` (7 preceding siblings ...)
  2020-01-17  0:07 ` [mlx5-next 08/16] net/mlx5e: Add discard counters per priority Saeed Mahameed
@ 2020-01-17  0:07 ` Saeed Mahameed
  2020-01-17  0:07 ` [net-next 10/16] net/mlx5: fs_core: Introduce unmanaged flow tables Saeed Mahameed
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2020-01-17  0:07 UTC (permalink / raw)
  To: David S. Miller, kuba
  Cc: netdev, Paul Blakey, Roi Dayan, Oz Shlomo, Mark Bloch, Saeed Mahameed

From: Paul Blakey <paulb@mellanox.com>

Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param
which already carries the max_fte, prio and flags memebers, and is
used the same in similar mlx5_create_flow_table() function.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c             | 10 +++++----
 .../mellanox/mlx5/core/en_fs_ethtool.c        |  9 +++++---
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 13 +++++++-----
 .../net/ethernet/mellanox/mlx5/core/eswitch.c |  7 +++++--
 .../mellanox/mlx5/core/eswitch_offloads.c     | 13 ++++++------
 .../mlx5/core/eswitch_offloads_termtbl.c      | 11 +++++-----
 .../net/ethernet/mellanox/mlx5/core/fs_core.c | 21 ++++++-------------
 include/linux/mlx5/fs.h                       | 16 +++++++-------
 8 files changed, 52 insertions(+), 48 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 997cbfe4b90c..90489c5f0c6f 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3276,12 +3276,14 @@ static struct mlx5_ib_flow_prio *_get_prio(struct mlx5_flow_namespace *ns,
 					   int num_entries, int num_groups,
 					   u32 flags)
 {
+	struct mlx5_flow_table_attr ft_attr = {};
 	struct mlx5_flow_table *ft;
 
-	ft = mlx5_create_auto_grouped_flow_table(ns, priority,
-						 num_entries,
-						 num_groups,
-						 0, flags);
+	ft_attr.prio = priority;
+	ft_attr.max_fte = num_entries;
+	ft_attr.flags = flags;
+	ft_attr.autogroup.max_num_groups = num_groups;
+	ft = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
 	if (IS_ERR(ft))
 		return ERR_CAST(ft);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
index acd946f2ddbe..3bc2ac3d53fc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
@@ -58,6 +58,7 @@ static struct mlx5e_ethtool_table *get_flow_table(struct mlx5e_priv *priv,
 						  struct ethtool_rx_flow_spec *fs,
 						  int num_tuples)
 {
+	struct mlx5_flow_table_attr ft_attr = {};
 	struct mlx5e_ethtool_table *eth_ft;
 	struct mlx5_flow_namespace *ns;
 	struct mlx5_flow_table *ft;
@@ -102,9 +103,11 @@ static struct mlx5e_ethtool_table *get_flow_table(struct mlx5e_priv *priv,
 	table_size = min_t(u32, BIT(MLX5_CAP_FLOWTABLE(priv->mdev,
 						       flow_table_properties_nic_receive.log_max_ft_size)),
 			   MLX5E_ETHTOOL_NUM_ENTRIES);
-	ft = mlx5_create_auto_grouped_flow_table(ns, prio,
-						 table_size,
-						 MLX5E_ETHTOOL_NUM_GROUPS, 0, 0);
+
+	ft_attr.prio = prio;
+	ft_attr.max_fte = table_size;
+	ft_attr.autogroup.max_num_groups = MLX5E_ETHTOOL_NUM_GROUPS;
+	ft = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
 	if (IS_ERR(ft))
 		return (void *)ft;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index db614bd6bd1f..5aafbb8d2e8e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -960,7 +960,8 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 
 	mutex_lock(&priv->fs.tc.t_lock);
 	if (IS_ERR_OR_NULL(priv->fs.tc.t)) {
-		int tc_grp_size, tc_tbl_size;
+		struct mlx5_flow_table_attr ft_attr = {};
+		int tc_grp_size, tc_tbl_size, tc_num_grps;
 		u32 max_flow_counter;
 
 		max_flow_counter = (MLX5_CAP_GEN(dev, max_flow_counter_31_16) << 16) |
@@ -970,13 +971,15 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 
 		tc_tbl_size = min_t(int, tc_grp_size * MLX5E_TC_TABLE_NUM_GROUPS,
 				    BIT(MLX5_CAP_FLOWTABLE_NIC_RX(dev, log_max_ft_size)));
+		tc_num_grps = MLX5E_TC_TABLE_NUM_GROUPS;
 
+		ft_attr.prio = MLX5E_TC_PRIO;
+		ft_attr.max_fte = tc_tbl_size;
+		ft_attr.level = MLX5E_TC_FT_LEVEL;
+		ft_attr.autogroup.max_num_groups = tc_num_grps;
 		priv->fs.tc.t =
 			mlx5_create_auto_grouped_flow_table(priv->fs.ns,
-							    MLX5E_TC_PRIO,
-							    tc_tbl_size,
-							    MLX5E_TC_TABLE_NUM_GROUPS,
-							    MLX5E_TC_FT_LEVEL, 0);
+							    &ft_attr);
 		if (IS_ERR(priv->fs.tc.t)) {
 			mutex_unlock(&priv->fs.tc.t_lock);
 			NL_SET_ERR_MSG_MOD(extack,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 2c965ad0d744..05b13a1e829c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -277,6 +277,7 @@ enum {
 
 static int esw_create_legacy_vepa_table(struct mlx5_eswitch *esw)
 {
+	struct mlx5_flow_table_attr ft_attr = {};
 	struct mlx5_core_dev *dev = esw->dev;
 	struct mlx5_flow_namespace *root_ns;
 	struct mlx5_flow_table *fdb;
@@ -289,8 +290,10 @@ static int esw_create_legacy_vepa_table(struct mlx5_eswitch *esw)
 	}
 
 	/* num FTE 2, num FG 2 */
-	fdb = mlx5_create_auto_grouped_flow_table(root_ns, LEGACY_VEPA_PRIO,
-						  2, 2, 0, 0);
+	ft_attr.prio = LEGACY_VEPA_PRIO;
+	ft_attr.max_fte = 2;
+	ft_attr.autogroup.max_num_groups = 2;
+	fdb = mlx5_create_auto_grouped_flow_table(root_ns, &ft_attr);
 	if (IS_ERR(fdb)) {
 		err = PTR_ERR(fdb);
 		esw_warn(dev, "Failed to create VEPA FDB err %d\n", err);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 243a5440867e..4b0d992263b1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -904,6 +904,7 @@ create_next_size_table(struct mlx5_eswitch *esw,
 		       int level,
 		       u32 flags)
 {
+	struct mlx5_flow_table_attr ft_attr = {};
 	struct mlx5_flow_table *fdb;
 	int sz;
 
@@ -911,12 +912,12 @@ create_next_size_table(struct mlx5_eswitch *esw,
 	if (!sz)
 		return ERR_PTR(-ENOSPC);
 
-	fdb = mlx5_create_auto_grouped_flow_table(ns,
-						  table_prio,
-						  sz,
-						  ESW_OFFLOADS_NUM_GROUPS,
-						  level,
-						  flags);
+	ft_attr.max_fte = sz;
+	ft_attr.prio = table_prio;
+	ft_attr.level = level;
+	ft_attr.flags = flags;
+	ft_attr.autogroup.max_num_groups = ESW_OFFLOADS_NUM_GROUPS;
+	fdb = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
 	if (IS_ERR(fdb)) {
 		esw_warn(esw->dev, "Failed to create FDB Table err %d (table prio: %d, level: %d, size: %d)\n",
 			 (int)PTR_ERR(fdb), table_prio, level, sz);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c
index 366bda1bb1c3..dc08ed9339ab 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c
@@ -50,8 +50,8 @@ mlx5_eswitch_termtbl_create(struct mlx5_core_dev *dev,
 			    struct mlx5_flow_act *flow_act)
 {
 	static const struct mlx5_flow_spec spec = {};
+	struct mlx5_flow_table_attr ft_attr = {};
 	struct mlx5_flow_namespace *root_ns;
-	int prio, flags;
 	int err;
 
 	root_ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_FDB);
@@ -63,10 +63,11 @@ mlx5_eswitch_termtbl_create(struct mlx5_core_dev *dev,
 	/* As this is the terminating action then the termination table is the
 	 * same prio as the slow path
 	 */
-	prio = FDB_SLOW_PATH;
-	flags = MLX5_FLOW_TABLE_TERMINATION;
-	tt->termtbl = mlx5_create_auto_grouped_flow_table(root_ns, prio, 1, 1,
-							  0, flags);
+	ft_attr.flags = MLX5_FLOW_TABLE_TERMINATION;
+	ft_attr.prio = FDB_SLOW_PATH;
+	ft_attr.max_fte = 1;
+	ft_attr.autogroup.max_num_groups = 1;
+	tt->termtbl = mlx5_create_auto_grouped_flow_table(root_ns, &ft_attr);
 	if (IS_ERR(tt->termtbl)) {
 		esw_warn(dev, "Failed to create termination table\n");
 		return -EOPNOTSUPP;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 8c5df6c7d7b6..51913e2cde5c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1103,31 +1103,22 @@ EXPORT_SYMBOL(mlx5_create_lag_demux_flow_table);
 
 struct mlx5_flow_table*
 mlx5_create_auto_grouped_flow_table(struct mlx5_flow_namespace *ns,
-				    int prio,
-				    int num_flow_table_entries,
-				    int max_num_groups,
-				    u32 level,
-				    u32 flags)
+				    struct mlx5_flow_table_attr *ft_attr)
 {
-	struct mlx5_flow_table_attr ft_attr = {};
 	struct mlx5_flow_table *ft;
 
-	if (max_num_groups > num_flow_table_entries)
+	if (ft_attr->autogroup.max_num_groups > ft_attr->max_fte)
 		return ERR_PTR(-EINVAL);
 
-	ft_attr.max_fte = num_flow_table_entries;
-	ft_attr.prio    = prio;
-	ft_attr.level   = level;
-	ft_attr.flags   = flags;
-
-	ft = mlx5_create_flow_table(ns, &ft_attr);
+	ft = mlx5_create_flow_table(ns, ft_attr);
 	if (IS_ERR(ft))
 		return ft;
 
 	ft->autogroup.active = true;
-	ft->autogroup.required_groups = max_num_groups;
+	ft->autogroup.required_groups = ft_attr->autogroup.max_num_groups;
 	/* We save place for flow groups in addition to max types */
-	ft->autogroup.group_size = ft->max_fte / (max_num_groups + 1);
+	ft->autogroup.group_size = ft->max_fte /
+				   (ft->autogroup.required_groups + 1);
 
 	return ft;
 }
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 4e5b84e66822..a3f8b63839de 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -145,25 +145,25 @@ mlx5_get_flow_vport_acl_namespace(struct mlx5_core_dev *dev,
 				  enum mlx5_flow_namespace_type type,
 				  int vport);
 
-struct mlx5_flow_table *
-mlx5_create_auto_grouped_flow_table(struct mlx5_flow_namespace *ns,
-				    int prio,
-				    int num_flow_table_entries,
-				    int max_num_groups,
-				    u32 level,
-				    u32 flags);
-
 struct mlx5_flow_table_attr {
 	int prio;
 	int max_fte;
 	u32 level;
 	u32 flags;
+
+	struct {
+		int max_num_groups;
+	} autogroup;
 };
 
 struct mlx5_flow_table *
 mlx5_create_flow_table(struct mlx5_flow_namespace *ns,
 		       struct mlx5_flow_table_attr *ft_attr);
 
+struct mlx5_flow_table *
+mlx5_create_auto_grouped_flow_table(struct mlx5_flow_namespace *ns,
+				    struct mlx5_flow_table_attr *ft_attr);
+
 struct mlx5_flow_table *
 mlx5_create_vport_flow_table(struct mlx5_flow_namespace *ns,
 			     int prio,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net-next 10/16] net/mlx5: fs_core: Introduce unmanaged flow tables
  2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
                   ` (8 preceding siblings ...)
  2020-01-17  0:07 ` [mlx5-next 09/16] net/mlx5: Refactor mlx5_create_auto_grouped_flow_table Saeed Mahameed
@ 2020-01-17  0:07 ` Saeed Mahameed
  2020-01-17  0:07 ` [net-next 11/16] net/mlx5: Add ignore level support fwd to table rules Saeed Mahameed
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2020-01-17  0:07 UTC (permalink / raw)
  To: David S. Miller, kuba
  Cc: netdev, Paul Blakey, Roi Dayan, Oz Shlomo, Mark Bloch, Saeed Mahameed

From: Paul Blakey <paulb@mellanox.com>

Currently, Most of the steering tree is statically declared ahead of time,
with steering prios instances allocated for each fdb chain to assign max
number of levels for each of them. This allows fs_core to manage the
connections and  levels of the flow tables hierarcy to prevent loops, but
restricts us with the number of supported chains and priorities.

Introduce unmananged flow tables, allowing the user to manage the flow
table connections. A unamanged table is detached from the fs_core flow
table hierarcy, and is only connected back to the hierarchy by explicit
FTEs forward actions.

This will be used together with firmware that supports ignoring the flow
table levels to increase the number of supported chains and prios.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/fs_core.c | 41 +++++++++++++------
 include/linux/mlx5/fs.h                       |  2 +
 2 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 51913e2cde5c..456d3739b166 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1006,7 +1006,8 @@ static struct mlx5_flow_table *__mlx5_create_flow_table(struct mlx5_flow_namespa
 							u16 vport)
 {
 	struct mlx5_flow_root_namespace *root = find_root(&ns->node);
-	struct mlx5_flow_table *next_ft = NULL;
+	bool unmanaged = ft_attr->flags & MLX5_FLOW_TABLE_UNMANAGED;
+	struct mlx5_flow_table *next_ft;
 	struct fs_prio *fs_prio = NULL;
 	struct mlx5_flow_table *ft;
 	int log_table_sz;
@@ -1023,14 +1024,21 @@ static struct mlx5_flow_table *__mlx5_create_flow_table(struct mlx5_flow_namespa
 		err = -EINVAL;
 		goto unlock_root;
 	}
-	if (ft_attr->level >= fs_prio->num_levels) {
-		err = -ENOSPC;
-		goto unlock_root;
+	if (!unmanaged) {
+		/* The level is related to the
+		 * priority level range.
+		 */
+		if (ft_attr->level >= fs_prio->num_levels) {
+			err = -ENOSPC;
+			goto unlock_root;
+		}
+
+		ft_attr->level += fs_prio->start_level;
 	}
+
 	/* The level is related to the
 	 * priority level range.
 	 */
-	ft_attr->level += fs_prio->start_level;
 	ft = alloc_flow_table(ft_attr->level,
 			      vport,
 			      ft_attr->max_fte ? roundup_pow_of_two(ft_attr->max_fte) : 0,
@@ -1043,19 +1051,27 @@ static struct mlx5_flow_table *__mlx5_create_flow_table(struct mlx5_flow_namespa
 
 	tree_init_node(&ft->node, del_hw_flow_table, del_sw_flow_table);
 	log_table_sz = ft->max_fte ? ilog2(ft->max_fte) : 0;
-	next_ft = find_next_chained_ft(fs_prio);
+	next_ft = unmanaged ? ft_attr->next_ft :
+			      find_next_chained_ft(fs_prio);
 	ft->def_miss_action = ns->def_miss_action;
 	err = root->cmds->create_flow_table(root, ft, log_table_sz, next_ft);
 	if (err)
 		goto free_ft;
 
-	err = connect_flow_table(root->dev, ft, fs_prio);
-	if (err)
-		goto destroy_ft;
+	if (!unmanaged) {
+		err = connect_flow_table(root->dev, ft, fs_prio);
+		if (err)
+			goto destroy_ft;
+	}
+
 	ft->node.active = true;
 	down_write_ref_node(&fs_prio->node, false);
-	tree_add_node(&ft->node, &fs_prio->node);
-	list_add_flow_table(ft, fs_prio);
+	if (!unmanaged) {
+		tree_add_node(&ft->node, &fs_prio->node);
+		list_add_flow_table(ft, fs_prio);
+	} else {
+		ft->node.root = fs_prio->node.root;
+	}
 	fs_prio->num_ft++;
 	up_write_ref_node(&fs_prio->node, false);
 	mutex_unlock(&root->chain_lock);
@@ -2024,7 +2040,8 @@ int mlx5_destroy_flow_table(struct mlx5_flow_table *ft)
 	int err = 0;
 
 	mutex_lock(&root->chain_lock);
-	err = disconnect_flow_table(ft);
+	if (!(ft->flags & MLX5_FLOW_TABLE_UNMANAGED))
+		err = disconnect_flow_table(ft);
 	if (err) {
 		mutex_unlock(&root->chain_lock);
 		return err;
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index a3f8b63839de..de2c838bae90 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -48,6 +48,7 @@ enum {
 	MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT = BIT(0),
 	MLX5_FLOW_TABLE_TUNNEL_EN_DECAP = BIT(1),
 	MLX5_FLOW_TABLE_TERMINATION = BIT(2),
+	MLX5_FLOW_TABLE_UNMANAGED = BIT(3),
 };
 
 #define LEFTOVERS_RULE_NUM	 2
@@ -150,6 +151,7 @@ struct mlx5_flow_table_attr {
 	int max_fte;
 	u32 level;
 	u32 flags;
+	struct mlx5_flow_table *next_ft;
 
 	struct {
 		int max_num_groups;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net-next 11/16] net/mlx5: Add ignore level support fwd to table rules
  2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
                   ` (9 preceding siblings ...)
  2020-01-17  0:07 ` [net-next 10/16] net/mlx5: fs_core: Introduce unmanaged flow tables Saeed Mahameed
@ 2020-01-17  0:07 ` Saeed Mahameed
  2020-01-17  0:07 ` [net-next 12/16] net/mlx5: Allow creating autogroups with reserved entries Saeed Mahameed
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2020-01-17  0:07 UTC (permalink / raw)
  To: David S. Miller, kuba; +Cc: netdev, Paul Blakey, Mark Bloch, Saeed Mahameed

From: Paul Blakey <paulb@mellanox.com>

If user sets ignore flow level flag on a rule, that rule can point to
a flow table of any level, including those with levels equal or less
than the level of the flow table it is added on.

This with unamanged tables will be used to create a FDB chain/prio
hierarchy much larger than currently supported level range.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/fs_cmd.c   |  3 +++
 .../net/ethernet/mellanox/mlx5/core/fs_core.c  | 18 +++++++++++++++---
 include/linux/mlx5/fs.h                        |  1 +
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index 3c816e81f8d9..b25465d9e030 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -432,6 +432,9 @@ static int mlx5_cmd_set_fte(struct mlx5_core_dev *dev,
 	MLX5_SET(set_fte_in, in, table_type, ft->type);
 	MLX5_SET(set_fte_in, in, table_id,   ft->id);
 	MLX5_SET(set_fte_in, in, flow_index, fte->index);
+	MLX5_SET(set_fte_in, in, ignore_flow_level,
+		 !!(fte->action.flags & FLOW_ACT_IGNORE_FLOW_LEVEL));
+
 	if (ft->vport) {
 		MLX5_SET(set_fte_in, in, vport_number, ft->vport);
 		MLX5_SET(set_fte_in, in, other_vport, 1);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 456d3739b166..355a424f4506 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1536,18 +1536,30 @@ static bool counter_is_valid(u32 action)
 }
 
 static bool dest_is_valid(struct mlx5_flow_destination *dest,
-			  u32 action,
+			  struct mlx5_flow_act *flow_act,
 			  struct mlx5_flow_table *ft)
 {
+	bool ignore_level = flow_act->flags & FLOW_ACT_IGNORE_FLOW_LEVEL;
+	u32 action = flow_act->action;
+
 	if (dest && (dest->type == MLX5_FLOW_DESTINATION_TYPE_COUNTER))
 		return counter_is_valid(action);
 
 	if (!(action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST))
 		return true;
 
+	if (ignore_level) {
+		if (ft->type != FS_FT_FDB)
+			return false;
+
+		if (dest->type == MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE &&
+		    dest->ft->type != FS_FT_FDB)
+			return false;
+	}
+
 	if (!dest || ((dest->type ==
 	    MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE) &&
-	    (dest->ft->level <= ft->level)))
+	    (dest->ft->level <= ft->level && !ignore_level)))
 		return false;
 	return true;
 }
@@ -1777,7 +1789,7 @@ _mlx5_add_flow_rules(struct mlx5_flow_table *ft,
 		return ERR_PTR(-EINVAL);
 
 	for (i = 0; i < dest_num; i++) {
-		if (!dest_is_valid(&dest[i], flow_act->action, ft))
+		if (!dest_is_valid(&dest[i], flow_act, ft))
 			return ERR_PTR(-EINVAL);
 	}
 	nested_down_read_ref_node(&ft->node, FS_LOCK_GRANDPARENT);
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index de2c838bae90..81f393fb7d96 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -196,6 +196,7 @@ struct mlx5_fs_vlan {
 
 enum {
 	FLOW_ACT_NO_APPEND = BIT(0),
+	FLOW_ACT_IGNORE_FLOW_LEVEL = BIT(1),
 };
 
 struct mlx5_flow_act {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net-next 12/16] net/mlx5: Allow creating autogroups with reserved entries
  2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
                   ` (10 preceding siblings ...)
  2020-01-17  0:07 ` [net-next 11/16] net/mlx5: Add ignore level support fwd to table rules Saeed Mahameed
@ 2020-01-17  0:07 ` Saeed Mahameed
  2020-01-17  0:07 ` [net-next 13/16] net/mlx5: ft: Use getter function to get ft chain Saeed Mahameed
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2020-01-17  0:07 UTC (permalink / raw)
  To: David S. Miller, kuba
  Cc: netdev, Paul Blakey, Roi Dayan, Oz Shlomo, Mark Bloch, Saeed Mahameed

From: Paul Blakey <paulb@mellanox.com>

Exclude the last n entries for an autogrouped flow table.

Reserving entries at the end of the FT will ensure that this FG will be
the last to be evaluated. This will be used in the next patch to create
a miss group enabling custom actions on FT miss.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/fs_core.c | 26 ++++++++++++-------
 .../net/ethernet/mellanox/mlx5/core/fs_core.h |  1 +
 include/linux/mlx5/fs.h                       |  1 +
 3 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 355a424f4506..c7a16ae05fa8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -579,7 +579,9 @@ static void del_sw_flow_group(struct fs_node *node)
 
 	rhashtable_destroy(&fg->ftes_hash);
 	ida_destroy(&fg->fte_allocator);
-	if (ft->autogroup.active && fg->max_ftes == ft->autogroup.group_size)
+	if (ft->autogroup.active &&
+	    fg->max_ftes == ft->autogroup.group_size &&
+	    fg->start_index < ft->autogroup.max_fte)
 		ft->autogroup.num_groups--;
 	err = rhltable_remove(&ft->fgs_hash,
 			      &fg->hash,
@@ -1121,9 +1123,14 @@ struct mlx5_flow_table*
 mlx5_create_auto_grouped_flow_table(struct mlx5_flow_namespace *ns,
 				    struct mlx5_flow_table_attr *ft_attr)
 {
+	int num_reserved_entries = ft_attr->autogroup.num_reserved_entries;
+	int autogroups_max_fte = ft_attr->max_fte - num_reserved_entries;
+	int max_num_groups = ft_attr->autogroup.max_num_groups;
 	struct mlx5_flow_table *ft;
 
-	if (ft_attr->autogroup.max_num_groups > ft_attr->max_fte)
+	if (max_num_groups > autogroups_max_fte)
+		return ERR_PTR(-EINVAL);
+	if (num_reserved_entries > ft_attr->max_fte)
 		return ERR_PTR(-EINVAL);
 
 	ft = mlx5_create_flow_table(ns, ft_attr);
@@ -1131,10 +1138,10 @@ mlx5_create_auto_grouped_flow_table(struct mlx5_flow_namespace *ns,
 		return ft;
 
 	ft->autogroup.active = true;
-	ft->autogroup.required_groups = ft_attr->autogroup.max_num_groups;
+	ft->autogroup.required_groups = max_num_groups;
+	ft->autogroup.max_fte = autogroups_max_fte;
 	/* We save place for flow groups in addition to max types */
-	ft->autogroup.group_size = ft->max_fte /
-				   (ft->autogroup.required_groups + 1);
+	ft->autogroup.group_size = autogroups_max_fte / (max_num_groups + 1);
 
 	return ft;
 }
@@ -1156,7 +1163,7 @@ struct mlx5_flow_group *mlx5_create_flow_group(struct mlx5_flow_table *ft,
 	struct mlx5_flow_group *fg;
 	int err;
 
-	if (ft->autogroup.active)
+	if (ft->autogroup.active && start_index < ft->autogroup.max_fte)
 		return ERR_PTR(-EPERM);
 
 	down_write_ref_node(&ft->node, false);
@@ -1329,9 +1336,10 @@ static struct mlx5_flow_group *alloc_auto_flow_group(struct mlx5_flow_table  *ft
 						     const struct mlx5_flow_spec *spec)
 {
 	struct list_head *prev = &ft->node.children;
-	struct mlx5_flow_group *fg;
+	u32 max_fte = ft->autogroup.max_fte;
 	unsigned int candidate_index = 0;
 	unsigned int group_size = 0;
+	struct mlx5_flow_group *fg;
 
 	if (!ft->autogroup.active)
 		return ERR_PTR(-ENOENT);
@@ -1339,7 +1347,7 @@ static struct mlx5_flow_group *alloc_auto_flow_group(struct mlx5_flow_table  *ft
 	if (ft->autogroup.num_groups < ft->autogroup.required_groups)
 		group_size = ft->autogroup.group_size;
 
-	/*  ft->max_fte == ft->autogroup.max_types */
+	/*  max_fte == ft->autogroup.max_types */
 	if (group_size == 0)
 		group_size = 1;
 
@@ -1352,7 +1360,7 @@ static struct mlx5_flow_group *alloc_auto_flow_group(struct mlx5_flow_table  *ft
 		prev = &fg->node.list;
 	}
 
-	if (candidate_index + group_size > ft->max_fte)
+	if (candidate_index + group_size > max_fte)
 		return ERR_PTR(-ENOSPC);
 
 	fg = alloc_insert_flow_group(ft,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
index c2621b911563..be5f5e32c1e8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
@@ -164,6 +164,7 @@ struct mlx5_flow_table {
 		unsigned int		required_groups;
 		unsigned int		group_size;
 		unsigned int		num_groups;
+		unsigned int		max_fte;
 	} autogroup;
 	/* Protect fwd_rules */
 	struct mutex			lock;
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 81f393fb7d96..4cae16016b2b 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -155,6 +155,7 @@ struct mlx5_flow_table_attr {
 
 	struct {
 		int max_num_groups;
+		int num_reserved_entries;
 	} autogroup;
 };
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net-next 13/16] net/mlx5: ft: Use getter function to get ft chain
  2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
                   ` (11 preceding siblings ...)
  2020-01-17  0:07 ` [net-next 12/16] net/mlx5: Allow creating autogroups with reserved entries Saeed Mahameed
@ 2020-01-17  0:07 ` Saeed Mahameed
  2020-01-17  0:07 ` [net-next 14/16] net/mlx5: ft: Check prio and chain sanity for ft offload Saeed Mahameed
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2020-01-17  0:07 UTC (permalink / raw)
  To: David S. Miller, kuba
  Cc: netdev, Paul Blakey, Roi Dayan, Mark Bloch, Saeed Mahameed

From: Paul Blakey <paulb@mellanox.com>

FT chain is defined as the next chain after tc.

To prepare for next patches that will increase the number of tc
chains available at runtime, use a getter function to get this
value.

The define is still used in static fs_core allocation,
to calculate the number of chains. This static allocation
will be used if the relevant capabilities won't be available
to support dynamic chains.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c           | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h          | 3 +++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 5 +++++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index f175cb24bb67..d85b56452ee1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1268,7 +1268,7 @@ static int mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void *type_data,
 		 * reserved ft chain.
 		 */
 		memcpy(&cls_flower, f, sizeof(*f));
-		cls_flower.common.chain_index = FDB_FT_CHAIN;
+		cls_flower.common.chain_index = mlx5_eswitch_get_ft_chain(esw);
 		err = mlx5e_rep_setup_tc_cls_flower(priv, &cls_flower, flags);
 		memcpy(&f->stats, &cls_flower.stats, sizeof(f->stats));
 		return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index ffcff3ba3701..69ff3031d1c0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -364,6 +364,9 @@ mlx5_eswitch_get_prio_range(struct mlx5_eswitch *esw);
 u32
 mlx5_eswitch_get_chain_range(struct mlx5_eswitch *esw);
 
+unsigned int
+mlx5_eswitch_get_ft_chain(struct mlx5_eswitch *esw);
+
 struct mlx5_flow_handle *
 mlx5_eswitch_create_vport_rx_rule(struct mlx5_eswitch *esw, u16 vport,
 				  struct mlx5_flow_destination *dest);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 4b0d992263b1..81bafd4b44bb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -80,6 +80,11 @@ u32 mlx5_eswitch_get_chain_range(struct mlx5_eswitch *esw)
 	return 0;
 }
 
+u32 mlx5_eswitch_get_ft_chain(struct mlx5_eswitch *esw)
+{
+	return mlx5_eswitch_get_chain_range(esw) + 1;
+}
+
 u16 mlx5_eswitch_get_prio_range(struct mlx5_eswitch *esw)
 {
 	if (esw->fdb_table.flags & ESW_FDB_CHAINS_AND_PRIOS_SUPPORTED)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net-next 14/16] net/mlx5: ft: Check prio and chain sanity for ft offload
  2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
                   ` (12 preceding siblings ...)
  2020-01-17  0:07 ` [net-next 13/16] net/mlx5: ft: Use getter function to get ft chain Saeed Mahameed
@ 2020-01-17  0:07 ` Saeed Mahameed
  2020-01-17  0:07 ` [net-next 15/16] net/mlx5: E-Switch, Refactor chains and priorities Saeed Mahameed
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2020-01-17  0:07 UTC (permalink / raw)
  To: David S. Miller, kuba
  Cc: netdev, Paul Blakey, Roi Dayan, Mark Bloch, Saeed Mahameed

From: Paul Blakey <paulb@mellanox.com>

Before changing the chain from original chain to ft offload chain,
make sure user doesn't actually use chains.

While here, normalize the prio range to that which we support.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  | 27 ++++++++++++++-----
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index d85b56452ee1..5d93c506ae8b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1247,8 +1247,7 @@ static int mlx5e_rep_setup_tc_cb(enum tc_setup_type type, void *type_data,
 static int mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void *type_data,
 				 void *cb_priv)
 {
-	struct flow_cls_offload *f = type_data;
-	struct flow_cls_offload cls_flower;
+	struct flow_cls_offload tmp, *f = type_data;
 	struct mlx5e_priv *priv = cb_priv;
 	struct mlx5_eswitch *esw;
 	unsigned long flags;
@@ -1261,16 +1260,30 @@ static int mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void *type_data,
 
 	switch (type) {
 	case TC_SETUP_CLSFLOWER:
-		if (!mlx5_eswitch_prios_supported(esw) || f->common.chain_index)
+		memcpy(&tmp, f, sizeof(*f));
+
+		if (!mlx5_eswitch_prios_supported(esw) ||
+		    tmp.common.chain_index)
 			return -EOPNOTSUPP;
 
 		/* Re-use tc offload path by moving the ft flow to the
 		 * reserved ft chain.
+		 *
+		 * FT offload can use prio range [0, INT_MAX], so we
+		 * normalize it to range [1, mlx5_eswitch_get_prio_range(esw)]
+		 * as with tc, where prio 0 isn't supported.
+		 *
+		 * We only support chain 0 of FT offload.
 		 */
-		memcpy(&cls_flower, f, sizeof(*f));
-		cls_flower.common.chain_index = mlx5_eswitch_get_ft_chain(esw);
-		err = mlx5e_rep_setup_tc_cls_flower(priv, &cls_flower, flags);
-		memcpy(&f->stats, &cls_flower.stats, sizeof(f->stats));
+		if (tmp.common.prio >= mlx5_eswitch_get_prio_range(esw))
+			return -EOPNOTSUPP;
+		if (tmp.common.chain_index != 0)
+			return -EOPNOTSUPP;
+
+		tmp.common.chain_index = mlx5_eswitch_get_ft_chain(esw);
+		tmp.common.prio++;
+		err = mlx5e_rep_setup_tc_cls_flower(priv, &tmp, flags);
+		memcpy(&f->stats, &tmp.stats, sizeof(f->stats));
 		return err;
 	default:
 		return -EOPNOTSUPP;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net-next 15/16] net/mlx5: E-Switch, Refactor chains and priorities
  2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
                   ` (13 preceding siblings ...)
  2020-01-17  0:07 ` [net-next 14/16] net/mlx5: ft: Check prio and chain sanity for ft offload Saeed Mahameed
@ 2020-01-17  0:07 ` Saeed Mahameed
  2020-01-17  0:07 ` [net-next 16/16] net/mlx5: E-Switch, Increase number of " Saeed Mahameed
  2020-01-19 15:17 ` [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios David Miller
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2020-01-17  0:07 UTC (permalink / raw)
  To: David S. Miller, kuba
  Cc: netdev, Paul Blakey, Roi Dayan, Oz Shlomo, Mark Bloch, Saeed Mahameed

From: Paul Blakey <paulb@mellanox.com>

To support the entire chain and prio range (32bit + 16bit),
instead of a using a static array of chains/prios of limited size, create
them dynamically, and use a rhashtable to search for existing chains/prio
combinations.

This will be used in next patch to actually increase the number using
unamanged tables support and ignore flow level capability.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |  11 +-
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   |  14 +-
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  30 +-
 .../mellanox/mlx5/core/eswitch_offloads.c     | 303 ++--------
 .../mlx5/core/eswitch_offloads_chains.c       | 542 ++++++++++++++++++
 .../mlx5/core/eswitch_offloads_chains.h       |  27 +
 7 files changed, 637 insertions(+), 292 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index a6f390fdb971..d3e06cec8317 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -42,7 +42,7 @@ mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += en/hv_vhca_stats.o
 # Core extra
 #
 mlx5_core-$(CONFIG_MLX5_ESWITCH)   += eswitch.o eswitch_offloads.o eswitch_offloads_termtbl.o \
-				      ecpf.o rdma.o
+				      ecpf.o rdma.o eswitch_offloads_chains.o
 mlx5_core-$(CONFIG_MLX5_MPFS)      += lib/mpfs.o
 mlx5_core-$(CONFIG_VXLAN)          += lib/vxlan.o
 mlx5_core-$(CONFIG_PTP_1588_CLOCK) += lib/clock.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 5d93c506ae8b..446eb4d6c983 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -41,6 +41,7 @@
 #include <net/ipv6_stubs.h>
 
 #include "eswitch.h"
+#include "eswitch_offloads_chains.h"
 #include "en.h"
 #include "en_rep.h"
 #include "en_tc.h"
@@ -1262,25 +1263,25 @@ static int mlx5e_rep_setup_ft_cb(enum tc_setup_type type, void *type_data,
 	case TC_SETUP_CLSFLOWER:
 		memcpy(&tmp, f, sizeof(*f));
 
-		if (!mlx5_eswitch_prios_supported(esw) ||
+		if (!mlx5_esw_chains_prios_supported(esw) ||
 		    tmp.common.chain_index)
 			return -EOPNOTSUPP;
 
 		/* Re-use tc offload path by moving the ft flow to the
 		 * reserved ft chain.
 		 *
-		 * FT offload can use prio range [0, INT_MAX], so we
-		 * normalize it to range [1, mlx5_eswitch_get_prio_range(esw)]
+		 * FT offload can use prio range [0, INT_MAX], so we normalize
+		 * it to range [1, mlx5_esw_chains_get_prio_range(esw)]
 		 * as with tc, where prio 0 isn't supported.
 		 *
 		 * We only support chain 0 of FT offload.
 		 */
-		if (tmp.common.prio >= mlx5_eswitch_get_prio_range(esw))
+		if (tmp.common.prio >= mlx5_esw_chains_get_prio_range(esw))
 			return -EOPNOTSUPP;
 		if (tmp.common.chain_index != 0)
 			return -EOPNOTSUPP;
 
-		tmp.common.chain_index = mlx5_eswitch_get_ft_chain(esw);
+		tmp.common.chain_index = mlx5_esw_chains_get_ft_chain(esw);
 		tmp.common.prio++;
 		err = mlx5e_rep_setup_tc_cls_flower(priv, &tmp, flags);
 		memcpy(&f->stats, &tmp.stats, sizeof(f->stats));
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 5aafbb8d2e8e..26f559b453dc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -51,6 +51,7 @@
 #include "en_rep.h"
 #include "en_tc.h"
 #include "eswitch.h"
+#include "eswitch_offloads_chains.h"
 #include "fs_core.h"
 #include "en/port.h"
 #include "en/tc_tun.h"
@@ -1083,7 +1084,7 @@ mlx5e_tc_offload_to_slow_path(struct mlx5_eswitch *esw,
 	memcpy(slow_attr, flow->esw_attr, sizeof(*slow_attr));
 	slow_attr->action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
 	slow_attr->split_count = 0;
-	slow_attr->dest_chain = FDB_TC_SLOW_PATH_CHAIN;
+	slow_attr->flags |= MLX5_ESW_ATTR_FLAG_SLOW_PATH;
 
 	rule = mlx5e_tc_offload_fdb_rules(esw, flow, spec, slow_attr);
 	if (!IS_ERR(rule))
@@ -1100,7 +1101,7 @@ mlx5e_tc_unoffload_from_slow_path(struct mlx5_eswitch *esw,
 	memcpy(slow_attr, flow->esw_attr, sizeof(*slow_attr));
 	slow_attr->action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
 	slow_attr->split_count = 0;
-	slow_attr->dest_chain = FDB_TC_SLOW_PATH_CHAIN;
+	slow_attr->flags |= MLX5_ESW_ATTR_FLAG_SLOW_PATH;
 	mlx5e_tc_unoffload_fdb_rules(esw, flow, slow_attr);
 	flow_flag_clear(flow, SLOW);
 }
@@ -1160,19 +1161,18 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 		      struct netlink_ext_ack *extack)
 {
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
-	u32 max_chain = mlx5_eswitch_get_chain_range(esw);
 	struct mlx5_esw_flow_attr *attr = flow->esw_attr;
 	struct mlx5e_tc_flow_parse_attr *parse_attr = attr->parse_attr;
-	u16 max_prio = mlx5_eswitch_get_prio_range(esw);
 	struct net_device *out_dev, *encap_dev = NULL;
 	struct mlx5_fc *counter = NULL;
 	struct mlx5e_rep_priv *rpriv;
 	struct mlx5e_priv *out_priv;
 	bool encap_valid = true;
+	u32 max_prio, max_chain;
 	int err = 0;
 	int out_index;
 
-	if (!mlx5_eswitch_prios_supported(esw) && attr->prio != 1) {
+	if (!mlx5_esw_chains_prios_supported(esw) && attr->prio != 1) {
 		NL_SET_ERR_MSG(extack, "E-switch priorities unsupported, upgrade FW");
 		return -EOPNOTSUPP;
 	}
@@ -1182,11 +1182,13 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 	 * FDB_FT_CHAIN which is outside tc range.
 	 * See mlx5e_rep_setup_ft_cb().
 	 */
+	max_chain = mlx5_esw_chains_get_chain_range(esw);
 	if (!mlx5e_is_ft_flow(flow) && attr->chain > max_chain) {
 		NL_SET_ERR_MSG(extack, "Requested chain is out of supported range");
 		return -EOPNOTSUPP;
 	}
 
+	max_prio = mlx5_esw_chains_get_prio_range(esw);
 	if (attr->prio > max_prio) {
 		NL_SET_ERR_MSG(extack, "Requested priority is out of supported range");
 		return -EOPNOTSUPP;
@@ -3469,7 +3471,7 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 			break;
 		case FLOW_ACTION_GOTO: {
 			u32 dest_chain = act->chain_index;
-			u32 max_chain = mlx5_eswitch_get_chain_range(esw);
+			u32 max_chain = mlx5_esw_chains_get_chain_range(esw);
 
 			if (ft_flow) {
 				NL_SET_ERR_MSG_MOD(extack, "Goto action is not supported");
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 69ff3031d1c0..4472710ccc9c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -157,7 +157,7 @@ enum offloads_fdb_flags {
 	ESW_FDB_CHAINS_AND_PRIOS_SUPPORTED = BIT(0),
 };
 
-extern const unsigned int ESW_POOLS[4];
+struct mlx5_esw_chains_priv;
 
 struct mlx5_eswitch_fdb {
 	union {
@@ -182,14 +182,7 @@ struct mlx5_eswitch_fdb {
 			struct mlx5_flow_handle *miss_rule_multi;
 			int vlan_push_pop_refcount;
 
-			struct {
-				struct mlx5_flow_table *fdb;
-				u32 num_rules;
-			} fdb_prio[FDB_NUM_CHAINS][FDB_TC_MAX_PRIO + 1][FDB_TC_LEVELS_PER_PRIO];
-			/* Protects fdb_prio table */
-			struct mutex fdb_prio_lock;
-
-			int fdb_left[ARRAY_SIZE(ESW_POOLS)];
+			struct mlx5_esw_chains_priv *esw_chains_priv;
 		} offloads;
 	};
 	u32 flags;
@@ -355,18 +348,6 @@ mlx5_eswitch_del_fwd_rule(struct mlx5_eswitch *esw,
 			  struct mlx5_flow_handle *rule,
 			  struct mlx5_esw_flow_attr *attr);
 
-bool
-mlx5_eswitch_prios_supported(struct mlx5_eswitch *esw);
-
-u16
-mlx5_eswitch_get_prio_range(struct mlx5_eswitch *esw);
-
-u32
-mlx5_eswitch_get_chain_range(struct mlx5_eswitch *esw);
-
-unsigned int
-mlx5_eswitch_get_ft_chain(struct mlx5_eswitch *esw);
-
 struct mlx5_flow_handle *
 mlx5_eswitch_create_vport_rx_rule(struct mlx5_eswitch *esw, u16 vport,
 				  struct mlx5_flow_destination *dest);
@@ -391,6 +372,11 @@ enum {
 	MLX5_ESW_DEST_ENCAP_VALID   = BIT(1),
 };
 
+enum {
+	MLX5_ESW_ATTR_FLAG_VLAN_HANDLED  = BIT(0),
+	MLX5_ESW_ATTR_FLAG_SLOW_PATH     = BIT(1),
+};
+
 struct mlx5_esw_flow_attr {
 	struct mlx5_eswitch_rep *in_rep;
 	struct mlx5_core_dev	*in_mdev;
@@ -404,7 +390,6 @@ struct mlx5_esw_flow_attr {
 	u16	vlan_vid[MLX5_FS_VLAN_DEPTH];
 	u8	vlan_prio[MLX5_FS_VLAN_DEPTH];
 	u8	total_vlan;
-	bool	vlan_handled;
 	struct {
 		u32 flags;
 		struct mlx5_eswitch_rep *rep;
@@ -419,6 +404,7 @@ struct mlx5_esw_flow_attr {
 	u32	chain;
 	u16	prio;
 	u32	dest_chain;
+	u32	flags;
 	struct mlx5e_tc_flow_parse_attr *parse_attr;
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 81bafd4b44bb..47f8729197e0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -37,6 +37,7 @@
 #include <linux/mlx5/fs.h>
 #include "mlx5_core.h"
 #include "eswitch.h"
+#include "eswitch_offloads_chains.h"
 #include "rdma.h"
 #include "en.h"
 #include "fs_core.h"
@@ -47,10 +48,6 @@
  * one for multicast.
  */
 #define MLX5_ESW_MISS_FLOWS (2)
-
-#define fdb_prio_table(esw, chain, prio, level) \
-	(esw)->fdb_table.offloads.fdb_prio[(chain)][(prio)][(level)]
-
 #define UPLINK_REP_INDEX 0
 
 static struct mlx5_eswitch_rep *mlx5_eswitch_get_rep(struct mlx5_eswitch *esw,
@@ -62,37 +59,6 @@ static struct mlx5_eswitch_rep *mlx5_eswitch_get_rep(struct mlx5_eswitch *esw,
 	return &esw->offloads.vport_reps[idx];
 }
 
-static struct mlx5_flow_table *
-esw_get_prio_table(struct mlx5_eswitch *esw, u32 chain, u16 prio, int level);
-static void
-esw_put_prio_table(struct mlx5_eswitch *esw, u32 chain, u16 prio, int level);
-
-bool mlx5_eswitch_prios_supported(struct mlx5_eswitch *esw)
-{
-	return (!!(esw->fdb_table.flags & ESW_FDB_CHAINS_AND_PRIOS_SUPPORTED));
-}
-
-u32 mlx5_eswitch_get_chain_range(struct mlx5_eswitch *esw)
-{
-	if (esw->fdb_table.flags & ESW_FDB_CHAINS_AND_PRIOS_SUPPORTED)
-		return FDB_TC_MAX_CHAIN;
-
-	return 0;
-}
-
-u32 mlx5_eswitch_get_ft_chain(struct mlx5_eswitch *esw)
-{
-	return mlx5_eswitch_get_chain_range(esw) + 1;
-}
-
-u16 mlx5_eswitch_get_prio_range(struct mlx5_eswitch *esw)
-{
-	if (esw->fdb_table.flags & ESW_FDB_CHAINS_AND_PRIOS_SUPPORTED)
-		return FDB_TC_MAX_PRIO;
-
-	return 1;
-}
-
 static bool
 esw_check_ingress_prio_tag_enabled(const struct mlx5_eswitch *esw,
 				   const struct mlx5_vport *vport)
@@ -180,10 +146,17 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 	}
 
 	if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
-		if (attr->dest_chain) {
-			struct mlx5_flow_table *ft;
+		struct mlx5_flow_table *ft;
 
-			ft = esw_get_prio_table(esw, attr->dest_chain, 1, 0);
+		if (attr->flags & MLX5_ESW_ATTR_FLAG_SLOW_PATH) {
+			flow_act.flags |= FLOW_ACT_IGNORE_FLOW_LEVEL;
+			dest[i].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
+			dest[i].ft = esw->fdb_table.offloads.slow_fdb;
+			i++;
+		} else if (attr->dest_chain) {
+			flow_act.flags |= FLOW_ACT_IGNORE_FLOW_LEVEL;
+			ft = mlx5_esw_chains_get_table(esw, attr->dest_chain,
+						       1, 0);
 			if (IS_ERR(ft)) {
 				rule = ERR_CAST(ft);
 				goto err_create_goto_table;
@@ -228,7 +201,8 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 	if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
 		flow_act.modify_hdr = attr->modify_hdr;
 
-	fdb = esw_get_prio_table(esw, attr->chain, attr->prio, !!split);
+	fdb = mlx5_esw_chains_get_table(esw, attr->chain, attr->prio,
+					!!split);
 	if (IS_ERR(fdb)) {
 		rule = ERR_CAST(fdb);
 		goto err_esw_get;
@@ -247,10 +221,10 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 	return rule;
 
 err_add_rule:
-	esw_put_prio_table(esw, attr->chain, attr->prio, !!split);
+	mlx5_esw_chains_put_table(esw, attr->chain, attr->prio, !!split);
 err_esw_get:
-	if (attr->dest_chain)
-		esw_put_prio_table(esw, attr->dest_chain, 1, 0);
+	if (!(attr->flags & MLX5_ESW_ATTR_FLAG_SLOW_PATH) && attr->dest_chain)
+		mlx5_esw_chains_put_table(esw, attr->dest_chain, 1, 0);
 err_create_goto_table:
 	return rule;
 }
@@ -267,13 +241,13 @@ mlx5_eswitch_add_fwd_rule(struct mlx5_eswitch *esw,
 	struct mlx5_flow_handle *rule;
 	int i;
 
-	fast_fdb = esw_get_prio_table(esw, attr->chain, attr->prio, 0);
+	fast_fdb = mlx5_esw_chains_get_table(esw, attr->chain, attr->prio, 0);
 	if (IS_ERR(fast_fdb)) {
 		rule = ERR_CAST(fast_fdb);
 		goto err_get_fast;
 	}
 
-	fwd_fdb = esw_get_prio_table(esw, attr->chain, attr->prio, 1);
+	fwd_fdb = mlx5_esw_chains_get_table(esw, attr->chain, attr->prio, 1);
 	if (IS_ERR(fwd_fdb)) {
 		rule = ERR_CAST(fwd_fdb);
 		goto err_get_fwd;
@@ -310,9 +284,9 @@ mlx5_eswitch_add_fwd_rule(struct mlx5_eswitch *esw,
 
 	return rule;
 add_err:
-	esw_put_prio_table(esw, attr->chain, attr->prio, 1);
+	mlx5_esw_chains_put_table(esw, attr->chain, attr->prio, 1);
 err_get_fwd:
-	esw_put_prio_table(esw, attr->chain, attr->prio, 0);
+	mlx5_esw_chains_put_table(esw, attr->chain, attr->prio, 0);
 err_get_fast:
 	return rule;
 }
@@ -337,12 +311,13 @@ __mlx5_eswitch_del_rule(struct mlx5_eswitch *esw,
 	atomic64_dec(&esw->offloads.num_flows);
 
 	if (fwd_rule)  {
-		esw_put_prio_table(esw, attr->chain, attr->prio, 1);
-		esw_put_prio_table(esw, attr->chain, attr->prio, 0);
+		mlx5_esw_chains_put_table(esw, attr->chain, attr->prio, 1);
+		mlx5_esw_chains_put_table(esw, attr->chain, attr->prio, 0);
 	} else {
-		esw_put_prio_table(esw, attr->chain, attr->prio, !!split);
+		mlx5_esw_chains_put_table(esw, attr->chain, attr->prio,
+					  !!split);
 		if (attr->dest_chain)
-			esw_put_prio_table(esw, attr->dest_chain, 1, 0);
+			mlx5_esw_chains_put_table(esw, attr->dest_chain, 1, 0);
 	}
 }
 
@@ -456,7 +431,7 @@ int mlx5_eswitch_add_vlan_action(struct mlx5_eswitch *esw,
 	if (err)
 		goto unlock;
 
-	attr->vlan_handled = false;
+	attr->flags &= ~MLX5_ESW_ATTR_FLAG_VLAN_HANDLED;
 
 	vport = esw_vlan_action_get_vport(attr, push, pop);
 
@@ -464,7 +439,7 @@ int mlx5_eswitch_add_vlan_action(struct mlx5_eswitch *esw,
 		/* tracks VF --> wire rules without vlan push action */
 		if (attr->dests[0].rep->vport == MLX5_VPORT_UPLINK) {
 			vport->vlan_refcount++;
-			attr->vlan_handled = true;
+			attr->flags |= MLX5_ESW_ATTR_FLAG_VLAN_HANDLED;
 		}
 
 		goto unlock;
@@ -495,7 +470,7 @@ int mlx5_eswitch_add_vlan_action(struct mlx5_eswitch *esw,
 	}
 out:
 	if (!err)
-		attr->vlan_handled = true;
+		attr->flags |= MLX5_ESW_ATTR_FLAG_VLAN_HANDLED;
 unlock:
 	mutex_unlock(&esw->state_lock);
 	return err;
@@ -513,7 +488,7 @@ int mlx5_eswitch_del_vlan_action(struct mlx5_eswitch *esw,
 	if (mlx5_eswitch_vlan_actions_supported(esw->dev, 1))
 		return 0;
 
-	if (!attr->vlan_handled)
+	if (!(attr->flags & MLX5_ESW_ATTR_FLAG_VLAN_HANDLED))
 		return 0;
 
 	push = !!(attr->action & MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH);
@@ -587,8 +562,8 @@ mlx5_eswitch_add_send_to_vport_rule(struct mlx5_eswitch *esw, u16 vport,
 	dest.vport.num = vport;
 	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
 
-	flow_rule = mlx5_add_flow_rules(esw->fdb_table.offloads.slow_fdb, spec,
-					&flow_act, &dest, 1);
+	flow_rule = mlx5_add_flow_rules(esw->fdb_table.offloads.slow_fdb,
+					spec, &flow_act, &dest, 1);
 	if (IS_ERR(flow_rule))
 		esw_warn(esw->dev, "FDB: Failed to add send to vport rule err %ld\n", PTR_ERR(flow_rule));
 out:
@@ -829,8 +804,8 @@ static int esw_add_fdb_miss_rule(struct mlx5_eswitch *esw)
 	dest.vport.num = esw->manager_vport;
 	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
 
-	flow_rule = mlx5_add_flow_rules(esw->fdb_table.offloads.slow_fdb, spec,
-					&flow_act, &dest, 1);
+	flow_rule = mlx5_add_flow_rules(esw->fdb_table.offloads.slow_fdb,
+					spec, &flow_act, &dest, 1);
 	if (IS_ERR(flow_rule)) {
 		err = PTR_ERR(flow_rule);
 		esw_warn(esw->dev,  "FDB: Failed to add unicast miss flow rule err %d\n", err);
@@ -844,8 +819,8 @@ static int esw_add_fdb_miss_rule(struct mlx5_eswitch *esw)
 	dmac_v = MLX5_ADDR_OF(fte_match_param, headers_v,
 			      outer_headers.dmac_47_16);
 	dmac_v[0] = 0x01;
-	flow_rule = mlx5_add_flow_rules(esw->fdb_table.offloads.slow_fdb, spec,
-					&flow_act, &dest, 1);
+	flow_rule = mlx5_add_flow_rules(esw->fdb_table.offloads.slow_fdb,
+					spec, &flow_act, &dest, 1);
 	if (IS_ERR(flow_rule)) {
 		err = PTR_ERR(flow_rule);
 		esw_warn(esw->dev, "FDB: Failed to add multicast miss flow rule err %d\n", err);
@@ -860,175 +835,6 @@ static int esw_add_fdb_miss_rule(struct mlx5_eswitch *esw)
 	return err;
 }
 
-#define ESW_OFFLOADS_NUM_GROUPS  4
-
-/* Firmware currently has 4 pool of 4 sizes that it supports (ESW_POOLS),
- * and a virtual memory region of 16M (ESW_SIZE), this region is duplicated
- * for each flow table pool. We can allocate up to 16M of each pool,
- * and we keep track of how much we used via put/get_sz_to_pool.
- * Firmware doesn't report any of this for now.
- * ESW_POOL is expected to be sorted from large to small
- */
-#define ESW_SIZE (16 * 1024 * 1024)
-const unsigned int ESW_POOLS[4] = { 4 * 1024 * 1024, 1 * 1024 * 1024,
-				    64 * 1024, 4 * 1024 };
-
-static int
-get_sz_from_pool(struct mlx5_eswitch *esw)
-{
-	int sz = 0, i;
-
-	for (i = 0; i < ARRAY_SIZE(ESW_POOLS); i++) {
-		if (esw->fdb_table.offloads.fdb_left[i]) {
-			--esw->fdb_table.offloads.fdb_left[i];
-			sz = ESW_POOLS[i];
-			break;
-		}
-	}
-
-	return sz;
-}
-
-static void
-put_sz_to_pool(struct mlx5_eswitch *esw, int sz)
-{
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(ESW_POOLS); i++) {
-		if (sz >= ESW_POOLS[i]) {
-			++esw->fdb_table.offloads.fdb_left[i];
-			break;
-		}
-	}
-}
-
-static struct mlx5_flow_table *
-create_next_size_table(struct mlx5_eswitch *esw,
-		       struct mlx5_flow_namespace *ns,
-		       u16 table_prio,
-		       int level,
-		       u32 flags)
-{
-	struct mlx5_flow_table_attr ft_attr = {};
-	struct mlx5_flow_table *fdb;
-	int sz;
-
-	sz = get_sz_from_pool(esw);
-	if (!sz)
-		return ERR_PTR(-ENOSPC);
-
-	ft_attr.max_fte = sz;
-	ft_attr.prio = table_prio;
-	ft_attr.level = level;
-	ft_attr.flags = flags;
-	ft_attr.autogroup.max_num_groups = ESW_OFFLOADS_NUM_GROUPS;
-	fdb = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
-	if (IS_ERR(fdb)) {
-		esw_warn(esw->dev, "Failed to create FDB Table err %d (table prio: %d, level: %d, size: %d)\n",
-			 (int)PTR_ERR(fdb), table_prio, level, sz);
-		put_sz_to_pool(esw, sz);
-	}
-
-	return fdb;
-}
-
-static struct mlx5_flow_table *
-esw_get_prio_table(struct mlx5_eswitch *esw, u32 chain, u16 prio, int level)
-{
-	struct mlx5_core_dev *dev = esw->dev;
-	struct mlx5_flow_table *fdb = NULL;
-	struct mlx5_flow_namespace *ns;
-	int table_prio, l = 0;
-	u32 flags = 0;
-
-	if (chain == FDB_TC_SLOW_PATH_CHAIN)
-		return esw->fdb_table.offloads.slow_fdb;
-
-	mutex_lock(&esw->fdb_table.offloads.fdb_prio_lock);
-
-	fdb = fdb_prio_table(esw, chain, prio, level).fdb;
-	if (fdb) {
-		/* take ref on earlier levels as well */
-		while (level >= 0)
-			fdb_prio_table(esw, chain, prio, level--).num_rules++;
-		mutex_unlock(&esw->fdb_table.offloads.fdb_prio_lock);
-		return fdb;
-	}
-
-	ns = mlx5_get_fdb_sub_ns(dev, chain);
-	if (!ns) {
-		esw_warn(dev, "Failed to get FDB sub namespace\n");
-		mutex_unlock(&esw->fdb_table.offloads.fdb_prio_lock);
-		return ERR_PTR(-EOPNOTSUPP);
-	}
-
-	if (esw->offloads.encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE)
-		flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT |
-			  MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
-
-	table_prio = prio - 1;
-
-	/* create earlier levels for correct fs_core lookup when
-	 * connecting tables
-	 */
-	for (l = 0; l <= level; l++) {
-		if (fdb_prio_table(esw, chain, prio, l).fdb) {
-			fdb_prio_table(esw, chain, prio, l).num_rules++;
-			continue;
-		}
-
-		fdb = create_next_size_table(esw, ns, table_prio, l, flags);
-		if (IS_ERR(fdb)) {
-			l--;
-			goto err_create_fdb;
-		}
-
-		fdb_prio_table(esw, chain, prio, l).fdb = fdb;
-		fdb_prio_table(esw, chain, prio, l).num_rules = 1;
-	}
-
-	mutex_unlock(&esw->fdb_table.offloads.fdb_prio_lock);
-	return fdb;
-
-err_create_fdb:
-	mutex_unlock(&esw->fdb_table.offloads.fdb_prio_lock);
-	if (l >= 0)
-		esw_put_prio_table(esw, chain, prio, l);
-
-	return fdb;
-}
-
-static void
-esw_put_prio_table(struct mlx5_eswitch *esw, u32 chain, u16 prio, int level)
-{
-	int l;
-
-	if (chain == FDB_TC_SLOW_PATH_CHAIN)
-		return;
-
-	mutex_lock(&esw->fdb_table.offloads.fdb_prio_lock);
-
-	for (l = level; l >= 0; l--) {
-		if (--(fdb_prio_table(esw, chain, prio, l).num_rules) > 0)
-			continue;
-
-		put_sz_to_pool(esw, fdb_prio_table(esw, chain, prio, l).fdb->max_fte);
-		mlx5_destroy_flow_table(fdb_prio_table(esw, chain, prio, l).fdb);
-		fdb_prio_table(esw, chain, prio, l).fdb = NULL;
-	}
-
-	mutex_unlock(&esw->fdb_table.offloads.fdb_prio_lock);
-}
-
-static void esw_destroy_offloads_fast_fdb_tables(struct mlx5_eswitch *esw)
-{
-	/* If lazy creation isn't supported, deref the fast path tables */
-	if (!(esw->fdb_table.flags & ESW_FDB_CHAINS_AND_PRIOS_SUPPORTED)) {
-		esw_put_prio_table(esw, 0, 1, 1);
-		esw_put_prio_table(esw, 0, 1, 0);
-	}
-}
-
 #define MAX_PF_SQ 256
 #define MAX_SQ_NVPORTS 32
 
@@ -1061,16 +867,16 @@ static int esw_create_offloads_fdb_tables(struct mlx5_eswitch *esw, int nvports)
 	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
 	struct mlx5_flow_table_attr ft_attr = {};
 	struct mlx5_core_dev *dev = esw->dev;
-	u32 *flow_group_in, max_flow_counter;
 	struct mlx5_flow_namespace *root_ns;
 	struct mlx5_flow_table *fdb = NULL;
-	int table_size, ix, err = 0, i;
+	u32 flags = 0, *flow_group_in;
+	int table_size, ix, err = 0;
 	struct mlx5_flow_group *g;
-	u32 flags = 0, fdb_max;
 	void *match_criteria;
 	u8 *dmac;
 
 	esw_debug(esw->dev, "Create offloads FDB Tables\n");
+
 	flow_group_in = kvzalloc(inlen, GFP_KERNEL);
 	if (!flow_group_in)
 		return -ENOMEM;
@@ -1089,19 +895,6 @@ static int esw_create_offloads_fdb_tables(struct mlx5_eswitch *esw, int nvports)
 		goto ns_err;
 	}
 
-	max_flow_counter = (MLX5_CAP_GEN(dev, max_flow_counter_31_16) << 16) |
-			    MLX5_CAP_GEN(dev, max_flow_counter_15_0);
-	fdb_max = 1 << MLX5_CAP_ESW_FLOWTABLE_FDB(dev, log_max_ft_size);
-
-	esw_debug(dev, "Create offloads FDB table, min (max esw size(2^%d), max counters(%d), groups(%d), max flow table size(%d))\n",
-		  MLX5_CAP_ESW_FLOWTABLE_FDB(dev, log_max_ft_size),
-		  max_flow_counter, ESW_OFFLOADS_NUM_GROUPS,
-		  fdb_max);
-
-	for (i = 0; i < ARRAY_SIZE(ESW_POOLS); i++)
-		esw->fdb_table.offloads.fdb_left[i] =
-			ESW_POOLS[i] <= fdb_max ? ESW_SIZE / ESW_POOLS[i] : 0;
-
 	table_size = nvports * MAX_SQ_NVPORTS + MAX_PF_SQ +
 		MLX5_ESW_MISS_FLOWS + esw->total_vports;
 
@@ -1124,16 +917,10 @@ static int esw_create_offloads_fdb_tables(struct mlx5_eswitch *esw, int nvports)
 	}
 	esw->fdb_table.offloads.slow_fdb = fdb;
 
-	/* If lazy creation isn't supported, open the fast path tables now */
-	if (!MLX5_CAP_ESW_FLOWTABLE(esw->dev, multi_fdb_encap) &&
-	    esw->offloads.encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE) {
-		esw->fdb_table.flags &= ~ESW_FDB_CHAINS_AND_PRIOS_SUPPORTED;
-		esw_warn(dev, "Lazy creation of flow tables isn't supported, ignoring priorities\n");
-		esw_get_prio_table(esw, 0, 1, 0);
-		esw_get_prio_table(esw, 0, 1, 1);
-	} else {
-		esw_debug(dev, "Lazy creation of flow tables supported, deferring table opening\n");
-		esw->fdb_table.flags |= ESW_FDB_CHAINS_AND_PRIOS_SUPPORTED;
+	err = mlx5_esw_chains_create(esw);
+	if (err) {
+		esw_warn(dev, "Failed to create fdb chains err(%d)\n", err);
+		goto fdb_chains_err;
 	}
 
 	/* create send-to-vport group */
@@ -1224,7 +1011,8 @@ static int esw_create_offloads_fdb_tables(struct mlx5_eswitch *esw, int nvports)
 peer_miss_err:
 	mlx5_destroy_flow_group(esw->fdb_table.offloads.send_to_vport_grp);
 send_vport_err:
-	esw_destroy_offloads_fast_fdb_tables(esw);
+	mlx5_esw_chains_destroy(esw);
+fdb_chains_err:
 	mlx5_destroy_flow_table(esw->fdb_table.offloads.slow_fdb);
 slow_fdb_err:
 	/* Holds true only as long as DMFS is the default */
@@ -1246,8 +1034,8 @@ static void esw_destroy_offloads_fdb_tables(struct mlx5_eswitch *esw)
 	mlx5_destroy_flow_group(esw->fdb_table.offloads.peer_miss_grp);
 	mlx5_destroy_flow_group(esw->fdb_table.offloads.miss_grp);
 
+	mlx5_esw_chains_destroy(esw);
 	mlx5_destroy_flow_table(esw->fdb_table.offloads.slow_fdb);
-	esw_destroy_offloads_fast_fdb_tables(esw);
 	/* Holds true only as long as DMFS is the default */
 	mlx5_flow_namespace_set_mode(esw->fdb_table.offloads.ns,
 				     MLX5_FLOW_STEERING_MODE_DMFS);
@@ -2117,7 +1905,6 @@ static int esw_offloads_steering_init(struct mlx5_eswitch *esw)
 		total_vports = num_vfs + MLX5_SPECIAL_VPORTS(esw->dev);
 
 	memset(&esw->fdb_table.offloads, 0, sizeof(struct offloads_fdb));
-	mutex_init(&esw->fdb_table.offloads.fdb_prio_lock);
 
 	err = esw_create_uplink_offloads_acl_tables(esw);
 	if (err)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c
new file mode 100644
index 000000000000..a694cc52d94c
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c
@@ -0,0 +1,542 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+// Copyright (c) 2020 Mellanox Technologies.
+
+#include <linux/mlx5/driver.h>
+#include <linux/mlx5/mlx5_ifc.h>
+#include <linux/mlx5/fs.h>
+
+#include "eswitch_offloads_chains.h"
+#include "mlx5_core.h"
+#include "fs_core.h"
+#include "eswitch.h"
+#include "en.h"
+
+#define esw_chains_priv(esw) ((esw)->fdb_table.offloads.esw_chains_priv)
+#define esw_chains_lock(esw) (esw_chains_priv(esw)->lock)
+#define esw_chains_ht(esw) (esw_chains_priv(esw)->chains_ht)
+#define esw_prios_ht(esw) (esw_chains_priv(esw)->prios_ht)
+#define fdb_pool_left(esw) (esw_chains_priv(esw)->fdb_left)
+
+#define ESW_OFFLOADS_NUM_GROUPS  4
+
+/* Firmware currently has 4 pool of 4 sizes that it supports (ESW_POOLS),
+ * and a virtual memory region of 16M (ESW_SIZE), this region is duplicated
+ * for each flow table pool. We can allocate up to 16M of each pool,
+ * and we keep track of how much we used via get_next_avail_sz_from_pool.
+ * Firmware doesn't report any of this for now.
+ * ESW_POOL is expected to be sorted from large to small and match firmware
+ * pools.
+ */
+#define ESW_SIZE (16 * 1024 * 1024)
+const unsigned int ESW_POOLS[] = { 4 * 1024 * 1024,
+				   1 * 1024 * 1024,
+				   64 * 1024,
+				   4 * 1024, };
+
+struct mlx5_esw_chains_priv {
+	struct rhashtable chains_ht;
+	struct rhashtable prios_ht;
+	/* Protects above chains_ht and prios_ht */
+	struct mutex lock;
+
+	int fdb_left[ARRAY_SIZE(ESW_POOLS)];
+};
+
+struct fdb_chain {
+	struct rhash_head node;
+
+	u32 chain;
+
+	int ref;
+
+	struct mlx5_eswitch *esw;
+};
+
+struct fdb_prio_key {
+	u32 chain;
+	u32 prio;
+	u32 level;
+};
+
+struct fdb_prio {
+	struct rhash_head node;
+
+	struct fdb_prio_key key;
+
+	int ref;
+
+	struct fdb_chain *fdb_chain;
+	struct mlx5_flow_table *fdb;
+};
+
+static const struct rhashtable_params chain_params = {
+	.head_offset = offsetof(struct fdb_chain, node),
+	.key_offset = offsetof(struct fdb_chain, chain),
+	.key_len = sizeof_field(struct fdb_chain, chain),
+	.automatic_shrinking = true,
+};
+
+static const struct rhashtable_params prio_params = {
+	.head_offset = offsetof(struct fdb_prio, node),
+	.key_offset = offsetof(struct fdb_prio, key),
+	.key_len = sizeof_field(struct fdb_prio, key),
+	.automatic_shrinking = true,
+};
+
+bool mlx5_esw_chains_prios_supported(struct mlx5_eswitch *esw)
+{
+	return esw->fdb_table.flags & ESW_FDB_CHAINS_AND_PRIOS_SUPPORTED;
+}
+
+u32 mlx5_esw_chains_get_chain_range(struct mlx5_eswitch *esw)
+{
+	if (!mlx5_esw_chains_prios_supported(esw))
+		return 1;
+
+	return FDB_TC_MAX_CHAIN;
+}
+
+u32 mlx5_esw_chains_get_ft_chain(struct mlx5_eswitch *esw)
+{
+	return mlx5_esw_chains_get_chain_range(esw) + 1;
+}
+
+u32 mlx5_esw_chains_get_prio_range(struct mlx5_eswitch *esw)
+{
+	if (!mlx5_esw_chains_prios_supported(esw))
+		return 1;
+
+	return FDB_TC_MAX_PRIO;
+}
+
+static unsigned int mlx5_esw_chains_get_level_range(struct mlx5_eswitch *esw)
+{
+	return FDB_TC_LEVELS_PER_PRIO;
+}
+
+#define POOL_NEXT_SIZE 0
+static int
+mlx5_esw_chains_get_avail_sz_from_pool(struct mlx5_eswitch *esw,
+				       int desired_size)
+{
+	int i, found_i = -1;
+
+	for (i = ARRAY_SIZE(ESW_POOLS) - 1; i >= 0; i--) {
+		if (fdb_pool_left(esw)[i] && ESW_POOLS[i] > desired_size) {
+			found_i = i;
+			if (desired_size != POOL_NEXT_SIZE)
+				break;
+		}
+	}
+
+	if (found_i != -1) {
+		--fdb_pool_left(esw)[found_i];
+		return ESW_POOLS[found_i];
+	}
+
+	return 0;
+}
+
+static void
+mlx5_esw_chains_put_sz_to_pool(struct mlx5_eswitch *esw, int sz)
+{
+	int i;
+
+	for (i = ARRAY_SIZE(ESW_POOLS) - 1; i >= 0; i--) {
+		if (sz == ESW_POOLS[i]) {
+			++fdb_pool_left(esw)[i];
+			return;
+		}
+	}
+
+	WARN_ONCE(1, "Couldn't find size %d in fdb size pool", sz);
+}
+
+static void
+mlx5_esw_chains_init_sz_pool(struct mlx5_eswitch *esw)
+{
+	u32 fdb_max;
+	int i;
+
+	fdb_max = 1 << MLX5_CAP_ESW_FLOWTABLE_FDB(esw->dev, log_max_ft_size);
+
+	for (i = ARRAY_SIZE(ESW_POOLS) - 1; i >= 0; i--)
+		fdb_pool_left(esw)[i] =
+			ESW_POOLS[i] <= fdb_max ? ESW_SIZE / ESW_POOLS[i] : 0;
+}
+
+static struct mlx5_flow_table *
+mlx5_esw_chains_create_fdb_table(struct mlx5_eswitch *esw,
+				 u32 chain, u32 prio, u32 level)
+{
+	struct mlx5_flow_table_attr ft_attr = {};
+	struct mlx5_flow_namespace *ns;
+	struct mlx5_flow_table *fdb;
+	int sz;
+
+	if (esw->offloads.encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE)
+		ft_attr.flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT |
+				  MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
+
+	sz = mlx5_esw_chains_get_avail_sz_from_pool(esw, POOL_NEXT_SIZE);
+	if (!sz)
+		return ERR_PTR(-ENOSPC);
+
+	ft_attr.max_fte = sz;
+	ft_attr.level = level;
+	ft_attr.prio = prio - 1;
+	ft_attr.autogroup.max_num_groups = ESW_OFFLOADS_NUM_GROUPS;
+	ns = mlx5_get_fdb_sub_ns(esw->dev, chain);
+
+	fdb = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
+	if (IS_ERR(fdb)) {
+		esw_warn(esw->dev,
+			 "Failed to create FDB table err %d (chain: %d, prio: %d, level: %d, size: %d)\n",
+			 (int)PTR_ERR(fdb), chain, prio, level, sz);
+		mlx5_esw_chains_put_sz_to_pool(esw, sz);
+		return fdb;
+	}
+
+	return fdb;
+}
+
+static void
+mlx5_esw_chains_destroy_fdb_table(struct mlx5_eswitch *esw,
+				  struct mlx5_flow_table *fdb)
+{
+	mlx5_esw_chains_put_sz_to_pool(esw, fdb->max_fte);
+	mlx5_destroy_flow_table(fdb);
+}
+
+static struct fdb_chain *
+mlx5_esw_chains_create_fdb_chain(struct mlx5_eswitch *esw, u32 chain)
+{
+	struct fdb_chain *fdb_chain = NULL;
+	int err;
+
+	fdb_chain = kvzalloc(sizeof(*fdb_chain), GFP_KERNEL);
+	if (!fdb_chain)
+		return ERR_PTR(-ENOMEM);
+
+	fdb_chain->esw = esw;
+	fdb_chain->chain = chain;
+
+	err = rhashtable_insert_fast(&esw_chains_ht(esw), &fdb_chain->node,
+				     chain_params);
+	if (err)
+		goto err_insert;
+
+	return fdb_chain;
+
+err_insert:
+	kvfree(fdb_chain);
+	return ERR_PTR(err);
+}
+
+static void
+mlx5_esw_chains_destroy_fdb_chain(struct fdb_chain *fdb_chain)
+{
+	struct mlx5_eswitch *esw = fdb_chain->esw;
+
+	rhashtable_remove_fast(&esw_chains_ht(esw), &fdb_chain->node,
+			       chain_params);
+	kvfree(fdb_chain);
+}
+
+static struct fdb_chain *
+mlx5_esw_chains_get_fdb_chain(struct mlx5_eswitch *esw, u32 chain)
+{
+	struct fdb_chain *fdb_chain;
+
+	fdb_chain = rhashtable_lookup_fast(&esw_chains_ht(esw), &chain,
+					   chain_params);
+	if (!fdb_chain) {
+		fdb_chain = mlx5_esw_chains_create_fdb_chain(esw, chain);
+		if (IS_ERR(fdb_chain))
+			return fdb_chain;
+	}
+
+	fdb_chain->ref++;
+
+	return fdb_chain;
+}
+
+static void
+mlx5_esw_chains_put_fdb_chain(struct fdb_chain *fdb_chain)
+{
+	if (--fdb_chain->ref == 0)
+		mlx5_esw_chains_destroy_fdb_chain(fdb_chain);
+}
+
+static struct fdb_prio *
+mlx5_esw_chains_create_fdb_prio(struct mlx5_eswitch *esw,
+				u32 chain, u32 prio, u32 level)
+{
+	struct fdb_prio *fdb_prio = NULL;
+	struct fdb_chain *fdb_chain;
+	struct mlx5_flow_table *fdb;
+	int err;
+
+	fdb_chain = mlx5_esw_chains_get_fdb_chain(esw, chain);
+	if (IS_ERR(fdb_chain))
+		return ERR_CAST(fdb_chain);
+
+	fdb_prio = kvzalloc(sizeof(*fdb_prio), GFP_KERNEL);
+	if (!fdb_prio) {
+		err = -ENOMEM;
+		goto err_alloc;
+	}
+
+	fdb = mlx5_esw_chains_create_fdb_table(esw, fdb_chain->chain, prio,
+					       level);
+	if (IS_ERR(fdb)) {
+		err = PTR_ERR(fdb);
+		goto err_create;
+	}
+
+	fdb_prio->fdb_chain = fdb_chain;
+	fdb_prio->key.chain = chain;
+	fdb_prio->key.prio = prio;
+	fdb_prio->key.level = level;
+	fdb_prio->fdb = fdb;
+
+	err = rhashtable_insert_fast(&esw_prios_ht(esw), &fdb_prio->node,
+				     prio_params);
+	if (err)
+		goto err_insert;
+
+	return fdb_prio;
+
+err_insert:
+	mlx5_esw_chains_destroy_fdb_table(esw, fdb);
+err_create:
+	kvfree(fdb_prio);
+err_alloc:
+	mlx5_esw_chains_put_fdb_chain(fdb_chain);
+	return ERR_PTR(err);
+}
+
+static void
+mlx5_esw_chains_destroy_fdb_prio(struct mlx5_eswitch *esw,
+				 struct fdb_prio *fdb_prio)
+{
+	struct fdb_chain *fdb_chain = fdb_prio->fdb_chain;
+
+	rhashtable_remove_fast(&esw_prios_ht(esw), &fdb_prio->node,
+			       prio_params);
+	mlx5_esw_chains_destroy_fdb_table(esw, fdb_prio->fdb);
+	mlx5_esw_chains_put_fdb_chain(fdb_chain);
+	kvfree(fdb_prio);
+}
+
+struct mlx5_flow_table *
+mlx5_esw_chains_get_table(struct mlx5_eswitch *esw, u32 chain, u32 prio,
+			  u32 level)
+{
+	struct mlx5_flow_table *prev_fts;
+	struct fdb_prio *fdb_prio;
+	struct fdb_prio_key key;
+	int l = 0;
+
+	if ((chain > mlx5_esw_chains_get_chain_range(esw) &&
+	     chain != mlx5_esw_chains_get_ft_chain(esw)) ||
+	    prio > mlx5_esw_chains_get_prio_range(esw) ||
+	    level > mlx5_esw_chains_get_level_range(esw))
+		return ERR_PTR(-EOPNOTSUPP);
+
+	/* create earlier levels for correct fs_core lookup when
+	 * connecting tables.
+	 */
+	for (l = 0; l < level; l++) {
+		prev_fts = mlx5_esw_chains_get_table(esw, chain, prio, l);
+		if (IS_ERR(prev_fts)) {
+			fdb_prio = ERR_CAST(prev_fts);
+			goto err_get_prevs;
+		}
+	}
+
+	key.chain = chain;
+	key.prio = prio;
+	key.level = level;
+
+	mutex_lock(&esw_chains_lock(esw));
+	fdb_prio = rhashtable_lookup_fast(&esw_prios_ht(esw), &key,
+					  prio_params);
+	if (!fdb_prio) {
+		fdb_prio = mlx5_esw_chains_create_fdb_prio(esw, chain,
+							   prio, level);
+		if (IS_ERR(fdb_prio))
+			goto err_create_prio;
+	}
+
+	++fdb_prio->ref;
+	mutex_unlock(&esw_chains_lock(esw));
+
+	return fdb_prio->fdb;
+
+err_create_prio:
+	mutex_unlock(&esw_chains_lock(esw));
+err_get_prevs:
+	while (--l >= 0)
+		mlx5_esw_chains_put_table(esw, chain, prio, l);
+	return ERR_CAST(fdb_prio);
+}
+
+void
+mlx5_esw_chains_put_table(struct mlx5_eswitch *esw, u32 chain, u32 prio,
+			  u32 level)
+{
+	struct fdb_prio *fdb_prio;
+	struct fdb_prio_key key;
+
+	key.chain = chain;
+	key.prio = prio;
+	key.level = level;
+
+	mutex_lock(&esw_chains_lock(esw));
+	fdb_prio = rhashtable_lookup_fast(&esw_prios_ht(esw), &key,
+					  prio_params);
+	if (!fdb_prio)
+		goto err_get_prio;
+
+	if (--fdb_prio->ref == 0)
+		mlx5_esw_chains_destroy_fdb_prio(esw, fdb_prio);
+	mutex_unlock(&esw_chains_lock(esw));
+
+	while (level-- > 0)
+		mlx5_esw_chains_put_table(esw, chain, prio, level);
+
+	return;
+
+err_get_prio:
+	mutex_unlock(&esw_chains_lock(esw));
+	WARN_ONCE(1,
+		  "Couldn't find table: (chain: %d prio: %d level: %d)",
+		  chain, prio, level);
+}
+
+static int
+mlx5_esw_chains_init(struct mlx5_eswitch *esw)
+{
+	struct mlx5_esw_chains_priv *chains_priv;
+	struct mlx5_core_dev *dev = esw->dev;
+	u32 max_flow_counter, fdb_max;
+	int err;
+
+	chains_priv = kzalloc(sizeof(*chains_priv), GFP_KERNEL);
+	if (!chains_priv)
+		return -ENOMEM;
+	esw_chains_priv(esw) = chains_priv;
+
+	max_flow_counter = (MLX5_CAP_GEN(dev, max_flow_counter_31_16) << 16) |
+			    MLX5_CAP_GEN(dev, max_flow_counter_15_0);
+	fdb_max = 1 << MLX5_CAP_ESW_FLOWTABLE_FDB(dev, log_max_ft_size);
+
+	esw_debug(dev,
+		  "Init esw offloads chains, max counters(%d), groups(%d), max flow table size(%d)\n",
+		  max_flow_counter, ESW_OFFLOADS_NUM_GROUPS, fdb_max);
+
+	mlx5_esw_chains_init_sz_pool(esw);
+
+	if (!MLX5_CAP_ESW_FLOWTABLE(esw->dev, multi_fdb_encap) &&
+	    esw->offloads.encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE) {
+		esw->fdb_table.flags &= ~ESW_FDB_CHAINS_AND_PRIOS_SUPPORTED;
+		esw_warn(dev, "Tc chains and priorities offload aren't supported, update firmware if needed\n");
+	} else {
+		esw->fdb_table.flags |= ESW_FDB_CHAINS_AND_PRIOS_SUPPORTED;
+		esw_info(dev, "Supported tc offload range - chains: %u, prios: %u\n",
+			 mlx5_esw_chains_get_chain_range(esw),
+			 mlx5_esw_chains_get_prio_range(esw));
+	}
+
+	err = rhashtable_init(&esw_chains_ht(esw), &chain_params);
+	if (err)
+		goto init_chains_ht_err;
+
+	err = rhashtable_init(&esw_prios_ht(esw), &prio_params);
+	if (err)
+		goto init_prios_ht_err;
+
+	mutex_init(&esw_chains_lock(esw));
+
+	return 0;
+
+init_prios_ht_err:
+	rhashtable_destroy(&esw_chains_ht(esw));
+init_chains_ht_err:
+	kfree(chains_priv);
+	return err;
+}
+
+static void
+mlx5_esw_chains_cleanup(struct mlx5_eswitch *esw)
+{
+	mutex_destroy(&esw_chains_lock(esw));
+	rhashtable_destroy(&esw_prios_ht(esw));
+	rhashtable_destroy(&esw_chains_ht(esw));
+
+	kfree(esw_chains_priv(esw));
+}
+
+static int
+mlx5_esw_chains_open(struct mlx5_eswitch *esw)
+{
+	struct mlx5_flow_table *ft;
+	int err;
+
+	/* Always open the root for fast path */
+	ft = mlx5_esw_chains_get_table(esw, 0, 1, 0);
+	if (IS_ERR(ft))
+		return PTR_ERR(ft);
+
+	/* Open level 1 for split rules now if prios isn't supported  */
+	if (!mlx5_esw_chains_prios_supported(esw)) {
+		ft = mlx5_esw_chains_get_table(esw, 0, 1, 1);
+
+		if (IS_ERR(ft)) {
+			err = PTR_ERR(ft);
+			goto level_1_err;
+		}
+	}
+
+	return 0;
+
+level_1_err:
+	mlx5_esw_chains_put_table(esw, 0, 1, 0);
+	return err;
+}
+
+static void
+mlx5_esw_chains_close(struct mlx5_eswitch *esw)
+{
+	if (!mlx5_esw_chains_prios_supported(esw))
+		mlx5_esw_chains_put_table(esw, 0, 1, 1);
+	mlx5_esw_chains_put_table(esw, 0, 1, 0);
+}
+
+int
+mlx5_esw_chains_create(struct mlx5_eswitch *esw)
+{
+	int err;
+
+	err = mlx5_esw_chains_init(esw);
+	if (err)
+		return err;
+
+	err = mlx5_esw_chains_open(esw);
+	if (err)
+		goto err_open;
+
+	return 0;
+
+err_open:
+	mlx5_esw_chains_cleanup(esw);
+	return err;
+}
+
+void
+mlx5_esw_chains_destroy(struct mlx5_eswitch *esw)
+{
+	mlx5_esw_chains_close(esw);
+	mlx5_esw_chains_cleanup(esw);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h
new file mode 100644
index 000000000000..52fadacab84d
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2020 Mellanox Technologies. */
+
+#ifndef __ML5_ESW_CHAINS_H__
+#define __ML5_ESW_CHAINS_H__
+
+bool
+mlx5_esw_chains_prios_supported(struct mlx5_eswitch *esw);
+u32
+mlx5_esw_chains_get_prio_range(struct mlx5_eswitch *esw);
+u32
+mlx5_esw_chains_get_chain_range(struct mlx5_eswitch *esw);
+u32
+mlx5_esw_chains_get_ft_chain(struct mlx5_eswitch *esw);
+
+struct mlx5_flow_table *
+mlx5_esw_chains_get_table(struct mlx5_eswitch *esw, u32 chain, u32 prio,
+			  u32 level);
+void
+mlx5_esw_chains_put_table(struct mlx5_eswitch *esw, u32 chain, u32 prio,
+			  u32 level);
+
+int mlx5_esw_chains_create(struct mlx5_eswitch *esw);
+void mlx5_esw_chains_destroy(struct mlx5_eswitch *esw);
+
+#endif /* __ML5_ESW_CHAINS_H__ */
+
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net-next 16/16] net/mlx5: E-Switch, Increase number of chains and priorities
  2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
                   ` (14 preceding siblings ...)
  2020-01-17  0:07 ` [net-next 15/16] net/mlx5: E-Switch, Refactor chains and priorities Saeed Mahameed
@ 2020-01-17  0:07 ` Saeed Mahameed
  2020-01-19 15:17 ` [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios David Miller
  16 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2020-01-17  0:07 UTC (permalink / raw)
  To: David S. Miller, kuba
  Cc: netdev, Paul Blakey, Roi Dayan, Oz Shlomo, Mark Bloch, Saeed Mahameed

From: Paul Blakey <paulb@mellanox.com>

Increase the number of chains and priorities to support
the whole range available in tc.

We use unmanaged tables and ignore flow level to create more
tables than what we declared to fs_core steering, and we manage
the connections between the tables themselves.

To support that we need FW with ignore_flow_level capability.
Otherwise the old behaviour will be used, where we are limited
by the number of levels we declared (4 chains, 16 prios).

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../mellanox/mlx5/core/eswitch_offloads.c     |   3 +-
 .../mlx5/core/eswitch_offloads_chains.c       | 238 +++++++++++++++++-
 .../mlx5/core/eswitch_offloads_chains.h       |   3 +
 3 files changed, 232 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 47f8729197e0..a6d0b62ef234 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -151,7 +151,7 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 		if (attr->flags & MLX5_ESW_ATTR_FLAG_SLOW_PATH) {
 			flow_act.flags |= FLOW_ACT_IGNORE_FLOW_LEVEL;
 			dest[i].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
-			dest[i].ft = esw->fdb_table.offloads.slow_fdb;
+			dest[i].ft = mlx5_esw_chains_get_tc_end_ft(esw);
 			i++;
 		} else if (attr->dest_chain) {
 			flow_act.flags |= FLOW_ACT_IGNORE_FLOW_LEVEL;
@@ -275,6 +275,7 @@ mlx5_eswitch_add_fwd_rule(struct mlx5_eswitch *esw,
 	if (attr->outer_match_level != MLX5_MATCH_NONE)
 		spec->match_criteria_enable |= MLX5_MATCH_OUTER_HEADERS;
 
+	flow_act.flags |= FLOW_ACT_IGNORE_FLOW_LEVEL;
 	rule = mlx5_add_flow_rules(fast_fdb, spec, &flow_act, dest, i);
 
 	if (IS_ERR(rule))
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c
index a694cc52d94c..3a60eb5360bd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c
@@ -16,6 +16,10 @@
 #define esw_chains_ht(esw) (esw_chains_priv(esw)->chains_ht)
 #define esw_prios_ht(esw) (esw_chains_priv(esw)->prios_ht)
 #define fdb_pool_left(esw) (esw_chains_priv(esw)->fdb_left)
+#define tc_slow_fdb(esw) ((esw)->fdb_table.offloads.slow_fdb)
+#define tc_end_fdb(esw) (esw_chains_priv(esw)->tc_end_fdb)
+#define fdb_ignore_flow_level_supported(esw) \
+	(MLX5_CAP_ESW_FLOWTABLE_FDB((esw)->dev, ignore_flow_level))
 
 #define ESW_OFFLOADS_NUM_GROUPS  4
 
@@ -39,6 +43,8 @@ struct mlx5_esw_chains_priv {
 	/* Protects above chains_ht and prios_ht */
 	struct mutex lock;
 
+	struct mlx5_flow_table *tc_end_fdb;
+
 	int fdb_left[ARRAY_SIZE(ESW_POOLS)];
 };
 
@@ -50,6 +56,7 @@ struct fdb_chain {
 	int ref;
 
 	struct mlx5_eswitch *esw;
+	struct list_head prios_list;
 };
 
 struct fdb_prio_key {
@@ -60,6 +67,7 @@ struct fdb_prio_key {
 
 struct fdb_prio {
 	struct rhash_head node;
+	struct list_head list;
 
 	struct fdb_prio_key key;
 
@@ -67,6 +75,9 @@ struct fdb_prio {
 
 	struct fdb_chain *fdb_chain;
 	struct mlx5_flow_table *fdb;
+	struct mlx5_flow_table *next_fdb;
+	struct mlx5_flow_group *miss_group;
+	struct mlx5_flow_handle *miss_rule;
 };
 
 static const struct rhashtable_params chain_params = {
@@ -93,6 +104,9 @@ u32 mlx5_esw_chains_get_chain_range(struct mlx5_eswitch *esw)
 	if (!mlx5_esw_chains_prios_supported(esw))
 		return 1;
 
+	if (fdb_ignore_flow_level_supported(esw))
+		return UINT_MAX - 1;
+
 	return FDB_TC_MAX_CHAIN;
 }
 
@@ -106,11 +120,17 @@ u32 mlx5_esw_chains_get_prio_range(struct mlx5_eswitch *esw)
 	if (!mlx5_esw_chains_prios_supported(esw))
 		return 1;
 
+	if (fdb_ignore_flow_level_supported(esw))
+		return UINT_MAX;
+
 	return FDB_TC_MAX_PRIO;
 }
 
 static unsigned int mlx5_esw_chains_get_level_range(struct mlx5_eswitch *esw)
 {
+	if (fdb_ignore_flow_level_supported(esw))
+		return UINT_MAX;
+
 	return FDB_TC_LEVELS_PER_PRIO;
 }
 
@@ -181,13 +201,40 @@ mlx5_esw_chains_create_fdb_table(struct mlx5_eswitch *esw,
 	sz = mlx5_esw_chains_get_avail_sz_from_pool(esw, POOL_NEXT_SIZE);
 	if (!sz)
 		return ERR_PTR(-ENOSPC);
-
 	ft_attr.max_fte = sz;
-	ft_attr.level = level;
-	ft_attr.prio = prio - 1;
-	ft_attr.autogroup.max_num_groups = ESW_OFFLOADS_NUM_GROUPS;
-	ns = mlx5_get_fdb_sub_ns(esw->dev, chain);
 
+	/* We use tc_slow_fdb(esw) as the table's next_ft till
+	 * ignore_flow_level is allowed on FT creation and not just for FTEs.
+	 * Instead caller should add an explicit miss rule if needed.
+	 */
+	ft_attr.next_ft = tc_slow_fdb(esw);
+
+	/* The root table(chain 0, prio 1, level 0) is required to be
+	 * connected to the previous prio (FDB_BYPASS_PATH if exists).
+	 * We always create it, as a managed table, in order to align with
+	 * fs_core logic.
+	 */
+	if (!fdb_ignore_flow_level_supported(esw) ||
+	    (chain == 0 && prio == 1 && level == 0)) {
+		ft_attr.level = level;
+		ft_attr.prio = prio - 1;
+		ns = mlx5_get_fdb_sub_ns(esw->dev, chain);
+	} else {
+		ft_attr.flags |= MLX5_FLOW_TABLE_UNMANAGED;
+		ft_attr.prio = FDB_TC_OFFLOAD;
+		/* Firmware doesn't allow us to create another level 0 table,
+		 * so we create all unmanaged tables as level 1.
+		 *
+		 * To connect them, we use explicit miss rules with
+		 * ignore_flow_level. Caller is responsible to create
+		 * these rules (if needed).
+		 */
+		ft_attr.level = 1;
+		ns = mlx5_get_flow_namespace(esw->dev, MLX5_FLOW_NAMESPACE_FDB);
+	}
+
+	ft_attr.autogroup.num_reserved_entries = 2;
+	ft_attr.autogroup.max_num_groups = ESW_OFFLOADS_NUM_GROUPS;
 	fdb = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
 	if (IS_ERR(fdb)) {
 		esw_warn(esw->dev,
@@ -220,6 +267,7 @@ mlx5_esw_chains_create_fdb_chain(struct mlx5_eswitch *esw, u32 chain)
 
 	fdb_chain->esw = esw;
 	fdb_chain->chain = chain;
+	INIT_LIST_HEAD(&fdb_chain->prios_list);
 
 	err = rhashtable_insert_fast(&esw_chains_ht(esw), &fdb_chain->node,
 				     chain_params);
@@ -261,6 +309,79 @@ mlx5_esw_chains_get_fdb_chain(struct mlx5_eswitch *esw, u32 chain)
 	return fdb_chain;
 }
 
+static struct mlx5_flow_handle *
+mlx5_esw_chains_add_miss_rule(struct mlx5_flow_table *fdb,
+			      struct mlx5_flow_table *next_fdb)
+{
+	static const struct mlx5_flow_spec spec = {};
+	struct mlx5_flow_destination dest = {};
+	struct mlx5_flow_act act = {};
+
+	act.flags  = FLOW_ACT_IGNORE_FLOW_LEVEL | FLOW_ACT_NO_APPEND;
+	act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+	dest.type  = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
+	dest.ft = next_fdb;
+
+	return mlx5_add_flow_rules(fdb, &spec, &act, &dest, 1);
+}
+
+static int
+mlx5_esw_chains_update_prio_prevs(struct fdb_prio *fdb_prio,
+				  struct mlx5_flow_table *next_fdb)
+{
+	struct mlx5_flow_handle *miss_rules[FDB_TC_LEVELS_PER_PRIO + 1] = {};
+	struct fdb_chain *fdb_chain = fdb_prio->fdb_chain;
+	struct fdb_prio *pos;
+	int n = 0, err;
+
+	if (fdb_prio->key.level)
+		return 0;
+
+	/* Iterate in reverse order until reaching the level 0 rule of
+	 * the previous priority, adding all the miss rules first, so we can
+	 * revert them if any of them fails.
+	 */
+	pos = fdb_prio;
+	list_for_each_entry_continue_reverse(pos,
+					     &fdb_chain->prios_list,
+					     list) {
+		miss_rules[n] = mlx5_esw_chains_add_miss_rule(pos->fdb,
+							      next_fdb);
+		if (IS_ERR(miss_rules[n])) {
+			err = PTR_ERR(miss_rules[n]);
+			goto err_prev_rule;
+		}
+
+		n++;
+		if (!pos->key.level)
+			break;
+	}
+
+	/* Success, delete old miss rules, and update the pointers. */
+	n = 0;
+	pos = fdb_prio;
+	list_for_each_entry_continue_reverse(pos,
+					     &fdb_chain->prios_list,
+					     list) {
+		mlx5_del_flow_rules(pos->miss_rule);
+
+		pos->miss_rule = miss_rules[n];
+		pos->next_fdb = next_fdb;
+
+		n++;
+		if (!pos->key.level)
+			break;
+	}
+
+	return 0;
+
+err_prev_rule:
+	while (--n >= 0)
+		mlx5_del_flow_rules(miss_rules[n]);
+
+	return err;
+}
+
 static void
 mlx5_esw_chains_put_fdb_chain(struct fdb_chain *fdb_chain)
 {
@@ -272,9 +393,15 @@ static struct fdb_prio *
 mlx5_esw_chains_create_fdb_prio(struct mlx5_eswitch *esw,
 				u32 chain, u32 prio, u32 level)
 {
+	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
+	struct mlx5_flow_handle *miss_rule = NULL;
+	struct mlx5_flow_group *miss_group;
 	struct fdb_prio *fdb_prio = NULL;
+	struct mlx5_flow_table *next_fdb;
 	struct fdb_chain *fdb_chain;
 	struct mlx5_flow_table *fdb;
+	struct list_head *pos;
+	u32 *flow_group_in;
 	int err;
 
 	fdb_chain = mlx5_esw_chains_get_fdb_chain(esw, chain);
@@ -282,18 +409,65 @@ mlx5_esw_chains_create_fdb_prio(struct mlx5_eswitch *esw,
 		return ERR_CAST(fdb_chain);
 
 	fdb_prio = kvzalloc(sizeof(*fdb_prio), GFP_KERNEL);
-	if (!fdb_prio) {
+	flow_group_in = kvzalloc(inlen, GFP_KERNEL);
+	if (!fdb_prio || !flow_group_in) {
 		err = -ENOMEM;
 		goto err_alloc;
 	}
 
-	fdb = mlx5_esw_chains_create_fdb_table(esw, fdb_chain->chain, prio,
-					       level);
+	/* Chain's prio list is sorted by prio and level.
+	 * And all levels of some prio point to the next prio's level 0.
+	 * Example list (prio, level):
+	 * (3,0)->(3,1)->(5,0)->(5,1)->(6,1)->(7,0)
+	 * In hardware, we will we have the following pointers:
+	 * (3,0) -> (5,0) -> (7,0) -> Slow path
+	 * (3,1) -> (5,0)
+	 * (5,1) -> (7,0)
+	 * (6,1) -> (7,0)
+	 */
+
+	/* Default miss for each chain: */
+	next_fdb = (chain == mlx5_esw_chains_get_ft_chain(esw)) ?
+		    tc_slow_fdb(esw) :
+		    tc_end_fdb(esw);
+	list_for_each(pos, &fdb_chain->prios_list) {
+		struct fdb_prio *p = list_entry(pos, struct fdb_prio, list);
+
+		/* exit on first pos that is larger */
+		if (prio < p->key.prio || (prio == p->key.prio &&
+					   level < p->key.level)) {
+			/* Get next level 0 table */
+			next_fdb = p->key.level == 0 ? p->fdb : p->next_fdb;
+			break;
+		}
+	}
+
+	fdb = mlx5_esw_chains_create_fdb_table(esw, chain, prio, level);
 	if (IS_ERR(fdb)) {
 		err = PTR_ERR(fdb);
 		goto err_create;
 	}
 
+	MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index,
+		 fdb->max_fte - 2);
+	MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index,
+		 fdb->max_fte - 1);
+	miss_group = mlx5_create_flow_group(fdb, flow_group_in);
+	if (IS_ERR(miss_group)) {
+		err = PTR_ERR(miss_group);
+		goto err_group;
+	}
+
+	/* Add miss rule to next_fdb */
+	miss_rule = mlx5_esw_chains_add_miss_rule(fdb, next_fdb);
+	if (IS_ERR(miss_rule)) {
+		err = PTR_ERR(miss_rule);
+		goto err_miss_rule;
+	}
+
+	fdb_prio->miss_group = miss_group;
+	fdb_prio->miss_rule = miss_rule;
+	fdb_prio->next_fdb = next_fdb;
 	fdb_prio->fdb_chain = fdb_chain;
 	fdb_prio->key.chain = chain;
 	fdb_prio->key.prio = prio;
@@ -305,13 +479,30 @@ mlx5_esw_chains_create_fdb_prio(struct mlx5_eswitch *esw,
 	if (err)
 		goto err_insert;
 
+	list_add(&fdb_prio->list, pos->prev);
+
+	/* Table is ready, connect it */
+	err = mlx5_esw_chains_update_prio_prevs(fdb_prio, fdb);
+	if (err)
+		goto err_update;
+
+	kvfree(flow_group_in);
 	return fdb_prio;
 
+err_update:
+	list_del(&fdb_prio->list);
+	rhashtable_remove_fast(&esw_prios_ht(esw), &fdb_prio->node,
+			       prio_params);
 err_insert:
+	mlx5_del_flow_rules(miss_rule);
+err_miss_rule:
+	mlx5_destroy_flow_group(miss_group);
+err_group:
 	mlx5_esw_chains_destroy_fdb_table(esw, fdb);
 err_create:
-	kvfree(fdb_prio);
 err_alloc:
+	kvfree(fdb_prio);
+	kvfree(flow_group_in);
 	mlx5_esw_chains_put_fdb_chain(fdb_chain);
 	return ERR_PTR(err);
 }
@@ -322,8 +513,14 @@ mlx5_esw_chains_destroy_fdb_prio(struct mlx5_eswitch *esw,
 {
 	struct fdb_chain *fdb_chain = fdb_prio->fdb_chain;
 
+	WARN_ON(mlx5_esw_chains_update_prio_prevs(fdb_prio,
+						  fdb_prio->next_fdb));
+
+	list_del(&fdb_prio->list);
 	rhashtable_remove_fast(&esw_prios_ht(esw), &fdb_prio->node,
 			       prio_params);
+	mlx5_del_flow_rules(fdb_prio->miss_rule);
+	mlx5_destroy_flow_group(fdb_prio->miss_group);
 	mlx5_esw_chains_destroy_fdb_table(esw, fdb_prio->fdb);
 	mlx5_esw_chains_put_fdb_chain(fdb_chain);
 	kvfree(fdb_prio);
@@ -415,6 +612,12 @@ mlx5_esw_chains_put_table(struct mlx5_eswitch *esw, u32 chain, u32 prio,
 		  chain, prio, level);
 }
 
+struct mlx5_flow_table *
+mlx5_esw_chains_get_tc_end_ft(struct mlx5_eswitch *esw)
+{
+	return tc_end_fdb(esw);
+}
+
 static int
 mlx5_esw_chains_init(struct mlx5_eswitch *esw)
 {
@@ -484,11 +687,21 @@ mlx5_esw_chains_open(struct mlx5_eswitch *esw)
 	struct mlx5_flow_table *ft;
 	int err;
 
-	/* Always open the root for fast path */
-	ft = mlx5_esw_chains_get_table(esw, 0, 1, 0);
+	/* Create tc_end_fdb(esw) which is the always created ft chain */
+	ft = mlx5_esw_chains_get_table(esw, mlx5_esw_chains_get_ft_chain(esw),
+				       1, 0);
 	if (IS_ERR(ft))
 		return PTR_ERR(ft);
 
+	tc_end_fdb(esw) = ft;
+
+	/* Always open the root for fast path */
+	ft = mlx5_esw_chains_get_table(esw, 0, 1, 0);
+	if (IS_ERR(ft)) {
+		err = PTR_ERR(ft);
+		goto level_0_err;
+	}
+
 	/* Open level 1 for split rules now if prios isn't supported  */
 	if (!mlx5_esw_chains_prios_supported(esw)) {
 		ft = mlx5_esw_chains_get_table(esw, 0, 1, 1);
@@ -503,6 +716,8 @@ mlx5_esw_chains_open(struct mlx5_eswitch *esw)
 
 level_1_err:
 	mlx5_esw_chains_put_table(esw, 0, 1, 0);
+level_0_err:
+	mlx5_esw_chains_put_table(esw, mlx5_esw_chains_get_ft_chain(esw), 1, 0);
 	return err;
 }
 
@@ -512,6 +727,7 @@ mlx5_esw_chains_close(struct mlx5_eswitch *esw)
 	if (!mlx5_esw_chains_prios_supported(esw))
 		mlx5_esw_chains_put_table(esw, 0, 1, 1);
 	mlx5_esw_chains_put_table(esw, 0, 1, 0);
+	mlx5_esw_chains_put_table(esw, mlx5_esw_chains_get_ft_chain(esw), 1, 0);
 }
 
 int
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h
index 52fadacab84d..2e13097fe348 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h
@@ -20,6 +20,9 @@ void
 mlx5_esw_chains_put_table(struct mlx5_eswitch *esw, u32 chain, u32 prio,
 			  u32 level);
 
+struct mlx5_flow_table *
+mlx5_esw_chains_get_tc_end_ft(struct mlx5_eswitch *esw);
+
 int mlx5_esw_chains_create(struct mlx5_eswitch *esw);
 void mlx5_esw_chains_destroy(struct mlx5_eswitch *esw);
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios
  2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
                   ` (15 preceding siblings ...)
  2020-01-17  0:07 ` [net-next 16/16] net/mlx5: E-Switch, Increase number of " Saeed Mahameed
@ 2020-01-19 15:17 ` David Miller
  16 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2020-01-19 15:17 UTC (permalink / raw)
  To: saeedm; +Cc: kuba, netdev

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Fri, 17 Jan 2020 00:06:50 +0000

> This series has two parts, 
> 
> 1) A merge commit with mlx5-next branch that include updates for mlx5
> HW layouts needed for this and upcoming submissions. 
> 
> 2) From Paul, Increase the number of chains and prios
> 
> Currently the Mellanox driver supports offloading tc rules that
> are defined on the first 4 chains and the first 16 priorities.
> The restriction stems from the firmware flow level enforcement
> requiring a flow table of a certain level to point to a flow
> table of a higher level. This limitation may be ignored by setting
> the ignore_flow_level bit when creating flow table entries.
> Use unmanaged tables and ignore flow level to create more tables than
> declared by fs_core steering. Manually manage the connections between the
> tables themselves.
> 
> HW table is instantiated for every tc <chain,prio> tuple. The miss rule
> of every table either jumps to the next <chain,prio> table, or continues
> to slow_fdb. This logic is realized by following this sequence:
> 
> 1. Create an auto-grouped flow table for the specified priority with
>     reserved entries
> 
> Reserved entries are allocated at the end of the flow table.
> Flow groups are evaluated in sequence and therefore it is guaranteed
> that the flow group defined on the last FTEs will be the last to evaluate.
> 
> Define a "match all" flow group on the reserved entries, providing
> the platform to add table miss actions.
> 
> 2. Set the miss rule action to jump to the next <chain,prio> table
>     or the slow_fdb.
> 
> 3. Link the previous priority table to point to the new table by
>     updating its miss rule.
> 
> Please pull and let me know if there's any problem.

Pulled, thanks Saeed.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-01-19 15:17 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-17  0:06 [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios Saeed Mahameed
2020-01-17  0:06 ` [mlx5-next 01/16] net/mlx5: Add structures layout for new MCAM access reg groups Saeed Mahameed
2020-01-17  0:06 ` [mlx5-next 02/16] net/mlx5: Read MCAM register groups 1 and 2 Saeed Mahameed
2020-01-17  0:06 ` [mlx5-next 03/16] net/mlx5: Add structures and defines for MIRC register Saeed Mahameed
2020-01-17  0:06 ` [mlx5-next 04/16] net/mlx5: Expose resource dump register mapping Saeed Mahameed
2020-01-17  0:07 ` [mlx5-next 05/16] net/mlx5: Add copy header action struct layout Saeed Mahameed
2020-01-17  0:07 ` [mlx5-next 06/16] net/mlx5: Add mlx5_ifc definitions for connection tracking support Saeed Mahameed
2020-01-17  0:07 ` [mlx5-next 07/16] net/mlx5e: Expose FEC feilds and related capability bit Saeed Mahameed
2020-01-17  0:07 ` [mlx5-next 08/16] net/mlx5e: Add discard counters per priority Saeed Mahameed
2020-01-17  0:07 ` [mlx5-next 09/16] net/mlx5: Refactor mlx5_create_auto_grouped_flow_table Saeed Mahameed
2020-01-17  0:07 ` [net-next 10/16] net/mlx5: fs_core: Introduce unmanaged flow tables Saeed Mahameed
2020-01-17  0:07 ` [net-next 11/16] net/mlx5: Add ignore level support fwd to table rules Saeed Mahameed
2020-01-17  0:07 ` [net-next 12/16] net/mlx5: Allow creating autogroups with reserved entries Saeed Mahameed
2020-01-17  0:07 ` [net-next 13/16] net/mlx5: ft: Use getter function to get ft chain Saeed Mahameed
2020-01-17  0:07 ` [net-next 14/16] net/mlx5: ft: Check prio and chain sanity for ft offload Saeed Mahameed
2020-01-17  0:07 ` [net-next 15/16] net/mlx5: E-Switch, Refactor chains and priorities Saeed Mahameed
2020-01-17  0:07 ` [net-next 16/16] net/mlx5: E-Switch, Increase number of " Saeed Mahameed
2020-01-19 15:17 ` [net-next 00/16][pull request] Mellanox, mlx5 E-Switch chains and prios David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).