netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20
@ 2020-11-20 23:03 Saeed Mahameed
  2020-11-20 23:03 ` [PATCH mlx5-next 01/16] net/mlx5: Add sample offload hardware bits and structures Saeed Mahameed
                   ` (16 more replies)
  0 siblings, 17 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-20 23:03 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky; +Cc: netdev, linux-rdma

Hi,

This series includes trivial updates to mlx5 next branch
1) HW definition for upcoming features
2) Include files and general Cleanups
3) Add the upcoming BlueField-3 device ID
4) Define flow steering priority for VDPA
5) Export missing steering API for ULPs,
   will be used later in VDPA driver, to create flow steering domain for
   VDPA queues.
6) ECPF (Embedded CPU Physical function) minor improvements for BlueField.

In case of no objection this series will be applied to mlx5-next, and sent
as a pull request to both net-next and rdma-next trees.

Thanks,
Saeed.

---

Aya Levin (1):
  net/mlx5: Expose IP-in-IP TX and RX capability bits

Chris Mi (2):
  net/mlx5: Add sample offload hardware bits and structures
  net/mlx5: Add sampler destination type

Eli Cohen (2):
  net/mlx5: Add VDPA priority to NIC RX namespace
  net/mlx5: Export steering related functions

Eran Ben Elisha (1):
  net/mlx5: Add ts_cqe_to_dest_cqn related bits

Meir Lichtinger (1):
  net/mlx5: Update the list of the PCI supported devices

Muhammad Sammar (2):
  net/mlx5: Check dr mask size against mlx5_match_param size
  net/mlx5: Add misc4 to mlx5_ifc_fte_match_param_bits

Parav Pandit (6):
  net/mlx5: Avoid exposing driver internal command helpers
  net/mlx5: Update the hardware interface definition for vhca state
  net/mlx5: Make API mlx5_core_is_ecpf accept const pointer
  net/mlx5: Rename peer_pf to host_pf
  net/mlx5: Enable host PF HCA after eswitch is initialized
  net/mlx5: Treat host PF vport as other (non eswitch manager) vport

Yishai Hadas (1):
  net/mlx5: Expose other function ifc bits

 drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  3 -
 .../mellanox/mlx5/core/diag/fs_tracepoint.c   |  3 +
 .../net/ethernet/mellanox/mlx5/core/ecpf.c    | 76 ++++++++++-----
 .../net/ethernet/mellanox/mlx5/core/ecpf.h    |  3 +
 .../mellanox/mlx5/core/esw/acl/helper.c       |  5 +-
 .../net/ethernet/mellanox/mlx5/core/eswitch.c | 29 +++++-
 .../net/ethernet/mellanox/mlx5/core/fs_cmd.c  | 57 ++++++-----
 .../net/ethernet/mellanox/mlx5/core/fs_core.c | 27 +++---
 .../net/ethernet/mellanox/mlx5/core/fs_core.h |  2 +-
 .../net/ethernet/mellanox/mlx5/core/main.c    | 19 ++--
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  4 +
 .../ethernet/mellanox/mlx5/core/pagealloc.c   | 12 +--
 .../mellanox/mlx5/core/steering/dr_matcher.c  |  2 +-
 .../mellanox/mlx5/core/steering/dr_rule.c     |  3 +-
 .../mellanox/mlx5/core/steering/dr_types.h    |  1 +
 include/linux/mlx5/device.h                   |  8 ++
 include/linux/mlx5/driver.h                   |  8 +-
 include/linux/mlx5/fs.h                       |  7 +-
 include/linux/mlx5/mlx5_ifc.h                 | 94 +++++++++++++++++--
 include/uapi/rdma/mlx5_user_ioctl_cmds.h      |  2 +-
 20 files changed, 260 insertions(+), 105 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH mlx5-next 01/16] net/mlx5: Add sample offload hardware bits and structures
  2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
@ 2020-11-20 23:03 ` Saeed Mahameed
  2020-11-20 23:03 ` [PATCH mlx5-next 02/16] net/mlx5: Add sampler destination type Saeed Mahameed
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-20 23:03 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky; +Cc: netdev, linux-rdma, Chris Mi, Oz Shlomo

From: Chris Mi <cmi@nvidia.com>

Hardware introduces flow sampler object for packet sampling.
Add the offload hardware bits and structures.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 include/linux/mlx5/mlx5_ifc.h | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 651591a2965d..65ea35af0527 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -10657,11 +10657,13 @@ struct mlx5_ifc_affiliated_event_header_bits {
 enum {
 	MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_ENCRYPTION_KEY = BIT(0xc),
 	MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_IPSEC = BIT(0x13),
+	MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_SAMPLER = BIT(0x20),
 };
 
 enum {
 	MLX5_GENERAL_OBJECT_TYPES_ENCRYPTION_KEY = 0xc,
 	MLX5_GENERAL_OBJECT_TYPES_IPSEC = 0x13,
+	MLX5_GENERAL_OBJECT_TYPES_SAMPLER = 0x20,
 };
 
 enum {
@@ -10736,6 +10738,33 @@ struct mlx5_ifc_create_encryption_key_in_bits {
 	struct mlx5_ifc_encryption_key_obj_bits encryption_key_object;
 };
 
+struct mlx5_ifc_sampler_obj_bits {
+	u8         modify_field_select[0x40];
+
+	u8         table_type[0x8];
+	u8         level[0x8];
+	u8         reserved_at_50[0xf];
+	u8         ignore_flow_level[0x1];
+
+	u8         sample_ratio[0x20];
+
+	u8         reserved_at_80[0x8];
+	u8         sample_table_id[0x18];
+
+	u8         reserved_at_a0[0x8];
+	u8         default_table_id[0x18];
+
+	u8         sw_steering_icm_address_rx[0x40];
+	u8         sw_steering_icm_address_tx[0x40];
+
+	u8         reserved_at_140[0xa0];
+};
+
+struct mlx5_ifc_create_sampler_obj_in_bits {
+	struct mlx5_ifc_general_obj_in_cmd_hdr_bits general_obj_in_cmd_hdr;
+	struct mlx5_ifc_sampler_obj_bits sampler_object;
+};
+
 enum {
 	MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_KEY_SIZE_128 = 0x0,
 	MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_KEY_SIZE_256 = 0x1,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH mlx5-next 02/16] net/mlx5: Add sampler destination type
  2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
  2020-11-20 23:03 ` [PATCH mlx5-next 01/16] net/mlx5: Add sample offload hardware bits and structures Saeed Mahameed
@ 2020-11-20 23:03 ` Saeed Mahameed
  2020-11-20 23:03 ` [PATCH mlx5-next 03/16] net/mlx5: Check dr mask size against mlx5_match_param size Saeed Mahameed
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-20 23:03 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky; +Cc: netdev, linux-rdma, Chris Mi, Oz Shlomo

From: Chris Mi <cmi@nvidia.com>

The flow sampler object is a new destination type. Add a new member
for the flow destination.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.c | 3 +++
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c             | 3 +++
 include/linux/mlx5/fs.h                                      | 1 +
 include/linux/mlx5/mlx5_ifc.h                                | 1 +
 4 files changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.c
index a700f3c86899..87d65f6b5310 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.c
@@ -247,6 +247,9 @@ const char *parse_fs_dst(struct trace_seq *p,
 	case MLX5_FLOW_DESTINATION_TYPE_TIR:
 		trace_seq_printf(p, "tir=%u\n", dst->tir_num);
 		break;
+	case MLX5_FLOW_DESTINATION_TYPE_FLOW_SAMPLER:
+		trace_seq_printf(p, "sampler_id=%u\n", dst->sampler_id);
+		break;
 	case MLX5_FLOW_DESTINATION_TYPE_COUNTER:
 		trace_seq_printf(p, "counter_id=%u\n", counter_id);
 		break;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index babe3405132a..c2fed9c3d75c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -515,6 +515,9 @@ static int mlx5_cmd_set_fte(struct mlx5_core_dev *dev,
 						 dst->dest_attr.vport.pkt_reformat->id);
 				}
 				break;
+			case MLX5_FLOW_DESTINATION_TYPE_FLOW_SAMPLER:
+				id = dst->dest_attr.sampler_id;
+				break;
 			default:
 				id = dst->dest_attr.tir_num;
 			}
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 846d94ad04bc..35d2cc1646d3 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -132,6 +132,7 @@ struct mlx5_flow_destination {
 			struct mlx5_pkt_reformat *pkt_reformat;
 			u8		flags;
 		} vport;
+		u32			sampler_id;
 	};
 };
 
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 65ea35af0527..2f2add4bd5e1 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1616,6 +1616,7 @@ enum mlx5_flow_destination_type {
 	MLX5_FLOW_DESTINATION_TYPE_VPORT        = 0x0,
 	MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE   = 0x1,
 	MLX5_FLOW_DESTINATION_TYPE_TIR          = 0x2,
+	MLX5_FLOW_DESTINATION_TYPE_FLOW_SAMPLER = 0x6,
 
 	MLX5_FLOW_DESTINATION_TYPE_PORT         = 0x99,
 	MLX5_FLOW_DESTINATION_TYPE_COUNTER      = 0x100,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH mlx5-next 03/16] net/mlx5: Check dr mask size against mlx5_match_param size
  2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
  2020-11-20 23:03 ` [PATCH mlx5-next 01/16] net/mlx5: Add sample offload hardware bits and structures Saeed Mahameed
  2020-11-20 23:03 ` [PATCH mlx5-next 02/16] net/mlx5: Add sampler destination type Saeed Mahameed
@ 2020-11-20 23:03 ` Saeed Mahameed
  2020-11-20 23:03 ` [PATCH mlx5-next 04/16] net/mlx5: Add misc4 to mlx5_ifc_fte_match_param_bits Saeed Mahameed
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-20 23:03 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev, linux-rdma, Muhammad Sammar, Alex Vesker, Mark Bloch

From: Muhammad Sammar <muhammads@nvidia.com>

This is to allow passing misc4 match param from userspace when
function like ib_flow_matcher_create is called.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c    | 3 +--
 drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h   | 1 +
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
index 7df883686d46..1b3b2acd45c5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
@@ -630,7 +630,7 @@ static int dr_matcher_init(struct mlx5dr_matcher *matcher,
 	}
 
 	if (mask) {
-		if (mask->match_sz > sizeof(struct mlx5dr_match_param)) {
+		if (mask->match_sz > DR_SZ_MATCH_PARAM) {
 			mlx5dr_err(dmn, "Invalid match size attribute\n");
 			return -EINVAL;
 		}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c
index b3c9dc032026..6d73719db1f4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c
@@ -874,8 +874,7 @@ static bool dr_rule_verify(struct mlx5dr_matcher *matcher,
 	u32 s_idx, e_idx;
 
 	if (!value_size ||
-	    (value_size > sizeof(struct mlx5dr_match_param) ||
-	     (value_size % sizeof(u32)))) {
+	    (value_size > DR_SZ_MATCH_PARAM || (value_size % sizeof(u32)))) {
 		mlx5dr_err(matcher->tbl->dmn, "Rule parameters length is incorrect\n");
 		return false;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
index f50f3b107aa3..937f469ec678 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
@@ -17,6 +17,7 @@
 #define WIRE_PORT 0xFFFF
 #define DR_STE_SVLAN 0x1
 #define DR_STE_CVLAN 0x2
+#define DR_SZ_MATCH_PARAM (MLX5_ST_SZ_DW_MATCH_PARAM * 4)
 
 #define mlx5dr_err(dmn, arg...) mlx5_core_err((dmn)->mdev, ##arg)
 #define mlx5dr_info(dmn, arg...) mlx5_core_info((dmn)->mdev, ##arg)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH mlx5-next 04/16] net/mlx5: Add misc4 to mlx5_ifc_fte_match_param_bits
  2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2020-11-20 23:03 ` [PATCH mlx5-next 03/16] net/mlx5: Check dr mask size against mlx5_match_param size Saeed Mahameed
@ 2020-11-20 23:03 ` Saeed Mahameed
  2020-11-20 23:03 ` [PATCH mlx5-next 05/16] net/mlx5: Add ts_cqe_to_dest_cqn related bits Saeed Mahameed
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-20 23:03 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev, linux-rdma, Muhammad Sammar, Alex Vesker, Mark Bloch

From: Muhammad Sammar <muhammads@nvidia.com>

Add misc4 match params to enable matching on prog_sample_fields.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/fs_core.h |  2 +-
 include/linux/mlx5/device.h                   |  1 +
 include/linux/mlx5/mlx5_ifc.h                 | 25 ++++++++++++++++++-
 include/uapi/rdma/mlx5_user_ioctl_cmds.h      |  2 +-
 4 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
index afe7f0bffb93..b24a9849c45e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
@@ -194,7 +194,7 @@ struct mlx5_ft_underlay_qp {
 	u32 qpn;
 };
 
-#define MLX5_FTE_MATCH_PARAM_RESERVED	reserved_at_a00
+#define MLX5_FTE_MATCH_PARAM_RESERVED	reserved_at_c00
 /* Calculate the fte_match_param length and without the reserved length.
  * Make sure the reserved field is the last.
  */
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index cf824366a7d1..e9639c4cf2ed 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1076,6 +1076,7 @@ enum {
 	MLX5_MATCH_INNER_HEADERS	= 1 << 2,
 	MLX5_MATCH_MISC_PARAMETERS_2	= 1 << 3,
 	MLX5_MATCH_MISC_PARAMETERS_3	= 1 << 4,
+	MLX5_MATCH_MISC_PARAMETERS_4	= 1 << 5,
 };
 
 enum {
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 2f2add4bd5e1..11c24fafd7f2 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -623,6 +623,26 @@ struct mlx5_ifc_fte_match_set_misc3_bits {
 	u8         reserved_at_140[0xc0];
 };
 
+struct mlx5_ifc_fte_match_set_misc4_bits {
+	u8         prog_sample_field_value_0[0x20];
+
+	u8         prog_sample_field_id_0[0x20];
+
+	u8         prog_sample_field_value_1[0x20];
+
+	u8         prog_sample_field_id_1[0x20];
+
+	u8         prog_sample_field_value_2[0x20];
+
+	u8         prog_sample_field_id_2[0x20];
+
+	u8         prog_sample_field_value_3[0x20];
+
+	u8         prog_sample_field_id_3[0x20];
+
+	u8         reserved_at_100[0x100];
+};
+
 struct mlx5_ifc_cmd_pas_bits {
 	u8         pa_h[0x20];
 
@@ -1669,7 +1689,9 @@ struct mlx5_ifc_fte_match_param_bits {
 
 	struct mlx5_ifc_fte_match_set_misc3_bits misc_parameters_3;
 
-	u8         reserved_at_a00[0x600];
+	struct mlx5_ifc_fte_match_set_misc4_bits misc_parameters_4;
+
+	u8         reserved_at_c00[0x400];
 };
 
 enum {
@@ -5462,6 +5484,7 @@ enum {
 	MLX5_QUERY_FLOW_GROUP_OUT_MATCH_CRITERIA_ENABLE_INNER_HEADERS    = 0x2,
 	MLX5_QUERY_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_MISC_PARAMETERS_2 = 0x3,
 	MLX5_QUERY_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_MISC_PARAMETERS_3 = 0x4,
+	MLX5_QUERY_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_MISC_PARAMETERS_4 = 0x5,
 };
 
 struct mlx5_ifc_query_flow_group_out_bits {
diff --git a/include/uapi/rdma/mlx5_user_ioctl_cmds.h b/include/uapi/rdma/mlx5_user_ioctl_cmds.h
index e24d66d278cf..3fd9b380a091 100644
--- a/include/uapi/rdma/mlx5_user_ioctl_cmds.h
+++ b/include/uapi/rdma/mlx5_user_ioctl_cmds.h
@@ -232,7 +232,7 @@ enum mlx5_ib_device_query_context_attrs {
 	MLX5_IB_ATTR_QUERY_CONTEXT_RESP_UCTX = (1U << UVERBS_ID_NS_SHIFT),
 };
 
-#define MLX5_IB_DW_MATCH_PARAM 0x80
+#define MLX5_IB_DW_MATCH_PARAM 0x90
 
 struct mlx5_ib_match_params {
 	__u32	match_params[MLX5_IB_DW_MATCH_PARAM];
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH mlx5-next 05/16] net/mlx5: Add ts_cqe_to_dest_cqn related bits
  2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2020-11-20 23:03 ` [PATCH mlx5-next 04/16] net/mlx5: Add misc4 to mlx5_ifc_fte_match_param_bits Saeed Mahameed
@ 2020-11-20 23:03 ` Saeed Mahameed
  2020-11-20 23:03 ` [PATCH mlx5-next 06/16] net/mlx5: Avoid exposing driver internal command helpers Saeed Mahameed
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-20 23:03 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev, linux-rdma, Eran Ben Elisha, Tariq Toukan

From: Eran Ben Elisha <eranbe@nvidia.com>

Add a bit in HCA capabilities layout to indicate if ts_cqe_to_dest_cqn is
supported.

In addition, add ts_cqe_to_dest_cqn field to SQ context, for driver to
set the actual CQN.

Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 include/linux/mlx5/mlx5_ifc.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 11c24fafd7f2..632b9a61fda5 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1261,7 +1261,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8	   ece_support[0x1];
 	u8	   reserved_at_a4[0x7];
 	u8         log_max_srq[0x5];
-	u8         reserved_at_b0[0x10];
+	u8         reserved_at_b0[0x2];
+	u8         ts_cqe_to_dest_cqn[0x1];
+	u8         reserved_at_b3[0xd];
 
 	u8         max_sgl_for_optimized_performance[0x8];
 	u8         log_max_cq_sz[0x8];
@@ -3312,8 +3314,12 @@ struct mlx5_ifc_sqc_bits {
 	u8         reserved_at_80[0x10];
 	u8         hairpin_peer_vhca[0x10];
 
-	u8         reserved_at_a0[0x50];
+	u8         reserved_at_a0[0x20];
 
+	u8         reserved_at_c0[0x8];
+	u8         ts_cqe_to_dest_cqn[0x18];
+
+	u8         reserved_at_e0[0x10];
 	u8         packet_pacing_rate_limit_index[0x10];
 	u8         tis_lst_sz[0x10];
 	u8         reserved_at_110[0x10];
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH mlx5-next 06/16] net/mlx5: Avoid exposing driver internal command helpers
  2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2020-11-20 23:03 ` [PATCH mlx5-next 05/16] net/mlx5: Add ts_cqe_to_dest_cqn related bits Saeed Mahameed
@ 2020-11-20 23:03 ` Saeed Mahameed
  2020-11-20 23:03 ` [PATCH mlx5-next 07/16] net/mlx5: Update the list of the PCI supported devices Saeed Mahameed
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-20 23:03 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky; +Cc: netdev, linux-rdma, Parav Pandit

From: Parav Pandit <parav@nvidia.com>

mlx5 command init and cleanup routines are internal to mlx5_core driver.
Hence, avoid exporting them and move their definition to mlx5_core
driver's internal file mlx5_core.h

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c       | 3 ---
 drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h | 4 ++++
 include/linux/mlx5/driver.h                         | 4 ----
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index e49387dbef98..50c7b9ee80c3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -2142,7 +2142,6 @@ int mlx5_cmd_init(struct mlx5_core_dev *dev)
 	kvfree(cmd->stats);
 	return err;
 }
-EXPORT_SYMBOL(mlx5_cmd_init);
 
 void mlx5_cmd_cleanup(struct mlx5_core_dev *dev)
 {
@@ -2155,11 +2154,9 @@ void mlx5_cmd_cleanup(struct mlx5_core_dev *dev)
 	dma_pool_destroy(cmd->pool);
 	kvfree(cmd->stats);
 }
-EXPORT_SYMBOL(mlx5_cmd_cleanup);
 
 void mlx5_cmd_set_state(struct mlx5_core_dev *dev,
 			enum mlx5_cmdif_state cmdif_state)
 {
 	dev->cmd.state = cmdif_state;
 }
-EXPORT_SYMBOL(mlx5_cmd_set_state);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 8cec85ab419d..9d00efa9e6bc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -122,6 +122,10 @@ enum mlx5_semaphore_space_address {
 
 int mlx5_query_hca_caps(struct mlx5_core_dev *dev);
 int mlx5_query_board_id(struct mlx5_core_dev *dev);
+int mlx5_cmd_init(struct mlx5_core_dev *dev);
+void mlx5_cmd_cleanup(struct mlx5_core_dev *dev);
+void mlx5_cmd_set_state(struct mlx5_core_dev *dev,
+			enum mlx5_cmdif_state cmdif_state);
 int mlx5_cmd_init_hca(struct mlx5_core_dev *dev, uint32_t *sw_owner_id);
 int mlx5_cmd_teardown_hca(struct mlx5_core_dev *dev);
 int mlx5_cmd_force_teardown_hca(struct mlx5_core_dev *dev);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index add85094f9a5..5e84b1d53650 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -888,10 +888,6 @@ enum {
 	CMD_ALLOWED_OPCODE_ALL,
 };
 
-int mlx5_cmd_init(struct mlx5_core_dev *dev);
-void mlx5_cmd_cleanup(struct mlx5_core_dev *dev);
-void mlx5_cmd_set_state(struct mlx5_core_dev *dev,
-			enum mlx5_cmdif_state cmdif_state);
 void mlx5_cmd_use_events(struct mlx5_core_dev *dev);
 void mlx5_cmd_use_polling(struct mlx5_core_dev *dev);
 void mlx5_cmd_allowed_opcode(struct mlx5_core_dev *dev, u16 opcode);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH mlx5-next 07/16] net/mlx5: Update the list of the PCI supported devices
  2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
                   ` (5 preceding siblings ...)
  2020-11-20 23:03 ` [PATCH mlx5-next 06/16] net/mlx5: Avoid exposing driver internal command helpers Saeed Mahameed
@ 2020-11-20 23:03 ` Saeed Mahameed
  2020-11-20 23:03 ` [PATCH mlx5-next 08/16] net/mlx5: Update the hardware interface definition for vhca state Saeed Mahameed
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-20 23:03 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev, linux-rdma, Meir Lichtinger, Eran Ben Elisha, Tariq Toukan

From: Meir Lichtinger <meirl@nvidia.com>

Add the upcoming BlueField-3 device ID.

Signed-off-by: Meir Lichtinger <meirl@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 8ff207aa1479..a9757ccb9d16 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1594,6 +1594,7 @@ static const struct pci_device_id mlx5_core_pci_table[] = {
 	{ PCI_VDEVICE(MELLANOX, 0xa2d2) },			/* BlueField integrated ConnectX-5 network controller */
 	{ PCI_VDEVICE(MELLANOX, 0xa2d3), MLX5_PCI_DEV_IS_VF},	/* BlueField integrated ConnectX-5 network controller VF */
 	{ PCI_VDEVICE(MELLANOX, 0xa2d6) },			/* BlueField-2 integrated ConnectX-6 Dx network controller */
+	{ PCI_VDEVICE(MELLANOX, 0xa2dc) },			/* BlueField-3 integrated ConnectX-7 network controller */
 	{ 0, }
 };
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH mlx5-next 08/16] net/mlx5: Update the hardware interface definition for vhca state
  2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
                   ` (6 preceding siblings ...)
  2020-11-20 23:03 ` [PATCH mlx5-next 07/16] net/mlx5: Update the list of the PCI supported devices Saeed Mahameed
@ 2020-11-20 23:03 ` Saeed Mahameed
  2020-11-20 23:03 ` [PATCH mlx5-next 09/16] net/mlx5: Expose IP-in-IP TX and RX capability bits Saeed Mahameed
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-20 23:03 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky; +Cc: netdev, linux-rdma, Parav Pandit

From: Parav Pandit <parav@nvidia.com>

Update the hardware interface definitions to query and modify vhca
state, related EQE and event code.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 include/linux/mlx5/device.h   |  7 +++++++
 include/linux/mlx5/mlx5_ifc.h | 17 ++++++++++++++---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index e9639c4cf2ed..f1de49d64a98 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -346,6 +346,7 @@ enum mlx5_event {
 	MLX5_EVENT_TYPE_NIC_VPORT_CHANGE   = 0xd,
 
 	MLX5_EVENT_TYPE_ESW_FUNCTIONS_CHANGED = 0xe,
+	MLX5_EVENT_TYPE_VHCA_STATE_CHANGE = 0xf,
 
 	MLX5_EVENT_TYPE_DCT_DRAINED        = 0x1c,
 	MLX5_EVENT_TYPE_DCT_KEY_VIOLATION  = 0x1d,
@@ -717,6 +718,11 @@ struct mlx5_eqe_sync_fw_update {
 	u8 sync_rst_state;
 };
 
+struct mlx5_eqe_vhca_state {
+	__be16 ec_function;
+	__be16 function_id;
+} __packed;
+
 union ev_data {
 	__be32				raw[7];
 	struct mlx5_eqe_cmd		cmd;
@@ -736,6 +742,7 @@ union ev_data {
 	struct mlx5_eqe_temp_warning	temp_warning;
 	struct mlx5_eqe_xrq_err		xrq_err;
 	struct mlx5_eqe_sync_fw_update	sync_fw_update;
+	struct mlx5_eqe_vhca_state	vhca_state;
 } __packed;
 
 struct mlx5_eqe {
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 632b9a61fda5..3ace1976514c 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -299,6 +299,8 @@ enum {
 	MLX5_CMD_OP_CREATE_UMEM                   = 0xa08,
 	MLX5_CMD_OP_DESTROY_UMEM                  = 0xa0a,
 	MLX5_CMD_OP_SYNC_STEERING                 = 0xb00,
+	MLX5_CMD_OP_QUERY_VHCA_STATE              = 0xb0d,
+	MLX5_CMD_OP_MODIFY_VHCA_STATE             = 0xb0e,
 	MLX5_CMD_OP_MAX
 };
 
@@ -1244,7 +1246,15 @@ enum mlx5_fc_bulk_alloc_bitmask {
 #define MLX5_FC_BULK_NUM_FCS(fc_enum) (MLX5_FC_BULK_SIZE_FACTOR * (fc_enum))
 
 struct mlx5_ifc_cmd_hca_cap_bits {
-	u8         reserved_at_0[0x30];
+	u8         reserved_at_0[0x20];
+
+	u8         reserved_at_20[0x3];
+	u8         event_on_vhca_state_teardown_request[0x1];
+	u8         event_on_vhca_state_in_use[0x1];
+	u8         event_on_vhca_state_active[0x1];
+	u8         event_on_vhca_state_allocated[0x1];
+	u8         event_on_vhca_state_invalid[0x1];
+	u8         reserved_at_28[0x8];
 	u8         vhca_id[0x10];
 
 	u8         reserved_at_40[0x40];
@@ -1534,7 +1544,8 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         disable_local_lb_uc[0x1];
 	u8         disable_local_lb_mc[0x1];
 	u8         log_min_hairpin_wq_data_sz[0x5];
-	u8         reserved_at_3e8[0x3];
+	u8         reserved_at_3e8[0x2];
+	u8         vhca_state[0x1];
 	u8         log_max_vlan_list[0x5];
 	u8         reserved_at_3f0[0x3];
 	u8         log_max_current_mc_list[0x5];
@@ -1602,7 +1613,7 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         max_num_of_monitor_counters[0x10];
 	u8         num_ppcnt_monitor_counters[0x10];
 
-	u8         reserved_at_640[0x10];
+	u8         max_num_sf[0x10];
 	u8         num_q_monitor_counters[0x10];
 
 	u8         reserved_at_660[0x20];
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH mlx5-next 09/16] net/mlx5: Expose IP-in-IP TX and RX capability bits
  2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
                   ` (7 preceding siblings ...)
  2020-11-20 23:03 ` [PATCH mlx5-next 08/16] net/mlx5: Update the hardware interface definition for vhca state Saeed Mahameed
@ 2020-11-20 23:03 ` Saeed Mahameed
  2020-11-21 23:58   ` Jakub Kicinski
  2020-11-20 23:03 ` [PATCH mlx5-next 10/16] net/mlx5: Expose other function ifc bits Saeed Mahameed
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-20 23:03 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev, linux-rdma, Aya Levin, Moshe Shemesh

From: Aya Levin <ayal@nvidia.com>

Expose FW indication that it supports stateless offloads for IP over IP
tunneled packets per direction. In some HW like ConnectX-4 IP-in-IP
support is not symmetric, it supports steering on the inner header but
it doesn't TX-Checksum and TSO. Add IP-in-IP capability per direction to
cover this case as well.

Note: only if both indications are turned on, the global
tunnel_stateless_ip_over_ip is on too.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 include/linux/mlx5/mlx5_ifc.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 3ace1976514c..96888f9f822d 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -913,7 +913,10 @@ struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
 	u8         tunnel_stateless_ipv4_over_vxlan[0x1];
 	u8         tunnel_stateless_ip_over_ip[0x1];
 	u8         insert_trailer[0x1];
-	u8         reserved_at_2b[0x5];
+	u8         reserved_at_2b[0x1];
+	u8         tunnel_stateless_ip_over_ip_rx[0x1];
+	u8         tunnel_stateless_ip_over_ip_tx[0x1];
+	u8         reserved_at_2e[0x2];
 	u8         max_vxlan_udp_ports[0x8];
 	u8         reserved_at_38[0x6];
 	u8         max_geneve_opt_len[0x1];
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH mlx5-next 10/16] net/mlx5: Expose other function ifc bits
  2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
                   ` (8 preceding siblings ...)
  2020-11-20 23:03 ` [PATCH mlx5-next 09/16] net/mlx5: Expose IP-in-IP TX and RX capability bits Saeed Mahameed
@ 2020-11-20 23:03 ` Saeed Mahameed
  2020-11-20 23:03 ` [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace Saeed Mahameed
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-20 23:03 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev, linux-rdma, Yishai Hadas, Parav Pandit

From: Yishai Hadas <yishaih@nvidia.com>

Expose other function ifc bits to enable setting HCA caps on behalf of
other function.

In addition, expose vhca_resource_manager bit to control whether the
other function functionality is supported by firmware.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 include/linux/mlx5/mlx5_ifc.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 96888f9f822d..3e337386faa8 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1249,7 +1249,8 @@ enum mlx5_fc_bulk_alloc_bitmask {
 #define MLX5_FC_BULK_NUM_FCS(fc_enum) (MLX5_FC_BULK_SIZE_FACTOR * (fc_enum))
 
 struct mlx5_ifc_cmd_hca_cap_bits {
-	u8         reserved_at_0[0x20];
+	u8         reserved_at_0[0x1f];
+	u8         vhca_resource_manager[0x1];
 
 	u8         reserved_at_20[0x3];
 	u8         event_on_vhca_state_teardown_request[0x1];
@@ -4247,7 +4248,11 @@ struct mlx5_ifc_set_hca_cap_in_bits {
 	u8         reserved_at_20[0x10];
 	u8         op_mod[0x10];
 
-	u8         reserved_at_40[0x40];
+	u8         other_function[0x1];
+	u8         reserved_at_41[0xf];
+	u8         function_id[0x10];
+
+	u8         reserved_at_60[0x20];
 
 	union mlx5_ifc_hca_cap_union_bits capability;
 };
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace
  2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
                   ` (9 preceding siblings ...)
  2020-11-20 23:03 ` [PATCH mlx5-next 10/16] net/mlx5: Expose other function ifc bits Saeed Mahameed
@ 2020-11-20 23:03 ` Saeed Mahameed
  2020-11-22  0:01   ` Jakub Kicinski
  2020-11-20 23:03 ` [PATCH mlx5-next 12/16] net/mlx5: Export steering related functions Saeed Mahameed
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-20 23:03 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev, linux-rdma, Eli Cohen, Mark Bloch, Maor Gottlieb

From: Eli Cohen <eli@mellanox.com>

Add a new namespace type to the NIC RX root namespace to allow for
inserting VDPA rules before regular NIC but after bypass, thus allowing
DPDK to have precedence in packet processing.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 10 +++++++++-
 include/linux/mlx5/fs.h                           |  1 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 16091838bfcf..e095c5968e67 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -118,6 +118,10 @@
 #define ANCHOR_NUM_PRIOS 1
 #define ANCHOR_MIN_LEVEL (BY_PASS_MIN_LEVEL + 1)
 
+#define VDPA_PRIO_NUM_LEVELS 1
+#define VDPA_NUM_PRIOS 1
+#define VDPA_MIN_LEVEL 1
+
 #define OFFLOADS_MAX_FT 2
 #define OFFLOADS_NUM_PRIOS 2
 #define OFFLOADS_MIN_LEVEL (ANCHOR_MIN_LEVEL + OFFLOADS_NUM_PRIOS)
@@ -147,7 +151,7 @@ static struct init_tree_node {
 	enum mlx5_flow_table_miss_action def_miss_action;
 } root_fs = {
 	.type = FS_TYPE_NAMESPACE,
-	.ar_size = 7,
+	.ar_size = 8,
 	  .children = (struct init_tree_node[]){
 		  ADD_PRIO(0, BY_PASS_MIN_LEVEL, 0, FS_CHAINING_CAPS,
 			   ADD_NS(MLX5_FLOW_TABLE_MISS_ACTION_DEF,
@@ -165,6 +169,10 @@ static struct init_tree_node {
 			   ADD_NS(MLX5_FLOW_TABLE_MISS_ACTION_DEF,
 				  ADD_MULTIPLE_PRIO(ETHTOOL_NUM_PRIOS,
 						    ETHTOOL_PRIO_NUM_LEVELS))),
+		  ADD_PRIO(0, VDPA_MIN_LEVEL, 0, FS_CHAINING_CAPS,
+			   ADD_NS(MLX5_FLOW_TABLE_MISS_ACTION_DEF,
+				  ADD_MULTIPLE_PRIO(VDPA_NUM_PRIOS,
+						    VDPA_PRIO_NUM_LEVELS))),
 		  ADD_PRIO(0, KERNEL_MIN_LEVEL, 0, {},
 			   ADD_NS(MLX5_FLOW_TABLE_MISS_ACTION_DEF,
 				  ADD_MULTIPLE_PRIO(KERNEL_NIC_TC_NUM_PRIOS,
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 35d2cc1646d3..97176d623d74 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -67,6 +67,7 @@ enum mlx5_flow_namespace_type {
 	MLX5_FLOW_NAMESPACE_LAG,
 	MLX5_FLOW_NAMESPACE_OFFLOADS,
 	MLX5_FLOW_NAMESPACE_ETHTOOL,
+	MLX5_FLOW_NAMESPACE_VDPA,
 	MLX5_FLOW_NAMESPACE_KERNEL,
 	MLX5_FLOW_NAMESPACE_LEFTOVERS,
 	MLX5_FLOW_NAMESPACE_ANCHOR,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH mlx5-next 12/16] net/mlx5: Export steering related functions
  2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
                   ` (10 preceding siblings ...)
  2020-11-20 23:03 ` [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace Saeed Mahameed
@ 2020-11-20 23:03 ` Saeed Mahameed
  2020-11-20 23:03 ` [PATCH mlx5-next 13/16] net/mlx5: Make API mlx5_core_is_ecpf accept const pointer Saeed Mahameed
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-20 23:03 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky; +Cc: netdev, linux-rdma, Eli Cohen, Mark Bloch

From: Eli Cohen <elic@nvidia.com>

Export
 mlx5_create_flow_table()
 mlx5_create_flow_group()
 mlx5_destroy_flow_group().

These symbols are required by the VDPA implementation to create rules
that consume VDPA specific traffic.

We do not deal with putting the prototypes in a header file since they
already exist in include/linux/mlx5/fs.h.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index e095c5968e67..9feab81ab919 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1153,6 +1153,7 @@ struct mlx5_flow_table *mlx5_create_flow_table(struct mlx5_flow_namespace *ns,
 {
 	return __mlx5_create_flow_table(ns, ft_attr, FS_FT_OP_MOD_NORMAL, 0);
 }
+EXPORT_SYMBOL(mlx5_create_flow_table);
 
 struct mlx5_flow_table *mlx5_create_vport_flow_table(struct mlx5_flow_namespace *ns,
 						     int prio, int max_fte,
@@ -1244,6 +1245,7 @@ struct mlx5_flow_group *mlx5_create_flow_group(struct mlx5_flow_table *ft,
 
 	return fg;
 }
+EXPORT_SYMBOL(mlx5_create_flow_group);
 
 static struct mlx5_flow_rule *alloc_rule(struct mlx5_flow_destination *dest)
 {
@@ -2146,6 +2148,7 @@ void mlx5_destroy_flow_group(struct mlx5_flow_group *fg)
 		mlx5_core_warn(get_dev(&fg->node), "Flow group %d wasn't destroyed, refcount > 1\n",
 			       fg->id);
 }
+EXPORT_SYMBOL(mlx5_destroy_flow_group);
 
 struct mlx5_flow_namespace *mlx5_get_fdb_sub_ns(struct mlx5_core_dev *dev,
 						int n)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH mlx5-next 13/16] net/mlx5: Make API mlx5_core_is_ecpf accept const pointer
  2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
                   ` (11 preceding siblings ...)
  2020-11-20 23:03 ` [PATCH mlx5-next 12/16] net/mlx5: Export steering related functions Saeed Mahameed
@ 2020-11-20 23:03 ` Saeed Mahameed
  2020-11-20 23:03 ` [PATCH mlx5-next 14/16] net/mlx5: Rename peer_pf to host_pf Saeed Mahameed
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-20 23:03 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev, linux-rdma, Parav Pandit, Bodong Wang

From: Parav Pandit <parav@nvidia.com>

Subsequent patch implements helper API which has mlx5_core_dev
as const pointer, make its caller API too const *.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 include/linux/mlx5/driver.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 5e84b1d53650..d6ef3068d7d3 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1133,7 +1133,7 @@ static inline bool mlx5_core_is_vf(const struct mlx5_core_dev *dev)
 	return dev->coredev_type == MLX5_COREDEV_VF;
 }
 
-static inline bool mlx5_core_is_ecpf(struct mlx5_core_dev *dev)
+static inline bool mlx5_core_is_ecpf(const struct mlx5_core_dev *dev)
 {
 	return dev->caps.embedded_cpu;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH mlx5-next 14/16] net/mlx5: Rename peer_pf to host_pf
  2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
                   ` (12 preceding siblings ...)
  2020-11-20 23:03 ` [PATCH mlx5-next 13/16] net/mlx5: Make API mlx5_core_is_ecpf accept const pointer Saeed Mahameed
@ 2020-11-20 23:03 ` Saeed Mahameed
  2020-11-20 23:03 ` [PATCH mlx5-next 15/16] net/mlx5: Enable host PF HCA after eswitch is initialized Saeed Mahameed
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-20 23:03 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev, linux-rdma, Parav Pandit, Bodong Wang

From: Parav Pandit <parav@nvidia.com>

To match the hardware spec, rename peer_pf to host_pf.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/ecpf.c    | 51 ++++++++++++-------
 .../ethernet/mellanox/mlx5/core/pagealloc.c   | 12 ++---
 include/linux/mlx5/driver.h                   |  2 +-
 3 files changed, 40 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
index 3dc9dd3f24dc..68ca0e2b26cd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
@@ -8,37 +8,52 @@ bool mlx5_read_embedded_cpu(struct mlx5_core_dev *dev)
 	return (ioread32be(&dev->iseg->initializing) >> MLX5_ECPU_BIT_NUM) & 1;
 }
 
-static int mlx5_peer_pf_init(struct mlx5_core_dev *dev)
+static int mlx5_cmd_host_pf_enable_hca(struct mlx5_core_dev *dev)
 {
-	u32 in[MLX5_ST_SZ_DW(enable_hca_in)] = {};
-	int err;
+	u32 out[MLX5_ST_SZ_DW(enable_hca_out)] = {};
+	u32 in[MLX5_ST_SZ_DW(enable_hca_in)]   = {};
 
 	MLX5_SET(enable_hca_in, in, opcode, MLX5_CMD_OP_ENABLE_HCA);
-	err = mlx5_cmd_exec_in(dev, enable_hca, in);
+	MLX5_SET(enable_hca_in, in, function_id, 0);
+	MLX5_SET(enable_hca_in, in, embedded_cpu_function, 0);
+	return mlx5_cmd_exec(dev, &in, sizeof(in), &out, sizeof(out));
+}
+
+static int mlx5_cmd_host_pf_disable_hca(struct mlx5_core_dev *dev)
+{
+	u32 out[MLX5_ST_SZ_DW(disable_hca_out)] = {};
+	u32 in[MLX5_ST_SZ_DW(disable_hca_in)]   = {};
+
+	MLX5_SET(disable_hca_in, in, opcode, MLX5_CMD_OP_DISABLE_HCA);
+	MLX5_SET(disable_hca_in, in, function_id, 0);
+	MLX5_SET(disable_hca_in, in, embedded_cpu_function, 0);
+	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
+
+static int mlx5_host_pf_init(struct mlx5_core_dev *dev)
+{
+	int err;
+
+	err = mlx5_cmd_host_pf_enable_hca(dev);
 	if (err)
-		mlx5_core_err(dev, "Failed to enable peer PF HCA err(%d)\n",
-			      err);
+		mlx5_core_err(dev, "Failed to enable external host PF HCA err(%d)\n", err);
 
 	return err;
 }
 
-static void mlx5_peer_pf_cleanup(struct mlx5_core_dev *dev)
+static void mlx5_host_pf_cleanup(struct mlx5_core_dev *dev)
 {
-	u32 in[MLX5_ST_SZ_DW(disable_hca_in)] = {};
 	int err;
 
-	MLX5_SET(disable_hca_in, in, opcode, MLX5_CMD_OP_DISABLE_HCA);
-	err = mlx5_cmd_exec_in(dev, disable_hca, in);
+	err = mlx5_cmd_host_pf_disable_hca(dev);
 	if (err) {
-		mlx5_core_err(dev, "Failed to disable peer PF HCA err(%d)\n",
-			      err);
+		mlx5_core_err(dev, "Failed to disable external host PF HCA err(%d)\n", err);
 		return;
 	}
 
-	err = mlx5_wait_for_pages(dev, &dev->priv.peer_pf_pages);
+	err = mlx5_wait_for_pages(dev, &dev->priv.host_pf_pages);
 	if (err)
-		mlx5_core_warn(dev, "Timeout reclaiming peer PF pages err(%d)\n",
-			       err);
+		mlx5_core_warn(dev, "Timeout reclaiming external host PF pages err(%d)\n", err);
 }
 
 int mlx5_ec_init(struct mlx5_core_dev *dev)
@@ -46,10 +61,10 @@ int mlx5_ec_init(struct mlx5_core_dev *dev)
 	if (!mlx5_core_is_ecpf(dev))
 		return 0;
 
-	/* ECPF shall enable HCA for peer PF in the same way a PF
+	/* ECPF shall enable HCA for host PF in the same way a PF
 	 * does this for its VFs.
 	 */
-	return mlx5_peer_pf_init(dev);
+	return mlx5_host_pf_init(dev);
 }
 
 void mlx5_ec_cleanup(struct mlx5_core_dev *dev)
@@ -57,5 +72,5 @@ void mlx5_ec_cleanup(struct mlx5_core_dev *dev)
 	if (!mlx5_core_is_ecpf(dev))
 		return;
 
-	mlx5_peer_pf_cleanup(dev);
+	mlx5_host_pf_cleanup(dev);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
index 150638814517..539baea358bf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
@@ -374,7 +374,7 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
 	if (func_id)
 		dev->priv.vfs_pages += npages;
 	else if (mlx5_core_is_ecpf(dev) && !ec_function)
-		dev->priv.peer_pf_pages += npages;
+		dev->priv.host_pf_pages += npages;
 
 	mlx5_core_dbg(dev, "npages %d, ec_function %d, func_id 0x%x, err %d\n",
 		      npages, ec_function, func_id, err);
@@ -416,7 +416,7 @@ static void release_all_pages(struct mlx5_core_dev *dev, u32 func_id,
 	if (func_id)
 		dev->priv.vfs_pages -= npages;
 	else if (mlx5_core_is_ecpf(dev) && !ec_function)
-		dev->priv.peer_pf_pages -= npages;
+		dev->priv.host_pf_pages -= npages;
 
 	mlx5_core_dbg(dev, "npages %d, ec_function %d, func_id 0x%x\n",
 		      npages, ec_function, func_id);
@@ -506,7 +506,7 @@ static int reclaim_pages(struct mlx5_core_dev *dev, u32 func_id, int npages,
 	if (func_id)
 		dev->priv.vfs_pages -= num_claimed;
 	else if (mlx5_core_is_ecpf(dev) && !ec_function)
-		dev->priv.peer_pf_pages -= num_claimed;
+		dev->priv.host_pf_pages -= num_claimed;
 
 out_free:
 	kvfree(out);
@@ -661,9 +661,9 @@ int mlx5_reclaim_startup_pages(struct mlx5_core_dev *dev)
 	WARN(dev->priv.vfs_pages,
 	     "VFs FW pages counter is %d after reclaiming all pages\n",
 	     dev->priv.vfs_pages);
-	WARN(dev->priv.peer_pf_pages,
-	     "Peer PF FW pages counter is %d after reclaiming all pages\n",
-	     dev->priv.peer_pf_pages);
+	WARN(dev->priv.host_pf_pages,
+	     "External host PF FW pages counter is %d after reclaiming all pages\n",
+	     dev->priv.host_pf_pages);
 
 	return 0;
 }
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index d6ef3068d7d3..8e9bcb3bfd77 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -547,7 +547,7 @@ struct mlx5_priv {
 	atomic_t		reg_pages;
 	struct list_head	free_list;
 	int			vfs_pages;
-	int			peer_pf_pages;
+	int			host_pf_pages;
 
 	struct mlx5_core_health health;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH mlx5-next 15/16] net/mlx5: Enable host PF HCA after eswitch is initialized
  2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
                   ` (13 preceding siblings ...)
  2020-11-20 23:03 ` [PATCH mlx5-next 14/16] net/mlx5: Rename peer_pf to host_pf Saeed Mahameed
@ 2020-11-20 23:03 ` Saeed Mahameed
  2020-11-20 23:03 ` [PATCH mlx5-next 16/16] net/mlx5: Treat host PF vport as other (non eswitch manager) vport Saeed Mahameed
  2020-11-30 18:42 ` [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
  16 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-20 23:03 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev, linux-rdma, Parav Pandit, Bodong Wang

From: Parav Pandit <parav@nvidia.com>

Currently ECPF enables external host PF too early in the initialization
sequence for Ethernet links when ECPF is eswitch manager.

Due to this, when external host PF driver is loaded, host PF's HCA CAP has
inner_ip_version supported by NIC RX flow table.
This capability is later updated by firmware after ECPF driver enables
ENCAP/DECAP as eswitch manager.

This results into a timing race condition, where CREATE_TIR command
fails with a below syndrome on host PF.

mlx5_cmd_check:775:(pid 510): CREATE_TIR(0x900) op_mod(0x0) failed,
status bad parameter(0x3), syndrome (0x562b00)

Hence, enable the external host PF after necessary eswitch and per vport
initialization is completed.
Continue to enable host PF when eswitch manager capability is off for a
ECPF.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/ecpf.c    | 35 ++++++++++++++-----
 .../net/ethernet/mellanox/mlx5/core/ecpf.h    |  3 ++
 .../net/ethernet/mellanox/mlx5/core/eswitch.c | 29 ++++++++++++++-
 .../net/ethernet/mellanox/mlx5/core/main.c    | 18 +++++-----
 4 files changed, 66 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
index 68ca0e2b26cd..464eb3a18450 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
@@ -8,7 +8,16 @@ bool mlx5_read_embedded_cpu(struct mlx5_core_dev *dev)
 	return (ioread32be(&dev->iseg->initializing) >> MLX5_ECPU_BIT_NUM) & 1;
 }
 
-static int mlx5_cmd_host_pf_enable_hca(struct mlx5_core_dev *dev)
+static bool mlx5_ecpf_esw_admins_host_pf(const struct mlx5_core_dev *dev)
+{
+	/* In separate host mode, PF enables itself.
+	 * When ECPF is eswitch manager, eswitch enables host PF after
+	 * eswitch is setup.
+	 */
+	return mlx5_core_is_ecpf_esw_manager(dev);
+}
+
+int mlx5_cmd_host_pf_enable_hca(struct mlx5_core_dev *dev)
 {
 	u32 out[MLX5_ST_SZ_DW(enable_hca_out)] = {};
 	u32 in[MLX5_ST_SZ_DW(enable_hca_in)]   = {};
@@ -19,7 +28,7 @@ static int mlx5_cmd_host_pf_enable_hca(struct mlx5_core_dev *dev)
 	return mlx5_cmd_exec(dev, &in, sizeof(in), &out, sizeof(out));
 }
 
-static int mlx5_cmd_host_pf_disable_hca(struct mlx5_core_dev *dev)
+int mlx5_cmd_host_pf_disable_hca(struct mlx5_core_dev *dev)
 {
 	u32 out[MLX5_ST_SZ_DW(disable_hca_out)] = {};
 	u32 in[MLX5_ST_SZ_DW(disable_hca_in)]   = {};
@@ -34,6 +43,12 @@ static int mlx5_host_pf_init(struct mlx5_core_dev *dev)
 {
 	int err;
 
+	if (mlx5_ecpf_esw_admins_host_pf(dev))
+		return 0;
+
+	/* ECPF shall enable HCA for host PF in the same way a PF
+	 * does this for its VFs when ECPF is not a eswitch manager.
+	 */
 	err = mlx5_cmd_host_pf_enable_hca(dev);
 	if (err)
 		mlx5_core_err(dev, "Failed to enable external host PF HCA err(%d)\n", err);
@@ -45,15 +60,14 @@ static void mlx5_host_pf_cleanup(struct mlx5_core_dev *dev)
 {
 	int err;
 
+	if (mlx5_ecpf_esw_admins_host_pf(dev))
+		return;
+
 	err = mlx5_cmd_host_pf_disable_hca(dev);
 	if (err) {
 		mlx5_core_err(dev, "Failed to disable external host PF HCA err(%d)\n", err);
 		return;
 	}
-
-	err = mlx5_wait_for_pages(dev, &dev->priv.host_pf_pages);
-	if (err)
-		mlx5_core_warn(dev, "Timeout reclaiming external host PF pages err(%d)\n", err);
 }
 
 int mlx5_ec_init(struct mlx5_core_dev *dev)
@@ -61,16 +75,19 @@ int mlx5_ec_init(struct mlx5_core_dev *dev)
 	if (!mlx5_core_is_ecpf(dev))
 		return 0;
 
-	/* ECPF shall enable HCA for host PF in the same way a PF
-	 * does this for its VFs.
-	 */
 	return mlx5_host_pf_init(dev);
 }
 
 void mlx5_ec_cleanup(struct mlx5_core_dev *dev)
 {
+	int err;
+
 	if (!mlx5_core_is_ecpf(dev))
 		return;
 
 	mlx5_host_pf_cleanup(dev);
+
+	err = mlx5_wait_for_pages(dev, &dev->priv.host_pf_pages);
+	if (err)
+		mlx5_core_warn(dev, "Timeout reclaiming external host PF pages err(%d)\n", err);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.h b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.h
index d3d7a00a02ac..40b6ad76dca6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.h
@@ -17,6 +17,9 @@ bool mlx5_read_embedded_cpu(struct mlx5_core_dev *dev);
 int mlx5_ec_init(struct mlx5_core_dev *dev);
 void mlx5_ec_cleanup(struct mlx5_core_dev *dev);
 
+int mlx5_cmd_host_pf_enable_hca(struct mlx5_core_dev *dev);
+int mlx5_cmd_host_pf_disable_hca(struct mlx5_core_dev *dev);
+
 #else  /* CONFIG_MLX5_ESWITCH */
 
 static inline bool
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 6e6a9a563992..dcd8946a843c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1469,6 +1469,26 @@ int mlx5_eswitch_load_vf_vports(struct mlx5_eswitch *esw, u16 num_vfs,
 	return err;
 }
 
+static int host_pf_enable_hca(struct mlx5_core_dev *dev)
+{
+	if (!mlx5_core_is_ecpf(dev))
+		return 0;
+
+	/* Once vport and representor are ready, take out the external host PF
+	 * out of initializing state. Enabling HCA clears the iser->initializing
+	 * bit and host PF driver loading can progress.
+	 */
+	return mlx5_cmd_host_pf_enable_hca(dev);
+}
+
+static void host_pf_disable_hca(struct mlx5_core_dev *dev)
+{
+	if (!mlx5_core_is_ecpf(dev))
+		return;
+
+	mlx5_cmd_host_pf_disable_hca(dev);
+}
+
 /* mlx5_eswitch_enable_pf_vf_vports() enables vports of PF, ECPF and VFs
  * whichever are present on the eswitch.
  */
@@ -1483,6 +1503,11 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw,
 	if (ret)
 		return ret;
 
+	/* Enable external host PF HCA */
+	ret = host_pf_enable_hca(esw->dev);
+	if (ret)
+		goto pf_hca_err;
+
 	/* Enable ECPF vport */
 	if (mlx5_ecpf_vport_exists(esw->dev)) {
 		ret = mlx5_eswitch_load_vport(esw, MLX5_VPORT_ECPF, enabled_events);
@@ -1500,8 +1525,9 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw,
 vf_err:
 	if (mlx5_ecpf_vport_exists(esw->dev))
 		mlx5_eswitch_unload_vport(esw, MLX5_VPORT_ECPF);
-
 ecpf_err:
+	host_pf_disable_hca(esw->dev);
+pf_hca_err:
 	mlx5_eswitch_unload_vport(esw, MLX5_VPORT_PF);
 	return ret;
 }
@@ -1516,6 +1542,7 @@ void mlx5_eswitch_disable_pf_vf_vports(struct mlx5_eswitch *esw)
 	if (mlx5_ecpf_vport_exists(esw->dev))
 		mlx5_eswitch_unload_vport(esw, MLX5_VPORT_ECPF);
 
+	host_pf_disable_hca(esw->dev);
 	mlx5_eswitch_unload_vport(esw, MLX5_VPORT_PF);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index a9757ccb9d16..d86f06f14cd3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1126,23 +1126,23 @@ static int mlx5_load(struct mlx5_core_dev *dev)
 		goto err_sriov;
 	}
 
-	err = mlx5_sriov_attach(dev);
-	if (err) {
-		mlx5_core_err(dev, "sriov init failed %d\n", err);
-		goto err_sriov;
-	}
-
 	err = mlx5_ec_init(dev);
 	if (err) {
 		mlx5_core_err(dev, "Failed to init embedded CPU\n");
 		goto err_ec;
 	}
 
+	err = mlx5_sriov_attach(dev);
+	if (err) {
+		mlx5_core_err(dev, "sriov init failed %d\n", err);
+		goto err_sriov;
+	}
+
 	return 0;
 
-err_ec:
-	mlx5_sriov_detach(dev);
 err_sriov:
+	mlx5_ec_cleanup(dev);
+err_ec:
 	mlx5_cleanup_fs(dev);
 err_fs:
 	mlx5_accel_tls_cleanup(dev);
@@ -1168,8 +1168,8 @@ static int mlx5_load(struct mlx5_core_dev *dev)
 
 static void mlx5_unload(struct mlx5_core_dev *dev)
 {
-	mlx5_ec_cleanup(dev);
 	mlx5_sriov_detach(dev);
+	mlx5_ec_cleanup(dev);
 	mlx5_cleanup_fs(dev);
 	mlx5_accel_ipsec_cleanup(dev);
 	mlx5_accel_tls_cleanup(dev);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH mlx5-next 16/16] net/mlx5: Treat host PF vport as other (non eswitch manager) vport
  2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
                   ` (14 preceding siblings ...)
  2020-11-20 23:03 ` [PATCH mlx5-next 15/16] net/mlx5: Enable host PF HCA after eswitch is initialized Saeed Mahameed
@ 2020-11-20 23:03 ` Saeed Mahameed
  2020-11-30 18:42 ` [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
  16 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-20 23:03 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky; +Cc: netdev, linux-rdma, Parav Pandit

From: Parav Pandit <parav@nvidia.com>

When eswitch manager is running on ECPF, host PF should be treated
as non eswitch manager port, similar to other VF vports.
Fail to do so, results in firmware treating PF's vport as ECPF
vport for eswitch ACL tables.
Non zero check to figure out if a given vport is other vport or not
is not sufficient becase PF vport number = 0 on ECPF.
Hence, create esw acl tables with an attribute of other vport.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/esw/acl/helper.c       |  5 +-
 .../net/ethernet/mellanox/mlx5/core/fs_cmd.c  | 54 +++++++++----------
 .../net/ethernet/mellanox/mlx5/core/fs_core.c | 14 ++---
 include/linux/mlx5/fs.h                       |  5 +-
 4 files changed, 34 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/helper.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/helper.c
index 22f4c1c28006..4a369669e51e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/helper.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/helper.c
@@ -8,6 +8,7 @@
 struct mlx5_flow_table *
 esw_acl_table_create(struct mlx5_eswitch *esw, u16 vport_num, int ns, int size)
 {
+	struct mlx5_flow_table_attr ft_attr = {};
 	struct mlx5_core_dev *dev = esw->dev;
 	struct mlx5_flow_namespace *root_ns;
 	struct mlx5_flow_table *acl;
@@ -33,7 +34,9 @@ esw_acl_table_create(struct mlx5_eswitch *esw, u16 vport_num, int ns, int size)
 		return ERR_PTR(-EOPNOTSUPP);
 	}
 
-	acl = mlx5_create_vport_flow_table(root_ns, 0, size, 0, vport_num);
+	ft_attr.max_fte = size;
+	ft_attr.flags = MLX5_FLOW_TABLE_OTHER_VPORT;
+	acl = mlx5_create_vport_flow_table(root_ns, &ft_attr, vport_num);
 	if (IS_ERR(acl)) {
 		err = PTR_ERR(acl);
 		esw_warn(dev, "vport[%d] create %s ACL table, err(%d)\n", vport_num,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index c2fed9c3d75c..8e06731d3cb3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -172,10 +172,9 @@ static int mlx5_cmd_update_root_ft(struct mlx5_flow_root_namespace *ns,
 		MLX5_SET(set_flow_table_root_in, in, table_id, ft->id);
 
 	MLX5_SET(set_flow_table_root_in, in, underlay_qpn, underlay_qpn);
-	if (ft->vport) {
-		MLX5_SET(set_flow_table_root_in, in, vport_number, ft->vport);
-		MLX5_SET(set_flow_table_root_in, in, other_vport, 1);
-	}
+	MLX5_SET(set_flow_table_root_in, in, vport_number, ft->vport);
+	MLX5_SET(set_flow_table_root_in, in, other_vport,
+		 !!(ft->flags & MLX5_FLOW_TABLE_OTHER_VPORT));
 
 	return mlx5_cmd_exec_in(dev, set_flow_table_root, in);
 }
@@ -199,10 +198,9 @@ static int mlx5_cmd_create_flow_table(struct mlx5_flow_root_namespace *ns,
 	MLX5_SET(create_flow_table_in, in, table_type, ft->type);
 	MLX5_SET(create_flow_table_in, in, flow_table_context.level, ft->level);
 	MLX5_SET(create_flow_table_in, in, flow_table_context.log_size, log_size);
-	if (ft->vport) {
-		MLX5_SET(create_flow_table_in, in, vport_number, ft->vport);
-		MLX5_SET(create_flow_table_in, in, other_vport, 1);
-	}
+	MLX5_SET(create_flow_table_in, in, vport_number, ft->vport);
+	MLX5_SET(create_flow_table_in, in, other_vport,
+		 !!(ft->flags & MLX5_FLOW_TABLE_OTHER_VPORT));
 
 	MLX5_SET(create_flow_table_in, in, flow_table_context.decap_en,
 		 en_decap);
@@ -252,10 +250,9 @@ static int mlx5_cmd_destroy_flow_table(struct mlx5_flow_root_namespace *ns,
 		 MLX5_CMD_OP_DESTROY_FLOW_TABLE);
 	MLX5_SET(destroy_flow_table_in, in, table_type, ft->type);
 	MLX5_SET(destroy_flow_table_in, in, table_id, ft->id);
-	if (ft->vport) {
-		MLX5_SET(destroy_flow_table_in, in, vport_number, ft->vport);
-		MLX5_SET(destroy_flow_table_in, in, other_vport, 1);
-	}
+	MLX5_SET(destroy_flow_table_in, in, vport_number, ft->vport);
+	MLX5_SET(destroy_flow_table_in, in, other_vport,
+		 !!(ft->flags & MLX5_FLOW_TABLE_OTHER_VPORT));
 
 	return mlx5_cmd_exec_in(dev, destroy_flow_table, in);
 }
@@ -283,11 +280,9 @@ static int mlx5_cmd_modify_flow_table(struct mlx5_flow_root_namespace *ns,
 				 flow_table_context.lag_master_next_table_id, 0);
 		}
 	} else {
-		if (ft->vport) {
-			MLX5_SET(modify_flow_table_in, in, vport_number,
-				 ft->vport);
-			MLX5_SET(modify_flow_table_in, in, other_vport, 1);
-		}
+		MLX5_SET(modify_flow_table_in, in, vport_number, ft->vport);
+		MLX5_SET(modify_flow_table_in, in, other_vport,
+			 !!(ft->flags & MLX5_FLOW_TABLE_OTHER_VPORT));
 		MLX5_SET(modify_flow_table_in, in, modify_field_select,
 			 MLX5_MODIFY_FLOW_TABLE_MISS_TABLE_ID);
 		if (next_ft) {
@@ -325,6 +320,9 @@ static int mlx5_cmd_create_flow_group(struct mlx5_flow_root_namespace *ns,
 		MLX5_SET(create_flow_group_in, in, other_vport, 1);
 	}
 
+	MLX5_SET(create_flow_group_in, in, vport_number, ft->vport);
+	MLX5_SET(create_flow_group_in, in, other_vport,
+		 !!(ft->flags & MLX5_FLOW_TABLE_OTHER_VPORT));
 	err = mlx5_cmd_exec_inout(dev, create_flow_group, in, out);
 	if (!err)
 		fg->id = MLX5_GET(create_flow_group_out, out,
@@ -344,11 +342,9 @@ static int mlx5_cmd_destroy_flow_group(struct mlx5_flow_root_namespace *ns,
 	MLX5_SET(destroy_flow_group_in, in, table_type, ft->type);
 	MLX5_SET(destroy_flow_group_in, in, table_id, ft->id);
 	MLX5_SET(destroy_flow_group_in, in, group_id, fg->id);
-	if (ft->vport) {
-		MLX5_SET(destroy_flow_group_in, in, vport_number, ft->vport);
-		MLX5_SET(destroy_flow_group_in, in, other_vport, 1);
-	}
-
+	MLX5_SET(destroy_flow_group_in, in, vport_number, ft->vport);
+	MLX5_SET(destroy_flow_group_in, in, other_vport,
+		 !!(ft->flags & MLX5_FLOW_TABLE_OTHER_VPORT));
 	return mlx5_cmd_exec_in(dev, destroy_flow_group, in);
 }
 
@@ -427,10 +423,9 @@ static int mlx5_cmd_set_fte(struct mlx5_core_dev *dev,
 	MLX5_SET(set_fte_in, in, ignore_flow_level,
 		 !!(fte->action.flags & FLOW_ACT_IGNORE_FLOW_LEVEL));
 
-	if (ft->vport) {
-		MLX5_SET(set_fte_in, in, vport_number, ft->vport);
-		MLX5_SET(set_fte_in, in, other_vport, 1);
-	}
+	MLX5_SET(set_fte_in, in, vport_number, ft->vport);
+	MLX5_SET(set_fte_in, in, other_vport,
+		 !!(ft->flags & MLX5_FLOW_TABLE_OTHER_VPORT));
 
 	in_flow_context = MLX5_ADDR_OF(set_fte_in, in, flow_context);
 	MLX5_SET(flow_context, in_flow_context, group_id, group_id);
@@ -604,10 +599,9 @@ static int mlx5_cmd_delete_fte(struct mlx5_flow_root_namespace *ns,
 	MLX5_SET(delete_fte_in, in, table_type, ft->type);
 	MLX5_SET(delete_fte_in, in, table_id, ft->id);
 	MLX5_SET(delete_fte_in, in, flow_index, fte->index);
-	if (ft->vport) {
-		MLX5_SET(delete_fte_in, in, vport_number, ft->vport);
-		MLX5_SET(delete_fte_in, in, other_vport, 1);
-	}
+	MLX5_SET(delete_fte_in, in, vport_number, ft->vport);
+	MLX5_SET(delete_fte_in, in, other_vport,
+		 !!(ft->flags & MLX5_FLOW_TABLE_OTHER_VPORT));
 
 	return mlx5_cmd_exec_in(dev, delete_fte, in);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 9feab81ab919..761581232139 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1155,17 +1155,11 @@ struct mlx5_flow_table *mlx5_create_flow_table(struct mlx5_flow_namespace *ns,
 }
 EXPORT_SYMBOL(mlx5_create_flow_table);
 
-struct mlx5_flow_table *mlx5_create_vport_flow_table(struct mlx5_flow_namespace *ns,
-						     int prio, int max_fte,
-						     u32 level, u16 vport)
+struct mlx5_flow_table *
+mlx5_create_vport_flow_table(struct mlx5_flow_namespace *ns,
+			     struct mlx5_flow_table_attr *ft_attr, u16 vport)
 {
-	struct mlx5_flow_table_attr ft_attr = {};
-
-	ft_attr.max_fte = max_fte;
-	ft_attr.level   = level;
-	ft_attr.prio    = prio;
-
-	return __mlx5_create_flow_table(ns, &ft_attr, FS_FT_OP_MOD_NORMAL, vport);
+	return __mlx5_create_flow_table(ns, ft_attr, FS_FT_OP_MOD_NORMAL, vport);
 }
 
 struct mlx5_flow_table*
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 97176d623d74..12d84e99ff63 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -50,6 +50,7 @@ enum {
 	MLX5_FLOW_TABLE_TUNNEL_EN_DECAP = BIT(1),
 	MLX5_FLOW_TABLE_TERMINATION = BIT(2),
 	MLX5_FLOW_TABLE_UNMANAGED = BIT(3),
+	MLX5_FLOW_TABLE_OTHER_VPORT = BIT(4),
 };
 
 #define LEFTOVERS_RULE_NUM	 2
@@ -175,9 +176,7 @@ mlx5_create_auto_grouped_flow_table(struct mlx5_flow_namespace *ns,
 
 struct mlx5_flow_table *
 mlx5_create_vport_flow_table(struct mlx5_flow_namespace *ns,
-			     int prio,
-			     int num_flow_table_entries,
-			     u32 level, u16 vport);
+			     struct mlx5_flow_table_attr *ft_attr, u16 vport);
 struct mlx5_flow_table *mlx5_create_lag_demux_flow_table(
 					       struct mlx5_flow_namespace *ns,
 					       int prio, u32 level);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH mlx5-next 09/16] net/mlx5: Expose IP-in-IP TX and RX capability bits
  2020-11-20 23:03 ` [PATCH mlx5-next 09/16] net/mlx5: Expose IP-in-IP TX and RX capability bits Saeed Mahameed
@ 2020-11-21 23:58   ` Jakub Kicinski
  2020-11-22 15:17     ` Aya Levin
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2020-11-21 23:58 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Leon Romanovsky, netdev, linux-rdma, Aya Levin, Moshe Shemesh

On Fri, 20 Nov 2020 15:03:32 -0800 Saeed Mahameed wrote:
> From: Aya Levin <ayal@nvidia.com>
> 
> Expose FW indication that it supports stateless offloads for IP over IP
> tunneled packets per direction. In some HW like ConnectX-4 IP-in-IP
> support is not symmetric, it supports steering on the inner header but
> it doesn't TX-Checksum and TSO. Add IP-in-IP capability per direction to
> cover this case as well.

What's the use for the rx capability in Linux? We don't have an API to
configure that AFAIK.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace
  2020-11-20 23:03 ` [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace Saeed Mahameed
@ 2020-11-22  0:01   ` Jakub Kicinski
  2020-11-22  6:41     ` Eli Cohen
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2020-11-22  0:01 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Leon Romanovsky, netdev, linux-rdma, Eli Cohen, Mark Bloch,
	Maor Gottlieb

On Fri, 20 Nov 2020 15:03:34 -0800 Saeed Mahameed wrote:
> From: Eli Cohen <eli@mellanox.com>
> 
> Add a new namespace type to the NIC RX root namespace to allow for
> inserting VDPA rules before regular NIC but after bypass, thus allowing
> DPDK to have precedence in packet processing.

How does DPDK and VDPA relate in this context?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace
  2020-11-22  0:01   ` Jakub Kicinski
@ 2020-11-22  6:41     ` Eli Cohen
  2020-11-24 17:12       ` Jakub Kicinski
  0 siblings, 1 reply; 32+ messages in thread
From: Eli Cohen @ 2020-11-22  6:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, Leon Romanovsky, netdev, linux-rdma, Eli Cohen,
	Mark Bloch, Maor Gottlieb

On Sat, Nov 21, 2020 at 04:01:55PM -0800, Jakub Kicinski wrote:
> On Fri, 20 Nov 2020 15:03:34 -0800 Saeed Mahameed wrote:
> > From: Eli Cohen <eli@mellanox.com>
> > 
> > Add a new namespace type to the NIC RX root namespace to allow for
> > inserting VDPA rules before regular NIC but after bypass, thus allowing
> > DPDK to have precedence in packet processing.
> 
> How does DPDK and VDPA relate in this context?

mlx5 steering is hierarchical and defines precedence amongst namespaces.
Up till now, the VDPA implementation would insert a rule into the
MLX5_FLOW_NAMESPACE_BYPASS hierarchy which is used by DPDK thus taking
all the incoming traffic.

The MLX5_FLOW_NAMESPACE_VDPA hirerachy comes after
MLX5_FLOW_NAMESPACE_BYPASS.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH mlx5-next 09/16] net/mlx5: Expose IP-in-IP TX and RX capability bits
  2020-11-21 23:58   ` Jakub Kicinski
@ 2020-11-22 15:17     ` Aya Levin
  2020-11-23 21:15       ` Saeed Mahameed
  0 siblings, 1 reply; 32+ messages in thread
From: Aya Levin @ 2020-11-22 15:17 UTC (permalink / raw)
  To: Jakub Kicinski, Saeed Mahameed
  Cc: Leon Romanovsky, netdev, linux-rdma, Moshe Shemesh



On 11/22/2020 1:58 AM, Jakub Kicinski wrote:
> On Fri, 20 Nov 2020 15:03:32 -0800 Saeed Mahameed wrote:
>> From: Aya Levin <ayal@nvidia.com>
>>
>> Expose FW indication that it supports stateless offloads for IP over IP
>> tunneled packets per direction. In some HW like ConnectX-4 IP-in-IP
>> support is not symmetric, it supports steering on the inner header but
>> it doesn't TX-Checksum and TSO. Add IP-in-IP capability per direction to
>> cover this case as well.
> 
> What's the use for the rx capability in Linux? We don't have an API to
> configure that AFAIK.
> 
Correct, the rx capability bit is used by the driver to allow flow 
steering on the inner header.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH mlx5-next 09/16] net/mlx5: Expose IP-in-IP TX and RX capability bits
  2020-11-22 15:17     ` Aya Levin
@ 2020-11-23 21:15       ` Saeed Mahameed
  0 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-23 21:15 UTC (permalink / raw)
  To: Aya Levin, Jakub Kicinski
  Cc: Leon Romanovsky, netdev, linux-rdma, Moshe Shemesh

On Sun, 2020-11-22 at 17:17 +0200, Aya Levin wrote:
> 
> On 11/22/2020 1:58 AM, Jakub Kicinski wrote:
> > On Fri, 20 Nov 2020 15:03:32 -0800 Saeed Mahameed wrote:
> > > From: Aya Levin <ayal@nvidia.com>
> > > 
> > > Expose FW indication that it supports stateless offloads for IP
> > > over IP
> > > tunneled packets per direction. In some HW like ConnectX-4 IP-in-
> > > IP
> > > support is not symmetric, it supports steering on the inner
> > > header but
> > > it doesn't TX-Checksum and TSO. Add IP-in-IP capability per
> > > direction to
> > > cover this case as well.
> > 
> > What's the use for the rx capability in Linux? We don't have an API
> > to
> > configure that AFAIK.
> > 
> Correct, the rx capability bit is used by the driver to allow flow 
> steering on the inner header.

Currently we use the global HW capability to enable flow steering on
inner header for RSS. in upcoming patch to net-next we will relax the
dependency on the global capability and will use the dedicated rx cap
instead.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace
  2020-11-22  6:41     ` Eli Cohen
@ 2020-11-24 17:12       ` Jakub Kicinski
  2020-11-24 18:02         ` Jason Gunthorpe
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2020-11-24 17:12 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Eli Cohen, Leon Romanovsky, netdev, linux-rdma, Eli Cohen,
	Mark Bloch, Maor Gottlieb

On Sun, 22 Nov 2020 08:41:58 +0200 Eli Cohen wrote:
> On Sat, Nov 21, 2020 at 04:01:55PM -0800, Jakub Kicinski wrote:
> > On Fri, 20 Nov 2020 15:03:34 -0800 Saeed Mahameed wrote:  
> > > From: Eli Cohen <eli@mellanox.com>
> > > 
> > > Add a new namespace type to the NIC RX root namespace to allow for
> > > inserting VDPA rules before regular NIC but after bypass, thus allowing
> > > DPDK to have precedence in packet processing.  
> > 
> > How does DPDK and VDPA relate in this context?  
> 
> mlx5 steering is hierarchical and defines precedence amongst namespaces.
> Up till now, the VDPA implementation would insert a rule into the
> MLX5_FLOW_NAMESPACE_BYPASS hierarchy which is used by DPDK thus taking
> all the incoming traffic.
> 
> The MLX5_FLOW_NAMESPACE_VDPA hirerachy comes after
> MLX5_FLOW_NAMESPACE_BYPASS.

Our policy was no DPDK driver bifurcation. There's no asterisk saying
"unless you pretend you need flow filters for RDMA, get them upstream
and then drop the act".

What do you expect me to do?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace
  2020-11-24 17:12       ` Jakub Kicinski
@ 2020-11-24 18:02         ` Jason Gunthorpe
  2020-11-24 18:41           ` Jakub Kicinski
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2020-11-24 18:02 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, Eli Cohen, Leon Romanovsky, netdev, linux-rdma,
	Eli Cohen, Mark Bloch, Maor Gottlieb

On Tue, Nov 24, 2020 at 09:12:19AM -0800, Jakub Kicinski wrote:
> On Sun, 22 Nov 2020 08:41:58 +0200 Eli Cohen wrote:
> > On Sat, Nov 21, 2020 at 04:01:55PM -0800, Jakub Kicinski wrote:
> > > On Fri, 20 Nov 2020 15:03:34 -0800 Saeed Mahameed wrote:  
> > > > From: Eli Cohen <eli@mellanox.com>
> > > > 
> > > > Add a new namespace type to the NIC RX root namespace to allow for
> > > > inserting VDPA rules before regular NIC but after bypass, thus allowing
> > > > DPDK to have precedence in packet processing.  
> > > 
> > > How does DPDK and VDPA relate in this context?  
> > 
> > mlx5 steering is hierarchical and defines precedence amongst namespaces.
> > Up till now, the VDPA implementation would insert a rule into the
> > MLX5_FLOW_NAMESPACE_BYPASS hierarchy which is used by DPDK thus taking
> > all the incoming traffic.
> > 
> > The MLX5_FLOW_NAMESPACE_VDPA hirerachy comes after
> > MLX5_FLOW_NAMESPACE_BYPASS.
> 
> Our policy was no DPDK driver bifurcation. There's no asterisk saying
> "unless you pretend you need flow filters for RDMA, get them upstream
> and then drop the act".

Huh?

mlx5 DPDK is an *RDMA* userspace application. It links to
libibverbs. It runs on the RDMA stack. It uses RDMA flow filtering and
RDMA raw ethernet QPs. It has been like this for years, it is not some
"act".

It is long standing uABI that accelerators like RDMA/etc get to take
the traffic before netdev. This cannot be reverted. I don't really
understand what you are expecting here?

Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace
  2020-11-24 18:02         ` Jason Gunthorpe
@ 2020-11-24 18:41           ` Jakub Kicinski
  2020-11-24 19:44             ` Jason Gunthorpe
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2020-11-24 18:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Saeed Mahameed, Eli Cohen, Leon Romanovsky, netdev, linux-rdma,
	Eli Cohen, Mark Bloch, Maor Gottlieb

On Tue, 24 Nov 2020 14:02:10 -0400 Jason Gunthorpe wrote:
> On Tue, Nov 24, 2020 at 09:12:19AM -0800, Jakub Kicinski wrote:
> > On Sun, 22 Nov 2020 08:41:58 +0200 Eli Cohen wrote:  
> > > On Sat, Nov 21, 2020 at 04:01:55PM -0800, Jakub Kicinski wrote:  
> > > > On Fri, 20 Nov 2020 15:03:34 -0800 Saeed Mahameed wrote:    
> > > > > From: Eli Cohen <eli@mellanox.com>
> > > > > 
> > > > > Add a new namespace type to the NIC RX root namespace to allow for
> > > > > inserting VDPA rules before regular NIC but after bypass, thus allowing
> > > > > DPDK to have precedence in packet processing.    
> > > > 
> > > > How does DPDK and VDPA relate in this context?    
> > > 
> > > mlx5 steering is hierarchical and defines precedence amongst namespaces.
> > > Up till now, the VDPA implementation would insert a rule into the
> > > MLX5_FLOW_NAMESPACE_BYPASS hierarchy which is used by DPDK thus taking
> > > all the incoming traffic.
> > > 
> > > The MLX5_FLOW_NAMESPACE_VDPA hirerachy comes after
> > > MLX5_FLOW_NAMESPACE_BYPASS.  
> > 
> > Our policy was no DPDK driver bifurcation. There's no asterisk saying
> > "unless you pretend you need flow filters for RDMA, get them upstream
> > and then drop the act".  
> 
> Huh?
> 
> mlx5 DPDK is an *RDMA* userspace application. 

Forgive me for my naiveté. 

Here I thought the RDMA subsystem is for doing RDMA.

I'm sure if you start doing crypto over ibverbs crypto people will want
to have a look.

> libibverbs. It runs on the RDMA stack. It uses RDMA flow filtering and
> RDMA raw ethernet QPs. 

I'm not saying that's not the case. I'm saying I don't think this was
something that netdev developers signed-off on. And our policy on DPDK
is pretty widely known.

Would you mind pointing us to the introduction of raw Ethernet QPs?

Is there any production use for that without DPDK?

> It has been like this for years, it is not some "act".
> 
> It is long standing uABI that accelerators like RDMA/etc get to take
> the traffic before netdev. This cannot be reverted. I don't really
> understand what you are expecting here?

Same. I don't really know what you expect me to do either. I don't
think I can sign-off on kernel changes needed for DPDK.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace
  2020-11-24 18:41           ` Jakub Kicinski
@ 2020-11-24 19:44             ` Jason Gunthorpe
  2020-11-25  6:19               ` Eli Cohen
  2020-11-25 18:54               ` Jakub Kicinski
  0 siblings, 2 replies; 32+ messages in thread
From: Jason Gunthorpe @ 2020-11-24 19:44 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, Eli Cohen, Leon Romanovsky, netdev, linux-rdma,
	Eli Cohen, Mark Bloch, Maor Gottlieb

On Tue, Nov 24, 2020 at 10:41:06AM -0800, Jakub Kicinski wrote:
> On Tue, 24 Nov 2020 14:02:10 -0400 Jason Gunthorpe wrote:
> > On Tue, Nov 24, 2020 at 09:12:19AM -0800, Jakub Kicinski wrote:
> > > On Sun, 22 Nov 2020 08:41:58 +0200 Eli Cohen wrote:  
> > > > On Sat, Nov 21, 2020 at 04:01:55PM -0800, Jakub Kicinski wrote:  
> > > > > On Fri, 20 Nov 2020 15:03:34 -0800 Saeed Mahameed wrote:    
> > > > > > From: Eli Cohen <eli@mellanox.com>
> > > > > > 
> > > > > > Add a new namespace type to the NIC RX root namespace to allow for
> > > > > > inserting VDPA rules before regular NIC but after bypass, thus allowing
> > > > > > DPDK to have precedence in packet processing.    
> > > > > 
> > > > > How does DPDK and VDPA relate in this context?    
> > > > 
> > > > mlx5 steering is hierarchical and defines precedence amongst namespaces.
> > > > Up till now, the VDPA implementation would insert a rule into the
> > > > MLX5_FLOW_NAMESPACE_BYPASS hierarchy which is used by DPDK thus taking
> > > > all the incoming traffic.
> > > > 
> > > > The MLX5_FLOW_NAMESPACE_VDPA hirerachy comes after
> > > > MLX5_FLOW_NAMESPACE_BYPASS.  
> > > 
> > > Our policy was no DPDK driver bifurcation. There's no asterisk saying
> > > "unless you pretend you need flow filters for RDMA, get them upstream
> > > and then drop the act".  
> > 
> > Huh?
> > 
> > mlx5 DPDK is an *RDMA* userspace application. 
> 
> Forgive me for my naiveté. 
> 
> Here I thought the RDMA subsystem is for doing RDMA.

RDMA covers a wide range of accelerated networking these days.. Where
else are you going to put this stuff in the kernel?

> I'm sure if you start doing crypto over ibverbs crypto people will want
> to have a look.

Well, RDMA has crypto transforms for a few years now too. Why would
crypto subsystem people be involved? It isn't using or duplicating
their APIs.

> > libibverbs. It runs on the RDMA stack. It uses RDMA flow filtering and
> > RDMA raw ethernet QPs. 
> 
> I'm not saying that's not the case. I'm saying I don't think this was
> something that netdev developers signed-off on.

Part of the point of the subsystem split was to end the fighting that
started all of it. It was very clear during the whole iWarp and TCP
Offload Engine buisness in the mid 2000's that netdev wanted nothing
to do with the accelerator world.

So why would netdev need sign off on any accelerator stuff?  Do you
want to start co-operating now? I'm willing to talk about how to do
that.

> And our policy on DPDK is pretty widely known.

I honestly have no idea on the netdev DPDK policy, I'm maintaining the
RDMA subsystem not DPDK :)

> Would you mind pointing us to the introduction of raw Ethernet QPs?
> 
> Is there any production use for that without DPDK?

Hmm.. It is very old. RAW (InfiniBand) QPs were part of the original
IBA specification cira 2000. When RoCE was defined (around 2010) they
were naturally carried forward to Ethernet. The "flow steering"
concept to make raw ethernet QP useful was added to verbs around 2012
- 2013. It officially made it upstream in commit 436f2ad05a0b
("IB/core: Export ib_create/destroy_flow through uverbs")

If I recall properly the first real application was ultra low latency
ethernet processing for financial applications.

dpdk later adopted the first mlx4 PMD using this libibverbs API around
2015. Interestingly the mlx4 PMD was made through an open source
process with minimal involvment from Mellanox, based on the
pre-existing RDMA work.

Currently there are many projects, and many open source, built on top
of the RDMA raw ethernet QP and RDMA flow steering model. It is now
long established kernel ABI.

> > It has been like this for years, it is not some "act".
> > 
> > It is long standing uABI that accelerators like RDMA/etc get to take
> > the traffic before netdev. This cannot be reverted. I don't really
> > understand what you are expecting here?
> 
> Same. I don't really know what you expect me to do either. I don't
> think I can sign-off on kernel changes needed for DPDK.

This patch is fine tuning the shared logic that splits the traffic to
accelerator subsystems, I don't think netdev should have a veto
here. This needs to be consensus among the various communities and
subsystems that rely on this.

Eli did not explain this well in his commit message. When he said DPDK
he means RDMA which is the owner of the FLOW_NAMESPACE. Each
accelerator subsystem gets hooked into this, so here VPDA is getting
its own hook because re-using the the same hook between two kernel
subsystems is buggy.

Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace
  2020-11-24 19:44             ` Jason Gunthorpe
@ 2020-11-25  6:19               ` Eli Cohen
  2020-11-25 19:04                 ` Saeed Mahameed
  2020-11-25 18:54               ` Jakub Kicinski
  1 sibling, 1 reply; 32+ messages in thread
From: Eli Cohen @ 2020-11-25  6:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jakub Kicinski, Saeed Mahameed, Leon Romanovsky, netdev,
	linux-rdma, Eli Cohen, Mark Bloch, Maor Gottlieb

On Tue, Nov 24, 2020 at 03:44:13PM -0400, Jason Gunthorpe wrote:
> On Tue, Nov 24, 2020 at 10:41:06AM -0800, Jakub Kicinski wrote:
> > On Tue, 24 Nov 2020 14:02:10 -0400 Jason Gunthorpe wrote:
> > > On Tue, Nov 24, 2020 at 09:12:19AM -0800, Jakub Kicinski wrote:
> > > > On Sun, 22 Nov 2020 08:41:58 +0200 Eli Cohen wrote:  
> > > > > On Sat, Nov 21, 2020 at 04:01:55PM -0800, Jakub Kicinski wrote:  
> > > > > > On Fri, 20 Nov 2020 15:03:34 -0800 Saeed Mahameed wrote:    
> > > > > > > From: Eli Cohen <eli@mellanox.com>
> > > > > > > 
> > > > > > > Add a new namespace type to the NIC RX root namespace to allow for
> > > > > > > inserting VDPA rules before regular NIC but after bypass, thus allowing
> > > > > > > DPDK to have precedence in packet processing.    
> > > > > > 
> > > > > > How does DPDK and VDPA relate in this context?    
> > > > > 
> > > > > mlx5 steering is hierarchical and defines precedence amongst namespaces.
> > > > > Up till now, the VDPA implementation would insert a rule into the
> > > > > MLX5_FLOW_NAMESPACE_BYPASS hierarchy which is used by DPDK thus taking
> > > > > all the incoming traffic.
> > > > > 
> > > > > The MLX5_FLOW_NAMESPACE_VDPA hirerachy comes after
> > > > > MLX5_FLOW_NAMESPACE_BYPASS.  
> > > > 
> > > > Our policy was no DPDK driver bifurcation. There's no asterisk saying
> > > > "unless you pretend you need flow filters for RDMA, get them upstream
> > > > and then drop the act".  
> > > 
> > > Huh?
> > > 
> > > mlx5 DPDK is an *RDMA* userspace application. 
> > 
> > Forgive me for my naiveté. 
> > 
> > Here I thought the RDMA subsystem is for doing RDMA.
> 
> RDMA covers a wide range of accelerated networking these days.. Where
> else are you going to put this stuff in the kernel?
> 
> > I'm sure if you start doing crypto over ibverbs crypto people will want
> > to have a look.
> 
> Well, RDMA has crypto transforms for a few years now too. Why would
> crypto subsystem people be involved? It isn't using or duplicating
> their APIs.
> 
> > > libibverbs. It runs on the RDMA stack. It uses RDMA flow filtering and
> > > RDMA raw ethernet QPs. 
> > 
> > I'm not saying that's not the case. I'm saying I don't think this was
> > something that netdev developers signed-off on.
> 
> Part of the point of the subsystem split was to end the fighting that
> started all of it. It was very clear during the whole iWarp and TCP
> Offload Engine buisness in the mid 2000's that netdev wanted nothing
> to do with the accelerator world.
> 
> So why would netdev need sign off on any accelerator stuff?  Do you
> want to start co-operating now? I'm willing to talk about how to do
> that.
> 
> > And our policy on DPDK is pretty widely known.
> 
> I honestly have no idea on the netdev DPDK policy, I'm maintaining the
> RDMA subsystem not DPDK :)
> 
> > Would you mind pointing us to the introduction of raw Ethernet QPs?
> > 
> > Is there any production use for that without DPDK?
> 
> Hmm.. It is very old. RAW (InfiniBand) QPs were part of the original
> IBA specification cira 2000. When RoCE was defined (around 2010) they
> were naturally carried forward to Ethernet. The "flow steering"
> concept to make raw ethernet QP useful was added to verbs around 2012
> - 2013. It officially made it upstream in commit 436f2ad05a0b
> ("IB/core: Export ib_create/destroy_flow through uverbs")
> 
> If I recall properly the first real application was ultra low latency
> ethernet processing for financial applications.
> 
> dpdk later adopted the first mlx4 PMD using this libibverbs API around
> 2015. Interestingly the mlx4 PMD was made through an open source
> process with minimal involvment from Mellanox, based on the
> pre-existing RDMA work.
> 
> Currently there are many projects, and many open source, built on top
> of the RDMA raw ethernet QP and RDMA flow steering model. It is now
> long established kernel ABI.
> 
> > > It has been like this for years, it is not some "act".
> > > 
> > > It is long standing uABI that accelerators like RDMA/etc get to take
> > > the traffic before netdev. This cannot be reverted. I don't really
> > > understand what you are expecting here?
> > 
> > Same. I don't really know what you expect me to do either. I don't
> > think I can sign-off on kernel changes needed for DPDK.
> 
> This patch is fine tuning the shared logic that splits the traffic to
> accelerator subsystems, I don't think netdev should have a veto
> here. This needs to be consensus among the various communities and
> subsystems that rely on this.
> 
> Eli did not explain this well in his commit message. When he said DPDK
> he means RDMA which is the owner of the FLOW_NAMESPACE. Each
> accelerator subsystem gets hooked into this, so here VPDA is getting
> its own hook because re-using the the same hook between two kernel
> subsystems is buggy.

I agree, RDMA should have been used here. DPDK is just one, though
widely used, accelerator using RDMA interfaces to flow steering.

I will push submit another patch with a modified change log.

> 
> Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace
  2020-11-24 19:44             ` Jason Gunthorpe
  2020-11-25  6:19               ` Eli Cohen
@ 2020-11-25 18:54               ` Jakub Kicinski
  2020-11-25 19:28                 ` Saeed Mahameed
  2020-11-25 21:22                 ` Jason Gunthorpe
  1 sibling, 2 replies; 32+ messages in thread
From: Jakub Kicinski @ 2020-11-25 18:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Saeed Mahameed, Eli Cohen, Leon Romanovsky, netdev, linux-rdma,
	Eli Cohen, Mark Bloch, Maor Gottlieb

On Tue, 24 Nov 2020 15:44:13 -0400 Jason Gunthorpe wrote:
> On Tue, Nov 24, 2020 at 10:41:06AM -0800, Jakub Kicinski wrote:
> > On Tue, 24 Nov 2020 14:02:10 -0400 Jason Gunthorpe wrote:  
> > > On Tue, Nov 24, 2020 at 09:12:19AM -0800, Jakub Kicinski wrote:  
> > > > On Sun, 22 Nov 2020 08:41:58 +0200 Eli Cohen wrote:    
> > > > > On Sat, Nov 21, 2020 at 04:01:55PM -0800, Jakub Kicinski wrote:    
> > > > > > On Fri, 20 Nov 2020 15:03:34 -0800 Saeed Mahameed wrote:      
> > > > > > > From: Eli Cohen <eli@mellanox.com>
> > > > > > > 
> > > > > > > Add a new namespace type to the NIC RX root namespace to allow for
> > > > > > > inserting VDPA rules before regular NIC but after bypass, thus allowing
> > > > > > > DPDK to have precedence in packet processing.      
> > > > > > 
> > > > > > How does DPDK and VDPA relate in this context?      
> > > > > 
> > > > > mlx5 steering is hierarchical and defines precedence amongst namespaces.
> > > > > Up till now, the VDPA implementation would insert a rule into the
> > > > > MLX5_FLOW_NAMESPACE_BYPASS hierarchy which is used by DPDK thus taking
> > > > > all the incoming traffic.
> > > > > 
> > > > > The MLX5_FLOW_NAMESPACE_VDPA hirerachy comes after
> > > > > MLX5_FLOW_NAMESPACE_BYPASS.    
> > > > 
> > > > Our policy was no DPDK driver bifurcation. There's no asterisk saying
> > > > "unless you pretend you need flow filters for RDMA, get them upstream
> > > > and then drop the act".    
> > > 
> > > Huh?
> > > 
> > > mlx5 DPDK is an *RDMA* userspace application.   
> > 
> > Forgive me for my naiveté. 
> > 
> > Here I thought the RDMA subsystem is for doing RDMA.  
> 
> RDMA covers a wide range of accelerated networking these days.. Where
> else are you going to put this stuff in the kernel?

IDK what else you got in there :) It's probably a case by case answer.

IMHO even using libibverbs is no strong reason for things to fall under
RDMA exclusively. Client drivers of virtio don't get silently funneled
through a separate tree just because they use a certain spec.

> > I'm sure if you start doing crypto over ibverbs crypto people will want
> > to have a look.  
> 
> Well, RDMA has crypto transforms for a few years now too. 

Are you talking about RDMA traffic being encrypted? That's a different
case.

My example was alluding to access to a generic crypto accelerator 
over ibverbs. I hope you'd let crypto people know when merging such 
a thing...

> Why would crypto subsystem people be involved? It isn't using or
> duplicating their APIs.
> 
> > > libibverbs. It runs on the RDMA stack. It uses RDMA flow
> > > filtering and RDMA raw ethernet QPs.   
> > 
> > I'm not saying that's not the case. I'm saying I don't think this
> > was something that netdev developers signed-off on.  
> 
> Part of the point of the subsystem split was to end the fighting that
> started all of it. It was very clear during the whole iWarp and TCP
> Offload Engine buisness in the mid 2000's that netdev wanted nothing
> to do with the accelerator world.

I was in middle school at the time, not sure what exactly went down :)
But I'm going by common sense here. Perhaps there was an agreement I'm
not aware of?

> So why would netdev need sign off on any accelerator stuff?

I'm not sure why you keep saying accelerators!

What is accelerated in raw Ethernet frame access??

> Do you want to start co-operating now? I'm willing to talk about how
> to do that.

IDK how that's even in question. I always try to bump all RDMA-looking
stuff to linux-rdma when it's not CCed there. That's the bare minimum
of cooperation I'd expect from anyone.

> > And our policy on DPDK is pretty widely known.  
> 
> I honestly have no idea on the netdev DPDK policy,
> 
> I'm maintaining the RDMA subsystem not DPDK :)

That's what I thought, but turns out DPDK is your important user.

> > Would you mind pointing us to the introduction of raw Ethernet QPs?
> > 
> > Is there any production use for that without DPDK?  
> 
> Hmm.. It is very old. RAW (InfiniBand) QPs were part of the original
> IBA specification cira 2000. When RoCE was defined (around 2010) they
> were naturally carried forward to Ethernet. The "flow steering"
> concept to make raw ethernet QP useful was added to verbs around 2012
> - 2013. It officially made it upstream in commit 436f2ad05a0b
> ("IB/core: Export ib_create/destroy_flow through uverbs")
> 
> If I recall properly the first real application was ultra low latency
> ethernet processing for financial applications.
> 
> dpdk later adopted the first mlx4 PMD using this libibverbs API around
> 2015. Interestingly the mlx4 PMD was made through an open source
> process with minimal involvment from Mellanox, based on the
> pre-existing RDMA work.
> 
> Currently there are many projects, and many open source, built on top
> of the RDMA raw ethernet QP and RDMA flow steering model. It is now
> long established kernel ABI.
> 
> > > It has been like this for years, it is not some "act".
> > > 
> > > It is long standing uABI that accelerators like RDMA/etc get to
> > > take the traffic before netdev. This cannot be reverted. I don't
> > > really understand what you are expecting here?  
> > 
> > Same. I don't really know what you expect me to do either. I don't
> > think I can sign-off on kernel changes needed for DPDK.  
> 
> This patch is fine tuning the shared logic that splits the traffic to
> accelerator subsystems, I don't think netdev should have a veto
> here. This needs to be consensus among the various communities and
> subsystems that rely on this.
> 
> Eli did not explain this well in his commit message. When he said DPDK
> he means RDMA which is the owner of the FLOW_NAMESPACE. Each
> accelerator subsystem gets hooked into this, so here VPDA is getting
> its own hook because re-using the the same hook between two kernel
> subsystems is buggy.

I'm not so sure about this.

The switchdev modeling is supposed to give users control over flow of
traffic in a sane, well defined way, as opposed to magic flow filtering
of the early SR-IOV implementations which every vendor had their own
twist on. 

Now IIUC you're tapping traffic for DPDK/raw QPs _before_ all switching
happens in the NIC? That breaks the switchdev model. We're back to
per-vendor magic.

And why do you need a separate VDPA table in the first place?
Forwarding to a VDPA device has different semantics than forwarding to
any other VF/SF?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace
  2020-11-25  6:19               ` Eli Cohen
@ 2020-11-25 19:04                 ` Saeed Mahameed
  0 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-25 19:04 UTC (permalink / raw)
  To: Eli Cohen, Jason Gunthorpe
  Cc: Jakub Kicinski, Leon Romanovsky, netdev, linux-rdma, Eli Cohen,
	Mark Bloch, Maor Gottlieb

On Wed, 2020-11-25 at 08:19 +0200, Eli Cohen wrote:
> On Tue, Nov 24, 2020 at 03:44:13PM -0400, Jason Gunthorpe wrote:
> > On Tue, Nov 24, 2020 at 10:41:06AM -0800, Jakub Kicinski wrote:
> > > On Tue, 24 Nov 2020 14:02:10 -0400 Jason Gunthorpe wrote:
> > > > On Tue, Nov 24, 2020 at 09:12:19AM -0800, Jakub Kicinski wrote:
> > > > > On Sun, 22 Nov 2020 08:41:58 +0200 Eli Cohen wrote:  
> > > > > > On Sat, Nov 21, 2020 at 04:01:55PM -0800, Jakub Kicinski
> > > > > > wrote:  
> > > > > > > On Fri, 20 Nov 2020 15:03:34 -0800 Saeed Mahameed
> > > > > > > wrote:    
> > > > > > > > From: Eli Cohen <eli@mellanox.com>
> > > > > > > > 
> > > > > > > > Add a new namespace type to the NIC RX root namespace
> > > > > > > > to allow for
> > > > > > > > inserting VDPA rules before regular NIC but after
> > > > > > > > bypass, thus allowing
> > > > > > > > DPDK to have precedence in packet processing.    
> > > > > > > 
> > > > > > > How does DPDK and VDPA relate in this context?    
> > > > > > 
> > > > > > mlx5 steering is hierarchical and defines precedence
> > > > > > amongst namespaces.
> > > > > > Up till now, the VDPA implementation would insert a rule
> > > > > > into the
> > > > > > MLX5_FLOW_NAMESPACE_BYPASS hierarchy which is used by DPDK
> > > > > > thus taking
> > > > > > all the incoming traffic.
> > > > > > 
> > > > > > The MLX5_FLOW_NAMESPACE_VDPA hirerachy comes after
> > > > > > MLX5_FLOW_NAMESPACE_BYPASS.  
> > > > > 
> > > > > Our policy was no DPDK driver bifurcation. There's no
> > > > > asterisk saying
> > > > > "unless you pretend you need flow filters for RDMA, get them
> > > > > upstream
> > > > > and then drop the act".  
> > > > 
> > > > Huh?
> > > > 
> > > > mlx5 DPDK is an *RDMA* userspace application. 
> > > 
> > > Forgive me for my naiveté. 
> > > 
> > > Here I thought the RDMA subsystem is for doing RDMA.
> > 
> > RDMA covers a wide range of accelerated networking these days..
> > Where
> > else are you going to put this stuff in the kernel?
> > 
> > > I'm sure if you start doing crypto over ibverbs crypto people
> > > will want
> > > to have a look.
> > 
> > Well, RDMA has crypto transforms for a few years now too. Why would
> > crypto subsystem people be involved? It isn't using or duplicating
> > their APIs.
> > 
> > > > libibverbs. It runs on the RDMA stack. It uses RDMA flow
> > > > filtering and
> > > > RDMA raw ethernet QPs. 
> > > 
> > > I'm not saying that's not the case. I'm saying I don't think this
> > > was
> > > something that netdev developers signed-off on.
> > 
> > Part of the point of the subsystem split was to end the fighting
> > that
> > started all of it. It was very clear during the whole iWarp and TCP
> > Offload Engine buisness in the mid 2000's that netdev wanted
> > nothing
> > to do with the accelerator world.
> > 
> > So why would netdev need sign off on any accelerator stuff?  Do you
> > want to start co-operating now? I'm willing to talk about how to do
> > that.
> > 
> > > And our policy on DPDK is pretty widely known.
> > 
> > I honestly have no idea on the netdev DPDK policy, I'm maintaining
> > the
> > RDMA subsystem not DPDK :)
> > 
> > > Would you mind pointing us to the introduction of raw Ethernet
> > > QPs?
> > > 
> > > Is there any production use for that without DPDK?
> > 
> > Hmm.. It is very old. RAW (InfiniBand) QPs were part of the
> > original
> > IBA specification cira 2000. When RoCE was defined (around 2010)
> > they
> > were naturally carried forward to Ethernet. The "flow steering"
> > concept to make raw ethernet QP useful was added to verbs around
> > 2012
> > - 2013. It officially made it upstream in commit 436f2ad05a0b
> > ("IB/core: Export ib_create/destroy_flow through uverbs")
> > 
> > If I recall properly the first real application was ultra low
> > latency
> > ethernet processing for financial applications.
> > 
> > dpdk later adopted the first mlx4 PMD using this libibverbs API
> > around
> > 2015. Interestingly the mlx4 PMD was made through an open source
> > process with minimal involvment from Mellanox, based on the
> > pre-existing RDMA work.
> > 
> > Currently there are many projects, and many open source, built on
> > top
> > of the RDMA raw ethernet QP and RDMA flow steering model. It is now
> > long established kernel ABI.
> > 
> > > > It has been like this for years, it is not some "act".
> > > > 
> > > > It is long standing uABI that accelerators like RDMA/etc get to
> > > > take
> > > > the traffic before netdev. This cannot be reverted. I don't
> > > > really
> > > > understand what you are expecting here?
> > > 
> > > Same. I don't really know what you expect me to do either. I
> > > don't
> > > think I can sign-off on kernel changes needed for DPDK.
> > 
> > This patch is fine tuning the shared logic that splits the traffic
> > to
> > accelerator subsystems, I don't think netdev should have a veto
> > here. This needs to be consensus among the various communities and
> > subsystems that rely on this.
> > 
> > Eli did not explain this well in his commit message. When he said
> > DPDK
> > he means RDMA which is the owner of the FLOW_NAMESPACE. Each
> > accelerator subsystem gets hooked into this, so here VPDA is
> > getting
> > its own hook because re-using the the same hook between two kernel
> > subsystems is buggy.
> 
> I agree, RDMA should have been used here. DPDK is just one, though
> widely used, accelerator using RDMA interfaces to flow steering.
> 
> I will push submit another patch with a modified change log.

Hi Jakub, Given that this patch is just defining the HW domain of vDPA
to separate it from the RDMA domain, it has no actual functionality,
just the mlx5_core low level hw definition, can i take this patch to
mlx5-next and move on to my next series ? 

Thanks,
Saeed.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace
  2020-11-25 18:54               ` Jakub Kicinski
@ 2020-11-25 19:28                 ` Saeed Mahameed
  2020-11-25 21:22                 ` Jason Gunthorpe
  1 sibling, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-25 19:28 UTC (permalink / raw)
  To: Jakub Kicinski, Jason Gunthorpe
  Cc: Eli Cohen, Leon Romanovsky, netdev, linux-rdma, Eli Cohen,
	Mark Bloch, Maor Gottlieb

On Wed, 2020-11-25 at 10:54 -0800, Jakub Kicinski wrote:
> On Tue, 24 Nov 2020 15:44:13 -0400 Jason Gunthorpe wrote:
> > On Tue, Nov 24, 2020 at 10:41:06AM -0800, Jakub Kicinski wrote:
> > > On Tue, 24 Nov 2020 14:02:10 -0400 Jason Gunthorpe wrote:  

[snip]

> > > > > > 
> > > > It has been like this for years, it is not some "act".
> > > > 
> > > > It is long standing uABI that accelerators like RDMA/etc get to
> > > > take the traffic before netdev. This cannot be reverted. I
> > > > don't
> > > > really understand what you are expecting here?  
> > > 
> > > Same. I don't really know what you expect me to do either. I
> > > don't
> > > think I can sign-off on kernel changes needed for DPDK.  
> > 
> > This patch is fine tuning the shared logic that splits the traffic
> > to
> > accelerator subsystems, I don't think netdev should have a veto
> > here. This needs to be consensus among the various communities and
> > subsystems that rely on this.
> > 
> > Eli did not explain this well in his commit message. When he said
> > DPDK
> > he means RDMA which is the owner of the FLOW_NAMESPACE. Each
> > accelerator subsystem gets hooked into this, so here VPDA is
> > getting
> > its own hook because re-using the the same hook between two kernel
> > subsystems is buggy.
> 
> I'm not so sure about this.
> 
> The switchdev modeling is supposed to give users control over flow of
> traffic in a sane, well defined way, as opposed to magic flow
> filtering
> of the early SR-IOV implementations which every vendor had their own
> twist on. 
> 
> Now IIUC you're tapping traffic for DPDK/raw QPs _before_ all
> switching
> happens in the NIC? That breaks the switchdev model. We're back to
> per-vendor magic.

No this is after switching, nothing can precede switching!
after switching and forwarding to the correct function/vport, 
The HW deumx rdma to rdma and eth(rest) to netdev.

> 
> And why do you need a separate VDPA table in the first place?
> Forwarding to a VDPA device has different semantics than forwarding
> to
> any other VF/SF?

VDPA is yet another "RDMA" Application, similar to raw qp, it is
different than VF/SF.

switching can only forward to PF/VF/SF, it doesn't know or care about
the end point app (netdev/rdma).

Jakub, this is how rdma works and has been working for the past 20
years :), Jason is well aware of the lack of visibility, and i am sure
rdma folks will improve this, they have been improving a lot lately,
take rdma_tool for example.

Bottom line the switching model is not the answer for rdma, another
model is required, rdma by definition is HW oriented from day one, you
can't think of it as an offloaded SW model. ( also in a real switch you
can't define if a port is rdma or eth :) ) 

Anyway you have very valid points that Jason already raised in the
past, but the challenge is more complicated than the challenges we have
in netdev, simply because RDMA is RDMA, where the leading model is the
HW model and the rdma spec and not the SW .. 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace
  2020-11-25 18:54               ` Jakub Kicinski
  2020-11-25 19:28                 ` Saeed Mahameed
@ 2020-11-25 21:22                 ` Jason Gunthorpe
  1 sibling, 0 replies; 32+ messages in thread
From: Jason Gunthorpe @ 2020-11-25 21:22 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, Eli Cohen, Leon Romanovsky, netdev, linux-rdma,
	Eli Cohen, Mark Bloch, Maor Gottlieb

On Wed, Nov 25, 2020 at 10:54:22AM -0800, Jakub Kicinski wrote:

> > RDMA covers a wide range of accelerated networking these days.. Where
> > else are you going to put this stuff in the kernel?
> 
> IDK what else you got in there :) It's probably a case by case answer.

Hmm, yes, it seems endless sometimes :(
 
> IMHO even using libibverbs is no strong reason for things to fall under
> RDMA exclusively. Client drivers of virtio don't get silently funneled
> through a separate tree just because they use a certain spec.

I'm not sure I understand this, libibverbs is the user library to
interface with the kernel RDMA subsystem. I don't care what apps
people build on top of it, it doesn't matter to me that netdev and
DPDK have some kind of feud.

> > > I'm sure if you start doing crypto over ibverbs crypto people will want
> > > to have a look.  
> > 
> > Well, RDMA has crypto transforms for a few years now too. 
> 
> Are you talking about RDMA traffic being encrypted? That's a different
> case.

That too, but in general, anything netdev can do can be done via RDMA
in userspace. So all the kTLS and IPSEC xfrm HW offloads mlx5 supports
are all available in userspace too.

> > Part of the point of the subsystem split was to end the fighting that
> > started all of it. It was very clear during the whole iWarp and TCP
> > Offload Engine buisness in the mid 2000's that netdev wanted nothing
> > to do with the accelerator world.
> 
> I was in middle school at the time, not sure what exactly went down :)

Ah, it was quite the thing. Microsoft and Co were heavilly pushing TOE
technology (Microsoft Chimney!) as the next most certain thing and I
recall DaveM&co was completely against it in Linux.

I will admit at the time I was doubtful, but in hindsight this was the
correct choice. netdev would not look like it does today if it had
been shackled by the HW implementations of the day. Instead all this
HW stuff ended up largely in RDMA and some in block with the iSCSI
mania of old. It is quite evident to me the mess being tied to HW has
caused to a SW ecosystem. DRM and RDMA both have a very similiar kind
of suffering due to this.

However - over the last 20 years it has been steadfast that there is
*always* a compelling reason for certain applications to use something
from the accelerator side. It is not for everyone, but the specialized
applications that need it, *really need it*.

For instance, it is the difference between being able to get a COVID
simulation result in a few week vs .. well.. never.

> But I'm going by common sense here. Perhaps there was an agreement
> I'm not aware of?

The resolution to the argument above was to split them in Linux.  Thus
what logically is networking was split up in the kernel between netdev
and the accelerator subsystems (iscsi, rdma, and so on).

The general notion is netdev doesn't have to accomodate anything an
accelerator does. If you choose to run them then you do not get to
complain that your ethtool counters are wrong, your routing tables
and tc don't work, firewalling doesn't work. Etc.

That is all broken by design.

In turn, the accelerators do their own thing, tap the traffic before
it hits netdev and so on. netdev does not care what goes on over there
and is not responsible.

I would say this is the basic unspoken agreement of the last 15 years.

Both have a right to exist in Linux. Both have a right to use the
physical ethernet port.

> > So why would netdev need sign off on any accelerator stuff?
> 
> I'm not sure why you keep saying accelerators!
> 
> What is accelerated in raw Ethernet frame access??

The nature of the traffic is not relavent.

It goes through RDMA, it is accelerator traffic (vs netdev traffic,
which goes to netdev). Even if you want to be pedantic, in the raw
ethernet area there is lots of HW special accelerated stuff going
on. Mellanox has some really neat hard realtime networking technology
that works on raw ethernet packets, for instance.

And of course raw ethernet is a fraction of what RDMA covers. iWarp
and RoCE are much more like you might imagine when you hear the word
accelerator.

> > Do you want to start co-operating now? I'm willing to talk about how
> > to do that.
> 
> IDK how that's even in question. I always try to bump all RDMA-looking
> stuff to linux-rdma when it's not CCed there. That's the bare minimum
> of cooperation I'd expect from anyone.

I mean co-operate in the sense of defining a scheme where the two
worlds are not completely seperated and isolated.

> > > And our policy on DPDK is pretty widely known.  
> > 
> > I honestly have no idea on the netdev DPDK policy,
> > 
> > I'm maintaining the RDMA subsystem not DPDK :)
> 
> That's what I thought, but turns out DPDK is your important user.

Nonsense.

I don't have stats but the majority of people I work with using RDMA
are not using DPDK. DPDK serves two somewhat niche markets of NFV and
certain hyperscalers - RDMA covers the entire scientific computing
community and a big swath of the classic "Big Iron" enterprise stuff,
like databases and storage.

> Now IIUC you're tapping traffic for DPDK/raw QPs _before_ all switching
> happens in the NIC? That breaks the switchdev model. We're back to
> per-vendor magic.

No, as I explained before, the switchdev completely contains the SF/VF
and all applications running on a mlx5_core are trapped by it. This
includes netdev, RDMA and VDPA.

> And why do you need a separate VDPA table in the first place?
> Forwarding to a VDPA device has different semantics than forwarding to
> any other VF/SF?

The VDPA table is not switchdev. Go back to my overly long email about
VDPA, here we are talking about the "selector" that chooses which
subsystem the traffic will go to. The selector is after switchdev but
before netdev, VDPA, RDMA.

Each accelerator subsystem gets a table. RDMA, VDPA, and netdev all
get one. It is some part of the HW to make the selectoring work.

Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20
  2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
                   ` (15 preceding siblings ...)
  2020-11-20 23:03 ` [PATCH mlx5-next 16/16] net/mlx5: Treat host PF vport as other (non eswitch manager) vport Saeed Mahameed
@ 2020-11-30 18:42 ` Saeed Mahameed
  16 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2020-11-30 18:42 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: netdev, linux-rdma, Jakub Kicinski, Jason Gunthorpe

On Fri, 2020-11-20 at 15:03 -0800, Saeed Mahameed wrote:
> Hi,
> 
> This series includes trivial updates to mlx5 next branch
> 1) HW definition for upcoming features
> 2) Include files and general Cleanups
> 3) Add the upcoming BlueField-3 device ID
> 4) Define flow steering priority for VDPA
> 5) Export missing steering API for ULPs,
>    will be used later in VDPA driver, to create flow steering domain
> for
>    VDPA queues.
> 6) ECPF (Embedded CPU Physical function) minor improvements for
> BlueField.
> 

Series applied to mlx5-next without the VDPA patch.

Jakub, please let me know if you still have concerns regarding that
patch, i will eventually need to apply it, regardless of the outcome of
the RDMA vs ETH discussion, it is just how we configure the HW :).

Thanks,
Saeed.



^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-11-30 18:43 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 01/16] net/mlx5: Add sample offload hardware bits and structures Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 02/16] net/mlx5: Add sampler destination type Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 03/16] net/mlx5: Check dr mask size against mlx5_match_param size Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 04/16] net/mlx5: Add misc4 to mlx5_ifc_fte_match_param_bits Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 05/16] net/mlx5: Add ts_cqe_to_dest_cqn related bits Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 06/16] net/mlx5: Avoid exposing driver internal command helpers Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 07/16] net/mlx5: Update the list of the PCI supported devices Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 08/16] net/mlx5: Update the hardware interface definition for vhca state Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 09/16] net/mlx5: Expose IP-in-IP TX and RX capability bits Saeed Mahameed
2020-11-21 23:58   ` Jakub Kicinski
2020-11-22 15:17     ` Aya Levin
2020-11-23 21:15       ` Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 10/16] net/mlx5: Expose other function ifc bits Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace Saeed Mahameed
2020-11-22  0:01   ` Jakub Kicinski
2020-11-22  6:41     ` Eli Cohen
2020-11-24 17:12       ` Jakub Kicinski
2020-11-24 18:02         ` Jason Gunthorpe
2020-11-24 18:41           ` Jakub Kicinski
2020-11-24 19:44             ` Jason Gunthorpe
2020-11-25  6:19               ` Eli Cohen
2020-11-25 19:04                 ` Saeed Mahameed
2020-11-25 18:54               ` Jakub Kicinski
2020-11-25 19:28                 ` Saeed Mahameed
2020-11-25 21:22                 ` Jason Gunthorpe
2020-11-20 23:03 ` [PATCH mlx5-next 12/16] net/mlx5: Export steering related functions Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 13/16] net/mlx5: Make API mlx5_core_is_ecpf accept const pointer Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 14/16] net/mlx5: Rename peer_pf to host_pf Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 15/16] net/mlx5: Enable host PF HCA after eswitch is initialized Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 16/16] net/mlx5: Treat host PF vport as other (non eswitch manager) vport Saeed Mahameed
2020-11-30 18:42 ` [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).