linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rdma-next 0/6] Set flow_label and RoCEv2 UDP source port for datagram QP
@ 2020-03-18  9:52 Leon Romanovsky
  2020-03-18  9:52 ` [PATCH mlx5-next 1/6] net/mlx5: Enable SW-defined RoCEv2 UDP source port Leon Romanovsky
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Leon Romanovsky @ 2020-03-18  9:52 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, Maor Gottlieb, Mark Zhang, netdev,
	Saeed Mahameed

From: Leon Romanovsky <leonro@mellanox.com>

From Mark:

This series provide flow label and UDP source port definition in RoCE v2.
Those fields are used to create entropy for network routes (ECMP), load
balancers and 802.3ad link aggregation switching that are not aware of
RoCE headers.

Thanks.

Mark Zhang (6):
  net/mlx5: Enable SW-defined RoCEv2 UDP source port
  RDMA/core: Add hash functions to calculate RoCEv2 flowlabel and UDP
    source port
  RDMA/mlx5: Define RoCEv2 udp source port when set path
  RDMA/cma: Initialize the flow label of CM's route path record
  RDMA/cm: Set flow label of recv_wc based on primary flow label
  RDMA/mlx5: Set UDP source port based on the grh.flow_label

 drivers/infiniband/core/cm.c                  |  7 +++
 drivers/infiniband/core/cma.c                 | 23 ++++++++++
 drivers/infiniband/hw/mlx5/ah.c               | 21 ++++++++-
 drivers/infiniband/hw/mlx5/main.c             |  4 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |  4 +-
 drivers/infiniband/hw/mlx5/qp.c               | 30 ++++++++++---
 .../net/ethernet/mellanox/mlx5/core/main.c    | 39 ++++++++++++++++
 include/linux/mlx5/mlx5_ifc.h                 |  5 ++-
 include/rdma/ib_verbs.h                       | 44 +++++++++++++++++++
 9 files changed, 164 insertions(+), 13 deletions(-)

--
2.24.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH mlx5-next 1/6] net/mlx5: Enable SW-defined RoCEv2 UDP source port
  2020-03-18  9:52 [PATCH rdma-next 0/6] Set flow_label and RoCEv2 UDP source port for datagram QP Leon Romanovsky
@ 2020-03-18  9:52 ` Leon Romanovsky
  2020-03-18 23:33   ` Saeed Mahameed
  2020-03-18  9:52 ` [PATCH rdma-next 2/6] RDMA/core: Add hash functions to calculate RoCEv2 flowlabel and " Leon Romanovsky
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Leon Romanovsky @ 2020-03-18  9:52 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Mark Zhang, linux-rdma, Maor Gottlieb, netdev, Saeed Mahameed

From: Mark Zhang <markz@mellanox.com>

When this is enabled, UDP source port for RoCEv2 packets are defined
by software instead of firmware.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/main.c    | 39 +++++++++++++++++++
 include/linux/mlx5/mlx5_ifc.h                 |  5 ++-
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 6b38ec72215a..bdc73370297b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -585,6 +585,39 @@ static int handle_hca_cap(struct mlx5_core_dev *dev)
 	return err;
 }
 
+static int handle_hca_cap_roce(struct mlx5_core_dev *dev)
+{
+	int set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in);
+	void *set_hca_cap;
+	void *set_ctx;
+	int err;
+
+	if (!MLX5_CAP_GEN(dev, roce))
+		return 0;
+
+	err = mlx5_core_get_caps(dev, MLX5_CAP_ROCE);
+	if (err)
+		return err;
+
+	if (MLX5_CAP_ROCE(dev, sw_r_roce_src_udp_port) ||
+	    !MLX5_CAP_ROCE_MAX(dev, sw_r_roce_src_udp_port))
+		return 0;
+
+	set_ctx = kzalloc(set_sz, GFP_KERNEL);
+	if (!set_ctx)
+		return -ENOMEM;
+
+	set_hca_cap = MLX5_ADDR_OF(set_hca_cap_in, set_ctx, capability);
+	memcpy(set_hca_cap, dev->caps.hca_cur[MLX5_CAP_ROCE],
+	       MLX5_ST_SZ_BYTES(roce_cap));
+	MLX5_SET(roce_cap, set_hca_cap, sw_r_roce_src_udp_port, 1);
+
+	err = set_caps(dev, set_ctx, set_sz, MLX5_SET_HCA_CAP_OP_MOD_ROCE);
+
+	kfree(set_ctx);
+	return err;
+}
+
 static int set_hca_cap(struct mlx5_core_dev *dev)
 {
 	int err;
@@ -607,6 +640,12 @@ static int set_hca_cap(struct mlx5_core_dev *dev)
 		goto out;
 	}
 
+	err = handle_hca_cap_roce(dev);
+	if (err) {
+		mlx5_core_err(dev, "handle_hca_cap_roce failed\n");
+		goto out;
+	}
+
 out:
 	return err;
 }
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 208bf1127be7..bb217c3f30da 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -74,6 +74,7 @@ enum {
 	MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE        = 0x0,
 	MLX5_SET_HCA_CAP_OP_MOD_ODP                   = 0x2,
 	MLX5_SET_HCA_CAP_OP_MOD_ATOMIC                = 0x3,
+	MLX5_SET_HCA_CAP_OP_MOD_ROCE                  = 0x4,
 };
 
 enum {
@@ -902,7 +903,9 @@ struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
 
 struct mlx5_ifc_roce_cap_bits {
 	u8         roce_apm[0x1];
-	u8         reserved_at_1[0x1f];
+	u8         reserved_at_1[0x3];
+	u8         sw_r_roce_src_udp_port[0x1];
+	u8         reserved_at_5[0x1b];
 
 	u8         reserved_at_20[0x60];
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 2/6] RDMA/core: Add hash functions to calculate RoCEv2 flowlabel and UDP source port
  2020-03-18  9:52 [PATCH rdma-next 0/6] Set flow_label and RoCEv2 UDP source port for datagram QP Leon Romanovsky
  2020-03-18  9:52 ` [PATCH mlx5-next 1/6] net/mlx5: Enable SW-defined RoCEv2 UDP source port Leon Romanovsky
@ 2020-03-18  9:52 ` Leon Romanovsky
  2020-03-18  9:52 ` [PATCH rdma-next 3/6] RDMA/mlx5: Define RoCEv2 udp source port when set path Leon Romanovsky
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2020-03-18  9:52 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Mark Zhang, linux-rdma, Maor Gottlieb

From: Mark Zhang <markz@mellanox.com>

Add two hash functions to distribute RoCE v2 UDP source and Flowlabel
symmetrically. These are user visible API and any change in the
implementation needs to be tested for inter-operability between old and
new variant.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 include/rdma/ib_verbs.h | 44 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 60f9969b6d83..8763d4a06eb7 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4703,4 +4703,48 @@ static inline struct ib_device *rdma_device_to_ibdev(struct device *device)
 
 bool rdma_dev_access_netns(const struct ib_device *device,
 			   const struct net *net);
+
+#define IB_ROCE_UDP_ENCAP_VALID_PORT_MIN (0xC000)
+#define IB_GRH_FLOWLABEL_MASK (0x000FFFFF)
+
+/**
+ * rdma_flow_label_to_udp_sport - generate a RoCE v2 UDP src port value based
+ *                               on the flow_label
+ *
+ * This function will convert the 20 bit flow_label input to a valid RoCE v2
+ * UDP src port 14 bit value. All RoCE V2 drivers should use this same
+ * convention.
+ */
+static inline u16 rdma_flow_label_to_udp_sport(u32 fl)
+{
+	u32 fl_low = fl & 0x03fff, fl_high = fl & 0xFC000;
+
+	fl_low ^= fl_high >> 14;
+	return (u16)(fl_low | IB_ROCE_UDP_ENCAP_VALID_PORT_MIN);
+}
+
+/**
+ * rdma_calc_flow_label - generate a RDMA symmetric flow label value based on
+ *                        local and remote qpn values
+ *
+ * This function folded the multiplication results of two qpns, 24 bit each,
+ * fields, and converts it to a 20 bit results.
+ *
+ * This function will create symmetric flow_label value based on the local
+ * and remote qpn values. this will allow both the requester and responder
+ * to calculate the same flow_label for a given connection.
+ *
+ * This helper function should be used by driver in case the upper layer
+ * provide a zero flow_label value. This is to improve entropy of RDMA
+ * traffic in the network.
+ */
+static inline u32 rdma_calc_flow_label(u32 lqpn, u32 rqpn)
+{
+	u64 v = (u64)lqpn * rqpn;
+
+	v ^= v >> 20;
+	v ^= v >> 40;
+
+	return (u32)(v & IB_GRH_FLOWLABEL_MASK);
+}
 #endif /* IB_VERBS_H */
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 3/6] RDMA/mlx5: Define RoCEv2 udp source port when set path
  2020-03-18  9:52 [PATCH rdma-next 0/6] Set flow_label and RoCEv2 UDP source port for datagram QP Leon Romanovsky
  2020-03-18  9:52 ` [PATCH mlx5-next 1/6] net/mlx5: Enable SW-defined RoCEv2 UDP source port Leon Romanovsky
  2020-03-18  9:52 ` [PATCH rdma-next 2/6] RDMA/core: Add hash functions to calculate RoCEv2 flowlabel and " Leon Romanovsky
@ 2020-03-18  9:52 ` Leon Romanovsky
  2020-03-18  9:52 ` [PATCH rdma-next 4/6] RDMA/cma: Initialize the flow label of CM's route path record Leon Romanovsky
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2020-03-18  9:52 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Mark Zhang, linux-rdma, Maor Gottlieb

From: Mark Zhang <markz@mellanox.com>

Calculate and set UDP source port based on the flow label. If flow label is
not defined in GRH then calculate it based on lqpn/rqpn.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/qp.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 9c2f0cf63d1b..d3055f3eb0b6 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -2954,6 +2954,21 @@ static int modify_raw_packet_tx_affinity(struct mlx5_core_dev *dev,
 	return err;
 }
 
+static void mlx5_set_path_udp_sport(struct mlx5_qp_path *path,
+				    const struct rdma_ah_attr *ah,
+				    u32 lqpn, u32 rqpn)
+
+{
+	u32 fl = ah->grh.flow_label;
+	u16 sport;
+
+	if (!fl)
+		fl = rdma_calc_flow_label(lqpn, rqpn);
+
+	sport = rdma_flow_label_to_udp_sport(fl);
+	path->udp_sport = cpu_to_be16(sport);
+}
+
 static int mlx5_set_path(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
 			 const struct rdma_ah_attr *ah,
 			 struct mlx5_qp_path *path, u8 port, int attr_mask,
@@ -2985,12 +3000,15 @@ static int mlx5_set_path(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
 			return -EINVAL;
 
 		memcpy(path->rmac, ah->roce.dmac, sizeof(ah->roce.dmac));
-		if (qp->ibqp.qp_type == IB_QPT_RC ||
-		    qp->ibqp.qp_type == IB_QPT_UC ||
-		    qp->ibqp.qp_type == IB_QPT_XRC_INI ||
-		    qp->ibqp.qp_type == IB_QPT_XRC_TGT)
-			path->udp_sport =
-				mlx5_get_roce_udp_sport(dev, ah->grh.sgid_attr);
+		if ((qp->ibqp.qp_type == IB_QPT_RC ||
+		     qp->ibqp.qp_type == IB_QPT_UC ||
+		     qp->ibqp.qp_type == IB_QPT_XRC_INI ||
+		     qp->ibqp.qp_type == IB_QPT_XRC_TGT) &&
+		    (grh->sgid_attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) &&
+		    (attr_mask & IB_QP_DEST_QPN))
+			mlx5_set_path_udp_sport(path, ah,
+						qp->ibqp.qp_num,
+						attr->dest_qp_num);
 		path->dci_cfi_prio_sl = (sl & 0x7) << 4;
 		gid_type = ah->grh.sgid_attr->gid_type;
 		if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 4/6] RDMA/cma: Initialize the flow label of CM's route path record
  2020-03-18  9:52 [PATCH rdma-next 0/6] Set flow_label and RoCEv2 UDP source port for datagram QP Leon Romanovsky
                   ` (2 preceding siblings ...)
  2020-03-18  9:52 ` [PATCH rdma-next 3/6] RDMA/mlx5: Define RoCEv2 udp source port when set path Leon Romanovsky
@ 2020-03-18  9:52 ` Leon Romanovsky
  2020-03-18  9:52 ` [PATCH rdma-next 5/6] RDMA/cm: Set flow label of recv_wc based on primary flow label Leon Romanovsky
  2020-03-18  9:53 ` [PATCH rdma-next 6/6] RDMA/mlx5: Set UDP source port based on the grh.flow_label Leon Romanovsky
  5 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2020-03-18  9:52 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Mark Zhang, linux-rdma, Maor Gottlieb

From: Mark Zhang <markz@mellanox.com>

If flow label is not set by the user or it's not IPv4, initialize it with
the cma src/dst based on the "Kernighan and Ritchie's hash function".

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cma.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index a051cc169e9c..8924b2f8e299 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2910,6 +2910,24 @@ static int iboe_tos_to_sl(struct net_device *ndev, int tos)
 		return 0;
 }
 
+static __be32 cma_get_roce_udp_flow_label(struct rdma_id_private *id_priv)
+{
+	struct sockaddr_in6 *addr6;
+	u16 dport, sport;
+	u32 hash, fl;
+
+	addr6 = (struct sockaddr_in6 *)cma_src_addr(id_priv);
+	fl = be32_to_cpu(addr6->sin6_flowinfo) & IB_GRH_FLOWLABEL_MASK;
+	if ((cma_family(id_priv) != AF_INET6) || !fl) {
+		dport = be16_to_cpu(cma_port(cma_dst_addr(id_priv)));
+		sport = be16_to_cpu(cma_port(cma_src_addr(id_priv)));
+		hash = (u32)sport * 31 + dport;
+		fl = hash & IB_GRH_FLOWLABEL_MASK;
+	}
+
+	return cpu_to_be32(fl);
+}
+
 static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
 {
 	struct rdma_route *route = &id_priv->id.route;
@@ -2976,6 +2994,11 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
 		goto err2;
 	}
 
+	if (rdma_protocol_roce_udp_encap(id_priv->id.device,
+					 id_priv->id.port_num))
+		route->path_rec->flow_label =
+			cma_get_roce_udp_flow_label(id_priv);
+
 	cma_init_resolve_route_work(work, id_priv);
 	queue_work(cma_wq, &work->work);
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 5/6] RDMA/cm: Set flow label of recv_wc based on primary flow label
  2020-03-18  9:52 [PATCH rdma-next 0/6] Set flow_label and RoCEv2 UDP source port for datagram QP Leon Romanovsky
                   ` (3 preceding siblings ...)
  2020-03-18  9:52 ` [PATCH rdma-next 4/6] RDMA/cma: Initialize the flow label of CM's route path record Leon Romanovsky
@ 2020-03-18  9:52 ` Leon Romanovsky
  2020-03-18  9:53 ` [PATCH rdma-next 6/6] RDMA/mlx5: Set UDP source port based on the grh.flow_label Leon Romanovsky
  5 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2020-03-18  9:52 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Mark Zhang, linux-rdma, Maor Gottlieb

From: Mark Zhang <markz@mellanox.com>

In the request handler of the response side, Set flow label of the
recv_wc if it is not net. It will be used for all messages sent
by the responder.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cm.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index bbbfa77dbce7..4ab2f71da522 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -2039,6 +2039,7 @@ static int cm_req_handler(struct cm_work *work)
 	struct cm_req_msg *req_msg;
 	const struct ib_global_route *grh;
 	const struct ib_gid_attr *gid_attr;
+	struct ib_grh *ibgrh;
 	int ret;
 
 	req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
@@ -2048,6 +2049,12 @@ static int cm_req_handler(struct cm_work *work)
 	if (IS_ERR(cm_id_priv))
 		return PTR_ERR(cm_id_priv);
 
+	ibgrh = work->mad_recv_wc->recv_buf.grh;
+	if (!(be32_to_cpu(ibgrh->version_tclass_flow) & IB_GRH_FLOWLABEL_MASK))
+		ibgrh->version_tclass_flow |=
+			cpu_to_be32(IBA_GET(CM_REQ_PRIMARY_FLOW_LABEL,
+					    req_msg));
+
 	cm_id_priv->id.remote_id =
 		cpu_to_be32(IBA_GET(CM_REQ_LOCAL_COMM_ID, req_msg));
 	cm_id_priv->id.service_id =
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 6/6] RDMA/mlx5: Set UDP source port based on the grh.flow_label
  2020-03-18  9:52 [PATCH rdma-next 0/6] Set flow_label and RoCEv2 UDP source port for datagram QP Leon Romanovsky
                   ` (4 preceding siblings ...)
  2020-03-18  9:52 ` [PATCH rdma-next 5/6] RDMA/cm: Set flow label of recv_wc based on primary flow label Leon Romanovsky
@ 2020-03-18  9:53 ` Leon Romanovsky
  5 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2020-03-18  9:53 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Mark Zhang, linux-rdma, Maor Gottlieb

From: Mark Zhang <markz@mellanox.com>

Calculate UDP source port based on the grh.flow_label. If grh.flow_label
is not valid, we will use minimal supported UDP source port.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/ah.c      | 21 +++++++++++++++++++--
 drivers/infiniband/hw/mlx5/main.c    |  4 ++--
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  4 ++--
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c
index 14ad05e7c5bf..5acf1bfb73fe 100644
--- a/drivers/infiniband/hw/mlx5/ah.c
+++ b/drivers/infiniband/hw/mlx5/ah.c
@@ -32,6 +32,24 @@
 
 #include "mlx5_ib.h"
 
+static __be16 mlx5_ah_get_udp_sport(const struct mlx5_ib_dev *dev,
+				  const struct rdma_ah_attr *ah_attr)
+{
+	enum ib_gid_type gid_type = ah_attr->grh.sgid_attr->gid_type;
+	__be16 sport;
+
+	if ((gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) &&
+	    (rdma_ah_get_ah_flags(ah_attr) & IB_AH_GRH) &&
+	    (ah_attr->grh.flow_label & IB_GRH_FLOWLABEL_MASK))
+		sport = cpu_to_be16(
+			rdma_flow_label_to_udp_sport(ah_attr->grh.flow_label));
+	else
+		sport = mlx5_get_roce_udp_sport_min(dev,
+						    ah_attr->grh.sgid_attr);
+
+	return sport;
+}
+
 static void create_ib_ah(struct mlx5_ib_dev *dev, struct mlx5_ib_ah *ah,
 			 struct rdma_ah_attr *ah_attr)
 {
@@ -59,8 +77,7 @@ static void create_ib_ah(struct mlx5_ib_dev *dev, struct mlx5_ib_ah *ah,
 
 		memcpy(ah->av.rmac, ah_attr->roce.dmac,
 		       sizeof(ah_attr->roce.dmac));
-		ah->av.udp_sport =
-			mlx5_get_roce_udp_sport(dev, ah_attr->grh.sgid_attr);
+		ah->av.udp_sport = mlx5_ah_get_udp_sport(dev, ah_attr);
 		ah->av.stat_rate_sl |= (rdma_ah_get_sl(ah_attr) & 0x7) << 1;
 		if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP)
 #define MLX5_ECN_ENABLED BIT(1)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index d57ebdba027e..66cd417f5d09 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -631,8 +631,8 @@ static int mlx5_ib_del_gid(const struct ib_gid_attr *attr,
 			     attr->index, NULL, NULL);
 }
 
-__be16 mlx5_get_roce_udp_sport(struct mlx5_ib_dev *dev,
-			       const struct ib_gid_attr *attr)
+__be16 mlx5_get_roce_udp_sport_min(const struct mlx5_ib_dev *dev,
+				   const struct ib_gid_attr *attr)
 {
 	if (attr->gid_type != IB_GID_TYPE_ROCE_UDP_ENCAP)
 		return 0;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 7d3e4e4942e9..85d4f3958e32 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1362,8 +1362,8 @@ int mlx5_ib_get_vf_guid(struct ib_device *device, int vf, u8 port,
 int mlx5_ib_set_vf_guid(struct ib_device *device, int vf, u8 port,
 			u64 guid, int type);
 
-__be16 mlx5_get_roce_udp_sport(struct mlx5_ib_dev *dev,
-			       const struct ib_gid_attr *attr);
+__be16 mlx5_get_roce_udp_sport_min(const struct mlx5_ib_dev *dev,
+				   const struct ib_gid_attr *attr);
 
 void mlx5_ib_cleanup_cong_debugfs(struct mlx5_ib_dev *dev, u8 port_num);
 void mlx5_ib_init_cong_debugfs(struct mlx5_ib_dev *dev, u8 port_num);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH mlx5-next 1/6] net/mlx5: Enable SW-defined RoCEv2 UDP source port
  2020-03-18  9:52 ` [PATCH mlx5-next 1/6] net/mlx5: Enable SW-defined RoCEv2 UDP source port Leon Romanovsky
@ 2020-03-18 23:33   ` Saeed Mahameed
  2020-03-19  6:05     ` Leon Romanovsky
  0 siblings, 1 reply; 10+ messages in thread
From: Saeed Mahameed @ 2020-03-18 23:33 UTC (permalink / raw)
  To: Jason Gunthorpe, leon, dledford
  Cc: Mark Zhang, Maor Gottlieb, netdev, linux-rdma

On Wed, 2020-03-18 at 11:52 +0200, Leon Romanovsky wrote:
> From: Mark Zhang <markz@mellanox.com>
> 
> When this is enabled, UDP source port for RoCEv2 packets are defined
> by software instead of firmware.
> 
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
>  .../net/ethernet/mellanox/mlx5/core/main.c    | 39
> +++++++++++++++++++
>  include/linux/mlx5/mlx5_ifc.h                 |  5 ++-
>  2 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index 6b38ec72215a..bdc73370297b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -585,6 +585,39 @@ static int handle_hca_cap(struct mlx5_core_dev
> *dev)
>  	return err;
>  }
>  
> +static int handle_hca_cap_roce(struct mlx5_core_dev *dev)
> +{
> +	int set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in);
> +	void *set_hca_cap;
> +	void *set_ctx;
> +	int err;
> +
> +	if (!MLX5_CAP_GEN(dev, roce))
> +		return 0;
> +
> +	err = mlx5_core_get_caps(dev, MLX5_CAP_ROCE);
> +	if (err)
> +		return err;
> +
> +	if (MLX5_CAP_ROCE(dev, sw_r_roce_src_udp_port) ||
> +	    !MLX5_CAP_ROCE_MAX(dev, sw_r_roce_src_udp_port))
> +		return 0;
> +
> +	set_ctx = kzalloc(set_sz, GFP_KERNEL);
> +	if (!set_ctx)
> +		return -ENOMEM;
> +

all the sisters of this function allocate this and free it
consecutively, why not allocate it from outside once, pass it to all
handle_hca_cap_xyz functions, each one will memset it and reuse it.
see below.

> +	set_hca_cap = MLX5_ADDR_OF(set_hca_cap_in, set_ctx,
> capability);
> +	memcpy(set_hca_cap, dev->caps.hca_cur[MLX5_CAP_ROCE],
> +	       MLX5_ST_SZ_BYTES(roce_cap));
> +	MLX5_SET(roce_cap, set_hca_cap, sw_r_roce_src_udp_port, 1);
> +
> +	err = set_caps(dev, set_ctx, set_sz,
> MLX5_SET_HCA_CAP_OP_MOD_ROCE);
> +

Do we really need to fail the whole driver if we just try to set a non
mandatory cap ?

> +	kfree(set_ctx);
> +	return err;
> +}
> +
>  static int set_hca_cap(struct mlx5_core_dev *dev)
>  {
>  	int err;

let's allocate the set_ctx in this parent function and pass it to all
hca cap handlers;

set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in);
set_ctx = kzalloc(set_sz, GFP_KERNEL);

> @@ -607,6 +640,12 @@ static int set_hca_cap(struct mlx5_core_dev
> *dev)
>  		goto out;
>  	}
>  
> +	err = handle_hca_cap_roce(dev);
> +	if (err) {
> +		mlx5_core_err(dev, "handle_hca_cap_roce failed\n");
> +		goto out;
> +	}
> +
>  out:
>  	return err;
>  }
> diff --git a/include/linux/mlx5/mlx5_ifc.h
> b/include/linux/mlx5/mlx5_ifc.h
> index 208bf1127be7..bb217c3f30da 100644
> --- a/include/linux/mlx5/mlx5_ifc.h
> +++ b/include/linux/mlx5/mlx5_ifc.h
> @@ -74,6 +74,7 @@ enum {
>  	MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE        = 0x0,
>  	MLX5_SET_HCA_CAP_OP_MOD_ODP                   = 0x2,
>  	MLX5_SET_HCA_CAP_OP_MOD_ATOMIC                = 0x3,
> +	MLX5_SET_HCA_CAP_OP_MOD_ROCE                  = 0x4,
>  };
>  
>  enum {
> @@ -902,7 +903,9 @@ struct
> mlx5_ifc_per_protocol_networking_offload_caps_bits {
>  
>  struct mlx5_ifc_roce_cap_bits {
>  	u8         roce_apm[0x1];
> -	u8         reserved_at_1[0x1f];
> +	u8         reserved_at_1[0x3];
> +	u8         sw_r_roce_src_udp_port[0x1];
> +	u8         reserved_at_5[0x1b];
>  
>  	u8         reserved_at_20[0x60];
>  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH mlx5-next 1/6] net/mlx5: Enable SW-defined RoCEv2 UDP source port
  2020-03-18 23:33   ` Saeed Mahameed
@ 2020-03-19  6:05     ` Leon Romanovsky
  2020-03-20  1:16       ` Saeed Mahameed
  0 siblings, 1 reply; 10+ messages in thread
From: Leon Romanovsky @ 2020-03-19  6:05 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Jason Gunthorpe, dledford, Mark Zhang, Maor Gottlieb, netdev, linux-rdma

On Wed, Mar 18, 2020 at 11:33:46PM +0000, Saeed Mahameed wrote:
> On Wed, 2020-03-18 at 11:52 +0200, Leon Romanovsky wrote:
> > From: Mark Zhang <markz@mellanox.com>
> >
> > When this is enabled, UDP source port for RoCEv2 packets are defined
> > by software instead of firmware.
> >
> > Signed-off-by: Mark Zhang <markz@mellanox.com>
> > Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > ---
> >  .../net/ethernet/mellanox/mlx5/core/main.c    | 39
> > +++++++++++++++++++
> >  include/linux/mlx5/mlx5_ifc.h                 |  5 ++-
> >  2 files changed, 43 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > index 6b38ec72215a..bdc73370297b 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > @@ -585,6 +585,39 @@ static int handle_hca_cap(struct mlx5_core_dev
> > *dev)
> >  	return err;
> >  }
> >
> > +static int handle_hca_cap_roce(struct mlx5_core_dev *dev)
> > +{
> > +	int set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in);
> > +	void *set_hca_cap;
> > +	void *set_ctx;
> > +	int err;
> > +
> > +	if (!MLX5_CAP_GEN(dev, roce))
> > +		return 0;
> > +
> > +	err = mlx5_core_get_caps(dev, MLX5_CAP_ROCE);
> > +	if (err)
> > +		return err;
> > +
> > +	if (MLX5_CAP_ROCE(dev, sw_r_roce_src_udp_port) ||
> > +	    !MLX5_CAP_ROCE_MAX(dev, sw_r_roce_src_udp_port))
> > +		return 0;
> > +
> > +	set_ctx = kzalloc(set_sz, GFP_KERNEL);
> > +	if (!set_ctx)
> > +		return -ENOMEM;
> > +
>
> all the sisters of this function allocate this and free it
> consecutively, why not allocate it from outside once, pass it to all
> handle_hca_cap_xyz functions, each one will memset it and reuse it.
> see below.

Agree, I'll do it.

>
> > +	set_hca_cap = MLX5_ADDR_OF(set_hca_cap_in, set_ctx,
> > capability);
> > +	memcpy(set_hca_cap, dev->caps.hca_cur[MLX5_CAP_ROCE],
> > +	       MLX5_ST_SZ_BYTES(roce_cap));
> > +	MLX5_SET(roce_cap, set_hca_cap, sw_r_roce_src_udp_port, 1);
> > +
> > +	err = set_caps(dev, set_ctx, set_sz,
> > MLX5_SET_HCA_CAP_OP_MOD_ROCE);
> > +
>
> Do we really need to fail the whole driver if we just try to set a non
> mandatory cap ?

It is less important what caused to failure, but the fact that basic
mlx5_cmd_exec() failed during initialization flow. I think that it
is bad enough to stop the driver, because its operation is going to
be unreliable.

Please share your end-result decision on that and I'll align to it.

>
> > +	kfree(set_ctx);
> > +	return err;
> > +}
> > +
> >  static int set_hca_cap(struct mlx5_core_dev *dev)
> >  {
> >  	int err;
>
> let's allocate the set_ctx in this parent function and pass it to all
> hca cap handlers;
>
> set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in);
> set_ctx = kzalloc(set_sz, GFP_KERNEL);

I'm doing it now.

Thanks

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH mlx5-next 1/6] net/mlx5: Enable SW-defined RoCEv2 UDP source port
  2020-03-19  6:05     ` Leon Romanovsky
@ 2020-03-20  1:16       ` Saeed Mahameed
  0 siblings, 0 replies; 10+ messages in thread
From: Saeed Mahameed @ 2020-03-20  1:16 UTC (permalink / raw)
  To: leon
  Cc: Jason Gunthorpe, Mark Zhang, Maor Gottlieb, netdev, linux-rdma, dledford

On Thu, 2020-03-19 at 08:05 +0200, Leon Romanovsky wrote:
> On Wed, Mar 18, 2020 at 11:33:46PM +0000, Saeed Mahameed wrote:
> > On Wed, 2020-03-18 at 11:52 +0200, Leon Romanovsky wrote:
> > > From: Mark Zhang <markz@mellanox.com>
> > > 
> > > When this is enabled, UDP source port for RoCEv2 packets are
> > > defined
> > > by software instead of firmware.
> > > 
> > > Signed-off-by: Mark Zhang <markz@mellanox.com>
> > > Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
> > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > > ---
> > >  .../net/ethernet/mellanox/mlx5/core/main.c    | 39
> > > +++++++++++++++++++
> > >  include/linux/mlx5/mlx5_ifc.h                 |  5 ++-
> > >  2 files changed, 43 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > index 6b38ec72215a..bdc73370297b 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > @@ -585,6 +585,39 @@ static int handle_hca_cap(struct
> > > mlx5_core_dev
> > > *dev)
> > >  	return err;
> > >  }
> > > 
> > > +static int handle_hca_cap_roce(struct mlx5_core_dev *dev)
> > > +{
> > > +	int set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in);
> > > +	void *set_hca_cap;
> > > +	void *set_ctx;
> > > +	int err;
> > > +
> > > +	if (!MLX5_CAP_GEN(dev, roce))
> > > +		return 0;
> > > +
> > > +	err = mlx5_core_get_caps(dev, MLX5_CAP_ROCE);
> > > +	if (err)
> > > +		return err;
> > > +
> > > +	if (MLX5_CAP_ROCE(dev, sw_r_roce_src_udp_port) ||
> > > +	    !MLX5_CAP_ROCE_MAX(dev, sw_r_roce_src_udp_port))
> > > +		return 0;
> > > +
> > > +	set_ctx = kzalloc(set_sz, GFP_KERNEL);
> > > +	if (!set_ctx)
> > > +		return -ENOMEM;
> > > +
> > 
> > all the sisters of this function allocate this and free it
> > consecutively, why not allocate it from outside once, pass it to
> > all
> > handle_hca_cap_xyz functions, each one will memset it and reuse it.
> > see below.
> 
> Agree, I'll do it.
> 
> > > +	set_hca_cap = MLX5_ADDR_OF(set_hca_cap_in, set_ctx,
> > > capability);
> > > +	memcpy(set_hca_cap, dev->caps.hca_cur[MLX5_CAP_ROCE],
> > > +	       MLX5_ST_SZ_BYTES(roce_cap));
> > > +	MLX5_SET(roce_cap, set_hca_cap, sw_r_roce_src_udp_port, 1);
> > > +
> > > +	err = set_caps(dev, set_ctx, set_sz,
> > > MLX5_SET_HCA_CAP_OP_MOD_ROCE);
> > > +
> > 
> > Do we really need to fail the whole driver if we just try to set a
> > non
> > mandatory cap ?
> 
> It is less important what caused to failure, but the fact that basic
> mlx5_cmd_exec() failed during initialization flow. I think that it
> is bad enough to stop the driver, because its operation is going to
> be unreliable.
> 
> Please share your end-result decision on that and I'll align to it.
> 

driver stability and reliability is not affected by this failing, since
design-wise we don't count on setting the caps on this stage, we query
them anyway in the next stages of the driver load.

Many reason this could fail, old FW that doesn't handle this new CAP
properly, new FW which has a bug only in the new feature flow.
The driver should be resilient and provide basic functionality or in
this case just drop this feature, since next cap query of this feature
will return 0, and driver will not try to enable this feature anyway.

if it is something really fundamental that caused the issue, then just
let it be, if we fail in a more advanced mandatory stage then we will
fail on that stage, if we didn't, then it is a win win.


> > > +	kfree(set_ctx);
> > > +	return err;
> > > +}
> > > +
> > >  static int set_hca_cap(struct mlx5_core_dev *dev)
> > >  {
> > >  	int err;
> > 
> > let's allocate the set_ctx in this parent function and pass it to
> > all
> > hca cap handlers;
> > 
> > set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in);
> > set_ctx = kzalloc(set_sz, GFP_KERNEL);
> 
> I'm doing it now.
> 

Awesome, Thanks !

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-03-20  1:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-18  9:52 [PATCH rdma-next 0/6] Set flow_label and RoCEv2 UDP source port for datagram QP Leon Romanovsky
2020-03-18  9:52 ` [PATCH mlx5-next 1/6] net/mlx5: Enable SW-defined RoCEv2 UDP source port Leon Romanovsky
2020-03-18 23:33   ` Saeed Mahameed
2020-03-19  6:05     ` Leon Romanovsky
2020-03-20  1:16       ` Saeed Mahameed
2020-03-18  9:52 ` [PATCH rdma-next 2/6] RDMA/core: Add hash functions to calculate RoCEv2 flowlabel and " Leon Romanovsky
2020-03-18  9:52 ` [PATCH rdma-next 3/6] RDMA/mlx5: Define RoCEv2 udp source port when set path Leon Romanovsky
2020-03-18  9:52 ` [PATCH rdma-next 4/6] RDMA/cma: Initialize the flow label of CM's route path record Leon Romanovsky
2020-03-18  9:52 ` [PATCH rdma-next 5/6] RDMA/cm: Set flow label of recv_wc based on primary flow label Leon Romanovsky
2020-03-18  9:53 ` [PATCH rdma-next 6/6] RDMA/mlx5: Set UDP source port based on the grh.flow_label Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).