Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH for-rc 0/4] Few more rc fixes
@ 2019-10-25 19:58 Dennis Dalessandro
  2019-10-25 19:58 ` [PATCH for-rc 1/4] IB/hfi1: Allow for all speeds higher than gen3 Dennis Dalessandro
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Dennis Dalessandro @ 2019-10-25 19:58 UTC (permalink / raw)
  To: jgg, dledford; +Cc: linux-rdma

Here we have 4 more bug fixing patches. These are all marked stable as well so
it would be good to get these in to 5.4 if possible.

The allow for speeds patch is not a "feature" it is a bug that causes problems
if the host comes up in gen4. Is very small. The other 3 from Kaike fix TID RDMA
bugs. A few more lines of code here but all relegated to TID RDMA.

---

James Erwin (1):
      IB/hfi1: Allow for all speeds higher than gen3

Kaike Wan (3):
      IB/hfi1: Ensure r_tid_ack is valid before building TID RDMA ACK packet
      IB/hfi1: Calculate flow weight based on QP MTU for TID RDMA
      IB/hfi1: TID RDMA WRITE should not return IB_WC_RNR_RETRY_EXC_ERR


 drivers/infiniband/hw/hfi1/init.c     |    1 -
 drivers/infiniband/hw/hfi1/pcie.c     |    4 ++
 drivers/infiniband/hw/hfi1/rc.c       |   16 +++++----
 drivers/infiniband/hw/hfi1/tid_rdma.c |   57 +++++++++++++++++++--------------
 drivers/infiniband/hw/hfi1/tid_rdma.h |    3 +-
 5 files changed, 44 insertions(+), 37 deletions(-)

--
-Denny

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH for-rc 1/4] IB/hfi1: Allow for all speeds higher than gen3
  2019-10-25 19:58 [PATCH for-rc 0/4] Few more rc fixes Dennis Dalessandro
@ 2019-10-25 19:58 ` Dennis Dalessandro
  2019-10-29 19:52   ` Jason Gunthorpe
  2019-10-25 19:58 ` [PATCH for-rc 2/4] IB/hfi1: Ensure r_tid_ack is valid before building TID RDMA ACK packet Dennis Dalessandro
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Dennis Dalessandro @ 2019-10-25 19:58 UTC (permalink / raw)
  To: jgg, dledford
  Cc: linux-rdma, Mike Marciniszyn, Kaike Wan, stable, James Erwin

From: James Erwin <james.erwin@intel.com>

The driver avoids the gen3 speed bump when the parent
bus speed isn't identical to gen3, 8.0GT/s.  This is not
compatible with gen4 and newer speeds.

Fix by relaxing the test to explicitly look for the lower
capability speeds which inherently allows for all future speeds.

Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: James Erwin <james.erwin@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
---
 drivers/infiniband/hw/hfi1/pcie.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hfi1/pcie.c b/drivers/infiniband/hw/hfi1/pcie.c
index 61aa550..61362bd 100644
--- a/drivers/infiniband/hw/hfi1/pcie.c
+++ b/drivers/infiniband/hw/hfi1/pcie.c
@@ -319,7 +319,9 @@ int pcie_speeds(struct hfi1_devdata *dd)
 	/*
 	 * bus->max_bus_speed is set from the bridge's linkcap Max Link Speed
 	 */
-	if (parent && dd->pcidev->bus->max_bus_speed != PCIE_SPEED_8_0GT) {
+	if (parent &&
+	    (dd->pcidev->bus->max_bus_speed == PCIE_SPEED_2_5GT ||
+	     dd->pcidev->bus->max_bus_speed == PCIE_SPEED_5_0GT)) {
 		dd_dev_info(dd, "Parent PCIe bridge does not support Gen3\n");
 		dd->link_gen3_capable = 0;
 	}


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH for-rc 2/4] IB/hfi1: Ensure r_tid_ack is valid before building TID RDMA ACK packet
  2019-10-25 19:58 [PATCH for-rc 0/4] Few more rc fixes Dennis Dalessandro
  2019-10-25 19:58 ` [PATCH for-rc 1/4] IB/hfi1: Allow for all speeds higher than gen3 Dennis Dalessandro
@ 2019-10-25 19:58 ` Dennis Dalessandro
  2019-10-25 19:58 ` [PATCH for-rc 3/4] IB/hfi1: Calculate flow weight based on QP MTU for TID RDMA Dennis Dalessandro
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Dennis Dalessandro @ 2019-10-25 19:58 UTC (permalink / raw)
  To: jgg, dledford; +Cc: linux-rdma, Mike Marciniszyn, stable, Kaike Wan

From: Kaike Wan <kaike.wan@intel.com>

The index r_tid_ack is used to indicate the next TID RDMA WRITE request
to acknowledge in the ring s_ack_queue[] on the responder side and
should be set to a valid index other than its initial value before
r_tid_tail is advanced to the next TID RDMA WRITE request and
particularly before a TID RDMA ACK is built. Otherwise, a NULL pointer
dereference may result:

[1372301.686240] BUG: unable to handle kernel paging request at ffff9a32d27abff8
[1372301.686270] IP: [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
[1372301.686272] PGD 2749032067 PUD 0
[1372301.686273] Oops: 0000 1 SMP
[1372301.686306] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ib_ipoib(OE) hfi1(OE) rdmavt(OE) nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod target_core_mod ib_ucm dm_mirror dm_region_hash dm_log mlx5_ib dm_mod zfs(POE) rpcrdma sunrpc rdma_ucm ib_uverbs opa_vnic ib_iser zunicode(POE) ib_umad zavl(POE) icp(POE) sb_edac intel_powerclamp coretemp rdma_cm intel_rapl iosf_mbi iw_cm libiscsi scsi_transport_iscsi kvm ib_cm iTCO_wdt mxm_wmi iTCO_vendor_support irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd zcommon(POE) znvpair(POE) pcspkr spl(OE) mei_me
[1372301.686322] sg mei ioatdma lpc_ich joydev i2c_i801 shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 mlx5_core drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ixgbe ahci ttm mlxfw ib_core libahci devlink mdio crct10dif_pclmul crct10dif_common drm ptp libata megaraid_sas crc32c_intel i2c_algo_bit pps_core i2c_core dca [last unloaded: rdmavt]
[1372301.686326] CPU: 15 PID: 68691 Comm: kworker/15:2H Kdump: loaded Tainted: P W OE ------------ 3.10.0-862.2.3.el7_lustre.x86_64 #1
[1372301.686327] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
[1372301.686341] Workqueue: hfi0_0 _hfi1_do_tid_send [hfi1]
[1372301.686342] task: ffff9a01f47faf70 ti: ffff9a11776a8000 task.ti: ffff9a11776a8000
[1372301.686355] RIP: 0010:[<ffffffffc0d87ea6>] [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
[1372301.686356] RSP: 0018:ffff9a11776abd08 EFLAGS: 00010002
[1372301.686357] RAX: ffff9a32d27abfc0 RBX: ffff99f2d27aa000 RCX: 00000000ffffffff
[1372301.686358] RDX: 0000000000000000 RSI: 0000000000000220 RDI: ffff99f2ffc05300
[1372301.686359] RBP: ffff9a11776abd88 R08: 000000000001c310 R09: ffffffffc0d87ad4
[1372301.686359] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9a117a423c00
[1372301.686360] R13: ffff9a117a423c00 R14: ffff9a03500c0000 R15: ffff9a117a423cb8
[1372301.686361] FS: 0000000000000000(0000) GS:ffff9a117e9c0000(0000) knlGS:0000000000000000
[1372301.686362] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1372301.686363] CR2: ffff9a32d27abff8 CR3: 0000002748a0e000 CR4: 00000000001607e0
[1372301.686363] Call Trace:
[1372301.686378] [<ffffffffc0d88874>] _hfi1_do_tid_send+0x194/0x320 [hfi1]
[1372301.686383] [<ffffffffaf0b2dff>] process_one_work+0x17f/0x440
[1372301.686385] [<ffffffffaf0b3ac6>] worker_thread+0x126/0x3c0
[1372301.686386] [<ffffffffaf0b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
[1372301.686389] [<ffffffffaf0bae31>] kthread+0xd1/0xe0
[1372301.686391] [<ffffffffaf0bad60>] ? insert_kthread_work+0x40/0x40
[1372301.686394] [<ffffffffaf71f5f7>] ret_from_fork_nospec_begin+0x21/0x21
[1372301.686396] [<ffffffffaf0bad60>] ? insert_kthread_work+0x40/0x40
[1372301.686398] hfi1 0000:05:00.0: hfi1_0: reserved_op: opcode 0xf2, slot 2, rsv_used 1, rsv_ops 1
[1372301.686414] Code: 00 00 41 8b 8d d8 02 00 00 89 c8 48 89 45 b0 48 c1 65 b0 06 48 8b 83 a0 01 00 00 48 01 45 b0 48 8b 45 b0 41 80 bd 10 03 00 00 00 <48> 8b 50 38 4c 8d 7a 50 74 45 8b b2 d0 00 00 00 85 f6 0f 85 72
[1372301.686426] RIP [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
[1372301.686427] RSP <ffff9a11776abd08>
[1372301.686427] CR2: ffff9a32d27abff8

This problem can happen if a RESYNC request is received before
r_tid_ack is modified.

This patch fixes the issue by making sure that r_tid_ack is set to a
valid value before a TID RDMA ACK is built. Functions are defined to
simplify the code.

Fixes: 07b923701e38 ("IB/hfi1: Add functions to receive TID RDMA WRITE request")
Fixes: 7cf0ad679de4 ("IB/hfi1: Add a function to receive TID RDMA RESYNC packet")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
---
 drivers/infiniband/hw/hfi1/tid_rdma.c |   44 ++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/tid_rdma.c b/drivers/infiniband/hw/hfi1/tid_rdma.c
index f21fca3..73bc78b 100644
--- a/drivers/infiniband/hw/hfi1/tid_rdma.c
+++ b/drivers/infiniband/hw/hfi1/tid_rdma.c
@@ -136,6 +136,26 @@ static void update_r_next_psn_fecn(struct hfi1_packet *packet,
 				   struct tid_rdma_flow *flow,
 				   bool fecn);
 
+static void validate_r_tid_ack(struct hfi1_qp_priv *priv)
+{
+	if (priv->r_tid_ack == HFI1_QP_WQE_INVALID)
+		priv->r_tid_ack = priv->r_tid_tail;
+}
+
+static void tid_rdma_schedule_ack(struct rvt_qp *qp)
+{
+	struct hfi1_qp_priv *priv = qp->priv;
+
+	priv->s_flags |= RVT_S_ACK_PENDING;
+	hfi1_schedule_tid_send(qp);
+}
+
+static void tid_rdma_trigger_ack(struct rvt_qp *qp)
+{
+	validate_r_tid_ack(qp->priv);
+	tid_rdma_schedule_ack(qp);
+}
+
 static u64 tid_rdma_opfn_encode(struct tid_rdma_params *p)
 {
 	return
@@ -3005,10 +3025,7 @@ bool hfi1_handle_kdeth_eflags(struct hfi1_ctxtdata *rcd,
 		qpriv->s_nak_state = IB_NAK_PSN_ERROR;
 		/* We are NAK'ing the next expected PSN */
 		qpriv->s_nak_psn = mask_psn(flow->flow_state.r_next_psn);
-		qpriv->s_flags |= RVT_S_ACK_PENDING;
-		if (qpriv->r_tid_ack == HFI1_QP_WQE_INVALID)
-			qpriv->r_tid_ack = qpriv->r_tid_tail;
-		hfi1_schedule_tid_send(qp);
+		tid_rdma_trigger_ack(qp);
 	}
 	goto unlock;
 }
@@ -3526,7 +3543,7 @@ static void hfi1_tid_write_alloc_resources(struct rvt_qp *qp, bool intr_ctx)
 		/*
 		 * If overtaking req->acked_tail, send an RNR NAK. Because the
 		 * QP is not queued in this case, and the issue can only be
-		 * caused due a delay in scheduling the second leg which we
+		 * caused by a delay in scheduling the second leg which we
 		 * cannot estimate, we use a rather arbitrary RNR timeout of
 		 * (MAX_FLOWS / 2) segments
 		 */
@@ -3534,8 +3551,7 @@ static void hfi1_tid_write_alloc_resources(struct rvt_qp *qp, bool intr_ctx)
 				MAX_FLOWS)) {
 			ret = -EAGAIN;
 			to_seg = MAX_FLOWS >> 1;
-			qpriv->s_flags |= RVT_S_ACK_PENDING;
-			hfi1_schedule_tid_send(qp);
+			tid_rdma_trigger_ack(qp);
 			break;
 		}
 
@@ -4335,8 +4351,7 @@ void hfi1_rc_rcv_tid_rdma_write_data(struct hfi1_packet *packet)
 	trace_hfi1_tid_req_rcv_write_data(qp, 0, e->opcode, e->psn, e->lpsn,
 					  req);
 	trace_hfi1_tid_write_rsp_rcv_data(qp);
-	if (priv->r_tid_ack == HFI1_QP_WQE_INVALID)
-		priv->r_tid_ack = priv->r_tid_tail;
+	validate_r_tid_ack(priv);
 
 	if (opcode == TID_OP(WRITE_DATA_LAST)) {
 		release_rdma_sge_mr(e);
@@ -4375,8 +4390,7 @@ void hfi1_rc_rcv_tid_rdma_write_data(struct hfi1_packet *packet)
 	}
 
 done:
-	priv->s_flags |= RVT_S_ACK_PENDING;
-	hfi1_schedule_tid_send(qp);
+	tid_rdma_schedule_ack(qp);
 exit:
 	priv->r_next_psn_kdeth = flow->flow_state.r_next_psn;
 	if (fecn)
@@ -4388,10 +4402,7 @@ void hfi1_rc_rcv_tid_rdma_write_data(struct hfi1_packet *packet)
 	if (!priv->s_nak_state) {
 		priv->s_nak_state = IB_NAK_PSN_ERROR;
 		priv->s_nak_psn = flow->flow_state.r_next_psn;
-		priv->s_flags |= RVT_S_ACK_PENDING;
-		if (priv->r_tid_ack == HFI1_QP_WQE_INVALID)
-			priv->r_tid_ack = priv->r_tid_tail;
-		hfi1_schedule_tid_send(qp);
+		tid_rdma_trigger_ack(qp);
 	}
 	goto done;
 }
@@ -4939,8 +4950,7 @@ void hfi1_rc_rcv_tid_rdma_resync(struct hfi1_packet *packet)
 	qpriv->resync = true;
 	/* RESYNC request always gets a TID RDMA ACK. */
 	qpriv->s_nak_state = 0;
-	qpriv->s_flags |= RVT_S_ACK_PENDING;
-	hfi1_schedule_tid_send(qp);
+	tid_rdma_trigger_ack(qp);
 bail:
 	if (fecn)
 		qp->s_flags |= RVT_S_ECN;


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH for-rc 3/4] IB/hfi1: Calculate flow weight based on QP MTU for TID RDMA
  2019-10-25 19:58 [PATCH for-rc 0/4] Few more rc fixes Dennis Dalessandro
  2019-10-25 19:58 ` [PATCH for-rc 1/4] IB/hfi1: Allow for all speeds higher than gen3 Dennis Dalessandro
  2019-10-25 19:58 ` [PATCH for-rc 2/4] IB/hfi1: Ensure r_tid_ack is valid before building TID RDMA ACK packet Dennis Dalessandro
@ 2019-10-25 19:58 ` Dennis Dalessandro
  2019-10-25 19:58 ` [PATCH for-rc 4/4] IB/hfi1: TID RDMA WRITE should not return IB_WC_RNR_RETRY_EXC_ERR Dennis Dalessandro
  2019-11-06 17:26 ` [PATCH for-rc 0/4] Few more rc fixes Jason Gunthorpe
  4 siblings, 0 replies; 11+ messages in thread
From: Dennis Dalessandro @ 2019-10-25 19:58 UTC (permalink / raw)
  To: jgg, dledford; +Cc: linux-rdma, Mike Marciniszyn, stable, Kaike Wan

From: Kaike Wan <kaike.wan@intel.com>

For a TID RDMA WRITE request, a QP on the responder side could be put
into a queue when a hardware flow is not available. A RNR NAK will be
returned to the requester with a RNR timeout value based on the
position of the QP in the queue. The tid_rdma_flow_wt variable is used
to calculate the timeout value and is determined by using a MTU of
4096 at the module loading time. This could reduce the timeout value
by half from the desired value, leading to excessive RNR retries.

This patch fixes the issue by calculating the flow weight with the
real MTU assigned to the QP.

Fixes: 07b923701e38 ("IB/hfi1: Add functions to receive TID RDMA WRITE request")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
---
 drivers/infiniband/hw/hfi1/init.c     |    1 -
 drivers/infiniband/hw/hfi1/tid_rdma.c |   13 +++++--------
 drivers/infiniband/hw/hfi1/tid_rdma.h |    3 +--
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c
index 71cb952..26b792b 100644
--- a/drivers/infiniband/hw/hfi1/init.c
+++ b/drivers/infiniband/hw/hfi1/init.c
@@ -1489,7 +1489,6 @@ static int __init hfi1_mod_init(void)
 		goto bail_dev;
 	}
 
-	hfi1_compute_tid_rdma_flow_wt();
 	/*
 	 * These must be called before the driver is registered with
 	 * the PCI subsystem.
diff --git a/drivers/infiniband/hw/hfi1/tid_rdma.c b/drivers/infiniband/hw/hfi1/tid_rdma.c
index 73bc78b..e53f542 100644
--- a/drivers/infiniband/hw/hfi1/tid_rdma.c
+++ b/drivers/infiniband/hw/hfi1/tid_rdma.c
@@ -107,8 +107,6 @@ static u32 mask_generation(u32 a)
  * C - Capcode
  */
 
-static u32 tid_rdma_flow_wt;
-
 static void tid_rdma_trigger_resume(struct work_struct *work);
 static void hfi1_kern_exp_rcv_free_flows(struct tid_rdma_request *req);
 static int hfi1_kern_exp_rcv_alloc_flows(struct tid_rdma_request *req,
@@ -3388,18 +3386,17 @@ u32 hfi1_build_tid_rdma_write_req(struct rvt_qp *qp, struct rvt_swqe *wqe,
 	return sizeof(ohdr->u.tid_rdma.w_req) / sizeof(u32);
 }
 
-void hfi1_compute_tid_rdma_flow_wt(void)
+static u32 hfi1_compute_tid_rdma_flow_wt(struct rvt_qp *qp)
 {
 	/*
 	 * Heuristic for computing the RNR timeout when waiting on the flow
 	 * queue. Rather than a computationaly expensive exact estimate of when
 	 * a flow will be available, we assume that if a QP is at position N in
 	 * the flow queue it has to wait approximately (N + 1) * (number of
-	 * segments between two sync points), assuming PMTU of 4K. The rationale
-	 * for this is that flows are released and recycled at each sync point.
+	 * segments between two sync points). The rationale for this is that
+	 * flows are released and recycled at each sync point.
 	 */
-	tid_rdma_flow_wt = MAX_TID_FLOW_PSN * enum_to_mtu(OPA_MTU_4096) /
-		TID_RDMA_MAX_SEGMENT_SIZE;
+	return (MAX_TID_FLOW_PSN * qp->pmtu) >> TID_RDMA_SEGMENT_SHIFT;
 }
 
 static u32 position_in_queue(struct hfi1_qp_priv *qpriv,
@@ -3522,7 +3519,7 @@ static void hfi1_tid_write_alloc_resources(struct rvt_qp *qp, bool intr_ctx)
 		if (qpriv->flow_state.index >= RXE_NUM_TID_FLOWS) {
 			ret = hfi1_kern_setup_hw_flow(qpriv->rcd, qp);
 			if (ret) {
-				to_seg = tid_rdma_flow_wt *
+				to_seg = hfi1_compute_tid_rdma_flow_wt(qp) *
 					position_in_queue(qpriv,
 							  &rcd->flow_queue);
 				break;
diff --git a/drivers/infiniband/hw/hfi1/tid_rdma.h b/drivers/infiniband/hw/hfi1/tid_rdma.h
index 1c53618..6e82df2 100644
--- a/drivers/infiniband/hw/hfi1/tid_rdma.h
+++ b/drivers/infiniband/hw/hfi1/tid_rdma.h
@@ -17,6 +17,7 @@
 #define TID_RDMA_MIN_SEGMENT_SIZE       BIT(18)   /* 256 KiB (for now) */
 #define TID_RDMA_MAX_SEGMENT_SIZE       BIT(18)   /* 256 KiB (for now) */
 #define TID_RDMA_MAX_PAGES              (BIT(18) >> PAGE_SHIFT)
+#define TID_RDMA_SEGMENT_SHIFT		18
 
 /*
  * Bit definitions for priv->s_flags.
@@ -274,8 +275,6 @@ u32 hfi1_build_tid_rdma_write_req(struct rvt_qp *qp, struct rvt_swqe *wqe,
 				  struct ib_other_headers *ohdr,
 				  u32 *bth1, u32 *bth2, u32 *len);
 
-void hfi1_compute_tid_rdma_flow_wt(void);
-
 void hfi1_rc_rcv_tid_rdma_write_req(struct hfi1_packet *packet);
 
 u32 hfi1_build_tid_rdma_write_resp(struct rvt_qp *qp, struct rvt_ack_entry *e,


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH for-rc 4/4] IB/hfi1: TID RDMA WRITE should not return IB_WC_RNR_RETRY_EXC_ERR
  2019-10-25 19:58 [PATCH for-rc 0/4] Few more rc fixes Dennis Dalessandro
                   ` (2 preceding siblings ...)
  2019-10-25 19:58 ` [PATCH for-rc 3/4] IB/hfi1: Calculate flow weight based on QP MTU for TID RDMA Dennis Dalessandro
@ 2019-10-25 19:58 ` Dennis Dalessandro
  2019-11-06 17:26 ` [PATCH for-rc 0/4] Few more rc fixes Jason Gunthorpe
  4 siblings, 0 replies; 11+ messages in thread
From: Dennis Dalessandro @ 2019-10-25 19:58 UTC (permalink / raw)
  To: jgg, dledford; +Cc: linux-rdma, Mike Marciniszyn, stable, Kaike Wan

From: Kaike Wan <kaike.wan@intel.com>

Normal RDMA WRITE request never returns IB_WC_RNR_RETRY_EXC_ERR to ULPs
because it does not need post receive buffer on the responder side.
Consequently, as an enhancement to normal RDMA WRITE request inside
the hfi1 driver, TID RDMA WRITE request should not return such an
error status to ULPs, although it does receive RNR NAKs from the
responder when TID resources are not available. This behavior is
violated when qp->s_rnr_retry_cnt is set in current hfi1
implementation.

This patch enforces this semantics by avoiding any reaction to the
updates of the RNR QP attributes.

Fixes: 3c6cb20a0d17 ("IB/hfi1: Add TID RDMA WRITE functionality into RDMA verbs")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
---
 drivers/infiniband/hw/hfi1/rc.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/rc.c b/drivers/infiniband/hw/hfi1/rc.c
index 513a8aa..1a3c647 100644
--- a/drivers/infiniband/hw/hfi1/rc.c
+++ b/drivers/infiniband/hw/hfi1/rc.c
@@ -2209,15 +2209,15 @@ int do_rc_ack(struct rvt_qp *qp, u32 aeth, u32 psn, int opcode,
 		if (qp->s_flags & RVT_S_WAIT_RNR)
 			goto bail_stop;
 		rdi = ib_to_rvt(qp->ibqp.device);
-		if (qp->s_rnr_retry == 0 &&
-		    !((rdi->post_parms[wqe->wr.opcode].flags &
-		      RVT_OPERATION_IGN_RNR_CNT) &&
-		      qp->s_rnr_retry_cnt == 0)) {
-			status = IB_WC_RNR_RETRY_EXC_ERR;
-			goto class_b;
+		if (!(rdi->post_parms[wqe->wr.opcode].flags &
+		       RVT_OPERATION_IGN_RNR_CNT)) {
+			if (qp->s_rnr_retry == 0) {
+				status = IB_WC_RNR_RETRY_EXC_ERR;
+				goto class_b;
+			}
+			if (qp->s_rnr_retry_cnt < 7 && qp->s_rnr_retry_cnt > 0)
+				qp->s_rnr_retry--;
 		}
-		if (qp->s_rnr_retry_cnt < 7 && qp->s_rnr_retry_cnt > 0)
-			qp->s_rnr_retry--;
 
 		/*
 		 * The last valid PSN is the previous PSN. For TID RDMA WRITE


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-rc 1/4] IB/hfi1: Allow for all speeds higher than gen3
  2019-10-25 19:58 ` [PATCH for-rc 1/4] IB/hfi1: Allow for all speeds higher than gen3 Dennis Dalessandro
@ 2019-10-29 19:52   ` Jason Gunthorpe
  2019-10-29 21:19     ` Marciniszyn, Mike
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2019-10-29 19:52 UTC (permalink / raw)
  To: Dennis Dalessandro
  Cc: dledford, linux-rdma, Mike Marciniszyn, Kaike Wan, stable, James Erwin

On Fri, Oct 25, 2019 at 03:58:24PM -0400, Dennis Dalessandro wrote:
> From: James Erwin <james.erwin@intel.com>
> 
> The driver avoids the gen3 speed bump when the parent
> bus speed isn't identical to gen3, 8.0GT/s.  This is not
> compatible with gen4 and newer speeds.
> 
> Fix by relaxing the test to explicitly look for the lower
> capability speeds which inherently allows for all future speeds.

This description does not seem like stable material to me.

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH for-rc 1/4] IB/hfi1: Allow for all speeds higher than gen3
  2019-10-29 19:52   ` Jason Gunthorpe
@ 2019-10-29 21:19     ` Marciniszyn, Mike
  2019-10-30 18:07       ` Jason Gunthorpe
  0 siblings, 1 reply; 11+ messages in thread
From: Marciniszyn, Mike @ 2019-10-29 21:19 UTC (permalink / raw)
  To: Jason Gunthorpe, Dalessandro, Dennis
  Cc: dledford, linux-rdma, Wan, Kaike, stable, Erwin, James

> Subject: Re: [PATCH for-rc 1/4] IB/hfi1: Allow for all speeds higher than gen3
> 
> On Fri, Oct 25, 2019 at 03:58:24PM -0400, Dennis Dalessandro wrote:
> > From: James Erwin <james.erwin@intel.com>
> >
> > The driver avoids the gen3 speed bump when the parent
> > bus speed isn't identical to gen3, 8.0GT/s.  This is not
> > compatible with gen4 and newer speeds.
> >
> > Fix by relaxing the test to explicitly look for the lower
> > capability speeds which inherently allows for all future speeds.
> 
> This description does not seem like stable material to me.
> 

Having a card unknowingly operate at half speed would seem pretty serious to me.

Perhaps the description should say:

IB/hfi1: Insure full Gen3 speed in a Gen4 system

Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-rc 1/4] IB/hfi1: Allow for all speeds higher than gen3
  2019-10-29 21:19     ` Marciniszyn, Mike
@ 2019-10-30 18:07       ` Jason Gunthorpe
  2019-10-30 20:14         ` Marciniszyn, Mike
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2019-10-30 18:07 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Dalessandro, Dennis, dledford, linux-rdma, Wan, Kaike, stable,
	Erwin, James

On Tue, Oct 29, 2019 at 09:19:34PM +0000, Marciniszyn, Mike wrote:
> > Subject: Re: [PATCH for-rc 1/4] IB/hfi1: Allow for all speeds higher than gen3
> > 
> > On Fri, Oct 25, 2019 at 03:58:24PM -0400, Dennis Dalessandro wrote:
> > > From: James Erwin <james.erwin@intel.com>
> > >
> > > The driver avoids the gen3 speed bump when the parent
> > > bus speed isn't identical to gen3, 8.0GT/s.  This is not
> > > compatible with gen4 and newer speeds.
> > >
> > > Fix by relaxing the test to explicitly look for the lower
> > > capability speeds which inherently allows for all future speeds.
> > 
> > This description does not seem like stable material to me.
> > 
> 
> Having a card unknowingly operate at half speed would seem pretty serious to me.

Since gen4 systems are really new this also sounds like a new feature
to me.. You need to be concerned that changing the pci setup doesn't
cause regressions on existing systems too.

> Perhaps the description should say:
> 
> IB/hfi1: Insure full Gen3 speed in a Gen4 system

And maybe explain what the actual user visible impact is here. Sounds
like plugging a card into a gen4 system will not run at gen3 speeds?

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH for-rc 1/4] IB/hfi1: Allow for all speeds higher than gen3
  2019-10-30 18:07       ` Jason Gunthorpe
@ 2019-10-30 20:14         ` Marciniszyn, Mike
  2019-11-01 19:39           ` Marciniszyn, Mike
  0 siblings, 1 reply; 11+ messages in thread
From: Marciniszyn, Mike @ 2019-10-30 20:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dalessandro, Dennis, dledford, linux-rdma, Wan, Kaike, stable,
	Erwin, James

> 
> Since gen4 systems are really new this also sounds like a new feature
> to me.. You need to be concerned that changing the pci setup doesn't
> cause regressions on existing systems too.
> 

We have covered this with all of our types of cards and servers (gen3 and gen4).

> > Perhaps the description should say:
> >
> > IB/hfi1: Insure full Gen3 speed in a Gen4 system
> 
> And maybe explain what the actual user visible impact is here. Sounds
> like plugging a card into a gen4 system will not run at gen3 speeds?
> 

Ok.   I will reissue the patch some changes in the commit message.

Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH for-rc 1/4] IB/hfi1: Allow for all speeds higher than gen3
  2019-10-30 20:14         ` Marciniszyn, Mike
@ 2019-11-01 19:39           ` Marciniszyn, Mike
  0 siblings, 0 replies; 11+ messages in thread
From: Marciniszyn, Mike @ 2019-11-01 19:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dalessandro, Dennis, dledford, linux-rdma, Wan, Kaike, stable,
	Erwin, James


> 
> Ok.   I will reissue the patch some changes in the commit message.
> 
> Mike

The new patch is https://lore.kernel.org/linux-rdma/20191101192059.106248.1699.stgit@awfm-01.aw.intel.com/T/#u.

Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-rc 0/4] Few more rc fixes
  2019-10-25 19:58 [PATCH for-rc 0/4] Few more rc fixes Dennis Dalessandro
                   ` (3 preceding siblings ...)
  2019-10-25 19:58 ` [PATCH for-rc 4/4] IB/hfi1: TID RDMA WRITE should not return IB_WC_RNR_RETRY_EXC_ERR Dennis Dalessandro
@ 2019-11-06 17:26 ` Jason Gunthorpe
  4 siblings, 0 replies; 11+ messages in thread
From: Jason Gunthorpe @ 2019-11-06 17:26 UTC (permalink / raw)
  To: Dennis Dalessandro; +Cc: dledford, linux-rdma

On Fri, Oct 25, 2019 at 03:58:17PM -0400, Dennis Dalessandro wrote:
> Here we have 4 more bug fixing patches. These are all marked stable as well so
> it would be good to get these in to 5.4 if possible.
> 
> The allow for speeds patch is not a "feature" it is a bug that causes problems
> if the host comes up in gen4. Is very small. The other 3 from Kaike fix TID RDMA
> bugs. A few more lines of code here but all relegated to TID RDMA.
> 
> 
> James Erwin (1):
>       IB/hfi1: Allow for all speeds higher than gen3

With v2 of this one

 
> Kaike Wan (3):
>       IB/hfi1: Ensure r_tid_ack is valid before building TID RDMA ACK packet
>       IB/hfi1: Calculate flow weight based on QP MTU for TID RDMA
>       IB/hfi1: TID RDMA WRITE should not return IB_WC_RNR_RETRY_EXC_ERR

Applied to for-rc, thanks

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-25 19:58 [PATCH for-rc 0/4] Few more rc fixes Dennis Dalessandro
2019-10-25 19:58 ` [PATCH for-rc 1/4] IB/hfi1: Allow for all speeds higher than gen3 Dennis Dalessandro
2019-10-29 19:52   ` Jason Gunthorpe
2019-10-29 21:19     ` Marciniszyn, Mike
2019-10-30 18:07       ` Jason Gunthorpe
2019-10-30 20:14         ` Marciniszyn, Mike
2019-11-01 19:39           ` Marciniszyn, Mike
2019-10-25 19:58 ` [PATCH for-rc 2/4] IB/hfi1: Ensure r_tid_ack is valid before building TID RDMA ACK packet Dennis Dalessandro
2019-10-25 19:58 ` [PATCH for-rc 3/4] IB/hfi1: Calculate flow weight based on QP MTU for TID RDMA Dennis Dalessandro
2019-10-25 19:58 ` [PATCH for-rc 4/4] IB/hfi1: TID RDMA WRITE should not return IB_WC_RNR_RETRY_EXC_ERR Dennis Dalessandro
2019-11-06 17:26 ` [PATCH for-rc 0/4] Few more rc fixes Jason Gunthorpe

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git