All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/20] IB/hfi1, qib, rdmavt: Another round of patches for 4.11
@ 2017-03-21  0:24 ` Dennis Dalessandro
  0 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:24 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: Mike Marciniszyn, Dean Luick, Jakub Byczkowski, Tadeusz Struk,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Ira Weiny, Brian Welty,
	Easwar Hariharan, stable-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky,
	Michael J. Ruhl, Don Hiatt, Sebastian Sanchez

Doug,
Here is another round of patches for 4.11. Included with the usual bug fixes
and general improvements of particular interest are new versions of the two
patches that you didn't take for the first set. The fault injection stuff.
We decided to go ahead and use the already existing config variable for those.
The other interesting thing here is a patch to the IB core for MGID/MLID
checking.

Patches apply on top of Linus' master branch which includes your most recent
pull request so this should apply equally well to your tree. Patches can 
also be found in my GitHub repo at:
https://github.com/ddalessa/kernel/tree/for-4.11

Changes since v1:
-----------------
Correct 0-day build errors in fault injection patches
Correct swqe completion trace message location

---

Dean Luick (1):
      IB/hfi1: Force logical link down

Don Hiatt (2):
      IB/hfi1: Add receive fault injection feature
      IB/hfi1: Add transmit fault injection feature

Easwar Hariharan (1):
      IB/hfi1: Check for QSFP presence before attempting reads

Michael J. Ruhl (5):
      IB/hfi1: Race hazard avoidance in user SDMA driver
      IB/hfi1: Cache registers during state change
      IB/hfi1: Add a patch value to the firmware version string
      IB/hfi1: Ensure VL index is within bounds
      IB/core: If the MGID/MLID pair is not on the list return an error

Mike Marciniszyn (7):
      IB/rdmavt,IB/hfi1,IB/qib: Make wc opcode translation driver dependent
      IB/rdmavt: Add additional fields to post send trace
      IB/rdmavt: Add tracing for cq entry and poll
      IB/rdmavt: Add swqe completion trace
      IB/rdmavt: Avoid reseting wqe send_flags in unreserve
      IB/hfi1: Eliminate synchronize_rcu() in mr delete
      IB/rdmavt,IB/qib,IB/hfi1: Make percpu refcount optional for user MRs

Sebastian Sanchez (2):
      IB/hfi1: NULL pointer dereference when freeing rhashtable
      IB/rdmavt,IB/hfi1: Fix timer migration regressions

Tadeusz Struk (2):
      IB/hfi1: Check device id early during init
      IB/hfi1: Protect the global dev_cntr_names and port_cntr_names


 drivers/infiniband/core/uverbs_cmd.c    |   13 +-
 drivers/infiniband/hw/hfi1/chip.c       |  178 ++++++++++++++++++++----
 drivers/infiniband/hw/hfi1/chip.h       |   18 +-
 drivers/infiniband/hw/hfi1/debugfs.c    |  230 +++++++++++++++++++++++++++++++
 drivers/infiniband/hw/hfi1/debugfs.h    |   62 ++++++++
 drivers/infiniband/hw/hfi1/driver.c     |   19 +++
 drivers/infiniband/hw/hfi1/firmware.c   |   14 +-
 drivers/infiniband/hw/hfi1/hfi.h        |   11 +
 drivers/infiniband/hw/hfi1/init.c       |   19 +--
 drivers/infiniband/hw/hfi1/rc.c         |   12 +-
 drivers/infiniband/hw/hfi1/ruc.c        |    7 +
 drivers/infiniband/hw/hfi1/sdma.c       |   43 ++++--
 drivers/infiniband/hw/hfi1/trace_misc.h |   48 ++++++
 drivers/infiniband/hw/hfi1/trace_rc.h   |    7 -
 drivers/infiniband/hw/hfi1/trace_tx.h   |   43 ++++++
 drivers/infiniband/hw/hfi1/user_sdma.c  |    3 
 drivers/infiniband/hw/hfi1/verbs.c      |  104 ++++++++++++--
 drivers/infiniband/hw/hfi1/verbs.h      |    5 +
 drivers/infiniband/hw/qib/qib_rc.c      |   10 +
 drivers/infiniband/hw/qib/qib_ruc.c     |    5 +
 drivers/infiniband/hw/qib/qib_verbs.c   |   20 +++
 drivers/infiniband/sw/rdmavt/cq.c       |    3 
 drivers/infiniband/sw/rdmavt/mr.c       |   55 +++++--
 drivers/infiniband/sw/rdmavt/qp.c       |   32 +---
 drivers/infiniband/sw/rdmavt/trace.h    |    4 -
 drivers/infiniband/sw/rdmavt/trace_cq.h |  127 +++++++++++++++++
 drivers/infiniband/sw/rdmavt/trace_rc.h |  109 +++++++++++++++
 drivers/infiniband/sw/rdmavt/trace_tx.h |   34 ++++-
 include/rdma/ib_pack.h                  |    2 
 include/rdma/rdma_vt.h                  |    1 
 include/rdma/rdmavt_qp.h                |    7 -
 31 files changed, 1096 insertions(+), 149 deletions(-)
 create mode 100644 drivers/infiniband/sw/rdmavt/trace_cq.h
 create mode 100644 drivers/infiniband/sw/rdmavt/trace_rc.h

--
-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 00/20] IB/hfi1, qib, rdmavt: Another round of patches for 4.11
@ 2017-03-21  0:24 ` Dennis Dalessandro
  0 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:24 UTC (permalink / raw)
  To: dledford
  Cc: Mike Marciniszyn, Dean Luick, Jakub Byczkowski, Tadeusz Struk,
	linux-rdma, Ira Weiny, Brian Welty, Easwar Hariharan, stable,
	Leon Romanovsky, Michael J. Ruhl, Don Hiatt, Sebastian Sanchez

Doug,
Here is another round of patches for 4.11. Included with the usual bug fixes
and general improvements of particular interest are new versions of the two
patches that you didn't take for the first set. The fault injection stuff.
We decided to go ahead and use the already existing config variable for those.
The other interesting thing here is a patch to the IB core for MGID/MLID
checking.

Patches apply on top of Linus' master branch which includes your most recent
pull request so this should apply equally well to your tree. Patches can 
also be found in my GitHub repo at:
https://github.com/ddalessa/kernel/tree/for-4.11

Changes since v1:
-----------------
Correct 0-day build errors in fault injection patches
Correct swqe completion trace message location

---

Dean Luick (1):
      IB/hfi1: Force logical link down

Don Hiatt (2):
      IB/hfi1: Add receive fault injection feature
      IB/hfi1: Add transmit fault injection feature

Easwar Hariharan (1):
      IB/hfi1: Check for QSFP presence before attempting reads

Michael J. Ruhl (5):
      IB/hfi1: Race hazard avoidance in user SDMA driver
      IB/hfi1: Cache registers during state change
      IB/hfi1: Add a patch value to the firmware version string
      IB/hfi1: Ensure VL index is within bounds
      IB/core: If the MGID/MLID pair is not on the list return an error

Mike Marciniszyn (7):
      IB/rdmavt,IB/hfi1,IB/qib: Make wc opcode translation driver dependent
      IB/rdmavt: Add additional fields to post send trace
      IB/rdmavt: Add tracing for cq entry and poll
      IB/rdmavt: Add swqe completion trace
      IB/rdmavt: Avoid reseting wqe send_flags in unreserve
      IB/hfi1: Eliminate synchronize_rcu() in mr delete
      IB/rdmavt,IB/qib,IB/hfi1: Make percpu refcount optional for user MRs

Sebastian Sanchez (2):
      IB/hfi1: NULL pointer dereference when freeing rhashtable
      IB/rdmavt,IB/hfi1: Fix timer migration regressions

Tadeusz Struk (2):
      IB/hfi1: Check device id early during init
      IB/hfi1: Protect the global dev_cntr_names and port_cntr_names


 drivers/infiniband/core/uverbs_cmd.c    |   13 +-
 drivers/infiniband/hw/hfi1/chip.c       |  178 ++++++++++++++++++++----
 drivers/infiniband/hw/hfi1/chip.h       |   18 +-
 drivers/infiniband/hw/hfi1/debugfs.c    |  230 +++++++++++++++++++++++++++++++
 drivers/infiniband/hw/hfi1/debugfs.h    |   62 ++++++++
 drivers/infiniband/hw/hfi1/driver.c     |   19 +++
 drivers/infiniband/hw/hfi1/firmware.c   |   14 +-
 drivers/infiniband/hw/hfi1/hfi.h        |   11 +
 drivers/infiniband/hw/hfi1/init.c       |   19 +--
 drivers/infiniband/hw/hfi1/rc.c         |   12 +-
 drivers/infiniband/hw/hfi1/ruc.c        |    7 +
 drivers/infiniband/hw/hfi1/sdma.c       |   43 ++++--
 drivers/infiniband/hw/hfi1/trace_misc.h |   48 ++++++
 drivers/infiniband/hw/hfi1/trace_rc.h   |    7 -
 drivers/infiniband/hw/hfi1/trace_tx.h   |   43 ++++++
 drivers/infiniband/hw/hfi1/user_sdma.c  |    3 
 drivers/infiniband/hw/hfi1/verbs.c      |  104 ++++++++++++--
 drivers/infiniband/hw/hfi1/verbs.h      |    5 +
 drivers/infiniband/hw/qib/qib_rc.c      |   10 +
 drivers/infiniband/hw/qib/qib_ruc.c     |    5 +
 drivers/infiniband/hw/qib/qib_verbs.c   |   20 +++
 drivers/infiniband/sw/rdmavt/cq.c       |    3 
 drivers/infiniband/sw/rdmavt/mr.c       |   55 +++++--
 drivers/infiniband/sw/rdmavt/qp.c       |   32 +---
 drivers/infiniband/sw/rdmavt/trace.h    |    4 -
 drivers/infiniband/sw/rdmavt/trace_cq.h |  127 +++++++++++++++++
 drivers/infiniband/sw/rdmavt/trace_rc.h |  109 +++++++++++++++
 drivers/infiniband/sw/rdmavt/trace_tx.h |   34 ++++-
 include/rdma/ib_pack.h                  |    2 
 include/rdma/rdma_vt.h                  |    1 
 include/rdma/rdmavt_qp.h                |    7 -
 31 files changed, 1096 insertions(+), 149 deletions(-)
 create mode 100644 drivers/infiniband/sw/rdmavt/trace_cq.h
 create mode 100644 drivers/infiniband/sw/rdmavt/trace_rc.h

--
-Denny

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 01/20] IB/hfi1: Force logical link down
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
@ 2017-03-21  0:24   ` Dennis Dalessandro
  2017-03-21  0:24   ` [PATCH v2 02/20] IB/hfi1: Race hazard avoidance in user SDMA driver Dennis Dalessandro
                     ` (17 subsequent siblings)
  18 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:24 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Easwar Hariharan, Dean Luick,
	Jakub Byczkowski

From: Dean Luick <dean.luick-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

If the logical link state does not read as down when
the physical link state is offline, force it to down.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Reviewed-by: Jakub Byczkowski <jakub.byczkowski-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dean Luick <dean.luick-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Easwar Hariharan <easwar.hariharan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/hfi1/chip.c |   86 +++++++++++++++++++++++++++++--------
 1 files changed, 68 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index 121a4c9..44322c6 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -1045,6 +1045,7 @@ static int wait_logical_linkstate(struct hfi1_pportdata *ppd, u32 state,
 static int qos_rmt_entries(struct hfi1_devdata *dd, unsigned int *mp,
 			   unsigned int *np);
 static void clear_full_mgmt_pkey(struct hfi1_pportdata *ppd);
+static int wait_link_transfer_active(struct hfi1_devdata *dd, int wait_ms);
 
 /*
  * Error interrupt table entry.  This is used as input to the interrupt
@@ -8891,8 +8892,6 @@ int send_idle_sma(struct hfi1_devdata *dd, u64 message)
  */
 static int do_quick_linkup(struct hfi1_devdata *dd)
 {
-	u64 reg;
-	unsigned long timeout;
 	int ret;
 
 	lcb_shutdown(dd, 0);
@@ -8915,19 +8914,9 @@ static int do_quick_linkup(struct hfi1_devdata *dd)
 		write_csr(dd, DC_LCB_CFG_RUN,
 			  1ull << DC_LCB_CFG_RUN_EN_SHIFT);
 
-		/* watch LCB_STS_LINK_TRANSFER_ACTIVE */
-		timeout = jiffies + msecs_to_jiffies(10);
-		while (1) {
-			reg = read_csr(dd, DC_LCB_STS_LINK_TRANSFER_ACTIVE);
-			if (reg)
-				break;
-			if (time_after(jiffies, timeout)) {
-				dd_dev_err(dd,
-					   "timeout waiting for LINK_TRANSFER_ACTIVE\n");
-				return -ETIMEDOUT;
-			}
-			udelay(2);
-		}
+		ret = wait_link_transfer_active(dd, 10);
+		if (ret)
+			return ret;
 
 		write_csr(dd, DC_LCB_CFG_ALLOW_LINK_UP,
 			  1ull << DC_LCB_CFG_ALLOW_LINK_UP_VAL_SHIFT);
@@ -10082,6 +10071,64 @@ static void check_lni_states(struct hfi1_pportdata *ppd)
 	decode_state_complete(ppd, last_remote_state, "received");
 }
 
+/* wait for wait_ms for LINK_TRANSFER_ACTIVE to go to 1 */
+static int wait_link_transfer_active(struct hfi1_devdata *dd, int wait_ms)
+{
+	u64 reg;
+	unsigned long timeout;
+
+	/* watch LCB_STS_LINK_TRANSFER_ACTIVE */
+	timeout = jiffies + msecs_to_jiffies(wait_ms);
+	while (1) {
+		reg = read_csr(dd, DC_LCB_STS_LINK_TRANSFER_ACTIVE);
+		if (reg)
+			break;
+		if (time_after(jiffies, timeout)) {
+			dd_dev_err(dd,
+				   "timeout waiting for LINK_TRANSFER_ACTIVE\n");
+			return -ETIMEDOUT;
+		}
+		udelay(2);
+	}
+	return 0;
+}
+
+/* called when the logical link state is not down as it should be */
+static void force_logical_link_state_down(struct hfi1_pportdata *ppd)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+
+	/*
+	 * Bring link up in LCB loopback
+	 */
+	write_csr(dd, DC_LCB_CFG_TX_FIFOS_RESET, 1);
+	write_csr(dd, DC_LCB_CFG_IGNORE_LOST_RCLK,
+		  DC_LCB_CFG_IGNORE_LOST_RCLK_EN_SMASK);
+
+	write_csr(dd, DC_LCB_CFG_LANE_WIDTH, 0);
+	write_csr(dd, DC_LCB_CFG_REINIT_AS_SLAVE, 0);
+	write_csr(dd, DC_LCB_CFG_CNT_FOR_SKIP_STALL, 0x110);
+	write_csr(dd, DC_LCB_CFG_LOOPBACK, 0x2);
+
+	write_csr(dd, DC_LCB_CFG_TX_FIFOS_RESET, 0);
+	(void)read_csr(dd, DC_LCB_CFG_TX_FIFOS_RESET);
+	udelay(3);
+	write_csr(dd, DC_LCB_CFG_ALLOW_LINK_UP, 1);
+	write_csr(dd, DC_LCB_CFG_RUN, 1ull << DC_LCB_CFG_RUN_EN_SHIFT);
+
+	wait_link_transfer_active(dd, 100);
+
+	/*
+	 * Bring the link down again.
+	 */
+	write_csr(dd, DC_LCB_CFG_TX_FIFOS_RESET, 1);
+	write_csr(dd, DC_LCB_CFG_ALLOW_LINK_UP, 0);
+	write_csr(dd, DC_LCB_CFG_IGNORE_LOST_RCLK, 0);
+
+	/* call again to adjust ppd->statusp, if needed */
+	get_logical_state(ppd);
+}
+
 /*
  * Helper for set_link_state().  Do not call except from that routine.
  * Expects ppd->hls_mutex to be held.
@@ -10135,15 +10182,18 @@ static int goto_offline(struct hfi1_pportdata *ppd, u8 rem_reason)
 			return ret;
 	}
 
-	/* make sure the logical state is also down */
-	wait_logical_linkstate(ppd, IB_PORT_DOWN, 1000);
-
 	/*
 	 * Now in charge of LCB - must be after the physical state is
 	 * offline.quiet and before host_link_state is changed.
 	 */
 	set_host_lcb_access(dd);
 	write_csr(dd, DC_LCB_ERR_EN, ~0ull); /* watch LCB errors */
+
+	/* make sure the logical state is also down */
+	ret = wait_logical_linkstate(ppd, IB_PORT_DOWN, 1000);
+	if (ret)
+		force_logical_link_state_down(ppd);
+
 	ppd->host_link_state = HLS_LINK_COOLDOWN; /* LCB access allowed */
 
 	if (ppd->port_type == PORT_TYPE_QSFP &&

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 02/20] IB/hfi1: Race hazard avoidance in user SDMA driver
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
  2017-03-21  0:24   ` [PATCH v2 01/20] IB/hfi1: Force logical link down Dennis Dalessandro
@ 2017-03-21  0:24   ` Dennis Dalessandro
  2017-03-21  0:24   ` [PATCH v2 03/20] IB/hfi1: Cache registers during state change Dennis Dalessandro
                     ` (16 subsequent siblings)
  18 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:24 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Michael Ruhl

From: Michael J. Ruhl <michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Set the errcode before the state and add the smb_wmb() to avoid a
potential race condition with the user.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Michael Ruhl <michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/hfi1/user_sdma.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/user_sdma.c b/drivers/infiniband/hw/hfi1/user_sdma.c
index e6811c4..060e374 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.c
+++ b/drivers/infiniband/hw/hfi1/user_sdma.c
@@ -1615,9 +1615,10 @@ static inline void set_comp_state(struct hfi1_user_sdma_pkt_q *pq,
 {
 	hfi1_cdbg(SDMA, "[%u:%u:%u:%u] Setting completion status %u %d",
 		  pq->dd->unit, pq->ctxt, pq->subctxt, idx, state, ret);
-	cq->comps[idx].status = state;
 	if (state == ERROR)
 		cq->comps[idx].errcode = -ret;
+	smp_wmb(); /* make sure errcode is visible first */
+	cq->comps[idx].status = state;
 	trace_hfi1_sdma_user_completion(pq->dd, pq->ctxt, pq->subctxt,
 					idx, state, ret);
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 03/20] IB/hfi1: Cache registers during state change
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
  2017-03-21  0:24   ` [PATCH v2 01/20] IB/hfi1: Force logical link down Dennis Dalessandro
  2017-03-21  0:24   ` [PATCH v2 02/20] IB/hfi1: Race hazard avoidance in user SDMA driver Dennis Dalessandro
@ 2017-03-21  0:24   ` Dennis Dalessandro
  2017-03-21  0:24   ` [PATCH v2 04/20] IB/hfi1: NULL pointer dereference when freeing rhashtable Dennis Dalessandro
                     ` (15 subsequent siblings)
  18 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:24 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Michael J. Ruhl

From: Michael J. Ruhl <michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

When the LCB is going offline, inopportune port queries can cause
benign error messages to be logged.  To deal with this, cache the
registers just before setting the LCB to offline, allowing queries to
return without eliciting the error.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/hfi1/chip.c |   58 +++++++++++++++++++++++++++++++++++--
 1 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index 44322c6..8b8840a 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -8345,6 +8345,52 @@ static int read_lcb_via_8051(struct hfi1_devdata *dd, u32 addr, u64 *data)
 }
 
 /*
+ * Provide a cache for some of the LCB registers in case the LCB is
+ * unavailable.
+ * (The LCB is unavailable in certain link states, for example.)
+ */
+struct lcb_datum {
+	u32 off;
+	u64 val;
+};
+
+static struct lcb_datum lcb_cache[] = {
+	{ DC_LCB_ERR_INFO_RX_REPLAY_CNT, 0},
+	{ DC_LCB_ERR_INFO_SEQ_CRC_CNT, 0 },
+	{ DC_LCB_ERR_INFO_REINIT_FROM_PEER_CNT, 0 },
+};
+
+static void update_lcb_cache(struct hfi1_devdata *dd)
+{
+	int i;
+	int ret;
+	u64 val;
+
+	for (i = 0; i < ARRAY_SIZE(lcb_cache); i++) {
+		ret = read_lcb_csr(dd, lcb_cache[i].off, &val);
+
+		/* Update if we get good data */
+		if (likely(ret != -EBUSY))
+			lcb_cache[i].val = val;
+	}
+}
+
+static int read_lcb_cache(u32 off, u64 *val)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(lcb_cache); i++) {
+		if (lcb_cache[i].off == off) {
+			*val = lcb_cache[i].val;
+			return 0;
+		}
+	}
+
+	pr_warn("%s bad offset 0x%x\n", __func__, off);
+	return -1;
+}
+
+/*
  * Read an LCB CSR.  Access may not be in host control, so check.
  * Return 0 on success, -EBUSY on failure.
  */
@@ -8355,9 +8401,13 @@ int read_lcb_csr(struct hfi1_devdata *dd, u32 addr, u64 *data)
 	/* if up, go through the 8051 for the value */
 	if (ppd->host_link_state & HLS_UP)
 		return read_lcb_via_8051(dd, addr, data);
-	/* if going up or down, no access */
-	if (ppd->host_link_state & (HLS_GOING_UP | HLS_GOING_OFFLINE))
-		return -EBUSY;
+	/* if going up or down, check the cache, otherwise, no access */
+	if (ppd->host_link_state & (HLS_GOING_UP | HLS_GOING_OFFLINE)) {
+		if (read_lcb_cache(addr, data))
+			return -EBUSY;
+		return 0;
+	}
+
 	/* otherwise, host has access */
 	*data = read_csr(dd, addr);
 	return 0;
@@ -10145,6 +10195,8 @@ static int goto_offline(struct hfi1_pportdata *ppd, u8 rem_reason)
 	int do_transition;
 	int do_wait;
 
+	update_lcb_cache(dd);
+
 	previous_state = ppd->host_link_state;
 	ppd->host_link_state = HLS_GOING_OFFLINE;
 	pstate = read_physical_state(dd);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 04/20] IB/hfi1: NULL pointer dereference when freeing rhashtable
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-03-21  0:24   ` [PATCH v2 03/20] IB/hfi1: Cache registers during state change Dennis Dalessandro
@ 2017-03-21  0:24   ` Dennis Dalessandro
  2017-03-21  0:25   ` [PATCH v2 05/20] IB/rdmavt, IB/hfi1, IB/qib: Make wc opcode translation driver dependent Dennis Dalessandro
                     ` (14 subsequent siblings)
  18 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:24 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mike Marciniszyn, Sebastian Sanchez

From: Sebastian Sanchez <sebastian.sanchez-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

A NULL pointer dereference occurs when the driver
is unloaded, and the SDMA rhashtable is freed if
the rhashtable_init() function has not been called.
Prevent this by changing sdma_rht to be a pointer
to a dynamically allocated hash table. The NULL-ness
of the pointer serves as an indication that the hash
table was initialized and that it needs to be
destroyed.

Fixes: 0cb2aa690c7e ("IB/hfi1: Add sysfs interface for affinity setup")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/hfi1/hfi.h  |    2 +-
 drivers/infiniband/hw/hfi1/sdma.c |   38 ++++++++++++++++++++++++++-----------
 2 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h
index 0808e3c..b69ab47 100644
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -1167,7 +1167,7 @@ struct hfi1_devdata {
 	bool eprom_available;	/* true if EPROM is available for this device */
 	bool aspm_supported;	/* Does HW support ASPM */
 	bool aspm_enabled;	/* ASPM state: enabled/disabled */
-	struct rhashtable sdma_rht;
+	struct rhashtable *sdma_rht;
 
 	struct kobject kobj;
 };
diff --git a/drivers/infiniband/hw/hfi1/sdma.c b/drivers/infiniband/hw/hfi1/sdma.c
index 1d81cac..9bee28d 100644
--- a/drivers/infiniband/hw/hfi1/sdma.c
+++ b/drivers/infiniband/hw/hfi1/sdma.c
@@ -868,7 +868,7 @@ struct sdma_engine *sdma_select_user_engine(struct hfi1_devdata *dd,
 
 	cpu_id = smp_processor_id();
 	rcu_read_lock();
-	rht_node = rhashtable_lookup_fast(&dd->sdma_rht, &cpu_id,
+	rht_node = rhashtable_lookup_fast(dd->sdma_rht, &cpu_id,
 					  sdma_rht_params);
 
 	if (rht_node && rht_node->map[vl]) {
@@ -962,7 +962,7 @@ ssize_t sdma_set_cpu_to_sde_map(struct sdma_engine *sde, const char *buf,
 			continue;
 		}
 
-		rht_node = rhashtable_lookup_fast(&dd->sdma_rht, &cpu,
+		rht_node = rhashtable_lookup_fast(dd->sdma_rht, &cpu,
 						  sdma_rht_params);
 		if (!rht_node) {
 			rht_node = kzalloc(sizeof(*rht_node), GFP_KERNEL);
@@ -982,7 +982,7 @@ ssize_t sdma_set_cpu_to_sde_map(struct sdma_engine *sde, const char *buf,
 			rht_node->map[vl]->ctr = 1;
 			rht_node->map[vl]->sde[0] = sde;
 
-			ret = rhashtable_insert_fast(&dd->sdma_rht,
+			ret = rhashtable_insert_fast(dd->sdma_rht,
 						     &rht_node->node,
 						     sdma_rht_params);
 			if (ret) {
@@ -1025,7 +1025,7 @@ ssize_t sdma_set_cpu_to_sde_map(struct sdma_engine *sde, const char *buf,
 		if (cpumask_test_cpu(cpu, mask))
 			continue;
 
-		rht_node = rhashtable_lookup_fast(&dd->sdma_rht, &cpu,
+		rht_node = rhashtable_lookup_fast(dd->sdma_rht, &cpu,
 						  sdma_rht_params);
 		if (rht_node) {
 			bool empty = true;
@@ -1049,7 +1049,7 @@ ssize_t sdma_set_cpu_to_sde_map(struct sdma_engine *sde, const char *buf,
 			}
 
 			if (empty) {
-				ret = rhashtable_remove_fast(&dd->sdma_rht,
+				ret = rhashtable_remove_fast(dd->sdma_rht,
 							     &rht_node->node,
 							     sdma_rht_params);
 				WARN_ON(ret);
@@ -1108,7 +1108,7 @@ void sdma_seqfile_dump_cpu_list(struct seq_file *s,
 	struct sdma_rht_node *rht_node;
 	int i, j;
 
-	rht_node = rhashtable_lookup_fast(&dd->sdma_rht, &cpuid,
+	rht_node = rhashtable_lookup_fast(dd->sdma_rht, &cpuid,
 					  sdma_rht_params);
 	if (!rht_node)
 		return;
@@ -1322,6 +1322,12 @@ static void sdma_clean(struct hfi1_devdata *dd, size_t num_engines)
 	synchronize_rcu();
 	kfree(dd->per_sdma);
 	dd->per_sdma = NULL;
+
+	if (dd->sdma_rht) {
+		rhashtable_free_and_destroy(dd->sdma_rht, sdma_rht_free, NULL);
+		kfree(dd->sdma_rht);
+		dd->sdma_rht = NULL;
+	}
 }
 
 /**
@@ -1341,12 +1347,14 @@ int sdma_init(struct hfi1_devdata *dd, u8 port)
 {
 	unsigned this_idx;
 	struct sdma_engine *sde;
+	struct rhashtable *tmp_sdma_rht;
 	u16 descq_cnt;
 	void *curr_head;
 	struct hfi1_pportdata *ppd = dd->pport + port;
 	u32 per_sdma_credits;
 	uint idle_cnt = sdma_idle_cnt;
 	size_t num_engines = dd->chip_sdma_engines;
+	int ret = -ENOMEM;
 
 	if (!HFI1_CAP_IS_KSET(SDMA)) {
 		HFI1_CAP_CLEAR(SDMA_AHG);
@@ -1378,7 +1386,7 @@ int sdma_init(struct hfi1_devdata *dd, u8 port)
 	/* alloc memory for array of send engines */
 	dd->per_sdma = kcalloc(num_engines, sizeof(*dd->per_sdma), GFP_KERNEL);
 	if (!dd->per_sdma)
-		return -ENOMEM;
+		return ret;
 
 	idle_cnt = ns_to_cclock(dd, idle_cnt);
 	if (!sdma_desct_intr)
@@ -1507,18 +1515,27 @@ int sdma_init(struct hfi1_devdata *dd, u8 port)
 	dd->flags |= HFI1_HAS_SEND_DMA;
 	dd->flags |= idle_cnt ? HFI1_HAS_SDMA_TIMEOUT : 0;
 	dd->num_sdma = num_engines;
-	if (sdma_map_init(dd, port, ppd->vls_operational, NULL))
+	ret = sdma_map_init(dd, port, ppd->vls_operational, NULL);
+	if (ret < 0)
+		goto bail;
+
+	tmp_sdma_rht = kzalloc(sizeof(*tmp_sdma_rht), GFP_KERNEL);
+	if (!tmp_sdma_rht) {
+		ret = -ENOMEM;
 		goto bail;
+	}
 
-	if (rhashtable_init(&dd->sdma_rht, &sdma_rht_params))
+	ret = rhashtable_init(tmp_sdma_rht, &sdma_rht_params);
+	if (ret < 0)
 		goto bail;
+	dd->sdma_rht = tmp_sdma_rht;
 
 	dd_dev_info(dd, "SDMA num_sdma: %u\n", dd->num_sdma);
 	return 0;
 
 bail:
 	sdma_clean(dd, num_engines);
-	return -ENOMEM;
+	return ret;
 }
 
 /**
@@ -1604,7 +1621,6 @@ void sdma_exit(struct hfi1_devdata *dd)
 		sdma_finalput(&sde->state);
 	}
 	sdma_clean(dd, dd->num_sdma);
-	rhashtable_free_and_destroy(&dd->sdma_rht, sdma_rht_free, NULL);
 }
 
 /*

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 05/20] IB/rdmavt, IB/hfi1, IB/qib: Make wc opcode translation driver dependent
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
                     ` (3 preceding siblings ...)
  2017-03-21  0:24   ` [PATCH v2 04/20] IB/hfi1: NULL pointer dereference when freeing rhashtable Dennis Dalessandro
@ 2017-03-21  0:25   ` Dennis Dalessandro
  2017-03-21  0:25   ` [PATCH v2 06/20] IB/rdmavt: Add additional fields to post send trace Dennis Dalessandro
                     ` (13 subsequent siblings)
  18 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:25 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mike Marciniszyn

From: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

The work to create a completion helper moved the translation of send
wqe operations to completion opcodes to rdmvat.

This precludes having driver dependent operations.  Make the translation
driver dependent by doing the translation in the driver prior to the
rvt_qp_swqe_complete() call using restored translation tables.

Fixes: Commit f2dc9cdce83c ("IB/rdmavt: Add a send completion helper")
Fixes: Commit 0771da5a6e9d ("IB/hfi1,IB/qib: Use new send completion helper")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/hfi1/rc.c       |   10 ++++++++--
 drivers/infiniband/hw/hfi1/ruc.c      |    5 ++++-
 drivers/infiniband/hw/hfi1/verbs.c    |   16 ++++++++++++++++
 drivers/infiniband/hw/qib/qib_rc.c    |   10 ++++++++--
 drivers/infiniband/hw/qib/qib_ruc.c   |    5 ++++-
 drivers/infiniband/hw/qib/qib_verbs.c |   13 +++++++++++++
 drivers/infiniband/sw/rdmavt/qp.c     |   17 -----------------
 include/rdma/rdmavt_qp.h              |    3 ++-
 8 files changed, 55 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/rc.c b/drivers/infiniband/hw/hfi1/rc.c
index 7382be1..4649530 100644
--- a/drivers/infiniband/hw/hfi1/rc.c
+++ b/drivers/infiniband/hw/hfi1/rc.c
@@ -1034,7 +1034,10 @@ void hfi1_rc_send_complete(struct rvt_qp *qp, struct ib_header *hdr)
 		/* see post_send() */
 		barrier();
 		rvt_put_swqe(wqe);
-		rvt_qp_swqe_complete(qp, wqe, IB_WC_SUCCESS);
+		rvt_qp_swqe_complete(qp,
+				     wqe,
+				     ib_hfi1_wc_opcode[wqe->wr.opcode],
+				     IB_WC_SUCCESS);
 	}
 	/*
 	 * If we were waiting for sends to complete before re-sending,
@@ -1081,7 +1084,10 @@ static inline void update_last_psn(struct rvt_qp *qp, u32 psn)
 		qp->s_last = s_last;
 		/* see post_send() */
 		barrier();
-		rvt_qp_swqe_complete(qp, wqe, IB_WC_SUCCESS);
+		rvt_qp_swqe_complete(qp,
+				     wqe,
+				     ib_hfi1_wc_opcode[wqe->wr.opcode],
+				     IB_WC_SUCCESS);
 	} else {
 		struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
 
diff --git a/drivers/infiniband/hw/hfi1/ruc.c b/drivers/infiniband/hw/hfi1/ruc.c
index aa15bcb..d2eb793 100644
--- a/drivers/infiniband/hw/hfi1/ruc.c
+++ b/drivers/infiniband/hw/hfi1/ruc.c
@@ -920,7 +920,10 @@ void hfi1_send_complete(struct rvt_qp *qp, struct rvt_swqe *wqe,
 	    qp->ibqp.qp_type == IB_QPT_GSI)
 		atomic_dec(&ibah_to_rvtah(wqe->ud_wr.ah)->refcount);
 
-	rvt_qp_swqe_complete(qp, wqe, status);
+	rvt_qp_swqe_complete(qp,
+			     wqe,
+			     ib_hfi1_wc_opcode[wqe->wr.opcode],
+			     status);
 
 	if (qp->s_acked == old_last)
 		qp->s_acked = last;
diff --git a/drivers/infiniband/hw/hfi1/verbs.c b/drivers/infiniband/hw/hfi1/verbs.c
index 222315f..815cb44 100644
--- a/drivers/infiniband/hw/hfi1/verbs.c
+++ b/drivers/infiniband/hw/hfi1/verbs.c
@@ -297,6 +297,22 @@ static inline bool wss_exceeds_threshold(void)
 }
 
 /*
+ * Translate ib_wr_opcode into ib_wc_opcode.
+ */
+const enum ib_wc_opcode ib_hfi1_wc_opcode[] = {
+	[IB_WR_RDMA_WRITE] = IB_WC_RDMA_WRITE,
+	[IB_WR_RDMA_WRITE_WITH_IMM] = IB_WC_RDMA_WRITE,
+	[IB_WR_SEND] = IB_WC_SEND,
+	[IB_WR_SEND_WITH_IMM] = IB_WC_SEND,
+	[IB_WR_RDMA_READ] = IB_WC_RDMA_READ,
+	[IB_WR_ATOMIC_CMP_AND_SWP] = IB_WC_COMP_SWAP,
+	[IB_WR_ATOMIC_FETCH_AND_ADD] = IB_WC_FETCH_ADD,
+	[IB_WR_SEND_WITH_INV] = IB_WC_SEND,
+	[IB_WR_LOCAL_INV] = IB_WC_LOCAL_INV,
+	[IB_WR_REG_MR] = IB_WC_REG_MR
+};
+
+/*
  * Length of header by opcode, 0 --> not supported
  */
 const u8 hdr_len_by_opcode[256] = {
diff --git a/drivers/infiniband/hw/qib/qib_rc.c b/drivers/infiniband/hw/qib/qib_rc.c
index 12658e3..0234987 100644
--- a/drivers/infiniband/hw/qib/qib_rc.c
+++ b/drivers/infiniband/hw/qib/qib_rc.c
@@ -938,7 +938,10 @@ void qib_rc_send_complete(struct rvt_qp *qp, struct ib_header *hdr)
 		/* see post_send() */
 		barrier();
 		rvt_put_swqe(wqe);
-		rvt_qp_swqe_complete(qp, wqe, IB_WC_SUCCESS);
+		rvt_qp_swqe_complete(qp,
+				     wqe,
+				     ib_qib_wc_opcode[wqe->wr.opcode],
+				     IB_WC_SUCCESS);
 	}
 	/*
 	 * If we were waiting for sends to complete before resending,
@@ -983,7 +986,10 @@ static inline void update_last_psn(struct rvt_qp *qp, u32 psn)
 		qp->s_last = s_last;
 		/* see post_send() */
 		barrier();
-		rvt_qp_swqe_complete(qp, wqe, IB_WC_SUCCESS);
+		rvt_qp_swqe_complete(qp,
+				     wqe,
+				     ib_qib_wc_opcode[wqe->wr.opcode],
+				     IB_WC_SUCCESS);
 	} else
 		this_cpu_inc(*ibp->rvp.rc_delayed_comp);
 
diff --git a/drivers/infiniband/hw/qib/qib_ruc.c b/drivers/infiniband/hw/qib/qib_ruc.c
index 17655cc..6e1adf7 100644
--- a/drivers/infiniband/hw/qib/qib_ruc.c
+++ b/drivers/infiniband/hw/qib/qib_ruc.c
@@ -769,7 +769,10 @@ void qib_send_complete(struct rvt_qp *qp, struct rvt_swqe *wqe,
 	    qp->ibqp.qp_type == IB_QPT_GSI)
 		atomic_dec(&ibah_to_rvtah(wqe->ud_wr.ah)->refcount);
 
-	rvt_qp_swqe_complete(qp, wqe, status);
+	rvt_qp_swqe_complete(qp,
+			     wqe,
+			     ib_qib_wc_opcode[wqe->wr.opcode],
+			     status);
 
 	if (qp->s_acked == old_last)
 		qp->s_acked = last;
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index 83f8b5f..e120efe 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -114,6 +114,19 @@
 MODULE_PARM_DESC(disable_sma, "Disable the SMA");
 
 /*
+ * Translate ib_wr_opcode into ib_wc_opcode.
+ */
+const enum ib_wc_opcode ib_qib_wc_opcode[] = {
+	[IB_WR_RDMA_WRITE] = IB_WC_RDMA_WRITE,
+	[IB_WR_RDMA_WRITE_WITH_IMM] = IB_WC_RDMA_WRITE,
+	[IB_WR_SEND] = IB_WC_SEND,
+	[IB_WR_SEND_WITH_IMM] = IB_WC_SEND,
+	[IB_WR_RDMA_READ] = IB_WC_RDMA_READ,
+	[IB_WR_ATOMIC_CMP_AND_SWP] = IB_WC_COMP_SWAP,
+	[IB_WR_ATOMIC_FETCH_AND_ADD] = IB_WC_FETCH_ADD
+};
+
+/*
  * System image GUID.
  */
 __be64 ib_qib_sys_image_guid;
diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c
index f5ad8d4..28fb724 100644
--- a/drivers/infiniband/sw/rdmavt/qp.c
+++ b/drivers/infiniband/sw/rdmavt/qp.c
@@ -117,23 +117,6 @@
 };
 EXPORT_SYMBOL(ib_rvt_state_ops);
 
-/*
- * Translate ib_wr_opcode into ib_wc_opcode.
- */
-const enum ib_wc_opcode ib_rvt_wc_opcode[] = {
-	[IB_WR_RDMA_WRITE] = IB_WC_RDMA_WRITE,
-	[IB_WR_RDMA_WRITE_WITH_IMM] = IB_WC_RDMA_WRITE,
-	[IB_WR_SEND] = IB_WC_SEND,
-	[IB_WR_SEND_WITH_IMM] = IB_WC_SEND,
-	[IB_WR_RDMA_READ] = IB_WC_RDMA_READ,
-	[IB_WR_ATOMIC_CMP_AND_SWP] = IB_WC_COMP_SWAP,
-	[IB_WR_ATOMIC_FETCH_AND_ADD] = IB_WC_FETCH_ADD,
-	[IB_WR_SEND_WITH_INV] = IB_WC_SEND,
-	[IB_WR_LOCAL_INV] = IB_WC_LOCAL_INV,
-	[IB_WR_REG_MR] = IB_WC_REG_MR
-};
-EXPORT_SYMBOL(ib_rvt_wc_opcode);
-
 static void get_map_page(struct rvt_qpn_table *qpt,
 			 struct rvt_qpn_map *map,
 			 gfp_t gfp)
diff --git a/include/rdma/rdmavt_qp.h b/include/rdma/rdmavt_qp.h
index f381639..3cdd9e2 100644
--- a/include/rdma/rdmavt_qp.h
+++ b/include/rdma/rdmavt_qp.h
@@ -574,6 +574,7 @@ static inline void rvt_qp_wqe_unreserve(
 static inline void rvt_qp_swqe_complete(
 	struct rvt_qp *qp,
 	struct rvt_swqe *wqe,
+	enum ib_wc_opcode opcode,
 	enum ib_wc_status status)
 {
 	if (unlikely(wqe->wr.send_flags & RVT_SEND_RESERVE_USED))
@@ -586,7 +587,7 @@ static inline void rvt_qp_swqe_complete(
 		memset(&wc, 0, sizeof(wc));
 		wc.wr_id = wqe->wr.wr_id;
 		wc.status = status;
-		wc.opcode = ib_rvt_wc_opcode[wqe->wr.opcode];
+		wc.opcode = opcode;
 		wc.qp = &qp->ibqp;
 		wc.byte_len = wqe->length;
 		rvt_cq_enter(ibcq_to_rvtcq(qp->ibqp.send_cq), &wc,

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 06/20] IB/rdmavt: Add additional fields to post send trace
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
                     ` (4 preceding siblings ...)
  2017-03-21  0:25   ` [PATCH v2 05/20] IB/rdmavt, IB/hfi1, IB/qib: Make wc opcode translation driver dependent Dennis Dalessandro
@ 2017-03-21  0:25   ` Dennis Dalessandro
  2017-03-21  0:25   ` [PATCH v2 07/20] IB/rdmavt: Add tracing for cq entry and poll Dennis Dalessandro
                     ` (12 subsequent siblings)
  18 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:25 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mike Marciniszyn

From: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

This fix is to get additional debugging information.

The following fields are added:
- wqe
- qpt
- num_sge
- ssn
- pid
- send_flags

These additional fields provide for more focused filtering
and triggering.

The patch also moves the trace to just before the wqe is
posted to get the most accurate information and future proofs
the code to trace all possible reserved opcodes.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/sw/rdmavt/qp.c       |    2 +-
 drivers/infiniband/sw/rdmavt/trace_tx.h |   34 ++++++++++++++++++++++++++++---
 2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c
index 28fb724..3c55a8b 100644
--- a/drivers/infiniband/sw/rdmavt/qp.c
+++ b/drivers/infiniband/sw/rdmavt/qp.c
@@ -1772,11 +1772,11 @@ static int rvt_post_one_wr(struct rvt_qp *qp,
 					0);
 		qp->s_next_psn = wqe->lpsn + 1;
 	}
-	trace_rvt_post_one_wr(qp, wqe);
 	if (unlikely(reserved_op))
 		rvt_qp_wqe_reserve(qp, wqe);
 	else
 		qp->s_avail--;
+	trace_rvt_post_one_wr(qp, wqe);
 	smp_wmb(); /* see request builders */
 	qp->s_head = next;
 
diff --git a/drivers/infiniband/sw/rdmavt/trace_tx.h b/drivers/infiniband/sw/rdmavt/trace_tx.h
index 0e03173..a613a22 100644
--- a/drivers/infiniband/sw/rdmavt/trace_tx.h
+++ b/drivers/infiniband/sw/rdmavt/trace_tx.h
@@ -71,10 +71,20 @@
 	wr_opcode_name(RDMA_READ_WITH_INV),                \
 	wr_opcode_name(LOCAL_INV),                         \
 	wr_opcode_name(MASKED_ATOMIC_CMP_AND_SWP),         \
-	wr_opcode_name(MASKED_ATOMIC_FETCH_AND_ADD))
+	wr_opcode_name(MASKED_ATOMIC_FETCH_AND_ADD),       \
+	wr_opcode_name(RESERVED1),                         \
+	wr_opcode_name(RESERVED2),                         \
+	wr_opcode_name(RESERVED3),                         \
+	wr_opcode_name(RESERVED4),                         \
+	wr_opcode_name(RESERVED5),                         \
+	wr_opcode_name(RESERVED6),                         \
+	wr_opcode_name(RESERVED7),                         \
+	wr_opcode_name(RESERVED8),                         \
+	wr_opcode_name(RESERVED9),                         \
+	wr_opcode_name(RESERVED10))
 
 #define POS_PRN \
-"[%s] wr_id %llx qpn %x psn 0x%x lpsn 0x%x length %u opcode 0x%.2x,%s size %u avail %u head %u last %u"
+"[%s] wqe %p wr_id %llx send_flags %x qpn %x qpt %u psn %x lpsn %x ssn %x length %u opcode 0x%.2x,%s size %u avail %u head %u last %u pid %u num_sge %u"
 
 TRACE_EVENT(
 	rvt_post_one_wr,
@@ -83,7 +93,9 @@
 	TP_STRUCT__entry(
 		RDI_DEV_ENTRY(ib_to_rvt(qp->ibqp.device))
 		__field(u64, wr_id)
+		__field(struct rvt_swqe *, wqe)
 		__field(u32, qpn)
+		__field(u32, qpt)
 		__field(u32, psn)
 		__field(u32, lpsn)
 		__field(u32, length)
@@ -92,11 +104,17 @@
 		__field(u32, avail)
 		__field(u32, head)
 		__field(u32, last)
+		__field(u32, ssn)
+		__field(int, send_flags)
+		__field(pid_t, pid)
+		__field(int, num_sge)
 	),
 	TP_fast_assign(
 		RDI_DEV_ASSIGN(ib_to_rvt(qp->ibqp.device))
+		__entry->wqe = wqe;
 		__entry->wr_id = wqe->wr.wr_id;
 		__entry->qpn = qp->ibqp.qp_num;
+		__entry->qpt = qp->ibqp.qp_type;
 		__entry->psn = wqe->psn;
 		__entry->lpsn = wqe->lpsn;
 		__entry->length = wqe->length;
@@ -105,20 +123,30 @@
 		__entry->avail = qp->s_avail;
 		__entry->head = qp->s_head;
 		__entry->last = qp->s_last;
+		__entry->pid = qp->pid;
+		__entry->ssn = wqe->ssn;
+		__entry->send_flags = wqe->wr.send_flags;
+		__entry->num_sge = wqe->wr.num_sge;
 	),
 	TP_printk(
 		POS_PRN,
 		__get_str(dev),
+		__entry->wqe,
 		__entry->wr_id,
+		__entry->send_flags,
 		__entry->qpn,
+		__entry->qpt,
 		__entry->psn,
 		__entry->lpsn,
+		__entry->ssn,
 		__entry->length,
 		__entry->opcode, show_wr_opcode(__entry->opcode),
 		__entry->size,
 		__entry->avail,
 		__entry->head,
-		__entry->last
+		__entry->last,
+		__entry->pid,
+		__entry->num_sge
 	)
 );
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 07/20] IB/rdmavt: Add tracing for cq entry and poll
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
                     ` (5 preceding siblings ...)
  2017-03-21  0:25   ` [PATCH v2 06/20] IB/rdmavt: Add additional fields to post send trace Dennis Dalessandro
@ 2017-03-21  0:25   ` Dennis Dalessandro
  2017-03-21  0:25   ` [PATCH v2 08/20] IB/rdmavt: Add swqe completion trace Dennis Dalessandro
                     ` (11 subsequent siblings)
  18 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:25 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mike Marciniszyn

From: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

The following fields are defined for filtering and triggering:
- wr_id
- status
- opcode
- qpn
- length
- idx

Reviewed-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/sw/rdmavt/cq.c       |    3 +
 drivers/infiniband/sw/rdmavt/trace.h    |    1 
 drivers/infiniband/sw/rdmavt/trace_cq.h |  127 +++++++++++++++++++++++++++++++
 3 files changed, 131 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/sw/rdmavt/trace_cq.h

diff --git a/drivers/infiniband/sw/rdmavt/cq.c b/drivers/infiniband/sw/rdmavt/cq.c
index 7aa7a4e..0ae2ff8 100644
--- a/drivers/infiniband/sw/rdmavt/cq.c
+++ b/drivers/infiniband/sw/rdmavt/cq.c
@@ -50,6 +50,7 @@
 #include <linux/kthread.h>
 #include "cq.h"
 #include "vt.h"
+#include "trace.h"
 
 /**
  * rvt_cq_enter - add a new entry to the completion queue
@@ -93,6 +94,7 @@ void rvt_cq_enter(struct rvt_cq *cq, struct ib_wc *entry, bool solicited)
 		}
 		return;
 	}
+	trace_rvt_cq_enter(cq, entry, head);
 	if (cq->ip) {
 		wc->uqueue[head].wr_id = entry->wr_id;
 		wc->uqueue[head].status = entry->status;
@@ -482,6 +484,7 @@ int rvt_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry)
 		if (tail == wc->head)
 			break;
 		/* The kernel doesn't need a RMB since it has the lock. */
+		trace_rvt_cq_poll(cq, &wc->kqueue[tail], npolled);
 		*entry = wc->kqueue[tail];
 		if (tail >= cq->ibcq.cqe)
 			tail = 0;
diff --git a/drivers/infiniband/sw/rdmavt/trace.h b/drivers/infiniband/sw/rdmavt/trace.h
index e2d23ac..89554c0 100644
--- a/drivers/infiniband/sw/rdmavt/trace.h
+++ b/drivers/infiniband/sw/rdmavt/trace.h
@@ -52,3 +52,4 @@
 #include "trace_qp.h"
 #include "trace_tx.h"
 #include "trace_mr.h"
+#include "trace_cq.h"
diff --git a/drivers/infiniband/sw/rdmavt/trace_cq.h b/drivers/infiniband/sw/rdmavt/trace_cq.h
new file mode 100644
index 0000000..a315850
--- /dev/null
+++ b/drivers/infiniband/sw/rdmavt/trace_cq.h
@@ -0,0 +1,127 @@
+/*
+ * Copyright(c) 2016 Intel Corporation.
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#if !defined(__RVT_TRACE_CQ_H) || defined(TRACE_HEADER_MULTI_READ)
+#define __RVT_TRACE_CQ_H
+
+#include <linux/tracepoint.h>
+#include <linux/trace_seq.h>
+
+#include <rdma/ib_verbs.h>
+#include <rdma/rdmavt_cq.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM rvt_cq
+
+#define wc_opcode_name(opcode) { IB_WC_##opcode, #opcode  }
+#define show_wc_opcode(opcode)                                \
+__print_symbolic(opcode,                                      \
+	wc_opcode_name(SEND),                                 \
+	wc_opcode_name(RDMA_WRITE),                           \
+	wc_opcode_name(RDMA_READ),                            \
+	wc_opcode_name(COMP_SWAP),                            \
+	wc_opcode_name(FETCH_ADD),                            \
+	wc_opcode_name(LSO),                                  \
+	wc_opcode_name(LOCAL_INV),                            \
+	wc_opcode_name(REG_MR),                               \
+	wc_opcode_name(MASKED_COMP_SWAP),                     \
+	wc_opcode_name(RECV),                                 \
+	wc_opcode_name(RECV_RDMA_WITH_IMM))
+
+#define CQ_PRN \
+"[%s] idx %u wr_id %llx status %u opcode %u,%s length %u qpn %x"
+
+DECLARE_EVENT_CLASS(
+	rvt_cq_entry_template,
+	TP_PROTO(struct rvt_cq *cq, struct ib_wc *wc, u32 idx),
+	TP_ARGS(cq, wc, idx),
+	TP_STRUCT__entry(
+		RDI_DEV_ENTRY(cq->rdi)
+		__field(u64, wr_id)
+		__field(u32, status)
+		__field(u32, opcode)
+		__field(u32, qpn)
+		__field(u32, length)
+		__field(u32, idx)
+	),
+	TP_fast_assign(
+		RDI_DEV_ASSIGN(cq->rdi)
+		__entry->wr_id = wc->wr_id;
+		__entry->status = wc->status;
+		__entry->opcode = wc->opcode;
+		__entry->length = wc->byte_len;
+		__entry->qpn = wc->qp->qp_num;
+		__entry->idx = idx;
+	),
+	TP_printk(
+		CQ_PRN,
+		__get_str(dev),
+		__entry->idx,
+		__entry->wr_id,
+		__entry->status,
+		__entry->opcode, show_wc_opcode(__entry->opcode),
+		__entry->length,
+		__entry->qpn
+	)
+);
+
+DEFINE_EVENT(
+	rvt_cq_entry_template, rvt_cq_enter,
+	TP_PROTO(struct rvt_cq *cq, struct ib_wc *wc, u32 idx),
+	TP_ARGS(cq, wc, idx));
+
+DEFINE_EVENT(
+	rvt_cq_entry_template, rvt_cq_poll,
+	TP_PROTO(struct rvt_cq *cq, struct ib_wc *wc, u32 idx),
+	TP_ARGS(cq, wc, idx));
+
+#endif /* __RVT_TRACE_CQ_H */
+
+#undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE trace_cq
+#include <trace/define_trace.h>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 08/20] IB/rdmavt: Add swqe completion trace
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
                     ` (6 preceding siblings ...)
  2017-03-21  0:25   ` [PATCH v2 07/20] IB/rdmavt: Add tracing for cq entry and poll Dennis Dalessandro
@ 2017-03-21  0:25   ` Dennis Dalessandro
  2017-03-21  0:25   ` [PATCH v2 09/20] IB/hfi1: Check device id early during init Dennis Dalessandro
                     ` (10 subsequent siblings)
  18 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:25 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mike Marciniszyn

From: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

The following fields are available for filter/trace:
- wqe
- wr_id
- qpn
- qpt
- length
- idx
- ssn
- (wr)opcode
- (wr)send_flags

Reviewed-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/hfi1/rc.c       |    2 ++
 drivers/infiniband/hw/hfi1/ruc.c      |    2 ++
 drivers/infiniband/hw/hfi1/trace_tx.h |   43 +++++++++++++++++++++++++++++++++
 3 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/rc.c b/drivers/infiniband/hw/hfi1/rc.c
index 4649530..0e56578 100644
--- a/drivers/infiniband/hw/hfi1/rc.c
+++ b/drivers/infiniband/hw/hfi1/rc.c
@@ -1028,6 +1028,7 @@ void hfi1_rc_send_complete(struct rvt_qp *qp, struct ib_header *hdr)
 		    cmp_psn(qp->s_sending_psn, qp->s_sending_hpsn) <= 0)
 			break;
 		s_last = qp->s_last;
+		trace_hfi1_qp_send_completion(qp, wqe, s_last);
 		if (++s_last >= qp->s_size)
 			s_last = 0;
 		qp->s_last = s_last;
@@ -1079,6 +1080,7 @@ static inline void update_last_psn(struct rvt_qp *qp, u32 psn)
 
 		rvt_put_swqe(wqe);
 		s_last = qp->s_last;
+		trace_hfi1_qp_send_completion(qp, wqe, s_last);
 		if (++s_last >= qp->s_size)
 			s_last = 0;
 		qp->s_last = s_last;
diff --git a/drivers/infiniband/hw/hfi1/ruc.c b/drivers/infiniband/hw/hfi1/ruc.c
index d2eb793..eeb650d 100644
--- a/drivers/infiniband/hw/hfi1/ruc.c
+++ b/drivers/infiniband/hw/hfi1/ruc.c
@@ -909,8 +909,10 @@ void hfi1_send_complete(struct rvt_qp *qp, struct rvt_swqe *wqe,
 
 	last = qp->s_last;
 	old_last = last;
+	trace_hfi1_qp_send_completion(qp, wqe, last);
 	if (++last >= qp->s_size)
 		last = 0;
+	trace_hfi1_qp_send_completion(qp, wqe, last);
 	qp->s_last = last;
 	/* See post_send() */
 	barrier();
diff --git a/drivers/infiniband/hw/hfi1/trace_tx.h b/drivers/infiniband/hw/hfi1/trace_tx.h
index 415d6be..2c9ac57 100644
--- a/drivers/infiniband/hw/hfi1/trace_tx.h
+++ b/drivers/infiniband/hw/hfi1/trace_tx.h
@@ -633,6 +633,49 @@
 	     TP_PROTO(struct hfi1_devdata *dd, struct buffer_control *bc),
 	     TP_ARGS(dd, bc));
 
+TRACE_EVENT(
+	hfi1_qp_send_completion,
+	TP_PROTO(struct rvt_qp *qp, struct rvt_swqe *wqe, u32 idx),
+	TP_ARGS(qp, wqe, idx),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(dd_from_ibdev(qp->ibqp.device))
+		__field(struct rvt_swqe *, wqe)
+		__field(u64, wr_id)
+		__field(u32, qpn)
+		__field(u32, qpt)
+		__field(u32, length)
+		__field(u32, idx)
+		__field(u32, ssn)
+		__field(enum ib_wr_opcode, opcode)
+		__field(int, send_flags)
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device))
+		__entry->wqe = wqe;
+		__entry->wr_id = wqe->wr.wr_id;
+		__entry->qpn = qp->ibqp.qp_num;
+		__entry->qpt = qp->ibqp.qp_type;
+		__entry->length = wqe->length;
+		__entry->idx = idx;
+		__entry->ssn = wqe->ssn;
+		__entry->opcode = wqe->wr.opcode;
+		__entry->send_flags = wqe->wr.send_flags;
+	),
+	TP_printk(
+		"[%s] qpn 0x%x qpt %u wqe %p idx %u wr_id %llx length %u ssn %u opcode %x send_flags %x",
+		__get_str(dev),
+		__entry->qpn,
+		__entry->qpt,
+		__entry->wqe,
+		__entry->idx,
+		__entry->wr_id,
+		__entry->length,
+		__entry->ssn,
+		__entry->opcode,
+		__entry->send_flags
+	)
+);
+
 #endif /* __HFI1_TRACE_TX_H */
 
 #undef TRACE_INCLUDE_PATH

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 09/20] IB/hfi1: Check device id early during init
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
                     ` (7 preceding siblings ...)
  2017-03-21  0:25   ` [PATCH v2 08/20] IB/rdmavt: Add swqe completion trace Dennis Dalessandro
@ 2017-03-21  0:25   ` Dennis Dalessandro
  2017-03-21  0:25   ` [PATCH v2 10/20] IB/hfi1: Protect the global dev_cntr_names and port_cntr_names Dennis Dalessandro
                     ` (9 subsequent siblings)
  18 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:25 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Ira Weiny, Tadeusz Struk

From: Tadeusz Struk <tadeusz.struk-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

If there is a wrong device passed to the driver it should fail early,
without trying to initialize the device only to find out that it has
an invalid device later during the init.

Reviewed-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Tadeusz Struk <tadeusz.struk-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/hfi1/init.c |   19 ++++++++++---------
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c
index f40864e..9bfb8eb 100644
--- a/drivers/infiniband/hw/hfi1/init.c
+++ b/drivers/infiniband/hw/hfi1/init.c
@@ -1425,6 +1425,16 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* First, lock the non-writable module parameters */
 	HFI1_CAP_LOCK();
 
+	/* Validate dev ids */
+	if (!(ent->device == PCI_DEVICE_ID_INTEL0 ||
+	      ent->device == PCI_DEVICE_ID_INTEL1)) {
+		hfi1_early_err(&pdev->dev,
+			       "Failing on unknown Intel deviceid 0x%x\n",
+			       ent->device);
+		ret = -ENODEV;
+		goto bail;
+	}
+
 	/* Validate some global module parameters */
 	ret = init_validate_rcvhdrcnt(&pdev->dev, rcvhdrcnt);
 	if (ret)
@@ -1470,15 +1480,6 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (ret)
 		goto bail;
 
-	if (!(ent->device == PCI_DEVICE_ID_INTEL0 ||
-	      ent->device == PCI_DEVICE_ID_INTEL1)) {
-		hfi1_early_err(&pdev->dev,
-			       "Failing on unknown Intel deviceid 0x%x\n",
-			       ent->device);
-		ret = -ENODEV;
-		goto clean_bail;
-	}
-
 	/*
 	 * Do device-specific initialization, function table setup, dd
 	 * allocation, etc.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 10/20] IB/hfi1: Protect the global dev_cntr_names and port_cntr_names
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
                     ` (8 preceding siblings ...)
  2017-03-21  0:25   ` [PATCH v2 09/20] IB/hfi1: Check device id early during init Dennis Dalessandro
@ 2017-03-21  0:25   ` Dennis Dalessandro
  2017-03-21  0:25   ` [PATCH v2 11/20] IB/hfi1: Check for QSFP presence before attempting reads Dennis Dalessandro
                     ` (8 subsequent siblings)
  18 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:25 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Easwar Hariharan, Tadeusz Struk

From: Tadeusz Struk <tadeusz.struk-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Protect the global dev_cntr_names and port_cntr_names with the global
mutex as they are allocated and freed in a function called per device.
Otherwise there is a danger of double free and memory leaks.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Reviewed-by: Easwar Hariharan <easwar.hariharan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Tadeusz Struk <tadeusz.struk-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/hfi1/verbs.c |   12 +++++++++++-
 1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/verbs.c b/drivers/infiniband/hw/hfi1/verbs.c
index 815cb44..8d71654 100644
--- a/drivers/infiniband/hw/hfi1/verbs.c
+++ b/drivers/infiniband/hw/hfi1/verbs.c
@@ -1540,6 +1540,7 @@ static void hfi1_get_dev_fw_str(struct ib_device *ibdev, char *str,
 	"DRIVER_EgrHdrFull"
 };
 
+static DEFINE_MUTEX(cntr_names_lock); /* protects the *_cntr_names bufers */
 static const char **dev_cntr_names;
 static const char **port_cntr_names;
 static int num_driver_cntrs = ARRAY_SIZE(driver_cntr_names);
@@ -1594,6 +1595,7 @@ static int init_cntr_names(const char *names_in,
 {
 	int i, err;
 
+	mutex_lock(&cntr_names_lock);
 	if (!cntr_names_initialized) {
 		struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
 
@@ -1602,8 +1604,10 @@ static int init_cntr_names(const char *names_in,
 				      num_driver_cntrs,
 				      &num_dev_cntrs,
 				      &dev_cntr_names);
-		if (err)
+		if (err) {
+			mutex_unlock(&cntr_names_lock);
 			return NULL;
+		}
 
 		for (i = 0; i < num_driver_cntrs; i++)
 			dev_cntr_names[num_dev_cntrs + i] =
@@ -1617,10 +1621,12 @@ static int init_cntr_names(const char *names_in,
 		if (err) {
 			kfree(dev_cntr_names);
 			dev_cntr_names = NULL;
+			mutex_unlock(&cntr_names_lock);
 			return NULL;
 		}
 		cntr_names_initialized = 1;
 	}
+	mutex_unlock(&cntr_names_lock);
 
 	if (!port_num)
 		return rdma_alloc_hw_stats_struct(
@@ -1839,9 +1845,13 @@ void hfi1_unregister_ib_device(struct hfi1_devdata *dd)
 	del_timer_sync(&dev->mem_timer);
 	verbs_txreq_exit(dev);
 
+	mutex_lock(&cntr_names_lock);
 	kfree(dev_cntr_names);
 	kfree(port_cntr_names);
+	dev_cntr_names = NULL;
+	port_cntr_names = NULL;
 	cntr_names_initialized = 0;
+	mutex_unlock(&cntr_names_lock);
 }
 
 void hfi1_cnp_rcv(struct hfi1_packet *packet)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 11/20] IB/hfi1: Check for QSFP presence before attempting reads
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
                     ` (9 preceding siblings ...)
  2017-03-21  0:25   ` [PATCH v2 10/20] IB/hfi1: Protect the global dev_cntr_names and port_cntr_names Dennis Dalessandro
@ 2017-03-21  0:25   ` Dennis Dalessandro
  2017-03-21  0:25   ` [PATCH v2 12/20] IB/hfi1: Add a patch value to the firmware version string Dennis Dalessandro
                     ` (7 subsequent siblings)
  18 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:25 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Easwar Hariharan

From: Easwar Hariharan <easwar.hariharan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Attempting to read the status of a QSFP cable creates noise in the logs
and misses out on setting an appropriate Offline/Disabled Reason if the
cable is not plugged in. Check for this prior to attempting the read and
attendant retries.

Fixes: 673b975f1fba ("IB/hfi1: Add QSFP sanity pre-check")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Easwar Hariharan <easwar.hariharan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/hfi1/chip.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index 8b8840a..f9d0d8c 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -9533,8 +9533,11 @@ static int test_qsfp_read(struct hfi1_pportdata *ppd)
 	int ret;
 	u8 status;
 
-	/* report success if not a QSFP */
-	if (ppd->port_type != PORT_TYPE_QSFP)
+	/*
+	 * Report success if not a QSFP or, if it is a QSFP, but the cable is
+	 * not present
+	 */
+	if (ppd->port_type != PORT_TYPE_QSFP || !qsfp_mod_present(ppd))
 		return 0;
 
 	/* read byte 2, the status byte */

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 12/20] IB/hfi1: Add a patch value to the firmware version string
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
                     ` (10 preceding siblings ...)
  2017-03-21  0:25   ` [PATCH v2 11/20] IB/hfi1: Check for QSFP presence before attempting reads Dennis Dalessandro
@ 2017-03-21  0:25   ` Dennis Dalessandro
  2017-03-21  0:25   ` [PATCH v2 13/20] IB/rdmavt, IB/hfi1: Fix timer migration regressions Dennis Dalessandro
                     ` (6 subsequent siblings)
  18 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:25 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Michael J. Ruhl, Easwar Hariharan

From: Michael J. Ruhl <michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

The HFI firmware now includes a patch level in its version.
Updating the necessary code to include the patch version in the
firmware string.

Reviewed-by: Easwar Hariharan <easwar.hariharan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/hfi1/chip.c     |   23 +++++++++++++++--------
 drivers/infiniband/hw/hfi1/chip.h     |   18 +++++++++++-------
 drivers/infiniband/hw/hfi1/firmware.c |   14 ++++++++------
 drivers/infiniband/hw/hfi1/hfi.h      |    9 +++++----
 drivers/infiniband/hw/hfi1/verbs.c    |   14 ++++++++------
 5 files changed, 47 insertions(+), 31 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index f9d0d8c..77f4b41 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -1,5 +1,5 @@
 /*
- * Copyright(c) 2015, 2016 Intel Corporation.
+ * Copyright(c) 2015 - 2017 Intel Corporation.
  *
  * This file is provided under a dual BSD/GPLv2 license.  When using or
  * redistributing this file, you may do so under either license.
@@ -7166,7 +7166,7 @@ static void get_link_widths(struct hfi1_devdata *dd, u16 *tx_width,
 	 * set the max_rate field in handle_verify_cap until v0.19.
 	 */
 	if ((dd->icode == ICODE_RTL_SILICON) &&
-	    (dd->dc8051_ver < dc8051_ver(0, 19))) {
+	    (dd->dc8051_ver < dc8051_ver(0, 19, 0))) {
 		/* max_rate: 0 = 12.5G, 1 = 25G */
 		switch (max_rate) {
 		case 0:
@@ -7351,7 +7351,7 @@ void handle_verify_cap(struct work_struct *work)
 	}
 
 	ppd->link_speed_active = 0;	/* invalid value */
-	if (dd->dc8051_ver < dc8051_ver(0, 20)) {
+	if (dd->dc8051_ver < dc8051_ver(0, 20, 0)) {
 		/* remote_tx_rate: 0 = 12.5G, 1 = 25G */
 		switch (remote_tx_rate) {
 		case 0:
@@ -8422,7 +8422,7 @@ static int write_lcb_via_8051(struct hfi1_devdata *dd, u32 addr, u64 data)
 	int ret;
 
 	if (dd->icode == ICODE_FUNCTIONAL_SIMULATOR ||
-	    (dd->dc8051_ver < dc8051_ver(0, 20))) {
+	    (dd->dc8051_ver < dc8051_ver(0, 20, 0))) {
 		if (acquire_lcb_access(dd, 0) == 0) {
 			write_csr(dd, addr, data);
 			release_lcb_access(dd, 0);
@@ -8728,13 +8728,20 @@ static void read_remote_device_id(struct hfi1_devdata *dd, u16 *device_id,
 			& REMOTE_DEVICE_REV_MASK;
 }
 
-void read_misc_status(struct hfi1_devdata *dd, u8 *ver_a, u8 *ver_b)
+void read_misc_status(struct hfi1_devdata *dd, u8 *ver_major, u8 *ver_minor,
+		      u8 *ver_patch)
 {
 	u32 frame;
 
 	read_8051_config(dd, MISC_STATUS, GENERAL_CONFIG, &frame);
-	*ver_a = (frame >> STS_FM_VERSION_A_SHIFT) & STS_FM_VERSION_A_MASK;
-	*ver_b = (frame >> STS_FM_VERSION_B_SHIFT) & STS_FM_VERSION_B_MASK;
+	*ver_major = (frame >> STS_FM_VERSION_MAJOR_SHIFT) &
+		STS_FM_VERSION_MAJOR_MASK;
+	*ver_minor = (frame >> STS_FM_VERSION_MINOR_SHIFT) &
+		STS_FM_VERSION_MINOR_MASK;
+
+	read_8051_config(dd, VERSION_PATCH, GENERAL_CONFIG, &frame);
+	*ver_patch = (frame >> STS_FM_VERSION_PATCH_SHIFT) &
+		STS_FM_VERSION_PATCH_MASK;
 }
 
 static void read_vc_remote_phy(struct hfi1_devdata *dd, u8 *power_management,
@@ -9130,7 +9137,7 @@ static int set_local_link_attributes(struct hfi1_pportdata *ppd)
 	if (ret)
 		goto set_local_link_attributes_fail;
 
-	if (dd->dc8051_ver < dc8051_ver(0, 20)) {
+	if (dd->dc8051_ver < dc8051_ver(0, 20, 0)) {
 		/* set the tx rate to the fastest enabled */
 		if (ppd->link_speed_enabled & OPA_LINK_SPEED_25G)
 			ppd->local_tx_rate = 1;
diff --git a/drivers/infiniband/hw/hfi1/chip.h b/drivers/infiniband/hw/hfi1/chip.h
index 043fd21..24df45f 100644
--- a/drivers/infiniband/hw/hfi1/chip.h
+++ b/drivers/infiniband/hw/hfi1/chip.h
@@ -1,7 +1,7 @@
 #ifndef _CHIP_H
 #define _CHIP_H
 /*
- * Copyright(c) 2015, 2016 Intel Corporation.
+ * Copyright(c) 2015 - 2017 Intel Corporation.
  *
  * This file is provided under a dual BSD/GPLv2 license.  When using or
  * redistributing this file, you may do so under either license.
@@ -394,7 +394,8 @@
 #define LAST_REMOTE_STATE_COMPLETE   0x13
 #define LINK_QUALITY_INFO            0x14
 #define REMOTE_DEVICE_ID	     0x15
-#define LINK_DOWN_REASON	     0x16
+#define LINK_DOWN_REASON	     0x16 /* first byte of offset 0x16 */
+#define VERSION_PATCH		     0x16 /* last byte of offset 0x16 */
 
 /* 8051 lane specific register field IDs */
 #define TX_EQ_SETTINGS		0x00
@@ -524,10 +525,12 @@ enum {
 #define SUPPORTED_CRCS (CAP_CRC_14B | CAP_CRC_48B)
 
 /* misc status version fields */
-#define STS_FM_VERSION_A_SHIFT 16
-#define STS_FM_VERSION_A_MASK  0xff
-#define STS_FM_VERSION_B_SHIFT 24
-#define STS_FM_VERSION_B_MASK  0xff
+#define STS_FM_VERSION_MINOR_SHIFT 16
+#define STS_FM_VERSION_MINOR_MASK  0xff
+#define STS_FM_VERSION_MAJOR_SHIFT 24
+#define STS_FM_VERSION_MAJOR_MASK  0xff
+#define STS_FM_VERSION_PATCH_SHIFT 24
+#define STS_FM_VERSION_PATCH_MASK  0xff
 
 /* LCB_CFG_CRC_MODE TX_VAL and RX_VAL CRC mode values */
 #define LCB_CRC_16B			0x0	/* 16b CRC */
@@ -698,7 +701,8 @@ bool check_chip_resource(struct hfi1_devdata *dd, u32 resource,
 int read_8051_data(struct hfi1_devdata *dd, u32 addr, u32 len, u64 *result);
 
 /* chip.c */
-void read_misc_status(struct hfi1_devdata *dd, u8 *ver_a, u8 *ver_b);
+void read_misc_status(struct hfi1_devdata *dd, u8 *ver_major, u8 *ver_minor,
+		      u8 *ver_patch);
 void read_guid(struct hfi1_devdata *dd);
 int wait_fm_ready(struct hfi1_devdata *dd, u32 mstimeout);
 void set_link_down_reason(struct hfi1_pportdata *ppd, u8 lcl_reason,
diff --git a/drivers/infiniband/hw/hfi1/firmware.c b/drivers/infiniband/hw/hfi1/firmware.c
index 0dd50cd..4042c11 100644
--- a/drivers/infiniband/hw/hfi1/firmware.c
+++ b/drivers/infiniband/hw/hfi1/firmware.c
@@ -1,5 +1,5 @@
 /*
- * Copyright(c) 2015, 2016 Intel Corporation.
+ * Copyright(c) 2015 - 2017 Intel Corporation.
  *
  * This file is provided under a dual BSD/GPLv2 license.  When using or
  * redistributing this file, you may do so under either license.
@@ -1004,7 +1004,9 @@ static int load_8051_firmware(struct hfi1_devdata *dd,
 {
 	u64 reg;
 	int ret;
-	u8 ver_a, ver_b;
+	u8 ver_major;
+	u8 ver_minor;
+	u8 ver_patch;
 
 	/*
 	 * DC Reset sequence
@@ -1073,10 +1075,10 @@ static int load_8051_firmware(struct hfi1_devdata *dd,
 		return -ETIMEDOUT;
 	}
 
-	read_misc_status(dd, &ver_a, &ver_b);
-	dd_dev_info(dd, "8051 firmware version %d.%d\n",
-		    (int)ver_b, (int)ver_a);
-	dd->dc8051_ver = dc8051_ver(ver_b, ver_a);
+	read_misc_status(dd, &ver_major, &ver_minor, &ver_patch);
+	dd_dev_info(dd, "8051 firmware version %d.%d.%d\n",
+		    (int)ver_major, (int)ver_minor, (int)ver_patch);
+	dd->dc8051_ver = dc8051_ver(ver_major, ver_minor, ver_patch);
 
 	return 0;
 }
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h
index b69ab47..a31638c 100644
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -1020,7 +1020,7 @@ struct hfi1_devdata {
 	u8 qos_shift;
 
 	u16 irev;	/* implementation revision */
-	u16 dc8051_ver; /* 8051 firmware version */
+	u32 dc8051_ver; /* 8051 firmware version */
 
 	spinlock_t hfi1_diag_trans_lock; /* protect diag observer ops */
 	struct platform_config platform_config;
@@ -1173,9 +1173,10 @@ struct hfi1_devdata {
 };
 
 /* 8051 firmware version helper */
-#define dc8051_ver(a, b) ((a) << 8 | (b))
-#define dc8051_ver_maj(a) ((a & 0xff00) >> 8)
-#define dc8051_ver_min(a)  (a & 0x00ff)
+#define dc8051_ver(a, b, c) ((a) << 16 | (b) << 8 | (c))
+#define dc8051_ver_maj(a) (((a) & 0xff0000) >> 16)
+#define dc8051_ver_min(a) (((a) & 0x00ff00) >> 8)
+#define dc8051_ver_patch(a) ((a) & 0x0000ff)
 
 /* f_put_tid types */
 #define PT_EXPECTED 0
diff --git a/drivers/infiniband/hw/hfi1/verbs.c b/drivers/infiniband/hw/hfi1/verbs.c
index 8d71654..928918c 100644
--- a/drivers/infiniband/hw/hfi1/verbs.c
+++ b/drivers/infiniband/hw/hfi1/verbs.c
@@ -1236,12 +1236,14 @@ int hfi1_verbs_send(struct rvt_qp *qp, struct hfi1_pkt_state *ps)
 static void hfi1_fill_device_attr(struct hfi1_devdata *dd)
 {
 	struct rvt_dev_info *rdi = &dd->verbs_dev.rdi;
-	u16 ver = dd->dc8051_ver;
+	u32 ver = dd->dc8051_ver;
 
 	memset(&rdi->dparms.props, 0, sizeof(rdi->dparms.props));
 
-	rdi->dparms.props.fw_ver = ((u64)(dc8051_ver_maj(ver)) << 16) |
-				    (u64)dc8051_ver_min(ver);
+	rdi->dparms.props.fw_ver = ((u64)(dc8051_ver_maj(ver)) << 32) |
+		((u64)(dc8051_ver_min(ver)) << 16) |
+		(u64)dc8051_ver_patch(ver);
+
 	rdi->dparms.props.device_cap_flags = IB_DEVICE_BAD_PKEY_CNTR |
 			IB_DEVICE_BAD_QKEY_CNTR | IB_DEVICE_SHUTDOWN_PORT |
 			IB_DEVICE_SYS_IMAGE_GUID | IB_DEVICE_RC_RNR_NAK_GEN |
@@ -1520,10 +1522,10 @@ static void hfi1_get_dev_fw_str(struct ib_device *ibdev, char *str,
 {
 	struct rvt_dev_info *rdi = ib_to_rvt(ibdev);
 	struct hfi1_ibdev *dev = dev_from_rdi(rdi);
-	u16 ver = dd_from_dev(dev)->dc8051_ver;
+	u32 ver = dd_from_dev(dev)->dc8051_ver;
 
-	snprintf(str, str_len, "%u.%u", dc8051_ver_maj(ver),
-		 dc8051_ver_min(ver));
+	snprintf(str, str_len, "%u.%u.%u", dc8051_ver_maj(ver),
+		 dc8051_ver_min(ver), dc8051_ver_patch(ver));
 }
 
 static const char * const driver_cntr_names[] = {

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 13/20] IB/rdmavt, IB/hfi1: Fix timer migration regressions
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
                     ` (11 preceding siblings ...)
  2017-03-21  0:25   ` [PATCH v2 12/20] IB/hfi1: Add a patch value to the firmware version string Dennis Dalessandro
@ 2017-03-21  0:25   ` Dennis Dalessandro
  2017-03-21  0:26   ` [PATCH v2 14/20] IB/rdmavt: Avoid reseting wqe send_flags in unreserve Dennis Dalessandro
                     ` (5 subsequent siblings)
  18 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:25 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mike Marciniszyn, Brian Welty,
	Sebastian Sanchez

From: Sebastian Sanchez <sebastian.sanchez-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

RC timeout counter isn't getting incremented.
Increment counter and add the trace for it.

Fixes: 87c23b4ab018 ("IB/rdmavt: Adding timer logic to rdmavt")
Reviewed-by: Brian Welty <brian.welty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/hfi1/trace_rc.h   |    7 --
 drivers/infiniband/sw/rdmavt/qp.c       |    6 +-
 drivers/infiniband/sw/rdmavt/trace.h    |    3 +
 drivers/infiniband/sw/rdmavt/trace_rc.h |  109 +++++++++++++++++++++++++++++++
 4 files changed, 117 insertions(+), 8 deletions(-)
 create mode 100644 drivers/infiniband/sw/rdmavt/trace_rc.h

diff --git a/drivers/infiniband/hw/hfi1/trace_rc.h b/drivers/infiniband/hw/hfi1/trace_rc.h
index 5ea5005..8ce4765 100644
--- a/drivers/infiniband/hw/hfi1/trace_rc.h
+++ b/drivers/infiniband/hw/hfi1/trace_rc.h
@@ -1,5 +1,5 @@
 /*
-* Copyright(c) 2015, 2016 Intel Corporation.
+* Copyright(c) 2015, 2016, 2017 Intel Corporation.
 *
 * This file is provided under a dual BSD/GPLv2 license.  When using or
 * redistributing this file, you may do so under either license.
@@ -104,11 +104,6 @@
 	     TP_ARGS(qp, psn)
 );
 
-DEFINE_EVENT(hfi1_rc_template, hfi1_timeout,
-	     TP_PROTO(struct rvt_qp *qp, u32 psn),
-	     TP_ARGS(qp, psn)
-);
-
 DEFINE_EVENT(hfi1_rc_template, hfi1_rcv_error,
 	     TP_PROTO(struct rvt_qp *qp, u32 psn),
 	     TP_ARGS(qp, psn)
diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c
index 3c55a8b..d7dabdf 100644
--- a/drivers/infiniband/sw/rdmavt/qp.c
+++ b/drivers/infiniband/sw/rdmavt/qp.c
@@ -1,5 +1,5 @@
 /*
- * Copyright(c) 2016 Intel Corporation.
+ * Copyright(c) 2016, 2017 Intel Corporation.
  *
  * This file is provided under a dual BSD/GPLv2 license.  When using or
  * redistributing this file, you may do so under either license.
@@ -2052,8 +2052,12 @@ static void rvt_rc_timeout(unsigned long arg)
 	spin_lock_irqsave(&qp->r_lock, flags);
 	spin_lock(&qp->s_lock);
 	if (qp->s_flags & RVT_S_TIMER) {
+		struct rvt_ibport *rvp = rdi->ports[qp->port_num - 1];
+
 		qp->s_flags &= ~RVT_S_TIMER;
+		rvp->n_rc_timeouts++;
 		del_timer(&qp->s_timer);
+		trace_rvt_rc_timeout(qp, qp->s_last_psn + 1);
 		if (rdi->driver_f.notify_restart_rc)
 			rdi->driver_f.notify_restart_rc(qp,
 							qp->s_last_psn + 1,
diff --git a/drivers/infiniband/sw/rdmavt/trace.h b/drivers/infiniband/sw/rdmavt/trace.h
index 89554c0..bb4b1e7 100644
--- a/drivers/infiniband/sw/rdmavt/trace.h
+++ b/drivers/infiniband/sw/rdmavt/trace.h
@@ -1,5 +1,5 @@
 /*
- * Copyright(c) 2016 Intel Corporation.
+ * Copyright(c) 2016, 2017 Intel Corporation.
  *
  * This file is provided under a dual BSD/GPLv2 license.  When using or
  * redistributing this file, you may do so under either license.
@@ -53,3 +53,4 @@
 #include "trace_tx.h"
 #include "trace_mr.h"
 #include "trace_cq.h"
+#include "trace_rc.h"
diff --git a/drivers/infiniband/sw/rdmavt/trace_rc.h b/drivers/infiniband/sw/rdmavt/trace_rc.h
new file mode 100644
index 0000000..9952769
--- /dev/null
+++ b/drivers/infiniband/sw/rdmavt/trace_rc.h
@@ -0,0 +1,109 @@
+/*
+ * Copyright(c) 2017 Intel Corporation.
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#if !defined(__RVT_TRACE_RC_H) || defined(TRACE_HEADER_MULTI_READ)
+#define __RVT_TRACE_RC_H
+
+#include <linux/tracepoint.h>
+#include <linux/trace_seq.h>
+
+#include <rdma/ib_verbs.h>
+#include <rdma/rdma_vt.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM rvt_rc
+
+DECLARE_EVENT_CLASS(rvt_rc_template,
+		    TP_PROTO(struct rvt_qp *qp, u32 psn),
+		    TP_ARGS(qp, psn),
+		    TP_STRUCT__entry(
+			RDI_DEV_ENTRY(ib_to_rvt(qp->ibqp.device))
+			__field(u32, qpn)
+			__field(u32, s_flags)
+			__field(u32, psn)
+			__field(u32, s_psn)
+			__field(u32, s_next_psn)
+			__field(u32, s_sending_psn)
+			__field(u32, s_sending_hpsn)
+			__field(u32, r_psn)
+			),
+		    TP_fast_assign(
+			RDI_DEV_ASSIGN(ib_to_rvt(qp->ibqp.device))
+			__entry->qpn = qp->ibqp.qp_num;
+			__entry->s_flags = qp->s_flags;
+			__entry->psn = psn;
+			__entry->s_psn = qp->s_psn;
+			__entry->s_next_psn = qp->s_next_psn;
+			__entry->s_sending_psn = qp->s_sending_psn;
+			__entry->s_sending_hpsn = qp->s_sending_hpsn;
+			__entry->r_psn = qp->r_psn;
+			),
+		    TP_printk(
+			"[%s] qpn 0x%x s_flags 0x%x psn 0x%x s_psn 0x%x s_next_psn 0x%x s_sending_psn 0x%x sending_hpsn 0x%x r_psn 0x%x",
+			__get_str(dev),
+			__entry->qpn,
+			__entry->s_flags,
+			__entry->psn,
+			__entry->s_psn,
+			__entry->s_next_psn,
+			__entry->s_sending_psn,
+			__entry->s_sending_hpsn,
+			__entry->r_psn
+			)
+);
+
+DEFINE_EVENT(rvt_rc_template, rvt_rc_timeout,
+	     TP_PROTO(struct rvt_qp *qp, u32 psn),
+	     TP_ARGS(qp, psn)
+);
+
+#endif /* __RVT_TRACE_RC_H */
+
+#undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE trace_rc
+#include <trace/define_trace.h>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 14/20] IB/rdmavt: Avoid reseting wqe send_flags in unreserve
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
                     ` (12 preceding siblings ...)
  2017-03-21  0:25   ` [PATCH v2 13/20] IB/rdmavt, IB/hfi1: Fix timer migration regressions Dennis Dalessandro
@ 2017-03-21  0:26   ` Dennis Dalessandro
  2017-03-21  0:26   ` [PATCH v2 15/20] IB/hfi1: Ensure VL index is within bounds Dennis Dalessandro
                     ` (4 subsequent siblings)
  18 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:26 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Don Hiatt, Mike Marciniszyn

From: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

The wqe should be read only and in fact the superfluous reset of the
RVT_SEND_RESERVE_USED flag causes an issue where reserved operations
elicit a bad completion to the ULP.

The maintenance of the flag is now entirely within rvt_post_one_wr()
where a reserved operation will set the flag and a non-reserved operation
will insure the operation that is about to be posted has the flag reset.

Fixes: Commit 856cc4c237ad ("IB/hfi1: Add the capability for reserved operations")
Reviewed-by: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/sw/rdmavt/qp.c |    7 +++++--
 include/rdma/rdmavt_qp.h          |    4 +---
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c
index d7dabdf..728f5f1 100644
--- a/drivers/infiniband/sw/rdmavt/qp.c
+++ b/drivers/infiniband/sw/rdmavt/qp.c
@@ -1772,10 +1772,13 @@ static int rvt_post_one_wr(struct rvt_qp *qp,
 					0);
 		qp->s_next_psn = wqe->lpsn + 1;
 	}
-	if (unlikely(reserved_op))
+	if (unlikely(reserved_op)) {
+		wqe->wr.send_flags |= RVT_SEND_RESERVE_USED;
 		rvt_qp_wqe_reserve(qp, wqe);
-	else
+	} else {
+		wqe->wr.send_flags &= ~RVT_SEND_RESERVE_USED;
 		qp->s_avail--;
+	}
 	trace_rvt_post_one_wr(qp, wqe);
 	smp_wmb(); /* see request builders */
 	qp->s_head = next;
diff --git a/include/rdma/rdmavt_qp.h b/include/rdma/rdmavt_qp.h
index 3cdd9e2..e3bb312 100644
--- a/include/rdma/rdmavt_qp.h
+++ b/include/rdma/rdmavt_qp.h
@@ -2,7 +2,7 @@
 #define DEF_RDMAVT_INCQP_H
 
 /*
- * Copyright(c) 2016 Intel Corporation.
+ * Copyright(c) 2016, 2017 Intel Corporation.
  *
  * This file is provided under a dual BSD/GPLv2 license.  When using or
  * redistributing this file, you may do so under either license.
@@ -526,7 +526,6 @@ static inline void rvt_qp_wqe_reserve(
 	struct rvt_qp *qp,
 	struct rvt_swqe *wqe)
 {
-	wqe->wr.send_flags |= RVT_SEND_RESERVE_USED;
 	atomic_inc(&qp->s_reserved_used);
 }
 
@@ -550,7 +549,6 @@ static inline void rvt_qp_wqe_unreserve(
 	struct rvt_swqe *wqe)
 {
 	if (unlikely(wqe->wr.send_flags & RVT_SEND_RESERVE_USED)) {
-		wqe->wr.send_flags &= ~RVT_SEND_RESERVE_USED;
 		atomic_dec(&qp->s_reserved_used);
 		/* insure no compiler re-order up to s_last change */
 		smp_mb__after_atomic();

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 15/20] IB/hfi1: Ensure VL index is within bounds
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
                     ` (13 preceding siblings ...)
  2017-03-21  0:26   ` [PATCH v2 14/20] IB/rdmavt: Avoid reseting wqe send_flags in unreserve Dennis Dalessandro
@ 2017-03-21  0:26   ` Dennis Dalessandro
  2017-03-21  0:26   ` [PATCH v2 16/20] IB/hfi1: Add receive fault injection feature Dennis Dalessandro
                     ` (3 subsequent siblings)
  18 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:26 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Michael J. Ruhl

From: Michael J. Ruhl <michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Improve the safety of the code and ensure the array cannot be indexed
out of bounds when picking the CPU for a given SDMA engine.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/hfi1/sdma.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/sdma.c b/drivers/infiniband/hw/hfi1/sdma.c
index 9bee28d..1f7bf30 100644
--- a/drivers/infiniband/hw/hfi1/sdma.c
+++ b/drivers/infiniband/hw/hfi1/sdma.c
@@ -962,6 +962,11 @@ ssize_t sdma_set_cpu_to_sde_map(struct sdma_engine *sde, const char *buf,
 			continue;
 		}
 
+		if (vl >= ARRAY_SIZE(rht_node->map)) {
+			ret = -EINVAL;
+			goto out;
+		}
+
 		rht_node = rhashtable_lookup_fast(dd->sdma_rht, &cpu,
 						  sdma_rht_params);
 		if (!rht_node) {

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 16/20] IB/hfi1: Add receive fault injection feature
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
                     ` (14 preceding siblings ...)
  2017-03-21  0:26   ` [PATCH v2 15/20] IB/hfi1: Ensure VL index is within bounds Dennis Dalessandro
@ 2017-03-21  0:26   ` Dennis Dalessandro
  2017-03-21  0:26   ` [PATCH v2 17/20] IB/hfi1: Add transmit " Dennis Dalessandro
                     ` (2 subsequent siblings)
  18 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:26 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Don Hiatt, Mike Marciniszyn

From: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Add fault injection capability:
  - Drop packets unconditionally (fault_by_packet)
  - Drop packets based on opcode (fault_by_opcode)

This feature reacts to the global FAULT_INJECTION
config flag.

The faulting traces have been added:
  - misc/fault_opcode
  - misc/fault_packet

See 'Documentation/fault-injection/fault-injection.txt'
for details.

Examples:
  - Dropping packets by opcode:
    /sys/kernel/debug/hfi1/hfi1_X/fault_opcode
	# Enable fault
	echo Y > fault_by_opcode
	# Setprobability of dropping (0-100%)
	# echo 25 > probability
	# Set opcode
	echo 0x64 > opcode
	# Number of times to fault
	echo 3 > times
	# An optional mask allows you to fault
	# a range of opcodes
	echo 0xf0 > mask
    /sys/kernel/debug/hfi1/hfi1_X/fault_stats
    contains a value in parentheses to indicate
    number of each opcode dropped.

  - Dropping packets unconditionally
    /sys/kernel/debug/hfi1/hfi1_X/fault_packet
	# Enable fault
	echo Y > fault_by_packet
    /sys/kernel/debug/hfi1/hfi1_X/fault_packet/fault_stats
    contains the number of packets dropped.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/hfi1/debugfs.c    |  222 +++++++++++++++++++++++++++++++
 drivers/infiniband/hw/hfi1/debugfs.h    |   51 +++++++
 drivers/infiniband/hw/hfi1/driver.c     |    8 +
 drivers/infiniband/hw/hfi1/trace_misc.h |   48 +++++++
 drivers/infiniband/hw/hfi1/verbs.c      |    6 +
 drivers/infiniband/hw/hfi1/verbs.h      |    4 +
 6 files changed, 336 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/debugfs.c b/drivers/infiniband/hw/hfi1/debugfs.c
index 7fe9dd8..cac6d52 100644
--- a/drivers/infiniband/hw/hfi1/debugfs.c
+++ b/drivers/infiniband/hw/hfi1/debugfs.c
@@ -51,8 +51,12 @@
 #include <linux/export.h>
 #include <linux/module.h>
 #include <linux/string.h>
+#include <linux/types.h>
+#include <linux/ratelimit.h>
+#include <linux/fault-inject.h>
 
 #include "hfi.h"
+#include "trace.h"
 #include "debugfs.h"
 #include "device.h"
 #include "qp.h"
@@ -1063,6 +1067,217 @@ static int _sdma_cpu_list_seq_show(struct seq_file *s, void *v)
 DEBUGFS_SEQ_FILE_OPEN(sdma_cpu_list)
 DEBUGFS_FILE_OPS(sdma_cpu_list);
 
+#ifdef CONFIG_FAULT_INJECTION
+static void *_fault_stats_seq_start(struct seq_file *s, loff_t *pos)
+{
+	struct hfi1_opcode_stats_perctx *opstats;
+
+	if (*pos >= ARRAY_SIZE(opstats->stats))
+		return NULL;
+	return pos;
+}
+
+static void *_fault_stats_seq_next(struct seq_file *s, void *v, loff_t *pos)
+{
+	struct hfi1_opcode_stats_perctx *opstats;
+
+	++*pos;
+	if (*pos >= ARRAY_SIZE(opstats->stats))
+		return NULL;
+	return pos;
+}
+
+static void _fault_stats_seq_stop(struct seq_file *s, void *v)
+{
+}
+
+static int _fault_stats_seq_show(struct seq_file *s, void *v)
+{
+	loff_t *spos = v;
+	loff_t i = *spos, j;
+	u64 n_packets = 0, n_bytes = 0;
+	struct hfi1_ibdev *ibd = (struct hfi1_ibdev *)s->private;
+	struct hfi1_devdata *dd = dd_from_dev(ibd);
+
+	for (j = 0; j < dd->first_user_ctxt; j++) {
+		if (!dd->rcd[j])
+			continue;
+		n_packets += dd->rcd[j]->opstats->stats[i].n_packets;
+		n_bytes += dd->rcd[j]->opstats->stats[i].n_bytes;
+	}
+	if (!n_packets && !n_bytes)
+		return SEQ_SKIP;
+	if (!ibd->fault_opcode->n_rxfaults[i] &&
+	    !ibd->fault_opcode->n_txfaults[i])
+		return SEQ_SKIP;
+	seq_printf(s, "%02llx %llu/%llu (faults rx:%llu faults: tx:%llu)\n", i,
+		   (unsigned long long)n_packets,
+		   (unsigned long long)n_bytes,
+		   (unsigned long long)ibd->fault_opcode->n_rxfaults[i],
+		   (unsigned long long)ibd->fault_opcode->n_txfaults[i]);
+	return 0;
+}
+
+DEBUGFS_SEQ_FILE_OPS(fault_stats);
+DEBUGFS_SEQ_FILE_OPEN(fault_stats);
+DEBUGFS_FILE_OPS(fault_stats);
+
+static void fault_exit_opcode_debugfs(struct hfi1_ibdev *ibd)
+{
+	debugfs_remove_recursive(ibd->fault_opcode->dir);
+	kfree(ibd->fault_opcode);
+	ibd->fault_opcode = NULL;
+}
+
+static int fault_init_opcode_debugfs(struct hfi1_ibdev *ibd)
+{
+	struct dentry *parent = ibd->hfi1_ibdev_dbg;
+
+	ibd->fault_opcode = kzalloc(sizeof(*ibd->fault_opcode), GFP_KERNEL);
+	if (!ibd->fault_opcode)
+		return -ENOMEM;
+
+	ibd->fault_opcode->attr.interval = 1;
+	ibd->fault_opcode->attr.require_end = ULONG_MAX;
+	ibd->fault_opcode->attr.stacktrace_depth = 32;
+	ibd->fault_opcode->attr.dname = NULL;
+	ibd->fault_opcode->attr.verbose = 0;
+	ibd->fault_opcode->fault_by_opcode = false;
+	ibd->fault_opcode->opcode = 0;
+	ibd->fault_opcode->mask = 0xff;
+
+	ibd->fault_opcode->dir =
+		fault_create_debugfs_attr("fault_opcode",
+					  parent,
+					  &ibd->fault_opcode->attr);
+	if (IS_ERR(ibd->fault_opcode->dir)) {
+		kfree(ibd->fault_opcode);
+		return -ENOENT;
+	}
+
+	DEBUGFS_SEQ_FILE_CREATE(fault_stats, ibd->fault_opcode->dir, ibd);
+	if (!debugfs_create_bool("fault_by_opcode", 0600,
+				 ibd->fault_opcode->dir,
+				 &ibd->fault_opcode->fault_by_opcode))
+		goto fail;
+	if (!debugfs_create_x8("opcode", 0600, ibd->fault_opcode->dir,
+			       &ibd->fault_opcode->opcode))
+		goto fail;
+	if (!debugfs_create_x8("mask", 0600, ibd->fault_opcode->dir,
+			       &ibd->fault_opcode->mask))
+		goto fail;
+
+	return 0;
+fail:
+	fault_exit_opcode_debugfs(ibd);
+	return -ENOMEM;
+}
+
+static void fault_exit_packet_debugfs(struct hfi1_ibdev *ibd)
+{
+	debugfs_remove_recursive(ibd->fault_packet->dir);
+	kfree(ibd->fault_packet);
+	ibd->fault_packet = NULL;
+}
+
+static int fault_init_packet_debugfs(struct hfi1_ibdev *ibd)
+{
+	struct dentry *parent = ibd->hfi1_ibdev_dbg;
+
+	ibd->fault_packet = kzalloc(sizeof(*ibd->fault_packet), GFP_KERNEL);
+	if (!ibd->fault_packet)
+		return -ENOMEM;
+
+	ibd->fault_packet->attr.interval = 1;
+	ibd->fault_packet->attr.require_end = ULONG_MAX;
+	ibd->fault_packet->attr.stacktrace_depth = 32;
+	ibd->fault_packet->attr.dname = NULL;
+	ibd->fault_packet->attr.verbose = 0;
+	ibd->fault_packet->fault_by_packet = false;
+
+	ibd->fault_packet->dir =
+		fault_create_debugfs_attr("fault_packet",
+					  parent,
+					  &ibd->fault_opcode->attr);
+	if (IS_ERR(ibd->fault_packet->dir)) {
+		kfree(ibd->fault_packet);
+		return -ENOENT;
+	}
+
+	if (!debugfs_create_bool("fault_by_packet", 0600,
+				 ibd->fault_packet->dir,
+				 &ibd->fault_packet->fault_by_packet))
+		goto fail;
+	if (!debugfs_create_u64("fault_stats", 0400,
+				ibd->fault_packet->dir,
+				&ibd->fault_packet->n_faults))
+		goto fail;
+
+	return 0;
+fail:
+	fault_exit_packet_debugfs(ibd);
+	return -ENOMEM;
+}
+
+static void fault_exit_debugfs(struct hfi1_ibdev *ibd)
+{
+	fault_exit_opcode_debugfs(ibd);
+	fault_exit_packet_debugfs(ibd);
+}
+
+static int fault_init_debugfs(struct hfi1_ibdev *ibd)
+{
+	int ret = 0;
+
+	ret = fault_init_opcode_debugfs(ibd);
+	if (ret)
+		return ret;
+
+	ret = fault_init_packet_debugfs(ibd);
+	if (ret)
+		fault_exit_opcode_debugfs(ibd);
+
+	return ret;
+}
+
+bool hfi1_dbg_fault_opcode(struct rvt_qp *qp, u32 opcode, bool rx)
+{
+	bool ret = false;
+	struct hfi1_ibdev *ibd = to_idev(qp->ibqp.device);
+
+	if (!ibd->fault_opcode || !ibd->fault_opcode->fault_by_opcode)
+		return false;
+	if (ibd->fault_opcode->opcode != (opcode & ibd->fault_opcode->mask))
+		return false;
+	ret = should_fail(&ibd->fault_opcode->attr, 1);
+	if (ret) {
+		trace_hfi1_fault_opcode(qp, opcode);
+		if (rx)
+			ibd->fault_opcode->n_rxfaults[opcode]++;
+		else
+			ibd->fault_opcode->n_txfaults[opcode]++;
+	}
+	return ret;
+}
+
+bool hfi1_dbg_fault_packet(struct hfi1_packet *packet)
+{
+	struct rvt_dev_info *rdi = &packet->rcd->ppd->dd->verbs_dev.rdi;
+	struct hfi1_ibdev *ibd = dev_from_rdi(rdi);
+	bool ret = false;
+
+	if (!ibd->fault_packet || !ibd->fault_packet->fault_by_packet)
+		return false;
+
+	ret = should_fail(&ibd->fault_packet->attr, 1);
+	if (ret) {
+		++ibd->fault_packet->n_faults;
+		trace_hfi1_fault_packet(packet);
+	}
+	return ret;
+}
+#endif
+
 void hfi1_dbg_ibdev_init(struct hfi1_ibdev *ibd)
 {
 	char name[sizeof("port0counters") + 1];
@@ -1112,12 +1327,19 @@ void hfi1_dbg_ibdev_init(struct hfi1_ibdev *ibd)
 					    !port_cntr_ops[i].ops.write ?
 					    S_IRUGO : S_IRUGO | S_IWUSR);
 		}
+
+#ifdef CONFIG_FAULT_INJECTION
+	fault_init_debugfs(ibd);
+#endif
 }
 
 void hfi1_dbg_ibdev_exit(struct hfi1_ibdev *ibd)
 {
 	if (!hfi1_dbg_root)
 		goto out;
+#ifdef CONFIG_FAULT_INJECTION
+	fault_exit_debugfs(ibd);
+#endif
 	debugfs_remove(ibd->hfi1_ibdev_link);
 	debugfs_remove_recursive(ibd->hfi1_ibdev_dbg);
 out:
diff --git a/drivers/infiniband/hw/hfi1/debugfs.h b/drivers/infiniband/hw/hfi1/debugfs.h
index b6fb681..70be5ca 100644
--- a/drivers/infiniband/hw/hfi1/debugfs.h
+++ b/drivers/infiniband/hw/hfi1/debugfs.h
@@ -53,23 +53,68 @@
 void hfi1_dbg_ibdev_exit(struct hfi1_ibdev *ibd);
 void hfi1_dbg_init(void);
 void hfi1_dbg_exit(void);
+
+#ifdef CONFIG_FAULT_INJECTION
+#include <linux/fault-inject.h>
+struct fault_opcode {
+	struct fault_attr attr;
+	struct dentry *dir;
+	bool fault_by_opcode;
+	u64 n_rxfaults[256];
+	u64 n_txfaults[256];
+	u8 opcode;
+	u8 mask;
+};
+
+struct fault_packet {
+	struct fault_attr attr;
+	struct dentry *dir;
+	bool fault_by_packet;
+	u64 n_faults;
+};
+
+bool hfi1_dbg_fault_opcode(struct rvt_qp *qp, u32 opcode, bool rx);
+bool hfi1_dbg_fault_packet(struct hfi1_packet *packet);
+#else
+static inline bool hfi1_dbg_fault_packet(struct hfi1_packet *packet)
+{
+	return false;
+}
+
+static inline bool hfi1_dbg_fault_opcode(struct rvt_qp *qp,
+					 u32 opcode, bool rx)
+{
+	return false;
+}
+#endif
+
 #else
 static inline void hfi1_dbg_ibdev_init(struct hfi1_ibdev *ibd)
 {
 }
 
-void hfi1_dbg_ibdev_exit(struct hfi1_ibdev *ibd)
+static inline void hfi1_dbg_ibdev_exit(struct hfi1_ibdev *ibd)
+{
+}
+
+static inline void hfi1_dbg_init(void)
 {
 }
 
-void hfi1_dbg_init(void)
+static inline void hfi1_dbg_exit(void)
 {
 }
 
-void hfi1_dbg_exit(void)
+static inline bool hfi1_dbg_fault_packet(struct hfi1_packet *packet)
 {
+	return false;
 }
 
+static inline bool hfi1_dbg_fault_opcode(struct rvt_qp *qp,
+					 u32 opcode, bool rx)
+{
+	return false;
+}
 #endif
 
 #endif                          /* _HFI1_DEBUGFS_H */
diff --git a/drivers/infiniband/hw/hfi1/driver.c b/drivers/infiniband/hw/hfi1/driver.c
index 3881c95..c0b012f 100644
--- a/drivers/infiniband/hw/hfi1/driver.c
+++ b/drivers/infiniband/hw/hfi1/driver.c
@@ -59,6 +59,7 @@
 #include "trace.h"
 #include "qp.h"
 #include "sdma.h"
+#include "debugfs.h"
 
 #undef pr_fmt
 #define pr_fmt(fmt) DRIVER_NAME ": " fmt
@@ -1354,6 +1355,9 @@ void handle_eflags(struct hfi1_packet *packet)
  */
 int process_receive_ib(struct hfi1_packet *packet)
 {
+	if (unlikely(hfi1_dbg_fault_packet(packet)))
+		return RHF_RCV_CONTINUE;
+
 	trace_hfi1_rcvhdr(packet->rcd->ppd->dd,
 			  packet->rcd->ctxt,
 			  rhf_err_flags(packet->rhf),
@@ -1409,6 +1413,8 @@ int process_receive_error(struct hfi1_packet *packet)
 
 int kdeth_process_expected(struct hfi1_packet *packet)
 {
+	if (unlikely(hfi1_dbg_fault_packet(packet)))
+		return RHF_RCV_CONTINUE;
 	if (unlikely(rhf_err_flags(packet->rhf)))
 		handle_eflags(packet);
 
@@ -1421,6 +1427,8 @@ int kdeth_process_eager(struct hfi1_packet *packet)
 {
 	if (unlikely(rhf_err_flags(packet->rhf)))
 		handle_eflags(packet);
+	if (unlikely(hfi1_dbg_fault_packet(packet)))
+		return RHF_RCV_CONTINUE;
 
 	dd_dev_err(packet->rcd->dd,
 		   "Unhandled eager packet received. Dropping.\n");
diff --git a/drivers/infiniband/hw/hfi1/trace_misc.h b/drivers/infiniband/hw/hfi1/trace_misc.h
index d308454..deac77d 100644
--- a/drivers/infiniband/hw/hfi1/trace_misc.h
+++ b/drivers/infiniband/hw/hfi1/trace_misc.h
@@ -72,6 +72,54 @@
 		      __entry->src)
 );
 
+#ifdef CONFIG_FAULT_INJECTION
+TRACE_EVENT(hfi1_fault_opcode,
+	    TP_PROTO(struct rvt_qp *qp, u8 opcode),
+	    TP_ARGS(qp, opcode),
+	    TP_STRUCT__entry(DD_DEV_ENTRY(dd_from_ibdev(qp->ibqp.device))
+			     __field(u32, qpn)
+			     __field(u8, opcode)
+			     ),
+	    TP_fast_assign(DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device))
+			   __entry->qpn = qp->ibqp.qp_num;
+			   __entry->opcode = opcode;
+			   ),
+	    TP_printk("[%s] qpn 0x%x opcode 0x%x",
+		      __get_str(dev), __entry->qpn, __entry->opcode)
+);
+
+TRACE_EVENT(hfi1_fault_packet,
+	    TP_PROTO(struct hfi1_packet *packet),
+	    TP_ARGS(packet),
+	    TP_STRUCT__entry(DD_DEV_ENTRY(packet->rcd->ppd->dd)
+			     __field(u64, eflags)
+			     __field(u32, ctxt)
+			     __field(u32, hlen)
+			     __field(u32, tlen)
+			     __field(u32, updegr)
+			     __field(u32, etail)
+			     ),
+	     TP_fast_assign(DD_DEV_ASSIGN(packet->rcd->ppd->dd);
+			    __entry->eflags = rhf_err_flags(packet->rhf);
+			    __entry->ctxt = packet->rcd->ctxt;
+			    __entry->hlen = packet->hlen;
+			    __entry->tlen = packet->tlen;
+			    __entry->updegr = packet->updegr;
+			    __entry->etail = rhf_egr_index(packet->rhf);
+			    ),
+	     TP_printk(
+		"[%s] ctxt %d eflags 0x%llx hlen %d tlen %d updegr %d etail %d",
+		__get_str(dev),
+		__entry->ctxt,
+		__entry->eflags,
+		__entry->hlen,
+		__entry->tlen,
+		__entry->updegr,
+		__entry->etail
+		)
+);
+#endif
+
 #endif /* __HFI1_TRACE_MISC_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/drivers/infiniband/hw/hfi1/verbs.c b/drivers/infiniband/hw/hfi1/verbs.c
index 928918c..9f016da 100644
--- a/drivers/infiniband/hw/hfi1/verbs.c
+++ b/drivers/infiniband/hw/hfi1/verbs.c
@@ -60,6 +60,7 @@
 #include "trace.h"
 #include "qp.h"
 #include "verbs_txreq.h"
+#include "debugfs.h"
 
 static unsigned int hfi1_lkey_table_size = 16;
 module_param_named(lkey_table_size, hfi1_lkey_table_size, uint,
@@ -599,6 +600,11 @@ void hfi1_ib_rcv(struct hfi1_packet *packet)
 			rcu_read_unlock();
 			goto drop;
 		}
+		if (unlikely(hfi1_dbg_fault_opcode(packet->qp, opcode,
+						   true))) {
+			rcu_read_unlock();
+			goto drop;
+		}
 		spin_lock_irqsave(&packet->qp->r_lock, flags);
 		packet_handler = qp_ok(opcode, packet);
 		if (likely(packet_handler))
diff --git a/drivers/infiniband/hw/hfi1/verbs.h b/drivers/infiniband/hw/hfi1/verbs.h
index 3a0b589..2756ec3 100644
--- a/drivers/infiniband/hw/hfi1/verbs.h
+++ b/drivers/infiniband/hw/hfi1/verbs.h
@@ -195,6 +195,10 @@ struct hfi1_ibdev {
 	struct dentry *hfi1_ibdev_dbg;
 	/* per HFI symlinks to above */
 	struct dentry *hfi1_ibdev_link;
+#ifdef CONFIG_FAULT_INJECTION
+	struct fault_opcode *fault_opcode;
+	struct fault_packet *fault_packet;
+#endif
 #endif
 };
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 17/20] IB/hfi1: Add transmit fault injection feature
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
                     ` (15 preceding siblings ...)
  2017-03-21  0:26   ` [PATCH v2 16/20] IB/hfi1: Add receive fault injection feature Dennis Dalessandro
@ 2017-03-21  0:26   ` Dennis Dalessandro
       [not found]     ` <20170321002619.28538.31428.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
  2017-03-21  0:26   ` [PATCH v2 18/20] IB/hfi1: Eliminate synchronize_rcu() in mr delete Dennis Dalessandro
  2017-03-21  0:26   ` [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs Dennis Dalessandro
  18 siblings, 1 reply; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:26 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Don Hiatt, Mike Marciniszyn

From: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Add ability to fault packets on transmit by opcode.
Dropping by packet can be achieved by setting the mask to 0.

In order to drop non-verbs traffic we set PbcInsertHrc
to NONE (0x2). The packet will still be delivered to
the receiving node but a KHdrHCRCErr (KDETH packet
with a bad HCRC) will be triggered and the packet will
not be delivered to the correct context.

In order to drop regular verbs traffic we set the
PbcTestEbp flag. The packet will still be delivered
to the receiving node but a 'late ebp error' will
be triggered and will be dropped.

A global toggle (/sys/kernel/debug/hfi1/hfi1_X/fault_suppress_err)
has been added to suppress the error messages on the receive
node when a packet was faulted on the sending node.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/hfi1/chip.c    |    4 +++
 drivers/infiniband/hw/hfi1/debugfs.c |    8 ++++++
 drivers/infiniband/hw/hfi1/debugfs.h |   11 ++++++++
 drivers/infiniband/hw/hfi1/driver.c  |   11 ++++++++
 drivers/infiniband/hw/hfi1/verbs.c   |   49 +++++++++++++++++++++++++++++-----
 drivers/infiniband/hw/hfi1/verbs.h   |    1 +
 include/rdma/ib_pack.h               |    2 +
 7 files changed, 79 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index 77f4b41..79a316a 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -64,6 +64,7 @@
 #include "platform.h"
 #include "aspm.h"
 #include "affinity.h"
+#include "debugfs.h"
 
 #define NUM_IB_PORTS 1
 
@@ -7898,6 +7899,9 @@ static void handle_dcc_err(struct hfi1_devdata *dd, u32 unused, u64 reg)
 		reg &= ~DCC_ERR_FLG_EN_CSR_ACCESS_BLOCKED_HOST_SMASK;
 	}
 
+	if (unlikely(hfi1_dbg_fault_suppress_err(&dd->verbs_dev)))
+		reg &= ~DCC_ERR_FLG_LATE_EBP_ERR_SMASK;
+
 	/* report any remaining errors */
 	if (reg)
 		dd_dev_info_ratelimited(dd, "DCC Error: %s\n",
diff --git a/drivers/infiniband/hw/hfi1/debugfs.c b/drivers/infiniband/hw/hfi1/debugfs.c
index cac6d52..dc2c1c9 100644
--- a/drivers/infiniband/hw/hfi1/debugfs.c
+++ b/drivers/infiniband/hw/hfi1/debugfs.c
@@ -1240,6 +1240,11 @@ static int fault_init_debugfs(struct hfi1_ibdev *ibd)
 	return ret;
 }
 
+bool hfi1_dbg_fault_suppress_err(struct hfi1_ibdev *ibd)
+{
+	return ibd->fault_suppress_err;
+}
+
 bool hfi1_dbg_fault_opcode(struct rvt_qp *qp, u32 opcode, bool rx)
 {
 	bool ret = false;
@@ -1329,6 +1334,9 @@ void hfi1_dbg_ibdev_init(struct hfi1_ibdev *ibd)
 		}
 
 #ifdef CONFIG_FAULT_INJECTION
+	debugfs_create_bool("fault_suppress_err", 0600,
+			    ibd->hfi1_ibdev_dbg,
+			    &ibd->fault_suppress_err);
 	fault_init_debugfs(ibd);
 #endif
 }
diff --git a/drivers/infiniband/hw/hfi1/debugfs.h b/drivers/infiniband/hw/hfi1/debugfs.h
index 70be5ca..38c38a9 100644
--- a/drivers/infiniband/hw/hfi1/debugfs.h
+++ b/drivers/infiniband/hw/hfi1/debugfs.h
@@ -75,6 +75,7 @@ struct fault_packet {
 
 bool hfi1_dbg_fault_opcode(struct rvt_qp *qp, u32 opcode, bool rx);
 bool hfi1_dbg_fault_packet(struct hfi1_packet *packet);
+bool hfi1_dbg_fault_suppress_err(struct hfi1_ibdev *ibd);
 #else
 static inline bool hfi1_dbg_fault_packet(struct hfi1_packet *packet)
 {
@@ -86,6 +87,11 @@ static inline bool hfi1_dbg_fault_opcode(struct rvt_qp *qp,
 {
 	return false;
 }
+
+static inline bool hfi1_dbg_fault_suppress_err(struct hfi1_ibdev *ibd)
+{
+	return false;
+}
 #endif
 
 #else
@@ -115,6 +121,11 @@ static inline bool hfi1_dbg_fault_opcode(struct rvt_qp *qp,
 {
 	return false;
 }
+
+static inline bool hfi1_dbg_fault_suppress_err(struct hfi1_ibdev *ibd)
+{
+	return false;
+}
 #endif
 
 #endif                          /* _HFI1_DEBUGFS_H */
diff --git a/drivers/infiniband/hw/hfi1/driver.c b/drivers/infiniband/hw/hfi1/driver.c
index c0b012f..64bdbce 100644
--- a/drivers/infiniband/hw/hfi1/driver.c
+++ b/drivers/infiniband/hw/hfi1/driver.c
@@ -1367,6 +1367,11 @@ int process_receive_ib(struct hfi1_packet *packet)
 			  packet->updegr,
 			  rhf_egr_index(packet->rhf));
 
+	if (unlikely(
+		 (hfi1_dbg_fault_suppress_err(&packet->rcd->dd->verbs_dev) &&
+		 (packet->rhf & RHF_DC_ERR))))
+		return RHF_RCV_CONTINUE;
+
 	if (unlikely(rhf_err_flags(packet->rhf))) {
 		handle_eflags(packet);
 		return RHF_RCV_CONTINUE;
@@ -1402,6 +1407,12 @@ int process_receive_bypass(struct hfi1_packet *packet)
 
 int process_receive_error(struct hfi1_packet *packet)
 {
+	/* KHdrHCRCErr -- KDETH packet with a bad HCRC */
+	if (unlikely(
+		 hfi1_dbg_fault_suppress_err(&packet->rcd->dd->verbs_dev) &&
+		 rhf_rcv_type_err(packet->rhf) == 3))
+		return RHF_RCV_CONTINUE;
+
 	handle_eflags(packet);
 
 	if (unlikely(rhf_err_flags(packet->rhf)))
diff --git a/drivers/infiniband/hw/hfi1/verbs.c b/drivers/infiniband/hw/hfi1/verbs.c
index 9f016da..5e7e577 100644
--- a/drivers/infiniband/hw/hfi1/verbs.c
+++ b/drivers/infiniband/hw/hfi1/verbs.c
@@ -518,6 +518,35 @@ static inline opcode_handler qp_ok(int opcode, struct hfi1_packet *packet)
 	return NULL;
 }
 
+static u64 hfi1_fault_tx(struct rvt_qp *qp, u8 opcode, u64 pbc)
+{
+#ifdef CONFIG_HFI1_FAULT_INJECTION
+	if ((opcode & IB_OPCODE_MSP) == IB_OPCODE_MSP)
+		/*
+		 * In order to drop non-IB traffic we
+		 * set PbcInsertHrc to NONE (0x2).
+		 * The packet will still be delivered
+		 * to the receiving node but a
+		 * KHdrHCRCErr (KDETH packet with a bad
+		 * HCRC) will be triggered and the
+		 * packet will not be delivered to the
+		 * correct context.
+		 */
+		pbc |= (u64)PBC_IHCRC_NONE << PBC_INSERT_HCRC_SHIFT;
+	else
+		/*
+		 * In order to drop regular verbs
+		 * traffic we set the PbcTestEbp
+		 * flag. The packet will still be
+		 * delivered to the receiving node but
+		 * a 'late ebp error' will be
+		 * triggered and will be dropped.
+		 */
+		pbc |= PBC_TEST_EBP;
+#endif
+	return pbc;
+}
+
 /**
  * hfi1_ib_rcv - process an incoming packet
  * @packet: data packet information
@@ -803,7 +832,6 @@ static int build_verbs_tx_desc(
 		if (ret)
 			goto bail_txadd;
 	}
-
 	/* add the ulp payload - if any. tx->ss can be NULL for acks */
 	if (tx->ss)
 		ret = build_verbs_ulp_payload(sde, length, tx);
@@ -822,7 +850,6 @@ int hfi1_verbs_send_dma(struct rvt_qp *qp, struct hfi1_pkt_state *ps,
 	struct hfi1_ibdev *dev = ps->dev;
 	struct hfi1_pportdata *ppd = ps->ppd;
 	struct verbs_txreq *tx;
-	u64 pbc_flags = 0;
 	u8 sc5 = priv->s_sc;
 
 	int ret;
@@ -831,12 +858,16 @@ int hfi1_verbs_send_dma(struct rvt_qp *qp, struct hfi1_pkt_state *ps,
 	if (!sdma_txreq_built(&tx->txreq)) {
 		if (likely(pbc == 0)) {
 			u32 vl = sc_to_vlt(dd_from_ibdev(qp->ibqp.device), sc5);
+			u8 opcode = get_opcode(&tx->phdr.hdr);
+
 			/* No vl15 here */
 			/* set PBC_DC_INFO bit (aka SC[4]) in pbc_flags */
-			pbc_flags |= (!!(sc5 & 0x10)) << PBC_DC_INFO_SHIFT;
+			pbc |= (!!(sc5 & 0x10)) << PBC_DC_INFO_SHIFT;
 
+			if (unlikely(hfi1_dbg_fault_opcode(qp, opcode, false)))
+				pbc = hfi1_fault_tx(qp, opcode, pbc);
 			pbc = create_pbc(ppd,
-					 pbc_flags,
+					 pbc,
 					 qp->srate_mbps,
 					 vl,
 					 plen);
@@ -939,7 +970,6 @@ int hfi1_verbs_send_pio(struct rvt_qp *qp, struct hfi1_pkt_state *ps,
 	u32 plen = hdrwords + dwords + 2; /* includes pbc */
 	struct hfi1_pportdata *ppd = ps->ppd;
 	u32 *hdr = (u32 *)&ps->s_txreq->phdr.hdr;
-	u64 pbc_flags = 0;
 	u8 sc5;
 	unsigned long flags = 0;
 	struct send_context *sc;
@@ -964,9 +994,14 @@ int hfi1_verbs_send_pio(struct rvt_qp *qp, struct hfi1_pkt_state *ps,
 
 	if (likely(pbc == 0)) {
 		u8 vl = sc_to_vlt(dd_from_ibdev(qp->ibqp.device), sc5);
+		struct verbs_txreq *tx = ps->s_txreq;
+		u8 opcode = get_opcode(&tx->phdr.hdr);
+
 		/* set PBC_DC_INFO bit (aka SC[4]) in pbc_flags */
-		pbc_flags |= (!!(sc5 & 0x10)) << PBC_DC_INFO_SHIFT;
-		pbc = create_pbc(ppd, pbc_flags, qp->srate_mbps, vl, plen);
+		pbc |= (!!(sc5 & 0x10)) << PBC_DC_INFO_SHIFT;
+		if (unlikely(hfi1_dbg_fault_opcode(qp, opcode, false)))
+			pbc = hfi1_fault_tx(qp, opcode, pbc);
+		pbc = create_pbc(ppd, pbc, qp->srate_mbps, vl, plen);
 	}
 	if (cb)
 		iowait_pio_inc(&priv->s_iowait);
diff --git a/drivers/infiniband/hw/hfi1/verbs.h b/drivers/infiniband/hw/hfi1/verbs.h
index 2756ec3..6c549e7 100644
--- a/drivers/infiniband/hw/hfi1/verbs.h
+++ b/drivers/infiniband/hw/hfi1/verbs.h
@@ -198,6 +198,7 @@ struct hfi1_ibdev {
 #ifdef CONFIG_FAULT_INJECTION
 	struct fault_opcode *fault_opcode;
 	struct fault_packet *fault_packet;
+	bool fault_suppress_err;
 #endif
 #endif
 };
diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h
index b13419c..3665589 100644
--- a/include/rdma/ib_pack.h
+++ b/include/rdma/ib_pack.h
@@ -80,6 +80,8 @@ enum {
 	IB_OPCODE_UD                                = 0x60,
 	/* per IBTA 1.3 vol 1 Table 38, A10.3.2 */
 	IB_OPCODE_CNP                               = 0x80,
+	/* Manufacturer specific */
+	IB_OPCODE_MSP                               = 0xe0,
 
 	/* operations -- just used to define real constants */
 	IB_OPCODE_SEND_FIRST                        = 0x00,

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 18/20] IB/hfi1: Eliminate synchronize_rcu() in mr delete
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
                     ` (16 preceding siblings ...)
  2017-03-21  0:26   ` [PATCH v2 17/20] IB/hfi1: Add transmit " Dennis Dalessandro
@ 2017-03-21  0:26   ` Dennis Dalessandro
  2017-03-21  0:26   ` [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs Dennis Dalessandro
  18 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:26 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mike Marciniszyn

From: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

The synchronize_rcu() call can be eliminated to improve memory deregistration
performance.

There are two key fields involved:
- The rcu pointer itself
- the lkey_published field

To close the window between the rcu read of the mregion pointer and the
reference count the code should:

1. To lkey/rkey validation (reader)

Read the rcu pointer.  If the pointer is non-NULL, get a reference.

To the current validation tests use a READ_ONCE() on the lkey_published.

Upon any failure release the reference.

2. To the remove logic (delete)

Insure the published is zeroed prior to setting the pointer to NULL.
This requires using rcu_assign_pointer() to insure lkey_published
is written prior to the NULL.

3. To the insert logic (add)

Insure the published is set use an rcu_assign_pointer() to insure the
pointer is after all MR fields.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/sw/rdmavt/mr.c |   49 +++++++++++++++++++++++++------------
 1 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c
index ae30b68..7c86955 100644
--- a/drivers/infiniband/sw/rdmavt/mr.c
+++ b/drivers/infiniband/sw/rdmavt/mr.c
@@ -191,8 +191,9 @@ static int rvt_alloc_lkey(struct rvt_mregion *mr, int dma_region)
 
 		tmr = rcu_access_pointer(dev->dma_mr);
 		if (!tmr) {
-			rcu_assign_pointer(dev->dma_mr, mr);
 			mr->lkey_published = 1;
+			/* Insure published written first */
+			rcu_assign_pointer(dev->dma_mr, mr);
 			rvt_get_mr(mr);
 		}
 		goto success;
@@ -224,8 +225,9 @@ static int rvt_alloc_lkey(struct rvt_mregion *mr, int dma_region)
 		mr->lkey |= 1 << 8;
 		rkt->gen++;
 	}
-	rcu_assign_pointer(rkt->table[r], mr);
 	mr->lkey_published = 1;
+	/* Insure published written first */
+	rcu_assign_pointer(rkt->table[r], mr);
 success:
 	spin_unlock_irqrestore(&rkt->lock, flags);
 out:
@@ -253,23 +255,24 @@ static void rvt_free_lkey(struct rvt_mregion *mr)
 	spin_lock_irqsave(&rkt->lock, flags);
 	if (!lkey) {
 		if (mr->lkey_published) {
-			RCU_INIT_POINTER(dev->dma_mr, NULL);
+			mr->lkey_published = 0;
+			/* insure published is written before pointer */
+			rcu_assign_pointer(dev->dma_mr, NULL);
 			rvt_put_mr(mr);
 		}
 	} else {
 		if (!mr->lkey_published)
 			goto out;
 		r = lkey >> (32 - dev->dparms.lkey_table_size);
-		RCU_INIT_POINTER(rkt->table[r], NULL);
+		mr->lkey_published = 0;
+		/* insure published is written before pointer */
+		rcu_assign_pointer(rkt->table[r], NULL);
 	}
-	mr->lkey_published = 0;
 	freed++;
 out:
 	spin_unlock_irqrestore(&rkt->lock, flags);
-	if (freed) {
-		synchronize_rcu();
+	if (freed)
 		percpu_ref_kill(&mr->refcount);
-	}
 }
 
 static struct rvt_mr *__rvt_alloc_mr(int count, struct ib_pd *pd)
@@ -822,16 +825,21 @@ int rvt_lkey_ok(struct rvt_lkey_table *rkt, struct rvt_pd *pd,
 		goto ok;
 	}
 	mr = rcu_dereference(rkt->table[sge->lkey >> rkt->shift]);
-	if (unlikely(!mr || atomic_read(&mr->lkey_invalid) ||
-		     mr->lkey != sge->lkey || mr->pd != &pd->ibpd))
+	if (!mr)
 		goto bail;
+	rvt_get_mr(mr);
+	if (!READ_ONCE(mr->lkey_published))
+		goto bail_unref;
+
+	if (unlikely(atomic_read(&mr->lkey_invalid) ||
+		     mr->lkey != sge->lkey || mr->pd != &pd->ibpd))
+		goto bail_unref;
 
 	off = sge->addr - mr->user_base;
 	if (unlikely(sge->addr < mr->user_base ||
 		     off + sge->length > mr->length ||
 		     (mr->access_flags & acc) != acc))
-		goto bail;
-	rvt_get_mr(mr);
+		goto bail_unref;
 	rcu_read_unlock();
 
 	off += mr->offset;
@@ -867,6 +875,8 @@ int rvt_lkey_ok(struct rvt_lkey_table *rkt, struct rvt_pd *pd,
 	isge->n = n;
 ok:
 	return 1;
+bail_unref:
+	rvt_put_mr(mr);
 bail:
 	rcu_read_unlock();
 	return 0;
@@ -922,15 +932,20 @@ int rvt_rkey_ok(struct rvt_qp *qp, struct rvt_sge *sge,
 	}
 
 	mr = rcu_dereference(rkt->table[rkey >> rkt->shift]);
-	if (unlikely(!mr || atomic_read(&mr->lkey_invalid) ||
-		     mr->lkey != rkey || qp->ibqp.pd != mr->pd))
+	if (!mr)
 		goto bail;
+	rvt_get_mr(mr);
+	/* insure mr read is before test */
+	if (!READ_ONCE(mr->lkey_published))
+		goto bail_unref;
+	if (unlikely(atomic_read(&mr->lkey_invalid) ||
+		     mr->lkey != rkey || qp->ibqp.pd != mr->pd))
+		goto bail_unref;
 
 	off = vaddr - mr->iova;
 	if (unlikely(vaddr < mr->iova || off + len > mr->length ||
 		     (mr->access_flags & acc) == 0))
-		goto bail;
-	rvt_get_mr(mr);
+		goto bail_unref;
 	rcu_read_unlock();
 
 	off += mr->offset;
@@ -966,6 +981,8 @@ int rvt_rkey_ok(struct rvt_qp *qp, struct rvt_sge *sge,
 	sge->n = n;
 ok:
 	return 1;
+bail_unref:
+	rvt_put_mr(mr);
 bail:
 	rcu_read_unlock();
 	return 0;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
                     ` (17 preceding siblings ...)
  2017-03-21  0:26   ` [PATCH v2 18/20] IB/hfi1: Eliminate synchronize_rcu() in mr delete Dennis Dalessandro
@ 2017-03-21  0:26   ` Dennis Dalessandro
       [not found]     ` <20170321002631.28538.2121.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
  18 siblings, 1 reply; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:26 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mike Marciniszyn

From: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

In some cases, the cost of user memory deregistration is more
important than the data path benefit of percpu reference counts.

Add a (default off) module parameter to disarm percpu for user memory
regions.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/hfi1/verbs.c    |    7 +++++++
 drivers/infiniband/hw/qib/qib_verbs.c |    7 +++++++
 drivers/infiniband/sw/rdmavt/mr.c     |    6 +++++-
 include/rdma/rdma_vt.h                |    1 +
 4 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/verbs.c b/drivers/infiniband/hw/hfi1/verbs.c
index 5e7e577..552b26d 100644
--- a/drivers/infiniband/hw/hfi1/verbs.c
+++ b/drivers/infiniband/hw/hfi1/verbs.c
@@ -68,6 +68,12 @@
 MODULE_PARM_DESC(lkey_table_size,
 		 "LKEY table size in bits (2^n, 1 <= n <= 23)");
 
+static unsigned int hfi1_no_user_mr_percpu;
+module_param_named(no_user_mr_percpu, hfi1_no_user_mr_percpu, uint,
+		   S_IRUGO);
+MODULE_PARM_DESC(no_user_mr_percpu,
+		 "Avoid percpu refcount for user MRs (default 0)");
+
 static unsigned int hfi1_max_pds = 0xFFFF;
 module_param_named(max_pds, hfi1_max_pds, uint, S_IRUGO);
 MODULE_PARM_DESC(max_pds,
@@ -1841,6 +1847,7 @@ int hfi1_register_ib_device(struct hfi1_devdata *dd)
 	/* misc settings */
 	dd->verbs_dev.rdi.flags = 0; /* Let rdmavt handle it all */
 	dd->verbs_dev.rdi.dparms.lkey_table_size = hfi1_lkey_table_size;
+	dd->verbs_dev.rdi.dparms.no_user_mr_percpu = hfi1_no_user_mr_percpu;
 	dd->verbs_dev.rdi.dparms.nports = dd->num_pports;
 	dd->verbs_dev.rdi.dparms.npkeys = hfi1_get_npkeys(dd);
 
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index e120efe..6c718cd 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -56,6 +56,12 @@
 MODULE_PARM_DESC(lkey_table_size,
 		 "LKEY table size in bits (2^n, 1 <= n <= 23)");
 
+static unsigned int qib_no_user_mr_percpu;
+module_param_named(no_user_mr_percpu, qib_no_user_mr_percpu, uint,
+		   S_IRUGO);
+MODULE_PARM_DESC(no_user_mr_percpu,
+		 "Avoid percpu refcount for user MRs (default 0)");
+
 static unsigned int ib_qib_max_pds = 0xFFFF;
 module_param_named(max_pds, ib_qib_max_pds, uint, S_IRUGO);
 MODULE_PARM_DESC(max_pds,
@@ -1606,6 +1612,7 @@ int qib_register_ib_device(struct qib_devdata *dd)
 	dd->verbs_dev.rdi.dparms.max_rdma_atomic = QIB_MAX_RDMA_ATOMIC;
 	dd->verbs_dev.rdi.driver_f.get_guid_be = qib_get_guid_be;
 	dd->verbs_dev.rdi.dparms.lkey_table_size = qib_lkey_table_size;
+	dd->verbs_dev.rdi.dparms.no_user_mr_percpu = qib_no_user_mr_percpu;
 	dd->verbs_dev.rdi.dparms.qp_table_size = ib_qib_qp_table_size;
 	dd->verbs_dev.rdi.dparms.qpn_start = 1;
 	dd->verbs_dev.rdi.dparms.qpn_res_start = QIB_KD_QP;
diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c
index 7c86955..bbcc31f 100644
--- a/drivers/infiniband/sw/rdmavt/mr.c
+++ b/drivers/infiniband/sw/rdmavt/mr.c
@@ -280,6 +280,7 @@ static void rvt_free_lkey(struct rvt_mregion *mr)
 	struct rvt_mr *mr;
 	int rval = -ENOMEM;
 	int m;
+	struct rvt_dev_info *dev = ib_to_rvt(pd->device);
 
 	/* Allocate struct plus pointers to first level page tables. */
 	m = (count + RVT_SEGSZ - 1) / RVT_SEGSZ;
@@ -287,7 +288,10 @@ static void rvt_free_lkey(struct rvt_mregion *mr)
 	if (!mr)
 		goto bail;
 
-	rval = rvt_init_mregion(&mr->mr, pd, count, 0);
+	rval = rvt_init_mregion(&mr->mr, pd, count,
+				ibpd_to_rvtpd(pd)->user &&
+				dev->dparms.no_user_mr_percpu ?
+					PERCPU_REF_INIT_ATOMIC : 0);
 	if (rval)
 		goto bail;
 	/*
diff --git a/include/rdma/rdma_vt.h b/include/rdma/rdma_vt.h
index 8fc1ca7..d60a41e 100644
--- a/include/rdma/rdma_vt.h
+++ b/include/rdma/rdma_vt.h
@@ -142,6 +142,7 @@ struct rvt_driver_params {
 	 * For instance special module parameters. Goes here.
 	 */
 	unsigned int lkey_table_size;
+	unsigned int no_user_mr_percpu;
 	unsigned int qp_table_size;
 	int qpn_start;
 	int qpn_inc;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 20/20] IB/core: If the MGID/MLID pair is not on the list return an error
  2017-03-21  0:24 ` Dennis Dalessandro
  (?)
  (?)
@ 2017-03-21  0:26 ` Dennis Dalessandro
  -1 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-03-21  0:26 UTC (permalink / raw)
  To: dledford; +Cc: linux-rdma, Leon Romanovsky, Ira Weiny, stable, Michael J. Ruhl

From: Michael J. Ruhl <michael.j.ruhl@intel.com>

A list of MGID/MLID pairs is built when doing a multicast attach.  When
the multicast detach is called, the list is searched, and regardless of
the search outcome, the driver detach is called.

If an MGID/MLID pair is not on the list, driver detach should not be
called, and an error should be returned.  Calling the driver without
removing an MGID/MLID pair from the list can leave the core and driver
out of sync.

Fixes: f4e401562c11 IB/uverbs: track multicast group membership for userspace QPs
Cc: stable@vger.kernel.org
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
---
 drivers/infiniband/core/uverbs_cmd.c |   13 +++++++++----
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 7b7a76e..40cd335 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3186,6 +3186,7 @@ ssize_t ib_uverbs_detach_mcast(struct ib_uverbs_file *file,
 	struct ib_qp                 *qp;
 	struct ib_uverbs_mcast_entry *mcast;
 	int                           ret = -EINVAL;
+	bool                          found = false;
 
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
@@ -3194,10 +3195,6 @@ ssize_t ib_uverbs_detach_mcast(struct ib_uverbs_file *file,
 	if (!qp)
 		return -EINVAL;
 
-	ret = ib_detach_mcast(qp, (union ib_gid *) cmd.gid, cmd.mlid);
-	if (ret)
-		goto out_put;
-
 	obj = container_of(qp->uobject, struct ib_uqp_object, uevent.uobject);
 
 	list_for_each_entry(mcast, &obj->mcast_list, list)
@@ -3205,9 +3202,17 @@ ssize_t ib_uverbs_detach_mcast(struct ib_uverbs_file *file,
 		    !memcmp(cmd.gid, mcast->gid.raw, sizeof mcast->gid.raw)) {
 			list_del(&mcast->list);
 			kfree(mcast);
+			found = true;
 			break;
 		}
 
+	if (!found) {
+		ret = -EINVAL;
+		goto out_put;
+	}
+
+	ret = ib_detach_mcast(qp, (union ib_gid *)cmd.gid, cmd.mlid);
+
 out_put:
 	put_qp_write(qp);
 

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 17/20] IB/hfi1: Add transmit fault injection feature
       [not found]     ` <20170321002619.28538.31428.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
@ 2017-04-05 18:34       ` Doug Ledford
       [not found]         ` <1491417255.2923.5.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2017-04-05 18:34 UTC (permalink / raw)
  To: Dennis Dalessandro
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Don Hiatt, Mike Marciniszyn

On Mon, 2017-03-20 at 17:26 -0700, Dennis Dalessandro wrote:
> 
>  #ifdef CONFIG_FAULT_INJECTION
> +	debugfs_create_bool("fault_suppress_err", 0600,
> +			    ibd->hfi1_ibdev_dbg,
> +			    &ibd->fault_suppress_err);
>  	fault_init_debugfs(ibd);
>  #endif

...

> +static u64 hfi1_fault_tx(struct rvt_qp *qp, u8 opcode, u64 pbc)
> +{
> +#ifdef CONFIG_HFI1_FAULT_INJECTION
                 ^Looks like you failed to fix this spot up

> +	if ((opcode & IB_OPCODE_MSP) == IB_OPCODE_MSP)
> +		/*
> +		 * In order to drop non-IB traffic we
> +		 * set PbcInsertHrc to NONE (0x2).
> +		 * The packet will still be delivered
> +		 * to the receiving node but a
> +		 * KHdrHCRCErr (KDETH packet with a bad
> +		 * HCRC) will be triggered and the
> +		 * packet will not be delivered to the
> +		 * correct context.
> +		 */
> +		pbc |= (u64)PBC_IHCRC_NONE << PBC_INSERT_HCRC_SHIFT;
> +	else
> +		/*
> +		 * In order to drop regular verbs
> +		 * traffic we set the PbcTestEbp
> +		 * flag. The packet will still be
> +		 * delivered to the receiving node but
> +		 * a 'late ebp error' will be
> +		 * triggered and will be dropped.
> +		 */
> +		pbc |= PBC_TEST_EBP;
> +#endif
> +	return pbc;
> +}
> +

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG KeyID: B826A3330E572FDD
   
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found]     ` <20170321002631.28538.2121.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
@ 2017-04-05 18:38       ` Doug Ledford
       [not found]         ` <1491417489.2923.6.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2017-04-05 18:38 UTC (permalink / raw)
  To: Dennis Dalessandro; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mike Marciniszyn

On Mon, 2017-03-20 at 17:26 -0700, Dennis Dalessandro wrote:
> +static unsigned int hfi1_no_user_mr_percpu;
> +module_param_named(no_user_mr_percpu, hfi1_no_user_mr_percpu, uint,
> +                  S_IRUGO);
> +MODULE_PARM_DESC(no_user_mr_percpu,
> +                "Avoid percpu refcount for user MRs (default 0)");
> +

Does this have to be a module parameter?  Those are frowned upon now a
days...

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG KeyID: B826A3330E572FDD
   
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 17/20] IB/hfi1: Add transmit fault injection feature
       [not found]         ` <1491417255.2923.5.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-04-05 18:49           ` Dennis Dalessandro
  0 siblings, 0 replies; 42+ messages in thread
From: Dennis Dalessandro @ 2017-04-05 18:49 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Don Hiatt, Mike Marciniszyn

On 04/05/2017 02:34 PM, Doug Ledford wrote:
> On Mon, 2017-03-20 at 17:26 -0700, Dennis Dalessandro wrote:
>>
>>  #ifdef CONFIG_FAULT_INJECTION
>> +	debugfs_create_bool("fault_suppress_err", 0600,
>> +			    ibd->hfi1_ibdev_dbg,
>> +			    &ibd->fault_suppress_err);
>>  	fault_init_debugfs(ibd);
>>  #endif
>
> ...
>
>> +static u64 hfi1_fault_tx(struct rvt_qp *qp, u8 opcode, u64 pbc)
>> +{
>> +#ifdef CONFIG_HFI1_FAULT_INJECTION
>                  ^Looks like you failed to fix this spot up

Doh! Will fix that up and re-send.

-Denny


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 00/20] IB/hfi1, qib, rdmavt: Another round of patches for 4.11
  2017-03-21  0:24 ` Dennis Dalessandro
                   ` (2 preceding siblings ...)
  (?)
@ 2017-04-05 18:50 ` Doug Ledford
  -1 siblings, 0 replies; 42+ messages in thread
From: Doug Ledford @ 2017-04-05 18:50 UTC (permalink / raw)
  To: Dennis Dalessandro
  Cc: Mike Marciniszyn, Dean Luick, Jakub Byczkowski, Tadeusz Struk,
	linux-rdma, Ira Weiny, Brian Welty, Easwar Hariharan, stable,
	Leon Romanovsky, Michael J. Ruhl, Don Hiatt, Sebastian Sanchez

On Mon, 2017-03-20 at 17:24 -0700, Dennis Dalessandro wrote:
> Doug,
> Here is another round of patches for 4.11. Included with the usual
> bug fixes
> and general improvements of particular interest are new versions of
> the two
> patches that you didn't take for the first set. The fault injection
> stuff.
> We decided to go ahead and use the already existing config variable
> for those.
> The other interesting thing here is a patch to the IB core for
> MGID/MLID
> checking.
> 
> Patches apply on top of Linus' master branch which includes your most
> recent
> pull request so this should apply equally well to your tree. Patches
> can 
> also be found in my GitHub repo at:
> https://github.com/ddalessa/kernel/tree/for-4.11

Hi Denny,

I've done a partial pull of this set.  I took the first 18 patches (I
fixed up the config option issue I brought up).  The 19th patch I'm
waiting for you to tell me if it would be possible to do something
other than a kernel module option, and patch 20 didn't apply and I
didn't take the time to try and apply it today (I can look at it later,
but I wanted to get your first 18 pushed out before I left for an appt
this afternoon).

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
   
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found]         ` <1491417489.2923.6.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-04-05 19:46           ` Dennis Dalessandro
       [not found]             ` <f008c532-340e-01f2-80e6-4bea74175e3e-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2017-04-05 19:47           ` Leon Romanovsky
  1 sibling, 1 reply; 42+ messages in thread
From: Dennis Dalessandro @ 2017-04-05 19:46 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Mike Marciniszyn

On 04/05/2017 02:38 PM, Doug Ledford wrote:
> On Mon, 2017-03-20 at 17:26 -0700, Dennis Dalessandro wrote:
>> +static unsigned int hfi1_no_user_mr_percpu;
>> +module_param_named(no_user_mr_percpu, hfi1_no_user_mr_percpu, uint,
>> +                  S_IRUGO);
>> +MODULE_PARM_DESC(no_user_mr_percpu,
>> +                "Avoid percpu refcount for user MRs (default 0)");
>> +
>
> Does this have to be a module parameter?  Those are frowned upon now a
> days...
>

Yeah I don't like it either really, but there is a pretty big tradeoff 
between memory deregistration improvement and cost to the data path. 
Most use cases care about the data path but some uses register and 
deregister memory a lot and this helps them.

Is there a another way we should be looking at for setting things like this?

-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found]         ` <1491417489.2923.6.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-04-05 19:46           ` Dennis Dalessandro
@ 2017-04-05 19:47           ` Leon Romanovsky
  1 sibling, 0 replies; 42+ messages in thread
From: Leon Romanovsky @ 2017-04-05 19:47 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Dennis Dalessandro, linux-rdma, Mike Marciniszyn

On Wed, Apr 5, 2017 at 9:38 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Mon, 2017-03-20 at 17:26 -0700, Dennis Dalessandro wrote:
>> +static unsigned int hfi1_no_user_mr_percpu;
>> +module_param_named(no_user_mr_percpu, hfi1_no_user_mr_percpu, uint,
>> +                  S_IRUGO);
>> +MODULE_PARM_DESC(no_user_mr_percpu,
>> +                "Avoid percpu refcount for user MRs (default 0)");
>> +
>
> Does this have to be a module parameter?  Those are frowned upon now a
> days...

Good catch, no way that driver will require module parameter which is
a headache to set while it is compiled into kernel.

>
> --
> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>     GPG KeyID: B826A3330E572FDD
>
> Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found]             ` <f008c532-340e-01f2-80e6-4bea74175e3e-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2017-04-05 19:51               ` Leon Romanovsky
       [not found]                 ` <CALq1K=JsjSCiSBeZVe4kHQmjw7tznL36JcsamZTVGZ5RhBvZPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Leon Romanovsky @ 2017-04-05 19:51 UTC (permalink / raw)
  To: Dennis Dalessandro; +Cc: Doug Ledford, linux-rdma, Mike Marciniszyn

On Wed, Apr 5, 2017 at 10:46 PM, Dennis Dalessandro
<dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> On 04/05/2017 02:38 PM, Doug Ledford wrote:
>>
>> On Mon, 2017-03-20 at 17:26 -0700, Dennis Dalessandro wrote:
>>>
>>> +static unsigned int hfi1_no_user_mr_percpu;
>>> +module_param_named(no_user_mr_percpu, hfi1_no_user_mr_percpu, uint,
>>> +                  S_IRUGO);
>>> +MODULE_PARM_DESC(no_user_mr_percpu,
>>> +                "Avoid percpu refcount for user MRs (default 0)");
>>> +
>>
>>
>> Does this have to be a module parameter?  Those are frowned upon now a
>> days...
>>
>
> Yeah I don't like it either really, but there is a pretty big tradeoff
> between memory deregistration improvement and cost to the data path. Most
> use cases care about the data path but some uses register and deregister
> memory a lot and this helps them.
>
> Is there a another way we should be looking at for setting things like this?

Use vendor channel interface to configure your driver.

>
> -Denny
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found]                 ` <CALq1K=JsjSCiSBeZVe4kHQmjw7tznL36JcsamZTVGZ5RhBvZPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-04-05 20:09                   ` Marciniszyn, Mike
       [not found]                     ` <32E1700B9017364D9B60AED9960492BC342EA858-RjuIdWtd+YbTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Marciniszyn, Mike @ 2017-04-05 20:09 UTC (permalink / raw)
  To: Leon Romanovsky, Dalessandro, Dennis; +Cc: Doug Ledford, linux-rdma

 > Is there a another way we should be looking at for setting things like this?
> 
> Use vendor channel interface to configure your driver.
> 

What is that? configfs or something else?

Mike

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found]                     ` <32E1700B9017364D9B60AED9960492BC342EA858-RjuIdWtd+YbTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2017-04-06  7:49                       ` Leon Romanovsky
       [not found]                         ` <20170406074955.GG2269-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Leon Romanovsky @ 2017-04-06  7:49 UTC (permalink / raw)
  To: Marciniszyn, Mike; +Cc: Dalessandro, Dennis, Doug Ledford, linux-rdma

[-- Attachment #1: Type: text/plain, Size: 586 bytes --]

On Wed, Apr 05, 2017 at 08:09:16PM +0000, Marciniszyn, Mike wrote:
>  > Is there a another way we should be looking at for setting things like this?
> >
> > Use vendor channel interface to configure your driver.
> >
>
> What is that? configfs or something else?

An immediate answer without digging into your code is Matan's KABI work.
https://github.com/matanb10/linux/tree/abi-devel-latest

However, I have an question, how do you ensure that user memory has no
users without refcounts? Will it be possible to dereg the memory despite
the fact that there are users?

Thanks

>
> Mike

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found]                         ` <20170406074955.GG2269-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
@ 2017-04-06 11:45                           ` Dennis Dalessandro
       [not found]                             ` <8cdf2fbb-f2a9-0b4b-b144-397ee73d1569-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Dennis Dalessandro @ 2017-04-06 11:45 UTC (permalink / raw)
  To: Leon Romanovsky, Marciniszyn, Mike; +Cc: Doug Ledford, linux-rdma

On 04/06/2017 03:49 AM, Leon Romanovsky wrote:
> On Wed, Apr 05, 2017 at 08:09:16PM +0000, Marciniszyn, Mike wrote:
>>  > Is there a another way we should be looking at for setting things like this?
>>>
>>> Use vendor channel interface to configure your driver.
>>>
>>
>> What is that? configfs or something else?
>
> An immediate answer without digging into your code is Matan's KABI work.
> https://github.com/matanb10/linux/tree/abi-devel-latest

Until that code is formally accepted and actually in the kernel we can't 
base our changes that are ready to go now (for 4.12) on it.

> However, I have an question, how do you ensure that user memory has no
> users without refcounts? Will it be possible to dereg the memory despite
> the fact that there are users?

Mike can correct me if I'm wrong but it is still refcounted. Just not 
per CPU, global if you will.

-Denny

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found]                             ` <8cdf2fbb-f2a9-0b4b-b144-397ee73d1569-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2017-04-06 12:37                               ` Leon Romanovsky
       [not found]                                 ` <20170406123726.GH2269-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  2017-04-06 14:47                               ` Marciniszyn, Mike
  1 sibling, 1 reply; 42+ messages in thread
From: Leon Romanovsky @ 2017-04-06 12:37 UTC (permalink / raw)
  To: Dennis Dalessandro; +Cc: Marciniszyn, Mike, Doug Ledford, linux-rdma

[-- Attachment #1: Type: text/plain, Size: 1865 bytes --]

On Thu, Apr 06, 2017 at 07:45:16AM -0400, Dennis Dalessandro wrote:
> On 04/06/2017 03:49 AM, Leon Romanovsky wrote:
> > On Wed, Apr 05, 2017 at 08:09:16PM +0000, Marciniszyn, Mike wrote:
> > >  > Is there a another way we should be looking at for setting things like this?
> > > >
> > > > Use vendor channel interface to configure your driver.
> > > >
> > >
> > > What is that? configfs or something else?
> >
> > An immediate answer without digging into your code is Matan's KABI work.
> > https://github.com/matanb10/linux/tree/abi-devel-latest
>
> Until that code is formally accepted and actually in the kernel we can't
> base our changes that are ready to go now (for 4.12) on it.

And we can't accept module parameters. I already presented my setup,
which I know in use by many people. Standalone kernel with everything
compiled in, everything runs in read-only small image without distro bloat.
it gives very small footprint, very fast execution and system
protection.

In such case, we'll be required to rebuild whole image to update command
line for one module parameter and we will need to do it just for
specific application, which is insane.

I'm glad that you realize now the importance of Matan's work and how it
can help overcome your current problems, You (Intel) are invited to help
him to make it faster.

>
> > However, I have an question, how do you ensure that user memory has no
> > users without refcounts? Will it be possible to dereg the memory despite
> > the fact that there are users?
>
> Mike can correct me if I'm wrong but it is still refcounted. Just not per
> CPU, global if you will.

IMHO, global will be always more expensive than percpu, due to locality.
However your patch presents different picture. You are claiming that removing
percpu_refcnt and leaving global will make work faster. How will it be?

Thanks

>
> -Denny
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found]                                 ` <20170406123726.GH2269-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
@ 2017-04-06 13:00                                   ` Dennis Dalessandro
       [not found]                                     ` <f1703866-9c5c-a30a-0d95-9f6a33cc4f75-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Dennis Dalessandro @ 2017-04-06 13:00 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Marciniszyn, Mike, Doug Ledford, linux-rdma

On 04/06/2017 08:37 AM, Leon Romanovsky wrote:
> On Thu, Apr 06, 2017 at 07:45:16AM -0400, Dennis Dalessandro wrote:
>> On 04/06/2017 03:49 AM, Leon Romanovsky wrote:
>>> On Wed, Apr 05, 2017 at 08:09:16PM +0000, Marciniszyn, Mike wrote:
>>>>  > Is there a another way we should be looking at for setting things like this?
>>>>>
>>>>> Use vendor channel interface to configure your driver.
>>>>>
>>>>
>>>> What is that? configfs or something else?
>>>
>>> An immediate answer without digging into your code is Matan's KABI work.
>>> https://github.com/matanb10/linux/tree/abi-devel-latest
>>
>> Until that code is formally accepted and actually in the kernel we can't
>> base our changes that are ready to go now (for 4.12) on it.
>
> And we can't accept module parameters. I already presented my setup,
> which I know in use by many people. Standalone kernel with everything
> compiled in, everything runs in read-only small image without distro bloat.
> it gives very small footprint, very fast execution and system
> protection.
>
> In such case, we'll be required to rebuild whole image to update command
> line for one module parameter and we will need to do it just for
> specific application, which is insane.

In the very rare case that if you care more about making mem dereg 
faster at the expense of the data path you would have to do just that. 
But for the vast majority of use cases the default is what you want, 
keep performance benefits to the data path at the cost of memory dereg.

> I'm glad that you realize now the importance of Matan's work and how it
> can help overcome your current problems, You (Intel) are invited to help
> him to make it faster.

This is the same stuff that a year ago was claimed it would only take a 
couple weeks. In my opinion it's still more than one release out. When 
it's done and Linus has accepted it, it's a different story.

>>
>>> However, I have an question, how do you ensure that user memory has no
>>> users without refcounts? Will it be possible to dereg the memory despite
>>> the fact that there are users?
>>
>> Mike can correct me if I'm wrong but it is still refcounted. Just not per
>> CPU, global if you will.
>
> IMHO, global will be always more expensive than percpu, due to locality.
> However your patch presents different picture. You are claiming that removing
> percpu_refcnt and leaving global will make work faster. How will it be?

Mike can explain better I'm sure but the gist of it is there is an 
implicit RCU delay waiting for the async per CPU model to quiesce ref 
counts to zero in the de-reg.

So while in the per CPU case, other things can be going on which helps 
packet reception and posting of sends, the benefit to the data path. 
However this comes at a cost to the de-reg.

So in the rare event that you want the de-reg to be faster and are 
willing to take the data path hit, it's better to do it atomically 
across all CPUs.

-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found]                                     ` <f1703866-9c5c-a30a-0d95-9f6a33cc4f75-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2017-04-06 13:33                                       ` Leon Romanovsky
  0 siblings, 0 replies; 42+ messages in thread
From: Leon Romanovsky @ 2017-04-06 13:33 UTC (permalink / raw)
  To: Dennis Dalessandro; +Cc: Marciniszyn, Mike, Doug Ledford, linux-rdma

[-- Attachment #1: Type: text/plain, Size: 3620 bytes --]

On Thu, Apr 06, 2017 at 09:00:12AM -0400, Dennis Dalessandro wrote:
> On 04/06/2017 08:37 AM, Leon Romanovsky wrote:
> > On Thu, Apr 06, 2017 at 07:45:16AM -0400, Dennis Dalessandro wrote:
> > > On 04/06/2017 03:49 AM, Leon Romanovsky wrote:
> > > > On Wed, Apr 05, 2017 at 08:09:16PM +0000, Marciniszyn, Mike wrote:
> > > > >  > Is there a another way we should be looking at for setting things like this?
> > > > > >
> > > > > > Use vendor channel interface to configure your driver.
> > > > > >
> > > > >
> > > > > What is that? configfs or something else?
> > > >
> > > > An immediate answer without digging into your code is Matan's KABI work.
> > > > https://github.com/matanb10/linux/tree/abi-devel-latest
> > >
> > > Until that code is formally accepted and actually in the kernel we can't
> > > base our changes that are ready to go now (for 4.12) on it.
> >
> > And we can't accept module parameters. I already presented my setup,
> > which I know in use by many people. Standalone kernel with everything
> > compiled in, everything runs in read-only small image without distro bloat.
> > it gives very small footprint, very fast execution and system
> > protection.
> >
> > In such case, we'll be required to rebuild whole image to update command
> > line for one module parameter and we will need to do it just for
> > specific application, which is insane.
>
> In the very rare case that if you care more about making mem dereg faster at
> the expense of the data path you would have to do just that. But for the
> vast majority of use cases the default is what you want, keep performance
> benefits to the data path at the cost of memory dereg.

I have very strong feelings that these module parameters won't work in
secured boot environment too.

>
> > I'm glad that you realize now the importance of Matan's work and how it
> > can help overcome your current problems, You (Intel) are invited to help
> > him to make it faster.
>
> This is the same stuff that a year ago was claimed it would only take a
> couple weeks. In my opinion it's still more than one release out. When it's
> done and Linus has accepted it, it's a different story.

I can put money on it, if Doug didn't accept your ioctl patches at the
beginning, we would converge much faster. I have a confidence that Doug
won't put RDMA subsystem in the confrontation with kernel core
development, just because it is easiest/fastest track.

>
> > >
> > > > However, I have an question, how do you ensure that user memory has no
> > > > users without refcounts? Will it be possible to dereg the memory despite
> > > > the fact that there are users?
> > >
> > > Mike can correct me if I'm wrong but it is still refcounted. Just not per
> > > CPU, global if you will.
> >
> > IMHO, global will be always more expensive than percpu, due to locality.
> > However your patch presents different picture. You are claiming that removing
> > percpu_refcnt and leaving global will make work faster. How will it be?
>
> Mike can explain better I'm sure but the gist of it is there is an implicit
> RCU delay waiting for the async per CPU model to quiesce ref counts to zero
> in the de-reg.
>
> So while in the per CPU case, other things can be going on which helps
> packet reception and posting of sends, the benefit to the data path. However
> this comes at a cost to the de-reg.
>
> So in the rare event that you want the de-reg to be faster and are willing
> to take the data path hit, it's better to do it atomically across all CPUs.

Again, it is application hint and not hint to whole kernel as module
parameter was intended.

Thanks

>
> -Denny

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found]                             ` <8cdf2fbb-f2a9-0b4b-b144-397ee73d1569-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2017-04-06 12:37                               ` Leon Romanovsky
@ 2017-04-06 14:47                               ` Marciniszyn, Mike
       [not found]                                 ` <32E1700B9017364D9B60AED9960492BC342EABD0-RjuIdWtd+YbTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  1 sibling, 1 reply; 42+ messages in thread
From: Marciniszyn, Mike @ 2017-04-06 14:47 UTC (permalink / raw)
  To: Dalessandro, Dennis, Leon Romanovsky; +Cc: Doug Ledford, linux-rdma

> > However, I have an question, how do you ensure that user memory has no
> > users without refcounts? Will it be possible to dereg the memory
> > despite the fact that there are users?
> 
> Mike can correct me if I'm wrong but it is still refcounted. Just not per CPU,
> global if you will.
> 

The details on appropriate usage are in include/linux/percpu-refcount.h.

The counters "can" operate in percpu or atomic mode based on an initialization choice.   Percpu mode is STILL reference counting.

At the point that a counted object is heading out the door, the counter is changed to atomic mode with an implicit RCU barrier in percpu_ref_kill().

frmr, dma, and user MRs transparently use the percpu mode with a significant benefit for data path operations.

There were two patches in the patch series, one reworks the rdmavt read side MR to allow for getting rid of an explicit RCU at deref time.   The second module parameter was intended to allow for a faster memory registration for user MRs as an option.

Ideally, the choice SHOULD be addressed in the MR registration code as a "fast dereg" option in ib_access_flags?

Mike




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found]                                 ` <32E1700B9017364D9B60AED9960492BC342EABD0-RjuIdWtd+YbTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2017-04-06 17:13                                   ` Jason Gunthorpe
       [not found]                                     ` <20170406171354.GA19854-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Jason Gunthorpe @ 2017-04-06 17:13 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Dalessandro, Dennis, Leon Romanovsky, Doug Ledford, linux-rdma

On Thu, Apr 06, 2017 at 02:47:56PM +0000, Marciniszyn, Mike wrote:

> There were two patches in the patch series, one reworks the rdmavt
> read side MR to allow for getting rid of an explicit RCU at deref
> time.  The second module parameter was intended to allow for a
> faster memory registration for user MRs as an option.

If it is faster why wouldn't you always just use that mode?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found]                                     ` <20170406171354.GA19854-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-04-06 17:16                                       ` Marciniszyn, Mike
       [not found]                                         ` <32E1700B9017364D9B60AED9960492BC342EADEE-RjuIdWtd+YbTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Marciniszyn, Mike @ 2017-04-06 17:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dalessandro, Dennis, Leon Romanovsky, Doug Ledford, linux-rdma

> > There were two patches in the patch series, one reworks the rdmavt
> > read side MR to allow for getting rid of an explicit RCU at deref
> > time.  The second module parameter was intended to allow for a faster
> > memory registration for user MRs as an option.
> 
> If it is faster why wouldn't you always just use that mode?
> 

It would be faster mr deregistration, but slower data patch since atomic operations would then be done.

Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found]                                         ` <32E1700B9017364D9B60AED9960492BC342EADEE-RjuIdWtd+YbTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2017-04-06 17:44                                           ` Jason Gunthorpe
       [not found]                                             ` <20170406174438.GA20020-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Jason Gunthorpe @ 2017-04-06 17:44 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Dalessandro, Dennis, Leon Romanovsky, Doug Ledford, linux-rdma

On Thu, Apr 06, 2017 at 05:16:31PM +0000, Marciniszyn, Mike wrote:
> > > There were two patches in the patch series, one reworks the rdmavt
> > > read side MR to allow for getting rid of an explicit RCU at deref
> > > time.  The second module parameter was intended to allow for a faster
> > > memory registration for user MRs as an option.
> > 
> > If it is faster why wouldn't you always just use that mode?
> 
> It would be faster mr deregistration, but slower data patch since
> atomic operations would then be done.

Umm.. This doesn't look like a refcount, it is a rwlock - why aren't
you using the optimized percpu_rwsem?

Seriously, making a rwlock out of a completion and a percpu_refcount
and then providing user options to micro-optimize it is fantastically
ugly/bad taste.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found]                                             ` <20170406174438.GA20020-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-04-07 21:12                                               ` Marciniszyn, Mike
       [not found]                                                 ` <32E1700B9017364D9B60AED9960492BC342EBA18-RjuIdWtd+YbTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Marciniszyn, Mike @ 2017-04-07 21:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dalessandro, Dennis, Leon Romanovsky, Doug Ledford, linux-rdma

> Umm.. This doesn't look like a refcount, it is a rwlock - why aren't you using
> the optimized percpu_rwsem?
> 

The refcount with a completion has been in qib and rdmavt for years without issue.

It is the best way to express having an MR's underlying pages held until all of the "users" have finished.

When the count gets to zero, the completion is triggered and allows a deregistration to complete.

What we have found is that as lkey and rkey validation calls scale out to many cores/threads, the atomic operation causes excessive cache bouncing.

That is the motivation for using the percpu reference counting: to avoid the bouncing in scaled out data path operations.

The information that is in include/linux/percpu-refcount.h and lib/percpu-refcount.c has the API documentation and our use is consistent with other callers and was pretty much a drop in replacement for our older atomic operations.

>From the above header file:
 * This implements a refcount with similar semantics to atomic_t - atomic_inc(),
 * atomic_dec_and_test() - but percpu.

The percpu rwsem seems more a drop-in replacement for the older rwlock_t stuff.

> ... micro-optimize it is fantastically ugly/bad
> taste.
> 

All this being said, we have encountered a use case where the MR is short lived and supports just one transaction.

In that case, the RCU quiescence during deregistration IS the performance bottleneck.    As cores scale out, the RCU grace period can cause large delays.

I have a prototype patch to pass a hint (no module parameter) to the user MR registration via the access flags.

Before (no hint):
    -- Alloc memory: 4 us
    -- Zero memory: 1933 us
    -- Register: 112 us
    -- Unregister: 6086 us <------
    -- Free memory: 89 us

After (with hint):
    -- Alloc memory: 7 us
    -- Zero memory: 1929 us
    -- Register: 111 us
    -- Unregister: 49 us   <------
    -- Free memory: 85 us

I don't think a two order of magnitude improvement is a micro optimization.

Note that in percpu-rw-semaphore.txt:
   Locking for reading is very fast, it uses RCU and it avoids any atomic
   instruction in the lock and unlock path. On the other hand, locking for
   writing is very expensive, it calls synchronize_rcu() that can take
   hundreds of milliseconds.

So the RCU grace period is problematic in this context as well.

Mike


  


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found]                                                 ` <32E1700B9017364D9B60AED9960492BC342EBA18-RjuIdWtd+YbTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2017-04-07 22:06                                                   ` Jason Gunthorpe
       [not found]                                                     ` <20170407220618.GA29138-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Jason Gunthorpe @ 2017-04-07 22:06 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Dalessandro, Dennis, Leon Romanovsky, Doug Ledford, linux-rdma

On Fri, Apr 07, 2017 at 09:12:34PM +0000, Marciniszyn, Mike wrote:
> > Umm.. This doesn't look like a refcount, it is a rwlock - why aren't you using
> > the optimized percpu_rwsem?
> > 
> 
> The refcount with a completion has been in qib and rdmavt for years
> without issue.

Doesn't change the fact this isn't a refcount behavior, it is a rwsem
with write lock on destroy. A proper refcounf would destroy the object
not call a completion.

Doing things properly using the common primitives makes stuff work
better, eg percpu_rwsem has sane lockdep.

> All this being said, we have encountered a use case where the MR is
> short lived and supports just one transaction.

Well, yes, that is a pretty common idiom in kernel workloads too..

> I have a prototype patch to pass a hint (no module parameter) to the
> user MR registration via the access flags.

Okay, so you'd have a IBV_MR_MULTI_THREADED to enable the RCU
optimization?

That seems sort of consistent with some of the other flags we've had
in the past (eg single threaded CQ polling optimization)

> I don't think a two order of magnitude improvement is a micro optimization.

The micro optimization was tring to optimize rwlock with percpu and
RCU. The two order of magnitude penalty on the destroy and the new
need for tuning knobs is the penalty for that.

I doubt the percpu optimization was two orders of magnitude..

> So the RCU grace period is problematic in this context as well.

Of course, RCU is not designed to have these kinds of performance
characteristics. If you define destroy to be a hot path then you can't
use RCU here, the worst case RCU grace period times are potentually
quite big..

This is why you shouldn't have the RCU optimization on by default at
all.

Usually RCU grace period latency is solved by defering the write side
to an async rcu grace period callback - why not do that instead of
adding a flag? It feels like destroy is a reasonable candidate to do
that kind of trick.

Perhaps some kind of enhancement to percpu_rwsem such that it would
asynchronously call a function with the write side lock held? Looks
not to hard..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs
       [not found]                                                     ` <20170407220618.GA29138-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-04-09  6:26                                                       ` Leon Romanovsky
  0 siblings, 0 replies; 42+ messages in thread
From: Leon Romanovsky @ 2017-04-09  6:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Marciniszyn, Mike, Dalessandro, Dennis, Doug Ledford, linux-rdma

[-- Attachment #1: Type: text/plain, Size: 2588 bytes --]

On Fri, Apr 07, 2017 at 04:06:18PM -0600, Jason Gunthorpe wrote:
> On Fri, Apr 07, 2017 at 09:12:34PM +0000, Marciniszyn, Mike wrote:
> > > Umm.. This doesn't look like a refcount, it is a rwlock - why aren't you using
> > > the optimized percpu_rwsem?
> > >
> >
> > The refcount with a completion has been in qib and rdmavt for years
> > without issue.
>
> Doesn't change the fact this isn't a refcount behavior, it is a rwsem
> with write lock on destroy. A proper refcounf would destroy the object
> not call a completion.
>
> Doing things properly using the common primitives makes stuff work
> better, eg percpu_rwsem has sane lockdep.
>
> > All this being said, we have encountered a use case where the MR is
> > short lived and supports just one transaction.
>
> Well, yes, that is a pretty common idiom in kernel workloads too..
>
> > I have a prototype patch to pass a hint (no module parameter) to the
> > user MR registration via the access flags.
>
> Okay, so you'd have a IBV_MR_MULTI_THREADED to enable the RCU
> optimization?

It is not needed for kernel paths (RCU optimization).
There is get_nr_threads(struct task_struct *tsk) call to get number of threads.
However I don't know if it is appropriate to use that function in driver code.

If the goal to optimize the user space drivers, indeed the flag will be needed.

>
> That seems sort of consistent with some of the other flags we've had
> in the past (eg single threaded CQ polling optimization)
>
> > I don't think a two order of magnitude improvement is a micro optimization.
>
> The micro optimization was tring to optimize rwlock with percpu and
> RCU. The two order of magnitude penalty on the destroy and the new
> need for tuning knobs is the penalty for that.
>
> I doubt the percpu optimization was two orders of magnitude..
>
> > So the RCU grace period is problematic in this context as well.
>
> Of course, RCU is not designed to have these kinds of performance
> characteristics. If you define destroy to be a hot path then you can't
> use RCU here, the worst case RCU grace period times are potentually
> quite big..
>
> This is why you shouldn't have the RCU optimization on by default at
> all.
>
> Usually RCU grace period latency is solved by defering the write side
> to an async rcu grace period callback - why not do that instead of
> adding a flag? It feels like destroy is a reasonable candidate to do
> that kind of trick.
>
> Perhaps some kind of enhancement to percpu_rwsem such that it would
> asynchronously call a function with the write side lock held? Looks
> not to hard..
>
> Jason

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2017-04-09  6:26 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-21  0:24 [PATCH v2 00/20] IB/hfi1, qib, rdmavt: Another round of patches for 4.11 Dennis Dalessandro
2017-03-21  0:24 ` Dennis Dalessandro
     [not found] ` <20170321001900.28538.38175.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
2017-03-21  0:24   ` [PATCH v2 01/20] IB/hfi1: Force logical link down Dennis Dalessandro
2017-03-21  0:24   ` [PATCH v2 02/20] IB/hfi1: Race hazard avoidance in user SDMA driver Dennis Dalessandro
2017-03-21  0:24   ` [PATCH v2 03/20] IB/hfi1: Cache registers during state change Dennis Dalessandro
2017-03-21  0:24   ` [PATCH v2 04/20] IB/hfi1: NULL pointer dereference when freeing rhashtable Dennis Dalessandro
2017-03-21  0:25   ` [PATCH v2 05/20] IB/rdmavt, IB/hfi1, IB/qib: Make wc opcode translation driver dependent Dennis Dalessandro
2017-03-21  0:25   ` [PATCH v2 06/20] IB/rdmavt: Add additional fields to post send trace Dennis Dalessandro
2017-03-21  0:25   ` [PATCH v2 07/20] IB/rdmavt: Add tracing for cq entry and poll Dennis Dalessandro
2017-03-21  0:25   ` [PATCH v2 08/20] IB/rdmavt: Add swqe completion trace Dennis Dalessandro
2017-03-21  0:25   ` [PATCH v2 09/20] IB/hfi1: Check device id early during init Dennis Dalessandro
2017-03-21  0:25   ` [PATCH v2 10/20] IB/hfi1: Protect the global dev_cntr_names and port_cntr_names Dennis Dalessandro
2017-03-21  0:25   ` [PATCH v2 11/20] IB/hfi1: Check for QSFP presence before attempting reads Dennis Dalessandro
2017-03-21  0:25   ` [PATCH v2 12/20] IB/hfi1: Add a patch value to the firmware version string Dennis Dalessandro
2017-03-21  0:25   ` [PATCH v2 13/20] IB/rdmavt, IB/hfi1: Fix timer migration regressions Dennis Dalessandro
2017-03-21  0:26   ` [PATCH v2 14/20] IB/rdmavt: Avoid reseting wqe send_flags in unreserve Dennis Dalessandro
2017-03-21  0:26   ` [PATCH v2 15/20] IB/hfi1: Ensure VL index is within bounds Dennis Dalessandro
2017-03-21  0:26   ` [PATCH v2 16/20] IB/hfi1: Add receive fault injection feature Dennis Dalessandro
2017-03-21  0:26   ` [PATCH v2 17/20] IB/hfi1: Add transmit " Dennis Dalessandro
     [not found]     ` <20170321002619.28538.31428.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
2017-04-05 18:34       ` Doug Ledford
     [not found]         ` <1491417255.2923.5.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-04-05 18:49           ` Dennis Dalessandro
2017-03-21  0:26   ` [PATCH v2 18/20] IB/hfi1: Eliminate synchronize_rcu() in mr delete Dennis Dalessandro
2017-03-21  0:26   ` [PATCH v2 19/20] IB/rdmavt, IB/qib, IB/hfi1: Make percpu refcount optional for user MRs Dennis Dalessandro
     [not found]     ` <20170321002631.28538.2121.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
2017-04-05 18:38       ` Doug Ledford
     [not found]         ` <1491417489.2923.6.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-04-05 19:46           ` Dennis Dalessandro
     [not found]             ` <f008c532-340e-01f2-80e6-4bea74175e3e-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-04-05 19:51               ` Leon Romanovsky
     [not found]                 ` <CALq1K=JsjSCiSBeZVe4kHQmjw7tznL36JcsamZTVGZ5RhBvZPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-04-05 20:09                   ` Marciniszyn, Mike
     [not found]                     ` <32E1700B9017364D9B60AED9960492BC342EA858-RjuIdWtd+YbTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-04-06  7:49                       ` Leon Romanovsky
     [not found]                         ` <20170406074955.GG2269-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-04-06 11:45                           ` Dennis Dalessandro
     [not found]                             ` <8cdf2fbb-f2a9-0b4b-b144-397ee73d1569-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-04-06 12:37                               ` Leon Romanovsky
     [not found]                                 ` <20170406123726.GH2269-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-04-06 13:00                                   ` Dennis Dalessandro
     [not found]                                     ` <f1703866-9c5c-a30a-0d95-9f6a33cc4f75-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-04-06 13:33                                       ` Leon Romanovsky
2017-04-06 14:47                               ` Marciniszyn, Mike
     [not found]                                 ` <32E1700B9017364D9B60AED9960492BC342EABD0-RjuIdWtd+YbTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-04-06 17:13                                   ` Jason Gunthorpe
     [not found]                                     ` <20170406171354.GA19854-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-04-06 17:16                                       ` Marciniszyn, Mike
     [not found]                                         ` <32E1700B9017364D9B60AED9960492BC342EADEE-RjuIdWtd+YbTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-04-06 17:44                                           ` Jason Gunthorpe
     [not found]                                             ` <20170406174438.GA20020-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-04-07 21:12                                               ` Marciniszyn, Mike
     [not found]                                                 ` <32E1700B9017364D9B60AED9960492BC342EBA18-RjuIdWtd+YbTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-04-07 22:06                                                   ` Jason Gunthorpe
     [not found]                                                     ` <20170407220618.GA29138-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-04-09  6:26                                                       ` Leon Romanovsky
2017-04-05 19:47           ` Leon Romanovsky
2017-03-21  0:26 ` [PATCH v2 20/20] IB/core: If the MGID/MLID pair is not on the list return an error Dennis Dalessandro
2017-04-05 18:50 ` [PATCH v2 00/20] IB/hfi1, qib, rdmavt: Another round of patches for 4.11 Doug Ledford

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.