linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Mike Marciniszyn <mike.marciniszyn@intel.com>,
	"Michael J. Ruhl" <michael.j.ruhl@intel.com>,
	Dennis Dalessandro <dennis.dalessandro@intel.com>,
	Jason Gunthorpe <jgg@mellanox.com>
Subject: [PATCH 4.14 41/41] IB/hfi1: Fix destroy_qp hang after a link down
Date: Thu, 18 Oct 2018 19:54:56 +0200	[thread overview]
Message-ID: <20181018175423.636301145@linuxfoundation.org> (raw)
In-Reply-To: <20181018175416.718399607@linuxfoundation.org>

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Michael J. Ruhl <michael.j.ruhl@intel.com>

commit b4a4957d3d1c328b733fce783b7264996f866ad2 upstream.

rvt_destroy_qp() cannot complete until all in process packets have
been released from the underlying hardware.  If a link down event
occurs, an application can hang with a kernel stack similar to:

cat /proc/<app PID>/stack
 quiesce_qp+0x178/0x250 [hfi1]
 rvt_reset_qp+0x23d/0x400 [rdmavt]
 rvt_destroy_qp+0x69/0x210 [rdmavt]
 ib_destroy_qp+0xba/0x1c0 [ib_core]
 nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma]
 nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma]
 nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma]
 nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma]
 process_one_work+0x17a/0x440
 worker_thread+0x126/0x3c0
 kthread+0xcf/0xe0
 ret_from_fork+0x58/0x90
 0xffffffffffffffff

quiesce_qp() waits until all outstanding packets have been freed.
This wait should be momentary.  During a link down event, the cleanup
handling does not ensure that all packets caught by the link down are
flushed properly.

This is caused by the fact that the freeze path and the link down
event is handled the same.  This is not correct.  The freeze path
waits until the HFI is unfrozen and then restarts PIO.  A link down
is not a freeze event.  The link down path cannot restart the PIO
until link is restored.  If the PIO path is restarted before the link
comes up, the application (QP) using the PIO path will hang (until
link is restored).

Fix by separating the linkdown path from the freeze path and use the
link down path for link down events.

Close a race condition sc_disable() by acquiring both the progress
and release locks.

Close a race condition in sc_stop() by moving the setting of the flag
bits under the alloc lock.

Cc: <stable@vger.kernel.org> # 4.9.x+
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/infiniband/hw/hfi1/chip.c |    7 +++++-
 drivers/infiniband/hw/hfi1/pio.c  |   42 ++++++++++++++++++++++++++++++--------
 drivers/infiniband/hw/hfi1/pio.h  |    2 +
 3 files changed, 42 insertions(+), 9 deletions(-)

--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -6722,6 +6722,7 @@ void start_freeze_handling(struct hfi1_p
 	struct hfi1_devdata *dd = ppd->dd;
 	struct send_context *sc;
 	int i;
+	int sc_flags;
 
 	if (flags & FREEZE_SELF)
 		write_csr(dd, CCE_CTRL, CCE_CTRL_SPC_FREEZE_SMASK);
@@ -6732,11 +6733,13 @@ void start_freeze_handling(struct hfi1_p
 	/* notify all SDMA engines that they are going into a freeze */
 	sdma_freeze_notify(dd, !!(flags & FREEZE_LINK_DOWN));
 
+	sc_flags = SCF_FROZEN | SCF_HALTED | (flags & FREEZE_LINK_DOWN ?
+					      SCF_LINK_DOWN : 0);
 	/* do halt pre-handling on all enabled send contexts */
 	for (i = 0; i < dd->num_send_contexts; i++) {
 		sc = dd->send_contexts[i].sc;
 		if (sc && (sc->flags & SCF_ENABLED))
-			sc_stop(sc, SCF_FROZEN | SCF_HALTED);
+			sc_stop(sc, sc_flags);
 	}
 
 	/* Send context are frozen. Notify user space */
@@ -10646,6 +10649,8 @@ int set_link_state(struct hfi1_pportdata
 		add_rcvctrl(dd, RCV_CTRL_RCV_PORT_ENABLE_SMASK);
 
 		handle_linkup_change(dd, 1);
+		pio_kernel_linkup(dd);
+
 		ppd->host_link_state = HLS_UP_INIT;
 		break;
 	case HLS_UP_ARMED:
--- a/drivers/infiniband/hw/hfi1/pio.c
+++ b/drivers/infiniband/hw/hfi1/pio.c
@@ -942,20 +942,18 @@ void sc_free(struct send_context *sc)
 void sc_disable(struct send_context *sc)
 {
 	u64 reg;
-	unsigned long flags;
 	struct pio_buf *pbuf;
 
 	if (!sc)
 		return;
 
 	/* do all steps, even if already disabled */
-	spin_lock_irqsave(&sc->alloc_lock, flags);
+	spin_lock_irq(&sc->alloc_lock);
 	reg = read_kctxt_csr(sc->dd, sc->hw_context, SC(CTRL));
 	reg &= ~SC(CTRL_CTXT_ENABLE_SMASK);
 	sc->flags &= ~SCF_ENABLED;
 	sc_wait_for_packet_egress(sc, 1);
 	write_kctxt_csr(sc->dd, sc->hw_context, SC(CTRL), reg);
-	spin_unlock_irqrestore(&sc->alloc_lock, flags);
 
 	/*
 	 * Flush any waiters.  Once the context is disabled,
@@ -965,7 +963,7 @@ void sc_disable(struct send_context *sc)
 	 * proceed with the flush.
 	 */
 	udelay(1);
-	spin_lock_irqsave(&sc->release_lock, flags);
+	spin_lock(&sc->release_lock);
 	if (sc->sr) {	/* this context has a shadow ring */
 		while (sc->sr_tail != sc->sr_head) {
 			pbuf = &sc->sr[sc->sr_tail].pbuf;
@@ -976,7 +974,8 @@ void sc_disable(struct send_context *sc)
 				sc->sr_tail = 0;
 		}
 	}
-	spin_unlock_irqrestore(&sc->release_lock, flags);
+	spin_unlock(&sc->release_lock);
+	spin_unlock_irq(&sc->alloc_lock);
 }
 
 /* return SendEgressCtxtStatus.PacketOccupancy */
@@ -1199,11 +1198,39 @@ void pio_kernel_unfreeze(struct hfi1_dev
 		sc = dd->send_contexts[i].sc;
 		if (!sc || !(sc->flags & SCF_FROZEN) || sc->type == SC_USER)
 			continue;
+		if (sc->flags & SCF_LINK_DOWN)
+			continue;
 
 		sc_enable(sc);	/* will clear the sc frozen flag */
 	}
 }
 
+/**
+ * pio_kernel_linkup() - Re-enable send contexts after linkup event
+ * @dd: valid devive data
+ *
+ * When the link goes down, the freeze path is taken.  However, a link down
+ * event is different from a freeze because if the send context is re-enabled
+ * whowever is sending data will start sending data again, which will hang
+ * any QP that is sending data.
+ *
+ * The freeze path now looks at the type of event that occurs and takes this
+ * path for link down event.
+ */
+void pio_kernel_linkup(struct hfi1_devdata *dd)
+{
+	struct send_context *sc;
+	int i;
+
+	for (i = 0; i < dd->num_send_contexts; i++) {
+		sc = dd->send_contexts[i].sc;
+		if (!sc || !(sc->flags & SCF_LINK_DOWN) || sc->type == SC_USER)
+			continue;
+
+		sc_enable(sc);	/* will clear the sc link down flag */
+	}
+}
+
 /*
  * Wait for the SendPioInitCtxt.PioInitInProgress bit to clear.
  * Returns:
@@ -1403,11 +1430,10 @@ void sc_stop(struct send_context *sc, in
 {
 	unsigned long flags;
 
-	/* mark the context */
-	sc->flags |= flag;
-
 	/* stop buffer allocations */
 	spin_lock_irqsave(&sc->alloc_lock, flags);
+	/* mark the context */
+	sc->flags |= flag;
 	sc->flags &= ~SCF_ENABLED;
 	spin_unlock_irqrestore(&sc->alloc_lock, flags);
 	wake_up(&sc->halt_wait);
--- a/drivers/infiniband/hw/hfi1/pio.h
+++ b/drivers/infiniband/hw/hfi1/pio.h
@@ -145,6 +145,7 @@ struct send_context {
 #define SCF_IN_FREE 0x02
 #define SCF_HALTED  0x04
 #define SCF_FROZEN  0x08
+#define SCF_LINK_DOWN 0x10
 
 struct send_context_info {
 	struct send_context *sc;	/* allocated working context */
@@ -312,6 +313,7 @@ void set_pio_integrity(struct send_conte
 void pio_reset_all(struct hfi1_devdata *dd);
 void pio_freeze(struct hfi1_devdata *dd);
 void pio_kernel_unfreeze(struct hfi1_devdata *dd);
+void pio_kernel_linkup(struct hfi1_devdata *dd);
 
 /* global PIO send control operations */
 #define PSC_GLOBAL_ENABLE 0



  parent reply	other threads:[~2018-10-18 18:02 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-18 17:54 [PATCH 4.14 00/41] 4.14.78-stable review Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 01/41] media: af9035: prevent buffer overflow on write Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 02/41] batman-adv: Avoid probe ELP information leak Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 03/41] batman-adv: Fix segfault when writing to throughput_override Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 04/41] batman-adv: Fix segfault when writing to sysfs elp_interval Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 05/41] batman-adv: Prevent duplicated gateway_node entry Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 06/41] batman-adv: Prevent duplicated nc_node entry Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 07/41] batman-adv: Prevent duplicated softif_vlan entry Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 08/41] batman-adv: Prevent duplicated global TT entry Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 09/41] batman-adv: Prevent duplicated tvlv handler Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 10/41] batman-adv: fix backbone_gw refcount on queue_work() failure Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 11/41] batman-adv: fix hardif_neigh " Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 12/41] clocksource/drivers/ti-32k: Add CLOCK_SOURCE_SUSPEND_NONSTOP flag for non-am43 SoCs Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 13/41] scsi: ibmvscsis: Fix a stringop-overflow warning Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 14/41] scsi: ibmvscsis: Ensure partition name is properly NUL terminated Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 15/41] intel_th: pci: Add Ice Lake PCH support Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 16/41] Input: atakbd - fix Atari keymap Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 17/41] Input: atakbd - fix Atari CapsLock behaviour Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 18/41] net: emac: fix fixed-link setup for the RTL8363SB switch Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 19/41] ravb: do not write 1 to reserved bits Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 20/41] PCI: dwc: Fix scheduling while atomic issues Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 21/41] drm: mali-dp: Call drm_crtc_vblank_reset on device init Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 22/41] scsi: ipr: System hung while dlpar adding primary ipr adapter back Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 23/41] scsi: sd: dont crash the host on invalid commands Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 24/41] net/mlx4: Use cpumask_available for eq->affinity_mask Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 25/41] clocksource/drivers/fttmr010: Fix set_next_event handler Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 26/41] RISC-V: include linux/ftrace.h in asm-prototypes.h Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 27/41] powerpc/tm: Fix userspace r13 corruption Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 28/41] powerpc/tm: Avoid possible userspace r1 corruption on reclaim Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 29/41] iommu/amd: Return devid as alias for ACPI HID devices Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 30/41] powerpc/lib/feature-fixups: use raw_patch_instruction() Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 31/41] Revert "vfs: fix freeze protection in mnt_want_write_file() for overlayfs" Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 32/41] mremap: properly flush TLB before releasing the page Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 33/41] ARC: build: Get rid of toolchain check Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 34/41] ARC: build: Dont set CROSS_COMPILE in archs Makefile Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 35/41] HID: quirks: fix support for Apple Magic Keyboards Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 36/41] drm/i915: Nuke the LVDS lid notifier Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 37/41] staging: ccree: check DMA pool buf !NULL before free Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 38/41] mm: disallow mappings that conflict for devm_memremap_pages() Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 39/41] drm/i915/glk: Add Quirk for GLK NUC HDMI port issues Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.14 40/41] i2c: rcar: handle RXDMA HW behaviour on Gen3 Greg Kroah-Hartman
2018-10-18 17:54 ` Greg Kroah-Hartman [this message]
2018-10-19  1:41 ` [PATCH 4.14 00/41] 4.14.78-stable review Nathan Chancellor
2018-10-19  8:15   ` Greg Kroah-Hartman
2018-10-19 12:10 ` Rafael David Tinoco
2018-10-19 15:49 ` Guenter Roeck
2018-10-19 20:43 ` Shuah Khan
2018-10-22 13:03 ` Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181018175423.636301145@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dennis.dalessandro@intel.com \
    --cc=jgg@mellanox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.j.ruhl@intel.com \
    --cc=mike.marciniszyn@intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).