All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Matthew R. Ochs" <mrochs@linux.vnet.ibm.com>
To: linux-scsi@vger.kernel.org,
	James Bottomley <James.Bottomley@HansenPartnership.com>,
	"Nicholas A. Bellinger" <nab@linux-iscsi.org>,
	Brian King <brking@linux.vnet.ibm.com>,
	Ian Munsie <imunsie@au1.ibm.com>,
	Daniel Axtens <dja@ozlabs.au.ibm.com>,
	Andrew Donnellan <andrew.donnellan@au1.ibm.com>,
	Tomas Henzl <thenzl@redhat.com>,
	David Laight <David.Laight@ACULAB.COM>
Cc: Michael Neuling <mikey@neuling.org>,
	"Manoj N. Kumar" <manoj@linux.vnet.ibm.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: [PATCH v6 25/37] cxlflash: Fix to prevent EEH recovery failure
Date: Wed, 21 Oct 2015 15:14:56 -0500	[thread overview]
Message-ID: <1445458496-57625-1-git-send-email-mrochs@linux.vnet.ibm.com> (raw)
In-Reply-To: <1445458134-63197-1-git-send-email-mrochs@linux.vnet.ibm.com>

The process_sense() routine can perform a read capacity which
can take some time to complete. If an EEH occurs while waiting
on the read capacity, the EEH handler will wait to obtain the
context's mutex in order to put the context in an error state.
The EEH handler will sit and wait until the context is free,
but this wait can potentially last forever (deadlock) if the
scsi_execute() that performs the read capacity experiences a
timeout and calls into the reset callback. When that occurs,
the reset callback sees that the device is already being reset
and waits for the reset to complete. This leaves two threads
waiting on the other.

To address this issue, make the context unavailable to new,
non-system owned threads and release the context while calling
into process_sense(). After returning from process_sense() the
context mutex is reacquired and the context is made available
again. The context can be safely moved to the error state if
needed during the unavailable window as no other threads will
hold its reference.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
---
 drivers/scsi/cxlflash/superpipe.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/scsi/cxlflash/superpipe.c b/drivers/scsi/cxlflash/superpipe.c
index a6316f5..7283e83 100644
--- a/drivers/scsi/cxlflash/superpipe.c
+++ b/drivers/scsi/cxlflash/superpipe.c
@@ -1787,12 +1787,21 @@ static int cxlflash_disk_verify(struct scsi_device *sdev,
 	 * inquiry (i.e. the Unit attention is due to the WWN changing).
 	 */
 	if (verify->hint & DK_CXLFLASH_VERIFY_HINT_SENSE) {
+		/* Can't hold mutex across process_sense/read_cap16,
+		 * since we could have an intervening EEH event.
+		 */
+		ctxi->unavail = true;
+		mutex_unlock(&ctxi->mutex);
 		rc = process_sense(sdev, verify);
 		if (unlikely(rc)) {
 			dev_err(dev, "%s: Failed to validate sense data (%d)\n",
 				__func__, rc);
+			mutex_lock(&ctxi->mutex);
+			ctxi->unavail = false;
 			goto out;
 		}
+		mutex_lock(&ctxi->mutex);
+		ctxi->unavail = false;
 	}
 
 	switch (gli->mode) {
-- 
2.1.0


  parent reply	other threads:[~2015-10-21 20:16 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-21 20:08 [PATCH v6 00/37] cxlflash: Miscellaneous bug fixes and corrections Matthew R. Ochs
2015-10-21 20:10 ` [PATCH v6 01/37] cxlflash: Fix to avoid invalid port_sel value Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 02/37] cxlflash: Replace magic numbers with literals Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 03/37] cxlflash: Fix read capacity timeout Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 04/37] cxlflash: Fix potential oops following LUN removal Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 05/37] cxlflash: Fix data corruption when vLUN used over multiple cards Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 06/37] cxlflash: Fix to avoid sizeof(bool) Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 07/37] cxlflash: Fix context encode mask width Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 08/37] cxlflash: Fix to avoid CXL services during EEH Matthew R. Ochs
2015-10-21 20:12 ` [PATCH v6 09/37] cxlflash: Correct naming of limbo state and waitq Matthew R. Ochs
2015-10-21 20:12 ` [PATCH v6 10/37] cxlflash: Make functions static Matthew R. Ochs
2015-10-21 20:12 ` [PATCH v6 11/37] cxlflash: Refine host/device attributes Matthew R. Ochs
2015-10-23 13:33   ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 12/37] cxlflash: Fix to avoid spamming the kernel log Matthew R. Ochs
2015-10-23 13:33   ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 13/37] cxlflash: Fix to avoid stall while waiting on TMF Matthew R. Ochs
2015-10-23 13:36   ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 14/37] cxlflash: Fix location of setting resid Matthew R. Ochs
2015-10-23 13:37   ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 15/37] cxlflash: Fix host link up event handling Matthew R. Ochs
2015-10-23 13:38   ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 16/37] cxlflash: Fix async interrupt bypass logic Matthew R. Ochs
2015-10-23  3:40   ` Andrew Donnellan
2015-10-23 13:39   ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 17/37] cxlflash: Remove dual port online dependency Matthew R. Ochs
2015-10-23 13:39   ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 18/37] cxlflash: Fix AFU version access/storage and add check Matthew R. Ochs
2015-10-23 13:40   ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 19/37] cxlflash: Correct usage of scsi_host_put() Matthew R. Ochs
2015-10-23 13:41   ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 20/37] cxlflash: Fix to prevent workq from accessing freed memory Matthew R. Ochs
2015-10-23 13:41   ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 21/37] cxlflash: Correct behavior in device reset handler following EEH Matthew R. Ochs
2015-10-23 13:42   ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 22/37] cxlflash: Remove unnecessary scsi_block_requests Matthew R. Ochs
2015-10-23 13:42   ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 23/37] cxlflash: Fix function prolog parameters and return codes Matthew R. Ochs
2015-10-23 13:45   ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 24/37] cxlflash: Fix MMIO and endianness errors Matthew R. Ochs
2015-10-23 13:53   ` Tomas Henzl
2015-10-21 20:14 ` Matthew R. Ochs [this message]
2015-10-23 13:54   ` [PATCH v6 25/37] cxlflash: Fix to prevent EEH recovery failure Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 26/37] cxlflash: Correct spelling, grammar, and alignment mistakes Matthew R. Ochs
2015-10-23 13:54   ` Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 27/37] cxlflash: Fix to prevent stale AFU RRQ Matthew R. Ochs
2015-10-23 13:55   ` Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 28/37] MAINTAINERS: Add cxlflash driver Matthew R. Ochs
2015-10-21 20:15 ` [PATCH v6 29/37] cxlflash: Fix to double the delay each time Matthew R. Ochs
2015-10-23 13:57   ` Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 30/37] cxlflash: Fix to avoid corrupting adapter fops Matthew R. Ochs
2015-10-23 14:00   ` Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 31/37] cxlflash: Correct trace string Matthew R. Ochs
2015-10-23 14:00   ` Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 32/37] cxlflash: Fix to avoid potential deadlock on EEH Matthew R. Ochs
2015-10-23 14:01   ` Tomas Henzl
2015-10-21 20:16 ` [PATCH v6 33/37] cxlflash: Fix to avoid leaving dangling interrupt resources Matthew R. Ochs
2015-10-23 14:01   ` Tomas Henzl
2015-10-21 20:16 ` [PATCH v6 34/37] cxlflash: Fix to escalate to LINK_RESET on login timeout Matthew R. Ochs
2015-10-23 14:01   ` Tomas Henzl
2015-10-21 20:16 ` [PATCH v6 35/37] cxlflash: Fix to avoid corrupting port selection mask Matthew R. Ochs
2015-10-22 17:17   ` Manoj Kumar
2015-10-23  3:52   ` Andrew Donnellan
2015-10-21 20:16 ` [PATCH v6 36/37] cxlflash: Fix to avoid lock instrumentation rejection Matthew R. Ochs
2015-10-22 17:34   ` Manoj Kumar
2015-10-23  3:22   ` Andrew Donnellan
2015-10-21 20:16 ` [PATCH v6 37/37] cxlflash: Fix to avoid bypassing context cleanup Matthew R. Ochs
2015-10-22  2:01   ` Andrew Donnellan
2015-10-22 18:05   ` Manoj Kumar
2015-10-27 23:30 ` [PATCH v6 00/37] cxlflash: Miscellaneous bug fixes and corrections Matthew R. Ochs
2015-10-27 23:30   ` Matthew R. Ochs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1445458496-57625-1-git-send-email-mrochs@linux.vnet.ibm.com \
    --to=mrochs@linux.vnet.ibm.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=andrew.donnellan@au1.ibm.com \
    --cc=brking@linux.vnet.ibm.com \
    --cc=dja@ozlabs.au.ibm.com \
    --cc=imunsie@au1.ibm.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=manoj@linux.vnet.ibm.com \
    --cc=mikey@neuling.org \
    --cc=nab@linux-iscsi.org \
    --cc=thenzl@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.