All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bart Van Assche <bvanassche@acm.org>
To: "Martin K . Petersen" <martin.petersen@oracle.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>,
	linux-scsi@vger.kernel.org,
	Adrian Hunter <adrian.hunter@intel.com>,
	Bart Van Assche <bvanassche@acm.org>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	Bean Huo <beanhuo@micron.com>, Avri Altman <avri.altman@wdc.com>,
	Jinyoung Choi <j-young.choi@samsung.com>
Subject: [PATCH 8/8] scsi: ufs: Fix deadlock between power management and error handler
Date: Fri, 23 Sep 2022 13:11:38 -0700	[thread overview]
Message-ID: <20220923201138.2113123-9-bvanassche@acm.org> (raw)
In-Reply-To: <20220923201138.2113123-1-bvanassche@acm.org>

The following deadlock has been observed on multiple test setups:
* ufshcd_wl_suspend() is waiting for blk_execute_rq() to complete while it
  holds host_sem.
* ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and the
  latter function tries to obtain host_sem.
This is a deadlock because blk_execute_rq() can't execute SCSI commands
while the host is in the SHOST_RECOVERY state and because the error
handler cannot make progress either.

Fix this deadlock by calling the UFS error handler directly during system
suspend instead of relying on the error handling mechanism in the SCSI
core.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufshcd.c | 43 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index abeb120b12eb..996641fc1176 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -6405,9 +6405,19 @@ static void ufshcd_err_handler(struct work_struct *work)
 {
 	struct ufs_hba *hba = container_of(work, struct ufs_hba, eh_work);
 
-	down(&hba->host_sem);
-	ufshcd_recover_link(hba);
-	up(&hba->host_sem);
+	/*
+	 * Serialize suspend/resume and error handling because a deadlock
+	 * occurs if the error handler runs concurrently with
+	 * ufshcd_set_dev_pwr_mode().
+	 */
+	if (mutex_trylock(&system_transition_mutex)) {
+		down(&hba->host_sem);
+		ufshcd_recover_link(hba);
+		up(&hba->host_sem);
+		mutex_unlock(&system_transition_mutex);
+	} else {
+		pr_info("%s: suspend or resume is ongoing\n", __func__);
+	}
 }
 
 /**
@@ -8298,6 +8308,25 @@ static void ufshcd_async_scan(void *data, async_cookie_t cookie)
 	}
 }
 
+static enum scsi_timeout_action ufshcd_eh_timed_out(struct scsi_cmnd *scmd)
+{
+	struct ufs_hba *hba = shost_priv(scmd->device->host);
+
+	if (!hba->system_suspending) {
+		/* Apply the SCSI core error handling strategy. */
+		return SCSI_EH_NOT_HANDLED;
+	}
+
+	/*
+	 * Fail START STOP UNIT commands without activating the SCSI error
+	 * handler to prevent a deadlock between ufshcd_set_dev_pwr_mode() and
+	 * ufshcd_err_handler().
+	 */
+	set_host_byte(scmd, DID_TIME_OUT);
+	scsi_done(scmd);
+	return SCSI_EH_DONE;
+}
+
 static const struct attribute_group *ufshcd_driver_groups[] = {
 	&ufs_sysfs_unit_descriptor_group,
 	&ufs_sysfs_lun_attributes_group,
@@ -8332,6 +8361,7 @@ static struct scsi_host_template ufshcd_driver_template = {
 	.eh_abort_handler	= ufshcd_abort,
 	.eh_device_reset_handler = ufshcd_eh_device_reset_handler,
 	.eh_host_reset_handler   = ufshcd_eh_host_reset_handler,
+	.eh_timed_out		= ufshcd_eh_timed_out,
 	.this_id		= -1,
 	.sg_tablesize		= SG_ALL,
 	.cmd_per_lun		= UFSHCD_CMD_PER_LUN,
@@ -8791,6 +8821,13 @@ static int ufshcd_set_dev_pwr_mode(struct ufs_hba *hba,
 			break;
 		if (host_byte(ret) == 0 && scsi_status_is_good(ret))
 			break;
+		/*
+		 * Calling the error handler directly when suspending or
+		 * resuming the system since the callers of this function hold
+		 * hba->host_sem in that case.
+		 */
+		if (host_byte(ret) != 0 && hba->system_suspending)
+			ufshcd_recover_link(hba);
 	}
 	if (ret) {
 		sdev_printk(KERN_WARNING, sdp,

  parent reply	other threads:[~2022-09-23 20:13 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-23 20:11 [PATCH 0/8] Fix a deadlock in the UFS driver Bart Van Assche
2022-09-23 20:11 ` [PATCH 1/8] scsi: core: Fix a race between scsi_done() and scsi_times_out() Bart Van Assche
2022-09-23 20:11 ` [PATCH 2/8] scsi: core: Change the return type of .eh_timed_out() Bart Van Assche
2022-09-24  2:10   ` kernel test robot
2022-09-23 20:11 ` [PATCH 3/8] scsi: ufs: Remove an outdated comment Bart Van Assche
2022-09-23 20:11 ` [PATCH 4/8] scsi: ufs: Use 'else' in ufshcd_set_dev_pwr_mode() Bart Van Assche
2022-09-23 20:11 ` [PATCH 5/8] scsi: ufs: Try harder to change the power mode Bart Van Assche
2022-09-27 10:41   ` Avri Altman
2022-09-27 16:44     ` Bart Van Assche
2022-09-23 20:11 ` [PATCH 6/8] scsi: ufs: Split ufshcd_err_handler() Bart Van Assche
2022-09-23 20:11 ` [PATCH 7/8] scsi: ufs: Add a PM notifier Bart Van Assche
2022-09-23 20:11 ` Bart Van Assche [this message]
2022-09-24 15:06   ` [PATCH 8/8] scsi: ufs: Fix deadlock between power management and error handler kernel test robot
2022-09-27 17:06   ` Adrian Hunter
2022-09-27 17:54     ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220923201138.2113123-9-bvanassche@acm.org \
    --to=bvanassche@acm.org \
    --cc=adrian.hunter@intel.com \
    --cc=avri.altman@wdc.com \
    --cc=beanhuo@micron.com \
    --cc=j-young.choi@samsung.com \
    --cc=jaegeuk@kernel.org \
    --cc=jejb@linux.ibm.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.