All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 0/3] scsi: ufs: Let devices remain runtime suspended during system suspend
@ 2021-09-05  9:51 Adrian Hunter
  2021-09-05  9:51 ` [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock Adrian Hunter
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Adrian Hunter @ 2021-09-05  9:51 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Bart Van Assche, Manivannan Sadhasivam,
	Wei Li, linux-scsi

Hi

UFS devices can remain runtime suspended at system suspend time,
if the conditions are right.  Add support for that, first fixing
the impediments.


Changes in V3:

      scsi: ufs: Fix error handler clear ua deadlock

	Correct commit message.
	Amend stable tags to add dependent cherry picks

Changes in V2:

    scsi: ufs: Let devices remain runtime suspended during system suspend

	The ufs-hisi driver uses different RPM and SPM, but it is made
	explicit by a new parameter to suspend prepare.


Adrian Hunter (3):
      scsi: ufs: Fix error handler clear ua deadlock
      scsi: ufs: Fix runtime PM dependencies getting broken
      scsi: ufs: Let devices remain runtime suspended during system suspend

 drivers/scsi/scsi_pm.c      | 16 ++++++---
 drivers/scsi/ufs/ufs-hisi.c |  8 ++++-
 drivers/scsi/ufs/ufshcd.c   | 87 +++++++++++++++++++++++++++++++--------------
 drivers/scsi/ufs/ufshcd.h   | 12 ++++++-
 include/scsi/scsi_device.h  |  1 +
 5 files changed, 90 insertions(+), 34 deletions(-)


Regards
Adrian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock
  2021-09-05  9:51 [PATCH V3 0/3] scsi: ufs: Let devices remain runtime suspended during system suspend Adrian Hunter
@ 2021-09-05  9:51 ` Adrian Hunter
  2021-09-07 14:42   ` Bart Van Assche
  2021-09-05  9:51 ` [PATCH V3 2/3] scsi: ufs: Fix runtime PM dependencies getting broken Adrian Hunter
  2021-09-05  9:51 ` [PATCH V3 3/3] scsi: ufs: Let devices remain runtime suspended during system suspend Adrian Hunter
  2 siblings, 1 reply; 19+ messages in thread
From: Adrian Hunter @ 2021-09-05  9:51 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Bart Van Assche, Manivannan Sadhasivam,
	Wei Li, linux-scsi

There is no guarantee to be able to enter the queue if requests are
blocked. That is because freezing the queue will block entry to the
queue, but freezing also waits for outstanding requests which can make
no progress while the queue is blocked.

That situation can happen when the error handler issues requests to
clear unit attention condition. Requests can be blocked if the
ufshcd_state is UFSHCD_STATE_EH_SCHEDULED_FATAL, which can happen
as a result either of error handler activity, or theoretically a
request that is issued after the error handler unblocks the queue
but before clearing unit attention condition.

The deadlock is very unlikely, so the error handler can be expected
to clear ua at some point anyway, so the simple solution is not to
wait to enter the queue.

Additionally, note that the RPMB queue might be not be entered because
it is runtime suspended, but in that case ua will be cleared at RPMB
runtime resume.

Cc: stable@vger.kernel.org # 5.14+ ac1bc2ba060f: scsi: ufs: Request sense data asynchronously
Cc: stable@vger.kernel.org # 5.14+ 9b5ac8ab4e8b: scsi: ufs: Fix ufshcd_request_sense_async() for Samsung KLUFG8RHDA-B2D1
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---


Changes in V3:

	Correct commit message.
	Amend stable tags to add dependent cherry picks


 drivers/scsi/ufs/ufshcd.c | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 67889d74761c..52fb059efa77 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -224,7 +224,7 @@ static int ufshcd_reset_and_restore(struct ufs_hba *hba);
 static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd);
 static int ufshcd_clear_tm_cmd(struct ufs_hba *hba, int tag);
 static void ufshcd_hba_exit(struct ufs_hba *hba);
-static int ufshcd_clear_ua_wluns(struct ufs_hba *hba);
+static int ufshcd_clear_ua_wluns(struct ufs_hba *hba, bool nowait);
 static int ufshcd_probe_hba(struct ufs_hba *hba, bool async);
 static int ufshcd_setup_clocks(struct ufs_hba *hba, bool on);
 static int ufshcd_uic_hibern8_enter(struct ufs_hba *hba);
@@ -4110,7 +4110,7 @@ int ufshcd_link_recovery(struct ufs_hba *hba)
 		dev_err(hba->dev, "%s: link recovery failed, err %d",
 			__func__, ret);
 	else
-		ufshcd_clear_ua_wluns(hba);
+		ufshcd_clear_ua_wluns(hba, false);
 
 	return ret;
 }
@@ -5974,7 +5974,7 @@ static void ufshcd_err_handling_unprepare(struct ufs_hba *hba)
 	ufshcd_release(hba);
 	if (ufshcd_is_clkscaling_supported(hba))
 		ufshcd_clk_scaling_suspend(hba, false);
-	ufshcd_clear_ua_wluns(hba);
+	ufshcd_clear_ua_wluns(hba, true);
 	ufshcd_rpm_put(hba);
 }
 
@@ -7907,7 +7907,7 @@ static int ufshcd_add_lus(struct ufs_hba *hba)
 	if (ret)
 		goto out;
 
-	ufshcd_clear_ua_wluns(hba);
+	ufshcd_clear_ua_wluns(hba, false);
 
 	/* Initialize devfreq after UFS device is detected */
 	if (ufshcd_is_clkscaling_supported(hba)) {
@@ -7943,7 +7943,8 @@ static void ufshcd_request_sense_done(struct request *rq, blk_status_t error)
 }
 
 static int
-ufshcd_request_sense_async(struct ufs_hba *hba, struct scsi_device *sdev)
+ufshcd_request_sense_async(struct ufs_hba *hba, struct scsi_device *sdev,
+			   bool nowait)
 {
 	/*
 	 * Some UFS devices clear unit attention condition only if the sense
@@ -7951,6 +7952,7 @@ ufshcd_request_sense_async(struct ufs_hba *hba, struct scsi_device *sdev)
 	 */
 	static const u8 cmd[6] = {REQUEST_SENSE, 0, 0, 0, UFS_SENSE_SIZE, 0};
 	struct scsi_request *rq;
+	blk_mq_req_flags_t flags;
 	struct request *req;
 	char *buffer;
 	int ret;
@@ -7959,8 +7961,8 @@ ufshcd_request_sense_async(struct ufs_hba *hba, struct scsi_device *sdev)
 	if (!buffer)
 		return -ENOMEM;
 
-	req = blk_get_request(sdev->request_queue, REQ_OP_DRV_IN,
-			      /*flags=*/BLK_MQ_REQ_PM);
+	flags = BLK_MQ_REQ_PM | (nowait ? BLK_MQ_REQ_NOWAIT : 0);
+	req = blk_get_request(sdev->request_queue, REQ_OP_DRV_IN, flags);
 	if (IS_ERR(req)) {
 		ret = PTR_ERR(req);
 		goto out_free;
@@ -7990,7 +7992,7 @@ ufshcd_request_sense_async(struct ufs_hba *hba, struct scsi_device *sdev)
 	return ret;
 }
 
-static int ufshcd_clear_ua_wlun(struct ufs_hba *hba, u8 wlun)
+static int ufshcd_clear_ua_wlun(struct ufs_hba *hba, u8 wlun, bool nowait)
 {
 	struct scsi_device *sdp;
 	unsigned long flags;
@@ -8016,7 +8018,10 @@ static int ufshcd_clear_ua_wlun(struct ufs_hba *hba, u8 wlun)
 	if (ret)
 		goto out_err;
 
-	ret = ufshcd_request_sense_async(hba, sdp);
+	ret = ufshcd_request_sense_async(hba, sdp, nowait);
+	if (nowait && ret && wlun == UFS_UPIU_RPMB_WLUN &&
+	    pm_runtime_suspended(&sdp->sdev_gendev))
+		ret = 0; /* RPMB runtime resume will clear UAC */
 	scsi_device_put(sdp);
 out_err:
 	if (ret)
@@ -8025,16 +8030,16 @@ static int ufshcd_clear_ua_wlun(struct ufs_hba *hba, u8 wlun)
 	return ret;
 }
 
-static int ufshcd_clear_ua_wluns(struct ufs_hba *hba)
+static int ufshcd_clear_ua_wluns(struct ufs_hba *hba, bool nowait)
 {
 	int ret = 0;
 
 	if (!hba->wlun_dev_clr_ua)
 		goto out;
 
-	ret = ufshcd_clear_ua_wlun(hba, UFS_UPIU_UFS_DEVICE_WLUN);
+	ret = ufshcd_clear_ua_wlun(hba, UFS_UPIU_UFS_DEVICE_WLUN, nowait);
 	if (!ret)
-		ret = ufshcd_clear_ua_wlun(hba, UFS_UPIU_RPMB_WLUN);
+		ret = ufshcd_clear_ua_wlun(hba, UFS_UPIU_RPMB_WLUN, nowait);
 	if (!ret)
 		hba->wlun_dev_clr_ua = false;
 out:
@@ -8656,7 +8661,7 @@ static int ufshcd_set_dev_pwr_mode(struct ufs_hba *hba,
 	 */
 	hba->host->eh_noresume = 1;
 	if (hba->wlun_dev_clr_ua)
-		ufshcd_clear_ua_wlun(hba, UFS_UPIU_UFS_DEVICE_WLUN);
+		ufshcd_clear_ua_wlun(hba, UFS_UPIU_UFS_DEVICE_WLUN, false);
 
 	cmd[4] = pwr_mode << 4;
 
@@ -9825,7 +9830,7 @@ static inline int ufshcd_clear_rpmb_uac(struct ufs_hba *hba)
 
 	if (!hba->wlun_rpmb_clr_ua)
 		return 0;
-	ret = ufshcd_clear_ua_wlun(hba, UFS_UPIU_RPMB_WLUN);
+	ret = ufshcd_clear_ua_wlun(hba, UFS_UPIU_RPMB_WLUN, false);
 	if (!ret)
 		hba->wlun_rpmb_clr_ua = 0;
 	return ret;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V3 2/3] scsi: ufs: Fix runtime PM dependencies getting broken
  2021-09-05  9:51 [PATCH V3 0/3] scsi: ufs: Let devices remain runtime suspended during system suspend Adrian Hunter
  2021-09-05  9:51 ` [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock Adrian Hunter
@ 2021-09-05  9:51 ` Adrian Hunter
  2021-09-05  9:51 ` [PATCH V3 3/3] scsi: ufs: Let devices remain runtime suspended during system suspend Adrian Hunter
  2 siblings, 0 replies; 19+ messages in thread
From: Adrian Hunter @ 2021-09-05  9:51 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Bart Van Assche, Manivannan Sadhasivam,
	Wei Li, linux-scsi

UFS SCSI devices make use of device links to establish PM dependencies.
However, SCSI PM will force devices' runtime PM state to be active during
system resume. That can break runtime PM dependencies for UFS devices.
Fix by adding a flag 'preserve_rpm' to let UFS SCSI devices opt-out of
the unwanted behaviour.

Fixes: b294ff3e34490f ("scsi: ufs: core: Enable power management for wlun")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/scsi/scsi_pm.c     | 16 +++++++++++-----
 drivers/scsi/ufs/ufshcd.c  |  1 +
 include/scsi/scsi_device.h |  1 +
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index 3717eea37ecb..0557c1ad304d 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -73,13 +73,22 @@ static int scsi_dev_type_resume(struct device *dev,
 		int (*cb)(struct device *, const struct dev_pm_ops *))
 {
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
+	struct scsi_device *sdev = NULL;
+	bool preserve_rpm = false;
 	int err = 0;
 
+	if (scsi_is_sdev_device(dev)) {
+		sdev = to_scsi_device(dev);
+		preserve_rpm = sdev->preserve_rpm;
+		if (preserve_rpm && pm_runtime_suspended(dev))
+			return 0;
+	}
+
 	err = cb(dev, pm);
 	scsi_device_resume(to_scsi_device(dev));
 	dev_dbg(dev, "scsi resume: %d\n", err);
 
-	if (err == 0) {
+	if (err == 0 && !preserve_rpm) {
 		pm_runtime_disable(dev);
 		err = pm_runtime_set_active(dev);
 		pm_runtime_enable(dev);
@@ -91,11 +100,8 @@ static int scsi_dev_type_resume(struct device *dev,
 		 *
 		 * The resume hook will correct runtime PM status of the disk.
 		 */
-		if (!err && scsi_is_sdev_device(dev)) {
-			struct scsi_device *sdev = to_scsi_device(dev);
-
+		if (!err && sdev)
 			blk_set_runtime_active(sdev->request_queue);
-		}
 	}
 
 	return err;
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 52fb059efa77..57ed4b93b949 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -5016,6 +5016,7 @@ static int ufshcd_slave_configure(struct scsi_device *sdev)
 		pm_runtime_get_noresume(&sdev->sdev_gendev);
 	else if (ufshcd_is_rpm_autosuspend_allowed(hba))
 		sdev->rpm_autosuspend = 1;
+	sdev->preserve_rpm = 1;
 
 	ufshcd_crypto_setup_rq_keyslot_manager(hba, q);
 
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 09a17f6e93a7..47eb30a6b7b2 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -197,6 +197,7 @@ struct scsi_device {
 	unsigned no_read_disc_info:1;	/* Avoid READ_DISC_INFO cmds */
 	unsigned no_read_capacity_16:1; /* Avoid READ_CAPACITY_16 cmds */
 	unsigned try_rc_10_first:1;	/* Try READ_CAPACACITY_10 first */
+	unsigned preserve_rpm:1;	/* Preserve runtime PM */
 	unsigned security_supported:1;	/* Supports Security Protocols */
 	unsigned is_visible:1;	/* is the device visible in sysfs */
 	unsigned wce_default_on:1;	/* Cache is ON by default */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V3 3/3] scsi: ufs: Let devices remain runtime suspended during system suspend
  2021-09-05  9:51 [PATCH V3 0/3] scsi: ufs: Let devices remain runtime suspended during system suspend Adrian Hunter
  2021-09-05  9:51 ` [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock Adrian Hunter
  2021-09-05  9:51 ` [PATCH V3 2/3] scsi: ufs: Fix runtime PM dependencies getting broken Adrian Hunter
@ 2021-09-05  9:51 ` Adrian Hunter
  2 siblings, 0 replies; 19+ messages in thread
From: Adrian Hunter @ 2021-09-05  9:51 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Bart Van Assche, Manivannan Sadhasivam,
	Wei Li, linux-scsi

If the UFS Device WLUN is runtime suspended and is in the same power
mode, link state and b_rpm_dev_flush_capable (BKOP or WB buffer flush etc)
state, then it can remain runtime suspended instead of being runtime
resumed and then system suspended.

The following patches have cleared the way for that to happen:
  scsi: ufs: Fix runtime PM dependencies getting broken
  scsi: ufs: Fix error handler clear ua deadlock

So amend the logic accordingly.

Note, the ufs-hisi driver uses different RPM and SPM, but it is made
explicit by a new parameter to suspend prepare.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---

Changes in V3:

	None.

Changes in V2:

	The ufs-hisi driver uses different RPM and SPM, but it is made
	explicit by a new parameter to suspend prepare.


 drivers/scsi/ufs/ufs-hisi.c |  8 +++++-
 drivers/scsi/ufs/ufshcd.c   | 53 ++++++++++++++++++++++++++++---------
 drivers/scsi/ufs/ufshcd.h   | 12 ++++++++-
 3 files changed, 58 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/ufs/ufs-hisi.c b/drivers/scsi/ufs/ufs-hisi.c
index 6b706de8354b..4a08fb35642c 100644
--- a/drivers/scsi/ufs/ufs-hisi.c
+++ b/drivers/scsi/ufs/ufs-hisi.c
@@ -396,6 +396,12 @@ static int ufs_hisi_pwr_change_notify(struct ufs_hba *hba,
 	return ret;
 }
 
+static int ufs_hisi_suspend_prepare(struct device *dev)
+{
+	/* RPM and SPM are different. Refer ufs_hisi_suspend() */
+	return __ufshcd_suspend_prepare(dev, false);
+}
+
 static int ufs_hisi_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
 {
 	struct ufs_hisi_host *host = ufshcd_get_variant(hba);
@@ -574,7 +580,7 @@ static int ufs_hisi_remove(struct platform_device *pdev)
 static const struct dev_pm_ops ufs_hisi_pm_ops = {
 	SET_SYSTEM_SLEEP_PM_OPS(ufshcd_system_suspend, ufshcd_system_resume)
 	SET_RUNTIME_PM_OPS(ufshcd_runtime_suspend, ufshcd_runtime_resume, NULL)
-	.prepare	 = ufshcd_suspend_prepare,
+	.prepare	 = ufs_hisi_suspend_prepare,
 	.complete	 = ufshcd_resume_complete,
 };
 
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 57ed4b93b949..453fbb8753e2 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -9722,14 +9722,30 @@ void ufshcd_resume_complete(struct device *dev)
 		ufshcd_rpm_put(hba);
 		hba->complete_put = false;
 	}
-	if (hba->rpmb_complete_put) {
-		ufshcd_rpmb_rpm_put(hba);
-		hba->rpmb_complete_put = false;
-	}
 }
 EXPORT_SYMBOL_GPL(ufshcd_resume_complete);
 
-int ufshcd_suspend_prepare(struct device *dev)
+static bool ufshcd_rpm_ok_for_spm(struct ufs_hba *hba)
+{
+	struct device *dev = &hba->sdev_ufs_device->sdev_gendev;
+	enum ufs_dev_pwr_mode dev_pwr_mode;
+	enum uic_link_state link_state;
+	unsigned long flags;
+	bool res;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	dev_pwr_mode = ufs_get_pm_lvl_to_dev_pwr_mode(hba->spm_lvl);
+	link_state = ufs_get_pm_lvl_to_link_pwr_state(hba->spm_lvl);
+	res = pm_runtime_suspended(dev) &&
+	      hba->curr_dev_pwr_mode == dev_pwr_mode &&
+	      hba->uic_link_state == link_state &&
+	      !hba->dev_info.b_rpm_dev_flush_capable;
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return res;
+}
+
+int __ufshcd_suspend_prepare(struct device *dev, bool rpm_ok_for_spm)
 {
 	struct ufs_hba *hba = dev_get_drvdata(dev);
 	int ret;
@@ -9741,19 +9757,30 @@ int ufshcd_suspend_prepare(struct device *dev)
 	 * Refer ufshcd_resume_complete()
 	 */
 	if (hba->sdev_ufs_device) {
-		ret = ufshcd_rpm_get_sync(hba);
-		if (ret < 0 && ret != -EACCES) {
-			ufshcd_rpm_put(hba);
-			return ret;
+		/* Prevent runtime suspend */
+		ufshcd_rpm_get_noresume(hba);
+		/*
+		 * Check if already runtime suspended in same state as system
+		 * suspend would be.
+		 */
+		if (!rpm_ok_for_spm || !ufshcd_rpm_ok_for_spm(hba)) {
+			/* RPM state is not ok for SPM, so runtime resume */
+			ret = ufshcd_rpm_resume(hba);
+			if (ret < 0 && ret != -EACCES) {
+				ufshcd_rpm_put(hba);
+				return ret;
+			}
 		}
 		hba->complete_put = true;
 	}
-	if (hba->sdev_rpmb) {
-		ufshcd_rpmb_rpm_get_sync(hba);
-		hba->rpmb_complete_put = true;
-	}
 	return 0;
 }
+EXPORT_SYMBOL_GPL(__ufshcd_suspend_prepare);
+
+int ufshcd_suspend_prepare(struct device *dev)
+{
+	return __ufshcd_suspend_prepare(dev, true);
+}
 EXPORT_SYMBOL_GPL(ufshcd_suspend_prepare);
 
 #ifdef CONFIG_PM_SLEEP
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 4723f27a55d1..1dc8024d5211 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -915,7 +915,6 @@ struct ufs_hba {
 #endif
 	u32 luns_avail;
 	bool complete_put;
-	bool rpmb_complete_put;
 };
 
 /* Returns true if clocks can be gated. Otherwise false */
@@ -1175,6 +1174,7 @@ int ufshcd_exec_raw_upiu_cmd(struct ufs_hba *hba,
 
 int ufshcd_wb_toggle(struct ufs_hba *hba, bool enable);
 int ufshcd_suspend_prepare(struct device *dev);
+int __ufshcd_suspend_prepare(struct device *dev, bool rpm_ok_for_spm);
 void ufshcd_resume_complete(struct device *dev);
 
 /* Wrapper functions for safely calling variant operations */
@@ -1383,6 +1383,16 @@ static inline int ufshcd_rpm_put_sync(struct ufs_hba *hba)
 	return pm_runtime_put_sync(&hba->sdev_ufs_device->sdev_gendev);
 }
 
+static inline void ufshcd_rpm_get_noresume(struct ufs_hba *hba)
+{
+	pm_runtime_get_noresume(&hba->sdev_ufs_device->sdev_gendev);
+}
+
+static inline int ufshcd_rpm_resume(struct ufs_hba *hba)
+{
+	return pm_runtime_resume(&hba->sdev_ufs_device->sdev_gendev);
+}
+
 static inline int ufshcd_rpm_put(struct ufs_hba *hba)
 {
 	return pm_runtime_put(&hba->sdev_ufs_device->sdev_gendev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock
  2021-09-05  9:51 ` [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock Adrian Hunter
@ 2021-09-07 14:42   ` Bart Van Assche
  2021-09-07 15:43     ` Adrian Hunter
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2021-09-07 14:42 UTC (permalink / raw)
  To: Adrian Hunter, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Manivannan Sadhasivam, Wei Li, linux-scsi

On 9/5/21 2:51 AM, Adrian Hunter wrote:
> There is no guarantee to be able to enter the queue if requests are
> blocked. That is because freezing the queue will block entry to the
> queue, but freezing also waits for outstanding requests which can make
> no progress while the queue is blocked.
> 
> That situation can happen when the error handler issues requests to
> clear unit attention condition. Requests can be blocked if the
> ufshcd_state is UFSHCD_STATE_EH_SCHEDULED_FATAL, which can happen
> as a result either of error handler activity, or theoretically a
> request that is issued after the error handler unblocks the queue
> but before clearing unit attention condition.
> 
> The deadlock is very unlikely, so the error handler can be expected
> to clear ua at some point anyway, so the simple solution is not to
> wait to enter the queue.

Do you agree that the interaction between ufshcd_scsi_block_requests() and
blk_mq_freeze_queue() can only lead to a deadlock if blk_queue_enter() is
called without using the BLK_MQ_REQ_NOWAIT flag and if unblocking SCSI
request processing can only happen by the same thread?

Do you agree that no ufshcd_clear_ua_wluns() caller blocks SCSI request
processing and hence that it is not necessary to add a "nowait" argument
to ufshcd_clear_ua_wluns()?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock
  2021-09-07 14:42   ` Bart Van Assche
@ 2021-09-07 15:43     ` Adrian Hunter
  2021-09-07 16:56       ` Bart Van Assche
  0 siblings, 1 reply; 19+ messages in thread
From: Adrian Hunter @ 2021-09-07 15:43 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Manivannan Sadhasivam, Wei Li, linux-scsi

On 7/09/21 5:42 pm, Bart Van Assche wrote:
> On 9/5/21 2:51 AM, Adrian Hunter wrote:
>> There is no guarantee to be able to enter the queue if requests are
>> blocked. That is because freezing the queue will block entry to the
>> queue, but freezing also waits for outstanding requests which can make
>> no progress while the queue is blocked.
>>
>> That situation can happen when the error handler issues requests to
>> clear unit attention condition. Requests can be blocked if the
>> ufshcd_state is UFSHCD_STATE_EH_SCHEDULED_FATAL, which can happen
>> as a result either of error handler activity, or theoretically a
>> request that is issued after the error handler unblocks the queue
>> but before clearing unit attention condition.
>>
>> The deadlock is very unlikely, so the error handler can be expected
>> to clear ua at some point anyway, so the simple solution is not to
>> wait to enter the queue.
> 
> Do you agree that the interaction between ufshcd_scsi_block_requests() and
> blk_mq_freeze_queue() can only lead to a deadlock if blk_queue_enter() is
> called without using the BLK_MQ_REQ_NOWAIT flag and if unblocking SCSI
> request processing can only happen by the same thread?

Sure

> Do you agree that no ufshcd_clear_ua_wluns() caller blocks SCSI request
> processing and hence that it is not necessary to add a "nowait" argument
> to ufshcd_clear_ua_wluns()?

No.  Requests cannot make progress when ufshcd_state is
UFSHCD_STATE_EH_SCHEDULED_FATAL, and only the error handler can change that,
so if the error handler is waiting to enter the queue and blk_mq_freeze_queue()
is waiting for outstanding requests, they will deadlock.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock
  2021-09-07 15:43     ` Adrian Hunter
@ 2021-09-07 16:56       ` Bart Van Assche
  2021-09-07 22:36         ` Bart Van Assche
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2021-09-07 16:56 UTC (permalink / raw)
  To: Adrian Hunter, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Manivannan Sadhasivam, Wei Li, linux-scsi

On 9/7/21 8:43 AM, Adrian Hunter wrote:
> No.  Requests cannot make progress when ufshcd_state is
> UFSHCD_STATE_EH_SCHEDULED_FATAL, and only the error handler can change that,
> so if the error handler is waiting to enter the queue and blk_mq_freeze_queue()
> is waiting for outstanding requests, they will deadlock.

How about adding the above text as a comment above ufshcd_clear_ua_wluns() such
that this information becomes available to those who have not followed this
conversation?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock
  2021-09-07 16:56       ` Bart Van Assche
@ 2021-09-07 22:36         ` Bart Van Assche
  2021-09-11 16:47           ` Adrian Hunter
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2021-09-07 22:36 UTC (permalink / raw)
  To: Adrian Hunter, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Manivannan Sadhasivam, Wei Li, linux-scsi

On 9/7/21 9:56 AM, Bart Van Assche wrote:
> On 9/7/21 8:43 AM, Adrian Hunter wrote:
>> No.  Requests cannot make progress when ufshcd_state is
>> UFSHCD_STATE_EH_SCHEDULED_FATAL, and only the error handler can change 
>> that,
>> so if the error handler is waiting to enter the queue and 
>> blk_mq_freeze_queue()
>> is waiting for outstanding requests, they will deadlock.
> 
> How about adding the above text as a comment above 
> ufshcd_clear_ua_wluns() such
> that this information becomes available to those who have not followed this
> conversation?

After having given patch 1/3 some further thought: an unfortunate
effect of this patch is that unit attention clearing is skipped for
the states UFSHCD_STATE_EH_SCHEDULED_FATAL and UFSHCD_STATE_RESET.
How about replacing patch 1/3 with the untested patch below since that
patch does not have the disadvantage of sometimes skipping clearing UA?

Thanks,

Bart.

[PATCH] scsi: ufs: Fix a recently introduced deadlock

Completing pending commands with DID_IMM_RETRY triggers the following
code paths:

   scsi_complete()
   -> scsi_queue_insert()
     -> __scsi_queue_insert()
       -> scsi_device_unbusy()
         -> scsi_dec_host_busy()
	  -> scsi_eh_wakeup()
       -> blk_mq_requeue_request()

   scsi_queue_rq()
   -> scsi_host_queue_ready()
     -> scsi_host_in_recovery()

Fixes: a113eaaf8637 ("scsi: ufs: Synchronize SCSI and UFS error handling")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
  drivers/scsi/ufs/ufshcd.c | 8 ++++++++
  1 file changed, 8 insertions(+)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index c2c614da1fb8..9560f34f3d27 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -2707,6 +2707,14 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
  		}
  		fallthrough;
  	case UFSHCD_STATE_RESET:
+		/*
+		 * The SCSI error handler only starts after all pending commands
+		 * have failed or timed out. Complete commands with
+		 * DID_IMM_RETRY to allow the error handler to start
+		 * if it has been scheduled.
+		 */
+		set_host_byte(cmd, DID_IMM_RETRY);
+		cmd->scsi_done(cmd);
  		err = SCSI_MLQUEUE_HOST_BUSY;
  		goto out;
  	case UFSHCD_STATE_ERROR:

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock
  2021-09-07 22:36         ` Bart Van Assche
@ 2021-09-11 16:47           ` Adrian Hunter
  2021-09-13  3:17             ` Bart Van Assche
  0 siblings, 1 reply; 19+ messages in thread
From: Adrian Hunter @ 2021-09-11 16:47 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Manivannan Sadhasivam, Wei Li, linux-scsi

On 8/09/21 1:36 am, Bart Van Assche wrote:
> On 9/7/21 9:56 AM, Bart Van Assche wrote:
>> On 9/7/21 8:43 AM, Adrian Hunter wrote:
>>> No.  Requests cannot make progress when ufshcd_state is
>>> UFSHCD_STATE_EH_SCHEDULED_FATAL, and only the error handler can change that,
>>> so if the error handler is waiting to enter the queue and blk_mq_freeze_queue()
>>> is waiting for outstanding requests, they will deadlock.
>>
>> How about adding the above text as a comment above ufshcd_clear_ua_wluns() such
>> that this information becomes available to those who have not followed this
>> conversation?
> 
> After having given patch 1/3 some further thought: an unfortunate
> effect of this patch is that unit attention clearing is skipped for
> the states UFSHCD_STATE_EH_SCHEDULED_FATAL and UFSHCD_STATE_RESET.

Only if the error handler is racing with blk_mq_freeze_queue(), but it
is not ideal.

> How about replacing patch 1/3 with the untested patch below since that
> patch does not have the disadvantage of sometimes skipping clearing UA?

I presume you mean without reverting "scsi: ufs: Synchronize SCSI
and UFS error handling" but in that case the deadlock happens because:

error handler is waiting on blk_queue_enter()
blk_queue_enter() is waiting on blk_mq_freeze_queue()
blk_mq_freeze_queue() is waiting on outstanding requests
outstanding requests are blocked by the SCSI error handler shost_state == SHOST_RECOVERY set by scsi_schedule_eh()

> 
> Thanks,
> 
> Bart.
> 
> [PATCH] scsi: ufs: Fix a recently introduced deadlock
> 
> Completing pending commands with DID_IMM_RETRY triggers the following
> code paths:
> 
>   scsi_complete()
>   -> scsi_queue_insert()
>     -> __scsi_queue_insert()
>       -> scsi_device_unbusy()
>         -> scsi_dec_host_busy()
>       -> scsi_eh_wakeup()
>       -> blk_mq_requeue_request()
> 
>   scsi_queue_rq()
>   -> scsi_host_queue_ready()
>     -> scsi_host_in_recovery()
> 
> Fixes: a113eaaf8637 ("scsi: ufs: Synchronize SCSI and UFS error handling")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/ufs/ufshcd.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index c2c614da1fb8..9560f34f3d27 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -2707,6 +2707,14 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
>          }
>          fallthrough;
>      case UFSHCD_STATE_RESET:
> +        /*
> +         * The SCSI error handler only starts after all pending commands
> +         * have failed or timed out. Complete commands with
> +         * DID_IMM_RETRY to allow the error handler to start
> +         * if it has been scheduled.
> +         */
> +        set_host_byte(cmd, DID_IMM_RETRY);
> +        cmd->scsi_done(cmd);

Setting non-zero return value, in this case "err = SCSI_MLQUEUE_HOST_BUSY"
will anyway cause scsi_dec_host_busy(), so does this make any difference?


>          err = SCSI_MLQUEUE_HOST_BUSY;
>          goto out;
>      case UFSHCD_STATE_ERROR:


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock
  2021-09-11 16:47           ` Adrian Hunter
@ 2021-09-13  3:17             ` Bart Van Assche
  2021-09-13  8:53               ` Adrian Hunter
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2021-09-13  3:17 UTC (permalink / raw)
  To: Adrian Hunter, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Manivannan Sadhasivam, Wei Li, linux-scsi

On 9/11/21 09:47, Adrian Hunter wrote:
> On 8/09/21 1:36 am, Bart Van Assche wrote:
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -2707,6 +2707,14 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
>>           }
>>           fallthrough;
>>       case UFSHCD_STATE_RESET:
>> +        /*
>> +         * The SCSI error handler only starts after all pending commands
>> +         * have failed or timed out. Complete commands with
>> +         * DID_IMM_RETRY to allow the error handler to start
>> +         * if it has been scheduled.
>> +         */
>> +        set_host_byte(cmd, DID_IMM_RETRY);
>> +        cmd->scsi_done(cmd);
> 
> Setting non-zero return value, in this case "err = SCSI_MLQUEUE_HOST_BUSY"
> will anyway cause scsi_dec_host_busy(), so does this make any difference?

The return value should be changed into 0 since returning 
SCSI_MLQUEUE_HOST_BUSY is only allowed if cmd->scsi_done(cmd) has not 
yet been called.

I expect that setting the host byte to DID_IMM_RETRY and calling 
scsi_done will make a difference, otherwise I wouldn't have suggested 
this. As explained in my previous email doing that triggers the SCSI 
command completion and resubmission paths. Resubmission only happens if 
the SCSI error handler has not yet been scheduled. The SCSI error 
handler is scheduled after for all pending commands scsi_done() has been 
called or a timeout occurred. In other words, setting the host byte to 
DID_IMM_RETRY and calling scsi_done() makes it possible for the error 
handler to be scheduled, something that won't happen if 
ufshcd_queuecommand() systematically returns SCSI_MLQUEUE_HOST_BUSY. In 
the latter case the block layer timer is reset over and over again. See 
also the blk_mq_start_request() in scsi_queue_rq(). One could wonder 
whether this is really what the SCSI core should do if a SCSI LLD keeps 
returning the SCSI_MLQUEUE_HOST_BUSY status code ...

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock
  2021-09-13  3:17             ` Bart Van Assche
@ 2021-09-13  8:53               ` Adrian Hunter
  2021-09-13 16:33                 ` Bart Van Assche
  0 siblings, 1 reply; 19+ messages in thread
From: Adrian Hunter @ 2021-09-13  8:53 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Manivannan Sadhasivam, Wei Li, linux-scsi

On 13/09/21 6:17 am, Bart Van Assche wrote:
> On 9/11/21 09:47, Adrian Hunter wrote:
>> On 8/09/21 1:36 am, Bart Van Assche wrote:
>>> --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c 
>>> @@ -2707,6 +2707,14 @@ static int ufshcd_queuecommand(struct
>>> Scsi_Host *host, struct scsi_cmnd *cmd) } fallthrough; case
>>> UFSHCD_STATE_RESET: +        /* +         * The SCSI error
>>> handler only starts after all pending commands +         * have
>>> failed or timed out. Complete commands with +         *
>>> DID_IMM_RETRY to allow the error handler to start +         * if
>>> it has been scheduled. +         */ +        set_host_byte(cmd,
>>> DID_IMM_RETRY); +        cmd->scsi_done(cmd);
>> 
>> Setting non-zero return value, in this case "err =
>> SCSI_MLQUEUE_HOST_BUSY" will anyway cause scsi_dec_host_busy(), so
>> does this make any difference?
> 
> The return value should be changed into 0 since returning
> SCSI_MLQUEUE_HOST_BUSY is only allowed if cmd->scsi_done(cmd) has not
> yet been called.
> 
> I expect that setting the host byte to DID_IMM_RETRY and calling
> scsi_done will make a difference, otherwise I wouldn't have suggested
> this. As explained in my previous email doing that triggers the SCSI> command completion and resubmission paths. Resubmission only happens
> if the SCSI error handler has not yet been scheduled. The SCSI error
> handler is scheduled after for all pending commands scsi_done() has
> been called or a timeout occurred. In other words, setting the host
> byte to DID_IMM_RETRY and calling scsi_done() makes it possible for
> the error handler to be scheduled, something that won't happen if
> ufshcd_queuecommand() systematically returns SCSI_MLQUEUE_HOST_BUSY.

Not getting it, sorry. :-(

The error handler sets UFSHCD_STATE_RESET and never leaves the state
as UFSHCD_STATE_RESET, so that case does not need to start the error
handler because it is already running.

The error handler is always scheduled after setting 
UFSHCD_STATE_EH_SCHEDULED_FATAL.

scsi_dec_host_busy() is called for any non-zero return value like
SCSI_MLQUEUE_HOST_BUSY:

i.e.
	reason = scsi_dispatch_cmd(cmd);
	if (reason) {
		scsi_set_blocked(cmd, reason);
		ret = BLK_STS_RESOURCE;
		goto out_dec_host_busy;
	}

	return BLK_STS_OK;

out_dec_host_busy:
	scsi_dec_host_busy(shost, cmd);

And that will wake the error handler:

static void scsi_dec_host_busy(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
{
	unsigned long flags;

	rcu_read_lock();
	__clear_bit(SCMD_STATE_INFLIGHT, &cmd->state);
	if (unlikely(scsi_host_in_recovery(shost))) {
		spin_lock_irqsave(shost->host_lock, flags);
		if (shost->host_failed || shost->host_eh_scheduled)
			scsi_eh_wakeup(shost);
		spin_unlock_irqrestore(shost->host_lock, flags);
	}
	rcu_read_unlock();
}

Note that scsi_host_queue_ready() won't let any requests through
when scsi_host_in_recovery(), so the potential problem is with
requests that have already been successfully submitted to the
UFS driver but have not completed. The change you suggest
does not help with that.

That seems like another problem with the patch 
"scsi: ufs: Synchronize SCSI and UFS error handling".


> In the latter case the block layer timer is reset over and over
> again. See also the blk_mq_start_request() in scsi_queue_rq(). One
> could wonder whether this is really what the SCSI core should do if a
> SCSI LLD keeps returning the SCSI_MLQUEUE_HOST_BUSY status code ...
> 
> Bart.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock
  2021-09-13  8:53               ` Adrian Hunter
@ 2021-09-13 16:33                 ` Bart Van Assche
  2021-09-13 17:13                   ` Adrian Hunter
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2021-09-13 16:33 UTC (permalink / raw)
  To: Adrian Hunter, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Manivannan Sadhasivam, Wei Li, linux-scsi

On 9/13/21 1:53 AM, Adrian Hunter wrote:
> scsi_dec_host_busy() is called for any non-zero return value like
> SCSI_MLQUEUE_HOST_BUSY:
> 
> i.e.
> 	reason = scsi_dispatch_cmd(cmd);
> 	if (reason) {
> 		scsi_set_blocked(cmd, reason);
> 		ret = BLK_STS_RESOURCE;
> 		goto out_dec_host_busy;
> 	}
> 
> 	return BLK_STS_OK;
> 
> out_dec_host_busy:
> 	scsi_dec_host_busy(shost, cmd);
> 
> And that will wake the error handler:
> 
> static void scsi_dec_host_busy(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
> {
> 	unsigned long flags;
> 
> 	rcu_read_lock();
> 	__clear_bit(SCMD_STATE_INFLIGHT, &cmd->state);
> 	if (unlikely(scsi_host_in_recovery(shost))) {
> 		spin_lock_irqsave(shost->host_lock, flags);
> 		if (shost->host_failed || shost->host_eh_scheduled)
> 			scsi_eh_wakeup(shost);
> 		spin_unlock_irqrestore(shost->host_lock, flags);
> 	}
> 	rcu_read_unlock();
> }

Returning SCSI_MLQUEUE_HOST_BUSY is not sufficient to wake up the SCSI
error handler because of the following test in scsi_error_handler():

	shost->host_failed != scsi_host_busy(shost)

As I mentioned in a previous email, all pending commands must have failed
or timed out before the error handler is woken up. Returning
SCSI_MLQUEUE_HOST_BUSY from ufshcd_queuecommand() does not fail a command
and prevents it from timing out. Hence my suggestion to change
"return SCSI_MLQUEUE_HOST_BUSY" into set_host_byte(cmd, DID_IMM_RETRY)
followed by cmd->scsi_done(cmd). A possible alternative is to move the
blk_mq_start_request() call in the SCSI core such that the block layer
request timer is not reset if a SCSI LLD returns SCSI_MLQUEUE_HOST_BUSY.

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock
  2021-09-13 16:33                 ` Bart Van Assche
@ 2021-09-13 17:13                   ` Adrian Hunter
  2021-09-13 20:11                     ` Bart Van Assche
  0 siblings, 1 reply; 19+ messages in thread
From: Adrian Hunter @ 2021-09-13 17:13 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Manivannan Sadhasivam, Wei Li, linux-scsi

On 13/09/21 7:33 pm, Bart Van Assche wrote:
> On 9/13/21 1:53 AM, Adrian Hunter wrote:
>> scsi_dec_host_busy() is called for any non-zero return value like
>> SCSI_MLQUEUE_HOST_BUSY:
>>
>> i.e.
>>     reason = scsi_dispatch_cmd(cmd);
>>     if (reason) {
>>         scsi_set_blocked(cmd, reason);
>>         ret = BLK_STS_RESOURCE;
>>         goto out_dec_host_busy;
>>     }
>>
>>     return BLK_STS_OK;
>>
>> out_dec_host_busy:
>>     scsi_dec_host_busy(shost, cmd);
>>
>> And that will wake the error handler:
>>
>> static void scsi_dec_host_busy(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
>> {
>>     unsigned long flags;
>>
>>     rcu_read_lock();
>>     __clear_bit(SCMD_STATE_INFLIGHT, &cmd->state);
>>     if (unlikely(scsi_host_in_recovery(shost))) {
>>         spin_lock_irqsave(shost->host_lock, flags);
>>         if (shost->host_failed || shost->host_eh_scheduled)
>>             scsi_eh_wakeup(shost);
>>         spin_unlock_irqrestore(shost->host_lock, flags);
>>     }
>>     rcu_read_unlock();
>> }
> 
> Returning SCSI_MLQUEUE_HOST_BUSY is not sufficient to wake up the SCSI
> error handler because of the following test in scsi_error_handler():
> 
>     shost->host_failed != scsi_host_busy(shost)

SCSI_MLQUEUE_HOST_BUSY causes scsi_host_busy() to decrement by calling
scsi_dec_host_busy() as described above, so the request is not being
counted in that condition anymore.

> 
> As I mentioned in a previous email, all pending commands must have failed
> or timed out before the error handler is woken up. Returning
> SCSI_MLQUEUE_HOST_BUSY from ufshcd_queuecommand() does not fail a command
> and prevents it from timing out. Hence my suggestion to change
> "return SCSI_MLQUEUE_HOST_BUSY" into set_host_byte(cmd, DID_IMM_RETRY)
> followed by cmd->scsi_done(cmd). A possible alternative is to move the
> blk_mq_start_request() call in the SCSI core such that the block layer
> request timer is not reset if a SCSI LLD returns SCSI_MLQUEUE_HOST_BUSY.
> 
> Bart.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock
  2021-09-13 17:13                   ` Adrian Hunter
@ 2021-09-13 20:11                     ` Bart Van Assche
  2021-09-14  4:55                       ` Adrian Hunter
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2021-09-13 20:11 UTC (permalink / raw)
  To: Adrian Hunter, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Manivannan Sadhasivam, Wei Li, linux-scsi

On 9/13/21 10:13 AM, Adrian Hunter wrote:
> SCSI_MLQUEUE_HOST_BUSY causes scsi_host_busy() to decrement by calling
> scsi_dec_host_busy() as described above, so the request is not being
> counted in that condition anymore.

Let's take a step back. My understanding is that the deadlock is caused by
the combination of:
* SCSI command processing being blocked because of the state
   UFSHCD_STATE_EH_SCHEDULED_FATAL.
* The sdev_ufs_device and/or sdev_rpmb request queues are frozen
   (blk_mq_freeze_queue() has started).
* A REQUEST SENSE command being scheduled from inside the error handler
   (ufshcd_clear_ua_wlun()).

Is this a theoretical concern or something that has been observed on a test
setup?

If this has been observed on a test setup, was the error handler scheduled
(ufshcd_err_handler())?

I don't see how SCSI command processing could get stuck indefinitely since
it is guaranteed that the UFS error handler will get scheduled and also that
the UFS error handler will change ufshcd_state from
UFSHCD_STATE_EH_SCHEDULED_FATAL into another state?

What am I missing?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock
  2021-09-13 20:11                     ` Bart Van Assche
@ 2021-09-14  4:55                       ` Adrian Hunter
  2021-09-14 22:28                         ` Bart Van Assche
  0 siblings, 1 reply; 19+ messages in thread
From: Adrian Hunter @ 2021-09-14  4:55 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Manivannan Sadhasivam, Wei Li, linux-scsi

On 13/09/21 11:11 pm, Bart Van Assche wrote:
> On 9/13/21 10:13 AM, Adrian Hunter wrote:
>> SCSI_MLQUEUE_HOST_BUSY causes scsi_host_busy() to decrement by calling
>> scsi_dec_host_busy() as described above, so the request is not being
>> counted in that condition anymore.
> 
> Let's take a step back. My understanding is that the deadlock is caused by
> the combination of:
> * SCSI command processing being blocked because of the state
>   UFSHCD_STATE_EH_SCHEDULED_FATAL.

That assumes "scsi: ufs: Synchronize SCSI and UFS error handling" is reverted.
With "scsi: ufs: Synchronize SCSI and UFS error handling" all requests are
blocked because scsi_host_in_recovery().

> * The sdev_ufs_device and/or sdev_rpmb request queues are frozen
>   (blk_mq_freeze_queue() has started).

Yes

> * A REQUEST SENSE command being scheduled from inside the error handler
>   (ufshcd_clear_ua_wlun()).

Not exactly.  It is not possible to enter the queue after freezing starts
so blk_queue_enter() is stuck waiting on any existing requests to exit
the queue, but existing requests are blocked as described above.

> 
> Is this a theoretical concern or something that has been observed on a test
> setup?

It is observed on Samsung Galaxy Book S when suspending.

> 
> If this has been observed on a test setup, was the error handler scheduled
> (ufshcd_err_handler())?

Yes.

> 
> I don't see how SCSI command processing could get stuck indefinitely since
> it is guaranteed that the UFS error handler will get scheduled and also that
> the UFS error handler will change ufshcd_state from
> UFSHCD_STATE_EH_SCHEDULED_FATAL into another state?

The error handler is stuck waiting on the freeze, which is stuck waiting on
requests which are stuck waiting on the error handler.


> 
> What am I missing?

You have not responded to the issues raised by
"scsi: ufs: Synchronize SCSI and UFS error handling"


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock
  2021-09-14  4:55                       ` Adrian Hunter
@ 2021-09-14 22:28                         ` Bart Van Assche
  2021-09-15 15:35                           ` Adrian Hunter
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2021-09-14 22:28 UTC (permalink / raw)
  To: Adrian Hunter, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Manivannan Sadhasivam, Wei Li, linux-scsi

On 9/13/21 9:55 PM, Adrian Hunter wrote:
> On 13/09/21 11:11 pm, Bart Van Assche wrote: >> What am I missing?
> 
> You have not responded to the issues raised by
> "scsi: ufs: Synchronize SCSI and UFS error handling"

Because one of the follow-up messages to that patch was so cryptic that I
did not comprehend it. Anyway, based on the patch at the start of this email
thread I assume that the deadlock is caused by calling blk_get_request()
without the BLK_MQ_REQ_NOWAIT flag from inside a SCSI error handler. How
about fixing this by removing the code that submits a REQUEST SENSE command
and calling scsi_report_bus_reset() or scsi_report_device_reset() instead?
ufshcd_reset_and_restore() already uses that approach to make sure that the
unit attention condition triggered by a reset is not reported to the SCSI
command submitter. I think only if needs_restore == true and
needs_reset == false that ufshcd_err_handler() can trigger a UA condition
without calling scsi_report_bus_reset().

The following code from scsi_error.c makes sure that the UA after a reset
does not reach the upper-level driver:

	case NOT_READY:
	case UNIT_ATTENTION:
		/*
		 * if we are expecting a cc/ua because of a bus reset that we
		 * performed, treat this just as a retry.  otherwise this is
		 * information that we should pass up to the upper-level driver
		 * so that we can deal with it there.
		 */
		if (scmd->device->expecting_cc_ua) {
			/*
			 * Because some device does not queue unit
			 * attentions correctly, we carefully check
			 * additional sense code and qualifier so as
			 * not to squash media change unit attention.
			 */
			if (sshdr.asc != 0x28 || sshdr.ascq != 0x00) {
				scmd->device->expecting_cc_ua = 0;
				return NEEDS_RETRY;
			}
		}

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock
  2021-09-14 22:28                         ` Bart Van Assche
@ 2021-09-15 15:35                           ` Adrian Hunter
  2021-09-15 22:41                             ` Bart Van Assche
  0 siblings, 1 reply; 19+ messages in thread
From: Adrian Hunter @ 2021-09-15 15:35 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Manivannan Sadhasivam, Wei Li, linux-scsi

On 15/09/21 1:28 am, Bart Van Assche wrote:
> On 9/13/21 9:55 PM, Adrian Hunter wrote:
>> On 13/09/21 11:11 pm, Bart Van Assche wrote: >> What am I missing?
>>
>> You have not responded to the issues raised by
>> "scsi: ufs: Synchronize SCSI and UFS error handling"
> 
> Because one of the follow-up messages to that patch was so cryptic that I
> did not comprehend it. Anyway, based on the patch at the start of this email
> thread I assume that the deadlock is caused by calling blk_get_request()
> without the BLK_MQ_REQ_NOWAIT flag from inside a SCSI error handler. How
> about fixing this by removing the code that submits a REQUEST SENSE command
> and calling scsi_report_bus_reset() or scsi_report_device_reset() instead?
> ufshcd_reset_and_restore() already uses that approach to make sure that the
> unit attention condition triggered by a reset is not reported to the SCSI
> command submitter. I think only if needs_restore == true and
> needs_reset == false that ufshcd_err_handler() can trigger a UA condition
> without calling scsi_report_bus_reset().
> 
> The following code from scsi_error.c makes sure that the UA after a reset
> does not reach the upper-level driver:
> 
>     case NOT_READY:
>     case UNIT_ATTENTION:
>         /*
>          * if we are expecting a cc/ua because of a bus reset that we
>          * performed, treat this just as a retry.  otherwise this is
>          * information that we should pass up to the upper-level driver
>          * so that we can deal with it there.
>          */
>         if (scmd->device->expecting_cc_ua) {
>             /*
>              * Because some device does not queue unit
>              * attentions correctly, we carefully check
>              * additional sense code and qualifier so as
>              * not to squash media change unit attention.
>              */
>             if (sshdr.asc != 0x28 || sshdr.ascq != 0x00) {
>                 scmd->device->expecting_cc_ua = 0;
>                 return NEEDS_RETRY;
>             }
>         }
> 
> Bart.


Thanks for the idea.  Unfortunately it does not work for pass-through
requests, refer scsi_noretry_cmd().  sdev_ufs_device and sdev_rpmb are
used with pass-through requests.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock
  2021-09-15 15:35                           ` Adrian Hunter
@ 2021-09-15 22:41                             ` Bart Van Assche
  2021-09-16 17:01                               ` Adrian Hunter
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2021-09-15 22:41 UTC (permalink / raw)
  To: Adrian Hunter, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Manivannan Sadhasivam, Wei Li, linux-scsi

On 9/15/21 8:35 AM, Adrian Hunter wrote:
> Thanks for the idea.  Unfortunately it does not work for pass-through
> requests, refer scsi_noretry_cmd().  sdev_ufs_device and sdev_rpmb are
> used with pass-through requests.

How about allocating and submitting the REQUEST SENSE command from the context
of a workqueue, or in other words, switching back to scsi_execute()? Although
that approach doesn't guarantee that the unit attention condition is cleared
before the first SCSI command is received from outside the UFS driver, I don't
see any other solution since my understanding is that the deadlock between
blk_mq_freeze_queue() and blk_get_request() from inside ufshcd_err_handler()
can also happen without "ufs: Synchronize SCSI and UFS error handling".

The only code I know of that relies on the UFS driver clearing unit attentions
is this code:
https://android.googlesource.com/platform/system/core/+/master/trusty/storage/proxy/rpmb.c
The code that submits a REQUEST SENSE was added in the UFS driver as the result
of a request from the team that maintains the Trusty code. Earlier today I have
been promised that unit attention handling support will be added in Trusty but I
do not when this will be realized.

Bart.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock
  2021-09-15 22:41                             ` Bart Van Assche
@ 2021-09-16 17:01                               ` Adrian Hunter
  0 siblings, 0 replies; 19+ messages in thread
From: Adrian Hunter @ 2021-09-16 17:01 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Manivannan Sadhasivam, Wei Li, linux-scsi

On 16/09/21 1:41 am, Bart Van Assche wrote:
> On 9/15/21 8:35 AM, Adrian Hunter wrote:
>> Thanks for the idea.  Unfortunately it does not work for pass-through
>> requests, refer scsi_noretry_cmd().  sdev_ufs_device and sdev_rpmb are
>> used with pass-through requests.
> 
> How about allocating and submitting the REQUEST SENSE command from the context
> of a workqueue, or in other words, switching back to scsi_execute()? Although
> that approach doesn't guarantee that the unit attention condition is cleared
> before the first SCSI command is received from outside the UFS driver, I don't
> see any other solution since my understanding is that the deadlock between
> blk_mq_freeze_queue() and blk_get_request() from inside ufshcd_err_handler()
> can also happen without "ufs: Synchronize SCSI and UFS error handling".

The issue can also be fixed by sending REQUEST SENSE directly avoiding the
SCSI queues.  Please see V4.

> 
> The only code I know of that relies on the UFS driver clearing unit attentions
> is this code:
> https://android.googlesource.com/platform/system/core/+/master/trusty/storage/proxy/rpmb.c
> The code that submits a REQUEST SENSE was added in the UFS driver as the result
> of a request from the team that maintains the Trusty code. Earlier today I have
> been promised that unit attention handling support will be added in Trusty but I
> do not when this will be realized.
> 
> Bart.
> 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-09-16 17:12 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-05  9:51 [PATCH V3 0/3] scsi: ufs: Let devices remain runtime suspended during system suspend Adrian Hunter
2021-09-05  9:51 ` [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock Adrian Hunter
2021-09-07 14:42   ` Bart Van Assche
2021-09-07 15:43     ` Adrian Hunter
2021-09-07 16:56       ` Bart Van Assche
2021-09-07 22:36         ` Bart Van Assche
2021-09-11 16:47           ` Adrian Hunter
2021-09-13  3:17             ` Bart Van Assche
2021-09-13  8:53               ` Adrian Hunter
2021-09-13 16:33                 ` Bart Van Assche
2021-09-13 17:13                   ` Adrian Hunter
2021-09-13 20:11                     ` Bart Van Assche
2021-09-14  4:55                       ` Adrian Hunter
2021-09-14 22:28                         ` Bart Van Assche
2021-09-15 15:35                           ` Adrian Hunter
2021-09-15 22:41                             ` Bart Van Assche
2021-09-16 17:01                               ` Adrian Hunter
2021-09-05  9:51 ` [PATCH V3 2/3] scsi: ufs: Fix runtime PM dependencies getting broken Adrian Hunter
2021-09-05  9:51 ` [PATCH V3 3/3] scsi: ufs: Let devices remain runtime suspended during system suspend Adrian Hunter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.