linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/2] Fix deadlock in ufs
@ 2021-01-27  4:00 Asutosh Das
  2021-01-27  4:00 ` [RFC PATCH v1 1/2] block: bsg: resume scsi device before accessing Asutosh Das
                   ` (6 more replies)
  0 siblings, 7 replies; 31+ messages in thread
From: Asutosh Das @ 2021-01-27  4:00 UTC (permalink / raw)
  To: cang, martin.petersen, linux-scsi; +Cc: Asutosh Das, linux-arm-msm, stern

This patchset attempts to fix a deadlock in ufs.
This deadlock occurs because the ufs host driver tries to resume
its child (wlun scsi device) to send SSU to it during its suspend.

Asutosh Das (2):
  block: bsg: resume scsi device before accessing
  scsi: ufs: Fix deadlock while suspending ufs host

 block/bsg.c               |  8 ++++++++
 drivers/scsi/ufs/ufshcd.c | 18 ++----------------
 2 files changed, 10 insertions(+), 16 deletions(-)

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH v1 1/2] block: bsg: resume scsi device before accessing
  2021-01-27  4:00 [RFC PATCH v1 0/2] Fix deadlock in ufs Asutosh Das
@ 2021-01-27  4:00 ` Asutosh Das
  2021-01-27  7:09   ` Avri Altman
  2021-02-07  2:23   ` Bart Van Assche
  2021-01-27  4:00 ` [RFC PATCH v1 2/2] scsi: ufs: Fix deadlock while suspending ufs host Asutosh Das
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 31+ messages in thread
From: Asutosh Das @ 2021-01-27  4:00 UTC (permalink / raw)
  To: cang, martin.petersen, linux-scsi
  Cc: Asutosh Das, linux-arm-msm, stern, Bao D . Nguyen,
	FUJITA Tomonori, Jens Axboe, open list:BLOCK LAYER, open list

Resumes the scsi device before accessing it.

Change-Id: I2929af60f2a92c89704a582fcdb285d35b429fde
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bao D. Nguyen <nguyenb@codeaurora.org>
---
 block/bsg.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/block/bsg.c b/block/bsg.c
index d7bae94..f4c197f 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -306,12 +306,16 @@ static struct bsg_device *bsg_get_device(struct inode *inode, struct file *file)
 static int bsg_open(struct inode *inode, struct file *file)
 {
 	struct bsg_device *bd;
+	struct scsi_device *sd;
 
 	bd = bsg_get_device(inode, file);
 
 	if (IS_ERR(bd))
 		return PTR_ERR(bd);
 
+	sd = (struct scsi_device *) bd->queue->queuedata;
+	if (scsi_autopm_get_device(sd))
+		return -EIO;
 	file->private_data = bd;
 	return 0;
 }
@@ -319,8 +323,12 @@ static int bsg_open(struct inode *inode, struct file *file)
 static int bsg_release(struct inode *inode, struct file *file)
 {
 	struct bsg_device *bd = file->private_data;
+	struct scsi_device *sd;
 
 	file->private_data = NULL;
+	sd = (struct scsi_device *) bd->queue->queuedata;
+	scsi_autopm_put_device(sd);
+
 	return bsg_put_device(bd);
 }
 
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH v1 2/2] scsi: ufs: Fix deadlock while suspending ufs host
  2021-01-27  4:00 [RFC PATCH v1 0/2] Fix deadlock in ufs Asutosh Das
  2021-01-27  4:00 ` [RFC PATCH v1 1/2] block: bsg: resume scsi device before accessing Asutosh Das
@ 2021-01-27  4:00 ` Asutosh Das
  2021-01-27 15:22 ` [RFC PATCH v1 0/2] Fix deadlock in ufs Bjorn Andersson
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 31+ messages in thread
From: Asutosh Das @ 2021-01-27  4:00 UTC (permalink / raw)
  To: cang, martin.petersen, linux-scsi
  Cc: Asutosh Das, linux-arm-msm, stern, Bao D . Nguyen, Alim Akhtar,
	Avri Altman, James E.J. Bottomley, Stanley Chu, Bean Huo,
	open list

During runtime-suspend of ufs host, the scsi devices are
already suspended and so are the queues associated with them.
But the ufs host sends SSU to wlun during its runtime-suspend.
During the process blk_queue_enter checks if the queue is not in
suspended state. If so, it waits for the queue to resume, and never
comes out of it.
The commit
(d55d15a33: scsi: block: Do not accept any requests while suspended)
adds the check if the queue is in suspended state in blk_queue_enter().

Fix this, by decoupling wlun scsi devices from block layer pm.
The runtime-pm for these devices would be managed by bsg and sg drivers.

Call trace:
 __switch_to+0x174/0x2c4
 __schedule+0x478/0x764
 schedule+0x9c/0xe0
 blk_queue_enter+0x158/0x228
 blk_mq_alloc_request+0x40/0xa4
 blk_get_request+0x2c/0x70
 __scsi_execute+0x60/0x1c4
 ufshcd_set_dev_pwr_mode+0x124/0x1e4
 ufshcd_suspend+0x208/0x83c
 ufshcd_runtime_suspend+0x40/0x154
 ufshcd_pltfrm_runtime_suspend+0x14/0x20
 pm_generic_runtime_suspend+0x28/0x3c
 __rpm_callback+0x80/0x2a4
 rpm_suspend+0x308/0x614
 rpm_idle+0x158/0x228
 pm_runtime_work+0x84/0xac
 process_one_work+0x1f0/0x470
 worker_thread+0x26c/0x4c8
 kthread+0x13c/0x320
 ret_from_fork+0x10/0x18

Change-Id: Id777fd52493c8b5522d1ebcad73cd30dac33e8a4
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bao D. Nguyen <nguyenb@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c | 18 ++----------------
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 9c691e4..b7e7f81 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -7217,16 +7217,6 @@ static void ufshcd_set_active_icc_lvl(struct ufs_hba *hba)
 	kfree(desc_buf);
 }
 
-static inline void ufshcd_blk_pm_runtime_init(struct scsi_device *sdev)
-{
-	scsi_autopm_get_device(sdev);
-	blk_pm_runtime_init(sdev->request_queue, &sdev->sdev_gendev);
-	if (sdev->rpm_autosuspend)
-		pm_runtime_set_autosuspend_delay(&sdev->sdev_gendev,
-						 RPM_AUTOSUSPEND_DELAY_MS);
-	scsi_autopm_put_device(sdev);
-}
-
 /**
  * ufshcd_scsi_add_wlus - Adds required W-LUs
  * @hba: per-adapter instance
@@ -7265,7 +7255,6 @@ static int ufshcd_scsi_add_wlus(struct ufs_hba *hba)
 		hba->sdev_ufs_device = NULL;
 		goto out;
 	}
-	ufshcd_blk_pm_runtime_init(hba->sdev_ufs_device);
 	scsi_device_put(hba->sdev_ufs_device);
 
 	hba->sdev_rpmb = __scsi_add_device(hba->host, 0, 0,
@@ -7274,17 +7263,14 @@ static int ufshcd_scsi_add_wlus(struct ufs_hba *hba)
 		ret = PTR_ERR(hba->sdev_rpmb);
 		goto remove_sdev_ufs_device;
 	}
-	ufshcd_blk_pm_runtime_init(hba->sdev_rpmb);
 	scsi_device_put(hba->sdev_rpmb);
 
 	sdev_boot = __scsi_add_device(hba->host, 0, 0,
 		ufshcd_upiu_wlun_to_scsi_wlun(UFS_UPIU_BOOT_WLUN), NULL);
-	if (IS_ERR(sdev_boot)) {
+	if (IS_ERR(sdev_boot))
 		dev_err(hba->dev, "%s: BOOT WLUN not found\n", __func__);
-	} else {
-		ufshcd_blk_pm_runtime_init(sdev_boot);
+	else
 		scsi_device_put(sdev_boot);
-	}
 	goto out;
 
 remove_sdev_ufs_device:
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH v1 1/2] block: bsg: resume scsi device before accessing
  2021-01-27  4:00 ` [RFC PATCH v1 1/2] block: bsg: resume scsi device before accessing Asutosh Das
@ 2021-01-27  7:09   ` Avri Altman
  2021-01-27  7:59     ` Can Guo
  2021-02-07  2:23   ` Bart Van Assche
  1 sibling, 1 reply; 31+ messages in thread
From: Avri Altman @ 2021-01-27  7:09 UTC (permalink / raw)
  To: Asutosh Das, cang, martin.petersen, linux-scsi
  Cc: linux-arm-msm, stern, Bao D . Nguyen, FUJITA Tomonori,
	Jens Axboe, open list:BLOCK LAYER, open list

> 
> Resumes the scsi device before accessing it.
> 
> Change-Id: I2929af60f2a92c89704a582fcdb285d35b429fde
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Can Guo <cang@codeaurora.org>
> Signed-off-by: Bao D. Nguyen <nguyenb@codeaurora.org>
Following this patch, is it possible to revert commit 74e5e468b664d?

Thanks,
Avri


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 1/2] block: bsg: resume scsi device before accessing
  2021-01-27  7:09   ` Avri Altman
@ 2021-01-27  7:59     ` Can Guo
  2021-01-27  8:53       ` Can Guo
  0 siblings, 1 reply; 31+ messages in thread
From: Can Guo @ 2021-01-27  7:59 UTC (permalink / raw)
  To: Avri Altman
  Cc: Asutosh Das, martin.petersen, linux-scsi, linux-arm-msm, stern,
	Bao D . Nguyen, FUJITA Tomonori, Jens Axboe,
	open list:BLOCK LAYER, open list

On 2021-01-27 15:09, Avri Altman wrote:
>> 
>> Resumes the scsi device before accessing it.
>> 
>> Change-Id: I2929af60f2a92c89704a582fcdb285d35b429fde
>> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
>> Signed-off-by: Can Guo <cang@codeaurora.org>
>> Signed-off-by: Bao D. Nguyen <nguyenb@codeaurora.org>
> Following this patch, is it possible to revert commit 74e5e468b664d?
> 

No, but this is a good finding... This change assumes
that the queue->queue_data is a scsi_device, which is
why we call scsi_auto_pm_get(). But for ufs_bsg's case,
queue->queue_data is a device...

Thanks,
Can Guo.

> Thanks,
> Avri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 1/2] block: bsg: resume scsi device before accessing
  2021-01-27  7:59     ` Can Guo
@ 2021-01-27  8:53       ` Can Guo
  0 siblings, 0 replies; 31+ messages in thread
From: Can Guo @ 2021-01-27  8:53 UTC (permalink / raw)
  To: Avri Altman
  Cc: Asutosh Das, martin.petersen, linux-scsi, linux-arm-msm, stern,
	Bao D . Nguyen, FUJITA Tomonori, Jens Axboe,
	open list:BLOCK LAYER, open list

On 2021-01-27 15:59, Can Guo wrote:
> On 2021-01-27 15:09, Avri Altman wrote:
>>> 
>>> Resumes the scsi device before accessing it.
>>> 
>>> Change-Id: I2929af60f2a92c89704a582fcdb285d35b429fde
>>> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
>>> Signed-off-by: Can Guo <cang@codeaurora.org>
>>> Signed-off-by: Bao D. Nguyen <nguyenb@codeaurora.org>
>> Following this patch, is it possible to revert commit 74e5e468b664d?
>> 
> 
> No, but this is a good finding... This change assumes
> that the queue->queue_data is a scsi_device, which is
> why we call scsi_auto_pm_get(). But for ufs_bsg's case,
> queue->queue_data is a device...
> 

If we call pm_runtime_get/put_sync(bcd->class_dev->parent) in
bsg_get/put_device(), commit 74e5e468b664d can be reverted.
This is just a rough idea.

> Thanks,
> Can Guo.
> 
>> Thanks,
>> Avri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 0/2] Fix deadlock in ufs
  2021-01-27  4:00 [RFC PATCH v1 0/2] Fix deadlock in ufs Asutosh Das
  2021-01-27  4:00 ` [RFC PATCH v1 1/2] block: bsg: resume scsi device before accessing Asutosh Das
  2021-01-27  4:00 ` [RFC PATCH v1 2/2] scsi: ufs: Fix deadlock while suspending ufs host Asutosh Das
@ 2021-01-27 15:22 ` Bjorn Andersson
  2021-01-27 16:16   ` Asutosh Das
  2021-01-28  3:26 ` [RFC PATCH v2 " Asutosh Das
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 31+ messages in thread
From: Bjorn Andersson @ 2021-01-27 15:22 UTC (permalink / raw)
  To: Asutosh Das; +Cc: cang, martin.petersen, linux-scsi, linux-arm-msm, stern

On Tue 26 Jan 20:00 PST 2021, Asutosh Das wrote:

> This patchset attempts to fix a deadlock in ufs.
> This deadlock occurs because the ufs host driver tries to resume
> its child (wlun scsi device) to send SSU to it during its suspend.
> 

I've been trying to bisect a regression currently seen on all Qualcomm
boards I've tried running next-20210125 on, but have yet to figure out
the actual culprit. I was hoping this might be related, but no.

The following is the UFS related logs from a SDM845 board:

[    4.556147] ufshcd-qcom 1d84000.ufshc: Adding to iommu group 10
[   14.823635] ufshcd-qcom 1d84000.ufshc: ufshcd_populate_vreg: Unable to find vdd-hba-supply regulator, assuming enabled
[   14.834483] ufshcd-qcom 1d84000.ufshc: ufshcd_populate_vreg: Unable to find vccq-supply regulator, assuming enabled
[   14.845022] ufshcd-qcom 1d84000.ufshc: ufshcd_populate_vreg: Unable to find vccq2-supply regulator, assuming enabled
[   14.859278] scsi host0: ufshcd
[   14.894076] ufshcd-qcom 1d84000.ufshc: ufshcd_print_pwr_info:[RX, TX]: gear=[1, 1], lane[1, 1], pwr[SLOWAUTO_MODE, SLOWAUTO_MODE], rate = 0
[   14.972730] ufshcd-qcom 1d84000.ufshc: ufshcd_find_max_sup_active_icc_level: Regulator capability was not set, actvIccLevel=0
[   14.988386] scsi 0:0:0:49488: Well-known LUN    SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
[   15.002508] scsi 0:0:0:49476: Well-known LUN    SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
[   15.015848] scsi 0:0:0:49456: Well-known LUN    SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
[   15.031551] scsi 0:0:0:0: Direct-Access     SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
[   15.046885] scsi 0:0:0:1: Direct-Access     SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
[   15.047204] sd 0:0:0:0: [sda] 14145536 4096-byte logical blocks: (57.9 GB/54.0 GiB)
[   15.047274] sd 0:0:0:0: [sda] Write Protect is off
[   15.047281] sd 0:0:0:0: [sda] Mode Sense: 00 32 00 10
[   15.047418] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[   15.047488] sd 0:0:0:0: [sda] Optimal transfer size 786432 bytes
[   15.062041] scsi 0:0:0:2: Direct-Access     SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
[   15.062347] sd 0:0:0:1: [sdb] 1024 4096-byte logical blocks: (4.19 MB/4.00 MiB)
[   15.062414] sd 0:0:0:1: [sdb] Write Protect is off
[   15.062418] sd 0:0:0:1: [sdb] Mode Sense: 00 32 00 10
[   15.062522] sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
[   15.062582] sd 0:0:0:1: [sdb] Optimal transfer size 786432 bytes
[   15.074503] scsi 0:0:0:3: Direct-Access     SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
[   15.075092] sd 0:0:0:2: [sdc] 1024 4096-byte logical blocks: (4.19 MB/4.00 MiB)
[   15.075161] sd 0:0:0:2: [sdc] Write Protect is off
[   15.075166] sd 0:0:0:2: [sdc] Mode Sense: 00 32 00 10
[   15.075292] sd 0:0:0:2: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
[   15.075358] sd 0:0:0:2: [sdc] Optimal transfer size 786432 bytes
[   15.093820] ufshcd-qcom 1d84000.ufshc: dme-set: attr-id 0x1583 val 0x0 failed 0 retries
[   15.199208] ufshcd-qcom 1d84000.ufshc: dme-set: attr-id 0x1568 val 0x0 failed 0 retries
[   15.400514] ufshcd-qcom 1d84000.ufshc: __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
[   15.409439] ufshcd-qcom 1d84000.ufshc: ufshcd_try_to_abort_task: no response from device. tag = 10, err -110
[   15.479763] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   15.546285] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   15.612688] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   15.679057] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   15.685187] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
[   15.754125] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   15.820405] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   15.886519] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   15.952976] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   15.959119] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
[   16.027951] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   16.094499] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   16.161005] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   16.276723] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   16.282756] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
[   16.360239] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   16.427425] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   16.497737] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   16.566869] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   16.572939] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
[   16.642243] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   16.708590] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   16.774868] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   16.841364] ufshcd-qcom 1d84000.ufshc: Controller enable failed
[   16.847492] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
[   16.855938] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler: reset and restore failed with err -5
[   16.936724] sd 0:0:0:2: [sdc] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=1s
[   16.946151] sd 0:0:0:2: [sdc] tag#12 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
[   16.953950] blk_update_request: I/O error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   16.963987] Buffer I/O error on dev sdc, logical block 0, async page read
[   16.971263] sd 0:0:0:1: [sdb] tag#13 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=1s
[   16.980668] sd 0:0:0:1: [sdb] tag#13 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
[   16.988457] blk_update_request: I/O error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   16.998455] Buffer I/O error on dev sdb, logical block 0, async page read
[   17.005438] sd 0:0:0:0: [sda] tag#15 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=1s
[   17.014887] sd 0:0:0:0: [sda] tag#15 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
[   17.022639] blk_update_request: I/O error, dev sda, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   17.032581] Buffer I/O error on dev sda, logical block 0, async page read
[   17.060604] sd 0:0:0:2: [sdc] tag#11 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
[   17.070024] sd 0:0:0:2: [sdc] tag#11 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
[   17.077811] blk_update_request: I/O error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   17.087785] Buffer I/O error on dev sdc, logical block 0, async page read
[   17.094762]  sdc: unable to read partition table
[   17.121375] sd 0:0:0:0: [sda] tag#13 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
[   17.124811] sd 0:0:0:1: [sdb] tag#14 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
[   17.130915] sd 0:0:0:0: [sda] tag#13 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
[   17.138281] sd 0:0:0:2: [sdc] tag#9 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
[   17.138351] sd 0:0:0:2: [sdc] tag#9 CDB: opcode=0x28 28 00 00 00 03 f0 00 00 01 00
[   17.138361] blk_update_request: I/O error, dev sdc, sector 8064 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[   17.140305] sd 0:0:0:1: [sdb] tag#14 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
[   17.148070] blk_update_request: I/O error, dev sda, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   17.157308] blk_update_request: I/O error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   17.164973] Buffer I/O error on dev sda, logical block 0, async page read
[   17.175522] Buffer I/O error on dev sdb, logical block 0, async page read
[   17.183484]  sda: unable to read partition table
[   17.193379]  sdb: unable to read partition table
[   17.245890] sd 0:0:0:0: [sda] tag#16 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
[   17.255444] sd 0:0:0:0: [sda] tag#16 CDB: opcode=0x28 28 00 00 d7 d7 f0 00 00 01 00
[   17.262309] sd 0:0:0:2: [sdc] tag#9 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
[   17.263226] blk_update_request: I/O error, dev sda, sector 113164160 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[   17.267751] sd 0:0:0:1: [sdb] tag#14 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
[   17.267804] sd 0:0:0:1: [sdb] tag#14 CDB: opcode=0x28 28 00 00 00 03 f0 00 00 01 00
[   17.267820] blk_update_request: I/O error, dev sdb, sector 8064 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[   17.272473] sd 0:0:0:2: [sdc] tag#9 CDB: opcode=0x28 28 00 00 00 03 f0 00 00 01 00
[   17.272480] blk_update_request: I/O error, dev sdc, sector 8064 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   17.328934] Buffer I/O error on dev sdc, logical block 1008, async page read
[   17.336400] Buffer I/O error on dev sdb, logical block 1008, async page read
[   17.344184] Buffer I/O error on dev sda, logical block 14145520, async page read
[   17.356908] sd 0:0:0:3: [sdd] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
[   17.365890] sd 0:0:0:3: [sdd] Sense not available.
[   17.504489] sd 0:0:0:1: [sdb] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
[   17.513199] sd 0:0:0:1: [sdb] Sense not available.
[   17.548648] sd 0:0:0:0: [sda] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
[   17.548875] sd 0:0:0:2: [sdc] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
[   17.557365] sd 0:0:0:0: [sda] Sense not available.
[   17.570888] sd 0:0:0:2: [sdc] Sense not available.
[   17.612557] sd 0:0:0:3: [sdd] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
[   17.621302] sd 0:0:0:3: [sdd] Sense not available.
[   17.720470] sd 0:0:0:3: [sdd] 0 512-byte logical blocks: (0 B/0 B)
[   17.726895] sd 0:0:0:3: [sdd] 0-byte physical blocks
[   17.756442] sd 0:0:0:1: [sdb] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
[   17.765154] sd 0:0:0:1: [sdb] Sense not available.
[   17.796694] sd 0:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
[   17.805550] sd 0:0:0:0: [sda] Sense not available.
[   17.812461] sd 0:0:0:2: [sdc] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
[   17.812479] sd 0:0:0:3: [sdd] Write Protect is off
[   17.821124] sd 0:0:0:2: [sdc] Sense not available.
[   17.831183] sd 0:0:0:3: [sdd] Mode Sense: 00 00 00 00
[   17.916633] sd 0:0:0:3: [sdd] Asking for cache data failed
[   17.922246] sd 0:0:0:3: [sdd] Assuming drive cache: write through
[   17.928661] sd 0:0:0:1: [sdb] 0 4096-byte logical blocks: (0 B/0 B)
[   17.964461] sd 0:0:0:2: [sdc] 0 4096-byte logical blocks: (0 B/0 B)
[   17.968389] sd 0:0:0:0: [sda] 0 4096-byte logical blocks: (0 B/0 B)
[   18.052598] sd 0:0:0:2: [sdc] Write Protect is on
[   18.057654] sd 0:0:0:2: [sdc] Mode Sense: 00 01 1c 88
[   18.140750] Buffer I/O error on dev sdc, logical block 1008, async page read
[   18.148120] sdc: detected capacity change from 0 to 8192
[   18.148436] sdb: detected capacity change from 0 to 8192
[   18.153573] sd 0:0:0:2: [sdc] Attached SCSI disk
[   18.161640] sd 0:0:0:1: [sdb] Attached SCSI disk
[   18.272458] sda: detected capacity change from 0 to 113164288
[   18.278440] sd 0:0:0:0: [sda] Attached SCSI disk
[   18.336456] sd 0:0:0:3: [sdd] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
[   18.345155] sd 0:0:0:3: [sdd] Sense not available.
[   18.437277] ufshcd-qcom 1d84000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 2 failed, index 4, err = -11
[   18.588427] sd 0:0:0:3: [sdd] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
[   18.597131] sd 0:0:0:3: [sdd] Sense not available.
[   18.984425] sd 0:0:0:3: [sdd] Attached SCSI disk
[   19.973658] ufshcd-qcom 1d84000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 2 failed, index 4, err = -11
[   21.508706] ufshcd-qcom 1d84000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 2 failed, index 4, err = -11
[   21.519204] ufshcd-qcom 1d84000.ufshc: ufshcd_read_desc_param: Failed reading descriptor. desc_id 2, desc_index 4, param_offset 6, ret -11
[   21.577216] sd 0:0:0:0: Unexpected response from lun 4 while scanning, scan aborted
[   22.181387] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   23.007510] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   23.076607] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   23.524907] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   24.133026] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   24.421140] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   24.549023] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   25.156985] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   25.253136] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   26.180950] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   26.597019] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   27.204984] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   27.493135] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   27.621022] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   28.228982] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   28.325152] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   29.252956] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   29.669021] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   30.276984] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   30.565140] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   30.693023] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   31.300983] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   31.397134] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   32.324954] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   32.741021] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   33.348985] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   33.637132] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   33.765258] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   34.372980] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   34.469142] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   35.396957] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   35.813018] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   36.420980] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   36.709133] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   36.837021] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   37.444979] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   37.541138] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   38.468955] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   38.885021] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   39.492983] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   39.781134] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   39.909017] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   40.516977] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   40.613135] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   41.540956] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   41.957016] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   42.564987] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   42.853130] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   42.981020] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   43.588980] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   43.685135] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   44.612954] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   45.029014] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   45.636979] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   45.925137] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   46.053022] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   46.660983] devfreq 1d84000.ufshc: dvfs failed with (-16) error
[   46.757132] devfreq 1d84000.ufshc: dvfs failed with (-16) error

Is this something that you've seen?

Regards,
Bjorn

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 0/2] Fix deadlock in ufs
  2021-01-27 15:22 ` [RFC PATCH v1 0/2] Fix deadlock in ufs Bjorn Andersson
@ 2021-01-27 16:16   ` Asutosh Das
  2021-01-27 19:36     ` Bjorn Andersson
  0 siblings, 1 reply; 31+ messages in thread
From: Asutosh Das @ 2021-01-27 16:16 UTC (permalink / raw)
  To: Bjorn Andersson; +Cc: cang, martin.petersen, linux-scsi, linux-arm-msm, stern

On Wed, Jan 27 2021 at 07:25 -0800, Bjorn Andersson wrote:
>On Tue 26 Jan 20:00 PST 2021, Asutosh Das wrote:
>
>> This patchset attempts to fix a deadlock in ufs.
>> This deadlock occurs because the ufs host driver tries to resume
>> its child (wlun scsi device) to send SSU to it during its suspend.
>>
>
>I've been trying to bisect a regression currently seen on all Qualcomm
>boards I've tried running next-20210125 on, but have yet to figure out
>the actual culprit. I was hoping this might be related, but no.
>
>The following is the UFS related logs from a SDM845 board:
>
>[    4.556147] ufshcd-qcom 1d84000.ufshc: Adding to iommu group 10
>[   14.823635] ufshcd-qcom 1d84000.ufshc: ufshcd_populate_vreg: Unable to find vdd-hba-supply regulator, assuming enabled
>[   14.834483] ufshcd-qcom 1d84000.ufshc: ufshcd_populate_vreg: Unable to find vccq-supply regulator, assuming enabled
>[   14.845022] ufshcd-qcom 1d84000.ufshc: ufshcd_populate_vreg: Unable to find vccq2-supply regulator, assuming enabled
>[   14.859278] scsi host0: ufshcd
>[   14.894076] ufshcd-qcom 1d84000.ufshc: ufshcd_print_pwr_info:[RX, TX]: gear=[1, 1], lane[1, 1], pwr[SLOWAUTO_MODE, SLOWAUTO_MODE], rate = 0
>[   14.972730] ufshcd-qcom 1d84000.ufshc: ufshcd_find_max_sup_active_icc_level: Regulator capability was not set, actvIccLevel=0
>[   14.988386] scsi 0:0:0:49488: Well-known LUN    SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
>[   15.002508] scsi 0:0:0:49476: Well-known LUN    SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
>[   15.015848] scsi 0:0:0:49456: Well-known LUN    SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
>[   15.031551] scsi 0:0:0:0: Direct-Access     SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
>[   15.046885] scsi 0:0:0:1: Direct-Access     SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
>[   15.047204] sd 0:0:0:0: [sda] 14145536 4096-byte logical blocks: (57.9 GB/54.0 GiB)
>[   15.047274] sd 0:0:0:0: [sda] Write Protect is off
>[   15.047281] sd 0:0:0:0: [sda] Mode Sense: 00 32 00 10
>[   15.047418] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
>[   15.047488] sd 0:0:0:0: [sda] Optimal transfer size 786432 bytes
>[   15.062041] scsi 0:0:0:2: Direct-Access     SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
>[   15.062347] sd 0:0:0:1: [sdb] 1024 4096-byte logical blocks: (4.19 MB/4.00 MiB)
>[   15.062414] sd 0:0:0:1: [sdb] Write Protect is off
>[   15.062418] sd 0:0:0:1: [sdb] Mode Sense: 00 32 00 10
>[   15.062522] sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
>[   15.062582] sd 0:0:0:1: [sdb] Optimal transfer size 786432 bytes
>[   15.074503] scsi 0:0:0:3: Direct-Access     SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
>[   15.075092] sd 0:0:0:2: [sdc] 1024 4096-byte logical blocks: (4.19 MB/4.00 MiB)
>[   15.075161] sd 0:0:0:2: [sdc] Write Protect is off
>[   15.075166] sd 0:0:0:2: [sdc] Mode Sense: 00 32 00 10
>[   15.075292] sd 0:0:0:2: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
>[   15.075358] sd 0:0:0:2: [sdc] Optimal transfer size 786432 bytes
>[   15.093820] ufshcd-qcom 1d84000.ufshc: dme-set: attr-id 0x1583 val 0x0 failed 0 retries
>[   15.199208] ufshcd-qcom 1d84000.ufshc: dme-set: attr-id 0x1568 val 0x0 failed 0 retries
>[   15.400514] ufshcd-qcom 1d84000.ufshc: __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
>[   15.409439] ufshcd-qcom 1d84000.ufshc: ufshcd_try_to_abort_task: no response from device. tag = 10, err -110
>[   15.479763] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   15.546285] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   15.612688] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   15.679057] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   15.685187] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
>[   15.754125] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   15.820405] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   15.886519] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   15.952976] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   15.959119] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
>[   16.027951] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   16.094499] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   16.161005] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   16.276723] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   16.282756] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
>[   16.360239] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   16.427425] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   16.497737] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   16.566869] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   16.572939] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
>[   16.642243] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   16.708590] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   16.774868] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   16.841364] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>[   16.847492] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
>[   16.855938] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler: reset and restore failed with err -5
>[   16.936724] sd 0:0:0:2: [sdc] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=1s
>[   16.946151] sd 0:0:0:2: [sdc] tag#12 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
>[   16.953950] blk_update_request: I/O error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>[   16.963987] Buffer I/O error on dev sdc, logical block 0, async page read
>[   16.971263] sd 0:0:0:1: [sdb] tag#13 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=1s
>[   16.980668] sd 0:0:0:1: [sdb] tag#13 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
>[   16.988457] blk_update_request: I/O error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>[   16.998455] Buffer I/O error on dev sdb, logical block 0, async page read
>[   17.005438] sd 0:0:0:0: [sda] tag#15 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=1s
>[   17.014887] sd 0:0:0:0: [sda] tag#15 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
>[   17.022639] blk_update_request: I/O error, dev sda, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>[   17.032581] Buffer I/O error on dev sda, logical block 0, async page read
>[   17.060604] sd 0:0:0:2: [sdc] tag#11 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
>[   17.070024] sd 0:0:0:2: [sdc] tag#11 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
>[   17.077811] blk_update_request: I/O error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>[   17.087785] Buffer I/O error on dev sdc, logical block 0, async page read
>[   17.094762]  sdc: unable to read partition table
>[   17.121375] sd 0:0:0:0: [sda] tag#13 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
>[   17.124811] sd 0:0:0:1: [sdb] tag#14 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
>[   17.130915] sd 0:0:0:0: [sda] tag#13 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
>[   17.138281] sd 0:0:0:2: [sdc] tag#9 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
>[   17.138351] sd 0:0:0:2: [sdc] tag#9 CDB: opcode=0x28 28 00 00 00 03 f0 00 00 01 00
>[   17.138361] blk_update_request: I/O error, dev sdc, sector 8064 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>[   17.140305] sd 0:0:0:1: [sdb] tag#14 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
>[   17.148070] blk_update_request: I/O error, dev sda, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>[   17.157308] blk_update_request: I/O error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>[   17.164973] Buffer I/O error on dev sda, logical block 0, async page read
>[   17.175522] Buffer I/O error on dev sdb, logical block 0, async page read
>[   17.183484]  sda: unable to read partition table
>[   17.193379]  sdb: unable to read partition table
>[   17.245890] sd 0:0:0:0: [sda] tag#16 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
>[   17.255444] sd 0:0:0:0: [sda] tag#16 CDB: opcode=0x28 28 00 00 d7 d7 f0 00 00 01 00
>[   17.262309] sd 0:0:0:2: [sdc] tag#9 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
>[   17.263226] blk_update_request: I/O error, dev sda, sector 113164160 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>[   17.267751] sd 0:0:0:1: [sdb] tag#14 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
>[   17.267804] sd 0:0:0:1: [sdb] tag#14 CDB: opcode=0x28 28 00 00 00 03 f0 00 00 01 00
>[   17.267820] blk_update_request: I/O error, dev sdb, sector 8064 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>[   17.272473] sd 0:0:0:2: [sdc] tag#9 CDB: opcode=0x28 28 00 00 00 03 f0 00 00 01 00
>[   17.272480] blk_update_request: I/O error, dev sdc, sector 8064 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>[   17.328934] Buffer I/O error on dev sdc, logical block 1008, async page read
>[   17.336400] Buffer I/O error on dev sdb, logical block 1008, async page read
>[   17.344184] Buffer I/O error on dev sda, logical block 14145520, async page read
>[   17.356908] sd 0:0:0:3: [sdd] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
>[   17.365890] sd 0:0:0:3: [sdd] Sense not available.
>[   17.504489] sd 0:0:0:1: [sdb] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
>[   17.513199] sd 0:0:0:1: [sdb] Sense not available.
>[   17.548648] sd 0:0:0:0: [sda] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
>[   17.548875] sd 0:0:0:2: [sdc] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
>[   17.557365] sd 0:0:0:0: [sda] Sense not available.
>[   17.570888] sd 0:0:0:2: [sdc] Sense not available.
>[   17.612557] sd 0:0:0:3: [sdd] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
>[   17.621302] sd 0:0:0:3: [sdd] Sense not available.
>[   17.720470] sd 0:0:0:3: [sdd] 0 512-byte logical blocks: (0 B/0 B)
>[   17.726895] sd 0:0:0:3: [sdd] 0-byte physical blocks
>[   17.756442] sd 0:0:0:1: [sdb] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
>[   17.765154] sd 0:0:0:1: [sdb] Sense not available.
>[   17.796694] sd 0:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
>[   17.805550] sd 0:0:0:0: [sda] Sense not available.
>[   17.812461] sd 0:0:0:2: [sdc] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
>[   17.812479] sd 0:0:0:3: [sdd] Write Protect is off
>[   17.821124] sd 0:0:0:2: [sdc] Sense not available.
>[   17.831183] sd 0:0:0:3: [sdd] Mode Sense: 00 00 00 00
>[   17.916633] sd 0:0:0:3: [sdd] Asking for cache data failed
>[   17.922246] sd 0:0:0:3: [sdd] Assuming drive cache: write through
>[   17.928661] sd 0:0:0:1: [sdb] 0 4096-byte logical blocks: (0 B/0 B)
>[   17.964461] sd 0:0:0:2: [sdc] 0 4096-byte logical blocks: (0 B/0 B)
>[   17.968389] sd 0:0:0:0: [sda] 0 4096-byte logical blocks: (0 B/0 B)
>[   18.052598] sd 0:0:0:2: [sdc] Write Protect is on
>[   18.057654] sd 0:0:0:2: [sdc] Mode Sense: 00 01 1c 88
>[   18.140750] Buffer I/O error on dev sdc, logical block 1008, async page read
>[   18.148120] sdc: detected capacity change from 0 to 8192
>[   18.148436] sdb: detected capacity change from 0 to 8192
>[   18.153573] sd 0:0:0:2: [sdc] Attached SCSI disk
>[   18.161640] sd 0:0:0:1: [sdb] Attached SCSI disk
>[   18.272458] sda: detected capacity change from 0 to 113164288
>[   18.278440] sd 0:0:0:0: [sda] Attached SCSI disk
>[   18.336456] sd 0:0:0:3: [sdd] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
>[   18.345155] sd 0:0:0:3: [sdd] Sense not available.
>[   18.437277] ufshcd-qcom 1d84000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 2 failed, index 4, err = -11
>[   18.588427] sd 0:0:0:3: [sdd] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
>[   18.597131] sd 0:0:0:3: [sdd] Sense not available.
>[   18.984425] sd 0:0:0:3: [sdd] Attached SCSI disk
>[   19.973658] ufshcd-qcom 1d84000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 2 failed, index 4, err = -11
>[   21.508706] ufshcd-qcom 1d84000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 2 failed, index 4, err = -11
>[   21.519204] ufshcd-qcom 1d84000.ufshc: ufshcd_read_desc_param: Failed reading descriptor. desc_id 2, desc_index 4, param_offset 6, ret -11
>[   21.577216] sd 0:0:0:0: Unexpected response from lun 4 while scanning, scan aborted
>[   22.181387] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   23.007510] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   23.076607] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   23.524907] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   24.133026] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   24.421140] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   24.549023] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   25.156985] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   25.253136] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   26.180950] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   26.597019] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   27.204984] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   27.493135] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   27.621022] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   28.228982] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   28.325152] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   29.252956] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   29.669021] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   30.276984] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   30.565140] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   30.693023] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   31.300983] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   31.397134] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   32.324954] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   32.741021] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   33.348985] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   33.637132] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   33.765258] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   34.372980] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   34.469142] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   35.396957] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   35.813018] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   36.420980] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   36.709133] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   36.837021] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   37.444979] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   37.541138] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   38.468955] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   38.885021] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   39.492983] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   39.781134] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   39.909017] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   40.516977] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   40.613135] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   41.540956] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   41.957016] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   42.564987] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   42.853130] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   42.981020] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   43.588980] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   43.685135] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   44.612954] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   45.029014] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   45.636979] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   45.925137] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   46.053022] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   46.660983] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>[   46.757132] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>
>Is this something that you've seen?
>
>Regards,
>Bjorn

Hi Bjorn
Nope, I've not seen this issue yet.
Looks like the commands are timing out, could be clocks/power, perhaps?

-asd

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 0/2] Fix deadlock in ufs
  2021-01-27 16:16   ` Asutosh Das
@ 2021-01-27 19:36     ` Bjorn Andersson
  2021-01-28  2:47       ` Can Guo
  0 siblings, 1 reply; 31+ messages in thread
From: Bjorn Andersson @ 2021-01-27 19:36 UTC (permalink / raw)
  To: Asutosh Das; +Cc: cang, martin.petersen, linux-scsi, linux-arm-msm, stern

On Wed 27 Jan 10:16 CST 2021, Asutosh Das wrote:

> On Wed, Jan 27 2021 at 07:25 -0800, Bjorn Andersson wrote:
> > On Tue 26 Jan 20:00 PST 2021, Asutosh Das wrote:
> > 
> > > This patchset attempts to fix a deadlock in ufs.
> > > This deadlock occurs because the ufs host driver tries to resume
> > > its child (wlun scsi device) to send SSU to it during its suspend.
> > > 
> > 
> > I've been trying to bisect a regression currently seen on all Qualcomm
> > boards I've tried running next-20210125 on, but have yet to figure out
> > the actual culprit. I was hoping this might be related, but no.
> > 
> > The following is the UFS related logs from a SDM845 board:
> > 
> > [    4.556147] ufshcd-qcom 1d84000.ufshc: Adding to iommu group 10
> > [   14.823635] ufshcd-qcom 1d84000.ufshc: ufshcd_populate_vreg: Unable to find vdd-hba-supply regulator, assuming enabled
> > [   14.834483] ufshcd-qcom 1d84000.ufshc: ufshcd_populate_vreg: Unable to find vccq-supply regulator, assuming enabled
> > [   14.845022] ufshcd-qcom 1d84000.ufshc: ufshcd_populate_vreg: Unable to find vccq2-supply regulator, assuming enabled
> > [   14.859278] scsi host0: ufshcd
> > [   14.894076] ufshcd-qcom 1d84000.ufshc: ufshcd_print_pwr_info:[RX, TX]: gear=[1, 1], lane[1, 1], pwr[SLOWAUTO_MODE, SLOWAUTO_MODE], rate = 0
> > [   14.972730] ufshcd-qcom 1d84000.ufshc: ufshcd_find_max_sup_active_icc_level: Regulator capability was not set, actvIccLevel=0
> > [   14.988386] scsi 0:0:0:49488: Well-known LUN    SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
> > [   15.002508] scsi 0:0:0:49476: Well-known LUN    SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
> > [   15.015848] scsi 0:0:0:49456: Well-known LUN    SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
> > [   15.031551] scsi 0:0:0:0: Direct-Access     SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
> > [   15.046885] scsi 0:0:0:1: Direct-Access     SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
> > [   15.047204] sd 0:0:0:0: [sda] 14145536 4096-byte logical blocks: (57.9 GB/54.0 GiB)
> > [   15.047274] sd 0:0:0:0: [sda] Write Protect is off
> > [   15.047281] sd 0:0:0:0: [sda] Mode Sense: 00 32 00 10
> > [   15.047418] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
> > [   15.047488] sd 0:0:0:0: [sda] Optimal transfer size 786432 bytes
> > [   15.062041] scsi 0:0:0:2: Direct-Access     SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
> > [   15.062347] sd 0:0:0:1: [sdb] 1024 4096-byte logical blocks: (4.19 MB/4.00 MiB)
> > [   15.062414] sd 0:0:0:1: [sdb] Write Protect is off
> > [   15.062418] sd 0:0:0:1: [sdb] Mode Sense: 00 32 00 10
> > [   15.062522] sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
> > [   15.062582] sd 0:0:0:1: [sdb] Optimal transfer size 786432 bytes
> > [   15.074503] scsi 0:0:0:3: Direct-Access     SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
> > [   15.075092] sd 0:0:0:2: [sdc] 1024 4096-byte logical blocks: (4.19 MB/4.00 MiB)
> > [   15.075161] sd 0:0:0:2: [sdc] Write Protect is off
> > [   15.075166] sd 0:0:0:2: [sdc] Mode Sense: 00 32 00 10
> > [   15.075292] sd 0:0:0:2: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
> > [   15.075358] sd 0:0:0:2: [sdc] Optimal transfer size 786432 bytes
> > [   15.093820] ufshcd-qcom 1d84000.ufshc: dme-set: attr-id 0x1583 val 0x0 failed 0 retries
> > [   15.199208] ufshcd-qcom 1d84000.ufshc: dme-set: attr-id 0x1568 val 0x0 failed 0 retries
> > [   15.400514] ufshcd-qcom 1d84000.ufshc: __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
> > [   15.409439] ufshcd-qcom 1d84000.ufshc: ufshcd_try_to_abort_task: no response from device. tag = 10, err -110
> > [   15.479763] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   15.546285] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   15.612688] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   15.679057] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   15.685187] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
> > [   15.754125] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   15.820405] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   15.886519] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   15.952976] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   15.959119] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
> > [   16.027951] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   16.094499] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   16.161005] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   16.276723] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   16.282756] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
> > [   16.360239] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   16.427425] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   16.497737] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   16.566869] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   16.572939] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
> > [   16.642243] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   16.708590] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   16.774868] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   16.841364] ufshcd-qcom 1d84000.ufshc: Controller enable failed
> > [   16.847492] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
> > [   16.855938] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler: reset and restore failed with err -5
> > [   16.936724] sd 0:0:0:2: [sdc] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=1s
> > [   16.946151] sd 0:0:0:2: [sdc] tag#12 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
> > [   16.953950] blk_update_request: I/O error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> > [   16.963987] Buffer I/O error on dev sdc, logical block 0, async page read
> > [   16.971263] sd 0:0:0:1: [sdb] tag#13 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=1s
> > [   16.980668] sd 0:0:0:1: [sdb] tag#13 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
> > [   16.988457] blk_update_request: I/O error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> > [   16.998455] Buffer I/O error on dev sdb, logical block 0, async page read
> > [   17.005438] sd 0:0:0:0: [sda] tag#15 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=1s
> > [   17.014887] sd 0:0:0:0: [sda] tag#15 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
> > [   17.022639] blk_update_request: I/O error, dev sda, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> > [   17.032581] Buffer I/O error on dev sda, logical block 0, async page read
> > [   17.060604] sd 0:0:0:2: [sdc] tag#11 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
> > [   17.070024] sd 0:0:0:2: [sdc] tag#11 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
> > [   17.077811] blk_update_request: I/O error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> > [   17.087785] Buffer I/O error on dev sdc, logical block 0, async page read
> > [   17.094762]  sdc: unable to read partition table
> > [   17.121375] sd 0:0:0:0: [sda] tag#13 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
> > [   17.124811] sd 0:0:0:1: [sdb] tag#14 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
> > [   17.130915] sd 0:0:0:0: [sda] tag#13 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
> > [   17.138281] sd 0:0:0:2: [sdc] tag#9 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
> > [   17.138351] sd 0:0:0:2: [sdc] tag#9 CDB: opcode=0x28 28 00 00 00 03 f0 00 00 01 00
> > [   17.138361] blk_update_request: I/O error, dev sdc, sector 8064 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > [   17.140305] sd 0:0:0:1: [sdb] tag#14 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
> > [   17.148070] blk_update_request: I/O error, dev sda, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> > [   17.157308] blk_update_request: I/O error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> > [   17.164973] Buffer I/O error on dev sda, logical block 0, async page read
> > [   17.175522] Buffer I/O error on dev sdb, logical block 0, async page read
> > [   17.183484]  sda: unable to read partition table
> > [   17.193379]  sdb: unable to read partition table
> > [   17.245890] sd 0:0:0:0: [sda] tag#16 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
> > [   17.255444] sd 0:0:0:0: [sda] tag#16 CDB: opcode=0x28 28 00 00 d7 d7 f0 00 00 01 00
> > [   17.262309] sd 0:0:0:2: [sdc] tag#9 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
> > [   17.263226] blk_update_request: I/O error, dev sda, sector 113164160 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > [   17.267751] sd 0:0:0:1: [sdb] tag#14 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
> > [   17.267804] sd 0:0:0:1: [sdb] tag#14 CDB: opcode=0x28 28 00 00 00 03 f0 00 00 01 00
> > [   17.267820] blk_update_request: I/O error, dev sdb, sector 8064 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> > [   17.272473] sd 0:0:0:2: [sdc] tag#9 CDB: opcode=0x28 28 00 00 00 03 f0 00 00 01 00
> > [   17.272480] blk_update_request: I/O error, dev sdc, sector 8064 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> > [   17.328934] Buffer I/O error on dev sdc, logical block 1008, async page read
> > [   17.336400] Buffer I/O error on dev sdb, logical block 1008, async page read
> > [   17.344184] Buffer I/O error on dev sda, logical block 14145520, async page read
> > [   17.356908] sd 0:0:0:3: [sdd] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
> > [   17.365890] sd 0:0:0:3: [sdd] Sense not available.
> > [   17.504489] sd 0:0:0:1: [sdb] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
> > [   17.513199] sd 0:0:0:1: [sdb] Sense not available.
> > [   17.548648] sd 0:0:0:0: [sda] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
> > [   17.548875] sd 0:0:0:2: [sdc] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
> > [   17.557365] sd 0:0:0:0: [sda] Sense not available.
> > [   17.570888] sd 0:0:0:2: [sdc] Sense not available.
> > [   17.612557] sd 0:0:0:3: [sdd] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
> > [   17.621302] sd 0:0:0:3: [sdd] Sense not available.
> > [   17.720470] sd 0:0:0:3: [sdd] 0 512-byte logical blocks: (0 B/0 B)
> > [   17.726895] sd 0:0:0:3: [sdd] 0-byte physical blocks
> > [   17.756442] sd 0:0:0:1: [sdb] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
> > [   17.765154] sd 0:0:0:1: [sdb] Sense not available.
> > [   17.796694] sd 0:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
> > [   17.805550] sd 0:0:0:0: [sda] Sense not available.
> > [   17.812461] sd 0:0:0:2: [sdc] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
> > [   17.812479] sd 0:0:0:3: [sdd] Write Protect is off
> > [   17.821124] sd 0:0:0:2: [sdc] Sense not available.
> > [   17.831183] sd 0:0:0:3: [sdd] Mode Sense: 00 00 00 00
> > [   17.916633] sd 0:0:0:3: [sdd] Asking for cache data failed
> > [   17.922246] sd 0:0:0:3: [sdd] Assuming drive cache: write through
> > [   17.928661] sd 0:0:0:1: [sdb] 0 4096-byte logical blocks: (0 B/0 B)
> > [   17.964461] sd 0:0:0:2: [sdc] 0 4096-byte logical blocks: (0 B/0 B)
> > [   17.968389] sd 0:0:0:0: [sda] 0 4096-byte logical blocks: (0 B/0 B)
> > [   18.052598] sd 0:0:0:2: [sdc] Write Protect is on
> > [   18.057654] sd 0:0:0:2: [sdc] Mode Sense: 00 01 1c 88
> > [   18.140750] Buffer I/O error on dev sdc, logical block 1008, async page read
> > [   18.148120] sdc: detected capacity change from 0 to 8192
> > [   18.148436] sdb: detected capacity change from 0 to 8192
> > [   18.153573] sd 0:0:0:2: [sdc] Attached SCSI disk
> > [   18.161640] sd 0:0:0:1: [sdb] Attached SCSI disk
> > [   18.272458] sda: detected capacity change from 0 to 113164288
> > [   18.278440] sd 0:0:0:0: [sda] Attached SCSI disk
> > [   18.336456] sd 0:0:0:3: [sdd] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
> > [   18.345155] sd 0:0:0:3: [sdd] Sense not available.
> > [   18.437277] ufshcd-qcom 1d84000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 2 failed, index 4, err = -11
> > [   18.588427] sd 0:0:0:3: [sdd] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
> > [   18.597131] sd 0:0:0:3: [sdd] Sense not available.
> > [   18.984425] sd 0:0:0:3: [sdd] Attached SCSI disk
> > [   19.973658] ufshcd-qcom 1d84000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 2 failed, index 4, err = -11
> > [   21.508706] ufshcd-qcom 1d84000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 2 failed, index 4, err = -11
> > [   21.519204] ufshcd-qcom 1d84000.ufshc: ufshcd_read_desc_param: Failed reading descriptor. desc_id 2, desc_index 4, param_offset 6, ret -11
> > [   21.577216] sd 0:0:0:0: Unexpected response from lun 4 while scanning, scan aborted
> > [   22.181387] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   23.007510] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   23.076607] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   23.524907] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   24.133026] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   24.421140] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   24.549023] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   25.156985] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   25.253136] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   26.180950] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   26.597019] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   27.204984] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   27.493135] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   27.621022] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   28.228982] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   28.325152] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   29.252956] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   29.669021] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   30.276984] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   30.565140] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   30.693023] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   31.300983] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   31.397134] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   32.324954] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   32.741021] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   33.348985] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   33.637132] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   33.765258] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   34.372980] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   34.469142] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   35.396957] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   35.813018] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   36.420980] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   36.709133] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   36.837021] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   37.444979] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   37.541138] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   38.468955] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   38.885021] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   39.492983] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   39.781134] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   39.909017] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   40.516977] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   40.613135] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   41.540956] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   41.957016] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   42.564987] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   42.853130] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   42.981020] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   43.588980] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   43.685135] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   44.612954] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   45.029014] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   45.636979] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   45.925137] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   46.053022] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   46.660983] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > [   46.757132] devfreq 1d84000.ufshc: dvfs failed with (-16) error
> > 
> > Is this something that you've seen?
> > 
> > Regards,
> > Bjorn
> 
> Hi Bjorn
> Nope, I've not seen this issue yet.
> Looks like the commands are timing out, could be clocks/power, perhaps?
> 

I've finally concluded that reverting 4543d9d78227 ("scsi: ufs: Refactor
ufshcd_init/exit_clk_scaling/gating()") fixes the problem and allow at
least SDM845 to boot again (haven't yet tested the others).

I've not yet figured out why it's causing the regression.

Regards,
Bjorn

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 0/2] Fix deadlock in ufs
  2021-01-27 19:36     ` Bjorn Andersson
@ 2021-01-28  2:47       ` Can Guo
  0 siblings, 0 replies; 31+ messages in thread
From: Can Guo @ 2021-01-28  2:47 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Asutosh Das, martin.petersen, linux-scsi, linux-arm-msm, stern

On 2021-01-28 03:36, Bjorn Andersson wrote:
> On Wed 27 Jan 10:16 CST 2021, Asutosh Das wrote:
> 
>> On Wed, Jan 27 2021 at 07:25 -0800, Bjorn Andersson wrote:
>> > On Tue 26 Jan 20:00 PST 2021, Asutosh Das wrote:
>> >
>> > > This patchset attempts to fix a deadlock in ufs.
>> > > This deadlock occurs because the ufs host driver tries to resume
>> > > its child (wlun scsi device) to send SSU to it during its suspend.
>> > >
>> >
>> > I've been trying to bisect a regression currently seen on all Qualcomm
>> > boards I've tried running next-20210125 on, but have yet to figure out
>> > the actual culprit. I was hoping this might be related, but no.
>> >
>> > The following is the UFS related logs from a SDM845 board:
>> >
>> > [    4.556147] ufshcd-qcom 1d84000.ufshc: Adding to iommu group 10
>> > [   14.823635] ufshcd-qcom 1d84000.ufshc: ufshcd_populate_vreg: Unable to find vdd-hba-supply regulator, assuming enabled
>> > [   14.834483] ufshcd-qcom 1d84000.ufshc: ufshcd_populate_vreg: Unable to find vccq-supply regulator, assuming enabled
>> > [   14.845022] ufshcd-qcom 1d84000.ufshc: ufshcd_populate_vreg: Unable to find vccq2-supply regulator, assuming enabled
>> > [   14.859278] scsi host0: ufshcd
>> > [   14.894076] ufshcd-qcom 1d84000.ufshc: ufshcd_print_pwr_info:[RX, TX]: gear=[1, 1], lane[1, 1], pwr[SLOWAUTO_MODE, SLOWAUTO_MODE], rate = 0
>> > [   14.972730] ufshcd-qcom 1d84000.ufshc: ufshcd_find_max_sup_active_icc_level: Regulator capability was not set, actvIccLevel=0
>> > [   14.988386] scsi 0:0:0:49488: Well-known LUN    SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
>> > [   15.002508] scsi 0:0:0:49476: Well-known LUN    SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
>> > [   15.015848] scsi 0:0:0:49456: Well-known LUN    SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
>> > [   15.031551] scsi 0:0:0:0: Direct-Access     SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
>> > [   15.046885] scsi 0:0:0:1: Direct-Access     SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
>> > [   15.047204] sd 0:0:0:0: [sda] 14145536 4096-byte logical blocks: (57.9 GB/54.0 GiB)
>> > [   15.047274] sd 0:0:0:0: [sda] Write Protect is off
>> > [   15.047281] sd 0:0:0:0: [sda] Mode Sense: 00 32 00 10
>> > [   15.047418] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
>> > [   15.047488] sd 0:0:0:0: [sda] Optimal transfer size 786432 bytes
>> > [   15.062041] scsi 0:0:0:2: Direct-Access     SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
>> > [   15.062347] sd 0:0:0:1: [sdb] 1024 4096-byte logical blocks: (4.19 MB/4.00 MiB)
>> > [   15.062414] sd 0:0:0:1: [sdb] Write Protect is off
>> > [   15.062418] sd 0:0:0:1: [sdb] Mode Sense: 00 32 00 10
>> > [   15.062522] sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
>> > [   15.062582] sd 0:0:0:1: [sdb] Optimal transfer size 786432 bytes
>> > [   15.074503] scsi 0:0:0:3: Direct-Access     SKhynix  H28S7Q302BMR     A002 PQ: 0 ANSI: 6
>> > [   15.075092] sd 0:0:0:2: [sdc] 1024 4096-byte logical blocks: (4.19 MB/4.00 MiB)
>> > [   15.075161] sd 0:0:0:2: [sdc] Write Protect is off
>> > [   15.075166] sd 0:0:0:2: [sdc] Mode Sense: 00 32 00 10
>> > [   15.075292] sd 0:0:0:2: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
>> > [   15.075358] sd 0:0:0:2: [sdc] Optimal transfer size 786432 bytes
>> > [   15.093820] ufshcd-qcom 1d84000.ufshc: dme-set: attr-id 0x1583 val 0x0 failed 0 retries
>> > [   15.199208] ufshcd-qcom 1d84000.ufshc: dme-set: attr-id 0x1568 val 0x0 failed 0 retries
>> > [   15.400514] ufshcd-qcom 1d84000.ufshc: __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
>> > [   15.409439] ufshcd-qcom 1d84000.ufshc: ufshcd_try_to_abort_task: no response from device. tag = 10, err -110
>> > [   15.479763] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   15.546285] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   15.612688] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   15.679057] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   15.685187] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
>> > [   15.754125] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   15.820405] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   15.886519] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   15.952976] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   15.959119] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
>> > [   16.027951] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   16.094499] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   16.161005] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   16.276723] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   16.282756] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
>> > [   16.360239] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   16.427425] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   16.497737] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   16.566869] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   16.572939] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
>> > [   16.642243] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   16.708590] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   16.774868] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   16.841364] ufshcd-qcom 1d84000.ufshc: Controller enable failed
>> > [   16.847492] ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
>> > [   16.855938] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler: reset and restore failed with err -5
>> > [   16.936724] sd 0:0:0:2: [sdc] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=1s
>> > [   16.946151] sd 0:0:0:2: [sdc] tag#12 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
>> > [   16.953950] blk_update_request: I/O error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> > [   16.963987] Buffer I/O error on dev sdc, logical block 0, async page read
>> > [   16.971263] sd 0:0:0:1: [sdb] tag#13 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=1s
>> > [   16.980668] sd 0:0:0:1: [sdb] tag#13 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
>> > [   16.988457] blk_update_request: I/O error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> > [   16.998455] Buffer I/O error on dev sdb, logical block 0, async page read
>> > [   17.005438] sd 0:0:0:0: [sda] tag#15 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=1s
>> > [   17.014887] sd 0:0:0:0: [sda] tag#15 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
>> > [   17.022639] blk_update_request: I/O error, dev sda, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> > [   17.032581] Buffer I/O error on dev sda, logical block 0, async page read
>> > [   17.060604] sd 0:0:0:2: [sdc] tag#11 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
>> > [   17.070024] sd 0:0:0:2: [sdc] tag#11 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
>> > [   17.077811] blk_update_request: I/O error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> > [   17.087785] Buffer I/O error on dev sdc, logical block 0, async page read
>> > [   17.094762]  sdc: unable to read partition table
>> > [   17.121375] sd 0:0:0:0: [sda] tag#13 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
>> > [   17.124811] sd 0:0:0:1: [sdb] tag#14 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
>> > [   17.130915] sd 0:0:0:0: [sda] tag#13 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
>> > [   17.138281] sd 0:0:0:2: [sdc] tag#9 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
>> > [   17.138351] sd 0:0:0:2: [sdc] tag#9 CDB: opcode=0x28 28 00 00 00 03 f0 00 00 01 00
>> > [   17.138361] blk_update_request: I/O error, dev sdc, sector 8064 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> > [   17.140305] sd 0:0:0:1: [sdb] tag#14 CDB: opcode=0x28 28 00 00 00 00 00 00 00 01 00
>> > [   17.148070] blk_update_request: I/O error, dev sda, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> > [   17.157308] blk_update_request: I/O error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> > [   17.164973] Buffer I/O error on dev sda, logical block 0, async page read
>> > [   17.175522] Buffer I/O error on dev sdb, logical block 0, async page read
>> > [   17.183484]  sda: unable to read partition table
>> > [   17.193379]  sdb: unable to read partition table
>> > [   17.245890] sd 0:0:0:0: [sda] tag#16 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
>> > [   17.255444] sd 0:0:0:0: [sda] tag#16 CDB: opcode=0x28 28 00 00 d7 d7 f0 00 00 01 00
>> > [   17.262309] sd 0:0:0:2: [sdc] tag#9 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
>> > [   17.263226] blk_update_request: I/O error, dev sda, sector 113164160 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> > [   17.267751] sd 0:0:0:1: [sdb] tag#14 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
>> > [   17.267804] sd 0:0:0:1: [sdb] tag#14 CDB: opcode=0x28 28 00 00 00 03 f0 00 00 01 00
>> > [   17.267820] blk_update_request: I/O error, dev sdb, sector 8064 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> > [   17.272473] sd 0:0:0:2: [sdc] tag#9 CDB: opcode=0x28 28 00 00 00 03 f0 00 00 01 00
>> > [   17.272480] blk_update_request: I/O error, dev sdc, sector 8064 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> > [   17.328934] Buffer I/O error on dev sdc, logical block 1008, async page read
>> > [   17.336400] Buffer I/O error on dev sdb, logical block 1008, async page read
>> > [   17.344184] Buffer I/O error on dev sda, logical block 14145520, async page read
>> > [   17.356908] sd 0:0:0:3: [sdd] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
>> > [   17.365890] sd 0:0:0:3: [sdd] Sense not available.
>> > [   17.504489] sd 0:0:0:1: [sdb] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
>> > [   17.513199] sd 0:0:0:1: [sdb] Sense not available.
>> > [   17.548648] sd 0:0:0:0: [sda] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
>> > [   17.548875] sd 0:0:0:2: [sdc] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
>> > [   17.557365] sd 0:0:0:0: [sda] Sense not available.
>> > [   17.570888] sd 0:0:0:2: [sdc] Sense not available.
>> > [   17.612557] sd 0:0:0:3: [sdd] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
>> > [   17.621302] sd 0:0:0:3: [sdd] Sense not available.
>> > [   17.720470] sd 0:0:0:3: [sdd] 0 512-byte logical blocks: (0 B/0 B)
>> > [   17.726895] sd 0:0:0:3: [sdd] 0-byte physical blocks
>> > [   17.756442] sd 0:0:0:1: [sdb] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
>> > [   17.765154] sd 0:0:0:1: [sdb] Sense not available.
>> > [   17.796694] sd 0:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
>> > [   17.805550] sd 0:0:0:0: [sda] Sense not available.
>> > [   17.812461] sd 0:0:0:2: [sdc] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
>> > [   17.812479] sd 0:0:0:3: [sdd] Write Protect is off
>> > [   17.821124] sd 0:0:0:2: [sdc] Sense not available.
>> > [   17.831183] sd 0:0:0:3: [sdd] Mode Sense: 00 00 00 00
>> > [   17.916633] sd 0:0:0:3: [sdd] Asking for cache data failed
>> > [   17.922246] sd 0:0:0:3: [sdd] Assuming drive cache: write through
>> > [   17.928661] sd 0:0:0:1: [sdb] 0 4096-byte logical blocks: (0 B/0 B)
>> > [   17.964461] sd 0:0:0:2: [sdc] 0 4096-byte logical blocks: (0 B/0 B)
>> > [   17.968389] sd 0:0:0:0: [sda] 0 4096-byte logical blocks: (0 B/0 B)
>> > [   18.052598] sd 0:0:0:2: [sdc] Write Protect is on
>> > [   18.057654] sd 0:0:0:2: [sdc] Mode Sense: 00 01 1c 88
>> > [   18.140750] Buffer I/O error on dev sdc, logical block 1008, async page read
>> > [   18.148120] sdc: detected capacity change from 0 to 8192
>> > [   18.148436] sdb: detected capacity change from 0 to 8192
>> > [   18.153573] sd 0:0:0:2: [sdc] Attached SCSI disk
>> > [   18.161640] sd 0:0:0:1: [sdb] Attached SCSI disk
>> > [   18.272458] sda: detected capacity change from 0 to 113164288
>> > [   18.278440] sd 0:0:0:0: [sda] Attached SCSI disk
>> > [   18.336456] sd 0:0:0:3: [sdd] Read Capacity(16) failed: Result: hostbyte=0x07 driverbyte=0x00
>> > [   18.345155] sd 0:0:0:3: [sdd] Sense not available.
>> > [   18.437277] ufshcd-qcom 1d84000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 2 failed, index 4, err = -11
>> > [   18.588427] sd 0:0:0:3: [sdd] Read Capacity(10) failed: Result: hostbyte=0x07 driverbyte=0x00
>> > [   18.597131] sd 0:0:0:3: [sdd] Sense not available.
>> > [   18.984425] sd 0:0:0:3: [sdd] Attached SCSI disk
>> > [   19.973658] ufshcd-qcom 1d84000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 2 failed, index 4, err = -11
>> > [   21.508706] ufshcd-qcom 1d84000.ufshc: __ufshcd_query_descriptor: opcode 0x01 for idn 2 failed, index 4, err = -11
>> > [   21.519204] ufshcd-qcom 1d84000.ufshc: ufshcd_read_desc_param: Failed reading descriptor. desc_id 2, desc_index 4, param_offset 6, ret -11
>> > [   21.577216] sd 0:0:0:0: Unexpected response from lun 4 while scanning, scan aborted
>> > [   22.181387] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   23.007510] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   23.076607] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   23.524907] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   24.133026] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   24.421140] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   24.549023] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   25.156985] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   25.253136] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   26.180950] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   26.597019] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   27.204984] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   27.493135] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   27.621022] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   28.228982] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   28.325152] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   29.252956] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   29.669021] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   30.276984] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   30.565140] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   30.693023] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   31.300983] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   31.397134] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   32.324954] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   32.741021] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   33.348985] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   33.637132] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   33.765258] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   34.372980] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   34.469142] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   35.396957] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   35.813018] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   36.420980] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   36.709133] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   36.837021] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   37.444979] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   37.541138] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   38.468955] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   38.885021] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   39.492983] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   39.781134] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   39.909017] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   40.516977] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   40.613135] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   41.540956] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   41.957016] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   42.564987] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   42.853130] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   42.981020] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   43.588980] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   43.685135] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   44.612954] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   45.029014] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   45.636979] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   45.925137] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   46.053022] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   46.660983] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> > [   46.757132] devfreq 1d84000.ufshc: dvfs failed with (-16) error
>> >
>> > Is this something that you've seen?
>> >
>> > Regards,
>> > Bjorn
>> 
>> Hi Bjorn
>> Nope, I've not seen this issue yet.
>> Looks like the commands are timing out, could be clocks/power, 
>> perhaps?
>> 
> 
> I've finally concluded that reverting 4543d9d78227 ("scsi: ufs: 
> Refactor
> ufshcd_init/exit_clk_scaling/gating()") fixes the problem and allow at
> least SDM845 to boot again (haven't yet tested the others).
> 
> I've not yet figured out why it's causing the regression.
> 

Hi Bjorn,

Thanks for pointing it out. I have sent out the fix - my mistake that I
removed two lines by mistake in commit 4543d9d78227. When I tested 
commit
4543d9d78227, I found that the two lines were missing, but I forgot to
add the two lines into 4543d9d78227 when I sent it out.

Regards,
Can Guo.

> Regards,
> Bjorn

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH v2 0/2] Fix deadlock in ufs
  2021-01-27  4:00 [RFC PATCH v1 0/2] Fix deadlock in ufs Asutosh Das
                   ` (2 preceding siblings ...)
  2021-01-27 15:22 ` [RFC PATCH v1 0/2] Fix deadlock in ufs Bjorn Andersson
@ 2021-01-28  3:26 ` Asutosh Das
  2021-01-28  3:26 ` [RFC PATCH v2 1/2] block: bsg: resume platform device before accessing Asutosh Das
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 31+ messages in thread
From: Asutosh Das @ 2021-01-28  3:26 UTC (permalink / raw)
  To: cang, martin.petersen, linux-scsi; +Cc: Asutosh Das, linux-arm-msm, stern

v1 -> v2
Use pm_runtime_get/put APIs.
Assuming that all bsg devices are scsi devices may break.

This patchset attempts to fix a deadlock in ufs.
This deadlock occurs because the ufs host driver tries to resume
its child (wlun scsi device) to send SSU to it during its suspend.

Asutosh Das (2):
  block: bsg: resume scsi device before accessing
  scsi: ufs: Fix deadlock while suspending ufs host

 block/bsg.c               |  8 ++++++++
 drivers/scsi/ufs/ufshcd.c | 18 ++----------------
 2 files changed, 10 insertions(+), 16 deletions(-)

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH v2 1/2] block: bsg: resume platform device before accessing
  2021-01-27  4:00 [RFC PATCH v1 0/2] Fix deadlock in ufs Asutosh Das
                   ` (3 preceding siblings ...)
  2021-01-28  3:26 ` [RFC PATCH v2 " Asutosh Das
@ 2021-01-28  3:26 ` Asutosh Das
  2021-01-28  3:26   ` [RFC PATCH v2 2/2] scsi: ufs: Fix deadlock while suspending ufs host Asutosh Das
  2021-01-28  9:33 ` [RFC PATCH v2 0/2] Fix deadlock in ufs Avri Altman
  2021-02-01 20:11 ` Asutosh Das (asd)
  6 siblings, 1 reply; 31+ messages in thread
From: Asutosh Das @ 2021-01-28  3:26 UTC (permalink / raw)
  To: cang, martin.petersen, linux-scsi
  Cc: Asutosh Das, linux-arm-msm, stern, Bao D . Nguyen,
	FUJITA Tomonori, Jens Axboe, open list:BLOCK LAYER, open list

It may happen that the underlying device's runtime-pm is
not controlled by block-pm. So it's possible that when
commands are sent to the device, it's suspended and may not
be resumed by blk-pm. Hence explicitly resume the parent
which is the platform device.

Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bao D. Nguyen <nguyenb@codeaurora.org>
---
 block/bsg.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/block/bsg.c b/block/bsg.c
index d7bae94..e9fc896 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -12,6 +12,7 @@
 #include <linux/idr.h>
 #include <linux/bsg.h>
 #include <linux/slab.h>
+#include <linux/pm_runtime.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_ioctl.h>
@@ -306,12 +307,15 @@ static struct bsg_device *bsg_get_device(struct inode *inode, struct file *file)
 static int bsg_open(struct inode *inode, struct file *file)
 {
 	struct bsg_device *bd;
+	struct bsg_class_device *bcd;
 
 	bd = bsg_get_device(inode, file);
 
 	if (IS_ERR(bd))
 		return PTR_ERR(bd);
 
+	bcd = &bd->queue->bsg_dev;
+	pm_runtime_get_sync(bcd->class_dev->parent);
 	file->private_data = bd;
 	return 0;
 }
@@ -319,8 +323,12 @@ static int bsg_open(struct inode *inode, struct file *file)
 static int bsg_release(struct inode *inode, struct file *file)
 {
 	struct bsg_device *bd = file->private_data;
+	struct bsg_class_device *bcd;
 
 	file->private_data = NULL;
+
+	bcd = &bd->queue->bsg_dev;
+	pm_runtime_put_sync(bcd->class_dev->parent);
 	return bsg_put_device(bd);
 }
 
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH v2 2/2] scsi: ufs: Fix deadlock while suspending ufs host
  2021-01-28  3:26 ` [RFC PATCH v2 1/2] block: bsg: resume platform device before accessing Asutosh Das
@ 2021-01-28  3:26   ` Asutosh Das
  2021-01-28 12:21     ` Avri Altman
  0 siblings, 1 reply; 31+ messages in thread
From: Asutosh Das @ 2021-01-28  3:26 UTC (permalink / raw)
  To: cang, martin.petersen, linux-scsi
  Cc: Asutosh Das, linux-arm-msm, stern, Bao D . Nguyen, Alim Akhtar,
	Avri Altman, James E.J. Bottomley, Stanley Chu, Bean Huo,
	open list

During runtime-suspend of ufs host, the scsi devices are
already suspended and so are the queues associated with them.
But the ufs host sends SSU to wlun during its runtime-suspend.
During the process blk_queue_enter checks if the queue is not in
suspended state. If so, it waits for the queue to resume, and never
comes out of it.
The commit
(d55d15a33: scsi: block: Do not accept any requests while suspended)
adds the check if the queue is in suspended state in blk_queue_enter().

Fix this, by decoupling wlun scsi devices from block layer pm.
The runtime-pm for these devices would be managed by bsg and sg drivers.

Call trace:
 __switch_to+0x174/0x2c4
 __schedule+0x478/0x764
 schedule+0x9c/0xe0
 blk_queue_enter+0x158/0x228
 blk_mq_alloc_request+0x40/0xa4
 blk_get_request+0x2c/0x70
 __scsi_execute+0x60/0x1c4
 ufshcd_set_dev_pwr_mode+0x124/0x1e4
 ufshcd_suspend+0x208/0x83c
 ufshcd_runtime_suspend+0x40/0x154
 ufshcd_pltfrm_runtime_suspend+0x14/0x20
 pm_generic_runtime_suspend+0x28/0x3c
 __rpm_callback+0x80/0x2a4
 rpm_suspend+0x308/0x614
 rpm_idle+0x158/0x228
 pm_runtime_work+0x84/0xac
 process_one_work+0x1f0/0x470
 worker_thread+0x26c/0x4c8
 kthread+0x13c/0x320
 ret_from_fork+0x10/0x18

Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bao D. Nguyen <nguyenb@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c | 18 ++----------------
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 9c691e4..b7e7f81 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -7217,16 +7217,6 @@ static void ufshcd_set_active_icc_lvl(struct ufs_hba *hba)
 	kfree(desc_buf);
 }
 
-static inline void ufshcd_blk_pm_runtime_init(struct scsi_device *sdev)
-{
-	scsi_autopm_get_device(sdev);
-	blk_pm_runtime_init(sdev->request_queue, &sdev->sdev_gendev);
-	if (sdev->rpm_autosuspend)
-		pm_runtime_set_autosuspend_delay(&sdev->sdev_gendev,
-						 RPM_AUTOSUSPEND_DELAY_MS);
-	scsi_autopm_put_device(sdev);
-}
-
 /**
  * ufshcd_scsi_add_wlus - Adds required W-LUs
  * @hba: per-adapter instance
@@ -7265,7 +7255,6 @@ static int ufshcd_scsi_add_wlus(struct ufs_hba *hba)
 		hba->sdev_ufs_device = NULL;
 		goto out;
 	}
-	ufshcd_blk_pm_runtime_init(hba->sdev_ufs_device);
 	scsi_device_put(hba->sdev_ufs_device);
 
 	hba->sdev_rpmb = __scsi_add_device(hba->host, 0, 0,
@@ -7274,17 +7263,14 @@ static int ufshcd_scsi_add_wlus(struct ufs_hba *hba)
 		ret = PTR_ERR(hba->sdev_rpmb);
 		goto remove_sdev_ufs_device;
 	}
-	ufshcd_blk_pm_runtime_init(hba->sdev_rpmb);
 	scsi_device_put(hba->sdev_rpmb);
 
 	sdev_boot = __scsi_add_device(hba->host, 0, 0,
 		ufshcd_upiu_wlun_to_scsi_wlun(UFS_UPIU_BOOT_WLUN), NULL);
-	if (IS_ERR(sdev_boot)) {
+	if (IS_ERR(sdev_boot))
 		dev_err(hba->dev, "%s: BOOT WLUN not found\n", __func__);
-	} else {
-		ufshcd_blk_pm_runtime_init(sdev_boot);
+	else
 		scsi_device_put(sdev_boot);
-	}
 	goto out;
 
 remove_sdev_ufs_device:
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH v2 0/2] Fix deadlock in ufs
  2021-01-27  4:00 [RFC PATCH v1 0/2] Fix deadlock in ufs Asutosh Das
                   ` (4 preceding siblings ...)
  2021-01-28  3:26 ` [RFC PATCH v2 1/2] block: bsg: resume platform device before accessing Asutosh Das
@ 2021-01-28  9:33 ` Avri Altman
  2021-01-28 17:19   ` Asutosh Das
  2021-02-01 20:11 ` Asutosh Das (asd)
  6 siblings, 1 reply; 31+ messages in thread
From: Avri Altman @ 2021-01-28  9:33 UTC (permalink / raw)
  To: Asutosh Das, cang, martin.petersen, linux-scsi; +Cc: linux-arm-msm, stern

Hi,
Asking again:
Following your 1/2 patch, shouldn't this series revert commit 74e5e468b664d as well?

Thanks,
Avri

> v1 -> v2
> Use pm_runtime_get/put APIs.
> Assuming that all bsg devices are scsi devices may break.
> 
> This patchset attempts to fix a deadlock in ufs.
> This deadlock occurs because the ufs host driver tries to resume
> its child (wlun scsi device) to send SSU to it during its suspend.
> 
> Asutosh Das (2):
>   block: bsg: resume scsi device before accessing
>   scsi: ufs: Fix deadlock while suspending ufs host
> 
>  block/bsg.c               |  8 ++++++++
>  drivers/scsi/ufs/ufshcd.c | 18 ++----------------
>  2 files changed, 10 insertions(+), 16 deletions(-)
> 
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
> Foundation Collaborative Project.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH v2 2/2] scsi: ufs: Fix deadlock while suspending ufs host
  2021-01-28  3:26   ` [RFC PATCH v2 2/2] scsi: ufs: Fix deadlock while suspending ufs host Asutosh Das
@ 2021-01-28 12:21     ` Avri Altman
  2021-01-28 16:39       ` Asutosh Das
  0 siblings, 1 reply; 31+ messages in thread
From: Avri Altman @ 2021-01-28 12:21 UTC (permalink / raw)
  To: Asutosh Das, cang, martin.petersen, linux-scsi
  Cc: linux-arm-msm, stern, Bao D . Nguyen, Alim Akhtar,
	James E.J. Bottomley, Stanley Chu, Bean Huo, open list

> 
> During runtime-suspend of ufs host, the scsi devices are
> already suspended and so are the queues associated with them.
> But the ufs host sends SSU to wlun during its runtime-suspend.
Do you possible meant: "sends request-sense while clearing UAC to...."

Thanks,
Avri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v2 2/2] scsi: ufs: Fix deadlock while suspending ufs host
  2021-01-28 12:21     ` Avri Altman
@ 2021-01-28 16:39       ` Asutosh Das
  0 siblings, 0 replies; 31+ messages in thread
From: Asutosh Das @ 2021-01-28 16:39 UTC (permalink / raw)
  To: Avri Altman
  Cc: cang, martin.petersen, linux-scsi, linux-arm-msm, stern,
	Bao D . Nguyen, Alim Akhtar, James E.J. Bottomley, Stanley Chu,
	Bean Huo, open list

On Thu, Jan 28 2021 at 04:21 -0800, Avri Altman wrote:
>>
>> During runtime-suspend of ufs host, the scsi devices are
>> already suspended and so are the queues associated with them.
>> But the ufs host sends SSU to wlun during its runtime-suspend.
>Do you possible meant: "sends request-sense while clearing UAC to...."
>

The idea was to show that there's a scsi command that's sent during
suspend which may deadlock.

Yes, I agree to your comment and would change the message to reflect it in
the next version.

>
>Thanks,
>Avri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v2 0/2] Fix deadlock in ufs
  2021-01-28  9:33 ` [RFC PATCH v2 0/2] Fix deadlock in ufs Avri Altman
@ 2021-01-28 17:19   ` Asutosh Das
  0 siblings, 0 replies; 31+ messages in thread
From: Asutosh Das @ 2021-01-28 17:19 UTC (permalink / raw)
  To: Avri Altman; +Cc: cang, martin.petersen, linux-scsi, linux-arm-msm, stern

On Thu, Jan 28 2021 at 01:36 -0800, Avri Altman wrote:
>Hi,
>Asking again:
>Following your 1/2 patch, shouldn't this series revert commit 74e5e468b664d as well?
>
>Thanks,
>Avri
>

Yes I think its reasonable to do that. I'll modify and send the v3 series.

>
>> v1 -> v2
>> Use pm_runtime_get/put APIs.
>> Assuming that all bsg devices are scsi devices may break.
>>
>> This patchset attempts to fix a deadlock in ufs.
>> This deadlock occurs because the ufs host driver tries to resume
>> its child (wlun scsi device) to send SSU to it during its suspend.
>>
>> Asutosh Das (2):
>>   block: bsg: resume scsi device before accessing
>>   scsi: ufs: Fix deadlock while suspending ufs host
>>
>>  block/bsg.c               |  8 ++++++++
>>  drivers/scsi/ufs/ufshcd.c | 18 ++----------------
>>  2 files changed, 10 insertions(+), 16 deletions(-)
>>
>> --
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
>> Foundation Collaborative Project.
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v2 0/2] Fix deadlock in ufs
  2021-01-27  4:00 [RFC PATCH v1 0/2] Fix deadlock in ufs Asutosh Das
                   ` (5 preceding siblings ...)
  2021-01-28  9:33 ` [RFC PATCH v2 0/2] Fix deadlock in ufs Avri Altman
@ 2021-02-01 20:11 ` Asutosh Das (asd)
  2021-02-01 20:27   ` Bart Van Assche
  2021-02-01 21:48   ` Alan Stern
  6 siblings, 2 replies; 31+ messages in thread
From: Asutosh Das (asd) @ 2021-02-01 20:11 UTC (permalink / raw)
  To: cang, martin.petersen, stern, Bart Van Assche; +Cc: linux-arm-msm, linux-scsi

On 1/27/2021 7:26 PM, Asutosh Das wrote:
> v1 -> v2
> Use pm_runtime_get/put APIs.
> Assuming that all bsg devices are scsi devices may break.
> 
> This patchset attempts to fix a deadlock in ufs.
> This deadlock occurs because the ufs host driver tries to resume
> its child (wlun scsi device) to send SSU to it during its suspend.
> 
> Asutosh Das (2):
>    block: bsg: resume scsi device before accessing
>    scsi: ufs: Fix deadlock while suspending ufs host
> 
>   block/bsg.c               |  8 ++++++++
>   drivers/scsi/ufs/ufshcd.c | 18 ++----------------
>   2 files changed, 10 insertions(+), 16 deletions(-)
> 

Hi Alan/Bart

Please can you take a look at this series.
Please let me know if you've any better suggestions for this.


Thanks
-asd


-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v2 0/2] Fix deadlock in ufs
  2021-02-01 20:11 ` Asutosh Das (asd)
@ 2021-02-01 20:27   ` Bart Van Assche
  2021-02-01 21:48   ` Alan Stern
  1 sibling, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2021-02-01 20:27 UTC (permalink / raw)
  To: Asutosh Das (asd), cang, martin.petersen, stern; +Cc: linux-arm-msm, linux-scsi

On 2/1/21 12:11 PM, Asutosh Das (asd) wrote:
> Please can you take a look at this series.
> Please let me know if you've any better suggestions for this.

Hi Asutosh,

Against which kernel version has this patch series been prepared and
tested? Have you noticed the following patch series that went into
v5.11-rc1:
https://lore.kernel.org/linux-scsi/20201209052951.16136-1-bvanassche@acm.org/
?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v2 0/2] Fix deadlock in ufs
  2021-02-01 20:11 ` Asutosh Das (asd)
  2021-02-01 20:27   ` Bart Van Assche
@ 2021-02-01 21:48   ` Alan Stern
  2021-02-02 20:52     ` Asutosh Das
  1 sibling, 1 reply; 31+ messages in thread
From: Alan Stern @ 2021-02-01 21:48 UTC (permalink / raw)
  To: Asutosh Das (asd)
  Cc: cang, martin.petersen, Bart Van Assche, linux-arm-msm, linux-scsi

On Mon, Feb 01, 2021 at 12:11:23PM -0800, Asutosh Das (asd) wrote:
> On 1/27/2021 7:26 PM, Asutosh Das wrote:
> > v1 -> v2
> > Use pm_runtime_get/put APIs.
> > Assuming that all bsg devices are scsi devices may break.
> > 
> > This patchset attempts to fix a deadlock in ufs.
> > This deadlock occurs because the ufs host driver tries to resume
> > its child (wlun scsi device) to send SSU to it during its suspend.
> > 
> > Asutosh Das (2):
> >    block: bsg: resume scsi device before accessing
> >    scsi: ufs: Fix deadlock while suspending ufs host
> > 
> >   block/bsg.c               |  8 ++++++++
> >   drivers/scsi/ufs/ufshcd.c | 18 ++----------------
> >   2 files changed, 10 insertions(+), 16 deletions(-)
> > 
> 
> Hi Alan/Bart
> 
> Please can you take a look at this series.
> Please let me know if you've any better suggestions for this.

I haven't commented on them so far because I don't understand them.

> [RFC PATCH v2 1/2] block: bsg: resume platform device before accessing:
> 
> It may happen that the underlying device's runtime-pm is
> not controlled by block-pm. So it's possible that when
> commands are sent to the device, it's suspended and may not
> be resumed by blk-pm. Hence explicitly resume the parent
> which is the platform device.

If you want to send a command to the underlying device, why do you 
resume the underlying device's _parent_?  Why not resume the device 
itself?

Why is bsg sending commands to the underlying device in a way that 
evades checks for whether the device is suspended?  Shouldn't the 
device's driver already be responsible for automatically resuming the 
device when a command is sent?

> [RFC PATCH v2 2/2] scsi: ufs: Fix deadlock while suspending ufs host:
> 
> During runtime-suspend of ufs host, the scsi devices are
> already suspended and so are the queues associated with them.
> But the ufs host sends SSU to wlun during its runtime-suspend.
> During the process blk_queue_enter checks if the queue is not in
> suspended state. If so, it waits for the queue to resume, and never
> comes out of it.
> The commit
> (d55d15a33: scsi: block: Do not accept any requests while suspended)
> adds the check if the queue is in suspended state in blk_queue_enter().
> 
> Fix this, by decoupling wlun scsi devices from block layer pm.
> The runtime-pm for these devices would be managed by bsg and sg drivers.

Why do you need to send a command to the wlun when the host is being 
suspended?  Shouldn't that command already have been sent, at the time 
when the wlun was suspended?

Alan Stern

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v2 0/2] Fix deadlock in ufs
  2021-02-01 21:48   ` Alan Stern
@ 2021-02-02 20:52     ` Asutosh Das
  2021-02-02 22:05       ` Alan Stern
  0 siblings, 1 reply; 31+ messages in thread
From: Asutosh Das @ 2021-02-02 20:52 UTC (permalink / raw)
  To: Alan Stern
  Cc: cang, martin.petersen, Bart Van Assche, linux-arm-msm, linux-scsi

On Mon, Feb 01 2021 at 13:48 -0800, Alan Stern wrote:
>On Mon, Feb 01, 2021 at 12:11:23PM -0800, Asutosh Das (asd) wrote:
>> On 1/27/2021 7:26 PM, Asutosh Das wrote:
>> > v1 -> v2
>> > Use pm_runtime_get/put APIs.
>> > Assuming that all bsg devices are scsi devices may break.
>> >
>> > This patchset attempts to fix a deadlock in ufs.
>> > This deadlock occurs because the ufs host driver tries to resume
>> > its child (wlun scsi device) to send SSU to it during its suspend.
>> >
>> > Asutosh Das (2):
>> >    block: bsg: resume scsi device before accessing
>> >    scsi: ufs: Fix deadlock while suspending ufs host
>> >
>> >   block/bsg.c               |  8 ++++++++
>> >   drivers/scsi/ufs/ufshcd.c | 18 ++----------------
>> >   2 files changed, 10 insertions(+), 16 deletions(-)
>> >
>>
>> Hi Alan/Bart
>>
>> Please can you take a look at this series.
>> Please let me know if you've any better suggestions for this.
>
>I haven't commented on them so far because I don't understand them.

Merging thread with Bart.

>Against which kernel version has this patch series been prepared and
>tested? Have you noticed the following patch series that went into
>v5.11-rc1
>https://lore.kernel.org/linux-scsi/20201209052951.16136-1-bvanassche@acm.org/
Hi Bart - Yes this was tested with this series pulled in.
I'm on 5.10.9.

Thanks Alan.
I've tried to summarize below the problem that I'm seeing.

Problem:
There's a deadlock seen in ufs's runtime-suspend path.
Currently, the wlun's are registered to request based blk-pm.
During ufs pltform-dev runtime-suspend cb, as per protocol needs,
it sends a few cmds (uac, ssu) to wlun.

In this path, it tries to resume the ufs platform device which is actually
suspending and deadlocks.

Yes, if the host doesn't send any commands during it's suspend there wouldn't be
this deadlock.
Setting manage_start_stop would send ssu only.
I can't seem to find a way to send cmds to wlun during it's suspend.
Would overriding sd_pm_ops for wlun be a good idea?
Do you've any other pointers on how to do this?
I'd appreciate any pointers.

>
>> [RFC PATCH v2 1/2] block: bsg: resume platform device before accessing:
>>
>> It may happen that the underlying device's runtime-pm is
>> not controlled by block-pm. So it's possible that when
>> commands are sent to the device, it's suspended and may not
>> be resumed by blk-pm. Hence explicitly resume the parent
>> which is the platform device.
>
>If you want to send a command to the underlying device, why do you
>resume the underlying device's _parent_?  Why not resume the device
>itself?
>
>Why is bsg sending commands to the underlying device in a way that
>evades checks for whether the device is suspended?  Shouldn't the
>device's driver already be responsible for automatically resuming the
>device when a command is sent?
>
>> [RFC PATCH v2 2/2] scsi: ufs: Fix deadlock while suspending ufs host:
>>
>> During runtime-suspend of ufs host, the scsi devices are
>> already suspended and so are the queues associated with them.
>> But the ufs host sends SSU to wlun during its runtime-suspend.
>> During the process blk_queue_enter checks if the queue is not in
>> suspended state. If so, it waits for the queue to resume, and never
>> comes out of it.
>> The commit
>> (d55d15a33: scsi: block: Do not accept any requests while suspended)
>> adds the check if the queue is in suspended state in blk_queue_enter().
>>
>> Fix this, by decoupling wlun scsi devices from block layer pm.
>> The runtime-pm for these devices would be managed by bsg and sg drivers.
>
>Why do you need to send a command to the wlun when the host is being
>suspended?  Shouldn't that command already have been sent, at the time
>when the wlun was suspended?
>
>Alan Stern

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v2 0/2] Fix deadlock in ufs
  2021-02-02 20:52     ` Asutosh Das
@ 2021-02-02 22:05       ` Alan Stern
  2021-02-04  0:13         ` Asutosh Das
  0 siblings, 1 reply; 31+ messages in thread
From: Alan Stern @ 2021-02-02 22:05 UTC (permalink / raw)
  To: Asutosh Das
  Cc: cang, martin.petersen, Bart Van Assche, linux-arm-msm, linux-scsi

On Tue, Feb 02, 2021 at 12:52:45PM -0800, Asutosh Das wrote:
> On Mon, Feb 01 2021 at 13:48 -0800, Alan Stern wrote:
> > On Mon, Feb 01, 2021 at 12:11:23PM -0800, Asutosh Das (asd) wrote:
> > > On 1/27/2021 7:26 PM, Asutosh Das wrote:
> > > > v1 -> v2
> > > > Use pm_runtime_get/put APIs.
> > > > Assuming that all bsg devices are scsi devices may break.
> > > >
> > > > This patchset attempts to fix a deadlock in ufs.
> > > > This deadlock occurs because the ufs host driver tries to resume
> > > > its child (wlun scsi device) to send SSU to it during its suspend.
> > > >
> > > > Asutosh Das (2):
> > > >    block: bsg: resume scsi device before accessing
> > > >    scsi: ufs: Fix deadlock while suspending ufs host
> > > >
> > > >   block/bsg.c               |  8 ++++++++
> > > >   drivers/scsi/ufs/ufshcd.c | 18 ++----------------
> > > >   2 files changed, 10 insertions(+), 16 deletions(-)
> > > >
> > > 
> > > Hi Alan/Bart
> > > 
> > > Please can you take a look at this series.
> > > Please let me know if you've any better suggestions for this.
> > 
> > I haven't commented on them so far because I don't understand them.
> 
> Merging thread with Bart.
> 
> > Against which kernel version has this patch series been prepared and
> > tested? Have you noticed the following patch series that went into
> > v5.11-rc1
> > https://lore.kernel.org/linux-scsi/20201209052951.16136-1-bvanassche@acm.org/
> Hi Bart - Yes this was tested with this series pulled in.
> I'm on 5.10.9.
> 
> Thanks Alan.
> I've tried to summarize below the problem that I'm seeing.
> 
> Problem:
> There's a deadlock seen in ufs's runtime-suspend path.
> Currently, the wlun's are registered to request based blk-pm.
> During ufs pltform-dev runtime-suspend cb, as per protocol needs,
> it sends a few cmds (uac, ssu) to wlun.

That doesn't make sense.  Why send commands to the wlun at a time when 
you know the wlun is already suspended?  If you wanted the wlun to 
execute those commands, you should have sent them while the wlun was 
still powered up.

> In this path, it tries to resume the ufs platform device which is actually
> suspending and deadlocks.

Because you have violated the power management layering.  The platform 
device's suspend routine is meant to assume that all of its child 
devices are already suspended and therefore it must not try to 
communicate with them.

> Yes, if the host doesn't send any commands during it's suspend there wouldn't be
> this deadlock.
> Setting manage_start_stop would send ssu only.
> I can't seem to find a way to send cmds to wlun during it's suspend.

You can't send commands to _any_ device while it is suspended!  That's 
kind of the whole point -- being suspended means the device is in a 
low-power state and therefore is unable to execute commands.

> Would overriding sd_pm_ops for wlun be a good idea?
> Do you've any other pointers on how to do this?
> I'd appreciate any pointers.

I am not a good person to answer these questions, mainly because I know 
so little about this.  What is the relation between the wlun and the sd 
driver?  For that matter, what does the "w" in "wlun" stand for?

(And for that matter, what do "ufs" and "bsg" stand for?)

You really need to direct these questions to the SCSI maintainers; I am 
not in charge of any of that code.  I can only suggest a couple of 
possibilities.  For instance, you could modify the sd_suspend_runtime 
routine: make it send the commands whenever they are needed.  Or you 
could add a callback pointer to a routine that would send the commands.

Alan Stern

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v2 0/2] Fix deadlock in ufs
  2021-02-02 22:05       ` Alan Stern
@ 2021-02-04  0:13         ` Asutosh Das
  2021-02-04 19:48           ` Alan Stern
  0 siblings, 1 reply; 31+ messages in thread
From: Asutosh Das @ 2021-02-04  0:13 UTC (permalink / raw)
  To: Alan Stern
  Cc: cang, martin.petersen, Bart Van Assche, linux-arm-msm, linux-scsi

On Tue, Feb 02 2021 at 14:05 -0800, Alan Stern wrote:
>On Tue, Feb 02, 2021 at 12:52:45PM -0800, Asutosh Das wrote:
>> On Mon, Feb 01 2021 at 13:48 -0800, Alan Stern wrote:
>> > On Mon, Feb 01, 2021 at 12:11:23PM -0800, Asutosh Das (asd) wrote:
>> > > On 1/27/2021 7:26 PM, Asutosh Das wrote:
>> > > > v1 -> v2
>> > > > Use pm_runtime_get/put APIs.
>> > > > Assuming that all bsg devices are scsi devices may break.
>> > > >
>> > > > This patchset attempts to fix a deadlock in ufs.
>> > > > This deadlock occurs because the ufs host driver tries to resume
>> > > > its child (wlun scsi device) to send SSU to it during its suspend.
>> > > >
>> > > > Asutosh Das (2):
>> > > >    block: bsg: resume scsi device before accessing
>> > > >    scsi: ufs: Fix deadlock while suspending ufs host
>> > > >
>> > > >   block/bsg.c               |  8 ++++++++
>> > > >   drivers/scsi/ufs/ufshcd.c | 18 ++----------------
>> > > >   2 files changed, 10 insertions(+), 16 deletions(-)
>> > > >
>> > >
>> > > Hi Alan/Bart
>> > >
>> > > Please can you take a look at this series.
>> > > Please let me know if you've any better suggestions for this.
>> >
>> > I haven't commented on them so far because I don't understand them.
>>
>> Merging thread with Bart.
>>
>> > Against which kernel version has this patch series been prepared and
>> > tested? Have you noticed the following patch series that went into
>> > v5.11-rc1
>> > https://lore.kernel.org/linux-scsi/20201209052951.16136-1-bvanassche@acm.org/
>> Hi Bart - Yes this was tested with this series pulled in.
>> I'm on 5.10.9.
>>
>> Thanks Alan.
>> I've tried to summarize below the problem that I'm seeing.
>>
>> Problem:
>> There's a deadlock seen in ufs's runtime-suspend path.
>> Currently, the wlun's are registered to request based blk-pm.
>> During ufs pltform-dev runtime-suspend cb, as per protocol needs,
>> it sends a few cmds (uac, ssu) to wlun.
>
>That doesn't make sense.  Why send commands to the wlun at a time when
>you know the wlun is already suspended?  If you wanted the wlun to
>execute those commands, you should have sent them while the wlun was
>still powered up.
>
>> In this path, it tries to resume the ufs platform device which is actually
>> suspending and deadlocks.
>
>Because you have violated the power management layering.  The platform
>device's suspend routine is meant to assume that all of its child
>devices are already suspended and therefore it must not try to
>communicate with them.
>
>> Yes, if the host doesn't send any commands during it's suspend there wouldn't be
>> this deadlock.
>> Setting manage_start_stop would send ssu only.
>> I can't seem to find a way to send cmds to wlun during it's suspend.
>
>You can't send commands to _any_ device while it is suspended!  That's
>kind of the whole point -- being suspended means the device is in a
>low-power state and therefore is unable to execute commands.
>
>> Would overriding sd_pm_ops for wlun be a good idea?
>> Do you've any other pointers on how to do this?
>> I'd appreciate any pointers.
>
>I am not a good person to answer these questions, mainly because I know
>so little about this.  What is the relation between the wlun and the sd
>driver?  For that matter, what does the "w" in "wlun" stand for?
>
>(And for that matter, what do "ufs" and "bsg" stand for?)
>
>You really need to direct these questions to the SCSI maintainers; I am
>not in charge of any of that code.  I can only suggest a couple of
>possibilities.  For instance, you could modify the sd_suspend_runtime
>routine: make it send the commands whenever they are needed.  Or you
>could add a callback pointer to a routine that would send the commands.
>
>Alan Stern
>

Thanks Alan.
I understand the issues with the current ufs design.

ufs has a wlun (well-known lun) that handles power management commands,
such as SSUs. Now this wlun (device wlun) is registered as a scsi_device.
It's queue is also set up for runtime-pm. Likewise there're 2
more wluns, BOOT and RPMB.

Currently, the scsi devices independently runtime suspend/resume - request driven.
So to send SSU while suspending wlun (scsi_device) this scsi device should
be the last to be runtime suspended amongst all other ufs luns (scsi devices). The
reason is syncronize_cache() is sent to luns during their suspend and if SSU has
been sent already, it mostly would fail.
Perhaps that's the reason to send SSU during platform-device suspend. I'm not
sure if that's the right thing to do, but that's what it is now and is causing
this deadlock.
Now this wlun is also registered to bsg and some applications interact with rpmb
wlun and the device-wlun using that interface. Registering the corresponding
queues to runtime-pm ensures that the whole path is resumed before the request
is issued.
Because, we see this deadlock, in the RFC patch, I skipped registering the
queues representing the wluns to runtime-pm, thus removing the restrictions to
issue the request until queue is resumed.
But when the requests come-in via bsg, the device has to be resumed. Hence the
get_sync()/put_sync() in bsg driver.
The reason for initiating get_sync()/put_sync() on the parent device was because
the corresponding queue of this wlun was not setup for runtime-pm anymore.
And for ufs resuming the scsi device essentially means sending a SSU to wlun
which the ufs platform device does in its runtime resume now. I'm not sure if
that was a good idea though, hence the RFC on the patches.

And now it looks to me that adding a cb to sd_suspend_runtime may not work.
Because the scsi devices suspend asynchronously and the wlun suspends earlier than the others.

[    7.846165]scsi 0:0:0:49488: scsi_runtime_idle
[    7.851547]scsi 0:0:0:49488: device wlun
[    7.851809]sd 0:0:0:49488: scsi_runtime_idle
[    7.861536]sd 0:0:0:49488: scsi_runtime_suspend < suspends prior to other luns
[...]
[   12.861984]sd 0:0:0:1: [sdb] Synchronizing SCSI cache
[   12.868894]sd 0:0:0:2: [sdc] Synchronizing SCSI cache
[   13.124331]sd 0:0:0:0: [sda] Synchronizing SCSI cache
[   13.143961]sd 0:0:0:3: [sdd] Synchronizing SCSI cache
[   13.163876]sd 0:0:0:6: [sdg] Synchronizing SCSI cache
[   13.164024]sd 0:0:0:4: [sde] Synchronizing SCSI cache
[   13.167066]sd 0:0:0:5: [sdf] Synchronizing SCSI cache
[   17.101285]sd 0:0:0:7: [sdh] Synchronizing SCSI cache
[   73.889551]sd 0:0:0:4: [sde] Synchronizing SCSI cache

I'm not sure if there's a way to force the wlun to suspend only after all other luns are suspended.
Is there? I hope Bart/others help provide some inputs on this.

-asd



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v2 0/2] Fix deadlock in ufs
  2021-02-04  0:13         ` Asutosh Das
@ 2021-02-04 19:48           ` Alan Stern
  2021-02-04 21:14             ` Asutosh Das
  0 siblings, 1 reply; 31+ messages in thread
From: Alan Stern @ 2021-02-04 19:48 UTC (permalink / raw)
  To: Asutosh Das
  Cc: cang, martin.petersen, Bart Van Assche, linux-arm-msm, linux-scsi

On Wed, Feb 03, 2021 at 04:13:54PM -0800, Asutosh Das wrote:
> Thanks Alan.
> I understand the issues with the current ufs design.
> 
> ufs has a wlun (well-known lun) that handles power management commands,
> such as SSUs. Now this wlun (device wlun) is registered as a scsi_device.
> It's queue is also set up for runtime-pm. Likewise there're 2
> more wluns, BOOT and RPMB.
> 
> Currently, the scsi devices independently runtime suspend/resume - request driven.
> So to send SSU while suspending wlun (scsi_device) this scsi device should
> be the last to be runtime suspended amongst all other ufs luns (scsi devices). The
> reason is syncronize_cache() is sent to luns during their suspend and if SSU has
> been sent already, it mostly would fail.

The SCSI subsystem assumes that different LUNs operate independently.
Evidently that isn't true here.

> Perhaps that's the reason to send SSU during platform-device suspend. I'm not
> sure if that's the right thing to do, but that's what it is now and is causing
> this deadlock.
> Now this wlun is also registered to bsg and some applications interact with rpmb
> wlun and the device-wlun using that interface. Registering the corresponding
> queues to runtime-pm ensures that the whole path is resumed before the request
> is issued.
> Because, we see this deadlock, in the RFC patch, I skipped registering the
> queues representing the wluns to runtime-pm, thus removing the restrictions to
> issue the request until queue is resumed.
> But when the requests come-in via bsg, the device has to be resumed. Hence the
> get_sync()/put_sync() in bsg driver.

Does the bsg interface send its I/O requests to the LUNs through the
block request queue?

> The reason for initiating get_sync()/put_sync() on the parent device was because
> the corresponding queue of this wlun was not setup for runtime-pm anymore.
> And for ufs resuming the scsi device essentially means sending a SSU to wlun
> which the ufs platform device does in its runtime resume now. I'm not sure if
> that was a good idea though, hence the RFC on the patches.
> 
> And now it looks to me that adding a cb to sd_suspend_runtime may not work.
> Because the scsi devices suspend asynchronously and the wlun suspends earlier than the others.
> 
> [    7.846165]scsi 0:0:0:49488: scsi_runtime_idle
> [    7.851547]scsi 0:0:0:49488: device wlun
> [    7.851809]sd 0:0:0:49488: scsi_runtime_idle
> [    7.861536]sd 0:0:0:49488: scsi_runtime_suspend < suspends prior to other luns
> [...]
> [   12.861984]sd 0:0:0:1: [sdb] Synchronizing SCSI cache
> [   12.868894]sd 0:0:0:2: [sdc] Synchronizing SCSI cache
> [   13.124331]sd 0:0:0:0: [sda] Synchronizing SCSI cache
> [   13.143961]sd 0:0:0:3: [sdd] Synchronizing SCSI cache
> [   13.163876]sd 0:0:0:6: [sdg] Synchronizing SCSI cache
> [   13.164024]sd 0:0:0:4: [sde] Synchronizing SCSI cache
> [   13.167066]sd 0:0:0:5: [sdf] Synchronizing SCSI cache
> [   17.101285]sd 0:0:0:7: [sdh] Synchronizing SCSI cache
> [   73.889551]sd 0:0:0:4: [sde] Synchronizing SCSI cache
> 
> I'm not sure if there's a way to force the wlun to suspend only after all other luns are suspended.
> Is there? I hope Bart/others help provide some inputs on this.

I don't know what would work best for you; it depends on how the LUNs
are used.  But one possibility is to make sure that whenever the boot
and rpmb wluns are resumed, the device wlun is also resumed.  So for
example, the runtime-resume callback routines for the rpmb and boot
wluns could call pm_runtime_get_sync() for the device wlun, and their
runtime-suspend callback routines could call pm_runtime_put() for the
device wlun.  And of course there would have to be appropriate 
operations when those LUNs are bound to and unbound from their drivers.

Alan Stern

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v2 0/2] Fix deadlock in ufs
  2021-02-04 19:48           ` Alan Stern
@ 2021-02-04 21:14             ` Asutosh Das
  2021-02-05  7:56               ` Avri Altman
  0 siblings, 1 reply; 31+ messages in thread
From: Asutosh Das @ 2021-02-04 21:14 UTC (permalink / raw)
  To: Alan Stern
  Cc: cang, martin.petersen, Bart Van Assche, linux-arm-msm, linux-scsi

On Thu, Feb 04 2021 at 11:48 -0800, Alan Stern wrote:
>On Wed, Feb 03, 2021 at 04:13:54PM -0800, Asutosh Das wrote:
>> Thanks Alan.
>> I understand the issues with the current ufs design.
>>
>> ufs has a wlun (well-known lun) that handles power management commands,
>> such as SSUs. Now this wlun (device wlun) is registered as a scsi_device.
>> It's queue is also set up for runtime-pm. Likewise there're 2
>> more wluns, BOOT and RPMB.
>>
>> Currently, the scsi devices independently runtime suspend/resume - request driven.
>> So to send SSU while suspending wlun (scsi_device) this scsi device should
>> be the last to be runtime suspended amongst all other ufs luns (scsi devices). The
>> reason is syncronize_cache() is sent to luns during their suspend and if SSU has
>> been sent already, it mostly would fail.
>
>The SCSI subsystem assumes that different LUNs operate independently.
>Evidently that isn't true here.
>
>> Perhaps that's the reason to send SSU during platform-device suspend. I'm not
>> sure if that's the right thing to do, but that's what it is now and is causing
>> this deadlock.
>> Now this wlun is also registered to bsg and some applications interact with rpmb
>> wlun and the device-wlun using that interface. Registering the corresponding
>> queues to runtime-pm ensures that the whole path is resumed before the request
>> is issued.
>> Because, we see this deadlock, in the RFC patch, I skipped registering the
>> queues representing the wluns to runtime-pm, thus removing the restrictions to
>> issue the request until queue is resumed.
>> But when the requests come-in via bsg, the device has to be resumed. Hence the
>> get_sync()/put_sync() in bsg driver.
>
>Does the bsg interface send its I/O requests to the LUNs through the
>block request queue?
>
>
>> The reason for initiating get_sync()/put_sync() on the parent device was because
>> the corresponding queue of this wlun was not setup for runtime-pm anymore.
>> And for ufs resuming the scsi device essentially means sending a SSU to wlun
>> which the ufs platform device does in its runtime resume now. I'm not sure if
>> that was a good idea though, hence the RFC on the patches.
>>
>> And now it looks to me that adding a cb to sd_suspend_runtime may not work.
>> Because the scsi devices suspend asynchronously and the wlun suspends earlier than the others.
>>
>> [    7.846165]scsi 0:0:0:49488: scsi_runtime_idle
>> [    7.851547]scsi 0:0:0:49488: device wlun
>> [    7.851809]sd 0:0:0:49488: scsi_runtime_idle
>> [    7.861536]sd 0:0:0:49488: scsi_runtime_suspend < suspends prior to other luns
>> [...]
>> [   12.861984]sd 0:0:0:1: [sdb] Synchronizing SCSI cache
>> [   12.868894]sd 0:0:0:2: [sdc] Synchronizing SCSI cache
>> [   13.124331]sd 0:0:0:0: [sda] Synchronizing SCSI cache
>> [   13.143961]sd 0:0:0:3: [sdd] Synchronizing SCSI cache
>> [   13.163876]sd 0:0:0:6: [sdg] Synchronizing SCSI cache
>> [   13.164024]sd 0:0:0:4: [sde] Synchronizing SCSI cache
>> [   13.167066]sd 0:0:0:5: [sdf] Synchronizing SCSI cache
>> [   17.101285]sd 0:0:0:7: [sdh] Synchronizing SCSI cache
>> [   73.889551]sd 0:0:0:4: [sde] Synchronizing SCSI cache
>>
>> I'm not sure if there's a way to force the wlun to suspend only after all other luns are suspended.
>> Is there? I hope Bart/others help provide some inputs on this.
>
>I don't know what would work best for you; it depends on how the LUNs
>are used.  But one possibility is to make sure that whenever the boot
>and rpmb wluns are resumed, the device wlun is also resumed.  So for
>example, the runtime-resume callback routines for the rpmb and boot
>wluns could call pm_runtime_get_sync() for the device wlun, and their
>runtime-suspend callback routines could call pm_runtime_put() for the
>device wlun.  And of course there would have to be appropriate
>operations when those LUNs are bound to and unbound from their drivers.
>
>Alan Stern
>
Thanks Alan.
CanG & I had some discussions on it as well the other day.
I'm now looking into creating a device link between the siblings.
e.g. make the device wlun as a supplier for all the other luns & wluns.
So device wlun (supplier) wouldn't suspend (runtime/system) until all of the other
consumers are suspended. After this linking, I can move all the
pm commands that are being sent by host to the dedicated suspend routine of the device
wlun and the host needn't send any cmds during its suspend and layering
violation wouldn't take place.

-asd


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH v2 0/2] Fix deadlock in ufs
  2021-02-04 21:14             ` Asutosh Das
@ 2021-02-05  7:56               ` Avri Altman
  2021-02-05 16:11                 ` Asutosh Das
  0 siblings, 1 reply; 31+ messages in thread
From: Avri Altman @ 2021-02-05  7:56 UTC (permalink / raw)
  To: Asutosh Das, Alan Stern
  Cc: cang, martin.petersen, Bart Van Assche, linux-arm-msm, linux-scsi

> 
> On Thu, Feb 04 2021 at 11:48 -0800, Alan Stern wrote:
> >On Wed, Feb 03, 2021 at 04:13:54PM -0800, Asutosh Das wrote:
> >> Thanks Alan.
> >> I understand the issues with the current ufs design.
> >>
> >> ufs has a wlun (well-known lun) that handles power management
> commands,
> >> such as SSUs. Now this wlun (device wlun) is registered as a scsi_device.
> >> It's queue is also set up for runtime-pm. Likewise there're 2
> >> more wluns, BOOT and RPMB.
> >>
> >> Currently, the scsi devices independently runtime suspend/resume - request
> driven.
> >> So to send SSU while suspending wlun (scsi_device) this scsi device should
> >> be the last to be runtime suspended amongst all other ufs luns (scsi devices).
> The
> >> reason is syncronize_cache() is sent to luns during their suspend and if SSU
> has
> >> been sent already, it mostly would fail.
> >
> >The SCSI subsystem assumes that different LUNs operate independently.
> >Evidently that isn't true here.
> >
> >> Perhaps that's the reason to send SSU during platform-device suspend. I'm
> not
> >> sure if that's the right thing to do, but that's what it is now and is causing
> >> this deadlock.
> >> Now this wlun is also registered to bsg and some applications interact with
> rpmb
> >> wlun and the device-wlun using that interface. Registering the
> corresponding
> >> queues to runtime-pm ensures that the whole path is resumed before the
> request
> >> is issued.
> >> Because, we see this deadlock, in the RFC patch, I skipped registering the
> >> queues representing the wluns to runtime-pm, thus removing the
> restrictions to
> >> issue the request until queue is resumed.
> >> But when the requests come-in via bsg, the device has to be resumed. Hence
> the
> >> get_sync()/put_sync() in bsg driver.
> >
> >Does the bsg interface send its I/O requests to the LUNs through the
> >block request queue?
> >
> >
> >> The reason for initiating get_sync()/put_sync() on the parent device was
> because
> >> the corresponding queue of this wlun was not setup for runtime-pm
> anymore.
> >> And for ufs resuming the scsi device essentially means sending a SSU to wlun
> >> which the ufs platform device does in its runtime resume now. I'm not sure
> if
> >> that was a good idea though, hence the RFC on the patches.
> >>
> >> And now it looks to me that adding a cb to sd_suspend_runtime may not
> work.
> >> Because the scsi devices suspend asynchronously and the wlun suspends
> earlier than the others.
> >>
> >> [    7.846165]scsi 0:0:0:49488: scsi_runtime_idle
> >> [    7.851547]scsi 0:0:0:49488: device wlun
> >> [    7.851809]sd 0:0:0:49488: scsi_runtime_idle
> >> [    7.861536]sd 0:0:0:49488: scsi_runtime_suspend < suspends prior to other
> luns
> >> [...]
> >> [   12.861984]sd 0:0:0:1: [sdb] Synchronizing SCSI cache
> >> [   12.868894]sd 0:0:0:2: [sdc] Synchronizing SCSI cache
> >> [   13.124331]sd 0:0:0:0: [sda] Synchronizing SCSI cache
> >> [   13.143961]sd 0:0:0:3: [sdd] Synchronizing SCSI cache
> >> [   13.163876]sd 0:0:0:6: [sdg] Synchronizing SCSI cache
> >> [   13.164024]sd 0:0:0:4: [sde] Synchronizing SCSI cache
> >> [   13.167066]sd 0:0:0:5: [sdf] Synchronizing SCSI cache
> >> [   17.101285]sd 0:0:0:7: [sdh] Synchronizing SCSI cache
> >> [   73.889551]sd 0:0:0:4: [sde] Synchronizing SCSI cache
> >>
> >> I'm not sure if there's a way to force the wlun to suspend only after all other
> luns are suspended.
> >> Is there? I hope Bart/others help provide some inputs on this.
> >
> >I don't know what would work best for you; it depends on how the LUNs
> >are used.  But one possibility is to make sure that whenever the boot
> >and rpmb wluns are resumed, the device wlun is also resumed.  So for
> >example, the runtime-resume callback routines for the rpmb and boot
> >wluns could call pm_runtime_get_sync() for the device wlun, and their
> >runtime-suspend callback routines could call pm_runtime_put() for the
> >device wlun.  And of course there would have to be appropriate
> >operations when those LUNs are bound to and unbound from their drivers.
> >
> >Alan Stern
> >
> Thanks Alan.
> CanG & I had some discussions on it as well the other day.
> I'm now looking into creating a device link between the siblings.
> e.g. make the device wlun as a supplier for all the other luns & wluns.
> So device wlun (supplier) wouldn't suspend (runtime/system) until all of the
> other
> consumers are suspended. After this linking, I can move all the
> pm commands that are being sent by host to the dedicated suspend routine of
> the device
> wlun and the host needn't send any cmds during its suspend and layering
> violation wouldn't take place.
Regardless of your above proposal, as for the issues you were witnessing with rpmb,
That started this RFC in the first place, and the whole clearing uac series for that matter:
 "In order to conduct FFU or RPMB operations, UFS needs to clear UNIT ATTENTION condition...."

Functionally, This was already done for the device wlun, and only added the rpmb wlun.

Now you are trying to solve stuff because the rpmb is not provisioned.
a) There should be no relation between response to request-sense command,
 and if the key is programmed or not. And 
b) rpmb is accessed from user-space.  If it is not provisioned, it should processed the error (-7)
    and realize that by itself.  And also, It only makes sense that if needed,
    the access sequence will include  the request-sense command.

Therefore, IMHO, just reverting Randall commit (1918651f2d7e) and fixing the user-space code
Should suffice.

Thanks,
Avri


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v2 0/2] Fix deadlock in ufs
  2021-02-05  7:56               ` Avri Altman
@ 2021-02-05 16:11                 ` Asutosh Das
  2021-02-06  2:37                   ` Asutosh Das
  2021-02-06 19:24                   ` Avri Altman
  0 siblings, 2 replies; 31+ messages in thread
From: Asutosh Das @ 2021-02-05 16:11 UTC (permalink / raw)
  To: Avri Altman
  Cc: Alan Stern, cang, martin.petersen, Bart Van Assche,
	linux-arm-msm, linux-scsi

On Fri, Feb 05 2021 at 23:56 -0800, Avri Altman wrote:
>>
>> On Thu, Feb 04 2021 at 11:48 -0800, Alan Stern wrote:
>> >On Wed, Feb 03, 2021 at 04:13:54PM -0800, Asutosh Das wrote:
>> >> Thanks Alan.
>> >> I understand the issues with the current ufs design.
>> >>
>> >> ufs has a wlun (well-known lun) that handles power management
>> commands,
>> >> such as SSUs. Now this wlun (device wlun) is registered as a scsi_device.
>> >> It's queue is also set up for runtime-pm. Likewise there're 2
>> >> more wluns, BOOT and RPMB.
>> >>
>> >> Currently, the scsi devices independently runtime suspend/resume - request
>> driven.
>> >> So to send SSU while suspending wlun (scsi_device) this scsi device should
>> >> be the last to be runtime suspended amongst all other ufs luns (scsi devices).
>> The
>> >> reason is syncronize_cache() is sent to luns during their suspend and if SSU
>> has
>> >> been sent already, it mostly would fail.
>> >
>> >The SCSI subsystem assumes that different LUNs operate independently.
>> >Evidently that isn't true here.
>> >
>> >> Perhaps that's the reason to send SSU during platform-device suspend. I'm
>> not
>> >> sure if that's the right thing to do, but that's what it is now and is causing
>> >> this deadlock.
>> >> Now this wlun is also registered to bsg and some applications interact with
>> rpmb
>> >> wlun and the device-wlun using that interface. Registering the
>> corresponding
>> >> queues to runtime-pm ensures that the whole path is resumed before the
>> request
>> >> is issued.
>> >> Because, we see this deadlock, in the RFC patch, I skipped registering the
>> >> queues representing the wluns to runtime-pm, thus removing the
>> restrictions to
>> >> issue the request until queue is resumed.
>> >> But when the requests come-in via bsg, the device has to be resumed. Hence
>> the
>> >> get_sync()/put_sync() in bsg driver.
>> >
>> >Does the bsg interface send its I/O requests to the LUNs through the
>> >block request queue?
>> >
>> >
>> >> The reason for initiating get_sync()/put_sync() on the parent device was
>> because
>> >> the corresponding queue of this wlun was not setup for runtime-pm
>> anymore.
>> >> And for ufs resuming the scsi device essentially means sending a SSU to wlun
>> >> which the ufs platform device does in its runtime resume now. I'm not sure
>> if
>> >> that was a good idea though, hence the RFC on the patches.
>> >>
>> >> And now it looks to me that adding a cb to sd_suspend_runtime may not
>> work.
>> >> Because the scsi devices suspend asynchronously and the wlun suspends
>> earlier than the others.
>> >>
>> >> [    7.846165]scsi 0:0:0:49488: scsi_runtime_idle
>> >> [    7.851547]scsi 0:0:0:49488: device wlun
>> >> [    7.851809]sd 0:0:0:49488: scsi_runtime_idle
>> >> [    7.861536]sd 0:0:0:49488: scsi_runtime_suspend < suspends prior to other
>> luns
>> >> [...]
>> >> [   12.861984]sd 0:0:0:1: [sdb] Synchronizing SCSI cache
>> >> [   12.868894]sd 0:0:0:2: [sdc] Synchronizing SCSI cache
>> >> [   13.124331]sd 0:0:0:0: [sda] Synchronizing SCSI cache
>> >> [   13.143961]sd 0:0:0:3: [sdd] Synchronizing SCSI cache
>> >> [   13.163876]sd 0:0:0:6: [sdg] Synchronizing SCSI cache
>> >> [   13.164024]sd 0:0:0:4: [sde] Synchronizing SCSI cache
>> >> [   13.167066]sd 0:0:0:5: [sdf] Synchronizing SCSI cache
>> >> [   17.101285]sd 0:0:0:7: [sdh] Synchronizing SCSI cache
>> >> [   73.889551]sd 0:0:0:4: [sde] Synchronizing SCSI cache
>> >>
>> >> I'm not sure if there's a way to force the wlun to suspend only after all other
>> luns are suspended.
>> >> Is there? I hope Bart/others help provide some inputs on this.
>> >
>> >I don't know what would work best for you; it depends on how the LUNs
>> >are used.  But one possibility is to make sure that whenever the boot
>> >and rpmb wluns are resumed, the device wlun is also resumed.  So for
>> >example, the runtime-resume callback routines for the rpmb and boot
>> >wluns could call pm_runtime_get_sync() for the device wlun, and their
>> >runtime-suspend callback routines could call pm_runtime_put() for the
>> >device wlun.  And of course there would have to be appropriate
>> >operations when those LUNs are bound to and unbound from their drivers.
>> >
>> >Alan Stern
>> >
>> Thanks Alan.
>> CanG & I had some discussions on it as well the other day.
>> I'm now looking into creating a device link between the siblings.
>> e.g. make the device wlun as a supplier for all the other luns & wluns.
>> So device wlun (supplier) wouldn't suspend (runtime/system) until all of the
>> other
>> consumers are suspended. After this linking, I can move all the
>> pm commands that are being sent by host to the dedicated suspend routine of
>> the device
>> wlun and the host needn't send any cmds during its suspend and layering
>> violation wouldn't take place.
>Regardless of your above proposal, as for the issues you were witnessing with rpmb,
>That started this RFC in the first place, and the whole clearing uac series for that matter:
> "In order to conduct FFU or RPMB operations, UFS needs to clear UNIT ATTENTION condition...."
>
>Functionally, This was already done for the device wlun, and only added the rpmb wlun.
>
>Now you are trying to solve stuff because the rpmb is not provisioned.
>a) There should be no relation between response to request-sense command,
> and if the key is programmed or not. And
>b) rpmb is accessed from user-space.  If it is not provisioned, it should processed the error (-7)
>    and realize that by itself.  And also, It only makes sense that if needed,
>    the access sequence will include  the request-sense command.
>
>Therefore, IMHO, just reverting Randall commit (1918651f2d7e) and fixing the user-space code
>Should suffice.
>
>Thanks,
>Avri
>
Hi Avri

Thanks.

I don't think reverting 1918651f2d7e would fix this.

[   12.182750] ufshcd-qcom 1d84000.ufshc: ufshcd_suspend: Setting power mode
[   12.190467] ufshcd-qcom 1d84000.ufshc: wlun_dev_clr_ua: 0 <-- uac wasn't sent
[   12.196735] ufshcd-qcom 1d84000.ufshc: Sending ssu
[   12.202412] scsi 0:0:0:49488: Queue rpm status b4 ssu: 2 <- sdev_ufs_device queue is suspended
[   12.208613] ufshcd-qcom 1d84000.ufshc: Wait for resume - <-- deadlock!

The issue is sending any command to any lun which is registered for blk
runtime-pm in ufs host's suspend path would deadlock; since, it'd try to resume
the ufs host in the same suspend calling sequence.

-asd

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v2 0/2] Fix deadlock in ufs
  2021-02-05 16:11                 ` Asutosh Das
@ 2021-02-06  2:37                   ` Asutosh Das
  2021-02-06 19:24                   ` Avri Altman
  1 sibling, 0 replies; 31+ messages in thread
From: Asutosh Das @ 2021-02-06  2:37 UTC (permalink / raw)
  To: Avri Altman
  Cc: Alan Stern, cang, martin.petersen, Bart Van Assche,
	linux-arm-msm, linux-scsi

On Fri, Feb 05 2021 at 14:19 -0800, Asutosh Das wrote:
>On Fri, Feb 05 2021 at 23:56 -0800, Avri Altman wrote:
>>>
>>>On Thu, Feb 04 2021 at 11:48 -0800, Alan Stern wrote:
>>>>On Wed, Feb 03, 2021 at 04:13:54PM -0800, Asutosh Das wrote:
>>>>> Thanks Alan.
>>>>> I understand the issues with the current ufs design.
>>>>>
>>>>> ufs has a wlun (well-known lun) that handles power management
>>>commands,
>>>>> such as SSUs. Now this wlun (device wlun) is registered as a scsi_device.
>>>>> It's queue is also set up for runtime-pm. Likewise there're 2
>>>>> more wluns, BOOT and RPMB.
>>>>>
>>>>> Currently, the scsi devices independently runtime suspend/resume - request
>>>driven.
>>>>> So to send SSU while suspending wlun (scsi_device) this scsi device should
>>>>> be the last to be runtime suspended amongst all other ufs luns (scsi devices).
>>>The
>>>>> reason is syncronize_cache() is sent to luns during their suspend and if SSU
>>>has
>>>>> been sent already, it mostly would fail.
>>>>
>>>>The SCSI subsystem assumes that different LUNs operate independently.
>>>>Evidently that isn't true here.
>>>>
>>>>> Perhaps that's the reason to send SSU during platform-device suspend. I'm
>>>not
>>>>> sure if that's the right thing to do, but that's what it is now and is causing
>>>>> this deadlock.
>>>>> Now this wlun is also registered to bsg and some applications interact with
>>>rpmb
>>>>> wlun and the device-wlun using that interface. Registering the
>>>corresponding
>>>>> queues to runtime-pm ensures that the whole path is resumed before the
>>>request
>>>>> is issued.
>>>>> Because, we see this deadlock, in the RFC patch, I skipped registering the
>>>>> queues representing the wluns to runtime-pm, thus removing the
>>>restrictions to
>>>>> issue the request until queue is resumed.
>>>>> But when the requests come-in via bsg, the device has to be resumed. Hence
>>>the
>>>>> get_sync()/put_sync() in bsg driver.
>>>>
>>>>Does the bsg interface send its I/O requests to the LUNs through the
>>>>block request queue?
>>>>
>>>>
>>>>> The reason for initiating get_sync()/put_sync() on the parent device was
>>>because
>>>>> the corresponding queue of this wlun was not setup for runtime-pm
>>>anymore.
>>>>> And for ufs resuming the scsi device essentially means sending a SSU to wlun
>>>>> which the ufs platform device does in its runtime resume now. I'm not sure
>>>if
>>>>> that was a good idea though, hence the RFC on the patches.
>>>>>
>>>>> And now it looks to me that adding a cb to sd_suspend_runtime may not
>>>work.
>>>>> Because the scsi devices suspend asynchronously and the wlun suspends
>>>earlier than the others.
>>>>>
>>>>> [    7.846165]scsi 0:0:0:49488: scsi_runtime_idle
>>>>> [    7.851547]scsi 0:0:0:49488: device wlun
>>>>> [    7.851809]sd 0:0:0:49488: scsi_runtime_idle
>>>>> [    7.861536]sd 0:0:0:49488: scsi_runtime_suspend < suspends prior to other
>>>luns
>>>>> [...]
>>>>> [   12.861984]sd 0:0:0:1: [sdb] Synchronizing SCSI cache
>>>>> [   12.868894]sd 0:0:0:2: [sdc] Synchronizing SCSI cache
>>>>> [   13.124331]sd 0:0:0:0: [sda] Synchronizing SCSI cache
>>>>> [   13.143961]sd 0:0:0:3: [sdd] Synchronizing SCSI cache
>>>>> [   13.163876]sd 0:0:0:6: [sdg] Synchronizing SCSI cache
>>>>> [   13.164024]sd 0:0:0:4: [sde] Synchronizing SCSI cache
>>>>> [   13.167066]sd 0:0:0:5: [sdf] Synchronizing SCSI cache
>>>>> [   17.101285]sd 0:0:0:7: [sdh] Synchronizing SCSI cache
>>>>> [   73.889551]sd 0:0:0:4: [sde] Synchronizing SCSI cache
>>>>>
>>>>> I'm not sure if there's a way to force the wlun to suspend only after all other
>>>luns are suspended.
>>>>> Is there? I hope Bart/others help provide some inputs on this.
>>>>
>>>>I don't know what would work best for you; it depends on how the LUNs
>>>>are used.  But one possibility is to make sure that whenever the boot
>>>>and rpmb wluns are resumed, the device wlun is also resumed.  So for
>>>>example, the runtime-resume callback routines for the rpmb and boot
>>>>wluns could call pm_runtime_get_sync() for the device wlun, and their
>>>>runtime-suspend callback routines could call pm_runtime_put() for the
>>>>device wlun.  And of course there would have to be appropriate
>>>>operations when those LUNs are bound to and unbound from their drivers.
>>>>
>>>>Alan Stern
>>>>
>>>Thanks Alan.
>>>CanG & I had some discussions on it as well the other day.
>>>I'm now looking into creating a device link between the siblings.
>>>e.g. make the device wlun as a supplier for all the other luns & wluns.
>>>So device wlun (supplier) wouldn't suspend (runtime/system) until all of the
>>>other
>>>consumers are suspended. After this linking, I can move all the
>>>pm commands that are being sent by host to the dedicated suspend routine of
>>>the device
>>>wlun and the host needn't send any cmds during its suspend and layering
>>>violation wouldn't take place.
>>Regardless of your above proposal, as for the issues you were witnessing with rpmb,
>>That started this RFC in the first place, and the whole clearing uac series for that matter:
>>"In order to conduct FFU or RPMB operations, UFS needs to clear UNIT ATTENTION condition...."
>>
>>Functionally, This was already done for the device wlun, and only added the rpmb wlun.
>>
>>Now you are trying to solve stuff because the rpmb is not provisioned.
>>a) There should be no relation between response to request-sense command,
>>and if the key is programmed or not. And
>>b) rpmb is accessed from user-space.  If it is not provisioned, it should processed the error (-7)
>>   and realize that by itself.  And also, It only makes sense that if needed,
>>   the access sequence will include  the request-sense command.
>>
>>Therefore, IMHO, just reverting Randall commit (1918651f2d7e) and fixing the user-space code
>>Should suffice.
>>
>>Thanks,
>>Avri
>>
>Hi Avri
>
>Thanks.
>
>I don't think reverting 1918651f2d7e would fix this.
>
>[   12.182750] ufshcd-qcom 1d84000.ufshc: ufshcd_suspend: Setting power mode
>[   12.190467] ufshcd-qcom 1d84000.ufshc: wlun_dev_clr_ua: 0 <-- uac wasn't sent
>[   12.196735] ufshcd-qcom 1d84000.ufshc: Sending ssu
>[   12.202412] scsi 0:0:0:49488: Queue rpm status b4 ssu: 2 <- sdev_ufs_device queue is suspended
>[   12.208613] ufshcd-qcom 1d84000.ufshc: Wait for resume - <-- deadlock!
>
>The issue is sending any command to any lun which is registered for blk
>runtime-pm in ufs host's suspend path would deadlock; since, it'd try to resume
>the ufs host in the same suspend calling sequence.
>
>-asd
>

I coded it up as per the device linking proposal and it appears to be resolving
the issue. I'm testing it now and will clean it up & post a RFC sometime next week.

-asd


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH v2 0/2] Fix deadlock in ufs
  2021-02-05 16:11                 ` Asutosh Das
  2021-02-06  2:37                   ` Asutosh Das
@ 2021-02-06 19:24                   ` Avri Altman
  2021-02-08 16:24                     ` Asutosh Das
  1 sibling, 1 reply; 31+ messages in thread
From: Avri Altman @ 2021-02-06 19:24 UTC (permalink / raw)
  To: Asutosh Das
  Cc: Alan Stern, cang, martin.petersen, Bart Van Assche,
	linux-arm-msm, linux-scsi

> >Regardless of your above proposal, as for the issues you were witnessing with
> rpmb,
> >That started this RFC in the first place, and the whole clearing uac series for
> that matter:
> > "In order to conduct FFU or RPMB operations, UFS needs to clear UNIT
> ATTENTION condition...."
> >
> >Functionally, This was already done for the device wlun, and only added the
> rpmb wlun.
> >
> >Now you are trying to solve stuff because the rpmb is not provisioned.
> >a) There should be no relation between response to request-sense command,
> > and if the key is programmed or not. And
> >b) rpmb is accessed from user-space.  If it is not provisioned, it should
> processed the error (-7)
> >    and realize that by itself.  And also, It only makes sense that if needed,
> >    the access sequence will include  the request-sense command.
> >
> >Therefore, IMHO, just reverting Randall commit (1918651f2d7e) and fixing
> the user-space code
> >Should suffice.
> >
> >Thanks,
> >Avri
> >
> Hi Avri
> 
> Thanks.
> 
> I don't think reverting 1918651f2d7e would fix this.
> 
> [   12.182750] ufshcd-qcom 1d84000.ufshc: ufshcd_suspend: Setting power
> mode
> [   12.190467] ufshcd-qcom 1d84000.ufshc: wlun_dev_clr_ua: 0 <-- uac wasn't
> sent
> [   12.196735] ufshcd-qcom 1d84000.ufshc: Sending ssu
> [   12.202412] scsi 0:0:0:49488: Queue rpm status b4 ssu: 2 <- sdev_ufs_device
> queue is suspended
> [   12.208613] ufshcd-qcom 1d84000.ufshc: Wait for resume - <-- deadlock!
> 
> The issue is sending any command to any lun which is registered for blk
> runtime-pm in ufs host's suspend path would deadlock; since, it'd try to resume
> the ufs host in the same suspend calling sequence.
Did you managed to bisect the commit that caused the regression?
Is it in the series that Bart referred to?

Thanks,
Avri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 1/2] block: bsg: resume scsi device before accessing
  2021-01-27  4:00 ` [RFC PATCH v1 1/2] block: bsg: resume scsi device before accessing Asutosh Das
  2021-01-27  7:09   ` Avri Altman
@ 2021-02-07  2:23   ` Bart Van Assche
  1 sibling, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2021-02-07  2:23 UTC (permalink / raw)
  To: Asutosh Das, cang, martin.petersen, linux-scsi
  Cc: linux-arm-msm, stern, Bao D . Nguyen, FUJITA Tomonori,
	Jens Axboe, open list:BLOCK LAYER, open list

On 1/26/21 8:00 PM, Asutosh Das wrote:
> Resumes the scsi device before accessing it.
> 
> Change-Id: I2929af60f2a92c89704a582fcdb285d35b429fde
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Can Guo <cang@codeaurora.org>
> Signed-off-by: Bao D. Nguyen <nguyenb@codeaurora.org>

No Change-Id tags in upstream patches please.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v2 0/2] Fix deadlock in ufs
  2021-02-06 19:24                   ` Avri Altman
@ 2021-02-08 16:24                     ` Asutosh Das
  0 siblings, 0 replies; 31+ messages in thread
From: Asutosh Das @ 2021-02-08 16:24 UTC (permalink / raw)
  To: Avri Altman
  Cc: Alan Stern, cang, martin.petersen, Bart Van Assche,
	linux-arm-msm, linux-scsi

On Sat, Feb 06 2021 at 11:24 -0800, Avri Altman wrote:
>> >Regardless of your above proposal, as for the issues you were witnessing with
>> rpmb,
>> >That started this RFC in the first place, and the whole clearing uac series for
>> that matter:
>> > "In order to conduct FFU or RPMB operations, UFS needs to clear UNIT
>> ATTENTION condition...."
>> >
>> >Functionally, This was already done for the device wlun, and only added the
>> rpmb wlun.
>> >
>> >Now you are trying to solve stuff because the rpmb is not provisioned.
>> >a) There should be no relation between response to request-sense command,
>> > and if the key is programmed or not. And
>> >b) rpmb is accessed from user-space.  If it is not provisioned, it should
>> processed the error (-7)
>> >    and realize that by itself.  And also, It only makes sense that if needed,
>> >    the access sequence will include  the request-sense command.
>> >
>> >Therefore, IMHO, just reverting Randall commit (1918651f2d7e) and fixing
>> the user-space code
>> >Should suffice.
>> >
>> >Thanks,
>> >Avri
>> >
>> Hi Avri
>>
>> Thanks.
>>
>> I don't think reverting 1918651f2d7e would fix this.
>>
>> [   12.182750] ufshcd-qcom 1d84000.ufshc: ufshcd_suspend: Setting power
>> mode
>> [   12.190467] ufshcd-qcom 1d84000.ufshc: wlun_dev_clr_ua: 0 <-- uac wasn't
>> sent
>> [   12.196735] ufshcd-qcom 1d84000.ufshc: Sending ssu
>> [   12.202412] scsi 0:0:0:49488: Queue rpm status b4 ssu: 2 <- sdev_ufs_device
>> queue is suspended
>> [   12.208613] ufshcd-qcom 1d84000.ufshc: Wait for resume - <-- deadlock!
>>
>> The issue is sending any command to any lun which is registered for blk
>> runtime-pm in ufs host's suspend path would deadlock; since, it'd try to resume
>> the ufs host in the same suspend calling sequence.
>Did you managed to bisect the commit that caused the regression?
No - I didn't bisect.

>Is it in the series that Bart referred to?
>
Yes - the debug points to that.

>Thanks,
>Avri

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-02-08 16:25 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-27  4:00 [RFC PATCH v1 0/2] Fix deadlock in ufs Asutosh Das
2021-01-27  4:00 ` [RFC PATCH v1 1/2] block: bsg: resume scsi device before accessing Asutosh Das
2021-01-27  7:09   ` Avri Altman
2021-01-27  7:59     ` Can Guo
2021-01-27  8:53       ` Can Guo
2021-02-07  2:23   ` Bart Van Assche
2021-01-27  4:00 ` [RFC PATCH v1 2/2] scsi: ufs: Fix deadlock while suspending ufs host Asutosh Das
2021-01-27 15:22 ` [RFC PATCH v1 0/2] Fix deadlock in ufs Bjorn Andersson
2021-01-27 16:16   ` Asutosh Das
2021-01-27 19:36     ` Bjorn Andersson
2021-01-28  2:47       ` Can Guo
2021-01-28  3:26 ` [RFC PATCH v2 " Asutosh Das
2021-01-28  3:26 ` [RFC PATCH v2 1/2] block: bsg: resume platform device before accessing Asutosh Das
2021-01-28  3:26   ` [RFC PATCH v2 2/2] scsi: ufs: Fix deadlock while suspending ufs host Asutosh Das
2021-01-28 12:21     ` Avri Altman
2021-01-28 16:39       ` Asutosh Das
2021-01-28  9:33 ` [RFC PATCH v2 0/2] Fix deadlock in ufs Avri Altman
2021-01-28 17:19   ` Asutosh Das
2021-02-01 20:11 ` Asutosh Das (asd)
2021-02-01 20:27   ` Bart Van Assche
2021-02-01 21:48   ` Alan Stern
2021-02-02 20:52     ` Asutosh Das
2021-02-02 22:05       ` Alan Stern
2021-02-04  0:13         ` Asutosh Das
2021-02-04 19:48           ` Alan Stern
2021-02-04 21:14             ` Asutosh Das
2021-02-05  7:56               ` Avri Altman
2021-02-05 16:11                 ` Asutosh Das
2021-02-06  2:37                   ` Asutosh Das
2021-02-06 19:24                   ` Avri Altman
2021-02-08 16:24                     ` Asutosh Das

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).