All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress
@ 2022-09-15 11:58 peter.wang
  2022-09-16 21:39 ` Bart Van Assche
  0 siblings, 1 reply; 6+ messages in thread
From: peter.wang @ 2022-09-15 11:58 UTC (permalink / raw)
  To: stanley.chu, linux-scsi, martin.petersen, avri.altman, alim.akhtar, jejb
  Cc: wsd_upstream, linux-mediatek, peter.wang, chun-hung.wu,
	alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao,
	qilin.tan, lin.gui

From: Peter Wang <peter.wang@mediatek.com>

If error happened in rpm flow, get rpm will stuck because rpm is
suspending or resuming. And it cause IO hang.
This patch bypass get rpm when err handling with pm_op_in_progress.

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
 drivers/ufs/core/ufshcd.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index a202d7d5240d..cc58fb585df2 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -6086,9 +6086,13 @@ static void ufshcd_clk_scaling_suspend(struct ufs_hba *hba, bool suspend)
 	}
 }
 
-static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
+static void ufshcd_err_handling_prepare(struct ufs_hba *hba, bool *rpm_put)
 {
-	ufshcd_rpm_get_sync(hba);
+	if (!hba->pm_op_in_progress) {
+		ufshcd_rpm_get_sync(hba);
+		*rpm_put = true;
+	}
+
 	if (pm_runtime_status_suspended(&hba->ufs_device_wlun->sdev_gendev) ||
 	    hba->is_sys_suspended) {
 		enum ufs_pm_op pm_op;
@@ -6122,13 +6126,14 @@ static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
 	cancel_work_sync(&hba->eeh_work);
 }
 
-static void ufshcd_err_handling_unprepare(struct ufs_hba *hba)
+static void ufshcd_err_handling_unprepare(struct ufs_hba *hba, bool rpm_put)
 {
 	ufshcd_scsi_unblock_requests(hba);
 	ufshcd_release(hba);
 	if (ufshcd_is_clkscaling_supported(hba))
 		ufshcd_clk_scaling_suspend(hba, false);
-	ufshcd_rpm_put(hba);
+	if (rpm_put)
+		ufshcd_rpm_put(hba);
 }
 
 static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba)
@@ -6210,6 +6215,7 @@ static void ufshcd_err_handler(struct work_struct *work)
 	bool err_tm;
 	int pmc_err;
 	int tag;
+	bool rpm_put = false;
 
 	hba = container_of(work, struct ufs_hba, eh_work);
 
@@ -6231,7 +6237,7 @@ static void ufshcd_err_handler(struct work_struct *work)
 	}
 	ufshcd_set_eh_in_progress(hba);
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
-	ufshcd_err_handling_prepare(hba);
+	ufshcd_err_handling_prepare(hba, &rpm_put);
 	/* Complete requests that have door-bell cleared by h/w */
 	ufshcd_complete_requests(hba);
 	spin_lock_irqsave(hba->host->host_lock, flags);
@@ -6394,7 +6400,7 @@ static void ufshcd_err_handler(struct work_struct *work)
 	}
 	ufshcd_clear_eh_in_progress(hba);
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
-	ufshcd_err_handling_unprepare(hba);
+	ufshcd_err_handling_unprepare(hba, rpm_put);
 	up(&hba->host_sem);
 
 	dev_info(hba->dev, "%s finished; HBA state %s\n", __func__,
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress
  2022-09-15 11:58 [PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress peter.wang
@ 2022-09-16 21:39 ` Bart Van Assche
  2022-09-19 14:47   ` Peter Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2022-09-16 21:39 UTC (permalink / raw)
  To: peter.wang, stanley.chu, linux-scsi, martin.petersen,
	avri.altman, alim.akhtar, jejb
  Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
	chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui

On 9/15/22 04:58, peter.wang@mediatek.com wrote:
> -static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
> +static void ufshcd_err_handling_prepare(struct ufs_hba *hba, bool *rpm_put)
>   {
> -	ufshcd_rpm_get_sync(hba);
> +	if (!hba->pm_op_in_progress) {
> +		ufshcd_rpm_get_sync(hba);
> +		*rpm_put = true;
> +	}
> +

Hi Peter,

I don't think that this patch is sufficient. If 
ufshcd_err_handling_prepare() is called by the host reset handler 
(ufshcd_eh_host_reset_handler()) then the host state will be 
SHOST_RECOVERY. In that state SCSI command submission will hang and 
hence any ufshcd_rpm_get_sync() call will hang.

How about removing the ufshcd_rpm_get_sync() call from 
ufshcd_err_handling_prepare() and the ufshcd_rpm_put() call from 
ufshcd_err_handling_unprepare()? It is guaranteed that no commands are 
in progress for a runtime suspended LUN so the code for aborting pending 
requests in the UFS error handler will be skipped anyway if it is 
invoked for a runtime suspended device.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress
  2022-09-16 21:39 ` Bart Van Assche
@ 2022-09-19 14:47   ` Peter Wang
  2022-09-19 16:25     ` Bart Van Assche
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Wang @ 2022-09-19 14:47 UTC (permalink / raw)
  To: Bart Van Assche, stanley.chu, linux-scsi, martin.petersen,
	avri.altman, alim.akhtar, jejb
  Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
	chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui


On 9/17/22 05:39, Bart Van Assche wrote:
> On 9/15/22 04:58, peter.wang@mediatek.com wrote:
>> -static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
>> +static void ufshcd_err_handling_prepare(struct ufs_hba *hba, bool 
>> *rpm_put)
>>   {
>> -    ufshcd_rpm_get_sync(hba);
>> +    if (!hba->pm_op_in_progress) {
>> +        ufshcd_rpm_get_sync(hba);
>> +        *rpm_put = true;
>> +    }
>> +
>
> Hi Peter,
>
> I don't think that this patch is sufficient. If 
> ufshcd_err_handling_prepare() is called by the host reset handler 
> (ufshcd_eh_host_reset_handler()) then the host state will be 
> SHOST_RECOVERY. In that state SCSI command submission will hang and 
> hence any ufshcd_rpm_get_sync() call will hang.
>
> How about removing the ufshcd_rpm_get_sync() call from 
> ufshcd_err_handling_prepare() and the ufshcd_rpm_put() call from 
> ufshcd_err_handling_unprepare()? It is guaranteed that no commands are 
> in progress for a runtime suspended LUN so the code for aborting 
> pending requests in the UFS error handler will be skipped anyway if it 
> is invoked for a runtime suspended device.
>
> Thanks,
>
> Bart.

Hi Bart,

If the scsi error happened and need do ufshcd_eh_host_reset_handler, the 
rpm state should in RPM_ACTIVE.
Because scsi need wakeup suspended LUN, and send command to LUN then get 
error, right?
So, ufshcd_rpm_get_sync should not hang in this case.

If remove ufshcd_rpm_get_sync directly, think about this case.
ufshcd_err_handler is on going and try to abort some task (which may get 
stuck and timeout too).
Then rpm count down try to suspend. Finally runtime suspend callback may 
return IO error and IO hang.

Thanks.
Peter



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress
  2022-09-19 14:47   ` Peter Wang
@ 2022-09-19 16:25     ` Bart Van Assche
  2022-09-20  2:00       ` Peter Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2022-09-19 16:25 UTC (permalink / raw)
  To: Peter Wang, stanley.chu, linux-scsi, martin.petersen,
	avri.altman, alim.akhtar, jejb
  Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
	chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui

On 9/19/22 07:47, Peter Wang wrote:
> If the scsi error happened and need do ufshcd_eh_host_reset_handler, the 
> rpm state should in RPM_ACTIVE.
> Because scsi need wakeup suspended LUN, and send command to LUN then get 
> error, right?

The following sequence may activate the SCSI error handler while the RPM 
state is RPM_RESUMING:
* The RPM state is RPM_SUSPENDED.
* The RPM state is changed into RPM_RESUMING and ufshcd_wl_resume() is 
called.
* ufshcd_set_dev_pwr_mode() calls scsi_execute() and the START STOP UNIT 
command times out.
* Because of this timeout the SCSI error handler is activated.

> If remove ufshcd_rpm_get_sync directly, think about this case.
> ufshcd_err_handler is on going and try to abort some task (which may get 
> stuck and timeout too).
> Then rpm count down try to suspend. Finally runtime suspend callback may 
> return IO error and IO hang.

Hmm ... suspending a UFS device involves calling ufshcd_wl_shutdown(), 
ufshcd_set_dev_pwr_mode() and scsi_execute(). scsi_execute() is 
serialized against the UFS error handler because the latter calls 
ufshcd_scsi_block_requests().

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress
  2022-09-19 16:25     ` Bart Van Assche
@ 2022-09-20  2:00       ` Peter Wang
  2022-09-20 18:25         ` Bart Van Assche
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Wang @ 2022-09-20  2:00 UTC (permalink / raw)
  To: Bart Van Assche, stanley.chu, linux-scsi, martin.petersen,
	avri.altman, alim.akhtar, jejb
  Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
	chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui


On 9/20/22 00:25, Bart Van Assche wrote:
> On 9/19/22 07:47, Peter Wang wrote:
>> If the scsi error happened and need do ufshcd_eh_host_reset_handler, 
>> the rpm state should in RPM_ACTIVE.
>> Because scsi need wakeup suspended LUN, and send command to LUN then 
>> get error, right?
>
> The following sequence may activate the SCSI error handler while the 
> RPM state is RPM_RESUMING:
> * The RPM state is RPM_SUSPENDED.
> * The RPM state is changed into RPM_RESUMING and ufshcd_wl_resume() is 
> called.
> * ufshcd_set_dev_pwr_mode() calls scsi_execute() and the START STOP 
> UNIT command times out.
> * Because of this timeout the SCSI error handler is activated.

This case will not get rpm, because pm_op_in_progress is true.

So it won't hang with ufshcd_rpm_get_sync.


Thanks

Peter



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress
  2022-09-20  2:00       ` Peter Wang
@ 2022-09-20 18:25         ` Bart Van Assche
  0 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2022-09-20 18:25 UTC (permalink / raw)
  To: Peter Wang, stanley.chu, linux-scsi, martin.petersen,
	avri.altman, alim.akhtar, jejb
  Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
	chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui

On 9/19/22 19:00, Peter Wang wrote:
> 
> On 9/20/22 00:25, Bart Van Assche wrote:
>> On 9/19/22 07:47, Peter Wang wrote:
>>> If the scsi error happened and need do ufshcd_eh_host_reset_handler, 
>>> the rpm state should in RPM_ACTIVE.
>>> Because scsi need wakeup suspended LUN, and send command to LUN then 
>>> get error, right?
>>
>> The following sequence may activate the SCSI error handler while the 
>> RPM state is RPM_RESUMING:
>> * The RPM state is RPM_SUSPENDED.
>> * The RPM state is changed into RPM_RESUMING and ufshcd_wl_resume() is 
>> called.
>> * ufshcd_set_dev_pwr_mode() calls scsi_execute() and the START STOP 
>> UNIT command times out.
>> * Because of this timeout the SCSI error handler is activated.
> 
> This case will not get rpm, because pm_op_in_progress is true.
> 
> So it won't hang with ufshcd_rpm_get_sync.

Right, but I think the following scenario will result in a hang:
* The RPM state is changed from RPM_SUSPENDED into RPM_RESUMING and
   ufshcd_wl_resume() has not yet been called.
* ufshcd_eh_host_reset_handler() queues ufshcd_err_handler() and the
   latter function calls ufshcd_rpm_get_sync().
* This results in a deadlock: the scsi_execute() call by
   ufshcd_wl_resume() cannot make progress because the SCSI host state is
   SHOST_RECOVERY and the error handler cannot make progress because it
   keeps waiting until ufshcd_rpm_get_sync() has finished.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-09-20 18:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-15 11:58 [PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress peter.wang
2022-09-16 21:39 ` Bart Van Assche
2022-09-19 14:47   ` Peter Wang
2022-09-19 16:25     ` Bart Van Assche
2022-09-20  2:00       ` Peter Wang
2022-09-20 18:25         ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.