* [PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress
@ 2022-09-15 11:58 peter.wang
2022-09-16 21:39 ` Bart Van Assche
0 siblings, 1 reply; 6+ messages in thread
From: peter.wang @ 2022-09-15 11:58 UTC (permalink / raw)
To: stanley.chu, linux-scsi, martin.petersen, avri.altman, alim.akhtar, jejb
Cc: wsd_upstream, linux-mediatek, peter.wang, chun-hung.wu,
alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao,
qilin.tan, lin.gui
From: Peter Wang <peter.wang@mediatek.com>
If error happened in rpm flow, get rpm will stuck because rpm is
suspending or resuming. And it cause IO hang.
This patch bypass get rpm when err handling with pm_op_in_progress.
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
drivers/ufs/core/ufshcd.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index a202d7d5240d..cc58fb585df2 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -6086,9 +6086,13 @@ static void ufshcd_clk_scaling_suspend(struct ufs_hba *hba, bool suspend)
}
}
-static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
+static void ufshcd_err_handling_prepare(struct ufs_hba *hba, bool *rpm_put)
{
- ufshcd_rpm_get_sync(hba);
+ if (!hba->pm_op_in_progress) {
+ ufshcd_rpm_get_sync(hba);
+ *rpm_put = true;
+ }
+
if (pm_runtime_status_suspended(&hba->ufs_device_wlun->sdev_gendev) ||
hba->is_sys_suspended) {
enum ufs_pm_op pm_op;
@@ -6122,13 +6126,14 @@ static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
cancel_work_sync(&hba->eeh_work);
}
-static void ufshcd_err_handling_unprepare(struct ufs_hba *hba)
+static void ufshcd_err_handling_unprepare(struct ufs_hba *hba, bool rpm_put)
{
ufshcd_scsi_unblock_requests(hba);
ufshcd_release(hba);
if (ufshcd_is_clkscaling_supported(hba))
ufshcd_clk_scaling_suspend(hba, false);
- ufshcd_rpm_put(hba);
+ if (rpm_put)
+ ufshcd_rpm_put(hba);
}
static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba)
@@ -6210,6 +6215,7 @@ static void ufshcd_err_handler(struct work_struct *work)
bool err_tm;
int pmc_err;
int tag;
+ bool rpm_put = false;
hba = container_of(work, struct ufs_hba, eh_work);
@@ -6231,7 +6237,7 @@ static void ufshcd_err_handler(struct work_struct *work)
}
ufshcd_set_eh_in_progress(hba);
spin_unlock_irqrestore(hba->host->host_lock, flags);
- ufshcd_err_handling_prepare(hba);
+ ufshcd_err_handling_prepare(hba, &rpm_put);
/* Complete requests that have door-bell cleared by h/w */
ufshcd_complete_requests(hba);
spin_lock_irqsave(hba->host->host_lock, flags);
@@ -6394,7 +6400,7 @@ static void ufshcd_err_handler(struct work_struct *work)
}
ufshcd_clear_eh_in_progress(hba);
spin_unlock_irqrestore(hba->host->host_lock, flags);
- ufshcd_err_handling_unprepare(hba);
+ ufshcd_err_handling_unprepare(hba, rpm_put);
up(&hba->host_sem);
dev_info(hba->dev, "%s finished; HBA state %s\n", __func__,
--
2.18.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress
2022-09-15 11:58 [PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress peter.wang
@ 2022-09-16 21:39 ` Bart Van Assche
2022-09-19 14:47 ` Peter Wang
0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2022-09-16 21:39 UTC (permalink / raw)
To: peter.wang, stanley.chu, linux-scsi, martin.petersen,
avri.altman, alim.akhtar, jejb
Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui
On 9/15/22 04:58, peter.wang@mediatek.com wrote:
> -static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
> +static void ufshcd_err_handling_prepare(struct ufs_hba *hba, bool *rpm_put)
> {
> - ufshcd_rpm_get_sync(hba);
> + if (!hba->pm_op_in_progress) {
> + ufshcd_rpm_get_sync(hba);
> + *rpm_put = true;
> + }
> +
Hi Peter,
I don't think that this patch is sufficient. If
ufshcd_err_handling_prepare() is called by the host reset handler
(ufshcd_eh_host_reset_handler()) then the host state will be
SHOST_RECOVERY. In that state SCSI command submission will hang and
hence any ufshcd_rpm_get_sync() call will hang.
How about removing the ufshcd_rpm_get_sync() call from
ufshcd_err_handling_prepare() and the ufshcd_rpm_put() call from
ufshcd_err_handling_unprepare()? It is guaranteed that no commands are
in progress for a runtime suspended LUN so the code for aborting pending
requests in the UFS error handler will be skipped anyway if it is
invoked for a runtime suspended device.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress
2022-09-16 21:39 ` Bart Van Assche
@ 2022-09-19 14:47 ` Peter Wang
2022-09-19 16:25 ` Bart Van Assche
0 siblings, 1 reply; 6+ messages in thread
From: Peter Wang @ 2022-09-19 14:47 UTC (permalink / raw)
To: Bart Van Assche, stanley.chu, linux-scsi, martin.petersen,
avri.altman, alim.akhtar, jejb
Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui
On 9/17/22 05:39, Bart Van Assche wrote:
> On 9/15/22 04:58, peter.wang@mediatek.com wrote:
>> -static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
>> +static void ufshcd_err_handling_prepare(struct ufs_hba *hba, bool
>> *rpm_put)
>> {
>> - ufshcd_rpm_get_sync(hba);
>> + if (!hba->pm_op_in_progress) {
>> + ufshcd_rpm_get_sync(hba);
>> + *rpm_put = true;
>> + }
>> +
>
> Hi Peter,
>
> I don't think that this patch is sufficient. If
> ufshcd_err_handling_prepare() is called by the host reset handler
> (ufshcd_eh_host_reset_handler()) then the host state will be
> SHOST_RECOVERY. In that state SCSI command submission will hang and
> hence any ufshcd_rpm_get_sync() call will hang.
>
> How about removing the ufshcd_rpm_get_sync() call from
> ufshcd_err_handling_prepare() and the ufshcd_rpm_put() call from
> ufshcd_err_handling_unprepare()? It is guaranteed that no commands are
> in progress for a runtime suspended LUN so the code for aborting
> pending requests in the UFS error handler will be skipped anyway if it
> is invoked for a runtime suspended device.
>
> Thanks,
>
> Bart.
Hi Bart,
If the scsi error happened and need do ufshcd_eh_host_reset_handler, the
rpm state should in RPM_ACTIVE.
Because scsi need wakeup suspended LUN, and send command to LUN then get
error, right?
So, ufshcd_rpm_get_sync should not hang in this case.
If remove ufshcd_rpm_get_sync directly, think about this case.
ufshcd_err_handler is on going and try to abort some task (which may get
stuck and timeout too).
Then rpm count down try to suspend. Finally runtime suspend callback may
return IO error and IO hang.
Thanks.
Peter
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress
2022-09-19 14:47 ` Peter Wang
@ 2022-09-19 16:25 ` Bart Van Assche
2022-09-20 2:00 ` Peter Wang
0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2022-09-19 16:25 UTC (permalink / raw)
To: Peter Wang, stanley.chu, linux-scsi, martin.petersen,
avri.altman, alim.akhtar, jejb
Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui
On 9/19/22 07:47, Peter Wang wrote:
> If the scsi error happened and need do ufshcd_eh_host_reset_handler, the
> rpm state should in RPM_ACTIVE.
> Because scsi need wakeup suspended LUN, and send command to LUN then get
> error, right?
The following sequence may activate the SCSI error handler while the RPM
state is RPM_RESUMING:
* The RPM state is RPM_SUSPENDED.
* The RPM state is changed into RPM_RESUMING and ufshcd_wl_resume() is
called.
* ufshcd_set_dev_pwr_mode() calls scsi_execute() and the START STOP UNIT
command times out.
* Because of this timeout the SCSI error handler is activated.
> If remove ufshcd_rpm_get_sync directly, think about this case.
> ufshcd_err_handler is on going and try to abort some task (which may get
> stuck and timeout too).
> Then rpm count down try to suspend. Finally runtime suspend callback may
> return IO error and IO hang.
Hmm ... suspending a UFS device involves calling ufshcd_wl_shutdown(),
ufshcd_set_dev_pwr_mode() and scsi_execute(). scsi_execute() is
serialized against the UFS error handler because the latter calls
ufshcd_scsi_block_requests().
Thanks,
Bart.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress
2022-09-19 16:25 ` Bart Van Assche
@ 2022-09-20 2:00 ` Peter Wang
2022-09-20 18:25 ` Bart Van Assche
0 siblings, 1 reply; 6+ messages in thread
From: Peter Wang @ 2022-09-20 2:00 UTC (permalink / raw)
To: Bart Van Assche, stanley.chu, linux-scsi, martin.petersen,
avri.altman, alim.akhtar, jejb
Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui
On 9/20/22 00:25, Bart Van Assche wrote:
> On 9/19/22 07:47, Peter Wang wrote:
>> If the scsi error happened and need do ufshcd_eh_host_reset_handler,
>> the rpm state should in RPM_ACTIVE.
>> Because scsi need wakeup suspended LUN, and send command to LUN then
>> get error, right?
>
> The following sequence may activate the SCSI error handler while the
> RPM state is RPM_RESUMING:
> * The RPM state is RPM_SUSPENDED.
> * The RPM state is changed into RPM_RESUMING and ufshcd_wl_resume() is
> called.
> * ufshcd_set_dev_pwr_mode() calls scsi_execute() and the START STOP
> UNIT command times out.
> * Because of this timeout the SCSI error handler is activated.
This case will not get rpm, because pm_op_in_progress is true.
So it won't hang with ufshcd_rpm_get_sync.
Thanks
Peter
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress
2022-09-20 2:00 ` Peter Wang
@ 2022-09-20 18:25 ` Bart Van Assche
0 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2022-09-20 18:25 UTC (permalink / raw)
To: Peter Wang, stanley.chu, linux-scsi, martin.petersen,
avri.altman, alim.akhtar, jejb
Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui
On 9/19/22 19:00, Peter Wang wrote:
>
> On 9/20/22 00:25, Bart Van Assche wrote:
>> On 9/19/22 07:47, Peter Wang wrote:
>>> If the scsi error happened and need do ufshcd_eh_host_reset_handler,
>>> the rpm state should in RPM_ACTIVE.
>>> Because scsi need wakeup suspended LUN, and send command to LUN then
>>> get error, right?
>>
>> The following sequence may activate the SCSI error handler while the
>> RPM state is RPM_RESUMING:
>> * The RPM state is RPM_SUSPENDED.
>> * The RPM state is changed into RPM_RESUMING and ufshcd_wl_resume() is
>> called.
>> * ufshcd_set_dev_pwr_mode() calls scsi_execute() and the START STOP
>> UNIT command times out.
>> * Because of this timeout the SCSI error handler is activated.
>
> This case will not get rpm, because pm_op_in_progress is true.
>
> So it won't hang with ufshcd_rpm_get_sync.
Right, but I think the following scenario will result in a hang:
* The RPM state is changed from RPM_SUSPENDED into RPM_RESUMING and
ufshcd_wl_resume() has not yet been called.
* ufshcd_eh_host_reset_handler() queues ufshcd_err_handler() and the
latter function calls ufshcd_rpm_get_sync().
* This results in a deadlock: the scsi_execute() call by
ufshcd_wl_resume() cannot make progress because the SCSI host state is
SHOST_RECOVERY and the error handler cannot make progress because it
keeps waiting until ufshcd_rpm_get_sync() has finished.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-09-20 18:25 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-15 11:58 [PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress peter.wang
2022-09-16 21:39 ` Bart Van Assche
2022-09-19 14:47 ` Peter Wang
2022-09-19 16:25 ` Bart Van Assche
2022-09-20 2:00 ` Peter Wang
2022-09-20 18:25 ` Bart Van Assche
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.