linux-mediatek.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] ufs: core: wlun send SSU timeout recovery
@ 2023-09-22  9:09 peter.wang
  2023-09-22 16:06 ` Bart Van Assche
  2023-09-26 18:29 ` Bart Van Assche
  0 siblings, 2 replies; 7+ messages in thread
From: peter.wang @ 2023-09-22  9:09 UTC (permalink / raw)
  To: stanley.chu, linux-scsi, martin.petersen, avri.altman, alim.akhtar, jejb
  Cc: wsd_upstream, linux-mediatek, peter.wang, chun-hung.wu,
	alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao,
	qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu

From: Peter Wang <peter.wang@mediatek.com>

When runtime pm send SSU times out, the SCSI core invokes
eh_host_reset_handler, which hooks function ufshcd_eh_host_reset_handler
schedule eh_work and stuck at wait flush_work(&hba->eh_work).
However, ufshcd_err_handler hangs in wait rpm resume.
Do link recovery only in this case.
Below is IO hang stack dump in kernel-6.1

kworker/4:0     D
<ffffffd7d31f6fb4> __switch_to+0x180/0x344
<ffffffd7d31f779c> __schedule+0x5ec/0xa14
<ffffffd7d31f7c3c> schedule+0x78/0xe0
<ffffffd7d31fefbc> schedule_timeout+0xb0/0x15c
<ffffffd7d31f8120> io_schedule_timeout+0x48/0x70
<ffffffd7d31f8e40> do_wait_for_common+0x108/0x19c
<ffffffd7d31f837c> wait_for_completion_io_timeout+0x50/0x78
<ffffffd7d2876bc0> blk_execute_rq+0x1b8/0x218
<ffffffd7d2b4297c> scsi_execute_cmd+0x148/0x238
<ffffffd7d2da7358> ufshcd_set_dev_pwr_mode+0xe8/0x244
<ffffffd7d2da7e40> __ufshcd_wl_resume+0x1e0/0x45c
<ffffffd7d2da7b28> ufshcd_wl_runtime_resume+0x3c/0x174
<ffffffd7d2b4f290> scsi_runtime_resume+0x7c/0xc8
<ffffffd7d2ae1d48> __rpm_callback+0xa0/0x410
<ffffffd7d2ae0128> rpm_resume+0x43c/0x67c
<ffffffd7d2ae1e98> __rpm_callback+0x1f0/0x410
<ffffffd7d2ae014c> rpm_resume+0x460/0x67c
<ffffffd7d2ae1450> pm_runtime_work+0xa4/0xac
<ffffffd7d22e39ac> process_one_work+0x208/0x598
<ffffffd7d22e3fc0> worker_thread+0x228/0x438
<ffffffd7d22eb038> kthread+0x104/0x1d4
<ffffffd7d22171a0> ret_from_fork+0x10/0x20

scsi_eh_0       D
<ffffffd7d31f6fb4> __switch_to+0x180/0x344
<ffffffd7d31f779c> __schedule+0x5ec/0xa14
<ffffffd7d31f7c3c> schedule+0x78/0xe0
<ffffffd7d31fef50> schedule_timeout+0x44/0x15c
<ffffffd7d31f8e40> do_wait_for_common+0x108/0x19c
<ffffffd7d31f8234> wait_for_completion+0x48/0x64
<ffffffd7d22deb88> __flush_work+0x260/0x2d0
<ffffffd7d22de918> flush_work+0x10/0x20
<ffffffd7d2da4728> ufshcd_eh_host_reset_handler+0x88/0xcc
<ffffffd7d2b41da4> scsi_try_host_reset+0x48/0xe0
<ffffffd7d2b410fc> scsi_eh_ready_devs+0x934/0xa40
<ffffffd7d2b41618> scsi_error_handler+0x168/0x374
<ffffffd7d22eb038> kthread+0x104/0x1d4
<ffffffd7d22171a0> ret_from_fork+0x10/0x20

kworker/u16:5   D
<ffffffd7d31f6fb4> __switch_to+0x180/0x344
<ffffffd7d31f779c> __schedule+0x5ec/0xa14
<ffffffd7d31f7c3c> schedule+0x78/0xe0
<ffffffd7d2adfe00> rpm_resume+0x114/0x67c
<ffffffd7d2adfca8> __pm_runtime_resume+0x70/0xb4
<ffffffd7d2d9cf48> ufshcd_err_handler+0x1a0/0xe68
<ffffffd7d22e39ac> process_one_work+0x208/0x598
<ffffffd7d22e3fc0> worker_thread+0x228/0x438
<ffffffd7d22eb038> kthread+0x104/0x1d4
<ffffffd7d22171a0> ret_from_fork+0x10/0x20

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
 drivers/ufs/core/ufshcd.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index c2df07545f96..7608d75bb4fe 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -7713,9 +7713,29 @@ static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd)
 	int err = SUCCESS;
 	unsigned long flags;
 	struct ufs_hba *hba;
+	struct device *dev;
 
 	hba = shost_priv(cmd->device->host);
 
+	/*
+	 * If runtime pm send SSU and got timeout, scsi_error_handler
+	 * stuck at this function to wait flush_work(&hba->eh_work).
+	 * And ufshcd_err_handler(eh_work) stuck at wait runtime pm active.
+	 * Do ufshcd_link_recovery instead shedule eh_work can prevent
+	 * dead lock happen.
+	 */
+	dev = &hba->ufs_device_wlun->sdev_gendev;
+	if ((dev->power.runtime_status == RPM_RESUMING) ||
+		(dev->power.runtime_status == RPM_SUSPENDING)) {
+		err = ufshcd_link_recovery(hba);
+		if (err) {
+			dev_err(hba->dev, "WL Device PM: status:%d, err:%d\n",
+				dev->power.runtime_status,
+				dev->power.runtime_error);
+		}
+		return err;
+	}
+
 	spin_lock_irqsave(hba->host->host_lock, flags);
 	hba->force_reset = true;
 	ufshcd_schedule_eh_work(hba);
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] ufs: core: wlun send SSU timeout recovery
  2023-09-22  9:09 [PATCH v4] ufs: core: wlun send SSU timeout recovery peter.wang
@ 2023-09-22 16:06 ` Bart Van Assche
       [not found]   ` <39e0a8b6d6f235dfa7dd36a4436c9eac5ab5210e.camel@mediatek.com>
  2023-09-26 18:29 ` Bart Van Assche
  1 sibling, 1 reply; 7+ messages in thread
From: Bart Van Assche @ 2023-09-22 16:06 UTC (permalink / raw)
  To: peter.wang, stanley.chu, linux-scsi, martin.petersen,
	avri.altman, alim.akhtar, jejb
  Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
	chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui,
	tun-yu.yu, eddie.huang, naomi.chu

On 9/22/23 02:09, peter.wang@mediatek.com wrote:
> When runtime pm send SSU times out, the SCSI core invokes
> eh_host_reset_handler, which hooks function ufshcd_eh_host_reset_handler
> schedule eh_work and stuck at wait flush_work(&hba->eh_work).
> However, ufshcd_err_handler hangs in wait rpm resume.
> Do link recovery only in this case.
> Below is IO hang stack dump in kernel-6.1

What does kernel-6.1 mean? Has commit 7029e2151a7c ("scsi: ufs: Fix a
deadlock between PM and the SCSI error handler") been backported to
that kernel?

> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index c2df07545f96..7608d75bb4fe 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -7713,9 +7713,29 @@ static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd)
>   	int err = SUCCESS;
>   	unsigned long flags;
>   	struct ufs_hba *hba;
> +	struct device *dev;
>   
>   	hba = shost_priv(cmd->device->host);
>   
> +	/*
> +	 * If runtime pm send SSU and got timeout, scsi_error_handler
> +	 * stuck at this function to wait flush_work(&hba->eh_work).
> +	 * And ufshcd_err_handler(eh_work) stuck at wait runtime pm active.
> +	 * Do ufshcd_link_recovery instead shedule eh_work can prevent
> +	 * dead lock happen.
> +	 */
> +	dev = &hba->ufs_device_wlun->sdev_gendev;
> +	if ((dev->power.runtime_status == RPM_RESUMING) ||
> +		(dev->power.runtime_status == RPM_SUSPENDING)) {
> +		err = ufshcd_link_recovery(hba);
> +		if (err) {
> +			dev_err(hba->dev, "WL Device PM: status:%d, err:%d\n",
> +				dev->power.runtime_status,
> +				dev->power.runtime_error);
> +		}
> +		return err;
> +	}
> +
>   	spin_lock_irqsave(hba->host->host_lock, flags);
>   	hba->force_reset = true;
>   	ufshcd_schedule_eh_work(hba);

I think this change is racy because the runtime power management status
may change after the above checks have been performed and before
ufshcd_err_handling_prepare() is called. If commit 7029e2151a7c is
included in your kernel, does applying the untested patch below help?

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d45a7dd80ab8..656dabea678e 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -123,8 +123,7 @@ static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd, unsigned long msecs)
  	}

  	blk_mq_requeue_request(rq, false);
-	if (!scsi_host_in_recovery(cmd->device->host))
-		blk_mq_delay_kick_requeue_list(rq->q, msecs);
+	blk_mq_delay_kick_requeue_list(rq->q, msecs);
  }

  /**
@@ -163,8 +162,7 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, bool unbusy)
  	 */
  	cmd->result = 0;

-	blk_mq_requeue_request(scsi_cmd_to_rq(cmd),
-			       !scsi_host_in_recovery(cmd->device->host));
+	blk_mq_requeue_request(scsi_cmd_to_rq(cmd), true);
  }

  /**
@@ -495,9 +493,6 @@ static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)

  static void scsi_run_queue_async(struct scsi_device *sdev)
  {
-	if (scsi_host_in_recovery(sdev->host))
-		return;
-
  	if (scsi_target(sdev)->single_lun ||
  	    !list_empty(&sdev->host->starved_list)) {
  		kblockd_schedule_work(&sdev->requeue_work);

Thanks,

Bart.


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] ufs: core: wlun send SSU timeout recovery
       [not found]   ` <39e0a8b6d6f235dfa7dd36a4436c9eac5ab5210e.camel@mediatek.com>
@ 2023-09-25 22:07     ` Bart Van Assche
  2023-09-26 10:23       ` Peter Wang (王信友)
  0 siblings, 1 reply; 7+ messages in thread
From: Bart Van Assche @ 2023-09-25 22:07 UTC (permalink / raw)
  To: Peter Wang (王信友),
	linux-scsi, avri.altman, Stanley Chu (朱原陞),
	alim.akhtar, martin.petersen, jejb
  Cc: linux-mediatek, Jiajie Hao (郝加节),
	CC Chou (周志杰),
	Eddie Huang (黃智傑),
	Alice Chao (趙珮均),
	wsd_upstream, Lin Gui (桂林),
	Chun-Hung Wu (巫駿宏),
	Tun-yu Yu (游敦聿),
	Chaotian Jing (井朝天),
	Powen Kao (高伯文),
	Naomi Chu (朱詠田),
	Qilin Tan (谭麒麟)

On 9/25/23 01:51, Peter Wang (王信友) wrote:
> On Fri, 2023-09-22 at 09:06 -0700, Bart Van Assche wrote:
>> On 9/22/23 02:09, peter.wang@mediatek.com wrote:
>> > When runtime pm send SSU times out, the SCSI core invokes
>> > eh_host_reset_handler, which hooks function ufshcd_eh_host_reset_handler
>> > schedule eh_work and stuck at wait flush_work(&hba->eh_work).
>> > However, ufshcd_err_handler hangs in wait rpm resume.
>> > Do link recovery only in this case.
>> > Below is IO hang stack dump in kernel-6.1
>>
>> What does kernel-6.1 mean? Has commit 7029e2151a7c ("scsi: ufs: Fix a
>> deadlock between PM and the SCSI error handler") been backported to
>> that kernel?
> 
> Yes, our kernel-6.1 have backport 7029e2151a7c ("scsi: ufs: Fix a
> deadlock between PM and the SCSI error handler")

Hi Peter,

Thanks for having confirmed this.

Please use text mode instead of HTML mode when replying. As one can see
here, your reply did not make it to the linux-scsi mailing list:
https://lore.kernel.org/linux-scsi/20230922090925.16339-1-peter.wang@mediatek.com/

> And this IO hang issue still happen.
> 
> This issue is easy trigger if we drop the SSU response to let SSU timeout.
> 
> 
> 
>>
>> > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
>> > index c2df07545f96..7608d75bb4fe 100644
>> > --- a/drivers/ufs/core/ufshcd.c
>> > +++ b/drivers/ufs/core/ufshcd.c
>> > @@ -7713,9 +7713,29 @@ static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd)
>> >   int err = SUCCESS;
>> >   unsigned long flags;
>> >   struct ufs_hba *hba;
>> > +struct device *dev;
>> >   
>> >   hba = shost_priv(cmd->device->host);
>> >   
>> > +/*
>> > + * If runtime pm send SSU and got timeout, scsi_error_handler
>> > + * stuck at this function to wait flush_work(&hba->eh_work).
>> > + * And ufshcd_err_handler(eh_work) stuck at wait runtime pm active.
>> > + * Do ufshcd_link_recovery instead shedule eh_work can prevent
>> > + * dead lock happen.
>> > + */
>> > +dev = &hba->ufs_device_wlun->sdev_gendev;
>> > +if ((dev->power.runtime_status == RPM_RESUMING) ||
>> > +(dev->power.runtime_status == RPM_SUSPENDING)) {
>> > +err = ufshcd_link_recovery(hba);
>> > +if (err) {
>> > +dev_err(hba->dev, "WL Device PM: status:%d, err:%d\n",
>> > +dev->power.runtime_status,
>> > +dev->power.runtime_error);
>> > +}
>> > +return err;
>> > +}
>> > +
>> >   spin_lock_irqsave(hba->host->host_lock, flags);
>> >   hba->force_reset = true;
>> >   ufshcd_schedule_eh_work(hba);
>>
>> I think this change is racy because the runtime power management status
>> may change after the above checks have been performed and before
>> ufshcd_err_handling_prepare() is called.
> 
> If this function invoke and RPM state is in the RPM_RESUMING or RPM_SUSPENDING state,
> it means error happen in ufs driver resume/suspend callback function and no chance
> move to next state if callback function is not finished, just like backtrace stuck in send SSU cmd.
> Or we can make it simple by check pm_op_in_progress, what do you think?

I'm concerned that this approach will fix some cases but not all cases. The
UFS error handler (ufshcd_err_handler()) and runtime suspend or resume code
may be called concurrently. No matter which checks are added at the start of
ufshcd_eh_host_reset_handler(), I think these checks won't protect against
concurrent execution of runtime power management code just after these checks
have finished.

> Since the hang db backtrace is not involde below function, it is not help.

I think that conclusion is wrong. Can you please test the patch that I provided?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] ufs: core: wlun send SSU timeout recovery
  2023-09-25 22:07     ` Bart Van Assche
@ 2023-09-26 10:23       ` Peter Wang (王信友)
  2023-09-26 18:26         ` Bart Van Assche
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Wang (王信友) @ 2023-09-26 10:23 UTC (permalink / raw)
  To: linux-scsi, bvanassche, avri.altman,
	Stanley Chu (朱原陞),
	alim.akhtar, martin.petersen, jejb
  Cc: linux-mediatek, Jiajie Hao (郝加节),
	CC Chou (周志杰),
	Eddie Huang (黃智傑),
	Alice Chao (趙珮均),
	wsd_upstream, Lin Gui (桂林),
	Chun-Hung Wu (巫駿宏),
	Tun-yu Yu (游敦聿),
	Chaotian Jing (井朝天),
	Powen Kao (高伯文),
	Naomi Chu (朱詠田),
	Qilin Tan (谭麒麟)

On Mon, 2023-09-25 at 15:07 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 9/25/23 01:51, Peter Wang (王信友) wrote:
> > On Fri, 2023-09-22 at 09:06 -0700, Bart Van Assche wrote:
> >> On 9/22/23 02:09, peter.wang@mediatek.com wrote:
> >> > When runtime pm send SSU times out, the SCSI core invokes
> >> > eh_host_reset_handler, which hooks function
> ufshcd_eh_host_reset_handler
> >> > schedule eh_work and stuck at wait flush_work(&hba->eh_work).
> >> > However, ufshcd_err_handler hangs in wait rpm resume.
> >> > Do link recovery only in this case.
> >> > Below is IO hang stack dump in kernel-6.1
> >>
> >> What does kernel-6.1 mean? Has commit 7029e2151a7c ("scsi: ufs:
> Fix a
> >> deadlock between PM and the SCSI error handler") been backported
> to
> >> that kernel?
> > 
> > Yes, our kernel-6.1 have backport 7029e2151a7c ("scsi: ufs: Fix a
> > deadlock between PM and the SCSI error handler")
> 
> Hi Peter,
> 
> Thanks for having confirmed this.
> 
> Please use text mode instead of HTML mode when replying. As one can
> see
> here, your reply did not make it to the linux-scsi mailing list:
> 
https://lore.kernel.org/linux-scsi/20230922090925.16339-1-peter.wang@mediatek.com/
> 
> > And this IO hang issue still happen.
> > 
> > This issue is easy trigger if we drop the SSU response to let SSU
> timeout.
> > 
> > 
> > 
> >>
> >> > diff --git a/drivers/ufs/core/ufshcd.c
> b/drivers/ufs/core/ufshcd.c
> >> > index c2df07545f96..7608d75bb4fe 100644
> >> > --- a/drivers/ufs/core/ufshcd.c
> >> > +++ b/drivers/ufs/core/ufshcd.c
> >> > @@ -7713,9 +7713,29 @@ static int
> ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd)
> >> >   int err = SUCCESS;
> >> >   unsigned long flags;
> >> >   struct ufs_hba *hba;
> >> > +struct device *dev;
> >> >   
> >> >   hba = shost_priv(cmd->device->host);
> >> >   
> >> > +/*
> >> > + * If runtime pm send SSU and got timeout, scsi_error_handler
> >> > + * stuck at this function to wait flush_work(&hba->eh_work).
> >> > + * And ufshcd_err_handler(eh_work) stuck at wait runtime pm
> active.
> >> > + * Do ufshcd_link_recovery instead shedule eh_work can prevent
> >> > + * dead lock happen.
> >> > + */
> >> > +dev = &hba->ufs_device_wlun->sdev_gendev;
> >> > +if ((dev->power.runtime_status == RPM_RESUMING) ||
> >> > +(dev->power.runtime_status == RPM_SUSPENDING)) {
> >> > +err = ufshcd_link_recovery(hba);
> >> > +if (err) {
> >> > +dev_err(hba->dev, "WL Device PM: status:%d, err:%d\n",
> >> > +dev->power.runtime_status,
> >> > +dev->power.runtime_error);
> >> > +}
> >> > +return err;
> >> > +}
> >> > +
> >> >   spin_lock_irqsave(hba->host->host_lock, flags);
> >> >   hba->force_reset = true;
> >> >   ufshcd_schedule_eh_work(hba);
> >>
> >> I think this change is racy because the runtime power management
> status
> >> may change after the above checks have been performed and before
> >> ufshcd_err_handling_prepare() is called.
> > 
> > If this function invoke and RPM state is in the RPM_RESUMING or
> RPM_SUSPENDING state,
> > it means error happen in ufs driver resume/suspend callback
> function and no chance
> > move to next state if callback function is not finished, just like
> backtrace stuck in send SSU cmd.
> > Or we can make it simple by check pm_op_in_progress, what do you
> think?
> 
> I'm concerned that this approach will fix some cases but not all
> cases. The
> UFS error handler (ufshcd_err_handler()) and runtime suspend or
> resume code
> may be called concurrently. No matter which checks are added at the 
> start of
> ufshcd_eh_host_reset_handler(), I think these checks won't protect
> against
> concurrent execution of runtime power management code just after
> these checks
> have finished.
> 

Hi Bart,

Yes, but ufshcd_err_handler will wait runtime pm resume to actvie if
concurrently run with runtime suspend or resume.
(ufshcd_err_handler -> ufshcd_err_handling_prepare ->
ufshcd_rpm_get_sync)
So, if runtime suspend or resume send SSU timeout, scsi error handler
stuck at this function and deadlock happen.
The ideas is, if runtime pm flow get error, do ufshcd_link_recovery is
enough.


> > Since the hang db backtrace is not involde below function, it is
> not help.
> 
> I think that conclusion is wrong. Can you please test the patch that
> I provided?
> 

Yes, I have check this patch is include in our code base, and io hang
still happen. The stack dump of this deadlock can tell how it cause io
hang.

Thanks.

> Thanks,
> 
> Bart.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] ufs: core: wlun send SSU timeout recovery
  2023-09-26 10:23       ` Peter Wang (王信友)
@ 2023-09-26 18:26         ` Bart Van Assche
  0 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2023-09-26 18:26 UTC (permalink / raw)
  To: Peter Wang (王信友),
	linux-scsi, avri.altman, Stanley Chu (朱原陞),
	alim.akhtar, martin.petersen, jejb
  Cc: linux-mediatek, Jiajie Hao (郝加节),
	CC Chou (周志杰),
	Eddie Huang (黃智傑),
	Alice Chao (趙珮均),
	wsd_upstream, Lin Gui (桂林),
	Chun-Hung Wu (巫駿宏),
	Tun-yu Yu (游敦聿),
	Chaotian Jing (井朝天),
	Powen Kao (高伯文),
	Naomi Chu (朱詠田),
	Qilin Tan (谭麒麟)

On 9/26/23 03:23, Peter Wang (王信友) wrote:
> Yes, but ufshcd_err_handler will wait runtime pm resume to actvie if
> concurrently run with runtime suspend or resume.
> (ufshcd_err_handler -> ufshcd_err_handling_prepare ->
> ufshcd_rpm_get_sync)
> So, if runtime suspend or resume send SSU timeout, scsi error handler
> stuck at this function and deadlock happen.
> The ideas is, if runtime pm flow get error, do ufshcd_link_recovery is
> enough.

Thank you for having provided this clarification.

Bart.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] ufs: core: wlun send SSU timeout recovery
  2023-09-22  9:09 [PATCH v4] ufs: core: wlun send SSU timeout recovery peter.wang
  2023-09-22 16:06 ` Bart Van Assche
@ 2023-09-26 18:29 ` Bart Van Assche
  2023-09-27  3:32   ` Peter Wang (王信友)
  1 sibling, 1 reply; 7+ messages in thread
From: Bart Van Assche @ 2023-09-26 18:29 UTC (permalink / raw)
  To: peter.wang, stanley.chu, linux-scsi, martin.petersen,
	avri.altman, alim.akhtar, jejb
  Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
	chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui,
	tun-yu.yu, eddie.huang, naomi.chu

On 9/22/23 02:09, peter.wang@mediatek.com wrote:
> +	/*
> +	 * If runtime pm send SSU and got timeout, scsi_error_handler
> +	 * stuck at this function to wait flush_work(&hba->eh_work).
> +	 * And ufshcd_err_handler(eh_work) stuck at wait runtime pm active.
> +	 * Do ufshcd_link_recovery instead shedule eh_work can prevent
> +	 * dead lock happen.
> +	 */

The above comment is hard to understand because of grammatical issues.
Please try to improve this comment. A few examples: I think that "wait"
should be changed into "wait for" and also that "happen" should be changed
into "to happen".

> +	dev = &hba->ufs_device_wlun->sdev_gendev;
> +	if ((dev->power.runtime_status == RPM_RESUMING) ||
> +		(dev->power.runtime_status == RPM_SUSPENDING)) {
> +		err = ufshcd_link_recovery(hba);
> +		if (err) {
> +			dev_err(hba->dev, "WL Device PM: status:%d, err:%d\n",
> +				dev->power.runtime_status,
> +				dev->power.runtime_error);
> +		}
> +		return err;
> +	}

ufshcd_link_recovery() returns a Unix error code (e.g. -ETIMEDOUT) while
ufshcd_eh_host_reset_handler() should return one of the following values:
SUCCESS or FAILED. Please fix this.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] ufs: core: wlun send SSU timeout recovery
  2023-09-26 18:29 ` Bart Van Assche
@ 2023-09-27  3:32   ` Peter Wang (王信友)
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Wang (王信友) @ 2023-09-27  3:32 UTC (permalink / raw)
  To: linux-scsi, bvanassche, avri.altman,
	Stanley Chu (朱原陞),
	alim.akhtar, martin.petersen, jejb
  Cc: linux-mediatek, Jiajie Hao (郝加节),
	CC Chou (周志杰),
	Eddie Huang (黃智傑),
	Alice Chao (趙珮均),
	wsd_upstream, Lin Gui (桂林),
	Chun-Hung Wu (巫駿宏),
	Tun-yu Yu (游敦聿),
	Chaotian Jing (井朝天),
	Powen Kao (高伯文),
	Naomi Chu (朱詠田),
	Qilin Tan (谭麒麟)

On Tue, 2023-09-26 at 11:29 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 9/22/23 02:09, peter.wang@mediatek.com wrote:
> > +/*
> > + * If runtime pm send SSU and got timeout, scsi_error_handler
> > + * stuck at this function to wait flush_work(&hba->eh_work).
> > + * And ufshcd_err_handler(eh_work) stuck at wait runtime pm
> active.
> > + * Do ufshcd_link_recovery instead shedule eh_work can prevent
> > + * dead lock happen.
> > + */
> 
> The above comment is hard to understand because of grammatical
> issues.
> Please try to improve this comment. A few examples: I think that
> "wait"
> should be changed into "wait for" and also that "happen" should be
> changed
> into "to happen".
> 
> > +dev = &hba->ufs_device_wlun->sdev_gendev;
> > +if ((dev->power.runtime_status == RPM_RESUMING) ||
> > +(dev->power.runtime_status == RPM_SUSPENDING)) {
> > +err = ufshcd_link_recovery(hba);
> > +if (err) {
> > +dev_err(hba->dev, "WL Device PM: status:%d, err:%d\n",
> > +dev->power.runtime_status,
> > +dev->power.runtime_error);
> > +}
> > +return err;
> > +}
> 
> ufshcd_link_recovery() returns a Unix error code (e.g. -ETIMEDOUT)
> while
> ufshcd_eh_host_reset_handler() should return one of the following
> values:
> SUCCESS or FAILED. Please fix this.
> 
> Thanks,
> 
> Bart.

Hi Bart,

Thanks for review.
I will correct comment and chnage return value, Also make it more
simple by check pm_op_in_progress.

Thanks.

Peter

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-09-27  3:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-22  9:09 [PATCH v4] ufs: core: wlun send SSU timeout recovery peter.wang
2023-09-22 16:06 ` Bart Van Assche
     [not found]   ` <39e0a8b6d6f235dfa7dd36a4436c9eac5ab5210e.camel@mediatek.com>
2023-09-25 22:07     ` Bart Van Assche
2023-09-26 10:23       ` Peter Wang (王信友)
2023-09-26 18:26         ` Bart Van Assche
2023-09-26 18:29 ` Bart Van Assche
2023-09-27  3:32   ` Peter Wang (王信友)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).