[PATCH v5 0/2] Two UFS fixes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v5 0/2] Two UFS fixes
@ 2021-01-07 18:53 Jaegeuk Kim
  2021-01-07 18:53 ` [PATCH v5 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns Jaegeuk Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Jaegeuk Kim @ 2021-01-07 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, kernel-team
  Cc: cang, alim.akhtar, avri.altman, bvanassche, martin.petersen, stanley.chu

Change log from v4:
 - remove RESERVE tag for tm command
 - remove waiting IOs and let full reset handle it
 - avoid verbose error log which causes cpu lock up



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v5 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns
  2021-01-07 18:53 [PATCH v5 0/2] Two UFS fixes Jaegeuk Kim
@ 2021-01-07 18:53 ` Jaegeuk Kim
  2021-01-07 18:53 ` [PATCH v5 2/2] scsi: ufs: fix tm request correctly when non-fatal error happens Jaegeuk Kim
  2021-01-08  4:19 ` [PATCH v5 0/2] Two UFS fixes Martin K. Petersen
  2 siblings, 0 replies; 5+ messages in thread
From: Jaegeuk Kim @ 2021-01-07 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, kernel-team
  Cc: cang, alim.akhtar, avri.altman, bvanassche, martin.petersen,
	stanley.chu, Jaegeuk Kim

When gate_work/ungate_work gets an error during hibern8_enter or exit,
 ufshcd_err_handler()
   ufshcd_scsi_block_requests()
   ufshcd_reset_and_restore()
     ufshcd_clear_ua_wluns() -> stuck
   ufshcd_scsi_unblock_requests()

In order to avoid it, ufshcd_clear_ua_wluns() can be called per recovery flows
such as suspend/resume, link_recovery, and error_handler.

Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Can Guo <cang@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index bedb822a40a3..e6e7bdf99cd7 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -3996,6 +3996,8 @@ int ufshcd_link_recovery(struct ufs_hba *hba)
 	if (ret)
 		dev_err(hba->dev, "%s: link recovery failed, err %d",
 			__func__, ret);
+	else
+		ufshcd_clear_ua_wluns(hba);
 
 	return ret;
 }
@@ -6003,6 +6005,9 @@ static void ufshcd_err_handler(struct work_struct *work)
 	ufshcd_scsi_unblock_requests(hba);
 	ufshcd_err_handling_unprepare(hba);
 	up(&hba->eh_sem);
+
+	if (!err && needs_reset)
+		ufshcd_clear_ua_wluns(hba);
 }
 
 /**
@@ -6940,14 +6945,11 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba *hba)
 	ufshcd_set_clk_freq(hba, true);
 
 	err = ufshcd_hba_enable(hba);
-	if (err)
-		goto out;
 
 	/* Establish the link again and restore the device */
-	err = ufshcd_probe_hba(hba, false);
 	if (!err)
-		ufshcd_clear_ua_wluns(hba);
-out:
+		err = ufshcd_probe_hba(hba, false);
+
 	if (err)
 		dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);
 	ufshcd_update_evt_hist(hba, UFS_EVT_HOST_RESET, (u32)err);
@@ -7718,6 +7720,8 @@ static int ufshcd_add_lus(struct ufs_hba *hba)
 	if (ret)
 		goto out;
 
+	ufshcd_clear_ua_wluns(hba);
+
 	/* Initialize devfreq after UFS device is detected */
 	if (ufshcd_is_clkscaling_supported(hba)) {
 		memcpy(&hba->clk_scaling.saved_pwr_info.info,
@@ -7919,8 +7923,6 @@ static void ufshcd_async_scan(void *data, async_cookie_t cookie)
 		pm_runtime_put_sync(hba->dev);
 		ufshcd_exit_clk_scaling(hba);
 		ufshcd_hba_exit(hba);
-	} else {
-		ufshcd_clear_ua_wluns(hba);
 	}
 }
 
@@ -8777,6 +8779,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
 		ufshcd_resume_clkscaling(hba);
 	hba->clk_gating.is_suspended = false;
 	hba->dev_info.b_rpm_dev_flush_capable = false;
+	ufshcd_clear_ua_wluns(hba);
 	ufshcd_release(hba);
 out:
 	if (hba->dev_info.b_rpm_dev_flush_capable) {
@@ -8887,6 +8890,8 @@ static int ufshcd_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op)
 		cancel_delayed_work(&hba->rpm_dev_flush_recheck_work);
 	}
 
+	ufshcd_clear_ua_wluns(hba);
+
 	/* Schedule clock gating in case of no access to UFS device yet */
 	ufshcd_release(hba);
 
-- 
2.29.2.729.g45daf8777d-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v5 2/2] scsi: ufs: fix tm request correctly when non-fatal error happens
  2021-01-07 18:53 [PATCH v5 0/2] Two UFS fixes Jaegeuk Kim
  2021-01-07 18:53 ` [PATCH v5 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns Jaegeuk Kim
@ 2021-01-07 18:53 ` Jaegeuk Kim
  2021-01-08  0:53   ` Can Guo
  2021-01-08  4:19 ` [PATCH v5 0/2] Two UFS fixes Martin K. Petersen
  2 siblings, 1 reply; 5+ messages in thread
From: Jaegeuk Kim @ 2021-01-07 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, kernel-team
  Cc: cang, alim.akhtar, avri.altman, bvanassche, martin.petersen,
	stanley.chu, Jaegeuk Kim, Jaegeuk Kim

From: Jaegeuk Kim <jaegeuk@google.com>

When non-fatal error like line-reset happens, ufshcd_err_handler() starts to
abort tasks by ufshcd_try_to_abort_task(). When it tries to issue tm request,
we've hit two warnings.

WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82 blk_mq_get_tag+0x438/0x46c

After fixing the above warnings, I've hit another tm_cmd timeout, which may be
caused by unstable controller state.

__ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out

Then, ufshcd_err_handler() enters full reset, and I hit kernel stuck. It turned
out ufshcd_print_trs() printed too many messages in console which requires CPU
locks. Likewise hba->silence_err_logs, we need to avoid too verbose messages.
Actually it came from ufshcd_transfer_rsp_status() when requeuing commands back.
Indeed, this is actually not an error case, so let's fix it.

Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate and free TMFs")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 drivers/scsi/ufs/ufshcd.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index e6e7bdf99cd7..2a715f13fe1d 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -4996,7 +4996,8 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 		break;
 	} /* end of switch */
 
-	if ((host_byte(result) != DID_OK) && !hba->silence_err_logs)
+	if ((host_byte(result) != DID_OK) &&
+	    (host_byte(result) != DID_REQUEUE) && !hba->silence_err_logs)
 		ufshcd_print_trs(hba, 1 << lrbp->task_tag, true);
 	return result;
 }
@@ -6302,9 +6303,13 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
 		intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
 	}
 
-	if (enabled_intr_status && retval == IRQ_NONE) {
-		dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
-					__func__, intr_status);
+	if (enabled_intr_status && retval == IRQ_NONE &&
+				!ufshcd_eh_in_progress(hba)) {
+		dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x, 0x%08x)\n",
+					__func__,
+					intr_status,
+					hba->ufs_stats.last_intr_status,
+					enabled_intr_status);
 		ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
 	}
 
@@ -6348,7 +6353,10 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 	 * Even though we use wait_event() which sleeps indefinitely,
 	 * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
 	 */
-	req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
+	req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+
 	req->end_io_data = &wait;
 	free_slot = req->tag;
 	WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
-- 
2.29.2.729.g45daf8777d-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 2/2] scsi: ufs: fix tm request correctly when non-fatal error happens
  2021-01-07 18:53 ` [PATCH v5 2/2] scsi: ufs: fix tm request correctly when non-fatal error happens Jaegeuk Kim
@ 2021-01-08  0:53   ` Can Guo
  0 siblings, 0 replies; 5+ messages in thread
From: Can Guo @ 2021-01-08  0:53 UTC (permalink / raw)
  To: Jaegeuk Kim
  Cc: linux-kernel, linux-scsi, kernel-team, alim.akhtar, avri.altman,
	bvanassche, martin.petersen, stanley.chu, Jaegeuk Kim

On 2021-01-08 02:53, Jaegeuk Kim wrote:
> From: Jaegeuk Kim <jaegeuk@google.com>
> 
> When non-fatal error like line-reset happens, ufshcd_err_handler() 
> starts to
> abort tasks by ufshcd_try_to_abort_task(). When it tries to issue tm 
> request,
> we've hit two warnings.
> 
> WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 
> blk_get_request+0x68/0x70
> WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82 
> blk_mq_get_tag+0x438/0x46c
> 
> After fixing the above warnings, I've hit another tm_cmd timeout, which 
> may be
> caused by unstable controller state.
> 
> __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
> 
> Then, ufshcd_err_handler() enters full reset, and I hit kernel stuck. 
> It turned
> out ufshcd_print_trs() printed too many messages in console which 
> requires CPU
> locks. Likewise hba->silence_err_logs, we need to avoid too verbose 
> messages.
> Actually it came from ufshcd_transfer_rsp_status() when requeuing 
> commands back.
> Indeed, this is actually not an error case, so let's fix it.
> 
> Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to
> allocate and free TMFs")
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

It is really good to find out the root cause! Thanks for the fix.

Reviewed-by: Can Guo <cang@codeaurora.org>

> ---
>  drivers/scsi/ufs/ufshcd.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index e6e7bdf99cd7..2a715f13fe1d 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -4996,7 +4996,8 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba,
> struct ufshcd_lrb *lrbp)
>  		break;
>  	} /* end of switch */
> 
> -	if ((host_byte(result) != DID_OK) && !hba->silence_err_logs)
> +	if ((host_byte(result) != DID_OK) &&
> +	    (host_byte(result) != DID_REQUEUE) && !hba->silence_err_logs)
>  		ufshcd_print_trs(hba, 1 << lrbp->task_tag, true);
>  	return result;
>  }
> @@ -6302,9 +6303,13 @@ static irqreturn_t ufshcd_intr(int irq, void 
> *__hba)
>  		intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
>  	}
> 
> -	if (enabled_intr_status && retval == IRQ_NONE) {
> -		dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
> -					__func__, intr_status);
> +	if (enabled_intr_status && retval == IRQ_NONE &&
> +				!ufshcd_eh_in_progress(hba)) {
> +		dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x, 
> 0x%08x)\n",
> +					__func__,
> +					intr_status,
> +					hba->ufs_stats.last_intr_status,
> +					enabled_intr_status);
>  		ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
>  	}
> 
> @@ -6348,7 +6353,10 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba 
> *hba,
>  	 * Even though we use wait_event() which sleeps indefinitely,
>  	 * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
>  	 */
> -	req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
> +	req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
> +	if (IS_ERR(req))
> +		return PTR_ERR(req);
> +
>  	req->end_io_data = &wait;
>  	free_slot = req->tag;
>  	WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 0/2] Two UFS fixes
  2021-01-07 18:53 [PATCH v5 0/2] Two UFS fixes Jaegeuk Kim
  2021-01-07 18:53 ` [PATCH v5 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns Jaegeuk Kim
  2021-01-07 18:53 ` [PATCH v5 2/2] scsi: ufs: fix tm request correctly when non-fatal error happens Jaegeuk Kim
@ 2021-01-08  4:19 ` Martin K. Petersen
  2 siblings, 0 replies; 5+ messages in thread
From: Martin K. Petersen @ 2021-01-08  4:19 UTC (permalink / raw)
  To: linux-kernel, kernel-team, linux-scsi, Jaegeuk Kim
  Cc: Martin K . Petersen, alim.akhtar, bvanassche, cang, avri.altman,
	stanley.chu

On Thu, 7 Jan 2021 10:53:14 -0800, Jaegeuk Kim wrote:

> Change log from v4:
>  - remove RESERVE tag for tm command
>  - remove waiting IOs and let full reset handle it
>  - avoid verbose error log which causes cpu lock up

Applied to 5.11/scsi-fixes, thanks!

[1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns
      https://git.kernel.org/mkp/scsi/c/4ee7ee530bc2
[2/2] scsi: ufs: fix tm request correctly when non-fatal error happens
      https://git.kernel.org/mkp/scsi/c/eeb1b55b6e25

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-01-08  4:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-07 18:53 [PATCH v5 0/2] Two UFS fixes Jaegeuk Kim
2021-01-07 18:53 ` [PATCH v5 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns Jaegeuk Kim
2021-01-07 18:53 ` [PATCH v5 2/2] scsi: ufs: fix tm request correctly when non-fatal error happens Jaegeuk Kim
2021-01-08  0:53   ` Can Guo
2021-01-08  4:19 ` [PATCH v5 0/2] Two UFS fixes Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.