[PATCH v5 0/2] Two UFS fixes

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5 0/2] Two UFS fixes
@ 2021-01-07 18:53 Jaegeuk Kim
  2021-01-07 18:53 ` [PATCH v5 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns Jaegeuk Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Jaegeuk Kim @ 2021-01-07 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, kernel-team
  Cc: cang, alim.akhtar, avri.altman, bvanassche, martin.petersen, stanley.chu

Change log from v4:
 - remove RESERVE tag for tm command
 - remove waiting IOs and let full reset handle it
 - avoid verbose error log which causes cpu lock up



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v5 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns
  2021-01-07 18:53 [PATCH v5 0/2] Two UFS fixes Jaegeuk Kim
@ 2021-01-07 18:53 ` Jaegeuk Kim
  2021-01-07 18:53 ` [PATCH v5 2/2] scsi: ufs: fix tm request correctly when non-fatal error happens Jaegeuk Kim
  2021-01-08  4:19 ` [PATCH v5 0/2] Two UFS fixes Martin K. Petersen
  2 siblings, 0 replies; 5+ messages in thread
From: Jaegeuk Kim @ 2021-01-07 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, kernel-team
  Cc: cang, alim.akhtar, avri.altman, bvanassche, martin.petersen,
	stanley.chu, Jaegeuk Kim

When gate_work/ungate_work gets an error during hibern8_enter or exit,
 ufshcd_err_handler()
   ufshcd_scsi_block_requests()
   ufshcd_reset_and_restore()
     ufshcd_clear_ua_wluns() -> stuck
   ufshcd_scsi_unblock_requests()

In order to avoid it, ufshcd_clear_ua_wluns() can be called per recovery flows
such as suspend/resume, link_recovery, and error_handler.

Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Can Guo <cang@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index bedb822a40a3..e6e7bdf99cd7 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -3996,6 +3996,8 @@ int ufshcd_link_recovery(struct ufs_hba *hba)
 	if (ret)
 		dev_err(hba->dev, "%s: link recovery failed, err %d",
 			__func__, ret);
+	else
+		ufshcd_clear_ua_wluns(hba);
 
 	return ret;
 }
@@ -6003,6 +6005,9 @@ static void ufshcd_err_handler(struct work_struct *work)
 	ufshcd_scsi_unblock_requests(hba);
 	ufshcd_err_handling_unprepare(hba);
 	up(&hba->eh_sem);
+
+	if (!err && needs_reset)
+		ufshcd_clear_ua_wluns(hba);
 }
 
 /**
@@ -6940,14 +6945,11 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba *hba)
 	ufshcd_set_clk_freq(hba, true);
 
 	err = ufshcd_hba_enable(hba);
-	if (err)
-		goto out;
 
 	/* Establish the link again and restore the device */
-	err = ufshcd_probe_hba(hba, false);
 	if (!err)
-		ufshcd_clear_ua_wluns(hba);
-out:
+		err = ufshcd_probe_hba(hba, false);
+
 	if (err)
 		dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);
 	ufshcd_update_evt_hist(hba, UFS_EVT_HOST_RESET, (u32)err);
@@ -7718,6 +7720,8 @@ static int ufshcd_add_lus(struct ufs_hba *hba)
 	if (ret)
 		goto out;
 
+	ufshcd_clear_ua_wluns(hba);
+
 	/* Initialize devfreq after UFS device is detected */
 	if (ufshcd_is_clkscaling_supported(hba)) {
 		memcpy(&hba->clk_scaling.saved_pwr_info.info,
@@ -7919,8 +7923,6 @@ static void ufshcd_async_scan(void *data, async_cookie_t cookie)
 		pm_runtime_put_sync(hba->dev);
 		ufshcd_exit_clk_scaling(hba);
 		ufshcd_hba_exit(hba);
-	} else {
-		ufshcd_clear_ua_wluns(hba);
 	}
 }
 
@@ -8777,6 +8779,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
 		ufshcd_resume_clkscaling(hba);
 	hba->clk_gating.is_suspended = false;
 	hba->dev_info.b_rpm_dev_flush_capable = false;
+	ufshcd_clear_ua_wluns(hba);
 	ufshcd_release(hba);
 out:
 	if (hba->dev_info.b_rpm_dev_flush_capable) {
@@ -8887,6 +8890,8 @@ static int ufshcd_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op)
 		cancel_delayed_work(&hba->rpm_dev_flush_recheck_work);
 	}
 
+	ufshcd_clear_ua_wluns(hba);
+
 	/* Schedule clock gating in case of no access to UFS device yet */
 	ufshcd_release(hba);
 
-- 
2.29.2.729.g45daf8777d-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v5 2/2] scsi: ufs: fix tm request correctly when non-fatal error happens
  2021-01-07 18:53 [PATCH v5 0/2] Two UFS fixes Jaegeuk Kim
  2021-01-07 18:53 ` [PATCH v5 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns Jaegeuk Kim
@ 2021-01-07 18:53 ` Jaegeuk Kim
  2021-01-08  0:53   ` Can Guo
  2021-01-08  4:19 ` [PATCH v5 0/2] Two UFS fixes Martin K. Petersen
  2 siblings, 1 reply; 5+ messages in thread
From: Jaegeuk Kim @ 2021-01-07 18:53 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, kernel-team
  Cc: cang, alim.akhtar, avri.altman, bvanassche, martin.petersen,
	stanley.chu, Jaegeuk Kim, Jaegeuk Kim

From: Jaegeuk Kim <jaegeuk@google.com>

When non-fatal error like line-reset happens, ufshcd_err_handler() starts to
abort tasks by ufshcd_try_to_abort_task(). When it tries to issue tm request,
we've hit two warnings.

WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82 blk_mq_get_tag+0x438/0x46c

After fixing the above warnings, I've hit another tm_cmd timeout, which may be
caused by unstable controller state.

__ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out

Then, ufshcd_err_handler() enters full reset, and I hit kernel stuck. It turned
out ufshcd_print_trs() printed too many messages in console which requires CPU
locks. Likewise hba->silence_err_logs, we need to avoid too verbose messages.
Actually it came from ufshcd_transfer_rsp_status() when requeuing commands back.
Indeed, this is actually not an error case, so let's fix it.

Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate and free TMFs")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 drivers/scsi/ufs/ufshcd.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index e6e7bdf99cd7..2a715f13fe1d 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -4996,7 +4996,8 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 		break;
 	} /* end of switch */
 
-	if ((host_byte(result) != DID_OK) && !hba->silence_err_logs)
+	if ((host_byte(result) != DID_OK) &&
+	    (host_byte(result) != DID_REQUEUE) && !hba->silence_err_logs)
 		ufshcd_print_trs(hba, 1 << lrbp->task_tag, true);
 	return result;
 }
@@ -6302,9 +6303,13 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
 		intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
 	}
 
-	if (enabled_intr_status && retval == IRQ_NONE) {
-		dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
-					__func__, intr_status);
+	if (enabled_intr_status && retval == IRQ_NONE &&
+				!ufshcd_eh_in_progress(hba)) {
+		dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x, 0x%08x)\n",
+					__func__,
+					intr_status,
+					hba->ufs_stats.last_intr_status,
+					enabled_intr_status);
 		ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
 	}
 
@@ -6348,7 +6353,10 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 	 * Even though we use wait_event() which sleeps indefinitely,
 	 * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
 	 */
-	req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
+	req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+
 	req->end_io_data = &wait;
 	free_slot = req->tag;
 	WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
-- 
2.29.2.729.g45daf8777d-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 2/2] scsi: ufs: fix tm request correctly when non-fatal error happens
  2021-01-07 18:53 ` [PATCH v5 2/2] scsi: ufs: fix tm request correctly when non-fatal error happens Jaegeuk Kim
@ 2021-01-08  0:53   ` Can Guo
  0 siblings, 0 replies; 5+ messages in thread
From: Can Guo @ 2021-01-08  0:53 UTC (permalink / raw)
  To: Jaegeuk Kim
  Cc: linux-kernel, linux-scsi, kernel-team, alim.akhtar, avri.altman,
	bvanassche, martin.petersen, stanley.chu, Jaegeuk Kim

On 2021-01-08 02:53, Jaegeuk Kim wrote:
> From: Jaegeuk Kim <jaegeuk@google.com>
> 
> When non-fatal error like line-reset happens, ufshcd_err_handler() 
> starts to
> abort tasks by ufshcd_try_to_abort_task(). When it tries to issue tm 
> request,
> we've hit two warnings.
> 
> WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 
> blk_get_request+0x68/0x70
> WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82 
> blk_mq_get_tag+0x438/0x46c
> 
> After fixing the above warnings, I've hit another tm_cmd timeout, which 
> may be
> caused by unstable controller state.
> 
> __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
> 
> Then, ufshcd_err_handler() enters full reset, and I hit kernel stuck. 
> It turned
> out ufshcd_print_trs() printed too many messages in console which 
> requires CPU
> locks. Likewise hba->silence_err_logs, we need to avoid too verbose 
> messages.
> Actually it came from ufshcd_transfer_rsp_status() when requeuing 
> commands back.
> Indeed, this is actually not an error case, so let's fix it.
> 
> Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to
> allocate and free TMFs")
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

It is really good to find out the root cause! Thanks for the fix.

Reviewed-by: Can Guo <cang@codeaurora.org>

> ---
>  drivers/scsi/ufs/ufshcd.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index e6e7bdf99cd7..2a715f13fe1d 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -4996,7 +4996,8 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba,
> struct ufshcd_lrb *lrbp)
>  		break;
>  	} /* end of switch */
> 
> -	if ((host_byte(result) != DID_OK) && !hba->silence_err_logs)
> +	if ((host_byte(result) != DID_OK) &&
> +	    (host_byte(result) != DID_REQUEUE) && !hba->silence_err_logs)
>  		ufshcd_print_trs(hba, 1 << lrbp->task_tag, true);
>  	return result;
>  }
> @@ -6302,9 +6303,13 @@ static irqreturn_t ufshcd_intr(int irq, void 
> *__hba)
>  		intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
>  	}
> 
> -	if (enabled_intr_status && retval == IRQ_NONE) {
> -		dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
> -					__func__, intr_status);
> +	if (enabled_intr_status && retval == IRQ_NONE &&
> +				!ufshcd_eh_in_progress(hba)) {
> +		dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x, 
> 0x%08x)\n",
> +					__func__,
> +					intr_status,
> +					hba->ufs_stats.last_intr_status,
> +					enabled_intr_status);
>  		ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
>  	}
> 
> @@ -6348,7 +6353,10 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba 
> *hba,
>  	 * Even though we use wait_event() which sleeps indefinitely,
>  	 * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
>  	 */
> -	req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
> +	req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
> +	if (IS_ERR(req))
> +		return PTR_ERR(req);
> +
>  	req->end_io_data = &wait;
>  	free_slot = req->tag;
>  	WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 0/2] Two UFS fixes
  2021-01-07 18:53 [PATCH v5 0/2] Two UFS fixes Jaegeuk Kim
  2021-01-07 18:53 ` [PATCH v5 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns Jaegeuk Kim
  2021-01-07 18:53 ` [PATCH v5 2/2] scsi: ufs: fix tm request correctly when non-fatal error happens Jaegeuk Kim
@ 2021-01-08  4:19 ` Martin K. Petersen
  2 siblings, 0 replies; 5+ messages in thread
From: Martin K. Petersen @ 2021-01-08  4:19 UTC (permalink / raw)
  To: linux-kernel, kernel-team, linux-scsi, Jaegeuk Kim
  Cc: Martin K . Petersen, alim.akhtar, bvanassche, cang, avri.altman,
	stanley.chu

On Thu, 7 Jan 2021 10:53:14 -0800, Jaegeuk Kim wrote:

> Change log from v4:
>  - remove RESERVE tag for tm command
>  - remove waiting IOs and let full reset handle it
>  - avoid verbose error log which causes cpu lock up

Applied to 5.11/scsi-fixes, thanks!

[1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns
      https://git.kernel.org/mkp/scsi/c/4ee7ee530bc2
[2/2] scsi: ufs: fix tm request correctly when non-fatal error happens
      https://git.kernel.org/mkp/scsi/c/eeb1b55b6e25

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-01-08  4:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-07 18:53 [PATCH v5 0/2] Two UFS fixes Jaegeuk Kim
2021-01-07 18:53 ` [PATCH v5 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns Jaegeuk Kim
2021-01-07 18:53 ` [PATCH v5 2/2] scsi: ufs: fix tm request correctly when non-fatal error happens Jaegeuk Kim
2021-01-08  0:53   ` Can Guo
2021-01-08  4:19 ` [PATCH v5 0/2] Two UFS fixes Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).