All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/3] Optimize host lock on TR send/compl paths and utilize UTRLCNR
@ 2021-05-24  8:36 Can Guo
  2021-05-24  8:36 ` [PATCH v1 1/3] scsi: ufs: Remove a redundant command completion logic in error handler Can Guo
                   ` (3 more replies)
  0 siblings, 4 replies; 68+ messages in thread
From: Can Guo @ 2021-05-24  8:36 UTC (permalink / raw)
  To: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, cang

By optimizing host lock usage on TR send/compl paths and utilizing UTRLCNR,
we can get considerable gain in both random read and random write performance.

Can Guo (3):
  scsi: ufs: Remove a redundant command completion logic in error
    handler
  scsi: ufs: Optimize host lock on transfer requests send/compl paths
  scsi: ufs: Utilize Transfer Request List Completion Notification
    Register

 drivers/scsi/ufs/ufshcd.c | 309 +++++++++++++++++++++++-----------------------
 drivers/scsi/ufs/ufshcd.h |   7 +-
 drivers/scsi/ufs/ufshci.h |   1 +
 3 files changed, 162 insertions(+), 155 deletions(-)

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v1 1/3] scsi: ufs: Remove a redundant command completion logic in error handler
  2021-05-24  8:36 [PATCH v1 0/3] Optimize host lock on TR send/compl paths and utilize UTRLCNR Can Guo
@ 2021-05-24  8:36 ` Can Guo
  2021-05-24 16:43   ` Bart Van Assche
                     ` (2 more replies)
  2021-05-24  8:36   ` Can Guo
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 68+ messages in thread
From: Can Guo @ 2021-05-24  8:36 UTC (permalink / raw)
  To: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, cang
  Cc: Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, Jaegeuk Kim,
	open list

ufshcd_host_reset_and_restore() anyways completes all pending requests
before starts re-probing, so there is no need to complete the command on
the highest bit in tr_doorbell in advance.

Signed-off-by: Can Guo <cang@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index d543864..c4b37d2 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -6123,19 +6123,6 @@ static void ufshcd_err_handler(struct work_struct *work)
 do_reset:
 	/* Fatal errors need reset */
 	if (needs_reset) {
-		unsigned long max_doorbells = (1UL << hba->nutrs) - 1;
-
-		/*
-		 * ufshcd_reset_and_restore() does the link reinitialization
-		 * which will need atleast one empty doorbell slot to send the
-		 * device management commands (NOP and query commands).
-		 * If there is no slot empty at this moment then free up last
-		 * slot forcefully.
-		 */
-		if (hba->outstanding_reqs == max_doorbells)
-			__ufshcd_transfer_req_compl(hba,
-						    (1UL << (hba->nutrs - 1)));
-
 		hba->force_reset = false;
 		spin_unlock_irqrestore(hba->host->host_lock, flags);
 		err = ufshcd_reset_and_restore(hba);
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-05-24  8:36 [PATCH v1 0/3] Optimize host lock on TR send/compl paths and utilize UTRLCNR Can Guo
  2021-05-24  8:36 ` [PATCH v1 1/3] scsi: ufs: Remove a redundant command completion logic in error handler Can Guo
@ 2021-05-24  8:36   ` Can Guo
  2021-05-24  8:36   ` Can Guo
  2021-06-16  3:48 ` [PATCH v1 0/3] Optimize host lock on TR send/compl paths and utilize UTRLCNR Martin K. Petersen
  3 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-05-24  8:36 UTC (permalink / raw)
  To: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, cang
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

Current UFS IRQ handler is completely wrapped by host lock, and because
ufshcd_send_command() is also protected by host lock, when IRQ handler
fires, not only the CPU running the IRQ handler cannot send new requests,
the rest CPUs can neither. Move the host lock wrapping the IRQ handler into
specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further
reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is
no longer required to call __ufshcd_transfer_req_compl(). As per test, the
optimization can bring considerable gain to random read/write performance.

Cc: Stanley Chu <stanley.chu@mediatek.com>
Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c | 258 ++++++++++++++++++++++------------------------
 drivers/scsi/ufs/ufshcd.h |   2 -
 2 files changed, 126 insertions(+), 134 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index c4b37d2..b9b5e61 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -758,7 +758,7 @@ static inline void ufshcd_utmrl_clear(struct ufs_hba *hba, u32 pos)
  */
 static inline void ufshcd_outstanding_req_clear(struct ufs_hba *hba, int tag)
 {
-	__clear_bit(tag, &hba->outstanding_reqs);
+	clear_bit(tag, &hba->outstanding_reqs);
 }
 
 /**
@@ -1984,15 +1984,19 @@ static void ufshcd_clk_scaling_start_busy(struct ufs_hba *hba)
 {
 	bool queue_resume_work = false;
 	ktime_t curr_t = ktime_get();
+	unsigned long flags;
 
 	if (!ufshcd_is_clkscaling_supported(hba))
 		return;
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
 	if (!hba->clk_scaling.active_reqs++)
 		queue_resume_work = true;
 
-	if (!hba->clk_scaling.is_enabled || hba->pm_op_in_progress)
+	if (!hba->clk_scaling.is_enabled || hba->pm_op_in_progress) {
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
 		return;
+	}
 
 	if (queue_resume_work)
 		queue_work(hba->clk_scaling.workq,
@@ -2008,21 +2012,26 @@ static void ufshcd_clk_scaling_start_busy(struct ufs_hba *hba)
 		hba->clk_scaling.busy_start_t = curr_t;
 		hba->clk_scaling.is_busy_started = true;
 	}
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 }
 
 static void ufshcd_clk_scaling_update_busy(struct ufs_hba *hba)
 {
 	struct ufs_clk_scaling *scaling = &hba->clk_scaling;
+	unsigned long flags;
 
 	if (!ufshcd_is_clkscaling_supported(hba))
 		return;
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	hba->clk_scaling.active_reqs--;
 	if (!hba->outstanding_reqs && scaling->is_busy_started) {
 		scaling->tot_busy_t += ktime_to_us(ktime_sub(ktime_get(),
 					scaling->busy_start_t));
 		scaling->busy_start_t = 0;
 		scaling->is_busy_started = false;
 	}
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 }
 
 static inline int ufshcd_monitor_opcode2dir(u8 opcode)
@@ -2048,15 +2057,20 @@ static inline bool ufshcd_should_inform_monitor(struct ufs_hba *hba,
 static void ufshcd_start_monitor(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 {
 	int dir = ufshcd_monitor_opcode2dir(*lrbp->cmd->cmnd);
+	unsigned long flags;
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
 	if (dir >= 0 && hba->monitor.nr_queued[dir]++ == 0)
 		hba->monitor.busy_start_ts[dir] = ktime_get();
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 }
 
 static void ufshcd_update_monitor(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 {
 	int dir = ufshcd_monitor_opcode2dir(*lrbp->cmd->cmnd);
+	unsigned long flags;
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
 	if (dir >= 0 && hba->monitor.nr_queued[dir] > 0) {
 		struct request *req = lrbp->cmd->request;
 		struct ufs_hba_monitor *m = &hba->monitor;
@@ -2080,6 +2094,7 @@ static void ufshcd_update_monitor(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 		/* Push forward the busy start of monitor */
 		m->busy_start_ts[dir] = now;
 	}
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 }
 
 /**
@@ -2091,16 +2106,19 @@ static inline
 void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag)
 {
 	struct ufshcd_lrb *lrbp = &hba->lrb[task_tag];
+	unsigned long flags;
 
 	lrbp->issue_time_stamp = ktime_get();
 	lrbp->compl_time_stamp = ktime_set(0, 0);
 	ufshcd_vops_setup_xfer_req(hba, task_tag, (lrbp->cmd ? true : false));
 	ufshcd_add_command_trace(hba, task_tag, UFS_CMD_SEND);
 	ufshcd_clk_scaling_start_busy(hba);
-	__set_bit(task_tag, &hba->outstanding_reqs);
 	if (unlikely(ufshcd_should_inform_monitor(hba, lrbp)))
 		ufshcd_start_monitor(hba, lrbp);
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	set_bit(task_tag, &hba->outstanding_reqs);
 	ufshcd_writel(hba, 1 << task_tag, REG_UTP_TRANSFER_REQ_DOOR_BELL);
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 	/* Make sure that doorbell is committed immediately */
 	wmb();
 }
@@ -2671,7 +2689,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 {
 	struct ufshcd_lrb *lrbp;
 	struct ufs_hba *hba;
-	unsigned long flags;
 	int tag;
 	int err = 0;
 
@@ -2688,6 +2705,43 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 	if (!down_read_trylock(&hba->clk_scaling_lock))
 		return SCSI_MLQUEUE_HOST_BUSY;
 
+	switch (hba->ufshcd_state) {
+	case UFSHCD_STATE_OPERATIONAL:
+	case UFSHCD_STATE_EH_SCHEDULED_NON_FATAL:
+		break;
+	case UFSHCD_STATE_EH_SCHEDULED_FATAL:
+		/*
+		 * pm_runtime_get_sync() is used at error handling preparation
+		 * stage. If a scsi cmd, e.g. the SSU cmd, is sent from hba's
+		 * PM ops, it can never be finished if we let SCSI layer keep
+		 * retrying it, which gets err handler stuck forever. Neither
+		 * can we let the scsi cmd pass through, because UFS is in bad
+		 * state, the scsi cmd may eventually time out, which will get
+		 * err handler blocked for too long. So, just fail the scsi cmd
+		 * sent from PM ops, err handler can recover PM error anyways.
+		 */
+		if (hba->pm_op_in_progress) {
+			hba->force_reset = true;
+			set_host_byte(cmd, DID_BAD_TARGET);
+			cmd->scsi_done(cmd);
+			goto out;
+		}
+		fallthrough;
+	case UFSHCD_STATE_RESET:
+		err = SCSI_MLQUEUE_HOST_BUSY;
+		goto out;
+	case UFSHCD_STATE_ERROR:
+		set_host_byte(cmd, DID_ERROR);
+		cmd->scsi_done(cmd);
+		goto out;
+	default:
+		dev_WARN_ONCE(hba->dev, 1, "%s: invalid state %d\n",
+				__func__, hba->ufshcd_state);
+		set_host_byte(cmd, DID_BAD_TARGET);
+		cmd->scsi_done(cmd);
+		goto out;
+	}
+
 	hba->req_abort_count = 0;
 
 	err = ufshcd_hold(hba, true);
@@ -2698,8 +2752,7 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 	WARN_ON(ufshcd_is_clkgating_allowed(hba) &&
 		(hba->clk_gating.state != CLKS_ON));
 
-	lrbp = &hba->lrb[tag];
-	if (unlikely(lrbp->in_use)) {
+	if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
 		if (hba->pm_op_in_progress)
 			set_host_byte(cmd, DID_BAD_TARGET);
 		else
@@ -2708,6 +2761,7 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 		goto out;
 	}
 
+	lrbp = &hba->lrb[tag];
 	WARN_ON(lrbp->cmd);
 	lrbp->cmd = cmd;
 	lrbp->sense_bufflen = UFS_SENSE_SIZE;
@@ -2731,51 +2785,7 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 	/* Make sure descriptors are ready before ringing the doorbell */
 	wmb();
 
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	switch (hba->ufshcd_state) {
-	case UFSHCD_STATE_OPERATIONAL:
-	case UFSHCD_STATE_EH_SCHEDULED_NON_FATAL:
-		break;
-	case UFSHCD_STATE_EH_SCHEDULED_FATAL:
-		/*
-		 * pm_runtime_get_sync() is used at error handling preparation
-		 * stage. If a scsi cmd, e.g. the SSU cmd, is sent from hba's
-		 * PM ops, it can never be finished if we let SCSI layer keep
-		 * retrying it, which gets err handler stuck forever. Neither
-		 * can we let the scsi cmd pass through, because UFS is in bad
-		 * state, the scsi cmd may eventually time out, which will get
-		 * err handler blocked for too long. So, just fail the scsi cmd
-		 * sent from PM ops, err handler can recover PM error anyways.
-		 */
-		if (hba->pm_op_in_progress) {
-			hba->force_reset = true;
-			set_host_byte(cmd, DID_BAD_TARGET);
-			goto out_compl_cmd;
-		}
-		fallthrough;
-	case UFSHCD_STATE_RESET:
-		err = SCSI_MLQUEUE_HOST_BUSY;
-		goto out_compl_cmd;
-	case UFSHCD_STATE_ERROR:
-		set_host_byte(cmd, DID_ERROR);
-		goto out_compl_cmd;
-	default:
-		dev_WARN_ONCE(hba->dev, 1, "%s: invalid state %d\n",
-				__func__, hba->ufshcd_state);
-		set_host_byte(cmd, DID_BAD_TARGET);
-		goto out_compl_cmd;
-	}
 	ufshcd_send_command(hba, tag);
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-	goto out;
-
-out_compl_cmd:
-	scsi_dma_unmap(lrbp->cmd);
-	lrbp->cmd = NULL;
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-	ufshcd_release(hba);
-	if (!err)
-		cmd->scsi_done(cmd);
 out:
 	up_read(&hba->clk_scaling_lock);
 	return err;
@@ -2930,7 +2940,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 	int err;
 	int tag;
 	struct completion wait;
-	unsigned long flags;
 
 	down_read(&hba->clk_scaling_lock);
 
@@ -2947,13 +2956,13 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 	tag = req->tag;
 	WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
 
-	init_completion(&wait);
-	lrbp = &hba->lrb[tag];
-	if (unlikely(lrbp->in_use)) {
+	if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
 		err = -EBUSY;
 		goto out;
 	}
 
+	init_completion(&wait);
+	lrbp = &hba->lrb[tag];
 	WARN_ON(lrbp->cmd);
 	err = ufshcd_compose_dev_cmd(hba, lrbp, cmd_type, tag);
 	if (unlikely(err))
@@ -2964,12 +2973,9 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 	ufshcd_add_query_upiu_trace(hba, UFS_QUERY_SEND, lrbp->ucd_req_ptr);
 	/* Make sure descriptors are ready before ringing the doorbell */
 	wmb();
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	ufshcd_send_command(hba, tag);
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
+	ufshcd_send_command(hba, tag);
 	err = ufshcd_wait_for_dev_cmd(hba, lrbp, timeout);
-
 out:
 	ufshcd_add_query_upiu_trace(hba, err ? UFS_QUERY_ERR : UFS_QUERY_COMP,
 				    (struct utp_upiu_req *)lrbp->ucd_rsp_ptr);
@@ -5147,6 +5153,24 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 	return result;
 }
 
+static bool ufshcd_is_auto_hibern8_error(struct ufs_hba *hba,
+					 u32 intr_mask)
+{
+	if (!ufshcd_is_auto_hibern8_supported(hba) ||
+	    !ufshcd_is_auto_hibern8_enabled(hba))
+		return false;
+
+	if (!(intr_mask & UFSHCD_UIC_HIBERN8_MASK))
+		return false;
+
+	if (hba->active_uic_cmd &&
+	    (hba->active_uic_cmd->command == UIC_CMD_DME_HIBER_ENTER ||
+	    hba->active_uic_cmd->command == UIC_CMD_DME_HIBER_EXIT))
+		return false;
+
+	return true;
+}
+
 /**
  * ufshcd_uic_cmd_compl - handle completion of uic command
  * @hba: per adapter instance
@@ -5160,6 +5184,10 @@ static irqreturn_t ufshcd_uic_cmd_compl(struct ufs_hba *hba, u32 intr_status)
 {
 	irqreturn_t retval = IRQ_NONE;
 
+	spin_lock(hba->host->host_lock);
+	if (ufshcd_is_auto_hibern8_error(hba, intr_status))
+		hba->errors |= (UFSHCD_UIC_HIBERN8_MASK & intr_status);
+
 	if ((intr_status & UIC_COMMAND_COMPL) && hba->active_uic_cmd) {
 		hba->active_uic_cmd->argument2 |=
 			ufshcd_get_uic_cmd_result(hba);
@@ -5180,6 +5208,7 @@ static irqreturn_t ufshcd_uic_cmd_compl(struct ufs_hba *hba, u32 intr_status)
 	if (retval == IRQ_HANDLED)
 		ufshcd_add_uic_command_trace(hba, hba->active_uic_cmd,
 					     UFS_CMD_COMP);
+	spin_unlock(hba->host->host_lock);
 	return retval;
 }
 
@@ -5198,8 +5227,9 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 	bool update_scaling = false;
 
 	for_each_set_bit(index, &completed_reqs, hba->nutrs) {
+		if (!test_and_clear_bit(index, &hba->outstanding_reqs))
+			continue;
 		lrbp = &hba->lrb[index];
-		lrbp->in_use = false;
 		lrbp->compl_time_stamp = ktime_get();
 		cmd = lrbp->cmd;
 		if (cmd) {
@@ -5213,7 +5243,7 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 			lrbp->cmd = NULL;
 			/* Do not touch lrbp after scsi done */
 			cmd->scsi_done(cmd);
-			__ufshcd_release(hba);
+			ufshcd_release(hba);
 			update_scaling = true;
 		} else if (lrbp->command_type == UTP_CMD_TYPE_DEV_MANAGE ||
 			lrbp->command_type == UTP_CMD_TYPE_UFS_STORAGE) {
@@ -5224,14 +5254,9 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 				update_scaling = true;
 			}
 		}
-		if (ufshcd_is_clkscaling_supported(hba) && update_scaling)
-			hba->clk_scaling.active_reqs--;
+		if (update_scaling)
+			ufshcd_clk_scaling_update_busy(hba);
 	}
-
-	/* clear corresponding bits of completed commands */
-	hba->outstanding_reqs ^= completed_reqs;
-
-	ufshcd_clk_scaling_update_busy(hba);
 }
 
 /**
@@ -5244,7 +5269,7 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
  */
 static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
 {
-	unsigned long completed_reqs;
+	unsigned long completed_reqs, flags;
 	u32 tr_doorbell;
 
 	/* Resetting interrupt aggregation counters first and reading the
@@ -5258,8 +5283,10 @@ static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
 	    !(hba->quirks & UFSHCI_QUIRK_SKIP_RESET_INTR_AGGR))
 		ufshcd_reset_intr_aggr(hba);
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
 	tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
 	completed_reqs = tr_doorbell ^ hba->outstanding_reqs;
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
 	if (completed_reqs) {
 		__ufshcd_transfer_req_compl(hba, completed_reqs);
@@ -5996,13 +6023,11 @@ static void ufshcd_err_handler(struct work_struct *work)
 	ufshcd_set_eh_in_progress(hba);
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 	ufshcd_err_handling_prepare(hba);
+	/* Complete requests that have door-bell cleared by h/w */
+	ufshcd_complete_requests(hba);
 	spin_lock_irqsave(hba->host->host_lock, flags);
 	if (hba->ufshcd_state != UFSHCD_STATE_ERROR)
 		hba->ufshcd_state = UFSHCD_STATE_RESET;
-
-	/* Complete requests that have door-bell cleared by h/w */
-	ufshcd_complete_requests(hba);
-
 	/*
 	 * A full reset and restore might have happened after preparation
 	 * is finished, double check whether we should stop.
@@ -6085,12 +6110,11 @@ static void ufshcd_err_handler(struct work_struct *work)
 	}
 
 lock_skip_pending_xfer_clear:
-	spin_lock_irqsave(hba->host->host_lock, flags);
-
 	/* Complete the requests that are cleared by s/w */
 	ufshcd_complete_requests(hba);
-	hba->silence_err_logs = false;
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	hba->silence_err_logs = false;
 	if (err_xfer || err_tm) {
 		needs_reset = true;
 		goto do_reset;
@@ -6240,37 +6264,23 @@ static irqreturn_t ufshcd_update_uic_error(struct ufs_hba *hba)
 	return retval;
 }
 
-static bool ufshcd_is_auto_hibern8_error(struct ufs_hba *hba,
-					 u32 intr_mask)
-{
-	if (!ufshcd_is_auto_hibern8_supported(hba) ||
-	    !ufshcd_is_auto_hibern8_enabled(hba))
-		return false;
-
-	if (!(intr_mask & UFSHCD_UIC_HIBERN8_MASK))
-		return false;
-
-	if (hba->active_uic_cmd &&
-	    (hba->active_uic_cmd->command == UIC_CMD_DME_HIBER_ENTER ||
-	    hba->active_uic_cmd->command == UIC_CMD_DME_HIBER_EXIT))
-		return false;
-
-	return true;
-}
-
 /**
  * ufshcd_check_errors - Check for errors that need s/w attention
  * @hba: per-adapter instance
+ * @intr_status: interrupt status generated by the controller
  *
  * Returns
  *  IRQ_HANDLED - If interrupt is valid
  *  IRQ_NONE    - If invalid interrupt
  */
-static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba)
+static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba, u32 intr_status)
 {
 	bool queue_eh_work = false;
 	irqreturn_t retval = IRQ_NONE;
 
+	spin_lock(hba->host->host_lock);
+	hba->errors |= UFSHCD_ERROR_MASK & intr_status;
+
 	if (hba->errors & INT_FATAL_ERRORS) {
 		ufshcd_update_evt_hist(hba, UFS_EVT_FATAL_ERR,
 				       hba->errors);
@@ -6325,6 +6335,9 @@ static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba)
 	 * itself without s/w intervention or errors that will be
 	 * handled by the SCSI core layer.
 	 */
+	hba->errors = 0;
+	hba->uic_error = 0;
+	spin_unlock(hba->host->host_lock);
 	return retval;
 }
 
@@ -6359,13 +6372,17 @@ static bool ufshcd_compl_tm(struct request *req, void *priv, bool reserved)
  */
 static irqreturn_t ufshcd_tmc_handler(struct ufs_hba *hba)
 {
+	unsigned long flags;
 	struct request_queue *q = hba->tmf_queue;
 	struct ctm_info ci = {
 		.hba	 = hba,
-		.pending = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL),
 	};
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	ci.pending = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL);
 	blk_mq_tagset_busy_iter(q->tag_set, ufshcd_compl_tm, &ci);
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
+
 	return ci.ncpl ? IRQ_HANDLED : IRQ_NONE;
 }
 
@@ -6382,17 +6399,12 @@ static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
 {
 	irqreturn_t retval = IRQ_NONE;
 
-	hba->errors = UFSHCD_ERROR_MASK & intr_status;
-
-	if (ufshcd_is_auto_hibern8_error(hba, intr_status))
-		hba->errors |= (UFSHCD_UIC_HIBERN8_MASK & intr_status);
-
-	if (hba->errors)
-		retval |= ufshcd_check_errors(hba);
-
 	if (intr_status & UFSHCD_UIC_MASK)
 		retval |= ufshcd_uic_cmd_compl(hba, intr_status);
 
+	if (intr_status & UFSHCD_ERROR_MASK || hba->errors)
+		retval |= ufshcd_check_errors(hba, intr_status);
+
 	if (intr_status & UTP_TASK_REQ_COMPL)
 		retval |= ufshcd_tmc_handler(hba);
 
@@ -6418,7 +6430,6 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
 	struct ufs_hba *hba = __hba;
 	int retries = hba->nutrs;
 
-	spin_lock(hba->host->host_lock);
 	intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
 	hba->ufs_stats.last_intr_status = intr_status;
 	hba->ufs_stats.last_intr_ts = ktime_get();
@@ -6449,7 +6460,6 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
 		ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
 	}
 
-	spin_unlock(hba->host->host_lock);
 	return retval;
 }
 
@@ -6626,7 +6636,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 	int err = 0;
 	int tag;
 	struct completion wait;
-	unsigned long flags;
 	u8 upiu_flags;
 
 	down_read(&hba->clk_scaling_lock);
@@ -6639,13 +6648,13 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 	tag = req->tag;
 	WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
 
-	init_completion(&wait);
-	lrbp = &hba->lrb[tag];
-	if (unlikely(lrbp->in_use)) {
+	if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
 		err = -EBUSY;
 		goto out;
 	}
 
+	init_completion(&wait);
+	lrbp = &hba->lrb[tag];
 	WARN_ON(lrbp->cmd);
 	lrbp->cmd = NULL;
 	lrbp->sense_bufflen = 0;
@@ -6683,10 +6692,8 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 
 	/* Make sure descriptors are ready before ringing the doorbell */
 	wmb();
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	ufshcd_send_command(hba, tag);
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
+	ufshcd_send_command(hba, tag);
 	/*
 	 * ignore the returning value here - ufshcd_check_query_response is
 	 * bound to fail since dev_cmd.query and dev_cmd.type were left empty.
@@ -6805,7 +6812,6 @@ static int ufshcd_eh_device_reset_handler(struct scsi_cmnd *cmd)
 	u32 pos;
 	int err;
 	u8 resp = 0xF, lun;
-	unsigned long flags;
 
 	host = cmd->device->host;
 	hba = shost_priv(host);
@@ -6824,11 +6830,9 @@ static int ufshcd_eh_device_reset_handler(struct scsi_cmnd *cmd)
 			err = ufshcd_clear_cmd(hba, pos);
 			if (err)
 				break;
+			__ufshcd_transfer_req_compl(hba, pos);
 		}
 	}
-	spin_lock_irqsave(host->host_lock, flags);
-	ufshcd_transfer_req_compl(hba);
-	spin_unlock_irqrestore(host->host_lock, flags);
 
 out:
 	hba->req_abort_count = 0;
@@ -7005,20 +7009,16 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
 	 * will fail, due to spec violation, scsi err handling next step
 	 * will be to send LU reset which, again, is a spec violation.
 	 * To avoid these unnecessary/illegal steps, first we clean up
-	 * the lrb taken by this cmd and mark the lrb as in_use, then
-	 * queue the eh_work and bail.
+	 * the lrb taken by this cmd and re-set it in outstanding_reqs,
+	 * then queue the eh_work and bail.
 	 */
 	if (lrbp->lun == UFS_UPIU_UFS_DEVICE_WLUN) {
 		ufshcd_update_evt_hist(hba, UFS_EVT_ABORT, lrbp->lun);
+		__ufshcd_transfer_req_compl(hba, (1UL << tag));
+		set_bit(tag, &hba->outstanding_reqs);
 		spin_lock_irqsave(host->host_lock, flags);
-		if (lrbp->cmd) {
-			__ufshcd_transfer_req_compl(hba, (1UL << tag));
-			__set_bit(tag, &hba->outstanding_reqs);
-			lrbp->in_use = true;
-			hba->force_reset = true;
-			ufshcd_schedule_eh_work(hba);
-		}
-
+		hba->force_reset = true;
+		ufshcd_schedule_eh_work(hba);
 		spin_unlock_irqrestore(host->host_lock, flags);
 		goto out;
 	}
@@ -7031,9 +7031,7 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
 
 	if (!err) {
 cleanup:
-		spin_lock_irqsave(host->host_lock, flags);
 		__ufshcd_transfer_req_compl(hba, (1UL << tag));
-		spin_unlock_irqrestore(host->host_lock, flags);
 out:
 		err = SUCCESS;
 	} else {
@@ -7063,19 +7061,15 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
 static int ufshcd_host_reset_and_restore(struct ufs_hba *hba)
 {
 	int err;
-	unsigned long flags;
 
 	/*
 	 * Stop the host controller and complete the requests
 	 * cleared by h/w
 	 */
 	ufshcd_hba_stop(hba);
-
-	spin_lock_irqsave(hba->host->host_lock, flags);
 	hba->silence_err_logs = true;
 	ufshcd_complete_requests(hba);
 	hba->silence_err_logs = false;
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
 	/* scale up clocks to max frequency before full reinitialization */
 	ufshcd_set_clk_freq(hba, true);
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 0f0e62b..a70daf7 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -193,7 +193,6 @@ struct ufs_pm_lvl_states {
  * @crypto_key_slot: the key slot to use for inline crypto (-1 if none)
  * @data_unit_num: the data unit number for the first block for inline crypto
  * @req_abort_skip: skip request abort task flag
- * @in_use: indicates that this lrb is still in use
  */
 struct ufshcd_lrb {
 	struct utp_transfer_req_desc *utr_descriptor_ptr;
@@ -223,7 +222,6 @@ struct ufshcd_lrb {
 #endif
 
 	bool req_abort_skip;
-	bool in_use;
 };
 
 /**
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-05-24  8:36   ` Can Guo
  0 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-05-24  8:36 UTC (permalink / raw)
  To: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, cang
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

Current UFS IRQ handler is completely wrapped by host lock, and because
ufshcd_send_command() is also protected by host lock, when IRQ handler
fires, not only the CPU running the IRQ handler cannot send new requests,
the rest CPUs can neither. Move the host lock wrapping the IRQ handler into
specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further
reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is
no longer required to call __ufshcd_transfer_req_compl(). As per test, the
optimization can bring considerable gain to random read/write performance.

Cc: Stanley Chu <stanley.chu@mediatek.com>
Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c | 258 ++++++++++++++++++++++------------------------
 drivers/scsi/ufs/ufshcd.h |   2 -
 2 files changed, 126 insertions(+), 134 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index c4b37d2..b9b5e61 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -758,7 +758,7 @@ static inline void ufshcd_utmrl_clear(struct ufs_hba *hba, u32 pos)
  */
 static inline void ufshcd_outstanding_req_clear(struct ufs_hba *hba, int tag)
 {
-	__clear_bit(tag, &hba->outstanding_reqs);
+	clear_bit(tag, &hba->outstanding_reqs);
 }
 
 /**
@@ -1984,15 +1984,19 @@ static void ufshcd_clk_scaling_start_busy(struct ufs_hba *hba)
 {
 	bool queue_resume_work = false;
 	ktime_t curr_t = ktime_get();
+	unsigned long flags;
 
 	if (!ufshcd_is_clkscaling_supported(hba))
 		return;
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
 	if (!hba->clk_scaling.active_reqs++)
 		queue_resume_work = true;
 
-	if (!hba->clk_scaling.is_enabled || hba->pm_op_in_progress)
+	if (!hba->clk_scaling.is_enabled || hba->pm_op_in_progress) {
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
 		return;
+	}
 
 	if (queue_resume_work)
 		queue_work(hba->clk_scaling.workq,
@@ -2008,21 +2012,26 @@ static void ufshcd_clk_scaling_start_busy(struct ufs_hba *hba)
 		hba->clk_scaling.busy_start_t = curr_t;
 		hba->clk_scaling.is_busy_started = true;
 	}
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 }
 
 static void ufshcd_clk_scaling_update_busy(struct ufs_hba *hba)
 {
 	struct ufs_clk_scaling *scaling = &hba->clk_scaling;
+	unsigned long flags;
 
 	if (!ufshcd_is_clkscaling_supported(hba))
 		return;
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	hba->clk_scaling.active_reqs--;
 	if (!hba->outstanding_reqs && scaling->is_busy_started) {
 		scaling->tot_busy_t += ktime_to_us(ktime_sub(ktime_get(),
 					scaling->busy_start_t));
 		scaling->busy_start_t = 0;
 		scaling->is_busy_started = false;
 	}
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 }
 
 static inline int ufshcd_monitor_opcode2dir(u8 opcode)
@@ -2048,15 +2057,20 @@ static inline bool ufshcd_should_inform_monitor(struct ufs_hba *hba,
 static void ufshcd_start_monitor(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 {
 	int dir = ufshcd_monitor_opcode2dir(*lrbp->cmd->cmnd);
+	unsigned long flags;
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
 	if (dir >= 0 && hba->monitor.nr_queued[dir]++ == 0)
 		hba->monitor.busy_start_ts[dir] = ktime_get();
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 }
 
 static void ufshcd_update_monitor(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 {
 	int dir = ufshcd_monitor_opcode2dir(*lrbp->cmd->cmnd);
+	unsigned long flags;
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
 	if (dir >= 0 && hba->monitor.nr_queued[dir] > 0) {
 		struct request *req = lrbp->cmd->request;
 		struct ufs_hba_monitor *m = &hba->monitor;
@@ -2080,6 +2094,7 @@ static void ufshcd_update_monitor(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 		/* Push forward the busy start of monitor */
 		m->busy_start_ts[dir] = now;
 	}
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 }
 
 /**
@@ -2091,16 +2106,19 @@ static inline
 void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag)
 {
 	struct ufshcd_lrb *lrbp = &hba->lrb[task_tag];
+	unsigned long flags;
 
 	lrbp->issue_time_stamp = ktime_get();
 	lrbp->compl_time_stamp = ktime_set(0, 0);
 	ufshcd_vops_setup_xfer_req(hba, task_tag, (lrbp->cmd ? true : false));
 	ufshcd_add_command_trace(hba, task_tag, UFS_CMD_SEND);
 	ufshcd_clk_scaling_start_busy(hba);
-	__set_bit(task_tag, &hba->outstanding_reqs);
 	if (unlikely(ufshcd_should_inform_monitor(hba, lrbp)))
 		ufshcd_start_monitor(hba, lrbp);
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	set_bit(task_tag, &hba->outstanding_reqs);
 	ufshcd_writel(hba, 1 << task_tag, REG_UTP_TRANSFER_REQ_DOOR_BELL);
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 	/* Make sure that doorbell is committed immediately */
 	wmb();
 }
@@ -2671,7 +2689,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 {
 	struct ufshcd_lrb *lrbp;
 	struct ufs_hba *hba;
-	unsigned long flags;
 	int tag;
 	int err = 0;
 
@@ -2688,6 +2705,43 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 	if (!down_read_trylock(&hba->clk_scaling_lock))
 		return SCSI_MLQUEUE_HOST_BUSY;
 
+	switch (hba->ufshcd_state) {
+	case UFSHCD_STATE_OPERATIONAL:
+	case UFSHCD_STATE_EH_SCHEDULED_NON_FATAL:
+		break;
+	case UFSHCD_STATE_EH_SCHEDULED_FATAL:
+		/*
+		 * pm_runtime_get_sync() is used at error handling preparation
+		 * stage. If a scsi cmd, e.g. the SSU cmd, is sent from hba's
+		 * PM ops, it can never be finished if we let SCSI layer keep
+		 * retrying it, which gets err handler stuck forever. Neither
+		 * can we let the scsi cmd pass through, because UFS is in bad
+		 * state, the scsi cmd may eventually time out, which will get
+		 * err handler blocked for too long. So, just fail the scsi cmd
+		 * sent from PM ops, err handler can recover PM error anyways.
+		 */
+		if (hba->pm_op_in_progress) {
+			hba->force_reset = true;
+			set_host_byte(cmd, DID_BAD_TARGET);
+			cmd->scsi_done(cmd);
+			goto out;
+		}
+		fallthrough;
+	case UFSHCD_STATE_RESET:
+		err = SCSI_MLQUEUE_HOST_BUSY;
+		goto out;
+	case UFSHCD_STATE_ERROR:
+		set_host_byte(cmd, DID_ERROR);
+		cmd->scsi_done(cmd);
+		goto out;
+	default:
+		dev_WARN_ONCE(hba->dev, 1, "%s: invalid state %d\n",
+				__func__, hba->ufshcd_state);
+		set_host_byte(cmd, DID_BAD_TARGET);
+		cmd->scsi_done(cmd);
+		goto out;
+	}
+
 	hba->req_abort_count = 0;
 
 	err = ufshcd_hold(hba, true);
@@ -2698,8 +2752,7 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 	WARN_ON(ufshcd_is_clkgating_allowed(hba) &&
 		(hba->clk_gating.state != CLKS_ON));
 
-	lrbp = &hba->lrb[tag];
-	if (unlikely(lrbp->in_use)) {
+	if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
 		if (hba->pm_op_in_progress)
 			set_host_byte(cmd, DID_BAD_TARGET);
 		else
@@ -2708,6 +2761,7 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 		goto out;
 	}
 
+	lrbp = &hba->lrb[tag];
 	WARN_ON(lrbp->cmd);
 	lrbp->cmd = cmd;
 	lrbp->sense_bufflen = UFS_SENSE_SIZE;
@@ -2731,51 +2785,7 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 	/* Make sure descriptors are ready before ringing the doorbell */
 	wmb();
 
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	switch (hba->ufshcd_state) {
-	case UFSHCD_STATE_OPERATIONAL:
-	case UFSHCD_STATE_EH_SCHEDULED_NON_FATAL:
-		break;
-	case UFSHCD_STATE_EH_SCHEDULED_FATAL:
-		/*
-		 * pm_runtime_get_sync() is used at error handling preparation
-		 * stage. If a scsi cmd, e.g. the SSU cmd, is sent from hba's
-		 * PM ops, it can never be finished if we let SCSI layer keep
-		 * retrying it, which gets err handler stuck forever. Neither
-		 * can we let the scsi cmd pass through, because UFS is in bad
-		 * state, the scsi cmd may eventually time out, which will get
-		 * err handler blocked for too long. So, just fail the scsi cmd
-		 * sent from PM ops, err handler can recover PM error anyways.
-		 */
-		if (hba->pm_op_in_progress) {
-			hba->force_reset = true;
-			set_host_byte(cmd, DID_BAD_TARGET);
-			goto out_compl_cmd;
-		}
-		fallthrough;
-	case UFSHCD_STATE_RESET:
-		err = SCSI_MLQUEUE_HOST_BUSY;
-		goto out_compl_cmd;
-	case UFSHCD_STATE_ERROR:
-		set_host_byte(cmd, DID_ERROR);
-		goto out_compl_cmd;
-	default:
-		dev_WARN_ONCE(hba->dev, 1, "%s: invalid state %d\n",
-				__func__, hba->ufshcd_state);
-		set_host_byte(cmd, DID_BAD_TARGET);
-		goto out_compl_cmd;
-	}
 	ufshcd_send_command(hba, tag);
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-	goto out;
-
-out_compl_cmd:
-	scsi_dma_unmap(lrbp->cmd);
-	lrbp->cmd = NULL;
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-	ufshcd_release(hba);
-	if (!err)
-		cmd->scsi_done(cmd);
 out:
 	up_read(&hba->clk_scaling_lock);
 	return err;
@@ -2930,7 +2940,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 	int err;
 	int tag;
 	struct completion wait;
-	unsigned long flags;
 
 	down_read(&hba->clk_scaling_lock);
 
@@ -2947,13 +2956,13 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 	tag = req->tag;
 	WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
 
-	init_completion(&wait);
-	lrbp = &hba->lrb[tag];
-	if (unlikely(lrbp->in_use)) {
+	if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
 		err = -EBUSY;
 		goto out;
 	}
 
+	init_completion(&wait);
+	lrbp = &hba->lrb[tag];
 	WARN_ON(lrbp->cmd);
 	err = ufshcd_compose_dev_cmd(hba, lrbp, cmd_type, tag);
 	if (unlikely(err))
@@ -2964,12 +2973,9 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 	ufshcd_add_query_upiu_trace(hba, UFS_QUERY_SEND, lrbp->ucd_req_ptr);
 	/* Make sure descriptors are ready before ringing the doorbell */
 	wmb();
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	ufshcd_send_command(hba, tag);
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
+	ufshcd_send_command(hba, tag);
 	err = ufshcd_wait_for_dev_cmd(hba, lrbp, timeout);
-
 out:
 	ufshcd_add_query_upiu_trace(hba, err ? UFS_QUERY_ERR : UFS_QUERY_COMP,
 				    (struct utp_upiu_req *)lrbp->ucd_rsp_ptr);
@@ -5147,6 +5153,24 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 	return result;
 }
 
+static bool ufshcd_is_auto_hibern8_error(struct ufs_hba *hba,
+					 u32 intr_mask)
+{
+	if (!ufshcd_is_auto_hibern8_supported(hba) ||
+	    !ufshcd_is_auto_hibern8_enabled(hba))
+		return false;
+
+	if (!(intr_mask & UFSHCD_UIC_HIBERN8_MASK))
+		return false;
+
+	if (hba->active_uic_cmd &&
+	    (hba->active_uic_cmd->command == UIC_CMD_DME_HIBER_ENTER ||
+	    hba->active_uic_cmd->command == UIC_CMD_DME_HIBER_EXIT))
+		return false;
+
+	return true;
+}
+
 /**
  * ufshcd_uic_cmd_compl - handle completion of uic command
  * @hba: per adapter instance
@@ -5160,6 +5184,10 @@ static irqreturn_t ufshcd_uic_cmd_compl(struct ufs_hba *hba, u32 intr_status)
 {
 	irqreturn_t retval = IRQ_NONE;
 
+	spin_lock(hba->host->host_lock);
+	if (ufshcd_is_auto_hibern8_error(hba, intr_status))
+		hba->errors |= (UFSHCD_UIC_HIBERN8_MASK & intr_status);
+
 	if ((intr_status & UIC_COMMAND_COMPL) && hba->active_uic_cmd) {
 		hba->active_uic_cmd->argument2 |=
 			ufshcd_get_uic_cmd_result(hba);
@@ -5180,6 +5208,7 @@ static irqreturn_t ufshcd_uic_cmd_compl(struct ufs_hba *hba, u32 intr_status)
 	if (retval == IRQ_HANDLED)
 		ufshcd_add_uic_command_trace(hba, hba->active_uic_cmd,
 					     UFS_CMD_COMP);
+	spin_unlock(hba->host->host_lock);
 	return retval;
 }
 
@@ -5198,8 +5227,9 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 	bool update_scaling = false;
 
 	for_each_set_bit(index, &completed_reqs, hba->nutrs) {
+		if (!test_and_clear_bit(index, &hba->outstanding_reqs))
+			continue;
 		lrbp = &hba->lrb[index];
-		lrbp->in_use = false;
 		lrbp->compl_time_stamp = ktime_get();
 		cmd = lrbp->cmd;
 		if (cmd) {
@@ -5213,7 +5243,7 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 			lrbp->cmd = NULL;
 			/* Do not touch lrbp after scsi done */
 			cmd->scsi_done(cmd);
-			__ufshcd_release(hba);
+			ufshcd_release(hba);
 			update_scaling = true;
 		} else if (lrbp->command_type == UTP_CMD_TYPE_DEV_MANAGE ||
 			lrbp->command_type == UTP_CMD_TYPE_UFS_STORAGE) {
@@ -5224,14 +5254,9 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 				update_scaling = true;
 			}
 		}
-		if (ufshcd_is_clkscaling_supported(hba) && update_scaling)
-			hba->clk_scaling.active_reqs--;
+		if (update_scaling)
+			ufshcd_clk_scaling_update_busy(hba);
 	}
-
-	/* clear corresponding bits of completed commands */
-	hba->outstanding_reqs ^= completed_reqs;
-
-	ufshcd_clk_scaling_update_busy(hba);
 }
 
 /**
@@ -5244,7 +5269,7 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
  */
 static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
 {
-	unsigned long completed_reqs;
+	unsigned long completed_reqs, flags;
 	u32 tr_doorbell;
 
 	/* Resetting interrupt aggregation counters first and reading the
@@ -5258,8 +5283,10 @@ static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
 	    !(hba->quirks & UFSHCI_QUIRK_SKIP_RESET_INTR_AGGR))
 		ufshcd_reset_intr_aggr(hba);
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
 	tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
 	completed_reqs = tr_doorbell ^ hba->outstanding_reqs;
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
 	if (completed_reqs) {
 		__ufshcd_transfer_req_compl(hba, completed_reqs);
@@ -5996,13 +6023,11 @@ static void ufshcd_err_handler(struct work_struct *work)
 	ufshcd_set_eh_in_progress(hba);
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 	ufshcd_err_handling_prepare(hba);
+	/* Complete requests that have door-bell cleared by h/w */
+	ufshcd_complete_requests(hba);
 	spin_lock_irqsave(hba->host->host_lock, flags);
 	if (hba->ufshcd_state != UFSHCD_STATE_ERROR)
 		hba->ufshcd_state = UFSHCD_STATE_RESET;
-
-	/* Complete requests that have door-bell cleared by h/w */
-	ufshcd_complete_requests(hba);
-
 	/*
 	 * A full reset and restore might have happened after preparation
 	 * is finished, double check whether we should stop.
@@ -6085,12 +6110,11 @@ static void ufshcd_err_handler(struct work_struct *work)
 	}
 
 lock_skip_pending_xfer_clear:
-	spin_lock_irqsave(hba->host->host_lock, flags);
-
 	/* Complete the requests that are cleared by s/w */
 	ufshcd_complete_requests(hba);
-	hba->silence_err_logs = false;
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	hba->silence_err_logs = false;
 	if (err_xfer || err_tm) {
 		needs_reset = true;
 		goto do_reset;
@@ -6240,37 +6264,23 @@ static irqreturn_t ufshcd_update_uic_error(struct ufs_hba *hba)
 	return retval;
 }
 
-static bool ufshcd_is_auto_hibern8_error(struct ufs_hba *hba,
-					 u32 intr_mask)
-{
-	if (!ufshcd_is_auto_hibern8_supported(hba) ||
-	    !ufshcd_is_auto_hibern8_enabled(hba))
-		return false;
-
-	if (!(intr_mask & UFSHCD_UIC_HIBERN8_MASK))
-		return false;
-
-	if (hba->active_uic_cmd &&
-	    (hba->active_uic_cmd->command == UIC_CMD_DME_HIBER_ENTER ||
-	    hba->active_uic_cmd->command == UIC_CMD_DME_HIBER_EXIT))
-		return false;
-
-	return true;
-}
-
 /**
  * ufshcd_check_errors - Check for errors that need s/w attention
  * @hba: per-adapter instance
+ * @intr_status: interrupt status generated by the controller
  *
  * Returns
  *  IRQ_HANDLED - If interrupt is valid
  *  IRQ_NONE    - If invalid interrupt
  */
-static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba)
+static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba, u32 intr_status)
 {
 	bool queue_eh_work = false;
 	irqreturn_t retval = IRQ_NONE;
 
+	spin_lock(hba->host->host_lock);
+	hba->errors |= UFSHCD_ERROR_MASK & intr_status;
+
 	if (hba->errors & INT_FATAL_ERRORS) {
 		ufshcd_update_evt_hist(hba, UFS_EVT_FATAL_ERR,
 				       hba->errors);
@@ -6325,6 +6335,9 @@ static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba)
 	 * itself without s/w intervention or errors that will be
 	 * handled by the SCSI core layer.
 	 */
+	hba->errors = 0;
+	hba->uic_error = 0;
+	spin_unlock(hba->host->host_lock);
 	return retval;
 }
 
@@ -6359,13 +6372,17 @@ static bool ufshcd_compl_tm(struct request *req, void *priv, bool reserved)
  */
 static irqreturn_t ufshcd_tmc_handler(struct ufs_hba *hba)
 {
+	unsigned long flags;
 	struct request_queue *q = hba->tmf_queue;
 	struct ctm_info ci = {
 		.hba	 = hba,
-		.pending = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL),
 	};
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	ci.pending = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL);
 	blk_mq_tagset_busy_iter(q->tag_set, ufshcd_compl_tm, &ci);
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
+
 	return ci.ncpl ? IRQ_HANDLED : IRQ_NONE;
 }
 
@@ -6382,17 +6399,12 @@ static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
 {
 	irqreturn_t retval = IRQ_NONE;
 
-	hba->errors = UFSHCD_ERROR_MASK & intr_status;
-
-	if (ufshcd_is_auto_hibern8_error(hba, intr_status))
-		hba->errors |= (UFSHCD_UIC_HIBERN8_MASK & intr_status);
-
-	if (hba->errors)
-		retval |= ufshcd_check_errors(hba);
-
 	if (intr_status & UFSHCD_UIC_MASK)
 		retval |= ufshcd_uic_cmd_compl(hba, intr_status);
 
+	if (intr_status & UFSHCD_ERROR_MASK || hba->errors)
+		retval |= ufshcd_check_errors(hba, intr_status);
+
 	if (intr_status & UTP_TASK_REQ_COMPL)
 		retval |= ufshcd_tmc_handler(hba);
 
@@ -6418,7 +6430,6 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
 	struct ufs_hba *hba = __hba;
 	int retries = hba->nutrs;
 
-	spin_lock(hba->host->host_lock);
 	intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
 	hba->ufs_stats.last_intr_status = intr_status;
 	hba->ufs_stats.last_intr_ts = ktime_get();
@@ -6449,7 +6460,6 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
 		ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
 	}
 
-	spin_unlock(hba->host->host_lock);
 	return retval;
 }
 
@@ -6626,7 +6636,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 	int err = 0;
 	int tag;
 	struct completion wait;
-	unsigned long flags;
 	u8 upiu_flags;
 
 	down_read(&hba->clk_scaling_lock);
@@ -6639,13 +6648,13 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 	tag = req->tag;
 	WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
 
-	init_completion(&wait);
-	lrbp = &hba->lrb[tag];
-	if (unlikely(lrbp->in_use)) {
+	if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
 		err = -EBUSY;
 		goto out;
 	}
 
+	init_completion(&wait);
+	lrbp = &hba->lrb[tag];
 	WARN_ON(lrbp->cmd);
 	lrbp->cmd = NULL;
 	lrbp->sense_bufflen = 0;
@@ -6683,10 +6692,8 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 
 	/* Make sure descriptors are ready before ringing the doorbell */
 	wmb();
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	ufshcd_send_command(hba, tag);
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
+	ufshcd_send_command(hba, tag);
 	/*
 	 * ignore the returning value here - ufshcd_check_query_response is
 	 * bound to fail since dev_cmd.query and dev_cmd.type were left empty.
@@ -6805,7 +6812,6 @@ static int ufshcd_eh_device_reset_handler(struct scsi_cmnd *cmd)
 	u32 pos;
 	int err;
 	u8 resp = 0xF, lun;
-	unsigned long flags;
 
 	host = cmd->device->host;
 	hba = shost_priv(host);
@@ -6824,11 +6830,9 @@ static int ufshcd_eh_device_reset_handler(struct scsi_cmnd *cmd)
 			err = ufshcd_clear_cmd(hba, pos);
 			if (err)
 				break;
+			__ufshcd_transfer_req_compl(hba, pos);
 		}
 	}
-	spin_lock_irqsave(host->host_lock, flags);
-	ufshcd_transfer_req_compl(hba);
-	spin_unlock_irqrestore(host->host_lock, flags);
 
 out:
 	hba->req_abort_count = 0;
@@ -7005,20 +7009,16 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
 	 * will fail, due to spec violation, scsi err handling next step
 	 * will be to send LU reset which, again, is a spec violation.
 	 * To avoid these unnecessary/illegal steps, first we clean up
-	 * the lrb taken by this cmd and mark the lrb as in_use, then
-	 * queue the eh_work and bail.
+	 * the lrb taken by this cmd and re-set it in outstanding_reqs,
+	 * then queue the eh_work and bail.
 	 */
 	if (lrbp->lun == UFS_UPIU_UFS_DEVICE_WLUN) {
 		ufshcd_update_evt_hist(hba, UFS_EVT_ABORT, lrbp->lun);
+		__ufshcd_transfer_req_compl(hba, (1UL << tag));
+		set_bit(tag, &hba->outstanding_reqs);
 		spin_lock_irqsave(host->host_lock, flags);
-		if (lrbp->cmd) {
-			__ufshcd_transfer_req_compl(hba, (1UL << tag));
-			__set_bit(tag, &hba->outstanding_reqs);
-			lrbp->in_use = true;
-			hba->force_reset = true;
-			ufshcd_schedule_eh_work(hba);
-		}
-
+		hba->force_reset = true;
+		ufshcd_schedule_eh_work(hba);
 		spin_unlock_irqrestore(host->host_lock, flags);
 		goto out;
 	}
@@ -7031,9 +7031,7 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
 
 	if (!err) {
 cleanup:
-		spin_lock_irqsave(host->host_lock, flags);
 		__ufshcd_transfer_req_compl(hba, (1UL << tag));
-		spin_unlock_irqrestore(host->host_lock, flags);
 out:
 		err = SUCCESS;
 	} else {
@@ -7063,19 +7061,15 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
 static int ufshcd_host_reset_and_restore(struct ufs_hba *hba)
 {
 	int err;
-	unsigned long flags;
 
 	/*
 	 * Stop the host controller and complete the requests
 	 * cleared by h/w
 	 */
 	ufshcd_hba_stop(hba);
-
-	spin_lock_irqsave(hba->host->host_lock, flags);
 	hba->silence_err_logs = true;
 	ufshcd_complete_requests(hba);
 	hba->silence_err_logs = false;
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
 	/* scale up clocks to max frequency before full reinitialization */
 	ufshcd_set_clk_freq(hba, true);
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 0f0e62b..a70daf7 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -193,7 +193,6 @@ struct ufs_pm_lvl_states {
  * @crypto_key_slot: the key slot to use for inline crypto (-1 if none)
  * @data_unit_num: the data unit number for the first block for inline crypto
  * @req_abort_skip: skip request abort task flag
- * @in_use: indicates that this lrb is still in use
  */
 struct ufshcd_lrb {
 	struct utp_transfer_req_desc *utr_descriptor_ptr;
@@ -223,7 +222,6 @@ struct ufshcd_lrb {
 #endif
 
 	bool req_abort_skip;
-	bool in_use;
 };
 
 /**
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-05-24  8:36   ` Can Guo
  0 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-05-24  8:36 UTC (permalink / raw)
  To: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, cang
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

Current UFS IRQ handler is completely wrapped by host lock, and because
ufshcd_send_command() is also protected by host lock, when IRQ handler
fires, not only the CPU running the IRQ handler cannot send new requests,
the rest CPUs can neither. Move the host lock wrapping the IRQ handler into
specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further
reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is
no longer required to call __ufshcd_transfer_req_compl(). As per test, the
optimization can bring considerable gain to random read/write performance.

Cc: Stanley Chu <stanley.chu@mediatek.com>
Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c | 258 ++++++++++++++++++++++------------------------
 drivers/scsi/ufs/ufshcd.h |   2 -
 2 files changed, 126 insertions(+), 134 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index c4b37d2..b9b5e61 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -758,7 +758,7 @@ static inline void ufshcd_utmrl_clear(struct ufs_hba *hba, u32 pos)
  */
 static inline void ufshcd_outstanding_req_clear(struct ufs_hba *hba, int tag)
 {
-	__clear_bit(tag, &hba->outstanding_reqs);
+	clear_bit(tag, &hba->outstanding_reqs);
 }
 
 /**
@@ -1984,15 +1984,19 @@ static void ufshcd_clk_scaling_start_busy(struct ufs_hba *hba)
 {
 	bool queue_resume_work = false;
 	ktime_t curr_t = ktime_get();
+	unsigned long flags;
 
 	if (!ufshcd_is_clkscaling_supported(hba))
 		return;
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
 	if (!hba->clk_scaling.active_reqs++)
 		queue_resume_work = true;
 
-	if (!hba->clk_scaling.is_enabled || hba->pm_op_in_progress)
+	if (!hba->clk_scaling.is_enabled || hba->pm_op_in_progress) {
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
 		return;
+	}
 
 	if (queue_resume_work)
 		queue_work(hba->clk_scaling.workq,
@@ -2008,21 +2012,26 @@ static void ufshcd_clk_scaling_start_busy(struct ufs_hba *hba)
 		hba->clk_scaling.busy_start_t = curr_t;
 		hba->clk_scaling.is_busy_started = true;
 	}
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 }
 
 static void ufshcd_clk_scaling_update_busy(struct ufs_hba *hba)
 {
 	struct ufs_clk_scaling *scaling = &hba->clk_scaling;
+	unsigned long flags;
 
 	if (!ufshcd_is_clkscaling_supported(hba))
 		return;
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	hba->clk_scaling.active_reqs--;
 	if (!hba->outstanding_reqs && scaling->is_busy_started) {
 		scaling->tot_busy_t += ktime_to_us(ktime_sub(ktime_get(),
 					scaling->busy_start_t));
 		scaling->busy_start_t = 0;
 		scaling->is_busy_started = false;
 	}
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 }
 
 static inline int ufshcd_monitor_opcode2dir(u8 opcode)
@@ -2048,15 +2057,20 @@ static inline bool ufshcd_should_inform_monitor(struct ufs_hba *hba,
 static void ufshcd_start_monitor(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 {
 	int dir = ufshcd_monitor_opcode2dir(*lrbp->cmd->cmnd);
+	unsigned long flags;
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
 	if (dir >= 0 && hba->monitor.nr_queued[dir]++ == 0)
 		hba->monitor.busy_start_ts[dir] = ktime_get();
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 }
 
 static void ufshcd_update_monitor(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 {
 	int dir = ufshcd_monitor_opcode2dir(*lrbp->cmd->cmnd);
+	unsigned long flags;
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
 	if (dir >= 0 && hba->monitor.nr_queued[dir] > 0) {
 		struct request *req = lrbp->cmd->request;
 		struct ufs_hba_monitor *m = &hba->monitor;
@@ -2080,6 +2094,7 @@ static void ufshcd_update_monitor(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 		/* Push forward the busy start of monitor */
 		m->busy_start_ts[dir] = now;
 	}
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 }
 
 /**
@@ -2091,16 +2106,19 @@ static inline
 void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag)
 {
 	struct ufshcd_lrb *lrbp = &hba->lrb[task_tag];
+	unsigned long flags;
 
 	lrbp->issue_time_stamp = ktime_get();
 	lrbp->compl_time_stamp = ktime_set(0, 0);
 	ufshcd_vops_setup_xfer_req(hba, task_tag, (lrbp->cmd ? true : false));
 	ufshcd_add_command_trace(hba, task_tag, UFS_CMD_SEND);
 	ufshcd_clk_scaling_start_busy(hba);
-	__set_bit(task_tag, &hba->outstanding_reqs);
 	if (unlikely(ufshcd_should_inform_monitor(hba, lrbp)))
 		ufshcd_start_monitor(hba, lrbp);
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	set_bit(task_tag, &hba->outstanding_reqs);
 	ufshcd_writel(hba, 1 << task_tag, REG_UTP_TRANSFER_REQ_DOOR_BELL);
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 	/* Make sure that doorbell is committed immediately */
 	wmb();
 }
@@ -2671,7 +2689,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 {
 	struct ufshcd_lrb *lrbp;
 	struct ufs_hba *hba;
-	unsigned long flags;
 	int tag;
 	int err = 0;
 
@@ -2688,6 +2705,43 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 	if (!down_read_trylock(&hba->clk_scaling_lock))
 		return SCSI_MLQUEUE_HOST_BUSY;
 
+	switch (hba->ufshcd_state) {
+	case UFSHCD_STATE_OPERATIONAL:
+	case UFSHCD_STATE_EH_SCHEDULED_NON_FATAL:
+		break;
+	case UFSHCD_STATE_EH_SCHEDULED_FATAL:
+		/*
+		 * pm_runtime_get_sync() is used at error handling preparation
+		 * stage. If a scsi cmd, e.g. the SSU cmd, is sent from hba's
+		 * PM ops, it can never be finished if we let SCSI layer keep
+		 * retrying it, which gets err handler stuck forever. Neither
+		 * can we let the scsi cmd pass through, because UFS is in bad
+		 * state, the scsi cmd may eventually time out, which will get
+		 * err handler blocked for too long. So, just fail the scsi cmd
+		 * sent from PM ops, err handler can recover PM error anyways.
+		 */
+		if (hba->pm_op_in_progress) {
+			hba->force_reset = true;
+			set_host_byte(cmd, DID_BAD_TARGET);
+			cmd->scsi_done(cmd);
+			goto out;
+		}
+		fallthrough;
+	case UFSHCD_STATE_RESET:
+		err = SCSI_MLQUEUE_HOST_BUSY;
+		goto out;
+	case UFSHCD_STATE_ERROR:
+		set_host_byte(cmd, DID_ERROR);
+		cmd->scsi_done(cmd);
+		goto out;
+	default:
+		dev_WARN_ONCE(hba->dev, 1, "%s: invalid state %d\n",
+				__func__, hba->ufshcd_state);
+		set_host_byte(cmd, DID_BAD_TARGET);
+		cmd->scsi_done(cmd);
+		goto out;
+	}
+
 	hba->req_abort_count = 0;
 
 	err = ufshcd_hold(hba, true);
@@ -2698,8 +2752,7 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 	WARN_ON(ufshcd_is_clkgating_allowed(hba) &&
 		(hba->clk_gating.state != CLKS_ON));
 
-	lrbp = &hba->lrb[tag];
-	if (unlikely(lrbp->in_use)) {
+	if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
 		if (hba->pm_op_in_progress)
 			set_host_byte(cmd, DID_BAD_TARGET);
 		else
@@ -2708,6 +2761,7 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 		goto out;
 	}
 
+	lrbp = &hba->lrb[tag];
 	WARN_ON(lrbp->cmd);
 	lrbp->cmd = cmd;
 	lrbp->sense_bufflen = UFS_SENSE_SIZE;
@@ -2731,51 +2785,7 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 	/* Make sure descriptors are ready before ringing the doorbell */
 	wmb();
 
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	switch (hba->ufshcd_state) {
-	case UFSHCD_STATE_OPERATIONAL:
-	case UFSHCD_STATE_EH_SCHEDULED_NON_FATAL:
-		break;
-	case UFSHCD_STATE_EH_SCHEDULED_FATAL:
-		/*
-		 * pm_runtime_get_sync() is used at error handling preparation
-		 * stage. If a scsi cmd, e.g. the SSU cmd, is sent from hba's
-		 * PM ops, it can never be finished if we let SCSI layer keep
-		 * retrying it, which gets err handler stuck forever. Neither
-		 * can we let the scsi cmd pass through, because UFS is in bad
-		 * state, the scsi cmd may eventually time out, which will get
-		 * err handler blocked for too long. So, just fail the scsi cmd
-		 * sent from PM ops, err handler can recover PM error anyways.
-		 */
-		if (hba->pm_op_in_progress) {
-			hba->force_reset = true;
-			set_host_byte(cmd, DID_BAD_TARGET);
-			goto out_compl_cmd;
-		}
-		fallthrough;
-	case UFSHCD_STATE_RESET:
-		err = SCSI_MLQUEUE_HOST_BUSY;
-		goto out_compl_cmd;
-	case UFSHCD_STATE_ERROR:
-		set_host_byte(cmd, DID_ERROR);
-		goto out_compl_cmd;
-	default:
-		dev_WARN_ONCE(hba->dev, 1, "%s: invalid state %d\n",
-				__func__, hba->ufshcd_state);
-		set_host_byte(cmd, DID_BAD_TARGET);
-		goto out_compl_cmd;
-	}
 	ufshcd_send_command(hba, tag);
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-	goto out;
-
-out_compl_cmd:
-	scsi_dma_unmap(lrbp->cmd);
-	lrbp->cmd = NULL;
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-	ufshcd_release(hba);
-	if (!err)
-		cmd->scsi_done(cmd);
 out:
 	up_read(&hba->clk_scaling_lock);
 	return err;
@@ -2930,7 +2940,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 	int err;
 	int tag;
 	struct completion wait;
-	unsigned long flags;
 
 	down_read(&hba->clk_scaling_lock);
 
@@ -2947,13 +2956,13 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 	tag = req->tag;
 	WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
 
-	init_completion(&wait);
-	lrbp = &hba->lrb[tag];
-	if (unlikely(lrbp->in_use)) {
+	if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
 		err = -EBUSY;
 		goto out;
 	}
 
+	init_completion(&wait);
+	lrbp = &hba->lrb[tag];
 	WARN_ON(lrbp->cmd);
 	err = ufshcd_compose_dev_cmd(hba, lrbp, cmd_type, tag);
 	if (unlikely(err))
@@ -2964,12 +2973,9 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 	ufshcd_add_query_upiu_trace(hba, UFS_QUERY_SEND, lrbp->ucd_req_ptr);
 	/* Make sure descriptors are ready before ringing the doorbell */
 	wmb();
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	ufshcd_send_command(hba, tag);
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
+	ufshcd_send_command(hba, tag);
 	err = ufshcd_wait_for_dev_cmd(hba, lrbp, timeout);
-
 out:
 	ufshcd_add_query_upiu_trace(hba, err ? UFS_QUERY_ERR : UFS_QUERY_COMP,
 				    (struct utp_upiu_req *)lrbp->ucd_rsp_ptr);
@@ -5147,6 +5153,24 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 	return result;
 }
 
+static bool ufshcd_is_auto_hibern8_error(struct ufs_hba *hba,
+					 u32 intr_mask)
+{
+	if (!ufshcd_is_auto_hibern8_supported(hba) ||
+	    !ufshcd_is_auto_hibern8_enabled(hba))
+		return false;
+
+	if (!(intr_mask & UFSHCD_UIC_HIBERN8_MASK))
+		return false;
+
+	if (hba->active_uic_cmd &&
+	    (hba->active_uic_cmd->command == UIC_CMD_DME_HIBER_ENTER ||
+	    hba->active_uic_cmd->command == UIC_CMD_DME_HIBER_EXIT))
+		return false;
+
+	return true;
+}
+
 /**
  * ufshcd_uic_cmd_compl - handle completion of uic command
  * @hba: per adapter instance
@@ -5160,6 +5184,10 @@ static irqreturn_t ufshcd_uic_cmd_compl(struct ufs_hba *hba, u32 intr_status)
 {
 	irqreturn_t retval = IRQ_NONE;
 
+	spin_lock(hba->host->host_lock);
+	if (ufshcd_is_auto_hibern8_error(hba, intr_status))
+		hba->errors |= (UFSHCD_UIC_HIBERN8_MASK & intr_status);
+
 	if ((intr_status & UIC_COMMAND_COMPL) && hba->active_uic_cmd) {
 		hba->active_uic_cmd->argument2 |=
 			ufshcd_get_uic_cmd_result(hba);
@@ -5180,6 +5208,7 @@ static irqreturn_t ufshcd_uic_cmd_compl(struct ufs_hba *hba, u32 intr_status)
 	if (retval == IRQ_HANDLED)
 		ufshcd_add_uic_command_trace(hba, hba->active_uic_cmd,
 					     UFS_CMD_COMP);
+	spin_unlock(hba->host->host_lock);
 	return retval;
 }
 
@@ -5198,8 +5227,9 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 	bool update_scaling = false;
 
 	for_each_set_bit(index, &completed_reqs, hba->nutrs) {
+		if (!test_and_clear_bit(index, &hba->outstanding_reqs))
+			continue;
 		lrbp = &hba->lrb[index];
-		lrbp->in_use = false;
 		lrbp->compl_time_stamp = ktime_get();
 		cmd = lrbp->cmd;
 		if (cmd) {
@@ -5213,7 +5243,7 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 			lrbp->cmd = NULL;
 			/* Do not touch lrbp after scsi done */
 			cmd->scsi_done(cmd);
-			__ufshcd_release(hba);
+			ufshcd_release(hba);
 			update_scaling = true;
 		} else if (lrbp->command_type == UTP_CMD_TYPE_DEV_MANAGE ||
 			lrbp->command_type == UTP_CMD_TYPE_UFS_STORAGE) {
@@ -5224,14 +5254,9 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 				update_scaling = true;
 			}
 		}
-		if (ufshcd_is_clkscaling_supported(hba) && update_scaling)
-			hba->clk_scaling.active_reqs--;
+		if (update_scaling)
+			ufshcd_clk_scaling_update_busy(hba);
 	}
-
-	/* clear corresponding bits of completed commands */
-	hba->outstanding_reqs ^= completed_reqs;
-
-	ufshcd_clk_scaling_update_busy(hba);
 }
 
 /**
@@ -5244,7 +5269,7 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
  */
 static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
 {
-	unsigned long completed_reqs;
+	unsigned long completed_reqs, flags;
 	u32 tr_doorbell;
 
 	/* Resetting interrupt aggregation counters first and reading the
@@ -5258,8 +5283,10 @@ static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
 	    !(hba->quirks & UFSHCI_QUIRK_SKIP_RESET_INTR_AGGR))
 		ufshcd_reset_intr_aggr(hba);
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
 	tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
 	completed_reqs = tr_doorbell ^ hba->outstanding_reqs;
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
 	if (completed_reqs) {
 		__ufshcd_transfer_req_compl(hba, completed_reqs);
@@ -5996,13 +6023,11 @@ static void ufshcd_err_handler(struct work_struct *work)
 	ufshcd_set_eh_in_progress(hba);
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 	ufshcd_err_handling_prepare(hba);
+	/* Complete requests that have door-bell cleared by h/w */
+	ufshcd_complete_requests(hba);
 	spin_lock_irqsave(hba->host->host_lock, flags);
 	if (hba->ufshcd_state != UFSHCD_STATE_ERROR)
 		hba->ufshcd_state = UFSHCD_STATE_RESET;
-
-	/* Complete requests that have door-bell cleared by h/w */
-	ufshcd_complete_requests(hba);
-
 	/*
 	 * A full reset and restore might have happened after preparation
 	 * is finished, double check whether we should stop.
@@ -6085,12 +6110,11 @@ static void ufshcd_err_handler(struct work_struct *work)
 	}
 
 lock_skip_pending_xfer_clear:
-	spin_lock_irqsave(hba->host->host_lock, flags);
-
 	/* Complete the requests that are cleared by s/w */
 	ufshcd_complete_requests(hba);
-	hba->silence_err_logs = false;
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	hba->silence_err_logs = false;
 	if (err_xfer || err_tm) {
 		needs_reset = true;
 		goto do_reset;
@@ -6240,37 +6264,23 @@ static irqreturn_t ufshcd_update_uic_error(struct ufs_hba *hba)
 	return retval;
 }
 
-static bool ufshcd_is_auto_hibern8_error(struct ufs_hba *hba,
-					 u32 intr_mask)
-{
-	if (!ufshcd_is_auto_hibern8_supported(hba) ||
-	    !ufshcd_is_auto_hibern8_enabled(hba))
-		return false;
-
-	if (!(intr_mask & UFSHCD_UIC_HIBERN8_MASK))
-		return false;
-
-	if (hba->active_uic_cmd &&
-	    (hba->active_uic_cmd->command == UIC_CMD_DME_HIBER_ENTER ||
-	    hba->active_uic_cmd->command == UIC_CMD_DME_HIBER_EXIT))
-		return false;
-
-	return true;
-}
-
 /**
  * ufshcd_check_errors - Check for errors that need s/w attention
  * @hba: per-adapter instance
+ * @intr_status: interrupt status generated by the controller
  *
  * Returns
  *  IRQ_HANDLED - If interrupt is valid
  *  IRQ_NONE    - If invalid interrupt
  */
-static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba)
+static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba, u32 intr_status)
 {
 	bool queue_eh_work = false;
 	irqreturn_t retval = IRQ_NONE;
 
+	spin_lock(hba->host->host_lock);
+	hba->errors |= UFSHCD_ERROR_MASK & intr_status;
+
 	if (hba->errors & INT_FATAL_ERRORS) {
 		ufshcd_update_evt_hist(hba, UFS_EVT_FATAL_ERR,
 				       hba->errors);
@@ -6325,6 +6335,9 @@ static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba)
 	 * itself without s/w intervention or errors that will be
 	 * handled by the SCSI core layer.
 	 */
+	hba->errors = 0;
+	hba->uic_error = 0;
+	spin_unlock(hba->host->host_lock);
 	return retval;
 }
 
@@ -6359,13 +6372,17 @@ static bool ufshcd_compl_tm(struct request *req, void *priv, bool reserved)
  */
 static irqreturn_t ufshcd_tmc_handler(struct ufs_hba *hba)
 {
+	unsigned long flags;
 	struct request_queue *q = hba->tmf_queue;
 	struct ctm_info ci = {
 		.hba	 = hba,
-		.pending = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL),
 	};
 
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	ci.pending = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL);
 	blk_mq_tagset_busy_iter(q->tag_set, ufshcd_compl_tm, &ci);
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
+
 	return ci.ncpl ? IRQ_HANDLED : IRQ_NONE;
 }
 
@@ -6382,17 +6399,12 @@ static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
 {
 	irqreturn_t retval = IRQ_NONE;
 
-	hba->errors = UFSHCD_ERROR_MASK & intr_status;
-
-	if (ufshcd_is_auto_hibern8_error(hba, intr_status))
-		hba->errors |= (UFSHCD_UIC_HIBERN8_MASK & intr_status);
-
-	if (hba->errors)
-		retval |= ufshcd_check_errors(hba);
-
 	if (intr_status & UFSHCD_UIC_MASK)
 		retval |= ufshcd_uic_cmd_compl(hba, intr_status);
 
+	if (intr_status & UFSHCD_ERROR_MASK || hba->errors)
+		retval |= ufshcd_check_errors(hba, intr_status);
+
 	if (intr_status & UTP_TASK_REQ_COMPL)
 		retval |= ufshcd_tmc_handler(hba);
 
@@ -6418,7 +6430,6 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
 	struct ufs_hba *hba = __hba;
 	int retries = hba->nutrs;
 
-	spin_lock(hba->host->host_lock);
 	intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
 	hba->ufs_stats.last_intr_status = intr_status;
 	hba->ufs_stats.last_intr_ts = ktime_get();
@@ -6449,7 +6460,6 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
 		ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
 	}
 
-	spin_unlock(hba->host->host_lock);
 	return retval;
 }
 
@@ -6626,7 +6636,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 	int err = 0;
 	int tag;
 	struct completion wait;
-	unsigned long flags;
 	u8 upiu_flags;
 
 	down_read(&hba->clk_scaling_lock);
@@ -6639,13 +6648,13 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 	tag = req->tag;
 	WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
 
-	init_completion(&wait);
-	lrbp = &hba->lrb[tag];
-	if (unlikely(lrbp->in_use)) {
+	if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
 		err = -EBUSY;
 		goto out;
 	}
 
+	init_completion(&wait);
+	lrbp = &hba->lrb[tag];
 	WARN_ON(lrbp->cmd);
 	lrbp->cmd = NULL;
 	lrbp->sense_bufflen = 0;
@@ -6683,10 +6692,8 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 
 	/* Make sure descriptors are ready before ringing the doorbell */
 	wmb();
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	ufshcd_send_command(hba, tag);
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
+	ufshcd_send_command(hba, tag);
 	/*
 	 * ignore the returning value here - ufshcd_check_query_response is
 	 * bound to fail since dev_cmd.query and dev_cmd.type were left empty.
@@ -6805,7 +6812,6 @@ static int ufshcd_eh_device_reset_handler(struct scsi_cmnd *cmd)
 	u32 pos;
 	int err;
 	u8 resp = 0xF, lun;
-	unsigned long flags;
 
 	host = cmd->device->host;
 	hba = shost_priv(host);
@@ -6824,11 +6830,9 @@ static int ufshcd_eh_device_reset_handler(struct scsi_cmnd *cmd)
 			err = ufshcd_clear_cmd(hba, pos);
 			if (err)
 				break;
+			__ufshcd_transfer_req_compl(hba, pos);
 		}
 	}
-	spin_lock_irqsave(host->host_lock, flags);
-	ufshcd_transfer_req_compl(hba);
-	spin_unlock_irqrestore(host->host_lock, flags);
 
 out:
 	hba->req_abort_count = 0;
@@ -7005,20 +7009,16 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
 	 * will fail, due to spec violation, scsi err handling next step
 	 * will be to send LU reset which, again, is a spec violation.
 	 * To avoid these unnecessary/illegal steps, first we clean up
-	 * the lrb taken by this cmd and mark the lrb as in_use, then
-	 * queue the eh_work and bail.
+	 * the lrb taken by this cmd and re-set it in outstanding_reqs,
+	 * then queue the eh_work and bail.
 	 */
 	if (lrbp->lun == UFS_UPIU_UFS_DEVICE_WLUN) {
 		ufshcd_update_evt_hist(hba, UFS_EVT_ABORT, lrbp->lun);
+		__ufshcd_transfer_req_compl(hba, (1UL << tag));
+		set_bit(tag, &hba->outstanding_reqs);
 		spin_lock_irqsave(host->host_lock, flags);
-		if (lrbp->cmd) {
-			__ufshcd_transfer_req_compl(hba, (1UL << tag));
-			__set_bit(tag, &hba->outstanding_reqs);
-			lrbp->in_use = true;
-			hba->force_reset = true;
-			ufshcd_schedule_eh_work(hba);
-		}
-
+		hba->force_reset = true;
+		ufshcd_schedule_eh_work(hba);
 		spin_unlock_irqrestore(host->host_lock, flags);
 		goto out;
 	}
@@ -7031,9 +7031,7 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
 
 	if (!err) {
 cleanup:
-		spin_lock_irqsave(host->host_lock, flags);
 		__ufshcd_transfer_req_compl(hba, (1UL << tag));
-		spin_unlock_irqrestore(host->host_lock, flags);
 out:
 		err = SUCCESS;
 	} else {
@@ -7063,19 +7061,15 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
 static int ufshcd_host_reset_and_restore(struct ufs_hba *hba)
 {
 	int err;
-	unsigned long flags;
 
 	/*
 	 * Stop the host controller and complete the requests
 	 * cleared by h/w
 	 */
 	ufshcd_hba_stop(hba);
-
-	spin_lock_irqsave(hba->host->host_lock, flags);
 	hba->silence_err_logs = true;
 	ufshcd_complete_requests(hba);
 	hba->silence_err_logs = false;
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
 	/* scale up clocks to max frequency before full reinitialization */
 	ufshcd_set_clk_freq(hba, true);
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 0f0e62b..a70daf7 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -193,7 +193,6 @@ struct ufs_pm_lvl_states {
  * @crypto_key_slot: the key slot to use for inline crypto (-1 if none)
  * @data_unit_num: the data unit number for the first block for inline crypto
  * @req_abort_skip: skip request abort task flag
- * @in_use: indicates that this lrb is still in use
  */
 struct ufshcd_lrb {
 	struct utp_transfer_req_desc *utr_descriptor_ptr;
@@ -223,7 +222,6 @@ struct ufshcd_lrb {
 #endif
 
 	bool req_abort_skip;
-	bool in_use;
 };
 
 /**
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v1 3/3] scsi: ufs: Utilize Transfer Request List Completion Notification Register
  2021-05-24  8:36 [PATCH v1 0/3] Optimize host lock on TR send/compl paths and utilize UTRLCNR Can Guo
  2021-05-24  8:36 ` [PATCH v1 1/3] scsi: ufs: Remove a redundant command completion logic in error handler Can Guo
@ 2021-05-24  8:36   ` Can Guo
  2021-05-24  8:36   ` Can Guo
  2021-06-16  3:48 ` [PATCH v1 0/3] Optimize host lock on TR send/compl paths and utilize UTRLCNR Martin K. Petersen
  3 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-05-24  8:36 UTC (permalink / raw)
  To: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, cang
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, Gustavo A. R. Silva,
	Caleb Connolly, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

By reading the UTP Transfer Request List Completion Notification Register,
which is added in UFSHCI Ver 3.0, SW can easily get the compeleted transfer
requests. Thus, SW can get rid of host lock, which is used to synchronize
the tr_doorbell and outstanding_reqs, on transfer requests dispatch and
completion paths. This can further benefit random read/write performance.

Cc: Stanley Chu <stanley.chu@mediatek.com>
Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c | 52 +++++++++++++++++++++++++++++++++--------------
 drivers/scsi/ufs/ufshcd.h |  5 +++++
 drivers/scsi/ufs/ufshci.h |  1 +
 3 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index b9b5e61..2b7ad26 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -2106,7 +2106,6 @@ static inline
 void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag)
 {
 	struct ufshcd_lrb *lrbp = &hba->lrb[task_tag];
-	unsigned long flags;
 
 	lrbp->issue_time_stamp = ktime_get();
 	lrbp->compl_time_stamp = ktime_set(0, 0);
@@ -2115,10 +2114,19 @@ void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag)
 	ufshcd_clk_scaling_start_busy(hba);
 	if (unlikely(ufshcd_should_inform_monitor(hba, lrbp)))
 		ufshcd_start_monitor(hba, lrbp);
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	set_bit(task_tag, &hba->outstanding_reqs);
-	ufshcd_writel(hba, 1 << task_tag, REG_UTP_TRANSFER_REQ_DOOR_BELL);
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
+	if (ufshcd_has_utrlcnr(hba)) {
+		set_bit(task_tag, &hba->outstanding_reqs);
+		ufshcd_writel(hba, 1 << task_tag,
+			      REG_UTP_TRANSFER_REQ_DOOR_BELL);
+	} else {
+		unsigned long flags;
+
+		spin_lock_irqsave(hba->host->host_lock, flags);
+		set_bit(task_tag, &hba->outstanding_reqs);
+		ufshcd_writel(hba, 1 << task_tag,
+			      REG_UTP_TRANSFER_REQ_DOOR_BELL);
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
+	}
 	/* Make sure that doorbell is committed immediately */
 	wmb();
 }
@@ -5260,17 +5268,17 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 }
 
 /**
- * ufshcd_transfer_req_compl - handle SCSI and query command completion
+ * ufshcd_trc_handler - handle transfer requests completion
  * @hba: per adapter instance
+ * @use_utrlcnr: get completed requests from UTRLCNR
  *
  * Returns
  *  IRQ_HANDLED - If interrupt is valid
  *  IRQ_NONE    - If invalid interrupt
  */
-static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
+static irqreturn_t ufshcd_trc_handler(struct ufs_hba *hba, bool use_utrlcnr)
 {
-	unsigned long completed_reqs, flags;
-	u32 tr_doorbell;
+	unsigned long completed_reqs = 0;
 
 	/* Resetting interrupt aggregation counters first and reading the
 	 * DOOR_BELL afterward allows us to handle all the completed requests.
@@ -5283,10 +5291,24 @@ static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
 	    !(hba->quirks & UFSHCI_QUIRK_SKIP_RESET_INTR_AGGR))
 		ufshcd_reset_intr_aggr(hba);
 
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
-	completed_reqs = tr_doorbell ^ hba->outstanding_reqs;
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
+	if (use_utrlcnr) {
+		u32 utrlcnr;
+
+		utrlcnr = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_LIST_COMPL);
+		if (utrlcnr) {
+			ufshcd_writel(hba, utrlcnr,
+				      REG_UTP_TRANSFER_REQ_LIST_COMPL);
+			completed_reqs = utrlcnr;
+		}
+	} else {
+		unsigned long flags;
+		u32 tr_doorbell;
+
+		spin_lock_irqsave(hba->host->host_lock, flags);
+		tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
+		completed_reqs = tr_doorbell ^ hba->outstanding_reqs;
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
+	}
 
 	if (completed_reqs) {
 		__ufshcd_transfer_req_compl(hba, completed_reqs);
@@ -5768,7 +5790,7 @@ static void ufshcd_exception_event_handler(struct work_struct *work)
 /* Complete requests that have door-bell cleared */
 static void ufshcd_complete_requests(struct ufs_hba *hba)
 {
-	ufshcd_transfer_req_compl(hba);
+	ufshcd_trc_handler(hba, false);
 	ufshcd_tmc_handler(hba);
 }
 
@@ -6409,7 +6431,7 @@ static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
 		retval |= ufshcd_tmc_handler(hba);
 
 	if (intr_status & UTP_TRANSFER_REQ_COMPL)
-		retval |= ufshcd_transfer_req_compl(hba);
+		retval |= ufshcd_trc_handler(hba, ufshcd_has_utrlcnr(hba));
 
 	return retval;
 }
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index a70daf7..d5325e8 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -1159,6 +1159,11 @@ static inline u32 ufshcd_vops_get_ufs_hci_version(struct ufs_hba *hba)
 	return ufshcd_readl(hba, REG_UFS_VERSION);
 }
 
+static inline bool ufshcd_has_utrlcnr(struct ufs_hba *hba)
+{
+	return (hba->ufs_version >= ufshci_version(3, 0));
+}
+
 static inline int ufshcd_vops_clk_scale_notify(struct ufs_hba *hba,
 			bool up, enum ufs_notify_change_status status)
 {
diff --git a/drivers/scsi/ufs/ufshci.h b/drivers/scsi/ufs/ufshci.h
index de95be5..5affb1f 100644
--- a/drivers/scsi/ufs/ufshci.h
+++ b/drivers/scsi/ufs/ufshci.h
@@ -39,6 +39,7 @@ enum {
 	REG_UTP_TRANSFER_REQ_DOOR_BELL		= 0x58,
 	REG_UTP_TRANSFER_REQ_LIST_CLEAR		= 0x5C,
 	REG_UTP_TRANSFER_REQ_LIST_RUN_STOP	= 0x60,
+	REG_UTP_TRANSFER_REQ_LIST_COMPL		= 0x64,
 	REG_UTP_TASK_REQ_LIST_BASE_L		= 0x70,
 	REG_UTP_TASK_REQ_LIST_BASE_H		= 0x74,
 	REG_UTP_TASK_REQ_DOOR_BELL		= 0x78,
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v1 3/3] scsi: ufs: Utilize Transfer Request List Completion Notification Register
@ 2021-05-24  8:36   ` Can Guo
  0 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-05-24  8:36 UTC (permalink / raw)
  To: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, cang
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, Gustavo A. R. Silva,
	Caleb Connolly, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

By reading the UTP Transfer Request List Completion Notification Register,
which is added in UFSHCI Ver 3.0, SW can easily get the compeleted transfer
requests. Thus, SW can get rid of host lock, which is used to synchronize
the tr_doorbell and outstanding_reqs, on transfer requests dispatch and
completion paths. This can further benefit random read/write performance.

Cc: Stanley Chu <stanley.chu@mediatek.com>
Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c | 52 +++++++++++++++++++++++++++++++++--------------
 drivers/scsi/ufs/ufshcd.h |  5 +++++
 drivers/scsi/ufs/ufshci.h |  1 +
 3 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index b9b5e61..2b7ad26 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -2106,7 +2106,6 @@ static inline
 void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag)
 {
 	struct ufshcd_lrb *lrbp = &hba->lrb[task_tag];
-	unsigned long flags;
 
 	lrbp->issue_time_stamp = ktime_get();
 	lrbp->compl_time_stamp = ktime_set(0, 0);
@@ -2115,10 +2114,19 @@ void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag)
 	ufshcd_clk_scaling_start_busy(hba);
 	if (unlikely(ufshcd_should_inform_monitor(hba, lrbp)))
 		ufshcd_start_monitor(hba, lrbp);
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	set_bit(task_tag, &hba->outstanding_reqs);
-	ufshcd_writel(hba, 1 << task_tag, REG_UTP_TRANSFER_REQ_DOOR_BELL);
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
+	if (ufshcd_has_utrlcnr(hba)) {
+		set_bit(task_tag, &hba->outstanding_reqs);
+		ufshcd_writel(hba, 1 << task_tag,
+			      REG_UTP_TRANSFER_REQ_DOOR_BELL);
+	} else {
+		unsigned long flags;
+
+		spin_lock_irqsave(hba->host->host_lock, flags);
+		set_bit(task_tag, &hba->outstanding_reqs);
+		ufshcd_writel(hba, 1 << task_tag,
+			      REG_UTP_TRANSFER_REQ_DOOR_BELL);
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
+	}
 	/* Make sure that doorbell is committed immediately */
 	wmb();
 }
@@ -5260,17 +5268,17 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 }
 
 /**
- * ufshcd_transfer_req_compl - handle SCSI and query command completion
+ * ufshcd_trc_handler - handle transfer requests completion
  * @hba: per adapter instance
+ * @use_utrlcnr: get completed requests from UTRLCNR
  *
  * Returns
  *  IRQ_HANDLED - If interrupt is valid
  *  IRQ_NONE    - If invalid interrupt
  */
-static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
+static irqreturn_t ufshcd_trc_handler(struct ufs_hba *hba, bool use_utrlcnr)
 {
-	unsigned long completed_reqs, flags;
-	u32 tr_doorbell;
+	unsigned long completed_reqs = 0;
 
 	/* Resetting interrupt aggregation counters first and reading the
 	 * DOOR_BELL afterward allows us to handle all the completed requests.
@@ -5283,10 +5291,24 @@ static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
 	    !(hba->quirks & UFSHCI_QUIRK_SKIP_RESET_INTR_AGGR))
 		ufshcd_reset_intr_aggr(hba);
 
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
-	completed_reqs = tr_doorbell ^ hba->outstanding_reqs;
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
+	if (use_utrlcnr) {
+		u32 utrlcnr;
+
+		utrlcnr = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_LIST_COMPL);
+		if (utrlcnr) {
+			ufshcd_writel(hba, utrlcnr,
+				      REG_UTP_TRANSFER_REQ_LIST_COMPL);
+			completed_reqs = utrlcnr;
+		}
+	} else {
+		unsigned long flags;
+		u32 tr_doorbell;
+
+		spin_lock_irqsave(hba->host->host_lock, flags);
+		tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
+		completed_reqs = tr_doorbell ^ hba->outstanding_reqs;
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
+	}
 
 	if (completed_reqs) {
 		__ufshcd_transfer_req_compl(hba, completed_reqs);
@@ -5768,7 +5790,7 @@ static void ufshcd_exception_event_handler(struct work_struct *work)
 /* Complete requests that have door-bell cleared */
 static void ufshcd_complete_requests(struct ufs_hba *hba)
 {
-	ufshcd_transfer_req_compl(hba);
+	ufshcd_trc_handler(hba, false);
 	ufshcd_tmc_handler(hba);
 }
 
@@ -6409,7 +6431,7 @@ static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
 		retval |= ufshcd_tmc_handler(hba);
 
 	if (intr_status & UTP_TRANSFER_REQ_COMPL)
-		retval |= ufshcd_transfer_req_compl(hba);
+		retval |= ufshcd_trc_handler(hba, ufshcd_has_utrlcnr(hba));
 
 	return retval;
 }
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index a70daf7..d5325e8 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -1159,6 +1159,11 @@ static inline u32 ufshcd_vops_get_ufs_hci_version(struct ufs_hba *hba)
 	return ufshcd_readl(hba, REG_UFS_VERSION);
 }
 
+static inline bool ufshcd_has_utrlcnr(struct ufs_hba *hba)
+{
+	return (hba->ufs_version >= ufshci_version(3, 0));
+}
+
 static inline int ufshcd_vops_clk_scale_notify(struct ufs_hba *hba,
 			bool up, enum ufs_notify_change_status status)
 {
diff --git a/drivers/scsi/ufs/ufshci.h b/drivers/scsi/ufs/ufshci.h
index de95be5..5affb1f 100644
--- a/drivers/scsi/ufs/ufshci.h
+++ b/drivers/scsi/ufs/ufshci.h
@@ -39,6 +39,7 @@ enum {
 	REG_UTP_TRANSFER_REQ_DOOR_BELL		= 0x58,
 	REG_UTP_TRANSFER_REQ_LIST_CLEAR		= 0x5C,
 	REG_UTP_TRANSFER_REQ_LIST_RUN_STOP	= 0x60,
+	REG_UTP_TRANSFER_REQ_LIST_COMPL		= 0x64,
 	REG_UTP_TASK_REQ_LIST_BASE_L		= 0x70,
 	REG_UTP_TASK_REQ_LIST_BASE_H		= 0x74,
 	REG_UTP_TASK_REQ_DOOR_BELL		= 0x78,
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v1 3/3] scsi: ufs: Utilize Transfer Request List Completion Notification Register
@ 2021-05-24  8:36   ` Can Guo
  0 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-05-24  8:36 UTC (permalink / raw)
  To: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, cang
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, Gustavo A. R. Silva,
	Caleb Connolly, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

By reading the UTP Transfer Request List Completion Notification Register,
which is added in UFSHCI Ver 3.0, SW can easily get the compeleted transfer
requests. Thus, SW can get rid of host lock, which is used to synchronize
the tr_doorbell and outstanding_reqs, on transfer requests dispatch and
completion paths. This can further benefit random read/write performance.

Cc: Stanley Chu <stanley.chu@mediatek.com>
Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c | 52 +++++++++++++++++++++++++++++++++--------------
 drivers/scsi/ufs/ufshcd.h |  5 +++++
 drivers/scsi/ufs/ufshci.h |  1 +
 3 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index b9b5e61..2b7ad26 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -2106,7 +2106,6 @@ static inline
 void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag)
 {
 	struct ufshcd_lrb *lrbp = &hba->lrb[task_tag];
-	unsigned long flags;
 
 	lrbp->issue_time_stamp = ktime_get();
 	lrbp->compl_time_stamp = ktime_set(0, 0);
@@ -2115,10 +2114,19 @@ void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag)
 	ufshcd_clk_scaling_start_busy(hba);
 	if (unlikely(ufshcd_should_inform_monitor(hba, lrbp)))
 		ufshcd_start_monitor(hba, lrbp);
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	set_bit(task_tag, &hba->outstanding_reqs);
-	ufshcd_writel(hba, 1 << task_tag, REG_UTP_TRANSFER_REQ_DOOR_BELL);
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
+	if (ufshcd_has_utrlcnr(hba)) {
+		set_bit(task_tag, &hba->outstanding_reqs);
+		ufshcd_writel(hba, 1 << task_tag,
+			      REG_UTP_TRANSFER_REQ_DOOR_BELL);
+	} else {
+		unsigned long flags;
+
+		spin_lock_irqsave(hba->host->host_lock, flags);
+		set_bit(task_tag, &hba->outstanding_reqs);
+		ufshcd_writel(hba, 1 << task_tag,
+			      REG_UTP_TRANSFER_REQ_DOOR_BELL);
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
+	}
 	/* Make sure that doorbell is committed immediately */
 	wmb();
 }
@@ -5260,17 +5268,17 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 }
 
 /**
- * ufshcd_transfer_req_compl - handle SCSI and query command completion
+ * ufshcd_trc_handler - handle transfer requests completion
  * @hba: per adapter instance
+ * @use_utrlcnr: get completed requests from UTRLCNR
  *
  * Returns
  *  IRQ_HANDLED - If interrupt is valid
  *  IRQ_NONE    - If invalid interrupt
  */
-static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
+static irqreturn_t ufshcd_trc_handler(struct ufs_hba *hba, bool use_utrlcnr)
 {
-	unsigned long completed_reqs, flags;
-	u32 tr_doorbell;
+	unsigned long completed_reqs = 0;
 
 	/* Resetting interrupt aggregation counters first and reading the
 	 * DOOR_BELL afterward allows us to handle all the completed requests.
@@ -5283,10 +5291,24 @@ static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
 	    !(hba->quirks & UFSHCI_QUIRK_SKIP_RESET_INTR_AGGR))
 		ufshcd_reset_intr_aggr(hba);
 
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
-	completed_reqs = tr_doorbell ^ hba->outstanding_reqs;
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
+	if (use_utrlcnr) {
+		u32 utrlcnr;
+
+		utrlcnr = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_LIST_COMPL);
+		if (utrlcnr) {
+			ufshcd_writel(hba, utrlcnr,
+				      REG_UTP_TRANSFER_REQ_LIST_COMPL);
+			completed_reqs = utrlcnr;
+		}
+	} else {
+		unsigned long flags;
+		u32 tr_doorbell;
+
+		spin_lock_irqsave(hba->host->host_lock, flags);
+		tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
+		completed_reqs = tr_doorbell ^ hba->outstanding_reqs;
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
+	}
 
 	if (completed_reqs) {
 		__ufshcd_transfer_req_compl(hba, completed_reqs);
@@ -5768,7 +5790,7 @@ static void ufshcd_exception_event_handler(struct work_struct *work)
 /* Complete requests that have door-bell cleared */
 static void ufshcd_complete_requests(struct ufs_hba *hba)
 {
-	ufshcd_transfer_req_compl(hba);
+	ufshcd_trc_handler(hba, false);
 	ufshcd_tmc_handler(hba);
 }
 
@@ -6409,7 +6431,7 @@ static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
 		retval |= ufshcd_tmc_handler(hba);
 
 	if (intr_status & UTP_TRANSFER_REQ_COMPL)
-		retval |= ufshcd_transfer_req_compl(hba);
+		retval |= ufshcd_trc_handler(hba, ufshcd_has_utrlcnr(hba));
 
 	return retval;
 }
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index a70daf7..d5325e8 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -1159,6 +1159,11 @@ static inline u32 ufshcd_vops_get_ufs_hci_version(struct ufs_hba *hba)
 	return ufshcd_readl(hba, REG_UFS_VERSION);
 }
 
+static inline bool ufshcd_has_utrlcnr(struct ufs_hba *hba)
+{
+	return (hba->ufs_version >= ufshci_version(3, 0));
+}
+
 static inline int ufshcd_vops_clk_scale_notify(struct ufs_hba *hba,
 			bool up, enum ufs_notify_change_status status)
 {
diff --git a/drivers/scsi/ufs/ufshci.h b/drivers/scsi/ufs/ufshci.h
index de95be5..5affb1f 100644
--- a/drivers/scsi/ufs/ufshci.h
+++ b/drivers/scsi/ufs/ufshci.h
@@ -39,6 +39,7 @@ enum {
 	REG_UTP_TRANSFER_REQ_DOOR_BELL		= 0x58,
 	REG_UTP_TRANSFER_REQ_LIST_CLEAR		= 0x5C,
 	REG_UTP_TRANSFER_REQ_LIST_RUN_STOP	= 0x60,
+	REG_UTP_TRANSFER_REQ_LIST_COMPL		= 0x64,
 	REG_UTP_TASK_REQ_LIST_BASE_L		= 0x70,
 	REG_UTP_TASK_REQ_LIST_BASE_H		= 0x74,
 	REG_UTP_TASK_REQ_DOOR_BELL		= 0x78,
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-05-24  8:36   ` Can Guo
@ 2021-05-24 11:25     ` kernel test robot
  -1 siblings, 0 replies; 68+ messages in thread
From: kernel test robot @ 2021-05-24 11:25 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: kbuild-all, clang-built-linux, Stanley Chu, Alim Akhtar,
	Avri Altman, James E.J. Bottomley

[-- Attachment #1: Type: text/plain, Size: 5128 bytes --]

Hi Can,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mkp-scsi/for-next]
[also build test WARNING on next-20210524]
[cannot apply to scsi/for-next v5.13-rc3]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Can-Guo/Optimize-host-lock-on-TR-send-compl-paths-and-utilize-UTRLCNR/20210524-163847
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next
config: arm64-randconfig-r011-20210524 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 93d1e5822ed64abd777eb94ea9899e96c4c39fbe)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/efe94162bf7973be4ed6496871b9bc9ea54e2819
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Can-Guo/Optimize-host-lock-on-TR-send-compl-paths-and-utilize-UTRLCNR/20210524-163847
        git checkout efe94162bf7973be4ed6496871b9bc9ea54e2819
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/scsi/ufs/ufshcd.c:2959:6: warning: variable 'lrbp' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
           if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:78:22: note: expanded from macro 'unlikely'
   # define unlikely(x)    __builtin_expect(!!(x), 0)
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/scsi/ufs/ufshcd.c:2981:32: note: uninitialized use occurs here
                                       (struct utp_upiu_req *)lrbp->ucd_rsp_ptr);
                                                              ^~~~
   drivers/scsi/ufs/ufshcd.c:2959:2: note: remove the 'if' if its condition is always false
           if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/scsi/ufs/ufshcd.c:2939:25: note: initialize the variable 'lrbp' to silence this warning
           struct ufshcd_lrb *lrbp;
                                  ^
                                   = NULL
   1 warning generated.


vim +2959 drivers/scsi/ufs/ufshcd.c

  2924	
  2925	/**
  2926	 * ufshcd_exec_dev_cmd - API for sending device management requests
  2927	 * @hba: UFS hba
  2928	 * @cmd_type: specifies the type (NOP, Query...)
  2929	 * @timeout: time in seconds
  2930	 *
  2931	 * NOTE: Since there is only one available tag for device management commands,
  2932	 * it is expected you hold the hba->dev_cmd.lock mutex.
  2933	 */
  2934	static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
  2935			enum dev_cmd_type cmd_type, int timeout)
  2936	{
  2937		struct request_queue *q = hba->cmd_queue;
  2938		struct request *req;
  2939		struct ufshcd_lrb *lrbp;
  2940		int err;
  2941		int tag;
  2942		struct completion wait;
  2943	
  2944		down_read(&hba->clk_scaling_lock);
  2945	
  2946		/*
  2947		 * Get free slot, sleep if slots are unavailable.
  2948		 * Even though we use wait_event() which sleeps indefinitely,
  2949		 * the maximum wait time is bounded by SCSI request timeout.
  2950		 */
  2951		req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
  2952		if (IS_ERR(req)) {
  2953			err = PTR_ERR(req);
  2954			goto out_unlock;
  2955		}
  2956		tag = req->tag;
  2957		WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
  2958	
> 2959		if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
  2960			err = -EBUSY;
  2961			goto out;
  2962		}
  2963	
  2964		init_completion(&wait);
  2965		lrbp = &hba->lrb[tag];
  2966		WARN_ON(lrbp->cmd);
  2967		err = ufshcd_compose_dev_cmd(hba, lrbp, cmd_type, tag);
  2968		if (unlikely(err))
  2969			goto out_put_tag;
  2970	
  2971		hba->dev_cmd.complete = &wait;
  2972	
  2973		ufshcd_add_query_upiu_trace(hba, UFS_QUERY_SEND, lrbp->ucd_req_ptr);
  2974		/* Make sure descriptors are ready before ringing the doorbell */
  2975		wmb();
  2976	
  2977		ufshcd_send_command(hba, tag);
  2978		err = ufshcd_wait_for_dev_cmd(hba, lrbp, timeout);
  2979	out:
  2980		ufshcd_add_query_upiu_trace(hba, err ? UFS_QUERY_ERR : UFS_QUERY_COMP,
  2981					    (struct utp_upiu_req *)lrbp->ucd_rsp_ptr);
  2982	
  2983	out_put_tag:
  2984		blk_put_request(req);
  2985	out_unlock:
  2986		up_read(&hba->clk_scaling_lock);
  2987		return err;
  2988	}
  2989	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34142 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-05-24 11:25     ` kernel test robot
  0 siblings, 0 replies; 68+ messages in thread
From: kernel test robot @ 2021-05-24 11:25 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5253 bytes --]

Hi Can,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mkp-scsi/for-next]
[also build test WARNING on next-20210524]
[cannot apply to scsi/for-next v5.13-rc3]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Can-Guo/Optimize-host-lock-on-TR-send-compl-paths-and-utilize-UTRLCNR/20210524-163847
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next
config: arm64-randconfig-r011-20210524 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 93d1e5822ed64abd777eb94ea9899e96c4c39fbe)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/efe94162bf7973be4ed6496871b9bc9ea54e2819
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Can-Guo/Optimize-host-lock-on-TR-send-compl-paths-and-utilize-UTRLCNR/20210524-163847
        git checkout efe94162bf7973be4ed6496871b9bc9ea54e2819
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/scsi/ufs/ufshcd.c:2959:6: warning: variable 'lrbp' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
           if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:78:22: note: expanded from macro 'unlikely'
   # define unlikely(x)    __builtin_expect(!!(x), 0)
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/scsi/ufs/ufshcd.c:2981:32: note: uninitialized use occurs here
                                       (struct utp_upiu_req *)lrbp->ucd_rsp_ptr);
                                                              ^~~~
   drivers/scsi/ufs/ufshcd.c:2959:2: note: remove the 'if' if its condition is always false
           if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/scsi/ufs/ufshcd.c:2939:25: note: initialize the variable 'lrbp' to silence this warning
           struct ufshcd_lrb *lrbp;
                                  ^
                                   = NULL
   1 warning generated.


vim +2959 drivers/scsi/ufs/ufshcd.c

  2924	
  2925	/**
  2926	 * ufshcd_exec_dev_cmd - API for sending device management requests
  2927	 * @hba: UFS hba
  2928	 * @cmd_type: specifies the type (NOP, Query...)
  2929	 * @timeout: time in seconds
  2930	 *
  2931	 * NOTE: Since there is only one available tag for device management commands,
  2932	 * it is expected you hold the hba->dev_cmd.lock mutex.
  2933	 */
  2934	static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
  2935			enum dev_cmd_type cmd_type, int timeout)
  2936	{
  2937		struct request_queue *q = hba->cmd_queue;
  2938		struct request *req;
  2939		struct ufshcd_lrb *lrbp;
  2940		int err;
  2941		int tag;
  2942		struct completion wait;
  2943	
  2944		down_read(&hba->clk_scaling_lock);
  2945	
  2946		/*
  2947		 * Get free slot, sleep if slots are unavailable.
  2948		 * Even though we use wait_event() which sleeps indefinitely,
  2949		 * the maximum wait time is bounded by SCSI request timeout.
  2950		 */
  2951		req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
  2952		if (IS_ERR(req)) {
  2953			err = PTR_ERR(req);
  2954			goto out_unlock;
  2955		}
  2956		tag = req->tag;
  2957		WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
  2958	
> 2959		if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
  2960			err = -EBUSY;
  2961			goto out;
  2962		}
  2963	
  2964		init_completion(&wait);
  2965		lrbp = &hba->lrb[tag];
  2966		WARN_ON(lrbp->cmd);
  2967		err = ufshcd_compose_dev_cmd(hba, lrbp, cmd_type, tag);
  2968		if (unlikely(err))
  2969			goto out_put_tag;
  2970	
  2971		hba->dev_cmd.complete = &wait;
  2972	
  2973		ufshcd_add_query_upiu_trace(hba, UFS_QUERY_SEND, lrbp->ucd_req_ptr);
  2974		/* Make sure descriptors are ready before ringing the doorbell */
  2975		wmb();
  2976	
  2977		ufshcd_send_command(hba, tag);
  2978		err = ufshcd_wait_for_dev_cmd(hba, lrbp, timeout);
  2979	out:
  2980		ufshcd_add_query_upiu_trace(hba, err ? UFS_QUERY_ERR : UFS_QUERY_COMP,
  2981					    (struct utp_upiu_req *)lrbp->ucd_rsp_ptr);
  2982	
  2983	out_put_tag:
  2984		blk_put_request(req);
  2985	out_unlock:
  2986		up_read(&hba->clk_scaling_lock);
  2987		return err;
  2988	}
  2989	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 34142 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 1/3] scsi: ufs: Remove a redundant command completion logic in error handler
  2021-05-24  8:36 ` [PATCH v1 1/3] scsi: ufs: Remove a redundant command completion logic in error handler Can Guo
@ 2021-05-24 16:43   ` Bart Van Assche
  2021-05-25  4:15   ` Stanley Chu
  2021-05-31  7:14   ` Bean Huo
  2 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-05-24 16:43 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, Jaegeuk Kim,
	open list

On 5/24/21 1:36 AM, Can Guo wrote:
> ufshcd_host_reset_and_restore() anyways completes all pending requests
> before starts re-probing, so there is no need to complete the command on
> the highest bit in tr_doorbell in advance.
> 
> Signed-off-by: Can Guo <cang@codeaurora.org>
> ---
>  drivers/scsi/ufs/ufshcd.c | 13 -------------
>  1 file changed, 13 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index d543864..c4b37d2 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -6123,19 +6123,6 @@ static void ufshcd_err_handler(struct work_struct *work)
>  do_reset:
>  	/* Fatal errors need reset */
>  	if (needs_reset) {
> -		unsigned long max_doorbells = (1UL << hba->nutrs) - 1;
> -
> -		/*
> -		 * ufshcd_reset_and_restore() does the link reinitialization
> -		 * which will need atleast one empty doorbell slot to send the
> -		 * device management commands (NOP and query commands).
> -		 * If there is no slot empty at this moment then free up last
> -		 * slot forcefully.
> -		 */
> -		if (hba->outstanding_reqs == max_doorbells)
> -			__ufshcd_transfer_req_compl(hba,
> -						    (1UL << (hba->nutrs - 1)));
> -
>  		hba->force_reset = false;
>  		spin_unlock_irqrestore(hba->host->host_lock, flags);
>  		err = ufshcd_reset_and_restore(hba);
> 

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-05-24  8:36   ` Can Guo
  (?)
@ 2021-05-24 20:10     ` Bart Van Assche
  -1 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-05-24 20:10 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/24/21 1:36 AM, Can Guo wrote:
> Current UFS IRQ handler is completely wrapped by host lock, and because
> ufshcd_send_command() is also protected by host lock, when IRQ handler
> fires, not only the CPU running the IRQ handler cannot send new requests,
> the rest CPUs can neither. Move the host lock wrapping the IRQ handler into
> specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further
> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is
> no longer required to call __ufshcd_transfer_req_compl(). As per test, the
> optimization can bring considerable gain to random read/write performance.

Hi Can,

Using the host lock to serialize the completion path against the
submission path was a common practice 11 years ago, before the host lock
push-down (see also
https://linux-scsi.vger.kernel.narkive.com/UEmGgwAc/rfc-patch-scsi-host-lock-push-down).
Modern SCSI LLDs should not use the SCSI host lock. Please consider
introducing one or more new synchronization objects in struct ufs_hba
and to use these instead of the SCSI host lock. That will save multiple
pointer dereferences in the hot path since hba->host->host_lock will
become hba->new_spin_lock.

An additional question is whether it is necessary for v3.0 UFS devices
to serialize the submission path against the completion path? Multiple
high-performance SCSI LLDs support hardware with separate submission and
completion queues and hence do not need any serialization between the
submission and the completion path. I'm asking this because it is likely
that sooner or later multiqueue support will be added in the UFS
specification. Benefiting from multiqueue support will require to rework
locking in the UFS driver anyway.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-05-24 20:10     ` Bart Van Assche
  0 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-05-24 20:10 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/24/21 1:36 AM, Can Guo wrote:
> Current UFS IRQ handler is completely wrapped by host lock, and because
> ufshcd_send_command() is also protected by host lock, when IRQ handler
> fires, not only the CPU running the IRQ handler cannot send new requests,
> the rest CPUs can neither. Move the host lock wrapping the IRQ handler into
> specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further
> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is
> no longer required to call __ufshcd_transfer_req_compl(). As per test, the
> optimization can bring considerable gain to random read/write performance.

Hi Can,

Using the host lock to serialize the completion path against the
submission path was a common practice 11 years ago, before the host lock
push-down (see also
https://linux-scsi.vger.kernel.narkive.com/UEmGgwAc/rfc-patch-scsi-host-lock-push-down).
Modern SCSI LLDs should not use the SCSI host lock. Please consider
introducing one or more new synchronization objects in struct ufs_hba
and to use these instead of the SCSI host lock. That will save multiple
pointer dereferences in the hot path since hba->host->host_lock will
become hba->new_spin_lock.

An additional question is whether it is necessary for v3.0 UFS devices
to serialize the submission path against the completion path? Multiple
high-performance SCSI LLDs support hardware with separate submission and
completion queues and hence do not need any serialization between the
submission and the completion path. I'm asking this because it is likely
that sooner or later multiqueue support will be added in the UFS
specification. Benefiting from multiqueue support will require to rework
locking in the UFS driver anyway.

Thanks,

Bart.

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-05-24 20:10     ` Bart Van Assche
  0 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-05-24 20:10 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/24/21 1:36 AM, Can Guo wrote:
> Current UFS IRQ handler is completely wrapped by host lock, and because
> ufshcd_send_command() is also protected by host lock, when IRQ handler
> fires, not only the CPU running the IRQ handler cannot send new requests,
> the rest CPUs can neither. Move the host lock wrapping the IRQ handler into
> specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further
> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is
> no longer required to call __ufshcd_transfer_req_compl(). As per test, the
> optimization can bring considerable gain to random read/write performance.

Hi Can,

Using the host lock to serialize the completion path against the
submission path was a common practice 11 years ago, before the host lock
push-down (see also
https://linux-scsi.vger.kernel.narkive.com/UEmGgwAc/rfc-patch-scsi-host-lock-push-down).
Modern SCSI LLDs should not use the SCSI host lock. Please consider
introducing one or more new synchronization objects in struct ufs_hba
and to use these instead of the SCSI host lock. That will save multiple
pointer dereferences in the hot path since hba->host->host_lock will
become hba->new_spin_lock.

An additional question is whether it is necessary for v3.0 UFS devices
to serialize the submission path against the completion path? Multiple
high-performance SCSI LLDs support hardware with separate submission and
completion queues and hence do not need any serialization between the
submission and the completion path. I'm asking this because it is likely
that sooner or later multiqueue support will be added in the UFS
specification. Benefiting from multiqueue support will require to rework
locking in the UFS driver anyway.

Thanks,

Bart.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-05-24 20:10     ` Bart Van Assche
@ 2021-05-25  1:34       ` Asutosh Das (asd)
  -1 siblings, 0 replies; 68+ messages in thread
From: Asutosh Das (asd) @ 2021-05-25  1:34 UTC (permalink / raw)
  To: Bart Van Assche, Can Guo, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/24/2021 1:10 PM, Bart Van Assche wrote:
> On 5/24/21 1:36 AM, Can Guo wrote:
>> Current UFS IRQ handler is completely wrapped by host lock, and because
>> ufshcd_send_command() is also protected by host lock, when IRQ handler
>> fires, not only the CPU running the IRQ handler cannot send new requests,
>> the rest CPUs can neither. Move the host lock wrapping the IRQ handler into
>> specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
>> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further
>> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is
>> no longer required to call __ufshcd_transfer_req_compl(). As per test, the
>> optimization can bring considerable gain to random read/write performance.
> 

> An additional question is whether it is necessary for v3.0 UFS devices
> to serialize the submission path against the completion path? Multiple
> high-performance SCSI LLDs support hardware with separate submission and
> completion queues and hence do not need any serialization between the
> submission and the completion path. I'm asking this because it is likely
> that sooner or later multiqueue support will be added in the UFS
> specification. Benefiting from multiqueue support will require to rework
> locking in the UFS driver anyway.
> 
Hi Bart,
No it's not necessary to serialize both the paths. I think this series 
attempts to remove this serialization to a certain degree, which is 
what's giving the performance improvement.

Even if multiqueue support would be available in the future, I think 
this change is apt now for the current available specification.

> Thanks,
> 
> Bart.
> 


Thanks,
-asd

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-05-25  1:34       ` Asutosh Das (asd)
  0 siblings, 0 replies; 68+ messages in thread
From: Asutosh Das (asd) @ 2021-05-25  1:34 UTC (permalink / raw)
  To: Bart Van Assche, Can Guo, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/24/2021 1:10 PM, Bart Van Assche wrote:
> On 5/24/21 1:36 AM, Can Guo wrote:
>> Current UFS IRQ handler is completely wrapped by host lock, and because
>> ufshcd_send_command() is also protected by host lock, when IRQ handler
>> fires, not only the CPU running the IRQ handler cannot send new requests,
>> the rest CPUs can neither. Move the host lock wrapping the IRQ handler into
>> specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
>> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further
>> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is
>> no longer required to call __ufshcd_transfer_req_compl(). As per test, the
>> optimization can bring considerable gain to random read/write performance.
> 

> An additional question is whether it is necessary for v3.0 UFS devices
> to serialize the submission path against the completion path? Multiple
> high-performance SCSI LLDs support hardware with separate submission and
> completion queues and hence do not need any serialization between the
> submission and the completion path. I'm asking this because it is likely
> that sooner or later multiqueue support will be added in the UFS
> specification. Benefiting from multiqueue support will require to rework
> locking in the UFS driver anyway.
> 
Hi Bart,
No it's not necessary to serialize both the paths. I think this series 
attempts to remove this serialization to a certain degree, which is 
what's giving the performance improvement.

Even if multiqueue support would be available in the future, I think 
this change is apt now for the current available specification.

> Thanks,
> 
> Bart.
> 


Thanks,
-asd

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
Linux Foundation Collaborative Project

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-05-24 20:10     ` Bart Van Assche
@ 2021-05-25  1:40       ` Can Guo
  -1 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-05-25  1:40 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Stanley Chu,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support


On 2021-05-25 04:10, Bart Van Assche wrote:
> On 5/24/21 1:36 AM, Can Guo wrote:
>> Current UFS IRQ handler is completely wrapped by host lock, and 
>> because
>> ufshcd_send_command() is also protected by host lock, when IRQ handler
>> fires, not only the CPU running the IRQ handler cannot send new 
>> requests,
>> the rest CPUs can neither. Move the host lock wrapping the IRQ handler 
>> into
>> specific branches, i.e., ufshcd_uic_cmd_compl(), 
>> ufshcd_check_errors(),
>> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to 
>> further
>> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host 
>> lock is
>> no longer required to call __ufshcd_transfer_req_compl(). As per test, 
>> the
>> optimization can bring considerable gain to random read/write 
>> performance.
> 
> Hi Can,
> 
> Using the host lock to serialize the completion path against the
> submission path was a common practice 11 years ago, before the host 
> lock
> push-down (see also
> https://linux-scsi.vger.kernel.narkive.com/UEmGgwAc/rfc-patch-scsi-host-lock-push-down).
> Modern SCSI LLDs should not use the SCSI host lock. Please consider
> introducing one or more new synchronization objects in struct ufs_hba
> and to use these instead of the SCSI host lock. That will save multiple
> pointer dereferences in the hot path since hba->host->host_lock will
> become hba->new_spin_lock.
> 
> An additional question is whether it is necessary for v3.0 UFS devices
> to serialize the submission path against the completion path? Multiple
> high-performance SCSI LLDs support hardware with separate submission 
> and
> completion queues and hence do not need any serialization between the
> submission and the completion path. I'm asking this because it is 
> likely
> that sooner or later multiqueue support will be added in the UFS
> specification. Benefiting from multiqueue support will require to 
> rework
> locking in the UFS driver anyway.
> 

Hi Bart,

Agree with all above, and what you ask is right what we are doing in the
3rd change - get rid of host lock on dispatch and completion paths.

I agree with using dedicated spin locks for dedicated purposes in UFS 
driver,
e.g., clk gating has its own gating_lock and clk scaling has its own 
scaling_lock.
But this specific series is only for improving performance. We will take 
your
comments into consideration and address it in future.

Thanks,

Can Guo.

> Thanks,
> 
> Bart.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-05-25  1:40       ` Can Guo
  0 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-05-25  1:40 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Stanley Chu,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support


On 2021-05-25 04:10, Bart Van Assche wrote:
> On 5/24/21 1:36 AM, Can Guo wrote:
>> Current UFS IRQ handler is completely wrapped by host lock, and 
>> because
>> ufshcd_send_command() is also protected by host lock, when IRQ handler
>> fires, not only the CPU running the IRQ handler cannot send new 
>> requests,
>> the rest CPUs can neither. Move the host lock wrapping the IRQ handler 
>> into
>> specific branches, i.e., ufshcd_uic_cmd_compl(), 
>> ufshcd_check_errors(),
>> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to 
>> further
>> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host 
>> lock is
>> no longer required to call __ufshcd_transfer_req_compl(). As per test, 
>> the
>> optimization can bring considerable gain to random read/write 
>> performance.
> 
> Hi Can,
> 
> Using the host lock to serialize the completion path against the
> submission path was a common practice 11 years ago, before the host 
> lock
> push-down (see also
> https://linux-scsi.vger.kernel.narkive.com/UEmGgwAc/rfc-patch-scsi-host-lock-push-down).
> Modern SCSI LLDs should not use the SCSI host lock. Please consider
> introducing one or more new synchronization objects in struct ufs_hba
> and to use these instead of the SCSI host lock. That will save multiple
> pointer dereferences in the hot path since hba->host->host_lock will
> become hba->new_spin_lock.
> 
> An additional question is whether it is necessary for v3.0 UFS devices
> to serialize the submission path against the completion path? Multiple
> high-performance SCSI LLDs support hardware with separate submission 
> and
> completion queues and hence do not need any serialization between the
> submission and the completion path. I'm asking this because it is 
> likely
> that sooner or later multiqueue support will be added in the UFS
> specification. Benefiting from multiqueue support will require to 
> rework
> locking in the UFS driver anyway.
> 

Hi Bart,

Agree with all above, and what you ask is right what we are doing in the
3rd change - get rid of host lock on dispatch and completion paths.

I agree with using dedicated spin locks for dedicated purposes in UFS 
driver,
e.g., clk gating has its own gating_lock and clk scaling has its own 
scaling_lock.
But this specific series is only for improving performance. We will take 
your
comments into consideration and address it in future.

Thanks,

Can Guo.

> Thanks,
> 
> Bart.

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 1/3] scsi: ufs: Remove a redundant command completion logic in error handler
  2021-05-24  8:36 ` [PATCH v1 1/3] scsi: ufs: Remove a redundant command completion logic in error handler Can Guo
  2021-05-24 16:43   ` Bart Van Assche
@ 2021-05-25  4:15   ` Stanley Chu
  2021-05-31  7:14   ` Bean Huo
  2 siblings, 0 replies; 68+ messages in thread
From: Stanley Chu @ 2021-05-25  4:15 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Alim Akhtar,
	Avri Altman, James E.J. Bottomley, Martin K. Petersen, Bean Huo,
	Jaegeuk Kim, open list

On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
> ufshcd_host_reset_and_restore() anyways completes all pending requests
> before starts re-probing, so there is no need to complete the command on
> the highest bit in tr_doorbell in advance.
> 
> Signed-off-by: Can Guo <cang@codeaurora.org>
> ---
>  drivers/scsi/ufs/ufshcd.c | 13 -------------
>  1 file changed, 13 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index d543864..c4b37d2 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -6123,19 +6123,6 @@ static void ufshcd_err_handler(struct work_struct *work)
>  do_reset:
>  	/* Fatal errors need reset */
>  	if (needs_reset) {
> -		unsigned long max_doorbells = (1UL << hba->nutrs) - 1;
> -
> -		/*
> -		 * ufshcd_reset_and_restore() does the link reinitialization
> -		 * which will need atleast one empty doorbell slot to send the
> -		 * device management commands (NOP and query commands).
> -		 * If there is no slot empty at this moment then free up last
> -		 * slot forcefully.
> -		 */
> -		if (hba->outstanding_reqs == max_doorbells)
> -			__ufshcd_transfer_req_compl(hba,
> -						    (1UL << (hba->nutrs - 1)));
> -
>  		hba->force_reset = false;
>  		spin_unlock_irqrestore(hba->host->host_lock, flags);
>  		err = ufshcd_reset_and_restore(hba);

Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* RE: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-05-25  1:34       ` Asutosh Das (asd)
  (?)
@ 2021-05-25  8:24         ` Avri Altman
  -1 siblings, 0 replies; 68+ messages in thread
From: Avri Altman @ 2021-05-25  8:24 UTC (permalink / raw)
  To: Asutosh Das (asd),
	Bart Van Assche, Can Guo, nguyenb, hongwus, linux-scsi,
	kernel-team
  Cc: Stanley Chu, Alim Akhtar, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

> On 5/24/2021 1:10 PM, Bart Van Assche wrote:
> > On 5/24/21 1:36 AM, Can Guo wrote:
> >> Current UFS IRQ handler is completely wrapped by host lock, and because
> >> ufshcd_send_command() is also protected by host lock, when IRQ handler
> >> fires, not only the CPU running the IRQ handler cannot send new
> requests,
> >> the rest CPUs can neither. Move the host lock wrapping the IRQ handler
> into
> >> specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
> >> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to
> further
> >> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock
> is
> >> no longer required to call __ufshcd_transfer_req_compl(). As per test, the
> >> optimization can bring considerable gain to random read/write
> performance.
> >
> 
> > An additional question is whether it is necessary for v3.0 UFS devices
> > to serialize the submission path against the completion path? Multiple
> > high-performance SCSI LLDs support hardware with separate submission
> and
> > completion queues and hence do not need any serialization between the
> > submission and the completion path. I'm asking this because it is likely
> > that sooner or later multiqueue support will be added in the UFS
> > specification. Benefiting from multiqueue support will require to rework
> > locking in the UFS driver anyway.
> >
> Hi Bart,
> No it's not necessary to serialize both the paths. I think this series
> attempts to remove this serialization to a certain degree, which is
> what's giving the performance improvement.
> 
> Even if multiqueue support would be available in the future, I think
> this change is apt now for the current available specification.
I agree - this looks like the harbinger of a major change,
And going further with respect of hw queues,
will need the spec support - e.g. doorbell per lane, etc.

Thanks,
Avri
 
> > Thanks,
> >
> > Bart.
> >
> 
> 
> Thanks,
> -asd
> 
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
> Forum,
> Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 68+ messages in thread

* RE: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-05-25  8:24         ` Avri Altman
  0 siblings, 0 replies; 68+ messages in thread
From: Avri Altman @ 2021-05-25  8:24 UTC (permalink / raw)
  To: Asutosh Das (asd),
	Bart Van Assche, Can Guo, nguyenb, hongwus, linux-scsi,
	kernel-team
  Cc: Stanley Chu, Alim Akhtar, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

> On 5/24/2021 1:10 PM, Bart Van Assche wrote:
> > On 5/24/21 1:36 AM, Can Guo wrote:
> >> Current UFS IRQ handler is completely wrapped by host lock, and because
> >> ufshcd_send_command() is also protected by host lock, when IRQ handler
> >> fires, not only the CPU running the IRQ handler cannot send new
> requests,
> >> the rest CPUs can neither. Move the host lock wrapping the IRQ handler
> into
> >> specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
> >> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to
> further
> >> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock
> is
> >> no longer required to call __ufshcd_transfer_req_compl(). As per test, the
> >> optimization can bring considerable gain to random read/write
> performance.
> >
> 
> > An additional question is whether it is necessary for v3.0 UFS devices
> > to serialize the submission path against the completion path? Multiple
> > high-performance SCSI LLDs support hardware with separate submission
> and
> > completion queues and hence do not need any serialization between the
> > submission and the completion path. I'm asking this because it is likely
> > that sooner or later multiqueue support will be added in the UFS
> > specification. Benefiting from multiqueue support will require to rework
> > locking in the UFS driver anyway.
> >
> Hi Bart,
> No it's not necessary to serialize both the paths. I think this series
> attempts to remove this serialization to a certain degree, which is
> what's giving the performance improvement.
> 
> Even if multiqueue support would be available in the future, I think
> this change is apt now for the current available specification.
I agree - this looks like the harbinger of a major change,
And going further with respect of hw queues,
will need the spec support - e.g. doorbell per lane, etc.

Thanks,
Avri
 
> > Thanks,
> >
> > Bart.
> >
> 
> 
> Thanks,
> -asd
> 
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
> Forum,
> Linux Foundation Collaborative Project
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* RE: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-05-25  8:24         ` Avri Altman
  0 siblings, 0 replies; 68+ messages in thread
From: Avri Altman @ 2021-05-25  8:24 UTC (permalink / raw)
  To: Asutosh Das (asd),
	Bart Van Assche, Can Guo, nguyenb, hongwus, linux-scsi,
	kernel-team
  Cc: Stanley Chu, Alim Akhtar, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

> On 5/24/2021 1:10 PM, Bart Van Assche wrote:
> > On 5/24/21 1:36 AM, Can Guo wrote:
> >> Current UFS IRQ handler is completely wrapped by host lock, and because
> >> ufshcd_send_command() is also protected by host lock, when IRQ handler
> >> fires, not only the CPU running the IRQ handler cannot send new
> requests,
> >> the rest CPUs can neither. Move the host lock wrapping the IRQ handler
> into
> >> specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
> >> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to
> further
> >> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock
> is
> >> no longer required to call __ufshcd_transfer_req_compl(). As per test, the
> >> optimization can bring considerable gain to random read/write
> performance.
> >
> 
> > An additional question is whether it is necessary for v3.0 UFS devices
> > to serialize the submission path against the completion path? Multiple
> > high-performance SCSI LLDs support hardware with separate submission
> and
> > completion queues and hence do not need any serialization between the
> > submission and the completion path. I'm asking this because it is likely
> > that sooner or later multiqueue support will be added in the UFS
> > specification. Benefiting from multiqueue support will require to rework
> > locking in the UFS driver anyway.
> >
> Hi Bart,
> No it's not necessary to serialize both the paths. I think this series
> attempts to remove this serialization to a certain degree, which is
> what's giving the performance improvement.
> 
> Even if multiqueue support would be available in the future, I think
> this change is apt now for the current available specification.
I agree - this looks like the harbinger of a major change,
And going further with respect of hw queues,
will need the spec support - e.g. doorbell per lane, etc.

Thanks,
Avri
 
> > Thanks,
> >
> > Bart.
> >
> 
> 
> Thanks,
> -asd
> 
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
> Forum,
> Linux Foundation Collaborative Project
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-05-25  1:40       ` Can Guo
  (?)
@ 2021-05-25 16:40         ` Bart Van Assche
  -1 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-05-25 16:40 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Stanley Chu,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/24/21 6:40 PM, Can Guo wrote:
> Agree with all above, and what you ask is right what we are doing in
> the 3rd change - get rid of host lock on dispatch and completion
> paths.
> 
> I agree with using dedicated spin locks for dedicated purposes in
> UFS driver, e.g., clk gating has its own gating_lock and clk scaling
> has its own scaling_lock. But this specific series is only for
> improving performance. We will take your comments into consideration
> and address it in future.
Thanks!

Bart.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-05-25 16:40         ` Bart Van Assche
  0 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-05-25 16:40 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Stanley Chu,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/24/21 6:40 PM, Can Guo wrote:
> Agree with all above, and what you ask is right what we are doing in
> the 3rd change - get rid of host lock on dispatch and completion
> paths.
> 
> I agree with using dedicated spin locks for dedicated purposes in
> UFS driver, e.g., clk gating has its own gating_lock and clk scaling
> has its own scaling_lock. But this specific series is only for
> improving performance. We will take your comments into consideration
> and address it in future.
Thanks!

Bart.

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-05-25 16:40         ` Bart Van Assche
  0 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-05-25 16:40 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Stanley Chu,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/24/21 6:40 PM, Can Guo wrote:
> Agree with all above, and what you ask is right what we are doing in
> the 3rd change - get rid of host lock on dispatch and completion
> paths.
> 
> I agree with using dedicated spin locks for dedicated purposes in
> UFS driver, e.g., clk gating has its own gating_lock and clk scaling
> has its own scaling_lock. But this specific series is only for
> improving performance. We will take your comments into consideration
> and address it in future.
Thanks!

Bart.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* RE: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-05-25  8:24         ` Avri Altman
  (?)
@ 2021-05-28  7:30           ` Avri Altman
  -1 siblings, 0 replies; 68+ messages in thread
From: Avri Altman @ 2021-05-28  7:30 UTC (permalink / raw)
  To: Asutosh Das (asd),
	Bart Van Assche, Can Guo, nguyenb, hongwus, linux-scsi,
	kernel-team
  Cc: Stanley Chu, Alim Akhtar, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

> > On 5/24/2021 1:10 PM, Bart Van Assche wrote:
> > > On 5/24/21 1:36 AM, Can Guo wrote:
> > >> Current UFS IRQ handler is completely wrapped by host lock, and
> because
> > >> ufshcd_send_command() is also protected by host lock, when IRQ
> handler
> > >> fires, not only the CPU running the IRQ handler cannot send new
> > requests,
> > >> the rest CPUs can neither. Move the host lock wrapping the IRQ handler
> > into
> > >> specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
> > >> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to
> > further
> > >> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host
> lock
> > is
> > >> no longer required to call __ufshcd_transfer_req_compl(). As per test,
> the
> > >> optimization can bring considerable gain to random read/write
> > performance.
> > >
> >
> > > An additional question is whether it is necessary for v3.0 UFS devices
> > > to serialize the submission path against the completion path? Multiple
> > > high-performance SCSI LLDs support hardware with separate submission
> > and
> > > completion queues and hence do not need any serialization between the
> > > submission and the completion path. I'm asking this because it is likely
> > > that sooner or later multiqueue support will be added in the UFS
> > > specification. Benefiting from multiqueue support will require to rework
> > > locking in the UFS driver anyway.
> > >
> > Hi Bart,
> > No it's not necessary to serialize both the paths. I think this series
> > attempts to remove this serialization to a certain degree, which is
> > what's giving the performance improvement.
Btw, Is this performance improvement is on top of rq_affinity 2 or 1?

Thanks,
Avri

> >
> > Even if multiqueue support would be available in the future, I think
> > this change is apt now for the current available specification.
> I agree - this looks like the harbinger of a major change,
> And going further with respect of hw queues,
> will need the spec support - e.g. doorbell per lane, etc.
> 
> Thanks,
> Avri
> 
> > > Thanks,
> > >
> > > Bart.
> > >
> >
> >
> > Thanks,
> > -asd
> >
> > --
> > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
> > Forum,
> > Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 68+ messages in thread

* RE: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-05-28  7:30           ` Avri Altman
  0 siblings, 0 replies; 68+ messages in thread
From: Avri Altman @ 2021-05-28  7:30 UTC (permalink / raw)
  To: Asutosh Das (asd),
	Bart Van Assche, Can Guo, nguyenb, hongwus, linux-scsi,
	kernel-team
  Cc: Stanley Chu, Alim Akhtar, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

> > On 5/24/2021 1:10 PM, Bart Van Assche wrote:
> > > On 5/24/21 1:36 AM, Can Guo wrote:
> > >> Current UFS IRQ handler is completely wrapped by host lock, and
> because
> > >> ufshcd_send_command() is also protected by host lock, when IRQ
> handler
> > >> fires, not only the CPU running the IRQ handler cannot send new
> > requests,
> > >> the rest CPUs can neither. Move the host lock wrapping the IRQ handler
> > into
> > >> specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
> > >> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to
> > further
> > >> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host
> lock
> > is
> > >> no longer required to call __ufshcd_transfer_req_compl(). As per test,
> the
> > >> optimization can bring considerable gain to random read/write
> > performance.
> > >
> >
> > > An additional question is whether it is necessary for v3.0 UFS devices
> > > to serialize the submission path against the completion path? Multiple
> > > high-performance SCSI LLDs support hardware with separate submission
> > and
> > > completion queues and hence do not need any serialization between the
> > > submission and the completion path. I'm asking this because it is likely
> > > that sooner or later multiqueue support will be added in the UFS
> > > specification. Benefiting from multiqueue support will require to rework
> > > locking in the UFS driver anyway.
> > >
> > Hi Bart,
> > No it's not necessary to serialize both the paths. I think this series
> > attempts to remove this serialization to a certain degree, which is
> > what's giving the performance improvement.
Btw, Is this performance improvement is on top of rq_affinity 2 or 1?

Thanks,
Avri

> >
> > Even if multiqueue support would be available in the future, I think
> > this change is apt now for the current available specification.
> I agree - this looks like the harbinger of a major change,
> And going further with respect of hw queues,
> will need the spec support - e.g. doorbell per lane, etc.
> 
> Thanks,
> Avri
> 
> > > Thanks,
> > >
> > > Bart.
> > >
> >
> >
> > Thanks,
> > -asd
> >
> > --
> > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
> > Forum,
> > Linux Foundation Collaborative Project
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* RE: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-05-28  7:30           ` Avri Altman
  0 siblings, 0 replies; 68+ messages in thread
From: Avri Altman @ 2021-05-28  7:30 UTC (permalink / raw)
  To: Asutosh Das (asd),
	Bart Van Assche, Can Guo, nguyenb, hongwus, linux-scsi,
	kernel-team
  Cc: Stanley Chu, Alim Akhtar, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

> > On 5/24/2021 1:10 PM, Bart Van Assche wrote:
> > > On 5/24/21 1:36 AM, Can Guo wrote:
> > >> Current UFS IRQ handler is completely wrapped by host lock, and
> because
> > >> ufshcd_send_command() is also protected by host lock, when IRQ
> handler
> > >> fires, not only the CPU running the IRQ handler cannot send new
> > requests,
> > >> the rest CPUs can neither. Move the host lock wrapping the IRQ handler
> > into
> > >> specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
> > >> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to
> > further
> > >> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host
> lock
> > is
> > >> no longer required to call __ufshcd_transfer_req_compl(). As per test,
> the
> > >> optimization can bring considerable gain to random read/write
> > performance.
> > >
> >
> > > An additional question is whether it is necessary for v3.0 UFS devices
> > > to serialize the submission path against the completion path? Multiple
> > > high-performance SCSI LLDs support hardware with separate submission
> > and
> > > completion queues and hence do not need any serialization between the
> > > submission and the completion path. I'm asking this because it is likely
> > > that sooner or later multiqueue support will be added in the UFS
> > > specification. Benefiting from multiqueue support will require to rework
> > > locking in the UFS driver anyway.
> > >
> > Hi Bart,
> > No it's not necessary to serialize both the paths. I think this series
> > attempts to remove this serialization to a certain degree, which is
> > what's giving the performance improvement.
Btw, Is this performance improvement is on top of rq_affinity 2 or 1?

Thanks,
Avri

> >
> > Even if multiqueue support would be available in the future, I think
> > this change is apt now for the current available specification.
> I agree - this looks like the harbinger of a major change,
> And going further with respect of hw queues,
> will need the spec support - e.g. doorbell per lane, etc.
> 
> Thanks,
> Avri
> 
> > > Thanks,
> > >
> > > Bart.
> > >
> >
> >
> > Thanks,
> > -asd
> >
> > --
> > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
> > Forum,
> > Linux Foundation Collaborative Project
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 1/3] scsi: ufs: Remove a redundant command completion logic in error handler
  2021-05-24  8:36 ` [PATCH v1 1/3] scsi: ufs: Remove a redundant command completion logic in error handler Can Guo
  2021-05-24 16:43   ` Bart Van Assche
  2021-05-25  4:15   ` Stanley Chu
@ 2021-05-31  7:14   ` Bean Huo
  2 siblings, 0 replies; 68+ messages in thread
From: Bean Huo @ 2021-05-31  7:14 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, Jaegeuk Kim,
	open list

On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
> ufshcd_host_reset_and_restore() anyways completes all pending
> requests
> 
> before starts re-probing, so there is no need to complete the command
> on
> 
> the highest bit in tr_doorbell in advance.
> 
> 
> 
> Signed-off-by: Can Guo <cang@codeaurora.org>

Looks good to me.

Reviewed-by: Bean Huo <beanhuo@micron.com>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-05-24  8:36   ` Can Guo
  (?)
@ 2021-05-31 16:04     ` Bean Huo
  -1 siblings, 0 replies; 68+ messages in thread
From: Bean Huo @ 2021-05-31 16:04 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
> Current UFS IRQ handler is completely wrapped by host lock, and
> because
> 
> ufshcd_send_command() is also protected by host lock, when IRQ
> handler
> 
> fires, not only the CPU running the IRQ handler cannot send new
> requests,
> 
> the rest CPUs can neither. Move the host lock wrapping the IRQ
> handler into
> 
> specific branches, i.e., ufshcd_uic_cmd_compl(),
> ufshcd_check_errors(),
> 
> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to
> further
> 
> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host
> lock is
> 
> no longer required to call __ufshcd_transfer_req_compl(). As per
> test, the
> 
> optimization can bring considerable gain to random read/write
> performance.
> 
> 
> 
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> 
> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
> 
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> 
> Signed-off-by: Can Guo <cang@codeaurora.org>

Can,
The patch looks good to me.
I did a UFS queue limitation test before, observed that once the queue
is full, then the active task number in the queue will get down. For
the Nvme, the scenario is the same. You can refer to the slide 23, and
slide 24 in the pdf:
https://elinux.org/images/6/6c/Linux_Storage_System_Bottleneck_Exploration_V0.3.pdf I don't know if your patch can fix this
issue.

Unfortunately, I cannot verify UTRLCNR usage flow since my platform is
v2.1. But at least my test can prove that the patch doesn't impact the
legacy(UFSHCI is less than v3.0) doorbell usage flow.

Reviewed-by: Bean Huo <beanhuo@micron.com>


Bean


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-05-31 16:04     ` Bean Huo
  0 siblings, 0 replies; 68+ messages in thread
From: Bean Huo @ 2021-05-31 16:04 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
> Current UFS IRQ handler is completely wrapped by host lock, and
> because
> 
> ufshcd_send_command() is also protected by host lock, when IRQ
> handler
> 
> fires, not only the CPU running the IRQ handler cannot send new
> requests,
> 
> the rest CPUs can neither. Move the host lock wrapping the IRQ
> handler into
> 
> specific branches, i.e., ufshcd_uic_cmd_compl(),
> ufshcd_check_errors(),
> 
> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to
> further
> 
> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host
> lock is
> 
> no longer required to call __ufshcd_transfer_req_compl(). As per
> test, the
> 
> optimization can bring considerable gain to random read/write
> performance.
> 
> 
> 
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> 
> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
> 
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> 
> Signed-off-by: Can Guo <cang@codeaurora.org>

Can,
The patch looks good to me.
I did a UFS queue limitation test before, observed that once the queue
is full, then the active task number in the queue will get down. For
the Nvme, the scenario is the same. You can refer to the slide 23, and
slide 24 in the pdf:
https://elinux.org/images/6/6c/Linux_Storage_System_Bottleneck_Exploration_V0.3.pdf I don't know if your patch can fix this
issue.

Unfortunately, I cannot verify UTRLCNR usage flow since my platform is
v2.1. But at least my test can prove that the patch doesn't impact the
legacy(UFSHCI is less than v3.0) doorbell usage flow.

Reviewed-by: Bean Huo <beanhuo@micron.com>


Bean


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-05-31 16:04     ` Bean Huo
  0 siblings, 0 replies; 68+ messages in thread
From: Bean Huo @ 2021-05-31 16:04 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
> Current UFS IRQ handler is completely wrapped by host lock, and
> because
> 
> ufshcd_send_command() is also protected by host lock, when IRQ
> handler
> 
> fires, not only the CPU running the IRQ handler cannot send new
> requests,
> 
> the rest CPUs can neither. Move the host lock wrapping the IRQ
> handler into
> 
> specific branches, i.e., ufshcd_uic_cmd_compl(),
> ufshcd_check_errors(),
> 
> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to
> further
> 
> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host
> lock is
> 
> no longer required to call __ufshcd_transfer_req_compl(). As per
> test, the
> 
> optimization can bring considerable gain to random read/write
> performance.
> 
> 
> 
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> 
> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
> 
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> 
> Signed-off-by: Can Guo <cang@codeaurora.org>

Can,
The patch looks good to me.
I did a UFS queue limitation test before, observed that once the queue
is full, then the active task number in the queue will get down. For
the Nvme, the scenario is the same. You can refer to the slide 23, and
slide 24 in the pdf:
https://elinux.org/images/6/6c/Linux_Storage_System_Bottleneck_Exploration_V0.3.pdf I don't know if your patch can fix this
issue.

Unfortunately, I cannot verify UTRLCNR usage flow since my platform is
v2.1. But at least my test can prove that the patch doesn't impact the
legacy(UFSHCI is less than v3.0) doorbell usage flow.

Reviewed-by: Bean Huo <beanhuo@micron.com>


Bean


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 3/3] scsi: ufs: Utilize Transfer Request List Completion Notification Register
  2021-05-24  8:36   ` Can Guo
  (?)
@ 2021-05-31 16:05     ` Bean Huo
  -1 siblings, 0 replies; 68+ messages in thread
From: Bean Huo @ 2021-05-31 16:05 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, Gustavo A. R. Silva,
	Caleb Connolly, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
> By reading the UTP Transfer Request List Completion Notification
> Register,
> 
> which is added in UFSHCI Ver 3.0, SW can easily get the compeleted
> transfer
> 
> requests. Thus, SW can get rid of host lock, which is used to
> synchronize
> 
> the tr_doorbell and outstanding_reqs, on transfer requests dispatch
> and
> 
> completion paths. This can further benefit random read/write
> performance.
> 
> 
> 
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> 
> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
> 
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> 
> Signed-off-by: Can Guo <cang@codeaurora.org>

Reviewed-by: Bean Huo <beanhuo@micron.com>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 3/3] scsi: ufs: Utilize Transfer Request List Completion Notification Register
@ 2021-05-31 16:05     ` Bean Huo
  0 siblings, 0 replies; 68+ messages in thread
From: Bean Huo @ 2021-05-31 16:05 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, Gustavo A. R. Silva,
	Caleb Connolly, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
> By reading the UTP Transfer Request List Completion Notification
> Register,
> 
> which is added in UFSHCI Ver 3.0, SW can easily get the compeleted
> transfer
> 
> requests. Thus, SW can get rid of host lock, which is used to
> synchronize
> 
> the tr_doorbell and outstanding_reqs, on transfer requests dispatch
> and
> 
> completion paths. This can further benefit random read/write
> performance.
> 
> 
> 
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> 
> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
> 
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> 
> Signed-off-by: Can Guo <cang@codeaurora.org>

Reviewed-by: Bean Huo <beanhuo@micron.com>


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 3/3] scsi: ufs: Utilize Transfer Request List Completion Notification Register
@ 2021-05-31 16:05     ` Bean Huo
  0 siblings, 0 replies; 68+ messages in thread
From: Bean Huo @ 2021-05-31 16:05 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, Gustavo A. R. Silva,
	Caleb Connolly, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
> By reading the UTP Transfer Request List Completion Notification
> Register,
> 
> which is added in UFSHCI Ver 3.0, SW can easily get the compeleted
> transfer
> 
> requests. Thus, SW can get rid of host lock, which is used to
> synchronize
> 
> the tr_doorbell and outstanding_reqs, on transfer requests dispatch
> and
> 
> completion paths. This can further benefit random read/write
> performance.
> 
> 
> 
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> 
> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
> 
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> 
> Signed-off-by: Can Guo <cang@codeaurora.org>

Reviewed-by: Bean Huo <beanhuo@micron.com>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-05-31 16:04     ` Bean Huo
@ 2021-06-02  2:14       ` Can Guo
  -1 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-06-02  2:14 UTC (permalink / raw)
  To: Bean Huo
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Stanley Chu,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

Hi Bean,

On 2021-06-01 00:04, Bean Huo wrote:
> On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
>> Current UFS IRQ handler is completely wrapped by host lock, and
>> because
>> 
>> ufshcd_send_command() is also protected by host lock, when IRQ
>> handler
>> 
>> fires, not only the CPU running the IRQ handler cannot send new
>> requests,
>> 
>> the rest CPUs can neither. Move the host lock wrapping the IRQ
>> handler into
>> 
>> specific branches, i.e., ufshcd_uic_cmd_compl(),
>> ufshcd_check_errors(),
>> 
>> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to
>> further
>> 
>> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host
>> lock is
>> 
>> no longer required to call __ufshcd_transfer_req_compl(). As per
>> test, the
>> 
>> optimization can bring considerable gain to random read/write
>> performance.
>> 
>> 
>> 
>> Cc: Stanley Chu <stanley.chu@mediatek.com>
>> 
>> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
>> 
>> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
>> 
>> Signed-off-by: Can Guo <cang@codeaurora.org>
> 
> Can,
> The patch looks good to me.
> I did a UFS queue limitation test before, observed that once the queue
> is full, then the active task number in the queue will get down. For
> the Nvme, the scenario is the same. You can refer to the slide 23, and
> slide 24 in the pdf:
> https://elinux.org/images/6/6c/Linux_Storage_System_Bottleneck_Exploration_V0.3.pdf
> I don't know if your patch can fix this
> issue.

I've studied these slides made by you many times, it is really good.
I will do some study later on this. Thanks for the slides.

> 
> Unfortunately, I cannot verify UTRLCNR usage flow since my platform is
> v2.1. But at least my test can prove that the patch doesn't impact the
> legacy(UFSHCI is less than v3.0) doorbell usage flow.
> 

Thanks for your time :).

Regards,
Can Guo.

> Reviewed-by: Bean Huo <beanhuo@micron.com>
> 
> 
> Bean

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-06-02  2:14       ` Can Guo
  0 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-06-02  2:14 UTC (permalink / raw)
  To: Bean Huo
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Stanley Chu,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

Hi Bean,

On 2021-06-01 00:04, Bean Huo wrote:
> On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
>> Current UFS IRQ handler is completely wrapped by host lock, and
>> because
>> 
>> ufshcd_send_command() is also protected by host lock, when IRQ
>> handler
>> 
>> fires, not only the CPU running the IRQ handler cannot send new
>> requests,
>> 
>> the rest CPUs can neither. Move the host lock wrapping the IRQ
>> handler into
>> 
>> specific branches, i.e., ufshcd_uic_cmd_compl(),
>> ufshcd_check_errors(),
>> 
>> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to
>> further
>> 
>> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host
>> lock is
>> 
>> no longer required to call __ufshcd_transfer_req_compl(). As per
>> test, the
>> 
>> optimization can bring considerable gain to random read/write
>> performance.
>> 
>> 
>> 
>> Cc: Stanley Chu <stanley.chu@mediatek.com>
>> 
>> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
>> 
>> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
>> 
>> Signed-off-by: Can Guo <cang@codeaurora.org>
> 
> Can,
> The patch looks good to me.
> I did a UFS queue limitation test before, observed that once the queue
> is full, then the active task number in the queue will get down. For
> the Nvme, the scenario is the same. You can refer to the slide 23, and
> slide 24 in the pdf:
> https://elinux.org/images/6/6c/Linux_Storage_System_Bottleneck_Exploration_V0.3.pdf
> I don't know if your patch can fix this
> issue.

I've studied these slides made by you many times, it is really good.
I will do some study later on this. Thanks for the slides.

> 
> Unfortunately, I cannot verify UTRLCNR usage flow since my platform is
> v2.1. But at least my test can prove that the patch doesn't impact the
> legacy(UFSHCI is less than v3.0) doorbell usage flow.
> 

Thanks for your time :).

Regards,
Can Guo.

> Reviewed-by: Bean Huo <beanhuo@micron.com>
> 
> 
> Bean

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-05-28  7:30           ` Avri Altman
@ 2021-06-02 21:18             ` Asutosh Das (asd)
  -1 siblings, 0 replies; 68+ messages in thread
From: Asutosh Das (asd) @ 2021-06-02 21:18 UTC (permalink / raw)
  To: Avri Altman, Bart Van Assche, Can Guo, nguyenb, hongwus,
	linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/28/2021 12:30 AM, Avri Altman wrote:
>>> Hi Bart,
>>> No it's not necessary to serialize both the paths. I think this series
>>> attempts to remove this serialization to a certain degree, which is
>>> what's giving the performance improvement.
> Btw, Is this performance improvement is on top of rq_affinity 2 or 1?
> 
It's on 1.

> Thanks,
> Avri
> 
>>>
>>> Even if multiqueue support would be available in the future, I think
>>> this change is apt now for the current available specification.
>> I agree - this looks like the harbinger of a major change,
>> And going further with respect of hw queues,
>> will need the spec support - e.g. doorbell per lane, etc.
>>
>> Thanks,
>> Avri
>>
>>>> Thanks,
>>>>
>>>> Bart.
>>>>
>>>
>>>
>>> Thanks,
>>> -asd
>>>
>>> --
>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
>>> Forum,
>>> Linux Foundation Collaborative Project


-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-06-02 21:18             ` Asutosh Das (asd)
  0 siblings, 0 replies; 68+ messages in thread
From: Asutosh Das (asd) @ 2021-06-02 21:18 UTC (permalink / raw)
  To: Avri Altman, Bart Van Assche, Can Guo, nguyenb, hongwus,
	linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/28/2021 12:30 AM, Avri Altman wrote:
>>> Hi Bart,
>>> No it's not necessary to serialize both the paths. I think this series
>>> attempts to remove this serialization to a certain degree, which is
>>> what's giving the performance improvement.
> Btw, Is this performance improvement is on top of rq_affinity 2 or 1?
> 
It's on 1.

> Thanks,
> Avri
> 
>>>
>>> Even if multiqueue support would be available in the future, I think
>>> this change is apt now for the current available specification.
>> I agree - this looks like the harbinger of a major change,
>> And going further with respect of hw queues,
>> will need the spec support - e.g. doorbell per lane, etc.
>>
>> Thanks,
>> Avri
>>
>>>> Thanks,
>>>>
>>>> Bart.
>>>>
>>>
>>>
>>> Thanks,
>>> -asd
>>>
>>> --
>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
>>> Forum,
>>> Linux Foundation Collaborative Project


-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
Linux Foundation Collaborative Project

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-05-31 16:04     ` Bean Huo
  (?)
@ 2021-06-03  0:18       ` Bart Van Assche
  -1 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-06-03  0:18 UTC (permalink / raw)
  To: Bean Huo, Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/31/21 9:04 AM, Bean Huo wrote:
> I did a UFS queue limitation test before, observed that once the
> queue is full, then the active task number in the queue will get
> down. For the Nvme, the scenario is the same. You can refer to the
> slide 23, and slide 24 in the pdf: 
> https://elinux.org/images/6/6c/Linux_Storage_System_Bottleneck_Exploration_V0.3.pdf
> I don't know if your patch can fix this issue.

Hi Bean,

That's a very interesting presentation. Unfortunately the overhead for
SCSI in that presentation includes the SCSI core + UFS driver. Given the
use of the host lock in the version of the UFS driver that was used to
prepare that presentation, the overhead introduced by the UFS driver may
be significant. Maybe someone should measure the overhead of the
scsi_debug driver and compare it with the NVMe loopback driver to get a
better idea of how the overhead of these two subsystems compare to each
other?

Bart.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-06-03  0:18       ` Bart Van Assche
  0 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-06-03  0:18 UTC (permalink / raw)
  To: Bean Huo, Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/31/21 9:04 AM, Bean Huo wrote:
> I did a UFS queue limitation test before, observed that once the
> queue is full, then the active task number in the queue will get
> down. For the Nvme, the scenario is the same. You can refer to the
> slide 23, and slide 24 in the pdf: 
> https://elinux.org/images/6/6c/Linux_Storage_System_Bottleneck_Exploration_V0.3.pdf
> I don't know if your patch can fix this issue.

Hi Bean,

That's a very interesting presentation. Unfortunately the overhead for
SCSI in that presentation includes the SCSI core + UFS driver. Given the
use of the host lock in the version of the UFS driver that was used to
prepare that presentation, the overhead introduced by the UFS driver may
be significant. Maybe someone should measure the overhead of the
scsi_debug driver and compare it with the NVMe loopback driver to get a
better idea of how the overhead of these two subsystems compare to each
other?

Bart.

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-06-03  0:18       ` Bart Van Assche
  0 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-06-03  0:18 UTC (permalink / raw)
  To: Bean Huo, Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/31/21 9:04 AM, Bean Huo wrote:
> I did a UFS queue limitation test before, observed that once the
> queue is full, then the active task number in the queue will get
> down. For the Nvme, the scenario is the same. You can refer to the
> slide 23, and slide 24 in the pdf: 
> https://elinux.org/images/6/6c/Linux_Storage_System_Bottleneck_Exploration_V0.3.pdf
> I don't know if your patch can fix this issue.

Hi Bean,

That's a very interesting presentation. Unfortunately the overhead for
SCSI in that presentation includes the SCSI core + UFS driver. Given the
use of the host lock in the version of the UFS driver that was used to
prepare that presentation, the overhead introduced by the UFS driver may
be significant. Maybe someone should measure the overhead of the
scsi_debug driver and compare it with the NVMe loopback driver to get a
better idea of how the overhead of these two subsystems compare to each
other?

Bart.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-05-24  8:36   ` Can Guo
  (?)
@ 2021-06-03  2:54     ` Stanley Chu
  -1 siblings, 0 replies; 68+ messages in thread
From: Stanley Chu @ 2021-06-03  2:54 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Alim Akhtar,
	Avri Altman, James E.J. Bottomley, Martin K. Petersen,
	Matthias Brugger, Bean Huo, Jaegeuk Kim, Adrian Hunter,
	Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

Hi Can,

On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
> Current UFS IRQ handler is completely wrapped by host lock, and because
> ufshcd_send_command() is also protected by host lock, when IRQ handler
> fires, not only the CPU running the IRQ handler cannot send new requests,
> the rest CPUs can neither. Move the host lock wrapping the IRQ handler into
> specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further
> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is
> no longer required to call __ufshcd_transfer_req_compl(). As per test, the
> optimization can bring considerable gain to random read/write performance.
> 
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Can Guo <cang@codeaurora.org>

According to my test, the performance indeed has impressive improvement
with this series!

Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>




>  #endif
>  
>  	bool req_abort_skip;
> -	bool in_use;
>  };
>  
>  /**


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-06-03  2:54     ` Stanley Chu
  0 siblings, 0 replies; 68+ messages in thread
From: Stanley Chu @ 2021-06-03  2:54 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Alim Akhtar,
	Avri Altman, James E.J. Bottomley, Martin K. Petersen,
	Matthias Brugger, Bean Huo, Jaegeuk Kim, Adrian Hunter,
	Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

Hi Can,

On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
> Current UFS IRQ handler is completely wrapped by host lock, and because
> ufshcd_send_command() is also protected by host lock, when IRQ handler
> fires, not only the CPU running the IRQ handler cannot send new requests,
> the rest CPUs can neither. Move the host lock wrapping the IRQ handler into
> specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further
> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is
> no longer required to call __ufshcd_transfer_req_compl(). As per test, the
> optimization can bring considerable gain to random read/write performance.
> 
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Can Guo <cang@codeaurora.org>

According to my test, the performance indeed has impressive improvement
with this series!

Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>




>  #endif
>  
>  	bool req_abort_skip;
> -	bool in_use;
>  };
>  
>  /**

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-06-03  2:54     ` Stanley Chu
  0 siblings, 0 replies; 68+ messages in thread
From: Stanley Chu @ 2021-06-03  2:54 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Alim Akhtar,
	Avri Altman, James E.J. Bottomley, Martin K. Petersen,
	Matthias Brugger, Bean Huo, Jaegeuk Kim, Adrian Hunter,
	Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

Hi Can,

On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
> Current UFS IRQ handler is completely wrapped by host lock, and because
> ufshcd_send_command() is also protected by host lock, when IRQ handler
> fires, not only the CPU running the IRQ handler cannot send new requests,
> the rest CPUs can neither. Move the host lock wrapping the IRQ handler into
> specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further
> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is
> no longer required to call __ufshcd_transfer_req_compl(). As per test, the
> optimization can bring considerable gain to random read/write performance.
> 
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Can Guo <cang@codeaurora.org>

According to my test, the performance indeed has impressive improvement
with this series!

Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>




>  #endif
>  
>  	bool req_abort_skip;
> -	bool in_use;
>  };
>  
>  /**

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 3/3] scsi: ufs: Utilize Transfer Request List Completion Notification Register
  2021-05-24  8:36   ` Can Guo
  (?)
@ 2021-06-03  2:54     ` Stanley Chu
  -1 siblings, 0 replies; 68+ messages in thread
From: Stanley Chu @ 2021-06-03  2:54 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Alim Akhtar,
	Avri Altman, James E.J. Bottomley, Martin K. Petersen,
	Matthias Brugger, Bean Huo, Jaegeuk Kim, Adrian Hunter,
	Kiwoong Kim, Satya Tangirala, Gustavo A. R. Silva,
	Caleb Connolly, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

Hi Can,

On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
> By reading the UTP Transfer Request List Completion Notification Register,
> which is added in UFSHCI Ver 3.0, SW can easily get the compeleted transfer
> requests. Thus, SW can get rid of host lock, which is used to synchronize
> the tr_doorbell and outstanding_reqs, on transfer requests dispatch and
> completion paths. This can further benefit random read/write performance.
> 
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Can Guo <cang@codeaurora.org>

Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>


> +++ b/drivers/scsi/ufs/ufshci.h
> @@ -39,6 +39,7 @@ enum {
>  	REG_UTP_TRANSFER_REQ_DOOR_BELL		= 0x58,
>  	REG_UTP_TRANSFER_REQ_LIST_CLEAR		= 0x5C,
>  	REG_UTP_TRANSFER_REQ_LIST_RUN_STOP	= 0x60,
> +	REG_UTP_TRANSFER_REQ_LIST_COMPL		= 0x64,
>  	REG_UTP_TASK_REQ_LIST_BASE_L		= 0x70,
>  	REG_UTP_TASK_REQ_LIST_BASE_H		= 0x74,
>  	REG_UTP_TASK_REQ_DOOR_BELL		= 0x78,


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 3/3] scsi: ufs: Utilize Transfer Request List Completion Notification Register
@ 2021-06-03  2:54     ` Stanley Chu
  0 siblings, 0 replies; 68+ messages in thread
From: Stanley Chu @ 2021-06-03  2:54 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Alim Akhtar,
	Avri Altman, James E.J. Bottomley, Martin K. Petersen,
	Matthias Brugger, Bean Huo, Jaegeuk Kim, Adrian Hunter,
	Kiwoong Kim, Satya Tangirala, Gustavo A. R. Silva,
	Caleb Connolly, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

Hi Can,

On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
> By reading the UTP Transfer Request List Completion Notification Register,
> which is added in UFSHCI Ver 3.0, SW can easily get the compeleted transfer
> requests. Thus, SW can get rid of host lock, which is used to synchronize
> the tr_doorbell and outstanding_reqs, on transfer requests dispatch and
> completion paths. This can further benefit random read/write performance.
> 
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Can Guo <cang@codeaurora.org>

Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>


> +++ b/drivers/scsi/ufs/ufshci.h
> @@ -39,6 +39,7 @@ enum {
>  	REG_UTP_TRANSFER_REQ_DOOR_BELL		= 0x58,
>  	REG_UTP_TRANSFER_REQ_LIST_CLEAR		= 0x5C,
>  	REG_UTP_TRANSFER_REQ_LIST_RUN_STOP	= 0x60,
> +	REG_UTP_TRANSFER_REQ_LIST_COMPL		= 0x64,
>  	REG_UTP_TASK_REQ_LIST_BASE_L		= 0x70,
>  	REG_UTP_TASK_REQ_LIST_BASE_H		= 0x74,
>  	REG_UTP_TASK_REQ_DOOR_BELL		= 0x78,

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 3/3] scsi: ufs: Utilize Transfer Request List Completion Notification Register
@ 2021-06-03  2:54     ` Stanley Chu
  0 siblings, 0 replies; 68+ messages in thread
From: Stanley Chu @ 2021-06-03  2:54 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Alim Akhtar,
	Avri Altman, James E.J. Bottomley, Martin K. Petersen,
	Matthias Brugger, Bean Huo, Jaegeuk Kim, Adrian Hunter,
	Kiwoong Kim, Satya Tangirala, Gustavo A. R. Silva,
	Caleb Connolly, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

Hi Can,

On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
> By reading the UTP Transfer Request List Completion Notification Register,
> which is added in UFSHCI Ver 3.0, SW can easily get the compeleted transfer
> requests. Thus, SW can get rid of host lock, which is used to synchronize
> the tr_doorbell and outstanding_reqs, on transfer requests dispatch and
> completion paths. This can further benefit random read/write performance.
> 
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Can Guo <cang@codeaurora.org>

Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>


> +++ b/drivers/scsi/ufs/ufshci.h
> @@ -39,6 +39,7 @@ enum {
>  	REG_UTP_TRANSFER_REQ_DOOR_BELL		= 0x58,
>  	REG_UTP_TRANSFER_REQ_LIST_CLEAR		= 0x5C,
>  	REG_UTP_TRANSFER_REQ_LIST_RUN_STOP	= 0x60,
> +	REG_UTP_TRANSFER_REQ_LIST_COMPL		= 0x64,
>  	REG_UTP_TASK_REQ_LIST_BASE_L		= 0x70,
>  	REG_UTP_TASK_REQ_LIST_BASE_H		= 0x74,
>  	REG_UTP_TASK_REQ_DOOR_BELL		= 0x78,

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-06-03  2:54     ` Stanley Chu
@ 2021-06-04  1:49       ` Can Guo
  -1 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-06-04  1:49 UTC (permalink / raw)
  To: Stanley Chu
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Alim Akhtar,
	Avri Altman, James E.J. Bottomley, Martin K. Petersen,
	Matthias Brugger, Bean Huo, Jaegeuk Kim, Adrian Hunter,
	Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

Hi Stanley,

On 2021-06-03 10:54, Stanley Chu wrote:
> Hi Can,
> 
> On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
>> Current UFS IRQ handler is completely wrapped by host lock, and 
>> because
>> ufshcd_send_command() is also protected by host lock, when IRQ handler
>> fires, not only the CPU running the IRQ handler cannot send new 
>> requests,
>> the rest CPUs can neither. Move the host lock wrapping the IRQ handler 
>> into
>> specific branches, i.e., ufshcd_uic_cmd_compl(), 
>> ufshcd_check_errors(),
>> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to 
>> further
>> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host 
>> lock is
>> no longer required to call __ufshcd_transfer_req_compl(). As per test, 
>> the
>> optimization can bring considerable gain to random read/write 
>> performance.
>> 
>> Cc: Stanley Chu <stanley.chu@mediatek.com>
>> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
>> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
>> Signed-off-by: Can Guo <cang@codeaurora.org>
> 
> According to my test, the performance indeed has impressive improvement
> with this series!
> 

Thanks a lot for your time and review. :)

Regards,
Can Guo.

> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
> 
> 
> 
> 
>>  #endif
>> 
>>  	bool req_abort_skip;
>> -	bool in_use;
>>  };
>> 
>>  /**

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-06-04  1:49       ` Can Guo
  0 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-06-04  1:49 UTC (permalink / raw)
  To: Stanley Chu
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Alim Akhtar,
	Avri Altman, James E.J. Bottomley, Martin K. Petersen,
	Matthias Brugger, Bean Huo, Jaegeuk Kim, Adrian Hunter,
	Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

Hi Stanley,

On 2021-06-03 10:54, Stanley Chu wrote:
> Hi Can,
> 
> On Mon, 2021-05-24 at 01:36 -0700, Can Guo wrote:
>> Current UFS IRQ handler is completely wrapped by host lock, and 
>> because
>> ufshcd_send_command() is also protected by host lock, when IRQ handler
>> fires, not only the CPU running the IRQ handler cannot send new 
>> requests,
>> the rest CPUs can neither. Move the host lock wrapping the IRQ handler 
>> into
>> specific branches, i.e., ufshcd_uic_cmd_compl(), 
>> ufshcd_check_errors(),
>> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to 
>> further
>> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host 
>> lock is
>> no longer required to call __ufshcd_transfer_req_compl(). As per test, 
>> the
>> optimization can bring considerable gain to random read/write 
>> performance.
>> 
>> Cc: Stanley Chu <stanley.chu@mediatek.com>
>> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
>> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
>> Signed-off-by: Can Guo <cang@codeaurora.org>
> 
> According to my test, the performance indeed has impressive improvement
> with this series!
> 

Thanks a lot for your time and review. :)

Regards,
Can Guo.

> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
> 
> 
> 
> 
>>  #endif
>> 
>>  	bool req_abort_skip;
>> -	bool in_use;
>>  };
>> 
>>  /**

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-05-24 11:25     ` kernel test robot
@ 2021-06-08 17:53       ` Nathan Chancellor
  -1 siblings, 0 replies; 68+ messages in thread
From: Nathan Chancellor @ 2021-06-08 17:53 UTC (permalink / raw)
  To: kernel test robot
  Cc: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team,
	kbuild-all, clang-built-linux, Stanley Chu, Alim Akhtar,
	Avri Altman, James E.J. Bottomley

On Mon, May 24, 2021 at 07:25:57PM +0800, kernel test robot wrote:
> Hi Can,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on mkp-scsi/for-next]
> [also build test WARNING on next-20210524]
> [cannot apply to scsi/for-next v5.13-rc3]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/0day-ci/linux/commits/Can-Guo/Optimize-host-lock-on-TR-send-compl-paths-and-utilize-UTRLCNR/20210524-163847
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next
> config: arm64-randconfig-r011-20210524 (attached as .config)
> compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 93d1e5822ed64abd777eb94ea9899e96c4c39fbe)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # install arm64 cross compiling tool for clang build
>         # apt-get install binutils-aarch64-linux-gnu
>         # https://github.com/0day-ci/linux/commit/efe94162bf7973be4ed6496871b9bc9ea54e2819
>         git remote add linux-review https://github.com/0day-ci/linux
>         git fetch --no-tags linux-review Can-Guo/Optimize-host-lock-on-TR-send-compl-paths-and-utilize-UTRLCNR/20210524-163847
>         git checkout efe94162bf7973be4ed6496871b9bc9ea54e2819
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm64 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All warnings (new ones prefixed by >>):

Looks like this build warning never got taken care of before the patch
was accepted because I see it on next-20210608.

> >> drivers/scsi/ufs/ufshcd.c:2959:6: warning: variable 'lrbp' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
>            if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
>                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    include/linux/compiler.h:78:22: note: expanded from macro 'unlikely'
>    # define unlikely(x)    __builtin_expect(!!(x), 0)
>                            ^~~~~~~~~~~~~~~~~~~~~~~~~~
>    drivers/scsi/ufs/ufshcd.c:2981:32: note: uninitialized use occurs here
>                                        (struct utp_upiu_req *)lrbp->ucd_rsp_ptr);
>                                                               ^~~~
>    drivers/scsi/ufs/ufshcd.c:2959:2: note: remove the 'if' if its condition is always false
>            if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
>            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    drivers/scsi/ufs/ufshcd.c:2939:25: note: initialize the variable 'lrbp' to silence this warning
>            struct ufshcd_lrb *lrbp;
>                                   ^
>                                    = NULL
>    1 warning generated.
> 
> 
> vim +2959 drivers/scsi/ufs/ufshcd.c
> 
>   2924	
>   2925	/**
>   2926	 * ufshcd_exec_dev_cmd - API for sending device management requests
>   2927	 * @hba: UFS hba
>   2928	 * @cmd_type: specifies the type (NOP, Query...)
>   2929	 * @timeout: time in seconds
>   2930	 *
>   2931	 * NOTE: Since there is only one available tag for device management commands,
>   2932	 * it is expected you hold the hba->dev_cmd.lock mutex.
>   2933	 */
>   2934	static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
>   2935			enum dev_cmd_type cmd_type, int timeout)
>   2936	{
>   2937		struct request_queue *q = hba->cmd_queue;
>   2938		struct request *req;
>   2939		struct ufshcd_lrb *lrbp;
>   2940		int err;
>   2941		int tag;
>   2942		struct completion wait;
>   2943	
>   2944		down_read(&hba->clk_scaling_lock);
>   2945	
>   2946		/*
>   2947		 * Get free slot, sleep if slots are unavailable.
>   2948		 * Even though we use wait_event() which sleeps indefinitely,
>   2949		 * the maximum wait time is bounded by SCSI request timeout.
>   2950		 */
>   2951		req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
>   2952		if (IS_ERR(req)) {
>   2953			err = PTR_ERR(req);
>   2954			goto out_unlock;
>   2955		}
>   2956		tag = req->tag;
>   2957		WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
>   2958	
> > 2959		if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
>   2960			err = -EBUSY;

Should this goto be adjusted to out_put_tag then drop the out label?

>   2961			goto out;
>   2962		}
>   2963	
>   2964		init_completion(&wait);
>   2965		lrbp = &hba->lrb[tag];
>   2966		WARN_ON(lrbp->cmd);
>   2967		err = ufshcd_compose_dev_cmd(hba, lrbp, cmd_type, tag);
>   2968		if (unlikely(err))
>   2969			goto out_put_tag;
>   2970	
>   2971		hba->dev_cmd.complete = &wait;
>   2972	
>   2973		ufshcd_add_query_upiu_trace(hba, UFS_QUERY_SEND, lrbp->ucd_req_ptr);
>   2974		/* Make sure descriptors are ready before ringing the doorbell */
>   2975		wmb();
>   2976	
>   2977		ufshcd_send_command(hba, tag);
>   2978		err = ufshcd_wait_for_dev_cmd(hba, lrbp, timeout);
>   2979	out:
>   2980		ufshcd_add_query_upiu_trace(hba, err ? UFS_QUERY_ERR : UFS_QUERY_COMP,
>   2981					    (struct utp_upiu_req *)lrbp->ucd_rsp_ptr);
>   2982	
>   2983	out_put_tag:
>   2984		blk_put_request(req);
>   2985	out_unlock:
>   2986		up_read(&hba->clk_scaling_lock);
>   2987		return err;
>   2988	}
>   2989	
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-06-08 17:53       ` Nathan Chancellor
  0 siblings, 0 replies; 68+ messages in thread
From: Nathan Chancellor @ 2021-06-08 17:53 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5779 bytes --]

On Mon, May 24, 2021 at 07:25:57PM +0800, kernel test robot wrote:
> Hi Can,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on mkp-scsi/for-next]
> [also build test WARNING on next-20210524]
> [cannot apply to scsi/for-next v5.13-rc3]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/0day-ci/linux/commits/Can-Guo/Optimize-host-lock-on-TR-send-compl-paths-and-utilize-UTRLCNR/20210524-163847
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next
> config: arm64-randconfig-r011-20210524 (attached as .config)
> compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 93d1e5822ed64abd777eb94ea9899e96c4c39fbe)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # install arm64 cross compiling tool for clang build
>         # apt-get install binutils-aarch64-linux-gnu
>         # https://github.com/0day-ci/linux/commit/efe94162bf7973be4ed6496871b9bc9ea54e2819
>         git remote add linux-review https://github.com/0day-ci/linux
>         git fetch --no-tags linux-review Can-Guo/Optimize-host-lock-on-TR-send-compl-paths-and-utilize-UTRLCNR/20210524-163847
>         git checkout efe94162bf7973be4ed6496871b9bc9ea54e2819
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm64 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All warnings (new ones prefixed by >>):

Looks like this build warning never got taken care of before the patch
was accepted because I see it on next-20210608.

> >> drivers/scsi/ufs/ufshcd.c:2959:6: warning: variable 'lrbp' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
>            if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
>                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    include/linux/compiler.h:78:22: note: expanded from macro 'unlikely'
>    # define unlikely(x)    __builtin_expect(!!(x), 0)
>                            ^~~~~~~~~~~~~~~~~~~~~~~~~~
>    drivers/scsi/ufs/ufshcd.c:2981:32: note: uninitialized use occurs here
>                                        (struct utp_upiu_req *)lrbp->ucd_rsp_ptr);
>                                                               ^~~~
>    drivers/scsi/ufs/ufshcd.c:2959:2: note: remove the 'if' if its condition is always false
>            if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
>            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    drivers/scsi/ufs/ufshcd.c:2939:25: note: initialize the variable 'lrbp' to silence this warning
>            struct ufshcd_lrb *lrbp;
>                                   ^
>                                    = NULL
>    1 warning generated.
> 
> 
> vim +2959 drivers/scsi/ufs/ufshcd.c
> 
>   2924	
>   2925	/**
>   2926	 * ufshcd_exec_dev_cmd - API for sending device management requests
>   2927	 * @hba: UFS hba
>   2928	 * @cmd_type: specifies the type (NOP, Query...)
>   2929	 * @timeout: time in seconds
>   2930	 *
>   2931	 * NOTE: Since there is only one available tag for device management commands,
>   2932	 * it is expected you hold the hba->dev_cmd.lock mutex.
>   2933	 */
>   2934	static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
>   2935			enum dev_cmd_type cmd_type, int timeout)
>   2936	{
>   2937		struct request_queue *q = hba->cmd_queue;
>   2938		struct request *req;
>   2939		struct ufshcd_lrb *lrbp;
>   2940		int err;
>   2941		int tag;
>   2942		struct completion wait;
>   2943	
>   2944		down_read(&hba->clk_scaling_lock);
>   2945	
>   2946		/*
>   2947		 * Get free slot, sleep if slots are unavailable.
>   2948		 * Even though we use wait_event() which sleeps indefinitely,
>   2949		 * the maximum wait time is bounded by SCSI request timeout.
>   2950		 */
>   2951		req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
>   2952		if (IS_ERR(req)) {
>   2953			err = PTR_ERR(req);
>   2954			goto out_unlock;
>   2955		}
>   2956		tag = req->tag;
>   2957		WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
>   2958	
> > 2959		if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
>   2960			err = -EBUSY;

Should this goto be adjusted to out_put_tag then drop the out label?

>   2961			goto out;
>   2962		}
>   2963	
>   2964		init_completion(&wait);
>   2965		lrbp = &hba->lrb[tag];
>   2966		WARN_ON(lrbp->cmd);
>   2967		err = ufshcd_compose_dev_cmd(hba, lrbp, cmd_type, tag);
>   2968		if (unlikely(err))
>   2969			goto out_put_tag;
>   2970	
>   2971		hba->dev_cmd.complete = &wait;
>   2972	
>   2973		ufshcd_add_query_upiu_trace(hba, UFS_QUERY_SEND, lrbp->ucd_req_ptr);
>   2974		/* Make sure descriptors are ready before ringing the doorbell */
>   2975		wmb();
>   2976	
>   2977		ufshcd_send_command(hba, tag);
>   2978		err = ufshcd_wait_for_dev_cmd(hba, lrbp, timeout);
>   2979	out:
>   2980		ufshcd_add_query_upiu_trace(hba, err ? UFS_QUERY_ERR : UFS_QUERY_COMP,
>   2981					    (struct utp_upiu_req *)lrbp->ucd_rsp_ptr);
>   2982	
>   2983	out_put_tag:
>   2984		blk_put_request(req);
>   2985	out_unlock:
>   2986		up_read(&hba->clk_scaling_lock);
>   2987		return err;
>   2988	}
>   2989	
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-06-08 17:53       ` Nathan Chancellor
@ 2021-06-09  1:01         ` Can Guo
  -1 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-06-09  1:01 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: kernel test robot, asutoshd, nguyenb, hongwus, linux-scsi,
	kernel-team, kbuild-all, clang-built-linux, Stanley Chu,
	Alim Akhtar, Avri Altman, James E.J. Bottomley

Hi Nathan,

On 2021-06-09 01:53, Nathan Chancellor wrote:
> On Mon, May 24, 2021 at 07:25:57PM +0800, kernel test robot wrote:
>> Hi Can,
>> 
>> Thank you for the patch! Perhaps something to improve:
>> 
>> [auto build test WARNING on mkp-scsi/for-next]
>> [also build test WARNING on next-20210524]
>> [cannot apply to scsi/for-next v5.13-rc3]
>> [If your patch is applied to the wrong git tree, kindly drop us a 
>> note.
>> And when submitting patch, we suggest to use '--base' as documented in
>> https://git-scm.com/docs/git-format-patch]
>> 
>> url:    
>> https://github.com/0day-ci/linux/commits/Can-Guo/Optimize-host-lock-on-TR-send-compl-paths-and-utilize-UTRLCNR/20210524-163847
>> base:   https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git 
>> for-next
>> config: arm64-randconfig-r011-20210524 (attached as .config)
>> compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 
>> 93d1e5822ed64abd777eb94ea9899e96c4c39fbe)
>> reproduce (this is a W=1 build):
>>         wget 
>> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross 
>> -O ~/bin/make.cross
>>         chmod +x ~/bin/make.cross
>>         # install arm64 cross compiling tool for clang build
>>         # apt-get install binutils-aarch64-linux-gnu
>>         # 
>> https://github.com/0day-ci/linux/commit/efe94162bf7973be4ed6496871b9bc9ea54e2819
>>         git remote add linux-review https://github.com/0day-ci/linux
>>         git fetch --no-tags linux-review 
>> Can-Guo/Optimize-host-lock-on-TR-send-compl-paths-and-utilize-UTRLCNR/20210524-163847
>>         git checkout efe94162bf7973be4ed6496871b9bc9ea54e2819
>>         # save the attached .config to linux build tree
>>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross 
>> ARCH=arm64
>> 
>> If you fix the issue, kindly add following tag as appropriate
>> Reported-by: kernel test robot <lkp@intel.com>
>> 
>> All warnings (new ones prefixed by >>):
> 
> Looks like this build warning never got taken care of before the patch
> was accepted because I see it on next-20210608.

I am not aware of that it has already accepted to 5.14/scsi-staging.
I will fix it with a new patch.

> 
>> >> drivers/scsi/ufs/ufshcd.c:2959:6: warning: variable 'lrbp' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
>>            if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
>>                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>    include/linux/compiler.h:78:22: note: expanded from macro 
>> 'unlikely'
>>    # define unlikely(x)    __builtin_expect(!!(x), 0)
>>                            ^~~~~~~~~~~~~~~~~~~~~~~~~~
>>    drivers/scsi/ufs/ufshcd.c:2981:32: note: uninitialized use occurs 
>> here
>>                                        (struct utp_upiu_req 
>> *)lrbp->ucd_rsp_ptr);
>>                                                               ^~~~
>>    drivers/scsi/ufs/ufshcd.c:2959:2: note: remove the 'if' if its 
>> condition is always false
>>            if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
>>            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>    drivers/scsi/ufs/ufshcd.c:2939:25: note: initialize the variable 
>> 'lrbp' to silence this warning
>>            struct ufshcd_lrb *lrbp;
>>                                   ^
>>                                    = NULL
>>    1 warning generated.
>> 
>> 
>> vim +2959 drivers/scsi/ufs/ufshcd.c
>> 
>>   2924
>>   2925	/**
>>   2926	 * ufshcd_exec_dev_cmd - API for sending device management 
>> requests
>>   2927	 * @hba: UFS hba
>>   2928	 * @cmd_type: specifies the type (NOP, Query...)
>>   2929	 * @timeout: time in seconds
>>   2930	 *
>>   2931	 * NOTE: Since there is only one available tag for device 
>> management commands,
>>   2932	 * it is expected you hold the hba->dev_cmd.lock mutex.
>>   2933	 */
>>   2934	static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
>>   2935			enum dev_cmd_type cmd_type, int timeout)
>>   2936	{
>>   2937		struct request_queue *q = hba->cmd_queue;
>>   2938		struct request *req;
>>   2939		struct ufshcd_lrb *lrbp;
>>   2940		int err;
>>   2941		int tag;
>>   2942		struct completion wait;
>>   2943
>>   2944		down_read(&hba->clk_scaling_lock);
>>   2945
>>   2946		/*
>>   2947		 * Get free slot, sleep if slots are unavailable.
>>   2948		 * Even though we use wait_event() which sleeps indefinitely,
>>   2949		 * the maximum wait time is bounded by SCSI request timeout.
>>   2950		 */
>>   2951		req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
>>   2952		if (IS_ERR(req)) {
>>   2953			err = PTR_ERR(req);
>>   2954			goto out_unlock;
>>   2955		}
>>   2956		tag = req->tag;
>>   2957		WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
>>   2958
>> > 2959		if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
>>   2960			err = -EBUSY;
> 
> Should this goto be adjusted to out_put_tag then drop the out label?

Right, will fix it with a new change.

Thanks,
Can Guo.

> 
>>   2961			goto out;
>>   2962		}
>>   2963
>>   2964		init_completion(&wait);
>>   2965		lrbp = &hba->lrb[tag];
>>   2966		WARN_ON(lrbp->cmd);
>>   2967		err = ufshcd_compose_dev_cmd(hba, lrbp, cmd_type, tag);
>>   2968		if (unlikely(err))
>>   2969			goto out_put_tag;
>>   2970
>>   2971		hba->dev_cmd.complete = &wait;
>>   2972
>>   2973		ufshcd_add_query_upiu_trace(hba, UFS_QUERY_SEND, 
>> lrbp->ucd_req_ptr);
>>   2974		/* Make sure descriptors are ready before ringing the doorbell 
>> */
>>   2975		wmb();
>>   2976
>>   2977		ufshcd_send_command(hba, tag);
>>   2978		err = ufshcd_wait_for_dev_cmd(hba, lrbp, timeout);
>>   2979	out:
>>   2980		ufshcd_add_query_upiu_trace(hba, err ? UFS_QUERY_ERR : 
>> UFS_QUERY_COMP,
>>   2981					    (struct utp_upiu_req *)lrbp->ucd_rsp_ptr);
>>   2982
>>   2983	out_put_tag:
>>   2984		blk_put_request(req);
>>   2985	out_unlock:
>>   2986		up_read(&hba->clk_scaling_lock);
>>   2987		return err;
>>   2988	}
>>   2989
>> 
>> ---
>> 0-DAY CI Kernel Test Service, Intel Corporation
>> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
> 
> Cheers,
> Nathan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-06-09  1:01         ` Can Guo
  0 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-06-09  1:01 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 6239 bytes --]

Hi Nathan,

On 2021-06-09 01:53, Nathan Chancellor wrote:
> On Mon, May 24, 2021 at 07:25:57PM +0800, kernel test robot wrote:
>> Hi Can,
>> 
>> Thank you for the patch! Perhaps something to improve:
>> 
>> [auto build test WARNING on mkp-scsi/for-next]
>> [also build test WARNING on next-20210524]
>> [cannot apply to scsi/for-next v5.13-rc3]
>> [If your patch is applied to the wrong git tree, kindly drop us a 
>> note.
>> And when submitting patch, we suggest to use '--base' as documented in
>> https://git-scm.com/docs/git-format-patch]
>> 
>> url:    
>> https://github.com/0day-ci/linux/commits/Can-Guo/Optimize-host-lock-on-TR-send-compl-paths-and-utilize-UTRLCNR/20210524-163847
>> base:   https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git 
>> for-next
>> config: arm64-randconfig-r011-20210524 (attached as .config)
>> compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 
>> 93d1e5822ed64abd777eb94ea9899e96c4c39fbe)
>> reproduce (this is a W=1 build):
>>         wget 
>> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross 
>> -O ~/bin/make.cross
>>         chmod +x ~/bin/make.cross
>>         # install arm64 cross compiling tool for clang build
>>         # apt-get install binutils-aarch64-linux-gnu
>>         # 
>> https://github.com/0day-ci/linux/commit/efe94162bf7973be4ed6496871b9bc9ea54e2819
>>         git remote add linux-review https://github.com/0day-ci/linux
>>         git fetch --no-tags linux-review 
>> Can-Guo/Optimize-host-lock-on-TR-send-compl-paths-and-utilize-UTRLCNR/20210524-163847
>>         git checkout efe94162bf7973be4ed6496871b9bc9ea54e2819
>>         # save the attached .config to linux build tree
>>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross 
>> ARCH=arm64
>> 
>> If you fix the issue, kindly add following tag as appropriate
>> Reported-by: kernel test robot <lkp@intel.com>
>> 
>> All warnings (new ones prefixed by >>):
> 
> Looks like this build warning never got taken care of before the patch
> was accepted because I see it on next-20210608.

I am not aware of that it has already accepted to 5.14/scsi-staging.
I will fix it with a new patch.

> 
>> >> drivers/scsi/ufs/ufshcd.c:2959:6: warning: variable 'lrbp' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
>>            if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
>>                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>    include/linux/compiler.h:78:22: note: expanded from macro 
>> 'unlikely'
>>    # define unlikely(x)    __builtin_expect(!!(x), 0)
>>                            ^~~~~~~~~~~~~~~~~~~~~~~~~~
>>    drivers/scsi/ufs/ufshcd.c:2981:32: note: uninitialized use occurs 
>> here
>>                                        (struct utp_upiu_req 
>> *)lrbp->ucd_rsp_ptr);
>>                                                               ^~~~
>>    drivers/scsi/ufs/ufshcd.c:2959:2: note: remove the 'if' if its 
>> condition is always false
>>            if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
>>            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>    drivers/scsi/ufs/ufshcd.c:2939:25: note: initialize the variable 
>> 'lrbp' to silence this warning
>>            struct ufshcd_lrb *lrbp;
>>                                   ^
>>                                    = NULL
>>    1 warning generated.
>> 
>> 
>> vim +2959 drivers/scsi/ufs/ufshcd.c
>> 
>>   2924
>>   2925	/**
>>   2926	 * ufshcd_exec_dev_cmd - API for sending device management 
>> requests
>>   2927	 * @hba: UFS hba
>>   2928	 * @cmd_type: specifies the type (NOP, Query...)
>>   2929	 * @timeout: time in seconds
>>   2930	 *
>>   2931	 * NOTE: Since there is only one available tag for device 
>> management commands,
>>   2932	 * it is expected you hold the hba->dev_cmd.lock mutex.
>>   2933	 */
>>   2934	static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
>>   2935			enum dev_cmd_type cmd_type, int timeout)
>>   2936	{
>>   2937		struct request_queue *q = hba->cmd_queue;
>>   2938		struct request *req;
>>   2939		struct ufshcd_lrb *lrbp;
>>   2940		int err;
>>   2941		int tag;
>>   2942		struct completion wait;
>>   2943
>>   2944		down_read(&hba->clk_scaling_lock);
>>   2945
>>   2946		/*
>>   2947		 * Get free slot, sleep if slots are unavailable.
>>   2948		 * Even though we use wait_event() which sleeps indefinitely,
>>   2949		 * the maximum wait time is bounded by SCSI request timeout.
>>   2950		 */
>>   2951		req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
>>   2952		if (IS_ERR(req)) {
>>   2953			err = PTR_ERR(req);
>>   2954			goto out_unlock;
>>   2955		}
>>   2956		tag = req->tag;
>>   2957		WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
>>   2958
>> > 2959		if (unlikely(test_bit(tag, &hba->outstanding_reqs))) {
>>   2960			err = -EBUSY;
> 
> Should this goto be adjusted to out_put_tag then drop the out label?

Right, will fix it with a new change.

Thanks,
Can Guo.

> 
>>   2961			goto out;
>>   2962		}
>>   2963
>>   2964		init_completion(&wait);
>>   2965		lrbp = &hba->lrb[tag];
>>   2966		WARN_ON(lrbp->cmd);
>>   2967		err = ufshcd_compose_dev_cmd(hba, lrbp, cmd_type, tag);
>>   2968		if (unlikely(err))
>>   2969			goto out_put_tag;
>>   2970
>>   2971		hba->dev_cmd.complete = &wait;
>>   2972
>>   2973		ufshcd_add_query_upiu_trace(hba, UFS_QUERY_SEND, 
>> lrbp->ucd_req_ptr);
>>   2974		/* Make sure descriptors are ready before ringing the doorbell 
>> */
>>   2975		wmb();
>>   2976
>>   2977		ufshcd_send_command(hba, tag);
>>   2978		err = ufshcd_wait_for_dev_cmd(hba, lrbp, timeout);
>>   2979	out:
>>   2980		ufshcd_add_query_upiu_trace(hba, err ? UFS_QUERY_ERR : 
>> UFS_QUERY_COMP,
>>   2981					    (struct utp_upiu_req *)lrbp->ucd_rsp_ptr);
>>   2982
>>   2983	out_put_tag:
>>   2984		blk_put_request(req);
>>   2985	out_unlock:
>>   2986		up_read(&hba->clk_scaling_lock);
>>   2987		return err;
>>   2988	}
>>   2989
>> 
>> ---
>> 0-DAY CI Kernel Test Service, Intel Corporation
>> https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
> 
> Cheers,
> Nathan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 0/3] Optimize host lock on TR send/compl paths and utilize UTRLCNR
  2021-05-24  8:36 [PATCH v1 0/3] Optimize host lock on TR send/compl paths and utilize UTRLCNR Can Guo
                   ` (2 preceding siblings ...)
  2021-05-24  8:36   ` Can Guo
@ 2021-06-16  3:48 ` Martin K. Petersen
  3 siblings, 0 replies; 68+ messages in thread
From: Martin K. Petersen @ 2021-06-16  3:48 UTC (permalink / raw)
  To: nguyenb, kernel-team, asutoshd, Can Guo, hongwus, linux-scsi
  Cc: Martin K . Petersen

On Mon, 24 May 2021 01:36:55 -0700, Can Guo wrote:

> By optimizing host lock usage on TR send/compl paths and utilizing UTRLCNR,
> we can get considerable gain in both random read and random write performance.
> 
> Can Guo (3):
>   scsi: ufs: Remove a redundant command completion logic in error
>     handler
>   scsi: ufs: Optimize host lock on transfer requests send/compl paths
>   scsi: ufs: Utilize Transfer Request List Completion Notification
>     Register
> 
> [...]

Applied to 5.14/scsi-queue, thanks!

[1/3] scsi: ufs: Remove a redundant command completion logic in error handler
      https://git.kernel.org/mkp/scsi/c/1cca0c3fdc91
[2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
      https://git.kernel.org/mkp/scsi/c/a45f937110fa
[3/3] scsi: ufs: Utilize Transfer Request List Completion Notification Register
      https://git.kernel.org/mkp/scsi/c/6f7151729647

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-05-24  8:36   ` Can Guo
  (?)
@ 2021-06-17  2:49     ` Bart Van Assche
  -1 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-06-17  2:49 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/24/21 1:36 AM, Can Guo wrote:
> @@ -2688,6 +2705,43 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
> +	case UFSHCD_STATE_EH_SCHEDULED_FATAL:
> +		/*
> +		 * pm_runtime_get_sync() is used at error handling preparation
> +		 * stage. If a scsi cmd, e.g. the SSU cmd, is sent from hba's
> +		 * PM ops, it can never be finished if we let SCSI layer keep
> +		 * retrying it, which gets err handler stuck forever. Neither
> +		 * can we let the scsi cmd pass through, because UFS is in bad
> +		 * state, the scsi cmd may eventually time out, which will get
> +		 * err handler blocked for too long. So, just fail the scsi cmd
> +		 * sent from PM ops, err handler can recover PM error anyways.
> +		 */
> +		if (hba->pm_op_in_progress) {
> +			hba->force_reset = true;
> +			set_host_byte(cmd, DID_BAD_TARGET);
> +			cmd->scsi_done(cmd);
> +			goto out;
> +		}
> +		fallthrough;

Hi Can,

I know that this patch only moves the above code and that the above code
has not been introduced by this patch. Anyway, is my understanding
correct that ufshcd_err_handler() can change the host controller state
from UFSHCD_STATE_EH_SCHEDULED_FATAL into UFSHCD_STATE_RESET and next
into UFSHCD_STATE_OPERATIONAL? If so, if the above code completes a READ
with status DID_BAD_TARGET and if recovery by the error handler
succeeds, will that cause the filesystem above the UFS driver to change
into read-only mode? If the above code completes a WRITE with status
DID_BAD_TARGET, will that cause data corruption? Is there any other
solution to prevent data corruption than merging the
UFSHCD_STATE_EH_SCHEDULED_FATAL and UFSHCD_STATE_EH_SCHEDULED_NON_FATAL
back into a single state and changing the ufshcd_rpm_get_sync(hba) call
in ufshcd_err_handling_prepare() into a pm_runtime_get_noresume() call?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-06-17  2:49     ` Bart Van Assche
  0 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-06-17  2:49 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/24/21 1:36 AM, Can Guo wrote:
> @@ -2688,6 +2705,43 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
> +	case UFSHCD_STATE_EH_SCHEDULED_FATAL:
> +		/*
> +		 * pm_runtime_get_sync() is used at error handling preparation
> +		 * stage. If a scsi cmd, e.g. the SSU cmd, is sent from hba's
> +		 * PM ops, it can never be finished if we let SCSI layer keep
> +		 * retrying it, which gets err handler stuck forever. Neither
> +		 * can we let the scsi cmd pass through, because UFS is in bad
> +		 * state, the scsi cmd may eventually time out, which will get
> +		 * err handler blocked for too long. So, just fail the scsi cmd
> +		 * sent from PM ops, err handler can recover PM error anyways.
> +		 */
> +		if (hba->pm_op_in_progress) {
> +			hba->force_reset = true;
> +			set_host_byte(cmd, DID_BAD_TARGET);
> +			cmd->scsi_done(cmd);
> +			goto out;
> +		}
> +		fallthrough;

Hi Can,

I know that this patch only moves the above code and that the above code
has not been introduced by this patch. Anyway, is my understanding
correct that ufshcd_err_handler() can change the host controller state
from UFSHCD_STATE_EH_SCHEDULED_FATAL into UFSHCD_STATE_RESET and next
into UFSHCD_STATE_OPERATIONAL? If so, if the above code completes a READ
with status DID_BAD_TARGET and if recovery by the error handler
succeeds, will that cause the filesystem above the UFS driver to change
into read-only mode? If the above code completes a WRITE with status
DID_BAD_TARGET, will that cause data corruption? Is there any other
solution to prevent data corruption than merging the
UFSHCD_STATE_EH_SCHEDULED_FATAL and UFSHCD_STATE_EH_SCHEDULED_NON_FATAL
back into a single state and changing the ufshcd_rpm_get_sync(hba) call
in ufshcd_err_handling_prepare() into a pm_runtime_get_noresume() call?

Thanks,

Bart.

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-06-17  2:49     ` Bart Van Assche
  0 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-06-17  2:49 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/24/21 1:36 AM, Can Guo wrote:
> @@ -2688,6 +2705,43 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
> +	case UFSHCD_STATE_EH_SCHEDULED_FATAL:
> +		/*
> +		 * pm_runtime_get_sync() is used at error handling preparation
> +		 * stage. If a scsi cmd, e.g. the SSU cmd, is sent from hba's
> +		 * PM ops, it can never be finished if we let SCSI layer keep
> +		 * retrying it, which gets err handler stuck forever. Neither
> +		 * can we let the scsi cmd pass through, because UFS is in bad
> +		 * state, the scsi cmd may eventually time out, which will get
> +		 * err handler blocked for too long. So, just fail the scsi cmd
> +		 * sent from PM ops, err handler can recover PM error anyways.
> +		 */
> +		if (hba->pm_op_in_progress) {
> +			hba->force_reset = true;
> +			set_host_byte(cmd, DID_BAD_TARGET);
> +			cmd->scsi_done(cmd);
> +			goto out;
> +		}
> +		fallthrough;

Hi Can,

I know that this patch only moves the above code and that the above code
has not been introduced by this patch. Anyway, is my understanding
correct that ufshcd_err_handler() can change the host controller state
from UFSHCD_STATE_EH_SCHEDULED_FATAL into UFSHCD_STATE_RESET and next
into UFSHCD_STATE_OPERATIONAL? If so, if the above code completes a READ
with status DID_BAD_TARGET and if recovery by the error handler
succeeds, will that cause the filesystem above the UFS driver to change
into read-only mode? If the above code completes a WRITE with status
DID_BAD_TARGET, will that cause data corruption? Is there any other
solution to prevent data corruption than merging the
UFSHCD_STATE_EH_SCHEDULED_FATAL and UFSHCD_STATE_EH_SCHEDULED_NON_FATAL
back into a single state and changing the ufshcd_rpm_get_sync(hba) call
in ufshcd_err_handling_prepare() into a pm_runtime_get_noresume() call?

Thanks,

Bart.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-06-17  2:49     ` Bart Van Assche
@ 2021-06-23  2:04       ` Can Guo
  -1 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-06-23  2:04 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Stanley Chu,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

Hi Bart,

On 2021-06-17 10:49, Bart Van Assche wrote:
> On 5/24/21 1:36 AM, Can Guo wrote:
>> @@ -2688,6 +2705,43 @@ static int ufshcd_queuecommand(struct Scsi_Host 
>> *host, struct scsi_cmnd *cmd)
>> +	case UFSHCD_STATE_EH_SCHEDULED_FATAL:
>> +		/*
>> +		 * pm_runtime_get_sync() is used at error handling preparation
>> +		 * stage. If a scsi cmd, e.g. the SSU cmd, is sent from hba's
>> +		 * PM ops, it can never be finished if we let SCSI layer keep
>> +		 * retrying it, which gets err handler stuck forever. Neither
>> +		 * can we let the scsi cmd pass through, because UFS is in bad
>> +		 * state, the scsi cmd may eventually time out, which will get
>> +		 * err handler blocked for too long. So, just fail the scsi cmd
>> +		 * sent from PM ops, err handler can recover PM error anyways.
>> +		 */
>> +		if (hba->pm_op_in_progress) {
>> +			hba->force_reset = true;
>> +			set_host_byte(cmd, DID_BAD_TARGET);
>> +			cmd->scsi_done(cmd);
>> +			goto out;
>> +		}
>> +		fallthrough;
> 
> Hi Can,
> 
> I know that this patch only moves the above code and that the above 
> code
> has not been introduced by this patch. Anyway, is my understanding
> correct that ufshcd_err_handler() can change the host controller state
> from UFSHCD_STATE_EH_SCHEDULED_FATAL into UFSHCD_STATE_RESET and next
> into UFSHCD_STATE_OPERATIONAL? If so, if the above code completes a 
> READ
> with status DID_BAD_TARGET and if recovery by the error handler
> succeeds, will that cause the filesystem above the UFS driver to change
> into read-only mode? If the above code completes a WRITE with status
> DID_BAD_TARGET, will that cause data corruption? Is there any other
> solution to prevent data corruption than merging the
> UFSHCD_STATE_EH_SCHEDULED_FATAL and UFSHCD_STATE_EH_SCHEDULED_NON_FATAL
> back into a single state and changing the ufshcd_rpm_get_sync(hba) call
> in ufshcd_err_handling_prepare() into a pm_runtime_get_noresume() call?
> 

Here, when hba->pm_op_in_progress is true, there cannot be READ or WRITE
command since hba is resuming or suspending. When fatal erorr happens, 
the
DID_BAD_TARGET above is intend to let the SSU (or whatever PM requests
blocking suspend/resume) fail fast (neither returning HOST_BUSY nor 
letting
the cmd pass through can achieve such purpose), so that error handling 
prepare
won't get stuck [1] when it calls

lock_system_sleep()
runtime_pm_get_sync()

The reason why I split UFSHCD_STATE_EH_SCHEDULED to 
UFSHCD_STATE_EH_SCHEDULED_FATAL
and UFSHCD_STATE_EH_SCHEDULED_NON_FATAL is that

1. For non-fatal errors, HW can recover by itself, so when host state is
UFSHCD_STATE_EH_SCHEDULED_NON_FATAL, cmd can still passthrough.

2. When non-fatal error (LINE-RESET for example) happens, error handler 
only
needs to do a power mode transition without a full reset. If we only 
have one
state, returning HOST_BUSY will get error handling prepare stuck [1], 
while
fast failing SSU cmds shall make error handler do a full reset (which 
goes
too far for non-fatal errors).

Thanks,

Can Guo.

> Thanks,
> 
> Bart.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-06-23  2:04       ` Can Guo
  0 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-06-23  2:04 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Stanley Chu,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

Hi Bart,

On 2021-06-17 10:49, Bart Van Assche wrote:
> On 5/24/21 1:36 AM, Can Guo wrote:
>> @@ -2688,6 +2705,43 @@ static int ufshcd_queuecommand(struct Scsi_Host 
>> *host, struct scsi_cmnd *cmd)
>> +	case UFSHCD_STATE_EH_SCHEDULED_FATAL:
>> +		/*
>> +		 * pm_runtime_get_sync() is used at error handling preparation
>> +		 * stage. If a scsi cmd, e.g. the SSU cmd, is sent from hba's
>> +		 * PM ops, it can never be finished if we let SCSI layer keep
>> +		 * retrying it, which gets err handler stuck forever. Neither
>> +		 * can we let the scsi cmd pass through, because UFS is in bad
>> +		 * state, the scsi cmd may eventually time out, which will get
>> +		 * err handler blocked for too long. So, just fail the scsi cmd
>> +		 * sent from PM ops, err handler can recover PM error anyways.
>> +		 */
>> +		if (hba->pm_op_in_progress) {
>> +			hba->force_reset = true;
>> +			set_host_byte(cmd, DID_BAD_TARGET);
>> +			cmd->scsi_done(cmd);
>> +			goto out;
>> +		}
>> +		fallthrough;
> 
> Hi Can,
> 
> I know that this patch only moves the above code and that the above 
> code
> has not been introduced by this patch. Anyway, is my understanding
> correct that ufshcd_err_handler() can change the host controller state
> from UFSHCD_STATE_EH_SCHEDULED_FATAL into UFSHCD_STATE_RESET and next
> into UFSHCD_STATE_OPERATIONAL? If so, if the above code completes a 
> READ
> with status DID_BAD_TARGET and if recovery by the error handler
> succeeds, will that cause the filesystem above the UFS driver to change
> into read-only mode? If the above code completes a WRITE with status
> DID_BAD_TARGET, will that cause data corruption? Is there any other
> solution to prevent data corruption than merging the
> UFSHCD_STATE_EH_SCHEDULED_FATAL and UFSHCD_STATE_EH_SCHEDULED_NON_FATAL
> back into a single state and changing the ufshcd_rpm_get_sync(hba) call
> in ufshcd_err_handling_prepare() into a pm_runtime_get_noresume() call?
> 

Here, when hba->pm_op_in_progress is true, there cannot be READ or WRITE
command since hba is resuming or suspending. When fatal erorr happens, 
the
DID_BAD_TARGET above is intend to let the SSU (or whatever PM requests
blocking suspend/resume) fail fast (neither returning HOST_BUSY nor 
letting
the cmd pass through can achieve such purpose), so that error handling 
prepare
won't get stuck [1] when it calls

lock_system_sleep()
runtime_pm_get_sync()

The reason why I split UFSHCD_STATE_EH_SCHEDULED to 
UFSHCD_STATE_EH_SCHEDULED_FATAL
and UFSHCD_STATE_EH_SCHEDULED_NON_FATAL is that

1. For non-fatal errors, HW can recover by itself, so when host state is
UFSHCD_STATE_EH_SCHEDULED_NON_FATAL, cmd can still passthrough.

2. When non-fatal error (LINE-RESET for example) happens, error handler 
only
needs to do a power mode transition without a full reset. If we only 
have one
state, returning HOST_BUSY will get error handling prepare stuck [1], 
while
fast failing SSU cmds shall make error handler do a full reset (which 
goes
too far for non-fatal errors).

Thanks,

Can Guo.

> Thanks,
> 
> Bart.

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-05-24  8:36   ` Can Guo
  (?)
@ 2021-06-28 22:58     ` Bart Van Assche
  -1 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-06-28 22:58 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/24/21 1:36 AM, Can Guo wrote:
> Current UFS IRQ handler is completely wrapped by host lock, and because
> ufshcd_send_command() is also protected by host lock, when IRQ handler
> fires, not only the CPU running the IRQ handler cannot send new requests,
> the rest CPUs can neither. Move the host lock wrapping the IRQ handler into
> specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further
> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is
> no longer required to call __ufshcd_transfer_req_compl(). As per test, the
> optimization can bring considerable gain to random read/write performance.

Hi Can,

Since this patch has been applied on the AOSP kernel we see 100%
reproducible lockups appearing on multiple test setups. Examples of call
traces:

blk_execute_rq()
__scsi_execute()
sd_sync_cache()
sd_suspend_common()
sd_suspend_system()
scsi_bus_suspend()
__device_suspend()

blk_execute_rq()
__scsi_execute()
ufshcd_clear_ua_wlun()
ufshcd_err_handling_unprepare()
ufshcd_err_handler()
process_one_work()

Reverting this patch and the next patch from this series solved the
lockups. Do you prefer to revert this patch or do you perhaps want us to
test a potential fix?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-06-28 22:58     ` Bart Van Assche
  0 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-06-28 22:58 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/24/21 1:36 AM, Can Guo wrote:
> Current UFS IRQ handler is completely wrapped by host lock, and because
> ufshcd_send_command() is also protected by host lock, when IRQ handler
> fires, not only the CPU running the IRQ handler cannot send new requests,
> the rest CPUs can neither. Move the host lock wrapping the IRQ handler into
> specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further
> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is
> no longer required to call __ufshcd_transfer_req_compl(). As per test, the
> optimization can bring considerable gain to random read/write performance.

Hi Can,

Since this patch has been applied on the AOSP kernel we see 100%
reproducible lockups appearing on multiple test setups. Examples of call
traces:

blk_execute_rq()
__scsi_execute()
sd_sync_cache()
sd_suspend_common()
sd_suspend_system()
scsi_bus_suspend()
__device_suspend()

blk_execute_rq()
__scsi_execute()
ufshcd_clear_ua_wlun()
ufshcd_err_handling_unprepare()
ufshcd_err_handler()
process_one_work()

Reverting this patch and the next patch from this series solved the
lockups. Do you prefer to revert this patch or do you perhaps want us to
test a potential fix?

Thanks,

Bart.


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-06-28 22:58     ` Bart Van Assche
  0 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-06-28 22:58 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Stanley Chu, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 5/24/21 1:36 AM, Can Guo wrote:
> Current UFS IRQ handler is completely wrapped by host lock, and because
> ufshcd_send_command() is also protected by host lock, when IRQ handler
> fires, not only the CPU running the IRQ handler cannot send new requests,
> the rest CPUs can neither. Move the host lock wrapping the IRQ handler into
> specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further
> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is
> no longer required to call __ufshcd_transfer_req_compl(). As per test, the
> optimization can bring considerable gain to random read/write performance.

Hi Can,

Since this patch has been applied on the AOSP kernel we see 100%
reproducible lockups appearing on multiple test setups. Examples of call
traces:

blk_execute_rq()
__scsi_execute()
sd_sync_cache()
sd_suspend_common()
sd_suspend_system()
scsi_bus_suspend()
__device_suspend()

blk_execute_rq()
__scsi_execute()
ufshcd_clear_ua_wlun()
ufshcd_err_handling_unprepare()
ufshcd_err_handler()
process_one_work()

Reverting this patch and the next patch from this series solved the
lockups. Do you prefer to revert this patch or do you perhaps want us to
test a potential fix?

Thanks,

Bart.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-06-28 22:58     ` Bart Van Assche
@ 2021-06-29  5:41       ` Can Guo
  -1 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-06-29  5:41 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Stanley Chu,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 2021-06-29 06:58, Bart Van Assche wrote:
> On 5/24/21 1:36 AM, Can Guo wrote:
>> Current UFS IRQ handler is completely wrapped by host lock, and 
>> because
>> ufshcd_send_command() is also protected by host lock, when IRQ handler
>> fires, not only the CPU running the IRQ handler cannot send new 
>> requests,
>> the rest CPUs can neither. Move the host lock wrapping the IRQ handler 
>> into
>> specific branches, i.e., ufshcd_uic_cmd_compl(), 
>> ufshcd_check_errors(),
>> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to 
>> further
>> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host 
>> lock is
>> no longer required to call __ufshcd_transfer_req_compl(). As per test, 
>> the
>> optimization can bring considerable gain to random read/write 
>> performance.
> 
> Hi Can,
> 
> Since this patch has been applied on the AOSP kernel we see 100%
> reproducible lockups appearing on multiple test setups. Examples of 
> call
> traces:
> 
> blk_execute_rq()
> __scsi_execute()
> sd_sync_cache()
> sd_suspend_common()
> sd_suspend_system()
> scsi_bus_suspend()
> __device_suspend()
> 
> blk_execute_rq()
> __scsi_execute()
> ufshcd_clear_ua_wlun()
> ufshcd_err_handling_unprepare()
> ufshcd_err_handler()
> process_one_work()
> 
> Reverting this patch and the next patch from this series solved the
> lockups. Do you prefer to revert this patch or do you perhaps want us 
> to
> test a potential fix?
> 

Hi Bart,

I am waiting for more infos/logs/dumps on Buganizor to look into it.
With above calltrace snippet, it is hard to figure out what is 
happening.
Besides, we've tested this series before go upstream and we didn't
see such problem.

Thanks,

Can Guo.

> Thanks,
> 
> Bart.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-06-29  5:41       ` Can Guo
  0 siblings, 0 replies; 68+ messages in thread
From: Can Guo @ 2021-06-29  5:41 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Stanley Chu,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 2021-06-29 06:58, Bart Van Assche wrote:
> On 5/24/21 1:36 AM, Can Guo wrote:
>> Current UFS IRQ handler is completely wrapped by host lock, and 
>> because
>> ufshcd_send_command() is also protected by host lock, when IRQ handler
>> fires, not only the CPU running the IRQ handler cannot send new 
>> requests,
>> the rest CPUs can neither. Move the host lock wrapping the IRQ handler 
>> into
>> specific branches, i.e., ufshcd_uic_cmd_compl(), 
>> ufshcd_check_errors(),
>> ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to 
>> further
>> reduce occpuation of host lock in ufshcd_transfer_req_compl(), host 
>> lock is
>> no longer required to call __ufshcd_transfer_req_compl(). As per test, 
>> the
>> optimization can bring considerable gain to random read/write 
>> performance.
> 
> Hi Can,
> 
> Since this patch has been applied on the AOSP kernel we see 100%
> reproducible lockups appearing on multiple test setups. Examples of 
> call
> traces:
> 
> blk_execute_rq()
> __scsi_execute()
> sd_sync_cache()
> sd_suspend_common()
> sd_suspend_system()
> scsi_bus_suspend()
> __device_suspend()
> 
> blk_execute_rq()
> __scsi_execute()
> ufshcd_clear_ua_wlun()
> ufshcd_err_handling_unprepare()
> ufshcd_err_handler()
> process_one_work()
> 
> Reverting this patch and the next patch from this series solved the
> lockups. Do you prefer to revert this patch or do you perhaps want us 
> to
> test a potential fix?
> 

Hi Bart,

I am waiting for more infos/logs/dumps on Buganizor to look into it.
With above calltrace snippet, it is hard to figure out what is 
happening.
Besides, we've tested this series before go upstream and we didn't
see such problem.

Thanks,

Can Guo.

> Thanks,
> 
> Bart.

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
  2021-06-29  5:41       ` Can Guo
  (?)
@ 2021-07-01 15:57         ` Bart Van Assche
  -1 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-07-01 15:57 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Stanley Chu,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 6/28/21 10:41 PM, Can Guo wrote:
> I am waiting for more infos/logs/dumps on Buganizor to look into it.
> With above calltrace snippet, it is hard to figure out what is happening.
> Besides, we've tested this series before go upstream and we didn't
> see such problem.

Hi Can,

Jaegeuk has posted a fix as you may have noticed.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-07-01 15:57         ` Bart Van Assche
  0 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-07-01 15:57 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Stanley Chu,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 6/28/21 10:41 PM, Can Guo wrote:
> I am waiting for more infos/logs/dumps on Buganizor to look into it.
> With above calltrace snippet, it is hard to figure out what is happening.
> Besides, we've tested this series before go upstream and we didn't
> see such problem.

Hi Can,

Jaegeuk has posted a fix as you may have noticed.

Thanks,

Bart.

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths
@ 2021-07-01 15:57         ` Bart Van Assche
  0 siblings, 0 replies; 68+ messages in thread
From: Bart Van Assche @ 2021-07-01 15:57 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, linux-scsi, kernel-team, Stanley Chu,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, Bean Huo, Jaegeuk Kim,
	Adrian Hunter, Kiwoong Kim, Satya Tangirala, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support

On 6/28/21 10:41 PM, Can Guo wrote:
> I am waiting for more infos/logs/dumps on Buganizor to look into it.
> With above calltrace snippet, it is hard to figure out what is happening.
> Besides, we've tested this series before go upstream and we didn't
> see such problem.

Hi Can,

Jaegeuk has posted a fix as you may have noticed.

Thanks,

Bart.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2021-07-01 15:59 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-24  8:36 [PATCH v1 0/3] Optimize host lock on TR send/compl paths and utilize UTRLCNR Can Guo
2021-05-24  8:36 ` [PATCH v1 1/3] scsi: ufs: Remove a redundant command completion logic in error handler Can Guo
2021-05-24 16:43   ` Bart Van Assche
2021-05-25  4:15   ` Stanley Chu
2021-05-31  7:14   ` Bean Huo
2021-05-24  8:36 ` [PATCH v1 2/3] scsi: ufs: Optimize host lock on transfer requests send/compl paths Can Guo
2021-05-24  8:36   ` Can Guo
2021-05-24  8:36   ` Can Guo
2021-05-24 11:25   ` kernel test robot
2021-05-24 11:25     ` kernel test robot
2021-06-08 17:53     ` Nathan Chancellor
2021-06-08 17:53       ` Nathan Chancellor
2021-06-09  1:01       ` Can Guo
2021-06-09  1:01         ` Can Guo
2021-05-24 20:10   ` Bart Van Assche
2021-05-24 20:10     ` Bart Van Assche
2021-05-24 20:10     ` Bart Van Assche
2021-05-25  1:34     ` Asutosh Das (asd)
2021-05-25  1:34       ` Asutosh Das (asd)
2021-05-25  8:24       ` Avri Altman
2021-05-25  8:24         ` Avri Altman
2021-05-25  8:24         ` Avri Altman
2021-05-28  7:30         ` Avri Altman
2021-05-28  7:30           ` Avri Altman
2021-05-28  7:30           ` Avri Altman
2021-06-02 21:18           ` Asutosh Das (asd)
2021-06-02 21:18             ` Asutosh Das (asd)
2021-05-25  1:40     ` Can Guo
2021-05-25  1:40       ` Can Guo
2021-05-25 16:40       ` Bart Van Assche
2021-05-25 16:40         ` Bart Van Assche
2021-05-25 16:40         ` Bart Van Assche
2021-05-31 16:04   ` Bean Huo
2021-05-31 16:04     ` Bean Huo
2021-05-31 16:04     ` Bean Huo
2021-06-02  2:14     ` Can Guo
2021-06-02  2:14       ` Can Guo
2021-06-03  0:18     ` Bart Van Assche
2021-06-03  0:18       ` Bart Van Assche
2021-06-03  0:18       ` Bart Van Assche
2021-06-03  2:54   ` Stanley Chu
2021-06-03  2:54     ` Stanley Chu
2021-06-03  2:54     ` Stanley Chu
2021-06-04  1:49     ` Can Guo
2021-06-04  1:49       ` Can Guo
2021-06-17  2:49   ` Bart Van Assche
2021-06-17  2:49     ` Bart Van Assche
2021-06-17  2:49     ` Bart Van Assche
2021-06-23  2:04     ` Can Guo
2021-06-23  2:04       ` Can Guo
2021-06-28 22:58   ` Bart Van Assche
2021-06-28 22:58     ` Bart Van Assche
2021-06-28 22:58     ` Bart Van Assche
2021-06-29  5:41     ` Can Guo
2021-06-29  5:41       ` Can Guo
2021-07-01 15:57       ` Bart Van Assche
2021-07-01 15:57         ` Bart Van Assche
2021-07-01 15:57         ` Bart Van Assche
2021-05-24  8:36 ` [PATCH v1 3/3] scsi: ufs: Utilize Transfer Request List Completion Notification Register Can Guo
2021-05-24  8:36   ` Can Guo
2021-05-24  8:36   ` Can Guo
2021-05-31 16:05   ` Bean Huo
2021-05-31 16:05     ` Bean Huo
2021-05-31 16:05     ` Bean Huo
2021-06-03  2:54   ` Stanley Chu
2021-06-03  2:54     ` Stanley Chu
2021-06-03  2:54     ` Stanley Chu
2021-06-16  3:48 ` [PATCH v1 0/3] Optimize host lock on TR send/compl paths and utilize UTRLCNR Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.