All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/4] Simplify and optimize the UFS driver
@ 2019-11-12 17:37 Bart Van Assche
  2019-11-12 17:37 ` [PATCH v5 1/4] ufs: Serialize error handling and command submission Bart Van Assche
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Bart Van Assche @ 2019-11-12 17:37 UTC (permalink / raw)
  To: Martin K . Petersen, James E . J . Bottomley
  Cc: Bean Huo, Avri Altman, Asutosh Das, Vignesh Raghavendra, Can Guo,
	linux-scsi, Bart Van Assche

Hello Martin,

This patch series that simplifies and optimizes the UFS driver. Please consider
this patch series for kernel v5.5.

Thanks,

Bart.

Changes compared to v4:
- Reverted back to scsi_block_requests() / scsi_unblock_requests() for the
  UFS error handler.
- Added a new patch that serializes error handling and command submission.
- Fixed a blk_mq_init_queue() return value check.

Changes compared to v3:
- Left out "scsi" from the name of the functions that suspend and resume
  command processing.

Changes compared to v2:
- Use a separate tag set for TMF tags.

Changes compared to v1:
- Use the block layer tag infrastructure for managing TMF tags.

Bart Van Assche (4):
  ufs: Serialize error handling and command submission
  ufs: Avoid busy-waiting by eliminating tag conflicts
  ufs: Use blk_{get,put}_request() to allocate and free TMFs
  ufs: Simplify the clock scaling mechanism implementation

 drivers/scsi/ufs/ufshcd.c | 396 +++++++++++++++++---------------------
 drivers/scsi/ufs/ufshcd.h |  21 +-
 2 files changed, 186 insertions(+), 231 deletions(-)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5 1/4] ufs: Serialize error handling and command submission
  2019-11-12 17:37 [PATCH v5 0/4] Simplify and optimize the UFS driver Bart Van Assche
@ 2019-11-12 17:37 ` Bart Van Assche
  2019-11-12 17:37 ` [PATCH v5 2/4] ufs: Avoid busy-waiting by eliminating tag conflicts Bart Van Assche
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2019-11-12 17:37 UTC (permalink / raw)
  To: Martin K . Petersen, James E . J . Bottomley
  Cc: Bean Huo, Avri Altman, Asutosh Das, Vignesh Raghavendra, Can Guo,
	linux-scsi, Bart Van Assche, Stanley Chu, Tomas Winkler

With the legacy SCSI core holding the host lock was sufficient to serialize
queuecommand() calls and error handling. With scsi-mq a call to
blk_mq_quiesce() is required to wait until ongoing queuecommand calls have
finished. scsi_target_block() calls blk_mq_quiesce().

Cc: Bean Huo <beanhuo@micron.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Fixes: d5038a13eca7 ("scsi: core: switch to scsi-mq by default")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/ufs/ufshcd.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 9cbe3b45cf1c..a03538165a12 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -5275,6 +5275,8 @@ static bool ufshcd_quirk_dl_nac_errors(struct ufs_hba *hba)
 static void ufshcd_err_handler(struct work_struct *work)
 {
 	struct ufs_hba *hba;
+	struct scsi_device *sdev;
+	struct scsi_target *starget;
 	unsigned long flags;
 	u32 err_xfer = 0;
 	u32 err_tm = 0;
@@ -5283,10 +5285,19 @@ static void ufshcd_err_handler(struct work_struct *work)
 	bool needs_reset = false;
 
 	hba = container_of(work, struct ufs_hba, eh_work);
+	sdev = hba->sdev_ufs_device;
+	starget = sdev ? sdev->sdev_target : NULL;
 
 	pm_runtime_get_sync(hba->dev);
 	ufshcd_hold(hba, false);
 
+	/*
+	 * Wait until ongoing ufshcd_queuecommand() calls have finished
+	 * without waiting for command completion.
+	 */
+	if (starget)
+		scsi_target_block(&starget->dev);
+
 	spin_lock_irqsave(hba->host->host_lock, flags);
 	if (hba->ufshcd_state == UFSHCD_STATE_RESET)
 		goto out;
@@ -5395,6 +5406,8 @@ static void ufshcd_err_handler(struct work_struct *work)
 
 out:
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
+	if (starget)
+		scsi_target_unblock(&starget->dev, SDEV_RUNNING);
 	ufshcd_scsi_unblock_requests(hba);
 	ufshcd_release(hba);
 	pm_runtime_put_sync(hba->dev);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5 2/4] ufs: Avoid busy-waiting by eliminating tag conflicts
  2019-11-12 17:37 [PATCH v5 0/4] Simplify and optimize the UFS driver Bart Van Assche
  2019-11-12 17:37 ` [PATCH v5 1/4] ufs: Serialize error handling and command submission Bart Van Assche
@ 2019-11-12 17:37 ` Bart Van Assche
  2019-11-12 17:37 ` [PATCH v5 3/4] ufs: Use blk_{get,put}_request() to allocate and free TMFs Bart Van Assche
  2019-11-12 17:37 ` [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation Bart Van Assche
  3 siblings, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2019-11-12 17:37 UTC (permalink / raw)
  To: Martin K . Petersen, James E . J . Bottomley
  Cc: Bean Huo, Avri Altman, Asutosh Das, Vignesh Raghavendra, Can Guo,
	linux-scsi, Bart Van Assche, Stanley Chu, Tomas Winkler

Instead of tracking which tags are in use in the ufs_hba.lrb_in_use
bitmask, rely on the block layer tag allocation mechanism. This patch
removes the following busy-waiting loop if ufshcd_issue_devman_upiu_cmd()
and the block layer accidentally allocate the same tag for a SCSI request:
* ufshcd_queuecommand() returns SCSI_MLQUEUE_HOST_BUSY.
* The SCSI core requeues the SCSI command.

Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/ufs/ufshcd.c | 122 +++++++++++++++-----------------------
 drivers/scsi/ufs/ufshcd.h |   6 +-
 2 files changed, 51 insertions(+), 77 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index a03538165a12..d746afd0a2e2 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -497,8 +497,8 @@ static void ufshcd_print_tmrs(struct ufs_hba *hba, unsigned long bitmap)
 static void ufshcd_print_host_state(struct ufs_hba *hba)
 {
 	dev_err(hba->dev, "UFS Host state=%d\n", hba->ufshcd_state);
-	dev_err(hba->dev, "lrb in use=0x%lx, outstanding reqs=0x%lx tasks=0x%lx\n",
-		hba->lrb_in_use, hba->outstanding_reqs, hba->outstanding_tasks);
+	dev_err(hba->dev, "outstanding reqs=0x%lx tasks=0x%lx\n",
+		hba->outstanding_reqs, hba->outstanding_tasks);
 	dev_err(hba->dev, "saved_err=0x%x, saved_uic_err=0x%x\n",
 		hba->saved_err, hba->saved_uic_err);
 	dev_err(hba->dev, "Device power mode=%d, UIC link state=%d\n",
@@ -1596,6 +1596,25 @@ int ufshcd_hold(struct ufs_hba *hba, bool async)
 }
 EXPORT_SYMBOL_GPL(ufshcd_hold);
 
+static bool ufshcd_is_busy(struct request *req, void *priv, bool reserved)
+{
+	int *busy = priv;
+
+	WARN_ON_ONCE(reserved);
+	(*busy)++;
+	return false;
+}
+
+/* Whether or not any tag is in use by a request that is in progress. */
+static bool ufshcd_any_tag_in_use(struct ufs_hba *hba)
+{
+	struct request_queue *q = hba->cmd_queue;
+	int busy = 0;
+
+	blk_mq_tagset_busy_iter(q->tag_set, ufshcd_is_busy, &busy);
+	return busy;
+}
+
 static void ufshcd_gate_work(struct work_struct *work)
 {
 	struct ufs_hba *hba = container_of(work, struct ufs_hba,
@@ -1619,7 +1638,7 @@ static void ufshcd_gate_work(struct work_struct *work)
 
 	if (hba->clk_gating.active_reqs
 		|| hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL
-		|| hba->lrb_in_use || hba->outstanding_tasks
+		|| ufshcd_any_tag_in_use(hba) || hba->outstanding_tasks
 		|| hba->active_uic_cmd || hba->uic_async_done)
 		goto rel_lock;
 
@@ -1673,7 +1692,7 @@ static void __ufshcd_release(struct ufs_hba *hba)
 
 	if (hba->clk_gating.active_reqs || hba->clk_gating.is_suspended
 		|| hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL
-		|| hba->lrb_in_use || hba->outstanding_tasks
+		|| ufshcd_any_tag_in_use(hba) || hba->outstanding_tasks
 		|| hba->active_uic_cmd || hba->uic_async_done
 		|| ufshcd_eh_in_progress(hba))
 		return;
@@ -2443,22 +2462,9 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 
 	hba->req_abort_count = 0;
 
-	/* acquire the tag to make sure device cmds don't use it */
-	if (test_and_set_bit_lock(tag, &hba->lrb_in_use)) {
-		/*
-		 * Dev manage command in progress, requeue the command.
-		 * Requeuing the command helps in cases where the request *may*
-		 * find different tag instead of waiting for dev manage command
-		 * completion.
-		 */
-		err = SCSI_MLQUEUE_HOST_BUSY;
-		goto out;
-	}
-
 	err = ufshcd_hold(hba, true);
 	if (err) {
 		err = SCSI_MLQUEUE_HOST_BUSY;
-		clear_bit_unlock(tag, &hba->lrb_in_use);
 		goto out;
 	}
 	WARN_ON(hba->clk_gating.state != CLKS_ON);
@@ -2479,7 +2485,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 	err = ufshcd_map_sg(hba, lrbp);
 	if (err) {
 		lrbp->cmd = NULL;
-		clear_bit_unlock(tag, &hba->lrb_in_use);
 		goto out;
 	}
 	/* Make sure descriptors are ready before ringing the doorbell */
@@ -2626,44 +2631,6 @@ static int ufshcd_wait_for_dev_cmd(struct ufs_hba *hba,
 	return err;
 }
 
-/**
- * ufshcd_get_dev_cmd_tag - Get device management command tag
- * @hba: per-adapter instance
- * @tag_out: pointer to variable with available slot value
- *
- * Get a free slot and lock it until device management command
- * completes.
- *
- * Returns false if free slot is unavailable for locking, else
- * return true with tag value in @tag.
- */
-static bool ufshcd_get_dev_cmd_tag(struct ufs_hba *hba, int *tag_out)
-{
-	int tag;
-	bool ret = false;
-	unsigned long tmp;
-
-	if (!tag_out)
-		goto out;
-
-	do {
-		tmp = ~hba->lrb_in_use;
-		tag = find_last_bit(&tmp, hba->nutrs);
-		if (tag >= hba->nutrs)
-			goto out;
-	} while (test_and_set_bit_lock(tag, &hba->lrb_in_use));
-
-	*tag_out = tag;
-	ret = true;
-out:
-	return ret;
-}
-
-static inline void ufshcd_put_dev_cmd_tag(struct ufs_hba *hba, int tag)
-{
-	clear_bit_unlock(tag, &hba->lrb_in_use);
-}
-
 /**
  * ufshcd_exec_dev_cmd - API for sending device management requests
  * @hba: UFS hba
@@ -2676,6 +2643,8 @@ static inline void ufshcd_put_dev_cmd_tag(struct ufs_hba *hba, int tag)
 static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 		enum dev_cmd_type cmd_type, int timeout)
 {
+	struct request_queue *q = hba->cmd_queue;
+	struct request *req;
 	struct ufshcd_lrb *lrbp;
 	int err;
 	int tag;
@@ -2689,7 +2658,11 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 	 * Even though we use wait_event() which sleeps indefinitely,
 	 * the maximum wait time is bounded by SCSI request timeout.
 	 */
-	wait_event(hba->dev_cmd.tag_wq, ufshcd_get_dev_cmd_tag(hba, &tag));
+	req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+	tag = req->tag;
+	WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
 
 	init_completion(&wait);
 	lrbp = &hba->lrb[tag];
@@ -2714,8 +2687,7 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 			err ? "query_complete_err" : "query_complete");
 
 out_put_tag:
-	ufshcd_put_dev_cmd_tag(hba, tag);
-	wake_up(&hba->dev_cmd.tag_wq);
+	blk_put_request(req);
 	up_read(&hba->clk_scaling_lock);
 	return err;
 }
@@ -4834,7 +4806,6 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 			cmd->result = result;
 			/* Mark completed command as NULL in LRB */
 			lrbp->cmd = NULL;
-			clear_bit_unlock(index, &hba->lrb_in_use);
 			/* Do not touch lrbp after scsi done */
 			cmd->scsi_done(cmd);
 			__ufshcd_release(hba);
@@ -4856,9 +4827,6 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 	hba->outstanding_reqs ^= completed_reqs;
 
 	ufshcd_clk_scaling_update_busy(hba);
-
-	/* we might have free'd some tags above */
-	wake_up(&hba->dev_cmd.tag_wq);
 }
 
 /**
@@ -5800,6 +5768,8 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 					enum dev_cmd_type cmd_type,
 					enum query_opcode desc_op)
 {
+	struct request_queue *q = hba->cmd_queue;
+	struct request *req;
 	struct ufshcd_lrb *lrbp;
 	int err = 0;
 	int tag;
@@ -5809,7 +5779,11 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 
 	down_read(&hba->clk_scaling_lock);
 
-	wait_event(hba->dev_cmd.tag_wq, ufshcd_get_dev_cmd_tag(hba, &tag));
+	req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+	tag = req->tag;
+	WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
 
 	init_completion(&wait);
 	lrbp = &hba->lrb[tag];
@@ -5883,8 +5857,7 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 		}
 	}
 
-	ufshcd_put_dev_cmd_tag(hba, tag);
-	wake_up(&hba->dev_cmd.tag_wq);
+	blk_put_request(req);
 	up_read(&hba->clk_scaling_lock);
 	return err;
 }
@@ -6179,9 +6152,6 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
 	hba->lrb[tag].cmd = NULL;
 	spin_unlock_irqrestore(host->host_lock, flags);
 
-	clear_bit_unlock(tag, &hba->lrb_in_use);
-	wake_up(&hba->dev_cmd.tag_wq);
-
 out:
 	if (!err) {
 		err = SUCCESS;
@@ -8185,6 +8155,7 @@ void ufshcd_remove(struct ufs_hba *hba)
 {
 	ufs_bsg_remove(hba);
 	ufs_sysfs_remove_nodes(hba->dev);
+	blk_cleanup_queue(hba->cmd_queue);
 	scsi_remove_host(hba->host);
 	/* disable interrupts */
 	ufshcd_disable_intr(hba, hba->intr_mask);
@@ -8348,9 +8319,6 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 
 	init_rwsem(&hba->clk_scaling_lock);
 
-	/* Initialize device management tag acquire wait queue */
-	init_waitqueue_head(&hba->dev_cmd.tag_wq);
-
 	ufshcd_init_clk_gating(hba);
 
 	ufshcd_init_clk_scaling(hba);
@@ -8384,6 +8352,12 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 		goto exit_gating;
 	}
 
+	hba->cmd_queue = blk_mq_init_queue(&hba->host->tag_set);
+	if (IS_ERR(hba->cmd_queue)) {
+		err = PTR_ERR(hba->cmd_queue);
+		goto out_remove_scsi_host;
+	}
+
 	/* Reset the attached device */
 	ufshcd_vops_device_reset(hba);
 
@@ -8393,7 +8367,7 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 		dev_err(hba->dev, "Host controller enable failed\n");
 		ufshcd_print_host_regs(hba);
 		ufshcd_print_host_state(hba);
-		goto out_remove_scsi_host;
+		goto free_cmd_queue;
 	}
 
 	/*
@@ -8430,6 +8404,8 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 
 	return 0;
 
+free_cmd_queue:
+	blk_cleanup_queue(hba->cmd_queue);
 out_remove_scsi_host:
 	scsi_remove_host(hba->host);
 exit_gating:
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index e0fe247c719e..6d9521892a84 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -212,13 +212,11 @@ struct ufs_query {
  * @type: device management command type - Query, NOP OUT
  * @lock: lock to allow one command at a time
  * @complete: internal commands completion
- * @tag_wq: wait queue until free command slot is available
  */
 struct ufs_dev_cmd {
 	enum dev_cmd_type type;
 	struct mutex lock;
 	struct completion *complete;
-	wait_queue_head_t tag_wq;
 	struct ufs_query query;
 };
 
@@ -483,7 +481,7 @@ struct ufs_stats {
  * @host: Scsi_Host instance of the driver
  * @dev: device handle
  * @lrb: local reference block
- * @lrb_in_use: lrb in use
+ * @cmd_queue: Used to allocate command tags from hba->host->tag_set.
  * @outstanding_tasks: Bits representing outstanding task requests
  * @outstanding_reqs: Bits representing outstanding transfer requests
  * @capabilities: UFS Controller Capabilities
@@ -541,6 +539,7 @@ struct ufs_hba {
 
 	struct Scsi_Host *host;
 	struct device *dev;
+	struct request_queue *cmd_queue;
 	/*
 	 * This field is to keep a reference to "scsi_device" corresponding to
 	 * "UFS device" W-LU.
@@ -561,7 +560,6 @@ struct ufs_hba {
 	u32 ahit;
 
 	struct ufshcd_lrb *lrb;
-	unsigned long lrb_in_use;
 
 	unsigned long outstanding_tasks;
 	unsigned long outstanding_reqs;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5 3/4] ufs: Use blk_{get,put}_request() to allocate and free TMFs
  2019-11-12 17:37 [PATCH v5 0/4] Simplify and optimize the UFS driver Bart Van Assche
  2019-11-12 17:37 ` [PATCH v5 1/4] ufs: Serialize error handling and command submission Bart Van Assche
  2019-11-12 17:37 ` [PATCH v5 2/4] ufs: Avoid busy-waiting by eliminating tag conflicts Bart Van Assche
@ 2019-11-12 17:37 ` Bart Van Assche
  2019-11-12 17:37 ` [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation Bart Van Assche
  3 siblings, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2019-11-12 17:37 UTC (permalink / raw)
  To: Martin K . Petersen, James E . J . Bottomley
  Cc: Bean Huo, Avri Altman, Asutosh Das, Vignesh Raghavendra, Can Guo,
	linux-scsi, Bart Van Assche, Stanley Chu, Tomas Winkler

Manage TMF tags with blk_{get,put}_request() instead of
ufshcd_get_tm_free_slot() / ufshcd_put_tm_slot(). Store a per-request
completion pointer in request.end_io_data instead of using a waitqueue
to report TMF completion.

Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/ufs/ufshcd.c | 122 +++++++++++++++++++++++---------------
 drivers/scsi/ufs/ufshcd.h |  12 ++--
 2 files changed, 77 insertions(+), 57 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index d746afd0a2e2..810b997f55fc 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -645,40 +645,6 @@ static inline int ufshcd_get_tr_ocs(struct ufshcd_lrb *lrbp)
 	return le32_to_cpu(lrbp->utr_descriptor_ptr->header.dword_2) & MASK_OCS;
 }
 
-/**
- * ufshcd_get_tm_free_slot - get a free slot for task management request
- * @hba: per adapter instance
- * @free_slot: pointer to variable with available slot value
- *
- * Get a free tag and lock it until ufshcd_put_tm_slot() is called.
- * Returns 0 if free slot is not available, else return 1 with tag value
- * in @free_slot.
- */
-static bool ufshcd_get_tm_free_slot(struct ufs_hba *hba, int *free_slot)
-{
-	int tag;
-	bool ret = false;
-
-	if (!free_slot)
-		goto out;
-
-	do {
-		tag = find_first_zero_bit(&hba->tm_slots_in_use, hba->nutmrs);
-		if (tag >= hba->nutmrs)
-			goto out;
-	} while (test_and_set_bit_lock(tag, &hba->tm_slots_in_use));
-
-	*free_slot = tag;
-	ret = true;
-out:
-	return ret;
-}
-
-static inline void ufshcd_put_tm_slot(struct ufs_hba *hba, int slot)
-{
-	clear_bit_unlock(slot, &hba->tm_slots_in_use);
-}
-
 /**
  * ufshcd_utrl_clear - Clear a bit in UTRLCLR register
  * @hba: per adapter instance
@@ -5530,17 +5496,38 @@ static void ufshcd_check_errors(struct ufs_hba *hba)
 	 */
 }
 
+struct ctm_info {
+	struct ufs_hba *hba;
+	unsigned long pending;
+};
+
+static bool ufshcd_compl_tm(struct request *req, void *priv, bool reserved)
+{
+	const struct ctm_info *const ci = priv;
+	struct completion *c;
+
+	WARN_ON_ONCE(reserved);
+	if (test_bit(req->tag, &ci->pending))
+		return true;
+	c = req->end_io_data;
+	if (c)
+		complete(c);
+	return true;
+}
+
 /**
  * ufshcd_tmc_handler - handle task management function completion
  * @hba: per adapter instance
  */
 static void ufshcd_tmc_handler(struct ufs_hba *hba)
 {
-	u32 tm_doorbell;
+	struct request_queue *q = hba->tmf_queue;
+	struct ctm_info ci = {
+		.hba	 = hba,
+		.pending = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL),
+	};
 
-	tm_doorbell = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL);
-	hba->tm_condition = tm_doorbell ^ hba->outstanding_tasks;
-	wake_up(&hba->tm_wq);
+	blk_mq_tagset_busy_iter(q->tag_set, ufshcd_compl_tm, &ci);
 }
 
 /**
@@ -5633,7 +5620,10 @@ static int ufshcd_clear_tm_cmd(struct ufs_hba *hba, int tag)
 static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 		struct utp_task_req_desc *treq, u8 tm_function)
 {
+	struct request_queue *q = hba->tmf_queue;
 	struct Scsi_Host *host = hba->host;
+	DECLARE_COMPLETION_ONSTACK(wait);
+	struct request *req;
 	unsigned long flags;
 	int free_slot, task_tag, err;
 
@@ -5642,7 +5632,10 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 	 * Even though we use wait_event() which sleeps indefinitely,
 	 * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
 	 */
-	wait_event(hba->tm_tag_wq, ufshcd_get_tm_free_slot(hba, &free_slot));
+	req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
+	req->end_io_data = &wait;
+	free_slot = req->tag;
+	WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
 	ufshcd_hold(hba, false);
 
 	spin_lock_irqsave(host->host_lock, flags);
@@ -5668,10 +5661,14 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 	ufshcd_add_tm_upiu_trace(hba, task_tag, "tm_send");
 
 	/* wait until the task management command is completed */
-	err = wait_event_timeout(hba->tm_wq,
-			test_bit(free_slot, &hba->tm_condition),
+	err = wait_for_completion_io_timeout(&wait,
 			msecs_to_jiffies(TM_CMD_TIMEOUT));
 	if (!err) {
+		/*
+		 * Make sure that ufshcd_compl_tm() does not trigger a
+		 * use-after-free.
+		 */
+		req->end_io_data = NULL;
 		ufshcd_add_tm_upiu_trace(hba, task_tag, "tm_complete_err");
 		dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
 				__func__, tm_function);
@@ -5690,9 +5687,7 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 	__clear_bit(free_slot, &hba->outstanding_tasks);
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
-	clear_bit(free_slot, &hba->tm_condition);
-	ufshcd_put_tm_slot(hba, free_slot);
-	wake_up(&hba->tm_tag_wq);
+	blk_put_request(req);
 
 	ufshcd_release(hba);
 	return err;
@@ -8155,6 +8150,8 @@ void ufshcd_remove(struct ufs_hba *hba)
 {
 	ufs_bsg_remove(hba);
 	ufs_sysfs_remove_nodes(hba->dev);
+	blk_cleanup_queue(hba->tmf_queue);
+	blk_mq_free_tag_set(&hba->tmf_tag_set);
 	blk_cleanup_queue(hba->cmd_queue);
 	scsi_remove_host(hba->host);
 	/* disable interrupts */
@@ -8234,6 +8231,18 @@ int ufshcd_alloc_host(struct device *dev, struct ufs_hba **hba_handle)
 }
 EXPORT_SYMBOL(ufshcd_alloc_host);
 
+/* This function exists because blk_mq_alloc_tag_set() requires this. */
+static blk_status_t ufshcd_queue_tmf(struct blk_mq_hw_ctx *hctx,
+				     const struct blk_mq_queue_data *qd)
+{
+	WARN_ON_ONCE(true);
+	return BLK_STS_NOTSUPP;
+}
+
+static const struct blk_mq_ops ufshcd_tmf_ops = {
+	.queue_rq = ufshcd_queue_tmf,
+};
+
 /**
  * ufshcd_init - Driver initialization routine
  * @hba: per-adapter instance
@@ -8303,10 +8312,6 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 
 	hba->max_pwr_info.is_valid = false;
 
-	/* Initailize wait queue for task management */
-	init_waitqueue_head(&hba->tm_wq);
-	init_waitqueue_head(&hba->tm_tag_wq);
-
 	/* Initialize work queues */
 	INIT_WORK(&hba->eh_work, ufshcd_err_handler);
 	INIT_WORK(&hba->eeh_work, ufshcd_exception_event_handler);
@@ -8358,6 +8363,21 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 		goto out_remove_scsi_host;
 	}
 
+	hba->tmf_tag_set = (struct blk_mq_tag_set) {
+		.nr_hw_queues	= 1,
+		.queue_depth	= hba->nutmrs,
+		.ops		= &ufshcd_tmf_ops,
+		.flags		= BLK_MQ_F_NO_SCHED,
+	};
+	err = blk_mq_alloc_tag_set(&hba->tmf_tag_set);
+	if (err < 0)
+		goto free_cmd_queue;
+	hba->tmf_queue = blk_mq_init_queue(&hba->tmf_tag_set);
+	if (IS_ERR(hba->tmf_queue)) {
+		err = PTR_ERR(hba->tmf_queue);
+		goto free_tmf_tag_set;
+	}
+
 	/* Reset the attached device */
 	ufshcd_vops_device_reset(hba);
 
@@ -8367,7 +8387,7 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 		dev_err(hba->dev, "Host controller enable failed\n");
 		ufshcd_print_host_regs(hba);
 		ufshcd_print_host_state(hba);
-		goto free_cmd_queue;
+		goto free_tmf_queue;
 	}
 
 	/*
@@ -8404,6 +8424,10 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 
 	return 0;
 
+free_tmf_queue:
+	blk_cleanup_queue(hba->tmf_queue);
+free_tmf_tag_set:
+	blk_mq_free_tag_set(&hba->tmf_tag_set);
 free_cmd_queue:
 	blk_cleanup_queue(hba->cmd_queue);
 out_remove_scsi_host:
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 6d9521892a84..5865e16f53a6 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -493,11 +493,9 @@ struct ufs_stats {
  * @irq: Irq number of the controller
  * @active_uic_cmd: handle of active UIC command
  * @uic_cmd_mutex: mutex for uic command
- * @tm_wq: wait queue for task management
- * @tm_tag_wq: wait queue for free task management slots
- * @tm_slots_in_use: bit map of task management request slots in use
+ * @tmf_tag_set: TMF tag set.
+ * @tmf_queue: Used to allocate TMF tags.
  * @pwr_done: completion for power mode change
- * @tm_condition: condition variable for task management
  * @ufshcd_state: UFSHCD states
  * @eh_flags: Error handling flags
  * @intr_mask: Interrupt Mask Bits
@@ -641,10 +639,8 @@ struct ufs_hba {
 	/* Device deviations from standard UFS device spec. */
 	unsigned int dev_quirks;
 
-	wait_queue_head_t tm_wq;
-	wait_queue_head_t tm_tag_wq;
-	unsigned long tm_condition;
-	unsigned long tm_slots_in_use;
+	struct blk_mq_tag_set tmf_tag_set;
+	struct request_queue *tmf_queue;
 
 	struct uic_command *active_uic_cmd;
 	struct mutex uic_cmd_mutex;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation
  2019-11-12 17:37 [PATCH v5 0/4] Simplify and optimize the UFS driver Bart Van Assche
                   ` (2 preceding siblings ...)
  2019-11-12 17:37 ` [PATCH v5 3/4] ufs: Use blk_{get,put}_request() to allocate and free TMFs Bart Van Assche
@ 2019-11-12 17:37 ` Bart Van Assche
  2019-11-13  0:11   ` cang
  3 siblings, 1 reply; 18+ messages in thread
From: Bart Van Assche @ 2019-11-12 17:37 UTC (permalink / raw)
  To: Martin K . Petersen, James E . J . Bottomley
  Cc: Bean Huo, Avri Altman, Asutosh Das, Vignesh Raghavendra, Can Guo,
	linux-scsi, Bart Van Assche, Stanley Chu, Tomas Winkler

Scaling the clock is only safe while no commands are in progress. Use
blk_mq_{un,}freeze_queue() to block submission of new commands and to
wait for ongoing commands to complete. Remove "_scsi" from the name of
the functions that block and unblock requests since these functions
now block both SCSI commands and non-SCSI commands.

Cc: Bean Huo <beanhuo@micron.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/ufs/ufshcd.c | 141 +++++++++++++-------------------------
 drivers/scsi/ufs/ufshcd.h |   3 -
 2 files changed, 46 insertions(+), 98 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 810b997f55fc..717ba24a2d40 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -290,16 +290,50 @@ static inline void ufshcd_disable_irq(struct ufs_hba *hba)
 	}
 }
 
-static void ufshcd_scsi_unblock_requests(struct ufs_hba *hba)
+static void ufshcd_unblock_requests(struct ufs_hba *hba)
 {
-	if (atomic_dec_and_test(&hba->scsi_block_reqs_cnt))
-		scsi_unblock_requests(hba->host);
+	struct scsi_device *sdev;
+
+	blk_mq_unfreeze_queue(hba->tmf_queue);
+	blk_mq_unfreeze_queue(hba->cmd_queue);
+	shost_for_each_device(sdev, hba->host)
+		blk_mq_unfreeze_queue(sdev->request_queue);
 }
 
-static void ufshcd_scsi_block_requests(struct ufs_hba *hba)
+static int ufshcd_block_requests(struct ufs_hba *hba, unsigned long timeout)
 {
-	if (atomic_inc_return(&hba->scsi_block_reqs_cnt) == 1)
-		scsi_block_requests(hba->host);
+	struct scsi_device *sdev;
+	unsigned long deadline = jiffies + timeout;
+
+	if (timeout == ULONG_MAX) {
+		shost_for_each_device(sdev, hba->host)
+			blk_mq_freeze_queue(sdev->request_queue);
+		blk_mq_freeze_queue(hba->cmd_queue);
+		blk_mq_freeze_queue(hba->tmf_queue);
+		return 0;
+	}
+
+	shost_for_each_device(sdev, hba->host)
+		blk_freeze_queue_start(sdev->request_queue);
+	blk_freeze_queue_start(hba->cmd_queue);
+	blk_freeze_queue_start(hba->tmf_queue);
+	shost_for_each_device(sdev, hba->host) {
+		if (blk_mq_freeze_queue_wait_timeout(sdev->request_queue,
+				max_t(long, 0, deadline - jiffies)) <= 0) {
+			goto err;
+		}
+	}
+	if (blk_mq_freeze_queue_wait_timeout(hba->cmd_queue,
+				max_t(long, 0, deadline - jiffies)) <= 0)
+		goto err;
+	if (blk_mq_freeze_queue_wait_timeout(hba->tmf_queue,
+				max_t(long, 0, deadline - jiffies)) <= 0)
+		goto err;
+	return 0;
+
+err:
+	ufshcd_unblock_requests(hba);
+	return -ETIMEDOUT;
 }
 
 static void ufshcd_add_cmd_upiu_trace(struct ufs_hba *hba, unsigned int tag,
@@ -971,65 +1005,6 @@ static bool ufshcd_is_devfreq_scaling_required(struct ufs_hba *hba,
 	return false;
 }
 
-static int ufshcd_wait_for_doorbell_clr(struct ufs_hba *hba,
-					u64 wait_timeout_us)
-{
-	unsigned long flags;
-	int ret = 0;
-	u32 tm_doorbell;
-	u32 tr_doorbell;
-	bool timeout = false, do_last_check = false;
-	ktime_t start;
-
-	ufshcd_hold(hba, false);
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	/*
-	 * Wait for all the outstanding tasks/transfer requests.
-	 * Verify by checking the doorbell registers are clear.
-	 */
-	start = ktime_get();
-	do {
-		if (hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL) {
-			ret = -EBUSY;
-			goto out;
-		}
-
-		tm_doorbell = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL);
-		tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
-		if (!tm_doorbell && !tr_doorbell) {
-			timeout = false;
-			break;
-		} else if (do_last_check) {
-			break;
-		}
-
-		spin_unlock_irqrestore(hba->host->host_lock, flags);
-		schedule();
-		if (ktime_to_us(ktime_sub(ktime_get(), start)) >
-		    wait_timeout_us) {
-			timeout = true;
-			/*
-			 * We might have scheduled out for long time so make
-			 * sure to check if doorbells are cleared by this time
-			 * or not.
-			 */
-			do_last_check = true;
-		}
-		spin_lock_irqsave(hba->host->host_lock, flags);
-	} while (tm_doorbell || tr_doorbell);
-
-	if (timeout) {
-		dev_err(hba->dev,
-			"%s: timedout waiting for doorbell to clear (tm=0x%x, tr=0x%x)\n",
-			__func__, tm_doorbell, tr_doorbell);
-		ret = -EBUSY;
-	}
-out:
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-	ufshcd_release(hba);
-	return ret;
-}
-
 /**
  * ufshcd_scale_gear - scale up/down UFS gear
  * @hba: per adapter instance
@@ -1079,27 +1054,16 @@ static int ufshcd_scale_gear(struct ufs_hba *hba, bool scale_up)
 
 static int ufshcd_clock_scaling_prepare(struct ufs_hba *hba)
 {
-	#define DOORBELL_CLR_TOUT_US		(1000 * 1000) /* 1 sec */
-	int ret = 0;
 	/*
 	 * make sure that there are no outstanding requests when
 	 * clock scaling is in progress
 	 */
-	ufshcd_scsi_block_requests(hba);
-	down_write(&hba->clk_scaling_lock);
-	if (ufshcd_wait_for_doorbell_clr(hba, DOORBELL_CLR_TOUT_US)) {
-		ret = -EBUSY;
-		up_write(&hba->clk_scaling_lock);
-		ufshcd_scsi_unblock_requests(hba);
-	}
-
-	return ret;
+	return ufshcd_block_requests(hba, HZ);
 }
 
 static void ufshcd_clock_scaling_unprepare(struct ufs_hba *hba)
 {
-	up_write(&hba->clk_scaling_lock);
-	ufshcd_scsi_unblock_requests(hba);
+	ufshcd_unblock_requests(hba);
 }
 
 /**
@@ -1471,7 +1435,7 @@ static void ufshcd_ungate_work(struct work_struct *work)
 		hba->clk_gating.is_suspended = false;
 	}
 unblock_reqs:
-	ufshcd_scsi_unblock_requests(hba);
+	ufshcd_unblock_requests(hba);
 }
 
 /**
@@ -1528,7 +1492,7 @@ int ufshcd_hold(struct ufs_hba *hba, bool async)
 		 */
 		/* fallthrough */
 	case CLKS_OFF:
-		ufshcd_scsi_block_requests(hba);
+		ufshcd_block_requests(hba, ULONG_MAX);
 		hba->clk_gating.state = REQ_CLKS_ON;
 		trace_ufshcd_clk_gating(dev_name(hba->dev),
 					hba->clk_gating.state);
@@ -2395,9 +2359,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 		BUG();
 	}
 
-	if (!down_read_trylock(&hba->clk_scaling_lock))
-		return SCSI_MLQUEUE_HOST_BUSY;
-
 	spin_lock_irqsave(hba->host->host_lock, flags);
 	switch (hba->ufshcd_state) {
 	case UFSHCD_STATE_OPERATIONAL:
@@ -2463,7 +2424,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 out_unlock:
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 out:
-	up_read(&hba->clk_scaling_lock);
 	return err;
 }
 
@@ -2617,8 +2577,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 	struct completion wait;
 	unsigned long flags;
 
-	down_read(&hba->clk_scaling_lock);
-
 	/*
 	 * Get free slot, sleep if slots are unavailable.
 	 * Even though we use wait_event() which sleeps indefinitely,
@@ -2654,7 +2612,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 
 out_put_tag:
 	blk_put_request(req);
-	up_read(&hba->clk_scaling_lock);
 	return err;
 }
 
@@ -5342,7 +5299,7 @@ static void ufshcd_err_handler(struct work_struct *work)
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 	if (starget)
 		scsi_target_unblock(&starget->dev, SDEV_RUNNING);
-	ufshcd_scsi_unblock_requests(hba);
+	scsi_unblock_requests(hba->host);
 	ufshcd_release(hba);
 	pm_runtime_put_sync(hba->dev);
 }
@@ -5466,7 +5423,7 @@ static void ufshcd_check_errors(struct ufs_hba *hba)
 		/* handle fatal errors only when link is functional */
 		if (hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL) {
 			/* block commands from scsi mid-layer */
-			ufshcd_scsi_block_requests(hba);
+			scsi_block_requests(hba->host);
 
 			hba->ufshcd_state = UFSHCD_STATE_EH_SCHEDULED;
 
@@ -5772,8 +5729,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 	unsigned long flags;
 	u32 upiu_flags;
 
-	down_read(&hba->clk_scaling_lock);
-
 	req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
@@ -5853,7 +5808,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 	}
 
 	blk_put_request(req);
-	up_read(&hba->clk_scaling_lock);
 	return err;
 }
 
@@ -8322,8 +8276,6 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 	/* Initialize mutex for device management commands */
 	mutex_init(&hba->dev_cmd.lock);
 
-	init_rwsem(&hba->clk_scaling_lock);
-
 	ufshcd_init_clk_gating(hba);
 
 	ufshcd_init_clk_scaling(hba);
@@ -8410,7 +8362,6 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 
 	/* Hold auto suspend until async scan completes */
 	pm_runtime_get_sync(dev);
-	atomic_set(&hba->scsi_block_reqs_cnt, 0);
 	/*
 	 * We are assuming that device wasn't put in sleep/power-down
 	 * state exclusively during the boot stage before kernel.
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 5865e16f53a6..c44bfcdb26a3 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -520,7 +520,6 @@ struct ufs_stats {
  * @urgent_bkops_lvl: keeps track of urgent bkops level for device
  * @is_urgent_bkops_lvl_checked: keeps track if the urgent bkops level for
  *  device is known or not.
- * @scsi_block_reqs_cnt: reference counting for scsi block requests
  */
 struct ufs_hba {
 	void __iomem *mmio_base;
@@ -724,9 +723,7 @@ struct ufs_hba {
 	enum bkops_status urgent_bkops_lvl;
 	bool is_urgent_bkops_lvl_checked;
 
-	struct rw_semaphore clk_scaling_lock;
 	struct ufs_desc_size desc_size;
-	atomic_t scsi_block_reqs_cnt;
 
 	struct device		bsg_dev;
 	struct request_queue	*bsg_queue;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation
  2019-11-12 17:37 ` [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation Bart Van Assche
@ 2019-11-13  0:11   ` cang
  2019-11-13  0:55     ` Bart Van Assche
  0 siblings, 1 reply; 18+ messages in thread
From: cang @ 2019-11-13  0:11 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, James E . J . Bottomley, Bean Huo,
	Avri Altman, Asutosh Das, Vignesh Raghavendra, linux-scsi,
	Stanley Chu, Tomas Winkler

On 2019-11-13 01:37, Bart Van Assche wrote:
> Scaling the clock is only safe while no commands are in progress. Use
> blk_mq_{un,}freeze_queue() to block submission of new commands and to
> wait for ongoing commands to complete. Remove "_scsi" from the name of
> the functions that block and unblock requests since these functions
> now block both SCSI commands and non-SCSI commands.
> 
> Cc: Bean Huo <beanhuo@micron.com>
> Cc: Avri Altman <avri.altman@wdc.com>
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> Cc: Tomas Winkler <tomas.winkler@intel.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/ufs/ufshcd.c | 141 +++++++++++++-------------------------
>  drivers/scsi/ufs/ufshcd.h |   3 -
>  2 files changed, 46 insertions(+), 98 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 810b997f55fc..717ba24a2d40 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -290,16 +290,50 @@ static inline void ufshcd_disable_irq(struct 
> ufs_hba *hba)
>  	}
>  }
> 
> -static void ufshcd_scsi_unblock_requests(struct ufs_hba *hba)
> +static void ufshcd_unblock_requests(struct ufs_hba *hba)
>  {
> -	if (atomic_dec_and_test(&hba->scsi_block_reqs_cnt))
> -		scsi_unblock_requests(hba->host);
> +	struct scsi_device *sdev;
> +
> +	blk_mq_unfreeze_queue(hba->tmf_queue);
> +	blk_mq_unfreeze_queue(hba->cmd_queue);
> +	shost_for_each_device(sdev, hba->host)
> +		blk_mq_unfreeze_queue(sdev->request_queue);
>  }
> 
> -static void ufshcd_scsi_block_requests(struct ufs_hba *hba)
> +static int ufshcd_block_requests(struct ufs_hba *hba, unsigned long 
> timeout)
>  {
> -	if (atomic_inc_return(&hba->scsi_block_reqs_cnt) == 1)
> -		scsi_block_requests(hba->host);
> +	struct scsi_device *sdev;
> +	unsigned long deadline = jiffies + timeout;
> +
> +	if (timeout == ULONG_MAX) {
> +		shost_for_each_device(sdev, hba->host)
> +			blk_mq_freeze_queue(sdev->request_queue);
> +		blk_mq_freeze_queue(hba->cmd_queue);
> +		blk_mq_freeze_queue(hba->tmf_queue);
> +		return 0;
> +	}
> +
> +	shost_for_each_device(sdev, hba->host)
> +		blk_freeze_queue_start(sdev->request_queue);
> +	blk_freeze_queue_start(hba->cmd_queue);
> +	blk_freeze_queue_start(hba->tmf_queue);
> +	shost_for_each_device(sdev, hba->host) {
> +		if (blk_mq_freeze_queue_wait_timeout(sdev->request_queue,
> +				max_t(long, 0, deadline - jiffies)) <= 0) {
> +			goto err;
> +		}
> +	}
> +	if (blk_mq_freeze_queue_wait_timeout(hba->cmd_queue,
> +				max_t(long, 0, deadline - jiffies)) <= 0)
> +		goto err;
> +	if (blk_mq_freeze_queue_wait_timeout(hba->tmf_queue,
> +				max_t(long, 0, deadline - jiffies)) <= 0)
> +		goto err;
> +	return 0;
> +
> +err:
> +	ufshcd_unblock_requests(hba);
> +	return -ETIMEDOUT;
>  }
> 
>  static void ufshcd_add_cmd_upiu_trace(struct ufs_hba *hba, unsigned 
> int tag,
> @@ -971,65 +1005,6 @@ static bool
> ufshcd_is_devfreq_scaling_required(struct ufs_hba *hba,
>  	return false;
>  }
> 
> -static int ufshcd_wait_for_doorbell_clr(struct ufs_hba *hba,
> -					u64 wait_timeout_us)
> -{
> -	unsigned long flags;
> -	int ret = 0;
> -	u32 tm_doorbell;
> -	u32 tr_doorbell;
> -	bool timeout = false, do_last_check = false;
> -	ktime_t start;
> -
> -	ufshcd_hold(hba, false);
> -	spin_lock_irqsave(hba->host->host_lock, flags);
> -	/*
> -	 * Wait for all the outstanding tasks/transfer requests.
> -	 * Verify by checking the doorbell registers are clear.
> -	 */
> -	start = ktime_get();
> -	do {
> -		if (hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL) {
> -			ret = -EBUSY;
> -			goto out;
> -		}
> -
> -		tm_doorbell = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL);
> -		tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
> -		if (!tm_doorbell && !tr_doorbell) {
> -			timeout = false;
> -			break;
> -		} else if (do_last_check) {
> -			break;
> -		}
> -
> -		spin_unlock_irqrestore(hba->host->host_lock, flags);
> -		schedule();
> -		if (ktime_to_us(ktime_sub(ktime_get(), start)) >
> -		    wait_timeout_us) {
> -			timeout = true;
> -			/*
> -			 * We might have scheduled out for long time so make
> -			 * sure to check if doorbells are cleared by this time
> -			 * or not.
> -			 */
> -			do_last_check = true;
> -		}
> -		spin_lock_irqsave(hba->host->host_lock, flags);
> -	} while (tm_doorbell || tr_doorbell);
> -
> -	if (timeout) {
> -		dev_err(hba->dev,
> -			"%s: timedout waiting for doorbell to clear (tm=0x%x, tr=0x%x)\n",
> -			__func__, tm_doorbell, tr_doorbell);
> -		ret = -EBUSY;
> -	}
> -out:
> -	spin_unlock_irqrestore(hba->host->host_lock, flags);
> -	ufshcd_release(hba);
> -	return ret;
> -}
> -
>  /**
>   * ufshcd_scale_gear - scale up/down UFS gear
>   * @hba: per adapter instance
> @@ -1079,27 +1054,16 @@ static int ufshcd_scale_gear(struct ufs_hba
> *hba, bool scale_up)
> 
>  static int ufshcd_clock_scaling_prepare(struct ufs_hba *hba)
>  {
> -	#define DOORBELL_CLR_TOUT_US		(1000 * 1000) /* 1 sec */
> -	int ret = 0;
>  	/*
>  	 * make sure that there are no outstanding requests when
>  	 * clock scaling is in progress
>  	 */
> -	ufshcd_scsi_block_requests(hba);
> -	down_write(&hba->clk_scaling_lock);
> -	if (ufshcd_wait_for_doorbell_clr(hba, DOORBELL_CLR_TOUT_US)) {
> -		ret = -EBUSY;
> -		up_write(&hba->clk_scaling_lock);
> -		ufshcd_scsi_unblock_requests(hba);
> -	}
> -
> -	return ret;
> +	return ufshcd_block_requests(hba, HZ);
>  }
> 
>  static void ufshcd_clock_scaling_unprepare(struct ufs_hba *hba)
>  {
> -	up_write(&hba->clk_scaling_lock);
> -	ufshcd_scsi_unblock_requests(hba);
> +	ufshcd_unblock_requests(hba);
>  }
> 
>  /**
> @@ -1471,7 +1435,7 @@ static void ufshcd_ungate_work(struct work_struct 
> *work)
>  		hba->clk_gating.is_suspended = false;
>  	}
>  unblock_reqs:
> -	ufshcd_scsi_unblock_requests(hba);
> +	ufshcd_unblock_requests(hba);
>  }
> 
>  /**
> @@ -1528,7 +1492,7 @@ int ufshcd_hold(struct ufs_hba *hba, bool async)
>  		 */
>  		/* fallthrough */
>  	case CLKS_OFF:
> -		ufshcd_scsi_block_requests(hba);
> +		ufshcd_block_requests(hba, ULONG_MAX);

Hi Bart,

ufshcd_hold(async == true) is used in ufshcd_queuecommand() path because
ufshcd_queuecommand() can be entered under atomic contexts.
Thus ufshcd_block_requests() here has the same risk causing scheduling 
while atomic.
FYI, it is not easy to hit above scenario in small scale of test.

Best Regards,
Can Guo.

>  		hba->clk_gating.state = REQ_CLKS_ON;
>  		trace_ufshcd_clk_gating(dev_name(hba->dev),
>  					hba->clk_gating.state);
> @@ -2395,9 +2359,6 @@ static int ufshcd_queuecommand(struct Scsi_Host
> *host, struct scsi_cmnd *cmd)
>  		BUG();
>  	}
> 
> -	if (!down_read_trylock(&hba->clk_scaling_lock))
> -		return SCSI_MLQUEUE_HOST_BUSY;
> -
>  	spin_lock_irqsave(hba->host->host_lock, flags);
>  	switch (hba->ufshcd_state) {
>  	case UFSHCD_STATE_OPERATIONAL:
> @@ -2463,7 +2424,6 @@ static int ufshcd_queuecommand(struct Scsi_Host
> *host, struct scsi_cmnd *cmd)
>  out_unlock:
>  	spin_unlock_irqrestore(hba->host->host_lock, flags);
>  out:
> -	up_read(&hba->clk_scaling_lock);
>  	return err;
>  }
> 
> @@ -2617,8 +2577,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba 
> *hba,
>  	struct completion wait;
>  	unsigned long flags;
> 
> -	down_read(&hba->clk_scaling_lock);
> -
>  	/*
>  	 * Get free slot, sleep if slots are unavailable.
>  	 * Even though we use wait_event() which sleeps indefinitely,
> @@ -2654,7 +2612,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba 
> *hba,
> 
>  out_put_tag:
>  	blk_put_request(req);
> -	up_read(&hba->clk_scaling_lock);
>  	return err;
>  }
> 
> @@ -5342,7 +5299,7 @@ static void ufshcd_err_handler(struct work_struct 
> *work)
>  	spin_unlock_irqrestore(hba->host->host_lock, flags);
>  	if (starget)
>  		scsi_target_unblock(&starget->dev, SDEV_RUNNING);
> -	ufshcd_scsi_unblock_requests(hba);
> +	scsi_unblock_requests(hba->host);
>  	ufshcd_release(hba);
>  	pm_runtime_put_sync(hba->dev);
>  }
> @@ -5466,7 +5423,7 @@ static void ufshcd_check_errors(struct ufs_hba 
> *hba)
>  		/* handle fatal errors only when link is functional */
>  		if (hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL) {
>  			/* block commands from scsi mid-layer */
> -			ufshcd_scsi_block_requests(hba);
> +			scsi_block_requests(hba->host);
> 
>  			hba->ufshcd_state = UFSHCD_STATE_EH_SCHEDULED;
> 
> @@ -5772,8 +5729,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct
> ufs_hba *hba,
>  	unsigned long flags;
>  	u32 upiu_flags;
> 
> -	down_read(&hba->clk_scaling_lock);
> -
>  	req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
>  	if (IS_ERR(req))
>  		return PTR_ERR(req);
> @@ -5853,7 +5808,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct
> ufs_hba *hba,
>  	}
> 
>  	blk_put_request(req);
> -	up_read(&hba->clk_scaling_lock);
>  	return err;
>  }
> 
> @@ -8322,8 +8276,6 @@ int ufshcd_init(struct ufs_hba *hba, void
> __iomem *mmio_base, unsigned int irq)
>  	/* Initialize mutex for device management commands */
>  	mutex_init(&hba->dev_cmd.lock);
> 
> -	init_rwsem(&hba->clk_scaling_lock);
> -
>  	ufshcd_init_clk_gating(hba);
> 
>  	ufshcd_init_clk_scaling(hba);
> @@ -8410,7 +8362,6 @@ int ufshcd_init(struct ufs_hba *hba, void
> __iomem *mmio_base, unsigned int irq)
> 
>  	/* Hold auto suspend until async scan completes */
>  	pm_runtime_get_sync(dev);
> -	atomic_set(&hba->scsi_block_reqs_cnt, 0);
>  	/*
>  	 * We are assuming that device wasn't put in sleep/power-down
>  	 * state exclusively during the boot stage before kernel.
> diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
> index 5865e16f53a6..c44bfcdb26a3 100644
> --- a/drivers/scsi/ufs/ufshcd.h
> +++ b/drivers/scsi/ufs/ufshcd.h
> @@ -520,7 +520,6 @@ struct ufs_stats {
>   * @urgent_bkops_lvl: keeps track of urgent bkops level for device
>   * @is_urgent_bkops_lvl_checked: keeps track if the urgent bkops level 
> for
>   *  device is known or not.
> - * @scsi_block_reqs_cnt: reference counting for scsi block requests
>   */
>  struct ufs_hba {
>  	void __iomem *mmio_base;
> @@ -724,9 +723,7 @@ struct ufs_hba {
>  	enum bkops_status urgent_bkops_lvl;
>  	bool is_urgent_bkops_lvl_checked;
> 
> -	struct rw_semaphore clk_scaling_lock;
>  	struct ufs_desc_size desc_size;
> -	atomic_t scsi_block_reqs_cnt;
> 
>  	struct device		bsg_dev;
>  	struct request_queue	*bsg_queue;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation
  2019-11-13  0:11   ` cang
@ 2019-11-13  0:55     ` Bart Van Assche
  2019-11-13 16:03       ` [EXT] " Bean Huo (beanhuo)
  0 siblings, 1 reply; 18+ messages in thread
From: Bart Van Assche @ 2019-11-13  0:55 UTC (permalink / raw)
  To: cang
  Cc: Martin K . Petersen, James E . J . Bottomley, Bean Huo,
	Avri Altman, Asutosh Das, Vignesh Raghavendra, linux-scsi,
	Stanley Chu, Tomas Winkler

[-- Attachment #1: Type: text/plain, Size: 749 bytes --]

On 11/12/19 4:11 PM, cang@codeaurora.org wrote:
> On 2019-11-13 01:37, Bart Van Assche wrote:
>> @@ -1528,7 +1492,7 @@ int ufshcd_hold(struct ufs_hba *hba, bool async)
>>           */
>>          /* fallthrough */
>>      case CLKS_OFF:
>> -        ufshcd_scsi_block_requests(hba);
>> +        ufshcd_block_requests(hba, ULONG_MAX);
> 
> ufshcd_hold(async == true) is used in ufshcd_queuecommand() path because
> ufshcd_queuecommand() can be entered under atomic contexts.
> Thus ufshcd_block_requests() here has the same risk causing scheduling 
> while atomic.
> FYI, it is not easy to hit above scenario in small scale of test.

Hi Bean,

How about replacing patch 4/4 with the attached patch?

Thanks,

Bart.


[-- Attachment #2: 0001-ufs-Simplify-the-clock-scaling-mechanism-implementat.patch --]
[-- Type: text/x-patch, Size: 7194 bytes --]

From 112fd52ef68927ab9b19fd84765ea31aacd2d0de Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bvanassche@acm.org>
Date: Thu, 10 Oct 2019 15:56:35 -0700
Subject: [PATCH] ufs: Simplify the clock scaling mechanism implementation

Scaling the clock is only safe while no commands are in progress. Use
blk_mq_{un,}freeze_queue() to block submission of new commands and to
wait for ongoing commands to complete. This patch removes a semaphore
down and up operation pair from the hot path.

Cc: Bean Huo <beanhuo@micron.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/ufs/ufshcd.c | 132 ++++++++++++++------------------------
 drivers/scsi/ufs/ufshcd.h |   1 -
 2 files changed, 48 insertions(+), 85 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 99ce1d651f03..f00d665715d1 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -302,6 +302,52 @@ static void ufshcd_scsi_block_requests(struct ufs_hba *hba)
 		scsi_block_requests(hba->host);
 }
 
+static void ufshcd_unblock_requests(struct ufs_hba *hba)
+{
+	struct scsi_device *sdev;
+
+	blk_mq_unfreeze_queue(hba->tmf_queue);
+	blk_mq_unfreeze_queue(hba->cmd_queue);
+	shost_for_each_device(sdev, hba->host)
+		blk_mq_unfreeze_queue(sdev->request_queue);
+}
+
+static int ufshcd_block_requests(struct ufs_hba *hba, unsigned long timeout)
+{
+	struct scsi_device *sdev;
+	unsigned long deadline = jiffies + timeout;
+
+	if (timeout == ULONG_MAX) {
+		shost_for_each_device(sdev, hba->host)
+			blk_mq_freeze_queue(sdev->request_queue);
+		blk_mq_freeze_queue(hba->cmd_queue);
+		blk_mq_freeze_queue(hba->tmf_queue);
+		return 0;
+	}
+
+	shost_for_each_device(sdev, hba->host)
+		blk_freeze_queue_start(sdev->request_queue);
+	blk_freeze_queue_start(hba->cmd_queue);
+	blk_freeze_queue_start(hba->tmf_queue);
+	shost_for_each_device(sdev, hba->host) {
+		if (blk_mq_freeze_queue_wait_timeout(sdev->request_queue,
+				max_t(long, 0, deadline - jiffies)) <= 0) {
+			goto err;
+		}
+	}
+	if (blk_mq_freeze_queue_wait_timeout(hba->cmd_queue,
+				max_t(long, 0, deadline - jiffies)) <= 0)
+		goto err;
+	if (blk_mq_freeze_queue_wait_timeout(hba->tmf_queue,
+				max_t(long, 0, deadline - jiffies)) <= 0)
+		goto err;
+	return 0;
+
+err:
+	ufshcd_unblock_requests(hba);
+	return -ETIMEDOUT;
+}
+
 static void ufshcd_add_cmd_upiu_trace(struct ufs_hba *hba, unsigned int tag,
 		const char *str)
 {
@@ -971,65 +1017,6 @@ static bool ufshcd_is_devfreq_scaling_required(struct ufs_hba *hba,
 	return false;
 }
 
-static int ufshcd_wait_for_doorbell_clr(struct ufs_hba *hba,
-					u64 wait_timeout_us)
-{
-	unsigned long flags;
-	int ret = 0;
-	u32 tm_doorbell;
-	u32 tr_doorbell;
-	bool timeout = false, do_last_check = false;
-	ktime_t start;
-
-	ufshcd_hold(hba, false);
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	/*
-	 * Wait for all the outstanding tasks/transfer requests.
-	 * Verify by checking the doorbell registers are clear.
-	 */
-	start = ktime_get();
-	do {
-		if (hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL) {
-			ret = -EBUSY;
-			goto out;
-		}
-
-		tm_doorbell = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL);
-		tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
-		if (!tm_doorbell && !tr_doorbell) {
-			timeout = false;
-			break;
-		} else if (do_last_check) {
-			break;
-		}
-
-		spin_unlock_irqrestore(hba->host->host_lock, flags);
-		schedule();
-		if (ktime_to_us(ktime_sub(ktime_get(), start)) >
-		    wait_timeout_us) {
-			timeout = true;
-			/*
-			 * We might have scheduled out for long time so make
-			 * sure to check if doorbells are cleared by this time
-			 * or not.
-			 */
-			do_last_check = true;
-		}
-		spin_lock_irqsave(hba->host->host_lock, flags);
-	} while (tm_doorbell || tr_doorbell);
-
-	if (timeout) {
-		dev_err(hba->dev,
-			"%s: timedout waiting for doorbell to clear (tm=0x%x, tr=0x%x)\n",
-			__func__, tm_doorbell, tr_doorbell);
-		ret = -EBUSY;
-	}
-out:
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-	ufshcd_release(hba);
-	return ret;
-}
-
 /**
  * ufshcd_scale_gear - scale up/down UFS gear
  * @hba: per adapter instance
@@ -1079,27 +1066,16 @@ static int ufshcd_scale_gear(struct ufs_hba *hba, bool scale_up)
 
 static int ufshcd_clock_scaling_prepare(struct ufs_hba *hba)
 {
-	#define DOORBELL_CLR_TOUT_US		(1000 * 1000) /* 1 sec */
-	int ret = 0;
 	/*
 	 * make sure that there are no outstanding requests when
 	 * clock scaling is in progress
 	 */
-	ufshcd_scsi_block_requests(hba);
-	down_write(&hba->clk_scaling_lock);
-	if (ufshcd_wait_for_doorbell_clr(hba, DOORBELL_CLR_TOUT_US)) {
-		ret = -EBUSY;
-		up_write(&hba->clk_scaling_lock);
-		ufshcd_scsi_unblock_requests(hba);
-	}
-
-	return ret;
+	return ufshcd_block_requests(hba, HZ);
 }
 
 static void ufshcd_clock_scaling_unprepare(struct ufs_hba *hba)
 {
-	up_write(&hba->clk_scaling_lock);
-	ufshcd_scsi_unblock_requests(hba);
+	ufshcd_unblock_requests(hba);
 }
 
 /**
@@ -2394,9 +2370,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 		BUG();
 	}
 
-	if (!down_read_trylock(&hba->clk_scaling_lock))
-		return SCSI_MLQUEUE_HOST_BUSY;
-
 	spin_lock_irqsave(hba->host->host_lock, flags);
 	switch (hba->ufshcd_state) {
 	case UFSHCD_STATE_OPERATIONAL:
@@ -2462,7 +2435,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 out_unlock:
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 out:
-	up_read(&hba->clk_scaling_lock);
 	return err;
 }
 
@@ -2616,8 +2588,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 	struct completion wait;
 	unsigned long flags;
 
-	down_read(&hba->clk_scaling_lock);
-
 	/*
 	 * Get free slot, sleep if slots are unavailable.
 	 * Even though we use wait_event() which sleeps indefinitely,
@@ -2653,7 +2623,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 
 out_put_tag:
 	blk_put_request(req);
-	up_read(&hba->clk_scaling_lock);
 	return err;
 }
 
@@ -5771,8 +5740,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 	unsigned long flags;
 	u32 upiu_flags;
 
-	down_read(&hba->clk_scaling_lock);
-
 	req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
@@ -5852,7 +5819,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 	}
 
 	blk_put_request(req);
-	up_read(&hba->clk_scaling_lock);
 	return err;
 }
 
@@ -8321,8 +8287,6 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 	/* Initialize mutex for device management commands */
 	mutex_init(&hba->dev_cmd.lock);
 
-	init_rwsem(&hba->clk_scaling_lock);
-
 	ufshcd_init_clk_gating(hba);
 
 	ufshcd_init_clk_scaling(hba);
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 5865e16f53a6..5ebb920ae874 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -724,7 +724,6 @@ struct ufs_hba {
 	enum bkops_status urgent_bkops_lvl;
 	bool is_urgent_bkops_lvl_checked;
 
-	struct rw_semaphore clk_scaling_lock;
 	struct ufs_desc_size desc_size;
 	atomic_t scsi_block_reqs_cnt;
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* RE: [EXT] Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation
  2019-11-13  0:55     ` Bart Van Assche
@ 2019-11-13 16:03       ` Bean Huo (beanhuo)
  2019-11-14 16:11         ` Bart Van Assche
  0 siblings, 1 reply; 18+ messages in thread
From: Bean Huo (beanhuo) @ 2019-11-13 16:03 UTC (permalink / raw)
  To: Bart Van Assche, cang
  Cc: Martin K . Petersen, James E . J . Bottomley, Avri Altman,
	Asutosh Das, Vignesh Raghavendra, linux-scsi, Stanley Chu,
	Tomas Winkler

Hi, Bart

I think, you are asking for comment from Can.  As for me, attached patch is better. 
Removing ufshcd_wait_for_doorbell_clr(), instead of reading doorbell register, Now
using block layer blk_mq_{un,}freeze_queue(), looks good. I tested your V5 patches,
didn't find problem yet.

Since my available platform doesn't support dynamic clk scaling, I think, now seems only
Qcom UFS controllers support this feature. So we need Can Guo to finally confirm it.

Thanks,

//Bean

    

> -----Original Message-----
> From: Bart Van Assche <bvanassche@acm.org>
> Sent: Wednesday, November 13, 2019 1:56 AM
> To: cang@codeaurora.org
> Cc: Martin K . Petersen <martin.petersen@oracle.com>; James E . J . Bottomley
> <jejb@linux.vnet.ibm.com>; Bean Huo (beanhuo) <beanhuo@micron.com>; Avri
> Altman <avri.altman@wdc.com>; Asutosh Das <asutoshd@codeaurora.org>;
> Vignesh Raghavendra <vigneshr@ti.com>; linux-scsi@vger.kernel.org; Stanley
> Chu <stanley.chu@mediatek.com>; Tomas Winkler <tomas.winkler@intel.com>
> Subject: [EXT] Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism
> implementation
> 
> On 11/12/19 4:11 PM, cang@codeaurora.org wrote:
> > On 2019-11-13 01:37, Bart Van Assche wrote:
> >> @@ -1528,7 +1492,7 @@ int ufshcd_hold(struct ufs_hba *hba, bool
> >> async)
> >>           */
> >>          /* fallthrough */
> >>      case CLKS_OFF:
> >> -        ufshcd_scsi_block_requests(hba);
> >> +        ufshcd_block_requests(hba, ULONG_MAX);
> >
> > ufshcd_hold(async == true) is used in ufshcd_queuecommand() path
> > because
> > ufshcd_queuecommand() can be entered under atomic contexts.
> > Thus ufshcd_block_requests() here has the same risk causing scheduling
> > while atomic.
> > FYI, it is not easy to hit above scenario in small scale of test.
> 
> Hi Bean,
> 
> How about replacing patch 4/4 with the attached patch?
> 
> Thanks,
> 
> Bart.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [EXT] Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation
  2019-11-13 16:03       ` [EXT] " Bean Huo (beanhuo)
@ 2019-11-14 16:11         ` Bart Van Assche
  2019-11-15  6:01           ` Can Guo
  0 siblings, 1 reply; 18+ messages in thread
From: Bart Van Assche @ 2019-11-14 16:11 UTC (permalink / raw)
  To: cang, Bean Huo (beanhuo)
  Cc: Martin K . Petersen, James E . J . Bottomley, Avri Altman,
	Asutosh Das, Vignesh Raghavendra, linux-scsi, Stanley Chu,
	Tomas Winkler

On 11/13/19 8:03 AM, Bean Huo (beanhuo) wrote:
> I think, you are asking for comment from Can.  As for me, attached patch is better.
> Removing ufshcd_wait_for_doorbell_clr(), instead of reading doorbell register, Now
> using block layer blk_mq_{un,}freeze_queue(), looks good. I tested your V5 patches,
> didn't find problem yet.
> 
> Since my available platform doesn't support dynamic clk scaling, I think, now seems only
> Qcom UFS controllers support this feature. So we need Can Guo to finally confirm it.

Hi Can,

Do you agree with this patch series if patch 4 is replaced by the patch 
attached to my previous e-mail? The entire (revised) series is available 
at https://github.com/bvanassche/linux/tree/ufs-for-next.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [EXT] Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation
  2019-11-14 16:11         ` Bart Van Assche
@ 2019-11-15  6:01           ` Can Guo
  2019-11-15 16:23             ` Bart Van Assche
  2019-11-18 18:13             ` Bart Van Assche
  0 siblings, 2 replies; 18+ messages in thread
From: Can Guo @ 2019-11-15  6:01 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Bean Huo (beanhuo),
	Martin K . Petersen, James E . J . Bottomley, Avri Altman,
	Asutosh Das, stummala, Vignesh Raghavendra, linux-scsi,
	Stanley Chu, Tomas Winkler

On 2019-11-15 00:11, Bart Van Assche wrote:
> On 11/13/19 8:03 AM, Bean Huo (beanhuo) wrote:
>> I think, you are asking for comment from Can.  As for me, attached 
>> patch is better.
>> Removing ufshcd_wait_for_doorbell_clr(), instead of reading doorbell 
>> register, Now
>> using block layer blk_mq_{un,}freeze_queue(), looks good. I tested 
>> your V5 patches,
>> didn't find problem yet.
>> 
>> Since my available platform doesn't support dynamic clk scaling, I 
>> think, now seems only
>> Qcom UFS controllers support this feature. So we need Can Guo to 
>> finally confirm it.
> 
> Hi Can,
> 
> Do you agree with this patch series if patch 4 is replaced by the
> patch attached to my previous e-mail? The entire (revised) series is
> available at https://github.com/bvanassche/linux/tree/ufs-for-next.
> 
> Thanks,
> 
> Bart.

Hi Bart,

After ufshcd_clock_scaling_prepare() returns(no error), all request 
queues are frozen. If failure
happens(say power mode change command fails) after this point and error 
handler kicks off,
we need to send dev commands(in ufshcd_probe_hba()) to bring UFS back to 
functionality.
However, as the hba->cmd_queue is frozen, dev commands cannot be sent, 
the error handler shall fail.

Thanks,

Can Guo.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [EXT] Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation
  2019-11-15  6:01           ` Can Guo
@ 2019-11-15 16:23             ` Bart Van Assche
  2019-11-18  3:49               ` cang
       [not found]               ` <0101016e7ca0e791-30050d63-c260-4cc3-a12b-658b7aa70031-000000@us-west-2.amazonses.com>
  2019-11-18 18:13             ` Bart Van Assche
  1 sibling, 2 replies; 18+ messages in thread
From: Bart Van Assche @ 2019-11-15 16:23 UTC (permalink / raw)
  To: Can Guo
  Cc: Bean Huo (beanhuo),
	Martin K . Petersen, James E . J . Bottomley, Avri Altman,
	Asutosh Das, stummala, Vignesh Raghavendra, linux-scsi,
	Stanley Chu, Tomas Winkler

On 11/14/19 10:01 PM, Can Guo wrote:
> After ufshcd_clock_scaling_prepare() returns(no error), all request queues are frozen. If failure
> happens(say power mode change command fails) after this point and error handler kicks off,
> we need to send dev commands(in ufshcd_probe_hba()) to bring UFS back to functionality.
> However, as the hba->cmd_queue is frozen, dev commands cannot be sent, the error handler shall fail.

Hi Can,

I think that's easy to address. How about replacing patch 4/4 with the patch below?

Thanks,

Bart.


Subject: [PATCH] ufs: Simplify the clock scaling mechanism implementation

Scaling the clock is only safe while no commands are in progress. Use
blk_mq_{un,}freeze_queue() to block submission of new commands and to
wait for ongoing commands to complete. This patch removes a semaphore
down and up operation pair from the hot path.

Cc: Bean Huo <beanhuo@micron.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
  drivers/scsi/ufs/ufshcd.c | 115 +++++++++++---------------------------
  drivers/scsi/ufs/ufshcd.h |   1 -
  2 files changed, 33 insertions(+), 83 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 5314e8bfeeb6..343f9b746b70 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -971,65 +971,6 @@ static bool ufshcd_is_devfreq_scaling_required(struct ufs_hba *hba,
  	return false;
  }

-static int ufshcd_wait_for_doorbell_clr(struct ufs_hba *hba,
-					u64 wait_timeout_us)
-{
-	unsigned long flags;
-	int ret = 0;
-	u32 tm_doorbell;
-	u32 tr_doorbell;
-	bool timeout = false, do_last_check = false;
-	ktime_t start;
-
-	ufshcd_hold(hba, false);
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	/*
-	 * Wait for all the outstanding tasks/transfer requests.
-	 * Verify by checking the doorbell registers are clear.
-	 */
-	start = ktime_get();
-	do {
-		if (hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL) {
-			ret = -EBUSY;
-			goto out;
-		}
-
-		tm_doorbell = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL);
-		tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
-		if (!tm_doorbell && !tr_doorbell) {
-			timeout = false;
-			break;
-		} else if (do_last_check) {
-			break;
-		}
-
-		spin_unlock_irqrestore(hba->host->host_lock, flags);
-		schedule();
-		if (ktime_to_us(ktime_sub(ktime_get(), start)) >
-		    wait_timeout_us) {
-			timeout = true;
-			/*
-			 * We might have scheduled out for long time so make
-			 * sure to check if doorbells are cleared by this time
-			 * or not.
-			 */
-			do_last_check = true;
-		}
-		spin_lock_irqsave(hba->host->host_lock, flags);
-	} while (tm_doorbell || tr_doorbell);
-
-	if (timeout) {
-		dev_err(hba->dev,
-			"%s: timedout waiting for doorbell to clear (tm=0x%x, tr=0x%x)\n",
-			__func__, tm_doorbell, tr_doorbell);
-		ret = -EBUSY;
-	}
-out:
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-	ufshcd_release(hba);
-	return ret;
-}
-
  /**
   * ufshcd_scale_gear - scale up/down UFS gear
   * @hba: per adapter instance
@@ -1079,27 +1020,49 @@ static int ufshcd_scale_gear(struct ufs_hba *hba, bool scale_up)

  static int ufshcd_clock_scaling_prepare(struct ufs_hba *hba)
  {
-	#define DOORBELL_CLR_TOUT_US		(1000 * 1000) /* 1 sec */
-	int ret = 0;
+	unsigned long deadline = jiffies + HZ;
+	struct scsi_device *sdev;
+	int res = 0;
+
  	/*
  	 * make sure that there are no outstanding requests when
  	 * clock scaling is in progress
  	 */
-	ufshcd_scsi_block_requests(hba);
-	down_write(&hba->clk_scaling_lock);
-	if (ufshcd_wait_for_doorbell_clr(hba, DOORBELL_CLR_TOUT_US)) {
-		ret = -EBUSY;
-		up_write(&hba->clk_scaling_lock);
-		ufshcd_scsi_unblock_requests(hba);
+	shost_for_each_device(sdev, hba->host)
+		blk_freeze_queue_start(sdev->request_queue);
+	blk_freeze_queue_start(hba->cmd_queue);
+	blk_freeze_queue_start(hba->tmf_queue);
+	shost_for_each_device(sdev, hba->host) {
+		if (blk_mq_freeze_queue_wait_timeout(sdev->request_queue,
+				max_t(long, 0, deadline - jiffies)) <= 0) {
+			goto err;
+		}
  	}
+	if (blk_mq_freeze_queue_wait_timeout(hba->cmd_queue,
+				max_t(long, 0, deadline - jiffies)) <= 0)
+		goto err;
+	if (blk_mq_freeze_queue_wait_timeout(hba->tmf_queue,
+				max_t(long, 0, deadline - jiffies)) <= 0)
+		goto err;

-	return ret;
+out:
+	blk_mq_unfreeze_queue(hba->tmf_queue);
+	blk_mq_unfreeze_queue(hba->cmd_queue);
+	return res;
+
+err:
+	shost_for_each_device(sdev, hba->host)
+		blk_mq_unfreeze_queue(sdev->request_queue);
+	res = -ETIMEDOUT;
+	goto out;
  }

  static void ufshcd_clock_scaling_unprepare(struct ufs_hba *hba)
  {
-	up_write(&hba->clk_scaling_lock);
-	ufshcd_scsi_unblock_requests(hba);
+	struct scsi_device *sdev;
+
+	shost_for_each_device(sdev, hba->host)
+		blk_mq_unfreeze_queue(sdev->request_queue);
  }

  /**
@@ -2394,9 +2357,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
  		BUG();
  	}

-	if (!down_read_trylock(&hba->clk_scaling_lock))
-		return SCSI_MLQUEUE_HOST_BUSY;
-
  	spin_lock_irqsave(hba->host->host_lock, flags);
  	switch (hba->ufshcd_state) {
  	case UFSHCD_STATE_OPERATIONAL:
@@ -2462,7 +2422,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
  out_unlock:
  	spin_unlock_irqrestore(hba->host->host_lock, flags);
  out:
-	up_read(&hba->clk_scaling_lock);
  	return err;
  }

@@ -2616,8 +2575,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
  	struct completion wait;
  	unsigned long flags;

-	down_read(&hba->clk_scaling_lock);
-
  	/*
  	 * Get free slot, sleep if slots are unavailable.
  	 * Even though we use wait_event() which sleeps indefinitely,
@@ -2653,7 +2610,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,

  out_put_tag:
  	blk_put_request(req);
-	up_read(&hba->clk_scaling_lock);
  	return err;
  }

@@ -5771,8 +5727,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
  	unsigned long flags;
  	u32 upiu_flags;

-	down_read(&hba->clk_scaling_lock);
-
  	req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
  	if (IS_ERR(req))
  		return PTR_ERR(req);
@@ -5854,7 +5808,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
  	}

  	blk_put_request(req);
-	up_read(&hba->clk_scaling_lock);
  	return err;
  }

@@ -8323,8 +8276,6 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
  	/* Initialize mutex for device management commands */
  	mutex_init(&hba->dev_cmd.lock);

-	init_rwsem(&hba->clk_scaling_lock);
-
  	ufshcd_init_clk_gating(hba);

  	ufshcd_init_clk_scaling(hba);
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 5865e16f53a6..5ebb920ae874 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -724,7 +724,6 @@ struct ufs_hba {
  	enum bkops_status urgent_bkops_lvl;
  	bool is_urgent_bkops_lvl_checked;

-	struct rw_semaphore clk_scaling_lock;
  	struct ufs_desc_size desc_size;
  	atomic_t scsi_block_reqs_cnt;


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [EXT] Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation
  2019-11-15 16:23             ` Bart Van Assche
@ 2019-11-18  3:49               ` cang
       [not found]               ` <0101016e7ca0e791-30050d63-c260-4cc3-a12b-658b7aa70031-000000@us-west-2.amazonses.com>
  1 sibling, 0 replies; 18+ messages in thread
From: cang @ 2019-11-18  3:49 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Bean Huo (beanhuo),
	Martin K . Petersen, James E . J . Bottomley, Avri Altman,
	Asutosh Das, stummala, Vignesh Raghavendra, linux-scsi,
	Stanley Chu, Tomas Winkler

On 2019-11-16 00:23, Bart Van Assche wrote:
> On 11/14/19 10:01 PM, Can Guo wrote:
>> After ufshcd_clock_scaling_prepare() returns(no error), all request 
>> queues are frozen. If failure
>> happens(say power mode change command fails) after this point and 
>> error handler kicks off,
>> we need to send dev commands(in ufshcd_probe_hba()) to bring UFS back 
>> to functionality.
>> However, as the hba->cmd_queue is frozen, dev commands cannot be sent, 
>> the error handler shall fail.
> 
> Hi Can,
> 
> I think that's easy to address. How about replacing patch 4/4 with the
> patch below?
> 
> Thanks,
> 
> Bart.
> 
> 
> Subject: [PATCH] ufs: Simplify the clock scaling mechanism 
> implementation
> 
> Scaling the clock is only safe while no commands are in progress. Use
> blk_mq_{un,}freeze_queue() to block submission of new commands and to
> wait for ongoing commands to complete. This patch removes a semaphore
> down and up operation pair from the hot path.
> 
> Cc: Bean Huo <beanhuo@micron.com>
> Cc: Avri Altman <avri.altman@wdc.com>
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> Cc: Tomas Winkler <tomas.winkler@intel.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/ufs/ufshcd.c | 115 +++++++++++---------------------------
>  drivers/scsi/ufs/ufshcd.h |   1 -
>  2 files changed, 33 insertions(+), 83 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 5314e8bfeeb6..343f9b746b70 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -971,65 +971,6 @@ static bool
> ufshcd_is_devfreq_scaling_required(struct ufs_hba *hba,
>  	return false;
>  }
> 
> -static int ufshcd_wait_for_doorbell_clr(struct ufs_hba *hba,
> -					u64 wait_timeout_us)
> -{
> -	unsigned long flags;
> -	int ret = 0;
> -	u32 tm_doorbell;
> -	u32 tr_doorbell;
> -	bool timeout = false, do_last_check = false;
> -	ktime_t start;
> -
> -	ufshcd_hold(hba, false);
> -	spin_lock_irqsave(hba->host->host_lock, flags);
> -	/*
> -	 * Wait for all the outstanding tasks/transfer requests.
> -	 * Verify by checking the doorbell registers are clear.
> -	 */
> -	start = ktime_get();
> -	do {
> -		if (hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL) {
> -			ret = -EBUSY;
> -			goto out;
> -		}
> -
> -		tm_doorbell = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL);
> -		tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
> -		if (!tm_doorbell && !tr_doorbell) {
> -			timeout = false;
> -			break;
> -		} else if (do_last_check) {
> -			break;
> -		}
> -
> -		spin_unlock_irqrestore(hba->host->host_lock, flags);
> -		schedule();
> -		if (ktime_to_us(ktime_sub(ktime_get(), start)) >
> -		    wait_timeout_us) {
> -			timeout = true;
> -			/*
> -			 * We might have scheduled out for long time so make
> -			 * sure to check if doorbells are cleared by this time
> -			 * or not.
> -			 */
> -			do_last_check = true;
> -		}
> -		spin_lock_irqsave(hba->host->host_lock, flags);
> -	} while (tm_doorbell || tr_doorbell);
> -
> -	if (timeout) {
> -		dev_err(hba->dev,
> -			"%s: timedout waiting for doorbell to clear (tm=0x%x, tr=0x%x)\n",
> -			__func__, tm_doorbell, tr_doorbell);
> -		ret = -EBUSY;
> -	}
> -out:
> -	spin_unlock_irqrestore(hba->host->host_lock, flags);
> -	ufshcd_release(hba);
> -	return ret;
> -}
> -
>  /**
>   * ufshcd_scale_gear - scale up/down UFS gear
>   * @hba: per adapter instance
> @@ -1079,27 +1020,49 @@ static int ufshcd_scale_gear(struct ufs_hba
> *hba, bool scale_up)
> 
>  static int ufshcd_clock_scaling_prepare(struct ufs_hba *hba)
>  {
> -	#define DOORBELL_CLR_TOUT_US		(1000 * 1000) /* 1 sec */
> -	int ret = 0;
> +	unsigned long deadline = jiffies + HZ;
> +	struct scsi_device *sdev;
> +	int res = 0;
> +
>  	/*
>  	 * make sure that there are no outstanding requests when
>  	 * clock scaling is in progress
>  	 */
> -	ufshcd_scsi_block_requests(hba);
> -	down_write(&hba->clk_scaling_lock);
> -	if (ufshcd_wait_for_doorbell_clr(hba, DOORBELL_CLR_TOUT_US)) {
> -		ret = -EBUSY;
> -		up_write(&hba->clk_scaling_lock);
> -		ufshcd_scsi_unblock_requests(hba);
> +	shost_for_each_device(sdev, hba->host)
> +		blk_freeze_queue_start(sdev->request_queue);
> +	blk_freeze_queue_start(hba->cmd_queue);
> +	blk_freeze_queue_start(hba->tmf_queue);
> +	shost_for_each_device(sdev, hba->host) {
> +		if (blk_mq_freeze_queue_wait_timeout(sdev->request_queue,
> +				max_t(long, 0, deadline - jiffies)) <= 0) {
> +			goto err;
> +		}
>  	}
> +	if (blk_mq_freeze_queue_wait_timeout(hba->cmd_queue,
> +				max_t(long, 0, deadline - jiffies)) <= 0)
> +		goto err;
> +	if (blk_mq_freeze_queue_wait_timeout(hba->tmf_queue,
> +				max_t(long, 0, deadline - jiffies)) <= 0)
> +		goto err;
> 
> -	return ret;
> +out:
> +	blk_mq_unfreeze_queue(hba->tmf_queue);
> +	blk_mq_unfreeze_queue(hba->cmd_queue);

Hi Bart,

After this point, dev commands can be sent on hba->cmd_queue.
But there are multiple entries which users can send dev commands through 
in ufs-sysfs.c.
So the clock scaling sequence lost its protection from being disturbed.

Best Regards,
Can Guo.

> +	return res;
> +
> +err:
> +	shost_for_each_device(sdev, hba->host)
> +		blk_mq_unfreeze_queue(sdev->request_queue);
> +	res = -ETIMEDOUT;
> +	goto out;
>  }
> 
>  static void ufshcd_clock_scaling_unprepare(struct ufs_hba *hba)
>  {
> -	up_write(&hba->clk_scaling_lock);
> -	ufshcd_scsi_unblock_requests(hba);
> +	struct scsi_device *sdev;
> +
> +	shost_for_each_device(sdev, hba->host)
> +		blk_mq_unfreeze_queue(sdev->request_queue);
>  }
> 
>  /**
> @@ -2394,9 +2357,6 @@ static int ufshcd_queuecommand(struct Scsi_Host
> *host, struct scsi_cmnd *cmd)
>  		BUG();
>  	}
> 
> -	if (!down_read_trylock(&hba->clk_scaling_lock))
> -		return SCSI_MLQUEUE_HOST_BUSY;
> -
>  	spin_lock_irqsave(hba->host->host_lock, flags);
>  	switch (hba->ufshcd_state) {
>  	case UFSHCD_STATE_OPERATIONAL:
> @@ -2462,7 +2422,6 @@ static int ufshcd_queuecommand(struct Scsi_Host
> *host, struct scsi_cmnd *cmd)
>  out_unlock:
>  	spin_unlock_irqrestore(hba->host->host_lock, flags);
>  out:
> -	up_read(&hba->clk_scaling_lock);
>  	return err;
>  }
> 
> @@ -2616,8 +2575,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba 
> *hba,
>  	struct completion wait;
>  	unsigned long flags;
> 
> -	down_read(&hba->clk_scaling_lock);
> -
>  	/*
>  	 * Get free slot, sleep if slots are unavailable.
>  	 * Even though we use wait_event() which sleeps indefinitely,
> @@ -2653,7 +2610,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba 
> *hba,
> 
>  out_put_tag:
>  	blk_put_request(req);
> -	up_read(&hba->clk_scaling_lock);
>  	return err;
>  }
> 
> @@ -5771,8 +5727,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct
> ufs_hba *hba,
>  	unsigned long flags;
>  	u32 upiu_flags;
> 
> -	down_read(&hba->clk_scaling_lock);
> -
>  	req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
>  	if (IS_ERR(req))
>  		return PTR_ERR(req);
> @@ -5854,7 +5808,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct
> ufs_hba *hba,
>  	}
> 
>  	blk_put_request(req);
> -	up_read(&hba->clk_scaling_lock);
>  	return err;
>  }
> 
> @@ -8323,8 +8276,6 @@ int ufshcd_init(struct ufs_hba *hba, void
> __iomem *mmio_base, unsigned int irq)
>  	/* Initialize mutex for device management commands */
>  	mutex_init(&hba->dev_cmd.lock);
> 
> -	init_rwsem(&hba->clk_scaling_lock);
> -
>  	ufshcd_init_clk_gating(hba);
> 
>  	ufshcd_init_clk_scaling(hba);
> diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
> index 5865e16f53a6..5ebb920ae874 100644
> --- a/drivers/scsi/ufs/ufshcd.h
> +++ b/drivers/scsi/ufs/ufshcd.h
> @@ -724,7 +724,6 @@ struct ufs_hba {
>  	enum bkops_status urgent_bkops_lvl;
>  	bool is_urgent_bkops_lvl_checked;
> 
> -	struct rw_semaphore clk_scaling_lock;
>  	struct ufs_desc_size desc_size;
>  	atomic_t scsi_block_reqs_cnt;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [EXT] Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation
       [not found]               ` <0101016e7ca0e791-30050d63-c260-4cc3-a12b-658b7aa70031-000000@us-west-2.amazonses.com>
@ 2019-11-18 17:49                 ` Bart Van Assche
  0 siblings, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2019-11-18 17:49 UTC (permalink / raw)
  To: cang
  Cc: Bean Huo (beanhuo),
	Martin K . Petersen, James E . J . Bottomley, Avri Altman,
	Asutosh Das, stummala, Vignesh Raghavendra, linux-scsi,
	Stanley Chu, Tomas Winkler

On 11/17/19 7:49 PM, cang@codeaurora.org wrote:
> After this point, dev commands can be sent on hba->cmd_queue.
> But there are multiple entries which users can send dev commands through in ufs-sysfs.c.
> So the clock scaling sequence lost its protection from being disturbed.

Hi Can,

My understanding of the current implementation (Martin's 5.5/scsi-queue branch)
is that hba->clk_scaling_lock serializes clock scaling and submission of SCSI
and device commands but not the submission of TMFs. I think that the following
call chain allows user space to submit TMFs while clock scaling is in progress:

submit UPIU_TRANSACTION_TASK_REQ on an UFS BSG queue
-> ufs_bsg_request()
   -> ufshcd_exec_raw_upiu_cmd(msgcode = UPIU_TRANSACTION_TASK_REQ)
     -> ufshcd_exec_raw_upiu_cmd()
       -> __ufshcd_issue_tm_cmd()

Is that a feature or is that rather unintended behavior?

Anyway, can you have a look at the patch below and verify whether it preserves
existing behavior?

Thank you,

Bart.


Subject: [PATCH] ufs: Simplify the clock scaling mechanism implementation

Scaling the clock is only safe while no commands are in progress. The
current clock scaling implementation uses hba->clk_scaling_lock to
serialize clock scaling against the following three functions:
* ufshcd_queuecommand()          (uses sdev->request_queue)
* ufshcd_exec_dev_cmd()          (uses hba->cmd_queue)
* ufshcd_issue_devman_upiu_cmd() (uses hba->cmd_queue)

__ufshcd_issue_tm_cmd(), which uses hba->tmf_queue, is not serialized
against clock scaling.

Use blk_mq_{un,}freeze_queue() to block submission of new commands and to
wait for ongoing commands to complete. This patch removes a semaphore
down and up operation pair from the hot path.

Cc: Bean Huo <beanhuo@micron.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
  drivers/scsi/ufs/ufshcd.c | 116 +++++++++++---------------------------
  drivers/scsi/ufs/ufshcd.h |   1 -
  2 files changed, 34 insertions(+), 83 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 5314e8bfeeb6..46950cc87dc5 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -971,65 +971,6 @@ static bool ufshcd_is_devfreq_scaling_required(struct ufs_hba *hba,
  	return false;
  }

-static int ufshcd_wait_for_doorbell_clr(struct ufs_hba *hba,
-					u64 wait_timeout_us)
-{
-	unsigned long flags;
-	int ret = 0;
-	u32 tm_doorbell;
-	u32 tr_doorbell;
-	bool timeout = false, do_last_check = false;
-	ktime_t start;
-
-	ufshcd_hold(hba, false);
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	/*
-	 * Wait for all the outstanding tasks/transfer requests.
-	 * Verify by checking the doorbell registers are clear.
-	 */
-	start = ktime_get();
-	do {
-		if (hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL) {
-			ret = -EBUSY;
-			goto out;
-		}
-
-		tm_doorbell = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL);
-		tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
-		if (!tm_doorbell && !tr_doorbell) {
-			timeout = false;
-			break;
-		} else if (do_last_check) {
-			break;
-		}
-
-		spin_unlock_irqrestore(hba->host->host_lock, flags);
-		schedule();
-		if (ktime_to_us(ktime_sub(ktime_get(), start)) >
-		    wait_timeout_us) {
-			timeout = true;
-			/*
-			 * We might have scheduled out for long time so make
-			 * sure to check if doorbells are cleared by this time
-			 * or not.
-			 */
-			do_last_check = true;
-		}
-		spin_lock_irqsave(hba->host->host_lock, flags);
-	} while (tm_doorbell || tr_doorbell);
-
-	if (timeout) {
-		dev_err(hba->dev,
-			"%s: timedout waiting for doorbell to clear (tm=0x%x, tr=0x%x)\n",
-			__func__, tm_doorbell, tr_doorbell);
-		ret = -EBUSY;
-	}
-out:
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-	ufshcd_release(hba);
-	return ret;
-}
-
  /**
   * ufshcd_scale_gear - scale up/down UFS gear
   * @hba: per adapter instance
@@ -1079,27 +1020,50 @@ static int ufshcd_scale_gear(struct ufs_hba *hba, bool scale_up)

  static int ufshcd_clock_scaling_prepare(struct ufs_hba *hba)
  {
-	#define DOORBELL_CLR_TOUT_US		(1000 * 1000) /* 1 sec */
-	int ret = 0;
+	unsigned long deadline = jiffies + HZ;
+	struct scsi_device *sdev;
+	int res = 0;
+
  	/*
  	 * make sure that there are no outstanding requests when
  	 * clock scaling is in progress
  	 */
-	ufshcd_scsi_block_requests(hba);
-	down_write(&hba->clk_scaling_lock);
-	if (ufshcd_wait_for_doorbell_clr(hba, DOORBELL_CLR_TOUT_US)) {
-		ret = -EBUSY;
-		up_write(&hba->clk_scaling_lock);
-		ufshcd_scsi_unblock_requests(hba);
+	shost_for_each_device(sdev, hba->host)
+		blk_freeze_queue_start(sdev->request_queue);
+	blk_freeze_queue_start(hba->cmd_queue);
+	blk_freeze_queue_start(hba->tmf_queue);
+	shost_for_each_device(sdev, hba->host) {
+		if (blk_mq_freeze_queue_wait_timeout(sdev->request_queue,
+				max_t(long, 0, deadline - jiffies)) <= 0) {
+			goto err;
+		}
  	}
+	if (blk_mq_freeze_queue_wait_timeout(hba->cmd_queue,
+				max_t(long, 0, deadline - jiffies)) <= 0)
+		goto err;
+	if (blk_mq_freeze_queue_wait_timeout(hba->tmf_queue,
+				max_t(long, 0, deadline - jiffies)) <= 0)
+		goto err;

-	return ret;
+out:
+	blk_mq_unfreeze_queue(hba->tmf_queue);
+	return res;
+
+err:
+	blk_mq_unfreeze_queue(hba->cmd_queue);
+	shost_for_each_device(sdev, hba->host)
+		blk_mq_unfreeze_queue(sdev->request_queue);
+	res = -ETIMEDOUT;
+	goto out;
  }

  static void ufshcd_clock_scaling_unprepare(struct ufs_hba *hba)
  {
-	up_write(&hba->clk_scaling_lock);
-	ufshcd_scsi_unblock_requests(hba);
+	struct scsi_device *sdev;
+
+	blk_mq_unfreeze_queue(hba->cmd_queue);
+	shost_for_each_device(sdev, hba->host)
+		blk_mq_unfreeze_queue(sdev->request_queue);
  }

  /**
@@ -2394,9 +2358,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
  		BUG();
  	}

-	if (!down_read_trylock(&hba->clk_scaling_lock))
-		return SCSI_MLQUEUE_HOST_BUSY;
-
  	spin_lock_irqsave(hba->host->host_lock, flags);
  	switch (hba->ufshcd_state) {
  	case UFSHCD_STATE_OPERATIONAL:
@@ -2462,7 +2423,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
  out_unlock:
  	spin_unlock_irqrestore(hba->host->host_lock, flags);
  out:
-	up_read(&hba->clk_scaling_lock);
  	return err;
  }

@@ -2616,8 +2576,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
  	struct completion wait;
  	unsigned long flags;

-	down_read(&hba->clk_scaling_lock);
-
  	/*
  	 * Get free slot, sleep if slots are unavailable.
  	 * Even though we use wait_event() which sleeps indefinitely,
@@ -2653,7 +2611,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,

  out_put_tag:
  	blk_put_request(req);
-	up_read(&hba->clk_scaling_lock);
  	return err;
  }

@@ -5771,8 +5728,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
  	unsigned long flags;
  	u32 upiu_flags;

-	down_read(&hba->clk_scaling_lock);
-
  	req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
  	if (IS_ERR(req))
  		return PTR_ERR(req);
@@ -5854,7 +5809,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
  	}

  	blk_put_request(req);
-	up_read(&hba->clk_scaling_lock);
  	return err;
  }

@@ -8323,8 +8277,6 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
  	/* Initialize mutex for device management commands */
  	mutex_init(&hba->dev_cmd.lock);

-	init_rwsem(&hba->clk_scaling_lock);
-
  	ufshcd_init_clk_gating(hba);

  	ufshcd_init_clk_scaling(hba);
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 5865e16f53a6..5ebb920ae874 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -724,7 +724,6 @@ struct ufs_hba {
  	enum bkops_status urgent_bkops_lvl;
  	bool is_urgent_bkops_lvl_checked;

-	struct rw_semaphore clk_scaling_lock;
  	struct ufs_desc_size desc_size;
  	atomic_t scsi_block_reqs_cnt;


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [EXT] Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation
  2019-11-15  6:01           ` Can Guo
  2019-11-15 16:23             ` Bart Van Assche
@ 2019-11-18 18:13             ` Bart Van Assche
  2019-11-19  5:33               ` cang
       [not found]               ` <0101016e822696b5-d1c358be-a0a2-4ef6-b04d-627c1fb361f8-000000@us-west-2.amazonses.com>
  1 sibling, 2 replies; 18+ messages in thread
From: Bart Van Assche @ 2019-11-18 18:13 UTC (permalink / raw)
  To: Can Guo
  Cc: Bean Huo (beanhuo),
	Martin K . Petersen, James E . J . Bottomley, Avri Altman,
	Asutosh Das, stummala, Vignesh Raghavendra, linux-scsi,
	Stanley Chu, Tomas Winkler

On 11/14/19 10:01 PM, Can Guo wrote:
> On 2019-11-15 00:11, Bart Van Assche wrote:
>> On 11/13/19 8:03 AM, Bean Huo (beanhuo) wrote:
>>> I think, you are asking for comment from Can.  As for me, attached 
>>> patch is better.
>>> Removing ufshcd_wait_for_doorbell_clr(), instead of reading doorbell 
>>> register, Now
>>> using block layer blk_mq_{un,}freeze_queue(), looks good. I tested 
>>> your V5 patches,
>>> didn't find problem yet.
>>>
>>> Since my available platform doesn't support dynamic clk scaling, I 
>>> think, now seems only
>>> Qcom UFS controllers support this feature. So we need Can Guo to 
>>> finally confirm it.
>>
>> Do you agree with this patch series if patch 4 is replaced by the
>> patch attached to my previous e-mail? The entire (revised) series is
>> available at https://github.com/bvanassche/linux/tree/ufs-for-next.
>
> After ufshcd_clock_scaling_prepare() returns(no error), all request 
> queues are frozen. If failure
> happens(say power mode change command fails) after this point and error 
> handler kicks off,
> we need to send dev commands(in ufshcd_probe_hba()) to bring UFS back to 
> functionality.
> However, as the hba->cmd_queue is frozen, dev commands cannot be sent, 
> the error handler shall fail.

Hi Can,

My understanding of the current UFS driver code is that
ufshcd_clock_scaling_prepare() waits for ongoing commands to finish but 
not for SCSI error handling to finish. Would you agree with changing 
that behavior such that ufshcd_clock_scaling_prepare() returns an error
code if SCSI error handling is in progress? Do you agree that once that 
change has been made that it is fine to invoke blk_freeze_queue_start() 
for all three types of block layer request queues (SCSI commands, device 
management commands and TMFs)? Do you agree that this would fix the 
issue that it is possible today to submit TMFs from user space using 
through the BSG queue while clock scaling is in progress?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [EXT] Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation
  2019-11-18 18:13             ` Bart Van Assche
@ 2019-11-19  5:33               ` cang
       [not found]               ` <0101016e822696b5-d1c358be-a0a2-4ef6-b04d-627c1fb361f8-000000@us-west-2.amazonses.com>
  1 sibling, 0 replies; 18+ messages in thread
From: cang @ 2019-11-19  5:33 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Bean Huo (beanhuo),
	Martin K . Petersen, James E . J . Bottomley, Avri Altman,
	Asutosh Das, stummala, Vignesh Raghavendra, linux-scsi,
	Stanley Chu, Tomas Winkler

On 2019-11-19 02:13, Bart Van Assche wrote:
> On 11/14/19 10:01 PM, Can Guo wrote:
>> On 2019-11-15 00:11, Bart Van Assche wrote:
>>> On 11/13/19 8:03 AM, Bean Huo (beanhuo) wrote:
>>>> I think, you are asking for comment from Can.  As for me, attached 
>>>> patch is better.
>>>> Removing ufshcd_wait_for_doorbell_clr(), instead of reading doorbell 
>>>> register, Now
>>>> using block layer blk_mq_{un,}freeze_queue(), looks good. I tested 
>>>> your V5 patches,
>>>> didn't find problem yet.
>>>> 
>>>> Since my available platform doesn't support dynamic clk scaling, I 
>>>> think, now seems only
>>>> Qcom UFS controllers support this feature. So we need Can Guo to 
>>>> finally confirm it.
>>> 
>>> Do you agree with this patch series if patch 4 is replaced by the
>>> patch attached to my previous e-mail? The entire (revised) series is
>>> available at https://github.com/bvanassche/linux/tree/ufs-for-next.
>> 
>> After ufshcd_clock_scaling_prepare() returns(no error), all request 
>> queues are frozen. If failure
>> happens(say power mode change command fails) after this point and 
>> error handler kicks off,
>> we need to send dev commands(in ufshcd_probe_hba()) to bring UFS back 
>> to functionality.
>> However, as the hba->cmd_queue is frozen, dev commands cannot be sent, 
>> the error handler shall fail.
> 
> Hi Can,
> 
> My understanding of the current UFS driver code is that
> ufshcd_clock_scaling_prepare() waits for ongoing commands to finish
> but not for SCSI error handling to finish. Would you agree with
> changing that behavior such that ufshcd_clock_scaling_prepare()
> returns an error
> code if SCSI error handling is in progress? Do you agree that once
> that change has been made that it is fine to invoke
> blk_freeze_queue_start() for all three types of block layer request
> queues (SCSI commands, device management commands and TMFs)? Do you
> agree that this would fix the issue that it is possible today to
> submit TMFs from user space using through the BSG queue while clock
> scaling is in progress?
> 
> Thanks,
> 
> Bart.

Hi Bart,

Your understanding is correct.

> Would you agree with changing that behavior such that 
> ufshcd_clock_scaling_prepare()
> returns an error code if SCSI error handling is in progress?

I believe we already have the check of ufshcd_eh_in_progress() in
ufshcd_devfreq_target() before call ufshcd_devfreq_scale().

> Do you agree that once that change has been made that it is fine to 
> invoke
> blk_freeze_queue_start() for all three types of block layer request
> queues (SCSI commands, device management commands and TMFs)?

I agree. As err handling work always runs in ASYNC way in current code 
base, it is fine to freeze all the queues.
If there is err handling work, scheduled during scaling, would 
eventually be unblocked when hba->cmd_queue
is unfrozen after scaling returns or fails.

Sorry for making you confused. My previous comment is based on that if 
we need to do hibern8
enter/exit in vendor ops (see 
https://lore.kernel.org/patchwork/patch/1143579/), and if hibern8
enter/exit fails during scaling, we need to call ufshcd_link_recovery() 
to recovery everything in SYNC way.
And as hba->cmd_queue is frozen, ufshcd_link_recovery() would hang when 
it needs to send device
manangement commands. In this case, we need to make sure hba->cmd_queue 
is NOT frozen.
So it seems contraditory.

FYI, even with the patch 
https://lore.kernel.org/patchwork/patch/1143579/, current
implementation also has this problem. And we have another change to 
address it by
making change to ufshcd_exec_dev_cmd() like below.

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index c2a2ff2..81a000e 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -4068,6 +4068,7 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba 
*hba,
         int tag;
         struct completion wait;
         unsigned long flags;
+       bool has_read_lock = false;

@@ -4075,8 +4076,10 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba 
*hba,

+       if (!ufshcd_eh_in_progress(hba)) {
                 down_read(&hba->lock);
+               has_read_lock = true;
+       }

         /*
          * Get free slot, sleep if slots are unavailable.
@@ -4110,7 +4113,7 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba 
*hba,
  out_put_tag:
         ufshcd_put_dev_cmd_tag(hba, tag);
         wake_up(&hba->dev_cmd.tag_wq);
+       if (has_read_lock)
                 up_read(&hba->lock);
         return err;
  }


Aside the functionality, we need to take this change carefully as it may 
relate with performance.
With the old implementation, when devfreq decides to scale up, we can 
make sure that all requests
sent to UFS device after this point will go through the highest UFS 
Gear. Scaling up will happen
right after doorbell is cleared, not even need to wait till the requests 
are freed from block layer.
And 1 sec is long enough to clean out the doorbell by HW.

In the new implementation of ufshcd_clock_scaling_prepare(), after 
blk_freeze_queue_start(), we call
blk_mq_freeze_queue_wait_timeout() to wait for 1 sec. In addition to 
those requests which have already
been queued to HW doorbell, more requests will be dispatched within 1 
sec, through the lowest Gear.
My first impression is that it might be a bit lazy and I am not sure how 
much it may affect the benchmarks.

And if the request queues are heavily loaded(many bios are waiting for a 
free tag to become a request),
is 1 sec long enough to freeze all these request queues? If no, 
performance would be affected if scaling up
fails frequently.
As per my understanding, the request queue will become frozen only after 
all the existing requests and
bios(which are already waiting for a free tag on the queue) to be 
completed and freed from block layer.
Please correct me if I am wrong.

While, in the old implementatoin, when devfreq decides to scale up, 
scaling can always
happen for majority chances, execept for those unexpected HW issues.

Best Regards,
Can Guo

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [EXT] Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation
       [not found]               ` <0101016e822696b5-d1c358be-a0a2-4ef6-b04d-627c1fb361f8-000000@us-west-2.amazonses.com>
@ 2019-11-19 23:16                 ` Bart Van Assche
  2019-11-20  3:38                   ` cang
       [not found]                   ` <0101016e86e3a961-1840fd1c-d71b-434a-8392-f62d7ece8b0f-000000@us-west-2.amazonses.com>
  0 siblings, 2 replies; 18+ messages in thread
From: Bart Van Assche @ 2019-11-19 23:16 UTC (permalink / raw)
  To: cang
  Cc: Bean Huo (beanhuo),
	Martin K . Petersen, James E . J . Bottomley, Avri Altman,
	Asutosh Das, stummala, Vignesh Raghavendra, linux-scsi,
	Stanley Chu, Tomas Winkler

On 11/18/19 9:33 PM, cang@codeaurora.org wrote:
> Aside the functionality, we need to take this change carefully as it may 
> relate with performance.
> With the old implementation, when devfreq decides to scale up, we can 
> make sure that all requests
> sent to UFS device after this point will go through the highest UFS 
> Gear. Scaling up will happen
> right after doorbell is cleared, not even need to wait till the requests 
> are freed from block layer.
> And 1 sec is long enough to clean out the doorbell by HW.
> 
> In the new implementation of ufshcd_clock_scaling_prepare(), after 
> blk_freeze_queue_start(), we call
> blk_mq_freeze_queue_wait_timeout() to wait for 1 sec. In addition to 
> those requests which have already
> been queued to HW doorbell, more requests will be dispatched within 1 
> sec, through the lowest Gear.
> My first impression is that it might be a bit lazy and I am not sure how 
> much it may affect the benchmarks.
> 
> And if the request queues are heavily loaded(many bios are waiting for a 
> free tag to become a request),
> is 1 sec long enough to freeze all these request queues? If no, 
> performance would be affected if scaling up
> fails frequently.
> As per my understanding, the request queue will become frozen only after 
> all the existing requests and
> bios(which are already waiting for a free tag on the queue) to be 
> completed and freed from block layer.
> Please correct me if I am wrong.
> 
> While, in the old implementation, when devfreq decides to scale up, 
> scaling can always
> happen for majority chances, except for those unexpected HW issues.

Hi Can,

I agree with the description above of the current behavior of the UFS 
driver. But I do not agree with the interpretation of the behavior of 
the changes I proposed. The implementation in the block layer of request 
queue freezing is as follows:

void blk_freeze_queue_start(struct request_queue *q)
{
	mutex_lock(&q->mq_freeze_lock);
	if (++q->mq_freeze_depth == 1) {
		percpu_ref_kill(&q->q_usage_counter);
		mutex_unlock(&q->mq_freeze_lock);
		if (queue_is_mq(q))
			blk_mq_run_hw_queues(q, false);
	} else {
		mutex_unlock(&q->mq_freeze_lock);
	}
}

As long as calls from other threads to 
blk_mq_freeze_queue()/blk_mq_unfreeze_queue() are balanced, after 
blk_freeze_queue_start() returns it is guaranteed that 
q->q_usage_counter is in __PERCPU_REF_DEAD mode. Calls to 
blk_get_request() etc. block in that mode until the request queue is 
unfrozen. See also blk_queue_enter().

In other words, calling blk_freeze_queue() from inside 
ufshcd_clock_scaling_prepare() should preserve the current behavior, 
namely that ufshcd_clock_scaling_prepare() waits for ongoing requests to 
finish and also that submission of new requests is postponed until clock 
scaling has finished. Do you agree with this analysis?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [EXT] Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation
  2019-11-19 23:16                 ` Bart Van Assche
@ 2019-11-20  3:38                   ` cang
       [not found]                   ` <0101016e86e3a961-1840fd1c-d71b-434a-8392-f62d7ece8b0f-000000@us-west-2.amazonses.com>
  1 sibling, 0 replies; 18+ messages in thread
From: cang @ 2019-11-20  3:38 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Bean Huo (beanhuo),
	Martin K . Petersen, James E . J . Bottomley, Avri Altman,
	Asutosh Das, stummala, Vignesh Raghavendra, linux-scsi,
	Stanley Chu, Tomas Winkler

On 2019-11-20 07:16, Bart Van Assche wrote:
> On 11/18/19 9:33 PM, cang@codeaurora.org wrote:
>> Aside the functionality, we need to take this change carefully as it 
>> may relate with performance.
>> With the old implementation, when devfreq decides to scale up, we can 
>> make sure that all requests
>> sent to UFS device after this point will go through the highest UFS 
>> Gear. Scaling up will happen
>> right after doorbell is cleared, not even need to wait till the 
>> requests are freed from block layer.
>> And 1 sec is long enough to clean out the doorbell by HW.
>> 
>> In the new implementation of ufshcd_clock_scaling_prepare(), after 
>> blk_freeze_queue_start(), we call
>> blk_mq_freeze_queue_wait_timeout() to wait for 1 sec. In addition to 
>> those requests which have already
>> been queued to HW doorbell, more requests will be dispatched within 1 
>> sec, through the lowest Gear.
>> My first impression is that it might be a bit lazy and I am not sure 
>> how much it may affect the benchmarks.
>> 
>> And if the request queues are heavily loaded(many bios are waiting for 
>> a free tag to become a request),
>> is 1 sec long enough to freeze all these request queues? If no, 
>> performance would be affected if scaling up
>> fails frequently.
>> As per my understanding, the request queue will become frozen only 
>> after all the existing requests and
>> bios(which are already waiting for a free tag on the queue) to be 
>> completed and freed from block layer.
>> Please correct me if I am wrong.
>> 
>> While, in the old implementation, when devfreq decides to scale up, 
>> scaling can always
>> happen for majority chances, except for those unexpected HW issues.
> 
> Hi Can,
> 
> I agree with the description above of the current behavior of the UFS
> driver. But I do not agree with the interpretation of the behavior of
> the changes I proposed. The implementation in the block layer of
> request queue freezing is as follows:
> 
> void blk_freeze_queue_start(struct request_queue *q)
> {
> 	mutex_lock(&q->mq_freeze_lock);
> 	if (++q->mq_freeze_depth == 1) {
> 		percpu_ref_kill(&q->q_usage_counter);
> 		mutex_unlock(&q->mq_freeze_lock);
> 		if (queue_is_mq(q))
> 			blk_mq_run_hw_queues(q, false);
> 	} else {
> 		mutex_unlock(&q->mq_freeze_lock);
> 	}
> }
> 
> As long as calls from other threads to
> blk_mq_freeze_queue()/blk_mq_unfreeze_queue() are balanced, after
> blk_freeze_queue_start() returns it is guaranteed that
> q->q_usage_counter is in __PERCPU_REF_DEAD mode. Calls to
> blk_get_request() etc. block in that mode until the request queue is
> unfrozen. See also blk_queue_enter().
> 
> In other words, calling blk_freeze_queue() from inside
> ufshcd_clock_scaling_prepare() should preserve the current behavior,
> namely that ufshcd_clock_scaling_prepare() waits for ongoing requests
> to finish and also that submission of new requests is postponed until
> clock scaling has finished. Do you agree with this analysis?
> 
> Thanks,
> 
> Bart.

Hi Bart,

I am still not quite clear about the blk_freeze_queue() part.

>> In the new implementation of ufshcd_clock_scaling_prepare(), after 
>> blk_freeze_queue_start(), we call
>> blk_mq_freeze_queue_wait_timeout() to wait for 1 sec. In addition to 
>> those requests which have already
>> been queued to HW doorbell, more requests will be dispatched within 1 
>> sec, through the lowest Gear.
>> My first impression is that it might be a bit lazy and I am not sure 
>> how much it may affect the benchmarks.

First of all, reg above lines, do you agree that there can be latency in 
scaling up/down
comparing with the old implementation?

I can understand that after queue is frozen, all calls to 
blk_get_request()
are blocked, no submission can be made after this point, due to
blk_queue_enter() shall wait on q->mq_freeze_wq.

<--code snippet-->
wait_event(q->mq_freeze_wq,
            (atomic_read(&q->mq_freeze_depth) == 0 &&
<--code snippet-->

>> And if the request queues are heavily loaded(many bios are waiting for 
>> a free tag to become a request),
>> is 1 sec long enough to freeze all these request queue?

But here I meant those bios, before we call blk_freeze_queue_start(), 
sent through
submit_bio() calls which have already entered blk_mq_get_request() and 
already
returned from the blk_queue_enter_live(). These bios are waiting for a 
free tag
(waiting on func blk_mq_get_tag() when queue is full).
Shall the request queue be frozen before these bios are handled?

void blk_mq_freeze_queue_wait(struct request_queue *q)
{
  	wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter));
}
EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);

Per my understanding, these bios have already increased 
&q->q_usage_counter.
And the &q->q_usage_counter will only become 0 when all of the requests 
in the
queue and these bios(after they get free tags and turned into requests) 
are
completed from block layer, meaning after blk_queue_exit() is called in
__blk_mq_free_request() for all of them. Please correct me if I am 
wrong.

Thanks,

Can Guo.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [EXT] Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation
       [not found]                   ` <0101016e86e3a961-1840fd1c-d71b-434a-8392-f62d7ece8b0f-000000@us-west-2.amazonses.com>
@ 2019-11-20 17:58                     ` Bart Van Assche
  0 siblings, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2019-11-20 17:58 UTC (permalink / raw)
  To: cang
  Cc: Bean Huo (beanhuo),
	Martin K . Petersen, James E . J . Bottomley, Avri Altman,
	Asutosh Das, stummala, Vignesh Raghavendra, linux-scsi,
	Stanley Chu, Tomas Winkler

On 11/19/19 7:38 PM, cang@codeaurora.org wrote:
> On 2019-11-20 07:16, Bart Van Assche wrote:
>> On 11/18/19 9:33 PM, cang@codeaurora.org wrote:
[ ... ]
> 
> I am still not quite clear about the blk_freeze_queue() part.
> 
>>> In the new implementation of ufshcd_clock_scaling_prepare(), after 
>>> blk_freeze_queue_start(), we call blk_mq_freeze_queue_wait_timeout()
>>> to wait for 1 sec. In addition to those requests which have already
>>> been queued to HW doorbell, more requests will be dispatched within 1 
>>> sec, through the lowest Gear. My first impression is that it might be >>> a bit lazy and I am not sure how much it may affect the benchmarks.
> 
> First of all, reg above lines, do you agree that there can be latency in 
> scaling up/down
> comparing with the old implementation?
> 
> I can understand that after queue is frozen, all calls to blk_get_request()
> are blocked, no submission can be made after this point, due to
> blk_queue_enter() shall wait on q->mq_freeze_wq.
> 
> <--code snippet-->
> wait_event(q->mq_freeze_wq,
>             (atomic_read(&q->mq_freeze_depth) == 0 &&
> <--code snippet-->
> 
>>> And if the request queues are heavily loaded(many bios are waiting 
>>> for a free tag to become a request),
>>> is 1 sec long enough to freeze all these request queue?
> 
> But here I meant those bios, before we call blk_freeze_queue_start(), 
> sent through
> submit_bio() calls which have already entered blk_mq_get_request() and 
> already
> returned from the blk_queue_enter_live(). These bios are waiting for a 
> free tag
> (waiting on func blk_mq_get_tag() when queue is full).
> Shall the request queue be frozen before these bios are handled?
> 
> void blk_mq_freeze_queue_wait(struct request_queue *q)
> {
>       wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter));
> }
> EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
> 
> Per my understanding, these bios have already increased 
> &q->q_usage_counter.
> And the &q->q_usage_counter will only become 0 when all of the requests 
> in the
> queue and these bios(after they get free tags and turned into requests) are
> completed from block layer, meaning after blk_queue_exit() is called in
> __blk_mq_free_request() for all of them. Please correct me if I am wrong.

Hi Can,

Please have another look at the request queue freezing mechanism in the 
block layer. If blk_queue_enter() sleeps in wait_event() that implies 
either that the percpu_ref_tryget_live() call failed or that the 
percpu_ref_tryget_live() succeeded and that it was followed by a 
percpu_ref_put() call. Ignoring concurrent q_usage_counter changes, in 
both cases q->q_usage_counter will have the same value as before 
blk_queue_enter() started.

In other words, there shouldn't be a latency difference between the 
current and the proposed approach since neither approach waits for 
completion of bios or SCSI commands that are queued after clock scaling 
started.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-11-20 17:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-12 17:37 [PATCH v5 0/4] Simplify and optimize the UFS driver Bart Van Assche
2019-11-12 17:37 ` [PATCH v5 1/4] ufs: Serialize error handling and command submission Bart Van Assche
2019-11-12 17:37 ` [PATCH v5 2/4] ufs: Avoid busy-waiting by eliminating tag conflicts Bart Van Assche
2019-11-12 17:37 ` [PATCH v5 3/4] ufs: Use blk_{get,put}_request() to allocate and free TMFs Bart Van Assche
2019-11-12 17:37 ` [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation Bart Van Assche
2019-11-13  0:11   ` cang
2019-11-13  0:55     ` Bart Van Assche
2019-11-13 16:03       ` [EXT] " Bean Huo (beanhuo)
2019-11-14 16:11         ` Bart Van Assche
2019-11-15  6:01           ` Can Guo
2019-11-15 16:23             ` Bart Van Assche
2019-11-18  3:49               ` cang
     [not found]               ` <0101016e7ca0e791-30050d63-c260-4cc3-a12b-658b7aa70031-000000@us-west-2.amazonses.com>
2019-11-18 17:49                 ` Bart Van Assche
2019-11-18 18:13             ` Bart Van Assche
2019-11-19  5:33               ` cang
     [not found]               ` <0101016e822696b5-d1c358be-a0a2-4ef6-b04d-627c1fb361f8-000000@us-west-2.amazonses.com>
2019-11-19 23:16                 ` Bart Van Assche
2019-11-20  3:38                   ` cang
     [not found]                   ` <0101016e86e3a961-1840fd1c-d71b-434a-8392-f62d7ece8b0f-000000@us-west-2.amazonses.com>
2019-11-20 17:58                     ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.