linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/3] scsi: ufs: Fix task management request completion timeout
       [not found] <1611807365-35513-1-git-send-email-cang@codeaurora.org>
@ 2021-01-28  4:16 ` Can Guo
  2021-01-29  3:22   ` Bart Van Assche
  2021-01-28  4:16 ` [PATCH v3 2/3] scsi: ufs: Fix a race condition btw task management request send and compl Can Guo
  2021-01-28  4:16 ` [PATCH v3 3/3] scsi: ufs: Fix wrong Task Tag used in task management request UPIUs Can Guo
  2 siblings, 1 reply; 14+ messages in thread
From: Can Guo @ 2021-01-28  4:16 UTC (permalink / raw)
  To: jaegeuk, bvanassche, asutoshd, nguyenb, hongwus, linux-scsi,
	kernel-team, cang
  Cc: Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, open list

ufshcd_tmc_handler() calls blk_mq_tagset_busy_iter(fn = ufshcd_compl_tm()),
but since blk_mq_tagset_busy_iter() only iterates over all reserved tags
and requests which are not in IDLE state, ufshcd_compl_tm() never gets a
chance to run. Thus, TMR always ends up with completion timeout. Fix it by
calling blk_mq_start_request() in  __ufshcd_issue_tm_cmd().

Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate and free TMFs")

Signed-off-by: Can Guo <cang@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 8da75e6..c0c5925 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -6395,6 +6395,7 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 
 	spin_lock_irqsave(host->host_lock, flags);
 	task_tag = hba->nutrs + free_slot;
+	blk_mq_start_request(req);
 
 	treq->req_header.dword_0 |= cpu_to_be32(task_tag);
 
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/3] scsi: ufs: Fix a race condition btw task management request send and compl
       [not found] <1611807365-35513-1-git-send-email-cang@codeaurora.org>
  2021-01-28  4:16 ` [PATCH v3 1/3] scsi: ufs: Fix task management request completion timeout Can Guo
@ 2021-01-28  4:16 ` Can Guo
  2021-01-29  3:20   ` Bart Van Assche
  2021-01-28  4:16 ` [PATCH v3 3/3] scsi: ufs: Fix wrong Task Tag used in task management request UPIUs Can Guo
  2 siblings, 1 reply; 14+ messages in thread
From: Can Guo @ 2021-01-28  4:16 UTC (permalink / raw)
  To: jaegeuk, bvanassche, asutoshd, nguyenb, hongwus, linux-scsi,
	kernel-team, cang
  Cc: Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, open list

ufshcd_compl_tm() looks for all 0 bits in the REG_UTP_TASK_REQ_DOOR_BELL
and call complete() for each req who has the req->end_io_data set. There
can be a race condition btw tmc send/compl, because the req->end_io_data is
set, in __ufshcd_issue_tm_cmd(), without host lock protection, so it is
possible that when ufshcd_compl_tm() checks the req->end_io_data, it is set
but the corresponding tag has not been set in REG_UTP_TASK_REQ_DOOR_BELL.
Thus, ufshcd_tmc_handler() may wrongly complete TMRs which have not been
sent out. Fix it by protecting req->end_io_data with host lock, and let
ufshcd_compl_tm() only handle those tm cmds which have been completed
instead of looking for 0 bits in the REG_UTP_TASK_REQ_DOOR_BELL.

Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate and free TMFs")

Signed-off-by: Can Guo <cang@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index c0c5925..43894a3 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -6225,7 +6225,7 @@ static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba)
 
 struct ctm_info {
 	struct ufs_hba	*hba;
-	unsigned long	pending;
+	unsigned long	completed;
 	unsigned int	ncpl;
 };
 
@@ -6234,13 +6234,13 @@ static bool ufshcd_compl_tm(struct request *req, void *priv, bool reserved)
 	struct ctm_info *const ci = priv;
 	struct completion *c;
 
-	WARN_ON_ONCE(reserved);
-	if (test_bit(req->tag, &ci->pending))
-		return true;
-	ci->ncpl++;
-	c = req->end_io_data;
-	if (c)
-		complete(c);
+	if (test_bit(req->tag, &ci->completed)) {
+		__clear_bit(req->tag, &ci->hba->outstanding_tasks);
+		ci->ncpl++;
+		c = req->end_io_data;
+		if (c)
+			complete(c);
+	}
 	return true;
 }
 
@@ -6255,12 +6255,19 @@ static bool ufshcd_compl_tm(struct request *req, void *priv, bool reserved)
 static irqreturn_t ufshcd_tmc_handler(struct ufs_hba *hba)
 {
 	struct request_queue *q = hba->tmf_queue;
+	u32 tm_doorbell;
+	unsigned long completed;
 	struct ctm_info ci = {
-		.hba	 = hba,
-		.pending = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL),
+		.hba = hba,
+		.ncpl = 0,
 	};
 
-	blk_mq_tagset_busy_iter(q->tag_set, ufshcd_compl_tm, &ci);
+	tm_doorbell = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL);
+	completed = tm_doorbell ^ hba->outstanding_tasks;
+	if (completed) {
+		ci.completed = completed;
+		blk_mq_tagset_busy_iter(q->tag_set, ufshcd_compl_tm, &ci);
+	}
 	return ci.ncpl ? IRQ_HANDLED : IRQ_NONE;
 }
 
@@ -6388,12 +6395,12 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
-	req->end_io_data = &wait;
 	free_slot = req->tag;
 	WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
 	ufshcd_hold(hba, false);
 
 	spin_lock_irqsave(host->host_lock, flags);
+	req->end_io_data = &wait;
 	task_tag = hba->nutrs + free_slot;
 	blk_mq_start_request(req);
 
@@ -6420,11 +6427,13 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 	err = wait_for_completion_io_timeout(&wait,
 			msecs_to_jiffies(TM_CMD_TIMEOUT));
 	if (!err) {
+		spin_lock_irqsave(hba->host->host_lock, flags);
 		/*
 		 * Make sure that ufshcd_compl_tm() does not trigger a
 		 * use-after-free.
 		 */
 		req->end_io_data = NULL;
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
 		ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR);
 		dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
 				__func__, tm_function);
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/3] scsi: ufs: Fix wrong Task Tag used in task management request UPIUs
       [not found] <1611807365-35513-1-git-send-email-cang@codeaurora.org>
  2021-01-28  4:16 ` [PATCH v3 1/3] scsi: ufs: Fix task management request completion timeout Can Guo
  2021-01-28  4:16 ` [PATCH v3 2/3] scsi: ufs: Fix a race condition btw task management request send and compl Can Guo
@ 2021-01-28  4:16 ` Can Guo
  2021-01-29  3:15   ` Bart Van Assche
  2 siblings, 1 reply; 14+ messages in thread
From: Can Guo @ 2021-01-28  4:16 UTC (permalink / raw)
  To: jaegeuk, bvanassche, asutoshd, nguyenb, hongwus, linux-scsi,
	kernel-team, cang
  Cc: Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, Sujit Reddy Thumma,
	Vinayak Holikatti, Yaniv Gardi, open list

In __ufshcd_issue_tm_cmd(), it is not right to use hba->nutrs + req->tag as
the Task Tag in one TMR UPIU. Directly use req->tag as the Task Tag.

Fixes: e293313262d3 ("scsi: ufs: Fix broken task management command implementation")

Signed-off-by: Can Guo <cang@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c | 36 +++++++++++++++---------------------
 1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 43894a3..72fa14b 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -6384,38 +6384,33 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 	DECLARE_COMPLETION_ONSTACK(wait);
 	struct request *req;
 	unsigned long flags;
-	int free_slot, task_tag, err;
+	int task_tag, err;
 
 	/*
-	 * Get free slot, sleep if slots are unavailable.
-	 * Even though we use wait_event() which sleeps indefinitely,
-	 * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
+	 * blk_get_request() used here is only to get a free tag.
 	 */
 	req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
-	free_slot = req->tag;
-	WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
 	ufshcd_hold(hba, false);
-
 	spin_lock_irqsave(host->host_lock, flags);
 	req->end_io_data = &wait;
-	task_tag = hba->nutrs + free_slot;
 	blk_mq_start_request(req);
 
+	task_tag = req->tag;
 	treq->req_header.dword_0 |= cpu_to_be32(task_tag);
 
-	memcpy(hba->utmrdl_base_addr + free_slot, treq, sizeof(*treq));
-	ufshcd_vops_setup_task_mgmt(hba, free_slot, tm_function);
+	memcpy(hba->utmrdl_base_addr + task_tag, treq, sizeof(*treq));
+	ufshcd_vops_setup_task_mgmt(hba, task_tag, tm_function);
 
 	/* send command to the controller */
-	__set_bit(free_slot, &hba->outstanding_tasks);
+	__set_bit(task_tag, &hba->outstanding_tasks);
 
 	/* Make sure descriptors are ready before ringing the task doorbell */
 	wmb();
 
-	ufshcd_writel(hba, 1 << free_slot, REG_UTP_TASK_REQ_DOOR_BELL);
+	ufshcd_writel(hba, 1 << task_tag, REG_UTP_TASK_REQ_DOOR_BELL);
 	/* Make sure that doorbell is committed immediately */
 	wmb();
 
@@ -6437,24 +6432,23 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 		ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR);
 		dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
 				__func__, tm_function);
-		if (ufshcd_clear_tm_cmd(hba, free_slot))
-			dev_WARN(hba->dev, "%s: unable clear tm cmd (slot %d) after timeout\n",
-					__func__, free_slot);
+		if (ufshcd_clear_tm_cmd(hba, task_tag)) {
+			dev_WARN(hba->dev, "%s: unable to clear tm cmd (slot %d) after timeout\n",
+					__func__, task_tag);
+		} else {
+			__clear_bit(task_tag, &hba->outstanding_tasks);
+		}
 		err = -ETIMEDOUT;
 	} else {
 		err = 0;
-		memcpy(treq, hba->utmrdl_base_addr + free_slot, sizeof(*treq));
+		memcpy(treq, hba->utmrdl_base_addr + task_tag, sizeof(*treq));
 
 		ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_COMP);
 	}
 
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	__clear_bit(free_slot, &hba->outstanding_tasks);
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-
+	ufshcd_release(hba);
 	blk_put_request(req);
 
-	ufshcd_release(hba);
 	return err;
 }
 
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/3] scsi: ufs: Fix wrong Task Tag used in task management request UPIUs
  2021-01-28  4:16 ` [PATCH v3 3/3] scsi: ufs: Fix wrong Task Tag used in task management request UPIUs Can Guo
@ 2021-01-29  3:15   ` Bart Van Assche
  2021-01-29  5:57     ` Can Guo
  0 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2021-01-29  3:15 UTC (permalink / raw)
  To: Can Guo, jaegeuk, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, Sujit Reddy Thumma,
	Vinayak Holikatti, Yaniv Gardi, open list

On 1/27/21 8:16 PM, Can Guo wrote:
> In __ufshcd_issue_tm_cmd(), it is not right to use hba->nutrs + req->tag as
> the Task Tag in one TMR UPIU. Directly use req->tag as the Task Tag.

Why is the current code wrong and why is this patch the proper fix?
Please explain this in the patch description.

> +	 * blk_get_request() used here is only to get a free tag.

Please fix the word order in this comment ("blk_get_request() is used
here only to get a free tag").

> +	ufshcd_release(hba);
>  	blk_put_request(req);
>  
> -	ufshcd_release(hba);

An explanation for this change is missing from the patch description.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/3] scsi: ufs: Fix a race condition btw task management request send and compl
  2021-01-28  4:16 ` [PATCH v3 2/3] scsi: ufs: Fix a race condition btw task management request send and compl Can Guo
@ 2021-01-29  3:20   ` Bart Van Assche
  2021-01-29  6:06     ` Can Guo
  0 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2021-01-29  3:20 UTC (permalink / raw)
  To: Can Guo, jaegeuk, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, open list

On 1/27/21 8:16 PM, Can Guo wrote:
> ufshcd_compl_tm() looks for all 0 bits in the REG_UTP_TASK_REQ_DOOR_BELL
> and call complete() for each req who has the req->end_io_data set. There
> can be a race condition btw tmc send/compl, because the req->end_io_data is
> set, in __ufshcd_issue_tm_cmd(), without host lock protection, so it is
> possible that when ufshcd_compl_tm() checks the req->end_io_data, it is set
> but the corresponding tag has not been set in REG_UTP_TASK_REQ_DOOR_BELL.
> Thus, ufshcd_tmc_handler() may wrongly complete TMRs which have not been
> sent out. Fix it by protecting req->end_io_data with host lock, and let
> ufshcd_compl_tm() only handle those tm cmds which have been completed
> instead of looking for 0 bits in the REG_UTP_TASK_REQ_DOOR_BELL.

I don't know any other block driver that needs locking to protect races
between submission and completion context. Can the block layer timeout
mechanism be used instead of the mechanism introduced by this patch,
e.g. by using blk_execute_rq_nowait() to submit requests? That would
allow to reuse the existing mechanism in the block layer core to handle
races between request completion and timeout handling.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] scsi: ufs: Fix task management request completion timeout
  2021-01-28  4:16 ` [PATCH v3 1/3] scsi: ufs: Fix task management request completion timeout Can Guo
@ 2021-01-29  3:22   ` Bart Van Assche
  2021-01-29  5:46     ` Can Guo
  0 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2021-01-29  3:22 UTC (permalink / raw)
  To: Can Guo, jaegeuk, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team
  Cc: Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, open list

On 1/27/21 8:16 PM, Can Guo wrote:
> ufshcd_tmc_handler() calls blk_mq_tagset_busy_iter(fn = ufshcd_compl_tm()),
> but since blk_mq_tagset_busy_iter() only iterates over all reserved tags
> and requests which are not in IDLE state, ufshcd_compl_tm() never gets a
> chance to run. Thus, TMR always ends up with completion timeout. Fix it by
> calling blk_mq_start_request() in  __ufshcd_issue_tm_cmd().
> 
> Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate and free TMFs")
> 
> Signed-off-by: Can Guo <cang@codeaurora.org>
> ---
>  drivers/scsi/ufs/ufshcd.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 8da75e6..c0c5925 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -6395,6 +6395,7 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
>  
>  	spin_lock_irqsave(host->host_lock, flags);
>  	task_tag = hba->nutrs + free_slot;
> +	blk_mq_start_request(req);
>  
>  	treq->req_header.dword_0 |= cpu_to_be32(task_tag);

blk_mq_start_request() not only marks a request as in-flight but also
starts a timer. However, no timeout handler has been defined in
ufshcd_tmf_ops. Should a timeout handler be defined in that data structure?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] scsi: ufs: Fix task management request completion timeout
  2021-01-29  3:22   ` Bart Van Assche
@ 2021-01-29  5:46     ` Can Guo
  0 siblings, 0 replies; 14+ messages in thread
From: Can Guo @ 2021-01-29  5:46 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jaegeuk, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, open list

On 2021-01-29 11:22, Bart Van Assche wrote:
> On 1/27/21 8:16 PM, Can Guo wrote:
>> ufshcd_tmc_handler() calls blk_mq_tagset_busy_iter(fn = 
>> ufshcd_compl_tm()),
>> but since blk_mq_tagset_busy_iter() only iterates over all reserved 
>> tags
>> and requests which are not in IDLE state, ufshcd_compl_tm() never gets 
>> a
>> chance to run. Thus, TMR always ends up with completion timeout. Fix 
>> it by
>> calling blk_mq_start_request() in  __ufshcd_issue_tm_cmd().
>> 
>> Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to 
>> allocate and free TMFs")
>> 
>> Signed-off-by: Can Guo <cang@codeaurora.org>
>> ---
>>  drivers/scsi/ufs/ufshcd.c | 1 +
>>  1 file changed, 1 insertion(+)
>> 
>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> index 8da75e6..c0c5925 100644
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -6395,6 +6395,7 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba 
>> *hba,
>> 
>>  	spin_lock_irqsave(host->host_lock, flags);
>>  	task_tag = hba->nutrs + free_slot;
>> +	blk_mq_start_request(req);
>> 
>>  	treq->req_header.dword_0 |= cpu_to_be32(task_tag);
> 
> blk_mq_start_request() not only marks a request as in-flight but also
> starts a timer. However, no timeout handler has been defined in
> ufshcd_tmf_ops. Should a timeout handler be defined in that data 
> structure?
> 

Block mq driver gives 30s as default timeout,
TMR timeout is 100ms in UFS driver. So we don't
need a timeout handler as of now.

Thanks,
Can Guo.

> Thanks,
> 
> Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/3] scsi: ufs: Fix wrong Task Tag used in task management request UPIUs
  2021-01-29  3:15   ` Bart Van Assche
@ 2021-01-29  5:57     ` Can Guo
  2021-02-01  2:39       ` Bart Van Assche
  0 siblings, 1 reply; 14+ messages in thread
From: Can Guo @ 2021-01-29  5:57 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jaegeuk, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, Sujit Reddy Thumma,
	Vinayak Holikatti, Yaniv Gardi, open list

On 2021-01-29 11:15, Bart Van Assche wrote:
> On 1/27/21 8:16 PM, Can Guo wrote:
>> In __ufshcd_issue_tm_cmd(), it is not right to use hba->nutrs + 
>> req->tag as
>> the Task Tag in one TMR UPIU. Directly use req->tag as the Task Tag.
> 
> Why is the current code wrong and why is this patch the proper fix?
> Please explain this in the patch description.
> 

req->tag is the tag allocated for one TMR, no?

>> +	 * blk_get_request() used here is only to get a free tag.
> 
> Please fix the word order in this comment ("blk_get_request() is used
> here only to get a free tag").

Sure.

> 
>> +	ufshcd_release(hba);
>>  	blk_put_request(req);
>> 
>> -	ufshcd_release(hba);
> 
> An explanation for this change is missing from the patch description.
> 

This is just for symmetric coding since this change is almost
re-writing the whole func - at the entrence it calls blk_get_request()
and ufshcd_hold(), so before exit it'd be good to call ufshcd_release()
before blk_put_request(). If you think this single line change worths
a separate patch, I can split it out in next version.

Thanks,
Can Guo.

> Thanks,
> 
> Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/3] scsi: ufs: Fix a race condition btw task management request send and compl
  2021-01-29  3:20   ` Bart Van Assche
@ 2021-01-29  6:06     ` Can Guo
  2021-01-29  6:29       ` Can Guo
  0 siblings, 1 reply; 14+ messages in thread
From: Can Guo @ 2021-01-29  6:06 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jaegeuk, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, open list

On 2021-01-29 11:20, Bart Van Assche wrote:
> On 1/27/21 8:16 PM, Can Guo wrote:
>> ufshcd_compl_tm() looks for all 0 bits in the 
>> REG_UTP_TASK_REQ_DOOR_BELL
>> and call complete() for each req who has the req->end_io_data set. 
>> There
>> can be a race condition btw tmc send/compl, because the 
>> req->end_io_data is
>> set, in __ufshcd_issue_tm_cmd(), without host lock protection, so it 
>> is
>> possible that when ufshcd_compl_tm() checks the req->end_io_data, it 
>> is set
>> but the corresponding tag has not been set in 
>> REG_UTP_TASK_REQ_DOOR_BELL.
>> Thus, ufshcd_tmc_handler() may wrongly complete TMRs which have not 
>> been
>> sent out. Fix it by protecting req->end_io_data with host lock, and 
>> let
>> ufshcd_compl_tm() only handle those tm cmds which have been completed
>> instead of looking for 0 bits in the REG_UTP_TASK_REQ_DOOR_BELL.
> 
> I don't know any other block driver that needs locking to protect races
> between submission and completion context. Can the block layer timeout
> mechanism be used instead of the mechanism introduced by this patch,
> e.g. by using blk_execute_rq_nowait() to submit requests? That would
> allow to reuse the existing mechanism in the block layer core to handle
> races between request completion and timeout handling.

This patch is not introducing any new mechanism, it is fixing the
usage of completion (req->end_io_data = c) introduced by commit
69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate
and free TMFs"). If you have better idea to get it fixed once for
all, we are glad to take your change to get it fixed asap.

Regards,

Can Guo.

> 
> Thanks,
> 
> Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/3] scsi: ufs: Fix a race condition btw task management request send and compl
  2021-01-29  6:06     ` Can Guo
@ 2021-01-29  6:29       ` Can Guo
  2021-02-01  2:27         ` Bart Van Assche
  0 siblings, 1 reply; 14+ messages in thread
From: Can Guo @ 2021-01-29  6:29 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jaegeuk, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, open list

On 2021-01-29 14:06, Can Guo wrote:
> On 2021-01-29 11:20, Bart Van Assche wrote:
>> On 1/27/21 8:16 PM, Can Guo wrote:
>>> ufshcd_compl_tm() looks for all 0 bits in the 
>>> REG_UTP_TASK_REQ_DOOR_BELL
>>> and call complete() for each req who has the req->end_io_data set. 
>>> There
>>> can be a race condition btw tmc send/compl, because the 
>>> req->end_io_data is
>>> set, in __ufshcd_issue_tm_cmd(), without host lock protection, so it 
>>> is
>>> possible that when ufshcd_compl_tm() checks the req->end_io_data, it 
>>> is set
>>> but the corresponding tag has not been set in 
>>> REG_UTP_TASK_REQ_DOOR_BELL.
>>> Thus, ufshcd_tmc_handler() may wrongly complete TMRs which have not 
>>> been
>>> sent out. Fix it by protecting req->end_io_data with host lock, and 
>>> let
>>> ufshcd_compl_tm() only handle those tm cmds which have been completed
>>> instead of looking for 0 bits in the REG_UTP_TASK_REQ_DOOR_BELL.
>> 
>> I don't know any other block driver that needs locking to protect 
>> races
>> between submission and completion context. Can the block layer timeout
>> mechanism be used instead of the mechanism introduced by this patch,
>> e.g. by using blk_execute_rq_nowait() to submit requests? That would
>> allow to reuse the existing mechanism in the block layer core to 
>> handle
>> races between request completion and timeout handling.
> 
> This patch is not introducing any new mechanism, it is fixing the
> usage of completion (req->end_io_data = c) introduced by commit
> 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate
> and free TMFs"). If you have better idea to get it fixed once for
> all, we are glad to take your change to get it fixed asap.
> 
> Regards,
> 
> Can Guo.
> 

On second thought, actually the 1st fix alone is enough to eliminate the
race condition. Because blk_mq_tagset_busy_iter() only iterates over all
requests which are not in IDLE state, if blk_mq_start_request() is 
called
within the protection of host spin lock, ufshcd_compl_tm() shall not run
into the scenario where req->end_io_data is set but 
REG_UTP_TASK_REQ_DOOR_BELL
has not been set. What do you think?

Thanks,

Can Guo.

>> 
>> Thanks,
>> 
>> Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/3] scsi: ufs: Fix a race condition btw task management request send and compl
  2021-01-29  6:29       ` Can Guo
@ 2021-02-01  2:27         ` Bart Van Assche
  0 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2021-02-01  2:27 UTC (permalink / raw)
  To: Can Guo
  Cc: jaegeuk, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, open list

On 1/28/21 10:29 PM, Can Guo wrote:
> On second thought, actually the 1st fix alone is enough to eliminate the
> race condition. Because blk_mq_tagset_busy_iter() only iterates over all
> requests which are not in IDLE state, if blk_mq_start_request() is called
> within the protection of host spin lock, ufshcd_compl_tm() shall not run
> into the scenario where req->end_io_data is set but
> REG_UTP_TASK_REQ_DOOR_BELL
> has not been set. What do you think?

That sounds reasonable to me.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/3] scsi: ufs: Fix wrong Task Tag used in task management request UPIUs
  2021-01-29  5:57     ` Can Guo
@ 2021-02-01  2:39       ` Bart Van Assche
  2021-02-05  6:09         ` Can Guo
  0 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2021-02-01  2:39 UTC (permalink / raw)
  To: Can Guo
  Cc: jaegeuk, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, Sujit Reddy Thumma,
	Vinayak Holikatti, Yaniv Gardi, open list

On 1/28/21 9:57 PM, Can Guo wrote:
> On 2021-01-29 11:15, Bart Van Assche wrote:
>> On 1/27/21 8:16 PM, Can Guo wrote:
>>> In __ufshcd_issue_tm_cmd(), it is not right to use hba->nutrs +
>>> req->tag as
>>> the Task Tag in one TMR UPIU. Directly use req->tag as the Task Tag.
>>
>> Why is the current code wrong and why is this patch the proper fix?
>> Please explain this in the patch description.
> 
> req->tag is the tag allocated for one TMR, no?

Hi Can,
 Commit e293313262d3 ("scsi: ufs: Fix broken task management command
implementation") includes the following changes:

+       task_tag = hba->nutrs + free_slot;
        task_req_upiup->header.dword_0 =
                UPIU_HEADER_DWORD(UPIU_TRANSACTION_TASK_REQ, 0,
-                                            lrbp->lun, lrbp->task_tag);
+                                            lun_id, task_tag);
        task_req_upiup->header.dword_1 =
                UPIU_HEADER_DWORD(0, tm_function, 0, 0);

As one can see the value written in dword_0 starts at hba->nutrs. Was
that code correct? If that code was correct, does your patch perhaps
break task management support?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/3] scsi: ufs: Fix wrong Task Tag used in task management request UPIUs
  2021-02-01  2:39       ` Bart Van Assche
@ 2021-02-05  6:09         ` Can Guo
  2021-02-07  2:50           ` Bart Van Assche
  0 siblings, 1 reply; 14+ messages in thread
From: Can Guo @ 2021-02-05  6:09 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jaegeuk, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, Sujit Reddy Thumma,
	Vinayak Holikatti, Yaniv Gardi, open list

On 2021-02-01 10:39, Bart Van Assche wrote:
> On 1/28/21 9:57 PM, Can Guo wrote:
>> On 2021-01-29 11:15, Bart Van Assche wrote:
>>> On 1/27/21 8:16 PM, Can Guo wrote:
>>>> In __ufshcd_issue_tm_cmd(), it is not right to use hba->nutrs +
>>>> req->tag as
>>>> the Task Tag in one TMR UPIU. Directly use req->tag as the Task Tag.
>>> 
>>> Why is the current code wrong and why is this patch the proper fix?
>>> Please explain this in the patch description.
>> 
>> req->tag is the tag allocated for one TMR, no?
> 
> Hi Can,
>  Commit e293313262d3 ("scsi: ufs: Fix broken task management command
> implementation") includes the following changes:
> 
> +       task_tag = hba->nutrs + free_slot;
>         task_req_upiup->header.dword_0 =
>                 UPIU_HEADER_DWORD(UPIU_TRANSACTION_TASK_REQ, 0,
> -                                            lrbp->lun, 
> lrbp->task_tag);
> +                                            lun_id, task_tag);
>         task_req_upiup->header.dword_1 =
>                 UPIU_HEADER_DWORD(0, tm_function, 0, 0);
> 
> As one can see the value written in dword_0 starts at hba->nutrs. Was
> that code correct? If that code was correct, does your patch perhaps
> break task management support?

That code is wrong. The Task Tag in Dword_0 should be the real tag we
allocated for TMR. The transfer request Task Tag which we are trying to
abort is given in Dword_5, which is the Input Parameter 3 of the TMR 
UPIU.
I am not sure why the author gave hba->nutrs + req->tag as the Task Tag
of one TMR, the commit msg abot this part is not quite informative....

Table 10.22 — Task Management Request UPIU
TASK MANAGEMENT REQUEST UPIU
----------------------------------
|0         |1      |2   |3       |
----------------------------------
|xx00 0100b| Flags |LUN |Task Tag|
----------------------------------
...
16 (MSB)   |17     |18  |19 (LSB)|
----------------------------------
Input Parameter 2
----------------------------------

Table 10.24 — Task Management Input Parameters
Field Description
Input Parameter 2 LSB: Task Tag of the task/command operated by the task 
management function.

Thanks,

Can Guo.

> 
> Thanks,
> 
> Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/3] scsi: ufs: Fix wrong Task Tag used in task management request UPIUs
  2021-02-05  6:09         ` Can Guo
@ 2021-02-07  2:50           ` Bart Van Assche
  0 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2021-02-07  2:50 UTC (permalink / raw)
  To: Can Guo
  Cc: jaegeuk, asutoshd, nguyenb, hongwus, linux-scsi, kernel-team,
	Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, Sujit Reddy Thumma,
	Vinayak Holikatti, Yaniv Gardi, open list

On 2/4/21 10:09 PM, Can Guo wrote:
> That code is wrong. The Task Tag in Dword_0 should be the real tag we
> allocated for TMR. The transfer request Task Tag which we are trying to
> abort is given in Dword_5, which is the Input Parameter 3 of the TMR UPIU.
> I am not sure why the author gave hba->nutrs + req->tag as the Task Tag
> of one TMR, the commit msg abot this part is not quite informative....
> 
> Table 10.22 — Task Management Request UPIU
> TASK MANAGEMENT REQUEST UPIU
> ----------------------------------
> |0         |1      |2   |3       |
> ----------------------------------
> |xx00 0100b| Flags |LUN |Task Tag|
> ----------------------------------
> ...
> 16 (MSB)   |17     |18  |19 (LSB)|
> ----------------------------------
> Input Parameter 2
> ----------------------------------
> 
> Table 10.24 — Task Management Input Parameters
> Field Description
> Input Parameter 2 LSB: Task Tag of the task/command operated by the task
> management function.

Thanks for the clarification. Feel free to add the following to this patch:

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-02-07  2:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1611807365-35513-1-git-send-email-cang@codeaurora.org>
2021-01-28  4:16 ` [PATCH v3 1/3] scsi: ufs: Fix task management request completion timeout Can Guo
2021-01-29  3:22   ` Bart Van Assche
2021-01-29  5:46     ` Can Guo
2021-01-28  4:16 ` [PATCH v3 2/3] scsi: ufs: Fix a race condition btw task management request send and compl Can Guo
2021-01-29  3:20   ` Bart Van Assche
2021-01-29  6:06     ` Can Guo
2021-01-29  6:29       ` Can Guo
2021-02-01  2:27         ` Bart Van Assche
2021-01-28  4:16 ` [PATCH v3 3/3] scsi: ufs: Fix wrong Task Tag used in task management request UPIUs Can Guo
2021-01-29  3:15   ` Bart Van Assche
2021-01-29  5:57     ` Can Guo
2021-02-01  2:39       ` Bart Van Assche
2021-02-05  6:09         ` Can Guo
2021-02-07  2:50           ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).