All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
       [not found] <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p6>
@ 2021-10-27 22:36 ` Daejun Park
  2021-10-28  4:16   ` Bart Van Assche
                     ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Daejun Park @ 2021-10-27 22:36 UTC (permalink / raw)
  To: ALIM AKHTAR, avri.altman, jejb, martin.petersen, Daejun Park,
	huobean, bvanassche, Keoseong Park
  Cc: linux-scsi, linux-kernel

This patch addresses the issue of using the wrong API to create a
pre_request for HPB READ.
HPB READ candidate that require a pre-request will try to allocate a
pre-request only during request_timeout_ms (default: 0). Otherwise, it is
passed as normal READ, so deadlock problem can be resolved.

Signed-off-by: Daejun Park <daejun7.park@samsung.com>
---
 drivers/scsi/ufs/ufshpb.c | 11 +++++------
 drivers/scsi/ufs/ufshpb.h |  1 +
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
index 02fb51ae8b25..3117bd47d762 100644
--- a/drivers/scsi/ufs/ufshpb.c
+++ b/drivers/scsi/ufs/ufshpb.c
@@ -548,8 +548,7 @@ static int ufshpb_execute_pre_req(struct ufshpb_lu *hpb, struct scsi_cmnd *cmd,
 				 read_id);
 	rq->cmd_len = scsi_command_size(rq->cmd);
 
-	if (blk_insert_cloned_request(q, req) != BLK_STS_OK)
-		return -EAGAIN;
+	blk_execute_rq_nowait(NULL, req, true, ufshpb_pre_req_compl_fn);
 
 	hpb->stats.pre_req_cnt++;
 
@@ -2315,19 +2314,19 @@ struct attribute_group ufs_sysfs_hpb_param_group = {
 static int ufshpb_pre_req_mempool_init(struct ufshpb_lu *hpb)
 {
 	struct ufshpb_req *pre_req = NULL, *t;
-	int qd = hpb->sdev_ufs_lu->queue_depth / 2;
 	int i;
 
 	INIT_LIST_HEAD(&hpb->lh_pre_req_free);
 
-	hpb->pre_req = kcalloc(qd, sizeof(struct ufshpb_req), GFP_KERNEL);
-	hpb->throttle_pre_req = qd;
+	hpb->pre_req = kcalloc(HPB_INFLIGHT_PRE_REQ, sizeof(struct ufshpb_req),
+			       GFP_KERNEL);
+	hpb->throttle_pre_req = HPB_INFLIGHT_PRE_REQ;
 	hpb->num_inflight_pre_req = 0;
 
 	if (!hpb->pre_req)
 		goto release_mem;
 
-	for (i = 0; i < qd; i++) {
+	for (i = 0; i < HPB_INFLIGHT_PRE_REQ; i++) {
 		pre_req = hpb->pre_req + i;
 		INIT_LIST_HEAD(&pre_req->list_req);
 		pre_req->req = NULL;
diff --git a/drivers/scsi/ufs/ufshpb.h b/drivers/scsi/ufs/ufshpb.h
index a79e07398970..411a6d625f53 100644
--- a/drivers/scsi/ufs/ufshpb.h
+++ b/drivers/scsi/ufs/ufshpb.h
@@ -50,6 +50,7 @@
 #define HPB_RESET_REQ_RETRIES			10
 #define HPB_MAP_REQ_RETRIES			5
 #define HPB_REQUEUE_TIME_MS			0
+#define HPB_INFLIGHT_PRE_REQ			4
 
 #define HPB_SUPPORT_VERSION			0x200
 #define HPB_SUPPORT_LEGACY_VERSION		0x100
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
  2021-10-27 22:36 ` [PATCH] scsi: ufs: Fix proper API to send HPB pre-request Daejun Park
@ 2021-10-28  4:16   ` Bart Van Assche
  2021-10-28 14:28   ` James Bottomley
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2021-10-28  4:16 UTC (permalink / raw)
  To: daejun7.park, ALIM AKHTAR, avri.altman, jejb, martin.petersen,
	huobean, Keoseong Park
  Cc: linux-scsi, linux-kernel

On 10/27/21 15:36, Daejun Park wrote:
> This patch addresses the issue of using the wrong API to create a
> pre_request for HPB READ.
> HPB READ candidate that require a pre-request will try to allocate a
> pre-request only during request_timeout_ms (default: 0). Otherwise, it is
> passed as normal READ, so deadlock problem can be resolved.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
  2021-10-27 22:36 ` [PATCH] scsi: ufs: Fix proper API to send HPB pre-request Daejun Park
  2021-10-28  4:16   ` Bart Van Assche
@ 2021-10-28 14:28   ` James Bottomley
  2021-10-28 15:27     ` Christoph Hellwig
  2021-10-28 15:59     ` Bart Van Assche
  2021-10-28 15:22   ` James Bottomley
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 18+ messages in thread
From: James Bottomley @ 2021-10-28 14:28 UTC (permalink / raw)
  To: daejun7.park, ALIM AKHTAR, avri.altman, martin.petersen, huobean,
	bvanassche, Keoseong Park
  Cc: linux-scsi, linux-kernel

On Thu, 2021-10-28 at 07:36 +0900, Daejun Park wrote:
> This patch addresses the issue of using the wrong API to create a
> pre_request for HPB READ.
> HPB READ candidate that require a pre-request will try to allocate a
> pre-request only during request_timeout_ms (default: 0). Otherwise,
> it is
> passed as normal READ, so deadlock problem can be resolved.
> 
> Signed-off-by: Daejun Park <daejun7.park@samsung.com>
> ---
>  drivers/scsi/ufs/ufshpb.c | 11 +++++------
>  drivers/scsi/ufs/ufshpb.h |  1 +
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
> index 02fb51ae8b25..3117bd47d762 100644
> --- a/drivers/scsi/ufs/ufshpb.c
> +++ b/drivers/scsi/ufs/ufshpb.c
> @@ -548,8 +548,7 @@ static int ufshpb_execute_pre_req(struct
> ufshpb_lu *hpb, struct scsi_cmnd *cmd,
>  				 read_id);
>  	rq->cmd_len = scsi_command_size(rq->cmd);
>  
> -	if (blk_insert_cloned_request(q, req) != BLK_STS_OK)
> -		return -EAGAIN;
> +	blk_execute_rq_nowait(NULL, req, true,
> ufshpb_pre_req_compl_fn);
>  
>  	hpb->stats.pre_req_cnt++;
>  
> @@ -2315,19 +2314,19 @@ struct attribute_group
> ufs_sysfs_hpb_param_group = {
>  static int ufshpb_pre_req_mempool_init(struct ufshpb_lu *hpb)
>  {
>  	struct ufshpb_req *pre_req = NULL, *t;
> -	int qd = hpb->sdev_ufs_lu->queue_depth / 2;
>  	int i;
>  
>  	INIT_LIST_HEAD(&hpb->lh_pre_req_free);
>  
> -	hpb->pre_req = kcalloc(qd, sizeof(struct ufshpb_req),
> GFP_KERNEL);
> -	hpb->throttle_pre_req = qd;
> +	hpb->pre_req = kcalloc(HPB_INFLIGHT_PRE_REQ, sizeof(struct
> ufshpb_req),
> +			       GFP_KERNEL);
> +	hpb->throttle_pre_req = HPB_INFLIGHT_PRE_REQ;
>  	hpb->num_inflight_pre_req = 0;
>  
>  	if (!hpb->pre_req)
>  		goto release_mem;
>  
> -	for (i = 0; i < qd; i++) {
> +	for (i = 0; i < HPB_INFLIGHT_PRE_REQ; i++) {
>  		pre_req = hpb->pre_req + i;
>  		INIT_LIST_HEAD(&pre_req->list_req);
>  		pre_req->req = NULL;
> diff --git a/drivers/scsi/ufs/ufshpb.h b/drivers/scsi/ufs/ufshpb.h
> index a79e07398970..411a6d625f53 100644
> --- a/drivers/scsi/ufs/ufshpb.h
> +++ b/drivers/scsi/ufs/ufshpb.h
> @@ -50,6 +50,7 @@
>  #define HPB_RESET_REQ_RETRIES			10
>  #define HPB_MAP_REQ_RETRIES			5
>  #define HPB_REQUEUE_TIME_MS			0
> +#define HPB_INFLIGHT_PRE_REQ			4

If the block people are happy with this, then I'm OK with it, but it
doesn't look like you've solved the fanout deadlock problem because
this new mechanism is still going to allocate a new tag.

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
  2021-10-27 22:36 ` [PATCH] scsi: ufs: Fix proper API to send HPB pre-request Daejun Park
  2021-10-28  4:16   ` Bart Van Assche
  2021-10-28 14:28   ` James Bottomley
@ 2021-10-28 15:22   ` James Bottomley
  2021-10-29  1:32   ` Ming Lei
       [not found]   ` <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p3>
  4 siblings, 0 replies; 18+ messages in thread
From: James Bottomley @ 2021-10-28 15:22 UTC (permalink / raw)
  To: daejun7.park, ALIM AKHTAR, avri.altman, martin.petersen, huobean,
	bvanassche, Keoseong Park
  Cc: linux-scsi, linux-kernel, linux-block, Jens Axboe, Christoph Hellwig

+Jens, Christoph and linux-block

On Thu, 2021-10-28 at 07:36 +0900, Daejun Park wrote:
> This patch addresses the issue of using the wrong API to create a
> pre_request for HPB READ.
> HPB READ candidate that require a pre-request will try to allocate a
> pre-request only during request_timeout_ms (default: 0). Otherwise,
> it is
> passed as normal READ, so deadlock problem can be resolved.
> 
> Signed-off-by: Daejun Park <daejun7.park@samsung.com>
> ---
>  drivers/scsi/ufs/ufshpb.c | 11 +++++------
>  drivers/scsi/ufs/ufshpb.h |  1 +
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
> index 02fb51ae8b25..3117bd47d762 100644
> --- a/drivers/scsi/ufs/ufshpb.c
> +++ b/drivers/scsi/ufs/ufshpb.c
> @@ -548,8 +548,7 @@ static int ufshpb_execute_pre_req(struct
> ufshpb_lu *hpb, struct scsi_cmnd *cmd,
>  				 read_id);
>  	rq->cmd_len = scsi_command_size(rq->cmd);
>  
> -	if (blk_insert_cloned_request(q, req) != BLK_STS_OK)
> -		return -EAGAIN;
> +	blk_execute_rq_nowait(NULL, req, true,
> ufshpb_pre_req_compl_fn);
>  
>  	hpb->stats.pre_req_cnt++;
>  
> @@ -2315,19 +2314,19 @@ struct attribute_group
> ufs_sysfs_hpb_param_group = {
>  static int ufshpb_pre_req_mempool_init(struct ufshpb_lu *hpb)
>  {
>  	struct ufshpb_req *pre_req = NULL, *t;
> -	int qd = hpb->sdev_ufs_lu->queue_depth / 2;
>  	int i;
>  
>  	INIT_LIST_HEAD(&hpb->lh_pre_req_free);
>  
> -	hpb->pre_req = kcalloc(qd, sizeof(struct ufshpb_req),
> GFP_KERNEL);
> -	hpb->throttle_pre_req = qd;
> +	hpb->pre_req = kcalloc(HPB_INFLIGHT_PRE_REQ, sizeof(struct
> ufshpb_req),
> +			       GFP_KERNEL);
> +	hpb->throttle_pre_req = HPB_INFLIGHT_PRE_REQ;
>  	hpb->num_inflight_pre_req = 0;
>  
>  	if (!hpb->pre_req)
>  		goto release_mem;
>  
> -	for (i = 0; i < qd; i++) {
> +	for (i = 0; i < HPB_INFLIGHT_PRE_REQ; i++) {
>  		pre_req = hpb->pre_req + i;
>  		INIT_LIST_HEAD(&pre_req->list_req);
>  		pre_req->req = NULL;
> diff --git a/drivers/scsi/ufs/ufshpb.h b/drivers/scsi/ufs/ufshpb.h
> index a79e07398970..411a6d625f53 100644
> --- a/drivers/scsi/ufs/ufshpb.h
> +++ b/drivers/scsi/ufs/ufshpb.h
> @@ -50,6 +50,7 @@
>  #define HPB_RESET_REQ_RETRIES			10
>  #define HPB_MAP_REQ_RETRIES			5
>  #define HPB_REQUEUE_TIME_MS			0
> +#define HPB_INFLIGHT_PRE_REQ			4
>  
>  #define HPB_SUPPORT_VERSION			0x200
>  #define HPB_SUPPORT_LEGACY_VERSION		0x100



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
  2021-10-28 14:28   ` James Bottomley
@ 2021-10-28 15:27     ` Christoph Hellwig
  2021-10-28 17:07       ` Bart Van Assche
  2021-10-28 15:59     ` Bart Van Assche
  1 sibling, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2021-10-28 15:27 UTC (permalink / raw)
  To: James Bottomley
  Cc: daejun7.park, ALIM AKHTAR, avri.altman, martin.petersen, huobean,
	bvanassche, Keoseong Park, linux-scsi, linux-kernel

On Thu, Oct 28, 2021 at 10:28:01AM -0400, James Bottomley wrote:
> If the block people are happy with this, then I'm OK with it, but it
> doesn't look like you've solved the fanout deadlock problem because
> this new mechanism is still going to allocate a new tag.

Yes, same problem as before.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
  2021-10-28 14:28   ` James Bottomley
  2021-10-28 15:27     ` Christoph Hellwig
@ 2021-10-28 15:59     ` Bart Van Assche
  2021-10-28 19:12       ` James Bottomley
       [not found]       ` <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p8>
  1 sibling, 2 replies; 18+ messages in thread
From: Bart Van Assche @ 2021-10-28 15:59 UTC (permalink / raw)
  To: jejb, daejun7.park, ALIM AKHTAR, avri.altman, martin.petersen,
	huobean, Keoseong Park
  Cc: linux-scsi, linux-kernel, linux-block, Jens Axboe, Christoph Hellwig

On 10/28/21 7:28 AM, James Bottomley wrote:
> If the block people are happy with this, then I'm OK with it, but it
> doesn't look like you've solved the fanout deadlock problem because
> this new mechanism is still going to allocate a new tag.

(+Jens, Christoph and linux-block)

Hi James,

My understanding is that the UFS HPB code makes ufshcd_queuecommand()
return SCSI_MLQUEUE_HOST_BUSY if the pool with pre-allocated requests is
exhausted. This will make the SCSI core reissue a SCSI command until
completion of another command has freed up one of the pre-allocated
requests. This is not the most efficient approach but should not trigger
a deadlock.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
  2021-10-28 15:27     ` Christoph Hellwig
@ 2021-10-28 17:07       ` Bart Van Assche
  2021-10-28 17:10         ` Christoph Hellwig
  0 siblings, 1 reply; 18+ messages in thread
From: Bart Van Assche @ 2021-10-28 17:07 UTC (permalink / raw)
  To: Christoph Hellwig, James Bottomley
  Cc: daejun7.park, ALIM AKHTAR, avri.altman, martin.petersen, huobean,
	Keoseong Park, linux-scsi, linux-kernel

On 10/28/21 8:27 AM, Christoph Hellwig wrote:
> On Thu, Oct 28, 2021 at 10:28:01AM -0400, James Bottomley wrote:
>> If the block people are happy with this, then I'm OK with it, but it
>> doesn't look like you've solved the fanout deadlock problem because
>> this new mechanism is still going to allocate a new tag.
> 
> Yes, same problem as before.

Hi Christoph,

I spent some time looking around for other examples of allocating and
inserting a request from inside block layer callbacks. I only found one
such example, namely in the NVMe core. nvme_timeout() calls
nvme_alloc_request() and blk_execute_rq_nowait(). The difference between
what the UFS HPB code is doing and what nvme_timeout() does doesn't seem
that big to me. For clarity, I don't like the UFS HPB protocol nor how
support for that protocol has been implemented. However, I don't see how
the UFS HPB implementation would complicate maintenance of the block
layer core. Am I perhaps missing something?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
  2021-10-28 17:07       ` Bart Van Assche
@ 2021-10-28 17:10         ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2021-10-28 17:10 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, James Bottomley, daejun7.park, ALIM AKHTAR,
	avri.altman, martin.petersen, huobean, Keoseong Park, linux-scsi,
	linux-kernel

On Thu, Oct 28, 2021 at 10:07:52AM -0700, Bart Van Assche wrote:
> I spent some time looking around for other examples of allocating and
> inserting a request from inside block layer callbacks. I only found one
> such example, namely in the NVMe core. nvme_timeout() calls
> nvme_alloc_request() and blk_execute_rq_nowait(). The difference between
> what the UFS HPB code is doing and what nvme_timeout() does doesn't seem
> that big to me.

The difference is that nvme_timeout allocates a request on the
admin queue, and only does so for commands on the I/O queues.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
  2021-10-28 15:59     ` Bart Van Assche
@ 2021-10-28 19:12       ` James Bottomley
  2021-10-28 20:04         ` Bart Van Assche
       [not found]       ` <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p8>
  1 sibling, 1 reply; 18+ messages in thread
From: James Bottomley @ 2021-10-28 19:12 UTC (permalink / raw)
  To: Bart Van Assche, daejun7.park, ALIM AKHTAR, avri.altman,
	martin.petersen, huobean, Keoseong Park
  Cc: linux-scsi, linux-kernel, linux-block, Jens Axboe, Christoph Hellwig

On Thu, 2021-10-28 at 08:59 -0700, Bart Van Assche wrote:
> On 10/28/21 7:28 AM, James Bottomley wrote:
> > If the block people are happy with this, then I'm OK with it, but
> > it
> > doesn't look like you've solved the fanout deadlock problem because
> > this new mechanism is still going to allocate a new tag.
> 
> (+Jens, Christoph and linux-block)
> 
> Hi James,
> 
> My understanding is that the UFS HPB code makes ufshcd_queuecommand()
> return SCSI_MLQUEUE_HOST_BUSY if the pool with pre-allocated requests
> is exhausted. This will make the SCSI core reissue a SCSI command
> until completion of another command has freed up one of the pre-
> allocated requests. This is not the most efficient approach but
> should not trigger a deadlock.

I think the deadlock is triggered if the system is down to its last
reserved request on the memory clearing device and the next entry in
the queue for this device is one which does a fanout so we can't
service it with the single reserved request we have left for the
purposes of making forward progress.  Sending it back doesn't help,
assuming this is the only memory clearing path, because retrying it
won't help ... we have to succeed with a request on this path to move
forward with clearing memory.

I think this problem could be solved by processing the WRITE BUFFER and
the request serially by hijacking the request sent down, but we can't
solve it if we try to allocate a new request.

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
  2021-10-28 19:12       ` James Bottomley
@ 2021-10-28 20:04         ` Bart Van Assche
  2021-10-28 20:12           ` James Bottomley
  0 siblings, 1 reply; 18+ messages in thread
From: Bart Van Assche @ 2021-10-28 20:04 UTC (permalink / raw)
  To: jejb, daejun7.park, ALIM AKHTAR, avri.altman, martin.petersen,
	huobean, Keoseong Park
  Cc: linux-scsi, linux-kernel, linux-block, Jens Axboe,
	Christoph Hellwig, Jaegeuk Kim

On 10/28/21 12:12 PM, James Bottomley wrote:
> I think the deadlock is triggered if the system is down to its last
> reserved request on the memory clearing device and the next entry in
> the queue for this device is one which does a fanout so we can't
> service it with the single reserved request we have left for the
> purposes of making forward progress.  Sending it back doesn't help,
> assuming this is the only memory clearing path, because retrying it
> won't help ... we have to succeed with a request on this path to move
> forward with clearing memory.
> 
> I think this problem could be solved by processing the WRITE BUFFER and
> the request serially by hijacking the request sent down, but we can't
> solve it if we try to allocate a new request.

Hi James,

How about fixing the abuse of blk_insert_cloned_request() in the UFS HPB
before the v5.16 SCSI pull request is sent to Linus and implementing the
proposal from your email at a later time? I'm proposing to defer further
UFS HPB rework since the issue described above only affects UFS HPB users
and does not obstruct maintenance or refactoring of the block layer core.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
  2021-10-28 20:04         ` Bart Van Assche
@ 2021-10-28 20:12           ` James Bottomley
  0 siblings, 0 replies; 18+ messages in thread
From: James Bottomley @ 2021-10-28 20:12 UTC (permalink / raw)
  To: Bart Van Assche, daejun7.park, ALIM AKHTAR, avri.altman,
	martin.petersen, huobean, Keoseong Park
  Cc: linux-scsi, linux-kernel, linux-block, Jens Axboe,
	Christoph Hellwig, Jaegeuk Kim

On Thu, 2021-10-28 at 13:04 -0700, Bart Van Assche wrote:
> On 10/28/21 12:12 PM, James Bottomley wrote:
> > I think the deadlock is triggered if the system is down to its last
> > reserved request on the memory clearing device and the next entry
> > in the queue for this device is one which does a fanout so we can't
> > service it with the single reserved request we have left for the
> > purposes of making forward progress.  Sending it back doesn't help,
> > assuming this is the only memory clearing path, because retrying it
> > won't help ... we have to succeed with a request on this path to
> > move forward with clearing memory.
> > 
> > I think this problem could be solved by processing the WRITE BUFFER
> > and the request serially by hijacking the request sent down, but we
> > can't solve it if we try to allocate a new request.
> 
> Hi James,
> 
> How about fixing the abuse of blk_insert_cloned_request() in the UFS
> HPB before the v5.16 SCSI pull request is sent to Linus and
> implementing the proposal from your email at a later time? I'm
> proposing to defer further UFS HPB rework since the issue described
> above only affects UFS HPB users and does not obstruct maintenance or
> refactoring of the block layer core.

Well, yes, I'm already on record as saying we need to do that and add
the functionality back compatibly in a later release.  I think excising
the WRITE BUFFER path, which is simply an optimization and will affect
performance but not function, solves the above issue (and the clone API
problem as well) completely but I haven't heard the patch I proposed
has actually been tested yet.

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
       [not found]       ` <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p8>
@ 2021-10-28 21:20         ` Daejun Park
  0 siblings, 0 replies; 18+ messages in thread
From: Daejun Park @ 2021-10-28 21:20 UTC (permalink / raw)
  To: jejb, Bart Van Assche, Daejun Park, ALIM AKHTAR, avri.altman,
	martin.petersen, huobean, Keoseong Park
  Cc: linux-scsi, linux-kernel, linux-block, Jens Axboe, Christoph Hellwig

Hi James,

> On Thu, 2021-10-28 at 08:59 -0700, Bart Van Assche wrote:
> > On 10/28/21 7:28 AM, James Bottomley wrote:
> > > If the block people are happy with this, then I'm OK with it, but
> > > it
> > > doesn't look like you've solved the fanout deadlock problem because
> > > this new mechanism is still going to allocate a new tag.
> > 
> > (+Jens, Christoph and linux-block)
> > 
> > Hi James,
> > 
> > My understanding is that the UFS HPB code makes ufshcd_queuecommand()
> > return SCSI_MLQUEUE_HOST_BUSY if the pool with pre-allocated requests
> > is exhausted. This will make the SCSI core reissue a SCSI command
> > until completion of another command has freed up one of the pre-
> > allocated requests. This is not the most efficient approach but
> > should not trigger a deadlock.
>  
> I think the deadlock is triggered if the system is down to its last
> reserved request on the memory clearing device and the next entry in
> the queue for this device is one which does a fanout so we can't
> service it with the single reserved request we have left for the
> purposes of making forward progress.  Sending it back doesn't help,
> assuming this is the only memory clearing path, because retrying it
> won't help ... we have to succeed with a request on this path to move
> forward with clearing memory.
The above approach can retry several times (before the HPB timeout) but, it
gives up to allocate pre-request and it sends as just READ. So deadlock can
be avoided.

Thanks,
Daejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
  2021-10-27 22:36 ` [PATCH] scsi: ufs: Fix proper API to send HPB pre-request Daejun Park
                     ` (2 preceding siblings ...)
  2021-10-28 15:22   ` James Bottomley
@ 2021-10-29  1:32   ` Ming Lei
       [not found]   ` <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p3>
  4 siblings, 0 replies; 18+ messages in thread
From: Ming Lei @ 2021-10-29  1:32 UTC (permalink / raw)
  To: Daejun Park
  Cc: ALIM AKHTAR, avri.altman, jejb, martin.petersen, huobean,
	bvanassche, Keoseong Park, linux-scsi, linux-kernel, ming.lei

On Thu, Oct 28, 2021 at 07:36:19AM +0900, Daejun Park wrote:
> This patch addresses the issue of using the wrong API to create a
> pre_request for HPB READ.
> HPB READ candidate that require a pre-request will try to allocate a
> pre-request only during request_timeout_ms (default: 0). Otherwise, it is

Can you explain about 'only during request_timeout_ms'?

From the following code in ufshpb_prep(), the pre-request is allocated
for each READ IO in case of (!ufshpb_is_legacy(hba) && ufshpb_is_required_wb(hpb,
transfer_len)).

   if (!ufshpb_is_legacy(hba) &&
            ufshpb_is_required_wb(hpb, transfer_len)) {
                err = ufshpb_issue_pre_req(hpb, cmd, &read_id);

> passed as normal READ, so deadlock problem can be resolved.
> 
> Signed-off-by: Daejun Park <daejun7.park@samsung.com>
> ---
>  drivers/scsi/ufs/ufshpb.c | 11 +++++------
>  drivers/scsi/ufs/ufshpb.h |  1 +
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
> index 02fb51ae8b25..3117bd47d762 100644
> --- a/drivers/scsi/ufs/ufshpb.c
> +++ b/drivers/scsi/ufs/ufshpb.c
> @@ -548,8 +548,7 @@ static int ufshpb_execute_pre_req(struct ufshpb_lu *hpb, struct scsi_cmnd *cmd,
>  				 read_id);
>  	rq->cmd_len = scsi_command_size(rq->cmd);
>  
> -	if (blk_insert_cloned_request(q, req) != BLK_STS_OK)
> -		return -EAGAIN;
> +	blk_execute_rq_nowait(NULL, req, true, ufshpb_pre_req_compl_fn);

Be care with above change, blk_insert_cloned_request() allocates
driver tag and issues the request to LLD directly, then returns the
result. If anything fails in the code path, -EAGAIN is returned.

But blk_execute_rq_nowait() simply queued the request in block layer,
and run hw queue. It doesn't allocate driver tag, and doesn't issue it
to LLD.

So ufshpb_execute_pre_req() may think the pre-request is issued to LLD
successfully, but actually not, maybe never. What will happen after the
READ IO is issued to device, but the pre-request(write buffer) isn't
sent to device?

>  
>  	hpb->stats.pre_req_cnt++;
>  
> @@ -2315,19 +2314,19 @@ struct attribute_group ufs_sysfs_hpb_param_group = {
>  static int ufshpb_pre_req_mempool_init(struct ufshpb_lu *hpb)
>  {
>  	struct ufshpb_req *pre_req = NULL, *t;
> -	int qd = hpb->sdev_ufs_lu->queue_depth / 2;
>  	int i;
>  
>  	INIT_LIST_HEAD(&hpb->lh_pre_req_free);
>  
> -	hpb->pre_req = kcalloc(qd, sizeof(struct ufshpb_req), GFP_KERNEL);
> -	hpb->throttle_pre_req = qd;
> +	hpb->pre_req = kcalloc(HPB_INFLIGHT_PRE_REQ, sizeof(struct ufshpb_req),
> +			       GFP_KERNEL);
> +	hpb->throttle_pre_req = HPB_INFLIGHT_PRE_REQ;
>  	hpb->num_inflight_pre_req = 0;
>  
>  	if (!hpb->pre_req)
>  		goto release_mem;
>  
> -	for (i = 0; i < qd; i++) {
> +	for (i = 0; i < HPB_INFLIGHT_PRE_REQ; i++) {
>  		pre_req = hpb->pre_req + i;
>  		INIT_LIST_HEAD(&pre_req->list_req);
>  		pre_req->req = NULL;
> diff --git a/drivers/scsi/ufs/ufshpb.h b/drivers/scsi/ufs/ufshpb.h
> index a79e07398970..411a6d625f53 100644
> --- a/drivers/scsi/ufs/ufshpb.h
> +++ b/drivers/scsi/ufs/ufshpb.h
> @@ -50,6 +50,7 @@
>  #define HPB_RESET_REQ_RETRIES			10
>  #define HPB_MAP_REQ_RETRIES			5
>  #define HPB_REQUEUE_TIME_MS			0
> +#define HPB_INFLIGHT_PRE_REQ			4

Can you explain how this change solves the deadlock?

Thanks,
Ming


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
       [not found]   ` <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p3>
@ 2021-10-29  1:50     ` Daejun Park
  2021-10-29  2:10       ` Ming Lei
       [not found]       ` <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p4>
  0 siblings, 2 replies; 18+ messages in thread
From: Daejun Park @ 2021-10-29  1:50 UTC (permalink / raw)
  To: Ming Lei, Daejun Park
  Cc: ALIM AKHTAR, avri.altman, jejb, martin.petersen, huobean,
	bvanassche, Keoseong Park, linux-scsi, linux-kernel

> On Thu, Oct 28, 2021 at 07:36:19AM +0900, Daejun Park wrote:
> > This patch addresses the issue of using the wrong API to create a
> > pre_request for HPB READ.
> > HPB READ candidate that require a pre-request will try to allocate a
> > pre-request only during request_timeout_ms (default: 0). Otherwise, it is
>  
> Can you explain about 'only during request_timeout_ms'?
>  
> From the following code in ufshpb_prep(), the pre-request is allocated
> for each READ IO in case of (!ufshpb_is_legacy(hba) && ufshpb_is_required_wb(hpb,
> transfer_len)).
>  
>    if (!ufshpb_is_legacy(hba) &&
>             ufshpb_is_required_wb(hpb, transfer_len)) {
>                 err = ufshpb_issue_pre_req(hpb, cmd, &read_id);
>  
> > passed as normal READ, so deadlock problem can be resolved.
> > 
> > Signed-off-by: Daejun Park <daejun7.park@samsung.com>
> > ---
> >  drivers/scsi/ufs/ufshpb.c | 11 +++++------
> >  drivers/scsi/ufs/ufshpb.h |  1 +
> >  2 files changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
> > index 02fb51ae8b25..3117bd47d762 100644
> > --- a/drivers/scsi/ufs/ufshpb.c
> > +++ b/drivers/scsi/ufs/ufshpb.c
> > @@ -548,8 +548,7 @@ static int ufshpb_execute_pre_req(struct ufshpb_lu *hpb, struct scsi_cmnd *cmd,
> >                                   read_id);
> >          rq->cmd_len = scsi_command_size(rq->cmd);
> >  
> > -        if (blk_insert_cloned_request(q, req) != BLK_STS_OK)
> > -                return -EAGAIN;
> > +        blk_execute_rq_nowait(NULL, req, true, ufshpb_pre_req_compl_fn);
>  
> Be care with above change, blk_insert_cloned_request() allocates
> driver tag and issues the request to LLD directly, then returns the
> result. If anything fails in the code path, -EAGAIN is returned.
>  
> But blk_execute_rq_nowait() simply queued the request in block layer,
> and run hw queue. It doesn't allocate driver tag, and doesn't issue it
> to LLD.
>  
> So ufshpb_execute_pre_req() may think the pre-request is issued to LLD
> successfully, but actually not, maybe never. What will happen after the
> READ IO is issued to device, but the pre-request(write buffer) isn't
> sent to device?

In that case, the HPB READ cannot get benefit from pre-request. But it is not
common case.

> Can you explain how this change solves the deadlock?

The deadlock is happen when the READ waiting allocation of pre-request. But
the timeout code makes to stop waiting after given time later.

Thanks,
Daejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
  2021-10-29  1:50     ` Daejun Park
@ 2021-10-29  2:10       ` Ming Lei
       [not found]       ` <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p4>
  1 sibling, 0 replies; 18+ messages in thread
From: Ming Lei @ 2021-10-29  2:10 UTC (permalink / raw)
  To: Daejun Park
  Cc: ALIM AKHTAR, avri.altman, jejb, martin.petersen, huobean,
	bvanassche, Keoseong Park, linux-scsi, linux-kernel, ming.lei

On Fri, Oct 29, 2021 at 10:50:15AM +0900, Daejun Park wrote:
> > On Thu, Oct 28, 2021 at 07:36:19AM +0900, Daejun Park wrote:
> > > This patch addresses the issue of using the wrong API to create a
> > > pre_request for HPB READ.
> > > HPB READ candidate that require a pre-request will try to allocate a
> > > pre-request only during request_timeout_ms (default: 0). Otherwise, it is
> >  
> > Can you explain about 'only during request_timeout_ms'?
> >  
> > From the following code in ufshpb_prep(), the pre-request is allocated
> > for each READ IO in case of (!ufshpb_is_legacy(hba) && ufshpb_is_required_wb(hpb,
> > transfer_len)).
> >  
> >    if (!ufshpb_is_legacy(hba) &&
> >             ufshpb_is_required_wb(hpb, transfer_len)) {
> >                 err = ufshpb_issue_pre_req(hpb, cmd, &read_id);
> >  
> > > passed as normal READ, so deadlock problem can be resolved.
> > > 
> > > Signed-off-by: Daejun Park <daejun7.park@samsung.com>
> > > ---
> > >  drivers/scsi/ufs/ufshpb.c | 11 +++++------
> > >  drivers/scsi/ufs/ufshpb.h |  1 +
> > >  2 files changed, 6 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
> > > index 02fb51ae8b25..3117bd47d762 100644
> > > --- a/drivers/scsi/ufs/ufshpb.c
> > > +++ b/drivers/scsi/ufs/ufshpb.c
> > > @@ -548,8 +548,7 @@ static int ufshpb_execute_pre_req(struct ufshpb_lu *hpb, struct scsi_cmnd *cmd,
> > >                                   read_id);
> > >          rq->cmd_len = scsi_command_size(rq->cmd);
> > >  
> > > -        if (blk_insert_cloned_request(q, req) != BLK_STS_OK)
> > > -                return -EAGAIN;
> > > +        blk_execute_rq_nowait(NULL, req, true, ufshpb_pre_req_compl_fn);
> >  
> > Be care with above change, blk_insert_cloned_request() allocates
> > driver tag and issues the request to LLD directly, then returns the
> > result. If anything fails in the code path, -EAGAIN is returned.
> >  
> > But blk_execute_rq_nowait() simply queued the request in block layer,
> > and run hw queue. It doesn't allocate driver tag, and doesn't issue it
> > to LLD.
> >  
> > So ufshpb_execute_pre_req() may think the pre-request is issued to LLD
> > successfully, but actually not, maybe never. What will happen after the
> > READ IO is issued to device, but the pre-request(write buffer) isn't
> > sent to device?
> 
> In that case, the HPB READ cannot get benefit from pre-request. But it is not
> common case.

OK, so the device will ignore the pre-request if it isn't received in
time, not sure it is common or not, since blk_execute_rq_nowait()
doesn't provide any feedback. Here looks blk_insert_cloned_request()
is better.

> 
> > Can you explain how this change solves the deadlock?
> 
> The deadlock is happen when the READ waiting allocation of pre-request. But
> the timeout code makes to stop waiting after given time later.

If you mean blk-mq timeout code will be triggered, I think it won't.
Meantime, LLD may see nothing to timeout too.

For example, in case of none io sched, the queue depth is 128, and 128 READ
IOs are sent to ufshcd_queuecommand() at the same time, and all these 128 IOs
require to allocate pre-request, but no one can move on because 128 tags are
used up.

So no request can be sent to device and BLK_STS_RESOURCE is always returned
from ufshcd_queuecommand(), also when blk-mq timeout is triggered, all in-flight
request's state may just be updated as IDLE, so blk-mq may find nothing to expire.

In case of real io scheduler, request->tag is released when BLK_STS_RESOURCE
is returned from ufshcd_queuecommand(), but it doesn't mean that
pre-request can get tag always.

The approach[1] of reserving one slot for pre-request should provide
forward progress guarantee.


[1] https://lore.kernel.org/linux-block/YXoF59XeZ5KS0jZj@T590/

Thanks,
Ming


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
       [not found]       ` <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p4>
@ 2021-10-29  2:50         ` Daejun Park
  2021-10-29  3:14           ` Ming Lei
       [not found]           ` <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p1>
  0 siblings, 2 replies; 18+ messages in thread
From: Daejun Park @ 2021-10-29  2:50 UTC (permalink / raw)
  To: Ming Lei, Daejun Park
  Cc: ALIM AKHTAR, avri.altman, jejb, martin.petersen, huobean,
	bvanassche, Keoseong Park, linux-scsi, linux-kernel

> On Fri, Oct 29, 2021 at 10:50:15AM +0900, Daejun Park wrote:
> > > On Thu, Oct 28, 2021 at 07:36:19AM +0900, Daejun Park wrote:
> > > > This patch addresses the issue of using the wrong API to create a
> > > > pre_request for HPB READ.
> > > > HPB READ candidate that require a pre-request will try to allocate a
> > > > pre-request only during request_timeout_ms (default: 0). Otherwise, it is
> > >  
> > > Can you explain about 'only during request_timeout_ms'?
> > >  
> > > From the following code in ufshpb_prep(), the pre-request is allocated
> > > for each READ IO in case of (!ufshpb_is_legacy(hba) && ufshpb_is_required_wb(hpb,
> > > transfer_len)).
> > >  
> > >    if (!ufshpb_is_legacy(hba) &&
> > >             ufshpb_is_required_wb(hpb, transfer_len)) {
> > >                 err = ufshpb_issue_pre_req(hpb, cmd, &read_id);
> > >  
> > > > passed as normal READ, so deadlock problem can be resolved.
> > > > 
> > > > Signed-off-by: Daejun Park <daejun7.park@samsung.com>
> > > > ---
> > > >  drivers/scsi/ufs/ufshpb.c | 11 +++++------
> > > >  drivers/scsi/ufs/ufshpb.h |  1 +
> > > >  2 files changed, 6 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
> > > > index 02fb51ae8b25..3117bd47d762 100644
> > > > --- a/drivers/scsi/ufs/ufshpb.c
> > > > +++ b/drivers/scsi/ufs/ufshpb.c
> > > > @@ -548,8 +548,7 @@ static int ufshpb_execute_pre_req(struct ufshpb_lu *hpb, struct scsi_cmnd *cmd,
> > > >                                   read_id);
> > > >          rq->cmd_len = scsi_command_size(rq->cmd);
> > > >  
> > > > -        if (blk_insert_cloned_request(q, req) != BLK_STS_OK)
> > > > -                return -EAGAIN;
> > > > +        blk_execute_rq_nowait(NULL, req, true, ufshpb_pre_req_compl_fn);
> > >  
> > > Be care with above change, blk_insert_cloned_request() allocates
> > > driver tag and issues the request to LLD directly, then returns the
> > > result. If anything fails in the code path, -EAGAIN is returned.
> > >  
> > > But blk_execute_rq_nowait() simply queued the request in block layer,
> > > and run hw queue. It doesn't allocate driver tag, and doesn't issue it
> > > to LLD.
> > >  
> > > So ufshpb_execute_pre_req() may think the pre-request is issued to LLD
> > > successfully, but actually not, maybe never. What will happen after the
> > > READ IO is issued to device, but the pre-request(write buffer) isn't
> > > sent to device?
> > 
> > In that case, the HPB READ cannot get benefit from pre-request. But it is not
> > common case.
>  
> OK, so the device will ignore the pre-request if it isn't received in
> time, not sure it is common or not, since blk_execute_rq_nowait()
> doesn't provide any feedback. Here looks blk_insert_cloned_request()
> is better.

Yor're right.

> > 
> > > Can you explain how this change solves the deadlock?
> > 
> > The deadlock is happen when the READ waiting allocation of pre-request. But
> > the timeout code makes to stop waiting after given time later.
>  
> If you mean blk-mq timeout code will be triggered, I think it won't.
> Meantime, LLD may see nothing to timeout too.

I mean timeout of the HPB code. Please refer following code:

if (!ufshpb_is_legacy(hba) &&
	ufshpb_is_required_wb(hpb, transfer_len)) {
	err = ufshpb_issue_pre_req(hpb, cmd, &read_id);
	if (err) {
		unsigned long timeout;

		timeout = cmd->jiffies_at_alloc + msecs_to_jiffies(
			  hpb->params.requeue_timeout_ms);

		if (time_before(jiffies, timeout))
			return -EAGAIN;

		hpb->stats.miss_cnt++;
		return 0;
	}
}

Although the return value of ufshpb_issue_pre_req() is -EAGAIN, the code
ignores the return value and issues READ not HPB READ.

Thanks,
Daejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
  2021-10-29  2:50         ` Daejun Park
@ 2021-10-29  3:14           ` Ming Lei
       [not found]           ` <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p1>
  1 sibling, 0 replies; 18+ messages in thread
From: Ming Lei @ 2021-10-29  3:14 UTC (permalink / raw)
  To: Daejun Park
  Cc: ALIM AKHTAR, avri.altman, jejb, martin.petersen, huobean,
	bvanassche, Keoseong Park, linux-scsi, linux-kernel

On Fri, Oct 29, 2021 at 11:50:12AM +0900, Daejun Park wrote:
> > On Fri, Oct 29, 2021 at 10:50:15AM +0900, Daejun Park wrote:
> > > > On Thu, Oct 28, 2021 at 07:36:19AM +0900, Daejun Park wrote:
> > > > > This patch addresses the issue of using the wrong API to create a
> > > > > pre_request for HPB READ.
> > > > > HPB READ candidate that require a pre-request will try to allocate a
> > > > > pre-request only during request_timeout_ms (default: 0). Otherwise, it is
> > > >  
> > > > Can you explain about 'only during request_timeout_ms'?
> > > >  
> > > > From the following code in ufshpb_prep(), the pre-request is allocated
> > > > for each READ IO in case of (!ufshpb_is_legacy(hba) && ufshpb_is_required_wb(hpb,
> > > > transfer_len)).
> > > >  
> > > >    if (!ufshpb_is_legacy(hba) &&
> > > >             ufshpb_is_required_wb(hpb, transfer_len)) {
> > > >                 err = ufshpb_issue_pre_req(hpb, cmd, &read_id);
> > > >  
> > > > > passed as normal READ, so deadlock problem can be resolved.
> > > > > 
> > > > > Signed-off-by: Daejun Park <daejun7.park@samsung.com>
> > > > > ---
> > > > >  drivers/scsi/ufs/ufshpb.c | 11 +++++------
> > > > >  drivers/scsi/ufs/ufshpb.h |  1 +
> > > > >  2 files changed, 6 insertions(+), 6 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
> > > > > index 02fb51ae8b25..3117bd47d762 100644
> > > > > --- a/drivers/scsi/ufs/ufshpb.c
> > > > > +++ b/drivers/scsi/ufs/ufshpb.c
> > > > > @@ -548,8 +548,7 @@ static int ufshpb_execute_pre_req(struct ufshpb_lu *hpb, struct scsi_cmnd *cmd,
> > > > >                                   read_id);
> > > > >          rq->cmd_len = scsi_command_size(rq->cmd);
> > > > >  
> > > > > -        if (blk_insert_cloned_request(q, req) != BLK_STS_OK)
> > > > > -                return -EAGAIN;
> > > > > +        blk_execute_rq_nowait(NULL, req, true, ufshpb_pre_req_compl_fn);
> > > >  
> > > > Be care with above change, blk_insert_cloned_request() allocates
> > > > driver tag and issues the request to LLD directly, then returns the
> > > > result. If anything fails in the code path, -EAGAIN is returned.
> > > >  
> > > > But blk_execute_rq_nowait() simply queued the request in block layer,
> > > > and run hw queue. It doesn't allocate driver tag, and doesn't issue it
> > > > to LLD.
> > > >  
> > > > So ufshpb_execute_pre_req() may think the pre-request is issued to LLD
> > > > successfully, but actually not, maybe never. What will happen after the
> > > > READ IO is issued to device, but the pre-request(write buffer) isn't
> > > > sent to device?
> > > 
> > > In that case, the HPB READ cannot get benefit from pre-request. But it is not
> > > common case.
> >  
> > OK, so the device will ignore the pre-request if it isn't received in
> > time, not sure it is common or not, since blk_execute_rq_nowait()
> > doesn't provide any feedback. Here looks blk_insert_cloned_request()
> > is better.
> 
> Yor're right.
> 
> > > 
> > > > Can you explain how this change solves the deadlock?
> > > 
> > > The deadlock is happen when the READ waiting allocation of pre-request. But
> > > the timeout code makes to stop waiting after given time later.
> >  
> > If you mean blk-mq timeout code will be triggered, I think it won't.
> > Meantime, LLD may see nothing to timeout too.
> 
> I mean timeout of the HPB code. Please refer following code:
> 
> if (!ufshpb_is_legacy(hba) &&
> 	ufshpb_is_required_wb(hpb, transfer_len)) {
> 	err = ufshpb_issue_pre_req(hpb, cmd, &read_id);
> 	if (err) {
> 		unsigned long timeout;
> 
> 		timeout = cmd->jiffies_at_alloc + msecs_to_jiffies(
> 			  hpb->params.requeue_timeout_ms);
> 
> 		if (time_before(jiffies, timeout))
> 			return -EAGAIN;
> 
> 		hpb->stats.miss_cnt++;
> 		return 0;
> 	}
> }
> 
> Although the return value of ufshpb_issue_pre_req() is -EAGAIN, the code
> ignores the return value and issues READ not HPB READ.

OK, got it, this way should avoid the deadlock. But just be curious why
you change hpb->throttle_pre_req to 4, seems it isn't necessary for
avoiding the deadlock?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH] scsi: ufs: Fix proper API to send HPB pre-request
       [not found]           ` <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p1>
@ 2021-10-29  4:39             ` Daejun Park
  0 siblings, 0 replies; 18+ messages in thread
From: Daejun Park @ 2021-10-29  4:39 UTC (permalink / raw)
  To: Ming Lei, Daejun Park
  Cc: ALIM AKHTAR, avri.altman, jejb, martin.petersen, huobean,
	bvanassche, Keoseong Park, linux-scsi, linux-kernel

> On Fri, Oct 29, 2021 at 11:50:12AM +0900, Daejun Park wrote:
> > > On Fri, Oct 29, 2021 at 10:50:15AM +0900, Daejun Park wrote:
> > > > > On Thu, Oct 28, 2021 at 07:36:19AM +0900, Daejun Park wrote:
> > > > > > This patch addresses the issue of using the wrong API to create a
> > > > > > pre_request for HPB READ.
> > > > > > HPB READ candidate that require a pre-request will try to allocate a
> > > > > > pre-request only during request_timeout_ms (default: 0). Otherwise, it is
> > > > >  
> > > > > Can you explain about 'only during request_timeout_ms'?
> > > > >  
> > > > > From the following code in ufshpb_prep(), the pre-request is allocated
> > > > > for each READ IO in case of (!ufshpb_is_legacy(hba) && ufshpb_is_required_wb(hpb,
> > > > > transfer_len)).
> > > > >  
> > > > >    if (!ufshpb_is_legacy(hba) &&
> > > > >             ufshpb_is_required_wb(hpb, transfer_len)) {
> > > > >                 err = ufshpb_issue_pre_req(hpb, cmd, &read_id);
> > > > >  
> > > > > > passed as normal READ, so deadlock problem can be resolved.
> > > > > > 
> > > > > > Signed-off-by: Daejun Park <daejun7.park@samsung.com>
> > > > > > ---
> > > > > >  drivers/scsi/ufs/ufshpb.c | 11 +++++------
> > > > > >  drivers/scsi/ufs/ufshpb.h |  1 +
> > > > > >  2 files changed, 6 insertions(+), 6 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
> > > > > > index 02fb51ae8b25..3117bd47d762 100644
> > > > > > --- a/drivers/scsi/ufs/ufshpb.c
> > > > > > +++ b/drivers/scsi/ufs/ufshpb.c
> > > > > > @@ -548,8 +548,7 @@ static int ufshpb_execute_pre_req(struct ufshpb_lu *hpb, struct scsi_cmnd *cmd,
> > > > > >                                   read_id);
> > > > > >          rq->cmd_len = scsi_command_size(rq->cmd);
> > > > > >  
> > > > > > -        if (blk_insert_cloned_request(q, req) != BLK_STS_OK)
> > > > > > -                return -EAGAIN;
> > > > > > +        blk_execute_rq_nowait(NULL, req, true, ufshpb_pre_req_compl_fn);
> > > > >  
> > > > > Be care with above change, blk_insert_cloned_request() allocates
> > > > > driver tag and issues the request to LLD directly, then returns the
> > > > > result. If anything fails in the code path, -EAGAIN is returned.
> > > > >  
> > > > > But blk_execute_rq_nowait() simply queued the request in block layer,
> > > > > and run hw queue. It doesn't allocate driver tag, and doesn't issue it
> > > > > to LLD.
> > > > >  
> > > > > So ufshpb_execute_pre_req() may think the pre-request is issued to LLD
> > > > > successfully, but actually not, maybe never. What will happen after the
> > > > > READ IO is issued to device, but the pre-request(write buffer) isn't
> > > > > sent to device?
> > > > 
> > > > In that case, the HPB READ cannot get benefit from pre-request. But it is not
> > > > common case.
> > >  
> > > OK, so the device will ignore the pre-request if it isn't received in
> > > time, not sure it is common or not, since blk_execute_rq_nowait()
> > > doesn't provide any feedback. Here looks blk_insert_cloned_request()
> > > is better.
> > 
> > Yor're right.
> > 
> > > > 
> > > > > Can you explain how this change solves the deadlock?
> > > > 
> > > > The deadlock is happen when the READ waiting allocation of pre-request. But
> > > > the timeout code makes to stop waiting after given time later.
> > >  
> > > If you mean blk-mq timeout code will be triggered, I think it won't.
> > > Meantime, LLD may see nothing to timeout too.
> > 
> > I mean timeout of the HPB code. Please refer following code:
> > 
> > if (!ufshpb_is_legacy(hba) &&
> >         ufshpb_is_required_wb(hpb, transfer_len)) {
> >         err = ufshpb_issue_pre_req(hpb, cmd, &read_id);
> >         if (err) {
> >                 unsigned long timeout;
> > 
> >                 timeout = cmd->jiffies_at_alloc + msecs_to_jiffies(
> >                           hpb->params.requeue_timeout_ms);
> > 
> >                 if (time_before(jiffies, timeout))
> >                         return -EAGAIN;
> > 
> >                 hpb->stats.miss_cnt++;
> >                 return 0;
> >         }
> > }
> > 
> > Although the return value of ufshpb_issue_pre_req() is -EAGAIN, the code
> > ignores the return value and issues READ not HPB READ.
>  
> OK, got it, this way should avoid the deadlock. But just be curious why
> you change hpb->throttle_pre_req to 4, seems it isn't necessary for
> avoiding the deadlock?

Because blk_execute_rq_nowait calls blk_mq_run_hw_queue, not dispatchs WRITE_BUFFER directly.
So, if the next request requires pre-request, it makes the latency of first read longer.
Therefore, it prevents this extreme case by limiting number of pre-request.

Thanks,
Daejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-10-29  4:39 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p6>
2021-10-27 22:36 ` [PATCH] scsi: ufs: Fix proper API to send HPB pre-request Daejun Park
2021-10-28  4:16   ` Bart Van Assche
2021-10-28 14:28   ` James Bottomley
2021-10-28 15:27     ` Christoph Hellwig
2021-10-28 17:07       ` Bart Van Assche
2021-10-28 17:10         ` Christoph Hellwig
2021-10-28 15:59     ` Bart Van Assche
2021-10-28 19:12       ` James Bottomley
2021-10-28 20:04         ` Bart Van Assche
2021-10-28 20:12           ` James Bottomley
     [not found]       ` <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p8>
2021-10-28 21:20         ` Daejun Park
2021-10-28 15:22   ` James Bottomley
2021-10-29  1:32   ` Ming Lei
     [not found]   ` <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p3>
2021-10-29  1:50     ` Daejun Park
2021-10-29  2:10       ` Ming Lei
     [not found]       ` <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p4>
2021-10-29  2:50         ` Daejun Park
2021-10-29  3:14           ` Ming Lei
     [not found]           ` <CGME20211027223619epcms2p60bbc74c9ba9757c58709a99acd0892ff@epcms2p1>
2021-10-29  4:39             ` Daejun Park

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.