All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme: cancel requests for real
@ 2020-05-27 19:09 Keith Busch
  2020-05-27 19:36 ` Dongli Zhang
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Keith Busch @ 2020-05-27 19:09 UTC (permalink / raw)
  To: linux-nvme, hch, sagi; +Cc: axboe, Keith Busch, alan.adamson

Once the driver decides to cancel requests, the concept of those
requests timing out should no longer exist. Since we can't stop fake
timeouts from preventing a forced reclaim, continue completing the same
request until the block layer isn't told to pretend that didn't happen.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ba860efd250d..72e5973dda3a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -310,7 +310,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
 		return true;
 
 	nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
-	blk_mq_complete_request(req);
+	while (!blk_mq_complete_request(req));
 	return true;
 }
 EXPORT_SYMBOL_GPL(nvme_cancel_request);
-- 
2.24.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme: cancel requests for real
  2020-05-27 19:09 [PATCH] nvme: cancel requests for real Keith Busch
@ 2020-05-27 19:36 ` Dongli Zhang
  2020-05-27 19:57   ` Keith Busch
  2020-05-27 20:18 ` Sagi Grimberg
  2020-05-28  4:58 ` Christoph Hellwig
  2 siblings, 1 reply; 8+ messages in thread
From: Dongli Zhang @ 2020-05-27 19:36 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, hch, sagi; +Cc: axboe, alan.adamson

Hi Keith,

On 5/27/20 12:09 PM, Keith Busch wrote:
> Once the driver decides to cancel requests, the concept of those
> requests timing out should no longer exist. Since we can't stop fake
> timeouts from preventing a forced reclaim, continue completing the same
> request until the block layer isn't told to pretend that didn't happen.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  drivers/nvme/host/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index ba860efd250d..72e5973dda3a 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -310,7 +310,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
>  		return true;
>  
>  	nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
> -	blk_mq_complete_request(req);
> +	while (!blk_mq_complete_request(req));
If the probability is configured to fail every time with 100% probability, we
would run into endless loop?

Dongli Zhang

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme: cancel requests for real
  2020-05-27 19:36 ` Dongli Zhang
@ 2020-05-27 19:57   ` Keith Busch
  0 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2020-05-27 19:57 UTC (permalink / raw)
  To: Dongli Zhang; +Cc: axboe, alan.adamson, hch, linux-nvme, sagi

On Wed, May 27, 2020 at 12:36:07PM -0700, Dongli Zhang wrote:
> On 5/27/20 12:09 PM, Keith Busch wrote:
> > Once the driver decides to cancel requests, the concept of those
> > requests timing out should no longer exist. Since we can't stop fake
> > timeouts from preventing a forced reclaim, continue completing the same
> > request until the block layer isn't told to pretend that didn't happen.
> > 
> > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > ---
> >  drivers/nvme/host/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index ba860efd250d..72e5973dda3a 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -310,7 +310,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
> >  		return true;
> >  
> >  	nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
> > -	blk_mq_complete_request(req);
> > +	while (!blk_mq_complete_request(req));
> If the probability is configured to fail every time with 100% probability, we
> would run into endless loop?

If completing requests is not possible, it's already endless. This is just a
tighter loop in that case.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme: cancel requests for real
  2020-05-27 19:09 [PATCH] nvme: cancel requests for real Keith Busch
  2020-05-27 19:36 ` Dongli Zhang
@ 2020-05-27 20:18 ` Sagi Grimberg
  2020-05-27 20:39   ` Alan Adamson
  2020-05-28  4:58 ` Christoph Hellwig
  2 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2020-05-27 20:18 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, hch; +Cc: axboe, alan.adamson


> Once the driver decides to cancel requests, the concept of those
> requests timing out should no longer exist. Since we can't stop fake
> timeouts from preventing a forced reclaim, continue completing the same
> request until the block layer isn't told to pretend that didn't happen.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>   drivers/nvme/host/core.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index ba860efd250d..72e5973dda3a 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -310,7 +310,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
>   		return true;
>   
>   	nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
> -	blk_mq_complete_request(req);
> +	while (!blk_mq_complete_request(req));

Maybe add cond_resched in the loop? just to play nice in case something
is wrong?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme: cancel requests for real
  2020-05-27 20:18 ` Sagi Grimberg
@ 2020-05-27 20:39   ` Alan Adamson
  2020-05-28  4:59     ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Alan Adamson @ 2020-05-27 20:39 UTC (permalink / raw)
  To: Sagi Grimberg, Keith Busch, linux-nvme, hch; +Cc: axboe


On 5/27/20 1:18 PM, Sagi Grimberg wrote:
>
>> Once the driver decides to cancel requests, the concept of those
>> requests timing out should no longer exist. Since we can't stop fake
>> timeouts from preventing a forced reclaim, continue completing the same
>> request until the block layer isn't told to pretend that didn't happen.
>>
>> Signed-off-by: Keith Busch <kbusch@kernel.org>
>> ---
>>   drivers/nvme/host/core.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index ba860efd250d..72e5973dda3a 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -310,7 +310,7 @@ bool nvme_cancel_request(struct request *req, 
>> void *data, bool reserved)
>>           return true;
>>         nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
>> -    blk_mq_complete_request(req);
>> +    while (!blk_mq_complete_request(req));
>
> Maybe add cond_resched in the loop? just to play nice in case something
> is wrong?


Is this a nvme issue or a block issue?  My original fix was in the block 
layer.

--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -22,10 +22,10 @@ static int __init setup_fail_io_timeout(char *str)

int blk_should_fake_timeout(struct request_queue *q)
{
-       if (!test_bit(QUEUE_FLAG_FAIL_IO, &q->queue_flags))
-               return 0;
-
-       return should_fail(&fail_io_timeout, 1);
+       if (test_bit(QUEUE_FLAG_FAIL_IO, &q->queue_flags) &&
+           !blk_queue_quiesced(q))
+               return should_fail(&fail_io_timeout, 1);
+       return 0;
}

static int __init fail_io_timeout_debugfs(void)
-- 

Alan





_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme: cancel requests for real
  2020-05-27 19:09 [PATCH] nvme: cancel requests for real Keith Busch
  2020-05-27 19:36 ` Dongli Zhang
  2020-05-27 20:18 ` Sagi Grimberg
@ 2020-05-28  4:58 ` Christoph Hellwig
  2020-05-28 19:37   ` Keith Busch
  2 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2020-05-28  4:58 UTC (permalink / raw)
  To: Keith Busch; +Cc: axboe, alan.adamson, hch, linux-nvme, sagi

On Wed, May 27, 2020 at 12:09:13PM -0700, Keith Busch wrote:
> Once the driver decides to cancel requests, the concept of those
> requests timing out should no longer exist. Since we can't stop fake
> timeouts from preventing a forced reclaim, continue completing the same
> request until the block layer isn't told to pretend that didn't happen.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  drivers/nvme/host/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index ba860efd250d..72e5973dda3a 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -310,7 +310,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
>  		return true;
>  
>  	nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
> -	blk_mq_complete_request(req);
> +	while (!blk_mq_complete_request(req));
>  	return true;

This is just disgusting.  Take a look at how blk_mq_complete_request
is implemented, and what might be a better function to export and call
here, or any other error handling context. Bonus points for finding a
saner name for it.

I'm also thinking that whole idea of randomly not doing anything in
blk_mq_complete_request is just a bad idea vs doing targeted hooks
in the actual drivers.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme: cancel requests for real
  2020-05-27 20:39   ` Alan Adamson
@ 2020-05-28  4:59     ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2020-05-28  4:59 UTC (permalink / raw)
  To: Alan Adamson; +Cc: Keith Busch, axboe, Sagi Grimberg, linux-nvme, hch

On Wed, May 27, 2020 at 01:39:21PM -0700, Alan Adamson wrote:
> --- a/block/blk-timeout.c
> +++ b/block/blk-timeout.c
> @@ -22,10 +22,10 @@ static int __init setup_fail_io_timeout(char *str)
>
> int blk_should_fake_timeout(struct request_queue *q)
> {
> -       if (!test_bit(QUEUE_FLAG_FAIL_IO, &q->queue_flags))
> -               return 0;
> -
> -       return should_fail(&fail_io_timeout, 1);
> +       if (test_bit(QUEUE_FLAG_FAIL_IO, &q->queue_flags) &&
> +           !blk_queue_quiesced(q))
> +               return should_fail(&fail_io_timeout, 1);
> +       return 0;
> }

While that is way better than the loop I think this encodes too way too
many assumptions.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme: cancel requests for real
  2020-05-28  4:58 ` Christoph Hellwig
@ 2020-05-28 19:37   ` Keith Busch
  0 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2020-05-28 19:37 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, sagi, linux-nvme, alan.adamson

On Thu, May 28, 2020 at 06:58:20AM +0200, Christoph Hellwig wrote:
> On Wed, May 27, 2020 at 12:09:13PM -0700, Keith Busch wrote:
> > Once the driver decides to cancel requests, the concept of those
> > requests timing out should no longer exist. Since we can't stop fake
> > timeouts from preventing a forced reclaim, continue completing the same
> > request until the block layer isn't told to pretend that didn't happen.
> > 
> > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > ---
> >  drivers/nvme/host/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index ba860efd250d..72e5973dda3a 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -310,7 +310,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
> >  		return true;
> >  
> >  	nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
> > -	blk_mq_complete_request(req);
> > +	while (!blk_mq_complete_request(req));
> >  	return true;
> 
> This is just disgusting.  

Thanks for noticing. :)

> I'm also thinking that whole idea of randomly not doing anything in
> blk_mq_complete_request is just a bad idea vs doing targeted hooks
> in the actual drivers.

Yeah, I don't like fake test code in the middle of production code.
Testing code shouldn't modify the code being testing. IMO, leave real
stuff alone and create a thunk for error injection, preferably one
specific to the low-level driver. I know we can do that on x86, but not
so sure about other archs.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-05-28 19:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-27 19:09 [PATCH] nvme: cancel requests for real Keith Busch
2020-05-27 19:36 ` Dongli Zhang
2020-05-27 19:57   ` Keith Busch
2020-05-27 20:18 ` Sagi Grimberg
2020-05-27 20:39   ` Alan Adamson
2020-05-28  4:59     ` Christoph Hellwig
2020-05-28  4:58 ` Christoph Hellwig
2020-05-28 19:37   ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.