* [PATCH] nvme: cancel requests for real
@ 2020-05-27 19:09 Keith Busch
2020-05-27 19:36 ` Dongli Zhang
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Keith Busch @ 2020-05-27 19:09 UTC (permalink / raw)
To: linux-nvme, hch, sagi; +Cc: axboe, Keith Busch, alan.adamson
Once the driver decides to cancel requests, the concept of those
requests timing out should no longer exist. Since we can't stop fake
timeouts from preventing a forced reclaim, continue completing the same
request until the block layer isn't told to pretend that didn't happen.
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/nvme/host/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ba860efd250d..72e5973dda3a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -310,7 +310,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
return true;
nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
- blk_mq_complete_request(req);
+ while (!blk_mq_complete_request(req));
return true;
}
EXPORT_SYMBOL_GPL(nvme_cancel_request);
--
2.24.1
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] nvme: cancel requests for real
2020-05-27 19:09 [PATCH] nvme: cancel requests for real Keith Busch
@ 2020-05-27 19:36 ` Dongli Zhang
2020-05-27 19:57 ` Keith Busch
2020-05-27 20:18 ` Sagi Grimberg
2020-05-28 4:58 ` Christoph Hellwig
2 siblings, 1 reply; 8+ messages in thread
From: Dongli Zhang @ 2020-05-27 19:36 UTC (permalink / raw)
To: Keith Busch, linux-nvme, hch, sagi; +Cc: axboe, alan.adamson
Hi Keith,
On 5/27/20 12:09 PM, Keith Busch wrote:
> Once the driver decides to cancel requests, the concept of those
> requests timing out should no longer exist. Since we can't stop fake
> timeouts from preventing a forced reclaim, continue completing the same
> request until the block layer isn't told to pretend that didn't happen.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> drivers/nvme/host/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index ba860efd250d..72e5973dda3a 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -310,7 +310,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
> return true;
>
> nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
> - blk_mq_complete_request(req);
> + while (!blk_mq_complete_request(req));
If the probability is configured to fail every time with 100% probability, we
would run into endless loop?
Dongli Zhang
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] nvme: cancel requests for real
2020-05-27 19:36 ` Dongli Zhang
@ 2020-05-27 19:57 ` Keith Busch
0 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2020-05-27 19:57 UTC (permalink / raw)
To: Dongli Zhang; +Cc: axboe, alan.adamson, hch, linux-nvme, sagi
On Wed, May 27, 2020 at 12:36:07PM -0700, Dongli Zhang wrote:
> On 5/27/20 12:09 PM, Keith Busch wrote:
> > Once the driver decides to cancel requests, the concept of those
> > requests timing out should no longer exist. Since we can't stop fake
> > timeouts from preventing a forced reclaim, continue completing the same
> > request until the block layer isn't told to pretend that didn't happen.
> >
> > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > ---
> > drivers/nvme/host/core.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index ba860efd250d..72e5973dda3a 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -310,7 +310,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
> > return true;
> >
> > nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
> > - blk_mq_complete_request(req);
> > + while (!blk_mq_complete_request(req));
> If the probability is configured to fail every time with 100% probability, we
> would run into endless loop?
If completing requests is not possible, it's already endless. This is just a
tighter loop in that case.
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] nvme: cancel requests for real
2020-05-27 19:09 [PATCH] nvme: cancel requests for real Keith Busch
2020-05-27 19:36 ` Dongli Zhang
@ 2020-05-27 20:18 ` Sagi Grimberg
2020-05-27 20:39 ` Alan Adamson
2020-05-28 4:58 ` Christoph Hellwig
2 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2020-05-27 20:18 UTC (permalink / raw)
To: Keith Busch, linux-nvme, hch; +Cc: axboe, alan.adamson
> Once the driver decides to cancel requests, the concept of those
> requests timing out should no longer exist. Since we can't stop fake
> timeouts from preventing a forced reclaim, continue completing the same
> request until the block layer isn't told to pretend that didn't happen.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> drivers/nvme/host/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index ba860efd250d..72e5973dda3a 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -310,7 +310,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
> return true;
>
> nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
> - blk_mq_complete_request(req);
> + while (!blk_mq_complete_request(req));
Maybe add cond_resched in the loop? just to play nice in case something
is wrong?
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] nvme: cancel requests for real
2020-05-27 20:18 ` Sagi Grimberg
@ 2020-05-27 20:39 ` Alan Adamson
2020-05-28 4:59 ` Christoph Hellwig
0 siblings, 1 reply; 8+ messages in thread
From: Alan Adamson @ 2020-05-27 20:39 UTC (permalink / raw)
To: Sagi Grimberg, Keith Busch, linux-nvme, hch; +Cc: axboe
On 5/27/20 1:18 PM, Sagi Grimberg wrote:
>
>> Once the driver decides to cancel requests, the concept of those
>> requests timing out should no longer exist. Since we can't stop fake
>> timeouts from preventing a forced reclaim, continue completing the same
>> request until the block layer isn't told to pretend that didn't happen.
>>
>> Signed-off-by: Keith Busch <kbusch@kernel.org>
>> ---
>> drivers/nvme/host/core.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index ba860efd250d..72e5973dda3a 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -310,7 +310,7 @@ bool nvme_cancel_request(struct request *req,
>> void *data, bool reserved)
>> return true;
>> nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
>> - blk_mq_complete_request(req);
>> + while (!blk_mq_complete_request(req));
>
> Maybe add cond_resched in the loop? just to play nice in case something
> is wrong?
Is this a nvme issue or a block issue? My original fix was in the block
layer.
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -22,10 +22,10 @@ static int __init setup_fail_io_timeout(char *str)
int blk_should_fake_timeout(struct request_queue *q)
{
- if (!test_bit(QUEUE_FLAG_FAIL_IO, &q->queue_flags))
- return 0;
-
- return should_fail(&fail_io_timeout, 1);
+ if (test_bit(QUEUE_FLAG_FAIL_IO, &q->queue_flags) &&
+ !blk_queue_quiesced(q))
+ return should_fail(&fail_io_timeout, 1);
+ return 0;
}
static int __init fail_io_timeout_debugfs(void)
--
Alan
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] nvme: cancel requests for real
2020-05-27 19:09 [PATCH] nvme: cancel requests for real Keith Busch
2020-05-27 19:36 ` Dongli Zhang
2020-05-27 20:18 ` Sagi Grimberg
@ 2020-05-28 4:58 ` Christoph Hellwig
2020-05-28 19:37 ` Keith Busch
2 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2020-05-28 4:58 UTC (permalink / raw)
To: Keith Busch; +Cc: axboe, alan.adamson, hch, linux-nvme, sagi
On Wed, May 27, 2020 at 12:09:13PM -0700, Keith Busch wrote:
> Once the driver decides to cancel requests, the concept of those
> requests timing out should no longer exist. Since we can't stop fake
> timeouts from preventing a forced reclaim, continue completing the same
> request until the block layer isn't told to pretend that didn't happen.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> drivers/nvme/host/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index ba860efd250d..72e5973dda3a 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -310,7 +310,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
> return true;
>
> nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
> - blk_mq_complete_request(req);
> + while (!blk_mq_complete_request(req));
> return true;
This is just disgusting. Take a look at how blk_mq_complete_request
is implemented, and what might be a better function to export and call
here, or any other error handling context. Bonus points for finding a
saner name for it.
I'm also thinking that whole idea of randomly not doing anything in
blk_mq_complete_request is just a bad idea vs doing targeted hooks
in the actual drivers.
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] nvme: cancel requests for real
2020-05-27 20:39 ` Alan Adamson
@ 2020-05-28 4:59 ` Christoph Hellwig
0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2020-05-28 4:59 UTC (permalink / raw)
To: Alan Adamson; +Cc: Keith Busch, axboe, Sagi Grimberg, linux-nvme, hch
On Wed, May 27, 2020 at 01:39:21PM -0700, Alan Adamson wrote:
> --- a/block/blk-timeout.c
> +++ b/block/blk-timeout.c
> @@ -22,10 +22,10 @@ static int __init setup_fail_io_timeout(char *str)
>
> int blk_should_fake_timeout(struct request_queue *q)
> {
> - if (!test_bit(QUEUE_FLAG_FAIL_IO, &q->queue_flags))
> - return 0;
> -
> - return should_fail(&fail_io_timeout, 1);
> + if (test_bit(QUEUE_FLAG_FAIL_IO, &q->queue_flags) &&
> + !blk_queue_quiesced(q))
> + return should_fail(&fail_io_timeout, 1);
> + return 0;
> }
While that is way better than the loop I think this encodes too way too
many assumptions.
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] nvme: cancel requests for real
2020-05-28 4:58 ` Christoph Hellwig
@ 2020-05-28 19:37 ` Keith Busch
0 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2020-05-28 19:37 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: axboe, sagi, linux-nvme, alan.adamson
On Thu, May 28, 2020 at 06:58:20AM +0200, Christoph Hellwig wrote:
> On Wed, May 27, 2020 at 12:09:13PM -0700, Keith Busch wrote:
> > Once the driver decides to cancel requests, the concept of those
> > requests timing out should no longer exist. Since we can't stop fake
> > timeouts from preventing a forced reclaim, continue completing the same
> > request until the block layer isn't told to pretend that didn't happen.
> >
> > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > ---
> > drivers/nvme/host/core.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index ba860efd250d..72e5973dda3a 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -310,7 +310,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
> > return true;
> >
> > nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
> > - blk_mq_complete_request(req);
> > + while (!blk_mq_complete_request(req));
> > return true;
>
> This is just disgusting.
Thanks for noticing. :)
> I'm also thinking that whole idea of randomly not doing anything in
> blk_mq_complete_request is just a bad idea vs doing targeted hooks
> in the actual drivers.
Yeah, I don't like fake test code in the middle of production code.
Testing code shouldn't modify the code being testing. IMO, leave real
stuff alone and create a thunk for error injection, preferably one
specific to the low-level driver. I know we can do that on x86, but not
so sure about other archs.
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-05-28 19:37 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-27 19:09 [PATCH] nvme: cancel requests for real Keith Busch
2020-05-27 19:36 ` Dongli Zhang
2020-05-27 19:57 ` Keith Busch
2020-05-27 20:18 ` Sagi Grimberg
2020-05-27 20:39 ` Alan Adamson
2020-05-28 4:59 ` Christoph Hellwig
2020-05-28 4:58 ` Christoph Hellwig
2020-05-28 19:37 ` Keith Busch
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.