* [PATCH] block: pop cached rq before potentially blocking rq_qos_throttle()
@ 2022-06-21 16:07 Jens Axboe
2022-06-21 16:58 ` Dylan Yudaken
2022-06-23 8:56 ` Shinichiro Kawasaki
0 siblings, 2 replies; 5+ messages in thread
From: Jens Axboe @ 2022-06-21 16:07 UTC (permalink / raw)
To: linux-block; +Cc: Dylan Yudaken, Shin'ichiro Kawasaki
If rq_qos_throttle() ends up blocking, then we will have invalidated and
flushed our current plug. Since blk_mq_get_cached_request() hasn't
popped the cached request off the plug list just yet, we end holding a
pointer to a request that is no longer valid. This insta-crashes with
rq->mq_hctx being NULL in the validity checks just after.
Pop the request off the cached list before doing rq_qos_throttle() to
avoid using a potentially stale request.
Fixes: 0a5aa8d161d1 ("block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection")
Reported-by: Dylan Yudaken <dylany@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 33145ba52c96..93d9d60980fb 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2765,15 +2765,20 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q,
return NULL;
}
- rq_qos_throttle(q, *bio);
-
if (blk_mq_get_hctx_type((*bio)->bi_opf) != rq->mq_hctx->type)
return NULL;
if (op_is_flush(rq->cmd_flags) != op_is_flush((*bio)->bi_opf))
return NULL;
- rq->cmd_flags = (*bio)->bi_opf;
+ /*
+ * If any qos ->throttle() end up blocking, we will have flushed the
+ * plug and hence killed the cached_rq list as well. Pop this entry
+ * before we throttle.
+ */
plug->cached_rq = rq_list_next(rq);
+ rq_qos_throttle(q, *bio);
+
+ rq->cmd_flags = (*bio)->bi_opf;
INIT_LIST_HEAD(&rq->queuelist);
return rq;
}
--
Jens Axboe
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] block: pop cached rq before potentially blocking rq_qos_throttle()
2022-06-21 16:07 [PATCH] block: pop cached rq before potentially blocking rq_qos_throttle() Jens Axboe
@ 2022-06-21 16:58 ` Dylan Yudaken
2022-06-23 8:56 ` Shinichiro Kawasaki
1 sibling, 0 replies; 5+ messages in thread
From: Dylan Yudaken @ 2022-06-21 16:58 UTC (permalink / raw)
To: axboe, linux-block; +Cc: shinichiro.kawasaki
On Tue, 2022-06-21 at 10:07 -0600, Jens Axboe wrote:
> If rq_qos_throttle() ends up blocking, then we will have invalidated
> and
> flushed our current plug. Since blk_mq_get_cached_request() hasn't
> popped the cached request off the plug list just yet, we end holding
> a
> pointer to a request that is no longer valid. This insta-crashes with
> rq->mq_hctx being NULL in the validity checks just after.
>
> Pop the request off the cached list before doing rq_qos_throttle() to
> avoid using a potentially stale request.
>
> Fixes: 0a5aa8d161d1 ("block: fix blk_mq_attempt_bio_merge and
> rq_qos_throttle protection")
> Reported-by: Dylan Yudaken <dylany@fb.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>
> ---
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 33145ba52c96..93d9d60980fb 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2765,15 +2765,20 @@ static inline struct request
> *blk_mq_get_cached_request(struct request_queue *q,
> return NULL;
> }
>
> - rq_qos_throttle(q, *bio);
> -
> if (blk_mq_get_hctx_type((*bio)->bi_opf) != rq->mq_hctx-
> >type)
> return NULL;
> if (op_is_flush(rq->cmd_flags) != op_is_flush((*bio)-
> >bi_opf))
> return NULL;
>
> - rq->cmd_flags = (*bio)->bi_opf;
> + /*
> + * If any qos ->throttle() end up blocking, we will have
> flushed the
> + * plug and hence killed the cached_rq list as well. Pop this
> entry
> + * before we throttle.
> + */
> plug->cached_rq = rq_list_next(rq);
> + rq_qos_throttle(q, *bio);
> +
> + rq->cmd_flags = (*bio)->bi_opf;
> INIT_LIST_HEAD(&rq->queuelist);
> return rq;
> }
This fixes the problems I was seeing
Tested-by: Dylan Yudaken <dylany@fb.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] block: pop cached rq before potentially blocking rq_qos_throttle()
2022-06-21 16:07 [PATCH] block: pop cached rq before potentially blocking rq_qos_throttle() Jens Axboe
2022-06-21 16:58 ` Dylan Yudaken
@ 2022-06-23 8:56 ` Shinichiro Kawasaki
2022-06-23 12:05 ` Jens Axboe
1 sibling, 1 reply; 5+ messages in thread
From: Shinichiro Kawasaki @ 2022-06-23 8:56 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Dylan Yudaken
On Jun 21, 2022 / 10:07, Jens Axboe wrote:
> If rq_qos_throttle() ends up blocking, then we will have invalidated and
> flushed our current plug. Since blk_mq_get_cached_request() hasn't
> popped the cached request off the plug list just yet, we end holding a
> pointer to a request that is no longer valid. This insta-crashes with
> rq->mq_hctx being NULL in the validity checks just after.
>
> Pop the request off the cached list before doing rq_qos_throttle() to
> avoid using a potentially stale request.
>
> Fixes: 0a5aa8d161d1 ("block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection")
> Reported-by: Dylan Yudaken <dylany@fb.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Thank you for the fix and sorry for the trouble. The patch passed my test set:
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
BTW, may I know the workload or script which triggers the failure? I'm
interested in if we can add a blktests test case to exercises qos throttle
and prevent similar bugs in the future.
--
Shin'ichiro Kawasaki
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] block: pop cached rq before potentially blocking rq_qos_throttle()
2022-06-23 8:56 ` Shinichiro Kawasaki
@ 2022-06-23 12:05 ` Jens Axboe
2022-06-23 12:33 ` Shinichiro Kawasaki
0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2022-06-23 12:05 UTC (permalink / raw)
To: Shinichiro Kawasaki; +Cc: linux-block, Dylan Yudaken
On 6/23/22 2:56 AM, Shinichiro Kawasaki wrote:
> On Jun 21, 2022 / 10:07, Jens Axboe wrote:
>> If rq_qos_throttle() ends up blocking, then we will have invalidated and
>> flushed our current plug. Since blk_mq_get_cached_request() hasn't
>> popped the cached request off the plug list just yet, we end holding a
>> pointer to a request that is no longer valid. This insta-crashes with
>> rq->mq_hctx being NULL in the validity checks just after.
>>
>> Pop the request off the cached list before doing rq_qos_throttle() to
>> avoid using a potentially stale request.
>>
>> Fixes: 0a5aa8d161d1 ("block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection")
>> Reported-by: Dylan Yudaken <dylany@fb.com>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>
> Thank you for the fix and sorry for the trouble. The patch passed my test set:
>
> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>
>
> BTW, may I know the workload or script which triggers the failure? I'm
> interested in if we can add a blktests test case to exercises qos throttle
> and prevent similar bugs in the future.
Not sure if there are others, but specifically blk-iocost will do an
explicit schedule() for some conditions. See the bottom of
ioc_rqos_throttle().
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] block: pop cached rq before potentially blocking rq_qos_throttle()
2022-06-23 12:05 ` Jens Axboe
@ 2022-06-23 12:33 ` Shinichiro Kawasaki
0 siblings, 0 replies; 5+ messages in thread
From: Shinichiro Kawasaki @ 2022-06-23 12:33 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Dylan Yudaken
On Jun 23, 2022 / 06:05, Jens Axboe wrote:
> On 6/23/22 2:56 AM, Shinichiro Kawasaki wrote:
> > On Jun 21, 2022 / 10:07, Jens Axboe wrote:
> >> If rq_qos_throttle() ends up blocking, then we will have invalidated and
> >> flushed our current plug. Since blk_mq_get_cached_request() hasn't
> >> popped the cached request off the plug list just yet, we end holding a
> >> pointer to a request that is no longer valid. This insta-crashes with
> >> rq->mq_hctx being NULL in the validity checks just after.
> >>
> >> Pop the request off the cached list before doing rq_qos_throttle() to
> >> avoid using a potentially stale request.
> >>
> >> Fixes: 0a5aa8d161d1 ("block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection")
> >> Reported-by: Dylan Yudaken <dylany@fb.com>
> >> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >
> > Thank you for the fix and sorry for the trouble. The patch passed my test set:
> >
> > Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> >
> >
> > BTW, may I know the workload or script which triggers the failure? I'm
> > interested in if we can add a blktests test case to exercises qos throttle
> > and prevent similar bugs in the future.
>
> Not sure if there are others, but specifically blk-iocost will do an
> explicit schedule() for some conditions. See the bottom of
> ioc_rqos_throttle().
Thanks. Will try to create a script which triggers the schedule() in
ioc_rqos_throttle().
--
Shin'ichiro Kawasaki
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-06-23 12:33 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-21 16:07 [PATCH] block: pop cached rq before potentially blocking rq_qos_throttle() Jens Axboe
2022-06-21 16:58 ` Dylan Yudaken
2022-06-23 8:56 ` Shinichiro Kawasaki
2022-06-23 12:05 ` Jens Axboe
2022-06-23 12:33 ` Shinichiro Kawasaki
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.