All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] block: pop cached rq before potentially blocking rq_qos_throttle()
@ 2022-06-21 16:07 Jens Axboe
  2022-06-21 16:58 ` Dylan Yudaken
  2022-06-23  8:56 ` Shinichiro Kawasaki
  0 siblings, 2 replies; 5+ messages in thread
From: Jens Axboe @ 2022-06-21 16:07 UTC (permalink / raw)
  To: linux-block; +Cc: Dylan Yudaken, Shin'ichiro Kawasaki

If rq_qos_throttle() ends up blocking, then we will have invalidated and
flushed our current plug. Since blk_mq_get_cached_request() hasn't
popped the cached request off the plug list just yet, we end holding a
pointer to a request that is no longer valid. This insta-crashes with
rq->mq_hctx being NULL in the validity checks just after.

Pop the request off the cached list before doing rq_qos_throttle() to
avoid using a potentially stale request.

Fixes: 0a5aa8d161d1 ("block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection")
Reported-by: Dylan Yudaken <dylany@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 33145ba52c96..93d9d60980fb 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2765,15 +2765,20 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q,
 		return NULL;
 	}
 
-	rq_qos_throttle(q, *bio);
-
 	if (blk_mq_get_hctx_type((*bio)->bi_opf) != rq->mq_hctx->type)
 		return NULL;
 	if (op_is_flush(rq->cmd_flags) != op_is_flush((*bio)->bi_opf))
 		return NULL;
 
-	rq->cmd_flags = (*bio)->bi_opf;
+	/*
+	 * If any qos ->throttle() end up blocking, we will have flushed the
+	 * plug and hence killed the cached_rq list as well. Pop this entry
+	 * before we throttle.
+	 */
 	plug->cached_rq = rq_list_next(rq);
+	rq_qos_throttle(q, *bio);
+
+	rq->cmd_flags = (*bio)->bi_opf;
 	INIT_LIST_HEAD(&rq->queuelist);
 	return rq;
 }
-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] block: pop cached rq before potentially blocking rq_qos_throttle()
  2022-06-21 16:07 [PATCH] block: pop cached rq before potentially blocking rq_qos_throttle() Jens Axboe
@ 2022-06-21 16:58 ` Dylan Yudaken
  2022-06-23  8:56 ` Shinichiro Kawasaki
  1 sibling, 0 replies; 5+ messages in thread
From: Dylan Yudaken @ 2022-06-21 16:58 UTC (permalink / raw)
  To: axboe, linux-block; +Cc: shinichiro.kawasaki

On Tue, 2022-06-21 at 10:07 -0600, Jens Axboe wrote:
> If rq_qos_throttle() ends up blocking, then we will have invalidated
> and
> flushed our current plug. Since blk_mq_get_cached_request() hasn't
> popped the cached request off the plug list just yet, we end holding
> a
> pointer to a request that is no longer valid. This insta-crashes with
> rq->mq_hctx being NULL in the validity checks just after.
> 
> Pop the request off the cached list before doing rq_qos_throttle() to
> avoid using a potentially stale request.
> 
> Fixes: 0a5aa8d161d1 ("block: fix blk_mq_attempt_bio_merge and
> rq_qos_throttle protection")
> Reported-by: Dylan Yudaken <dylany@fb.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> ---
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 33145ba52c96..93d9d60980fb 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2765,15 +2765,20 @@ static inline struct request
> *blk_mq_get_cached_request(struct request_queue *q,
>                 return NULL;
>         }
>  
> -       rq_qos_throttle(q, *bio);
> -
>         if (blk_mq_get_hctx_type((*bio)->bi_opf) != rq->mq_hctx-
> >type)
>                 return NULL;
>         if (op_is_flush(rq->cmd_flags) != op_is_flush((*bio)-
> >bi_opf))
>                 return NULL;
>  
> -       rq->cmd_flags = (*bio)->bi_opf;
> +       /*
> +        * If any qos ->throttle() end up blocking, we will have
> flushed the
> +        * plug and hence killed the cached_rq list as well. Pop this
> entry
> +        * before we throttle.
> +        */
>         plug->cached_rq = rq_list_next(rq);
> +       rq_qos_throttle(q, *bio);
> +
> +       rq->cmd_flags = (*bio)->bi_opf;
>         INIT_LIST_HEAD(&rq->queuelist);
>         return rq;
>  }

This fixes the problems I was seeing

Tested-by: Dylan Yudaken <dylany@fb.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] block: pop cached rq before potentially blocking rq_qos_throttle()
  2022-06-21 16:07 [PATCH] block: pop cached rq before potentially blocking rq_qos_throttle() Jens Axboe
  2022-06-21 16:58 ` Dylan Yudaken
@ 2022-06-23  8:56 ` Shinichiro Kawasaki
  2022-06-23 12:05   ` Jens Axboe
  1 sibling, 1 reply; 5+ messages in thread
From: Shinichiro Kawasaki @ 2022-06-23  8:56 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Dylan Yudaken

On Jun 21, 2022 / 10:07, Jens Axboe wrote:
> If rq_qos_throttle() ends up blocking, then we will have invalidated and
> flushed our current plug. Since blk_mq_get_cached_request() hasn't
> popped the cached request off the plug list just yet, we end holding a
> pointer to a request that is no longer valid. This insta-crashes with
> rq->mq_hctx being NULL in the validity checks just after.
> 
> Pop the request off the cached list before doing rq_qos_throttle() to
> avoid using a potentially stale request.
> 
> Fixes: 0a5aa8d161d1 ("block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection")
> Reported-by: Dylan Yudaken <dylany@fb.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Thank you for the fix and sorry for the trouble. The patch passed my test set:

Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>


BTW, may I know the workload or script which triggers the failure? I'm
interested in if we can add a blktests test case to exercises qos throttle
and prevent similar bugs in the future.

-- 
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] block: pop cached rq before potentially blocking rq_qos_throttle()
  2022-06-23  8:56 ` Shinichiro Kawasaki
@ 2022-06-23 12:05   ` Jens Axboe
  2022-06-23 12:33     ` Shinichiro Kawasaki
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2022-06-23 12:05 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: linux-block, Dylan Yudaken

On 6/23/22 2:56 AM, Shinichiro Kawasaki wrote:
> On Jun 21, 2022 / 10:07, Jens Axboe wrote:
>> If rq_qos_throttle() ends up blocking, then we will have invalidated and
>> flushed our current plug. Since blk_mq_get_cached_request() hasn't
>> popped the cached request off the plug list just yet, we end holding a
>> pointer to a request that is no longer valid. This insta-crashes with
>> rq->mq_hctx being NULL in the validity checks just after.
>>
>> Pop the request off the cached list before doing rq_qos_throttle() to
>> avoid using a potentially stale request.
>>
>> Fixes: 0a5aa8d161d1 ("block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection")
>> Reported-by: Dylan Yudaken <dylany@fb.com>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> Thank you for the fix and sorry for the trouble. The patch passed my test set:
> 
> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> 
> 
> BTW, may I know the workload or script which triggers the failure? I'm
> interested in if we can add a blktests test case to exercises qos throttle
> and prevent similar bugs in the future.

Not sure if there are others, but specifically blk-iocost will do an
explicit schedule() for some conditions. See the bottom of
ioc_rqos_throttle().

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] block: pop cached rq before potentially blocking rq_qos_throttle()
  2022-06-23 12:05   ` Jens Axboe
@ 2022-06-23 12:33     ` Shinichiro Kawasaki
  0 siblings, 0 replies; 5+ messages in thread
From: Shinichiro Kawasaki @ 2022-06-23 12:33 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Dylan Yudaken

On Jun 23, 2022 / 06:05, Jens Axboe wrote:
> On 6/23/22 2:56 AM, Shinichiro Kawasaki wrote:
> > On Jun 21, 2022 / 10:07, Jens Axboe wrote:
> >> If rq_qos_throttle() ends up blocking, then we will have invalidated and
> >> flushed our current plug. Since blk_mq_get_cached_request() hasn't
> >> popped the cached request off the plug list just yet, we end holding a
> >> pointer to a request that is no longer valid. This insta-crashes with
> >> rq->mq_hctx being NULL in the validity checks just after.
> >>
> >> Pop the request off the cached list before doing rq_qos_throttle() to
> >> avoid using a potentially stale request.
> >>
> >> Fixes: 0a5aa8d161d1 ("block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection")
> >> Reported-by: Dylan Yudaken <dylany@fb.com>
> >> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > 
> > Thank you for the fix and sorry for the trouble. The patch passed my test set:
> > 
> > Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > 
> > 
> > BTW, may I know the workload or script which triggers the failure? I'm
> > interested in if we can add a blktests test case to exercises qos throttle
> > and prevent similar bugs in the future.
> 
> Not sure if there are others, but specifically blk-iocost will do an
> explicit schedule() for some conditions. See the bottom of
> ioc_rqos_throttle().

Thanks. Will try to create a script which triggers the schedule() in
ioc_rqos_throttle().

-- 
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-06-23 12:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-21 16:07 [PATCH] block: pop cached rq before potentially blocking rq_qos_throttle() Jens Axboe
2022-06-21 16:58 ` Dylan Yudaken
2022-06-23  8:56 ` Shinichiro Kawasaki
2022-06-23 12:05   ` Jens Axboe
2022-06-23 12:33     ` Shinichiro Kawasaki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.