linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] blk-mq: fix is_flush_rq
@ 2021-08-18  1:09 Ming Lei
  2021-08-18  2:18 ` Jens Axboe
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Ming Lei @ 2021-08-18  1:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, John Garry,
	Ming Lei, Blank-Burian, Markus, Dr.,
	Yufen Yu

is_flush_rq() is called from bt_iter()/bt_tags_iter(), and runs the
following check:

	hctx->fq->flush_rq == req

but the passed hctx from bt_iter()/bt_tags_iter() may be NULL because:

1) memory re-order in blk_mq_rq_ctx_init():

	rq->mq_hctx = data->hctx;
	...
	refcount_set(&rq->ref, 1);

OR

2) tag re-use and ->rqs[] isn't updated with new request.

Fix the issue by re-writing is_flush_rq() as:

	return rq->end_io == flush_end_io;

which turns out simpler to follow and immune to data race since we have
ordered WRITE rq->end_io and refcount_set(&rq->ref, 1).

Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in
    blk_mq_tagset_busy_iter")
Cc: "Blank-Burian, Markus, Dr." <blankburian@uni-muenster.de>
Cc: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-flush.c | 5 +++++
 block/blk-mq.c    | 2 +-
 block/blk.h       | 6 +-----
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 4912c8dbb1d8..4201728bf3a5 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -262,6 +262,11 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
 	spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
 }
 
+bool is_flush_rq(struct request *rq)
+{
+	return rq->end_io == flush_end_io;
+}
+
 /**
  * blk_kick_flush - consider issuing flush request
  * @q: request_queue being kicked
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b5237211ccb7..6d7395ad6d94 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -911,7 +911,7 @@ static bool blk_mq_req_expired(struct request *rq, unsigned long *next)
 
 void blk_mq_put_rq_ref(struct request *rq)
 {
-	if (is_flush_rq(rq, rq->mq_hctx))
+	if (is_flush_rq(rq))
 		rq->end_io(rq, 0);
 	else if (refcount_dec_and_test(&rq->ref))
 		__blk_mq_free_request(rq);
diff --git a/block/blk.h b/block/blk.h
index 56f33fbcde59..d4fb5b72d12b 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -44,11 +44,7 @@ static inline void __blk_get_queue(struct request_queue *q)
 	kobject_get(&q->kobj);
 }
 
-static inline bool
-is_flush_rq(struct request *req, struct blk_mq_hw_ctx *hctx)
-{
-	return hctx->fq->flush_rq == req;
-}
+bool is_flush_rq(struct request *req);
 
 struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
 					      gfp_t flags);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] blk-mq: fix is_flush_rq
  2021-08-18  1:09 [PATCH] blk-mq: fix is_flush_rq Ming Lei
@ 2021-08-18  2:18 ` Jens Axboe
  2021-08-18  3:02 ` Bart Van Assche
  2021-08-23 12:42 ` Joseph Qi
  2 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2021-08-18  2:18 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, John Garry,
	Blank-Burian, Markus, Dr.,
	Yufen Yu

On 8/17/21 7:09 PM, Ming Lei wrote:
> is_flush_rq() is called from bt_iter()/bt_tags_iter(), and runs the
> following check:
> 
> 	hctx->fq->flush_rq == req
> 
> but the passed hctx from bt_iter()/bt_tags_iter() may be NULL because:
> 
> 1) memory re-order in blk_mq_rq_ctx_init():
> 
> 	rq->mq_hctx = data->hctx;
> 	...
> 	refcount_set(&rq->ref, 1);
> 
> OR
> 
> 2) tag re-use and ->rqs[] isn't updated with new request.
> 
> Fix the issue by re-writing is_flush_rq() as:
> 
> 	return rq->end_io == flush_end_io;
> 
> which turns out simpler to follow and immune to data race since we have
> ordered WRITE rq->end_io and refcount_set(&rq->ref, 1).

That is way better, applied thanks.

> Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in
>     blk_mq_tagset_busy_iter")

I think your mailer was a bit too eager to split lines here. I fixed it up.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] blk-mq: fix is_flush_rq
  2021-08-18  1:09 [PATCH] blk-mq: fix is_flush_rq Ming Lei
  2021-08-18  2:18 ` Jens Axboe
@ 2021-08-18  3:02 ` Bart Van Assche
  2021-08-18  3:31   ` Ming Lei
  2021-08-23 12:42 ` Joseph Qi
  2 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2021-08-18  3:02 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, Christoph Hellwig, John Garry, Blank-Burian, Markus,
	Dr.,
	Yufen Yu

On 8/17/21 6:09 PM, Ming Lei wrote:
> +bool is_flush_rq(struct request *rq)
> +{
> +	return rq->end_io == flush_end_io;
> +}

My understanding is that calling is_flush_rq() is only safe if the
caller guarantees that the request refcount >= 1. How about adding a
comment that explains this?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] blk-mq: fix is_flush_rq
  2021-08-18  3:02 ` Bart Van Assche
@ 2021-08-18  3:31   ` Ming Lei
  0 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2021-08-18  3:31 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, John Garry,
	Blank-Burian, Markus, Dr.,
	Yufen Yu

On Tue, Aug 17, 2021 at 08:02:58PM -0700, Bart Van Assche wrote:
> On 8/17/21 6:09 PM, Ming Lei wrote:
> > +bool is_flush_rq(struct request *rq)
> > +{
> > +	return rq->end_io == flush_end_io;
> > +}
> 
> My understanding is that calling is_flush_rq() is only safe if the
> caller guarantees that the request refcount >= 1. How about adding a
> comment that explains this?

Yeah, we can add the following words, but it isn't something urgent
since is_flush_rq() is one block layer internal helper.

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 4201728bf3a5..babf6262120e 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -262,6 +262,11 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
 	spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
 }
 
+/*
+ * Caller has to grab refcount of this request, otherwise the flush request
+ * may be re-cycled, then rq->end_io is cleared and kernel panic is caused,
+ * see blk_mq_put_rq_ref().
+ */
 bool is_flush_rq(struct request *rq)
 {
 	return rq->end_io == flush_end_io;




Thanks,
Ming


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] blk-mq: fix is_flush_rq
  2021-08-18  1:09 [PATCH] blk-mq: fix is_flush_rq Ming Lei
  2021-08-18  2:18 ` Jens Axboe
  2021-08-18  3:02 ` Bart Van Assche
@ 2021-08-23 12:42 ` Joseph Qi
  2021-08-23 12:50   ` Ming Lei
  2 siblings, 1 reply; 6+ messages in thread
From: Joseph Qi @ 2021-08-23 12:42 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, John Garry,
	Blank-Burian, Markus, Dr.,
	Yufen Yu

Hi Ming,

On 8/18/21 9:09 AM, Ming Lei wrote:
> is_flush_rq() is called from bt_iter()/bt_tags_iter(), and runs the
> following check:
> 
> 	hctx->fq->flush_rq == req
> 
> but the passed hctx from bt_iter()/bt_tags_iter() may be NULL because:
> 
> 1) memory re-order in blk_mq_rq_ctx_init():
> 
> 	rq->mq_hctx = data->hctx;
> 	...
> 	refcount_set(&rq->ref, 1);
> 
> OR
> 
> 2) tag re-use and ->rqs[] isn't updated with new request.
> 
> Fix the issue by re-writing is_flush_rq() as:
> 
> 	return rq->end_io == flush_end_io;
> 
> which turns out simpler to follow and immune to data race since we have
> ordered WRITE rq->end_io and refcount_set(&rq->ref, 1).
> 
Recently we've run into a similar crash due to NULL rq->mq_hctx in
blk_mq_put_rq_ref() on ARM, and it is a normal write request.
Since memory reorder truly exists, we may also risk other uninitialized
member accessing after this commit, at least we have to be more careful
in busy_iter_fn...
So here you don't use memory barrier before refcount_set() is for
performance consideration?

Thanks,
Joseph

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] blk-mq: fix is_flush_rq
  2021-08-23 12:42 ` Joseph Qi
@ 2021-08-23 12:50   ` Ming Lei
  0 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2021-08-23 12:50 UTC (permalink / raw)
  To: Joseph Qi
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	John Garry, Blank-Burian, Markus, Dr.,
	Yufen Yu

On Mon, Aug 23, 2021 at 08:42:45PM +0800, Joseph Qi wrote:
> Hi Ming,
> 
> On 8/18/21 9:09 AM, Ming Lei wrote:
> > is_flush_rq() is called from bt_iter()/bt_tags_iter(), and runs the
> > following check:
> > 
> > 	hctx->fq->flush_rq == req
> > 
> > but the passed hctx from bt_iter()/bt_tags_iter() may be NULL because:
> > 
> > 1) memory re-order in blk_mq_rq_ctx_init():
> > 
> > 	rq->mq_hctx = data->hctx;
> > 	...
> > 	refcount_set(&rq->ref, 1);
> > 
> > OR
> > 
> > 2) tag re-use and ->rqs[] isn't updated with new request.
> > 
> > Fix the issue by re-writing is_flush_rq() as:
> > 
> > 	return rq->end_io == flush_end_io;
> > 
> > which turns out simpler to follow and immune to data race since we have
> > ordered WRITE rq->end_io and refcount_set(&rq->ref, 1).
> > 
> Recently we've run into a similar crash due to NULL rq->mq_hctx in
> blk_mq_put_rq_ref() on ARM, and it is a normal write request.
> Since memory reorder truly exists, we may also risk other uninitialized
> member accessing after this commit, at least we have to be more careful
> in busy_iter_fn...
> So here you don't use memory barrier before refcount_set() is for
> performance consideration?

Yes, also it is much simpler to check ->end_io in concept.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-23 12:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-18  1:09 [PATCH] blk-mq: fix is_flush_rq Ming Lei
2021-08-18  2:18 ` Jens Axboe
2021-08-18  3:02 ` Bart Van Assche
2021-08-18  3:31   ` Ming Lei
2021-08-23 12:42 ` Joseph Qi
2021-08-23 12:50   ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).