linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Khazhy Kumykov <khazhy@google.com>,
	Bart Van Assche <bvanassche@acm.org>,
	John Garry <john.garry@huawei.com>
Cc: linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
	Ming Lei <ming.lei@redhat.com>, Hannes Reinecke <hare@suse.de>,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>
Subject: Re: [PATCH v5 3/3] blk-mq: Fix a race between iterating over requests and freeing requests
Date: Mon, 5 Apr 2021 15:34:40 -0600	[thread overview]
Message-ID: <54474b65-ffa4-9335-f7a2-5b49ccf169d4@kernel.dk> (raw)
In-Reply-To: <CACGdZYJh6ZvVekC8eBvz3SmN-TH8hTAmMQrvHtLJsKyL3R_fLw@mail.gmail.com>

On 4/5/21 12:08 PM, Khazhy Kumykov wrote:
> On Sun, Apr 4, 2021 at 5:28 PM Bart Van Assche <bvanassche@acm.org> wrote:
>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>> index 116c3691b104..e7a6a114d216 100644
>> --- a/block/blk-mq-tag.c
>> +++ b/block/blk-mq-tag.c
>> @@ -209,7 +209,11 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
>>
>>         if (!reserved)
>>                 bitnr += tags->nr_reserved_tags;
>> -       rq = tags->rqs[bitnr];
>> +       /*
>> +        * Protected by rq->q->q_usage_counter. See also
>> +        * blk_mq_queue_tag_busy_iter().
>> +        */
>> +       rq = rcu_dereference_check(tags->rqs[bitnr], true);
> 
> maybe I'm missing something, but if this tags struct has a shared
> sbitmap, what guarantees do we have that while iterating we won't
> touch requests from a queue that's tearing down? The check a few lines
> below suggests that at the least we may touch requests from a
> different queue.
> 
> say we enter blk_mq_queue_tag_busy_iter, we're iterating with raised
> hctx->q->q_usage_counter, and get to bt_iter
> 
> say tagset has 2 shared queues, hctx->q is q1, rq->q is q2
> (thread 1)
> rq = rcu_deref_check
> (rq->q != hctx->q, but we don't know yet)
> 
> (thread 2)
> elsewhere, blk_cleanup_queue(q2) runs (or elevator_exit), since we
> only have raised q_usage_counter on q1, this goes to completion and
> frees rq. if we have preempt kernel, thread 1 may be paused before we
> read rq->q, so synchonrize_rcu passes happily by
> 
> (thread 1)
> we check rq && rq->q == hctx->q, use-after-free since rq was freed above
> 
> I think John Garry mentioned observing a similar race in patch 2 of
> his series, perhaps his test case can verify this?
> 
> "Indeed, blk_mq_queue_tag_busy_iter() already does take a reference to its
> queue usage counter when called, and the queue cannot be frozen to switch
> IO scheduler until all refs are dropped. This ensures no stale references
> to IO scheduler requests will be seen by blk_mq_queue_tag_busy_iter().
> 
> However, there is nothing to stop blk_mq_queue_tag_busy_iter() being
> run for another queue associated with the same tagset, and it seeing
> a stale IO scheduler request from the other queue after they are freed."
> 
> so, to my understanding, we have a race between reading
> tags->rq[bitnr], and verifying that rq->q == hctx->q, where if we
> schedule off we might have use-after-free? If that's the case, perhaps
> we should rcu_read_lock until we verify the queues match, then we can
> release and run fn(), as we verified we no longer need it?

For something out of left field, we can check if the page that the rq
belongs to is still part of the tag set. If it isn't, then don't
deref it.

Totally untested.

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index e5bfecf2940d..6209c465e884 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -196,9 +196,35 @@ struct bt_iter_data {
 	struct blk_mq_hw_ctx *hctx;
 	busy_iter_fn *fn;
 	void *data;
+	struct page *last_lookup;
 	bool reserved;
 };
 
+static bool rq_from_queue(struct bt_iter_data *iter_data, struct request *rq)
+{
+	struct blk_mq_hw_ctx *hctx = iter_data->hctx;
+	struct page *rq_page, *page;
+
+	/*
+	 * We can hit rq == NULL here, because the tagging functions
+	 * test and set the bit before assigning ->rqs[].
+	 */
+	if (!rq)
+		return false;
+	rq_page = virt_to_page(rq);
+	if (rq_page == iter_data->last_lookup)
+		goto check_queue;
+	list_for_each_entry(page, &hctx->tags->page_list, lru) {
+		if (page == rq_page) {
+			iter_data->last_lookup = page;
+			goto check_queue;
+		}
+	}
+	return false;
+check_queue:
+	return rq->q == hctx->queue && rq->mq_hctx == hctx;
+}
+
 static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 {
 	struct bt_iter_data *iter_data = data;
@@ -211,11 +237,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 		bitnr += tags->nr_reserved_tags;
 	rq = tags->rqs[bitnr];
 
-	/*
-	 * We can hit rq == NULL here, because the tagging functions
-	 * test and set the bit before assigning ->rqs[].
-	 */
-	if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
+	if (rq_from_queue(iter_data, rq))
 		return iter_data->fn(hctx, rq, iter_data->data, reserved);
 	return true;
 }

-- 
Jens Axboe


  parent reply	other threads:[~2021-04-05 21:34 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-05  0:28 [PATCH v5 0/3] blk-mq: Fix a race between iterating over requests and freeing requests Bart Van Assche
2021-04-05  0:28 ` [PATCH v5 1/3] blk-mq: Move the elevator_exit() definition Bart Van Assche
2021-04-05  0:28 ` [PATCH v5 2/3] blk-mq: Introduce atomic variants of the tag iteration functions Bart Van Assche
2021-04-05  0:28 ` [PATCH v5 3/3] blk-mq: Fix a race between iterating over requests and freeing requests Bart Van Assche
2021-04-05 18:08   ` Khazhy Kumykov
2021-04-05 21:34     ` Bart Van Assche
2021-04-06  0:24       ` Khazhy Kumykov
2021-04-05 21:34     ` Jens Axboe [this message]
2021-04-06  1:06       ` Bart Van Assche
2021-04-06  8:00 ` [PATCH v5 0/3] " John Garry
2021-04-06 17:49   ` Bart Van Assche
2021-04-07 15:12     ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54474b65-ffa4-9335-f7a2-5b49ccf169d4@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=johannes.thumshirn@wdc.com \
    --cc=john.garry@huawei.com \
    --cc=khazhy@google.com \
    --cc=linux-block@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    --cc=shinichiro.kawasaki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).