All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 0/3] blk-mq: fix request UAF related with iterating over tagset requests
@ 2021-04-27 15:10 Ming Lei
  2021-04-27 15:10 ` [PATCH V3 1/3] block: avoid double io accounting for flush request Ming Lei
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Ming Lei @ 2021-04-27 15:10 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Khazhy Kumykov,
	Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
	David Jeffery, Ming Lei

Hi Jens,

This patchset fixes the request UAF issue by one simple approach,
without clearing ->rqs[] in fast path.

1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
and release it after calling ->fn, so ->fn won't be called for one
request if its queue is frozen, done in 2st patch

2) clearing any stale request referred in ->rqs[] before freeing the
request pool, one per-tags spinlock is added for protecting
grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
in bt_tags_iter() is avoided, done in 3rd patch.

V3:
	- drop patches for completing requests started in iterator ->fn,
	  because blk-mq guarantees that valid request is passed to ->fn,
	  and it is driver's responsibility for avoiding double completion.
	  And drivers works well for not completing rq twice.
	- add one patch for avoiding double accounting of flush rq 

V2:
	- take Bart's suggestion to not add blk-mq helper for completing
	  requests when it is being iterated
	- don't grab rq->ref if the iterator is over static rqs because
	the use case do require to iterate over all requests no matter if
	the request is initialized or not


Ming Lei (3):
  block: avoid double io accounting for flush request
  blk-mq: grab rq->refcount before calling ->fn in
    blk_mq_tagset_busy_iter
  blk-mq: clear stale request in tags->rq[] before freeing one request
    pool

 block/blk-flush.c  |  3 +--
 block/blk-mq-tag.c | 29 +++++++++++++++++++------
 block/blk-mq-tag.h |  3 +++
 block/blk-mq.c     | 53 +++++++++++++++++++++++++++++++++++++---------
 block/blk-mq.h     |  1 +
 5 files changed, 71 insertions(+), 18 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH V3 1/3] block: avoid double io accounting for flush request
  2021-04-27 15:10 [PATCH V3 0/3] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
@ 2021-04-27 15:10 ` Ming Lei
  2021-04-27 15:10 ` [PATCH V3 2/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
  2021-04-27 15:10 ` [PATCH V3 3/3] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
  2 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2021-04-27 15:10 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Khazhy Kumykov,
	Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
	David Jeffery, Ming Lei

For flush request, rq->end_io() may be called two times, one is from
timeout handling(blk_mq_check_expired()), another is from normal
completion(__blk_mq_end_request()).

Move blk_account_io_flush() after flush_rq->ref drops to zero, so
io accounting can be done just once for flush request.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-flush.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 7942ca6ed321..1002f6c58181 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -219,8 +219,6 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
 	unsigned long flags = 0;
 	struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
 
-	blk_account_io_flush(flush_rq);
-
 	/* release the tag's ownership to the req cloned from */
 	spin_lock_irqsave(&fq->mq_flush_lock, flags);
 
@@ -230,6 +228,7 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
 		return;
 	}
 
+	blk_account_io_flush(flush_rq);
 	/*
 	 * Flush request has to be marked as IDLE when it is really ended
 	 * because its .end_io() is called from timeout code path too for
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V3 2/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
  2021-04-27 15:10 [PATCH V3 0/3] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
  2021-04-27 15:10 ` [PATCH V3 1/3] block: avoid double io accounting for flush request Ming Lei
@ 2021-04-27 15:10 ` Ming Lei
  2021-04-27 20:17   ` Bart Van Assche
  2021-04-27 15:10 ` [PATCH V3 3/3] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
  2 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2021-04-27 15:10 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Khazhy Kumykov,
	Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
	David Jeffery, Ming Lei

Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and
this way will prevent the request from being re-used when ->fn is
running. The approach is same as what we do during handling timeout.

Fix request UAF related with completion race or queue releasing:

- If one rq is referred before rq->q is frozen, then queue won't be
frozen before the request is released during iteration.

- If one rq is referred after rq->q is frozen, refcount_inc_not_zero()
will return false, and we won't iterate over this request.

However, still one request UAF not covered: refcount_inc_not_zero() may
read one freed request, and it will be handled in next patch.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-tag.c | 22 ++++++++++++++++------
 block/blk-mq.c     | 14 +++++++++-----
 block/blk-mq.h     |  1 +
 3 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 2a37731e8244..9329b94a9743 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -264,6 +264,8 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	struct blk_mq_tags *tags = iter_data->tags;
 	bool reserved = iter_data->flags & BT_TAG_ITER_RESERVED;
 	struct request *rq;
+	bool ret;
+	bool iter_static_rqs = !!(iter_data->flags & BT_TAG_ITER_STATIC_RQS);
 
 	if (!reserved)
 		bitnr += tags->nr_reserved_tags;
@@ -272,16 +274,21 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	 * We can hit rq == NULL here, because the tagging functions
 	 * test and set the bit before assigning ->rqs[].
 	 */
-	if (iter_data->flags & BT_TAG_ITER_STATIC_RQS)
+	if (iter_static_rqs)
 		rq = tags->static_rqs[bitnr];
-	else
+	else {
 		rq = tags->rqs[bitnr];
-	if (!rq)
-		return true;
+		if (!rq || !refcount_inc_not_zero(&rq->ref))
+			return true;
+	}
 	if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
 	    !blk_mq_request_started(rq))
-		return true;
-	return iter_data->fn(rq, iter_data->data, reserved);
+		ret = true;
+	else
+		ret = iter_data->fn(rq, iter_data->data, reserved);
+	if (!iter_static_rqs)
+		blk_mq_put_rq_ref(rq);
+	return ret;
 }
 
 /**
@@ -348,6 +355,9 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
  *		indicates whether or not @rq is a reserved request. Return
  *		true to continue iterating tags, false to stop.
  * @priv:	Will be passed as second argument to @fn.
+ *
+ * We grab one request reference before calling @fn and release it after
+ * @fn returns.
  */
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		busy_tag_iter_fn *fn, void *priv)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 927189a55575..4bd6c11bd8bc 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -909,6 +909,14 @@ static bool blk_mq_req_expired(struct request *rq, unsigned long *next)
 	return false;
 }
 
+void blk_mq_put_rq_ref(struct request *rq)
+{
+	if (is_flush_rq(rq, rq->mq_hctx))
+		rq->end_io(rq, 0);
+	else if (refcount_dec_and_test(&rq->ref))
+		__blk_mq_free_request(rq);
+}
+
 static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 		struct request *rq, void *priv, bool reserved)
 {
@@ -942,11 +950,7 @@ static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 	if (blk_mq_req_expired(rq, next))
 		blk_mq_rq_timed_out(rq, reserved);
 
-	if (is_flush_rq(rq, hctx))
-		rq->end_io(rq, 0);
-	else if (refcount_dec_and_test(&rq->ref))
-		__blk_mq_free_request(rq);
-
+	blk_mq_put_rq_ref(rq);
 	return true;
 }
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 3616453ca28c..143afe42c63a 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -47,6 +47,7 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
 void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
 struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
 					struct blk_mq_ctx *start);
+void blk_mq_put_rq_ref(struct request *rq);
 
 /*
  * Internal helpers for allocating/freeing the request map
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V3 3/3] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-04-27 15:10 [PATCH V3 0/3] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
  2021-04-27 15:10 ` [PATCH V3 1/3] block: avoid double io accounting for flush request Ming Lei
  2021-04-27 15:10 ` [PATCH V3 2/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
@ 2021-04-27 15:10 ` Ming Lei
  2021-04-28 14:30   ` David Jeffery
  2 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2021-04-27 15:10 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Khazhy Kumykov,
	Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
	David Jeffery, Ming Lei

refcount_inc_not_zero() in bt_tags_iter() still may read one freed
request.

Fix the issue by the following approach:

1) hold a per-tags spinlock when reading ->rqs[tag] and calling
refcount_inc_not_zero in bt_tags_iter()

2) clearing stale request referred via ->rqs[tag] before freeing
request pool, the per-tags spinlock is held for clearing stale
->rq[tag]

So after we cleared stale requests, bt_tags_iter() won't observe
freed request any more, also the clearing will wait for pending
request reference.

The idea of clearing ->rqs[] is borrowed from John Garry's previous
patch and one recent David's patch.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Cc: John Garry <john.garry@huawei.com>
Cc: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-tag.c |  9 ++++++++-
 block/blk-mq-tag.h |  3 +++
 block/blk-mq.c     | 39 ++++++++++++++++++++++++++++++++++-----
 3 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 9329b94a9743..a3be267212b9 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -277,9 +277,15 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	if (iter_static_rqs)
 		rq = tags->static_rqs[bitnr];
 	else {
+		unsigned long flags;
+
+		spin_lock_irqsave(&tags->lock, flags);
 		rq = tags->rqs[bitnr];
-		if (!rq || !refcount_inc_not_zero(&rq->ref))
+		if (!rq || !refcount_inc_not_zero(&rq->ref)) {
+			spin_unlock_irqrestore(&tags->lock, flags);
 			return true;
+		}
+		spin_unlock_irqrestore(&tags->lock, flags);
 	}
 	if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
 	    !blk_mq_request_started(rq))
@@ -526,6 +532,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 
 	tags->nr_tags = total_tags;
 	tags->nr_reserved_tags = reserved_tags;
+	spin_lock_init(&tags->lock);
 
 	if (blk_mq_is_sbitmap_shared(flags))
 		return tags;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 7d3e6b333a4a..f942a601b5ef 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -20,6 +20,9 @@ struct blk_mq_tags {
 	struct request **rqs;
 	struct request **static_rqs;
 	struct list_head page_list;
+
+	/* used to clear rqs[] before one request pool is freed */
+	spinlock_t lock;
 };
 
 extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4bd6c11bd8bc..abd0f7a9d052 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2291,6 +2291,38 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio)
 	return BLK_QC_T_NONE;
 }
 
+static size_t order_to_size(unsigned int order)
+{
+	return (size_t)PAGE_SIZE << order;
+}
+
+/* called before freeing request pool in @tags */
+static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
+		struct blk_mq_tags *tags, unsigned int hctx_idx)
+{
+	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
+	struct page *page;
+	unsigned long flags;
+
+	spin_lock_irqsave(&drv_tags->lock, flags);
+	list_for_each_entry(page, &tags->page_list, lru) {
+		unsigned long start = (unsigned long)page_address(page);
+		unsigned long end = start + order_to_size(page->private);
+		int i;
+
+		for (i = 0; i < set->queue_depth; i++) {
+			struct request *rq = drv_tags->rqs[i];
+			unsigned long rq_addr = (unsigned long)rq;
+
+			if (rq_addr >= start && rq_addr < end) {
+				WARN_ON_ONCE(refcount_read(&rq->ref) != 0);
+				cmpxchg(&drv_tags->rqs[i], rq, NULL);
+			}
+		}
+	}
+	spin_unlock_irqrestore(&drv_tags->lock, flags);
+}
+
 void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		     unsigned int hctx_idx)
 {
@@ -2309,6 +2341,8 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		}
 	}
 
+	blk_mq_clear_rq_mapping(set, tags, hctx_idx);
+
 	while (!list_empty(&tags->page_list)) {
 		page = list_first_entry(&tags->page_list, struct page, lru);
 		list_del_init(&page->lru);
@@ -2368,11 +2402,6 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 	return tags;
 }
 
-static size_t order_to_size(unsigned int order)
-{
-	return (size_t)PAGE_SIZE << order;
-}
-
 static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq,
 			       unsigned int hctx_idx, int node)
 {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 2/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
  2021-04-27 15:10 ` [PATCH V3 2/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
@ 2021-04-27 20:17   ` Bart Van Assche
  2021-04-28  0:07     ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2021-04-27 20:17 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, Khazhy Kumykov, Shin'ichiro Kawasaki,
	Hannes Reinecke, John Garry, David Jeffery

On 4/27/21 8:10 AM, Ming Lei wrote:
> +void blk_mq_put_rq_ref(struct request *rq)
> +{
> +	if (is_flush_rq(rq, rq->mq_hctx))
> +		rq->end_io(rq, 0);
> +	else if (refcount_dec_and_test(&rq->ref))
> +		__blk_mq_free_request(rq);
> +}

The above function needs more work. blk_mq_put_rq_ref() may be called 
from multiple CPUs concurrently and hence must handle concurrent calls 
safely. The flush .end_io callbacks have not been designed to handle 
concurrent calls.

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 2/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
  2021-04-27 20:17   ` Bart Van Assche
@ 2021-04-28  0:07     ` Ming Lei
  2021-04-28  1:37       ` Bart Van Assche
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2021-04-28  0:07 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Khazhy Kumykov,
	Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
	David Jeffery

On Tue, Apr 27, 2021 at 01:17:06PM -0700, Bart Van Assche wrote:
> On 4/27/21 8:10 AM, Ming Lei wrote:
> > +void blk_mq_put_rq_ref(struct request *rq)
> > +{
> > +	if (is_flush_rq(rq, rq->mq_hctx))
> > +		rq->end_io(rq, 0);
> > +	else if (refcount_dec_and_test(&rq->ref))
> > +		__blk_mq_free_request(rq);
> > +}
> 
> The above function needs more work. blk_mq_put_rq_ref() may be called from
> multiple CPUs concurrently and hence must handle concurrent calls safely.
> The flush .end_io callbacks have not been designed to handle concurrent
> calls.

static void flush_end_io(struct request *flush_rq, blk_status_t error)
{
        struct request_queue *q = flush_rq->q;
        struct list_head *running;
        struct request *rq, *n;
        unsigned long flags = 0;
        struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);

        /* release the tag's ownership to the req cloned from */
        spin_lock_irqsave(&fq->mq_flush_lock, flags);

        if (!refcount_dec_and_test(&flush_rq->ref)) {
                fq->rq_status = error;
                spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
                return;
        }
		...
		spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
}

Both spin lock and refcount_dec_and_test() are called at the beginning of
flush_end_io(), so it is absolutely reliable in case of concurrent
calls.

Otherwise, it is simply one issue between normal completion and timeout
since the pattern in this patch is same with timeout.

Or do I miss something?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 2/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
  2021-04-28  0:07     ` Ming Lei
@ 2021-04-28  1:37       ` Bart Van Assche
  2021-04-28  2:22         ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2021-04-28  1:37 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Khazhy Kumykov,
	Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
	David Jeffery

On 4/27/21 5:07 PM, Ming Lei wrote:
> On Tue, Apr 27, 2021 at 01:17:06PM -0700, Bart Van Assche wrote:
>> On 4/27/21 8:10 AM, Ming Lei wrote:
>>> +void blk_mq_put_rq_ref(struct request *rq)
>>> +{
>>> +	if (is_flush_rq(rq, rq->mq_hctx))
>>> +		rq->end_io(rq, 0);
>>> +	else if (refcount_dec_and_test(&rq->ref))
>>> +		__blk_mq_free_request(rq);
>>> +}
>>
>> The above function needs more work. blk_mq_put_rq_ref() may be called from
>> multiple CPUs concurrently and hence must handle concurrent calls safely.
>> The flush .end_io callbacks have not been designed to handle concurrent
>> calls.
> 
> static void flush_end_io(struct request *flush_rq, blk_status_t error)
> {
>         struct request_queue *q = flush_rq->q;
>         struct list_head *running;
>         struct request *rq, *n;
>         unsigned long flags = 0;
>         struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
> 
>         /* release the tag's ownership to the req cloned from */
>         spin_lock_irqsave(&fq->mq_flush_lock, flags);
> 
>         if (!refcount_dec_and_test(&flush_rq->ref)) {
>                 fq->rq_status = error;
>                 spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
>                 return;
>         }
> 		...
> 		spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
> }
> 
> Both spin lock and refcount_dec_and_test() are called at the beginning of
> flush_end_io(), so it is absolutely reliable in case of concurrent
> calls.
> 
> Otherwise, it is simply one issue between normal completion and timeout
> since the pattern in this patch is same with timeout.
> 
> Or do I miss something?

The following code from blk_flush_restore_request() modifies the end_io
pointer:

	rq->end_io = rq->flush.saved_end_io;

If blk_mq_put_rq_ref() is called from two different contexts then one of
the two rq->end_io(rq, 0) calls in blk_mq_put_rq_ref() races with the
end_io assignment in blk_flush_restore_request().

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 2/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
  2021-04-28  1:37       ` Bart Van Assche
@ 2021-04-28  2:22         ` Ming Lei
  0 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2021-04-28  2:22 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Khazhy Kumykov,
	Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
	David Jeffery

On Wed, Apr 28, 2021 at 9:37 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 4/27/21 5:07 PM, Ming Lei wrote:
> > On Tue, Apr 27, 2021 at 01:17:06PM -0700, Bart Van Assche wrote:
> >> On 4/27/21 8:10 AM, Ming Lei wrote:
> >>> +void blk_mq_put_rq_ref(struct request *rq)
> >>> +{
> >>> +   if (is_flush_rq(rq, rq->mq_hctx))
> >>> +           rq->end_io(rq, 0);
> >>> +   else if (refcount_dec_and_test(&rq->ref))
> >>> +           __blk_mq_free_request(rq);
> >>> +}
> >>
> >> The above function needs more work. blk_mq_put_rq_ref() may be called from
> >> multiple CPUs concurrently and hence must handle concurrent calls safely.
> >> The flush .end_io callbacks have not been designed to handle concurrent
> >> calls.
> >
> > static void flush_end_io(struct request *flush_rq, blk_status_t error)
> > {
> >         struct request_queue *q = flush_rq->q;
> >         struct list_head *running;
> >         struct request *rq, *n;
> >         unsigned long flags = 0;
> >         struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
> >
> >         /* release the tag's ownership to the req cloned from */
> >         spin_lock_irqsave(&fq->mq_flush_lock, flags);
> >
> >         if (!refcount_dec_and_test(&flush_rq->ref)) {
> >                 fq->rq_status = error;
> >                 spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
> >                 return;
> >         }
> >               ...
> >               spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
> > }
> >
> > Both spin lock and refcount_dec_and_test() are called at the beginning of
> > flush_end_io(), so it is absolutely reliable in case of concurrent
> > calls.
> >
> > Otherwise, it is simply one issue between normal completion and timeout
> > since the pattern in this patch is same with timeout.
> >
> > Or do I miss something?
>
> The following code from blk_flush_restore_request() modifies the end_io
> pointer:
>
>         rq->end_io = rq->flush.saved_end_io;

blk_flush_restore_request() is only done for request passed to
blk_insert_flush(),
here we only call ->end_io() for flush_rq which is one flush internal
request instance, please see is_flush_rq() definition.  Also
flush_rq->end_io always
points to flush_end_io().

So there isn't such issue you mentioned.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 3/3] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-04-27 15:10 ` [PATCH V3 3/3] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
@ 2021-04-28 14:30   ` David Jeffery
  2021-04-28 15:24     ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: David Jeffery @ 2021-04-28 14:30 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Bart Van Assche, Khazhy Kumykov,
	Shin'ichiro Kawasaki, Hannes Reinecke, John Garry

On Tue, Apr 27, 2021 at 11:10:58PM +0800, Ming Lei wrote:
> 
> refcount_inc_not_zero() in bt_tags_iter() still may read one freed
> request.
> 
> Fix the issue by the following approach:
> 
> 1) hold a per-tags spinlock when reading ->rqs[tag] and calling
> refcount_inc_not_zero in bt_tags_iter()
> 

This method of closing the race still in my original patch is very nice.
It's a great improvement.

> 2) clearing stale request referred via ->rqs[tag] before freeing
> request pool, the per-tags spinlock is held for clearing stale
> ->rq[tag]
> 
> So after we cleared stale requests, bt_tags_iter() won't observe
> freed request any more, also the clearing will wait for pending
> request reference.
> 
> The idea of clearing ->rqs[] is borrowed from John Garry's previous
> patch and one recent David's patch.
>

However, when you took my original cmpxchg patch and merged my separate
function to do the cmpxchg cleaning into blk_mq_clear_rq_mapping, you
missed why it was a separate function.  Your patch will clean out the
static_rqs requests which are being freed, but it doesn't clean out the
special flush request that gets allocated individually by a
request_queue.  The flush request can be put directly into the rqs[]
array so it also needs to be cleaned when a request_queue is being
torn down.  This was the second caller of my separated cleaning function.

With that portion of my original patch removed, a stale pointer to a
freed flush request can still remain after a request_queue is released.

David Jeffery


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 3/3] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-04-28 14:30   ` David Jeffery
@ 2021-04-28 15:24     ` Ming Lei
  0 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2021-04-28 15:24 UTC (permalink / raw)
  To: David Jeffery
  Cc: Jens Axboe, linux-block, Bart Van Assche, Khazhy Kumykov,
	Shin'ichiro Kawasaki, Hannes Reinecke, John Garry

On Wed, Apr 28, 2021 at 10:30:13AM -0400, David Jeffery wrote:
> On Tue, Apr 27, 2021 at 11:10:58PM +0800, Ming Lei wrote:
> > 
> > refcount_inc_not_zero() in bt_tags_iter() still may read one freed
> > request.
> > 
> > Fix the issue by the following approach:
> > 
> > 1) hold a per-tags spinlock when reading ->rqs[tag] and calling
> > refcount_inc_not_zero in bt_tags_iter()
> > 
> 
> This method of closing the race still in my original patch is very nice.
> It's a great improvement.
> 
> > 2) clearing stale request referred via ->rqs[tag] before freeing
> > request pool, the per-tags spinlock is held for clearing stale
> > ->rq[tag]
> > 
> > So after we cleared stale requests, bt_tags_iter() won't observe
> > freed request any more, also the clearing will wait for pending
> > request reference.
> > 
> > The idea of clearing ->rqs[] is borrowed from John Garry's previous
> > patch and one recent David's patch.
> >
> 
> However, when you took my original cmpxchg patch and merged my separate
> function to do the cmpxchg cleaning into blk_mq_clear_rq_mapping, you
> missed why it was a separate function.  Your patch will clean out the
> static_rqs requests which are being freed, but it doesn't clean out the
> special flush request that gets allocated individually by a
> request_queue.  The flush request can be put directly into the rqs[]
> array so it also needs to be cleaned when a request_queue is being
> torn down.  This was the second caller of my separated cleaning function.

It can be covered just in one calling of the cleaning function,
see the following patch:

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index e1e997af89a0..fb9eeb03d6a0 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -500,7 +500,8 @@ static void blk_mq_sched_free_tags(struct blk_mq_tag_set *set,
 	unsigned int flags = set->flags & ~BLK_MQ_F_TAG_HCTX_SHARED;
 
 	if (hctx->sched_tags) {
-		blk_mq_free_rqs(set, hctx->sched_tags, hctx_idx);
+		blk_mq_free_rqs(set, hctx->sched_tags, hctx_idx,
+				hctx->fq->flush_rq);
 		blk_mq_free_rq_map(hctx->sched_tags, flags);
 		hctx->sched_tags = NULL;
 	}
@@ -611,7 +612,8 @@ void blk_mq_sched_free_requests(struct request_queue *q)
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (hctx->sched_tags)
-			blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i);
+			blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i,
+					hctx->fq->flush_rq);
 	}
 }
 
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 9329b94a9743..757e0e652ce7 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -277,9 +277,15 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	if (iter_static_rqs)
 		rq = tags->static_rqs[bitnr];
 	else {
+		unsigned long flags;
+
+		spin_lock_irqsave(&tags->lock, flags);
 		rq = tags->rqs[bitnr];
-		if (!rq || !refcount_inc_not_zero(&rq->ref))
+		if (!rq || !refcount_inc_not_zero(&rq->ref)) {
+			spin_unlock_irqrestore(&tags->lock, flags);
 			return true;
+		}
+		spin_unlock_irqrestore(&tags->lock, flags);
 	}
 	if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
 	    !blk_mq_request_started(rq))
@@ -526,6 +532,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 
 	tags->nr_tags = total_tags;
 	tags->nr_reserved_tags = reserved_tags;
+	spin_lock_init(&tags->lock);
 
 	if (blk_mq_is_sbitmap_shared(flags))
 		return tags;
@@ -586,7 +593,8 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 			return -ENOMEM;
 		}
 
-		blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
+		blk_mq_free_rqs(set, *tagsptr, hctx->queue_num,
+				hctx->fq->flush_rq);
 		blk_mq_free_rq_map(*tagsptr, flags);
 		*tagsptr = new;
 	} else {
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 7d3e6b333a4a..f942a601b5ef 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -20,6 +20,9 @@ struct blk_mq_tags {
 	struct request **rqs;
 	struct request **static_rqs;
 	struct list_head page_list;
+
+	/* used to clear rqs[] before one request pool is freed */
+	spinlock_t lock;
 };
 
 extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4bd6c11bd8bc..686f6b210114 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2291,8 +2291,42 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio)
 	return BLK_QC_T_NONE;
 }
 
+static size_t order_to_size(unsigned int order)
+{
+	return (size_t)PAGE_SIZE << order;
+}
+
+/* called before freeing request pool in @tags */
+static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
+		struct blk_mq_tags *tags, unsigned int hctx_idx,
+		struct request *flush_rq)
+{
+	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
+	struct page *page;
+	unsigned long flags;
+
+	spin_lock_irqsave(&drv_tags->lock, flags);
+	list_for_each_entry(page, &tags->page_list, lru) {
+		unsigned long start = (unsigned long)page_address(page);
+		unsigned long end = start + order_to_size(page->private);
+		int i;
+
+		for (i = 0; i < set->queue_depth; i++) {
+			struct request *rq = drv_tags->rqs[i];
+			unsigned long rq_addr = (unsigned long)rq;
+
+			if ((rq_addr >= start && rq_addr < end) ||
+					rq == flush_rq) {
+				WARN_ON_ONCE(refcount_read(&rq->ref) != 0);
+				cmpxchg(&drv_tags->rqs[i], rq, NULL);
+			}
+		}
+	}
+	spin_unlock_irqrestore(&drv_tags->lock, flags);
+}
+
 void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
-		     unsigned int hctx_idx)
+		     unsigned int hctx_idx, struct request *flush_rq)
 {
 	struct page *page;
 
@@ -2309,6 +2343,8 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		}
 	}
 
+	blk_mq_clear_rq_mapping(set, tags, hctx_idx, flush_rq);
+
 	while (!list_empty(&tags->page_list)) {
 		page = list_first_entry(&tags->page_list, struct page, lru);
 		list_del_init(&page->lru);
@@ -2368,11 +2404,6 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 	return tags;
 }
 
-static size_t order_to_size(unsigned int order)
-{
-	return (size_t)PAGE_SIZE << order;
-}
-
 static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq,
 			       unsigned int hctx_idx, int node)
 {
@@ -2461,7 +2492,7 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 	return 0;
 
 fail:
-	blk_mq_free_rqs(set, tags, hctx_idx);
+	blk_mq_free_rqs(set, tags, hctx_idx, NULL);
 	return -ENOMEM;
 }
 
@@ -2794,7 +2825,7 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
 	unsigned int flags = set->flags;
 
 	if (set->tags && set->tags[hctx_idx]) {
-		blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
+		blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx, NULL);
 		blk_mq_free_rq_map(set->tags[hctx_idx], flags);
 		set->tags[hctx_idx] = NULL;
 	}
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 143afe42c63a..08ef96c181f6 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -53,7 +53,7 @@ void blk_mq_put_rq_ref(struct request *rq);
  * Internal helpers for allocating/freeing the request map
  */
 void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
-		     unsigned int hctx_idx);
+		     unsigned int hctx_idx, struct request *flush_rq);
 void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags);
 struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 					unsigned int hctx_idx,

-- 
Ming


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-04-28 15:24 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-27 15:10 [PATCH V3 0/3] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
2021-04-27 15:10 ` [PATCH V3 1/3] block: avoid double io accounting for flush request Ming Lei
2021-04-27 15:10 ` [PATCH V3 2/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
2021-04-27 20:17   ` Bart Van Assche
2021-04-28  0:07     ` Ming Lei
2021-04-28  1:37       ` Bart Van Assche
2021-04-28  2:22         ` Ming Lei
2021-04-27 15:10 ` [PATCH V3 3/3] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
2021-04-28 14:30   ` David Jeffery
2021-04-28 15:24     ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.