linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests
@ 2021-05-07 14:42 Ming Lei
  2021-05-07 14:42 ` [PATCH V6 1/4] block: avoid double io accounting for flush request Ming Lei
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Ming Lei @ 2021-05-07 14:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
	Christoph Hellwig, David Jeffery, Ming Lei

Hi Jens,

This patchset fixes the request UAF issue by one simple approach,
without clearing ->rqs[] in fast path, please consider it for 5.13.

1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
and release it after calling ->fn, so ->fn won't be called for one
request if its queue is frozen, done in 2st patch

2) clearing any stale request referred in ->rqs[] before freeing the
request pool, one per-tags spinlock is added for protecting
grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
in bt_tags_iter() is avoided, done in 3rd patch.

V6:
	- hold spin lock when reading rq via ->rq[tag_bit], the issue is
	  added in V5
	- make blk_mq_find_and_get_req() more clean, as suggested by Bart
	- not hold tags->lock when clearing ->rqs[] for avoiding to disable
	interrupt a bit long, as suggested by Bart
	- code style change, as suggested by Christoph

V5:
	- cover bt_iter() by same approach taken in bt_tags_iter(), and pass
	John's invasive test
	- add tested-by/reviewed-by tag

V4:
	- remove hctx->fq-flush_rq from tags->rqs[] before freeing hw queue,
	patch 4/4 is added, which is based on David's patch.

V3:
	- drop patches for completing requests started in iterator ->fn,
	  because blk-mq guarantees that valid request is passed to ->fn,
	  and it is driver's responsibility for avoiding double completion.
	  And drivers works well for not completing rq twice.
	- add one patch for avoiding double accounting of flush rq 

V2:
	- take Bart's suggestion to not add blk-mq helper for completing
	  requests when it is being iterated
	- don't grab rq->ref if the iterator is over static rqs because
	the use case do require to iterate over all requests no matter if
	the request is initialized or not

Ming Lei (4):
  block: avoid double io accounting for flush request
  blk-mq: grab rq->refcount before calling ->fn in
    blk_mq_tagset_busy_iter
  blk-mq: clear stale request in tags->rq[] before freeing one request
    pool
  blk-mq: clearing flush request reference in tags->rqs[]

 block/blk-flush.c  |  3 +-
 block/blk-mq-tag.c | 49 +++++++++++++++++++------
 block/blk-mq-tag.h |  6 +++
 block/blk-mq.c     | 91 ++++++++++++++++++++++++++++++++++++++++------
 block/blk-mq.h     |  1 +
 5 files changed, 126 insertions(+), 24 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V6 1/4] block: avoid double io accounting for flush request
  2021-05-07 14:42 [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
@ 2021-05-07 14:42 ` Ming Lei
  2021-05-07 14:42 ` [PATCH V6 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2021-05-07 14:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
	Christoph Hellwig, David Jeffery, Ming Lei

For flush request, rq->end_io() may be called two times, one is from
timeout handling(blk_mq_check_expired()), another is from normal
completion(__blk_mq_end_request()).

Move blk_account_io_flush() after flush_rq->ref drops to zero, so
io accounting can be done just once for flush request.

Fixes: b68663186577 ("block: add iostat counters for flush requests")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-flush.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 7942ca6ed321..1002f6c58181 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -219,8 +219,6 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
 	unsigned long flags = 0;
 	struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
 
-	blk_account_io_flush(flush_rq);
-
 	/* release the tag's ownership to the req cloned from */
 	spin_lock_irqsave(&fq->mq_flush_lock, flags);
 
@@ -230,6 +228,7 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
 		return;
 	}
 
+	blk_account_io_flush(flush_rq);
 	/*
 	 * Flush request has to be marked as IDLE when it is really ended
 	 * because its .end_io() is called from timeout code path too for
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V6 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
  2021-05-07 14:42 [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
  2021-05-07 14:42 ` [PATCH V6 1/4] block: avoid double io accounting for flush request Ming Lei
@ 2021-05-07 14:42 ` Ming Lei
  2021-05-07 14:42 ` [PATCH V6 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2021-05-07 14:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
	Christoph Hellwig, David Jeffery, Ming Lei

Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and
this way will prevent the request from being re-used when ->fn is
running. The approach is same as what we do during handling timeout.

Fix request use-after-free(UAF) related with completion race or queue
releasing:

- If one rq is referred before rq->q is frozen, then queue won't be
frozen before the request is released during iteration.

- If one rq is referred after rq->q is frozen, refcount_inc_not_zero()
will return false, and we won't iterate over this request.

However, still one request UAF not covered: refcount_inc_not_zero() may
read one freed request, and it will be handled in next patch.

Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-tag.c | 44 +++++++++++++++++++++++++++++++++-----------
 block/blk-mq.c     | 14 +++++++++-----
 block/blk-mq.h     |  1 +
 3 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 2a37731e8244..544edf2c56a5 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -199,6 +199,16 @@ struct bt_iter_data {
 	bool reserved;
 };
 
+static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
+		unsigned int bitnr)
+{
+	struct request *rq = tags->rqs[bitnr];
+
+	if (!rq || !refcount_inc_not_zero(&rq->ref))
+		return NULL;
+	return rq;
+}
+
 static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 {
 	struct bt_iter_data *iter_data = data;
@@ -206,18 +216,22 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	struct blk_mq_tags *tags = hctx->tags;
 	bool reserved = iter_data->reserved;
 	struct request *rq;
+	bool ret = true;
 
 	if (!reserved)
 		bitnr += tags->nr_reserved_tags;
-	rq = tags->rqs[bitnr];
-
 	/*
 	 * We can hit rq == NULL here, because the tagging functions
 	 * test and set the bit before assigning ->rqs[].
 	 */
-	if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
-		return iter_data->fn(hctx, rq, iter_data->data, reserved);
-	return true;
+	rq = blk_mq_find_and_get_req(tags, bitnr);
+	if (!rq)
+		return true;
+
+	if (rq->q == hctx->queue && rq->mq_hctx == hctx)
+		ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
+	blk_mq_put_rq_ref(rq);
+	return ret;
 }
 
 /**
@@ -264,6 +278,8 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	struct blk_mq_tags *tags = iter_data->tags;
 	bool reserved = iter_data->flags & BT_TAG_ITER_RESERVED;
 	struct request *rq;
+	bool ret = true;
+	bool iter_static_rqs = !!(iter_data->flags & BT_TAG_ITER_STATIC_RQS);
 
 	if (!reserved)
 		bitnr += tags->nr_reserved_tags;
@@ -272,16 +288,19 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	 * We can hit rq == NULL here, because the tagging functions
 	 * test and set the bit before assigning ->rqs[].
 	 */
-	if (iter_data->flags & BT_TAG_ITER_STATIC_RQS)
+	if (iter_static_rqs)
 		rq = tags->static_rqs[bitnr];
 	else
-		rq = tags->rqs[bitnr];
+		rq = blk_mq_find_and_get_req(tags, bitnr);
 	if (!rq)
 		return true;
-	if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
-	    !blk_mq_request_started(rq))
-		return true;
-	return iter_data->fn(rq, iter_data->data, reserved);
+
+	if (!(iter_data->flags & BT_TAG_ITER_STARTED) ||
+	    blk_mq_request_started(rq))
+		ret = iter_data->fn(rq, iter_data->data, reserved);
+	if (!iter_static_rqs)
+		blk_mq_put_rq_ref(rq);
+	return ret;
 }
 
 /**
@@ -348,6 +367,9 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
  *		indicates whether or not @rq is a reserved request. Return
  *		true to continue iterating tags, false to stop.
  * @priv:	Will be passed as second argument to @fn.
+ *
+ * We grab one request reference before calling @fn and release it after
+ * @fn returns.
  */
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		busy_tag_iter_fn *fn, void *priv)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 466676bc2f0b..d053e80fa6d7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -909,6 +909,14 @@ static bool blk_mq_req_expired(struct request *rq, unsigned long *next)
 	return false;
 }
 
+void blk_mq_put_rq_ref(struct request *rq)
+{
+	if (is_flush_rq(rq, rq->mq_hctx))
+		rq->end_io(rq, 0);
+	else if (refcount_dec_and_test(&rq->ref))
+		__blk_mq_free_request(rq);
+}
+
 static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 		struct request *rq, void *priv, bool reserved)
 {
@@ -942,11 +950,7 @@ static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 	if (blk_mq_req_expired(rq, next))
 		blk_mq_rq_timed_out(rq, reserved);
 
-	if (is_flush_rq(rq, hctx))
-		rq->end_io(rq, 0);
-	else if (refcount_dec_and_test(&rq->ref))
-		__blk_mq_free_request(rq);
-
+	blk_mq_put_rq_ref(rq);
 	return true;
 }
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 9ce64bc4a6c8..556368d2c5b6 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -47,6 +47,7 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
 void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
 struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
 					struct blk_mq_ctx *start);
+void blk_mq_put_rq_ref(struct request *rq);
 
 /*
  * Internal helpers for allocating/freeing the request map
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V6 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-07 14:42 [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
  2021-05-07 14:42 ` [PATCH V6 1/4] block: avoid double io accounting for flush request Ming Lei
  2021-05-07 14:42 ` [PATCH V6 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
@ 2021-05-07 14:42 ` Ming Lei
  2021-05-07 14:42 ` [PATCH V6 4/4] blk-mq: clearing flush request reference in tags->rqs[] Ming Lei
  2021-05-11  5:05 ` [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Shinichiro Kawasaki
  4 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2021-05-07 14:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
	Christoph Hellwig, David Jeffery, Ming Lei

refcount_inc_not_zero() in bt_tags_iter() still may read one freed
request.

Fix the issue by the following approach:

1) hold a per-tags spinlock when reading ->rqs[tag] and calling
refcount_inc_not_zero in bt_tags_iter()

2) clearing stale request referred via ->rqs[tag] before freeing
request pool, the per-tags spinlock is held for clearing stale
->rq[tag]

So after we cleared stale requests, bt_tags_iter() won't observe
freed request any more, also the clearing will wait for pending
request reference.

The idea of clearing ->rqs[] is borrowed from John Garry's previous
patch and one recent David's patch.

Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-tag.c |  9 +++++++--
 block/blk-mq-tag.h |  6 ++++++
 block/blk-mq.c     | 46 +++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 544edf2c56a5..1671dae43030 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -202,10 +202,14 @@ struct bt_iter_data {
 static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
 		unsigned int bitnr)
 {
-	struct request *rq = tags->rqs[bitnr];
+	struct request *rq;
+	unsigned long flags;
 
+	spin_lock_irqsave(&tags->lock, flags);
+	rq = tags->rqs[bitnr];
 	if (!rq || !refcount_inc_not_zero(&rq->ref))
-		return NULL;
+		rq = NULL;
+	spin_unlock_irqrestore(&tags->lock, flags);
 	return rq;
 }
 
@@ -538,6 +542,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 
 	tags->nr_tags = total_tags;
 	tags->nr_reserved_tags = reserved_tags;
+	spin_lock_init(&tags->lock);
 
 	if (blk_mq_is_sbitmap_shared(flags))
 		return tags;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 7d3e6b333a4a..f887988e5ef6 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -20,6 +20,12 @@ struct blk_mq_tags {
 	struct request **rqs;
 	struct request **static_rqs;
 	struct list_head page_list;
+
+	/*
+	 * used to clear request reference in rqs[] before freeing one
+	 * request pool
+	 */
+	spinlock_t lock;
 };
 
 extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d053e80fa6d7..626b4483e006 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2306,6 +2306,45 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio)
 	return BLK_QC_T_NONE;
 }
 
+static size_t order_to_size(unsigned int order)
+{
+	return (size_t)PAGE_SIZE << order;
+}
+
+/* called before freeing request pool in @tags */
+static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
+		struct blk_mq_tags *tags, unsigned int hctx_idx)
+{
+	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
+	struct page *page;
+	unsigned long flags;
+
+	list_for_each_entry(page, &tags->page_list, lru) {
+		unsigned long start = (unsigned long)page_address(page);
+		unsigned long end = start + order_to_size(page->private);
+		int i;
+
+		for (i = 0; i < set->queue_depth; i++) {
+			struct request *rq = drv_tags->rqs[i];
+			unsigned long rq_addr = (unsigned long)rq;
+
+			if (rq_addr >= start && rq_addr < end) {
+				WARN_ON_ONCE(refcount_read(&rq->ref) != 0);
+				cmpxchg(&drv_tags->rqs[i], rq, NULL);
+			}
+		}
+	}
+
+	/*
+	 * Wait until all pending iteration is done.
+	 *
+	 * Request reference is cleared and it is guaranteed to be observed
+	 * after the ->lock is released.
+	 */
+	spin_lock_irqsave(&drv_tags->lock, flags);
+	spin_unlock_irqrestore(&drv_tags->lock, flags);
+}
+
 void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		     unsigned int hctx_idx)
 {
@@ -2324,6 +2363,8 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		}
 	}
 
+	blk_mq_clear_rq_mapping(set, tags, hctx_idx);
+
 	while (!list_empty(&tags->page_list)) {
 		page = list_first_entry(&tags->page_list, struct page, lru);
 		list_del_init(&page->lru);
@@ -2383,11 +2424,6 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 	return tags;
 }
 
-static size_t order_to_size(unsigned int order)
-{
-	return (size_t)PAGE_SIZE << order;
-}
-
 static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq,
 			       unsigned int hctx_idx, int node)
 {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V6 4/4] blk-mq: clearing flush request reference in tags->rqs[]
  2021-05-07 14:42 [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
                   ` (2 preceding siblings ...)
  2021-05-07 14:42 ` [PATCH V6 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
@ 2021-05-07 14:42 ` Ming Lei
  2021-05-11  5:05 ` [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Shinichiro Kawasaki
  4 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2021-05-07 14:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
	Christoph Hellwig, David Jeffery, Ming Lei

Before we free request queue, clearing flush request reference in
tags->rqs[], so that potential UAF can be avoided.

Based on one patch written by David Jeffery.

Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 626b4483e006..fcd5ed79011f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2642,16 +2642,45 @@ static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
 					    &hctx->cpuhp_dead);
 }
 
+/*
+ * Before freeing hw queue, clearing the flush request reference in
+ * tags->rqs[] for avoiding potential UAF.
+ */
+static void blk_mq_clear_flush_rq_mapping(struct blk_mq_tags *tags,
+		unsigned int queue_depth, struct request *flush_rq)
+{
+	int i;
+	unsigned long flags;
+
+	WARN_ON_ONCE(refcount_read(&flush_rq->ref) != 0);
+
+	for (i = 0; i < queue_depth; i++)
+		cmpxchg(&tags->rqs[i], flush_rq, NULL);
+
+	/*
+	 * Wait until all pending iteration is done.
+	 *
+	 * Request reference is cleared and it is guaranteed to be observed
+	 * after the ->lock is released.
+	 */
+	spin_lock_irqsave(&tags->lock, flags);
+	spin_unlock_irqrestore(&tags->lock, flags);
+}
+
 /* hctx->ctxs will be freed in queue's release handler */
 static void blk_mq_exit_hctx(struct request_queue *q,
 		struct blk_mq_tag_set *set,
 		struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
 {
+	struct request *flush_rq = hctx->fq->flush_rq;
+
 	if (blk_mq_hw_queue_mapped(hctx))
 		blk_mq_tag_idle(hctx);
 
+	blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx],
+			set->queue_depth, flush_rq);
 	if (set->ops->exit_request)
-		set->ops->exit_request(set, hctx->fq->flush_rq, hctx_idx);
+		set->ops->exit_request(set, flush_rq, hctx_idx);
 
 	if (set->ops->exit_hctx)
 		set->ops->exit_hctx(hctx, hctx_idx);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests
  2021-05-07 14:42 [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
                   ` (3 preceding siblings ...)
  2021-05-07 14:42 ` [PATCH V6 4/4] blk-mq: clearing flush request reference in tags->rqs[] Ming Lei
@ 2021-05-11  5:05 ` Shinichiro Kawasaki
  2021-05-11  6:36   ` Ming Lei
  4 siblings, 1 reply; 8+ messages in thread
From: Shinichiro Kawasaki @ 2021-05-11  5:05 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
	John Garry, Christoph Hellwig, David Jeffery

On May 07, 2021 / 22:42, Ming Lei wrote:
> Hi Jens,
> 
> This patchset fixes the request UAF issue by one simple approach,
> without clearing ->rqs[] in fast path, please consider it for 5.13.
> 
> 1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
> and release it after calling ->fn, so ->fn won't be called for one
> request if its queue is frozen, done in 2st patch
> 
> 2) clearing any stale request referred in ->rqs[] before freeing the
> request pool, one per-tags spinlock is added for protecting
> grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
> in bt_tags_iter() is avoided, done in 3rd patch.

Ming, thank you for your effort to fix the UAF issue. I applied the V6 series to
the kernel v5.13-rc1, and confirmed that the series avoids the UAF I had been
observing with blktests block/005 and HDD behind HBA. This is good. However, I
found that the series triggered block/029 hang. Let me share the kernel message
below, which was printed at the hang. KASAN reported null-ptr-deref.

[ 2124.489023] run blktests block/029 at 2021-05-11 13:42:22
[ 2124.561386] null_blk: module loaded
[ 2125.201166] general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] SMP KASAN PTI
[ 2125.212387] KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
[ 2125.220656] CPU: 2 PID: 26514 Comm: check Not tainted 5.13.0-rc1+ #3
[ 2125.227710] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015
[ 2125.235793] RIP: 0010:blk_mq_exit_hctx+0x21b/0x580
[ 2125.241298] Code: 00 00 00 31 db 48 89 44 24 20 85 d2 74 54 49 89 c5 48 b8 00 00 00 00 00 fc ff df 49 c1 ed 03 4c 01 e8 48 89 04 24 48 8b 04 24 <80> 38 00 0f 85 8a 02 00 00 49 8b 87 90 00 00 00 48 63 d3 be 08 00
[ 2125.260747] RSP: 0018:ffff888110677c08 EFLAGS: 00010286
[ 2125.266674] RAX: dffffc0000000012 RBX: 0000000000000000 RCX: ffffffffa4f3710f
[ 2125.274500] RDX: 0000000000000040 RSI: 0000000000000004 RDI: ffff88811f80c4e8
[ 2125.282326] RBP: ffff8881483c0000 R08: 0000000000000000 R09: ffff88811f80c4eb
[ 2125.290153] R10: ffffed1023f0189d R11: 0000000000000001 R12: ffff88811f80c400
[ 2125.297978] R13: 0000000000000012 R14: ffff88810c506038 R15: 0000000000000000
[ 2125.305807] FS:  00007f6dc4d4a740(0000) GS:ffff8883e1480000(0000) knlGS:0000000000000000
[ 2125.314592] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2125.321032] CR2: 00005566b34e55f0 CR3: 000000012c33e005 CR4: 00000000001706e0
[ 2125.328860] Call Trace:
[ 2125.332013]  blk_mq_realloc_hw_ctxs+0x71a/0x15f0
[ 2125.337338]  ? blk_mq_map_queues+0x20c/0x650
[ 2125.342317]  blk_mq_update_nr_hw_queues+0x4cc/0xb70
[ 2125.347905]  ? blk_mq_init_queue+0xb0/0xb0
[ 2125.352709]  nullb_device_submit_queues_store+0x10f/0x1f0 [null_blk]
[ 2125.359781]  ? __null_lookup_page.isra.0+0xd0/0xd0 [null_blk]
[ 2125.366243]  ? __null_lookup_page.isra.0+0xd0/0xd0 [null_blk]
[ 2125.372697]  configfs_write_file+0x2bb/0x450
[ 2125.377676]  vfs_write+0x1cb/0x840
[ 2125.381783]  ksys_write+0xe9/0x1b0
[ 2125.385890]  ? __ia32_sys_read+0xb0/0xb0
[ 2125.390517]  ? syscall_enter_from_user_mode+0x27/0x80
[ 2125.396276]  do_syscall_64+0x40/0x80
[ 2125.400553]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 2125.406308] RIP: 0033:0x7f6dc4e3f4e7
[ 2125.410590] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 2125.430036] RSP: 002b:00007ffc7ee29928 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 2125.438306] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f6dc4e3f4e7
[ 2125.446132] RDX: 0000000000000002 RSI: 00005566b34e55f0 RDI: 0000000000000001
[ 2125.453957] RBP: 00005566b34e55f0 R08: 000000000000000a R09: 00007f6dc4ed60c0
[ 2125.461783] R10: 00007f6dc4ed5fc0 R11: 0000000000000246 R12: 0000000000000002
[ 2125.469610] R13: 00007f6dc4f12520 R14: 0000000000000002 R15: 00007f6dc4f12720
[ 2125.477446] Modules linked in: null_blk xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp tun nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security bridge stp llc ip_set rfkill nfnetlink target_core_user ebtable_filter ebtables target_core_mod ip6table_filter ip6_tables iptable_filter sunrpc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iTCO_wdt intel_pmc_bxt iTCO_vendor_support kvm irqbypass rapl intel_cstate intel_uncore joydev i2c_i801 pcspkr i2c_smbus ses enclosure lpc_ich mei_me mei ioatdma wmi ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter
[ 2125.477718]  acpi_pad zram ip_tables drm_vram_helper drm_kms_helper cec drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul crc32c_intel drm ghash_clmulni_intel igb mpt3sas nvme nvme_core dca i2c_algo_bit raid_class scsi_transport_sas fuse [last unloaded: null_blk]
[ 2125.588156] ---[ end trace 94b8e87c4a29c520 ]---
[ 2125.615124] RIP: 0010:blk_mq_exit_hctx+0x21b/0x580
[ 2125.620628] Code: 00 00 00 31 db 48 89 44 24 20 85 d2 74 54 49 89 c5 48 b8 00 00 00 00 00 fc ff df 49 c1 ed 03 4c 01 e8 48 89 04 24 48 8b 04 24 <80> 38 00 0f 85 8a 02 00 00 49 8b 87 90 00 00 00 48 63 d3 be 08 00
[ 2125.640077] RSP: 0018:ffff888110677c08 EFLAGS: 00010286
[ 2125.646007] RAX: dffffc0000000012 RBX: 0000000000000000 RCX: ffffffffa4f3710f
[ 2125.653844] RDX: 0000000000000040 RSI: 0000000000000004 RDI: ffff88811f80c4e8
[ 2125.661675] RBP: ffff8881483c0000 R08: 0000000000000000 R09: ffff88811f80c4eb
[ 2125.669502] R10: ffffed1023f0189d R11: 0000000000000001 R12: ffff88811f80c400
[ 2125.677327] R13: 0000000000000012 R14: ffff88810c506038 R15: 0000000000000000
[ 2125.685158] FS:  00007f6dc4d4a740(0000) GS:ffff8883e1480000(0000) knlGS:0000000000000000
[ 2125.693941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2125.700382] CR2: 00005566b34e55f0 CR3: 000000012c33e005 CR4: 00000000001706e0

-- 
Best Regards,
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests
  2021-05-11  5:05 ` [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Shinichiro Kawasaki
@ 2021-05-11  6:36   ` Ming Lei
  2021-05-11 12:23     ` Shinichiro Kawasaki
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2021-05-11  6:36 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
	John Garry, Christoph Hellwig, David Jeffery

Hello Shinichiro,

Thanks for your test!

On Tue, May 11, 2021 at 05:05:52AM +0000, Shinichiro Kawasaki wrote:
> On May 07, 2021 / 22:42, Ming Lei wrote:
> > Hi Jens,
> > 
> > This patchset fixes the request UAF issue by one simple approach,
> > without clearing ->rqs[] in fast path, please consider it for 5.13.
> > 
> > 1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
> > and release it after calling ->fn, so ->fn won't be called for one
> > request if its queue is frozen, done in 2st patch
> > 
> > 2) clearing any stale request referred in ->rqs[] before freeing the
> > request pool, one per-tags spinlock is added for protecting
> > grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
> > in bt_tags_iter() is avoided, done in 3rd patch.
> 
> Ming, thank you for your effort to fix the UAF issue. I applied the V6 series to
> the kernel v5.13-rc1, and confirmed that the series avoids the UAF I had been
> observing with blktests block/005 and HDD behind HBA. This is good. However, I
> found that the series triggered block/029 hang. Let me share the kernel message
> below, which was printed at the hang. KASAN reported null-ptr-deref.
> 
> [ 2124.489023] run blktests block/029 at 2021-05-11 13:42:22
> [ 2124.561386] null_blk: module loaded
> [ 2125.201166] general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] SMP KASAN PTI
> [ 2125.212387] KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]

It is because this hw queue isn't mapped yet and new added hw queue is
mapped in blk_mq_map_swqueue(), and the following change can fix it, and
I will post V7 after careful test.


diff --git a/block/blk-mq.c b/block/blk-mq.c
index fcd5ed79011f..691b555c26fa 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2652,6 +2652,10 @@ static void blk_mq_clear_flush_rq_mapping(struct blk_mq_tags *tags,
 	int i;
 	unsigned long flags;
 
+	/* return if hw queue isn't mapped */
+	if (!tags)
+		return;
+
 	WARN_ON_ONCE(refcount_read(&flush_rq->ref) != 0);
 
 	for (i = 0; i < queue_depth; i++)


Thanks,
Ming


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests
  2021-05-11  6:36   ` Ming Lei
@ 2021-05-11 12:23     ` Shinichiro Kawasaki
  0 siblings, 0 replies; 8+ messages in thread
From: Shinichiro Kawasaki @ 2021-05-11 12:23 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
	John Garry, Christoph Hellwig, David Jeffery

On May 11, 2021 / 14:36, Ming Lei wrote:
> Hello Shinichiro,
> 
> Thanks for your test!
> 
> On Tue, May 11, 2021 at 05:05:52AM +0000, Shinichiro Kawasaki wrote:
> > On May 07, 2021 / 22:42, Ming Lei wrote:
> > > Hi Jens,
> > > 
> > > This patchset fixes the request UAF issue by one simple approach,
> > > without clearing ->rqs[] in fast path, please consider it for 5.13.
> > > 
> > > 1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
> > > and release it after calling ->fn, so ->fn won't be called for one
> > > request if its queue is frozen, done in 2st patch
> > > 
> > > 2) clearing any stale request referred in ->rqs[] before freeing the
> > > request pool, one per-tags spinlock is added for protecting
> > > grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
> > > in bt_tags_iter() is avoided, done in 3rd patch.
> > 
> > Ming, thank you for your effort to fix the UAF issue. I applied the V6 series to
> > the kernel v5.13-rc1, and confirmed that the series avoids the UAF I had been
> > observing with blktests block/005 and HDD behind HBA. This is good. However, I
> > found that the series triggered block/029 hang. Let me share the kernel message
> > below, which was printed at the hang. KASAN reported null-ptr-deref.
> > 
> > [ 2124.489023] run blktests block/029 at 2021-05-11 13:42:22
> > [ 2124.561386] null_blk: module loaded
> > [ 2125.201166] general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] SMP KASAN PTI
> > [ 2125.212387] KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
> 
> It is because this hw queue isn't mapped yet and new added hw queue is
> mapped in blk_mq_map_swqueue(), and the following change can fix it, and
> I will post V7 after careful test.
> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index fcd5ed79011f..691b555c26fa 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2652,6 +2652,10 @@ static void blk_mq_clear_flush_rq_mapping(struct blk_mq_tags *tags,
>  	int i;
>  	unsigned long flags;
>  
> +	/* return if hw queue isn't mapped */
> +	if (!tags)
> +		return;
> +
>  	WARN_ON_ONCE(refcount_read(&flush_rq->ref) != 0);
>  
>  	for (i = 0; i < queue_depth; i++)

Thank you Ming. I confirmed that this change avoids the hang at block/029.

-- 
Best Regards,
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-05-11 12:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-07 14:42 [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
2021-05-07 14:42 ` [PATCH V6 1/4] block: avoid double io accounting for flush request Ming Lei
2021-05-07 14:42 ` [PATCH V6 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
2021-05-07 14:42 ` [PATCH V6 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
2021-05-07 14:42 ` [PATCH V6 4/4] blk-mq: clearing flush request reference in tags->rqs[] Ming Lei
2021-05-11  5:05 ` [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Shinichiro Kawasaki
2021-05-11  6:36   ` Ming Lei
2021-05-11 12:23     ` Shinichiro Kawasaki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).