linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V5 0/4] blk-mq: fix request UAF related with iterating over tagset requests
@ 2021-05-05 14:58 Ming Lei
  2021-05-05 14:58 ` [PATCH V5 1/4] block: avoid double io accounting for flush request Ming Lei
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Ming Lei @ 2021-05-05 14:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
	David Jeffery, Ming Lei

Hi Jens,

This patchset fixes the request UAF issue by one simple approach,
without clearing ->rqs[] in fast path, please consider it for 5.13.

1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
and release it after calling ->fn, so ->fn won't be called for one
request if its queue is frozen, done in 2st patch

2) clearing any stale request referred in ->rqs[] before freeing the
request pool, one per-tags spinlock is added for protecting
grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
in bt_tags_iter() is avoided, done in 3rd patch.

V5:
	- cover bt_iter() by same approach taken in bt_tags_iter(), and pass
	John's invasive test
	- add tested-by/reviewed-by tag

V4:
	- remove hctx->fq-flush_rq from tags->rqs[] before freeing hw queue,
	patch 4/4 is added, which is based on David's patch.

V3:
	- drop patches for completing requests started in iterator ->fn,
	  because blk-mq guarantees that valid request is passed to ->fn,
	  and it is driver's responsibility for avoiding double completion.
	  And drivers works well for not completing rq twice.
	- add one patch for avoiding double accounting of flush rq 

V2:
	- take Bart's suggestion to not add blk-mq helper for completing
	  requests when it is being iterated
	- don't grab rq->ref if the iterator is over static rqs because
	the use case do require to iterate over all requests no matter if
	the request is initialized or not

Ming Lei (4):
  block: avoid double io accounting for flush request
  blk-mq: grab rq->refcount before calling ->fn in
    blk_mq_tagset_busy_iter
  blk-mq: clear stale request in tags->rq[] before freeing one request
    pool
  blk-mq: clearing flush request reference in tags->rqs[]

 block/blk-flush.c  |  3 +-
 block/blk-mq-tag.c | 50 +++++++++++++++++++++++-------
 block/blk-mq-tag.h |  3 ++
 block/blk-mq.c     | 77 +++++++++++++++++++++++++++++++++++++++-------
 block/blk-mq.h     |  1 +
 5 files changed, 110 insertions(+), 24 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH V5 1/4] block: avoid double io accounting for flush request
  2021-05-05 14:58 [PATCH V5 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
@ 2021-05-05 14:58 ` Ming Lei
  2021-05-06  6:44   ` Christoph Hellwig
  2021-05-05 14:58 ` [PATCH V5 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Ming Lei @ 2021-05-05 14:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
	David Jeffery, Ming Lei

For flush request, rq->end_io() may be called two times, one is from
timeout handling(blk_mq_check_expired()), another is from normal
completion(__blk_mq_end_request()).

Move blk_account_io_flush() after flush_rq->ref drops to zero, so
io accounting can be done just once for flush request.

Fixes: b68663186577 ("block: add iostat counters for flush requests")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-flush.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 7942ca6ed321..1002f6c58181 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -219,8 +219,6 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
 	unsigned long flags = 0;
 	struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
 
-	blk_account_io_flush(flush_rq);
-
 	/* release the tag's ownership to the req cloned from */
 	spin_lock_irqsave(&fq->mq_flush_lock, flags);
 
@@ -230,6 +228,7 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
 		return;
 	}
 
+	blk_account_io_flush(flush_rq);
 	/*
 	 * Flush request has to be marked as IDLE when it is really ended
 	 * because its .end_io() is called from timeout code path too for
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V5 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
  2021-05-05 14:58 [PATCH V5 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
  2021-05-05 14:58 ` [PATCH V5 1/4] block: avoid double io accounting for flush request Ming Lei
@ 2021-05-05 14:58 ` Ming Lei
  2021-05-06  6:54   ` Christoph Hellwig
  2021-05-05 14:58 ` [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
  2021-05-05 14:58 ` [PATCH V5 4/4] blk-mq: clearing flush request reference in tags->rqs[] Ming Lei
  3 siblings, 1 reply; 26+ messages in thread
From: Ming Lei @ 2021-05-05 14:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
	David Jeffery, Ming Lei

Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and
this way will prevent the request from being re-used when ->fn is
running. The approach is same as what we do during handling timeout.

Fix request UAF related with completion race or queue releasing:

- If one rq is referred before rq->q is frozen, then queue won't be
frozen before the request is released during iteration.

- If one rq is referred after rq->q is frozen, refcount_inc_not_zero()
will return false, and we won't iterate over this request.

However, still one request UAF not covered: refcount_inc_not_zero() may
read one freed request, and it will be handled in next patch.

Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-tag.c | 44 +++++++++++++++++++++++++++++++++-----------
 block/blk-mq.c     | 14 +++++++++-----
 block/blk-mq.h     |  1 +
 3 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 2a37731e8244..4a40d409f5dd 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -199,6 +199,16 @@ struct bt_iter_data {
 	bool reserved;
 };
 
+static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
+		unsigned int bitnr)
+{
+	struct request *rq = tags->rqs[bitnr];
+
+	if (!rq || !refcount_inc_not_zero(&rq->ref))
+		return NULL;
+	return rq;
+}
+
 static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 {
 	struct bt_iter_data *iter_data = data;
@@ -206,18 +216,22 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	struct blk_mq_tags *tags = hctx->tags;
 	bool reserved = iter_data->reserved;
 	struct request *rq;
+	bool ret = true;
 
 	if (!reserved)
 		bitnr += tags->nr_reserved_tags;
-	rq = tags->rqs[bitnr];
-
+	rq = blk_mq_find_and_get_req(tags, bitnr);
 	/*
 	 * We can hit rq == NULL here, because the tagging functions
 	 * test and set the bit before assigning ->rqs[].
 	 */
-	if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
-		return iter_data->fn(hctx, rq, iter_data->data, reserved);
-	return true;
+	if (!rq)
+		return true;
+
+	if (rq->q == hctx->queue && rq->mq_hctx == hctx)
+		ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
+	blk_mq_put_rq_ref(rq);
+	return ret;
 }
 
 /**
@@ -264,6 +278,8 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	struct blk_mq_tags *tags = iter_data->tags;
 	bool reserved = iter_data->flags & BT_TAG_ITER_RESERVED;
 	struct request *rq;
+	bool ret = true;
+	bool iter_static_rqs = !!(iter_data->flags & BT_TAG_ITER_STATIC_RQS);
 
 	if (!reserved)
 		bitnr += tags->nr_reserved_tags;
@@ -272,16 +288,19 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	 * We can hit rq == NULL here, because the tagging functions
 	 * test and set the bit before assigning ->rqs[].
 	 */
-	if (iter_data->flags & BT_TAG_ITER_STATIC_RQS)
+	if (iter_static_rqs)
 		rq = tags->static_rqs[bitnr];
 	else
-		rq = tags->rqs[bitnr];
+		rq = blk_mq_find_and_get_req(tags, bitnr);
 	if (!rq)
 		return true;
-	if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
-	    !blk_mq_request_started(rq))
-		return true;
-	return iter_data->fn(rq, iter_data->data, reserved);
+
+	if (!(iter_data->flags & BT_TAG_ITER_STARTED) ||
+	    blk_mq_request_started(rq))
+		ret = iter_data->fn(rq, iter_data->data, reserved);
+	if (!iter_static_rqs)
+		blk_mq_put_rq_ref(rq);
+	return ret;
 }
 
 /**
@@ -348,6 +367,9 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
  *		indicates whether or not @rq is a reserved request. Return
  *		true to continue iterating tags, false to stop.
  * @priv:	Will be passed as second argument to @fn.
+ *
+ * We grab one request reference before calling @fn and release it after
+ * @fn returns.
  */
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		busy_tag_iter_fn *fn, void *priv)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 466676bc2f0b..d053e80fa6d7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -909,6 +909,14 @@ static bool blk_mq_req_expired(struct request *rq, unsigned long *next)
 	return false;
 }
 
+void blk_mq_put_rq_ref(struct request *rq)
+{
+	if (is_flush_rq(rq, rq->mq_hctx))
+		rq->end_io(rq, 0);
+	else if (refcount_dec_and_test(&rq->ref))
+		__blk_mq_free_request(rq);
+}
+
 static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 		struct request *rq, void *priv, bool reserved)
 {
@@ -942,11 +950,7 @@ static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 	if (blk_mq_req_expired(rq, next))
 		blk_mq_rq_timed_out(rq, reserved);
 
-	if (is_flush_rq(rq, hctx))
-		rq->end_io(rq, 0);
-	else if (refcount_dec_and_test(&rq->ref))
-		__blk_mq_free_request(rq);
-
+	blk_mq_put_rq_ref(rq);
 	return true;
 }
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 9ce64bc4a6c8..556368d2c5b6 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -47,6 +47,7 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
 void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
 struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
 					struct blk_mq_ctx *start);
+void blk_mq_put_rq_ref(struct request *rq);
 
 /*
  * Internal helpers for allocating/freeing the request map
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-05 14:58 [PATCH V5 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
  2021-05-05 14:58 ` [PATCH V5 1/4] block: avoid double io accounting for flush request Ming Lei
  2021-05-05 14:58 ` [PATCH V5 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
@ 2021-05-05 14:58 ` Ming Lei
  2021-05-06  7:12   ` Christoph Hellwig
                     ` (2 more replies)
  2021-05-05 14:58 ` [PATCH V5 4/4] blk-mq: clearing flush request reference in tags->rqs[] Ming Lei
  3 siblings, 3 replies; 26+ messages in thread
From: Ming Lei @ 2021-05-05 14:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
	David Jeffery, Ming Lei

refcount_inc_not_zero() in bt_tags_iter() still may read one freed
request.

Fix the issue by the following approach:

1) hold a per-tags spinlock when reading ->rqs[tag] and calling
refcount_inc_not_zero in bt_tags_iter()

2) clearing stale request referred via ->rqs[tag] before freeing
request pool, the per-tags spinlock is held for clearing stale
->rq[tag]

So after we cleared stale requests, bt_tags_iter() won't observe
freed request any more, also the clearing will wait for pending
request reference.

The idea of clearing ->rqs[] is borrowed from John Garry's previous
patch and one recent David's patch.

Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-tag.c |  8 +++++++-
 block/blk-mq-tag.h |  3 +++
 block/blk-mq.c     | 39 ++++++++++++++++++++++++++++++++++-----
 3 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 4a40d409f5dd..8b239dcce85f 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -203,9 +203,14 @@ static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
 		unsigned int bitnr)
 {
 	struct request *rq = tags->rqs[bitnr];
+	unsigned long flags;
 
-	if (!rq || !refcount_inc_not_zero(&rq->ref))
+	spin_lock_irqsave(&tags->lock, flags);
+	if (!rq || !refcount_inc_not_zero(&rq->ref)) {
+		spin_unlock_irqrestore(&tags->lock, flags);
 		return NULL;
+	}
+	spin_unlock_irqrestore(&tags->lock, flags);
 	return rq;
 }
 
@@ -538,6 +543,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 
 	tags->nr_tags = total_tags;
 	tags->nr_reserved_tags = reserved_tags;
+	spin_lock_init(&tags->lock);
 
 	if (blk_mq_is_sbitmap_shared(flags))
 		return tags;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 7d3e6b333a4a..f942a601b5ef 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -20,6 +20,9 @@ struct blk_mq_tags {
 	struct request **rqs;
 	struct request **static_rqs;
 	struct list_head page_list;
+
+	/* used to clear rqs[] before one request pool is freed */
+	spinlock_t lock;
 };
 
 extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d053e80fa6d7..c1b28e09a27e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2306,6 +2306,38 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio)
 	return BLK_QC_T_NONE;
 }
 
+static size_t order_to_size(unsigned int order)
+{
+	return (size_t)PAGE_SIZE << order;
+}
+
+/* called before freeing request pool in @tags */
+static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
+		struct blk_mq_tags *tags, unsigned int hctx_idx)
+{
+	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
+	struct page *page;
+	unsigned long flags;
+
+	spin_lock_irqsave(&drv_tags->lock, flags);
+	list_for_each_entry(page, &tags->page_list, lru) {
+		unsigned long start = (unsigned long)page_address(page);
+		unsigned long end = start + order_to_size(page->private);
+		int i;
+
+		for (i = 0; i < set->queue_depth; i++) {
+			struct request *rq = drv_tags->rqs[i];
+			unsigned long rq_addr = (unsigned long)rq;
+
+			if (rq_addr >= start && rq_addr < end) {
+				WARN_ON_ONCE(refcount_read(&rq->ref) != 0);
+				cmpxchg(&drv_tags->rqs[i], rq, NULL);
+			}
+		}
+	}
+	spin_unlock_irqrestore(&drv_tags->lock, flags);
+}
+
 void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		     unsigned int hctx_idx)
 {
@@ -2324,6 +2356,8 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		}
 	}
 
+	blk_mq_clear_rq_mapping(set, tags, hctx_idx);
+
 	while (!list_empty(&tags->page_list)) {
 		page = list_first_entry(&tags->page_list, struct page, lru);
 		list_del_init(&page->lru);
@@ -2383,11 +2417,6 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 	return tags;
 }
 
-static size_t order_to_size(unsigned int order)
-{
-	return (size_t)PAGE_SIZE << order;
-}
-
 static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq,
 			       unsigned int hctx_idx, int node)
 {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V5 4/4] blk-mq: clearing flush request reference in tags->rqs[]
  2021-05-05 14:58 [PATCH V5 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
                   ` (2 preceding siblings ...)
  2021-05-05 14:58 ` [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
@ 2021-05-05 14:58 ` Ming Lei
  3 siblings, 0 replies; 26+ messages in thread
From: Ming Lei @ 2021-05-05 14:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
	David Jeffery, Ming Lei

Before we free request queue, clearing flush request reference in
tags->rqs[], so that potential UAF can be avoided.

Based on one patch written by David Jeffery.

Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-By: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c1b28e09a27e..55f6fa95482a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2635,16 +2635,38 @@ static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
 					    &hctx->cpuhp_dead);
 }
 
+/*
+ * Before freeing hw queue, clearing the flush request reference in
+ * tags->rqs[] for avoiding potential UAF.
+ */
+static void blk_mq_clear_flush_rq_mapping(struct blk_mq_tags *tags,
+		unsigned int queue_depth, struct request *flush_rq)
+{
+	int i;
+	unsigned long flags;
+
+	WARN_ON_ONCE(refcount_read(&flush_rq->ref) != 0);
+
+	spin_lock_irqsave(&tags->lock, flags);
+	for (i = 0; i < queue_depth; i++)
+		cmpxchg(&tags->rqs[i], flush_rq, NULL);
+	spin_unlock_irqrestore(&tags->lock, flags);
+}
+
 /* hctx->ctxs will be freed in queue's release handler */
 static void blk_mq_exit_hctx(struct request_queue *q,
 		struct blk_mq_tag_set *set,
 		struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
 {
+	struct request *flush_rq = hctx->fq->flush_rq;
+
 	if (blk_mq_hw_queue_mapped(hctx))
 		blk_mq_tag_idle(hctx);
 
+	blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx],
+			set->queue_depth, flush_rq);
 	if (set->ops->exit_request)
-		set->ops->exit_request(set, hctx->fq->flush_rq, hctx_idx);
+		set->ops->exit_request(set, flush_rq, hctx_idx);
 
 	if (set->ops->exit_hctx)
 		set->ops->exit_hctx(hctx, hctx_idx);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 1/4] block: avoid double io accounting for flush request
  2021-05-05 14:58 ` [PATCH V5 1/4] block: avoid double io accounting for flush request Ming Lei
@ 2021-05-06  6:44   ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2021-05-06  6:44 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
	John Garry, David Jeffery

On Wed, May 05, 2021 at 10:58:52PM +0800, Ming Lei wrote:
> For flush request, rq->end_io() may be called two times, one is from
> timeout handling(blk_mq_check_expired()), another is from normal
> completion(__blk_mq_end_request()).
> 
> Move blk_account_io_flush() after flush_rq->ref drops to zero, so
> io accounting can be done just once for flush request.
> 
> Fixes: b68663186577 ("block: add iostat counters for flush requests")
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> Tested-by: John Garry <john.garry@huawei.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

Alhought while reviewing this I have to say that the flush code is
one hard to follow mess, and the extra reference doesn't help..

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
  2021-05-05 14:58 ` [PATCH V5 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
@ 2021-05-06  6:54   ` Christoph Hellwig
  2021-05-06  7:30     ` Ming Lei
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2021-05-06  6:54 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
	John Garry, David Jeffery

On Wed, May 05, 2021 at 10:58:53PM +0800, Ming Lei wrote:
> Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and
> this way will prevent the request from being re-used when ->fn is
> running. The approach is same as what we do during handling timeout.
> 
> Fix request UAF related with completion race or queue releasing:

I guess UAF is supposed to mean use-after-free?  Please just spell that
out.

> +	rq = blk_mq_find_and_get_req(tags, bitnr);
>  	/*
>  	 * We can hit rq == NULL here, because the tagging functions
>  	 * test and set the bit before assigning ->rqs[].
>  	 */
> +	if (!rq)
> +		return true;

Nit: I'd find this much earier to follow if the blk_mq_find_and_get_req
and thus assignment to rq was after the block comment.

>  	struct blk_mq_tags *tags = iter_data->tags;
>  	bool reserved = iter_data->flags & BT_TAG_ITER_RESERVED;
>  	struct request *rq;
> +	bool ret = true;
> +	bool iter_static_rqs = !!(iter_data->flags & BT_TAG_ITER_STATIC_RQS);
>  
>  	if (!reserved)
>  		bitnr += tags->nr_reserved_tags;
> @@ -272,16 +288,19 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
>  	 * We can hit rq == NULL here, because the tagging functions
>  	 * test and set the bit before assigning ->rqs[].
>  	 */
> +	if (iter_static_rqs)

I think this local variable just obsfucates what is going on here.

Otherwise this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-05 14:58 ` [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
@ 2021-05-06  7:12   ` Christoph Hellwig
  2021-05-06  7:34     ` Ming Lei
  2021-05-06 15:02   ` Bart Van Assche
  2021-05-07  1:11   ` Bart Van Assche
  2 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2021-05-06  7:12 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
	John Garry, David Jeffery

On Wed, May 05, 2021 at 10:58:54PM +0800, Ming Lei wrote:
> refcount_inc_not_zero() in bt_tags_iter() still may read one freed
> request.
> 
> Fix the issue by the following approach:
> 
> 1) hold a per-tags spinlock when reading ->rqs[tag] and calling
> refcount_inc_not_zero in bt_tags_iter()
> 
> 2) clearing stale request referred via ->rqs[tag] before freeing
> request pool, the per-tags spinlock is held for clearing stale
> ->rq[tag]
> 
> So after we cleared stale requests, bt_tags_iter() won't observe
> freed request any more, also the clearing will wait for pending
> request reference.
> 
> The idea of clearing ->rqs[] is borrowed from John Garry's previous
> patch and one recent David's patch.
> 
> Tested-by: John Garry <john.garry@huawei.com>
> Reviewed-by: David Jeffery <djeffery@redhat.com>
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-mq-tag.c |  8 +++++++-
>  block/blk-mq-tag.h |  3 +++
>  block/blk-mq.c     | 39 ++++++++++++++++++++++++++++++++++-----
>  3 files changed, 44 insertions(+), 6 deletions(-)
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 4a40d409f5dd..8b239dcce85f 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -203,9 +203,14 @@ static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
>  		unsigned int bitnr)
>  {
>  	struct request *rq = tags->rqs[bitnr];
> +	unsigned long flags;
>  
> -	if (!rq || !refcount_inc_not_zero(&rq->ref))
> +	spin_lock_irqsave(&tags->lock, flags);
> +	if (!rq || !refcount_inc_not_zero(&rq->ref)) {
> +		spin_unlock_irqrestore(&tags->lock, flags);
>  		return NULL;
> +	}
> +	spin_unlock_irqrestore(&tags->lock, flags);
>  	return rq;
>  }
>  
> @@ -538,6 +543,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>  
>  	tags->nr_tags = total_tags;
>  	tags->nr_reserved_tags = reserved_tags;
> +	spin_lock_init(&tags->lock);
>  
>  	if (blk_mq_is_sbitmap_shared(flags))
>  		return tags;
> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
> index 7d3e6b333a4a..f942a601b5ef 100644
> --- a/block/blk-mq-tag.h
> +++ b/block/blk-mq-tag.h
> @@ -20,6 +20,9 @@ struct blk_mq_tags {
>  	struct request **rqs;
>  	struct request **static_rqs;
>  	struct list_head page_list;
> +
> +	/* used to clear rqs[] before one request pool is freed */
> +	spinlock_t lock;
>  };
>  
>  extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index d053e80fa6d7..c1b28e09a27e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2306,6 +2306,38 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio)
>  	return BLK_QC_T_NONE;
>  }
>  
> +static size_t order_to_size(unsigned int order)
> +{
> +	return (size_t)PAGE_SIZE << order;
> +}
> +
> +/* called before freeing request pool in @tags */
> +static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
> +		struct blk_mq_tags *tags, unsigned int hctx_idx)
> +{
> +	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
> +	struct page *page;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&drv_tags->lock, flags);
> +	list_for_each_entry(page, &tags->page_list, lru) {
> +		unsigned long start = (unsigned long)page_address(page);
> +		unsigned long end = start + order_to_size(page->private);
> +		int i;
> +
> +		for (i = 0; i < set->queue_depth; i++) {
> +			struct request *rq = drv_tags->rqs[i];
> +			unsigned long rq_addr = (unsigned long)rq;
> +
> +			if (rq_addr >= start && rq_addr < end) {
> +				WARN_ON_ONCE(refcount_read(&rq->ref) != 0);
> +				cmpxchg(&drv_tags->rqs[i], rq, NULL);
> +			}
> +		}
> +	}
> +	spin_unlock_irqrestore(&drv_tags->lock, flags);

This looks really strange.  Why do we need the cmpxchg while under
the lock anyway?  Why iterate the allocated pages and not just clear
the drv_tags valus directly?

static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
		struct blk_mq_tags *tags, unsigned int hctx_idx)
{
	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
	unsigned int i = 0;
	unsigned long flags;

	spin_lock_irqsave(&drv_tags->lock, flags);
	for (i = 0; i < set->queue_depth; i++)
		WARN_ON_ONCE(refcount_read(&drv_tags->rqs[i]->ref) != 0);
		drv_tags->rqs[i] = NULL;
	}
	spin_unlock_irqrestore(&drv_tags->lock, flags);
}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
  2021-05-06  6:54   ` Christoph Hellwig
@ 2021-05-06  7:30     ` Ming Lei
  0 siblings, 0 replies; 26+ messages in thread
From: Ming Lei @ 2021-05-06  7:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
	John Garry, David Jeffery

On Thu, May 06, 2021 at 07:54:40AM +0100, Christoph Hellwig wrote:
> On Wed, May 05, 2021 at 10:58:53PM +0800, Ming Lei wrote:
> > Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and
> > this way will prevent the request from being re-used when ->fn is
> > running. The approach is same as what we do during handling timeout.
> > 
> > Fix request UAF related with completion race or queue releasing:
> 
> I guess UAF is supposed to mean use-after-free?  Please just spell that
> out.

OK.

> 
> > +	rq = blk_mq_find_and_get_req(tags, bitnr);
> >  	/*
> >  	 * We can hit rq == NULL here, because the tagging functions
> >  	 * test and set the bit before assigning ->rqs[].
> >  	 */
> > +	if (!rq)
> > +		return true;
> 
> Nit: I'd find this much earier to follow if the blk_mq_find_and_get_req
> and thus assignment to rq was after the block comment.

OK.

> 
> >  	struct blk_mq_tags *tags = iter_data->tags;
> >  	bool reserved = iter_data->flags & BT_TAG_ITER_RESERVED;
> >  	struct request *rq;
> > +	bool ret = true;
> > +	bool iter_static_rqs = !!(iter_data->flags & BT_TAG_ITER_STATIC_RQS);
> >  
> >  	if (!reserved)
> >  		bitnr += tags->nr_reserved_tags;
> > @@ -272,16 +288,19 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
> >  	 * We can hit rq == NULL here, because the tagging functions
> >  	 * test and set the bit before assigning ->rqs[].
> >  	 */
> > +	if (iter_static_rqs)
> 
> I think this local variable just obsfucates what is going on here.

It has to be one local variable, otherwise sparse may complain since
iter_data->flags may be thought as being changed during the period.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-06  7:12   ` Christoph Hellwig
@ 2021-05-06  7:34     ` Ming Lei
  2021-05-06 12:18       ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Ming Lei @ 2021-05-06  7:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
	John Garry, David Jeffery

On Thu, May 06, 2021 at 08:12:56AM +0100, Christoph Hellwig wrote:
> On Wed, May 05, 2021 at 10:58:54PM +0800, Ming Lei wrote:
> > refcount_inc_not_zero() in bt_tags_iter() still may read one freed
> > request.
> > 
> > Fix the issue by the following approach:
> > 
> > 1) hold a per-tags spinlock when reading ->rqs[tag] and calling
> > refcount_inc_not_zero in bt_tags_iter()
> > 
> > 2) clearing stale request referred via ->rqs[tag] before freeing
> > request pool, the per-tags spinlock is held for clearing stale
> > ->rq[tag]
> > 
> > So after we cleared stale requests, bt_tags_iter() won't observe
> > freed request any more, also the clearing will wait for pending
> > request reference.
> > 
> > The idea of clearing ->rqs[] is borrowed from John Garry's previous
> > patch and one recent David's patch.
> > 
> > Tested-by: John Garry <john.garry@huawei.com>
> > Reviewed-by: David Jeffery <djeffery@redhat.com>
> > Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  block/blk-mq-tag.c |  8 +++++++-
> >  block/blk-mq-tag.h |  3 +++
> >  block/blk-mq.c     | 39 ++++++++++++++++++++++++++++++++++-----
> >  3 files changed, 44 insertions(+), 6 deletions(-)
> > 
> > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > index 4a40d409f5dd..8b239dcce85f 100644
> > --- a/block/blk-mq-tag.c
> > +++ b/block/blk-mq-tag.c
> > @@ -203,9 +203,14 @@ static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
> >  		unsigned int bitnr)
> >  {
> >  	struct request *rq = tags->rqs[bitnr];
> > +	unsigned long flags;
> >  
> > -	if (!rq || !refcount_inc_not_zero(&rq->ref))
> > +	spin_lock_irqsave(&tags->lock, flags);
> > +	if (!rq || !refcount_inc_not_zero(&rq->ref)) {
> > +		spin_unlock_irqrestore(&tags->lock, flags);
> >  		return NULL;
> > +	}
> > +	spin_unlock_irqrestore(&tags->lock, flags);
> >  	return rq;
> >  }
> >  
> > @@ -538,6 +543,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
> >  
> >  	tags->nr_tags = total_tags;
> >  	tags->nr_reserved_tags = reserved_tags;
> > +	spin_lock_init(&tags->lock);
> >  
> >  	if (blk_mq_is_sbitmap_shared(flags))
> >  		return tags;
> > diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
> > index 7d3e6b333a4a..f942a601b5ef 100644
> > --- a/block/blk-mq-tag.h
> > +++ b/block/blk-mq-tag.h
> > @@ -20,6 +20,9 @@ struct blk_mq_tags {
> >  	struct request **rqs;
> >  	struct request **static_rqs;
> >  	struct list_head page_list;
> > +
> > +	/* used to clear rqs[] before one request pool is freed */
> > +	spinlock_t lock;
> >  };
> >  
> >  extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index d053e80fa6d7..c1b28e09a27e 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -2306,6 +2306,38 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio)
> >  	return BLK_QC_T_NONE;
> >  }
> >  
> > +static size_t order_to_size(unsigned int order)
> > +{
> > +	return (size_t)PAGE_SIZE << order;
> > +}
> > +
> > +/* called before freeing request pool in @tags */
> > +static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
> > +		struct blk_mq_tags *tags, unsigned int hctx_idx)
> > +{
> > +	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
> > +	struct page *page;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&drv_tags->lock, flags);
> > +	list_for_each_entry(page, &tags->page_list, lru) {
> > +		unsigned long start = (unsigned long)page_address(page);
> > +		unsigned long end = start + order_to_size(page->private);
> > +		int i;
> > +
> > +		for (i = 0; i < set->queue_depth; i++) {
> > +			struct request *rq = drv_tags->rqs[i];
> > +			unsigned long rq_addr = (unsigned long)rq;
> > +
> > +			if (rq_addr >= start && rq_addr < end) {
> > +				WARN_ON_ONCE(refcount_read(&rq->ref) != 0);
> > +				cmpxchg(&drv_tags->rqs[i], rq, NULL);
> > +			}
> > +		}
> > +	}
> > +	spin_unlock_irqrestore(&drv_tags->lock, flags);
> 
> This looks really strange.  Why do we need the cmpxchg while under
> the lock anyway?  Why iterate the allocated pages and not just clear

Lock is for protecting free vs. iterating.

> the drv_tags valus directly?
> 
> static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
> 		struct blk_mq_tags *tags, unsigned int hctx_idx)
> {
> 	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
> 	unsigned int i = 0;
> 	unsigned long flags;
> 
> 	spin_lock_irqsave(&drv_tags->lock, flags);
> 	for (i = 0; i < set->queue_depth; i++)
> 		WARN_ON_ONCE(refcount_read(&drv_tags->rqs[i]->ref) != 0);
> 		drv_tags->rqs[i] = NULL;

drv_tags->rqs[] may be assigned from another LUN at the same time, then
the simple clearing will overwrite the active assignment. We just need
to clear mapping iff the stored rq will be freed.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-06  7:34     ` Ming Lei
@ 2021-05-06 12:18       ` Christoph Hellwig
  2021-05-06 13:38         ` Ming Lei
  2021-05-06 14:51         ` Bart Van Assche
  0 siblings, 2 replies; 26+ messages in thread
From: Christoph Hellwig @ 2021-05-06 12:18 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Bart Van Assche,
	Hannes Reinecke, John Garry, David Jeffery

On Thu, May 06, 2021 at 03:34:17PM +0800, Ming Lei wrote:
> > {
> > 	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
> > 	unsigned int i = 0;
> > 	unsigned long flags;
> > 
> > 	spin_lock_irqsave(&drv_tags->lock, flags);
> > 	for (i = 0; i < set->queue_depth; i++)
> > 		WARN_ON_ONCE(refcount_read(&drv_tags->rqs[i]->ref) != 0);
> > 		drv_tags->rqs[i] = NULL;
> 
> drv_tags->rqs[] may be assigned from another LUN at the same time, then
> the simple clearing will overwrite the active assignment. We just need
> to clear mapping iff the stored rq will be freed.

So.  Even a different LUN shares the same tagset.  So I can see the
need for the cmpxchg (please document it!), but I don't see the need
for the complex iteration.  All the rqs are freed in one single loop,
so we can just iterate through them sequentially.

Btw, I think ->lock might better be named ->clear_lock to document its
purpose better.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-06 12:18       ` Christoph Hellwig
@ 2021-05-06 13:38         ` Ming Lei
  2021-05-07  6:57           ` Christoph Hellwig
  2021-05-06 14:51         ` Bart Van Assche
  1 sibling, 1 reply; 26+ messages in thread
From: Ming Lei @ 2021-05-06 13:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
	John Garry, David Jeffery

On Thu, May 06, 2021 at 01:18:49PM +0100, Christoph Hellwig wrote:
> On Thu, May 06, 2021 at 03:34:17PM +0800, Ming Lei wrote:
> > > {
> > > 	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
> > > 	unsigned int i = 0;
> > > 	unsigned long flags;
> > > 
> > > 	spin_lock_irqsave(&drv_tags->lock, flags);
> > > 	for (i = 0; i < set->queue_depth; i++)
> > > 		WARN_ON_ONCE(refcount_read(&drv_tags->rqs[i]->ref) != 0);
> > > 		drv_tags->rqs[i] = NULL;
> > 
> > drv_tags->rqs[] may be assigned from another LUN at the same time, then
> > the simple clearing will overwrite the active assignment. We just need
> > to clear mapping iff the stored rq will be freed.
> 
> So.  Even a different LUN shares the same tagset.  So I can see the
> need for the cmpxchg (please document it!), but I don't see the need
> for the complex iteration.  All the rqs are freed in one single loop,
> so we can just iterate through them sequentially.

That is exactly what the patch is doing, requests are stored in page
list, so check if one request(covered in page list) reference in
drv_tags->rq[i] exists, if yes, we clear the request reference.

The code is actually sort of self-document: before we free requests,
clear the reference in hostwide drv->rqs[].

> 
> Btw, I think ->lock might better be named ->clear_lock to document its
> purpose better.

OK.


thanks, 
Ming


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-06 12:18       ` Christoph Hellwig
  2021-05-06 13:38         ` Ming Lei
@ 2021-05-06 14:51         ` Bart Van Assche
  2021-05-07  0:11           ` Ming Lei
  1 sibling, 1 reply; 26+ messages in thread
From: Bart Van Assche @ 2021-05-06 14:51 UTC (permalink / raw)
  To: Christoph Hellwig, Ming Lei
  Cc: Jens Axboe, linux-block, Hannes Reinecke, John Garry, David Jeffery

On 5/6/21 5:18 AM, Christoph Hellwig wrote:
> On Thu, May 06, 2021 at 03:34:17PM +0800, Ming Lei wrote:
>>> {
>>> 	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
>>> 	unsigned int i = 0;
>>> 	unsigned long flags;
>>>
>>> 	spin_lock_irqsave(&drv_tags->lock, flags);
>>> 	for (i = 0; i < set->queue_depth; i++)
>>> 		WARN_ON_ONCE(refcount_read(&drv_tags->rqs[i]->ref) != 0);
>>> 		drv_tags->rqs[i] = NULL;
>>
>> drv_tags->rqs[] may be assigned from another LUN at the same time, then
>> the simple clearing will overwrite the active assignment. We just need
>> to clear mapping iff the stored rq will be freed.
> 
> So.  Even a different LUN shares the same tagset.  So I can see the
> need for the cmpxchg (please document it!), but I don't see the need
> for the complex iteration.  All the rqs are freed in one single loop,
> so we can just iterate through them sequentially.
> 
> Btw, I think ->lock might better be named ->clear_lock to document its
> purpose better.

I'm not sure that would be a better name since I don't think that it is
necessary to hold that lock around the cmpxchg() calls. How about using
something like the following code in blk_mq_clear_rq_mapping() instead
of the code in v5 of patch 3/4?

	spin_lock_irqsave(&drv_tags->lock, flags);
	spin_unlock_irqrestore(&drv_tags->lock, flags);

	list_for_each_entry(page, &tags->page_list, lru) {
		/* use cmpxchg() to clear request pointer selectively */
	}

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-05 14:58 ` [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
  2021-05-06  7:12   ` Christoph Hellwig
@ 2021-05-06 15:02   ` Bart Van Assche
  2021-05-07  0:13     ` Ming Lei
  2021-05-07  1:11   ` Bart Van Assche
  2 siblings, 1 reply; 26+ messages in thread
From: Bart Van Assche @ 2021-05-06 15:02 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, Hannes Reinecke, John Garry, David Jeffery

On 5/5/21 7:58 AM, Ming Lei wrote:
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 4a40d409f5dd..8b239dcce85f 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -203,9 +203,14 @@ static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
>  		unsigned int bitnr)
>  {
>  	struct request *rq = tags->rqs[bitnr];
> +	unsigned long flags;
>  
> -	if (!rq || !refcount_inc_not_zero(&rq->ref))
> +	spin_lock_irqsave(&tags->lock, flags);
> +	if (!rq || !refcount_inc_not_zero(&rq->ref)) {
> +		spin_unlock_irqrestore(&tags->lock, flags);
>  		return NULL;
> +	}
> +	spin_unlock_irqrestore(&tags->lock, flags);
>  	return rq;
>  }
Has it been considered to change the body of the if-statement into the
following?

		rq = NULL;

That change would reduce the number of return statements from two to one.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-06 14:51         ` Bart Van Assche
@ 2021-05-07  0:11           ` Ming Lei
  2021-05-07  1:10             ` Bart Van Assche
  0 siblings, 1 reply; 26+ messages in thread
From: Ming Lei @ 2021-05-07  0:11 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Hannes Reinecke,
	John Garry, David Jeffery

On Thu, May 06, 2021 at 07:51:53AM -0700, Bart Van Assche wrote:
> On 5/6/21 5:18 AM, Christoph Hellwig wrote:
> > On Thu, May 06, 2021 at 03:34:17PM +0800, Ming Lei wrote:
> >>> {
> >>> 	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
> >>> 	unsigned int i = 0;
> >>> 	unsigned long flags;
> >>>
> >>> 	spin_lock_irqsave(&drv_tags->lock, flags);
> >>> 	for (i = 0; i < set->queue_depth; i++)
> >>> 		WARN_ON_ONCE(refcount_read(&drv_tags->rqs[i]->ref) != 0);
> >>> 		drv_tags->rqs[i] = NULL;
> >>
> >> drv_tags->rqs[] may be assigned from another LUN at the same time, then
> >> the simple clearing will overwrite the active assignment. We just need
> >> to clear mapping iff the stored rq will be freed.
> > 
> > So.  Even a different LUN shares the same tagset.  So I can see the
> > need for the cmpxchg (please document it!), but I don't see the need
> > for the complex iteration.  All the rqs are freed in one single loop,
> > so we can just iterate through them sequentially.
> > 
> > Btw, I think ->lock might better be named ->clear_lock to document its
> > purpose better.
> 
> I'm not sure that would be a better name since I don't think that it is
> necessary to hold that lock around the cmpxchg() calls. How about using
> something like the following code in blk_mq_clear_rq_mapping() instead
> of the code in v5 of patch 3/4?
> 
> 	spin_lock_irqsave(&drv_tags->lock, flags);
> 	spin_unlock_irqrestore(&drv_tags->lock, flags);
> 
> 	list_for_each_entry(page, &tags->page_list, lru) {
> 		/* use cmpxchg() to clear request pointer selectively */
> 	}

This way won't work as expected because iterating may happen between
releasing drv_tags->lock and cmpxchg(->rqs[]), then freed requests
may still be touched during iteration after they are freed in blk_mq_free_rqs().


Thanks,
Ming


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-06 15:02   ` Bart Van Assche
@ 2021-05-07  0:13     ` Ming Lei
  0 siblings, 0 replies; 26+ messages in thread
From: Ming Lei @ 2021-05-07  0:13 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Hannes Reinecke, John Garry, David Jeffery

On Thu, May 06, 2021 at 08:02:53AM -0700, Bart Van Assche wrote:
> On 5/5/21 7:58 AM, Ming Lei wrote:
> > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > index 4a40d409f5dd..8b239dcce85f 100644
> > --- a/block/blk-mq-tag.c
> > +++ b/block/blk-mq-tag.c
> > @@ -203,9 +203,14 @@ static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
> >  		unsigned int bitnr)
> >  {
> >  	struct request *rq = tags->rqs[bitnr];
> > +	unsigned long flags;
> >  
> > -	if (!rq || !refcount_inc_not_zero(&rq->ref))
> > +	spin_lock_irqsave(&tags->lock, flags);
> > +	if (!rq || !refcount_inc_not_zero(&rq->ref)) {
> > +		spin_unlock_irqrestore(&tags->lock, flags);
> >  		return NULL;
> > +	}
> > +	spin_unlock_irqrestore(&tags->lock, flags);
> >  	return rq;
> >  }
> Has it been considered to change the body of the if-statement into the
> following?
> 
> 		rq = NULL;
> 
> That change would reduce the number of return statements from two to one.

Looks cleaner, will take it in V6.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-07  0:11           ` Ming Lei
@ 2021-05-07  1:10             ` Bart Van Assche
  2021-05-07  2:05               ` Ming Lei
  0 siblings, 1 reply; 26+ messages in thread
From: Bart Van Assche @ 2021-05-07  1:10 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Hannes Reinecke,
	John Garry, David Jeffery

On 5/6/21 5:11 PM, Ming Lei wrote:
> On Thu, May 06, 2021 at 07:51:53AM -0700, Bart Van Assche wrote:
>> I'm not sure that would be a better name since I don't think that it is
>> necessary to hold that lock around the cmpxchg() calls. How about using
>> something like the following code in blk_mq_clear_rq_mapping() instead
>> of the code in v5 of patch 3/4?
>>
>> 	spin_lock_irqsave(&drv_tags->lock, flags);
>> 	spin_unlock_irqrestore(&drv_tags->lock, flags);
>>
>> 	list_for_each_entry(page, &tags->page_list, lru) {
>> 		/* use cmpxchg() to clear request pointer selectively */
>> 	}
> 
> This way won't work as expected because iterating may happen between
> releasing drv_tags->lock and cmpxchg(->rqs[]), then freed requests
> may still be touched during iteration after they are freed in blk_mq_free_rqs().

Right, the unlock should happen after the pointers have been cleared. 
But I think it is safe to move the spin_lock call down such that both 
the spin_lock and spin_unlock call happen after the pointers have been 
cleared. That is sufficient to guarantee that all 
blk_mq_find_and_get_req() calls either happen before or after the spin 
lock / unlock pair. blk_mq_find_and_get_req() calls that happen before 
the pair happen before the request pointers are freed. Calls that happen 
after the spin lock / unlock pair will either read NULL or a pointer to 
a request that is associated with another queue and hence won't trigger 
a use-after-free.

Bart.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-05 14:58 ` [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
  2021-05-06  7:12   ` Christoph Hellwig
  2021-05-06 15:02   ` Bart Van Assche
@ 2021-05-07  1:11   ` Bart Van Assche
  2021-05-07  2:06     ` Ming Lei
  2 siblings, 1 reply; 26+ messages in thread
From: Bart Van Assche @ 2021-05-07  1:11 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, Hannes Reinecke, John Garry, David Jeffery

On 5/5/21 7:58 AM, Ming Lei wrote:
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 4a40d409f5dd..8b239dcce85f 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -203,9 +203,14 @@ static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
>   		unsigned int bitnr)
>   {
>   	struct request *rq = tags->rqs[bitnr];
> +	unsigned long flags;
>   
> -	if (!rq || !refcount_inc_not_zero(&rq->ref))
> +	spin_lock_irqsave(&tags->lock, flags);
> +	if (!rq || !refcount_inc_not_zero(&rq->ref)) {
> +		spin_unlock_irqrestore(&tags->lock, flags);
>   		return NULL;
> +	}
> +	spin_unlock_irqrestore(&tags->lock, flags);
>   	return rq;
>   }

Shouldn't the 'rq = tags->rqs[bitnr]' assignment be protected by 
tags->lock too? Otherwise a request pointer could be read before the 
request pointer clearing happens and the refcount_inc_not_zero() call 
could happen after the request clearing.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-07  1:10             ` Bart Van Assche
@ 2021-05-07  2:05               ` Ming Lei
  2021-05-07  3:16                 ` Bart Van Assche
  0 siblings, 1 reply; 26+ messages in thread
From: Ming Lei @ 2021-05-07  2:05 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Hannes Reinecke,
	John Garry, David Jeffery

On Thu, May 06, 2021 at 06:10:09PM -0700, Bart Van Assche wrote:
> On 5/6/21 5:11 PM, Ming Lei wrote:
> > On Thu, May 06, 2021 at 07:51:53AM -0700, Bart Van Assche wrote:
> > > I'm not sure that would be a better name since I don't think that it is
> > > necessary to hold that lock around the cmpxchg() calls. How about using
> > > something like the following code in blk_mq_clear_rq_mapping() instead
> > > of the code in v5 of patch 3/4?
> > > 
> > > 	spin_lock_irqsave(&drv_tags->lock, flags);
> > > 	spin_unlock_irqrestore(&drv_tags->lock, flags);
> > > 
> > > 	list_for_each_entry(page, &tags->page_list, lru) {
> > > 		/* use cmpxchg() to clear request pointer selectively */
> > > 	}
> > 
> > This way won't work as expected because iterating may happen between
> > releasing drv_tags->lock and cmpxchg(->rqs[]), then freed requests
> > may still be touched during iteration after they are freed in blk_mq_free_rqs().
> 
> Right, the unlock should happen after the pointers have been cleared. But I
> think it is safe to move the spin_lock call down such that both the
> spin_lock and spin_unlock call happen after the pointers have been cleared.
> That is sufficient to guarantee that all blk_mq_find_and_get_req() calls
> either happen before or after the spin lock / unlock pair.
> blk_mq_find_and_get_req() calls that happen before the pair happen before
> the request pointers are freed. Calls that happen after the spin lock /
> unlock pair will either read NULL or a pointer to a request that is
> associated with another queue and hence won't trigger a use-after-free.

Putting the lock pair after clearing rq mapping should work, but not see
any benefit: not very readable, and memory barrier knowledge is required for
understanding its correctness(cmpxchg has to be completed before unlock), ...,
so is it better idea to move lock pair after clearing rq mapping?



Thanks,
Ming


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-07  1:11   ` Bart Van Assche
@ 2021-05-07  2:06     ` Ming Lei
  0 siblings, 0 replies; 26+ messages in thread
From: Ming Lei @ 2021-05-07  2:06 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Hannes Reinecke, John Garry, David Jeffery

On Thu, May 06, 2021 at 06:11:59PM -0700, Bart Van Assche wrote:
> On 5/5/21 7:58 AM, Ming Lei wrote:
> > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > index 4a40d409f5dd..8b239dcce85f 100644
> > --- a/block/blk-mq-tag.c
> > +++ b/block/blk-mq-tag.c
> > @@ -203,9 +203,14 @@ static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
> >   		unsigned int bitnr)
> >   {
> >   	struct request *rq = tags->rqs[bitnr];
> > +	unsigned long flags;
> > -	if (!rq || !refcount_inc_not_zero(&rq->ref))
> > +	spin_lock_irqsave(&tags->lock, flags);
> > +	if (!rq || !refcount_inc_not_zero(&rq->ref)) {
> > +		spin_unlock_irqrestore(&tags->lock, flags);
> >   		return NULL;
> > +	}
> > +	spin_unlock_irqrestore(&tags->lock, flags);
> >   	return rq;
> >   }
> 
> Shouldn't the 'rq = tags->rqs[bitnr]' assignment be protected by tags->lock
> too? Otherwise a request pointer could be read before the request pointer
> clearing happens and the refcount_inc_not_zero() call could happen after the
> request clearing.

Right, will fix it.

-- 
Ming


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-07  2:05               ` Ming Lei
@ 2021-05-07  3:16                 ` Bart Van Assche
  2021-05-07  6:31                   ` Ming Lei
  0 siblings, 1 reply; 26+ messages in thread
From: Bart Van Assche @ 2021-05-07  3:16 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Hannes Reinecke,
	John Garry, David Jeffery

On 5/6/21 7:05 PM, Ming Lei wrote:
> Putting the lock pair after clearing rq mapping should work, but not see
> any benefit: not very readable, and memory barrier knowledge is required for
> understanding its correctness(cmpxchg has to be completed before unlock), ...,
> so is it better idea to move lock pair after clearing rq mapping?

It depends on how much time will be spent inside
blk_mq_clear_rq_mapping(). If the time spent in the nested loop in
blk_mq_clear_rq_mapping() would be significant then the proposed change
will help to reduce interrupt latency in blk_mq_find_and_get_req().

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-07  3:16                 ` Bart Van Assche
@ 2021-05-07  6:31                   ` Ming Lei
  2021-05-07  6:52                     ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Ming Lei @ 2021-05-07  6:31 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Hannes Reinecke,
	John Garry, David Jeffery

On Thu, May 06, 2021 at 08:16:25PM -0700, Bart Van Assche wrote:
> On 5/6/21 7:05 PM, Ming Lei wrote:
> > Putting the lock pair after clearing rq mapping should work, but not see
> > any benefit: not very readable, and memory barrier knowledge is required for
> > understanding its correctness(cmpxchg has to be completed before unlock), ...,
> > so is it better idea to move lock pair after clearing rq mapping?
> 
> It depends on how much time will be spent inside
> blk_mq_clear_rq_mapping(). If the time spent in the nested loop in
> blk_mq_clear_rq_mapping() would be significant then the proposed change
> will help to reduce interrupt latency in blk_mq_find_and_get_req().

interrupt latency in blk_mq_find_and_get_req() shouldn't be increased
because interrupt won't be disabled when spinning on the lock. But interrupt
may be disabled for a while in blk_mq_clear_rq_mapping() in case of big
nr_requests and hw queue depth.

Fair enough, will take this way for not holding lock for too long.

Thanks, 
Ming


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-07  6:31                   ` Ming Lei
@ 2021-05-07  6:52                     ` Christoph Hellwig
  2021-05-08  2:02                       ` Bart Van Assche
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2021-05-07  6:52 UTC (permalink / raw)
  To: Ming Lei
  Cc: Bart Van Assche, Christoph Hellwig, Jens Axboe, linux-block,
	Hannes Reinecke, John Garry, David Jeffery

On Fri, May 07, 2021 at 02:31:02PM +0800, Ming Lei wrote:
> > It depends on how much time will be spent inside
> > blk_mq_clear_rq_mapping(). If the time spent in the nested loop in
> > blk_mq_clear_rq_mapping() would be significant then the proposed change
> > will help to reduce interrupt latency in blk_mq_find_and_get_req().
> 
> interrupt latency in blk_mq_find_and_get_req() shouldn't be increased
> because interrupt won't be disabled when spinning on the lock. But interrupt
> may be disabled for a while in blk_mq_clear_rq_mapping() in case of big
> nr_requests and hw queue depth.
> 
> Fair enough, will take this way for not holding lock for too long.

Can we take a step back here?  Once blk_mq_clear_rq_mapping hits we are
deep into tearing the device down and freeing the tag_set.  So if
blk_mq_find_and_get_req is waiting any time on the lock something is
wrong.  We might as well just trylock in blk_mq_find_and_get_req and
not find a request if the lock is contented as there is no point in
waiting for the lock.  In fact we might not even need a lock, but just
an atomic bitops in the tagset that marks it as beeing freed, that
needs to be tested in blk_mq_find_and_get_req with the right memory
barriers.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-06 13:38         ` Ming Lei
@ 2021-05-07  6:57           ` Christoph Hellwig
  2021-05-07  7:30             ` Ming Lei
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2021-05-07  6:57 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Bart Van Assche,
	Hannes Reinecke, John Garry, David Jeffery

On Thu, May 06, 2021 at 09:38:41PM +0800, Ming Lei wrote:
> > So.  Even a different LUN shares the same tagset.  So I can see the
> > need for the cmpxchg (please document it!), but I don't see the need
> > for the complex iteration.  All the rqs are freed in one single loop,
> > so we can just iterate through them sequentially.
> 
> That is exactly what the patch is doing, requests are stored in page
> list, so check if one request(covered in page list) reference in
> drv_tags->rq[i] exists, if yes, we clear the request reference.
> 
> The code is actually sort of self-document: before we free requests,
> clear the reference in hostwide drv->rqs[].

What the patch does it to do a completely pointless nested loop.
Instead of just looping through all requests which is simple and fast
it loops through each page, and then does another loop inside that,
just increasing complexity and runtime.  We should at least do something
like the incremental patch below instead which is simpler, faster and
easier to understand:


diff --git a/block/blk-mq.c b/block/blk-mq.c
index c1b28e09a27e..598fe82cfbcf 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2311,29 +2311,20 @@ static size_t order_to_size(unsigned int order)
 	return (size_t)PAGE_SIZE << order;
 }
 
-/* called before freeing request pool in @tags */
+/* ensure that blk_mq_find_and_get_req can't find the tags any more */
 static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
 		struct blk_mq_tags *tags, unsigned int hctx_idx)
 {
 	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
-	struct page *page;
 	unsigned long flags;
+	int i;
 
 	spin_lock_irqsave(&drv_tags->lock, flags);
-	list_for_each_entry(page, &tags->page_list, lru) {
-		unsigned long start = (unsigned long)page_address(page);
-		unsigned long end = start + order_to_size(page->private);
-		int i;
+	for (i = 0; i < set->queue_depth; i++) {
+		struct request *rq = drv_tags->rqs[i];
 
-		for (i = 0; i < set->queue_depth; i++) {
-			struct request *rq = drv_tags->rqs[i];
-			unsigned long rq_addr = (unsigned long)rq;
-
-			if (rq_addr >= start && rq_addr < end) {
-				WARN_ON_ONCE(refcount_read(&rq->ref) != 0);
-				cmpxchg(&drv_tags->rqs[i], rq, NULL);
-			}
-		}
+		WARN_ON_ONCE(refcount_read(&rq->ref) != 0);
+		cmpxchg(&drv_tags->rqs[i], rq, NULL);
 	}
 	spin_unlock_irqrestore(&drv_tags->lock, flags);
 }

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-07  6:57           ` Christoph Hellwig
@ 2021-05-07  7:30             ` Ming Lei
  0 siblings, 0 replies; 26+ messages in thread
From: Ming Lei @ 2021-05-07  7:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
	John Garry, David Jeffery

On Fri, May 07, 2021 at 07:57:23AM +0100, Christoph Hellwig wrote:
> On Thu, May 06, 2021 at 09:38:41PM +0800, Ming Lei wrote:
> > > So.  Even a different LUN shares the same tagset.  So I can see the
> > > need for the cmpxchg (please document it!), but I don't see the need
> > > for the complex iteration.  All the rqs are freed in one single loop,
> > > so we can just iterate through them sequentially.
> > 
> > That is exactly what the patch is doing, requests are stored in page
> > list, so check if one request(covered in page list) reference in
> > drv_tags->rq[i] exists, if yes, we clear the request reference.
> > 
> > The code is actually sort of self-document: before we free requests,
> > clear the reference in hostwide drv->rqs[].
> 
> What the patch does it to do a completely pointless nested loop.
> Instead of just looping through all requests which is simple and fast
> it loops through each page, and then does another loop inside that,
> just increasing complexity and runtime.  We should at least do something
> like the incremental patch below instead which is simpler, faster and
> easier to understand:

The pages to be freed may be from scheduler tags(set->sched_tags), which
belongs to one request queue being shutdown, but set->tags->rqs[] is
shared by all request queues in the host, and it can be actively assigned
from other LUN/request queue.

> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index c1b28e09a27e..598fe82cfbcf 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2311,29 +2311,20 @@ static size_t order_to_size(unsigned int order)
>  	return (size_t)PAGE_SIZE << order;
>  }
>  
> -/* called before freeing request pool in @tags */
> +/* ensure that blk_mq_find_and_get_req can't find the tags any more */
>  static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
>  		struct blk_mq_tags *tags, unsigned int hctx_idx)
>  {
>  	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
> -	struct page *page;
>  	unsigned long flags;
> +	int i;
>  
>  	spin_lock_irqsave(&drv_tags->lock, flags);
> -	list_for_each_entry(page, &tags->page_list, lru) {
> -		unsigned long start = (unsigned long)page_address(page);
> -		unsigned long end = start + order_to_size(page->private);
> -		int i;
> +	for (i = 0; i < set->queue_depth; i++) {
> +		struct request *rq = drv_tags->rqs[i];
>  
> -		for (i = 0; i < set->queue_depth; i++) {
> -			struct request *rq = drv_tags->rqs[i];
> -			unsigned long rq_addr = (unsigned long)rq;
> -
> -			if (rq_addr >= start && rq_addr < end) {
> -				WARN_ON_ONCE(refcount_read(&rq->ref) != 0);
> -				cmpxchg(&drv_tags->rqs[i], rq, NULL);
> -			}
> -		}
> +		WARN_ON_ONCE(refcount_read(&rq->ref) != 0);
> +		cmpxchg(&drv_tags->rqs[i], rq, NULL);

set->tags->rqs[] is just one dynamic mapping between host-wide driver tag and
request which may be allocated from sched tags which is per-request-queue,
and set->tags->rqs[] is host wide.

What if the request pointed by 'rq' is just assigned from another active LUN's
sched tags?

What we need to do is to make sure every reference to being freed request is
cleared, that is all.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-05-07  6:52                     ` Christoph Hellwig
@ 2021-05-08  2:02                       ` Bart Van Assche
  0 siblings, 0 replies; 26+ messages in thread
From: Bart Van Assche @ 2021-05-08  2:02 UTC (permalink / raw)
  To: Christoph Hellwig, Ming Lei
  Cc: Jens Axboe, linux-block, Hannes Reinecke, John Garry, David Jeffery

On 5/6/21 11:52 PM, Christoph Hellwig wrote:
> Can we take a step back here?  Once blk_mq_clear_rq_mapping hits we are
> deep into tearing the device down and freeing the tag_set.  So if
> blk_mq_find_and_get_req is waiting any time on the lock something is
> wrong.  We might as well just trylock in blk_mq_find_and_get_req and
> not find a request if the lock is contented as there is no point in
> waiting for the lock.  In fact we might not even need a lock, but just
> an atomic bitops in the tagset that marks it as beeing freed, that
> needs to be tested in blk_mq_find_and_get_req with the right memory
> barriers.

We need to make sure that blk_mq_find_and_get_req() in its entirety
either happens before scheduler tags are freed or after the cmpxchg()
loop has finished. Maybe I'm overlooking something but I don't see how
to do that without a wait loop, the kind of loop that is embedded in
spin_lock()?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-05-08  2:02 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-05 14:58 [PATCH V5 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
2021-05-05 14:58 ` [PATCH V5 1/4] block: avoid double io accounting for flush request Ming Lei
2021-05-06  6:44   ` Christoph Hellwig
2021-05-05 14:58 ` [PATCH V5 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
2021-05-06  6:54   ` Christoph Hellwig
2021-05-06  7:30     ` Ming Lei
2021-05-05 14:58 ` [PATCH V5 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
2021-05-06  7:12   ` Christoph Hellwig
2021-05-06  7:34     ` Ming Lei
2021-05-06 12:18       ` Christoph Hellwig
2021-05-06 13:38         ` Ming Lei
2021-05-07  6:57           ` Christoph Hellwig
2021-05-07  7:30             ` Ming Lei
2021-05-06 14:51         ` Bart Van Assche
2021-05-07  0:11           ` Ming Lei
2021-05-07  1:10             ` Bart Van Assche
2021-05-07  2:05               ` Ming Lei
2021-05-07  3:16                 ` Bart Van Assche
2021-05-07  6:31                   ` Ming Lei
2021-05-07  6:52                     ` Christoph Hellwig
2021-05-08  2:02                       ` Bart Van Assche
2021-05-06 15:02   ` Bart Van Assche
2021-05-07  0:13     ` Ming Lei
2021-05-07  1:11   ` Bart Van Assche
2021-05-07  2:06     ` Ming Lei
2021-05-05 14:58 ` [PATCH V5 4/4] blk-mq: clearing flush request reference in tags->rqs[] Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).