linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/3] blk-mq: fix request UAF related with iterating over tagset requests
@ 2021-04-27  1:45 Ming Lei
  2021-04-27  1:45 ` [PATCH V2 1/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Ming Lei @ 2021-04-27  1:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Khazhy Kumykov,
	Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
	David Jeffery, Ming Lei

Hi Guys,

This patchset fixes the request UAF issue by one simple approach,
without clearing ->rqs[] in fast path.

1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
and release it after calling ->fn, so ->fn won't be called for one
request if its queue is frozen, done in 1st patch

2) always complete request synchronously when the completing is run
via blk_mq_tagset_busy_iter(), done in 2nd patch

3) clearing any stale request referred in ->rqs[] before freeing the
request pool, one per-tags spinlock is added for protecting
grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
in bt_tags_iter() is avoided, done in 3rd patch.


V2:
	- take Bart's suggestion to not add blk-mq helper for completing
	  requests when it is being iterated
	- don't grab rq->ref if the iterator is over static rqs because
	the use case do require to iterate over all requests no matter if
	the request is initialized or not

Ming Lei (3):
  blk-mq: grab rq->refcount before calling ->fn in
    blk_mq_tagset_busy_iter
  blk-mq: complete request locally if the completion is from tagset
    iterator
  blk-mq: clear stale request in tags->rq[] before freeing one request
    pool

 block/blk-mq-tag.c     | 33 ++++++++++++++++++-----
 block/blk-mq-tag.h     |  3 +++
 block/blk-mq.c         | 61 +++++++++++++++++++++++++++++++++++-------
 block/blk-mq.h         |  1 +
 include/linux/blkdev.h |  2 ++
 5 files changed, 84 insertions(+), 16 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V2 1/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
  2021-04-27  1:45 [PATCH V2 0/3] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
@ 2021-04-27  1:45 ` Ming Lei
  2021-04-27  2:28   ` Bart Van Assche
  2021-04-27  1:45 ` [PATCH V2 2/3] blk-mq: complete request locally if the completion is from tagset iterator Ming Lei
  2021-04-27  1:45 ` [PATCH V2 3/3] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
  2 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2021-04-27  1:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Khazhy Kumykov,
	Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
	David Jeffery, Ming Lei

Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and
this way will prevent the request from being re-used when ->fn is
running. The approach is same as what we do during handling timeout.

Fix request UAF related with completion race or queue releasing:

- If one rq is referred before rq->q is frozen, then queue won't be
frozen before the request is released during iteration.

- If one rq is referred after rq->q is frozen, refcount_inc_not_zero()
will return false, and we won't iterate over this request.

However, still one request UAF not covered: refcount_inc_not_zero() may
read one freed request, and it will be handled in next patch.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-tag.c | 23 +++++++++++++++++------
 block/blk-mq.c     | 14 +++++++++-----
 block/blk-mq.h     |  1 +
 3 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 2a37731e8244..100fa44d52a6 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -264,6 +264,8 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	struct blk_mq_tags *tags = iter_data->tags;
 	bool reserved = iter_data->flags & BT_TAG_ITER_RESERVED;
 	struct request *rq;
+	bool ret;
+	bool iter_static_rqs = !!(iter_data->flags & BT_TAG_ITER_STATIC_RQS);
 
 	if (!reserved)
 		bitnr += tags->nr_reserved_tags;
@@ -272,16 +274,21 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	 * We can hit rq == NULL here, because the tagging functions
 	 * test and set the bit before assigning ->rqs[].
 	 */
-	if (iter_data->flags & BT_TAG_ITER_STATIC_RQS)
+	if (iter_static_rqs)
 		rq = tags->static_rqs[bitnr];
-	else
+	else {
 		rq = tags->rqs[bitnr];
-	if (!rq)
-		return true;
+		if (!rq || !refcount_inc_not_zero(&rq->ref))
+			return true;
+	}
 	if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
 	    !blk_mq_request_started(rq))
-		return true;
-	return iter_data->fn(rq, iter_data->data, reserved);
+		ret = true;
+	else
+		ret = iter_data->fn(rq, iter_data->data, reserved);
+	if (!iter_static_rqs)
+		blk_mq_put_rq_ref(rq);
+	return ret;
 }
 
 /**
@@ -348,6 +355,10 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
  *		indicates whether or not @rq is a reserved request. Return
  *		true to continue iterating tags, false to stop.
  * @priv:	Will be passed as second argument to @fn.
+ *
+ * We grab one request reference before calling @fn and release it after
+ * @fn returns. So far we don't support to pass the request reference to
+ * one new conetxt in @fn.
  */
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		busy_tag_iter_fn *fn, void *priv)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 927189a55575..4bd6c11bd8bc 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -909,6 +909,14 @@ static bool blk_mq_req_expired(struct request *rq, unsigned long *next)
 	return false;
 }
 
+void blk_mq_put_rq_ref(struct request *rq)
+{
+	if (is_flush_rq(rq, rq->mq_hctx))
+		rq->end_io(rq, 0);
+	else if (refcount_dec_and_test(&rq->ref))
+		__blk_mq_free_request(rq);
+}
+
 static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 		struct request *rq, void *priv, bool reserved)
 {
@@ -942,11 +950,7 @@ static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 	if (blk_mq_req_expired(rq, next))
 		blk_mq_rq_timed_out(rq, reserved);
 
-	if (is_flush_rq(rq, hctx))
-		rq->end_io(rq, 0);
-	else if (refcount_dec_and_test(&rq->ref))
-		__blk_mq_free_request(rq);
-
+	blk_mq_put_rq_ref(rq);
 	return true;
 }
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 3616453ca28c..143afe42c63a 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -47,6 +47,7 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
 void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
 struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
 					struct blk_mq_ctx *start);
+void blk_mq_put_rq_ref(struct request *rq);
 
 /*
  * Internal helpers for allocating/freeing the request map
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH V2 2/3] blk-mq: complete request locally if the completion is from tagset iterator
  2021-04-27  1:45 [PATCH V2 0/3] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
  2021-04-27  1:45 ` [PATCH V2 1/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
@ 2021-04-27  1:45 ` Ming Lei
  2021-04-27  2:30   ` Bart Van Assche
  2021-04-27  1:45 ` [PATCH V2 3/3] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
  2 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2021-04-27  1:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Khazhy Kumykov,
	Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
	David Jeffery, Ming Lei

rq->ref is not held when running a remote completion, and iteration
over tagset request pool is possible when another remote completion
is pending, so there is potential request UAF if request is completed
remotely from our tagset iterator helper.

Fix it by completing request locally if the completion is from tagset
iterator.

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-tag.c     | 5 ++++-
 block/blk-mq.c         | 8 ++++++++
 include/linux/blkdev.h | 2 ++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 100fa44d52a6..773aea4db90c 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -284,8 +284,11 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
 	    !blk_mq_request_started(rq))
 		ret = true;
-	else
+	else {
+		rq->rq_flags |= RQF_ITERATING;
 		ret = iter_data->fn(rq, iter_data->data, reserved);
+		rq->rq_flags &= ~RQF_ITERATING;
+	}
 	if (!iter_static_rqs)
 		blk_mq_put_rq_ref(rq);
 	return ret;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4bd6c11bd8bc..ae06e5b3f215 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -654,6 +654,14 @@ bool blk_mq_complete_request_remote(struct request *rq)
 	if (rq->cmd_flags & REQ_HIPRI)
 		return false;
 
+	/*
+	 * Complete the request locally if it is being completed via tagset
+	 * iterator helper for avoiding UAF because rq->ref isn't held when
+	 * running remote completion via IPI or softirq
+	 */
+	if (rq->rq_flags & RQF_ITERATING)
+		return false;
+
 	if (blk_mq_complete_need_ipi(rq)) {
 		blk_mq_complete_send_ipi(rq);
 		return true;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f2e77ba97550..3b9bc4381dab 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -102,6 +102,8 @@ typedef __u32 __bitwise req_flags_t;
 #define RQF_MQ_POLL_SLEPT	((__force req_flags_t)(1 << 20))
 /* ->timeout has been called, don't expire again */
 #define RQF_TIMED_OUT		((__force req_flags_t)(1 << 21))
+/* The request is being iterated by blk-mq iterator API */
+#define RQF_ITERATING		((__force req_flags_t)(1 << 22))
 
 /* flags that prevent us from merging requests: */
 #define RQF_NOMERGE_FLAGS \
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH V2 3/3] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-04-27  1:45 [PATCH V2 0/3] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
  2021-04-27  1:45 ` [PATCH V2 1/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
  2021-04-27  1:45 ` [PATCH V2 2/3] blk-mq: complete request locally if the completion is from tagset iterator Ming Lei
@ 2021-04-27  1:45 ` Ming Lei
  2021-04-27  2:34   ` Bart Van Assche
  2 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2021-04-27  1:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Khazhy Kumykov,
	Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
	David Jeffery, Ming Lei

refcount_inc_not_zero() in bt_tags_iter() still may read one freed
request.

Fix the issue by the following approach:

1) hold a per-tags spinlock when reading ->rqs[tag] and calling
refcount_inc_not_zero in bt_tags_iter()

2) clearing stale request referred via ->rqs[tag] before freeing
request pool, the per-tags spinlock is held for clearing stale
->rq[tag]

So after we cleared stale requests, bt_tags_iter() won't observe
freed request any more, also the clearing will wait for pending
request reference.

The idea of clearing ->rqs[] is borrowed from John Garry's previous
patch and one recent David's patch.

Cc: John Garry <john.garry@huawei.com>
Cc: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-tag.c |  9 ++++++++-
 block/blk-mq-tag.h |  3 +++
 block/blk-mq.c     | 39 ++++++++++++++++++++++++++++++++++-----
 3 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 773aea4db90c..219bf2b10b14 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -277,9 +277,15 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	if (iter_static_rqs)
 		rq = tags->static_rqs[bitnr];
 	else {
+		unsigned long flags;
+
+		spin_lock_irqsave(&tags->lock, flags);
 		rq = tags->rqs[bitnr];
-		if (!rq || !refcount_inc_not_zero(&rq->ref))
+		if (!rq || !refcount_inc_not_zero(&rq->ref)) {
+			spin_unlock_irqrestore(&tags->lock, flags);
 			return true;
+		}
+		spin_unlock_irqrestore(&tags->lock, flags);
 	}
 	if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
 	    !blk_mq_request_started(rq))
@@ -530,6 +536,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 
 	tags->nr_tags = total_tags;
 	tags->nr_reserved_tags = reserved_tags;
+	spin_lock_init(&tags->lock);
 
 	if (blk_mq_is_sbitmap_shared(flags))
 		return tags;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 7d3e6b333a4a..f942a601b5ef 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -20,6 +20,9 @@ struct blk_mq_tags {
 	struct request **rqs;
 	struct request **static_rqs;
 	struct list_head page_list;
+
+	/* used to clear rqs[] before one request pool is freed */
+	spinlock_t lock;
 };
 
 extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ae06e5b3f215..418145da1740 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2299,6 +2299,38 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio)
 	return BLK_QC_T_NONE;
 }
 
+static size_t order_to_size(unsigned int order)
+{
+	return (size_t)PAGE_SIZE << order;
+}
+
+/* called before freeing request pool in @tags */
+static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
+		struct blk_mq_tags *tags, unsigned int hctx_idx)
+{
+	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
+	struct page *page;
+	unsigned long flags;
+
+	spin_lock_irqsave(&drv_tags->lock, flags);
+	list_for_each_entry(page, &tags->page_list, lru) {
+		unsigned long start = (unsigned long)page_address(page);
+		unsigned long end = start + order_to_size(page->private);
+		int i;
+
+		for (i = 0; i < set->queue_depth; i++) {
+			struct request *rq = drv_tags->rqs[i];
+			unsigned long rq_addr = (unsigned long)rq;
+
+			if (rq_addr >= start && rq_addr < end) {
+				WARN_ON_ONCE(refcount_read(&rq->ref) != 0);
+				cmpxchg(&drv_tags->rqs[i], rq, NULL);
+			}
+		}
+	}
+	spin_unlock_irqrestore(&drv_tags->lock, flags);
+}
+
 void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		     unsigned int hctx_idx)
 {
@@ -2317,6 +2349,8 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		}
 	}
 
+	blk_mq_clear_rq_mapping(set, tags, hctx_idx);
+
 	while (!list_empty(&tags->page_list)) {
 		page = list_first_entry(&tags->page_list, struct page, lru);
 		list_del_init(&page->lru);
@@ -2376,11 +2410,6 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 	return tags;
 }
 
-static size_t order_to_size(unsigned int order)
-{
-	return (size_t)PAGE_SIZE << order;
-}
-
 static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq,
 			       unsigned int hctx_idx, int node)
 {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH V2 1/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
  2021-04-27  1:45 ` [PATCH V2 1/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
@ 2021-04-27  2:28   ` Bart Van Assche
  2021-04-27  2:45     ` Ming Lei
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2021-04-27  2:28 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, Khazhy Kumykov, Shin'ichiro Kawasaki,
	Hannes Reinecke, John Garry, David Jeffery

On 4/26/21 6:45 PM, Ming Lei wrote:
> + * We grab one request reference before calling @fn and release it after
> + * @fn returns. So far we don't support to pass the request reference to
> + * one new conetxt in @fn.
              ^^^^^^^
              context?

> +void blk_mq_put_rq_ref(struct request *rq)
> +{
> +	if (is_flush_rq(rq, rq->mq_hctx))
> +		rq->end_io(rq, 0);
> +	else if (refcount_dec_and_test(&rq->ref))
> +		__blk_mq_free_request(rq);
> +}

Will rq->end_io() be called twice when a tag iteration function
encounters a flush request?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2 2/3] blk-mq: complete request locally if the completion is from tagset iterator
  2021-04-27  1:45 ` [PATCH V2 2/3] blk-mq: complete request locally if the completion is from tagset iterator Ming Lei
@ 2021-04-27  2:30   ` Bart Van Assche
  2021-04-27  7:06     ` Ming Lei
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2021-04-27  2:30 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, Khazhy Kumykov, Shin'ichiro Kawasaki,
	Hannes Reinecke, John Garry, David Jeffery

On 4/26/21 6:45 PM, Ming Lei wrote:
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 100fa44d52a6..773aea4db90c 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -284,8 +284,11 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
>  	if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
>  	    !blk_mq_request_started(rq))
>  		ret = true;
> -	else
> +	else {
> +		rq->rq_flags |= RQF_ITERATING;
>  		ret = iter_data->fn(rq, iter_data->data, reserved);
> +		rq->rq_flags &= ~RQF_ITERATING;
> +	}
>  	if (!iter_static_rqs)
>  		blk_mq_put_rq_ref(rq);
>  	return ret;

All existing rq->rq_flags modifications are serialized. The above change
adds code that may change rq_flags concurrently with regular request
processing. I think that counts as a race condition. Additionally, the
RQF_ITERATING flag won't be set correctly in the (unlikely) case that
two concurrent bt_tags_iter() calls examine the same request at the same
time.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2 3/3] blk-mq: clear stale request in tags->rq[] before freeing one request pool
  2021-04-27  1:45 ` [PATCH V2 3/3] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
@ 2021-04-27  2:34   ` Bart Van Assche
  0 siblings, 0 replies; 11+ messages in thread
From: Bart Van Assche @ 2021-04-27  2:34 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, Khazhy Kumykov, Shin'ichiro Kawasaki,
	Hannes Reinecke, John Garry, David Jeffery

On 4/26/21 6:45 PM, Ming Lei wrote:
> refcount_inc_not_zero() in bt_tags_iter() still may read one freed
> request.
> 
> Fix the issue by the following approach:
> 
> 1) hold a per-tags spinlock when reading ->rqs[tag] and calling
> refcount_inc_not_zero in bt_tags_iter()
> 
> 2) clearing stale request referred via ->rqs[tag] before freeing
> request pool, the per-tags spinlock is held for clearing stale
> ->rq[tag]
> 
> So after we cleared stale requests, bt_tags_iter() won't observe
> freed request any more, also the clearing will wait for pending
> request reference.
> 
> The idea of clearing ->rqs[] is borrowed from John Garry's previous
> patch and one recent David's patch.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2 1/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
  2021-04-27  2:28   ` Bart Van Assche
@ 2021-04-27  2:45     ` Ming Lei
  0 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2021-04-27  2:45 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Khazhy Kumykov,
	Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
	David Jeffery

On Mon, Apr 26, 2021 at 07:28:18PM -0700, Bart Van Assche wrote:
> On 4/26/21 6:45 PM, Ming Lei wrote:
> > + * We grab one request reference before calling @fn and release it after
> > + * @fn returns. So far we don't support to pass the request reference to
> > + * one new conetxt in @fn.
>               ^^^^^^^
>               context?
> 
> > +void blk_mq_put_rq_ref(struct request *rq)
> > +{
> > +	if (is_flush_rq(rq, rq->mq_hctx))
> > +		rq->end_io(rq, 0);
> > +	else if (refcount_dec_and_test(&rq->ref))
> > +		__blk_mq_free_request(rq);
> > +}
> 
> Will rq->end_io() be called twice when a tag iteration function
> encounters a flush request?

It could be, see flush_end_io():

static void flush_end_io(struct request *flush_rq, blk_status_t error)
{
		...
        spin_lock_irqsave(&fq->mq_flush_lock, flags);

        if (!refcount_dec_and_test(&flush_rq->ref)) {
                fq->rq_status = error;
                spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
				return;
		}
		...


Thanks,
Ming


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2 2/3] blk-mq: complete request locally if the completion is from tagset iterator
  2021-04-27  2:30   ` Bart Van Assche
@ 2021-04-27  7:06     ` Ming Lei
  2021-04-27  9:00       ` Ming Lei
  2021-04-27 14:53       ` Bart Van Assche
  0 siblings, 2 replies; 11+ messages in thread
From: Ming Lei @ 2021-04-27  7:06 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Khazhy Kumykov,
	Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
	David Jeffery

On Mon, Apr 26, 2021 at 07:30:51PM -0700, Bart Van Assche wrote:
> On 4/26/21 6:45 PM, Ming Lei wrote:
> > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > index 100fa44d52a6..773aea4db90c 100644
> > --- a/block/blk-mq-tag.c
> > +++ b/block/blk-mq-tag.c
> > @@ -284,8 +284,11 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
> >  	if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
> >  	    !blk_mq_request_started(rq))
> >  		ret = true;
> > -	else
> > +	else {
> > +		rq->rq_flags |= RQF_ITERATING;
> >  		ret = iter_data->fn(rq, iter_data->data, reserved);
> > +		rq->rq_flags &= ~RQF_ITERATING;
> > +	}
> >  	if (!iter_static_rqs)
> >  		blk_mq_put_rq_ref(rq);
> >  	return ret;
> 
> All existing rq->rq_flags modifications are serialized. The above change
> adds code that may change rq_flags concurrently with regular request
> processing. I think that counts as a race condition.

Good catch, but we still can change .rq_flags via atomic op, such as:

	do {
		old = rq->rq_flags;
		new = old | RQF_ITERATING;
	} while (cmpxchg(&rq->rq_flags, old, new) != old);

> Additionally, the
> RQF_ITERATING flag won't be set correctly in the (unlikely) case that
> two concurrent bt_tags_iter() calls examine the same request at the same
> time.

If the driver completes request from two concurrent bt_tags_iter(), there has
been big trouble of double completion, so I'd rather not to consider this case.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2 2/3] blk-mq: complete request locally if the completion is from tagset iterator
  2021-04-27  7:06     ` Ming Lei
@ 2021-04-27  9:00       ` Ming Lei
  2021-04-27 14:53       ` Bart Van Assche
  1 sibling, 0 replies; 11+ messages in thread
From: Ming Lei @ 2021-04-27  9:00 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Khazhy Kumykov,
	Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
	David Jeffery

On Tue, Apr 27, 2021 at 03:06:51PM +0800, Ming Lei wrote:
> On Mon, Apr 26, 2021 at 07:30:51PM -0700, Bart Van Assche wrote:
> > On 4/26/21 6:45 PM, Ming Lei wrote:
> > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > > index 100fa44d52a6..773aea4db90c 100644
> > > --- a/block/blk-mq-tag.c
> > > +++ b/block/blk-mq-tag.c
> > > @@ -284,8 +284,11 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
> > >  	if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
> > >  	    !blk_mq_request_started(rq))
> > >  		ret = true;
> > > -	else
> > > +	else {
> > > +		rq->rq_flags |= RQF_ITERATING;
> > >  		ret = iter_data->fn(rq, iter_data->data, reserved);
> > > +		rq->rq_flags &= ~RQF_ITERATING;
> > > +	}
> > >  	if (!iter_static_rqs)
> > >  		blk_mq_put_rq_ref(rq);
> > >  	return ret;
> > 
> > All existing rq->rq_flags modifications are serialized. The above change
> > adds code that may change rq_flags concurrently with regular request
> > processing. I think that counts as a race condition.
> 
> Good catch, but we still can change .rq_flags via atomic op, such as:
> 
> 	do {
> 		old = rq->rq_flags;
> 		new = old | RQF_ITERATING;
> 	} while (cmpxchg(&rq->rq_flags, old, new) != old);

oops, this way can't work because other .rq_flags modification still may
clear RQF_ITERATING.

As I mentioned in another thread, blk-mq needn't to consider the double
completion[1] any more, which is covered by driver. blk-mq just needs to
make sure that valid request is passed to ->fn(), and it is driver's
responsibility to avoid double completion.

[1] https://lore.kernel.org/linux-block/YIdWz8C5eklPvEiV@T590/T/#u


Thanks,
Ming


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2 2/3] blk-mq: complete request locally if the completion is from tagset iterator
  2021-04-27  7:06     ` Ming Lei
  2021-04-27  9:00       ` Ming Lei
@ 2021-04-27 14:53       ` Bart Van Assche
  1 sibling, 0 replies; 11+ messages in thread
From: Bart Van Assche @ 2021-04-27 14:53 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Khazhy Kumykov,
	Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
	David Jeffery

On 4/27/21 12:06 AM, Ming Lei wrote:
> On Mon, Apr 26, 2021 at 07:30:51PM -0700, Bart Van Assche wrote:
>> On 4/26/21 6:45 PM, Ming Lei wrote:
>>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>>> index 100fa44d52a6..773aea4db90c 100644
>>> --- a/block/blk-mq-tag.c
>>> +++ b/block/blk-mq-tag.c
>>> @@ -284,8 +284,11 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
>>>  	if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
>>>  	    !blk_mq_request_started(rq))
>>>  		ret = true;
>>> -	else
>>> +	else {
>>> +		rq->rq_flags |= RQF_ITERATING;
>>>  		ret = iter_data->fn(rq, iter_data->data, reserved);
>>> +		rq->rq_flags &= ~RQF_ITERATING;
>>> +	}
>>>  	if (!iter_static_rqs)
>>>  		blk_mq_put_rq_ref(rq);
>>>  	return ret;
>>
>> All existing rq->rq_flags modifications are serialized. The above change
>> adds code that may change rq_flags concurrently with regular request
>> processing. I think that counts as a race condition.
> 
> Good catch, but we still can change .rq_flags via atomic op, such as:
> 
> 	do {
> 		old = rq->rq_flags;
> 		new = old | RQF_ITERATING;
> 	} while (cmpxchg(&rq->rq_flags, old, new) != old);

That's not sufficient because the above would not work correctly in
combination with statements like the following:

	rq->rq_flags &= ~RQF_MQ_INFLIGHT;
	req->rq_flags |= RQF_TIMED_OUT;

How about setting a flag in 'current', just like the memalloc_noio_*()
functions set or clear PF_MEMALLOC_NOIO to indicate whether or not
GFP_NOIO should be used? That should work fine in thread context and
also in interrupt context.

>> Additionally, the
>> RQF_ITERATING flag won't be set correctly in the (unlikely) case that
>> two concurrent bt_tags_iter() calls examine the same request at the same
>> time.
> 
> If the driver completes request from two concurrent bt_tags_iter(), there has
> been big trouble of double completion, so I'd rather not to consider this case.

bt_tags_iter() may be used for other purposes than completing requests.
Here is an example of a blk_mq_tagset_busy_iter() call (from debugfs)
that may run concurrently with other calls of that function:

static int hctx_busy_show(void *data, struct seq_file *m)
{
	struct blk_mq_hw_ctx *hctx = data;
	struct show_busy_params params = { .m = m, .hctx = hctx };

	blk_mq_tagset_busy_iter(hctx->queue->tag_set, hctx_show_busy_rq,
				&params);

	return 0;
}

Bart.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-04-27 14:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-27  1:45 [PATCH V2 0/3] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
2021-04-27  1:45 ` [PATCH V2 1/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
2021-04-27  2:28   ` Bart Van Assche
2021-04-27  2:45     ` Ming Lei
2021-04-27  1:45 ` [PATCH V2 2/3] blk-mq: complete request locally if the completion is from tagset iterator Ming Lei
2021-04-27  2:30   ` Bart Van Assche
2021-04-27  7:06     ` Ming Lei
2021-04-27  9:00       ` Ming Lei
2021-04-27 14:53       ` Bart Van Assche
2021-04-27  1:45 ` [PATCH V2 3/3] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
2021-04-27  2:34   ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).