* [PATCH V7 1/4] block: avoid double io accounting for flush request
2021-05-11 15:22 [PATCH V7 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
@ 2021-05-11 15:22 ` Ming Lei
2021-05-11 15:22 ` [PATCH V7 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
` (4 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2021-05-11 15:22 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
Christoph Hellwig, David Jeffery, Shinichiro Kawasaki, Ming Lei
For flush request, rq->end_io() may be called two times, one is from
timeout handling(blk_mq_check_expired()), another is from normal
completion(__blk_mq_end_request()).
Move blk_account_io_flush() after flush_rq->ref drops to zero, so
io accounting can be done just once for flush request.
Fixes: b68663186577 ("block: add iostat counters for flush requests")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-flush.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 7942ca6ed321..1002f6c58181 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -219,8 +219,6 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
unsigned long flags = 0;
struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
- blk_account_io_flush(flush_rq);
-
/* release the tag's ownership to the req cloned from */
spin_lock_irqsave(&fq->mq_flush_lock, flags);
@@ -230,6 +228,7 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
return;
}
+ blk_account_io_flush(flush_rq);
/*
* Flush request has to be marked as IDLE when it is really ended
* because its .end_io() is called from timeout code path too for
--
2.29.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH V7 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
2021-05-11 15:22 [PATCH V7 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
2021-05-11 15:22 ` [PATCH V7 1/4] block: avoid double io accounting for flush request Ming Lei
@ 2021-05-11 15:22 ` Ming Lei
2021-05-11 15:22 ` [PATCH V7 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2021-05-11 15:22 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
Christoph Hellwig, David Jeffery, Shinichiro Kawasaki, Ming Lei
Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and
this way will prevent the request from being re-used when ->fn is
running. The approach is same as what we do during handling timeout.
Fix request use-after-free(UAF) related with completion race or queue
releasing:
- If one rq is referred before rq->q is frozen, then queue won't be
frozen before the request is released during iteration.
- If one rq is referred after rq->q is frozen, refcount_inc_not_zero()
will return false, and we won't iterate over this request.
However, still one request UAF not covered: refcount_inc_not_zero() may
read one freed request, and it will be handled in next patch.
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq-tag.c | 44 +++++++++++++++++++++++++++++++++-----------
block/blk-mq.c | 14 +++++++++-----
block/blk-mq.h | 1 +
3 files changed, 43 insertions(+), 16 deletions(-)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 2a37731e8244..544edf2c56a5 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -199,6 +199,16 @@ struct bt_iter_data {
bool reserved;
};
+static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
+ unsigned int bitnr)
+{
+ struct request *rq = tags->rqs[bitnr];
+
+ if (!rq || !refcount_inc_not_zero(&rq->ref))
+ return NULL;
+ return rq;
+}
+
static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
{
struct bt_iter_data *iter_data = data;
@@ -206,18 +216,22 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
struct blk_mq_tags *tags = hctx->tags;
bool reserved = iter_data->reserved;
struct request *rq;
+ bool ret = true;
if (!reserved)
bitnr += tags->nr_reserved_tags;
- rq = tags->rqs[bitnr];
-
/*
* We can hit rq == NULL here, because the tagging functions
* test and set the bit before assigning ->rqs[].
*/
- if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
- return iter_data->fn(hctx, rq, iter_data->data, reserved);
- return true;
+ rq = blk_mq_find_and_get_req(tags, bitnr);
+ if (!rq)
+ return true;
+
+ if (rq->q == hctx->queue && rq->mq_hctx == hctx)
+ ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
+ blk_mq_put_rq_ref(rq);
+ return ret;
}
/**
@@ -264,6 +278,8 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
struct blk_mq_tags *tags = iter_data->tags;
bool reserved = iter_data->flags & BT_TAG_ITER_RESERVED;
struct request *rq;
+ bool ret = true;
+ bool iter_static_rqs = !!(iter_data->flags & BT_TAG_ITER_STATIC_RQS);
if (!reserved)
bitnr += tags->nr_reserved_tags;
@@ -272,16 +288,19 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
* We can hit rq == NULL here, because the tagging functions
* test and set the bit before assigning ->rqs[].
*/
- if (iter_data->flags & BT_TAG_ITER_STATIC_RQS)
+ if (iter_static_rqs)
rq = tags->static_rqs[bitnr];
else
- rq = tags->rqs[bitnr];
+ rq = blk_mq_find_and_get_req(tags, bitnr);
if (!rq)
return true;
- if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
- !blk_mq_request_started(rq))
- return true;
- return iter_data->fn(rq, iter_data->data, reserved);
+
+ if (!(iter_data->flags & BT_TAG_ITER_STARTED) ||
+ blk_mq_request_started(rq))
+ ret = iter_data->fn(rq, iter_data->data, reserved);
+ if (!iter_static_rqs)
+ blk_mq_put_rq_ref(rq);
+ return ret;
}
/**
@@ -348,6 +367,9 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
* indicates whether or not @rq is a reserved request. Return
* true to continue iterating tags, false to stop.
* @priv: Will be passed as second argument to @fn.
+ *
+ * We grab one request reference before calling @fn and release it after
+ * @fn returns.
*/
void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
busy_tag_iter_fn *fn, void *priv)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 466676bc2f0b..d053e80fa6d7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -909,6 +909,14 @@ static bool blk_mq_req_expired(struct request *rq, unsigned long *next)
return false;
}
+void blk_mq_put_rq_ref(struct request *rq)
+{
+ if (is_flush_rq(rq, rq->mq_hctx))
+ rq->end_io(rq, 0);
+ else if (refcount_dec_and_test(&rq->ref))
+ __blk_mq_free_request(rq);
+}
+
static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
struct request *rq, void *priv, bool reserved)
{
@@ -942,11 +950,7 @@ static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
if (blk_mq_req_expired(rq, next))
blk_mq_rq_timed_out(rq, reserved);
- if (is_flush_rq(rq, hctx))
- rq->end_io(rq, 0);
- else if (refcount_dec_and_test(&rq->ref))
- __blk_mq_free_request(rq);
-
+ blk_mq_put_rq_ref(rq);
return true;
}
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 9ce64bc4a6c8..556368d2c5b6 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -47,6 +47,7 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *start);
+void blk_mq_put_rq_ref(struct request *rq);
/*
* Internal helpers for allocating/freeing the request map
--
2.29.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH V7 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
2021-05-11 15:22 [PATCH V7 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
2021-05-11 15:22 ` [PATCH V7 1/4] block: avoid double io accounting for flush request Ming Lei
2021-05-11 15:22 ` [PATCH V7 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
@ 2021-05-11 15:22 ` Ming Lei
2021-05-11 15:22 ` [PATCH V7 4/4] blk-mq: clearing flush request reference in tags->rqs[] Ming Lei
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2021-05-11 15:22 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
Christoph Hellwig, David Jeffery, Shinichiro Kawasaki, Ming Lei
refcount_inc_not_zero() in bt_tags_iter() still may read one freed
request.
Fix the issue by the following approach:
1) hold a per-tags spinlock when reading ->rqs[tag] and calling
refcount_inc_not_zero in bt_tags_iter()
2) clearing stale request referred via ->rqs[tag] before freeing
request pool, the per-tags spinlock is held for clearing stale
->rq[tag]
So after we cleared stale requests, bt_tags_iter() won't observe
freed request any more, also the clearing will wait for pending
request reference.
The idea of clearing ->rqs[] is borrowed from John Garry's previous
patch and one recent David's patch.
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq-tag.c | 9 +++++++--
block/blk-mq-tag.h | 6 ++++++
block/blk-mq.c | 46 +++++++++++++++++++++++++++++++++++++++++-----
3 files changed, 54 insertions(+), 7 deletions(-)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 544edf2c56a5..1671dae43030 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -202,10 +202,14 @@ struct bt_iter_data {
static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
unsigned int bitnr)
{
- struct request *rq = tags->rqs[bitnr];
+ struct request *rq;
+ unsigned long flags;
+ spin_lock_irqsave(&tags->lock, flags);
+ rq = tags->rqs[bitnr];
if (!rq || !refcount_inc_not_zero(&rq->ref))
- return NULL;
+ rq = NULL;
+ spin_unlock_irqrestore(&tags->lock, flags);
return rq;
}
@@ -538,6 +542,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
tags->nr_tags = total_tags;
tags->nr_reserved_tags = reserved_tags;
+ spin_lock_init(&tags->lock);
if (blk_mq_is_sbitmap_shared(flags))
return tags;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 7d3e6b333a4a..f887988e5ef6 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -20,6 +20,12 @@ struct blk_mq_tags {
struct request **rqs;
struct request **static_rqs;
struct list_head page_list;
+
+ /*
+ * used to clear request reference in rqs[] before freeing one
+ * request pool
+ */
+ spinlock_t lock;
};
extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d053e80fa6d7..626b4483e006 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2306,6 +2306,45 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio)
return BLK_QC_T_NONE;
}
+static size_t order_to_size(unsigned int order)
+{
+ return (size_t)PAGE_SIZE << order;
+}
+
+/* called before freeing request pool in @tags */
+static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
+ struct blk_mq_tags *tags, unsigned int hctx_idx)
+{
+ struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
+ struct page *page;
+ unsigned long flags;
+
+ list_for_each_entry(page, &tags->page_list, lru) {
+ unsigned long start = (unsigned long)page_address(page);
+ unsigned long end = start + order_to_size(page->private);
+ int i;
+
+ for (i = 0; i < set->queue_depth; i++) {
+ struct request *rq = drv_tags->rqs[i];
+ unsigned long rq_addr = (unsigned long)rq;
+
+ if (rq_addr >= start && rq_addr < end) {
+ WARN_ON_ONCE(refcount_read(&rq->ref) != 0);
+ cmpxchg(&drv_tags->rqs[i], rq, NULL);
+ }
+ }
+ }
+
+ /*
+ * Wait until all pending iteration is done.
+ *
+ * Request reference is cleared and it is guaranteed to be observed
+ * after the ->lock is released.
+ */
+ spin_lock_irqsave(&drv_tags->lock, flags);
+ spin_unlock_irqrestore(&drv_tags->lock, flags);
+}
+
void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
unsigned int hctx_idx)
{
@@ -2324,6 +2363,8 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
}
}
+ blk_mq_clear_rq_mapping(set, tags, hctx_idx);
+
while (!list_empty(&tags->page_list)) {
page = list_first_entry(&tags->page_list, struct page, lru);
list_del_init(&page->lru);
@@ -2383,11 +2424,6 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
return tags;
}
-static size_t order_to_size(unsigned int order)
-{
- return (size_t)PAGE_SIZE << order;
-}
-
static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq,
unsigned int hctx_idx, int node)
{
--
2.29.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH V7 4/4] blk-mq: clearing flush request reference in tags->rqs[]
2021-05-11 15:22 [PATCH V7 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
` (2 preceding siblings ...)
2021-05-11 15:22 ` [PATCH V7 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
@ 2021-05-11 15:22 ` Ming Lei
2021-05-14 0:43 ` [PATCH V7 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
2021-08-06 3:40 ` yukuai (C)
5 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2021-05-11 15:22 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
Christoph Hellwig, David Jeffery, Shinichiro Kawasaki, Ming Lei
Before we free request queue, clearing flush request reference in
tags->rqs[], so that potential UAF can be avoided.
Based on one patch written by David Jeffery.
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 35 ++++++++++++++++++++++++++++++++++-
1 file changed, 34 insertions(+), 1 deletion(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 626b4483e006..41b64bbf2154 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2642,16 +2642,49 @@ static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
&hctx->cpuhp_dead);
}
+/*
+ * Before freeing hw queue, clearing the flush request reference in
+ * tags->rqs[] for avoiding potential UAF.
+ */
+static void blk_mq_clear_flush_rq_mapping(struct blk_mq_tags *tags,
+ unsigned int queue_depth, struct request *flush_rq)
+{
+ int i;
+ unsigned long flags;
+
+ /* The hw queue may not be mapped yet */
+ if (!tags)
+ return;
+
+ WARN_ON_ONCE(refcount_read(&flush_rq->ref) != 0);
+
+ for (i = 0; i < queue_depth; i++)
+ cmpxchg(&tags->rqs[i], flush_rq, NULL);
+
+ /*
+ * Wait until all pending iteration is done.
+ *
+ * Request reference is cleared and it is guaranteed to be observed
+ * after the ->lock is released.
+ */
+ spin_lock_irqsave(&tags->lock, flags);
+ spin_unlock_irqrestore(&tags->lock, flags);
+}
+
/* hctx->ctxs will be freed in queue's release handler */
static void blk_mq_exit_hctx(struct request_queue *q,
struct blk_mq_tag_set *set,
struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
{
+ struct request *flush_rq = hctx->fq->flush_rq;
+
if (blk_mq_hw_queue_mapped(hctx))
blk_mq_tag_idle(hctx);
+ blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx],
+ set->queue_depth, flush_rq);
if (set->ops->exit_request)
- set->ops->exit_request(set, hctx->fq->flush_rq, hctx_idx);
+ set->ops->exit_request(set, flush_rq, hctx_idx);
if (set->ops->exit_hctx)
set->ops->exit_hctx(hctx, hctx_idx);
--
2.29.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH V7 0/4] blk-mq: fix request UAF related with iterating over tagset requests
2021-05-11 15:22 [PATCH V7 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
` (3 preceding siblings ...)
2021-05-11 15:22 ` [PATCH V7 4/4] blk-mq: clearing flush request reference in tags->rqs[] Ming Lei
@ 2021-05-14 0:43 ` Ming Lei
2021-05-14 15:34 ` Jens Axboe
2021-08-06 3:40 ` yukuai (C)
5 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2021-05-14 0:43 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
Christoph Hellwig, David Jeffery, Shinichiro Kawasaki
On Tue, May 11, 2021 at 11:22:32PM +0800, Ming Lei wrote:
> Hi Jens,
>
> This patchset fixes the request UAF issue by one simple approach,
> without clearing ->rqs[] in fast path, please consider it for 5.13.
>
> 1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
> and release it after calling ->fn, so ->fn won't be called for one
> request if its queue is frozen, done in 2st patch
>
> 2) clearing any stale request referred in ->rqs[] before freeing the
> request pool, one per-tags spinlock is added for protecting
> grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
> in bt_tags_iter() is avoided, done in 3rd patch.
>
> V7:
> - fix one null-ptr-deref during updating nr_hw_queues, because
> blk_mq_clear_flush_rq_mapping() may touch non-mapped hw queue's
> tags, only patch 4/4 is modified, reported and verified by
> Shinichiro Kawasaki
> - run blktests and not see regression
Hi Jens,
We have been working on this issue for a bit long, so any chance to get
the fixes merged? Either 5.13 or 5.14 is fine.
Thanks,
Ming
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH V7 0/4] blk-mq: fix request UAF related with iterating over tagset requests
2021-05-14 0:43 ` [PATCH V7 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
@ 2021-05-14 15:34 ` Jens Axboe
0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2021-05-14 15:34 UTC (permalink / raw)
To: Ming Lei
Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
Christoph Hellwig, David Jeffery, Shinichiro Kawasaki
On 5/13/21 6:43 PM, Ming Lei wrote:
> On Tue, May 11, 2021 at 11:22:32PM +0800, Ming Lei wrote:
>> Hi Jens,
>>
>> This patchset fixes the request UAF issue by one simple approach,
>> without clearing ->rqs[] in fast path, please consider it for 5.13.
>>
>> 1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
>> and release it after calling ->fn, so ->fn won't be called for one
>> request if its queue is frozen, done in 2st patch
>>
>> 2) clearing any stale request referred in ->rqs[] before freeing the
>> request pool, one per-tags spinlock is added for protecting
>> grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
>> in bt_tags_iter() is avoided, done in 3rd patch.
>>
>> V7:
>> - fix one null-ptr-deref during updating nr_hw_queues, because
>> blk_mq_clear_flush_rq_mapping() may touch non-mapped hw queue's
>> tags, only patch 4/4 is modified, reported and verified by
>> Shinichiro Kawasaki
>> - run blktests and not see regression
>
> Hi Jens,
>
> We have been working on this issue for a bit long, so any chance to get
> the fixes merged? Either 5.13 or 5.14 is fine.
Let's get it queued up for 5.14 - we can backport to stable as needed.
I'll do that now.
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH V7 0/4] blk-mq: fix request UAF related with iterating over tagset requests
2021-05-11 15:22 [PATCH V7 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
` (4 preceding siblings ...)
2021-05-14 0:43 ` [PATCH V7 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
@ 2021-08-06 3:40 ` yukuai (C)
2021-08-06 4:12 ` Ming Lei
5 siblings, 1 reply; 10+ messages in thread
From: yukuai (C) @ 2021-08-06 3:40 UTC (permalink / raw)
To: Ming Lei, Jens Axboe
Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
Christoph Hellwig, David Jeffery, Shinichiro Kawasaki
On 2021/05/11 23:22, Ming Lei wrote:
> Hi Jens,
>
> This patchset fixes the request UAF issue by one simple approach,
> without clearing ->rqs[] in fast path, please consider it for 5.13.
>
> 1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
> and release it after calling ->fn, so ->fn won't be called for one
> request if its queue is frozen, done in 2st patch
>
> 2) clearing any stale request referred in ->rqs[] before freeing the
> request pool, one per-tags spinlock is added for protecting
> grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
> in bt_tags_iter() is avoided, done in 3rd patch.
>
> V7:
> - fix one null-ptr-deref during updating nr_hw_queues, because
> blk_mq_clear_flush_rq_mapping() may touch non-mapped hw queue's
> tags, only patch 4/4 is modified, reported and verified by
> Shinichiro Kawasaki
> - run blktests and not see regression
>
> V6:
> - hold spin lock when reading rq via ->rq[tag_bit], the issue is
> added in V5
> - make blk_mq_find_and_get_req() more clean, as suggested by Bart
> - not hold tags->lock when clearing ->rqs[] for avoiding to disable
> interrupt a bit long, as suggested by Bart
> - code style change, as suggested by Christoph
>
> V5:
> - cover bt_iter() by same approach taken in bt_tags_iter(), and pass
> John's invasive test
> - add tested-by/reviewed-by tag
>
> V4:
> - remove hctx->fq-flush_rq from tags->rqs[] before freeing hw queue,
> patch 4/4 is added, which is based on David's patch.
>
> V3:
> - drop patches for completing requests started in iterator ->fn,
> because blk-mq guarantees that valid request is passed to ->fn,
> and it is driver's responsibility for avoiding double completion.
> And drivers works well for not completing rq twice.
> - add one patch for avoiding double accounting of flush rq
>
> V2:
> - take Bart's suggestion to not add blk-mq helper for completing
> requests when it is being iterated
> - don't grab rq->ref if the iterator is over static rqs because
> the use case do require to iterate over all requests no matter if
> the request is initialized or not
>
>
> Ming Lei (4):
> block: avoid double io accounting for flush request
> blk-mq: grab rq->refcount before calling ->fn in
> blk_mq_tagset_busy_iter
> blk-mq: clear stale request in tags->rq[] before freeing one request
> pool
> blk-mq: clearing flush request reference in tags->rqs[]
>
> block/blk-flush.c | 3 +-
> block/blk-mq-tag.c | 49 ++++++++++++++++++------
> block/blk-mq-tag.h | 6 +++
> block/blk-mq.c | 95 ++++++++++++++++++++++++++++++++++++++++------
> block/blk-mq.h | 1 +
> 5 files changed, 130 insertions(+), 24 deletions(-)
>
Hi, ming
I see this patchset was applied to fix the problem while iterating over
tagset, however blk_mq_tag_to_rq() is still untouched, and this
interface might still return freed request.
Any reason why this interface didn't use the same solution ?
(hold tags->lock and return null if ref is 0)
Thanks
Kuai
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH V7 0/4] blk-mq: fix request UAF related with iterating over tagset requests
2021-08-06 3:40 ` yukuai (C)
@ 2021-08-06 4:12 ` Ming Lei
2021-08-06 7:50 ` yukuai (C)
0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2021-08-06 4:12 UTC (permalink / raw)
To: yukuai (C)
Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
John Garry, Christoph Hellwig, David Jeffery,
Shinichiro Kawasaki
On Fri, Aug 06, 2021 at 11:40:25AM +0800, yukuai (C) wrote:
> On 2021/05/11 23:22, Ming Lei wrote:
> > Hi Jens,
> >
> > This patchset fixes the request UAF issue by one simple approach,
> > without clearing ->rqs[] in fast path, please consider it for 5.13.
> >
> > 1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
> > and release it after calling ->fn, so ->fn won't be called for one
> > request if its queue is frozen, done in 2st patch
> >
> > 2) clearing any stale request referred in ->rqs[] before freeing the
> > request pool, one per-tags spinlock is added for protecting
> > grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
> > in bt_tags_iter() is avoided, done in 3rd patch.
> >
> > V7:
> > - fix one null-ptr-deref during updating nr_hw_queues, because
> > blk_mq_clear_flush_rq_mapping() may touch non-mapped hw queue's
> > tags, only patch 4/4 is modified, reported and verified by
> > Shinichiro Kawasaki
> > - run blktests and not see regression
> >
> > V6:
> > - hold spin lock when reading rq via ->rq[tag_bit], the issue is
> > added in V5
> > - make blk_mq_find_and_get_req() more clean, as suggested by Bart
> > - not hold tags->lock when clearing ->rqs[] for avoiding to disable
> > interrupt a bit long, as suggested by Bart
> > - code style change, as suggested by Christoph
> >
> > V5:
> > - cover bt_iter() by same approach taken in bt_tags_iter(), and pass
> > John's invasive test
> > - add tested-by/reviewed-by tag
> >
> > V4:
> > - remove hctx->fq-flush_rq from tags->rqs[] before freeing hw queue,
> > patch 4/4 is added, which is based on David's patch.
> >
> > V3:
> > - drop patches for completing requests started in iterator ->fn,
> > because blk-mq guarantees that valid request is passed to ->fn,
> > and it is driver's responsibility for avoiding double completion.
> > And drivers works well for not completing rq twice.
> > - add one patch for avoiding double accounting of flush rq
> >
> > V2:
> > - take Bart's suggestion to not add blk-mq helper for completing
> > requests when it is being iterated
> > - don't grab rq->ref if the iterator is over static rqs because
> > the use case do require to iterate over all requests no matter if
> > the request is initialized or not
> >
> >
> > Ming Lei (4):
> > block: avoid double io accounting for flush request
> > blk-mq: grab rq->refcount before calling ->fn in
> > blk_mq_tagset_busy_iter
> > blk-mq: clear stale request in tags->rq[] before freeing one request
> > pool
> > blk-mq: clearing flush request reference in tags->rqs[]
> >
> > block/blk-flush.c | 3 +-
> > block/blk-mq-tag.c | 49 ++++++++++++++++++------
> > block/blk-mq-tag.h | 6 +++
> > block/blk-mq.c | 95 ++++++++++++++++++++++++++++++++++++++++------
> > block/blk-mq.h | 1 +
> > 5 files changed, 130 insertions(+), 24 deletions(-)
> >
>
> Hi, ming
>
> I see this patchset was applied to fix the problem while iterating over
> tagset, however blk_mq_tag_to_rq() is still untouched, and this
> interface might still return freed request.
>
> Any reason why this interface didn't use the same solution ?
> (hold tags->lock and return null if ref is 0)
It is driver's responsibility to cover race between normal completion
and timeout/error handing, so driver has the knowledge if one tag is valid
or not. In short, it is driver's responsibility to guarantee that valid 'tag'
is passed to blk_mq_tag_to_rq().
Thanks,
Ming
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH V7 0/4] blk-mq: fix request UAF related with iterating over tagset requests
2021-08-06 4:12 ` Ming Lei
@ 2021-08-06 7:50 ` yukuai (C)
0 siblings, 0 replies; 10+ messages in thread
From: yukuai (C) @ 2021-08-06 7:50 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
John Garry, Christoph Hellwig, David Jeffery,
Shinichiro Kawasaki
On 2021/08/06 12:12, Ming Lei wrote:
> On Fri, Aug 06, 2021 at 11:40:25AM +0800, yukuai (C) wrote:
>> On 2021/05/11 23:22, Ming Lei wrote:
>>> Hi Jens,
>>>
>>> This patchset fixes the request UAF issue by one simple approach,
>>> without clearing ->rqs[] in fast path, please consider it for 5.13.
>>>
>>> 1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
>>> and release it after calling ->fn, so ->fn won't be called for one
>>> request if its queue is frozen, done in 2st patch
>>>
>>> 2) clearing any stale request referred in ->rqs[] before freeing the
>>> request pool, one per-tags spinlock is added for protecting
>>> grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
>>> in bt_tags_iter() is avoided, done in 3rd patch.
>>>
>>> V7:
>>> - fix one null-ptr-deref during updating nr_hw_queues, because
>>> blk_mq_clear_flush_rq_mapping() may touch non-mapped hw queue's
>>> tags, only patch 4/4 is modified, reported and verified by
>>> Shinichiro Kawasaki
>>> - run blktests and not see regression
>>>
>>> V6:
>>> - hold spin lock when reading rq via ->rq[tag_bit], the issue is
>>> added in V5
>>> - make blk_mq_find_and_get_req() more clean, as suggested by Bart
>>> - not hold tags->lock when clearing ->rqs[] for avoiding to disable
>>> interrupt a bit long, as suggested by Bart
>>> - code style change, as suggested by Christoph
>>>
>>> V5:
>>> - cover bt_iter() by same approach taken in bt_tags_iter(), and pass
>>> John's invasive test
>>> - add tested-by/reviewed-by tag
>>>
>>> V4:
>>> - remove hctx->fq-flush_rq from tags->rqs[] before freeing hw queue,
>>> patch 4/4 is added, which is based on David's patch.
>>>
>>> V3:
>>> - drop patches for completing requests started in iterator ->fn,
>>> because blk-mq guarantees that valid request is passed to ->fn,
>>> and it is driver's responsibility for avoiding double completion.
>>> And drivers works well for not completing rq twice.
>>> - add one patch for avoiding double accounting of flush rq
>>>
>>> V2:
>>> - take Bart's suggestion to not add blk-mq helper for completing
>>> requests when it is being iterated
>>> - don't grab rq->ref if the iterator is over static rqs because
>>> the use case do require to iterate over all requests no matter if
>>> the request is initialized or not
>>>
>>>
>>> Ming Lei (4):
>>> block: avoid double io accounting for flush request
>>> blk-mq: grab rq->refcount before calling ->fn in
>>> blk_mq_tagset_busy_iter
>>> blk-mq: clear stale request in tags->rq[] before freeing one request
>>> pool
>>> blk-mq: clearing flush request reference in tags->rqs[]
>>>
>>> block/blk-flush.c | 3 +-
>>> block/blk-mq-tag.c | 49 ++++++++++++++++++------
>>> block/blk-mq-tag.h | 6 +++
>>> block/blk-mq.c | 95 ++++++++++++++++++++++++++++++++++++++++------
>>> block/blk-mq.h | 1 +
>>> 5 files changed, 130 insertions(+), 24 deletions(-)
>>>
>>
>> Hi, ming
>>
>> I see this patchset was applied to fix the problem while iterating over
>> tagset, however blk_mq_tag_to_rq() is still untouched, and this
>> interface might still return freed request.
>>
>> Any reason why this interface didn't use the same solution ?
>> (hold tags->lock and return null if ref is 0)
>
> It is driver's responsibility to cover race between normal completion
> and timeout/error handing, so driver has the knowledge if one tag is valid
> or not. In short, it is driver's responsibility to guarantee that valid 'tag'
> is passed to blk_mq_tag_to_rq().
>
Thanks for the explanation.
Best regards,
Kuai
^ permalink raw reply [flat|nested] 10+ messages in thread