* [PATCH V4 1/4] block: avoid double io accounting for flush request
2021-04-29 2:34 [PATCH V4 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
@ 2021-04-29 2:34 ` Ming Lei
2021-04-30 2:51 ` Bart Van Assche
2021-04-29 2:34 ` [PATCH V4 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
` (4 subsequent siblings)
5 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2021-04-29 2:34 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Bart Van Assche, Khazhy Kumykov,
Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
David Jeffery, Ming Lei
For flush request, rq->end_io() may be called two times, one is from
timeout handling(blk_mq_check_expired()), another is from normal
completion(__blk_mq_end_request()).
Move blk_account_io_flush() after flush_rq->ref drops to zero, so
io accounting can be done just once for flush request.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-flush.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 7942ca6ed321..1002f6c58181 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -219,8 +219,6 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
unsigned long flags = 0;
struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
- blk_account_io_flush(flush_rq);
-
/* release the tag's ownership to the req cloned from */
spin_lock_irqsave(&fq->mq_flush_lock, flags);
@@ -230,6 +228,7 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
return;
}
+ blk_account_io_flush(flush_rq);
/*
* Flush request has to be marked as IDLE when it is really ended
* because its .end_io() is called from timeout code path too for
--
2.29.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH V4 1/4] block: avoid double io accounting for flush request
2021-04-29 2:34 ` [PATCH V4 1/4] block: avoid double io accounting for flush request Ming Lei
@ 2021-04-30 2:51 ` Bart Van Assche
2021-04-30 3:12 ` Ming Lei
0 siblings, 1 reply; 16+ messages in thread
From: Bart Van Assche @ 2021-04-30 2:51 UTC (permalink / raw)
To: Ming Lei, Jens Axboe
Cc: linux-block, Khazhy Kumykov, Shin'ichiro Kawasaki,
Hannes Reinecke, John Garry, David Jeffery
On 4/28/21 7:34 PM, Ming Lei wrote:
> For flush request, rq->end_io() may be called two times, one is from
> timeout handling(blk_mq_check_expired()), another is from normal
> completion(__blk_mq_end_request()).
>
> Move blk_account_io_flush() after flush_rq->ref drops to zero, so
> io accounting can be done just once for flush request.
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> block/blk-flush.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 7942ca6ed321..1002f6c58181 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -219,8 +219,6 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
> unsigned long flags = 0;
> struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
>
> - blk_account_io_flush(flush_rq);
> -
> /* release the tag's ownership to the req cloned from */
> spin_lock_irqsave(&fq->mq_flush_lock, flags);
>
> @@ -230,6 +228,7 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
> return;
> }
>
> + blk_account_io_flush(flush_rq);
> /*
> * Flush request has to be marked as IDLE when it is really ended
> * because its .end_io() is called from timeout code path too for
How about adding the following:
Fixes: b68663186577 ("block: add iostat counters for flush requests")
Anyway:
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH V4 1/4] block: avoid double io accounting for flush request
2021-04-30 2:51 ` Bart Van Assche
@ 2021-04-30 3:12 ` Ming Lei
0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2021-04-30 3:12 UTC (permalink / raw)
To: Bart Van Assche
Cc: Jens Axboe, linux-block, Khazhy Kumykov,
Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
David Jeffery
On Thu, Apr 29, 2021 at 07:51:37PM -0700, Bart Van Assche wrote:
> On 4/28/21 7:34 PM, Ming Lei wrote:
> > For flush request, rq->end_io() may be called two times, one is from
> > timeout handling(blk_mq_check_expired()), another is from normal
> > completion(__blk_mq_end_request()).
> >
> > Move blk_account_io_flush() after flush_rq->ref drops to zero, so
> > io accounting can be done just once for flush request.
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > block/blk-flush.c | 3 +--
> > 1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/block/blk-flush.c b/block/blk-flush.c
> > index 7942ca6ed321..1002f6c58181 100644
> > --- a/block/blk-flush.c
> > +++ b/block/blk-flush.c
> > @@ -219,8 +219,6 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
> > unsigned long flags = 0;
> > struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
> >
> > - blk_account_io_flush(flush_rq);
> > -
> > /* release the tag's ownership to the req cloned from */
> > spin_lock_irqsave(&fq->mq_flush_lock, flags);
> >
> > @@ -230,6 +228,7 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
> > return;
> > }
> >
> > + blk_account_io_flush(flush_rq);
> > /*
> > * Flush request has to be marked as IDLE when it is really ended
> > * because its .end_io() is called from timeout code path too for
>
> How about adding the following:
>
> Fixes: b68663186577 ("block: add iostat counters for flush requests")
Yeah, good point, thanks!
Jens, I guess you may cover that, or please let me know if V5 is needed.
Thanks,
Ming
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH V4 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
2021-04-29 2:34 [PATCH V4 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
2021-04-29 2:34 ` [PATCH V4 1/4] block: avoid double io accounting for flush request Ming Lei
@ 2021-04-29 2:34 ` Ming Lei
2021-04-30 3:06 ` Bart Van Assche
2021-04-29 2:34 ` [PATCH V4 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
` (3 subsequent siblings)
5 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2021-04-29 2:34 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Bart Van Assche, Khazhy Kumykov,
Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
David Jeffery, Ming Lei
Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and
this way will prevent the request from being re-used when ->fn is
running. The approach is same as what we do during handling timeout.
Fix request UAF related with completion race or queue releasing:
- If one rq is referred before rq->q is frozen, then queue won't be
frozen before the request is released during iteration.
- If one rq is referred after rq->q is frozen, refcount_inc_not_zero()
will return false, and we won't iterate over this request.
However, still one request UAF not covered: refcount_inc_not_zero() may
read one freed request, and it will be handled in next patch.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq-tag.c | 22 ++++++++++++++++------
block/blk-mq.c | 14 +++++++++-----
block/blk-mq.h | 1 +
3 files changed, 26 insertions(+), 11 deletions(-)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 2a37731e8244..9329b94a9743 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -264,6 +264,8 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
struct blk_mq_tags *tags = iter_data->tags;
bool reserved = iter_data->flags & BT_TAG_ITER_RESERVED;
struct request *rq;
+ bool ret;
+ bool iter_static_rqs = !!(iter_data->flags & BT_TAG_ITER_STATIC_RQS);
if (!reserved)
bitnr += tags->nr_reserved_tags;
@@ -272,16 +274,21 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
* We can hit rq == NULL here, because the tagging functions
* test and set the bit before assigning ->rqs[].
*/
- if (iter_data->flags & BT_TAG_ITER_STATIC_RQS)
+ if (iter_static_rqs)
rq = tags->static_rqs[bitnr];
- else
+ else {
rq = tags->rqs[bitnr];
- if (!rq)
- return true;
+ if (!rq || !refcount_inc_not_zero(&rq->ref))
+ return true;
+ }
if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
!blk_mq_request_started(rq))
- return true;
- return iter_data->fn(rq, iter_data->data, reserved);
+ ret = true;
+ else
+ ret = iter_data->fn(rq, iter_data->data, reserved);
+ if (!iter_static_rqs)
+ blk_mq_put_rq_ref(rq);
+ return ret;
}
/**
@@ -348,6 +355,9 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
* indicates whether or not @rq is a reserved request. Return
* true to continue iterating tags, false to stop.
* @priv: Will be passed as second argument to @fn.
+ *
+ * We grab one request reference before calling @fn and release it after
+ * @fn returns.
*/
void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
busy_tag_iter_fn *fn, void *priv)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 927189a55575..4bd6c11bd8bc 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -909,6 +909,14 @@ static bool blk_mq_req_expired(struct request *rq, unsigned long *next)
return false;
}
+void blk_mq_put_rq_ref(struct request *rq)
+{
+ if (is_flush_rq(rq, rq->mq_hctx))
+ rq->end_io(rq, 0);
+ else if (refcount_dec_and_test(&rq->ref))
+ __blk_mq_free_request(rq);
+}
+
static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
struct request *rq, void *priv, bool reserved)
{
@@ -942,11 +950,7 @@ static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
if (blk_mq_req_expired(rq, next))
blk_mq_rq_timed_out(rq, reserved);
- if (is_flush_rq(rq, hctx))
- rq->end_io(rq, 0);
- else if (refcount_dec_and_test(&rq->ref))
- __blk_mq_free_request(rq);
-
+ blk_mq_put_rq_ref(rq);
return true;
}
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 3616453ca28c..143afe42c63a 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -47,6 +47,7 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *start);
+void blk_mq_put_rq_ref(struct request *rq);
/*
* Internal helpers for allocating/freeing the request map
--
2.29.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH V4 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
2021-04-29 2:34 ` [PATCH V4 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
@ 2021-04-30 3:06 ` Bart Van Assche
0 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2021-04-30 3:06 UTC (permalink / raw)
To: Ming Lei, Jens Axboe
Cc: linux-block, Khazhy Kumykov, Shin'ichiro Kawasaki,
Hannes Reinecke, John Garry, David Jeffery
On 4/28/21 7:34 PM, Ming Lei wrote:
> Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and
> this way will prevent the request from being re-used when ->fn is
> running. The approach is same as what we do during handling timeout.
>
> Fix request UAF related with completion race or queue releasing:
>
> - If one rq is referred before rq->q is frozen, then queue won't be
> frozen before the request is released during iteration.
>
> - If one rq is referred after rq->q is frozen, refcount_inc_not_zero()
> will return false, and we won't iterate over this request.
>
> However, still one request UAF not covered: refcount_inc_not_zero() may
> read one freed request, and it will be handled in next patch.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH V4 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
2021-04-29 2:34 [PATCH V4 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
2021-04-29 2:34 ` [PATCH V4 1/4] block: avoid double io accounting for flush request Ming Lei
2021-04-29 2:34 ` [PATCH V4 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
@ 2021-04-29 2:34 ` Ming Lei
2021-04-29 14:02 ` David Jeffery
2021-04-29 2:34 ` [PATCH V4 4/4] blk-mq: clearing flush request reference in tags->rqs[] Ming Lei
` (2 subsequent siblings)
5 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2021-04-29 2:34 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Bart Van Assche, Khazhy Kumykov,
Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
David Jeffery, Ming Lei
refcount_inc_not_zero() in bt_tags_iter() still may read one freed
request.
Fix the issue by the following approach:
1) hold a per-tags spinlock when reading ->rqs[tag] and calling
refcount_inc_not_zero in bt_tags_iter()
2) clearing stale request referred via ->rqs[tag] before freeing
request pool, the per-tags spinlock is held for clearing stale
->rq[tag]
So after we cleared stale requests, bt_tags_iter() won't observe
freed request any more, also the clearing will wait for pending
request reference.
The idea of clearing ->rqs[] is borrowed from John Garry's previous
patch and one recent David's patch.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Cc: John Garry <john.garry@huawei.com>
Cc: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq-tag.c | 9 ++++++++-
block/blk-mq-tag.h | 3 +++
block/blk-mq.c | 39 ++++++++++++++++++++++++++++++++++-----
3 files changed, 45 insertions(+), 6 deletions(-)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 9329b94a9743..a3be267212b9 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -277,9 +277,15 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
if (iter_static_rqs)
rq = tags->static_rqs[bitnr];
else {
+ unsigned long flags;
+
+ spin_lock_irqsave(&tags->lock, flags);
rq = tags->rqs[bitnr];
- if (!rq || !refcount_inc_not_zero(&rq->ref))
+ if (!rq || !refcount_inc_not_zero(&rq->ref)) {
+ spin_unlock_irqrestore(&tags->lock, flags);
return true;
+ }
+ spin_unlock_irqrestore(&tags->lock, flags);
}
if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
!blk_mq_request_started(rq))
@@ -526,6 +532,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
tags->nr_tags = total_tags;
tags->nr_reserved_tags = reserved_tags;
+ spin_lock_init(&tags->lock);
if (blk_mq_is_sbitmap_shared(flags))
return tags;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 7d3e6b333a4a..f942a601b5ef 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -20,6 +20,9 @@ struct blk_mq_tags {
struct request **rqs;
struct request **static_rqs;
struct list_head page_list;
+
+ /* used to clear rqs[] before one request pool is freed */
+ spinlock_t lock;
};
extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4bd6c11bd8bc..abd0f7a9d052 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2291,6 +2291,38 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio)
return BLK_QC_T_NONE;
}
+static size_t order_to_size(unsigned int order)
+{
+ return (size_t)PAGE_SIZE << order;
+}
+
+/* called before freeing request pool in @tags */
+static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
+ struct blk_mq_tags *tags, unsigned int hctx_idx)
+{
+ struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
+ struct page *page;
+ unsigned long flags;
+
+ spin_lock_irqsave(&drv_tags->lock, flags);
+ list_for_each_entry(page, &tags->page_list, lru) {
+ unsigned long start = (unsigned long)page_address(page);
+ unsigned long end = start + order_to_size(page->private);
+ int i;
+
+ for (i = 0; i < set->queue_depth; i++) {
+ struct request *rq = drv_tags->rqs[i];
+ unsigned long rq_addr = (unsigned long)rq;
+
+ if (rq_addr >= start && rq_addr < end) {
+ WARN_ON_ONCE(refcount_read(&rq->ref) != 0);
+ cmpxchg(&drv_tags->rqs[i], rq, NULL);
+ }
+ }
+ }
+ spin_unlock_irqrestore(&drv_tags->lock, flags);
+}
+
void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
unsigned int hctx_idx)
{
@@ -2309,6 +2341,8 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
}
}
+ blk_mq_clear_rq_mapping(set, tags, hctx_idx);
+
while (!list_empty(&tags->page_list)) {
page = list_first_entry(&tags->page_list, struct page, lru);
list_del_init(&page->lru);
@@ -2368,11 +2402,6 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
return tags;
}
-static size_t order_to_size(unsigned int order)
-{
- return (size_t)PAGE_SIZE << order;
-}
-
static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq,
unsigned int hctx_idx, int node)
{
--
2.29.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH V4 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
2021-04-29 2:34 ` [PATCH V4 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
@ 2021-04-29 14:02 ` David Jeffery
0 siblings, 0 replies; 16+ messages in thread
From: David Jeffery @ 2021-04-29 14:02 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, linux-block, Bart Van Assche, Khazhy Kumykov,
Shin'ichiro Kawasaki, Hannes Reinecke, John Garry
On Thu, Apr 29, 2021 at 10:34:57AM +0800, Ming Lei wrote:
>
> refcount_inc_not_zero() in bt_tags_iter() still may read one freed
> request.
>
> Fix the issue by the following approach:
>
> 1) hold a per-tags spinlock when reading ->rqs[tag] and calling
> refcount_inc_not_zero in bt_tags_iter()
>
> 2) clearing stale request referred via ->rqs[tag] before freeing
> request pool, the per-tags spinlock is held for clearing stale
> ->rq[tag]
>
> So after we cleared stale requests, bt_tags_iter() won't observe
> freed request any more, also the clearing will wait for pending
> request reference.
>
> The idea of clearing ->rqs[] is borrowed from John Garry's previous
> patch and one recent David's patch.
With the flush request handled in a separate patch, this looks good to me.
Reviewed-by: David Jeffery <djeffery@redhat.com>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH V4 4/4] blk-mq: clearing flush request reference in tags->rqs[]
2021-04-29 2:34 [PATCH V4 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
` (2 preceding siblings ...)
2021-04-29 2:34 ` [PATCH V4 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
@ 2021-04-29 2:34 ` Ming Lei
2021-04-29 14:13 ` David Jeffery
2021-04-30 3:05 ` Bart Van Assche
2021-05-04 7:29 ` [PATCH V4 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
2021-05-04 10:15 ` John Garry
5 siblings, 2 replies; 16+ messages in thread
From: Ming Lei @ 2021-04-29 2:34 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Bart Van Assche, Khazhy Kumykov,
Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
David Jeffery, Ming Lei
Before we free request queue, clearing flush request reference in
tags->rqs[], so that potential UAF can be avoided.
Based on one patch written by David Jeffery.
Cc: John Garry <john.garry@huawei.com>
Cc: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index abd0f7a9d052..e320d9798715 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2620,16 +2620,38 @@ static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
&hctx->cpuhp_dead);
}
+/*
+ * Before freeing hw queue, clearing the flush request reference in
+ * tags->rqs[] for avoiding potential UAF.
+ */
+static void blk_mq_clear_flush_rq_mapping(struct blk_mq_tags *tags,
+ unsigned int queue_depth, struct request *flush_rq)
+{
+ int i;
+ unsigned long flags;
+
+ WARN_ON_ONCE(refcount_read(&flush_rq->ref) != 0);
+
+ spin_lock_irqsave(&tags->lock, flags);
+ for (i = 0; i < queue_depth; i++)
+ cmpxchg(&tags->rqs[i], flush_rq, NULL);
+ spin_unlock_irqrestore(&tags->lock, flags);
+}
+
/* hctx->ctxs will be freed in queue's release handler */
static void blk_mq_exit_hctx(struct request_queue *q,
struct blk_mq_tag_set *set,
struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
{
+ struct request *flush_rq = hctx->fq->flush_rq;
+
if (blk_mq_hw_queue_mapped(hctx))
blk_mq_tag_idle(hctx);
+ blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx],
+ set->queue_depth, flush_rq);
if (set->ops->exit_request)
- set->ops->exit_request(set, hctx->fq->flush_rq, hctx_idx);
+ set->ops->exit_request(set, flush_rq, hctx_idx);
if (set->ops->exit_hctx)
set->ops->exit_hctx(hctx, hctx_idx);
--
2.29.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH V4 4/4] blk-mq: clearing flush request reference in tags->rqs[]
2021-04-29 2:34 ` [PATCH V4 4/4] blk-mq: clearing flush request reference in tags->rqs[] Ming Lei
@ 2021-04-29 14:13 ` David Jeffery
2021-04-30 3:05 ` Bart Van Assche
1 sibling, 0 replies; 16+ messages in thread
From: David Jeffery @ 2021-04-29 14:13 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, linux-block, Bart Van Assche, Khazhy Kumykov,
Shin'ichiro Kawasaki, Hannes Reinecke, John Garry
On Thu, Apr 29, 2021 at 10:34:58AM +0800, Ming Lei wrote:
>
> Before we free request queue, clearing flush request reference in
> tags->rqs[], so that potential UAF can be avoided.
>
> Based on one patch written by David Jeffery.
>
With this fixing my issue with the previous patchset, looks good to me.
Reviewed-By: David Jeffery <djeffery@redhat.com>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH V4 4/4] blk-mq: clearing flush request reference in tags->rqs[]
2021-04-29 2:34 ` [PATCH V4 4/4] blk-mq: clearing flush request reference in tags->rqs[] Ming Lei
2021-04-29 14:13 ` David Jeffery
@ 2021-04-30 3:05 ` Bart Van Assche
1 sibling, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2021-04-30 3:05 UTC (permalink / raw)
To: Ming Lei, Jens Axboe
Cc: linux-block, Khazhy Kumykov, Shin'ichiro Kawasaki,
Hannes Reinecke, John Garry, David Jeffery
On 4/28/21 7:34 PM, Ming Lei wrote:
> Before we free request queue, clearing flush request reference in
> tags->rqs[], so that potential UAF can be avoided.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH V4 0/4] blk-mq: fix request UAF related with iterating over tagset requests
2021-04-29 2:34 [PATCH V4 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
` (3 preceding siblings ...)
2021-04-29 2:34 ` [PATCH V4 4/4] blk-mq: clearing flush request reference in tags->rqs[] Ming Lei
@ 2021-05-04 7:29 ` Ming Lei
2021-05-04 10:15 ` John Garry
5 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2021-05-04 7:29 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Bart Van Assche, Khazhy Kumykov,
Shin'ichiro Kawasaki, Hannes Reinecke, John Garry,
David Jeffery
On Thu, Apr 29, 2021 at 10:34:54AM +0800, Ming Lei wrote:
> Hi Jens,
>
> This patchset fixes the request UAF issue by one simple approach,
> without clearing ->rqs[] in fast path.
>
> 1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
> and release it after calling ->fn, so ->fn won't be called for one
> request if its queue is frozen, done in 2st patch
>
> 2) clearing any stale request referred in ->rqs[] before freeing the
> request pool, one per-tags spinlock is added for protecting
> grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
> in bt_tags_iter() is avoided, done in 3rd patch.
>
>
> V4:
> - remove hctx->fq-flush_rq from tags->rqs[] before freeing hw queue,
> patch 4/4 is added, which is based on David's patch.
Hello Jens,
Any chance to merge the patch to 5.13 if you are fine?
Thanks,
Ming
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH V4 0/4] blk-mq: fix request UAF related with iterating over tagset requests
2021-04-29 2:34 [PATCH V4 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
` (4 preceding siblings ...)
2021-05-04 7:29 ` [PATCH V4 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
@ 2021-05-04 10:15 ` John Garry
2021-05-04 11:43 ` Ming Lei
5 siblings, 1 reply; 16+ messages in thread
From: John Garry @ 2021-05-04 10:15 UTC (permalink / raw)
To: Ming Lei, Jens Axboe
Cc: linux-block, Bart Van Assche, Khazhy Kumykov,
Shin'ichiro Kawasaki, Hannes Reinecke, David Jeffery
On 29/04/2021 03:34, Ming Lei wrote:
> Hi Jens,
>
> This patchset fixes the request UAF issue by one simple approach,
> without clearing ->rqs[] in fast path.
>
> 1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
> and release it after calling ->fn, so ->fn won't be called for one
> request if its queue is frozen, done in 2st patch
>
> 2) clearing any stale request referred in ->rqs[] before freeing the
> request pool, one per-tags spinlock is added for protecting
> grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
> in bt_tags_iter() is avoided, done in 3rd patch.
>
I had a go at testing this. Without any modifications for testing, it
looks ok.
However I also tested by adding an artificial delay in bt_iter() -
otherwise it may not be realistic to trigger some UAF issues in sane
timeframes.
So I made this change:
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -215,8 +215,11 @@ static bool bt_iter(struct sbitmap *bitmap,
unsigned int bitnr, void *data)
* We can hit rq == NULL here, because the tagging functions
* test and set the bit before assigning ->rqs[].
*/
- if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
- return iter_data->fn(hctx, rq, iter_data->data, reserved);
+ if (rq) {
+ mdelay(50);
+ if (rq->q == hctx->queue && rq->mq_hctx == hctx)
+ return iter_data->fn(hctx, rq, iter_data->data, reserved);
+ }
return true;
}
And I could trigger this pretty quickly:
[ 129.318623]
==================================================================
[ 129.325870] BUG: KASAN: use-after-free in bt_iter+0xa0/0x120
[ 129.331548] Read of size 8 at addr ffff0010cc62a200 by task fio/1610
[ 129.339384] CPU: 33 PID: 1610 Comm: fio Not tainted
5.12.0-rc4-47861-g2edb78df2141 #1140
[ 129.339395] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
D05 IT21 Nemo 2.0 RC0 04/18/2018
[ 129.339403] Call trace:
[ 129.339408] dump_backtrace+0x0/0x2d0
[ 129.339430] show_stack+0x18/0x68
[ 129.339441] dump_stack+0x100/0x16c
[ 129.339455] print_address_description.constprop.13+0x68/0x30c
[ 129.339468] kasan_report+0x1d8/0x240
[ 129.339479] __asan_load8+0x9c/0xd8
[ 129.339488] bt_iter+0xa0/0x120
[ 129.339496] blk_mq_queue_tag_busy_iter+0x2d8/0x540
[ 129.339506] blk_mq_in_flight+0x80/0xb8
[ 129.339516] part_stat_show+0xcc/0x210
[ 129.339528] dev_attr_show+0x44/0x90
[ 129.339542] sysfs_kf_seq_show+0x128/0x1c8
[ 129.339554] kernfs_seq_show+0xa0/0xb8
[ 129.339563] seq_read_iter+0x210/0x660
[ 129.339575] kernfs_fop_read_iter+0x208/0x2b0
[ 129.339585] new_sync_read+0x1ec/0x2d0
[ 129.339596] vfs_read+0x188/0x248
[ 129.339605] ksys_read+0xc8/0x178
[ 129.339615] __arm64_sys_read+0x44/0x58
[ 129.339625] el0_svc_common.constprop.1+0xc4/0x190
[ 129.339639] do_el0_svc+0x90/0xa0
[ 129.339648] el0_svc+0x24/0x38
[ 129.339659] el0_sync_handler+0x90/0xb8
[ 129.339670] el0_sync+0x154/0x180
[ 129.341162] Allocated by task 1613:
[ 129.344644] stack_trace_save+0x94/0xc8
[ 129.344662] kasan_save_stack+0x28/0x58
[ 129.344670] __kasan_slab_alloc+0x88/0xa8
[ 129.344679] slab_post_alloc_hook+0x58/0x2a0
[ 129.344691] kmem_cache_alloc+0x19c/0x2c0
[ 129.344699] io_submit_one+0x138/0x1580
[ 129.344710] __arm64_sys_io_submit+0x120/0x310
[ 129.344721] el0_svc_common.constprop.1+0xc4/0x190
[ 129.344731] do_el0_svc+0x90/0xa0
[ 129.344741] el0_svc+0x24/0x38
[ 129.344750] el0_sync_handler+0x90/0xb8
[ 129.344760] el0_sync+0x154/0x180
[ 129.346250] Freed by task 496:
[ 129.349297] stack_trace_save+0x94/0xc8
[ 129.349307] kasan_save_stack+0x28/0x58
[ 129.349315] kasan_set_track+0x28/0x40
[ 129.349323] kasan_set_free_info+0x28/0x50
[ 129.349332] __kasan_slab_free+0xd0/0x130
[ 129.349340] slab_free_freelist_hook+0x70/0x178
[ 129.349352] kmem_cache_free+0x94/0x400
[ 129.349359] aio_complete_rw+0x4f8/0x7f8
[ 129.349369] blkdev_bio_end_io+0xe0/0x1e8
[ 129.349380] bio_endio+0x1d4/0x1f0
[ 129.349392] blk_update_request+0x388/0x6b8
[ 129.349401] scsi_end_request+0x54/0x320
[ 129.349411] scsi_io_completion+0xec/0x700
[ 129.349420] scsi_finish_command+0x168/0x1e8
[ 129.349432] scsi_softirq_done+0x140/0x1b8
[ 129.349441] blk_mq_complete_request+0x4c/0x60
[ 129.349449] scsi_mq_done+0x68/0x88
[ 129.349457] sas_scsi_task_done+0xe0/0x118
[ 129.349470] slot_complete_v2_hw+0x23c/0x620
[ 129.349483] cq_thread_v2_hw+0x10c/0x388
[ 129.349493] irq_thread_fn+0x48/0xd8
[ 129.349503] irq_thread+0x1d4/0x2f8
[ 129.349511] kthread+0x224/0x230
[ 129.349521] ret_from_fork+0x10/0x30
[ 129.351012] The buggy address belongs to the object at ffff0010cc62a200
which belongs to the cache aio_kiocb of size 176
[ 129.363351] The buggy address is located 0 bytes inside of
176-byte region [ffff0010cc62a200, ffff0010cc62a2b0)
[ 129.374909] The buggy address belongs to the page:
[ 129.379693] page:0000000026f8d4c3 refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0x10cc62a
[ 129.379704] head:0000000026f8d4c3 order:1 compound_mapcount:0
[ 129.379710] flags: 0xbfffc0000010200(slab|head)
[ 129.379727] raw: 0bfffc0000010200 dead000000000100 dead000000000122
ffff04100630f780
[ 129.379736] raw: 0000000000000000 0000000000200020 00000001ffffffff
0000000000000000
[ 129.379742] page dumped because: kasan: bad access detected
[ 129.381229] Memory state around the buggy address:
[ 129.386012] ffff0010cc62a100: fa fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb
[ 129.393228] ffff0010cc62a180: fb fb fb fb fb fb fc fc fc fc fc fc fc
fc fc fc
[ 129.400443] >ffff0010cc62a200: fa fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb
[ 129.407657] ^
[ 129.410878] ffff0010cc62a280: fb fb fb fb fb fb fc fc fc fc fc fc fc
fc fc fc
[ 129.418092] ffff0010cc62a300: fa fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb
[ 129.425306]
==================================================================
[ 129.432520] Disabling lock debugging due to kernel taint
root@ubuntu:/home/john#
My script simply kicks off fio running some data on the disk on which /
is mounted, and then continually changes IO sched from
none->mq-deadline->none for that same disk.
I have not had a chance to try to identify the issue.
I would also have added a delay in bt_tags_iter() for testing also, but
since I got a UAF, above, I didn't try that yet.
Thanks,
John
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH V4 0/4] blk-mq: fix request UAF related with iterating over tagset requests
2021-05-04 10:15 ` John Garry
@ 2021-05-04 11:43 ` Ming Lei
2021-05-05 11:19 ` John Garry
0 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2021-05-04 11:43 UTC (permalink / raw)
To: John Garry
Cc: Jens Axboe, linux-block, Bart Van Assche, Khazhy Kumykov,
Shin'ichiro Kawasaki, Hannes Reinecke, David Jeffery
On Tue, May 04, 2021 at 11:15:37AM +0100, John Garry wrote:
> On 29/04/2021 03:34, Ming Lei wrote:
> > Hi Jens,
> >
> > This patchset fixes the request UAF issue by one simple approach,
> > without clearing ->rqs[] in fast path.
> >
> > 1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
> > and release it after calling ->fn, so ->fn won't be called for one
> > request if its queue is frozen, done in 2st patch
> >
> > 2) clearing any stale request referred in ->rqs[] before freeing the
> > request pool, one per-tags spinlock is added for protecting
> > grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
> > in bt_tags_iter() is avoided, done in 3rd patch.
> >
>
> I had a go at testing this. Without any modifications for testing, it looks
> ok.
>
> However I also tested by adding an artificial delay in bt_iter() - otherwise
> it may not be realistic to trigger some UAF issues in sane timeframes.
>
> So I made this change:
>
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -215,8 +215,11 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned
> int bitnr, void *data)
> * We can hit rq == NULL here, because the tagging functions
> * test and set the bit before assigning ->rqs[].
> */
> - if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
> - return iter_data->fn(hctx, rq, iter_data->data, reserved);
> + if (rq) {
> + mdelay(50);
> + if (rq->q == hctx->queue && rq->mq_hctx == hctx)
> + return iter_data->fn(hctx, rq, iter_data->data, reserved);
> + }
> return true;
> }
hammmm, forget to cover bt_iter(), please test the following delta
patch:
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index a3be267212b9..27815114ee3f 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -206,18 +206,28 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
struct blk_mq_tags *tags = hctx->tags;
bool reserved = iter_data->reserved;
struct request *rq;
+ unsigned long flags;
+ bool ret = true;
if (!reserved)
bitnr += tags->nr_reserved_tags;
- rq = tags->rqs[bitnr];
+ spin_lock_irqsave(&tags->lock, flags);
+ rq = tags->rqs[bitnr];
/*
* We can hit rq == NULL here, because the tagging functions
* test and set the bit before assigning ->rqs[].
*/
- if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
- return iter_data->fn(hctx, rq, iter_data->data, reserved);
- return true;
+ if (!rq || !refcount_inc_not_zero(&rq->ref)) {
+ spin_unlock_irqrestore(&tags->lock, flags);
+ return true;
+ }
+ spin_unlock_irqrestore(&tags->lock, flags);
+
+ if (rq->q == hctx->queue && rq->mq_hctx == hctx)
+ ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
+ blk_mq_put_rq_ref(rq);
+ return ret;
}
/**
Thanks,
Ming
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH V4 0/4] blk-mq: fix request UAF related with iterating over tagset requests
2021-05-04 11:43 ` Ming Lei
@ 2021-05-05 11:19 ` John Garry
2021-05-05 14:28 ` Ming Lei
0 siblings, 1 reply; 16+ messages in thread
From: John Garry @ 2021-05-05 11:19 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, linux-block, Bart Van Assche, Khazhy Kumykov,
Shin'ichiro Kawasaki, Hannes Reinecke, David Jeffery
On 04/05/2021 12:43, Ming Lei wrote:
>> */
>> - if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
>> - return iter_data->fn(hctx, rq, iter_data->data, reserved);
>> + if (rq) {
>> + mdelay(50);
>> + if (rq->q == hctx->queue && rq->mq_hctx == hctx)
>> + return iter_data->fn(hctx, rq, iter_data->data, reserved);
>> + }
>> return true;
>> }
> hammmm, forget to cover bt_iter(), please test the following delta
> patch:
>
>
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index a3be267212b9..27815114ee3f 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -206,18 +206,28 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
> struct blk_mq_tags *tags = hctx->tags;
> bool reserved = iter_data->reserved;
> struct request *rq;
> + unsigned long flags;
> + bool ret = true;
>
> if (!reserved)
> bitnr += tags->nr_reserved_tags;
> - rq = tags->rqs[bitnr];
>
> + spin_lock_irqsave(&tags->lock, flags);
> + rq = tags->rqs[bitnr];
> /*
> * We can hit rq == NULL here, because the tagging functions
> * test and set the bit before assigning ->rqs[].
> */
> - if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
> - return iter_data->fn(hctx, rq, iter_data->data, reserved);
> - return true;
> + if (!rq || !refcount_inc_not_zero(&rq->ref)) {
> + spin_unlock_irqrestore(&tags->lock, flags);
> + return true;
> + }
> + spin_unlock_irqrestore(&tags->lock, flags);
> +
> + if (rq->q == hctx->queue && rq->mq_hctx == hctx)
> + ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
> + blk_mq_put_rq_ref(rq);
> + return ret;
> }
This looks to work ok from my testing.
Thanks,
John
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH V4 0/4] blk-mq: fix request UAF related with iterating over tagset requests
2021-05-05 11:19 ` John Garry
@ 2021-05-05 14:28 ` Ming Lei
0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2021-05-05 14:28 UTC (permalink / raw)
To: John Garry
Cc: Jens Axboe, linux-block, Bart Van Assche, Khazhy Kumykov,
Shin'ichiro Kawasaki, Hannes Reinecke, David Jeffery
On Wed, May 05, 2021 at 12:19:21PM +0100, John Garry wrote:
> On 04/05/2021 12:43, Ming Lei wrote:
> > > */
> > > - if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
> > > - return iter_data->fn(hctx, rq, iter_data->data, reserved);
> > > + if (rq) {
> > > + mdelay(50);
> > > + if (rq->q == hctx->queue && rq->mq_hctx == hctx)
> > > + return iter_data->fn(hctx, rq, iter_data->data, reserved);
> > > + }
> > > return true;
> > > }
> > hammmm, forget to cover bt_iter(), please test the following delta
> > patch:
> >
> >
> > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > index a3be267212b9..27815114ee3f 100644
> > --- a/block/blk-mq-tag.c
> > +++ b/block/blk-mq-tag.c
> > @@ -206,18 +206,28 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
> > struct blk_mq_tags *tags = hctx->tags;
> > bool reserved = iter_data->reserved;
> > struct request *rq;
> > + unsigned long flags;
> > + bool ret = true;
> > if (!reserved)
> > bitnr += tags->nr_reserved_tags;
> > - rq = tags->rqs[bitnr];
> > + spin_lock_irqsave(&tags->lock, flags);
> > + rq = tags->rqs[bitnr];
> > /*
> > * We can hit rq == NULL here, because the tagging functions
> > * test and set the bit before assigning ->rqs[].
> > */
> > - if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
> > - return iter_data->fn(hctx, rq, iter_data->data, reserved);
> > - return true;
> > + if (!rq || !refcount_inc_not_zero(&rq->ref)) {
> > + spin_unlock_irqrestore(&tags->lock, flags);
> > + return true;
> > + }
> > + spin_unlock_irqrestore(&tags->lock, flags);
> > +
> > + if (rq->q == hctx->queue && rq->mq_hctx == hctx)
> > + ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
> > + blk_mq_put_rq_ref(rq);
> > + return ret;
> > }
>
>
> This looks to work ok from my testing.
OK, thanks for your test, and will add your tested-by in V4.
Thanks,
Ming
^ permalink raw reply [flat|nested] 16+ messages in thread