* [PATCH V6 1/4] block: avoid double io accounting for flush request
2021-05-07 14:42 [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
@ 2021-05-07 14:42 ` Ming Lei
2021-05-07 14:42 ` [PATCH V6 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
` (3 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2021-05-07 14:42 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
Christoph Hellwig, David Jeffery, Ming Lei
For flush request, rq->end_io() may be called two times, one is from
timeout handling(blk_mq_check_expired()), another is from normal
completion(__blk_mq_end_request()).
Move blk_account_io_flush() after flush_rq->ref drops to zero, so
io accounting can be done just once for flush request.
Fixes: b68663186577 ("block: add iostat counters for flush requests")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-flush.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 7942ca6ed321..1002f6c58181 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -219,8 +219,6 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
unsigned long flags = 0;
struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
- blk_account_io_flush(flush_rq);
-
/* release the tag's ownership to the req cloned from */
spin_lock_irqsave(&fq->mq_flush_lock, flags);
@@ -230,6 +228,7 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
return;
}
+ blk_account_io_flush(flush_rq);
/*
* Flush request has to be marked as IDLE when it is really ended
* because its .end_io() is called from timeout code path too for
--
2.29.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V6 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
2021-05-07 14:42 [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
2021-05-07 14:42 ` [PATCH V6 1/4] block: avoid double io accounting for flush request Ming Lei
@ 2021-05-07 14:42 ` Ming Lei
2021-05-07 14:42 ` [PATCH V6 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
` (2 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2021-05-07 14:42 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
Christoph Hellwig, David Jeffery, Ming Lei
Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and
this way will prevent the request from being re-used when ->fn is
running. The approach is same as what we do during handling timeout.
Fix request use-after-free(UAF) related with completion race or queue
releasing:
- If one rq is referred before rq->q is frozen, then queue won't be
frozen before the request is released during iteration.
- If one rq is referred after rq->q is frozen, refcount_inc_not_zero()
will return false, and we won't iterate over this request.
However, still one request UAF not covered: refcount_inc_not_zero() may
read one freed request, and it will be handled in next patch.
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq-tag.c | 44 +++++++++++++++++++++++++++++++++-----------
block/blk-mq.c | 14 +++++++++-----
block/blk-mq.h | 1 +
3 files changed, 43 insertions(+), 16 deletions(-)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 2a37731e8244..544edf2c56a5 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -199,6 +199,16 @@ struct bt_iter_data {
bool reserved;
};
+static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
+ unsigned int bitnr)
+{
+ struct request *rq = tags->rqs[bitnr];
+
+ if (!rq || !refcount_inc_not_zero(&rq->ref))
+ return NULL;
+ return rq;
+}
+
static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
{
struct bt_iter_data *iter_data = data;
@@ -206,18 +216,22 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
struct blk_mq_tags *tags = hctx->tags;
bool reserved = iter_data->reserved;
struct request *rq;
+ bool ret = true;
if (!reserved)
bitnr += tags->nr_reserved_tags;
- rq = tags->rqs[bitnr];
-
/*
* We can hit rq == NULL here, because the tagging functions
* test and set the bit before assigning ->rqs[].
*/
- if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
- return iter_data->fn(hctx, rq, iter_data->data, reserved);
- return true;
+ rq = blk_mq_find_and_get_req(tags, bitnr);
+ if (!rq)
+ return true;
+
+ if (rq->q == hctx->queue && rq->mq_hctx == hctx)
+ ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
+ blk_mq_put_rq_ref(rq);
+ return ret;
}
/**
@@ -264,6 +278,8 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
struct blk_mq_tags *tags = iter_data->tags;
bool reserved = iter_data->flags & BT_TAG_ITER_RESERVED;
struct request *rq;
+ bool ret = true;
+ bool iter_static_rqs = !!(iter_data->flags & BT_TAG_ITER_STATIC_RQS);
if (!reserved)
bitnr += tags->nr_reserved_tags;
@@ -272,16 +288,19 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
* We can hit rq == NULL here, because the tagging functions
* test and set the bit before assigning ->rqs[].
*/
- if (iter_data->flags & BT_TAG_ITER_STATIC_RQS)
+ if (iter_static_rqs)
rq = tags->static_rqs[bitnr];
else
- rq = tags->rqs[bitnr];
+ rq = blk_mq_find_and_get_req(tags, bitnr);
if (!rq)
return true;
- if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
- !blk_mq_request_started(rq))
- return true;
- return iter_data->fn(rq, iter_data->data, reserved);
+
+ if (!(iter_data->flags & BT_TAG_ITER_STARTED) ||
+ blk_mq_request_started(rq))
+ ret = iter_data->fn(rq, iter_data->data, reserved);
+ if (!iter_static_rqs)
+ blk_mq_put_rq_ref(rq);
+ return ret;
}
/**
@@ -348,6 +367,9 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
* indicates whether or not @rq is a reserved request. Return
* true to continue iterating tags, false to stop.
* @priv: Will be passed as second argument to @fn.
+ *
+ * We grab one request reference before calling @fn and release it after
+ * @fn returns.
*/
void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
busy_tag_iter_fn *fn, void *priv)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 466676bc2f0b..d053e80fa6d7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -909,6 +909,14 @@ static bool blk_mq_req_expired(struct request *rq, unsigned long *next)
return false;
}
+void blk_mq_put_rq_ref(struct request *rq)
+{
+ if (is_flush_rq(rq, rq->mq_hctx))
+ rq->end_io(rq, 0);
+ else if (refcount_dec_and_test(&rq->ref))
+ __blk_mq_free_request(rq);
+}
+
static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
struct request *rq, void *priv, bool reserved)
{
@@ -942,11 +950,7 @@ static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
if (blk_mq_req_expired(rq, next))
blk_mq_rq_timed_out(rq, reserved);
- if (is_flush_rq(rq, hctx))
- rq->end_io(rq, 0);
- else if (refcount_dec_and_test(&rq->ref))
- __blk_mq_free_request(rq);
-
+ blk_mq_put_rq_ref(rq);
return true;
}
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 9ce64bc4a6c8..556368d2c5b6 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -47,6 +47,7 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *start);
+void blk_mq_put_rq_ref(struct request *rq);
/*
* Internal helpers for allocating/freeing the request map
--
2.29.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V6 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool
2021-05-07 14:42 [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
2021-05-07 14:42 ` [PATCH V6 1/4] block: avoid double io accounting for flush request Ming Lei
2021-05-07 14:42 ` [PATCH V6 2/4] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter Ming Lei
@ 2021-05-07 14:42 ` Ming Lei
2021-05-07 14:42 ` [PATCH V6 4/4] blk-mq: clearing flush request reference in tags->rqs[] Ming Lei
2021-05-11 5:05 ` [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Shinichiro Kawasaki
4 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2021-05-07 14:42 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
Christoph Hellwig, David Jeffery, Ming Lei
refcount_inc_not_zero() in bt_tags_iter() still may read one freed
request.
Fix the issue by the following approach:
1) hold a per-tags spinlock when reading ->rqs[tag] and calling
refcount_inc_not_zero in bt_tags_iter()
2) clearing stale request referred via ->rqs[tag] before freeing
request pool, the per-tags spinlock is held for clearing stale
->rq[tag]
So after we cleared stale requests, bt_tags_iter() won't observe
freed request any more, also the clearing will wait for pending
request reference.
The idea of clearing ->rqs[] is borrowed from John Garry's previous
patch and one recent David's patch.
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq-tag.c | 9 +++++++--
block/blk-mq-tag.h | 6 ++++++
block/blk-mq.c | 46 +++++++++++++++++++++++++++++++++++++++++-----
3 files changed, 54 insertions(+), 7 deletions(-)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 544edf2c56a5..1671dae43030 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -202,10 +202,14 @@ struct bt_iter_data {
static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
unsigned int bitnr)
{
- struct request *rq = tags->rqs[bitnr];
+ struct request *rq;
+ unsigned long flags;
+ spin_lock_irqsave(&tags->lock, flags);
+ rq = tags->rqs[bitnr];
if (!rq || !refcount_inc_not_zero(&rq->ref))
- return NULL;
+ rq = NULL;
+ spin_unlock_irqrestore(&tags->lock, flags);
return rq;
}
@@ -538,6 +542,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
tags->nr_tags = total_tags;
tags->nr_reserved_tags = reserved_tags;
+ spin_lock_init(&tags->lock);
if (blk_mq_is_sbitmap_shared(flags))
return tags;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 7d3e6b333a4a..f887988e5ef6 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -20,6 +20,12 @@ struct blk_mq_tags {
struct request **rqs;
struct request **static_rqs;
struct list_head page_list;
+
+ /*
+ * used to clear request reference in rqs[] before freeing one
+ * request pool
+ */
+ spinlock_t lock;
};
extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d053e80fa6d7..626b4483e006 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2306,6 +2306,45 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio)
return BLK_QC_T_NONE;
}
+static size_t order_to_size(unsigned int order)
+{
+ return (size_t)PAGE_SIZE << order;
+}
+
+/* called before freeing request pool in @tags */
+static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
+ struct blk_mq_tags *tags, unsigned int hctx_idx)
+{
+ struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
+ struct page *page;
+ unsigned long flags;
+
+ list_for_each_entry(page, &tags->page_list, lru) {
+ unsigned long start = (unsigned long)page_address(page);
+ unsigned long end = start + order_to_size(page->private);
+ int i;
+
+ for (i = 0; i < set->queue_depth; i++) {
+ struct request *rq = drv_tags->rqs[i];
+ unsigned long rq_addr = (unsigned long)rq;
+
+ if (rq_addr >= start && rq_addr < end) {
+ WARN_ON_ONCE(refcount_read(&rq->ref) != 0);
+ cmpxchg(&drv_tags->rqs[i], rq, NULL);
+ }
+ }
+ }
+
+ /*
+ * Wait until all pending iteration is done.
+ *
+ * Request reference is cleared and it is guaranteed to be observed
+ * after the ->lock is released.
+ */
+ spin_lock_irqsave(&drv_tags->lock, flags);
+ spin_unlock_irqrestore(&drv_tags->lock, flags);
+}
+
void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
unsigned int hctx_idx)
{
@@ -2324,6 +2363,8 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
}
}
+ blk_mq_clear_rq_mapping(set, tags, hctx_idx);
+
while (!list_empty(&tags->page_list)) {
page = list_first_entry(&tags->page_list, struct page, lru);
list_del_init(&page->lru);
@@ -2383,11 +2424,6 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
return tags;
}
-static size_t order_to_size(unsigned int order)
-{
- return (size_t)PAGE_SIZE << order;
-}
-
static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq,
unsigned int hctx_idx, int node)
{
--
2.29.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V6 4/4] blk-mq: clearing flush request reference in tags->rqs[]
2021-05-07 14:42 [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
` (2 preceding siblings ...)
2021-05-07 14:42 ` [PATCH V6 3/4] blk-mq: clear stale request in tags->rq[] before freeing one request pool Ming Lei
@ 2021-05-07 14:42 ` Ming Lei
2021-05-11 5:05 ` [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Shinichiro Kawasaki
4 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2021-05-07 14:42 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Bart Van Assche, Hannes Reinecke, John Garry,
Christoph Hellwig, David Jeffery, Ming Lei
Before we free request queue, clearing flush request reference in
tags->rqs[], so that potential UAF can be avoided.
Based on one patch written by David Jeffery.
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 626b4483e006..fcd5ed79011f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2642,16 +2642,45 @@ static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
&hctx->cpuhp_dead);
}
+/*
+ * Before freeing hw queue, clearing the flush request reference in
+ * tags->rqs[] for avoiding potential UAF.
+ */
+static void blk_mq_clear_flush_rq_mapping(struct blk_mq_tags *tags,
+ unsigned int queue_depth, struct request *flush_rq)
+{
+ int i;
+ unsigned long flags;
+
+ WARN_ON_ONCE(refcount_read(&flush_rq->ref) != 0);
+
+ for (i = 0; i < queue_depth; i++)
+ cmpxchg(&tags->rqs[i], flush_rq, NULL);
+
+ /*
+ * Wait until all pending iteration is done.
+ *
+ * Request reference is cleared and it is guaranteed to be observed
+ * after the ->lock is released.
+ */
+ spin_lock_irqsave(&tags->lock, flags);
+ spin_unlock_irqrestore(&tags->lock, flags);
+}
+
/* hctx->ctxs will be freed in queue's release handler */
static void blk_mq_exit_hctx(struct request_queue *q,
struct blk_mq_tag_set *set,
struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
{
+ struct request *flush_rq = hctx->fq->flush_rq;
+
if (blk_mq_hw_queue_mapped(hctx))
blk_mq_tag_idle(hctx);
+ blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx],
+ set->queue_depth, flush_rq);
if (set->ops->exit_request)
- set->ops->exit_request(set, hctx->fq->flush_rq, hctx_idx);
+ set->ops->exit_request(set, flush_rq, hctx_idx);
if (set->ops->exit_hctx)
set->ops->exit_hctx(hctx, hctx_idx);
--
2.29.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests
2021-05-07 14:42 [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Ming Lei
` (3 preceding siblings ...)
2021-05-07 14:42 ` [PATCH V6 4/4] blk-mq: clearing flush request reference in tags->rqs[] Ming Lei
@ 2021-05-11 5:05 ` Shinichiro Kawasaki
2021-05-11 6:36 ` Ming Lei
4 siblings, 1 reply; 8+ messages in thread
From: Shinichiro Kawasaki @ 2021-05-11 5:05 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
John Garry, Christoph Hellwig, David Jeffery
On May 07, 2021 / 22:42, Ming Lei wrote:
> Hi Jens,
>
> This patchset fixes the request UAF issue by one simple approach,
> without clearing ->rqs[] in fast path, please consider it for 5.13.
>
> 1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
> and release it after calling ->fn, so ->fn won't be called for one
> request if its queue is frozen, done in 2st patch
>
> 2) clearing any stale request referred in ->rqs[] before freeing the
> request pool, one per-tags spinlock is added for protecting
> grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
> in bt_tags_iter() is avoided, done in 3rd patch.
Ming, thank you for your effort to fix the UAF issue. I applied the V6 series to
the kernel v5.13-rc1, and confirmed that the series avoids the UAF I had been
observing with blktests block/005 and HDD behind HBA. This is good. However, I
found that the series triggered block/029 hang. Let me share the kernel message
below, which was printed at the hang. KASAN reported null-ptr-deref.
[ 2124.489023] run blktests block/029 at 2021-05-11 13:42:22
[ 2124.561386] null_blk: module loaded
[ 2125.201166] general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] SMP KASAN PTI
[ 2125.212387] KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
[ 2125.220656] CPU: 2 PID: 26514 Comm: check Not tainted 5.13.0-rc1+ #3
[ 2125.227710] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015
[ 2125.235793] RIP: 0010:blk_mq_exit_hctx+0x21b/0x580
[ 2125.241298] Code: 00 00 00 31 db 48 89 44 24 20 85 d2 74 54 49 89 c5 48 b8 00 00 00 00 00 fc ff df 49 c1 ed 03 4c 01 e8 48 89 04 24 48 8b 04 24 <80> 38 00 0f 85 8a 02 00 00 49 8b 87 90 00 00 00 48 63 d3 be 08 00
[ 2125.260747] RSP: 0018:ffff888110677c08 EFLAGS: 00010286
[ 2125.266674] RAX: dffffc0000000012 RBX: 0000000000000000 RCX: ffffffffa4f3710f
[ 2125.274500] RDX: 0000000000000040 RSI: 0000000000000004 RDI: ffff88811f80c4e8
[ 2125.282326] RBP: ffff8881483c0000 R08: 0000000000000000 R09: ffff88811f80c4eb
[ 2125.290153] R10: ffffed1023f0189d R11: 0000000000000001 R12: ffff88811f80c400
[ 2125.297978] R13: 0000000000000012 R14: ffff88810c506038 R15: 0000000000000000
[ 2125.305807] FS: 00007f6dc4d4a740(0000) GS:ffff8883e1480000(0000) knlGS:0000000000000000
[ 2125.314592] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2125.321032] CR2: 00005566b34e55f0 CR3: 000000012c33e005 CR4: 00000000001706e0
[ 2125.328860] Call Trace:
[ 2125.332013] blk_mq_realloc_hw_ctxs+0x71a/0x15f0
[ 2125.337338] ? blk_mq_map_queues+0x20c/0x650
[ 2125.342317] blk_mq_update_nr_hw_queues+0x4cc/0xb70
[ 2125.347905] ? blk_mq_init_queue+0xb0/0xb0
[ 2125.352709] nullb_device_submit_queues_store+0x10f/0x1f0 [null_blk]
[ 2125.359781] ? __null_lookup_page.isra.0+0xd0/0xd0 [null_blk]
[ 2125.366243] ? __null_lookup_page.isra.0+0xd0/0xd0 [null_blk]
[ 2125.372697] configfs_write_file+0x2bb/0x450
[ 2125.377676] vfs_write+0x1cb/0x840
[ 2125.381783] ksys_write+0xe9/0x1b0
[ 2125.385890] ? __ia32_sys_read+0xb0/0xb0
[ 2125.390517] ? syscall_enter_from_user_mode+0x27/0x80
[ 2125.396276] do_syscall_64+0x40/0x80
[ 2125.400553] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 2125.406308] RIP: 0033:0x7f6dc4e3f4e7
[ 2125.410590] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 2125.430036] RSP: 002b:00007ffc7ee29928 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 2125.438306] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f6dc4e3f4e7
[ 2125.446132] RDX: 0000000000000002 RSI: 00005566b34e55f0 RDI: 0000000000000001
[ 2125.453957] RBP: 00005566b34e55f0 R08: 000000000000000a R09: 00007f6dc4ed60c0
[ 2125.461783] R10: 00007f6dc4ed5fc0 R11: 0000000000000246 R12: 0000000000000002
[ 2125.469610] R13: 00007f6dc4f12520 R14: 0000000000000002 R15: 00007f6dc4f12720
[ 2125.477446] Modules linked in: null_blk xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp tun nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security bridge stp llc ip_set rfkill nfnetlink target_core_user ebtable_filter ebtables target_core_mod ip6table_filter ip6_tables iptable_filter sunrpc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iTCO_wdt intel_pmc_bxt iTCO_vendor_support kvm irqbypass rapl intel_cstate intel_uncore joydev i2c_i801 pcspkr i2c_smbus ses enclosure lpc_ich mei_me mei ioatdma wmi ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter
[ 2125.477718] acpi_pad zram ip_tables drm_vram_helper drm_kms_helper cec drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul crc32c_intel drm ghash_clmulni_intel igb mpt3sas nvme nvme_core dca i2c_algo_bit raid_class scsi_transport_sas fuse [last unloaded: null_blk]
[ 2125.588156] ---[ end trace 94b8e87c4a29c520 ]---
[ 2125.615124] RIP: 0010:blk_mq_exit_hctx+0x21b/0x580
[ 2125.620628] Code: 00 00 00 31 db 48 89 44 24 20 85 d2 74 54 49 89 c5 48 b8 00 00 00 00 00 fc ff df 49 c1 ed 03 4c 01 e8 48 89 04 24 48 8b 04 24 <80> 38 00 0f 85 8a 02 00 00 49 8b 87 90 00 00 00 48 63 d3 be 08 00
[ 2125.640077] RSP: 0018:ffff888110677c08 EFLAGS: 00010286
[ 2125.646007] RAX: dffffc0000000012 RBX: 0000000000000000 RCX: ffffffffa4f3710f
[ 2125.653844] RDX: 0000000000000040 RSI: 0000000000000004 RDI: ffff88811f80c4e8
[ 2125.661675] RBP: ffff8881483c0000 R08: 0000000000000000 R09: ffff88811f80c4eb
[ 2125.669502] R10: ffffed1023f0189d R11: 0000000000000001 R12: ffff88811f80c400
[ 2125.677327] R13: 0000000000000012 R14: ffff88810c506038 R15: 0000000000000000
[ 2125.685158] FS: 00007f6dc4d4a740(0000) GS:ffff8883e1480000(0000) knlGS:0000000000000000
[ 2125.693941] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2125.700382] CR2: 00005566b34e55f0 CR3: 000000012c33e005 CR4: 00000000001706e0
--
Best Regards,
Shin'ichiro Kawasaki
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests
2021-05-11 5:05 ` [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests Shinichiro Kawasaki
@ 2021-05-11 6:36 ` Ming Lei
2021-05-11 12:23 ` Shinichiro Kawasaki
0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2021-05-11 6:36 UTC (permalink / raw)
To: Shinichiro Kawasaki
Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
John Garry, Christoph Hellwig, David Jeffery
Hello Shinichiro,
Thanks for your test!
On Tue, May 11, 2021 at 05:05:52AM +0000, Shinichiro Kawasaki wrote:
> On May 07, 2021 / 22:42, Ming Lei wrote:
> > Hi Jens,
> >
> > This patchset fixes the request UAF issue by one simple approach,
> > without clearing ->rqs[] in fast path, please consider it for 5.13.
> >
> > 1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
> > and release it after calling ->fn, so ->fn won't be called for one
> > request if its queue is frozen, done in 2st patch
> >
> > 2) clearing any stale request referred in ->rqs[] before freeing the
> > request pool, one per-tags spinlock is added for protecting
> > grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
> > in bt_tags_iter() is avoided, done in 3rd patch.
>
> Ming, thank you for your effort to fix the UAF issue. I applied the V6 series to
> the kernel v5.13-rc1, and confirmed that the series avoids the UAF I had been
> observing with blktests block/005 and HDD behind HBA. This is good. However, I
> found that the series triggered block/029 hang. Let me share the kernel message
> below, which was printed at the hang. KASAN reported null-ptr-deref.
>
> [ 2124.489023] run blktests block/029 at 2021-05-11 13:42:22
> [ 2124.561386] null_blk: module loaded
> [ 2125.201166] general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] SMP KASAN PTI
> [ 2125.212387] KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
It is because this hw queue isn't mapped yet and new added hw queue is
mapped in blk_mq_map_swqueue(), and the following change can fix it, and
I will post V7 after careful test.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index fcd5ed79011f..691b555c26fa 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2652,6 +2652,10 @@ static void blk_mq_clear_flush_rq_mapping(struct blk_mq_tags *tags,
int i;
unsigned long flags;
+ /* return if hw queue isn't mapped */
+ if (!tags)
+ return;
+
WARN_ON_ONCE(refcount_read(&flush_rq->ref) != 0);
for (i = 0; i < queue_depth; i++)
Thanks,
Ming
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH V6 0/4] blk-mq: fix request UAF related with iterating over tagset requests
2021-05-11 6:36 ` Ming Lei
@ 2021-05-11 12:23 ` Shinichiro Kawasaki
0 siblings, 0 replies; 8+ messages in thread
From: Shinichiro Kawasaki @ 2021-05-11 12:23 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
John Garry, Christoph Hellwig, David Jeffery
On May 11, 2021 / 14:36, Ming Lei wrote:
> Hello Shinichiro,
>
> Thanks for your test!
>
> On Tue, May 11, 2021 at 05:05:52AM +0000, Shinichiro Kawasaki wrote:
> > On May 07, 2021 / 22:42, Ming Lei wrote:
> > > Hi Jens,
> > >
> > > This patchset fixes the request UAF issue by one simple approach,
> > > without clearing ->rqs[] in fast path, please consider it for 5.13.
> > >
> > > 1) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
> > > and release it after calling ->fn, so ->fn won't be called for one
> > > request if its queue is frozen, done in 2st patch
> > >
> > > 2) clearing any stale request referred in ->rqs[] before freeing the
> > > request pool, one per-tags spinlock is added for protecting
> > > grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
> > > in bt_tags_iter() is avoided, done in 3rd patch.
> >
> > Ming, thank you for your effort to fix the UAF issue. I applied the V6 series to
> > the kernel v5.13-rc1, and confirmed that the series avoids the UAF I had been
> > observing with blktests block/005 and HDD behind HBA. This is good. However, I
> > found that the series triggered block/029 hang. Let me share the kernel message
> > below, which was printed at the hang. KASAN reported null-ptr-deref.
> >
> > [ 2124.489023] run blktests block/029 at 2021-05-11 13:42:22
> > [ 2124.561386] null_blk: module loaded
> > [ 2125.201166] general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] SMP KASAN PTI
> > [ 2125.212387] KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
>
> It is because this hw queue isn't mapped yet and new added hw queue is
> mapped in blk_mq_map_swqueue(), and the following change can fix it, and
> I will post V7 after careful test.
>
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index fcd5ed79011f..691b555c26fa 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2652,6 +2652,10 @@ static void blk_mq_clear_flush_rq_mapping(struct blk_mq_tags *tags,
> int i;
> unsigned long flags;
>
> + /* return if hw queue isn't mapped */
> + if (!tags)
> + return;
> +
> WARN_ON_ONCE(refcount_read(&flush_rq->ref) != 0);
>
> for (i = 0; i < queue_depth; i++)
Thank you Ming. I confirmed that this change avoids the hang at block/029.
--
Best Regards,
Shin'ichiro Kawasaki
^ permalink raw reply [flat|nested] 8+ messages in thread