linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] blk-mq: improve blk_mq_tag_to_rq()
@ 2021-03-01  2:14 Yufen Yu
  2021-03-01  2:14 ` [RFC PATCH 1/2] blk-mq: test tags bitmap before get request Yufen Yu
  2021-03-01  2:14 ` [RFC PATCH 2/2] blk-mq: blk_mq_tag_to_rq() always return null for sched_tags Yufen Yu
  0 siblings, 2 replies; 10+ messages in thread
From: Yufen Yu @ 2021-03-01  2:14 UTC (permalink / raw)
  To: axboe, linux-block; +Cc: josef, ming.lei, hch, bvanassche, yuyufen

Hi,
   The first patch try to improve blk_mq_tag_to_rq().
   The second patch will cleanup code.

Yufen Yu (2):
  blk-mq: test tags bitmap before get request
  blk-mq: blk_mq_tag_to_rq() always return null for sched_tags

 block/blk-mq.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/2] blk-mq: test tags bitmap before get request
  2021-03-01  2:14 [RFC PATCH 0/2] blk-mq: improve blk_mq_tag_to_rq() Yufen Yu
@ 2021-03-01  2:14 ` Yufen Yu
  2021-03-01  2:49   ` Damien Le Moal
  2021-03-01  3:49   ` Bart Van Assche
  2021-03-01  2:14 ` [RFC PATCH 2/2] blk-mq: blk_mq_tag_to_rq() always return null for sched_tags Yufen Yu
  1 sibling, 2 replies; 10+ messages in thread
From: Yufen Yu @ 2021-03-01  2:14 UTC (permalink / raw)
  To: axboe, linux-block; +Cc: josef, ming.lei, hch, bvanassche, yuyufen

For now, we set hctx->tags->rqs[i] when get driver tag successfully.
The request either comes from sched_tags->static_rqs[] with scheduler,
or comes from tags->static_rqs[] with no scheduler. But, the value won't
be clear when put driver tag. Thus, tags->rqs[i] still remain old request.

We can free these sched_tags->static_rqs[] requests when switch elevator,
update nr_requests or update nr_hw_queues. After that, unexpected access
of tags->rqs[i] may cause use-after-free crash.

For example, we reported use-after-free of request in nbd device
by syzkaller:

BUG: KASAN: use-after-free in blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
Read of size 4 at addr ffff80036b77f9d4 by task kworker/u9:0/10086
Call trace:
 dump_backtrace+0x0/0x310 arch/arm64/kernel/time.c:78
 show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x144/0x1b4 lib/dump_stack.c:118
 print_address_description+0x68/0x2d0 mm/kasan/report.c:253
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x134/0x2f0 mm/kasan/report.c:409
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 __asan_load4+0x88/0xb0 mm/kasan/kasan.c:699
 __read_once_size include/linux/compiler.h:193 [inline]
 blk_mq_rq_state block/blk-mq.h:106 [inline]
 blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
 nbd_read_stat drivers/block/nbd.c:670 [inline]
 recv_work+0x1bc/0x890 drivers/block/nbd.c:749
 process_one_work+0x3ec/0x9e0 kernel/workqueue.c:2156
 worker_thread+0x80/0x9d0 kernel/workqueue.c:2311
 kthread+0x1d8/0x1e0 kernel/kthread.c:255
 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1174

The syzkaller test program sended a reply package to client
without client sending request. After receiving the package,
recv_work() try to get the remained request in tags->rqs[]
by tag, which have been free.

To avoid this type of problem, we may need to ensure the request
valid when get it by tag.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 block/blk-mq.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index d4d7c1caa439..5362a7958b74 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -836,9 +836,17 @@ void blk_mq_delay_kick_requeue_list(struct request_queue *q,
 }
 EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
 
+static int blk_mq_test_tag_bit(struct blk_mq_tags *tags, unsigned int tag)
+{
+	if (!blk_mq_tag_is_reserved(tags, tag))
+		return sbitmap_test_bit(&tags->bitmap_tags->sb, tag);
+	else
+		return sbitmap_test_bit(&tags->breserved_tags->sb, tag);
+}
+
 struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
 {
-	if (tag < tags->nr_tags) {
+	if (tag < tags->nr_tags && blk_mq_test_tag_bit(tags, tag)) {
 		prefetch(tags->rqs[tag]);
 		return tags->rqs[tag];
 	}
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/2] blk-mq: blk_mq_tag_to_rq() always return null for sched_tags
  2021-03-01  2:14 [RFC PATCH 0/2] blk-mq: improve blk_mq_tag_to_rq() Yufen Yu
  2021-03-01  2:14 ` [RFC PATCH 1/2] blk-mq: test tags bitmap before get request Yufen Yu
@ 2021-03-01  2:14 ` Yufen Yu
  2021-03-01  2:48   ` Damien Le Moal
  2021-03-01  6:50   ` Ming Lei
  1 sibling, 2 replies; 10+ messages in thread
From: Yufen Yu @ 2021-03-01  2:14 UTC (permalink / raw)
  To: axboe, linux-block; +Cc: josef, ming.lei, hch, bvanassche, yuyufen

We just set hctx->tags->rqs[tag] when get driver tag, but will
not set hctx->sched_tags->rqs[tag] when get sched tag.
So, blk_mq_tag_to_rq() always return NULL for sched_tags.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 block/blk-mq.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5362a7958b74..424afe112b58 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -846,6 +846,7 @@ static int blk_mq_test_tag_bit(struct blk_mq_tags *tags, unsigned int tag)
 
 struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
 {
+	/* if tags is hctx->sched_tags, it always return NULL */
 	if (tag < tags->nr_tags && blk_mq_test_tag_bit(tags, tag)) {
 		prefetch(tags->rqs[tag]);
 		return tags->rqs[tag];
@@ -3845,17 +3846,8 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
 
 	if (!blk_qc_t_is_internal(cookie))
 		rq = blk_mq_tag_to_rq(hctx->tags, blk_qc_t_to_tag(cookie));
-	else {
-		rq = blk_mq_tag_to_rq(hctx->sched_tags, blk_qc_t_to_tag(cookie));
-		/*
-		 * With scheduling, if the request has completed, we'll
-		 * get a NULL return here, as we clear the sched tag when
-		 * that happens. The request still remains valid, like always,
-		 * so we should be safe with just the NULL check.
-		 */
-		if (!rq)
-			return false;
-	}
+	else
+		return false;
 
 	return blk_mq_poll_hybrid_sleep(q, rq);
 }
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] blk-mq: blk_mq_tag_to_rq() always return null for sched_tags
  2021-03-01  2:14 ` [RFC PATCH 2/2] blk-mq: blk_mq_tag_to_rq() always return null for sched_tags Yufen Yu
@ 2021-03-01  2:48   ` Damien Le Moal
  2021-03-01  6:50   ` Ming Lei
  1 sibling, 0 replies; 10+ messages in thread
From: Damien Le Moal @ 2021-03-01  2:48 UTC (permalink / raw)
  To: Yufen Yu, axboe, linux-block; +Cc: josef, ming.lei, hch, bvanassche

On 2021/03/01 11:12, Yufen Yu wrote:
> We just set hctx->tags->rqs[tag] when get driver tag, but will
> not set hctx->sched_tags->rqs[tag] when get sched tag.
> So, blk_mq_tag_to_rq() always return NULL for sched_tags.
> 
> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
> ---
>  block/blk-mq.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 5362a7958b74..424afe112b58 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -846,6 +846,7 @@ static int blk_mq_test_tag_bit(struct blk_mq_tags *tags, unsigned int tag)
>  
>  struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
>  {
> +	/* if tags is hctx->sched_tags, it always return NULL */
>  	if (tag < tags->nr_tags && blk_mq_test_tag_bit(tags, tag)) {
>  		prefetch(tags->rqs[tag]);
>  		return tags->rqs[tag];
> @@ -3845,17 +3846,8 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
>  
>  	if (!blk_qc_t_is_internal(cookie))
>  		rq = blk_mq_tag_to_rq(hctx->tags, blk_qc_t_to_tag(cookie));
> -	else {
> -		rq = blk_mq_tag_to_rq(hctx->sched_tags, blk_qc_t_to_tag(cookie));
> -		/*
> -		 * With scheduling, if the request has completed, we'll
> -		 * get a NULL return here, as we clear the sched tag when
> -		 * that happens. The request still remains valid, like always,
> -		 * so we should be safe with just the NULL check.
> -		 */
> -		if (!rq)
> -			return false;
> -	}
> +	else
> +		return false;

Reverse the if condition to avoid the "else". That will nicely cleanup the code.

>  
>  	return blk_mq_poll_hybrid_sleep(q, rq);
>  }
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/2] blk-mq: test tags bitmap before get request
  2021-03-01  2:14 ` [RFC PATCH 1/2] blk-mq: test tags bitmap before get request Yufen Yu
@ 2021-03-01  2:49   ` Damien Le Moal
  2021-03-01  3:49   ` Bart Van Assche
  1 sibling, 0 replies; 10+ messages in thread
From: Damien Le Moal @ 2021-03-01  2:49 UTC (permalink / raw)
  To: Yufen Yu, axboe, linux-block; +Cc: josef, ming.lei, hch, bvanassche

On 2021/03/01 11:13, Yufen Yu wrote:
> For now, we set hctx->tags->rqs[i] when get driver tag successfully.
> The request either comes from sched_tags->static_rqs[] with scheduler,
> or comes from tags->static_rqs[] with no scheduler. But, the value won't
> be clear when put driver tag. Thus, tags->rqs[i] still remain old request.
> 
> We can free these sched_tags->static_rqs[] requests when switch elevator,
> update nr_requests or update nr_hw_queues. After that, unexpected access
> of tags->rqs[i] may cause use-after-free crash.
> 
> For example, we reported use-after-free of request in nbd device
> by syzkaller:
> 
> BUG: KASAN: use-after-free in blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
> Read of size 4 at addr ffff80036b77f9d4 by task kworker/u9:0/10086
> Call trace:
>  dump_backtrace+0x0/0x310 arch/arm64/kernel/time.c:78
>  show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x144/0x1b4 lib/dump_stack.c:118
>  print_address_description+0x68/0x2d0 mm/kasan/report.c:253
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x134/0x2f0 mm/kasan/report.c:409
>  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
>  __asan_load4+0x88/0xb0 mm/kasan/kasan.c:699
>  __read_once_size include/linux/compiler.h:193 [inline]
>  blk_mq_rq_state block/blk-mq.h:106 [inline]
>  blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
>  nbd_read_stat drivers/block/nbd.c:670 [inline]
>  recv_work+0x1bc/0x890 drivers/block/nbd.c:749
>  process_one_work+0x3ec/0x9e0 kernel/workqueue.c:2156
>  worker_thread+0x80/0x9d0 kernel/workqueue.c:2311
>  kthread+0x1d8/0x1e0 kernel/kthread.c:255
>  ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1174
> 
> The syzkaller test program sended a reply package to client
> without client sending request. After receiving the package,
> recv_work() try to get the remained request in tags->rqs[]
> by tag, which have been free.
> 
> To avoid this type of problem, we may need to ensure the request
> valid when get it by tag.
> 
> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
> ---
>  block/blk-mq.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index d4d7c1caa439..5362a7958b74 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -836,9 +836,17 @@ void blk_mq_delay_kick_requeue_list(struct request_queue *q,
>  }
>  EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
>  
> +static int blk_mq_test_tag_bit(struct blk_mq_tags *tags, unsigned int tag)
> +{
> +	if (!blk_mq_tag_is_reserved(tags, tag))
> +		return sbitmap_test_bit(&tags->bitmap_tags->sb, tag);
> +	else

No need for else after a return.

> +		return sbitmap_test_bit(&tags->breserved_tags->sb, tag);
> +}
> +
>  struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
>  {
> -	if (tag < tags->nr_tags) {
> +	if (tag < tags->nr_tags && blk_mq_test_tag_bit(tags, tag)) {
>  		prefetch(tags->rqs[tag]);
>  		return tags->rqs[tag];
>  	}
> 



-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/2] blk-mq: test tags bitmap before get request
  2021-03-01  2:14 ` [RFC PATCH 1/2] blk-mq: test tags bitmap before get request Yufen Yu
  2021-03-01  2:49   ` Damien Le Moal
@ 2021-03-01  3:49   ` Bart Van Assche
  2021-03-01  7:54     ` Hannes Reinecke
  1 sibling, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2021-03-01  3:49 UTC (permalink / raw)
  To: Yufen Yu, axboe, linux-block; +Cc: josef, ming.lei, hch

On 2/28/21 6:14 PM, Yufen Yu wrote:
> For now, we set hctx->tags->rqs[i] when get driver tag successfully.
> The request either comes from sched_tags->static_rqs[] with scheduler,
> or comes from tags->static_rqs[] with no scheduler. But, the value won't
> be clear when put driver tag. Thus, tags->rqs[i] still remain old request.
> 
> We can free these sched_tags->static_rqs[] requests when switch elevator,
> update nr_requests or update nr_hw_queues. After that, unexpected access
> of tags->rqs[i] may cause use-after-free crash.
> 
> For example, we reported use-after-free of request in nbd device
> by syzkaller:
> 
> BUG: KASAN: use-after-free in blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
> Read of size 4 at addr ffff80036b77f9d4 by task kworker/u9:0/10086
> Call trace:
>  dump_backtrace+0x0/0x310 arch/arm64/kernel/time.c:78
>  show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x144/0x1b4 lib/dump_stack.c:118
>  print_address_description+0x68/0x2d0 mm/kasan/report.c:253
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x134/0x2f0 mm/kasan/report.c:409
>  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
>  __asan_load4+0x88/0xb0 mm/kasan/kasan.c:699
>  __read_once_size include/linux/compiler.h:193 [inline]
>  blk_mq_rq_state block/blk-mq.h:106 [inline]
>  blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
>  nbd_read_stat drivers/block/nbd.c:670 [inline]
>  recv_work+0x1bc/0x890 drivers/block/nbd.c:749
>  process_one_work+0x3ec/0x9e0 kernel/workqueue.c:2156
>  worker_thread+0x80/0x9d0 kernel/workqueue.c:2311
>  kthread+0x1d8/0x1e0 kernel/kthread.c:255
>  ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1174
> 
> The syzkaller test program sended a reply package to client
> without client sending request. After receiving the package,
> recv_work() try to get the remained request in tags->rqs[]
> by tag, which have been free.
> 
> To avoid this type of problem, we may need to ensure the request
> valid when get it by tag.
> 
> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
> ---
>  block/blk-mq.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index d4d7c1caa439..5362a7958b74 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -836,9 +836,17 @@ void blk_mq_delay_kick_requeue_list(struct request_queue *q,
>  }
>  EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
>  
> +static int blk_mq_test_tag_bit(struct blk_mq_tags *tags, unsigned int tag)
> +{
> +	if (!blk_mq_tag_is_reserved(tags, tag))
> +		return sbitmap_test_bit(&tags->bitmap_tags->sb, tag);
> +	else
> +		return sbitmap_test_bit(&tags->breserved_tags->sb, tag);
> +}
> +
>  struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
>  {
> -	if (tag < tags->nr_tags) {
> +	if (tag < tags->nr_tags && blk_mq_test_tag_bit(tags, tag)) {
>  		prefetch(tags->rqs[tag]);
>  		return tags->rqs[tag];
>  	}

Please do not slow down the hot path by inserting additional code in the
hot path. I am convinced that the race described in the patch
description can be fixed without changing the hot path. See also the
conversation I had recently with John Garry on linux-block.

Thanks,

Bart.

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] blk-mq: blk_mq_tag_to_rq() always return null for sched_tags
  2021-03-01  2:14 ` [RFC PATCH 2/2] blk-mq: blk_mq_tag_to_rq() always return null for sched_tags Yufen Yu
  2021-03-01  2:48   ` Damien Le Moal
@ 2021-03-01  6:50   ` Ming Lei
  2021-03-01  7:33     ` Yufen Yu
  1 sibling, 1 reply; 10+ messages in thread
From: Ming Lei @ 2021-03-01  6:50 UTC (permalink / raw)
  To: Yufen Yu; +Cc: axboe, linux-block, josef, hch, bvanassche

On Sun, Feb 28, 2021 at 09:14:44PM -0500, Yufen Yu wrote:
> We just set hctx->tags->rqs[tag] when get driver tag, but will
> not set hctx->sched_tags->rqs[tag] when get sched tag.
> So, blk_mq_tag_to_rq() always return NULL for sched_tags.

True, also blk_mq_tag_to_rq() seems an awkward API, and it needs
'struct blk_mq_tags *', but which is a block layer internal definition.

> 
> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
> ---
>  block/blk-mq.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 5362a7958b74..424afe112b58 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -846,6 +846,7 @@ static int blk_mq_test_tag_bit(struct blk_mq_tags *tags, unsigned int tag)
>  
>  struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
>  {
> +	/* if tags is hctx->sched_tags, it always return NULL */
>  	if (tag < tags->nr_tags && blk_mq_test_tag_bit(tags, tag)) {
>  		prefetch(tags->rqs[tag]);
>  		return tags->rqs[tag];
> @@ -3845,17 +3846,8 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
>  
>  	if (!blk_qc_t_is_internal(cookie))
>  		rq = blk_mq_tag_to_rq(hctx->tags, blk_qc_t_to_tag(cookie));
> -	else {
> -		rq = blk_mq_tag_to_rq(hctx->sched_tags, blk_qc_t_to_tag(cookie));
> -		/*
> -		 * With scheduling, if the request has completed, we'll
> -		 * get a NULL return here, as we clear the sched tag when
> -		 * that happens. The request still remains valid, like always,
> -		 * so we should be safe with just the NULL check.
> -		 */
> -		if (!rq)
> -			return false;
> -	}
> +	else
> +		return false;
>  

I think the correct fix is to retrieve the request via:

	hctx->sched_tags->static_rqs[blk_qc_t_to_tag(cookie)]

since it is nice to run blk_mq_poll_hybrid_sleep() for one
non-started request in case of real scheduler.

-- 
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] blk-mq: blk_mq_tag_to_rq() always return null for sched_tags
  2021-03-01  6:50   ` Ming Lei
@ 2021-03-01  7:33     ` Yufen Yu
  0 siblings, 0 replies; 10+ messages in thread
From: Yufen Yu @ 2021-03-01  7:33 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, linux-block, josef, hch, bvanassche



On 2021/3/1 14:50, Ming Lei wrote:
> On Sun, Feb 28, 2021 at 09:14:44PM -0500, Yufen Yu wrote:
>> We just set hctx->tags->rqs[tag] when get driver tag, but will
>> not set hctx->sched_tags->rqs[tag] when get sched tag.
>> So, blk_mq_tag_to_rq() always return NULL for sched_tags.
> 
> True, also blk_mq_tag_to_rq() seems an awkward API, and it needs
> 'struct blk_mq_tags *', but which is a block layer internal definition.
> 
>>
>> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
>> ---
>>   block/blk-mq.c | 14 +++-----------
>>   1 file changed, 3 insertions(+), 11 deletions(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 5362a7958b74..424afe112b58 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -846,6 +846,7 @@ static int blk_mq_test_tag_bit(struct blk_mq_tags *tags, unsigned int tag)
>>   
>>   struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
>>   {
>> +	/* if tags is hctx->sched_tags, it always return NULL */
>>   	if (tag < tags->nr_tags && blk_mq_test_tag_bit(tags, tag)) {
>>   		prefetch(tags->rqs[tag]);
>>   		return tags->rqs[tag];
>> @@ -3845,17 +3846,8 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
>>   
>>   	if (!blk_qc_t_is_internal(cookie))
>>   		rq = blk_mq_tag_to_rq(hctx->tags, blk_qc_t_to_tag(cookie));
>> -	else {
>> -		rq = blk_mq_tag_to_rq(hctx->sched_tags, blk_qc_t_to_tag(cookie));
>> -		/*
>> -		 * With scheduling, if the request has completed, we'll
>> -		 * get a NULL return here, as we clear the sched tag when
>> -		 * that happens. The request still remains valid, like always,
>> -		 * so we should be safe with just the NULL check.
>> -		 */
>> -		if (!rq)
>> -			return false;
>> -	}
>> +	else
>> +		return false;
>>   
> 
> I think the correct fix is to retrieve the request via:
> 
> 	hctx->sched_tags->static_rqs[blk_qc_t_to_tag(cookie)]
> 
> since it is nice to run blk_mq_poll_hybrid_sleep() for one
> non-started request in case of real scheduler.
> 

Yes, do blk_mq_poll_hybrid_sleep() should be more reasonable here.
I will modify it in next version.

Thanks,
Yufen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/2] blk-mq: test tags bitmap before get request
  2021-03-01  3:49   ` Bart Van Assche
@ 2021-03-01  7:54     ` Hannes Reinecke
  2021-03-01 12:20       ` John Garry
  0 siblings, 1 reply; 10+ messages in thread
From: Hannes Reinecke @ 2021-03-01  7:54 UTC (permalink / raw)
  To: Bart Van Assche, Yufen Yu, axboe, linux-block; +Cc: josef, ming.lei, hch

On 3/1/21 4:49 AM, Bart Van Assche wrote:
> On 2/28/21 6:14 PM, Yufen Yu wrote:
>> For now, we set hctx->tags->rqs[i] when get driver tag successfully.
>> The request either comes from sched_tags->static_rqs[] with scheduler,
>> or comes from tags->static_rqs[] with no scheduler. But, the value won't
>> be clear when put driver tag. Thus, tags->rqs[i] still remain old request.
>>
>> We can free these sched_tags->static_rqs[] requests when switch elevator,
>> update nr_requests or update nr_hw_queues. After that, unexpected access
>> of tags->rqs[i] may cause use-after-free crash.
>>
>> For example, we reported use-after-free of request in nbd device
>> by syzkaller:
>>
>> BUG: KASAN: use-after-free in blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
>> Read of size 4 at addr ffff80036b77f9d4 by task kworker/u9:0/10086
>> Call trace:
>>   dump_backtrace+0x0/0x310 arch/arm64/kernel/time.c:78
>>   show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
>>   __dump_stack lib/dump_stack.c:77 [inline]
>>   dump_stack+0x144/0x1b4 lib/dump_stack.c:118
>>   print_address_description+0x68/0x2d0 mm/kasan/report.c:253
>>   kasan_report_error mm/kasan/report.c:351 [inline]
>>   kasan_report+0x134/0x2f0 mm/kasan/report.c:409
>>   check_memory_region_inline mm/kasan/kasan.c:260 [inline]
>>   __asan_load4+0x88/0xb0 mm/kasan/kasan.c:699
>>   __read_once_size include/linux/compiler.h:193 [inline]
>>   blk_mq_rq_state block/blk-mq.h:106 [inline]
>>   blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
>>   nbd_read_stat drivers/block/nbd.c:670 [inline]
>>   recv_work+0x1bc/0x890 drivers/block/nbd.c:749
>>   process_one_work+0x3ec/0x9e0 kernel/workqueue.c:2156
>>   worker_thread+0x80/0x9d0 kernel/workqueue.c:2311
>>   kthread+0x1d8/0x1e0 kernel/kthread.c:255
>>   ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1174
>>
>> The syzkaller test program sended a reply package to client
>> without client sending request. After receiving the package,
>> recv_work() try to get the remained request in tags->rqs[]
>> by tag, which have been free.
>>
>> To avoid this type of problem, we may need to ensure the request
>> valid when get it by tag.
>>
>> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
>> ---
>>   block/blk-mq.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index d4d7c1caa439..5362a7958b74 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -836,9 +836,17 @@ void blk_mq_delay_kick_requeue_list(struct request_queue *q,
>>   }
>>   EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
>>   
>> +static int blk_mq_test_tag_bit(struct blk_mq_tags *tags, unsigned int tag)
>> +{
>> +	if (!blk_mq_tag_is_reserved(tags, tag))
>> +		return sbitmap_test_bit(&tags->bitmap_tags->sb, tag);
>> +	else
>> +		return sbitmap_test_bit(&tags->breserved_tags->sb, tag);
>> +}
>> +
>>   struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
>>   {
>> -	if (tag < tags->nr_tags) {
>> +	if (tag < tags->nr_tags && blk_mq_test_tag_bit(tags, tag)) {
>>   		prefetch(tags->rqs[tag]);
>>   		return tags->rqs[tag];
>>   	}
> 
> Please do not slow down the hot path by inserting additional code in the
> hot path. I am convinced that the race described in the patch
> description can be fixed without changing the hot path. See also the
> conversation I had recently with John Garry on linux-block.
> 
Seems to be cropping up everywhere now; anyway, I do agree with Bart here.
For the hot path (typically when looking up the associated command from 
within the interrupt routine) we really should not add any further code 
to not slow down processing.
Additionally, this is typically a firmware response so we can be 
reasonably certain that this is a response to valid command, so in 
nearly all cases the bit will be set.
(Pathological cases like spoofed response frames aside).

However, there another use case where blk_mq_tag_to_rq() is used, and 
that is for traversing outstanding commands eg during a device reset.
There we _have_ to ensure that the request is valid lest we run into
uninitialized values.

So I would advocate to have a slow path variant here which would 
validate the bitmap before trying to access the request.

Or, really, converting those drivers to use blk_mq_tagset_busy_iter().

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/2] blk-mq: test tags bitmap before get request
  2021-03-01  7:54     ` Hannes Reinecke
@ 2021-03-01 12:20       ` John Garry
  0 siblings, 0 replies; 10+ messages in thread
From: John Garry @ 2021-03-01 12:20 UTC (permalink / raw)
  To: Hannes Reinecke, Bart Van Assche, Yufen Yu, axboe, linux-block
  Cc: josef, ming.lei, hch

On 01/03/2021 07:54, Hannes Reinecke wrote:
>>>
>>> For example, we reported use-after-free of request in nbd device
>>> by syzkaller:
>>>
>>> BUG: KASAN: use-after-free in blk_mq_request_started+0x24/0x40 
>>> block/blk-mq.c:644
>>> Read of size 4 at addr ffff80036b77f9d4 by task kworker/u9:0/10086
>>> Call trace:
>>>   dump_backtrace+0x0/0x310 arch/arm64/kernel/time.c:78
>>>   show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
>>>   __dump_stack lib/dump_stack.c:77 [inline]
>>>   dump_stack+0x144/0x1b4 lib/dump_stack.c:118
>>>   print_address_description+0x68/0x2d0 mm/kasan/report.c:253
>>>   kasan_report_error mm/kasan/report.c:351 [inline]
>>>   kasan_report+0x134/0x2f0 mm/kasan/report.c:409
>>>   check_memory_region_inline mm/kasan/kasan.c:260 [inline]
>>>   __asan_load4+0x88/0xb0 mm/kasan/kasan.c:699
>>>   __read_once_size include/linux/compiler.h:193 [inline]
>>>   blk_mq_rq_state block/blk-mq.h:106 [inline]
>>>   blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
>>>   nbd_read_stat drivers/block/nbd.c:670 [inline]
>>>   recv_work+0x1bc/0x890 drivers/block/nbd.c:749
>>>   process_one_work+0x3ec/0x9e0 kernel/workqueue.c:2156
>>>   worker_thread+0x80/0x9d0 kernel/workqueue.c:2311
>>>   kthread+0x1d8/0x1e0 kernel/kthread.c:255
>>>   ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1174
>>>
>>> The syzkaller test program sended a reply package to client
>>> without client sending request. After receiving the package,
>>> recv_work() try to get the remained request in tags->rqs[]
>>> by tag, which have been free.
>>>
>>> To avoid this type of problem, we may need to ensure the request
>>> valid when get it by tag.
>>>
>>> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
>>> ---
>>>   block/blk-mq.c | 10 +++++++++-
>>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index d4d7c1caa439..5362a7958b74 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -836,9 +836,17 @@ void blk_mq_delay_kick_requeue_list(struct 
>>> request_queue *q,
>>>   }
>>>   EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
>>> +static int blk_mq_test_tag_bit(struct blk_mq_tags *tags, unsigned 
>>> int tag)
>>> +{
>>> +    if (!blk_mq_tag_is_reserved(tags, tag))
>>> +        return sbitmap_test_bit(&tags->bitmap_tags->sb, tag);
>>> +    else
>>> +        return sbitmap_test_bit(&tags->breserved_tags->sb, tag);
>>> +}
>>> +
>>>   struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned 
>>> int tag)
>>>   {
>>> -    if (tag < tags->nr_tags) {
>>> +    if (tag < tags->nr_tags && blk_mq_test_tag_bit(tags, tag)) {
>>>           prefetch(tags->rqs[tag]);
>>>           return tags->rqs[tag];
>>>       }
>>
>> Please do not slow down the hot path by inserting additional code in the
>> hot path. I am convinced that the race described in the patch
>> description can be fixed without changing the hot path. See also the
>> conversation I had recently with John Garry on linux-block.
>>

I plan to send an updated series later this week:
https://lore.kernel.org/linux-block/28be6446-7e06-b03c-a373-39c5eef89c8a@huawei.com/

I'm inclined to say that the first patch there may solve all reasonable 
scenarios, but we should see how to fix all scenarios and decide on the 
path forward.

Thanks,
john

> Seems to be cropping up everywhere now; anyway, I do agree with Bart here.
> For the hot path (typically when looking up the associated command from 
> within the interrupt routine) we really should not add any further code 
> to not slow down processing.
> Additionally, this is typically a firmware response so we can be 
> reasonably certain that this is a response to valid command, so in 
> nearly all cases the bit will be set.
> (Pathological cases like spoofed response frames aside).
> 
> However, there another use case where blk_mq_tag_to_rq() is used, and 
> that is for traversing outstanding commands eg during a device reset.
> There we _have_ to ensure that the request is valid lest we run into
> uninitialized values.
> 
> So I would advocate to have a slow path variant here which would 
> validate the bitmap before trying to access the request.
> 
> Or, really, converting those drivers to use blk_mq_tagset_busy_iter().


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-03-01 12:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-01  2:14 [RFC PATCH 0/2] blk-mq: improve blk_mq_tag_to_rq() Yufen Yu
2021-03-01  2:14 ` [RFC PATCH 1/2] blk-mq: test tags bitmap before get request Yufen Yu
2021-03-01  2:49   ` Damien Le Moal
2021-03-01  3:49   ` Bart Van Assche
2021-03-01  7:54     ` Hannes Reinecke
2021-03-01 12:20       ` John Garry
2021-03-01  2:14 ` [RFC PATCH 2/2] blk-mq: blk_mq_tag_to_rq() always return null for sched_tags Yufen Yu
2021-03-01  2:48   ` Damien Le Moal
2021-03-01  6:50   ` Ming Lei
2021-03-01  7:33     ` Yufen Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).