linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] block, bfq: fix null pointer dereference in bfq_bio_bfqg()
@ 2022-11-08 10:34 Yu Kuai
  2022-11-08 12:48 ` Jan Kara
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Yu Kuai @ 2022-11-08 10:34 UTC (permalink / raw)
  To: jack, tj, josef, axboe, paolo.valente
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

Out test found a following problem in kernel 5.10, and the same problem
should exist in mainline:

BUG: kernel NULL pointer dereference, address: 0000000000000094
PGD 0 P4D 0
Oops: 0000 [#1] SMP
CPU: 7 PID: 155 Comm: kworker/7:1 Not tainted 5.10.0-01932-g19e0ace2ca1d-dirty 4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-b4
Workqueue: kthrotld blk_throtl_dispatch_work_fn
RIP: 0010:bfq_bio_bfqg+0x52/0xc0
Code: 94 00 00 00 00 75 2e 48 8b 40 30 48 83 05 35 06 c8 0b 01 48 85 c0 74 3d 4b
RSP: 0018:ffffc90001a1fba0 EFLAGS: 00010002
RAX: ffff888100d60400 RBX: ffff8881132e7000 RCX: 0000000000000000
RDX: 0000000000000017 RSI: ffff888103580a18 RDI: ffff888103580a18
RBP: ffff8881132e7000 R08: 0000000000000000 R09: ffffc90001a1fe10
R10: 0000000000000a20 R11: 0000000000034320 R12: 0000000000000000
R13: ffff888103580a18 R14: ffff888114447000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88881fdc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000094 CR3: 0000000100cdb000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 bfq_bic_update_cgroup+0x3c/0x350
 ? ioc_create_icq+0x42/0x270
 bfq_init_rq+0xfd/0x1060
 bfq_insert_requests+0x20f/0x1cc0
 ? ioc_create_icq+0x122/0x270
 blk_mq_sched_insert_requests+0x86/0x1d0
 blk_mq_flush_plug_list+0x193/0x2a0
 blk_flush_plug_list+0x127/0x170
 blk_finish_plug+0x31/0x50
 blk_throtl_dispatch_work_fn+0x151/0x190
 process_one_work+0x27c/0x5f0
 worker_thread+0x28b/0x6b0
 ? rescuer_thread+0x590/0x590
 kthread+0x153/0x1b0
 ? kthread_flush_work+0x170/0x170
 ret_from_fork+0x1f/0x30
Modules linked in:
CR2: 0000000000000094
---[ end trace e2e59ac014314547 ]---
RIP: 0010:bfq_bio_bfqg+0x52/0xc0
Code: 94 00 00 00 00 75 2e 48 8b 40 30 48 83 05 35 06 c8 0b 01 48 85 c0 74 3d 4b
RSP: 0018:ffffc90001a1fba0 EFLAGS: 00010002
RAX: ffff888100d60400 RBX: ffff8881132e7000 RCX: 0000000000000000
RDX: 0000000000000017 RSI: ffff888103580a18 RDI: ffff888103580a18
RBP: ffff8881132e7000 R08: 0000000000000000 R09: ffffc90001a1fe10
R10: 0000000000000a20 R11: 0000000000034320 R12: 0000000000000000
R13: ffff888103580a18 R14: ffff888114447000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88881fdc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000094 CR3: 0000000100cdb000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Root cause is quite complex:

1) use bfq elevator for the test device.
2) create a cgroup CG
3) config blk throtl in CG

   blkg_conf_prep
    blkg_create

4) create a thread T1 and issue async io in CG:

   bio_init
    bio_associate_blkg
   ...
   submit_bio
    submit_bio_noacct
     blk_throtl_bio -> io is throttled
     // io submit is done

5) switch elevator:

   bfq_exit_queue
    blkcg_deactivate_policy
     list_for_each_entry(blkg, &q->blkg_list, q_node)
      blkg->pd[] = NULL
      // bfq policy is removed

5) thread t1 exist, then remove the cgroup CG:

   blkcg_unpin_online
    blkcg_destroy_blkgs
     blkg_destroy
      list_del_init(&blkg->q_node)
      // blkg is removed from queue list

6) switch elevator back to bfq

 bfq_init_queue
  bfq_create_group_hierarchy
   blkcg_activate_policy
    list_for_each_entry_reverse(blkg, &q->blkg_list)
     // blkg is removed from list, hence bfq policy is still NULL

7) throttled io is dispatched to bfq:

 bfq_insert_requests
  bfq_init_rq
   bfq_bic_update_cgroup
    bfq_bio_bfqg
     bfqg = blkg_to_bfqg(blkg)
     // bfqg is NULL because bfq policy is NULL

The problem is only possible in bfq because only bfq can be deactivated and
activated while queue is online, while others can only be deactivated while
the device is removed.

Fix the problem in bfq by checking if blkg is online before calling
blkg_to_bfqg().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bfq-cgroup.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 144bca006463..7d624a3a3f0f 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -610,6 +610,10 @@ struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio)
 	struct bfq_group *bfqg;
 
 	while (blkg) {
+		if (!blkg->online) {
+			blkg = blkg->parent;
+			continue;
+		}
 		bfqg = blkg_to_bfqg(blkg);
 		if (bfqg->online) {
 			bio_associate_blkg_from_css(bio, &blkg->blkcg->css);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] block, bfq: fix null pointer dereference in bfq_bio_bfqg()
  2022-11-08 10:34 [PATCH] block, bfq: fix null pointer dereference in bfq_bio_bfqg() Yu Kuai
@ 2022-11-08 12:48 ` Jan Kara
  2022-11-08 13:15   ` Yu Kuai
  2022-11-08 14:13 ` Jens Axboe
  2022-11-08 22:50 ` Chaitanya Kulkarni
  2 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2022-11-08 12:48 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, tj, josef, axboe, paolo.valente, cgroups, linux-block,
	linux-kernel, yukuai3, yi.zhang

On Tue 08-11-22 18:34:34, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Out test found a following problem in kernel 5.10, and the same problem
> should exist in mainline:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000094
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP
> CPU: 7 PID: 155 Comm: kworker/7:1 Not tainted 5.10.0-01932-g19e0ace2ca1d-dirty 4
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-b4
> Workqueue: kthrotld blk_throtl_dispatch_work_fn
> RIP: 0010:bfq_bio_bfqg+0x52/0xc0
> Code: 94 00 00 00 00 75 2e 48 8b 40 30 48 83 05 35 06 c8 0b 01 48 85 c0 74 3d 4b
> RSP: 0018:ffffc90001a1fba0 EFLAGS: 00010002
> RAX: ffff888100d60400 RBX: ffff8881132e7000 RCX: 0000000000000000
> RDX: 0000000000000017 RSI: ffff888103580a18 RDI: ffff888103580a18
> RBP: ffff8881132e7000 R08: 0000000000000000 R09: ffffc90001a1fe10
> R10: 0000000000000a20 R11: 0000000000034320 R12: 0000000000000000
> R13: ffff888103580a18 R14: ffff888114447000 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff88881fdc0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000094 CR3: 0000000100cdb000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  bfq_bic_update_cgroup+0x3c/0x350
>  ? ioc_create_icq+0x42/0x270
>  bfq_init_rq+0xfd/0x1060
>  bfq_insert_requests+0x20f/0x1cc0
>  ? ioc_create_icq+0x122/0x270
>  blk_mq_sched_insert_requests+0x86/0x1d0
>  blk_mq_flush_plug_list+0x193/0x2a0
>  blk_flush_plug_list+0x127/0x170
>  blk_finish_plug+0x31/0x50
>  blk_throtl_dispatch_work_fn+0x151/0x190
>  process_one_work+0x27c/0x5f0
>  worker_thread+0x28b/0x6b0
>  ? rescuer_thread+0x590/0x590
>  kthread+0x153/0x1b0
>  ? kthread_flush_work+0x170/0x170
>  ret_from_fork+0x1f/0x30
> Modules linked in:
> CR2: 0000000000000094
> ---[ end trace e2e59ac014314547 ]---
> RIP: 0010:bfq_bio_bfqg+0x52/0xc0
> Code: 94 00 00 00 00 75 2e 48 8b 40 30 48 83 05 35 06 c8 0b 01 48 85 c0 74 3d 4b
> RSP: 0018:ffffc90001a1fba0 EFLAGS: 00010002
> RAX: ffff888100d60400 RBX: ffff8881132e7000 RCX: 0000000000000000
> RDX: 0000000000000017 RSI: ffff888103580a18 RDI: ffff888103580a18
> RBP: ffff8881132e7000 R08: 0000000000000000 R09: ffffc90001a1fe10
> R10: 0000000000000a20 R11: 0000000000034320 R12: 0000000000000000
> R13: ffff888103580a18 R14: ffff888114447000 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff88881fdc0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000094 CR3: 0000000100cdb000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> 
> Root cause is quite complex:
> 
> 1) use bfq elevator for the test device.
> 2) create a cgroup CG
> 3) config blk throtl in CG
> 
>    blkg_conf_prep
>     blkg_create
> 
> 4) create a thread T1 and issue async io in CG:
> 
>    bio_init
>     bio_associate_blkg
>    ...
>    submit_bio
>     submit_bio_noacct
>      blk_throtl_bio -> io is throttled
>      // io submit is done
> 
> 5) switch elevator:
> 
>    bfq_exit_queue
>     blkcg_deactivate_policy
>      list_for_each_entry(blkg, &q->blkg_list, q_node)
>       blkg->pd[] = NULL
>       // bfq policy is removed
> 
> 5) thread t1 exist, then remove the cgroup CG:
> 
>    blkcg_unpin_online
>     blkcg_destroy_blkgs
>      blkg_destroy
>       list_del_init(&blkg->q_node)
>       // blkg is removed from queue list
> 
> 6) switch elevator back to bfq
> 
>  bfq_init_queue
>   bfq_create_group_hierarchy
>    blkcg_activate_policy
>     list_for_each_entry_reverse(blkg, &q->blkg_list)
>      // blkg is removed from list, hence bfq policy is still NULL
> 
> 7) throttled io is dispatched to bfq:
> 
>  bfq_insert_requests
>   bfq_init_rq
>    bfq_bic_update_cgroup
>     bfq_bio_bfqg
>      bfqg = blkg_to_bfqg(blkg)
>      // bfqg is NULL because bfq policy is NULL
> 
> The problem is only possible in bfq because only bfq can be deactivated and
> activated while queue is online, while others can only be deactivated while
> the device is removed.
> 
> Fix the problem in bfq by checking if blkg is online before calling
> blkg_to_bfqg().
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  block/bfq-cgroup.c | 4 ++++
>  1 file changed, 4 insertions(+)

Hum, that is indeed contrieved ;). Your fixup makes sense so feel free to
add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> 
> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> index 144bca006463..7d624a3a3f0f 100644
> --- a/block/bfq-cgroup.c
> +++ b/block/bfq-cgroup.c
> @@ -610,6 +610,10 @@ struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio)
>  	struct bfq_group *bfqg;
>  
>  	while (blkg) {
> +		if (!blkg->online) {
> +			blkg = blkg->parent;
> +			continue;
> +		}
>  		bfqg = blkg_to_bfqg(blkg);
>  		if (bfqg->online) {
>  			bio_associate_blkg_from_css(bio, &blkg->blkcg->css);
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block, bfq: fix null pointer dereference in bfq_bio_bfqg()
  2022-11-08 12:48 ` Jan Kara
@ 2022-11-08 13:15   ` Yu Kuai
  0 siblings, 0 replies; 6+ messages in thread
From: Yu Kuai @ 2022-11-08 13:15 UTC (permalink / raw)
  To: Jan Kara, Yu Kuai
  Cc: tj, josef, axboe, paolo.valente, cgroups, linux-block,
	linux-kernel, yi.zhang, yukuai (C)

Hi!

在 2022/11/08 20:48, Jan Kara 写道:
> 
> Hum, that is indeed contrieved ;). Your fixup makes sense so feel free to
> add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>

Thanks for the review, our test do many unusual operations together...

Thanks,
Kuai
> 
> 								Honza
> 
>>
>> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
>> index 144bca006463..7d624a3a3f0f 100644
>> --- a/block/bfq-cgroup.c
>> +++ b/block/bfq-cgroup.c
>> @@ -610,6 +610,10 @@ struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio)
>>   	struct bfq_group *bfqg;
>>   
>>   	while (blkg) {
>> +		if (!blkg->online) {
>> +			blkg = blkg->parent;
>> +			continue;
>> +		}
>>   		bfqg = blkg_to_bfqg(blkg);
>>   		if (bfqg->online) {
>>   			bio_associate_blkg_from_css(bio, &blkg->blkcg->css);
>> -- 
>> 2.31.1
>>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block, bfq: fix null pointer dereference in bfq_bio_bfqg()
  2022-11-08 10:34 [PATCH] block, bfq: fix null pointer dereference in bfq_bio_bfqg() Yu Kuai
  2022-11-08 12:48 ` Jan Kara
@ 2022-11-08 14:13 ` Jens Axboe
  2022-11-08 22:50 ` Chaitanya Kulkarni
  2 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2022-11-08 14:13 UTC (permalink / raw)
  To: tj, Yu Kuai, josef, paolo.valente, jack
  Cc: linux-block, yukuai3, yi.zhang, linux-kernel, cgroups

On Tue, 8 Nov 2022 18:34:34 +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Out test found a following problem in kernel 5.10, and the same problem
> should exist in mainline:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000094
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP
> CPU: 7 PID: 155 Comm: kworker/7:1 Not tainted 5.10.0-01932-g19e0ace2ca1d-dirty 4
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-b4
> Workqueue: kthrotld blk_throtl_dispatch_work_fn
> RIP: 0010:bfq_bio_bfqg+0x52/0xc0
> Code: 94 00 00 00 00 75 2e 48 8b 40 30 48 83 05 35 06 c8 0b 01 48 85 c0 74 3d 4b
> RSP: 0018:ffffc90001a1fba0 EFLAGS: 00010002
> RAX: ffff888100d60400 RBX: ffff8881132e7000 RCX: 0000000000000000
> RDX: 0000000000000017 RSI: ffff888103580a18 RDI: ffff888103580a18
> RBP: ffff8881132e7000 R08: 0000000000000000 R09: ffffc90001a1fe10
> R10: 0000000000000a20 R11: 0000000000034320 R12: 0000000000000000
> R13: ffff888103580a18 R14: ffff888114447000 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff88881fdc0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000094 CR3: 0000000100cdb000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  bfq_bic_update_cgroup+0x3c/0x350
>  ? ioc_create_icq+0x42/0x270
>  bfq_init_rq+0xfd/0x1060
>  bfq_insert_requests+0x20f/0x1cc0
>  ? ioc_create_icq+0x122/0x270
>  blk_mq_sched_insert_requests+0x86/0x1d0
>  blk_mq_flush_plug_list+0x193/0x2a0
>  blk_flush_plug_list+0x127/0x170
>  blk_finish_plug+0x31/0x50
>  blk_throtl_dispatch_work_fn+0x151/0x190
>  process_one_work+0x27c/0x5f0
>  worker_thread+0x28b/0x6b0
>  ? rescuer_thread+0x590/0x590
>  kthread+0x153/0x1b0
>  ? kthread_flush_work+0x170/0x170
>  ret_from_fork+0x1f/0x30
> Modules linked in:
> CR2: 0000000000000094
> ---[ end trace e2e59ac014314547 ]---
> RIP: 0010:bfq_bio_bfqg+0x52/0xc0
> Code: 94 00 00 00 00 75 2e 48 8b 40 30 48 83 05 35 06 c8 0b 01 48 85 c0 74 3d 4b
> RSP: 0018:ffffc90001a1fba0 EFLAGS: 00010002
> RAX: ffff888100d60400 RBX: ffff8881132e7000 RCX: 0000000000000000
> RDX: 0000000000000017 RSI: ffff888103580a18 RDI: ffff888103580a18
> RBP: ffff8881132e7000 R08: 0000000000000000 R09: ffffc90001a1fe10
> R10: 0000000000000a20 R11: 0000000000034320 R12: 0000000000000000
> R13: ffff888103580a18 R14: ffff888114447000 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff88881fdc0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000094 CR3: 0000000100cdb000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> 
> [...]

Applied, thanks!

[1/1] block, bfq: fix null pointer dereference in bfq_bio_bfqg()
      commit: f02be9002c480cd3ec0fcf184ad27cf531bd6ece

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block, bfq: fix null pointer dereference in bfq_bio_bfqg()
  2022-11-08 10:34 [PATCH] block, bfq: fix null pointer dereference in bfq_bio_bfqg() Yu Kuai
  2022-11-08 12:48 ` Jan Kara
  2022-11-08 14:13 ` Jens Axboe
@ 2022-11-08 22:50 ` Chaitanya Kulkarni
  2022-11-10  3:30   ` Yu Kuai
  2 siblings, 1 reply; 6+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-08 22:50 UTC (permalink / raw)
  To: Yu Kuai, jack, tj, josef, axboe, paolo.valente
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yi.zhang

On 11/8/22 02:34, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Out test found a following problem in kernel 5.10, and the same problem
> should exist in mainline:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000094
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP
> CPU: 7 PID: 155 Comm: kworker/7:1 Not tainted 5.10.0-01932-g19e0ace2ca1d-dirty 4
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-b4
> Workqueue: kthrotld blk_throtl_dispatch_work_fn
> RIP: 0010:bfq_bio_bfqg+0x52/0xc0
> Code: 94 00 00 00 00 75 2e 48 8b 40 30 48 83 05 35 06 c8 0b 01 48 85 c0 74 3d 4b
> RSP: 0018:ffffc90001a1fba0 EFLAGS: 00010002
> RAX: ffff888100d60400 RBX: ffff8881132e7000 RCX: 0000000000000000
> RDX: 0000000000000017 RSI: ffff888103580a18 RDI: ffff888103580a18
> RBP: ffff8881132e7000 R08: 0000000000000000 R09: ffffc90001a1fe10
> R10: 0000000000000a20 R11: 0000000000034320 R12: 0000000000000000
> R13: ffff888103580a18 R14: ffff888114447000 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff88881fdc0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000094 CR3: 0000000100cdb000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>   bfq_bic_update_cgroup+0x3c/0x350
>   ? ioc_create_icq+0x42/0x270
>   bfq_init_rq+0xfd/0x1060
>   bfq_insert_requests+0x20f/0x1cc0
>   ? ioc_create_icq+0x122/0x270
>   blk_mq_sched_insert_requests+0x86/0x1d0
>   blk_mq_flush_plug_list+0x193/0x2a0
>   blk_flush_plug_list+0x127/0x170
>   blk_finish_plug+0x31/0x50
>   blk_throtl_dispatch_work_fn+0x151/0x190
>   process_one_work+0x27c/0x5f0
>   worker_thread+0x28b/0x6b0
>   ? rescuer_thread+0x590/0x590
>   kthread+0x153/0x1b0
>   ? kthread_flush_work+0x170/0x170
>   ret_from_fork+0x1f/0x30
> Modules linked in:
> CR2: 0000000000000094
> ---[ end trace e2e59ac014314547 ]---
> RIP: 0010:bfq_bio_bfqg+0x52/0xc0
> Code: 94 00 00 00 00 75 2e 48 8b 40 30 48 83 05 35 06 c8 0b 01 48 85 c0 74 3d 4b
> RSP: 0018:ffffc90001a1fba0 EFLAGS: 00010002
> RAX: ffff888100d60400 RBX: ffff8881132e7000 RCX: 0000000000000000
> RDX: 0000000000000017 RSI: ffff888103580a18 RDI: ffff888103580a18
> RBP: ffff8881132e7000 R08: 0000000000000000 R09: ffffc90001a1fe10
> R10: 0000000000000a20 R11: 0000000000034320 R12: 0000000000000000
> R13: ffff888103580a18 R14: ffff888114447000 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff88881fdc0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000094 CR3: 0000000100cdb000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> 
> Root cause is quite complex:
> 
> 1) use bfq elevator for the test device.
> 2) create a cgroup CG
> 3) config blk throtl in CG
> 
>     blkg_conf_prep
>      blkg_create
> 
> 4) create a thread T1 and issue async io in CG:
> 
>     bio_init
>      bio_associate_blkg
>     ...
>     submit_bio
>      submit_bio_noacct
>       blk_throtl_bio -> io is throttled
>       // io submit is done
> 
> 5) switch elevator:
> 
>     bfq_exit_queue
>      blkcg_deactivate_policy
>       list_for_each_entry(blkg, &q->blkg_list, q_node)
>        blkg->pd[] = NULL
>        // bfq policy is removed
> 
> 5) thread t1 exist, then remove the cgroup CG:
> 
>     blkcg_unpin_online
>      blkcg_destroy_blkgs
>       blkg_destroy
>        list_del_init(&blkg->q_node)
>        // blkg is removed from queue list
> 
> 6) switch elevator back to bfq
> 
>   bfq_init_queue
>    bfq_create_group_hierarchy
>     blkcg_activate_policy
>      list_for_each_entry_reverse(blkg, &q->blkg_list)
>       // blkg is removed from list, hence bfq policy is still NULL
> 
> 7) throttled io is dispatched to bfq:
> 
>   bfq_insert_requests
>    bfq_init_rq
>     bfq_bic_update_cgroup
>      bfq_bio_bfqg
>       bfqg = blkg_to_bfqg(blkg)
>       // bfqg is NULL because bfq policy is NULL
> 
> The problem is only possible in bfq because only bfq can be deactivated and
> activated while queue is online, while others can only be deactivated while
> the device is removed.
> 
> Fix the problem in bfq by checking if blkg is online before calling
> blkg_to_bfqg().
> 


Please submit the block tests for this as this is clearly non-trivial
issue that needs to be tested every release ..

-ck



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block, bfq: fix null pointer dereference in bfq_bio_bfqg()
  2022-11-08 22:50 ` Chaitanya Kulkarni
@ 2022-11-10  3:30   ` Yu Kuai
  0 siblings, 0 replies; 6+ messages in thread
From: Yu Kuai @ 2022-11-10  3:30 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Yu Kuai, jack, tj, josef, axboe, paolo.valente
  Cc: cgroups, linux-block, linux-kernel, yi.zhang, yukuai (C)

Hi,

在 2022/11/09 6:50, Chaitanya Kulkarni 写道:
> 
> 
> Please submit the block tests for this as this is clearly non-trivial
> issue that needs to be tested every release ..

I wrote a c reporducer that is 100% reproducible in v5.10, I'll try to
add a block test. However, it might take sometime because I'm not
familar with blktests yet...

Thanks,
Kuai
> 
> -ck
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-11-10  3:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-08 10:34 [PATCH] block, bfq: fix null pointer dereference in bfq_bio_bfqg() Yu Kuai
2022-11-08 12:48 ` Jan Kara
2022-11-08 13:15   ` Yu Kuai
2022-11-08 14:13 ` Jens Axboe
2022-11-08 22:50 ` Chaitanya Kulkarni
2022-11-10  3:30   ` Yu Kuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).