All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	<linux-block@vger.kernel.org>,
	<syzbot+77ba3d171a25c56756ea@syzkaller.appspotmail.com>
Subject: Re: [PATCH] blk-mq: fix use-after-free in blk_mq_exit_sched
Date: Wed, 9 Jun 2021 12:42:52 +0100	[thread overview]
Message-ID: <81c38feb-9d3e-7adc-57e6-54bccf0d3142@huawei.com> (raw)
In-Reply-To: <YMCU+iuHs4ULN0lb@T590>

On 09/06/2021 11:16, Ming Lei wrote:
> On Wed, Jun 09, 2021 at 09:59:43AM +0100, John Garry wrote:
>> On 09/06/2021 07:30, Ming Lei wrote:
>>
>> Thanks for the fix
>>
>>> tagset can't be used after blk_cleanup_queue() is returned because
>>> freeing tagset usually follows blk_clenup_queue(). Commit d97e594c5166
>>> ("blk-mq: Use request queue-wide tags for tagset-wide sbitmap") adds
>>> check on q->tag_set->flags in blk_mq_exit_sched(), and causes
>>> use-after-free.
>>>
>>> Fixes it by using hctx->flags.
>>>
>>
>> The tagset is a member of the Scsi_Host structure. So it is true that this
>> memory may be freed before the request_queue is exited?
> 
> Yeah, please see commit c3e2219216c9 ("block: free sched's request pool in
> blk_cleanup_queue")

JFYI, I could recreate with the following simple steps:

root@(none)$ mount /dev/sda1 mnt
[   27.252887] FAT-fs (sda1): Volume was not properly unmounted. Some 
data may be corrupt. Please run fsck.
_hw/unbind)$ echo HISI0162:01 > ./sys/bus/platform/drivers/hisi_sas_v2
[   31.262274] sas: ex 500e004aaaaaaa1f phys DID NOT change
[   31.270314] sas: ex 500e004aaaaaaa1f phys DID NOT change
[   31.278262] sas: ex 500e004aaaaaaa1f phys DID NOT change
[   31.286245] sas: ex 500e004aaaaaaa1f phys DID NOT change
[   31.294164] sas: ex 500e004aaaaaaa1f phys DID NOT change
[   31.302143] sas: ex 500e004aaaaaaa1f phys DID NOT change
[   31.310097] sas: ex 500e004aaaaaaa1f phys DID NOT change
[   31.321599] hisi_sas_v2_hw HISI0162:01: dev[9:1] is gone
[   31.429245] hisi_sas_v2_hw HISI0162:01: dev[8:1] is gone
[   31.533461] hisi_sas_v2_hw HISI0162:01: dev[7:1] is gone
[   31.637338] hisi_sas_v2_hw HISI0162:01: dev[6:1] is gone
[   31.740840] hisi_sas_v2_hw HISI0162:01: dev[5:1] is gone
[   31.750659] sd 0:0:3:0: [sdd] Synchronizing SCSI cache
[   31.833500] hisi_sas_v2_hw HISI0162:01: dev[4:1] is gone
[   31.937351] hisi_sas_v2_hw HISI0162:01: dev[3:1] is gone
[   31.947749] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
[   31.953195] sd 0:0:1:0: [sdb] Stopping disk

[   32.690815] hisi_sas_v2_hw HISI0162:01: dev[2:5] is gone
[   32.771526] hisi_sas_v2_hw HISI0162:01: dev[1:1] is gone
[   32.790406] hisi_sas_v2_hw HISI0162:01: dev[0:2] is gone

root@(none)$
root@(none)$
root@(none)$ umount mnt
[   37.323039] 
==================================================================
[   37.330262] BUG: KASAN: use-after-free in blk_mq_exit_sched+0x110/0x1c8
[   37.336880] Read of size 4 at addr ffff001051e80100 by task umount/547
[   37.343401]
[   37.344884] CPU: 4 PID: 547 Comm: umount Not tainted 
5.13.0-rc5-next-20210608 #80
[   37.352362] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon 
D05 IT21 Nemo 2.0 RC0 04/18/2018
[   37.361486] Call trace:
[   37.363924]  dump_backtrace+0x0/0x2d0
[   37.367586]  show_stack+0x18/0x28
[   37.370898]  dump_stack_lvl+0xfc/0x138
[   37.374643]  print_address_description.constprop.13+0x78/0x314
[   37.380472]  kasan_report+0x1e0/0x248
[   37.384131]  __asan_load4+0x9c/0xd8
[   37.387615]  blk_mq_exit_sched+0x110/0x1c8
[   37.391706]  __elevator_exit+0x34/0x58
[   37.395451]  blk_release_queue+0x108/0x1d8
[   37.399545]  kobject_put+0xa8/0x180
[   37.403029]  blk_put_queue+0x14/0x20
[   37.406601]  disk_release+0xcc/0x100
[   37.410171]  device_release+0x94/0x110
[   37.413918]  kobject_put+0xa8/0x180
[   37.417401]  put_device+0x14/0x28
[   37.420712]  put_disk+0x2c/0x40
[   37.423848]  blkdev_put_no_open+0x54/0x78
[   37.427853]  blkdev_put+0x108/0x258
[   37.431335]  kill_block_super+0x5c/0x78
[   37.435166]  deactivate_locked_super+0x6c/0xd0
[   37.439605]  deactivate_super+0x8c/0xa8
[   37.443435]  cleanup_mnt+0x110/0x1c0
[   37.447007]  __cleanup_mnt+0x14/0x20
[   37.450578]  task_work_run+0xbc/0x1a8
[   37.454236]  do_notify_resume+0x2cc/0x590
[   37.458242]  work_pending+0xc/0x3c8
[   37.461725]
[   37.463207] The buggy address belongs to the page:
[   37.467990] page:(____ptrval____) refcount:0 mapcount:-128 
mapping:0000000000000000 index:0x0 pfn:0x1051e80
[   37.477724] flags: 0xbfffc0000000000(node=0|zone=2|lastcpupid=0xffff)
[   37.484164] raw: 0bfffc0000000000 fffffc00415a9008 ffff0017fbffebb0 
0000000000000000
[   37.491900] raw: 0000000000000000 0000000000000006 00000000ffffff7f 
0000000000000000
[   37.499635] page dumped because: kasan: bad access detected
[   37.505198]
[   37.506680] Memory state around the buggy address:
[   37.511463]  ffff001051e80000: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
[   37.518677]  ffff001051e80080: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
[   37.525891] >ffff001051e80100: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
[   37.533104]^
[   37.536324]  ffff001051e80180: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
[   37.543538]  ffff001051e80200: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
[   37.550751] 
==================================================================
[   37.557963] Disabling lock debugging due to kernel taint
root@(none)$
root@(none)$

And this patch fixes it:
Tested-by: John Garry <john.garry@huawei.com>

> 
>>
>>> Reported-by: syzbot+77ba3d171a25c56756ea@syzkaller.appspotmail.com
>>> Fixes: d97e594c5166 ("blk-mq: Use request queue-wide tags for tagset-wide sbitmap")
>>> Cc: John Garry <john.garry@huawei.com>
>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>> ---
>>>    block/blk-mq-sched.c | 4 +++-
>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>>> index a9182d2f8ad3..80273245d11a 100644
>>> --- a/block/blk-mq-sched.c
>>> +++ b/block/blk-mq-sched.c
>>> @@ -680,6 +680,7 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
>>>    {
>>>    	struct blk_mq_hw_ctx *hctx;
>>>    	unsigned int i;
>>> +	unsigned int flags = 0;
>>>    	queue_for_each_hw_ctx(q, hctx, i) {
>>>    		blk_mq_debugfs_unregister_sched_hctx(hctx);
>>> @@ -687,12 +688,13 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
>>>    			e->type->ops.exit_hctx(hctx, i);
>>>    			hctx->sched_data = NULL;
>>>    		}
>>> +		flags = hctx->flags;
>>
>> I know the choice is limited, but it is unfortunate that we must set flags
>> in a loop
> 
> Does it matter?

It's just a nit on the coding style: it's not an especially good 
practice to set the same value in a loop.

But, as I said, choice is limited.

> 
>>
>>>    	}
>>>    	blk_mq_debugfs_unregister_sched(q);
>>>    	if (e->type->ops.exit_sched)
>>>    		e->type->ops.exit_sched(e);
>>>    	blk_mq_sched_tags_teardown(q);
>>> -	if (blk_mq_is_sbitmap_shared(q->tag_set->flags))
>>> +	if (blk_mq_is_sbitmap_shared(flags))
>>>    		blk_mq_exit_sched_shared_sbitmap(q);
>>
>> this is
>>
>> blk_mq_exit_sched_shared_sbitmap(struct request_queue *queue)
>> {
>> 	sbitmap_queue_free(&queue->sched_bitmap_tags);
>> 	..
>> }
>>
>> And isn't it safe to call sbitmap_queue_free() when
>> sbitmap_queue_init_node() has not been called?
>>
>> I'm just wondering if we can always call blk_mq_exit_sched_shared_sbitmap()?
>> I know it's not an ideal choice either.
> 
> So far it may work, not sure if it can in future, I suggest to follow
> the traditional alloc & free pattern.
> 
> 

Fine

Thanks,
John

  reply	other threads:[~2021-06-09 11:49 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09  6:30 [PATCH] blk-mq: fix use-after-free in blk_mq_exit_sched Ming Lei
2021-06-09  8:59 ` John Garry
2021-06-09 10:16   ` Ming Lei
2021-06-09 11:42     ` John Garry [this message]
2021-06-14 22:07 ` Ming Lei
2021-06-15 10:02   ` John Garry
2021-06-18  6:03     ` Ming Lei
2021-06-18 14:50 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=81c38feb-9d3e-7adc-57e6-54bccf0d3142@huawei.com \
    --to=john.garry@huawei.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=syzbot+77ba3d171a25c56756ea@syzkaller.appspotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.