From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEF99C48BCD for ; Wed, 9 Jun 2021 11:49:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8ECEA61108 for ; Wed, 9 Jun 2021 11:49:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239560AbhFILvE (ORCPT ); Wed, 9 Jun 2021 07:51:04 -0400 Received: from frasgout.his.huawei.com ([185.176.79.56]:3185 "EHLO frasgout.his.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239639AbhFILum (ORCPT ); Wed, 9 Jun 2021 07:50:42 -0400 Received: from fraeml743-chm.china.huawei.com (unknown [172.18.147.226]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4G0Q676PQNz6J9VY; Wed, 9 Jun 2021 19:35:59 +0800 (CST) Received: from lhreml724-chm.china.huawei.com (10.201.108.75) by fraeml743-chm.china.huawei.com (10.206.15.224) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 9 Jun 2021 13:48:46 +0200 Received: from [10.47.80.201] (10.47.80.201) by lhreml724-chm.china.huawei.com (10.201.108.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 9 Jun 2021 12:48:45 +0100 Subject: Re: [PATCH] blk-mq: fix use-after-free in blk_mq_exit_sched To: Ming Lei CC: Jens Axboe , Christoph Hellwig , , References: <20210609063046.122843-1-ming.lei@redhat.com> From: John Garry Message-ID: <81c38feb-9d3e-7adc-57e6-54bccf0d3142@huawei.com> Date: Wed, 9 Jun 2021 12:42:52 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.1.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.47.80.201] X-ClientProxiedBy: lhreml749-chm.china.huawei.com (10.201.108.199) To lhreml724-chm.china.huawei.com (10.201.108.75) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On 09/06/2021 11:16, Ming Lei wrote: > On Wed, Jun 09, 2021 at 09:59:43AM +0100, John Garry wrote: >> On 09/06/2021 07:30, Ming Lei wrote: >> >> Thanks for the fix >> >>> tagset can't be used after blk_cleanup_queue() is returned because >>> freeing tagset usually follows blk_clenup_queue(). Commit d97e594c5166 >>> ("blk-mq: Use request queue-wide tags for tagset-wide sbitmap") adds >>> check on q->tag_set->flags in blk_mq_exit_sched(), and causes >>> use-after-free. >>> >>> Fixes it by using hctx->flags. >>> >> >> The tagset is a member of the Scsi_Host structure. So it is true that this >> memory may be freed before the request_queue is exited? > > Yeah, please see commit c3e2219216c9 ("block: free sched's request pool in > blk_cleanup_queue") JFYI, I could recreate with the following simple steps: root@(none)$ mount /dev/sda1 mnt [ 27.252887] FAT-fs (sda1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. _hw/unbind)$ echo HISI0162:01 > ./sys/bus/platform/drivers/hisi_sas_v2 [ 31.262274] sas: ex 500e004aaaaaaa1f phys DID NOT change [ 31.270314] sas: ex 500e004aaaaaaa1f phys DID NOT change [ 31.278262] sas: ex 500e004aaaaaaa1f phys DID NOT change [ 31.286245] sas: ex 500e004aaaaaaa1f phys DID NOT change [ 31.294164] sas: ex 500e004aaaaaaa1f phys DID NOT change [ 31.302143] sas: ex 500e004aaaaaaa1f phys DID NOT change [ 31.310097] sas: ex 500e004aaaaaaa1f phys DID NOT change [ 31.321599] hisi_sas_v2_hw HISI0162:01: dev[9:1] is gone [ 31.429245] hisi_sas_v2_hw HISI0162:01: dev[8:1] is gone [ 31.533461] hisi_sas_v2_hw HISI0162:01: dev[7:1] is gone [ 31.637338] hisi_sas_v2_hw HISI0162:01: dev[6:1] is gone [ 31.740840] hisi_sas_v2_hw HISI0162:01: dev[5:1] is gone [ 31.750659] sd 0:0:3:0: [sdd] Synchronizing SCSI cache [ 31.833500] hisi_sas_v2_hw HISI0162:01: dev[4:1] is gone [ 31.937351] hisi_sas_v2_hw HISI0162:01: dev[3:1] is gone [ 31.947749] sd 0:0:1:0: [sdb] Synchronizing SCSI cache [ 31.953195] sd 0:0:1:0: [sdb] Stopping disk [ 32.690815] hisi_sas_v2_hw HISI0162:01: dev[2:5] is gone [ 32.771526] hisi_sas_v2_hw HISI0162:01: dev[1:1] is gone [ 32.790406] hisi_sas_v2_hw HISI0162:01: dev[0:2] is gone root@(none)$ root@(none)$ root@(none)$ umount mnt [ 37.323039] ================================================================== [ 37.330262] BUG: KASAN: use-after-free in blk_mq_exit_sched+0x110/0x1c8 [ 37.336880] Read of size 4 at addr ffff001051e80100 by task umount/547 [ 37.343401] [ 37.344884] CPU: 4 PID: 547 Comm: umount Not tainted 5.13.0-rc5-next-20210608 #80 [ 37.352362] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018 [ 37.361486] Call trace: [ 37.363924] dump_backtrace+0x0/0x2d0 [ 37.367586] show_stack+0x18/0x28 [ 37.370898] dump_stack_lvl+0xfc/0x138 [ 37.374643] print_address_description.constprop.13+0x78/0x314 [ 37.380472] kasan_report+0x1e0/0x248 [ 37.384131] __asan_load4+0x9c/0xd8 [ 37.387615] blk_mq_exit_sched+0x110/0x1c8 [ 37.391706] __elevator_exit+0x34/0x58 [ 37.395451] blk_release_queue+0x108/0x1d8 [ 37.399545] kobject_put+0xa8/0x180 [ 37.403029] blk_put_queue+0x14/0x20 [ 37.406601] disk_release+0xcc/0x100 [ 37.410171] device_release+0x94/0x110 [ 37.413918] kobject_put+0xa8/0x180 [ 37.417401] put_device+0x14/0x28 [ 37.420712] put_disk+0x2c/0x40 [ 37.423848] blkdev_put_no_open+0x54/0x78 [ 37.427853] blkdev_put+0x108/0x258 [ 37.431335] kill_block_super+0x5c/0x78 [ 37.435166] deactivate_locked_super+0x6c/0xd0 [ 37.439605] deactivate_super+0x8c/0xa8 [ 37.443435] cleanup_mnt+0x110/0x1c0 [ 37.447007] __cleanup_mnt+0x14/0x20 [ 37.450578] task_work_run+0xbc/0x1a8 [ 37.454236] do_notify_resume+0x2cc/0x590 [ 37.458242] work_pending+0xc/0x3c8 [ 37.461725] [ 37.463207] The buggy address belongs to the page: [ 37.467990] page:(____ptrval____) refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x1051e80 [ 37.477724] flags: 0xbfffc0000000000(node=0|zone=2|lastcpupid=0xffff) [ 37.484164] raw: 0bfffc0000000000 fffffc00415a9008 ffff0017fbffebb0 0000000000000000 [ 37.491900] raw: 0000000000000000 0000000000000006 00000000ffffff7f 0000000000000000 [ 37.499635] page dumped because: kasan: bad access detected [ 37.505198] [ 37.506680] Memory state around the buggy address: [ 37.511463] ffff001051e80000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 37.518677] ffff001051e80080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 37.525891] >ffff001051e80100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 37.533104]^ [ 37.536324] ffff001051e80180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 37.543538] ffff001051e80200: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 37.550751] ================================================================== [ 37.557963] Disabling lock debugging due to kernel taint root@(none)$ root@(none)$ And this patch fixes it: Tested-by: John Garry > >> >>> Reported-by: syzbot+77ba3d171a25c56756ea@syzkaller.appspotmail.com >>> Fixes: d97e594c5166 ("blk-mq: Use request queue-wide tags for tagset-wide sbitmap") >>> Cc: John Garry >>> Signed-off-by: Ming Lei >>> --- >>> block/blk-mq-sched.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c >>> index a9182d2f8ad3..80273245d11a 100644 >>> --- a/block/blk-mq-sched.c >>> +++ b/block/blk-mq-sched.c >>> @@ -680,6 +680,7 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e) >>> { >>> struct blk_mq_hw_ctx *hctx; >>> unsigned int i; >>> + unsigned int flags = 0; >>> queue_for_each_hw_ctx(q, hctx, i) { >>> blk_mq_debugfs_unregister_sched_hctx(hctx); >>> @@ -687,12 +688,13 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e) >>> e->type->ops.exit_hctx(hctx, i); >>> hctx->sched_data = NULL; >>> } >>> + flags = hctx->flags; >> >> I know the choice is limited, but it is unfortunate that we must set flags >> in a loop > > Does it matter? It's just a nit on the coding style: it's not an especially good practice to set the same value in a loop. But, as I said, choice is limited. > >> >>> } >>> blk_mq_debugfs_unregister_sched(q); >>> if (e->type->ops.exit_sched) >>> e->type->ops.exit_sched(e); >>> blk_mq_sched_tags_teardown(q); >>> - if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) >>> + if (blk_mq_is_sbitmap_shared(flags)) >>> blk_mq_exit_sched_shared_sbitmap(q); >> >> this is >> >> blk_mq_exit_sched_shared_sbitmap(struct request_queue *queue) >> { >> sbitmap_queue_free(&queue->sched_bitmap_tags); >> .. >> } >> >> And isn't it safe to call sbitmap_queue_free() when >> sbitmap_queue_init_node() has not been called? >> >> I'm just wondering if we can always call blk_mq_exit_sched_shared_sbitmap()? >> I know it's not an ideal choice either. > > So far it may work, not sure if it can in future, I suggest to follow > the traditional alloc & free pattern. > > Fine Thanks, John