From: John Garry <john.garry@huawei.com>
To: Bart Van Assche <bvanassche@acm.org>, <axboe@kernel.dk>,
<ming.lei@redhat.com>
Cc: <linux-block@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<hch@lst.de>, <hare@suse.de>, <kashyap.desai@broadcom.com>,
<linuxarm@huawei.com>
Subject: Re: [RFC PATCH v2 2/2] blk-mq: Lockout tagset iter when freeing rqs
Date: Mon, 4 Jan 2021 15:33:41 +0000 [thread overview]
Message-ID: <d22efcd3-274a-15c5-9e4a-248037789c4d@huawei.com> (raw)
In-Reply-To: <0ab85ab8-c5c7-01aa-6b39-da731b3db829@acm.org>
On 23/12/2020 15:47, Bart Van Assche wrote:
> On 12/23/20 3:40 AM, John Garry wrote:
>> Sorry, I got the 2x iter functions mixed up.
>>
>> So if we use mutex to solve blk_mq_queue_tag_busy_iter() problem, then we
>> still have this issue in blk_mq_tagset_busy_iter() which I report previously
>> [0]:
>>
>> [ 319.771745] BUG: KASAN: use-after-free in bt_tags_iter+0xe0/0x128
>> [ 319.777832] Read of size 4 at addr ffff0010b6bd27cc by task more/1866
>> [ 319.784262]
>> [ 319.785753] CPU: 61 PID: 1866 Comm: more Tainted: G W
>> 5.10.0-rc4-18118-gaa7b9c30d8ff #1070
>> [ 319.795312] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
>> D05 IT21 Nemo 2.0 RC0 04/18/2018
>> [ 319.804437] Call trace:
>> [ 319.806892] dump_backtrace+0x0/0x2d0
>> [ 319.810552] show_stack+0x18/0x68
>> [ 319.813865] dump_stack+0x100/0x16c
>> [ 319.817348] print_address_description.constprop.12+0x6c/0x4e8
>> [ 319.823176] kasan_report+0x130/0x200
>> [ 319.826831] __asan_load4+0x9c/0xd8
>> [ 319.830315] bt_tags_iter+0xe0/0x128
>> [ 319.833884] __blk_mq_all_tag_iter+0x320/0x3a8
>> [ 319.838320] blk_mq_tagset_busy_iter+0x8c/0xd8
>> [ 319.842760] scsi_host_busy+0x88/0xb8
>> [ 319.846418] show_host_busy+0x1c/0x48
>> [ 319.850079] dev_attr_show+0x44/0x90
>> [ 319.853655] sysfs_kf_seq_show+0x128/0x1c8
>> [ 319.857744] kernfs_seq_show+0xa0/0xb8
>> [ 319.861489] seq_read_iter+0x1ec/0x6a0
>> [ 319.865230] seq_read+0x1d0/0x250
>> [ 319.868539] kernfs_fop_read+0x70/0x330
>> [ 319.872369] vfs_read+0xe4/0x250
>> [ 319.875590] ksys_read+0xc8/0x178
>> [ 319.878898] __arm64_sys_read+0x44/0x58
>> [ 319.882730] el0_svc_common.constprop.2+0xc4/0x1e8
>> [ 319.887515] do_el0_svc+0x90/0xa0
>> [ 319.890824] el0_sync_handler+0x128/0x178
>> [ 319.894825] el0_sync+0x158/0x180
>> [ 319.898131]
>> [ 319.899614] The buggy address belongs to the page:
>> [ 319.904403] page:000000004e9e6864 refcount:0 mapcount:0
>> mapping:0000000000000000 index:0x0 pfn:0x10b6bd2
>> [ 319.913876] flags: 0xbfffc0000000000()
>> [ 319.917626] raw: 0bfffc0000000000 0000000000000000 fffffe0000000000
>> 0000000000000000
>> [ 319.925363] raw: 0000000000000000 0000000000000000 00000000ffffffff
>> 0000000000000000
>> [ 319.933096] page dumped because: kasan: bad access detected
>> [ 319.938658]
>> [ 319.940141] Memory state around the buggy address:
>> [ 319.944925] ffff0010b6bd2680: ff ff ff ff ff ff ff ff ff ff ff ff ff
>> ff ff ff
>> [ 319.952139] ffff0010b6bd2700: ff ff ff ff ff ff ff ff ff ff ff ff ff
>> ff ff ff
>> [ 319.959354] >ffff0010b6bd2780: ff ff ff ff ff ff ff ff ff ff ff ff ff
>> ff ff ff
>> [ 319.966566] ^
>> [ 319.972131] ffff0010b6bd2800: ff ff ff ff ff ff ff ff ff ff ff ff ff
>> ff ff ff
>> [ 319.979344] ffff0010b6bd2880: ff ff ff ff ff ff ff ff ff ff ff ff ff
>> ff ff ff
>> [ 319.986557]
>> ==================================================================
>> [ 319.993770] Disabling lock debugging due to kernel taint
>>
>> So to trigger this, I start fio on a disk, and then have one script
>> which constantly enables and disables an IO scheduler for that disk, and
>> another script which constantly reads /sys/class/scsi_host/host0/host_busy .
>>
>> And in this problem, the driver tag we iterate may point to a stale IO sched
>> request.
>
> Hi John,
Hi Bart,
Sorry for the slow reply, but I was on vacation since before you sent
this mail.
>
> I propose to change the order in which blk_mq_sched_free_requests(q) and
> blk_mq_debugfs_unregister(q) are called. Today blk_mq_sched_free_requests(q)
> is called by blk_cleanup_queue() before blk_put_queue() is called.
> blk_put_queue() calls blk_release_queue() if the last reference is dropped.
> blk_release_queue() calls blk_mq_debugfs_unregister(). I prefer removing the
> debugfs attributes earlier over modifying the tag iteration functions
> because I think removing the debugfs attributes earlier is less risky.
But don't we already have this following path to remove the per-hctx
debugfs dir earlier than blk_mq_sched_free_requests() or
blk_release_queue():
blk_cleanup_queue() -> blk_mq_exit_queue() -> blk_mq_exit_hw_queues() ->
blk_mq_debugfs_unregister_hctx() ->
blk_mq_debugfs_unregister_hctx(hctx->debugfs_dir)
Having said that, I am not sure how this is related directly to the
problem I mentioned. In that problem, above, we trigger the
blk_mq_tagset_busy_iter() from the SCSI host sysfs file, and the
use-after-free comes about from disabling the elevator (and freeing the
sched requests) in parallel.
> Although this will make it harder to debug lockups that happen while
> removing a request queue, kernel developers who are analyzing such an issue
> can undo this change in their development kernel tree.
>
Thanks,
John
next prev parent reply other threads:[~2021-01-04 15:35 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-17 11:07 [RFC PATCH v2 0/2] blk-mq: Avoid use-after-free for accessing old requests John Garry
2020-12-17 11:07 ` [RFC PATCH v2 1/2] blk-mq: Clean up references to old requests when freeing rqs John Garry
2020-12-17 11:07 ` [RFC PATCH v2 2/2] blk-mq: Lockout tagset iter " John Garry
2020-12-18 1:55 ` Bart Van Assche
2020-12-18 9:30 ` John Garry
2020-12-18 3:31 ` Ming Lei
2020-12-18 10:01 ` John Garry
2020-12-18 22:43 ` Bart Van Assche
2020-12-21 12:06 ` John Garry
2020-12-21 18:09 ` Bart Van Assche
2020-12-21 18:47 ` John Garry
2020-12-22 2:13 ` Bart Van Assche
2020-12-22 11:15 ` John Garry
2020-12-22 16:16 ` Bart Van Assche
2020-12-23 11:10 ` John Garry
2020-12-23 11:40 ` John Garry
2020-12-23 15:47 ` Bart Van Assche
2021-01-04 15:33 ` John Garry [this message]
2021-01-04 17:22 ` Bart Van Assche
2021-01-04 18:43 ` John Garry
[not found] ` <760304b3-dcbc-5b9d-0c70-627b7ff5b4eb@huawei.com>
2021-02-10 14:39 ` John Garry
2020-12-22 11:22 ` John Garry
2020-12-22 13:24 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d22efcd3-274a-15c5-9e4a-248037789c4d@huawei.com \
--to=john.garry@huawei.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=kashyap.desai@broadcom.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=ming.lei@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).