From: Jan Kara <jack@suse.cz>
To: Paolo Valente <paolo.valente@linaro.org>
Cc: linux-block@vger.kernel.org, fvogdt@suse.de
Subject: Use after free with BFQ and cgroups
Date: Thu, 25 Nov 2021 18:28:09 +0100 [thread overview]
Message-ID: <20211125172809.GC19572@quack2.suse.cz> (raw)
Hello!
Our test VMs started crashing recently (seems to be starting with 5.15
kernel). When we enabled KASAN, we were getting reports of bfq_group being
used after being freed like following (the reports differ a bit in where
exactly did BFQ hit the UAF):
[ 235.949241] ==================================================================
[ 235.950326] BUG: KASAN: use-after-free in __bfq_deactivate_entity+0x9cb/0xa50
[ 235.951369] Read of size 8 at addr ffff88800693c0c0 by task runc:[2:INIT]/10544
[ 235.953476] CPU: 0 PID: 10544 Comm: runc:[2:INIT] Tainted: G E 5.15.2-0.g5fb85fd-default #1 openSUSE Tumbleweed (unreleased) f1f3b891c72369aebecd2e43e4641a6358867c70
[ 235.955726] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
[ 235.958007] Call Trace:
[ 235.959157] <IRQ>
[ 235.960287] dump_stack_lvl+0x46/0x5a
[ 235.961412] print_address_description.constprop.0+0x1f/0x140
[ 235.962556] ? __bfq_deactivate_entity+0x9cb/0xa50
[ 235.963707] kasan_report.cold+0x7f/0x11b
[ 235.964841] ? __bfq_deactivate_entity+0x9cb/0xa50
[ 235.965970] __bfq_deactivate_entity+0x9cb/0xa50
[ 235.967092] ? update_curr+0x32f/0x5d0
[ 235.968227] bfq_deactivate_entity+0xa0/0x1d0
[ 235.969365] bfq_del_bfqq_busy+0x28a/0x420
[ 235.970481] ? resched_curr+0x116/0x1d0
[ 235.971573] ? bfq_requeue_bfqq+0x70/0x70
[ 235.972657] ? check_preempt_wakeup+0x52b/0xbc0
[ 235.973748] __bfq_bfqq_expire+0x1a2/0x270
[ 235.974822] bfq_bfqq_expire+0xd16/0x2160
[ 235.975893] ? try_to_wake_up+0x4ee/0x1260
[ 235.976965] ? bfq_end_wr_async_queues+0xe0/0xe0
[ 235.978039] ? _raw_write_unlock_bh+0x60/0x60
[ 235.979105] ? _raw_spin_lock_irq+0x81/0xe0
[ 235.980162] bfq_idle_slice_timer+0x109/0x280
[ 235.981199] ? bfq_dispatch_request+0x4870/0x4870
[ 235.982220] __hrtimer_run_queues+0x37d/0x700
[ 235.983242] ? enqueue_hrtimer+0x1b0/0x1b0
[ 235.984278] ? kvm_clock_get_cycles+0xd/0x10
[ 235.985301] ? ktime_get_update_offsets_now+0x6f/0x280
[ 235.986317] hrtimer_interrupt+0x2c8/0x740
[ 235.987321] __sysvec_apic_timer_interrupt+0xcd/0x260
[ 235.988357] sysvec_apic_timer_interrupt+0x6a/0x90
[ 235.989373] </IRQ>
[ 235.990355] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 235.991366] RIP: 0010:do_seccomp+0x4f5/0x1f40
[ 235.992376] Code: 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 cb 14 00 00 48 8b bd d8 0b 00 00 c6 07 00 0f 1f 40 00 fb 66 0f 1f 44 00 00 <8b> 4c 24 30 85 c9 0f 85 06 07 00 00 8b 54 24 04 85 d2 74 19 4d 85
[ 235.994481] RSP: 0018:ffffc900020cfd48 EFLAGS: 00000246
[ 235.995546] RAX: dffffc0000000000 RBX: 1ffff92000419fb1 RCX: ffffffffb9a8d89d
[ 235.996638] RDX: 1ffff1100080f17b RSI: 0000000000000008 RDI: ffff888008c56040
[ 235.997717] RBP: ffff888004078000 R08: 0000000000000001 R09: ffff88800407800f
[ 235.998784] R10: ffffed100080f001 R11: 0000000000000001 R12: 00000000ffffffff
[ 235.999852] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 236.000906] ? do_seccomp+0xfed/0x1f40
[ 236.001937] ? do_seccomp+0xfed/0x1f40
[ 236.002938] ? get_nth_filter+0x2e0/0x2e0
[ 236.003932] ? security_task_prctl+0x66/0xd0
[ 236.004910] __do_sys_prctl+0x420/0xd60
[ 236.005842] ? handle_mm_fault+0x196/0x610
[ 236.006739] ? __ia32_compat_sys_getrusage+0x90/0x90
[ 236.007611] ? up_read+0x15/0x90
[ 236.008477] do_syscall_64+0x5c/0x80
[ 236.009349] ? exc_page_fault+0x60/0xc0
[ 236.010219] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 236.011094] RIP: 0033:0x561fa9ceec6a
[ 236.011976] Code: e8 db 46 f8 ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 4c 8b 54 24 28 4c 8b 44 24 30 4c 8b 4c 24 38 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 40 ff ff ff ff 48 c7 44 24 48
[ 236.013823] RSP: 002b:000000c000116e38 EFLAGS: 00000216 ORIG_RAX: 000000000000009d
[ 236.014778] RAX: ffffffffffffffda RBX: 000000c000028000 RCX: 0000561fa9ceec6a
[ 236.015748] RDX: 000000c000116ee0 RSI: 0000000000000002 RDI: 0000000000000016
[ 236.016716] RBP: 000000c000116e90 R08: 0000000000000000 R09: 0000000000000000
[ 236.017685] R10: 0000000000000000 R11: 0000000000000216 R12: 00000000000000b8
[ 236.018645] R13: 00000000000000b7 R14: 0000000000000200 R15: 0000000000000004
[ 236.020558] Allocated by task 485:
[ 236.021511] kasan_save_stack+0x1b/0x40
[ 236.022460] __kasan_kmalloc+0xa4/0xd0
[ 236.023410] bfq_pd_alloc+0xa8/0x170
[ 236.024351] blkg_alloc+0x397/0x540
[ 236.025287] blkg_create+0x66b/0xcd0
[ 236.026219] bio_associate_blkg_from_css+0x43c/0xb20
[ 236.027161] bio_associate_blkg+0x66/0x100
[ 236.028098] submit_extent_page+0x744/0x1380 [btrfs]
[ 236.029126] __extent_writepage_io+0x605/0xaa0 [btrfs]
[ 236.030113] __extent_writepage+0x360/0x740 [btrfs]
[ 236.031093] extent_write_cache_pages+0x5a7/0xa50 [btrfs]
[ 236.032084] extent_writepages+0xcb/0x1a0 [btrfs]
[ 236.033063] do_writepages+0x188/0x720
[ 236.033997] filemap_fdatawrite_wbc+0x19f/0x2b0
[ 236.034929] filemap_fdatawrite_range+0x99/0xd0
[ 236.035855] btrfs_fdatawrite_range+0x46/0xf0 [btrfs]
[ 236.036833] start_ordered_ops.constprop.0+0xb6/0x110 [btrfs]
[ 236.037803] btrfs_sync_file+0x1bf/0xe70 [btrfs]
[ 236.038747] __x64_sys_fsync+0x51/0x80
[ 236.039622] do_syscall_64+0x5c/0x80
[ 236.040468] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 236.042137] Freed by task 10561:
[ 236.042966] kasan_save_stack+0x1b/0x40
[ 236.043802] kasan_set_track+0x1c/0x30
[ 236.044628] kasan_set_free_info+0x20/0x30
[ 236.045437] __kasan_slab_free+0x10b/0x140
[ 236.046256] slab_free_freelist_hook+0x8e/0x180
[ 236.047081] kfree+0xc7/0x400
[ 236.047907] blkg_free.part.0+0x78/0xf0
[ 236.048736] rcu_do_batch+0x365/0x1280
[ 236.049558] rcu_core+0x493/0x8d0
[ 236.050376] __do_softirq+0x18e/0x544
After some poking, looking into crashdumps, and applying some debug patches
the following seems to be happening: We have a process P in blkcg G. Now
G is taken offline so bfq_group is cleaned up in bfq_pd_offline() but P
still holds reference to G from its bfq_queue. Then P submits IO, G gets
inserted into service tree despite being already offline. IO completes, P
exits, bfq_queue pointing to G gets destroyed, the last reference to G is
dropped, G gets freed although is it still inserted in the service tree.
Eventually someone trips over the freed memory.
Now I was looking into how to best fix this. There are several
possibilities and I'm not sure which one to pick so that's why I'm writing
to you. bfq_pd_offline() is walking all entities in service trees and
trying to get rid of references to bfq_group (by reparenting entities).
Is this guaranteed to see all entities that point to G? From the scenario
I'm observing it seems this can miss entities pointing to G - e.g. if they
are in idle tree, we will just remove them from the idle tree but we won't
change entity->parent so they still point to G. This can be seen as one
culprit of the bug.
Or alternatively, should we e.g. add __bfq_deactivate_entity() to
bfq_put_queue() when that function is dropping last queue in a bfq_group?
Or should we just reparent bfq queues that have already dead parent on
activation?
What's your opinion?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next reply other threads:[~2021-11-25 17:30 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-25 17:28 Jan Kara [this message]
2021-11-26 14:47 ` Use after free with BFQ and cgroups Michal Koutný
2021-11-29 17:11 ` Jan Kara
2021-12-09 2:23 ` yukuai (C)
2021-12-09 15:33 ` Paolo Valente
2021-12-13 17:33 ` Jan Kara
2021-12-14 1:24 ` yukuai (C)
2021-12-20 18:38 ` Jan Kara
2021-12-22 15:21 ` Jan Kara
2021-12-23 1:02 ` yukuai (C)
2021-12-23 17:13 ` Jan Kara
2021-11-29 17:12 ` Tejun Heo
2021-11-30 11:50 ` Jan Kara
2021-11-30 16:22 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211125172809.GC19572@quack2.suse.cz \
--to=jack@suse.cz \
--cc=fvogdt@suse.de \
--cc=linux-block@vger.kernel.org \
--cc=paolo.valente@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).