* BUG: KASAN: use-after-free in dec_rlimit_ucounts @ 2021-11-17 22:00 Qian Cai 2021-11-18 19:46 ` Eric W. Biederman 0 siblings, 1 reply; 15+ messages in thread From: Qian Cai @ 2021-11-17 22:00 UTC (permalink / raw) To: Eric W. Biederman, Alexey Gladkov; +Cc: Yu Zhao, linux-kernel Hi there, I can still reproduce this quickly on today's linux-next and all the way back to 5.15-rc6 by running a syscall fuzzer for a while. The trace points out to this line, for (iter = ucounts; iter; iter = iter->ns->ucounts) { It looks KASAN indicated that that "ns" had already been freed. Is that possible or perhaps this is more of refcount issue? BUG: KASAN: use-after-free in dec_rlimit_ucounts Read of size 8 at addr ffff0008c0739860 by task trinity-c27/10924 CPU: 27 PID: 10924 Comm: trinity-c27 Not tainted 5.15.0-next-20211115-dirty #192 Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020 Call trace: dump_backtrace show_stack dump_stack_lvl print_address_description.constprop.0 kasan_report __asan_report_load8_noabort dec_rlimit_ucounts dec_rlimit_ucounts at kernel/ucount.c:284 mqueue_evict_inode mqueue_evict_inode at ipc/mqueue.c:544 evict iput.part.0 iput __arm64_sys_mq_unlink invoke_syscall el0_svc_common.constprop.0 do_el0_svc el0_svc el0t_64_sync_handler el0t_64_sync Allocated by task 10615: kasan_save_stack __kasan_slab_alloc slab_post_alloc_hook kmem_cache_alloc create_user_ns unshare_userns ksys_unshare __arm64_sys_unshare invoke_syscall el0_svc_common.constprop.0 do_el0_svc el0_svc el0t_64_sync_handler el0t_64_sync Freed by task 8660: kasan_save_stack kasan_set_track kasan_set_free_info __kasan_slab_free slab_free_freelist_hook kmem_cache_free free_user_ns process_one_work worker_thread kthread ret_from_fork Last potentially related work creation: kasan_save_stack __kasan_record_aux_stack kasan_record_aux_stack_noalloc insert_work __queue_work queue_work_on __put_user_ns put_cred_rcu rcu_do_batch rcu_core rcu_core_si __do_softirq The buggy address belongs to the object at ffff0008c07395e8 which belongs to the cache user_namespace of size 768 The buggy address is located 632 bytes inside of 768-byte region [ffff0008c07395e8, ffff0008c07398e8) The buggy address belongs to the page: page:fffffc002301ce00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff0008c073cec8 pfn:0x940738 head:fffffc002301ce00 order:3 compound_mapcount:0 compound_pincount:0 memcg:ffff0008b9b5f101 flags: 0xbfffc0000010200(slab|head|node=0|zone=2|lastcpupid=0xffff) raw: 0bfffc0000010200 ffff000800f3e9c8 ffff000800f3e9c8 ffff000802e69b80 raw: ffff0008c073cec8 00000000001d0012 00000001ffffffff ffff0008b9b5f101 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff0008c0739700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff0008c0739780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff0008c0739800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff0008c0739880: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc ffff0008c0739900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG: KASAN: use-after-free in dec_rlimit_ucounts 2021-11-17 22:00 BUG: KASAN: use-after-free in dec_rlimit_ucounts Qian Cai @ 2021-11-18 19:46 ` Eric W. Biederman 2021-11-18 20:32 ` Qian Cai 0 siblings, 1 reply; 15+ messages in thread From: Eric W. Biederman @ 2021-11-18 19:46 UTC (permalink / raw) To: Qian Cai; +Cc: Alexey Gladkov, Yu Zhao, linux-kernel Qian Cai <quic_qiancai@quicinc.com> writes: > Hi there, I can still reproduce this quickly on today's linux-next and all > the way back to 5.15-rc6 by running a syscall fuzzer for a while. The trace > points out to this line, > > for (iter = ucounts; iter; iter = iter->ns->ucounts) { > > It looks KASAN indicated that that "ns" had already been freed. Is that > possible or perhaps this is more of refcount issue? Is it possible? Yes it is possible. That is one place where a use-after-free has shown up and I expect would show up in the future. That said it is hard to believe there is still a user-after-free in the code. We spent the last kernel development cycle pouring through and correcting everything we saw until we ultimately found one very subtle use-after-free. If you have a reliable reproducer that you can share, we can look into this and see if we can track down where the reference count is going bad. It tends to take instrumenting the entire life cycle every increment and every decrement and then pouring through the logs to track down a use-after-free. Which is not something we can really do without a reproducer. Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG: KASAN: use-after-free in dec_rlimit_ucounts 2021-11-18 19:46 ` Eric W. Biederman @ 2021-11-18 20:32 ` Qian Cai 2021-11-18 20:57 ` Eric W. Biederman 0 siblings, 1 reply; 15+ messages in thread From: Qian Cai @ 2021-11-18 20:32 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Alexey Gladkov, Yu Zhao, linux-kernel On Thu, Nov 18, 2021 at 01:46:05PM -0600, Eric W. Biederman wrote: > Is it possible? Yes it is possible. That is one place where > a use-after-free has shown up and I expect would show up in the > future. > > That said it is hard to believe there is still a user-after-free in the > code. We spent the last kernel development cycle pouring through and > correcting everything we saw until we ultimately found one very subtle > use-after-free. > > If you have a reliable reproducer that you can share, we can look into > this and see if we can track down where the reference count is going > bad. > > It tends to take instrumenting the entire life cycle every increment and > every decrement and then pouring through the logs to track down a > use-after-free. Which is not something we can really do without a > reproducer. The reproducer is just to run trinity by an unprivileged user on defconfig with KASAN enabled (On linux-next, you can do "make defconfig debug.conf" [1], but dont think other debugging options are relevent here.) $ trinity -C 31 -N 10000000 It is always reproduced on an arm64 server here within 5-minute so far. Some debugging progress so far. BTW, this could happen on user_shm_unlock() path as well. Call trace: dec_rlimit_ucounts user_shm_unlock (inlined by) user_shm_unlock at mm/mlock.c:854 shmem_lock shmctl_do_lock ksys_shmctl.constprop.0 __arm64_sys_shmctl invoke_syscall el0_svc_common.constprop.0 do_el0_svc el0_svc el0t_64_sync_handler el0t_64_sync I noticed in dec_rlimit_ucounts(), dec == 0 and type == UCOUNT_RLIMIT_MEMLOCK. [1] https://lore.kernel.org/lkml/20211115134754.7334-1-quic_qiancai@quicinc.com/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG: KASAN: use-after-free in dec_rlimit_ucounts 2021-11-18 20:32 ` Qian Cai @ 2021-11-18 20:57 ` Eric W. Biederman 2021-11-19 13:32 ` Qian Cai 2021-11-24 21:49 ` Qian Cai 0 siblings, 2 replies; 15+ messages in thread From: Eric W. Biederman @ 2021-11-18 20:57 UTC (permalink / raw) To: Qian Cai; +Cc: Alexey Gladkov, Yu Zhao, linux-kernel Qian Cai <quic_qiancai@quicinc.com> writes: > On Thu, Nov 18, 2021 at 01:46:05PM -0600, Eric W. Biederman wrote: >> Is it possible? Yes it is possible. That is one place where >> a use-after-free has shown up and I expect would show up in the >> future. >> >> That said it is hard to believe there is still a user-after-free in the >> code. We spent the last kernel development cycle pouring through and >> correcting everything we saw until we ultimately found one very subtle >> use-after-free. >> >> If you have a reliable reproducer that you can share, we can look into >> this and see if we can track down where the reference count is going >> bad. >> >> It tends to take instrumenting the entire life cycle every increment and >> every decrement and then pouring through the logs to track down a >> use-after-free. Which is not something we can really do without a >> reproducer. > > The reproducer is just to run trinity by an unprivileged user on defconfig > with KASAN enabled (On linux-next, you can do "make defconfig debug.conf" > [1], but dont think other debugging options are relevent here.) > > $ trinity -C 31 -N 10000000 > > It is always reproduced on an arm64 server here within 5-minute so far. > Some debugging progress so far. BTW, this could happen on user_shm_unlock() > path as well. Does this only happen on a single architecture? If so I wonder if perhaps some of the architectures atomic primitives are implemented improperly. Unfortunately I don't have any arm64 machines where I can easily test this. The call path you posted from user_shm_unlock is another path where a use-after-free has show up in the past. My blind guess would be that I made an implementation mistake in inc_rlimit_get_ucounts or dec_rlimit_put_ucounts but I can't see it right now. Eric > Call trace: > dec_rlimit_ucounts > user_shm_unlock > (inlined by) user_shm_unlock at mm/mlock.c:854 > shmem_lock > shmctl_do_lock > ksys_shmctl.constprop.0 > __arm64_sys_shmctl > invoke_syscall > el0_svc_common.constprop.0 > do_el0_svc > el0_svc > el0t_64_sync_handler > el0t_64_sync > > I noticed in dec_rlimit_ucounts(), dec == 0 and type == > UCOUNT_RLIMIT_MEMLOCK. > > [1] https://lore.kernel.org/lkml/20211115134754.7334-1-quic_qiancai@quicinc.com/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG: KASAN: use-after-free in dec_rlimit_ucounts 2021-11-18 20:57 ` Eric W. Biederman @ 2021-11-19 13:32 ` Qian Cai 2021-11-24 21:49 ` Qian Cai 1 sibling, 0 replies; 15+ messages in thread From: Qian Cai @ 2021-11-19 13:32 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Alexey Gladkov, Yu Zhao, linux-kernel On Thu, Nov 18, 2021 at 02:57:17PM -0600, Eric W. Biederman wrote: > Does this only happen on a single architecture? If so I wonder if > perhaps some of the architectures atomic primitives are implemented > improperly. No, I just don't have another arch to test this on, and I see no reason that it won't be reproduced on x86. If arm64 atomic primitives are problematic, it will likely blow up elsewhere which is not the case from our daily CI regression testing running for many years. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG: KASAN: use-after-free in dec_rlimit_ucounts 2021-11-18 20:57 ` Eric W. Biederman @ 2021-11-24 21:49 ` Qian Cai 2021-11-24 21:49 ` Qian Cai 1 sibling, 0 replies; 15+ messages in thread From: Qian Cai @ 2021-11-24 21:49 UTC (permalink / raw) To: Eric W. Biederman Cc: Alexey Gladkov, Yu Zhao, linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland, linux-arm-kernel On Thu, Nov 18, 2021 at 02:57:17PM -0600, Eric W. Biederman wrote: > Qian Cai <quic_qiancai@quicinc.com> writes: > > > On Thu, Nov 18, 2021 at 01:46:05PM -0600, Eric W. Biederman wrote: > >> Is it possible? Yes it is possible. That is one place where > >> a use-after-free has shown up and I expect would show up in the > >> future. > >> > >> That said it is hard to believe there is still a user-after-free in the > >> code. We spent the last kernel development cycle pouring through and > >> correcting everything we saw until we ultimately found one very subtle > >> use-after-free. > >> > >> If you have a reliable reproducer that you can share, we can look into > >> this and see if we can track down where the reference count is going > >> bad. > >> > >> It tends to take instrumenting the entire life cycle every increment and > >> every decrement and then pouring through the logs to track down a > >> use-after-free. Which is not something we can really do without a > >> reproducer. > > > > The reproducer is just to run trinity by an unprivileged user on defconfig > > with KASAN enabled (On linux-next, you can do "make defconfig debug.conf" > > [1], but dont think other debugging options are relevent here.) > > > > $ trinity -C 31 -N 10000000 > > > > It is always reproduced on an arm64 server here within 5-minute so far. > > Some debugging progress so far. BTW, this could happen on user_shm_unlock() > > path as well. > > Does this only happen on a single architecture? If so I wonder if > perhaps some of the architectures atomic primitives are implemented > improperly. Hmm, I don't know if that or it is just this platfrom is lucky to trigger the race condition quickly, but I can't reproduce it on x86 so far. I am Cc'ing a few arm64 people to see if they have spot anything I might be missing. The original bug report is here: https://lore.kernel.org/lkml/YZV7Z+yXbsx9p3JN@fixkernel.com/ I did narrow it down the same traces were first introduced by those commits: d7c9e99aee48 Reimplement RLIMIT_MEMLOCK on top of ucounts d64696905554 Reimplement RLIMIT_SIGPENDING on top of ucounts 6e52a9f0532f Reimplement RLIMIT_MSGQUEUE on top of ucounts 21d1c5e386bc Reimplement RLIMIT_NPROC on top of ucounts b6c336528926 Use atomic_t for ucounts reference counting 905ae01c4ae2 Add a reference to ucounts for each cred f9c82a4ea89c Increase size of ucounts to atomic_long_t Also, I added a debugging patch here: --- a/mm/mlock.c +++ b/mm/mlock.c @@ -847,8 +847,14 @@ int user_shm_lock(size_t size, struct ucounts *ucounts) void user_shm_unlock(size_t size, struct ucounts *ucounts) { + int i; + spin_lock(&shmlock_user_lock); + printk("KK user_shm_unlock ucounts = %d\n", atomic_read(&ucounts->count)); + for (i = 0; i < UCOUNT_COUNTS; i++) + printk("KK type = %d, count = %ld\n", i, atomic_long_read(&ucounts->ucount[i])); dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, (size + PAGE_SIZE - 1) >> PAGE_SHIFT); + printk("size = %zu, count = %ld\n", size, atomic_long_read(&ucounts->ucount[UCOUNT_RLIMIT_MEMLOCK])); spin_unlock(&shmlock_user_lock); put_ucounts(ucounts) Then, I noticed that ucounts->count is off-by-one. Since the later put_ucounts() would free the "ucounts", I am wondering if it is actually correct that "ucounts->count == 1" when entering user_shm_unlock(), uncounts->ns has already gone. Thus, dec_rlimit_ucounts() should not blindly traverse ucounts->ns ? [ 214.541754] KK user_shm_unlock ucounts = 1 [ 214.545871] KK type = 0, count = 0 [ 214.549288] KK type = 1, count = 0 [ 214.552697] KK type = 2, count = 0 [ 214.556104] KK type = 3, count = 0 [ 214.559511] KK type = 4, count = 0 [ 214.562920] KK type = 5, count = 0 [ 214.566314] KK type = 6, count = 0 [ 214.569718] KK type = 7, count = 0 [ 214.573132] KK type = 8, count = 0 [ 214.576537] KK type = 9, count = 0 [ 214.579945] KK type = 10, count = 0 [ 214.583441] KK type = 11, count = 0 [ 214.586940] KK type = 12, count = 0 [ 214.590420] KK type = 13, count = 1 [ 214.593917] ================================================================== [ 214.601130] BUG: KASAN: use-after-free in dec_rlimit_ucounts+0xe8/0xf0 [ 214.607657] Read of size 8 at addr ffff000905ee12f0 by task trinity-c2/9708 [ 214.614611] [ 214.616093] CPU: 13 PID: 9708 Comm: trinity-c2 Not tainted 5.12.0-00007-gd7c9e99aee48-dirty #221 [ 214.624870] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020 [ 214.632689] Call trace: [ 214.635124] dump_backtrace+0x0/0x350 [ 214.638781] show_stack+0x18/0x28 [ 214.642088] dump_stack+0x120/0x18c [ 214.645570] print_address_description.constprop.0+0x6c/0x30c [ 214.651309] kasan_report+0x1d8/0x1f0 [ 214.654964] __asan_report_load8_noabort+0x34/0x60 [ 214.659747] dec_rlimit_ucounts+0xe8/0xf0 [ 214.663748] user_shm_unlock+0xdc/0x338 [ 214.667577] shmem_lock+0x154/0x250 [ 214.671057] shmctl_do_lock+0x310/0x5d8 [ 214.674886] ksys_shmctl.constprop.0+0x200/0x588 [ 214.679496] __arm64_sys_shmctl+0x6c/0xa0 [ 214.683497] el0_svc_common.constprop.0+0xe4/0x300 [ 214.688281] do_el0_svc+0x48/0xd0 [ 214.691587] el0_svc+0x24/0x38 [ 214.694633] el0_sync_handler+0xb0/0xb8 [ 214.698460] el0_sync+0x174/0x180 [ 214.701766] [ 214.703247] Allocated by task 9392: [ 214.706726] kasan_save_stack+0x28/0x58 [ 214.710555] __kasan_slab_alloc+0x88/0xa8 [ 214.714555] kmem_cache_alloc+0x190/0x5b0 [ 214.718555] create_user_ns+0x158/0xa60 [ 214.722384] unshare_userns+0x44/0xe0 [ 214.726038] ksys_unshare+0x23c/0x580 [ 214.729693] __arm64_sys_unshare+0x30/0x50 [ 214.733781] el0_svc_common.constprop.0+0xe4/0x300 [ 214.738564] do_el0_svc+0x48/0xd0 [ 214.741871] e [ 214.752048] asan_set_track+0x28/0x40 [ 214.764227] kasan_set_free_info+0x28/0x50 [ 214.768314] __kasan_slab_free+0xd0/0x130 [ 214.772316] kmem_cache_free+0xb4/0x390 [ 214.776146] free_user_ns+0x108/0x2a8 [ 214.779802] process_one_work+0x684/0xfd0 [ 214.783804] worker_thread+0x314/0xc78 [ 214.787543] kthread+0x3a4/0x460 [ 214.790763] ret_from_fork+0x10/0x30 [ 214.794330] [ 214.795811] Last potentially related work creation: [ 214.800678] kasan_save_stack+0x28/0x58 [ 214.804505] kasan_record_aux_stack+0xc0/0xd8 [ 214.808853] insert_work+0x50/0x2f0 [ 214.812334] __queue_work+0x314/0xac8 [ 214.815988] queue_work_on+0x94/0xc8 [ 214.819555] __put_user_ns+0x3c/0x60 [ 214.823122] put_cred_rcu+0x208/0x2f8 [ 214.826775] rcu_core+0x734/0xf68 [ 214.830083] rcu_core_si+0x10/0x20 [ 214.833477] __do_softirq+0x28c/0x774 [ 214.837130] [ 214.838610] The buggy address belongs to the object at ffff000905ee1110 [ 214.838610] which belongs to the cache user_namespace of size 600 [ 214.851378] The buggy address is located 480 bytes inside of [ 214.851378] 600-byte region [ffff000905ee1110, ffff000905ee1368) [ 214.863105] The buggy address belongs to the page: [ 214.867886] page:000000000a048a0d refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x985ee0 [ 214.877271] head:000000000a048a0d order:3 compound_mapcount:0 compound_pincount:0 [ 214.884744] flags: 0xbfffc0000010200(slab|head) [ 214.889270] raw: 0bfffc0000010200 dead000000000100 dead000000000122 ffff0008002a3180 [ 214.897003] raw: 0000000000000000 00000000802d002d 00000001ffffffff 0000000000000000 [ 214.904734] page dumped because: kasan: bad access detected [ 214.910296] [ 214.911776] Memory state around the buggy address: [ 214.916557] ffff000905ee1180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 214.923769] ffff000905ee1200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 214.930981] >ffff000905ee1280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 214.938191] ^ [ 214.945056] ffff000905ee1300: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc [ 214.952267] ffff000905ee1380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 214.959477] ================================================================== [ 214.967070] Disabling lock debugging due to kernel taint [ 214.972398] size = 4096, count = 0 > > Unfortunately I don't have any arm64 machines where I can easily test > this. > > The call path you posted from user_shm_unlock is another path where > a use-after-free has show up in the past. > > My blind guess would be that I made an implementation mistake in > inc_rlimit_get_ucounts or dec_rlimit_put_ucounts but I can't see it > right now. > > Eric > > > Call trace: > > dec_rlimit_ucounts > > user_shm_unlock > > (inlined by) user_shm_unlock at mm/mlock.c:854 > > shmem_lock > > shmctl_do_lock > > ksys_shmctl.constprop.0 > > __arm64_sys_shmctl > > invoke_syscall > > el0_svc_common.constprop.0 > > do_el0_svc > > el0_svc > > el0t_64_sync_handler > > el0t_64_sync > > > > I noticed in dec_rlimit_ucounts(), dec == 0 and type == > > UCOUNT_RLIMIT_MEMLOCK. > > > > [1] https://lore.kernel.org/lkml/20211115134754.7334-1-quic_qiancai@quicinc.com/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG: KASAN: use-after-free in dec_rlimit_ucounts @ 2021-11-24 21:49 ` Qian Cai 0 siblings, 0 replies; 15+ messages in thread From: Qian Cai @ 2021-11-24 21:49 UTC (permalink / raw) To: Eric W. Biederman Cc: Alexey Gladkov, Yu Zhao, linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland, linux-arm-kernel On Thu, Nov 18, 2021 at 02:57:17PM -0600, Eric W. Biederman wrote: > Qian Cai <quic_qiancai@quicinc.com> writes: > > > On Thu, Nov 18, 2021 at 01:46:05PM -0600, Eric W. Biederman wrote: > >> Is it possible? Yes it is possible. That is one place where > >> a use-after-free has shown up and I expect would show up in the > >> future. > >> > >> That said it is hard to believe there is still a user-after-free in the > >> code. We spent the last kernel development cycle pouring through and > >> correcting everything we saw until we ultimately found one very subtle > >> use-after-free. > >> > >> If you have a reliable reproducer that you can share, we can look into > >> this and see if we can track down where the reference count is going > >> bad. > >> > >> It tends to take instrumenting the entire life cycle every increment and > >> every decrement and then pouring through the logs to track down a > >> use-after-free. Which is not something we can really do without a > >> reproducer. > > > > The reproducer is just to run trinity by an unprivileged user on defconfig > > with KASAN enabled (On linux-next, you can do "make defconfig debug.conf" > > [1], but dont think other debugging options are relevent here.) > > > > $ trinity -C 31 -N 10000000 > > > > It is always reproduced on an arm64 server here within 5-minute so far. > > Some debugging progress so far. BTW, this could happen on user_shm_unlock() > > path as well. > > Does this only happen on a single architecture? If so I wonder if > perhaps some of the architectures atomic primitives are implemented > improperly. Hmm, I don't know if that or it is just this platfrom is lucky to trigger the race condition quickly, but I can't reproduce it on x86 so far. I am Cc'ing a few arm64 people to see if they have spot anything I might be missing. The original bug report is here: https://lore.kernel.org/lkml/YZV7Z+yXbsx9p3JN@fixkernel.com/ I did narrow it down the same traces were first introduced by those commits: d7c9e99aee48 Reimplement RLIMIT_MEMLOCK on top of ucounts d64696905554 Reimplement RLIMIT_SIGPENDING on top of ucounts 6e52a9f0532f Reimplement RLIMIT_MSGQUEUE on top of ucounts 21d1c5e386bc Reimplement RLIMIT_NPROC on top of ucounts b6c336528926 Use atomic_t for ucounts reference counting 905ae01c4ae2 Add a reference to ucounts for each cred f9c82a4ea89c Increase size of ucounts to atomic_long_t Also, I added a debugging patch here: --- a/mm/mlock.c +++ b/mm/mlock.c @@ -847,8 +847,14 @@ int user_shm_lock(size_t size, struct ucounts *ucounts) void user_shm_unlock(size_t size, struct ucounts *ucounts) { + int i; + spin_lock(&shmlock_user_lock); + printk("KK user_shm_unlock ucounts = %d\n", atomic_read(&ucounts->count)); + for (i = 0; i < UCOUNT_COUNTS; i++) + printk("KK type = %d, count = %ld\n", i, atomic_long_read(&ucounts->ucount[i])); dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, (size + PAGE_SIZE - 1) >> PAGE_SHIFT); + printk("size = %zu, count = %ld\n", size, atomic_long_read(&ucounts->ucount[UCOUNT_RLIMIT_MEMLOCK])); spin_unlock(&shmlock_user_lock); put_ucounts(ucounts) Then, I noticed that ucounts->count is off-by-one. Since the later put_ucounts() would free the "ucounts", I am wondering if it is actually correct that "ucounts->count == 1" when entering user_shm_unlock(), uncounts->ns has already gone. Thus, dec_rlimit_ucounts() should not blindly traverse ucounts->ns ? [ 214.541754] KK user_shm_unlock ucounts = 1 [ 214.545871] KK type = 0, count = 0 [ 214.549288] KK type = 1, count = 0 [ 214.552697] KK type = 2, count = 0 [ 214.556104] KK type = 3, count = 0 [ 214.559511] KK type = 4, count = 0 [ 214.562920] KK type = 5, count = 0 [ 214.566314] KK type = 6, count = 0 [ 214.569718] KK type = 7, count = 0 [ 214.573132] KK type = 8, count = 0 [ 214.576537] KK type = 9, count = 0 [ 214.579945] KK type = 10, count = 0 [ 214.583441] KK type = 11, count = 0 [ 214.586940] KK type = 12, count = 0 [ 214.590420] KK type = 13, count = 1 [ 214.593917] ================================================================== [ 214.601130] BUG: KASAN: use-after-free in dec_rlimit_ucounts+0xe8/0xf0 [ 214.607657] Read of size 8 at addr ffff000905ee12f0 by task trinity-c2/9708 [ 214.614611] [ 214.616093] CPU: 13 PID: 9708 Comm: trinity-c2 Not tainted 5.12.0-00007-gd7c9e99aee48-dirty #221 [ 214.624870] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020 [ 214.632689] Call trace: [ 214.635124] dump_backtrace+0x0/0x350 [ 214.638781] show_stack+0x18/0x28 [ 214.642088] dump_stack+0x120/0x18c [ 214.645570] print_address_description.constprop.0+0x6c/0x30c [ 214.651309] kasan_report+0x1d8/0x1f0 [ 214.654964] __asan_report_load8_noabort+0x34/0x60 [ 214.659747] dec_rlimit_ucounts+0xe8/0xf0 [ 214.663748] user_shm_unlock+0xdc/0x338 [ 214.667577] shmem_lock+0x154/0x250 [ 214.671057] shmctl_do_lock+0x310/0x5d8 [ 214.674886] ksys_shmctl.constprop.0+0x200/0x588 [ 214.679496] __arm64_sys_shmctl+0x6c/0xa0 [ 214.683497] el0_svc_common.constprop.0+0xe4/0x300 [ 214.688281] do_el0_svc+0x48/0xd0 [ 214.691587] el0_svc+0x24/0x38 [ 214.694633] el0_sync_handler+0xb0/0xb8 [ 214.698460] el0_sync+0x174/0x180 [ 214.701766] [ 214.703247] Allocated by task 9392: [ 214.706726] kasan_save_stack+0x28/0x58 [ 214.710555] __kasan_slab_alloc+0x88/0xa8 [ 214.714555] kmem_cache_alloc+0x190/0x5b0 [ 214.718555] create_user_ns+0x158/0xa60 [ 214.722384] unshare_userns+0x44/0xe0 [ 214.726038] ksys_unshare+0x23c/0x580 [ 214.729693] __arm64_sys_unshare+0x30/0x50 [ 214.733781] el0_svc_common.constprop.0+0xe4/0x300 [ 214.738564] do_el0_svc+0x48/0xd0 [ 214.741871] e [ 214.752048] asan_set_track+0x28/0x40 [ 214.764227] kasan_set_free_info+0x28/0x50 [ 214.768314] __kasan_slab_free+0xd0/0x130 [ 214.772316] kmem_cache_free+0xb4/0x390 [ 214.776146] free_user_ns+0x108/0x2a8 [ 214.779802] process_one_work+0x684/0xfd0 [ 214.783804] worker_thread+0x314/0xc78 [ 214.787543] kthread+0x3a4/0x460 [ 214.790763] ret_from_fork+0x10/0x30 [ 214.794330] [ 214.795811] Last potentially related work creation: [ 214.800678] kasan_save_stack+0x28/0x58 [ 214.804505] kasan_record_aux_stack+0xc0/0xd8 [ 214.808853] insert_work+0x50/0x2f0 [ 214.812334] __queue_work+0x314/0xac8 [ 214.815988] queue_work_on+0x94/0xc8 [ 214.819555] __put_user_ns+0x3c/0x60 [ 214.823122] put_cred_rcu+0x208/0x2f8 [ 214.826775] rcu_core+0x734/0xf68 [ 214.830083] rcu_core_si+0x10/0x20 [ 214.833477] __do_softirq+0x28c/0x774 [ 214.837130] [ 214.838610] The buggy address belongs to the object at ffff000905ee1110 [ 214.838610] which belongs to the cache user_namespace of size 600 [ 214.851378] The buggy address is located 480 bytes inside of [ 214.851378] 600-byte region [ffff000905ee1110, ffff000905ee1368) [ 214.863105] The buggy address belongs to the page: [ 214.867886] page:000000000a048a0d refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x985ee0 [ 214.877271] head:000000000a048a0d order:3 compound_mapcount:0 compound_pincount:0 [ 214.884744] flags: 0xbfffc0000010200(slab|head) [ 214.889270] raw: 0bfffc0000010200 dead000000000100 dead000000000122 ffff0008002a3180 [ 214.897003] raw: 0000000000000000 00000000802d002d 00000001ffffffff 0000000000000000 [ 214.904734] page dumped because: kasan: bad access detected [ 214.910296] [ 214.911776] Memory state around the buggy address: [ 214.916557] ffff000905ee1180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 214.923769] ffff000905ee1200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 214.930981] >ffff000905ee1280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 214.938191] ^ [ 214.945056] ffff000905ee1300: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc [ 214.952267] ffff000905ee1380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 214.959477] ================================================================== [ 214.967070] Disabling lock debugging due to kernel taint [ 214.972398] size = 4096, count = 0 > > Unfortunately I don't have any arm64 machines where I can easily test > this. > > The call path you posted from user_shm_unlock is another path where > a use-after-free has show up in the past. > > My blind guess would be that I made an implementation mistake in > inc_rlimit_get_ucounts or dec_rlimit_put_ucounts but I can't see it > right now. > > Eric > > > Call trace: > > dec_rlimit_ucounts > > user_shm_unlock > > (inlined by) user_shm_unlock at mm/mlock.c:854 > > shmem_lock > > shmctl_do_lock > > ksys_shmctl.constprop.0 > > __arm64_sys_shmctl > > invoke_syscall > > el0_svc_common.constprop.0 > > do_el0_svc > > el0_svc > > el0t_64_sync_handler > > el0t_64_sync > > > > I noticed in dec_rlimit_ucounts(), dec == 0 and type == > > UCOUNT_RLIMIT_MEMLOCK. > > > > [1] https://lore.kernel.org/lkml/20211115134754.7334-1-quic_qiancai@quicinc.com/ _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG: KASAN: use-after-free in dec_rlimit_ucounts 2021-11-24 21:49 ` Qian Cai @ 2021-11-26 5:34 ` Qian Cai -1 siblings, 0 replies; 15+ messages in thread From: Qian Cai @ 2021-11-26 5:34 UTC (permalink / raw) To: Eric W. Biederman Cc: Alexey Gladkov, Yu Zhao, linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland, linux-arm-kernel On Wed, Nov 24, 2021 at 04:49:19PM -0500, Qian Cai wrote: > Hmm, I don't know if that or it is just this platfrom is lucky to trigger > the race condition quickly, but I can't reproduce it on x86 so far. I am > Cc'ing a few arm64 people to see if they have spot anything I might be > missing. The original bug report is here: > > https://lore.kernel.org/lkml/YZV7Z+yXbsx9p3JN@fixkernel.com/ Okay, I am finally able to reproduce this on x86_64 with the latest mainline as well by setting CONFIG_USER_NS and KASAN on the top of defconfig (I did not realize it did not select CONFIG_USER_NS in the first place). Anyway, it still took less than 5-minute by running: $ trinity -C 48 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG: KASAN: use-after-free in dec_rlimit_ucounts @ 2021-11-26 5:34 ` Qian Cai 0 siblings, 0 replies; 15+ messages in thread From: Qian Cai @ 2021-11-26 5:34 UTC (permalink / raw) To: Eric W. Biederman Cc: Alexey Gladkov, Yu Zhao, linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland, linux-arm-kernel On Wed, Nov 24, 2021 at 04:49:19PM -0500, Qian Cai wrote: > Hmm, I don't know if that or it is just this platfrom is lucky to trigger > the race condition quickly, but I can't reproduce it on x86 so far. I am > Cc'ing a few arm64 people to see if they have spot anything I might be > missing. The original bug report is here: > > https://lore.kernel.org/lkml/YZV7Z+yXbsx9p3JN@fixkernel.com/ Okay, I am finally able to reproduce this on x86_64 with the latest mainline as well by setting CONFIG_USER_NS and KASAN on the top of defconfig (I did not realize it did not select CONFIG_USER_NS in the first place). Anyway, it still took less than 5-minute by running: $ trinity -C 48 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG: KASAN: use-after-free in dec_rlimit_ucounts 2021-11-26 5:34 ` Qian Cai @ 2021-12-20 5:58 ` Eric W. Biederman -1 siblings, 0 replies; 15+ messages in thread From: Eric W. Biederman @ 2021-12-20 5:58 UTC (permalink / raw) To: Qian Cai Cc: Alexey Gladkov, Yu Zhao, linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland, linux-arm-kernel Qian Cai <quic_qiancai@quicinc.com> writes: > On Wed, Nov 24, 2021 at 04:49:19PM -0500, Qian Cai wrote: >> Hmm, I don't know if that or it is just this platfrom is lucky to trigger >> the race condition quickly, but I can't reproduce it on x86 so far. I am >> Cc'ing a few arm64 people to see if they have spot anything I might be >> missing. The original bug report is here: >> >> https://lore.kernel.org/lkml/YZV7Z+yXbsx9p3JN@fixkernel.com/ > > Okay, I am finally able to reproduce this on x86_64 with the latest > mainline as well by setting CONFIG_USER_NS and KASAN on the top of > defconfig (I did not realize it did not select CONFIG_USER_NS in the first > place). Anyway, it still took less than 5-minute by running: > > $ trinity -C 48 It took me a while to get to the point of reproducing this but I can confirm I see this with 2 core VM, running 5.16.0-rc4. Running trinity 2019.06 packaged in debian 11. I didn't watch so I don't know if it was 5 minutes but I do know it took less than an hour. Now I am puzzled why there are not other reports of problems. Now to start drilling down to figure out why the user namespace was freed early. ---- The failure I got looked like: > BUG: KASAN: use-after-free in dec_rlimit_ucounts+0x7b/0xb0 > Read of size 8 at addr ffff88800b7dd018 by task trinity-c3/67982 > > CPU: 1 PID: 67982 Comm: trinity-c3 Tainted: G O 5.16.0-rc4 #1 > Hardware name: Xen HVM domU, BIOS 4.8.5-35.fc25 08/25/2021 > Call Trace: > <TASK> > dump_stack_lvl+0x48/0x5e > print_address_descrtion.constprop.0+0x1f/0x140 > ? dec_rlimit_ucounts+0x7b/0xb0 > ? dec_rlimit_ucounts+0x7b/0xb0 > kasan_report.cold+0x7f/0xe0 > ? _raw_spin_lock+0x7f/0x11b > ? dec_rlimit_ucounts+0x7b/0xb0 > dec_rlimit_ucounts+0x7b/0xb0 > mqueue_evict_inode+0x417/0x590 > ? perf_trace_global_dirty_state+0x350/0x350 > ? __x64_sys_mq_unlink+0x250/0x250 > ? _raw_spin_lock_bh+0xe0/0xe0 > ? _raw_spin_lock_bh+0xe0/0xe0 > evict+0x155/0x2a0 > __x64_sys_mq_unlink+0x1a7/0x250 > do_syscall_64+0x3b/0x90 > entry_SYSCALL_64_after_hwframe+0x44/0xae > RIP: 0033:0x7f0505ebc9b9 > Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 00 0f 1f 44 00 00 48 89 .... > > Allocated by task 67717 > Freed by task 6027 > > The buggy address belongs to the object at ffff88800b7dce38 > which belongs to the cache user_namespace of size 600 > The buggy address is located 480 bytes inside of > 600-byte region [ffff88800b7dce38, ffff88800b7dd090] > The buggy address belongs to the page: > > trinity: Detected kernel tainting. Last seed was 1891442794 Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG: KASAN: use-after-free in dec_rlimit_ucounts @ 2021-12-20 5:58 ` Eric W. Biederman 0 siblings, 0 replies; 15+ messages in thread From: Eric W. Biederman @ 2021-12-20 5:58 UTC (permalink / raw) To: Qian Cai Cc: Alexey Gladkov, Yu Zhao, linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland, linux-arm-kernel Qian Cai <quic_qiancai@quicinc.com> writes: > On Wed, Nov 24, 2021 at 04:49:19PM -0500, Qian Cai wrote: >> Hmm, I don't know if that or it is just this platfrom is lucky to trigger >> the race condition quickly, but I can't reproduce it on x86 so far. I am >> Cc'ing a few arm64 people to see if they have spot anything I might be >> missing. The original bug report is here: >> >> https://lore.kernel.org/lkml/YZV7Z+yXbsx9p3JN@fixkernel.com/ > > Okay, I am finally able to reproduce this on x86_64 with the latest > mainline as well by setting CONFIG_USER_NS and KASAN on the top of > defconfig (I did not realize it did not select CONFIG_USER_NS in the first > place). Anyway, it still took less than 5-minute by running: > > $ trinity -C 48 It took me a while to get to the point of reproducing this but I can confirm I see this with 2 core VM, running 5.16.0-rc4. Running trinity 2019.06 packaged in debian 11. I didn't watch so I don't know if it was 5 minutes but I do know it took less than an hour. Now I am puzzled why there are not other reports of problems. Now to start drilling down to figure out why the user namespace was freed early. ---- The failure I got looked like: > BUG: KASAN: use-after-free in dec_rlimit_ucounts+0x7b/0xb0 > Read of size 8 at addr ffff88800b7dd018 by task trinity-c3/67982 > > CPU: 1 PID: 67982 Comm: trinity-c3 Tainted: G O 5.16.0-rc4 #1 > Hardware name: Xen HVM domU, BIOS 4.8.5-35.fc25 08/25/2021 > Call Trace: > <TASK> > dump_stack_lvl+0x48/0x5e > print_address_descrtion.constprop.0+0x1f/0x140 > ? dec_rlimit_ucounts+0x7b/0xb0 > ? dec_rlimit_ucounts+0x7b/0xb0 > kasan_report.cold+0x7f/0xe0 > ? _raw_spin_lock+0x7f/0x11b > ? dec_rlimit_ucounts+0x7b/0xb0 > dec_rlimit_ucounts+0x7b/0xb0 > mqueue_evict_inode+0x417/0x590 > ? perf_trace_global_dirty_state+0x350/0x350 > ? __x64_sys_mq_unlink+0x250/0x250 > ? _raw_spin_lock_bh+0xe0/0xe0 > ? _raw_spin_lock_bh+0xe0/0xe0 > evict+0x155/0x2a0 > __x64_sys_mq_unlink+0x1a7/0x250 > do_syscall_64+0x3b/0x90 > entry_SYSCALL_64_after_hwframe+0x44/0xae > RIP: 0033:0x7f0505ebc9b9 > Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 00 0f 1f 44 00 00 48 89 .... > > Allocated by task 67717 > Freed by task 6027 > > The buggy address belongs to the object at ffff88800b7dce38 > which belongs to the cache user_namespace of size 600 > The buggy address is located 480 bytes inside of > 600-byte region [ffff88800b7dce38, ffff88800b7dd090] > The buggy address belongs to the page: > > trinity: Detected kernel tainting. Last seed was 1891442794 Eric _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG: KASAN: use-after-free in dec_rlimit_ucounts 2021-12-20 5:58 ` Eric W. Biederman @ 2021-12-21 13:09 ` Alexey Gladkov -1 siblings, 0 replies; 15+ messages in thread From: Alexey Gladkov @ 2021-12-21 13:09 UTC (permalink / raw) To: Eric W. Biederman Cc: Qian Cai, Yu Zhao, linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland, linux-arm-kernel On Sun, Dec 19, 2021 at 11:58:41PM -0600, Eric W. Biederman wrote: > Qian Cai <quic_qiancai@quicinc.com> writes: > > > On Wed, Nov 24, 2021 at 04:49:19PM -0500, Qian Cai wrote: > >> Hmm, I don't know if that or it is just this platfrom is lucky to trigger > >> the race condition quickly, but I can't reproduce it on x86 so far. I am > >> Cc'ing a few arm64 people to see if they have spot anything I might be > >> missing. The original bug report is here: > >> > >> https://lore.kernel.org/lkml/YZV7Z+yXbsx9p3JN@fixkernel.com/ > > > > Okay, I am finally able to reproduce this on x86_64 with the latest > > mainline as well by setting CONFIG_USER_NS and KASAN on the top of > > defconfig (I did not realize it did not select CONFIG_USER_NS in the first > > place). Anyway, it still took less than 5-minute by running: > > > > $ trinity -C 48 > > It took me a while to get to the point of reproducing this but I can > confirm I see this with 2 core VM, running 5.16.0-rc4. > > Running trinity 2019.06 packaged in debian 11. I still can't reproduce :( > I didn't watch so I don't know if it was 5 minutes but I do know it took > less than an hour. --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -209,6 +209,7 @@ void put_ucounts(struct ucounts *ucounts) if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) { hlist_del_init(&ucounts->node); + ucounts->ns = NULL; spin_unlock_irqrestore(&ucounts_lock, flags); kfree(ucounts); } Does the previous hack increase the likelihood of an error being triggered? > Now I am puzzled why there are not other reports of problems. > > Now to start drilling down to figure out why the user namespace was > freed early. > ---- > > The failure I got looked like: > > BUG: KASAN: use-after-free in dec_rlimit_ucounts+0x7b/0xb0 > > Read of size 8 at addr ffff88800b7dd018 by task trinity-c3/67982 > > > > CPU: 1 PID: 67982 Comm: trinity-c3 Tainted: G O 5.16.0-rc4 #1 > > Hardware name: Xen HVM domU, BIOS 4.8.5-35.fc25 08/25/2021 > > Call Trace: > > <TASK> > > dump_stack_lvl+0x48/0x5e > > print_address_descrtion.constprop.0+0x1f/0x140 > > ? dec_rlimit_ucounts+0x7b/0xb0 > > ? dec_rlimit_ucounts+0x7b/0xb0 > > kasan_report.cold+0x7f/0xe0 > > ? _raw_spin_lock+0x7f/0x11b > > ? dec_rlimit_ucounts+0x7b/0xb0 > > dec_rlimit_ucounts+0x7b/0xb0 > > mqueue_evict_inode+0x417/0x590 > > ? perf_trace_global_dirty_state+0x350/0x350 > > ? __x64_sys_mq_unlink+0x250/0x250 > > ? _raw_spin_lock_bh+0xe0/0xe0 > > ? _raw_spin_lock_bh+0xe0/0xe0 > > evict+0x155/0x2a0 > > __x64_sys_mq_unlink+0x1a7/0x250 > > do_syscall_64+0x3b/0x90 > > entry_SYSCALL_64_after_hwframe+0x44/0xae > > RIP: 0033:0x7f0505ebc9b9 > > Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 00 0f 1f 44 00 00 48 89 .... > > > > Allocated by task 67717 > > Freed by task 6027 > > > > The buggy address belongs to the object at ffff88800b7dce38 > > which belongs to the cache user_namespace of size 600 > > The buggy address is located 480 bytes inside of > > 600-byte region [ffff88800b7dce38, ffff88800b7dd090] > > The buggy address belongs to the page: > > > > trinity: Detected kernel tainting. Last seed was 1891442794 > > Eric > -- Rgrds, legion ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG: KASAN: use-after-free in dec_rlimit_ucounts @ 2021-12-21 13:09 ` Alexey Gladkov 0 siblings, 0 replies; 15+ messages in thread From: Alexey Gladkov @ 2021-12-21 13:09 UTC (permalink / raw) To: Eric W. Biederman Cc: Qian Cai, Yu Zhao, linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland, linux-arm-kernel On Sun, Dec 19, 2021 at 11:58:41PM -0600, Eric W. Biederman wrote: > Qian Cai <quic_qiancai@quicinc.com> writes: > > > On Wed, Nov 24, 2021 at 04:49:19PM -0500, Qian Cai wrote: > >> Hmm, I don't know if that or it is just this platfrom is lucky to trigger > >> the race condition quickly, but I can't reproduce it on x86 so far. I am > >> Cc'ing a few arm64 people to see if they have spot anything I might be > >> missing. The original bug report is here: > >> > >> https://lore.kernel.org/lkml/YZV7Z+yXbsx9p3JN@fixkernel.com/ > > > > Okay, I am finally able to reproduce this on x86_64 with the latest > > mainline as well by setting CONFIG_USER_NS and KASAN on the top of > > defconfig (I did not realize it did not select CONFIG_USER_NS in the first > > place). Anyway, it still took less than 5-minute by running: > > > > $ trinity -C 48 > > It took me a while to get to the point of reproducing this but I can > confirm I see this with 2 core VM, running 5.16.0-rc4. > > Running trinity 2019.06 packaged in debian 11. I still can't reproduce :( > I didn't watch so I don't know if it was 5 minutes but I do know it took > less than an hour. --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -209,6 +209,7 @@ void put_ucounts(struct ucounts *ucounts) if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) { hlist_del_init(&ucounts->node); + ucounts->ns = NULL; spin_unlock_irqrestore(&ucounts_lock, flags); kfree(ucounts); } Does the previous hack increase the likelihood of an error being triggered? > Now I am puzzled why there are not other reports of problems. > > Now to start drilling down to figure out why the user namespace was > freed early. > ---- > > The failure I got looked like: > > BUG: KASAN: use-after-free in dec_rlimit_ucounts+0x7b/0xb0 > > Read of size 8 at addr ffff88800b7dd018 by task trinity-c3/67982 > > > > CPU: 1 PID: 67982 Comm: trinity-c3 Tainted: G O 5.16.0-rc4 #1 > > Hardware name: Xen HVM domU, BIOS 4.8.5-35.fc25 08/25/2021 > > Call Trace: > > <TASK> > > dump_stack_lvl+0x48/0x5e > > print_address_descrtion.constprop.0+0x1f/0x140 > > ? dec_rlimit_ucounts+0x7b/0xb0 > > ? dec_rlimit_ucounts+0x7b/0xb0 > > kasan_report.cold+0x7f/0xe0 > > ? _raw_spin_lock+0x7f/0x11b > > ? dec_rlimit_ucounts+0x7b/0xb0 > > dec_rlimit_ucounts+0x7b/0xb0 > > mqueue_evict_inode+0x417/0x590 > > ? perf_trace_global_dirty_state+0x350/0x350 > > ? __x64_sys_mq_unlink+0x250/0x250 > > ? _raw_spin_lock_bh+0xe0/0xe0 > > ? _raw_spin_lock_bh+0xe0/0xe0 > > evict+0x155/0x2a0 > > __x64_sys_mq_unlink+0x1a7/0x250 > > do_syscall_64+0x3b/0x90 > > entry_SYSCALL_64_after_hwframe+0x44/0xae > > RIP: 0033:0x7f0505ebc9b9 > > Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 00 0f 1f 44 00 00 48 89 .... > > > > Allocated by task 67717 > > Freed by task 6027 > > > > The buggy address belongs to the object at ffff88800b7dce38 > > which belongs to the cache user_namespace of size 600 > > The buggy address is located 480 bytes inside of > > 600-byte region [ffff88800b7dce38, ffff88800b7dd090] > > The buggy address belongs to the page: > > > > trinity: Detected kernel tainting. Last seed was 1891442794 > > Eric > -- Rgrds, legion _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG: KASAN: use-after-free in dec_rlimit_ucounts 2021-12-21 13:09 ` Alexey Gladkov @ 2021-12-27 15:22 ` Eric W. Biederman -1 siblings, 0 replies; 15+ messages in thread From: Eric W. Biederman @ 2021-12-27 15:22 UTC (permalink / raw) To: Alexey Gladkov Cc: Qian Cai, Yu Zhao, linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland, linux-arm-kernel Alexey Gladkov <legion@kernel.org> writes: > On Sun, Dec 19, 2021 at 11:58:41PM -0600, Eric W. Biederman wrote: >> Qian Cai <quic_qiancai@quicinc.com> writes: >> >> > On Wed, Nov 24, 2021 at 04:49:19PM -0500, Qian Cai wrote: >> >> Hmm, I don't know if that or it is just this platfrom is lucky to trigger >> >> the race condition quickly, but I can't reproduce it on x86 so far. I am >> >> Cc'ing a few arm64 people to see if they have spot anything I might be >> >> missing. The original bug report is here: >> >> >> >> https://lore.kernel.org/lkml/YZV7Z+yXbsx9p3JN@fixkernel.com/ >> > >> > Okay, I am finally able to reproduce this on x86_64 with the latest >> > mainline as well by setting CONFIG_USER_NS and KASAN on the top of >> > defconfig (I did not realize it did not select CONFIG_USER_NS in the first >> > place). Anyway, it still took less than 5-minute by running: >> > >> > $ trinity -C 48 >> >> It took me a while to get to the point of reproducing this but I can >> confirm I see this with 2 core VM, running 5.16.0-rc4. >> >> Running trinity 2019.06 packaged in debian 11. > > I still can't reproduce :( > >> I didn't watch so I don't know if it was 5 minutes but I do know it took >> less than an hour. > > --- a/kernel/ucount.c > +++ b/kernel/ucount.c > @@ -209,6 +209,7 @@ void put_ucounts(struct ucounts *ucounts) > > if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) { > hlist_del_init(&ucounts->node); > + ucounts->ns = NULL; > spin_unlock_irqrestore(&ucounts_lock, flags); > kfree(ucounts); > } > > Does the previous hack increase the likelihood of an error being > triggered? It doesn't seem to make a difference. That makes sense as the kernel address sanitizer is part of the kernel configuration required to reproduce the issue. Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG: KASAN: use-after-free in dec_rlimit_ucounts @ 2021-12-27 15:22 ` Eric W. Biederman 0 siblings, 0 replies; 15+ messages in thread From: Eric W. Biederman @ 2021-12-27 15:22 UTC (permalink / raw) To: Alexey Gladkov Cc: Qian Cai, Yu Zhao, linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland, linux-arm-kernel Alexey Gladkov <legion@kernel.org> writes: > On Sun, Dec 19, 2021 at 11:58:41PM -0600, Eric W. Biederman wrote: >> Qian Cai <quic_qiancai@quicinc.com> writes: >> >> > On Wed, Nov 24, 2021 at 04:49:19PM -0500, Qian Cai wrote: >> >> Hmm, I don't know if that or it is just this platfrom is lucky to trigger >> >> the race condition quickly, but I can't reproduce it on x86 so far. I am >> >> Cc'ing a few arm64 people to see if they have spot anything I might be >> >> missing. The original bug report is here: >> >> >> >> https://lore.kernel.org/lkml/YZV7Z+yXbsx9p3JN@fixkernel.com/ >> > >> > Okay, I am finally able to reproduce this on x86_64 with the latest >> > mainline as well by setting CONFIG_USER_NS and KASAN on the top of >> > defconfig (I did not realize it did not select CONFIG_USER_NS in the first >> > place). Anyway, it still took less than 5-minute by running: >> > >> > $ trinity -C 48 >> >> It took me a while to get to the point of reproducing this but I can >> confirm I see this with 2 core VM, running 5.16.0-rc4. >> >> Running trinity 2019.06 packaged in debian 11. > > I still can't reproduce :( > >> I didn't watch so I don't know if it was 5 minutes but I do know it took >> less than an hour. > > --- a/kernel/ucount.c > +++ b/kernel/ucount.c > @@ -209,6 +209,7 @@ void put_ucounts(struct ucounts *ucounts) > > if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) { > hlist_del_init(&ucounts->node); > + ucounts->ns = NULL; > spin_unlock_irqrestore(&ucounts_lock, flags); > kfree(ucounts); > } > > Does the previous hack increase the likelihood of an error being > triggered? It doesn't seem to make a difference. That makes sense as the kernel address sanitizer is part of the kernel configuration required to reproduce the issue. Eric _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2021-12-27 15:25 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-17 22:00 BUG: KASAN: use-after-free in dec_rlimit_ucounts Qian Cai 2021-11-18 19:46 ` Eric W. Biederman 2021-11-18 20:32 ` Qian Cai 2021-11-18 20:57 ` Eric W. Biederman 2021-11-19 13:32 ` Qian Cai 2021-11-24 21:49 ` Qian Cai 2021-11-24 21:49 ` Qian Cai 2021-11-26 5:34 ` Qian Cai 2021-11-26 5:34 ` Qian Cai 2021-12-20 5:58 ` Eric W. Biederman 2021-12-20 5:58 ` Eric W. Biederman 2021-12-21 13:09 ` Alexey Gladkov 2021-12-21 13:09 ` Alexey Gladkov 2021-12-27 15:22 ` Eric W. Biederman 2021-12-27 15:22 ` Eric W. Biederman
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.