* KASAN: use-after-free Read in get_mem_cgroup_from_mm
@ 2018-11-07 1:52 syzbot
2018-12-04 15:43 ` syzbot
2019-03-22 9:36 ` syzbot
0 siblings, 2 replies; 26+ messages in thread
From: syzbot @ 2018-11-07 1:52 UTC (permalink / raw)
To: cgroups, hannes, linux-kernel, linux-mm, mhocko, syzkaller-bugs,
vdavydov.dev
Hello,
syzbot found the following crash on:
HEAD commit: 83650fd58a93 Merge tag 'arm64-upstream' of git://git.kerne..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12ce682b400000
kernel config: https://syzkaller.appspot.com/x/.config?x=9384ecb1c973baed
dashboard link: https://syzkaller.appspot.com/bug?extid=cbb52e396df3e565ab02
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
Unfortunately, I don't have any reproducer for this crash yet.
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com
FAT-fs (loop1): Unrecognized mount option "\a" or missing value
F2FS-fs (loop5): Magic Mismatch, valid(0xf2f52010) - read(0x0)
==================================================================
BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:182
[inline]
BUG: KASAN: use-after-free in task_css include/linux/cgroup.h:477 [inline]
BUG: KASAN: use-after-free in mem_cgroup_from_task mm/memcontrol.c:815
[inline]
BUG: KASAN: use-after-free in get_mem_cgroup_from_mm.part.62+0x6d7/0x880
mm/memcontrol.c:844
Read of size 8 at addr ffff8801c635d210 by task syz-executor0/14887
CPU: 0 PID: 14887 Comm: syz-executor0 Not tainted 4.19.0+ #318
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x244/0x39d lib/dump_stack.c:113
print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
__read_once_size include/linux/compiler.h:182 [inline]
task_css include/linux/cgroup.h:477 [inline]
mem_cgroup_from_task mm/memcontrol.c:815 [inline]
get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844
get_mem_cgroup_from_mm mm/memcontrol.c:834 [inline]
mem_cgroup_try_charge+0x608/0xe20 mm/memcontrol.c:5888
mcopy_atomic_pte mm/userfaultfd.c:69 [inline]
mfill_atomic_pte mm/userfaultfd.c:385 [inline]
__mcopy_atomic mm/userfaultfd.c:529 [inline]
mcopy_atomic+0xae9/0x2aa0 mm/userfaultfd.c:579
userfaultfd_copy fs/userfaultfd.c:1690 [inline]
userfaultfd_ioctl+0x213d/0x54a0 fs/userfaultfd.c:1836
vfs_ioctl fs/ioctl.c:46 [inline]
file_ioctl fs/ioctl.c:509 [inline]
do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696
ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457569
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f6dd22acc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457569
RDX: 0000000020000100 RSI: 00000000c028aa03 RDI: 0000000000000004
RBP: 000000000072bfa0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6dd22ad6d4
R13: 00000000004c142b R14: 00000000004d22a8 R15: 00000000ffffffff
Allocated by task 14881:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
kmem_cache_alloc_node+0x144/0x730 mm/slab.c:3644
alloc_task_struct_node kernel/fork.c:158 [inline]
dup_task_struct kernel/fork.c:843 [inline]
copy_process+0x2026/0x87a0 kernel/fork.c:1751
_do_fork+0x1cb/0x11d0 kernel/fork.c:2216
__do_sys_clone kernel/fork.c:2323 [inline]
__se_sys_clone kernel/fork.c:2317 [inline]
__x64_sys_clone+0xbf/0x150 kernel/fork.c:2317
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 14881:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
__cache_free mm/slab.c:3498 [inline]
kmem_cache_free+0x83/0x290 mm/slab.c:3760
free_task_struct kernel/fork.c:163 [inline]
free_task+0x16e/0x1f0 kernel/fork.c:457
copy_process+0x1dcc/0x87a0 kernel/fork.c:2148
_do_fork+0x1cb/0x11d0 kernel/fork.c:2216
__do_sys_clone kernel/fork.c:2323 [inline]
__se_sys_clone kernel/fork.c:2317 [inline]
__x64_sys_clone+0xbf/0x150 kernel/fork.c:2317
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The buggy address belongs to the object at ffff8801c635c140
which belongs to the cache task_struct(17:syz0) of size 6080
The buggy address is located 4304 bytes inside of
6080-byte region [ffff8801c635c140, ffff8801c635d900)
The buggy address belongs to the page:
page:ffffea000718d700 count:1 mapcount:0 mapping:ffff8801ccef9800 index:0x0
compound_mapcount: 0
flags: 0x2fffc0000010200(slab|head)
raw: 02fffc0000010200 ffffea000573d508 ffffea0006fc6088 ffff8801ccef9800
raw: 0000000000000000 ffff8801c635c140 0000000100000001 ffff880188008ec0
page dumped because: kasan: bad access detected
page->mem_cgroup:ffff880188008ec0
Memory state around the buggy address:
ffff8801c635d100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8801c635d180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8801c635d200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8801c635d280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8801c635d300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
FAT-fs (loop2): bogus number of FAT structure
FAT-fs (loop2): Can't find a valid FAT filesystem
F2FS-fs (loop5): Can't find valid F2FS filesystem in 1th superblock
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2018-11-07 1:52 KASAN: use-after-free Read in get_mem_cgroup_from_mm syzbot @ 2018-12-04 15:43 ` syzbot 2019-03-03 16:19 ` zhong jiang 2019-03-22 9:36 ` syzbot 1 sibling, 1 reply; 26+ messages in thread From: syzbot @ 2018-12-04 15:43 UTC (permalink / raw) To: cgroups, hannes, linux-kernel, linux-mm, mhocko, syzkaller-bugs, vdavydov.dev syzbot has found a reproducer for the following crash on: HEAD commit: 0072a0c14d5b Merge tag 'media/v4.20-4' of git://git.kernel.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=11c885a3400000 kernel config: https://syzkaller.appspot.com/x/.config?x=b9cc5a440391cbfd dashboard link: https://syzkaller.appspot.com/bug?extid=cbb52e396df3e565ab02 compiler: gcc (GCC) 8.0.1 20180413 (experimental) syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12835e25400000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=172fa5a3400000 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com cgroup: fork rejected by pids controller in /syz2 ================================================================== BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:182 [inline] BUG: KASAN: use-after-free in task_css include/linux/cgroup.h:477 [inline] BUG: KASAN: use-after-free in mem_cgroup_from_task mm/memcontrol.c:815 [inline] BUG: KASAN: use-after-free in get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 Read of size 8 at addr ffff8881b72af310 by task syz-executor198/9332 CPU: 0 PID: 9332 Comm: syz-executor198 Not tainted 4.20.0-rc5+ #142 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x244/0x39d lib/dump_stack.c:113 print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 __read_once_size include/linux/compiler.h:182 [inline] task_css include/linux/cgroup.h:477 [inline] mem_cgroup_from_task mm/memcontrol.c:815 [inline] get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 get_mem_cgroup_from_mm mm/memcontrol.c:834 [inline] mem_cgroup_try_charge+0x608/0xe20 mm/memcontrol.c:5888 mcopy_atomic_pte mm/userfaultfd.c:71 [inline] mfill_atomic_pte mm/userfaultfd.c:418 [inline] __mcopy_atomic mm/userfaultfd.c:559 [inline] mcopy_atomic+0xb08/0x2c70 mm/userfaultfd.c:609 userfaultfd_copy fs/userfaultfd.c:1705 [inline] userfaultfd_ioctl+0x29fb/0x5610 fs/userfaultfd.c:1851 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x44c7e9 Code: 5d c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f906b69fdb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000006e4a08 RCX: 000000000044c7e9 RDX: 0000000020000100 RSI: 00000000c028aa03 RDI: 0000000000000004 RBP: 00000000006e4a00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e4a0c R13: 00007ffdfd47813f R14: 00007f906b6a09c0 R15: 000000000000002d Allocated by task 9325: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc_node+0x144/0x730 mm/slab.c:3644 alloc_task_struct_node kernel/fork.c:158 [inline] dup_task_struct kernel/fork.c:843 [inline] copy_process+0x2026/0x87a0 kernel/fork.c:1751 _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 __do_sys_clone kernel/fork.c:2323 [inline] __se_sys_clone kernel/fork.c:2317 [inline] __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9325: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x83/0x290 mm/slab.c:3760 free_task_struct kernel/fork.c:163 [inline] free_task+0x16e/0x1f0 kernel/fork.c:457 copy_process+0x1dcc/0x87a0 kernel/fork.c:2148 _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 __do_sys_clone kernel/fork.c:2323 [inline] __se_sys_clone kernel/fork.c:2317 [inline] __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8881b72ae240 which belongs to the cache task_struct(81:syz2) of size 6080 The buggy address is located 4304 bytes inside of 6080-byte region [ffff8881b72ae240, ffff8881b72afa00) The buggy address belongs to the page: page:ffffea0006dcab80 count:1 mapcount:0 mapping:ffff8881d2dce0c0 index:0x0 compound_mapcount: 0 flags: 0x2fffc0000010200(slab|head) raw: 02fffc0000010200 ffffea00074a1f88 ffffea0006ebbb88 ffff8881d2dce0c0 raw: 0000000000000000 ffff8881b72ae240 0000000100000001 ffff8881d87fe580 page dumped because: kasan: bad access detected page->mem_cgroup:ffff8881d87fe580 Memory state around the buggy address: ffff8881b72af200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8881b72af280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff8881b72af300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8881b72af380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8881b72af400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2018-12-04 15:43 ` syzbot @ 2019-03-03 16:19 ` zhong jiang 2019-03-04 7:40 ` Dmitry Vyukov 2019-03-04 21:51 ` Matthew Wilcox 0 siblings, 2 replies; 26+ messages in thread From: zhong jiang @ 2019-03-03 16:19 UTC (permalink / raw) To: syzbot, mhocko, Andrea Arcangeli Cc: cgroups, hannes, linux-kernel, linux-mm, syzkaller-bugs, vdavydov.dev, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka Hi, guys I also hit the following issue. but it fails to reproduce the issue by the log. it seems to the case that we access the mm->owner and deference it will result in the UAF. But it should not be possible that we specify the incomplete process to be the mm->owner. Any thoughts? Thanks, zhong jiang On 2018/12/4 23:43, syzbot wrote: > syzbot has found a reproducer for the following crash on: > > HEAD commit: 0072a0c14d5b Merge tag 'media/v4.20-4' of git://git.kernel.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=11c885a3400000 > kernel config: https://syzkaller.appspot.com/x/.config?x=b9cc5a440391cbfd > dashboard link: https://syzkaller.appspot.com/bug?extid=cbb52e396df3e565ab02 > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12835e25400000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=172fa5a3400000 > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com > > cgroup: fork rejected by pids controller in /syz2 > ================================================================== > BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:182 [inline] > BUG: KASAN: use-after-free in task_css include/linux/cgroup.h:477 [inline] > BUG: KASAN: use-after-free in mem_cgroup_from_task mm/memcontrol.c:815 [inline] > BUG: KASAN: use-after-free in get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 > Read of size 8 at addr ffff8881b72af310 by task syz-executor198/9332 > > CPU: 0 PID: 9332 Comm: syz-executor198 Not tainted 4.20.0-rc5+ #142 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x244/0x39d lib/dump_stack.c:113 > print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256 > kasan_report_error mm/kasan/report.c:354 [inline] > kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412 > __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 > __read_once_size include/linux/compiler.h:182 [inline] > task_css include/linux/cgroup.h:477 [inline] > mem_cgroup_from_task mm/memcontrol.c:815 [inline] > get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 > get_mem_cgroup_from_mm mm/memcontrol.c:834 [inline] > mem_cgroup_try_charge+0x608/0xe20 mm/memcontrol.c:5888 > mcopy_atomic_pte mm/userfaultfd.c:71 [inline] > mfill_atomic_pte mm/userfaultfd.c:418 [inline] > __mcopy_atomic mm/userfaultfd.c:559 [inline] > mcopy_atomic+0xb08/0x2c70 mm/userfaultfd.c:609 > userfaultfd_copy fs/userfaultfd.c:1705 [inline] > userfaultfd_ioctl+0x29fb/0x5610 fs/userfaultfd.c:1851 > vfs_ioctl fs/ioctl.c:46 [inline] > file_ioctl fs/ioctl.c:509 [inline] > do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 > ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 > __do_sys_ioctl fs/ioctl.c:720 [inline] > __se_sys_ioctl fs/ioctl.c:718 [inline] > __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x44c7e9 > Code: 5d c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 > RSP: 002b:00007f906b69fdb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 00000000006e4a08 RCX: 000000000044c7e9 > RDX: 0000000020000100 RSI: 00000000c028aa03 RDI: 0000000000000004 > RBP: 00000000006e4a00 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e4a0c > R13: 00007ffdfd47813f R14: 00007f906b6a09c0 R15: 000000000000002d > > Allocated by task 9325: > save_stack+0x43/0xd0 mm/kasan/kasan.c:448 > set_track mm/kasan/kasan.c:460 [inline] > kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 > kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 > kmem_cache_alloc_node+0x144/0x730 mm/slab.c:3644 > alloc_task_struct_node kernel/fork.c:158 [inline] > dup_task_struct kernel/fork.c:843 [inline] > copy_process+0x2026/0x87a0 kernel/fork.c:1751 > _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 > __do_sys_clone kernel/fork.c:2323 [inline] > __se_sys_clone kernel/fork.c:2317 [inline] > __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > Freed by task 9325: > save_stack+0x43/0xd0 mm/kasan/kasan.c:448 > set_track mm/kasan/kasan.c:460 [inline] > __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 > kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 > __cache_free mm/slab.c:3498 [inline] > kmem_cache_free+0x83/0x290 mm/slab.c:3760 > free_task_struct kernel/fork.c:163 [inline] > free_task+0x16e/0x1f0 kernel/fork.c:457 > copy_process+0x1dcc/0x87a0 kernel/fork.c:2148 > _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 > __do_sys_clone kernel/fork.c:2323 [inline] > __se_sys_clone kernel/fork.c:2317 [inline] > __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > The buggy address belongs to the object at ffff8881b72ae240 > which belongs to the cache task_struct(81:syz2) of size 6080 > The buggy address is located 4304 bytes inside of > 6080-byte region [ffff8881b72ae240, ffff8881b72afa00) > The buggy address belongs to the page: > page:ffffea0006dcab80 count:1 mapcount:0 mapping:ffff8881d2dce0c0 index:0x0 compound_mapcount: 0 > flags: 0x2fffc0000010200(slab|head) > raw: 02fffc0000010200 ffffea00074a1f88 ffffea0006ebbb88 ffff8881d2dce0c0 > raw: 0000000000000000 ffff8881b72ae240 0000000100000001 ffff8881d87fe580 > page dumped because: kasan: bad access detected > page->mem_cgroup:ffff8881d87fe580 > > Memory state around the buggy address: > ffff8881b72af200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff8881b72af280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >> ffff8881b72af300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ^ > ffff8881b72af380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff8881b72af400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ================================================================== > > > . > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-03 16:19 ` zhong jiang @ 2019-03-04 7:40 ` Dmitry Vyukov 2019-03-04 14:00 ` zhong jiang 2019-03-04 21:51 ` Matthew Wilcox 1 sibling, 1 reply; 26+ messages in thread From: Dmitry Vyukov @ 2019-03-04 7:40 UTC (permalink / raw) To: zhong jiang Cc: syzbot, Michal Hocko, Andrea Arcangeli, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka On Sun, Mar 3, 2019 at 5:19 PM zhong jiang <zhongjiang@huawei.com> wrote: > > Hi, guys > > I also hit the following issue. but it fails to reproduce the issue by the log. > > it seems to the case that we access the mm->owner and deference it will result in the UAF. > But it should not be possible that we specify the incomplete process to be the mm->owner. > > Any thoughts? FWIW syzbot was able to reproduce this with this reproducer. This looks like a very subtle race (threaded reproducer that runs repeatedly in multiple processes), so most likely we are looking for something like few instructions inconsistency window. > Thanks, > zhong jiang > > On 2018/12/4 23:43, syzbot wrote: > > syzbot has found a reproducer for the following crash on: > > > > HEAD commit: 0072a0c14d5b Merge tag 'media/v4.20-4' of git://git.kernel.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=11c885a3400000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=b9cc5a440391cbfd > > dashboard link: https://syzkaller.appspot.com/bug?extid=cbb52e396df3e565ab02 > > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12835e25400000 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=172fa5a3400000 > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com > > > > cgroup: fork rejected by pids controller in /syz2 > > ================================================================== > > BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:182 [inline] > > BUG: KASAN: use-after-free in task_css include/linux/cgroup.h:477 [inline] > > BUG: KASAN: use-after-free in mem_cgroup_from_task mm/memcontrol.c:815 [inline] > > BUG: KASAN: use-after-free in get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 > > Read of size 8 at addr ffff8881b72af310 by task syz-executor198/9332 > > > > CPU: 0 PID: 9332 Comm: syz-executor198 Not tainted 4.20.0-rc5+ #142 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > Call Trace: > > __dump_stack lib/dump_stack.c:77 [inline] > > dump_stack+0x244/0x39d lib/dump_stack.c:113 > > print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256 > > kasan_report_error mm/kasan/report.c:354 [inline] > > kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412 > > __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 > > __read_once_size include/linux/compiler.h:182 [inline] > > task_css include/linux/cgroup.h:477 [inline] > > mem_cgroup_from_task mm/memcontrol.c:815 [inline] > > get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 > > get_mem_cgroup_from_mm mm/memcontrol.c:834 [inline] > > mem_cgroup_try_charge+0x608/0xe20 mm/memcontrol.c:5888 > > mcopy_atomic_pte mm/userfaultfd.c:71 [inline] > > mfill_atomic_pte mm/userfaultfd.c:418 [inline] > > __mcopy_atomic mm/userfaultfd.c:559 [inline] > > mcopy_atomic+0xb08/0x2c70 mm/userfaultfd.c:609 > > userfaultfd_copy fs/userfaultfd.c:1705 [inline] > > userfaultfd_ioctl+0x29fb/0x5610 fs/userfaultfd.c:1851 > > vfs_ioctl fs/ioctl.c:46 [inline] > > file_ioctl fs/ioctl.c:509 [inline] > > do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 > > ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 > > __do_sys_ioctl fs/ioctl.c:720 [inline] > > __se_sys_ioctl fs/ioctl.c:718 [inline] > > __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 > > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > RIP: 0033:0x44c7e9 > > Code: 5d c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 > > RSP: 002b:00007f906b69fdb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > > RAX: ffffffffffffffda RBX: 00000000006e4a08 RCX: 000000000044c7e9 > > RDX: 0000000020000100 RSI: 00000000c028aa03 RDI: 0000000000000004 > > RBP: 00000000006e4a00 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e4a0c > > R13: 00007ffdfd47813f R14: 00007f906b6a09c0 R15: 000000000000002d > > > > Allocated by task 9325: > > save_stack+0x43/0xd0 mm/kasan/kasan.c:448 > > set_track mm/kasan/kasan.c:460 [inline] > > kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 > > kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 > > kmem_cache_alloc_node+0x144/0x730 mm/slab.c:3644 > > alloc_task_struct_node kernel/fork.c:158 [inline] > > dup_task_struct kernel/fork.c:843 [inline] > > copy_process+0x2026/0x87a0 kernel/fork.c:1751 > > _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 > > __do_sys_clone kernel/fork.c:2323 [inline] > > __se_sys_clone kernel/fork.c:2317 [inline] > > __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 > > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > Freed by task 9325: > > save_stack+0x43/0xd0 mm/kasan/kasan.c:448 > > set_track mm/kasan/kasan.c:460 [inline] > > __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 > > kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 > > __cache_free mm/slab.c:3498 [inline] > > kmem_cache_free+0x83/0x290 mm/slab.c:3760 > > free_task_struct kernel/fork.c:163 [inline] > > free_task+0x16e/0x1f0 kernel/fork.c:457 > > copy_process+0x1dcc/0x87a0 kernel/fork.c:2148 > > _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 > > __do_sys_clone kernel/fork.c:2323 [inline] > > __se_sys_clone kernel/fork.c:2317 [inline] > > __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 > > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > The buggy address belongs to the object at ffff8881b72ae240 > > which belongs to the cache task_struct(81:syz2) of size 6080 > > The buggy address is located 4304 bytes inside of > > 6080-byte region [ffff8881b72ae240, ffff8881b72afa00) > > The buggy address belongs to the page: > > page:ffffea0006dcab80 count:1 mapcount:0 mapping:ffff8881d2dce0c0 index:0x0 compound_mapcount: 0 > > flags: 0x2fffc0000010200(slab|head) > > raw: 02fffc0000010200 ffffea00074a1f88 ffffea0006ebbb88 ffff8881d2dce0c0 > > raw: 0000000000000000 ffff8881b72ae240 0000000100000001 ffff8881d87fe580 > > page dumped because: kasan: bad access detected > > page->mem_cgroup:ffff8881d87fe580 > > > > Memory state around the buggy address: > > ffff8881b72af200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > ffff8881b72af280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >> ffff8881b72af300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > ^ > > ffff8881b72af380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > ffff8881b72af400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > ================================================================== > > > > > > . > > > > > -- > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/5C7BFE94.6070500%40huawei.com. > For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-04 7:40 ` Dmitry Vyukov @ 2019-03-04 14:00 ` zhong jiang 2019-03-04 14:11 ` Dmitry Vyukov 0 siblings, 1 reply; 26+ messages in thread From: zhong jiang @ 2019-03-04 14:00 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzbot, Michal Hocko, Andrea Arcangeli, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka On 2019/3/4 15:40, Dmitry Vyukov wrote: > On Sun, Mar 3, 2019 at 5:19 PM zhong jiang <zhongjiang@huawei.com> wrote: >> Hi, guys >> >> I also hit the following issue. but it fails to reproduce the issue by the log. >> >> it seems to the case that we access the mm->owner and deference it will result in the UAF. >> But it should not be possible that we specify the incomplete process to be the mm->owner. >> >> Any thoughts? > FWIW syzbot was able to reproduce this with this reproducer. > This looks like a very subtle race (threaded reproducer that runs > repeatedly in multiple processes), so most likely we are looking for > something like few instructions inconsistency window. > I has a little doubtful about the instrustions inconsistency window. I guess that you mean some smb barriers should be taken into account.:-) Because IMO, It should not be the lock case to result in the issue. Thanks, zhong jinag >> Thanks, >> zhong jiang >> >> On 2018/12/4 23:43, syzbot wrote: >>> syzbot has found a reproducer for the following crash on: >>> >>> HEAD commit: 0072a0c14d5b Merge tag 'media/v4.20-4' of git://git.kernel.. >>> git tree: upstream >>> console output: https://syzkaller.appspot.com/x/log.txt?x=11c885a3400000 >>> kernel config: https://syzkaller.appspot.com/x/.config?x=b9cc5a440391cbfd >>> dashboard link: https://syzkaller.appspot.com/bug?extid=cbb52e396df3e565ab02 >>> compiler: gcc (GCC) 8.0.1 20180413 (experimental) >>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12835e25400000 >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=172fa5a3400000 >>> >>> IMPORTANT: if you fix the bug, please add the following tag to the commit: >>> Reported-by: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com >>> >>> cgroup: fork rejected by pids controller in /syz2 >>> ================================================================== >>> BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:182 [inline] >>> BUG: KASAN: use-after-free in task_css include/linux/cgroup.h:477 [inline] >>> BUG: KASAN: use-after-free in mem_cgroup_from_task mm/memcontrol.c:815 [inline] >>> BUG: KASAN: use-after-free in get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 >>> Read of size 8 at addr ffff8881b72af310 by task syz-executor198/9332 >>> >>> CPU: 0 PID: 9332 Comm: syz-executor198 Not tainted 4.20.0-rc5+ #142 >>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 >>> Call Trace: >>> __dump_stack lib/dump_stack.c:77 [inline] >>> dump_stack+0x244/0x39d lib/dump_stack.c:113 >>> print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256 >>> kasan_report_error mm/kasan/report.c:354 [inline] >>> kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412 >>> __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 >>> __read_once_size include/linux/compiler.h:182 [inline] >>> task_css include/linux/cgroup.h:477 [inline] >>> mem_cgroup_from_task mm/memcontrol.c:815 [inline] >>> get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 >>> get_mem_cgroup_from_mm mm/memcontrol.c:834 [inline] >>> mem_cgroup_try_charge+0x608/0xe20 mm/memcontrol.c:5888 >>> mcopy_atomic_pte mm/userfaultfd.c:71 [inline] >>> mfill_atomic_pte mm/userfaultfd.c:418 [inline] >>> __mcopy_atomic mm/userfaultfd.c:559 [inline] >>> mcopy_atomic+0xb08/0x2c70 mm/userfaultfd.c:609 >>> userfaultfd_copy fs/userfaultfd.c:1705 [inline] >>> userfaultfd_ioctl+0x29fb/0x5610 fs/userfaultfd.c:1851 >>> vfs_ioctl fs/ioctl.c:46 [inline] >>> file_ioctl fs/ioctl.c:509 [inline] >>> do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 >>> ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 >>> __do_sys_ioctl fs/ioctl.c:720 [inline] >>> __se_sys_ioctl fs/ioctl.c:718 [inline] >>> __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 >>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>> RIP: 0033:0x44c7e9 >>> Code: 5d c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 >>> RSP: 002b:00007f906b69fdb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 >>> RAX: ffffffffffffffda RBX: 00000000006e4a08 RCX: 000000000044c7e9 >>> RDX: 0000000020000100 RSI: 00000000c028aa03 RDI: 0000000000000004 >>> RBP: 00000000006e4a00 R08: 0000000000000000 R09: 0000000000000000 >>> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e4a0c >>> R13: 00007ffdfd47813f R14: 00007f906b6a09c0 R15: 000000000000002d >>> >>> Allocated by task 9325: >>> save_stack+0x43/0xd0 mm/kasan/kasan.c:448 >>> set_track mm/kasan/kasan.c:460 [inline] >>> kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 >>> kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 >>> kmem_cache_alloc_node+0x144/0x730 mm/slab.c:3644 >>> alloc_task_struct_node kernel/fork.c:158 [inline] >>> dup_task_struct kernel/fork.c:843 [inline] >>> copy_process+0x2026/0x87a0 kernel/fork.c:1751 >>> _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 >>> __do_sys_clone kernel/fork.c:2323 [inline] >>> __se_sys_clone kernel/fork.c:2317 [inline] >>> __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 >>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>> >>> Freed by task 9325: >>> save_stack+0x43/0xd0 mm/kasan/kasan.c:448 >>> set_track mm/kasan/kasan.c:460 [inline] >>> __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 >>> kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 >>> __cache_free mm/slab.c:3498 [inline] >>> kmem_cache_free+0x83/0x290 mm/slab.c:3760 >>> free_task_struct kernel/fork.c:163 [inline] >>> free_task+0x16e/0x1f0 kernel/fork.c:457 >>> copy_process+0x1dcc/0x87a0 kernel/fork.c:2148 >>> _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 >>> __do_sys_clone kernel/fork.c:2323 [inline] >>> __se_sys_clone kernel/fork.c:2317 [inline] >>> __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 >>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>> >>> The buggy address belongs to the object at ffff8881b72ae240 >>> which belongs to the cache task_struct(81:syz2) of size 6080 >>> The buggy address is located 4304 bytes inside of >>> 6080-byte region [ffff8881b72ae240, ffff8881b72afa00) >>> The buggy address belongs to the page: >>> page:ffffea0006dcab80 count:1 mapcount:0 mapping:ffff8881d2dce0c0 index:0x0 compound_mapcount: 0 >>> flags: 0x2fffc0000010200(slab|head) >>> raw: 02fffc0000010200 ffffea00074a1f88 ffffea0006ebbb88 ffff8881d2dce0c0 >>> raw: 0000000000000000 ffff8881b72ae240 0000000100000001 ffff8881d87fe580 >>> page dumped because: kasan: bad access detected >>> page->mem_cgroup:ffff8881d87fe580 >>> >>> Memory state around the buggy address: >>> ffff8881b72af200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>> ffff8881b72af280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>> ffff8881b72af300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>> ^ >>> ffff8881b72af380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>> ffff8881b72af400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>> ================================================================== >>> >>> >>> . >>> >> >> -- >> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. >> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. >> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/5C7BFE94.6070500%40huawei.com. >> For more options, visit https://groups.google.com/d/optout. > . > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-04 14:00 ` zhong jiang @ 2019-03-04 14:11 ` Dmitry Vyukov 2019-03-04 15:32 ` zhong jiang 0 siblings, 1 reply; 26+ messages in thread From: Dmitry Vyukov @ 2019-03-04 14:11 UTC (permalink / raw) To: zhong jiang Cc: syzbot, Michal Hocko, Andrea Arcangeli, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka On Mon, Mar 4, 2019 at 3:00 PM zhong jiang <zhongjiang@huawei.com> wrote: > > On 2019/3/4 15:40, Dmitry Vyukov wrote: > > On Sun, Mar 3, 2019 at 5:19 PM zhong jiang <zhongjiang@huawei.com> wrote: > >> Hi, guys > >> > >> I also hit the following issue. but it fails to reproduce the issue by the log. > >> > >> it seems to the case that we access the mm->owner and deference it will result in the UAF. > >> But it should not be possible that we specify the incomplete process to be the mm->owner. > >> > >> Any thoughts? > > FWIW syzbot was able to reproduce this with this reproducer. > > This looks like a very subtle race (threaded reproducer that runs > > repeatedly in multiple processes), so most likely we are looking for > > something like few instructions inconsistency window. > > > > I has a little doubtful about the instrustions inconsistency window. > > I guess that you mean some smb barriers should be taken into account.:-) > > Because IMO, It should not be the lock case to result in the issue. Since the crash was triggered on x86 _most likley_ this is not a missed barrier. What I meant is that one thread needs to executed some code, while another thread is stopped within few instructions. > Thanks, > zhong jinag > >> Thanks, > >> zhong jiang > >> > >> On 2018/12/4 23:43, syzbot wrote: > >>> syzbot has found a reproducer for the following crash on: > >>> > >>> HEAD commit: 0072a0c14d5b Merge tag 'media/v4.20-4' of git://git.kernel.. > >>> git tree: upstream > >>> console output: https://syzkaller.appspot.com/x/log.txt?x=11c885a3400000 > >>> kernel config: https://syzkaller.appspot.com/x/.config?x=b9cc5a440391cbfd > >>> dashboard link: https://syzkaller.appspot.com/bug?extid=cbb52e396df3e565ab02 > >>> compiler: gcc (GCC) 8.0.1 20180413 (experimental) > >>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12835e25400000 > >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=172fa5a3400000 > >>> > >>> IMPORTANT: if you fix the bug, please add the following tag to the commit: > >>> Reported-by: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com > >>> > >>> cgroup: fork rejected by pids controller in /syz2 > >>> ================================================================== > >>> BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:182 [inline] > >>> BUG: KASAN: use-after-free in task_css include/linux/cgroup.h:477 [inline] > >>> BUG: KASAN: use-after-free in mem_cgroup_from_task mm/memcontrol.c:815 [inline] > >>> BUG: KASAN: use-after-free in get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 > >>> Read of size 8 at addr ffff8881b72af310 by task syz-executor198/9332 > >>> > >>> CPU: 0 PID: 9332 Comm: syz-executor198 Not tainted 4.20.0-rc5+ #142 > >>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > >>> Call Trace: > >>> __dump_stack lib/dump_stack.c:77 [inline] > >>> dump_stack+0x244/0x39d lib/dump_stack.c:113 > >>> print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256 > >>> kasan_report_error mm/kasan/report.c:354 [inline] > >>> kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412 > >>> __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 > >>> __read_once_size include/linux/compiler.h:182 [inline] > >>> task_css include/linux/cgroup.h:477 [inline] > >>> mem_cgroup_from_task mm/memcontrol.c:815 [inline] > >>> get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 > >>> get_mem_cgroup_from_mm mm/memcontrol.c:834 [inline] > >>> mem_cgroup_try_charge+0x608/0xe20 mm/memcontrol.c:5888 > >>> mcopy_atomic_pte mm/userfaultfd.c:71 [inline] > >>> mfill_atomic_pte mm/userfaultfd.c:418 [inline] > >>> __mcopy_atomic mm/userfaultfd.c:559 [inline] > >>> mcopy_atomic+0xb08/0x2c70 mm/userfaultfd.c:609 > >>> userfaultfd_copy fs/userfaultfd.c:1705 [inline] > >>> userfaultfd_ioctl+0x29fb/0x5610 fs/userfaultfd.c:1851 > >>> vfs_ioctl fs/ioctl.c:46 [inline] > >>> file_ioctl fs/ioctl.c:509 [inline] > >>> do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 > >>> ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 > >>> __do_sys_ioctl fs/ioctl.c:720 [inline] > >>> __se_sys_ioctl fs/ioctl.c:718 [inline] > >>> __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 > >>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe > >>> RIP: 0033:0x44c7e9 > >>> Code: 5d c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 > >>> RSP: 002b:00007f906b69fdb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > >>> RAX: ffffffffffffffda RBX: 00000000006e4a08 RCX: 000000000044c7e9 > >>> RDX: 0000000020000100 RSI: 00000000c028aa03 RDI: 0000000000000004 > >>> RBP: 00000000006e4a00 R08: 0000000000000000 R09: 0000000000000000 > >>> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e4a0c > >>> R13: 00007ffdfd47813f R14: 00007f906b6a09c0 R15: 000000000000002d > >>> > >>> Allocated by task 9325: > >>> save_stack+0x43/0xd0 mm/kasan/kasan.c:448 > >>> set_track mm/kasan/kasan.c:460 [inline] > >>> kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 > >>> kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 > >>> kmem_cache_alloc_node+0x144/0x730 mm/slab.c:3644 > >>> alloc_task_struct_node kernel/fork.c:158 [inline] > >>> dup_task_struct kernel/fork.c:843 [inline] > >>> copy_process+0x2026/0x87a0 kernel/fork.c:1751 > >>> _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 > >>> __do_sys_clone kernel/fork.c:2323 [inline] > >>> __se_sys_clone kernel/fork.c:2317 [inline] > >>> __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 > >>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe > >>> > >>> Freed by task 9325: > >>> save_stack+0x43/0xd0 mm/kasan/kasan.c:448 > >>> set_track mm/kasan/kasan.c:460 [inline] > >>> __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 > >>> kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 > >>> __cache_free mm/slab.c:3498 [inline] > >>> kmem_cache_free+0x83/0x290 mm/slab.c:3760 > >>> free_task_struct kernel/fork.c:163 [inline] > >>> free_task+0x16e/0x1f0 kernel/fork.c:457 > >>> copy_process+0x1dcc/0x87a0 kernel/fork.c:2148 > >>> _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 > >>> __do_sys_clone kernel/fork.c:2323 [inline] > >>> __se_sys_clone kernel/fork.c:2317 [inline] > >>> __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 > >>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe > >>> > >>> The buggy address belongs to the object at ffff8881b72ae240 > >>> which belongs to the cache task_struct(81:syz2) of size 6080 > >>> The buggy address is located 4304 bytes inside of > >>> 6080-byte region [ffff8881b72ae240, ffff8881b72afa00) > >>> The buggy address belongs to the page: > >>> page:ffffea0006dcab80 count:1 mapcount:0 mapping:ffff8881d2dce0c0 index:0x0 compound_mapcount: 0 > >>> flags: 0x2fffc0000010200(slab|head) > >>> raw: 02fffc0000010200 ffffea00074a1f88 ffffea0006ebbb88 ffff8881d2dce0c0 > >>> raw: 0000000000000000 ffff8881b72ae240 0000000100000001 ffff8881d87fe580 > >>> page dumped because: kasan: bad access detected > >>> page->mem_cgroup:ffff8881d87fe580 > >>> > >>> Memory state around the buggy address: > >>> ffff8881b72af200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >>> ffff8881b72af280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >>>> ffff8881b72af300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >>> ^ > >>> ffff8881b72af380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >>> ffff8881b72af400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >>> ================================================================== > >>> > >>> > >>> . > >>> > >> > >> -- > >> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > >> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. > >> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/5C7BFE94.6070500%40huawei.com. > >> For more options, visit https://groups.google.com/d/optout. > > . > > > > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-04 14:11 ` Dmitry Vyukov @ 2019-03-04 15:32 ` zhong jiang 2019-03-05 6:26 ` Dmitry Vyukov 0 siblings, 1 reply; 26+ messages in thread From: zhong jiang @ 2019-03-04 15:32 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzbot, Michal Hocko, Andrea Arcangeli, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka On 2019/3/4 22:11, Dmitry Vyukov wrote: > On Mon, Mar 4, 2019 at 3:00 PM zhong jiang <zhongjiang@huawei.com> wrote: >> On 2019/3/4 15:40, Dmitry Vyukov wrote: >>> On Sun, Mar 3, 2019 at 5:19 PM zhong jiang <zhongjiang@huawei.com> wrote: >>>> Hi, guys >>>> >>>> I also hit the following issue. but it fails to reproduce the issue by the log. >>>> >>>> it seems to the case that we access the mm->owner and deference it will result in the UAF. >>>> But it should not be possible that we specify the incomplete process to be the mm->owner. >>>> >>>> Any thoughts? >>> FWIW syzbot was able to reproduce this with this reproducer. >>> This looks like a very subtle race (threaded reproducer that runs >>> repeatedly in multiple processes), so most likely we are looking for >>> something like few instructions inconsistency window. >>> >> I has a little doubtful about the instrustions inconsistency window. >> >> I guess that you mean some smb barriers should be taken into account.:-) >> >> Because IMO, It should not be the lock case to result in the issue. > > Since the crash was triggered on x86 _most likley_ this is not a > missed barrier. What I meant is that one thread needs to executed some > code, while another thread is stopped within few instructions. > > It is weird and I can not find any relationship you had said with the issue.:-( Because It is the cause that mm->owner has been freed, whereas we still deference it. From the lastest freed task call trace, It fails to create process. Am I miss something or I misunderstand your meaning. Please correct me. Thanks, zhong jiang >> Thanks, >> zhong jinag >>>> Thanks, >>>> zhong jiang >>>> >>>> On 2018/12/4 23:43, syzbot wrote: >>>>> syzbot has found a reproducer for the following crash on: >>>>> >>>>> HEAD commit: 0072a0c14d5b Merge tag 'media/v4.20-4' of git://git.kernel.. >>>>> git tree: upstream >>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=11c885a3400000 >>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=b9cc5a440391cbfd >>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=cbb52e396df3e565ab02 >>>>> compiler: gcc (GCC) 8.0.1 20180413 (experimental) >>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12835e25400000 >>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=172fa5a3400000 >>>>> >>>>> IMPORTANT: if you fix the bug, please add the following tag to the commit: >>>>> Reported-by: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com >>>>> >>>>> cgroup: fork rejected by pids controller in /syz2 >>>>> ================================================================== >>>>> BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:182 [inline] >>>>> BUG: KASAN: use-after-free in task_css include/linux/cgroup.h:477 [inline] >>>>> BUG: KASAN: use-after-free in mem_cgroup_from_task mm/memcontrol.c:815 [inline] >>>>> BUG: KASAN: use-after-free in get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 >>>>> Read of size 8 at addr ffff8881b72af310 by task syz-executor198/9332 >>>>> >>>>> CPU: 0 PID: 9332 Comm: syz-executor198 Not tainted 4.20.0-rc5+ #142 >>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 >>>>> Call Trace: >>>>> __dump_stack lib/dump_stack.c:77 [inline] >>>>> dump_stack+0x244/0x39d lib/dump_stack.c:113 >>>>> print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256 >>>>> kasan_report_error mm/kasan/report.c:354 [inline] >>>>> kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412 >>>>> __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 >>>>> __read_once_size include/linux/compiler.h:182 [inline] >>>>> task_css include/linux/cgroup.h:477 [inline] >>>>> mem_cgroup_from_task mm/memcontrol.c:815 [inline] >>>>> get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 >>>>> get_mem_cgroup_from_mm mm/memcontrol.c:834 [inline] >>>>> mem_cgroup_try_charge+0x608/0xe20 mm/memcontrol.c:5888 >>>>> mcopy_atomic_pte mm/userfaultfd.c:71 [inline] >>>>> mfill_atomic_pte mm/userfaultfd.c:418 [inline] >>>>> __mcopy_atomic mm/userfaultfd.c:559 [inline] >>>>> mcopy_atomic+0xb08/0x2c70 mm/userfaultfd.c:609 >>>>> userfaultfd_copy fs/userfaultfd.c:1705 [inline] >>>>> userfaultfd_ioctl+0x29fb/0x5610 fs/userfaultfd.c:1851 >>>>> vfs_ioctl fs/ioctl.c:46 [inline] >>>>> file_ioctl fs/ioctl.c:509 [inline] >>>>> do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 >>>>> ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 >>>>> __do_sys_ioctl fs/ioctl.c:720 [inline] >>>>> __se_sys_ioctl fs/ioctl.c:718 [inline] >>>>> __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 >>>>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 >>>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>>> RIP: 0033:0x44c7e9 >>>>> Code: 5d c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 >>>>> RSP: 002b:00007f906b69fdb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 >>>>> RAX: ffffffffffffffda RBX: 00000000006e4a08 RCX: 000000000044c7e9 >>>>> RDX: 0000000020000100 RSI: 00000000c028aa03 RDI: 0000000000000004 >>>>> RBP: 00000000006e4a00 R08: 0000000000000000 R09: 0000000000000000 >>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e4a0c >>>>> R13: 00007ffdfd47813f R14: 00007f906b6a09c0 R15: 000000000000002d >>>>> >>>>> Allocated by task 9325: >>>>> save_stack+0x43/0xd0 mm/kasan/kasan.c:448 >>>>> set_track mm/kasan/kasan.c:460 [inline] >>>>> kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 >>>>> kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 >>>>> kmem_cache_alloc_node+0x144/0x730 mm/slab.c:3644 >>>>> alloc_task_struct_node kernel/fork.c:158 [inline] >>>>> dup_task_struct kernel/fork.c:843 [inline] >>>>> copy_process+0x2026/0x87a0 kernel/fork.c:1751 >>>>> _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 >>>>> __do_sys_clone kernel/fork.c:2323 [inline] >>>>> __se_sys_clone kernel/fork.c:2317 [inline] >>>>> __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 >>>>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 >>>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>>> >>>>> Freed by task 9325: >>>>> save_stack+0x43/0xd0 mm/kasan/kasan.c:448 >>>>> set_track mm/kasan/kasan.c:460 [inline] >>>>> __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 >>>>> kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 >>>>> __cache_free mm/slab.c:3498 [inline] >>>>> kmem_cache_free+0x83/0x290 mm/slab.c:3760 >>>>> free_task_struct kernel/fork.c:163 [inline] >>>>> free_task+0x16e/0x1f0 kernel/fork.c:457 >>>>> copy_process+0x1dcc/0x87a0 kernel/fork.c:2148 >>>>> _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 >>>>> __do_sys_clone kernel/fork.c:2323 [inline] >>>>> __se_sys_clone kernel/fork.c:2317 [inline] >>>>> __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 >>>>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 >>>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>>> >>>>> The buggy address belongs to the object at ffff8881b72ae240 >>>>> which belongs to the cache task_struct(81:syz2) of size 6080 >>>>> The buggy address is located 4304 bytes inside of >>>>> 6080-byte region [ffff8881b72ae240, ffff8881b72afa00) >>>>> The buggy address belongs to the page: >>>>> page:ffffea0006dcab80 count:1 mapcount:0 mapping:ffff8881d2dce0c0 index:0x0 compound_mapcount: 0 >>>>> flags: 0x2fffc0000010200(slab|head) >>>>> raw: 02fffc0000010200 ffffea00074a1f88 ffffea0006ebbb88 ffff8881d2dce0c0 >>>>> raw: 0000000000000000 ffff8881b72ae240 0000000100000001 ffff8881d87fe580 >>>>> page dumped because: kasan: bad access detected >>>>> page->mem_cgroup:ffff8881d87fe580 >>>>> >>>>> Memory state around the buggy address: >>>>> ffff8881b72af200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>>> ffff8881b72af280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>>>> ffff8881b72af300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>>> ^ >>>>> ffff8881b72af380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>>> ffff8881b72af400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>>> ================================================================== >>>>> >>>>> >>>>> . >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. >>>> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. >>>> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/5C7BFE94.6070500%40huawei.com. >>>> For more options, visit https://groups.google.com/d/optout. >>> . >>> >> > . > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-04 15:32 ` zhong jiang @ 2019-03-05 6:26 ` Dmitry Vyukov 2019-03-05 6:42 ` zhong jiang 0 siblings, 1 reply; 26+ messages in thread From: Dmitry Vyukov @ 2019-03-05 6:26 UTC (permalink / raw) To: zhong jiang Cc: syzbot, Michal Hocko, Andrea Arcangeli, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka On Mon, Mar 4, 2019 at 4:32 PM zhong jiang <zhongjiang@huawei.com> wrote: > > On 2019/3/4 22:11, Dmitry Vyukov wrote: > > On Mon, Mar 4, 2019 at 3:00 PM zhong jiang <zhongjiang@huawei.com> wrote: > >> On 2019/3/4 15:40, Dmitry Vyukov wrote: > >>> On Sun, Mar 3, 2019 at 5:19 PM zhong jiang <zhongjiang@huawei.com> wrote: > >>>> Hi, guys > >>>> > >>>> I also hit the following issue. but it fails to reproduce the issue by the log. > >>>> > >>>> it seems to the case that we access the mm->owner and deference it will result in the UAF. > >>>> But it should not be possible that we specify the incomplete process to be the mm->owner. > >>>> > >>>> Any thoughts? > >>> FWIW syzbot was able to reproduce this with this reproducer. > >>> This looks like a very subtle race (threaded reproducer that runs > >>> repeatedly in multiple processes), so most likely we are looking for > >>> something like few instructions inconsistency window. > >>> > >> I has a little doubtful about the instrustions inconsistency window. > >> > >> I guess that you mean some smb barriers should be taken into account.:-) > >> > >> Because IMO, It should not be the lock case to result in the issue. > > > > Since the crash was triggered on x86 _most likley_ this is not a > > missed barrier. What I meant is that one thread needs to executed some > > code, while another thread is stopped within few instructions. > > > > > It is weird and I can not find any relationship you had said with the issue.:-( > > Because It is the cause that mm->owner has been freed, whereas we still deference it. > > From the lastest freed task call trace, It fails to create process. > > Am I miss something or I misunderstand your meaning. Please correct me. Your analysis looks correct. I am just saying that the root cause of this use-after-free seems to be a race condition. > >>>> On 2018/12/4 23:43, syzbot wrote: > >>>>> syzbot has found a reproducer for the following crash on: > >>>>> > >>>>> HEAD commit: 0072a0c14d5b Merge tag 'media/v4.20-4' of git://git.kernel.. > >>>>> git tree: upstream > >>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=11c885a3400000 > >>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=b9cc5a440391cbfd > >>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=cbb52e396df3e565ab02 > >>>>> compiler: gcc (GCC) 8.0.1 20180413 (experimental) > >>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12835e25400000 > >>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=172fa5a3400000 > >>>>> > >>>>> IMPORTANT: if you fix the bug, please add the following tag to the commit: > >>>>> Reported-by: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com > >>>>> > >>>>> cgroup: fork rejected by pids controller in /syz2 > >>>>> ================================================================== > >>>>> BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:182 [inline] > >>>>> BUG: KASAN: use-after-free in task_css include/linux/cgroup.h:477 [inline] > >>>>> BUG: KASAN: use-after-free in mem_cgroup_from_task mm/memcontrol.c:815 [inline] > >>>>> BUG: KASAN: use-after-free in get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 > >>>>> Read of size 8 at addr ffff8881b72af310 by task syz-executor198/9332 > >>>>> > >>>>> CPU: 0 PID: 9332 Comm: syz-executor198 Not tainted 4.20.0-rc5+ #142 > >>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > >>>>> Call Trace: > >>>>> __dump_stack lib/dump_stack.c:77 [inline] > >>>>> dump_stack+0x244/0x39d lib/dump_stack.c:113 > >>>>> print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256 > >>>>> kasan_report_error mm/kasan/report.c:354 [inline] > >>>>> kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412 > >>>>> __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 > >>>>> __read_once_size include/linux/compiler.h:182 [inline] > >>>>> task_css include/linux/cgroup.h:477 [inline] > >>>>> mem_cgroup_from_task mm/memcontrol.c:815 [inline] > >>>>> get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 > >>>>> get_mem_cgroup_from_mm mm/memcontrol.c:834 [inline] > >>>>> mem_cgroup_try_charge+0x608/0xe20 mm/memcontrol.c:5888 > >>>>> mcopy_atomic_pte mm/userfaultfd.c:71 [inline] > >>>>> mfill_atomic_pte mm/userfaultfd.c:418 [inline] > >>>>> __mcopy_atomic mm/userfaultfd.c:559 [inline] > >>>>> mcopy_atomic+0xb08/0x2c70 mm/userfaultfd.c:609 > >>>>> userfaultfd_copy fs/userfaultfd.c:1705 [inline] > >>>>> userfaultfd_ioctl+0x29fb/0x5610 fs/userfaultfd.c:1851 > >>>>> vfs_ioctl fs/ioctl.c:46 [inline] > >>>>> file_ioctl fs/ioctl.c:509 [inline] > >>>>> do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 > >>>>> ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 > >>>>> __do_sys_ioctl fs/ioctl.c:720 [inline] > >>>>> __se_sys_ioctl fs/ioctl.c:718 [inline] > >>>>> __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 > >>>>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > >>>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe > >>>>> RIP: 0033:0x44c7e9 > >>>>> Code: 5d c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 > >>>>> RSP: 002b:00007f906b69fdb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > >>>>> RAX: ffffffffffffffda RBX: 00000000006e4a08 RCX: 000000000044c7e9 > >>>>> RDX: 0000000020000100 RSI: 00000000c028aa03 RDI: 0000000000000004 > >>>>> RBP: 00000000006e4a00 R08: 0000000000000000 R09: 0000000000000000 > >>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e4a0c > >>>>> R13: 00007ffdfd47813f R14: 00007f906b6a09c0 R15: 000000000000002d > >>>>> > >>>>> Allocated by task 9325: > >>>>> save_stack+0x43/0xd0 mm/kasan/kasan.c:448 > >>>>> set_track mm/kasan/kasan.c:460 [inline] > >>>>> kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 > >>>>> kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 > >>>>> kmem_cache_alloc_node+0x144/0x730 mm/slab.c:3644 > >>>>> alloc_task_struct_node kernel/fork.c:158 [inline] > >>>>> dup_task_struct kernel/fork.c:843 [inline] > >>>>> copy_process+0x2026/0x87a0 kernel/fork.c:1751 > >>>>> _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 > >>>>> __do_sys_clone kernel/fork.c:2323 [inline] > >>>>> __se_sys_clone kernel/fork.c:2317 [inline] > >>>>> __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 > >>>>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > >>>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe > >>>>> > >>>>> Freed by task 9325: > >>>>> save_stack+0x43/0xd0 mm/kasan/kasan.c:448 > >>>>> set_track mm/kasan/kasan.c:460 [inline] > >>>>> __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 > >>>>> kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 > >>>>> __cache_free mm/slab.c:3498 [inline] > >>>>> kmem_cache_free+0x83/0x290 mm/slab.c:3760 > >>>>> free_task_struct kernel/fork.c:163 [inline] > >>>>> free_task+0x16e/0x1f0 kernel/fork.c:457 > >>>>> copy_process+0x1dcc/0x87a0 kernel/fork.c:2148 > >>>>> _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 > >>>>> __do_sys_clone kernel/fork.c:2323 [inline] > >>>>> __se_sys_clone kernel/fork.c:2317 [inline] > >>>>> __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 > >>>>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > >>>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe > >>>>> > >>>>> The buggy address belongs to the object at ffff8881b72ae240 > >>>>> which belongs to the cache task_struct(81:syz2) of size 6080 > >>>>> The buggy address is located 4304 bytes inside of > >>>>> 6080-byte region [ffff8881b72ae240, ffff8881b72afa00) > >>>>> The buggy address belongs to the page: > >>>>> page:ffffea0006dcab80 count:1 mapcount:0 mapping:ffff8881d2dce0c0 index:0x0 compound_mapcount: 0 > >>>>> flags: 0x2fffc0000010200(slab|head) > >>>>> raw: 02fffc0000010200 ffffea00074a1f88 ffffea0006ebbb88 ffff8881d2dce0c0 > >>>>> raw: 0000000000000000 ffff8881b72ae240 0000000100000001 ffff8881d87fe580 > >>>>> page dumped because: kasan: bad access detected > >>>>> page->mem_cgroup:ffff8881d87fe580 > >>>>> > >>>>> Memory state around the buggy address: > >>>>> ffff8881b72af200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >>>>> ffff8881b72af280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >>>>>> ffff8881b72af300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >>>>> ^ > >>>>> ffff8881b72af380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >>>>> ffff8881b72af400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >>>>> ================================================================== > >>>>> > >>>>> > >>>>> . > >>>>> > >>>> -- > >>>> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > >>>> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. > >>>> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/5C7BFE94.6070500%40huawei.com. > >>>> For more options, visit https://groups.google.com/d/optout. > >>> . > >>> > >> > > . > > > > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-05 6:26 ` Dmitry Vyukov @ 2019-03-05 6:42 ` zhong jiang 2019-03-06 2:05 ` Andrea Arcangeli 0 siblings, 1 reply; 26+ messages in thread From: zhong jiang @ 2019-03-05 6:42 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzbot, Michal Hocko, Andrea Arcangeli, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka On 2019/3/5 14:26, Dmitry Vyukov wrote: > On Mon, Mar 4, 2019 at 4:32 PM zhong jiang <zhongjiang@huawei.com> wrote: >> On 2019/3/4 22:11, Dmitry Vyukov wrote: >>> On Mon, Mar 4, 2019 at 3:00 PM zhong jiang <zhongjiang@huawei.com> wrote: >>>> On 2019/3/4 15:40, Dmitry Vyukov wrote: >>>>> On Sun, Mar 3, 2019 at 5:19 PM zhong jiang <zhongjiang@huawei.com> wrote: >>>>>> Hi, guys >>>>>> >>>>>> I also hit the following issue. but it fails to reproduce the issue by the log. >>>>>> >>>>>> it seems to the case that we access the mm->owner and deference it will result in the UAF. >>>>>> But it should not be possible that we specify the incomplete process to be the mm->owner. >>>>>> >>>>>> Any thoughts? >>>>> FWIW syzbot was able to reproduce this with this reproducer. >>>>> This looks like a very subtle race (threaded reproducer that runs >>>>> repeatedly in multiple processes), so most likely we are looking for >>>>> something like few instructions inconsistency window. >>>>> >>>> I has a little doubtful about the instrustions inconsistency window. >>>> >>>> I guess that you mean some smb barriers should be taken into account.:-) >>>> >>>> Because IMO, It should not be the lock case to result in the issue. >>> Since the crash was triggered on x86 _most likley_ this is not a >>> missed barrier. What I meant is that one thread needs to executed some >>> code, while another thread is stopped within few instructions. >>> >>> >> It is weird and I can not find any relationship you had said with the issue.:-( >> >> Because It is the cause that mm->owner has been freed, whereas we still deference it. >> >> From the lastest freed task call trace, It fails to create process. >> >> Am I miss something or I misunderstand your meaning. Please correct me. > Your analysis looks correct. I am just saying that the root cause of > this use-after-free seems to be a race condition. > > > Yep, Indeed, I can not figure out how the race works. I will dig up further. Thanks, zhong jiang > >>>>>> On 2018/12/4 23:43, syzbot wrote: >>>>>>> syzbot has found a reproducer for the following crash on: >>>>>>> >>>>>>> HEAD commit: 0072a0c14d5b Merge tag 'media/v4.20-4' of git://git.kernel.. >>>>>>> git tree: upstream >>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=11c885a3400000 >>>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=b9cc5a440391cbfd >>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=cbb52e396df3e565ab02 >>>>>>> compiler: gcc (GCC) 8.0.1 20180413 (experimental) >>>>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12835e25400000 >>>>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=172fa5a3400000 >>>>>>> >>>>>>> IMPORTANT: if you fix the bug, please add the following tag to the commit: >>>>>>> Reported-by: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com >>>>>>> >>>>>>> cgroup: fork rejected by pids controller in /syz2 >>>>>>> ================================================================== >>>>>>> BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:182 [inline] >>>>>>> BUG: KASAN: use-after-free in task_css include/linux/cgroup.h:477 [inline] >>>>>>> BUG: KASAN: use-after-free in mem_cgroup_from_task mm/memcontrol.c:815 [inline] >>>>>>> BUG: KASAN: use-after-free in get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 >>>>>>> Read of size 8 at addr ffff8881b72af310 by task syz-executor198/9332 >>>>>>> >>>>>>> CPU: 0 PID: 9332 Comm: syz-executor198 Not tainted 4.20.0-rc5+ #142 >>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 >>>>>>> Call Trace: >>>>>>> __dump_stack lib/dump_stack.c:77 [inline] >>>>>>> dump_stack+0x244/0x39d lib/dump_stack.c:113 >>>>>>> print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256 >>>>>>> kasan_report_error mm/kasan/report.c:354 [inline] >>>>>>> kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412 >>>>>>> __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 >>>>>>> __read_once_size include/linux/compiler.h:182 [inline] >>>>>>> task_css include/linux/cgroup.h:477 [inline] >>>>>>> mem_cgroup_from_task mm/memcontrol.c:815 [inline] >>>>>>> get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 >>>>>>> get_mem_cgroup_from_mm mm/memcontrol.c:834 [inline] >>>>>>> mem_cgroup_try_charge+0x608/0xe20 mm/memcontrol.c:5888 >>>>>>> mcopy_atomic_pte mm/userfaultfd.c:71 [inline] >>>>>>> mfill_atomic_pte mm/userfaultfd.c:418 [inline] >>>>>>> __mcopy_atomic mm/userfaultfd.c:559 [inline] >>>>>>> mcopy_atomic+0xb08/0x2c70 mm/userfaultfd.c:609 >>>>>>> userfaultfd_copy fs/userfaultfd.c:1705 [inline] >>>>>>> userfaultfd_ioctl+0x29fb/0x5610 fs/userfaultfd.c:1851 >>>>>>> vfs_ioctl fs/ioctl.c:46 [inline] >>>>>>> file_ioctl fs/ioctl.c:509 [inline] >>>>>>> do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 >>>>>>> ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 >>>>>>> __do_sys_ioctl fs/ioctl.c:720 [inline] >>>>>>> __se_sys_ioctl fs/ioctl.c:718 [inline] >>>>>>> __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 >>>>>>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 >>>>>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>>>>> RIP: 0033:0x44c7e9 >>>>>>> Code: 5d c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 >>>>>>> RSP: 002b:00007f906b69fdb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 >>>>>>> RAX: ffffffffffffffda RBX: 00000000006e4a08 RCX: 000000000044c7e9 >>>>>>> RDX: 0000000020000100 RSI: 00000000c028aa03 RDI: 0000000000000004 >>>>>>> RBP: 00000000006e4a00 R08: 0000000000000000 R09: 0000000000000000 >>>>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e4a0c >>>>>>> R13: 00007ffdfd47813f R14: 00007f906b6a09c0 R15: 000000000000002d >>>>>>> >>>>>>> Allocated by task 9325: >>>>>>> save_stack+0x43/0xd0 mm/kasan/kasan.c:448 >>>>>>> set_track mm/kasan/kasan.c:460 [inline] >>>>>>> kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 >>>>>>> kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 >>>>>>> kmem_cache_alloc_node+0x144/0x730 mm/slab.c:3644 >>>>>>> alloc_task_struct_node kernel/fork.c:158 [inline] >>>>>>> dup_task_struct kernel/fork.c:843 [inline] >>>>>>> copy_process+0x2026/0x87a0 kernel/fork.c:1751 >>>>>>> _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 >>>>>>> __do_sys_clone kernel/fork.c:2323 [inline] >>>>>>> __se_sys_clone kernel/fork.c:2317 [inline] >>>>>>> __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 >>>>>>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 >>>>>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>>>>> >>>>>>> Freed by task 9325: >>>>>>> save_stack+0x43/0xd0 mm/kasan/kasan.c:448 >>>>>>> set_track mm/kasan/kasan.c:460 [inline] >>>>>>> __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 >>>>>>> kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 >>>>>>> __cache_free mm/slab.c:3498 [inline] >>>>>>> kmem_cache_free+0x83/0x290 mm/slab.c:3760 >>>>>>> free_task_struct kernel/fork.c:163 [inline] >>>>>>> free_task+0x16e/0x1f0 kernel/fork.c:457 >>>>>>> copy_process+0x1dcc/0x87a0 kernel/fork.c:2148 >>>>>>> _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 >>>>>>> __do_sys_clone kernel/fork.c:2323 [inline] >>>>>>> __se_sys_clone kernel/fork.c:2317 [inline] >>>>>>> __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 >>>>>>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 >>>>>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>>>>> >>>>>>> The buggy address belongs to the object at ffff8881b72ae240 >>>>>>> which belongs to the cache task_struct(81:syz2) of size 6080 >>>>>>> The buggy address is located 4304 bytes inside of >>>>>>> 6080-byte region [ffff8881b72ae240, ffff8881b72afa00) >>>>>>> The buggy address belongs to the page: >>>>>>> page:ffffea0006dcab80 count:1 mapcount:0 mapping:ffff8881d2dce0c0 index:0x0 compound_mapcount: 0 >>>>>>> flags: 0x2fffc0000010200(slab|head) >>>>>>> raw: 02fffc0000010200 ffffea00074a1f88 ffffea0006ebbb88 ffff8881d2dce0c0 >>>>>>> raw: 0000000000000000 ffff8881b72ae240 0000000100000001 ffff8881d87fe580 >>>>>>> page dumped because: kasan: bad access detected >>>>>>> page->mem_cgroup:ffff8881d87fe580 >>>>>>> >>>>>>> Memory state around the buggy address: >>>>>>> ffff8881b72af200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>>>>> ffff8881b72af280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>>>>>> ffff8881b72af300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>>>>> ^ >>>>>>> ffff8881b72af380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>>>>> ffff8881b72af400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>>>>> ================================================================== >>>>>>> >>>>>>> >>>>>>> . >>>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. >>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/5C7BFE94.6070500%40huawei.com. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>> . >>>>> >>> . >>> >> > . > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-05 6:42 ` zhong jiang @ 2019-03-06 2:05 ` Andrea Arcangeli 2019-03-06 5:53 ` zhong jiang 2019-03-08 7:10 ` zhong jiang 0 siblings, 2 replies; 26+ messages in thread From: Andrea Arcangeli @ 2019-03-06 2:05 UTC (permalink / raw) To: zhong jiang Cc: Dmitry Vyukov, syzbot, Michal Hocko, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka, Mike Rapoport, Peter Xu Hello everyone, [ CC'ed Mike and Peter ] On Tue, Mar 05, 2019 at 02:42:00PM +0800, zhong jiang wrote: > On 2019/3/5 14:26, Dmitry Vyukov wrote: > > On Mon, Mar 4, 2019 at 4:32 PM zhong jiang <zhongjiang@huawei.com> wrote: > >> On 2019/3/4 22:11, Dmitry Vyukov wrote: > >>> On Mon, Mar 4, 2019 at 3:00 PM zhong jiang <zhongjiang@huawei.com> wrote: > >>>> On 2019/3/4 15:40, Dmitry Vyukov wrote: > >>>>> On Sun, Mar 3, 2019 at 5:19 PM zhong jiang <zhongjiang@huawei.com> wrote: > >>>>>> Hi, guys > >>>>>> > >>>>>> I also hit the following issue. but it fails to reproduce the issue by the log. > >>>>>> > >>>>>> it seems to the case that we access the mm->owner and deference it will result in the UAF. > >>>>>> But it should not be possible that we specify the incomplete process to be the mm->owner. > >>>>>> > >>>>>> Any thoughts? > >>>>> FWIW syzbot was able to reproduce this with this reproducer. > >>>>> This looks like a very subtle race (threaded reproducer that runs > >>>>> repeatedly in multiple processes), so most likely we are looking for > >>>>> something like few instructions inconsistency window. > >>>>> > >>>> I has a little doubtful about the instrustions inconsistency window. > >>>> > >>>> I guess that you mean some smb barriers should be taken into account.:-) > >>>> > >>>> Because IMO, It should not be the lock case to result in the issue. > >>> Since the crash was triggered on x86 _most likley_ this is not a > >>> missed barrier. What I meant is that one thread needs to executed some > >>> code, while another thread is stopped within few instructions. > >>> > >>> > >> It is weird and I can not find any relationship you had said with the issue.:-( > >> > >> Because It is the cause that mm->owner has been freed, whereas we still deference it. > >> > >> From the lastest freed task call trace, It fails to create process. > >> > >> Am I miss something or I misunderstand your meaning. Please correct me. > > Your analysis looks correct. I am just saying that the root cause of > > this use-after-free seems to be a race condition. > > > > > > > Yep, Indeed, I can not figure out how the race works. I will dig up further. Yes it's a race condition. We were aware about the non-cooperative fork userfaultfd feature creating userfaultfd file descriptor that gets reported to the parent uffd, despite they belong to mm created by failed forks. https://www.spinics.net/lists/linux-mm/msg136357.html The fork failure in my testcase happened because of signal pending that interrupted fork after the failed-fork uffd context, was already pushed to the userfaultfd reader/monitor. CRIU then takes care of filtering the failed fork cases so we didn't want to make the fork code more complicated just for userfaultfd. In reality if MEMCG is enabled at build time, mm->owner maintainance code now creates a race condition in the above case, with any fork failure. I pinged Mike yesterday to ask if my theory could be true for this bug and one solution he suggested is to do the userfaultfd_dup at a point where fork cannot fail anymore. That's precisely what we were wondering to do back then to avoid the failed fork reports to the non cooperative uffd monitor. That will solve the false positive deliveries that CRIU manager currently filters out too. From a theoretical standpoint it's also quite strange to even allow any uffd ioctl to run on a otherwise long gone mm created for a process that in the end wasn't even created (the mm got temporarily fully created, but no child task really ever used such mm). However that mm is on its way to exit_mmap as soon as the ioclt returns and this only ever happens during race conditions, so the way CRIU monitor works there wasn't anything fundamentally concerning about this detail, despite it's remarkably "strange". Our priority was to keep the fork code as simple as possible and keep userfaultfd as non intrusive as possible. One alternative solution I'm wondering about for this memcg issue is to free the task struct with RCU also when fork has failed and to add the mm_update_next_owner before mmput. That will still report failed forks to the uffd monitor, so it's not the ideal fix, but since it's probably simpler I'm posting it below. Also I couldn't reproduce the problem with the testcase here yet. From 6cbf9d377b705476e5226704422357176f79e32c Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli <aarcange@redhat.com> Date: Tue, 5 Mar 2019 19:21:37 -0500 Subject: [PATCH 1/1] userfaultfd: use RCU to free the task struct when fork fails if MEMCG MEMCG depends on the task structure not to be freed under rcu_read_lock() in get_mem_cgroup_from_mm() after it dereferences mm->owner. A better fix would be to avoid registering forked vmas in userfaultfd contexts reported to the monitor, if case fork ends up failing. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- kernel/fork.c | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index eb9953c82104..3bcbb361ffbc 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -953,6 +953,15 @@ static void mm_init_aio(struct mm_struct *mm) #endif } +static __always_inline void mm_clear_owner(struct mm_struct *mm, + struct task_struct *p) +{ +#ifdef CONFIG_MEMCG + if (mm->owner == p) + mm->owner = NULL; +#endif +} + static void mm_init_owner(struct mm_struct *mm, struct task_struct *p) { #ifdef CONFIG_MEMCG @@ -1345,6 +1354,7 @@ static struct mm_struct *dup_mm(struct task_struct *tsk) free_pt: /* don't put binfmt in mmput, we haven't got module yet */ mm->binfmt = NULL; + mm_init_owner(mm, NULL); mmput(mm); fail_nomem: @@ -1676,6 +1686,24 @@ static inline void rcu_copy_process(struct task_struct *p) #endif /* #ifdef CONFIG_TASKS_RCU */ } +#ifdef CONFIG_MEMCG +static void __delayed_free_task(struct rcu_head *rhp) +{ + struct task_struct *tsk = container_of(rhp, struct task_struct, rcu); + + free_task(tsk); +} +#endif /* CONFIG_MEMCG */ + +static __always_inline void delayed_free_task(struct task_struct *tsk) +{ +#ifdef CONFIG_MEMCG + call_rcu(&tsk->rcu, __delayed_free_task); +#else /* CONFIG_MEMCG */ + free_task(tsk); +#endif /* CONFIG_MEMCG */ +} + /* * This creates a new process as a copy of the old one, * but does not actually start it yet. @@ -2137,8 +2165,10 @@ static __latent_entropy struct task_struct *copy_process( bad_fork_cleanup_namespaces: exit_task_namespaces(p); bad_fork_cleanup_mm: - if (p->mm) + if (p->mm) { + mm_clear_owner(p->mm, p); mmput(p->mm); + } bad_fork_cleanup_signal: if (!(clone_flags & CLONE_THREAD)) free_signal_struct(p->signal); @@ -2169,7 +2199,7 @@ static __latent_entropy struct task_struct *copy_process( bad_fork_free: p->state = TASK_DEAD; put_task_stack(p); - free_task(p); + delayed_free_task(p); fork_out: spin_lock_irq(¤t->sighand->siglock); hlist_del_init(&delayed.node); ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-06 2:05 ` Andrea Arcangeli @ 2019-03-06 5:53 ` zhong jiang 2019-03-06 6:26 ` Mike Rapoport 2019-03-08 7:10 ` zhong jiang 1 sibling, 1 reply; 26+ messages in thread From: zhong jiang @ 2019-03-06 5:53 UTC (permalink / raw) To: Andrea Arcangeli Cc: Dmitry Vyukov, syzbot, Michal Hocko, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka, Mike Rapoport, Peter Xu On 2019/3/6 10:05, Andrea Arcangeli wrote: > Hello everyone, > > [ CC'ed Mike and Peter ] > > On Tue, Mar 05, 2019 at 02:42:00PM +0800, zhong jiang wrote: >> On 2019/3/5 14:26, Dmitry Vyukov wrote: >>> On Mon, Mar 4, 2019 at 4:32 PM zhong jiang <zhongjiang@huawei.com> wrote: >>>> On 2019/3/4 22:11, Dmitry Vyukov wrote: >>>>> On Mon, Mar 4, 2019 at 3:00 PM zhong jiang <zhongjiang@huawei.com> wrote: >>>>>> On 2019/3/4 15:40, Dmitry Vyukov wrote: >>>>>>> On Sun, Mar 3, 2019 at 5:19 PM zhong jiang <zhongjiang@huawei.com> wrote: >>>>>>>> Hi, guys >>>>>>>> >>>>>>>> I also hit the following issue. but it fails to reproduce the issue by the log. >>>>>>>> >>>>>>>> it seems to the case that we access the mm->owner and deference it will result in the UAF. >>>>>>>> But it should not be possible that we specify the incomplete process to be the mm->owner. >>>>>>>> >>>>>>>> Any thoughts? >>>>>>> FWIW syzbot was able to reproduce this with this reproducer. >>>>>>> This looks like a very subtle race (threaded reproducer that runs >>>>>>> repeatedly in multiple processes), so most likely we are looking for >>>>>>> something like few instructions inconsistency window. >>>>>>> >>>>>> I has a little doubtful about the instrustions inconsistency window. >>>>>> >>>>>> I guess that you mean some smb barriers should be taken into account.:-) >>>>>> >>>>>> Because IMO, It should not be the lock case to result in the issue. >>>>> Since the crash was triggered on x86 _most likley_ this is not a >>>>> missed barrier. What I meant is that one thread needs to executed some >>>>> code, while another thread is stopped within few instructions. >>>>> >>>>> >>>> It is weird and I can not find any relationship you had said with the issue.:-( >>>> >>>> Because It is the cause that mm->owner has been freed, whereas we still deference it. >>>> >>>> From the lastest freed task call trace, It fails to create process. >>>> >>>> Am I miss something or I misunderstand your meaning. Please correct me. >>> Your analysis looks correct. I am just saying that the root cause of >>> this use-after-free seems to be a race condition. >>> >>> >>> >> Yep, Indeed, I can not figure out how the race works. I will dig up further. > Yes it's a race condition. > > We were aware about the non-cooperative fork userfaultfd feature > creating userfaultfd file descriptor that gets reported to the parent > uffd, despite they belong to mm created by failed forks. > > https://www.spinics.net/lists/linux-mm/msg136357.html > > The fork failure in my testcase happened because of signal pending > that interrupted fork after the failed-fork uffd context, was already > pushed to the userfaultfd reader/monitor. CRIU then takes care of > filtering the failed fork cases so we didn't want to make the fork > code more complicated just for userfaultfd. > > In reality if MEMCG is enabled at build time, mm->owner maintainance > code now creates a race condition in the above case, with any fork > failure. > > I pinged Mike yesterday to ask if my theory could be true for this bug > and one solution he suggested is to do the userfaultfd_dup at a point > where fork cannot fail anymore. That's precisely what we were > wondering to do back then to avoid the failed fork reports to the > non cooperative uffd monitor. > > That will solve the false positive deliveries that CRIU manager > currently filters out too. From a theoretical standpoint it's also > quite strange to even allow any uffd ioctl to run on a otherwise long > gone mm created for a process that in the end wasn't even created (the > mm got temporarily fully created, but no child task really ever used > such mm). However that mm is on its way to exit_mmap as soon as the > ioclt returns and this only ever happens during race conditions, so > the way CRIU monitor works there wasn't anything fundamentally > concerning about this detail, despite it's remarkably "strange". Our > priority was to keep the fork code as simple as possible and keep > userfaultfd as non intrusive as possible. Hi, Andrea I still not clear why uffd ioctl can use the incomplete process as the mm->owner. and how to produce the race. From your above explainations, My underdtanding is that the process handling do_exexve will have a temporary mm, which will be used by the UUFD ioctl. Thanks, zhong jiang > One alternative solution I'm wondering about for this memcg issue is > to free the task struct with RCU also when fork has failed and to add > the mm_update_next_owner before mmput. That will still report failed > forks to the uffd monitor, so it's not the ideal fix, but since it's > probably simpler I'm posting it below. Also I couldn't reproduce the > problem with the testcase here yet. > > >From 6cbf9d377b705476e5226704422357176f79e32c Mon Sep 17 00:00:00 2001 > From: Andrea Arcangeli <aarcange@redhat.com> > Date: Tue, 5 Mar 2019 19:21:37 -0500 > Subject: [PATCH 1/1] userfaultfd: use RCU to free the task struct when fork > fails if MEMCG > > MEMCG depends on the task structure not to be freed under > rcu_read_lock() in get_mem_cgroup_from_mm() after it dereferences > mm->owner. > > A better fix would be to avoid registering forked vmas in userfaultfd > contexts reported to the monitor, if case fork ends up failing. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > --- > kernel/fork.c | 34 ++++++++++++++++++++++++++++++++-- > 1 file changed, 32 insertions(+), 2 deletions(-) > > diff --git a/kernel/fork.c b/kernel/fork.c > index eb9953c82104..3bcbb361ffbc 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -953,6 +953,15 @@ static void mm_init_aio(struct mm_struct *mm) > #endif > } > > +static __always_inline void mm_clear_owner(struct mm_struct *mm, > + struct task_struct *p) > +{ > +#ifdef CONFIG_MEMCG > + if (mm->owner == p) > + mm->owner = NULL; > +#endif > +} > + > static void mm_init_owner(struct mm_struct *mm, struct task_struct *p) > { > #ifdef CONFIG_MEMCG > @@ -1345,6 +1354,7 @@ static struct mm_struct *dup_mm(struct task_struct *tsk) > free_pt: > /* don't put binfmt in mmput, we haven't got module yet */ > mm->binfmt = NULL; > + mm_init_owner(mm, NULL); > mmput(mm); > > fail_nomem: > @@ -1676,6 +1686,24 @@ static inline void rcu_copy_process(struct task_struct *p) > #endif /* #ifdef CONFIG_TASKS_RCU */ > } > > +#ifdef CONFIG_MEMCG > +static void __delayed_free_task(struct rcu_head *rhp) > +{ > + struct task_struct *tsk = container_of(rhp, struct task_struct, rcu); > + > + free_task(tsk); > +} > +#endif /* CONFIG_MEMCG */ > + > +static __always_inline void delayed_free_task(struct task_struct *tsk) > +{ > +#ifdef CONFIG_MEMCG > + call_rcu(&tsk->rcu, __delayed_free_task); > +#else /* CONFIG_MEMCG */ > + free_task(tsk); > +#endif /* CONFIG_MEMCG */ > +} > + > /* > * This creates a new process as a copy of the old one, > * but does not actually start it yet. > @@ -2137,8 +2165,10 @@ static __latent_entropy struct task_struct *copy_process( > bad_fork_cleanup_namespaces: > exit_task_namespaces(p); > bad_fork_cleanup_mm: > - if (p->mm) > + if (p->mm) { > + mm_clear_owner(p->mm, p); > mmput(p->mm); > + } > bad_fork_cleanup_signal: > if (!(clone_flags & CLONE_THREAD)) > free_signal_struct(p->signal); > @@ -2169,7 +2199,7 @@ static __latent_entropy struct task_struct *copy_process( > bad_fork_free: > p->state = TASK_DEAD; > put_task_stack(p); > - free_task(p); > + delayed_free_task(p); > fork_out: > spin_lock_irq(¤t->sighand->siglock); > hlist_del_init(&delayed.node); > > . > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-06 5:53 ` zhong jiang @ 2019-03-06 6:26 ` Mike Rapoport 2019-03-06 7:41 ` zhong jiang 0 siblings, 1 reply; 26+ messages in thread From: Mike Rapoport @ 2019-03-06 6:26 UTC (permalink / raw) To: zhong jiang Cc: Andrea Arcangeli, Dmitry Vyukov, syzbot, Michal Hocko, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka, Mike Rapoport, Peter Xu Hi, On Wed, Mar 06, 2019 at 01:53:12PM +0800, zhong jiang wrote: > On 2019/3/6 10:05, Andrea Arcangeli wrote: > > Hello everyone, > > > > [ CC'ed Mike and Peter ] > > > > On Tue, Mar 05, 2019 at 02:42:00PM +0800, zhong jiang wrote: > >> On 2019/3/5 14:26, Dmitry Vyukov wrote: > >>> On Mon, Mar 4, 2019 at 4:32 PM zhong jiang <zhongjiang@huawei.com> wrote: > >>>> On 2019/3/4 22:11, Dmitry Vyukov wrote: > >>>>> On Mon, Mar 4, 2019 at 3:00 PM zhong jiang <zhongjiang@huawei.com> wrote: > >>>>>> On 2019/3/4 15:40, Dmitry Vyukov wrote: > >>>>>>> On Sun, Mar 3, 2019 at 5:19 PM zhong jiang <zhongjiang@huawei.com> wrote: > >>>>>>>> Hi, guys > >>>>>>>> > >>>>>>>> I also hit the following issue. but it fails to reproduce the issue by the log. > >>>>>>>> > >>>>>>>> it seems to the case that we access the mm->owner and deference it will result in the UAF. > >>>>>>>> But it should not be possible that we specify the incomplete process to be the mm->owner. > >>>>>>>> > >>>>>>>> Any thoughts? > >>>>>>> FWIW syzbot was able to reproduce this with this reproducer. > >>>>>>> This looks like a very subtle race (threaded reproducer that runs > >>>>>>> repeatedly in multiple processes), so most likely we are looking for > >>>>>>> something like few instructions inconsistency window. > >>>>>>> > >>>>>> I has a little doubtful about the instrustions inconsistency window. > >>>>>> > >>>>>> I guess that you mean some smb barriers should be taken into account.:-) > >>>>>> > >>>>>> Because IMO, It should not be the lock case to result in the issue. > >>>>> Since the crash was triggered on x86 _most likley_ this is not a > >>>>> missed barrier. What I meant is that one thread needs to executed some > >>>>> code, while another thread is stopped within few instructions. > >>>>> > >>>>> > >>>> It is weird and I can not find any relationship you had said with the issue.:-( > >>>> > >>>> Because It is the cause that mm->owner has been freed, whereas we still deference it. > >>>> > >>>> From the lastest freed task call trace, It fails to create process. > >>>> > >>>> Am I miss something or I misunderstand your meaning. Please correct me. > >>> Your analysis looks correct. I am just saying that the root cause of > >>> this use-after-free seems to be a race condition. > >>> > >>> > >>> > >> Yep, Indeed, I can not figure out how the race works. I will dig up further. > > Yes it's a race condition. > > > > We were aware about the non-cooperative fork userfaultfd feature > > creating userfaultfd file descriptor that gets reported to the parent > > uffd, despite they belong to mm created by failed forks. > > > > https://www.spinics.net/lists/linux-mm/msg136357.html > > > > Hi, Andrea > > I still not clear why uffd ioctl can use the incomplete process as the mm->owner. > and how to produce the race. There is a C reproducer in the syzcaller report: https://syzkaller.appspot.com/x/repro.c?x=172fa5a3400000 > From your above explainations, My underdtanding is that the process handling do_exexve > will have a temporary mm, which will be used by the UUFD ioctl. The race is between userfaultfd operation and fork() failure: forking thread | userfaultfd monitor thread --------------------------------+------------------------------- fork() | dup_mmap() | dup_userfaultfd() | dup_userfaultfd_complete() | | read(UFFD_EVENT_FORK) | uffdio_copy() | mmget_not_zero() goto bad_fork_something | ... | bad_fork_free: | free_task() | | mem_cgroup_from_task() | /* access stale mm->owner */ > Thanks, > zhong jiang -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-06 6:26 ` Mike Rapoport @ 2019-03-06 7:41 ` zhong jiang 2019-03-06 8:12 ` Peter Xu 2019-03-06 8:20 ` Mike Rapoport 0 siblings, 2 replies; 26+ messages in thread From: zhong jiang @ 2019-03-06 7:41 UTC (permalink / raw) To: Mike Rapoport Cc: Andrea Arcangeli, Dmitry Vyukov, syzbot, Michal Hocko, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka, Mike Rapoport, Peter Xu On 2019/3/6 14:26, Mike Rapoport wrote: > Hi, > > On Wed, Mar 06, 2019 at 01:53:12PM +0800, zhong jiang wrote: >> On 2019/3/6 10:05, Andrea Arcangeli wrote: >>> Hello everyone, >>> >>> [ CC'ed Mike and Peter ] >>> >>> On Tue, Mar 05, 2019 at 02:42:00PM +0800, zhong jiang wrote: >>>> On 2019/3/5 14:26, Dmitry Vyukov wrote: >>>>> On Mon, Mar 4, 2019 at 4:32 PM zhong jiang <zhongjiang@huawei.com> wrote: >>>>>> On 2019/3/4 22:11, Dmitry Vyukov wrote: >>>>>>> On Mon, Mar 4, 2019 at 3:00 PM zhong jiang <zhongjiang@huawei.com> wrote: >>>>>>>> On 2019/3/4 15:40, Dmitry Vyukov wrote: >>>>>>>>> On Sun, Mar 3, 2019 at 5:19 PM zhong jiang <zhongjiang@huawei.com> wrote: >>>>>>>>>> Hi, guys >>>>>>>>>> >>>>>>>>>> I also hit the following issue. but it fails to reproduce the issue by the log. >>>>>>>>>> >>>>>>>>>> it seems to the case that we access the mm->owner and deference it will result in the UAF. >>>>>>>>>> But it should not be possible that we specify the incomplete process to be the mm->owner. >>>>>>>>>> >>>>>>>>>> Any thoughts? >>>>>>>>> FWIW syzbot was able to reproduce this with this reproducer. >>>>>>>>> This looks like a very subtle race (threaded reproducer that runs >>>>>>>>> repeatedly in multiple processes), so most likely we are looking for >>>>>>>>> something like few instructions inconsistency window. >>>>>>>>> >>>>>>>> I has a little doubtful about the instrustions inconsistency window. >>>>>>>> >>>>>>>> I guess that you mean some smb barriers should be taken into account.:-) >>>>>>>> >>>>>>>> Because IMO, It should not be the lock case to result in the issue. >>>>>>> Since the crash was triggered on x86 _most likley_ this is not a >>>>>>> missed barrier. What I meant is that one thread needs to executed some >>>>>>> code, while another thread is stopped within few instructions. >>>>>>> >>>>>>> >>>>>> It is weird and I can not find any relationship you had said with the issue.:-( >>>>>> >>>>>> Because It is the cause that mm->owner has been freed, whereas we still deference it. >>>>>> >>>>>> From the lastest freed task call trace, It fails to create process. >>>>>> >>>>>> Am I miss something or I misunderstand your meaning. Please correct me. >>>>> Your analysis looks correct. I am just saying that the root cause of >>>>> this use-after-free seems to be a race condition. >>>>> >>>>> >>>>> >>>> Yep, Indeed, I can not figure out how the race works. I will dig up further. >>> Yes it's a race condition. >>> >>> We were aware about the non-cooperative fork userfaultfd feature >>> creating userfaultfd file descriptor that gets reported to the parent >>> uffd, despite they belong to mm created by failed forks. >>> >>> https://www.spinics.net/lists/linux-mm/msg136357.html >>> >> Hi, Andrea >> >> I still not clear why uffd ioctl can use the incomplete process as the mm->owner. >> and how to produce the race. > There is a C reproducer in the syzcaller report: > > https://syzkaller.appspot.com/x/repro.c?x=172fa5a3400000 > >> From your above explainations, My underdtanding is that the process handling do_exexve >> will have a temporary mm, which will be used by the UUFD ioctl. > The race is between userfaultfd operation and fork() failure: > > forking thread | userfaultfd monitor thread > --------------------------------+------------------------------- > fork() | > dup_mmap() | > dup_userfaultfd() | > dup_userfaultfd_complete() | > | read(UFFD_EVENT_FORK) > | uffdio_copy() > | mmget_not_zero() > goto bad_fork_something | > ... | > bad_fork_free: | > free_task() | > | mem_cgroup_from_task() > | /* access stale mm->owner */ > Hi, Mike forking thread fails to create the process ,and then free the allocated task struct. Other userfaultfd monitor thread should not access the stale mm->owner. The parent process and child process do not share the mm struct. Userfaultfd monitor thread's mm->owner should not point to the freed child task_struct. and due to the existence of tasklist_lock, we can not specify the mm->owner to freed task_struct. I miss something,=-O Thanks, zhong jiang >> Thanks, >> zhong jiang ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-06 7:41 ` zhong jiang @ 2019-03-06 8:12 ` Peter Xu 2019-03-06 13:07 ` zhong jiang 2019-03-06 8:20 ` Mike Rapoport 1 sibling, 1 reply; 26+ messages in thread From: Peter Xu @ 2019-03-06 8:12 UTC (permalink / raw) To: zhong jiang Cc: Mike Rapoport, Andrea Arcangeli, Dmitry Vyukov, syzbot, Michal Hocko, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka, Mike Rapoport On Wed, Mar 06, 2019 at 03:41:06PM +0800, zhong jiang wrote: > On 2019/3/6 14:26, Mike Rapoport wrote: > > Hi, > > > > On Wed, Mar 06, 2019 at 01:53:12PM +0800, zhong jiang wrote: > >> On 2019/3/6 10:05, Andrea Arcangeli wrote: > >>> Hello everyone, > >>> > >>> [ CC'ed Mike and Peter ] > >>> > >>> On Tue, Mar 05, 2019 at 02:42:00PM +0800, zhong jiang wrote: > >>>> On 2019/3/5 14:26, Dmitry Vyukov wrote: > >>>>> On Mon, Mar 4, 2019 at 4:32 PM zhong jiang <zhongjiang@huawei.com> wrote: > >>>>>> On 2019/3/4 22:11, Dmitry Vyukov wrote: > >>>>>>> On Mon, Mar 4, 2019 at 3:00 PM zhong jiang <zhongjiang@huawei.com> wrote: > >>>>>>>> On 2019/3/4 15:40, Dmitry Vyukov wrote: > >>>>>>>>> On Sun, Mar 3, 2019 at 5:19 PM zhong jiang <zhongjiang@huawei.com> wrote: > >>>>>>>>>> Hi, guys > >>>>>>>>>> > >>>>>>>>>> I also hit the following issue. but it fails to reproduce the issue by the log. > >>>>>>>>>> > >>>>>>>>>> it seems to the case that we access the mm->owner and deference it will result in the UAF. > >>>>>>>>>> But it should not be possible that we specify the incomplete process to be the mm->owner. > >>>>>>>>>> > >>>>>>>>>> Any thoughts? > >>>>>>>>> FWIW syzbot was able to reproduce this with this reproducer. > >>>>>>>>> This looks like a very subtle race (threaded reproducer that runs > >>>>>>>>> repeatedly in multiple processes), so most likely we are looking for > >>>>>>>>> something like few instructions inconsistency window. > >>>>>>>>> > >>>>>>>> I has a little doubtful about the instrustions inconsistency window. > >>>>>>>> > >>>>>>>> I guess that you mean some smb barriers should be taken into account.:-) > >>>>>>>> > >>>>>>>> Because IMO, It should not be the lock case to result in the issue. > >>>>>>> Since the crash was triggered on x86 _most likley_ this is not a > >>>>>>> missed barrier. What I meant is that one thread needs to executed some > >>>>>>> code, while another thread is stopped within few instructions. > >>>>>>> > >>>>>>> > >>>>>> It is weird and I can not find any relationship you had said with the issue.:-( > >>>>>> > >>>>>> Because It is the cause that mm->owner has been freed, whereas we still deference it. > >>>>>> > >>>>>> From the lastest freed task call trace, It fails to create process. > >>>>>> > >>>>>> Am I miss something or I misunderstand your meaning. Please correct me. > >>>>> Your analysis looks correct. I am just saying that the root cause of > >>>>> this use-after-free seems to be a race condition. > >>>>> > >>>>> > >>>>> > >>>> Yep, Indeed, I can not figure out how the race works. I will dig up further. > >>> Yes it's a race condition. > >>> > >>> We were aware about the non-cooperative fork userfaultfd feature > >>> creating userfaultfd file descriptor that gets reported to the parent > >>> uffd, despite they belong to mm created by failed forks. > >>> > >>> https://www.spinics.net/lists/linux-mm/msg136357.html > >>> > >> Hi, Andrea > >> > >> I still not clear why uffd ioctl can use the incomplete process as the mm->owner. > >> and how to produce the race. > > There is a C reproducer in the syzcaller report: > > > > https://syzkaller.appspot.com/x/repro.c?x=172fa5a3400000 > > > >> From your above explainations, My underdtanding is that the process handling do_exexve > >> will have a temporary mm, which will be used by the UUFD ioctl. > > The race is between userfaultfd operation and fork() failure: > > > > forking thread | userfaultfd monitor thread > > --------------------------------+------------------------------- > > fork() | > > dup_mmap() | > > dup_userfaultfd() | > > dup_userfaultfd_complete() | > > | read(UFFD_EVENT_FORK) > > | uffdio_copy() > > | mmget_not_zero() > > goto bad_fork_something | > > ... | > > bad_fork_free: | > > free_task() | > > | mem_cgroup_from_task() > > | /* access stale mm->owner */ > > > Hi, Mike Hi, Zhong, > > forking thread fails to create the process ,and then free the allocated task struct. > Other userfaultfd monitor thread should not access the stale mm->owner. > > The parent process and child process do not share the mm struct. Userfaultfd monitor thread's > mm->owner should not point to the freed child task_struct. IIUC the problem is that above mm (of the mm->owner) is the child process's mm rather than the uffd monitor's. When dup_userfaultfd_complete() is called there will be a new userfaultfd context sent to the uffd monitor thread which linked to the chlid process's mm, and if the monitor thread do UFFDIO_COPY upon the newly received userfaultfd it'll operate on that new mm too. > > and due to the existence of tasklist_lock, we can not specify the mm->owner to freed task_struct. > > I miss something,=-O > > Thanks, > zhong jiang > >> Thanks, > >> zhong jiang > > Regards, -- Peter Xu ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-06 8:12 ` Peter Xu @ 2019-03-06 13:07 ` zhong jiang 2019-03-06 18:29 ` Andrea Arcangeli 0 siblings, 1 reply; 26+ messages in thread From: zhong jiang @ 2019-03-06 13:07 UTC (permalink / raw) To: Peter Xu, Mike Rapoport, Andrea Arcangeli Cc: Dmitry Vyukov, syzbot, Michal Hocko, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka, Mike Rapoport On 2019/3/6 16:12, Peter Xu wrote: > On Wed, Mar 06, 2019 at 03:41:06PM +0800, zhong jiang wrote: >> On 2019/3/6 14:26, Mike Rapoport wrote: >>> Hi, >>> >>> On Wed, Mar 06, 2019 at 01:53:12PM +0800, zhong jiang wrote: >>>> On 2019/3/6 10:05, Andrea Arcangeli wrote: >>>>> Hello everyone, >>>>> >>>>> [ CC'ed Mike and Peter ] >>>>> >>>>> On Tue, Mar 05, 2019 at 02:42:00PM +0800, zhong jiang wrote: >>>>>> On 2019/3/5 14:26, Dmitry Vyukov wrote: >>>>>>> On Mon, Mar 4, 2019 at 4:32 PM zhong jiang <zhongjiang@huawei.com> wrote: >>>>>>>> On 2019/3/4 22:11, Dmitry Vyukov wrote: >>>>>>>>> On Mon, Mar 4, 2019 at 3:00 PM zhong jiang <zhongjiang@huawei.com> wrote: >>>>>>>>>> On 2019/3/4 15:40, Dmitry Vyukov wrote: >>>>>>>>>>> On Sun, Mar 3, 2019 at 5:19 PM zhong jiang <zhongjiang@huawei.com> wrote: >>>>>>>>>>>> Hi, guys >>>>>>>>>>>> >>>>>>>>>>>> I also hit the following issue. but it fails to reproduce the issue by the log. >>>>>>>>>>>> >>>>>>>>>>>> it seems to the case that we access the mm->owner and deference it will result in the UAF. >>>>>>>>>>>> But it should not be possible that we specify the incomplete process to be the mm->owner. >>>>>>>>>>>> >>>>>>>>>>>> Any thoughts? >>>>>>>>>>> FWIW syzbot was able to reproduce this with this reproducer. >>>>>>>>>>> This looks like a very subtle race (threaded reproducer that runs >>>>>>>>>>> repeatedly in multiple processes), so most likely we are looking for >>>>>>>>>>> something like few instructions inconsistency window. >>>>>>>>>>> >>>>>>>>>> I has a little doubtful about the instrustions inconsistency window. >>>>>>>>>> >>>>>>>>>> I guess that you mean some smb barriers should be taken into account.:-) >>>>>>>>>> >>>>>>>>>> Because IMO, It should not be the lock case to result in the issue. >>>>>>>>> Since the crash was triggered on x86 _most likley_ this is not a >>>>>>>>> missed barrier. What I meant is that one thread needs to executed some >>>>>>>>> code, while another thread is stopped within few instructions. >>>>>>>>> >>>>>>>>> >>>>>>>> It is weird and I can not find any relationship you had said with the issue.:-( >>>>>>>> >>>>>>>> Because It is the cause that mm->owner has been freed, whereas we still deference it. >>>>>>>> >>>>>>>> From the lastest freed task call trace, It fails to create process. >>>>>>>> >>>>>>>> Am I miss something or I misunderstand your meaning. Please correct me. >>>>>>> Your analysis looks correct. I am just saying that the root cause of >>>>>>> this use-after-free seems to be a race condition. >>>>>>> >>>>>>> >>>>>>> >>>>>> Yep, Indeed, I can not figure out how the race works. I will dig up further. >>>>> Yes it's a race condition. >>>>> >>>>> We were aware about the non-cooperative fork userfaultfd feature >>>>> creating userfaultfd file descriptor that gets reported to the parent >>>>> uffd, despite they belong to mm created by failed forks. >>>>> >>>>> https://www.spinics.net/lists/linux-mm/msg136357.html >>>>> >>>> Hi, Andrea >>>> >>>> I still not clear why uffd ioctl can use the incomplete process as the mm->owner. >>>> and how to produce the race. >>> There is a C reproducer in the syzcaller report: >>> >>> https://syzkaller.appspot.com/x/repro.c?x=172fa5a3400000 >>> >>>> From your above explainations, My underdtanding is that the process handling do_exexve >>>> will have a temporary mm, which will be used by the UUFD ioctl. >>> The race is between userfaultfd operation and fork() failure: >>> >>> forking thread | userfaultfd monitor thread >>> --------------------------------+------------------------------- >>> fork() | >>> dup_mmap() | >>> dup_userfaultfd() | >>> dup_userfaultfd_complete() | >>> | read(UFFD_EVENT_FORK) >>> | uffdio_copy() >>> | mmget_not_zero() >>> goto bad_fork_something | >>> ... | >>> bad_fork_free: | >>> free_task() | >>> | mem_cgroup_from_task() >>> | /* access stale mm->owner */ >>> >> Hi, Mike > Hi, Zhong, > >> forking thread fails to create the process ,and then free the allocated task struct. >> Other userfaultfd monitor thread should not access the stale mm->owner. >> >> The parent process and child process do not share the mm struct. Userfaultfd monitor thread's >> mm->owner should not point to the freed child task_struct. > IIUC the problem is that above mm (of the mm->owner) is the child > process's mm rather than the uffd monitor's. When > dup_userfaultfd_complete() is called there will be a new userfaultfd > context sent to the uffd monitor thread which linked to the chlid > process's mm, and if the monitor thread do UFFDIO_COPY upon the newly > received userfaultfd it'll operate on that new mm too. Thank Mike and Peter for further explanation. I get it. Yes, The race indeed will result in the issue. but as for the patch Andrea has posted. I still has a little worry. The patch use call_rcu to delay free the task_struct, but It is possible to free the task_struct ahead of get_mem_cgroup_from_mm. is it right? Thanks, zhong jiang >> and due to the existence of tasklist_lock, we can not specify the mm->owner to freed task_struct. >> >> I miss something,=-O >> >> Thanks, >> zhong jiang >>>> Thanks, >>>> zhong jiang >> > Regards, > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-06 13:07 ` zhong jiang @ 2019-03-06 18:29 ` Andrea Arcangeli 2019-03-07 7:58 ` zhong jiang 0 siblings, 1 reply; 26+ messages in thread From: Andrea Arcangeli @ 2019-03-06 18:29 UTC (permalink / raw) To: zhong jiang Cc: Peter Xu, Mike Rapoport, Dmitry Vyukov, syzbot, Michal Hocko, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka, Mike Rapoport Hello Zhong, On Wed, Mar 06, 2019 at 09:07:00PM +0800, zhong jiang wrote: > The patch use call_rcu to delay free the task_struct, but It is possible to free the task_struct > ahead of get_mem_cgroup_from_mm. is it right? Yes it is possible to free before get_mem_cgroup_from_mm, but if it's freed before get_mem_cgroup_from_mm rcu_read_lock, rcu_dereference(mm->owner) will return NULL in such case and there will be no problem. The simple fix also clears the mm->owner of the failed-fork-mm before doing the call_rcu. The call_rcu delays the freeing after no other CPU runs in between rcu_read_lock/unlock anymore. That guarantees that those critical section will see mm->owner == NULL if the freeing of the task strut already happened. The solution Mike suggested for this and that we were wondering as ideal in the past for the signal issue too, is to move the uffd delivery at a point where fork is guaranteed to succeed. We should probably try that too to see how it looks like and if it can be done in a not intrusive way, but the simple fix that uses RCU should work too. Rolling back in case of errors inside fork itself isn't easily doable: the moment we push the uffd ctx to the other side of the uffd pipe there's no coming back as that information can reach the userland of the uffd monitor/reader thread immediately after. The rolling back is really the other thread failing at mmget_not_zero eventually. It's the userland that has to rollback in such case when it gets a -ESRCH retval. Note that this fork feature is only ever needed in the non-cooperative case, these things never need to happen when userfaultfd is used by an app (or a lib) that is aware that it is using userfaultfd. Thanks, Andrea ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-06 18:29 ` Andrea Arcangeli @ 2019-03-07 7:58 ` zhong jiang 0 siblings, 0 replies; 26+ messages in thread From: zhong jiang @ 2019-03-07 7:58 UTC (permalink / raw) To: Andrea Arcangeli Cc: Peter Xu, Mike Rapoport, Dmitry Vyukov, syzbot, Michal Hocko, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka, Mike Rapoport On 2019/3/7 2:29, Andrea Arcangeli wrote: > Hello Zhong, > > On Wed, Mar 06, 2019 at 09:07:00PM +0800, zhong jiang wrote: >> The patch use call_rcu to delay free the task_struct, but It is possible to free the task_struct >> ahead of get_mem_cgroup_from_mm. is it right? > Yes it is possible to free before get_mem_cgroup_from_mm, but if it's > freed before get_mem_cgroup_from_mm rcu_read_lock, > rcu_dereference(mm->owner) will return NULL in such case and there > will be no problem. Yes > The simple fix also clears the mm->owner of the failed-fork-mm before > doing the call_rcu. The call_rcu delays the freeing after no other CPU > runs in between rcu_read_lock/unlock anymore. That guarantees that > those critical section will see mm->owner == NULL if the freeing of > the task strut already happened. We has set the mm->owner to NULL when child process fails to fork ahead of freeing the task struct. Have those critical section chance to see the mm->owner, which is not NULL. I has tested the patch. Not Oops and panic appear so far. Thanks, zhong jiang > The solution Mike suggested for this and that we were wondering as > ideal in the past for the signal issue too, is to move the uffd > delivery at a point where fork is guaranteed to succeed. We should > probably try that too to see how it looks like and if it can be done > in a not intrusive way, but the simple fix that uses RCU should work > too. > > Rolling back in case of errors inside fork itself isn't easily doable: > the moment we push the uffd ctx to the other side of the uffd pipe > there's no coming back as that information can reach the userland of > the uffd monitor/reader thread immediately after. The rolling back is > really the other thread failing at mmget_not_zero eventually. It's the > userland that has to rollback in such case when it gets a -ESRCH > retval. > > Note that this fork feature is only ever needed in the non-cooperative > case, these things never need to happen when userfaultfd is used by an > app (or a lib) that is aware that it is using userfaultfd. > > Thanks, > Andrea > > . > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-06 7:41 ` zhong jiang 2019-03-06 8:12 ` Peter Xu @ 2019-03-06 8:20 ` Mike Rapoport 1 sibling, 0 replies; 26+ messages in thread From: Mike Rapoport @ 2019-03-06 8:20 UTC (permalink / raw) To: zhong jiang Cc: Andrea Arcangeli, Dmitry Vyukov, syzbot, Michal Hocko, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka, Mike Rapoport, Peter Xu On Wed, Mar 06, 2019 at 03:41:06PM +0800, zhong jiang wrote: > On 2019/3/6 14:26, Mike Rapoport wrote: > > Hi, > > > > On Wed, Mar 06, 2019 at 01:53:12PM +0800, zhong jiang wrote: > >> On 2019/3/6 10:05, Andrea Arcangeli wrote: > >>> Hello everyone, > >>> > >>> [ CC'ed Mike and Peter ] > >>> > >>> On Tue, Mar 05, 2019 at 02:42:00PM +0800, zhong jiang wrote: > >>>> On 2019/3/5 14:26, Dmitry Vyukov wrote: > >>>>> On Mon, Mar 4, 2019 at 4:32 PM zhong jiang <zhongjiang@huawei.com> wrote: > >>>>>> On 2019/3/4 22:11, Dmitry Vyukov wrote: > >>>>>>> On Mon, Mar 4, 2019 at 3:00 PM zhong jiang <zhongjiang@huawei.com> wrote: > >>>>>>>> On 2019/3/4 15:40, Dmitry Vyukov wrote: > >>>>>>>>> On Sun, Mar 3, 2019 at 5:19 PM zhong jiang <zhongjiang@huawei.com> wrote: > >>>>>>>>>> Hi, guys > >>>>>>>>>> > >>>>>>>>>> I also hit the following issue. but it fails to reproduce the issue by the log. > >>>>>>>>>> > >>>>>>>>>> it seems to the case that we access the mm->owner and deference it will result in the UAF. > >>>>>>>>>> But it should not be possible that we specify the incomplete process to be the mm->owner. > >>>>>>>>>> > >>>>>>>>>> Any thoughts? > >>>>>>>>> FWIW syzbot was able to reproduce this with this reproducer. > >>>>>>>>> This looks like a very subtle race (threaded reproducer that runs > >>>>>>>>> repeatedly in multiple processes), so most likely we are looking for > >>>>>>>>> something like few instructions inconsistency window. > >>>>>>>>> > >>>>>>>> I has a little doubtful about the instrustions inconsistency window. > >>>>>>>> > >>>>>>>> I guess that you mean some smb barriers should be taken into account.:-) > >>>>>>>> > >>>>>>>> Because IMO, It should not be the lock case to result in the issue. > >>>>>>> Since the crash was triggered on x86 _most likley_ this is not a > >>>>>>> missed barrier. What I meant is that one thread needs to executed some > >>>>>>> code, while another thread is stopped within few instructions. > >>>>>>> > >>>>>>> > >>>>>> It is weird and I can not find any relationship you had said with the issue.:-( > >>>>>> > >>>>>> Because It is the cause that mm->owner has been freed, whereas we still deference it. > >>>>>> > >>>>>> From the lastest freed task call trace, It fails to create process. > >>>>>> > >>>>>> Am I miss something or I misunderstand your meaning. Please correct me. > >>>>> Your analysis looks correct. I am just saying that the root cause of > >>>>> this use-after-free seems to be a race condition. > >>>>> > >>>>> > >>>>> > >>>> Yep, Indeed, I can not figure out how the race works. I will dig up further. > >>> Yes it's a race condition. > >>> > >>> We were aware about the non-cooperative fork userfaultfd feature > >>> creating userfaultfd file descriptor that gets reported to the parent > >>> uffd, despite they belong to mm created by failed forks. > >>> > >>> https://www.spinics.net/lists/linux-mm/msg136357.html > >>> > >> Hi, Andrea > >> > >> I still not clear why uffd ioctl can use the incomplete process as the mm->owner. > >> and how to produce the race. > > There is a C reproducer in the syzcaller report: > > > > https://syzkaller.appspot.com/x/repro.c?x=172fa5a3400000 > > > >> From your above explainations, My underdtanding is that the process handling do_exexve > >> will have a temporary mm, which will be used by the UUFD ioctl. > > The race is between userfaultfd operation and fork() failure: > > > > forking thread | userfaultfd monitor thread > > --------------------------------+------------------------------- > > fork() | > > dup_mmap() | > > dup_userfaultfd() | > > dup_userfaultfd_complete() | > > | read(UFFD_EVENT_FORK) > > | uffdio_copy() > > | mmget_not_zero() > > goto bad_fork_something | > > ... | > > bad_fork_free: | > > free_task() | > > | mem_cgroup_from_task() > > | /* access stale mm->owner */ > > > Hi, Mike > > forking thread fails to create the process ,and then free the allocated task struct. > Other userfaultfd monitor thread should not access the stale mm->owner. > > The parent process and child process do not share the mm struct. Userfaultfd monitor thread's > mm->owner should not point to the freed child task_struct. Userfaultfd can monitor remote mm's [1]. In this case, dup_userfaultfd() and dup_userfaultfd_complete() create uffd context for the new process and notify userspace uffd monitor about this new context. The uffd monitor then can perform uffd operations on the new context. On the right side the mmget_not_zero() will take the reference for the mm of the newly created process. [1] https://www.kernel.org/doc/html/latest/admin-guide/mm/userfaultfd.html#non-cooperative-userfaultfd > and due to the existence of tasklist_lock, we can not specify the mm->owner to freed task_struct. > > I miss something,=-O > > Thanks, > zhong jiang > >> Thanks, > >> zhong jiang > > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-06 2:05 ` Andrea Arcangeli 2019-03-06 5:53 ` zhong jiang @ 2019-03-08 7:10 ` zhong jiang 2019-03-15 21:39 ` Andrea Arcangeli 1 sibling, 1 reply; 26+ messages in thread From: zhong jiang @ 2019-03-08 7:10 UTC (permalink / raw) To: Andrea Arcangeli, Mike Rapoport, Peter Xu, Andrew Morton Cc: Dmitry Vyukov, syzbot, Michal Hocko, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka On 2019/3/6 10:05, Andrea Arcangeli wrote: > Hello everyone, > > [ CC'ed Mike and Peter ] > > On Tue, Mar 05, 2019 at 02:42:00PM +0800, zhong jiang wrote: >> On 2019/3/5 14:26, Dmitry Vyukov wrote: >>> On Mon, Mar 4, 2019 at 4:32 PM zhong jiang <zhongjiang@huawei.com> wrote: >>>> On 2019/3/4 22:11, Dmitry Vyukov wrote: >>>>> On Mon, Mar 4, 2019 at 3:00 PM zhong jiang <zhongjiang@huawei.com> wrote: >>>>>> On 2019/3/4 15:40, Dmitry Vyukov wrote: >>>>>>> On Sun, Mar 3, 2019 at 5:19 PM zhong jiang <zhongjiang@huawei.com> wrote: >>>>>>>> Hi, guys >>>>>>>> >>>>>>>> I also hit the following issue. but it fails to reproduce the issue by the log. >>>>>>>> >>>>>>>> it seems to the case that we access the mm->owner and deference it will result in the UAF. >>>>>>>> But it should not be possible that we specify the incomplete process to be the mm->owner. >>>>>>>> >>>>>>>> Any thoughts? >>>>>>> FWIW syzbot was able to reproduce this with this reproducer. >>>>>>> This looks like a very subtle race (threaded reproducer that runs >>>>>>> repeatedly in multiple processes), so most likely we are looking for >>>>>>> something like few instructions inconsistency window. >>>>>>> >>>>>> I has a little doubtful about the instrustions inconsistency window. >>>>>> >>>>>> I guess that you mean some smb barriers should be taken into account.:-) >>>>>> >>>>>> Because IMO, It should not be the lock case to result in the issue. >>>>> Since the crash was triggered on x86 _most likley_ this is not a >>>>> missed barrier. What I meant is that one thread needs to executed some >>>>> code, while another thread is stopped within few instructions. >>>>> >>>>> >>>> It is weird and I can not find any relationship you had said with the issue.:-( >>>> >>>> Because It is the cause that mm->owner has been freed, whereas we still deference it. >>>> >>>> From the lastest freed task call trace, It fails to create process. >>>> >>>> Am I miss something or I misunderstand your meaning. Please correct me. >>> Your analysis looks correct. I am just saying that the root cause of >>> this use-after-free seems to be a race condition. >>> >>> >>> >> Yep, Indeed, I can not figure out how the race works. I will dig up further. > Yes it's a race condition. > > We were aware about the non-cooperative fork userfaultfd feature > creating userfaultfd file descriptor that gets reported to the parent > uffd, despite they belong to mm created by failed forks. > > https://www.spinics.net/lists/linux-mm/msg136357.html > > The fork failure in my testcase happened because of signal pending > that interrupted fork after the failed-fork uffd context, was already > pushed to the userfaultfd reader/monitor. CRIU then takes care of > filtering the failed fork cases so we didn't want to make the fork > code more complicated just for userfaultfd. > > In reality if MEMCG is enabled at build time, mm->owner maintainance > code now creates a race condition in the above case, with any fork > failure. > > I pinged Mike yesterday to ask if my theory could be true for this bug > and one solution he suggested is to do the userfaultfd_dup at a point > where fork cannot fail anymore. That's precisely what we were > wondering to do back then to avoid the failed fork reports to the > non cooperative uffd monitor. > > That will solve the false positive deliveries that CRIU manager > currently filters out too. From a theoretical standpoint it's also > quite strange to even allow any uffd ioctl to run on a otherwise long > gone mm created for a process that in the end wasn't even created (the > mm got temporarily fully created, but no child task really ever used > such mm). However that mm is on its way to exit_mmap as soon as the > ioclt returns and this only ever happens during race conditions, so > the way CRIU monitor works there wasn't anything fundamentally > concerning about this detail, despite it's remarkably "strange". Our > priority was to keep the fork code as simple as possible and keep > userfaultfd as non intrusive as possible. > > One alternative solution I'm wondering about for this memcg issue is > to free the task struct with RCU also when fork has failed and to add > the mm_update_next_owner before mmput. That will still report failed > forks to the uffd monitor, so it's not the ideal fix, but since it's > probably simpler I'm posting it below. Also I couldn't reproduce the > problem with the testcase here yet. > > >From 6cbf9d377b705476e5226704422357176f79e32c Mon Sep 17 00:00:00 2001 > From: Andrea Arcangeli <aarcange@redhat.com> > Date: Tue, 5 Mar 2019 19:21:37 -0500 > Subject: [PATCH 1/1] userfaultfd: use RCU to free the task struct when fork > fails if MEMCG > > MEMCG depends on the task structure not to be freed under > rcu_read_lock() in get_mem_cgroup_from_mm() after it dereferences > mm->owner. > > A better fix would be to avoid registering forked vmas in userfaultfd > contexts reported to the monitor, if case fork ends up failing. Hi, Andrea I can reproduce the issue in arm64 qemu machine. The issue will leave after applying the patch. Tested-by: zhong jiang <zhongjiang@huawei.com> Meanwhile, I just has a little doubt whether it is necessary to use RCU to free the task struct or not. I think that mm->owner alway be NULL after failing to create to process. Because we call mm_clear_owner. Thanks, zhong jiang > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > --- > kernel/fork.c | 34 ++++++++++++++++++++++++++++++++-- > 1 file changed, 32 insertions(+), 2 deletions(-) > > diff --git a/kernel/fork.c b/kernel/fork.c > index eb9953c82104..3bcbb361ffbc 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -953,6 +953,15 @@ static void mm_init_aio(struct mm_struct *mm) > #endif > } > > +static __always_inline void mm_clear_owner(struct mm_struct *mm, > + struct task_struct *p) > +{ > +#ifdef CONFIG_MEMCG > + if (mm->owner == p) > + mm->owner = NULL; > +#endif > +} > + > static void mm_init_owner(struct mm_struct *mm, struct task_struct *p) > { > #ifdef CONFIG_MEMCG > @@ -1345,6 +1354,7 @@ static struct mm_struct *dup_mm(struct task_struct *tsk) > free_pt: > /* don't put binfmt in mmput, we haven't got module yet */ > mm->binfmt = NULL; > + mm_init_owner(mm, NULL); > mmput(mm); > > fail_nomem: > @@ -1676,6 +1686,24 @@ static inline void rcu_copy_process(struct task_struct *p) > #endif /* #ifdef CONFIG_TASKS_RCU */ > } > > +#ifdef CONFIG_MEMCG > +static void __delayed_free_task(struct rcu_head *rhp) > +{ > + struct task_struct *tsk = container_of(rhp, struct task_struct, rcu); > + > + free_task(tsk); > +} > +#endif /* CONFIG_MEMCG */ > + > +static __always_inline void delayed_free_task(struct task_struct *tsk) > +{ > +#ifdef CONFIG_MEMCG > + call_rcu(&tsk->rcu, __delayed_free_task); > +#else /* CONFIG_MEMCG */ > + free_task(tsk); > +#endif /* CONFIG_MEMCG */ > +} > + > /* > * This creates a new process as a copy of the old one, > * but does not actually start it yet. > @@ -2137,8 +2165,10 @@ static __latent_entropy struct task_struct *copy_process( > bad_fork_cleanup_namespaces: > exit_task_namespaces(p); > bad_fork_cleanup_mm: > - if (p->mm) > + if (p->mm) { > + mm_clear_owner(p->mm, p); > mmput(p->mm); > + } > bad_fork_cleanup_signal: > if (!(clone_flags & CLONE_THREAD)) > free_signal_struct(p->signal); > @@ -2169,7 +2199,7 @@ static __latent_entropy struct task_struct *copy_process( > bad_fork_free: > p->state = TASK_DEAD; > put_task_stack(p); > - free_task(p); > + delayed_free_task(p); > fork_out: > spin_lock_irq(¤t->sighand->siglock); > hlist_del_init(&delayed.node); > > . > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-08 7:10 ` zhong jiang @ 2019-03-15 21:39 ` Andrea Arcangeli 2019-03-16 9:38 ` zhong jiang 0 siblings, 1 reply; 26+ messages in thread From: Andrea Arcangeli @ 2019-03-15 21:39 UTC (permalink / raw) To: zhong jiang Cc: Mike Rapoport, Peter Xu, Andrew Morton, Dmitry Vyukov, syzbot, Michal Hocko, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka On Fri, Mar 08, 2019 at 03:10:08PM +0800, zhong jiang wrote: > I can reproduce the issue in arm64 qemu machine. The issue will leave after applying the > patch. > > Tested-by: zhong jiang <zhongjiang@huawei.com> Thanks a lot for the quick testing! > Meanwhile, I just has a little doubt whether it is necessary to use RCU to free the task struct or not. > I think that mm->owner alway be NULL after failing to create to process. Because we call mm_clear_owner. I wish it was enough, but the problem is that the other CPU may be in the middle of get_mem_cgroup_from_mm() while this runs, and it would dereference mm->owner while it is been freed without the call_rcu affter we clear mm->owner. What prevents this race is the rcu_read_lock() in get_mem_cgroup_from_mm() and the corresponding call_rcu to free the task struct in the fork failure path (again only if CONFIG_MEMCG=y is defined). Considering you can reproduce this tiny race on arm64 qemu (perhaps tcg JIT timing variantions helps?), you might also in theory be able to still reproduce the race condition if you remove the call_rcu from delayed_free_task and you replace it with free_task. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-15 21:39 ` Andrea Arcangeli @ 2019-03-16 9:38 ` zhong jiang 2019-03-16 19:42 ` Andrea Arcangeli 0 siblings, 1 reply; 26+ messages in thread From: zhong jiang @ 2019-03-16 9:38 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mike Rapoport, Peter Xu, Andrew Morton, Dmitry Vyukov, syzbot, Michal Hocko, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka On 2019/3/16 5:39, Andrea Arcangeli wrote: > On Fri, Mar 08, 2019 at 03:10:08PM +0800, zhong jiang wrote: >> I can reproduce the issue in arm64 qemu machine. The issue will leave after applying the >> patch. >> >> Tested-by: zhong jiang <zhongjiang@huawei.com> > Thanks a lot for the quick testing! > >> Meanwhile, I just has a little doubt whether it is necessary to use RCU to free the task struct or not. >> I think that mm->owner alway be NULL after failing to create to process. Because we call mm_clear_owner. > I wish it was enough, but the problem is that the other CPU may be in > the middle of get_mem_cgroup_from_mm() while this runs, and it would > dereference mm->owner while it is been freed without the call_rcu > affter we clear mm->owner. What prevents this race is the As you had said, It would dereference mm->owner after we clear mm->owner. But after we clear mm->owner, mm->owner should be NULL. Is it right? And mem_cgroup_from_task will check the parameter. you mean that it is possible after checking the parameter to clear the owner . and the NULL pointer will trigger. :-( Thanks, zhong jiang > rcu_read_lock() in get_mem_cgroup_from_mm() and the corresponding > call_rcu to free the task struct in the fork failure path (again only > if CONFIG_MEMCG=y is defined). Considering you can reproduce this tiny > race on arm64 qemu (perhaps tcg JIT timing variantions helps?), you > might also in theory be able to still reproduce the race condition if > you remove the call_rcu from delayed_free_task and you replace it with > free_task. > > . > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-16 9:38 ` zhong jiang @ 2019-03-16 19:42 ` Andrea Arcangeli 2019-03-18 6:23 ` zhong jiang 0 siblings, 1 reply; 26+ messages in thread From: Andrea Arcangeli @ 2019-03-16 19:42 UTC (permalink / raw) To: zhong jiang Cc: Mike Rapoport, Peter Xu, Andrew Morton, Dmitry Vyukov, syzbot, Michal Hocko, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka On Sat, Mar 16, 2019 at 05:38:54PM +0800, zhong jiang wrote: > On 2019/3/16 5:39, Andrea Arcangeli wrote: > > On Fri, Mar 08, 2019 at 03:10:08PM +0800, zhong jiang wrote: > >> I can reproduce the issue in arm64 qemu machine. The issue will leave after applying the > >> patch. > >> > >> Tested-by: zhong jiang <zhongjiang@huawei.com> > > Thanks a lot for the quick testing! > > > >> Meanwhile, I just has a little doubt whether it is necessary to use RCU to free the task struct or not. > >> I think that mm->owner alway be NULL after failing to create to process. Because we call mm_clear_owner. > > I wish it was enough, but the problem is that the other CPU may be in > > the middle of get_mem_cgroup_from_mm() while this runs, and it would > > dereference mm->owner while it is been freed without the call_rcu > > affter we clear mm->owner. What prevents this race is the > As you had said, It would dereference mm->owner after we clear mm->owner. > > But after we clear mm->owner, mm->owner should be NULL. Is it right? > > And mem_cgroup_from_task will check the parameter. > you mean that it is possible after checking the parameter to clear the owner . > and the NULL pointer will trigger. :-( Dereference mm->owner didn't mean reading the value of the mm->owner pointer, it really means to dereference the value of the pointer. It's like below: get_mem_cgroup_from_mm() failing fork() ---- --- task = mm->owner mm->owner = NULL; free(mm->owner) *task /* use after free */ We didn't set mm->owner to NULL before, so the window for the race was larger, but setting mm->owner to NULL only hides the problem and it can still happen (albeit with a smaller window). If get_mem_cgroup_from_mm() can see at any time mm->owner not NULL, then the free of the task struct must be delayed until after rcu_read_unlock has returned in get_mem_cgroup_from_mm(). This is the standard RCU model, the freeing must be delayed until after the next quiescent point. BTW, both mm_update_next_owner() and mm_clear_owner() should have used WRITE_ONCE when they write to mm->owner, I can update that too but it's just to not to make assumptions that gcc does the right thing (and we still rely on gcc to do the right thing in other places) so that is just an orthogonal cleanup. Thanks, Andrea ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-16 19:42 ` Andrea Arcangeli @ 2019-03-18 6:23 ` zhong jiang 0 siblings, 0 replies; 26+ messages in thread From: zhong jiang @ 2019-03-18 6:23 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mike Rapoport, Peter Xu, Andrew Morton, Dmitry Vyukov, syzbot, Michal Hocko, cgroups, Johannes Weiner, LKML, Linux-MM, syzkaller-bugs, Vladimir Davydov, David Rientjes, Hugh Dickins, Matthew Wilcox, Mel Gorman, Vlastimil Babka On 2019/3/17 3:42, Andrea Arcangeli wrote: > On Sat, Mar 16, 2019 at 05:38:54PM +0800, zhong jiang wrote: >> On 2019/3/16 5:39, Andrea Arcangeli wrote: >>> On Fri, Mar 08, 2019 at 03:10:08PM +0800, zhong jiang wrote: >>>> I can reproduce the issue in arm64 qemu machine. The issue will leave after applying the >>>> patch. >>>> >>>> Tested-by: zhong jiang <zhongjiang@huawei.com> >>> Thanks a lot for the quick testing! >>> >>>> Meanwhile, I just has a little doubt whether it is necessary to use RCU to free the task struct or not. >>>> I think that mm->owner alway be NULL after failing to create to process. Because we call mm_clear_owner. >>> I wish it was enough, but the problem is that the other CPU may be in >>> the middle of get_mem_cgroup_from_mm() while this runs, and it would >>> dereference mm->owner while it is been freed without the call_rcu >>> affter we clear mm->owner. What prevents this race is the >> As you had said, It would dereference mm->owner after we clear mm->owner. >> >> But after we clear mm->owner, mm->owner should be NULL. Is it right? >> >> And mem_cgroup_from_task will check the parameter. >> you mean that it is possible after checking the parameter to clear the owner . >> and the NULL pointer will trigger. :-( > Dereference mm->owner didn't mean reading the value of the mm->owner > pointer, it really means to dereference the value of the pointer. It's > like below: > > get_mem_cgroup_from_mm() failing fork() > ---- --- > task = mm->owner > mm->owner = NULL; > free(mm->owner) > *task /* use after free */ > > We didn't set mm->owner to NULL before, so the window for the race was > larger, but setting mm->owner to NULL only hides the problem and it > can still happen (albeit with a smaller window). > > If get_mem_cgroup_from_mm() can see at any time mm->owner not NULL, > then the free of the task struct must be delayed until after > rcu_read_unlock has returned in get_mem_cgroup_from_mm(). This is > the standard RCU model, the freeing must be delayed until after the > next quiescent point. Thank you for your explaination patiently. The patch should go to upstream too. I think you should send a formal patch to the mainline. Maybe other people suffer from the issue. :-) Thanks, zhong jiang > BTW, both mm_update_next_owner() and mm_clear_owner() should have used > WRITE_ONCE when they write to mm->owner, I can update that too but > it's just to not to make assumptions that gcc does the right thing > (and we still rely on gcc to do the right thing in other places) so > that is just an orthogonal cleanup. > > Thanks, > Andrea > > . > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-03 16:19 ` zhong jiang 2019-03-04 7:40 ` Dmitry Vyukov @ 2019-03-04 21:51 ` Matthew Wilcox 2019-03-05 3:09 ` zhong jiang 1 sibling, 1 reply; 26+ messages in thread From: Matthew Wilcox @ 2019-03-04 21:51 UTC (permalink / raw) To: zhong jiang Cc: syzbot, mhocko, Andrea Arcangeli, cgroups, hannes, linux-kernel, linux-mm, syzkaller-bugs, vdavydov.dev, David Rientjes, Hugh Dickins, Mel Gorman, Vlastimil Babka On Mon, Mar 04, 2019 at 12:19:32AM +0800, zhong jiang wrote: > I also hit the following issue. but it fails to reproduce the issue by the log. > > it seems to the case that we access the mm->owner and deference it will result in the UAF. > But it should not be possible that we specify the incomplete process to be the mm->owner. OK, so we've got thread 9325 calling fork() and failing due to the PID controller saying "no". 9325 calls free_task(), but somehow thread 9332 has a reference to the struct task_struct. There are two possibilities here: one is that 9332 really did manage to get a reference to the larval child of 9325, and the other is that 9332 has a stale reference to some memory which was reallocated to 9325's child. Andrea, is there any way for a UFFD thread to get access to the child's task_struct during the copy_process() call? If so, I think copy_process() needs to call mm_update_next_owner(). If there's no way for that to happen, then we have quite a bug-hunt ahead of us looking for who is missing a call to mm_update_next_owner(). > On 2018/12/4 23:43, syzbot wrote: > > syzbot has found a reproducer for the following crash on: > > > > HEAD commit: 0072a0c14d5b Merge tag 'media/v4.20-4' of git://git.kernel.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=11c885a3400000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=b9cc5a440391cbfd > > dashboard link: https://syzkaller.appspot.com/bug?extid=cbb52e396df3e565ab02 > > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12835e25400000 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=172fa5a3400000 > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com > > > > cgroup: fork rejected by pids controller in /syz2 > > ================================================================== > > BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:182 [inline] > > BUG: KASAN: use-after-free in task_css include/linux/cgroup.h:477 [inline] > > BUG: KASAN: use-after-free in mem_cgroup_from_task mm/memcontrol.c:815 [inline] > > BUG: KASAN: use-after-free in get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 > > Read of size 8 at addr ffff8881b72af310 by task syz-executor198/9332 > > > > CPU: 0 PID: 9332 Comm: syz-executor198 Not tainted 4.20.0-rc5+ #142 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > Call Trace: > > __dump_stack lib/dump_stack.c:77 [inline] > > dump_stack+0x244/0x39d lib/dump_stack.c:113 > > print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256 > > kasan_report_error mm/kasan/report.c:354 [inline] > > kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412 > > __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 > > __read_once_size include/linux/compiler.h:182 [inline] > > task_css include/linux/cgroup.h:477 [inline] > > mem_cgroup_from_task mm/memcontrol.c:815 [inline] > > get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 > > get_mem_cgroup_from_mm mm/memcontrol.c:834 [inline] > > mem_cgroup_try_charge+0x608/0xe20 mm/memcontrol.c:5888 > > mcopy_atomic_pte mm/userfaultfd.c:71 [inline] > > mfill_atomic_pte mm/userfaultfd.c:418 [inline] > > __mcopy_atomic mm/userfaultfd.c:559 [inline] > > mcopy_atomic+0xb08/0x2c70 mm/userfaultfd.c:609 > > userfaultfd_copy fs/userfaultfd.c:1705 [inline] > > userfaultfd_ioctl+0x29fb/0x5610 fs/userfaultfd.c:1851 > > vfs_ioctl fs/ioctl.c:46 [inline] > > file_ioctl fs/ioctl.c:509 [inline] > > do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 > > ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 > > __do_sys_ioctl fs/ioctl.c:720 [inline] > > __se_sys_ioctl fs/ioctl.c:718 [inline] > > __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 > > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > RIP: 0033:0x44c7e9 > > Code: 5d c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 > > RSP: 002b:00007f906b69fdb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > > RAX: ffffffffffffffda RBX: 00000000006e4a08 RCX: 000000000044c7e9 > > RDX: 0000000020000100 RSI: 00000000c028aa03 RDI: 0000000000000004 > > RBP: 00000000006e4a00 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e4a0c > > R13: 00007ffdfd47813f R14: 00007f906b6a09c0 R15: 000000000000002d > > > > Allocated by task 9325: > > save_stack+0x43/0xd0 mm/kasan/kasan.c:448 > > set_track mm/kasan/kasan.c:460 [inline] > > kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 > > kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 > > kmem_cache_alloc_node+0x144/0x730 mm/slab.c:3644 > > alloc_task_struct_node kernel/fork.c:158 [inline] > > dup_task_struct kernel/fork.c:843 [inline] > > copy_process+0x2026/0x87a0 kernel/fork.c:1751 > > _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 > > __do_sys_clone kernel/fork.c:2323 [inline] > > __se_sys_clone kernel/fork.c:2317 [inline] > > __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 > > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > Freed by task 9325: > > save_stack+0x43/0xd0 mm/kasan/kasan.c:448 > > set_track mm/kasan/kasan.c:460 [inline] > > __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 > > kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 > > __cache_free mm/slab.c:3498 [inline] > > kmem_cache_free+0x83/0x290 mm/slab.c:3760 > > free_task_struct kernel/fork.c:163 [inline] > > free_task+0x16e/0x1f0 kernel/fork.c:457 > > copy_process+0x1dcc/0x87a0 kernel/fork.c:2148 > > _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 > > __do_sys_clone kernel/fork.c:2323 [inline] > > __se_sys_clone kernel/fork.c:2317 [inline] > > __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 > > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > The buggy address belongs to the object at ffff8881b72ae240 > > which belongs to the cache task_struct(81:syz2) of size 6080 > > The buggy address is located 4304 bytes inside of > > 6080-byte region [ffff8881b72ae240, ffff8881b72afa00) > > The buggy address belongs to the page: > > page:ffffea0006dcab80 count:1 mapcount:0 mapping:ffff8881d2dce0c0 index:0x0 compound_mapcount: 0 > > flags: 0x2fffc0000010200(slab|head) > > raw: 02fffc0000010200 ffffea00074a1f88 ffffea0006ebbb88 ffff8881d2dce0c0 > > raw: 0000000000000000 ffff8881b72ae240 0000000100000001 ffff8881d87fe580 > > page dumped because: kasan: bad access detected > > page->mem_cgroup:ffff8881d87fe580 > > > > Memory state around the buggy address: > > ffff8881b72af200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > ffff8881b72af280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >> ffff8881b72af300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > ^ > > ffff8881b72af380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > ffff8881b72af400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > ================================================================== > > > > > > . > > > > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2019-03-04 21:51 ` Matthew Wilcox @ 2019-03-05 3:09 ` zhong jiang 0 siblings, 0 replies; 26+ messages in thread From: zhong jiang @ 2019-03-05 3:09 UTC (permalink / raw) To: Matthew Wilcox, Andrea Arcangeli Cc: syzbot, mhocko, cgroups, hannes, linux-kernel, linux-mm, syzkaller-bugs, vdavydov.dev, David Rientjes, Hugh Dickins, Mel Gorman, Vlastimil Babka On 2019/3/5 5:51, Matthew Wilcox wrote: > On Mon, Mar 04, 2019 at 12:19:32AM +0800, zhong jiang wrote: >> I also hit the following issue. but it fails to reproduce the issue by the log. >> >> it seems to the case that we access the mm->owner and deference it will result in the UAF. >> But it should not be possible that we specify the incomplete process to be the mm->owner. > OK, so we've got thread 9325 calling fork() and failing due to the PID > controller saying "no". 9325 calls free_task(), but somehow thread 9332 > has a reference to the struct task_struct. There are two possibilities > here: one is that 9332 really did manage to get a reference to the larval > child of 9325, and the other is that 9332 has a stale reference to some > memory which was reallocated to 9325's child. Good guess and analysis. IMO, 9332 can not handle the task_struct directly in the code flow. But It can get a reference of mm_struct. Maybe I miss something important. > Andrea, is there any way for a UFFD thread to get access to the child's > task_struct during the copy_process() call? If so, I think copy_process() > needs to call mm_update_next_owner(). Yep, Hope andrea have time to look at this. Thanks, zhong jiang > If there's no way for that to happen, then we have quite a bug-hunt ahead > of us looking for who is missing a call to mm_update_next_owner(). >> On 2018/12/4 23:43, syzbot wrote: >>> syzbot has found a reproducer for the following crash on: >>> >>> HEAD commit: 0072a0c14d5b Merge tag 'media/v4.20-4' of git://git.kernel.. >>> git tree: upstream >>> console output: https://syzkaller.appspot.com/x/log.txt?x=11c885a3400000 >>> kernel config: https://syzkaller.appspot.com/x/.config?x=b9cc5a440391cbfd >>> dashboard link: https://syzkaller.appspot.com/bug?extid=cbb52e396df3e565ab02 >>> compiler: gcc (GCC) 8.0.1 20180413 (experimental) >>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12835e25400000 >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=172fa5a3400000 >>> >>> IMPORTANT: if you fix the bug, please add the following tag to the commit: >>> Reported-by: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com >>> >>> cgroup: fork rejected by pids controller in /syz2 >>> ================================================================== >>> BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:182 [inline] >>> BUG: KASAN: use-after-free in task_css include/linux/cgroup.h:477 [inline] >>> BUG: KASAN: use-after-free in mem_cgroup_from_task mm/memcontrol.c:815 [inline] >>> BUG: KASAN: use-after-free in get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 >>> Read of size 8 at addr ffff8881b72af310 by task syz-executor198/9332 >>> >>> CPU: 0 PID: 9332 Comm: syz-executor198 Not tainted 4.20.0-rc5+ #142 >>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 >>> Call Trace: >>> __dump_stack lib/dump_stack.c:77 [inline] >>> dump_stack+0x244/0x39d lib/dump_stack.c:113 >>> print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256 >>> kasan_report_error mm/kasan/report.c:354 [inline] >>> kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412 >>> __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 >>> __read_once_size include/linux/compiler.h:182 [inline] >>> task_css include/linux/cgroup.h:477 [inline] >>> mem_cgroup_from_task mm/memcontrol.c:815 [inline] >>> get_mem_cgroup_from_mm.part.62+0x6d7/0x880 mm/memcontrol.c:844 >>> get_mem_cgroup_from_mm mm/memcontrol.c:834 [inline] >>> mem_cgroup_try_charge+0x608/0xe20 mm/memcontrol.c:5888 >>> mcopy_atomic_pte mm/userfaultfd.c:71 [inline] >>> mfill_atomic_pte mm/userfaultfd.c:418 [inline] >>> __mcopy_atomic mm/userfaultfd.c:559 [inline] >>> mcopy_atomic+0xb08/0x2c70 mm/userfaultfd.c:609 >>> userfaultfd_copy fs/userfaultfd.c:1705 [inline] >>> userfaultfd_ioctl+0x29fb/0x5610 fs/userfaultfd.c:1851 >>> vfs_ioctl fs/ioctl.c:46 [inline] >>> file_ioctl fs/ioctl.c:509 [inline] >>> do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 >>> ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 >>> __do_sys_ioctl fs/ioctl.c:720 [inline] >>> __se_sys_ioctl fs/ioctl.c:718 [inline] >>> __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 >>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>> RIP: 0033:0x44c7e9 >>> Code: 5d c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b c5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 >>> RSP: 002b:00007f906b69fdb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 >>> RAX: ffffffffffffffda RBX: 00000000006e4a08 RCX: 000000000044c7e9 >>> RDX: 0000000020000100 RSI: 00000000c028aa03 RDI: 0000000000000004 >>> RBP: 00000000006e4a00 R08: 0000000000000000 R09: 0000000000000000 >>> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e4a0c >>> R13: 00007ffdfd47813f R14: 00007f906b6a09c0 R15: 000000000000002d >>> >>> Allocated by task 9325: >>> save_stack+0x43/0xd0 mm/kasan/kasan.c:448 >>> set_track mm/kasan/kasan.c:460 [inline] >>> kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 >>> kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 >>> kmem_cache_alloc_node+0x144/0x730 mm/slab.c:3644 >>> alloc_task_struct_node kernel/fork.c:158 [inline] >>> dup_task_struct kernel/fork.c:843 [inline] >>> copy_process+0x2026/0x87a0 kernel/fork.c:1751 >>> _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 >>> __do_sys_clone kernel/fork.c:2323 [inline] >>> __se_sys_clone kernel/fork.c:2317 [inline] >>> __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 >>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>> >>> Freed by task 9325: >>> save_stack+0x43/0xd0 mm/kasan/kasan.c:448 >>> set_track mm/kasan/kasan.c:460 [inline] >>> __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 >>> kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 >>> __cache_free mm/slab.c:3498 [inline] >>> kmem_cache_free+0x83/0x290 mm/slab.c:3760 >>> free_task_struct kernel/fork.c:163 [inline] >>> free_task+0x16e/0x1f0 kernel/fork.c:457 >>> copy_process+0x1dcc/0x87a0 kernel/fork.c:2148 >>> _do_fork+0x1cb/0x11d0 kernel/fork.c:2216 >>> __do_sys_clone kernel/fork.c:2323 [inline] >>> __se_sys_clone kernel/fork.c:2317 [inline] >>> __x64_sys_clone+0xbf/0x150 kernel/fork.c:2317 >>> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>> >>> The buggy address belongs to the object at ffff8881b72ae240 >>> which belongs to the cache task_struct(81:syz2) of size 6080 >>> The buggy address is located 4304 bytes inside of >>> 6080-byte region [ffff8881b72ae240, ffff8881b72afa00) >>> The buggy address belongs to the page: >>> page:ffffea0006dcab80 count:1 mapcount:0 mapping:ffff8881d2dce0c0 index:0x0 compound_mapcount: 0 >>> flags: 0x2fffc0000010200(slab|head) >>> raw: 02fffc0000010200 ffffea00074a1f88 ffffea0006ebbb88 ffff8881d2dce0c0 >>> raw: 0000000000000000 ffff8881b72ae240 0000000100000001 ffff8881d87fe580 >>> page dumped because: kasan: bad access detected >>> page->mem_cgroup:ffff8881d87fe580 >>> >>> Memory state around the buggy address: >>> ffff8881b72af200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>> ffff8881b72af280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>>> ffff8881b72af300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>> ^ >>> ffff8881b72af380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>> ffff8881b72af400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>> ================================================================== >>> >>> >>> . >>> >> > . > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm 2018-11-07 1:52 KASAN: use-after-free Read in get_mem_cgroup_from_mm syzbot 2018-12-04 15:43 ` syzbot @ 2019-03-22 9:36 ` syzbot 1 sibling, 0 replies; 26+ messages in thread From: syzbot @ 2019-03-22 9:36 UTC (permalink / raw) To: aarcange, akpm, cgroups, dvyukov, hannes, hughd, linux-kernel, linux-mm, mgorman, mhocko, peterx, rientjes, rppt, rppt, syzkaller-bugs, vbabka, vdavydov.dev, willy, zhongjiang Bisection is inconclusive: the first bad commit could be any of: 2c43838c sched/isolation: Enable CONFIG_CPU_ISOLATION=y by default bf29cb23 sched/isolation: Make CONFIG_NO_HZ_FULL select CONFIG_CPU_ISOLATION d94d1053 sched/isolation: Document boot parameters dependency on CONFIG_CPU_ISOLATION=y 4c470317 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1592b037200000 start commit: 0072a0c1 git tree: upstream dashboard link: https://syzkaller.appspot.com/bug?extid=cbb52e396df3e565ab02 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12835e25400000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=172fa5a3400000 For information about bisection process see: https://goo.gl/tpsmEJ#bisection ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2019-03-22 9:36 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-11-07 1:52 KASAN: use-after-free Read in get_mem_cgroup_from_mm syzbot 2018-12-04 15:43 ` syzbot 2019-03-03 16:19 ` zhong jiang 2019-03-04 7:40 ` Dmitry Vyukov 2019-03-04 14:00 ` zhong jiang 2019-03-04 14:11 ` Dmitry Vyukov 2019-03-04 15:32 ` zhong jiang 2019-03-05 6:26 ` Dmitry Vyukov 2019-03-05 6:42 ` zhong jiang 2019-03-06 2:05 ` Andrea Arcangeli 2019-03-06 5:53 ` zhong jiang 2019-03-06 6:26 ` Mike Rapoport 2019-03-06 7:41 ` zhong jiang 2019-03-06 8:12 ` Peter Xu 2019-03-06 13:07 ` zhong jiang 2019-03-06 18:29 ` Andrea Arcangeli 2019-03-07 7:58 ` zhong jiang 2019-03-06 8:20 ` Mike Rapoport 2019-03-08 7:10 ` zhong jiang 2019-03-15 21:39 ` Andrea Arcangeli 2019-03-16 9:38 ` zhong jiang 2019-03-16 19:42 ` Andrea Arcangeli 2019-03-18 6:23 ` zhong jiang 2019-03-04 21:51 ` Matthew Wilcox 2019-03-05 3:09 ` zhong jiang 2019-03-22 9:36 ` syzbot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).