linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* KASAN: use-after-free Read in handle_userfault (2)
@ 2018-12-12  9:45 syzbot
  2018-12-12  9:58 ` Dmitry Vyukov
  0 siblings, 1 reply; 7+ messages in thread
From: syzbot @ 2018-12-12  9:45 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, syzkaller-bugs, viro

Hello,

syzbot found the following crash on:

HEAD commit:    14cf8c1d5b90 Add linux-next specific files for 20181210
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=133296db400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=c9133d0a4284c012
dashboard link: https://syzkaller.appspot.com/bug?extid=cbc64b24b2b2d54c07a9
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+cbc64b24b2b2d54c07a9@syzkaller.appspotmail.com

RDX: 00000000000003ff RSI: 0000000020012fe0 RDI: 00007f5dbe489850
RBP: 000000000072bf00 R08: 00000000000003ff R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5dbe48a6d4
R13: 00000000004c578a R14: 00000000004d9d90 R15: 00000000ffffffff
==================================================================
BUG: KASAN: use-after-free in __list_del_entry_valid+0xf1/0x100  
lib/list_debug.c:51
CPU: 1 PID: 20306 Comm: syz-executor2 Not tainted 4.20.0-rc6-next-20181210+  
#164
Read of size 8 at addr ffff8881c5e72bb0 by task kworker/0:1/12
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x244/0x39d lib/dump_stack.c:113
  handle_userfault.cold.30+0x47/0x62 fs/userfaultfd.c:431
  do_anonymous_page mm/memory.c:2938 [inline]
  handle_pte_fault mm/memory.c:3780 [inline]
  __handle_mm_fault+0x4d26/0x5b70 mm/memory.c:3906
  handle_mm_fault+0x54f/0xc70 mm/memory.c:3943
  do_user_addr_fault arch/x86/mm/fault.c:1475 [inline]
  __do_page_fault+0x5f6/0xd70 arch/x86/mm/fault.c:1541
  do_page_fault+0xf2/0x7e0 arch/x86/mm/fault.c:1572
  page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1143
RIP: 0033:0x4510a0
Code: 0f 84 c4 0f 00 00 48 89 f1 48 89 f8 48 83 e1 3f 48 83 f9 20 0f 86 7b  
02 00 00 48 83 e6 f0 48 83 e1 0f 66 0f ef c0 66 0f ef c9 <66> 0f 74 0e 66  
0f d7 d1 48 d3 ea 49 c7 c2 11 00 00 00 49 29 ca 4d
RSP: 002b:00007fab1fbba7a8 EFLAGS: 00010202
RAX: 00007fab1fbba850 RBX: 0000000000000003 RCX: 000000000000000e
RDX: 00000000000003ff RSI: 0000000020012fe0 RDI: 00007fab1fbba850
RBP: 000000000072bf00 R08: 00000000000003ff R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fab1fbbb6d4
R13: 00000000004c578a R14: 00000000004d9d90 R15: 00000000ffffffff
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 4.20.0-rc6-next-20181210+ #164
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: events_power_efficient neigh_periodic_work
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x244/0x39d lib/dump_stack.c:113
  print_address_description.cold.4+0x9/0x1ff mm/kasan/report.c:187
  kasan_report.cold.5+0x1b/0x39 mm/kasan/report.c:317
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135
  __list_del_entry_valid+0xf1/0x100 lib/list_debug.c:51
  __list_del_entry include/linux/list.h:117 [inline]
  list_del_init include/linux/list.h:159 [inline]
  neigh_mark_dead+0x13b/0x410 net/core/neighbour.c:125
  neigh_periodic_work+0x89a/0xc30 net/core/neighbour.c:905
  process_one_work+0xc90/0x1c40 kernel/workqueue.c:2153
  worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
  kthread+0x35a/0x440 kernel/kthread.c:246
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

Allocated by task 8166:
  save_stack+0x43/0xd0 mm/kasan/common.c:73
  set_track mm/kasan/common.c:85 [inline]
  kasan_kmalloc+0xcb/0xd0 mm/kasan/common.c:482
  __do_kmalloc_node mm/slab.c:3671 [inline]
  __kmalloc_node_track_caller+0x4d/0x70 mm/slab.c:3685
  __kmalloc_reserve.isra.38+0x41/0xe0 net/core/skbuff.c:137
  __alloc_skb+0x155/0x770 net/core/skbuff.c:205
  alloc_skb include/linux/skbuff.h:1008 [inline]
  netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
  netlink_sendmsg+0xb29/0xfc0 net/netlink/af_netlink.c:1892
  sock_sendmsg_nosec net/socket.c:622 [inline]
  sock_sendmsg+0xd5/0x120 net/socket.c:632
  ___sys_sendmsg+0x7fd/0x930 net/socket.c:2117
  __sys_sendmsg+0x11d/0x280 net/socket.c:2155
  __do_sys_sendmsg net/socket.c:2164 [inline]
  __se_sys_sendmsg net/socket.c:2162 [inline]
  __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 8166:
  save_stack+0x43/0xd0 mm/kasan/common.c:73
  set_track mm/kasan/common.c:85 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/common.c:444
  kasan_slab_free+0xe/0x10 mm/kasan/common.c:452
  __cache_free mm/slab.c:3485 [inline]
  kfree+0xcf/0x230 mm/slab.c:3804
  skb_free_head+0x99/0xc0 net/core/skbuff.c:550
  skb_release_data+0x70c/0x9a0 net/core/skbuff.c:570
  skb_release_all+0x4a/0x60 net/core/skbuff.c:627
  __kfree_skb net/core/skbuff.c:641 [inline]
  consume_skb+0x1ae/0x570 net/core/skbuff.c:701
  netlink_unicast_kernel net/netlink/af_netlink.c:1311 [inline]
  netlink_unicast+0x5ad/0x760 net/netlink/af_netlink.c:1336
  netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1917
  sock_sendmsg_nosec net/socket.c:622 [inline]
  sock_sendmsg+0xd5/0x120 net/socket.c:632
  ___sys_sendmsg+0x7fd/0x930 net/socket.c:2117
  __sys_sendmsg+0x11d/0x280 net/socket.c:2155
  __do_sys_sendmsg net/socket.c:2164 [inline]
  __se_sys_sendmsg net/socket.c:2162 [inline]
  __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8881c5e72940
  which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 624 bytes inside of
  1024-byte region [ffff8881c5e72940, ffff8881c5e72d40)
The buggy address belongs to the page:
page:ffffea0007179c80 count:1 mapcount:0 mapping:ffff8881da800ac0  
index:0xffff8881c5e72dc0 compound_mapcount: 0
flags: 0x2fffc0000010200(slab|head)
raw: 02fffc0000010200 ffffea00072cf188 ffffea00075d0c08 ffff8881da800ac0
raw: ffff8881c5e72dc0 ffff8881c5e72040 0000000100000005 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8881c5e72a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8881c5e72b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8881c5e72b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                      ^
  ffff8881c5e72c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8881c5e72c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
kobject: 'loop5' (00000000a3f9d928): kobject_uevent_env


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: KASAN: use-after-free Read in handle_userfault (2)
  2018-12-12  9:45 KASAN: use-after-free Read in handle_userfault (2) syzbot
@ 2018-12-12  9:58 ` Dmitry Vyukov
  2018-12-30  7:48   ` Dmitry Vyukov
  0 siblings, 1 reply; 7+ messages in thread
From: Dmitry Vyukov @ 2018-12-12  9:58 UTC (permalink / raw)
  To: syzbot+cbc64b24b2b2d54c07a9, Andrea Arcangeli, Sasha Levin
  Cc: linux-fsdevel, LKML, syzkaller-bugs, Al Viro

On Wed, Dec 12, 2018 at 10:45 AM syzbot
<syzbot+cbc64b24b2b2d54c07a9@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    14cf8c1d5b90 Add linux-next specific files for 20181210
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=133296db400000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=c9133d0a4284c012
> dashboard link: https://syzkaller.appspot.com/bug?extid=cbc64b24b2b2d54c07a9
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

This is a corrupted/intermixed kernel output, the bug actually
happened somewhere in net/core/neighbour.c.
syzkaller has a bunch of code to try to deal to corrupted kernel
output, but it's not always possible as a parallel thread printing
something can inject an unrelated frame just in the right place, and
then it's indistinguishable from a crash happened there.

The question is: what's the significance of that
"FAULT_FLAG_ALLOW_RETRY missing 70"?
Is it something to fix in kernel? Can the check be removed? Should it
be moved somewhere earlier when flags are passed to kernel and cause
EINVAL? Can it be at least converted to a single-line pr_warn?
+Sasha go as far as suggesting that any "Call Trace:" in kernel output
means a kernel bugs. This is one of few places in kernel that dumps
stacks left and right without a kernel bug (?).



> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+cbc64b24b2b2d54c07a9@syzkaller.appspotmail.com
>
> RDX: 00000000000003ff RSI: 0000000020012fe0 RDI: 00007f5dbe489850
> RBP: 000000000072bf00 R08: 00000000000003ff R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5dbe48a6d4
> R13: 00000000004c578a R14: 00000000004d9d90 R15: 00000000ffffffff
> ==================================================================
> BUG: KASAN: use-after-free in __list_del_entry_valid+0xf1/0x100
> lib/list_debug.c:51
> CPU: 1 PID: 20306 Comm: syz-executor2 Not tainted 4.20.0-rc6-next-20181210+
> #164
> Read of size 8 at addr ffff8881c5e72bb0 by task kworker/0:1/12
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
>
> Call Trace:
>   __dump_stack lib/dump_stack.c:77 [inline]
>   dump_stack+0x244/0x39d lib/dump_stack.c:113
>   handle_userfault.cold.30+0x47/0x62 fs/userfaultfd.c:431
>   do_anonymous_page mm/memory.c:2938 [inline]
>   handle_pte_fault mm/memory.c:3780 [inline]
>   __handle_mm_fault+0x4d26/0x5b70 mm/memory.c:3906
>   handle_mm_fault+0x54f/0xc70 mm/memory.c:3943
>   do_user_addr_fault arch/x86/mm/fault.c:1475 [inline]
>   __do_page_fault+0x5f6/0xd70 arch/x86/mm/fault.c:1541
>   do_page_fault+0xf2/0x7e0 arch/x86/mm/fault.c:1572
>   page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1143
> RIP: 0033:0x4510a0
> Code: 0f 84 c4 0f 00 00 48 89 f1 48 89 f8 48 83 e1 3f 48 83 f9 20 0f 86 7b
> 02 00 00 48 83 e6 f0 48 83 e1 0f 66 0f ef c0 66 0f ef c9 <66> 0f 74 0e 66
> 0f d7 d1 48 d3 ea 49 c7 c2 11 00 00 00 49 29 ca 4d
> RSP: 002b:00007fab1fbba7a8 EFLAGS: 00010202
> RAX: 00007fab1fbba850 RBX: 0000000000000003 RCX: 000000000000000e
> RDX: 00000000000003ff RSI: 0000000020012fe0 RDI: 00007fab1fbba850
> RBP: 000000000072bf00 R08: 00000000000003ff R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007fab1fbbb6d4
> R13: 00000000004c578a R14: 00000000004d9d90 R15: 00000000ffffffff
> CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 4.20.0-rc6-next-20181210+ #164
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: events_power_efficient neigh_periodic_work
> Call Trace:
>   __dump_stack lib/dump_stack.c:77 [inline]
>   dump_stack+0x244/0x39d lib/dump_stack.c:113
>   print_address_description.cold.4+0x9/0x1ff mm/kasan/report.c:187
>   kasan_report.cold.5+0x1b/0x39 mm/kasan/report.c:317
>   __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135
>   __list_del_entry_valid+0xf1/0x100 lib/list_debug.c:51
>   __list_del_entry include/linux/list.h:117 [inline]
>   list_del_init include/linux/list.h:159 [inline]
>   neigh_mark_dead+0x13b/0x410 net/core/neighbour.c:125
>   neigh_periodic_work+0x89a/0xc30 net/core/neighbour.c:905
>   process_one_work+0xc90/0x1c40 kernel/workqueue.c:2153
>   worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
>   kthread+0x35a/0x440 kernel/kthread.c:246
>   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
>
> Allocated by task 8166:
>   save_stack+0x43/0xd0 mm/kasan/common.c:73
>   set_track mm/kasan/common.c:85 [inline]
>   kasan_kmalloc+0xcb/0xd0 mm/kasan/common.c:482
>   __do_kmalloc_node mm/slab.c:3671 [inline]
>   __kmalloc_node_track_caller+0x4d/0x70 mm/slab.c:3685
>   __kmalloc_reserve.isra.38+0x41/0xe0 net/core/skbuff.c:137
>   __alloc_skb+0x155/0x770 net/core/skbuff.c:205
>   alloc_skb include/linux/skbuff.h:1008 [inline]
>   netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
>   netlink_sendmsg+0xb29/0xfc0 net/netlink/af_netlink.c:1892
>   sock_sendmsg_nosec net/socket.c:622 [inline]
>   sock_sendmsg+0xd5/0x120 net/socket.c:632
>   ___sys_sendmsg+0x7fd/0x930 net/socket.c:2117
>   __sys_sendmsg+0x11d/0x280 net/socket.c:2155
>   __do_sys_sendmsg net/socket.c:2164 [inline]
>   __se_sys_sendmsg net/socket.c:2162 [inline]
>   __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
>   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> Freed by task 8166:
>   save_stack+0x43/0xd0 mm/kasan/common.c:73
>   set_track mm/kasan/common.c:85 [inline]
>   __kasan_slab_free+0x102/0x150 mm/kasan/common.c:444
>   kasan_slab_free+0xe/0x10 mm/kasan/common.c:452
>   __cache_free mm/slab.c:3485 [inline]
>   kfree+0xcf/0x230 mm/slab.c:3804
>   skb_free_head+0x99/0xc0 net/core/skbuff.c:550
>   skb_release_data+0x70c/0x9a0 net/core/skbuff.c:570
>   skb_release_all+0x4a/0x60 net/core/skbuff.c:627
>   __kfree_skb net/core/skbuff.c:641 [inline]
>   consume_skb+0x1ae/0x570 net/core/skbuff.c:701
>   netlink_unicast_kernel net/netlink/af_netlink.c:1311 [inline]
>   netlink_unicast+0x5ad/0x760 net/netlink/af_netlink.c:1336
>   netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1917
>   sock_sendmsg_nosec net/socket.c:622 [inline]
>   sock_sendmsg+0xd5/0x120 net/socket.c:632
>   ___sys_sendmsg+0x7fd/0x930 net/socket.c:2117
>   __sys_sendmsg+0x11d/0x280 net/socket.c:2155
>   __do_sys_sendmsg net/socket.c:2164 [inline]
>   __se_sys_sendmsg net/socket.c:2162 [inline]
>   __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
>   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> The buggy address belongs to the object at ffff8881c5e72940
>   which belongs to the cache kmalloc-1k of size 1024
> The buggy address is located 624 bytes inside of
>   1024-byte region [ffff8881c5e72940, ffff8881c5e72d40)
> The buggy address belongs to the page:
> page:ffffea0007179c80 count:1 mapcount:0 mapping:ffff8881da800ac0
> index:0xffff8881c5e72dc0 compound_mapcount: 0
> flags: 0x2fffc0000010200(slab|head)
> raw: 02fffc0000010200 ffffea00072cf188 ffffea00075d0c08 ffff8881da800ac0
> raw: ffff8881c5e72dc0 ffff8881c5e72040 0000000100000005 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>   ffff8881c5e72a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff8881c5e72b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ffff8881c5e72b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                       ^
>   ffff8881c5e72c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff8881c5e72c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================
> kobject: 'loop5' (00000000a3f9d928): kobject_uevent_env
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
> syzbot.
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/000000000000665f2e057cd00db6%40google.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: KASAN: use-after-free Read in handle_userfault (2)
  2018-12-12  9:58 ` Dmitry Vyukov
@ 2018-12-30  7:48   ` Dmitry Vyukov
  2018-12-30 15:46     ` Andrea Arcangeli
  0 siblings, 1 reply; 7+ messages in thread
From: Dmitry Vyukov @ 2018-12-30  7:48 UTC (permalink / raw)
  To: syzbot+cbc64b24b2b2d54c07a9, Andrea Arcangeli, Sasha Levin
  Cc: linux-fsdevel, LKML, syzkaller-bugs, Al Viro

On Wed, Dec 12, 2018 at 10:58 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Dec 12, 2018 at 10:45 AM syzbot
> <syzbot+cbc64b24b2b2d54c07a9@syzkaller.appspotmail.com> wrote:
> >
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:    14cf8c1d5b90 Add linux-next specific files for 20181210
> > git tree:       linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=133296db400000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=c9133d0a4284c012
> > dashboard link: https://syzkaller.appspot.com/bug?extid=cbc64b24b2b2d54c07a9
> > compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>
> This is a corrupted/intermixed kernel output, the bug actually
> happened somewhere in net/core/neighbour.c.
> syzkaller has a bunch of code to try to deal to corrupted kernel
> output, but it's not always possible as a parallel thread printing
> something can inject an unrelated frame just in the right place, and
> then it's indistinguishable from a crash happened there.
>
> The question is: what's the significance of that
> "FAULT_FLAG_ALLOW_RETRY missing 70"?
> Is it something to fix in kernel? Can the check be removed? Should it
> be moved somewhere earlier when flags are passed to kernel and cause
> EINVAL? Can it be at least converted to a single-line pr_warn?
> +Sasha go as far as suggesting that any "Call Trace:" in kernel output
> means a kernel bugs. This is one of few places in kernel that dumps
> stacks left and right without a kernel bug (?).

up

anybody maintaining uffd?

> > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+cbc64b24b2b2d54c07a9@syzkaller.appspotmail.com
> >
> > RDX: 00000000000003ff RSI: 0000000020012fe0 RDI: 00007f5dbe489850
> > RBP: 000000000072bf00 R08: 00000000000003ff R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5dbe48a6d4
> > R13: 00000000004c578a R14: 00000000004d9d90 R15: 00000000ffffffff
> > ==================================================================
> > BUG: KASAN: use-after-free in __list_del_entry_valid+0xf1/0x100
> > lib/list_debug.c:51
> > CPU: 1 PID: 20306 Comm: syz-executor2 Not tainted 4.20.0-rc6-next-20181210+
> > #164
> > Read of size 8 at addr ffff8881c5e72bb0 by task kworker/0:1/12
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> >
> > Call Trace:
> >   __dump_stack lib/dump_stack.c:77 [inline]
> >   dump_stack+0x244/0x39d lib/dump_stack.c:113
> >   handle_userfault.cold.30+0x47/0x62 fs/userfaultfd.c:431
> >   do_anonymous_page mm/memory.c:2938 [inline]
> >   handle_pte_fault mm/memory.c:3780 [inline]
> >   __handle_mm_fault+0x4d26/0x5b70 mm/memory.c:3906
> >   handle_mm_fault+0x54f/0xc70 mm/memory.c:3943
> >   do_user_addr_fault arch/x86/mm/fault.c:1475 [inline]
> >   __do_page_fault+0x5f6/0xd70 arch/x86/mm/fault.c:1541
> >   do_page_fault+0xf2/0x7e0 arch/x86/mm/fault.c:1572
> >   page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1143
> > RIP: 0033:0x4510a0
> > Code: 0f 84 c4 0f 00 00 48 89 f1 48 89 f8 48 83 e1 3f 48 83 f9 20 0f 86 7b
> > 02 00 00 48 83 e6 f0 48 83 e1 0f 66 0f ef c0 66 0f ef c9 <66> 0f 74 0e 66
> > 0f d7 d1 48 d3 ea 49 c7 c2 11 00 00 00 49 29 ca 4d
> > RSP: 002b:00007fab1fbba7a8 EFLAGS: 00010202
> > RAX: 00007fab1fbba850 RBX: 0000000000000003 RCX: 000000000000000e
> > RDX: 00000000000003ff RSI: 0000000020012fe0 RDI: 00007fab1fbba850
> > RBP: 000000000072bf00 R08: 00000000000003ff R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 00007fab1fbbb6d4
> > R13: 00000000004c578a R14: 00000000004d9d90 R15: 00000000ffffffff
> > CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 4.20.0-rc6-next-20181210+ #164
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > Workqueue: events_power_efficient neigh_periodic_work
> > Call Trace:
> >   __dump_stack lib/dump_stack.c:77 [inline]
> >   dump_stack+0x244/0x39d lib/dump_stack.c:113
> >   print_address_description.cold.4+0x9/0x1ff mm/kasan/report.c:187
> >   kasan_report.cold.5+0x1b/0x39 mm/kasan/report.c:317
> >   __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135
> >   __list_del_entry_valid+0xf1/0x100 lib/list_debug.c:51
> >   __list_del_entry include/linux/list.h:117 [inline]
> >   list_del_init include/linux/list.h:159 [inline]
> >   neigh_mark_dead+0x13b/0x410 net/core/neighbour.c:125
> >   neigh_periodic_work+0x89a/0xc30 net/core/neighbour.c:905
> >   process_one_work+0xc90/0x1c40 kernel/workqueue.c:2153
> >   worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
> >   kthread+0x35a/0x440 kernel/kthread.c:246
> >   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
> >
> > Allocated by task 8166:
> >   save_stack+0x43/0xd0 mm/kasan/common.c:73
> >   set_track mm/kasan/common.c:85 [inline]
> >   kasan_kmalloc+0xcb/0xd0 mm/kasan/common.c:482
> >   __do_kmalloc_node mm/slab.c:3671 [inline]
> >   __kmalloc_node_track_caller+0x4d/0x70 mm/slab.c:3685
> >   __kmalloc_reserve.isra.38+0x41/0xe0 net/core/skbuff.c:137
> >   __alloc_skb+0x155/0x770 net/core/skbuff.c:205
> >   alloc_skb include/linux/skbuff.h:1008 [inline]
> >   netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
> >   netlink_sendmsg+0xb29/0xfc0 net/netlink/af_netlink.c:1892
> >   sock_sendmsg_nosec net/socket.c:622 [inline]
> >   sock_sendmsg+0xd5/0x120 net/socket.c:632
> >   ___sys_sendmsg+0x7fd/0x930 net/socket.c:2117
> >   __sys_sendmsg+0x11d/0x280 net/socket.c:2155
> >   __do_sys_sendmsg net/socket.c:2164 [inline]
> >   __se_sys_sendmsg net/socket.c:2162 [inline]
> >   __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
> >   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> >   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > Freed by task 8166:
> >   save_stack+0x43/0xd0 mm/kasan/common.c:73
> >   set_track mm/kasan/common.c:85 [inline]
> >   __kasan_slab_free+0x102/0x150 mm/kasan/common.c:444
> >   kasan_slab_free+0xe/0x10 mm/kasan/common.c:452
> >   __cache_free mm/slab.c:3485 [inline]
> >   kfree+0xcf/0x230 mm/slab.c:3804
> >   skb_free_head+0x99/0xc0 net/core/skbuff.c:550
> >   skb_release_data+0x70c/0x9a0 net/core/skbuff.c:570
> >   skb_release_all+0x4a/0x60 net/core/skbuff.c:627
> >   __kfree_skb net/core/skbuff.c:641 [inline]
> >   consume_skb+0x1ae/0x570 net/core/skbuff.c:701
> >   netlink_unicast_kernel net/netlink/af_netlink.c:1311 [inline]
> >   netlink_unicast+0x5ad/0x760 net/netlink/af_netlink.c:1336
> >   netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1917
> >   sock_sendmsg_nosec net/socket.c:622 [inline]
> >   sock_sendmsg+0xd5/0x120 net/socket.c:632
> >   ___sys_sendmsg+0x7fd/0x930 net/socket.c:2117
> >   __sys_sendmsg+0x11d/0x280 net/socket.c:2155
> >   __do_sys_sendmsg net/socket.c:2164 [inline]
> >   __se_sys_sendmsg net/socket.c:2162 [inline]
> >   __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
> >   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> >   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > The buggy address belongs to the object at ffff8881c5e72940
> >   which belongs to the cache kmalloc-1k of size 1024
> > The buggy address is located 624 bytes inside of
> >   1024-byte region [ffff8881c5e72940, ffff8881c5e72d40)
> > The buggy address belongs to the page:
> > page:ffffea0007179c80 count:1 mapcount:0 mapping:ffff8881da800ac0
> > index:0xffff8881c5e72dc0 compound_mapcount: 0
> > flags: 0x2fffc0000010200(slab|head)
> > raw: 02fffc0000010200 ffffea00072cf188 ffffea00075d0c08 ffff8881da800ac0
> > raw: ffff8881c5e72dc0 ffff8881c5e72040 0000000100000005 0000000000000000
> > page dumped because: kasan: bad access detected
> >
> > Memory state around the buggy address:
> >   ffff8881c5e72a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >   ffff8881c5e72b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > ffff8881c5e72b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >                                       ^
> >   ffff8881c5e72c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >   ffff8881c5e72c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ==================================================================
> > kobject: 'loop5' (00000000a3f9d928): kobject_uevent_env
> >
> >
> > ---
> > This bug is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@googlegroups.com.
> >
> > syzbot will keep track of this bug report. See:
> > https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
> > syzbot.
> >
> > --
> > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/000000000000665f2e057cd00db6%40google.com.
> > For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: KASAN: use-after-free Read in handle_userfault (2)
  2018-12-30  7:48   ` Dmitry Vyukov
@ 2018-12-30 15:46     ` Andrea Arcangeli
  2019-01-02 13:37       ` Dmitry Vyukov
  0 siblings, 1 reply; 7+ messages in thread
From: Andrea Arcangeli @ 2018-12-30 15:46 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot+cbc64b24b2b2d54c07a9, Sasha Levin, linux-fsdevel, LKML,
	syzkaller-bugs, Al Viro, Peter Xu, Mike Rapoport, Mike Kravetz

Hello,

On Sun, Dec 30, 2018 at 08:48:05AM +0100, Dmitry Vyukov wrote:
> On Wed, Dec 12, 2018 at 10:58 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Wed, Dec 12, 2018 at 10:45 AM syzbot
> > <syzbot+cbc64b24b2b2d54c07a9@syzkaller.appspotmail.com> wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit:    14cf8c1d5b90 Add linux-next specific files for 20181210
> > > git tree:       linux-next
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=133296db400000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=c9133d0a4284c012
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=cbc64b24b2b2d54c07a9
> > > compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> >
> > This is a corrupted/intermixed kernel output, the bug actually
> > happened somewhere in net/core/neighbour.c.
> > syzkaller has a bunch of code to try to deal to corrupted kernel
> > output, but it's not always possible as a parallel thread printing
> > something can inject an unrelated frame just in the right place, and
> > then it's indistinguishable from a crash happened there.
> >
> > The question is: what's the significance of that
> > "FAULT_FLAG_ALLOW_RETRY missing 70"?
> > Is it something to fix in kernel? Can the check be removed? Should it

This is a rate limited warning not compiled in production kernels.

#ifdef CONFIG_DEBUG_VM
		if (printk_ratelimit()) {
			printk(KERN_WARNING
			       "FAULT_FLAG_ALLOW_RETRY missing %x\n",
			       vmf->flags);
			dump_stack();
		}
#endif

It helps to tell if it's an userland bug or a kernel missing feature
like some get_user_pages that can't drop mmap_sem in a path where we
may need userfaultfd to be functional.

Those get_user_pages locations are deterministic, so for example when
qemu started explicitly calling get_mempolicy(MPOL_F_NODE|MPOL_F_ADDR)
we had to change the get_user_pages there to make uffd functional for
such syscall. An example of where this debug aid paid off relatively
recently is commit 3b9aadf7278d16d7bed4d5d808501065f70898d8.

> > be moved somewhere earlier when flags are passed to kernel and cause
> > EINVAL? Can it be at least converted to a single-line pr_warn?

I don't see the problem with the VM_FAULT_SIGBUS retval.

The reason of the full dump_stack is because pr_warn would tell us
nothing about the location of the get_user_pages that needs
conversion. It's pretty much like when you get an -EINVAL in userland
and there's a zillon places that returns -EINVAL and you've no clue
which one you hit. The debug_stack is about pinpointing the exact
location after the fact, to evaluate what to do about it.

> > +Sasha go as far as suggesting that any "Call Trace:" in kernel output
> > means a kernel bugs. This is one of few places in kernel that dumps
> > stacks left and right without a kernel bug (?).

Yes this is a case where the Call Trace in the logs doesn't guarantee
there was a kernel bug, this is why it's rate limited and only
compiled in non production builds.

I haven't looked at the reproducer source, but looking only at the
trace I can tell this happened because a region has been registered in
uffd while a page fault was already undergoing a VM_FAULT_RETRY
cycle.

We could remove the dump_stack if FAULT_FLAG_TRIED is set, but
removing it for all cases would make it harder to solve other kernel
issues like described above.

However with Peter Xu recent work FAULT_FLAG_TRIED will go away as
part of the userfaultfd WP support (FAULT_FLAG_TRIED is the reason of
this specific false positive) so you won't be able to trigger this
path through undefined state userland race conditions anymore after
that.

So I'd propose Peter's FAULT_FLAG_TRIED removal as the best path
forward because it's will solve 3 problems: 1) this rate limited false
positive FAULT_FLAG_TRIED dump_stack with malicious app, 2) signal
case not returning VM_FAULT_RETRY despite we dropped mmap_sem to sleep
until we got waken up by a signal, 3) it'll allow the uffd write
protection to work.

https://lkml.kernel.org/r/20181121062103.18835-1-peterx@redhat.com

Until FAULT_FLAG_TRIED is dropped, it's up to the manager in the non
cooperative case and up to the app itself in the normal more common
cooperative case, to ensure there aren't page faults already happening
in the region while the region is registered in uffd. It's totally
trivial for the app to do so until we relax this requirement in a
fully backwards compatible way.

The reason FAULT_FLAG_TRIED must be dropped for the write-protection
feature is that the uffd per-page granular write protection ioctl must
be allowed to run at any time on any region, potentially while there
are already VM_FAULT_RETRY faults being retried. So the kernel common
code must be allowed to return VM_FAULT_RETRY twice or more in a row
from the point of view of the arch page fault code.

To remove the dump_stack completely (not only for the case of a
malicious userland app) we could add a debug facility in debugfs where
we could dump stack traces in a circular buffer, so we could ask the
root user to cat the file and send us the extra call trace debug data
associated to the single pr_warn in the log. It'd actually be nice if
all kind of -EINVAL could dump their stack there too in fact
(supposedly they're not fast paths).

> > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > Reported-by: syzbot+cbc64b24b2b2d54c07a9@syzkaller.appspotmail.com
> > >
> > > RDX: 00000000000003ff RSI: 0000000020012fe0 RDI: 00007f5dbe489850
> > > RBP: 000000000072bf00 R08: 00000000000003ff R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5dbe48a6d4
> > > R13: 00000000004c578a R14: 00000000004d9d90 R15: 00000000ffffffff
> > > ==================================================================
> > > BUG: KASAN: use-after-free in __list_del_entry_valid+0xf1/0x100
> > > lib/list_debug.c:51
> > > CPU: 1 PID: 20306 Comm: syz-executor2 Not tainted 4.20.0-rc6-next-20181210+
> > > #164
> > > Read of size 8 at addr ffff8881c5e72bb0 by task kworker/0:1/12
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > Google 01/01/2011
> > >
> > > Call Trace:
> > >   __dump_stack lib/dump_stack.c:77 [inline]
> > >   dump_stack+0x244/0x39d lib/dump_stack.c:113
> > >   handle_userfault.cold.30+0x47/0x62 fs/userfaultfd.c:431
> > >   do_anonymous_page mm/memory.c:2938 [inline]
> > >   handle_pte_fault mm/memory.c:3780 [inline]
> > >   __handle_mm_fault+0x4d26/0x5b70 mm/memory.c:3906
> > >   handle_mm_fault+0x54f/0xc70 mm/memory.c:3943
> > >   do_user_addr_fault arch/x86/mm/fault.c:1475 [inline]
> > >   __do_page_fault+0x5f6/0xd70 arch/x86/mm/fault.c:1541
> > >   do_page_fault+0xf2/0x7e0 arch/x86/mm/fault.c:1572
> > >   page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1143
> > > RIP: 0033:0x4510a0
> > > Code: 0f 84 c4 0f 00 00 48 89 f1 48 89 f8 48 83 e1 3f 48 83 f9 20 0f 86 7b
> > > 02 00 00 48 83 e6 f0 48 83 e1 0f 66 0f ef c0 66 0f ef c9 <66> 0f 74 0e 66
> > > 0f d7 d1 48 d3 ea 49 c7 c2 11 00 00 00 49 29 ca 4d
> > > RSP: 002b:00007fab1fbba7a8 EFLAGS: 00010202
> > > RAX: 00007fab1fbba850 RBX: 0000000000000003 RCX: 000000000000000e
> > > RDX: 00000000000003ff RSI: 0000000020012fe0 RDI: 00007fab1fbba850
> > > RBP: 000000000072bf00 R08: 00000000000003ff R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000246 R12: 00007fab1fbbb6d4
> > > R13: 00000000004c578a R14: 00000000004d9d90 R15: 00000000ffffffff
> > > CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 4.20.0-rc6-next-20181210+ #164
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > Google 01/01/2011
> > > Workqueue: events_power_efficient neigh_periodic_work
> > > Call Trace:
> > >   __dump_stack lib/dump_stack.c:77 [inline]
> > >   dump_stack+0x244/0x39d lib/dump_stack.c:113
> > >   print_address_description.cold.4+0x9/0x1ff mm/kasan/report.c:187
> > >   kasan_report.cold.5+0x1b/0x39 mm/kasan/report.c:317
> > >   __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135
> > >   __list_del_entry_valid+0xf1/0x100 lib/list_debug.c:51
> > >   __list_del_entry include/linux/list.h:117 [inline]
> > >   list_del_init include/linux/list.h:159 [inline]
> > >   neigh_mark_dead+0x13b/0x410 net/core/neighbour.c:125

The real crash seems to be completely unrelated to userfaultfd, the
list_del_init is in neigh_mark_dead.

	if (!list_empty(&n->gc_list)) {
		list_del_init(&n->gc_list);

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: KASAN: use-after-free Read in handle_userfault (2)
  2018-12-30 15:46     ` Andrea Arcangeli
@ 2019-01-02 13:37       ` Dmitry Vyukov
  2019-01-04 23:10         ` Andrea Arcangeli
  0 siblings, 1 reply; 7+ messages in thread
From: Dmitry Vyukov @ 2019-01-02 13:37 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: syzbot+cbc64b24b2b2d54c07a9, Sasha Levin, linux-fsdevel, LKML,
	syzkaller-bugs, Al Viro, Peter Xu, Mike Rapoport, Mike Kravetz

On Sun, Dec 30, 2018 at 4:46 PM Andrea Arcangeli <aarcange@redhat.com> wrote:
>
> Hello,
>
> On Sun, Dec 30, 2018 at 08:48:05AM +0100, Dmitry Vyukov wrote:
> > On Wed, Dec 12, 2018 at 10:58 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> > >
> > > On Wed, Dec 12, 2018 at 10:45 AM syzbot
> > > <syzbot+cbc64b24b2b2d54c07a9@syzkaller.appspotmail.com> wrote:
> > > >
> > > > Hello,
> > > >
> > > > syzbot found the following crash on:
> > > >
> > > > HEAD commit:    14cf8c1d5b90 Add linux-next specific files for 20181210
> > > > git tree:       linux-next
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=133296db400000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=c9133d0a4284c012
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=cbc64b24b2b2d54c07a9
> > > > compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> > >
> > > This is a corrupted/intermixed kernel output, the bug actually
> > > happened somewhere in net/core/neighbour.c.
> > > syzkaller has a bunch of code to try to deal to corrupted kernel
> > > output, but it's not always possible as a parallel thread printing
> > > something can inject an unrelated frame just in the right place, and
> > > then it's indistinguishable from a crash happened there.
> > >
> > > The question is: what's the significance of that
> > > "FAULT_FLAG_ALLOW_RETRY missing 70"?
> > > Is it something to fix in kernel? Can the check be removed? Should it
>
> This is a rate limited warning not compiled in production kernels.
>
> #ifdef CONFIG_DEBUG_VM
>                 if (printk_ratelimit()) {
>                         printk(KERN_WARNING
>                                "FAULT_FLAG_ALLOW_RETRY missing %x\n",
>                                vmf->flags);
>                         dump_stack();
>                 }
> #endif
>
> It helps to tell if it's an userland bug or a kernel missing feature
> like some get_user_pages that can't drop mmap_sem in a path where we
> may need userfaultfd to be functional.
>
> Those get_user_pages locations are deterministic, so for example when
> qemu started explicitly calling get_mempolicy(MPOL_F_NODE|MPOL_F_ADDR)
> we had to change the get_user_pages there to make uffd functional for
> such syscall. An example of where this debug aid paid off relatively
> recently is commit 3b9aadf7278d16d7bed4d5d808501065f70898d8.
>
> > > be moved somewhere earlier when flags are passed to kernel and cause
> > > EINVAL? Can it be at least converted to a single-line pr_warn?
>
> I don't see the problem with the VM_FAULT_SIGBUS retval.
>
> The reason of the full dump_stack is because pr_warn would tell us
> nothing about the location of the get_user_pages that needs
> conversion. It's pretty much like when you get an -EINVAL in userland
> and there's a zillon places that returns -EINVAL and you've no clue
> which one you hit. The debug_stack is about pinpointing the exact
> location after the fact, to evaluate what to do about it.
>
> > > +Sasha go as far as suggesting that any "Call Trace:" in kernel output
> > > means a kernel bugs. This is one of few places in kernel that dumps
> > > stacks left and right without a kernel bug (?).
>
> Yes this is a case where the Call Trace in the logs doesn't guarantee
> there was a kernel bug, this is why it's rate limited and only
> compiled in non production builds.
>
> I haven't looked at the reproducer source, but looking only at the
> trace I can tell this happened because a region has been registered in
> uffd while a page fault was already undergoing a VM_FAULT_RETRY
> cycle.
>
> We could remove the dump_stack if FAULT_FLAG_TRIED is set, but
> removing it for all cases would make it harder to solve other kernel
> issues like described above.
>
> However with Peter Xu recent work FAULT_FLAG_TRIED will go away as
> part of the userfaultfd WP support (FAULT_FLAG_TRIED is the reason of
> this specific false positive) so you won't be able to trigger this
> path through undefined state userland race conditions anymore after
> that.
>
> So I'd propose Peter's FAULT_FLAG_TRIED removal as the best path
> forward because it's will solve 3 problems: 1) this rate limited false
> positive FAULT_FLAG_TRIED dump_stack with malicious app, 2) signal
> case not returning VM_FAULT_RETRY despite we dropped mmap_sem to sleep
> until we got waken up by a signal, 3) it'll allow the uffd write
> protection to work.
>
> https://lkml.kernel.org/r/20181121062103.18835-1-peterx@redhat.com
>
> Until FAULT_FLAG_TRIED is dropped, it's up to the manager in the non
> cooperative case and up to the app itself in the normal more common
> cooperative case, to ensure there aren't page faults already happening
> in the region while the region is registered in uffd. It's totally
> trivial for the app to do so until we relax this requirement in a
> fully backwards compatible way.
>
> The reason FAULT_FLAG_TRIED must be dropped for the write-protection
> feature is that the uffd per-page granular write protection ioctl must
> be allowed to run at any time on any region, potentially while there
> are already VM_FAULT_RETRY faults being retried. So the kernel common
> code must be allowed to return VM_FAULT_RETRY twice or more in a row
> from the point of view of the arch page fault code.
>
> To remove the dump_stack completely (not only for the case of a
> malicious userland app) we could add a debug facility in debugfs where
> we could dump stack traces in a circular buffer, so we could ask the
> root user to cat the file and send us the extra call trace debug data
> associated to the single pr_warn in the log. It'd actually be nice if
> all kind of -EINVAL could dump their stack there too in fact
> (supposedly they're not fast paths).
>
> > > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > > Reported-by: syzbot+cbc64b24b2b2d54c07a9@syzkaller.appspotmail.com
> > > >
> > > > RDX: 00000000000003ff RSI: 0000000020012fe0 RDI: 00007f5dbe489850
> > > > RBP: 000000000072bf00 R08: 00000000000003ff R09: 0000000000000000
> > > > R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5dbe48a6d4
> > > > R13: 00000000004c578a R14: 00000000004d9d90 R15: 00000000ffffffff
> > > > ==================================================================
> > > > BUG: KASAN: use-after-free in __list_del_entry_valid+0xf1/0x100
> > > > lib/list_debug.c:51
> > > > CPU: 1 PID: 20306 Comm: syz-executor2 Not tainted 4.20.0-rc6-next-20181210+
> > > > #164
> > > > Read of size 8 at addr ffff8881c5e72bb0 by task kworker/0:1/12
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > > Google 01/01/2011
> > > >
> > > > Call Trace:
> > > >   __dump_stack lib/dump_stack.c:77 [inline]
> > > >   dump_stack+0x244/0x39d lib/dump_stack.c:113
> > > >   handle_userfault.cold.30+0x47/0x62 fs/userfaultfd.c:431
> > > >   do_anonymous_page mm/memory.c:2938 [inline]
> > > >   handle_pte_fault mm/memory.c:3780 [inline]
> > > >   __handle_mm_fault+0x4d26/0x5b70 mm/memory.c:3906
> > > >   handle_mm_fault+0x54f/0xc70 mm/memory.c:3943
> > > >   do_user_addr_fault arch/x86/mm/fault.c:1475 [inline]
> > > >   __do_page_fault+0x5f6/0xd70 arch/x86/mm/fault.c:1541
> > > >   do_page_fault+0xf2/0x7e0 arch/x86/mm/fault.c:1572
> > > >   page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1143
> > > > RIP: 0033:0x4510a0
> > > > Code: 0f 84 c4 0f 00 00 48 89 f1 48 89 f8 48 83 e1 3f 48 83 f9 20 0f 86 7b
> > > > 02 00 00 48 83 e6 f0 48 83 e1 0f 66 0f ef c0 66 0f ef c9 <66> 0f 74 0e 66
> > > > 0f d7 d1 48 d3 ea 49 c7 c2 11 00 00 00 49 29 ca 4d
> > > > RSP: 002b:00007fab1fbba7a8 EFLAGS: 00010202
> > > > RAX: 00007fab1fbba850 RBX: 0000000000000003 RCX: 000000000000000e
> > > > RDX: 00000000000003ff RSI: 0000000020012fe0 RDI: 00007fab1fbba850
> > > > RBP: 000000000072bf00 R08: 00000000000003ff R09: 0000000000000000
> > > > R10: 0000000000000000 R11: 0000000000000246 R12: 00007fab1fbbb6d4
> > > > R13: 00000000004c578a R14: 00000000004d9d90 R15: 00000000ffffffff
> > > > CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 4.20.0-rc6-next-20181210+ #164
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > > Google 01/01/2011
> > > > Workqueue: events_power_efficient neigh_periodic_work
> > > > Call Trace:
> > > >   __dump_stack lib/dump_stack.c:77 [inline]
> > > >   dump_stack+0x244/0x39d lib/dump_stack.c:113
> > > >   print_address_description.cold.4+0x9/0x1ff mm/kasan/report.c:187
> > > >   kasan_report.cold.5+0x1b/0x39 mm/kasan/report.c:317
> > > >   __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135
> > > >   __list_del_entry_valid+0xf1/0x100 lib/list_debug.c:51
> > > >   __list_del_entry include/linux/list.h:117 [inline]
> > > >   list_del_init include/linux/list.h:159 [inline]
> > > >   neigh_mark_dead+0x13b/0x410 net/core/neighbour.c:125
>
> The real crash seems to be completely unrelated to userfaultfd, the
> list_del_init is in neigh_mark_dead.
>
>         if (!list_empty(&n->gc_list)) {
>                 list_del_init(&n->gc_list);


Hi Andrea,

Thanks for the detailed answer.

If we are proceeding with "mm: some enhancements to the page fault
mechanism", that's good as it will eliminate at least part of this
output.

There are 2 types of debug configs: ones add additional checks for
machines and another add verbose output for humans. CONFIG_DEBUG_VM
seems to be more of the first type of debug config -- additional
checks for machines. I've seen some of the second type debug configs
are prefixed with DEBUG_VERBOSE or something along these lines. Maybe
it makes sense to split this out of CONFIG_DEBUG_VM. Since it's a
"gray" check (rather then white/black check), it can't be used in
CI/fuzzing setups anyways -- not possible to analyse thousands of
cases manually (though maybe we actually hitting some that can be
classified as kernel bugs).

Yes, the problem is way more general. As you noted it applies to 100K
of EINVAL|EFAULT|ENOMEM, it's super hard to figure out what/where
exactly goes wrong in the kernel getting only -22. But at the same
time we can't have all of these 100K places dump stacks. I don't know
what's a good solution for this. Manually annotating 100K places does
not look like the right way to go. Maybe kprobes can do this? In some
cases I used CONFIG_KCOV  and kcovtrace
(https://github.com/google/syzkaller/blob/master/tools/kcovtrace/kcovtrace.c)
to collect kernel trace from a failing syscall.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: KASAN: use-after-free Read in handle_userfault (2)
  2019-01-02 13:37       ` Dmitry Vyukov
@ 2019-01-04 23:10         ` Andrea Arcangeli
  2019-01-07  9:44           ` Dmitry Vyukov
  0 siblings, 1 reply; 7+ messages in thread
From: Andrea Arcangeli @ 2019-01-04 23:10 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot+cbc64b24b2b2d54c07a9, Sasha Levin, linux-fsdevel, LKML,
	syzkaller-bugs, Al Viro, Peter Xu, Mike Rapoport, Mike Kravetz

On Wed, Jan 02, 2019 at 02:37:58PM +0100, Dmitry Vyukov wrote:
> If we are proceeding with "mm: some enhancements to the page fault
> mechanism", that's good as it will eliminate at least part of this
> output.

Agreed.

> There are 2 types of debug configs: ones add additional checks for
> machines and another add verbose output for humans. CONFIG_DEBUG_VM
> seems to be more of the first type of debug config -- additional
> checks for machines. I've seen some of the second type debug configs
> are prefixed with DEBUG_VERBOSE or something along these lines. Maybe
> it makes sense to split this out of CONFIG_DEBUG_VM. Since it's a
> "gray" check (rather then white/black check), it can't be used in
> CI/fuzzing setups anyways -- not possible to analyse thousands of
> cases manually (though maybe we actually hitting some that can be
> classified as kernel bugs).

The problem with the DEBUG_VERBOSE is that if the kernel needs a
rebuild it's only marginally better than having to ask the developer
to add a one liner dump_stack before rebuilding the kernel.

Next time we need it, it may be simpler to figure a dynamic tracing
trick than to ask a rebuild.

In addition of pursuing the VM_FAULT_RETRY improvements from Peter,
it'd be fine to drop it for now to avoid confusing the machines.

> Yes, the problem is way more general. As you noted it applies to 100K
> of EINVAL|EFAULT|ENOMEM, it's super hard to figure out what/where
> exactly goes wrong in the kernel getting only -22. But at the same

What stands out for this location is that it's the bailout point
of all non uffd compatible syscalls.

The chances such annotation turns out to be useful is much higher than
a random -EINVAL location, but in principle it's the same kind of
issue and I agree it'd be unpractical to annotate them manually.

> time we can't have all of these 100K places dump stacks. I don't know
> what's a good solution for this. Manually annotating 100K places does
> not look like the right way to go. Maybe kprobes can do this? In some
> cases I used CONFIG_KCOV  and kcovtrace
> (https://github.com/google/syzkaller/blob/master/tools/kcovtrace/kcovtrace.c)
> to collect kernel trace from a failing syscall.

If there is a de facto standard to search for a "syscall bailout call
trace" that can provide us that very same dump_stack, that should work
for this uffd issue too indeed.

KCOV may or may not be enabled in enterprise -debug kernels, but the
main limitation of the kcovtrace.c seems to be the lack of threading
support and the privilege inheritance through fork. While it's not
mandatory, the uffd manager is practically always implemented as a
thread of some process so whatever alternative to dump_stack() should
work with threads.

I think kprobes/ebpf should work provided debug info is available
(which is not always guaranteed), otherwise to obtain it without debug
info, it'd require a ftrace static trace point to use with the
function graph tracer I guess.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: KASAN: use-after-free Read in handle_userfault (2)
  2019-01-04 23:10         ` Andrea Arcangeli
@ 2019-01-07  9:44           ` Dmitry Vyukov
  0 siblings, 0 replies; 7+ messages in thread
From: Dmitry Vyukov @ 2019-01-07  9:44 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: syzbot+cbc64b24b2b2d54c07a9, Sasha Levin, linux-fsdevel, LKML,
	syzkaller-bugs, Al Viro, Peter Xu, Mike Rapoport, Mike Kravetz

On Sat, Jan 5, 2019 at 12:10 AM Andrea Arcangeli <aarcange@redhat.com> wrote:
>
> On Wed, Jan 02, 2019 at 02:37:58PM +0100, Dmitry Vyukov wrote:
> > If we are proceeding with "mm: some enhancements to the page fault
> > mechanism", that's good as it will eliminate at least part of this
> > output.
>
> Agreed.
>
> > There are 2 types of debug configs: ones add additional checks for
> > machines and another add verbose output for humans. CONFIG_DEBUG_VM
> > seems to be more of the first type of debug config -- additional
> > checks for machines. I've seen some of the second type debug configs
> > are prefixed with DEBUG_VERBOSE or something along these lines. Maybe
> > it makes sense to split this out of CONFIG_DEBUG_VM. Since it's a
> > "gray" check (rather then white/black check), it can't be used in
> > CI/fuzzing setups anyways -- not possible to analyse thousands of
> > cases manually (though maybe we actually hitting some that can be
> > classified as kernel bugs).
>
> The problem with the DEBUG_VERBOSE is that if the kernel needs a
> rebuild it's only marginally better than having to ask the developer
> to add a one liner dump_stack before rebuilding the kernel.
>
> Next time we need it, it may be simpler to figure a dynamic tracing
> trick than to ask a rebuild.
>
> In addition of pursuing the VM_FAULT_RETRY improvements from Peter,
> it'd be fine to drop it for now to avoid confusing the machines.
>
> > Yes, the problem is way more general. As you noted it applies to 100K
> > of EINVAL|EFAULT|ENOMEM, it's super hard to figure out what/where
> > exactly goes wrong in the kernel getting only -22. But at the same
>
> What stands out for this location is that it's the bailout point
> of all non uffd compatible syscalls.
>
> The chances such annotation turns out to be useful is much higher than
> a random -EINVAL location, but in principle it's the same kind of
> issue and I agree it'd be unpractical to annotate them manually.
>
> > time we can't have all of these 100K places dump stacks. I don't know
> > what's a good solution for this. Manually annotating 100K places does
> > not look like the right way to go. Maybe kprobes can do this? In some
> > cases I used CONFIG_KCOV  and kcovtrace
> > (https://github.com/google/syzkaller/blob/master/tools/kcovtrace/kcovtrace.c)
> > to collect kernel trace from a failing syscall.
>
> If there is a de facto standard to search for a "syscall bailout call
> trace" that can provide us that very same dump_stack, that should work
> for this uffd issue too indeed.
>
> KCOV may or may not be enabled in enterprise -debug kernels, but the
> main limitation of the kcovtrace.c seems to be the lack of threading
> support and the privilege inheritance through fork. While it's not
> mandatory, the uffd manager is practically always implemented as a
> thread of some process so whatever alternative to dump_stack() should
> work with threads.
>
> I think kprobes/ebpf should work provided debug info is available
> (which is not always guaranteed), otherwise to obtain it without debug
> info, it'd require a ftrace static trace point to use with the
> function graph tracer I guess.

OK, not sure what are the action points here now besides marking this
bug report as dup:

#syz dup: KASAN: use-after-free Read in neigh_mark_dead

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-01-07  9:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-12  9:45 KASAN: use-after-free Read in handle_userfault (2) syzbot
2018-12-12  9:58 ` Dmitry Vyukov
2018-12-30  7:48   ` Dmitry Vyukov
2018-12-30 15:46     ` Andrea Arcangeli
2019-01-02 13:37       ` Dmitry Vyukov
2019-01-04 23:10         ` Andrea Arcangeli
2019-01-07  9:44           ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).