linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* KASAN: use-after-free Read in netdevice_event_work_handler
@ 2020-07-09 23:54 syzbot
  2020-07-22 20:29 ` syzbot
  2020-07-31 21:11 ` Rustam Kovhaev
  0 siblings, 2 replies; 8+ messages in thread
From: syzbot @ 2020-07-09 23:54 UTC (permalink / raw)
  To: dledford, jgg, linux-kernel, linux-rdma, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    0bddd227 Documentation: update for gcc 4.9 requirement
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1418afb7100000
kernel config:  https://syzkaller.appspot.com/x/.config?x=66ad203c2bb6d8b
dashboard link: https://syzkaller.appspot.com/bug?extid=20b90969babe05609947
compiler:       gcc (GCC) 10.1.0-syz 20200507
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12a8edff100000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=167d3bb7100000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+20b90969babe05609947@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in dev_put include/linux/netdevice.h:3853 [inline]
BUG: KASAN: use-after-free in netdevice_event_work_handler+0x15b/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:627
Read of size 8 at addr ffff88807b13e568 by task kworker/u4:0/7

CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: gid-cache-wq netdevice_event_work_handler
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x18f/0x20d lib/dump_stack.c:118
 print_address_description.constprop.0.cold+0xae/0x436 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
 dev_put include/linux/netdevice.h:3853 [inline]
 netdevice_event_work_handler+0x15b/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:627
 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
 kthread+0x3b5/0x4a0 kernel/kthread.c:291
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293

Allocated by task 13061:
 save_stack+0x1b/0x40 mm/kasan/common.c:48
 set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:494
 kmalloc_node include/linux/slab.h:578 [inline]
 kvmalloc_node+0x61/0xf0 mm/util.c:574
 kvmalloc include/linux/mm.h:753 [inline]
 kvzalloc include/linux/mm.h:761 [inline]
 alloc_netdev_mqs+0x97/0xdc0 net/core/dev.c:9938
 __ip_tunnel_create+0x201/0x580 net/ipv4/ip_tunnel.c:254
 ip_tunnel_init_net+0x32b/0x980 net/ipv4/ip_tunnel.c:1072
 ops_init+0xaf/0x470 net/core/net_namespace.c:151
 setup_net+0x2d8/0x850 net/core/net_namespace.c:341
 copy_net_ns+0x2cf/0x5e0 net/core/net_namespace.c:482
 create_new_namespaces+0x3f6/0xb10 kernel/nsproxy.c:110
 unshare_nsproxy_namespaces+0xbd/0x1f0 kernel/nsproxy.c:231
 ksys_unshare+0x36c/0x9a0 kernel/fork.c:2983
 __do_sys_unshare kernel/fork.c:3051 [inline]
 __se_sys_unshare kernel/fork.c:3049 [inline]
 __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3049
 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 13061:
 save_stack+0x1b/0x40 mm/kasan/common.c:48
 set_track mm/kasan/common.c:56 [inline]
 kasan_set_free_info mm/kasan/common.c:316 [inline]
 __kasan_slab_free+0xf5/0x140 mm/kasan/common.c:455
 __cache_free mm/slab.c:3426 [inline]
 kfree+0x103/0x2c0 mm/slab.c:3757
 kvfree+0x42/0x50 mm/util.c:603
 device_release+0x71/0x200 drivers/base/core.c:1559
 kobject_cleanup lib/kobject.c:693 [inline]
 kobject_release lib/kobject.c:722 [inline]
 kref_put include/linux/kref.h:65 [inline]
 kobject_put+0x1c0/0x270 lib/kobject.c:739
 put_device+0x1b/0x30 drivers/base/core.c:2779
 free_netdev+0x35d/0x480 net/core/dev.c:10054
 __ip_tunnel_create+0x48f/0x580 net/ipv4/ip_tunnel.c:274
 ip_tunnel_init_net+0x32b/0x980 net/ipv4/ip_tunnel.c:1072
 ops_init+0xaf/0x470 net/core/net_namespace.c:151
 setup_net+0x2d8/0x850 net/core/net_namespace.c:341
 copy_net_ns+0x2cf/0x5e0 net/core/net_namespace.c:482
 create_new_namespaces+0x3f6/0xb10 kernel/nsproxy.c:110
 unshare_nsproxy_namespaces+0xbd/0x1f0 kernel/nsproxy.c:231
 ksys_unshare+0x36c/0x9a0 kernel/fork.c:2983
 __do_sys_unshare kernel/fork.c:3051 [inline]
 __se_sys_unshare kernel/fork.c:3049 [inline]
 __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3049
 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The buggy address belongs to the object at ffff88807b13e000
 which belongs to the cache kmalloc-4k of size 4096
The buggy address is located 1384 bytes inside of
 4096-byte region [ffff88807b13e000, ffff88807b13f000)
The buggy address belongs to the page:
page:ffffea0001ec4f80 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 head:ffffea0001ec4f80 order:1 compound_mapcount:0
flags: 0xfffe0000010200(slab|head)
raw: 00fffe0000010200 ffffea0001ecce88 ffffea0001987988 ffff8880aa002000
raw: 0000000000000000 ffff88807b13e000 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88807b13e400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88807b13e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88807b13e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                          ^
 ffff88807b13e580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88807b13e600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KASAN: use-after-free Read in netdevice_event_work_handler
  2020-07-09 23:54 KASAN: use-after-free Read in netdevice_event_work_handler syzbot
@ 2020-07-22 20:29 ` syzbot
  2020-07-31 21:11 ` Rustam Kovhaev
  1 sibling, 0 replies; 8+ messages in thread
From: syzbot @ 2020-07-22 20:29 UTC (permalink / raw)
  To: andrew, davem, dledford, f.fainelli, hkallweit1, jgg, kuba,
	linux-kernel, linux-rdma, linux, netdev, rkovhaev,
	syzkaller-bugs

syzbot has bisected this issue to:

commit d70c47c8dc6902db19555b7ff7e6eeb264d4ac06
Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Thu Apr 23 19:34:33 2020 +0000

    net: phy: make phy_suspend a no-op if PHY is suspended already

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=16b2aad8900000
start commit:   0bddd227 Documentation: update for gcc 4.9 requirement
git tree:       upstream
final oops:     https://syzkaller.appspot.com/x/report.txt?x=15b2aad8900000
console output: https://syzkaller.appspot.com/x/log.txt?x=11b2aad8900000
kernel config:  https://syzkaller.appspot.com/x/.config?x=66ad203c2bb6d8b
dashboard link: https://syzkaller.appspot.com/bug?extid=20b90969babe05609947
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12a8edff100000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=167d3bb7100000

Reported-by: syzbot+20b90969babe05609947@syzkaller.appspotmail.com
Fixes: d70c47c8dc69 ("net: phy: make phy_suspend a no-op if PHY is suspended already")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KASAN: use-after-free Read in netdevice_event_work_handler
  2020-07-09 23:54 KASAN: use-after-free Read in netdevice_event_work_handler syzbot
  2020-07-22 20:29 ` syzbot
@ 2020-07-31 21:11 ` Rustam Kovhaev
  2020-08-01  2:23   ` Rustam Kovhaev
  2020-08-02 22:22   ` Jason Gunthorpe
  1 sibling, 2 replies; 8+ messages in thread
From: Rustam Kovhaev @ 2020-07-31 21:11 UTC (permalink / raw)
  To: dledford, jgg, linux-kernel, linux-rdma, syzkaller-bugs

On Thu, Jul 09, 2020 at 04:54:19PM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    0bddd227 Documentation: update for gcc 4.9 requirement
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1418afb7100000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=66ad203c2bb6d8b
> dashboard link: https://syzkaller.appspot.com/bug?extid=20b90969babe05609947
> compiler:       gcc (GCC) 10.1.0-syz 20200507
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12a8edff100000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=167d3bb7100000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+20b90969babe05609947@syzkaller.appspotmail.com
> 
> ==================================================================
> BUG: KASAN: use-after-free in dev_put include/linux/netdevice.h:3853 [inline]
> BUG: KASAN: use-after-free in netdevice_event_work_handler+0x15b/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:627
> Read of size 8 at addr ffff88807b13e568 by task kworker/u4:0/7
> 
> CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0-rc4-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: gid-cache-wq netdevice_event_work_handler
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x18f/0x20d lib/dump_stack.c:118
>  print_address_description.constprop.0.cold+0xae/0x436 mm/kasan/report.c:383
>  __kasan_report mm/kasan/report.c:513 [inline]
>  kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
>  dev_put include/linux/netdevice.h:3853 [inline]
>  netdevice_event_work_handler+0x15b/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:627
>  process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
>  worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
>  kthread+0x3b5/0x4a0 kernel/kthread.c:291
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
> 
> Allocated by task 13061:
>  save_stack+0x1b/0x40 mm/kasan/common.c:48
>  set_track mm/kasan/common.c:56 [inline]
>  __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:494
>  kmalloc_node include/linux/slab.h:578 [inline]
>  kvmalloc_node+0x61/0xf0 mm/util.c:574
>  kvmalloc include/linux/mm.h:753 [inline]
>  kvzalloc include/linux/mm.h:761 [inline]
>  alloc_netdev_mqs+0x97/0xdc0 net/core/dev.c:9938
>  __ip_tunnel_create+0x201/0x580 net/ipv4/ip_tunnel.c:254
>  ip_tunnel_init_net+0x32b/0x980 net/ipv4/ip_tunnel.c:1072
>  ops_init+0xaf/0x470 net/core/net_namespace.c:151
>  setup_net+0x2d8/0x850 net/core/net_namespace.c:341
>  copy_net_ns+0x2cf/0x5e0 net/core/net_namespace.c:482
>  create_new_namespaces+0x3f6/0xb10 kernel/nsproxy.c:110
>  unshare_nsproxy_namespaces+0xbd/0x1f0 kernel/nsproxy.c:231
>  ksys_unshare+0x36c/0x9a0 kernel/fork.c:2983
>  __do_sys_unshare kernel/fork.c:3051 [inline]
>  __se_sys_unshare kernel/fork.c:3049 [inline]
>  __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3049
>  do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Freed by task 13061:
>  save_stack+0x1b/0x40 mm/kasan/common.c:48
>  set_track mm/kasan/common.c:56 [inline]
>  kasan_set_free_info mm/kasan/common.c:316 [inline]
>  __kasan_slab_free+0xf5/0x140 mm/kasan/common.c:455
>  __cache_free mm/slab.c:3426 [inline]
>  kfree+0x103/0x2c0 mm/slab.c:3757
>  kvfree+0x42/0x50 mm/util.c:603
>  device_release+0x71/0x200 drivers/base/core.c:1559
>  kobject_cleanup lib/kobject.c:693 [inline]
>  kobject_release lib/kobject.c:722 [inline]
>  kref_put include/linux/kref.h:65 [inline]
>  kobject_put+0x1c0/0x270 lib/kobject.c:739
>  put_device+0x1b/0x30 drivers/base/core.c:2779
>  free_netdev+0x35d/0x480 net/core/dev.c:10054
>  __ip_tunnel_create+0x48f/0x580 net/ipv4/ip_tunnel.c:274
>  ip_tunnel_init_net+0x32b/0x980 net/ipv4/ip_tunnel.c:1072
>  ops_init+0xaf/0x470 net/core/net_namespace.c:151
>  setup_net+0x2d8/0x850 net/core/net_namespace.c:341
>  copy_net_ns+0x2cf/0x5e0 net/core/net_namespace.c:482
>  create_new_namespaces+0x3f6/0xb10 kernel/nsproxy.c:110
>  unshare_nsproxy_namespaces+0xbd/0x1f0 kernel/nsproxy.c:231
>  ksys_unshare+0x36c/0x9a0 kernel/fork.c:2983
>  __do_sys_unshare kernel/fork.c:3051 [inline]
>  __se_sys_unshare kernel/fork.c:3049 [inline]
>  __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3049
>  do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The buggy address belongs to the object at ffff88807b13e000
>  which belongs to the cache kmalloc-4k of size 4096
> The buggy address is located 1384 bytes inside of
>  4096-byte region [ffff88807b13e000, ffff88807b13f000)
> The buggy address belongs to the page:
> page:ffffea0001ec4f80 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 head:ffffea0001ec4f80 order:1 compound_mapcount:0
> flags: 0xfffe0000010200(slab|head)
> raw: 00fffe0000010200 ffffea0001ecce88 ffffea0001987988 ffff8880aa002000
> raw: 0000000000000000 ffff88807b13e000 0000000100000001 0000000000000000
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>  ffff88807b13e400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff88807b13e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >ffff88807b13e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                           ^
>  ffff88807b13e580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff88807b13e600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================
> 
> 
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches

IB roce driver receives NETDEV_UNREGISTER event, calls dev_hold() and
schedules work item to execute, and before wq gets a chance to complete
it, we return to ip_tunnel.c:274 and call free_netdev(), and then later
we get UAF when scheduled function references already freed net_device

i added verbose logging to ip_tunnel.c to see pcpu_refcnt:
+       pr_info("about to free_netdev(dev) dev->pcpu_refcnt %d", netdev_refcnt_read(dev));

and got the following:
[  410.220127][ T2944] ip_tunnel: about to free_netdev(dev) dev->pcpu_refcnt 8

i tried to make IB roce driver flush wq and work item, but i ran into
lockdep issues
also tried to modify dev core and call netdev_wait_allrefs() but ran
into rntl deadlocks

any hints or help in fixing this would be appreciated, thank you!


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KASAN: use-after-free Read in netdevice_event_work_handler
  2020-07-31 21:11 ` Rustam Kovhaev
@ 2020-08-01  2:23   ` Rustam Kovhaev
  2020-08-02 22:22   ` Jason Gunthorpe
  1 sibling, 0 replies; 8+ messages in thread
From: Rustam Kovhaev @ 2020-08-01  2:23 UTC (permalink / raw)
  To: dledford, jgg, linux-kernel, linux-rdma, syzkaller-bugs

On Fri, Jul 31, 2020 at 02:11:22PM -0700, Rustam Kovhaev wrote:
> On Thu, Jul 09, 2020 at 04:54:19PM -0700, syzbot wrote:
> > Hello,
> > 
> > syzbot found the following crash on:
> > 
> > HEAD commit:    0bddd227 Documentation: update for gcc 4.9 requirement
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1418afb7100000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=66ad203c2bb6d8b
> > dashboard link: https://syzkaller.appspot.com/bug?extid=20b90969babe05609947
> > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12a8edff100000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=167d3bb7100000
> > 
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+20b90969babe05609947@syzkaller.appspotmail.com
> > 
> > ==================================================================
> > BUG: KASAN: use-after-free in dev_put include/linux/netdevice.h:3853 [inline]
> > BUG: KASAN: use-after-free in netdevice_event_work_handler+0x15b/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:627
> > Read of size 8 at addr ffff88807b13e568 by task kworker/u4:0/7
> > 
> > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0-rc4-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Workqueue: gid-cache-wq netdevice_event_work_handler
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:77 [inline]
> >  dump_stack+0x18f/0x20d lib/dump_stack.c:118
> >  print_address_description.constprop.0.cold+0xae/0x436 mm/kasan/report.c:383
> >  __kasan_report mm/kasan/report.c:513 [inline]
> >  kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
> >  dev_put include/linux/netdevice.h:3853 [inline]
> >  netdevice_event_work_handler+0x15b/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:627
> >  process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
> >  worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
> >  kthread+0x3b5/0x4a0 kernel/kthread.c:291
> >  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
> > 
> > Allocated by task 13061:
> >  save_stack+0x1b/0x40 mm/kasan/common.c:48
> >  set_track mm/kasan/common.c:56 [inline]
> >  __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:494
> >  kmalloc_node include/linux/slab.h:578 [inline]
> >  kvmalloc_node+0x61/0xf0 mm/util.c:574
> >  kvmalloc include/linux/mm.h:753 [inline]
> >  kvzalloc include/linux/mm.h:761 [inline]
> >  alloc_netdev_mqs+0x97/0xdc0 net/core/dev.c:9938
> >  __ip_tunnel_create+0x201/0x580 net/ipv4/ip_tunnel.c:254
> >  ip_tunnel_init_net+0x32b/0x980 net/ipv4/ip_tunnel.c:1072
> >  ops_init+0xaf/0x470 net/core/net_namespace.c:151
> >  setup_net+0x2d8/0x850 net/core/net_namespace.c:341
> >  copy_net_ns+0x2cf/0x5e0 net/core/net_namespace.c:482
> >  create_new_namespaces+0x3f6/0xb10 kernel/nsproxy.c:110
> >  unshare_nsproxy_namespaces+0xbd/0x1f0 kernel/nsproxy.c:231
> >  ksys_unshare+0x36c/0x9a0 kernel/fork.c:2983
> >  __do_sys_unshare kernel/fork.c:3051 [inline]
> >  __se_sys_unshare kernel/fork.c:3049 [inline]
> >  __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3049
> >  do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
> >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > Freed by task 13061:
> >  save_stack+0x1b/0x40 mm/kasan/common.c:48
> >  set_track mm/kasan/common.c:56 [inline]
> >  kasan_set_free_info mm/kasan/common.c:316 [inline]
> >  __kasan_slab_free+0xf5/0x140 mm/kasan/common.c:455
> >  __cache_free mm/slab.c:3426 [inline]
> >  kfree+0x103/0x2c0 mm/slab.c:3757
> >  kvfree+0x42/0x50 mm/util.c:603
> >  device_release+0x71/0x200 drivers/base/core.c:1559
> >  kobject_cleanup lib/kobject.c:693 [inline]
> >  kobject_release lib/kobject.c:722 [inline]
> >  kref_put include/linux/kref.h:65 [inline]
> >  kobject_put+0x1c0/0x270 lib/kobject.c:739
> >  put_device+0x1b/0x30 drivers/base/core.c:2779
> >  free_netdev+0x35d/0x480 net/core/dev.c:10054
> >  __ip_tunnel_create+0x48f/0x580 net/ipv4/ip_tunnel.c:274
> >  ip_tunnel_init_net+0x32b/0x980 net/ipv4/ip_tunnel.c:1072
> >  ops_init+0xaf/0x470 net/core/net_namespace.c:151
> >  setup_net+0x2d8/0x850 net/core/net_namespace.c:341
> >  copy_net_ns+0x2cf/0x5e0 net/core/net_namespace.c:482
> >  create_new_namespaces+0x3f6/0xb10 kernel/nsproxy.c:110
> >  unshare_nsproxy_namespaces+0xbd/0x1f0 kernel/nsproxy.c:231
> >  ksys_unshare+0x36c/0x9a0 kernel/fork.c:2983
> >  __do_sys_unshare kernel/fork.c:3051 [inline]
> >  __se_sys_unshare kernel/fork.c:3049 [inline]
> >  __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3049
> >  do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
> >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > The buggy address belongs to the object at ffff88807b13e000
> >  which belongs to the cache kmalloc-4k of size 4096
> > The buggy address is located 1384 bytes inside of
> >  4096-byte region [ffff88807b13e000, ffff88807b13f000)
> > The buggy address belongs to the page:
> > page:ffffea0001ec4f80 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 head:ffffea0001ec4f80 order:1 compound_mapcount:0
> > flags: 0xfffe0000010200(slab|head)
> > raw: 00fffe0000010200 ffffea0001ecce88 ffffea0001987988 ffff8880aa002000
> > raw: 0000000000000000 ffff88807b13e000 0000000100000001 0000000000000000
> > page dumped because: kasan: bad access detected
> > 
> > Memory state around the buggy address:
> >  ffff88807b13e400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >  ffff88807b13e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > >ffff88807b13e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >                                                           ^
> >  ffff88807b13e580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >  ffff88807b13e600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ==================================================================
> > 
> > 
> > ---
> > This bug is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@googlegroups.com.
> > 
> > syzbot will keep track of this bug report. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > syzbot can test patches for this bug, for details see:
> > https://goo.gl/tpsmEJ#testing-patches
> 
> IB roce driver receives NETDEV_UNREGISTER event, calls dev_hold() and
> schedules work item to execute, and before wq gets a chance to complete
> it, we return to ip_tunnel.c:274 and call free_netdev(), and then later
> we get UAF when scheduled function references already freed net_device
> 
> i added verbose logging to ip_tunnel.c to see pcpu_refcnt:
> +       pr_info("about to free_netdev(dev) dev->pcpu_refcnt %d", netdev_refcnt_read(dev));
> 
> and got the following:
> [  410.220127][ T2944] ip_tunnel: about to free_netdev(dev) dev->pcpu_refcnt 8
> 
> i tried to make IB roce driver flush wq and work item, but i ran into
> lockdep issues
> also tried to modify dev core and call netdev_wait_allrefs() but ran
> into rntl deadlocks
> 
> any hints or help in fixing this would be appreciated, thank you!
> 

does the patch below look sane to you? or is it complete nonsense?
syzbot test came out OK, but it might not mean anything

diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c
index 2860def84f4d..b31c8969c8b2 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -626,6 +626,7 @@ static void netdevice_event_work_handler(struct work_struct *_work)
 					 work->cmds[i].ndev);
 		dev_put(work->cmds[i].ndev);
 		dev_put(work->cmds[i].filter_ndev);
+		put_device(&work->cmds[i].ndev->dev);
 	}
 
 	kfree(work);
@@ -649,6 +650,7 @@ static int netdevice_queue_work(struct netdev_event_work_cmd *cmds,
 			ndev_work->cmds[i].filter_ndev = ndev;
 		dev_hold(ndev_work->cmds[i].ndev);
 		dev_hold(ndev_work->cmds[i].filter_ndev);
+		get_device(&ndev_work->cmds[i].ndev->dev);
 	}
 	INIT_WORK(&ndev_work->work, netdevice_event_work_handler);
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: KASAN: use-after-free Read in netdevice_event_work_handler
  2020-07-31 21:11 ` Rustam Kovhaev
  2020-08-01  2:23   ` Rustam Kovhaev
@ 2020-08-02 22:22   ` Jason Gunthorpe
  2020-08-04 20:00     ` Rustam Kovhaev
  1 sibling, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2020-08-02 22:22 UTC (permalink / raw)
  To: Rustam Kovhaev; +Cc: dledford, linux-kernel, linux-rdma, syzkaller-bugs

On Fri, Jul 31, 2020 at 02:11:22PM -0700, Rustam Kovhaev wrote:

> IB roce driver receives NETDEV_UNREGISTER event, calls dev_hold() and
> schedules work item to execute, and before wq gets a chance to complete
> it, we return to ip_tunnel.c:274 and call free_netdev(), and then later
> we get UAF when scheduled function references already freed net_device
> 
> i added verbose logging to ip_tunnel.c to see pcpu_refcnt:
> +       pr_info("about to free_netdev(dev) dev->pcpu_refcnt %d", netdev_refcnt_read(dev));
> 
> and got the following:
> [  410.220127][ T2944] ip_tunnel: about to free_netdev(dev) dev->pcpu_refcnt 8

I think there is a missing call to netdev_wait_allrefs() in the
rollback_registered_many().

The normal success flow has this wait after delivering
NETDEV_UNREGISTER, the error unwind for register_netdevice should as
well.

If the netdevice can progress to free while a dev_hold is active I
think it means dev_hold is functionally useless.

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KASAN: use-after-free Read in netdevice_event_work_handler
  2020-08-02 22:22   ` Jason Gunthorpe
@ 2020-08-04 20:00     ` Rustam Kovhaev
  2020-08-05 15:20       ` Jason Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Rustam Kovhaev @ 2020-08-04 20:00 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: dledford, linux-kernel, linux-rdma, syzkaller-bugs

On Sun, Aug 02, 2020 at 07:22:26PM -0300, Jason Gunthorpe wrote:
> On Fri, Jul 31, 2020 at 02:11:22PM -0700, Rustam Kovhaev wrote:
> 
> > IB roce driver receives NETDEV_UNREGISTER event, calls dev_hold() and
> > schedules work item to execute, and before wq gets a chance to complete
> > it, we return to ip_tunnel.c:274 and call free_netdev(), and then later
> > we get UAF when scheduled function references already freed net_device
> > 
> > i added verbose logging to ip_tunnel.c to see pcpu_refcnt:
> > +       pr_info("about to free_netdev(dev) dev->pcpu_refcnt %d", netdev_refcnt_read(dev));
> > 
> > and got the following:
> > [  410.220127][ T2944] ip_tunnel: about to free_netdev(dev) dev->pcpu_refcnt 8
> 
> I think there is a missing call to netdev_wait_allrefs() in the
> rollback_registered_many().
calling it there leads to rtnl deadlock, i think we should call
net_set_todo(), so that later when we call rtnl_unlock() it will
execute netdev_run_todo() and there it will proceed to calling
netdev_wait_allrefs(), but in ip tunnel i will need get
free_netdev() to be called after we unlock rtnl mutex
i'll try to send a new patch for review

> The normal success flow has this wait after delivering
> NETDEV_UNREGISTER, the error unwind for register_netdevice should as
> well.
> 
> If the netdevice can progress to free while a dev_hold is active I
> think it means dev_hold is functionally useless.
good point


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KASAN: use-after-free Read in netdevice_event_work_handler
  2020-08-04 20:00     ` Rustam Kovhaev
@ 2020-08-05 15:20       ` Jason Gunthorpe
  0 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2020-08-05 15:20 UTC (permalink / raw)
  To: Rustam Kovhaev; +Cc: dledford, linux-kernel, linux-rdma, syzkaller-bugs

On Tue, Aug 04, 2020 at 01:00:13PM -0700, Rustam Kovhaev wrote:
> On Sun, Aug 02, 2020 at 07:22:26PM -0300, Jason Gunthorpe wrote:
> > On Fri, Jul 31, 2020 at 02:11:22PM -0700, Rustam Kovhaev wrote:
> > 
> > > IB roce driver receives NETDEV_UNREGISTER event, calls dev_hold() and
> > > schedules work item to execute, and before wq gets a chance to complete
> > > it, we return to ip_tunnel.c:274 and call free_netdev(), and then later
> > > we get UAF when scheduled function references already freed net_device
> > > 
> > > i added verbose logging to ip_tunnel.c to see pcpu_refcnt:
> > > +       pr_info("about to free_netdev(dev) dev->pcpu_refcnt %d", netdev_refcnt_read(dev));
> > > 
> > > and got the following:
> > > [  410.220127][ T2944] ip_tunnel: about to free_netdev(dev) dev->pcpu_refcnt 8
> > 
> > I think there is a missing call to netdev_wait_allrefs() in the
> > rollback_registered_many().
> calling it there leads to rtnl deadlock, i think we should call
> net_set_todo(), so that later when we call rtnl_unlock() it will
> execute netdev_run_todo() and there it will proceed to calling
> netdev_wait_allrefs(), but in ip tunnel i will need get
> free_netdev() to be called after we unlock rtnl mutex
> i'll try to send a new patch for review

Oh the whole register is called under rtnl? Yikes..

This is probably a systemic problem with register_netdevice error
unwind, not just ip tunnel

The other way to handle it would be to organize things so that
register cannot fail once it starts calling notifiers?

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KASAN: use-after-free Read in netdevice_event_work_handler
       [not found] <20200731211122.GA1728751 () thinkpad>
@ 2020-08-01  2:06 ` Coiby Xu
  0 siblings, 0 replies; 8+ messages in thread
From: Coiby Xu @ 2020-08-01  2:06 UTC (permalink / raw)
  To: Rustam Kovhaev
  Cc: dledford, jgg, linux-kernel, linux-rdma, syzkaller-bugs, David S. Miller

On Fri, Jul 31, 2020 at 09:11:22PM +0000, Rustam Kovhaev wrote:
>On Thu, Jul 09, 2020 at 04:54:19PM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    0bddd227 Documentation: update for gcc 4.9 requirement
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1418afb7100000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=66ad203c2bb6d8b
>> dashboard link: https://syzkaller.appspot.com/bug?extid=20b90969babe05609947
>> compiler:       gcc (GCC) 10.1.0-syz 20200507
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12a8edff100000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=167d3bb7100000
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+20b90969babe05609947@syzkaller.appspotmail.com
>>
>> ==================================================================
>> BUG: KASAN: use-after-free in dev_put include/linux/netdevice.h:3853 [inline]
>> BUG: KASAN: use-after-free in netdevice_event_work_handler+0x15b/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:627
>> Read of size 8 at addr ffff88807b13e568 by task kworker/u4:0/7
>>
>> CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0-rc4-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> Workqueue: gid-cache-wq netdevice_event_work_handler
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:77 [inline]
>>  dump_stack+0x18f/0x20d lib/dump_stack.c:118
>>  print_address_description.constprop.0.cold+0xae/0x436 mm/kasan/report.c:383
>>  __kasan_report mm/kasan/report.c:513 [inline]
>>  kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
>>  dev_put include/linux/netdevice.h:3853 [inline]
>>  netdevice_event_work_handler+0x15b/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:627
>>  process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
>>  worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
>>  kthread+0x3b5/0x4a0 kernel/kthread.c:291
>>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
>>
>> Allocated by task 13061:
>>  save_stack+0x1b/0x40 mm/kasan/common.c:48
>>  set_track mm/kasan/common.c:56 [inline]
>>  __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:494
>>  kmalloc_node include/linux/slab.h:578 [inline]
>>  kvmalloc_node+0x61/0xf0 mm/util.c:574
>>  kvmalloc include/linux/mm.h:753 [inline]
>>  kvzalloc include/linux/mm.h:761 [inline]
>>  alloc_netdev_mqs+0x97/0xdc0 net/core/dev.c:9938
>>  __ip_tunnel_create+0x201/0x580 net/ipv4/ip_tunnel.c:254
>>  ip_tunnel_init_net+0x32b/0x980 net/ipv4/ip_tunnel.c:1072
>>  ops_init+0xaf/0x470 net/core/net_namespace.c:151
>>  setup_net+0x2d8/0x850 net/core/net_namespace.c:341
>>  copy_net_ns+0x2cf/0x5e0 net/core/net_namespace.c:482
>>  create_new_namespaces+0x3f6/0xb10 kernel/nsproxy.c:110
>>  unshare_nsproxy_namespaces+0xbd/0x1f0 kernel/nsproxy.c:231
>>  ksys_unshare+0x36c/0x9a0 kernel/fork.c:2983
>>  __do_sys_unshare kernel/fork.c:3051 [inline]
>>  __se_sys_unshare kernel/fork.c:3049 [inline]
>>  __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3049
>>  do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
>>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> Freed by task 13061:
>>  save_stack+0x1b/0x40 mm/kasan/common.c:48
>>  set_track mm/kasan/common.c:56 [inline]
>>  kasan_set_free_info mm/kasan/common.c:316 [inline]
>>  __kasan_slab_free+0xf5/0x140 mm/kasan/common.c:455
>>  __cache_free mm/slab.c:3426 [inline]
>>  kfree+0x103/0x2c0 mm/slab.c:3757
>>  kvfree+0x42/0x50 mm/util.c:603
>>  device_release+0x71/0x200 drivers/base/core.c:1559
>>  kobject_cleanup lib/kobject.c:693 [inline]
>>  kobject_release lib/kobject.c:722 [inline]
>>  kref_put include/linux/kref.h:65 [inline]
>>  kobject_put+0x1c0/0x270 lib/kobject.c:739
>>  put_device+0x1b/0x30 drivers/base/core.c:2779
>>  free_netdev+0x35d/0x480 net/core/dev.c:10054
>>  __ip_tunnel_create+0x48f/0x580 net/ipv4/ip_tunnel.c:274
>>  ip_tunnel_init_net+0x32b/0x980 net/ipv4/ip_tunnel.c:1072
>>  ops_init+0xaf/0x470 net/core/net_namespace.c:151
>>  setup_net+0x2d8/0x850 net/core/net_namespace.c:341
>>  copy_net_ns+0x2cf/0x5e0 net/core/net_namespace.c:482
>>  create_new_namespaces+0x3f6/0xb10 kernel/nsproxy.c:110
>>  unshare_nsproxy_namespaces+0xbd/0x1f0 kernel/nsproxy.c:231
>>  ksys_unshare+0x36c/0x9a0 kernel/fork.c:2983
>>  __do_sys_unshare kernel/fork.c:3051 [inline]
>>  __se_sys_unshare kernel/fork.c:3049 [inline]
>>  __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3049
>>  do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
>>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> The buggy address belongs to the object at ffff88807b13e000
>>  which belongs to the cache kmalloc-4k of size 4096
>> The buggy address is located 1384 bytes inside of
>>  4096-byte region [ffff88807b13e000, ffff88807b13f000)
>> The buggy address belongs to the page:
>> page:ffffea0001ec4f80 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 head:ffffea0001ec4f80 order:1 compound_mapcount:0
>> flags: 0xfffe0000010200(slab|head)
>> raw: 00fffe0000010200 ffffea0001ecce88 ffffea0001987988 ffff8880aa002000
>> raw: 0000000000000000 ffff88807b13e000 0000000100000001 0000000000000000
>> page dumped because: kasan: bad access detected
>>
>> Memory state around the buggy address:
>>  ffff88807b13e400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>  ffff88807b13e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> >ffff88807b13e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>                                                           ^
>>  ffff88807b13e580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>  ffff88807b13e600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ==================================================================
>>
>>
>> ---
>> This bug is generated by a bot. It may contain errors.
>> See https://goo.gl/tpsmEJ for more information about syzbot.
>> syzbot engineers can be reached at syzkaller@googlegroups.com.
>>
>> syzbot will keep track of this bug report. See:
>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>> syzbot can test patches for this bug, for details see:
>> https://goo.gl/tpsmEJ#testing-patches
>
>IB roce driver receives NETDEV_UNREGISTER event, calls dev_hold() and
>schedules work item to execute, and before wq gets a chance to complete
>it, we return to ip_tunnel.c:274 and call free_netdev(), and then later
>we get UAF when scheduled function references already freed net_device
>
>i added verbose logging to ip_tunnel.c to see pcpu_refcnt:
>+       pr_info("about to free_netdev(dev) dev->pcpu_refcnt %d", netdev_refcnt_read(dev));
>
>and got the following:
>[  410.220127][ T2944] ip_tunnel: about to free_netdev(dev) dev->pcpu_refcnt 8
>
>i tried to make IB roce driver flush wq and work item, but i ran into
>lockdep issues
>also tried to modify dev core and call netdev_wait_allrefs() but ran
>into rntl deadlocks
>
>any hints or help in fixing this would be appreciated, thank you!

Congratulations on your latest patch [1] which has fixed this issue! I've
also been trying to fix this bug. Your solution is the right one and simpler
than mine:) I've also realized struct device shouldn't be freed at all
since dev_hold is called but haven't figured out why,
     // drivers/infiniband/core/roce_gid_mgmt.c
     static int netdevice_queue_work(struct netdev_event_work_cmd *cmds,
     				struct net_device *ndev)
     {
         ...
     	for (i = 0; i < ARRAY_SIZE(ndev_work->cmds) && ndev_work->cmds[i].cb; i++) {
     		dev_hold(ndev_work->cmds[i].ndev);
             ...
     	}
         ...
     	return NOTIFY_DONE;
     }

But maybe we could make an improvement which is to fix what makes the
nofitied callback functions mpls_dev_notify and addrconf_notify fail in
the first place. Currently I simply let mpls_dev_notify and addrconf_notify
return NOTIFY_DONE [2] [3] so call_netdevice_notifiers and register_netdevice
would succeed.

My observation is that mpls_dev_notify and addrconf_notify are not per-netns
netdevice callback functions of the notified (the subscriber) and they could
fail.

When per-netns netdevice callback functions are successful and global ones
like mpls_dev_notify fail, register_netdevice would then fails.

     // net/core/dev.c
     static int call_netdevice_notifiers_info(unsigned long val,
     					 struct netdev_notifier_info *info)
     {
     	struct net *net = dev_net(info->dev);
     	int ret;

     	ASSERT_RTNL();

     	/* Run per-netns notifier block chain first, then run the global one.
     	 * Hopefully, one day, the global one is going to be removed after
     	 * all notifier block registrators get converted to be per-netns.
     	 */
     	ret = raw_notifier_call_chain(&net->netdev_chain, val, info);
     	if (ret & NOTIFY_STOP_MASK)
     		return ret;
     	return raw_notifier_call_chain(&netdev_chain, val, info);
     }


This causes the created net_device struct to be destroyed. Since per-netns
netdevice callback functions have been successful, the netdevice is still
used by netdevice_event_work_handler which cause this "KASAN: use-after-free"
bug.


[1] https://syzkaller.appspot.com/text?tag=Patch&x=1799f8a2900000
[2] https://github.com/coiby/linux/commit/859f817317c8cd4a17af8879094db8697a2c7754
[3] https://github.com/coiby/linux/commit/260ddb9ec9b124ca7ad93f368eadb90b16edf2ef

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-08-05 19:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-09 23:54 KASAN: use-after-free Read in netdevice_event_work_handler syzbot
2020-07-22 20:29 ` syzbot
2020-07-31 21:11 ` Rustam Kovhaev
2020-08-01  2:23   ` Rustam Kovhaev
2020-08-02 22:22   ` Jason Gunthorpe
2020-08-04 20:00     ` Rustam Kovhaev
2020-08-05 15:20       ` Jason Gunthorpe
     [not found] <20200731211122.GA1728751 () thinkpad>
2020-08-01  2:06 ` Coiby Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).