linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* WARNING in ib_umad_kill_port
@ 2020-04-06  6:37 syzbot
  2020-04-06 17:21 ` Leon Romanovsky
  0 siblings, 1 reply; 9+ messages in thread
From: syzbot @ 2020-04-06  6:37 UTC (permalink / raw)
  To: gregkh, linux-kernel, netdev, rafael, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    304e0242 net_sched: add a temporary refcnt for struct tcin..
git tree:       net
console output: https://syzkaller.appspot.com/x/log.txt?x=119dd16de00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8c1e98458335a7d1
dashboard link: https://syzkaller.appspot.com/bug?extid=9627a92b1f9262d5d30c
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+9627a92b1f9262d5d30c@syzkaller.appspotmail.com

------------[ cut here ]------------
sysfs group 'power' not found for kobject 'umad1'
WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group fs/sysfs/group.c:279 [inline]
WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 31308 Comm: kworker/u4:10 Not tainted 5.6.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_unbound ib_unregister_work
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x188/0x20d lib/dump_stack.c:118
 panic+0x2e3/0x75c kernel/panic.c:221
 __warn.cold+0x2f/0x35 kernel/panic.c:582
 report_bug+0x27b/0x2f0 lib/bug.c:195
 fixup_bug arch/x86/kernel/traps.c:175 [inline]
 fixup_bug arch/x86/kernel/traps.c:170 [inline]
 do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267
 do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286
 invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:sysfs_remove_group fs/sysfs/group.c:279 [inline]
RIP: 0010:sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270
Code: 48 89 d9 49 8b 14 24 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 80 3c 01 00 75 41 48 8b 33 48 c7 c7 60 c3 39 88 e8 93 c3 5f ff <0f> 0b eb 95 e8 22 62 cb ff e9 d2 fe ff ff 48 89 df e8 15 62 cb ff
RSP: 0018:ffffc90001d97a60 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffffffff88915620 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff815ca861 RDI: fffff520003b2f3e
RBP: 0000000000000000 R08: ffff8880a78fc2c0 R09: ffffed1015ce66a1
R10: ffffed1015ce66a0 R11: ffff8880ae733507 R12: ffff88808e5ba070
R13: ffffffff88915bc0 R14: ffff88808e5ba008 R15: dffffc0000000000
 dpm_sysfs_remove+0x97/0xb0 drivers/base/power/sysfs.c:794
 device_del+0x18b/0xd30 drivers/base/core.c:2687
 cdev_device_del+0x15/0x80 fs/char_dev.c:570
 ib_umad_kill_port+0x45/0x250 drivers/infiniband/core/user_mad.c:1327
 ib_umad_remove_one+0x18a/0x220 drivers/infiniband/core/user_mad.c:1409
 remove_client_context+0xbe/0x110 drivers/infiniband/core/device.c:724
 disable_device+0x13b/0x230 drivers/infiniband/core/device.c:1270
 __ib_unregister_device+0x91/0x180 drivers/infiniband/core/device.c:1437
 ib_unregister_work+0x15/0x30 drivers/infiniband/core/device.c:1547
 process_one_work+0x965/0x16a0 kernel/workqueue.c:2266
 worker_thread+0x96/0xe20 kernel/workqueue.c:2412
 kthread+0x388/0x470 kernel/kthread.c:268
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: WARNING in ib_umad_kill_port
  2020-04-06  6:37 WARNING in ib_umad_kill_port syzbot
@ 2020-04-06 17:21 ` Leon Romanovsky
  2020-04-06 17:44   ` Jason Gunthorpe
  0 siblings, 1 reply; 9+ messages in thread
From: Leon Romanovsky @ 2020-04-06 17:21 UTC (permalink / raw)
  To: syzbot, RDMA mailing list
  Cc: gregkh, linux-kernel, netdev, rafael, syzkaller-bugs

+ RDMA

On Sun, Apr 05, 2020 at 11:37:15PM -0700, syzbot wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    304e0242 net_sched: add a temporary refcnt for struct tcin..
> git tree:       net
> console output: https://syzkaller.appspot.com/x/log.txt?x=119dd16de00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=8c1e98458335a7d1
> dashboard link: https://syzkaller.appspot.com/bug?extid=9627a92b1f9262d5d30c
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+9627a92b1f9262d5d30c@syzkaller.appspotmail.com
>
> ------------[ cut here ]------------
> sysfs group 'power' not found for kobject 'umad1'
> WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group fs/sysfs/group.c:279 [inline]
> WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270
> Kernel panic - not syncing: panic_on_warn set ...
> CPU: 1 PID: 31308 Comm: kworker/u4:10 Not tainted 5.6.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: events_unbound ib_unregister_work
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x188/0x20d lib/dump_stack.c:118
>  panic+0x2e3/0x75c kernel/panic.c:221
>  __warn.cold+0x2f/0x35 kernel/panic.c:582
>  report_bug+0x27b/0x2f0 lib/bug.c:195
>  fixup_bug arch/x86/kernel/traps.c:175 [inline]
>  fixup_bug arch/x86/kernel/traps.c:170 [inline]
>  do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267
>  do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286
>  invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
> RIP: 0010:sysfs_remove_group fs/sysfs/group.c:279 [inline]
> RIP: 0010:sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270
> Code: 48 89 d9 49 8b 14 24 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 80 3c 01 00 75 41 48 8b 33 48 c7 c7 60 c3 39 88 e8 93 c3 5f ff <0f> 0b eb 95 e8 22 62 cb ff e9 d2 fe ff ff 48 89 df e8 15 62 cb ff
> RSP: 0018:ffffc90001d97a60 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: ffffffff88915620 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffffffff815ca861 RDI: fffff520003b2f3e
> RBP: 0000000000000000 R08: ffff8880a78fc2c0 R09: ffffed1015ce66a1
> R10: ffffed1015ce66a0 R11: ffff8880ae733507 R12: ffff88808e5ba070
> R13: ffffffff88915bc0 R14: ffff88808e5ba008 R15: dffffc0000000000
>  dpm_sysfs_remove+0x97/0xb0 drivers/base/power/sysfs.c:794
>  device_del+0x18b/0xd30 drivers/base/core.c:2687
>  cdev_device_del+0x15/0x80 fs/char_dev.c:570
>  ib_umad_kill_port+0x45/0x250 drivers/infiniband/core/user_mad.c:1327
>  ib_umad_remove_one+0x18a/0x220 drivers/infiniband/core/user_mad.c:1409
>  remove_client_context+0xbe/0x110 drivers/infiniband/core/device.c:724
>  disable_device+0x13b/0x230 drivers/infiniband/core/device.c:1270
>  __ib_unregister_device+0x91/0x180 drivers/infiniband/core/device.c:1437
>  ib_unregister_work+0x15/0x30 drivers/infiniband/core/device.c:1547
>  process_one_work+0x965/0x16a0 kernel/workqueue.c:2266
>  worker_thread+0x96/0xe20 kernel/workqueue.c:2412
>  kthread+0x388/0x470 kernel/kthread.c:268
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: WARNING in ib_umad_kill_port
  2020-04-06 17:21 ` Leon Romanovsky
@ 2020-04-06 17:44   ` Jason Gunthorpe
  2020-04-07  9:56     ` Dmitry Vyukov
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2020-04-06 17:44 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: syzbot, RDMA mailing list, gregkh, linux-kernel, netdev, rafael,
	syzkaller-bugs

On Mon, Apr 06, 2020 at 08:21:51PM +0300, Leon Romanovsky wrote:
> + RDMA
> 
> On Sun, Apr 05, 2020 at 11:37:15PM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:    304e0242 net_sched: add a temporary refcnt for struct tcin..
> > git tree:       net
> > console output: https://syzkaller.appspot.com/x/log.txt?x=119dd16de00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=8c1e98458335a7d1
> > dashboard link: https://syzkaller.appspot.com/bug?extid=9627a92b1f9262d5d30c
> > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> >
> > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+9627a92b1f9262d5d30c@syzkaller.appspotmail.com
> >
> > sysfs group 'power' not found for kobject 'umad1'
> > WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group fs/sysfs/group.c:279 [inline]
> > WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270
> > Kernel panic - not syncing: panic_on_warn set ...
> > CPU: 1 PID: 31308 Comm: kworker/u4:10 Not tainted 5.6.0-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Workqueue: events_unbound ib_unregister_work
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:77 [inline]
> >  dump_stack+0x188/0x20d lib/dump_stack.c:118
> >  panic+0x2e3/0x75c kernel/panic.c:221
> >  __warn.cold+0x2f/0x35 kernel/panic.c:582
> >  report_bug+0x27b/0x2f0 lib/bug.c:195
> >  fixup_bug arch/x86/kernel/traps.c:175 [inline]
> >  fixup_bug arch/x86/kernel/traps.c:170 [inline]
> >  do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267
> >  do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286
> >  invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
> > RIP: 0010:sysfs_remove_group fs/sysfs/group.c:279 [inline]
> > RIP: 0010:sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270
> > Code: 48 89 d9 49 8b 14 24 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 80 3c 01 00 75 41 48 8b 33 48 c7 c7 60 c3 39 88 e8 93 c3 5f ff <0f> 0b eb 95 e8 22 62 cb ff e9 d2 fe ff ff 48 89 df e8 15 62 cb ff
> > RSP: 0018:ffffc90001d97a60 EFLAGS: 00010282
> > RAX: 0000000000000000 RBX: ffffffff88915620 RCX: 0000000000000000
> > RDX: 0000000000000000 RSI: ffffffff815ca861 RDI: fffff520003b2f3e
> > RBP: 0000000000000000 R08: ffff8880a78fc2c0 R09: ffffed1015ce66a1
> > R10: ffffed1015ce66a0 R11: ffff8880ae733507 R12: ffff88808e5ba070
> > R13: ffffffff88915bc0 R14: ffff88808e5ba008 R15: dffffc0000000000
> >  dpm_sysfs_remove+0x97/0xb0 drivers/base/power/sysfs.c:794
> >  device_del+0x18b/0xd30 drivers/base/core.c:2687
> >  cdev_device_del+0x15/0x80 fs/char_dev.c:570
> >  ib_umad_kill_port+0x45/0x250 drivers/infiniband/core/user_mad.c:1327
> >  ib_umad_remove_one+0x18a/0x220 drivers/infiniband/core/user_mad.c:1409
> >  remove_client_context+0xbe/0x110 drivers/infiniband/core/device.c:724
> >  disable_device+0x13b/0x230 drivers/infiniband/core/device.c:1270
> >  __ib_unregister_device+0x91/0x180 drivers/infiniband/core/device.c:1437
> >  ib_unregister_work+0x15/0x30 drivers/infiniband/core/device.c:1547
> >  process_one_work+0x965/0x16a0 kernel/workqueue.c:2266
> >  worker_thread+0x96/0xe20 kernel/workqueue.c:2412
> >  kthread+0x388/0x470 kernel/kthread.c:268
> >  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> > Kernel Offset: disabled
> > Rebooting in 86400 seconds..

I'm not sure what could be done wrong here to elicit this:

 sysfs group 'power' not found for kobject 'umad1'

??

I've seen another similar sysfs related trigger that we couldn't
figure out.

Hard to investigate without a reproducer.

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: WARNING in ib_umad_kill_port
  2020-04-06 17:44   ` Jason Gunthorpe
@ 2020-04-07  9:56     ` Dmitry Vyukov
  2020-04-07 11:55       ` Jason Gunthorpe
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Vyukov @ 2020-04-07  9:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, syzbot, RDMA mailing list, Greg Kroah-Hartman,
	LKML, netdev, Rafael Wysocki, syzkaller-bugs

On Mon, Apr 6, 2020 at 7:44 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Mon, Apr 06, 2020 at 08:21:51PM +0300, Leon Romanovsky wrote:
> > + RDMA
> >
> > On Sun, Apr 05, 2020 at 11:37:15PM -0700, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit:    304e0242 net_sched: add a temporary refcnt for struct tcin..
> > > git tree:       net
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=119dd16de00000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=8c1e98458335a7d1
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=9627a92b1f9262d5d30c
> > > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > >
> > > Unfortunately, I don't have any reproducer for this crash yet.
> > >
> > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > Reported-by: syzbot+9627a92b1f9262d5d30c@syzkaller.appspotmail.com
> > >
> > > sysfs group 'power' not found for kobject 'umad1'
> > > WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group fs/sysfs/group.c:279 [inline]
> > > WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270
> > > Kernel panic - not syncing: panic_on_warn set ...
> > > CPU: 1 PID: 31308 Comm: kworker/u4:10 Not tainted 5.6.0-syzkaller #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > > Workqueue: events_unbound ib_unregister_work
> > > Call Trace:
> > >  __dump_stack lib/dump_stack.c:77 [inline]
> > >  dump_stack+0x188/0x20d lib/dump_stack.c:118
> > >  panic+0x2e3/0x75c kernel/panic.c:221
> > >  __warn.cold+0x2f/0x35 kernel/panic.c:582
> > >  report_bug+0x27b/0x2f0 lib/bug.c:195
> > >  fixup_bug arch/x86/kernel/traps.c:175 [inline]
> > >  fixup_bug arch/x86/kernel/traps.c:170 [inline]
> > >  do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267
> > >  do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286
> > >  invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
> > > RIP: 0010:sysfs_remove_group fs/sysfs/group.c:279 [inline]
> > > RIP: 0010:sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270
> > > Code: 48 89 d9 49 8b 14 24 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 80 3c 01 00 75 41 48 8b 33 48 c7 c7 60 c3 39 88 e8 93 c3 5f ff <0f> 0b eb 95 e8 22 62 cb ff e9 d2 fe ff ff 48 89 df e8 15 62 cb ff
> > > RSP: 0018:ffffc90001d97a60 EFLAGS: 00010282
> > > RAX: 0000000000000000 RBX: ffffffff88915620 RCX: 0000000000000000
> > > RDX: 0000000000000000 RSI: ffffffff815ca861 RDI: fffff520003b2f3e
> > > RBP: 0000000000000000 R08: ffff8880a78fc2c0 R09: ffffed1015ce66a1
> > > R10: ffffed1015ce66a0 R11: ffff8880ae733507 R12: ffff88808e5ba070
> > > R13: ffffffff88915bc0 R14: ffff88808e5ba008 R15: dffffc0000000000
> > >  dpm_sysfs_remove+0x97/0xb0 drivers/base/power/sysfs.c:794
> > >  device_del+0x18b/0xd30 drivers/base/core.c:2687
> > >  cdev_device_del+0x15/0x80 fs/char_dev.c:570
> > >  ib_umad_kill_port+0x45/0x250 drivers/infiniband/core/user_mad.c:1327
> > >  ib_umad_remove_one+0x18a/0x220 drivers/infiniband/core/user_mad.c:1409
> > >  remove_client_context+0xbe/0x110 drivers/infiniband/core/device.c:724
> > >  disable_device+0x13b/0x230 drivers/infiniband/core/device.c:1270
> > >  __ib_unregister_device+0x91/0x180 drivers/infiniband/core/device.c:1437
> > >  ib_unregister_work+0x15/0x30 drivers/infiniband/core/device.c:1547
> > >  process_one_work+0x965/0x16a0 kernel/workqueue.c:2266
> > >  worker_thread+0x96/0xe20 kernel/workqueue.c:2412
> > >  kthread+0x388/0x470 kernel/kthread.c:268
> > >  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> > > Kernel Offset: disabled
> > > Rebooting in 86400 seconds..
>
> I'm not sure what could be done wrong here to elicit this:
>
>  sysfs group 'power' not found for kobject 'umad1'
>
> ??
>
> I've seen another similar sysfs related trigger that we couldn't
> figure out.
>
> Hard to investigate without a reproducer.
>
> Jason


Based on all of the sysfs-related bugs I've seen, my bet would be on
some races. E.g. one thread registers devices, while another
unregisters these.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: WARNING in ib_umad_kill_port
  2020-04-07  9:56     ` Dmitry Vyukov
@ 2020-04-07 11:55       ` Jason Gunthorpe
  2020-04-07 12:39         ` Dmitry Vyukov
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2020-04-07 11:55 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Leon Romanovsky, syzbot, RDMA mailing list, Greg Kroah-Hartman,
	LKML, netdev, Rafael Wysocki, syzkaller-bugs

On Tue, Apr 07, 2020 at 11:56:30AM +0200, Dmitry Vyukov wrote:
> > I'm not sure what could be done wrong here to elicit this:
> >
> >  sysfs group 'power' not found for kobject 'umad1'
> >
> > ??
> >
> > I've seen another similar sysfs related trigger that we couldn't
> > figure out.
> >
> > Hard to investigate without a reproducer.
> 
> Based on all of the sysfs-related bugs I've seen, my bet would be on
> some races. E.g. one thread registers devices, while another
> unregisters these.

I did check that the naming is ordered right, at least we won't be
concurrently creating and destroying umadX sysfs of the same names.

I'm also fairly sure we can't be destroying the parent at the same
time as this child.

Do you see the above commonly? Could it be some driver core thing? Or
is it more likely something wrong in umad?

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: WARNING in ib_umad_kill_port
  2020-04-07 11:55       ` Jason Gunthorpe
@ 2020-04-07 12:39         ` Dmitry Vyukov
  2020-04-07 14:33           ` Greg Kroah-Hartman
  2020-04-07 14:35           ` Jason Gunthorpe
  0 siblings, 2 replies; 9+ messages in thread
From: Dmitry Vyukov @ 2020-04-07 12:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, syzbot, RDMA mailing list, Greg Kroah-Hartman,
	LKML, netdev, Rafael Wysocki, syzkaller-bugs

On Tue, Apr 7, 2020 at 1:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Apr 07, 2020 at 11:56:30AM +0200, Dmitry Vyukov wrote:
> > > I'm not sure what could be done wrong here to elicit this:
> > >
> > >  sysfs group 'power' not found for kobject 'umad1'
> > >
> > > ??
> > >
> > > I've seen another similar sysfs related trigger that we couldn't
> > > figure out.
> > >
> > > Hard to investigate without a reproducer.
> >
> > Based on all of the sysfs-related bugs I've seen, my bet would be on
> > some races. E.g. one thread registers devices, while another
> > unregisters these.
>
> I did check that the naming is ordered right, at least we won't be
> concurrently creating and destroying umadX sysfs of the same names.
>
> I'm also fairly sure we can't be destroying the parent at the same
> time as this child.
>
> Do you see the above commonly? Could it be some driver core thing? Or
> is it more likely something wrong in umad?

Mmmm... I can't say, I am looking at some bugs very briefly. I've
noticed that sysfs comes up periodically (or was it some other similar
fs?). General observation is that code frequently assumes only the
happy scenario and only, say, a single administrator doing one thing
at a time, slowly and carefully, and it is not really hardened against
armies of monkeys.
But I did not look at code abstractions, bug patterns, contracts, etc.

Greg KH may know better. Greg, as far as I remember you commented on
some of these reports along the lines of, for example, "the warning is
in sysfs code, but the bug is in the callers".

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: WARNING in ib_umad_kill_port
  2020-04-07 12:39         ` Dmitry Vyukov
@ 2020-04-07 14:33           ` Greg Kroah-Hartman
  2020-04-07 14:35           ` Jason Gunthorpe
  1 sibling, 0 replies; 9+ messages in thread
From: Greg Kroah-Hartman @ 2020-04-07 14:33 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Jason Gunthorpe, Leon Romanovsky, syzbot, RDMA mailing list,
	LKML, netdev, Rafael Wysocki, syzkaller-bugs

On Tue, Apr 07, 2020 at 02:39:42PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 7, 2020 at 1:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Apr 07, 2020 at 11:56:30AM +0200, Dmitry Vyukov wrote:
> > > > I'm not sure what could be done wrong here to elicit this:
> > > >
> > > >  sysfs group 'power' not found for kobject 'umad1'
> > > >
> > > > ??
> > > >
> > > > I've seen another similar sysfs related trigger that we couldn't
> > > > figure out.
> > > >
> > > > Hard to investigate without a reproducer.
> > >
> > > Based on all of the sysfs-related bugs I've seen, my bet would be on
> > > some races. E.g. one thread registers devices, while another
> > > unregisters these.
> >
> > I did check that the naming is ordered right, at least we won't be
> > concurrently creating and destroying umadX sysfs of the same names.
> >
> > I'm also fairly sure we can't be destroying the parent at the same
> > time as this child.
> >
> > Do you see the above commonly? Could it be some driver core thing? Or
> > is it more likely something wrong in umad?
> 
> Mmmm... I can't say, I am looking at some bugs very briefly. I've
> noticed that sysfs comes up periodically (or was it some other similar
> fs?). General observation is that code frequently assumes only the
> happy scenario and only, say, a single administrator doing one thing
> at a time, slowly and carefully, and it is not really hardened against
> armies of monkeys.
> But I did not look at code abstractions, bug patterns, contracts, etc.
> 
> Greg KH may know better. Greg, as far as I remember you commented on
> some of these reports along the lines of, for example, "the warning is
> in sysfs code, but the bug is in the callers".

Yes, that is correct.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: WARNING in ib_umad_kill_port
  2020-04-07 12:39         ` Dmitry Vyukov
  2020-04-07 14:33           ` Greg Kroah-Hartman
@ 2020-04-07 14:35           ` Jason Gunthorpe
  2020-04-09 13:35             ` Dmitry Vyukov
  1 sibling, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2020-04-07 14:35 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Leon Romanovsky, syzbot, RDMA mailing list, Greg Kroah-Hartman,
	LKML, netdev, Rafael Wysocki, syzkaller-bugs

On Tue, Apr 07, 2020 at 02:39:42PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 7, 2020 at 1:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Apr 07, 2020 at 11:56:30AM +0200, Dmitry Vyukov wrote:
> > > > I'm not sure what could be done wrong here to elicit this:
> > > >
> > > >  sysfs group 'power' not found for kobject 'umad1'
> > > >
> > > > ??
> > > >
> > > > I've seen another similar sysfs related trigger that we couldn't
> > > > figure out.
> > > >
> > > > Hard to investigate without a reproducer.
> > >
> > > Based on all of the sysfs-related bugs I've seen, my bet would be on
> > > some races. E.g. one thread registers devices, while another
> > > unregisters these.
> >
> > I did check that the naming is ordered right, at least we won't be
> > concurrently creating and destroying umadX sysfs of the same names.
> >
> > I'm also fairly sure we can't be destroying the parent at the same
> > time as this child.
> >
> > Do you see the above commonly? Could it be some driver core thing? Or
> > is it more likely something wrong in umad?
> 
> Mmmm... I can't say, I am looking at some bugs very briefly. I've
> noticed that sysfs comes up periodically (or was it some other similar
> fs?). 

Hmm..

Looking at the git history I see several cases where there are
ordering problems. I wonder if the rdma parent device is being
destroyed before the rdma devices complete destruction?

I see the syzkaller is creating a bunch of virtual net devices, and I
assume it has created a software rdma device on one of these virtual
devices.

So I'm guessing that it is also destroying a parent? But I can't guess
which.. Some simple tests with veth suggest it is OK because the
parent is virtual. But maybe bond or bridge or something?

The issue in rdma is that unregistering a netdev triggers an async
destruction of the RDMA devices. This has to be async because the
netdev notification is delivered with RTNL held, and a rdma device
cannot be destroyed while holding RTNL.

So there is a race, I suppose, where the netdev can complete
destruction while rdma continues, and if someone deletes the sysfs
holding the netdev before rdma completes, I'm going to guess, that we
hit this warning?

Could it be? I would love to know what netdev the rdma device was
created on, but it doesn't seem to show in the trace :\ 

This theory could be made more likely by adding a sleep to
ib_unregister_work() to increase the race window - is there some way
to get syzkaller to search for a reproducer with that patch?

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: WARNING in ib_umad_kill_port
  2020-04-07 14:35           ` Jason Gunthorpe
@ 2020-04-09 13:35             ` Dmitry Vyukov
  0 siblings, 0 replies; 9+ messages in thread
From: Dmitry Vyukov @ 2020-04-09 13:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, syzbot, RDMA mailing list, Greg Kroah-Hartman,
	LKML, netdev, Rafael Wysocki, syzkaller-bugs

On Tue, Apr 7, 2020 at 4:35 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Apr 07, 2020 at 02:39:42PM +0200, Dmitry Vyukov wrote:
> > On Tue, Apr 7, 2020 at 1:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Tue, Apr 07, 2020 at 11:56:30AM +0200, Dmitry Vyukov wrote:
> > > > > I'm not sure what could be done wrong here to elicit this:
> > > > >
> > > > >  sysfs group 'power' not found for kobject 'umad1'
> > > > >
> > > > > ??
> > > > >
> > > > > I've seen another similar sysfs related trigger that we couldn't
> > > > > figure out.
> > > > >
> > > > > Hard to investigate without a reproducer.
> > > >
> > > > Based on all of the sysfs-related bugs I've seen, my bet would be on
> > > > some races. E.g. one thread registers devices, while another
> > > > unregisters these.
> > >
> > > I did check that the naming is ordered right, at least we won't be
> > > concurrently creating and destroying umadX sysfs of the same names.
> > >
> > > I'm also fairly sure we can't be destroying the parent at the same
> > > time as this child.
> > >
> > > Do you see the above commonly? Could it be some driver core thing? Or
> > > is it more likely something wrong in umad?
> >
> > Mmmm... I can't say, I am looking at some bugs very briefly. I've
> > noticed that sysfs comes up periodically (or was it some other similar
> > fs?).
>
> Hmm..
>
> Looking at the git history I see several cases where there are
> ordering problems. I wonder if the rdma parent device is being
> destroyed before the rdma devices complete destruction?
>
> I see the syzkaller is creating a bunch of virtual net devices, and I
> assume it has created a software rdma device on one of these virtual
> devices.
>
> So I'm guessing that it is also destroying a parent? But I can't guess
> which.. Some simple tests with veth suggest it is OK because the
> parent is virtual. But maybe bond or bridge or something?
>
> The issue in rdma is that unregistering a netdev triggers an async
> destruction of the RDMA devices. This has to be async because the
> netdev notification is delivered with RTNL held, and a rdma device
> cannot be destroyed while holding RTNL.
>
> So there is a race, I suppose, where the netdev can complete
> destruction while rdma continues, and if someone deletes the sysfs
> holding the netdev before rdma completes, I'm going to guess, that we
> hit this warning?
>
> Could it be? I would love to know what netdev the rdma device was
> created on, but it doesn't seem to show in the trace :\
>
> This theory could be made more likely by adding a sleep to
> ib_unregister_work() to increase the race window - is there some way
> to get syzkaller to search for a reproducer with that patch?


Bad it happened in kthread context. Otherwise it's usually possible to
pinpoint the test based on process name.

syz-repro utility will do reproduction process with a any kernel you give it:
https://github.com/google/syzkaller/blob/master/docs/reproducing_crashes.md

Or it's possible to run individual programs, or whole log with
syz-execprog utility:
https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md

Or maybe you could pinpoint the guilty test program by hand in the log
(it's probably somewhere closer to the end):
https://syzkaller.appspot.com/x/log.txt?x=119dd16de00000

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-04-09 13:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-06  6:37 WARNING in ib_umad_kill_port syzbot
2020-04-06 17:21 ` Leon Romanovsky
2020-04-06 17:44   ` Jason Gunthorpe
2020-04-07  9:56     ` Dmitry Vyukov
2020-04-07 11:55       ` Jason Gunthorpe
2020-04-07 12:39         ` Dmitry Vyukov
2020-04-07 14:33           ` Greg Kroah-Hartman
2020-04-07 14:35           ` Jason Gunthorpe
2020-04-09 13:35             ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).