* [syzbot] general protection fault in __device_attach @ 2022-03-14 8:46 syzbot 2022-06-02 19:49 ` syzbot ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: syzbot @ 2022-03-14 8:46 UTC (permalink / raw) To: gregkh, linux-kernel, rafael, syzkaller-bugs Hello, syzbot found the following issue on: HEAD commit: e7e19defa575 Merge tag 'arm64-fixes' of git://git.kernel.o.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=13ea76f6700000 kernel config: https://syzkaller.appspot.com/x/.config?x=442f8ac61e60a75e dashboard link: https://syzkaller.appspot.com/bug?extid=dd3c97de244683533381 compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+dd3c97de244683533381@syzkaller.appspotmail.com general protection fault, probably for non-canonical address 0xdffffc0000000021: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000108-0x000000000000010f] CPU: 1 PID: 14569 Comm: syz-executor.4 Not tainted 5.17.0-rc7-syzkaller-00068-ge7e19defa575 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__device_attach+0xad/0x4a0 drivers/base/dd.c:949 Code: e8 03 42 80 3c 20 00 0f 85 a3 03 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 65 48 49 8d bc 24 08 01 00 00 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 06 0f 8e 6e 03 00 00 45 0f b6 b4 24 08 01 00 RSP: 0018:ffffc90010a87b98 EFLAGS: 00010216 RAX: dffffc0000000000 RBX: 1ffff92002150f74 RCX: 0000000000000000 RDX: 0000000000000021 RSI: 0000000000000008 RDI: 0000000000000108 RBP: ffff88807829d030 R08: 0000000000000000 R09: ffffc90010a87ad7 R10: fffff52002150f5a R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 00000000fffffff0 R15: ffff88807829d140 FS: 00007f7048b3e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f704a2dd090 CR3: 0000000074ae3000 CR4: 0000000000350ee0 Call Trace: <TASK> proc_ioctl.part.0+0x48e/0x560 drivers/usb/core/devio.c:2340 proc_ioctl drivers/usb/core/devio.c:170 [inline] proc_ioctl_compat drivers/usb/core/devio.c:2389 [inline] usbdev_do_ioctl drivers/usb/core/devio.c:2705 [inline] usbdev_ioctl+0xc01/0x36c0 drivers/usb/core/devio.c:2791 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline] __se_sys_ioctl fs/ioctl.c:860 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f704a1c9049 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f7048b3e168 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f704a2dbf60 RCX: 00007f704a1c9049 RDX: 0000000020000000 RSI: 00000000c00c5512 RDI: 0000000000000003 RBP: 00007f704a22308d R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffc683ba24f R14: 00007f7048b3e300 R15: 0000000000022000 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:__device_attach+0xad/0x4a0 drivers/base/dd.c:949 Code: e8 03 42 80 3c 20 00 0f 85 a3 03 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 65 48 49 8d bc 24 08 01 00 00 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 06 0f 8e 6e 03 00 00 45 0f b6 b4 24 08 01 00 RSP: 0018:ffffc90010a87b98 EFLAGS: 00010216 RAX: dffffc0000000000 RBX: 1ffff92002150f74 RCX: 0000000000000000 RDX: 0000000000000021 RSI: 0000000000000008 RDI: 0000000000000108 RBP: ffff88807829d030 R08: 0000000000000000 R09: ffffc90010a87ad7 R10: fffff52002150f5a R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 00000000fffffff0 R15: ffff88807829d140 FS: 00007f7048b3e700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0b074ee1b8 CR3: 0000000074ae3000 CR4: 0000000000350ef0 ---------------- Code disassembly (best guess): 0: e8 03 42 80 3c callq 0x3c804208 5: 20 00 and %al,(%rax) 7: 0f 85 a3 03 00 00 jne 0x3b0 d: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax 14: fc ff df 17: 4c 8b 65 48 mov 0x48(%rbp),%r12 1b: 49 8d bc 24 08 01 00 lea 0x108(%r12),%rdi 22: 00 23: 48 89 fa mov %rdi,%rdx 26: 48 c1 ea 03 shr $0x3,%rdx * 2a: 0f b6 04 02 movzbl (%rdx,%rax,1),%eax <-- trapping instruction 2e: 84 c0 test %al,%al 30: 74 06 je 0x38 32: 0f 8e 6e 03 00 00 jle 0x3a6 38: 45 rex.RB 39: 0f .byte 0xf 3a: b6 b4 mov $0xb4,%dh 3c: 24 08 and $0x8,%al 3e: 01 00 add %eax,(%rax) --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-03-14 8:46 [syzbot] general protection fault in __device_attach syzbot @ 2022-06-02 19:49 ` syzbot 2022-06-03 10:02 ` syzbot 2024-01-10 13:12 ` [syzbot] [kernel?] " syzbot 2 siblings, 0 replies; 18+ messages in thread From: syzbot @ 2022-06-02 19:49 UTC (permalink / raw) To: gregkh, linux-kernel, rafael, syzkaller-bugs syzbot has found a reproducer for the following issue on: HEAD commit: d1dc87763f40 assoc_array: Fix BUG_ON during garbage collect git tree: upstream console+strace: https://syzkaller.appspot.com/x/log.txt?x=17d2e7f5f00000 kernel config: https://syzkaller.appspot.com/x/.config?x=c51cd24814bb5665 dashboard link: https://syzkaller.appspot.com/bug?extid=dd3c97de244683533381 compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15613e2bf00000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c90adbf00000 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+dd3c97de244683533381@syzkaller.appspotmail.com usb usb9: device_add((null)) --> -22 general protection fault, probably for non-canonical address 0xdffffc0000000021: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000108-0x000000000000010f] CPU: 0 PID: 4190 Comm: syz-executor322 Not tainted 5.18.0-syzkaller-11972-gd1dc87763f40 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__device_attach+0xad/0x4a0 drivers/base/dd.c:948 Code: e8 03 42 80 3c 20 00 0f 85 a3 03 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 65 48 49 8d bc 24 08 01 00 00 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 06 0f 8e 6e 03 00 00 45 0f b6 b4 24 08 01 00 RSP: 0018:ffffc90003447b98 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 1ffff92000688f74 RCX: 0000000000000000 RDX: 0000000000000021 RSI: 0000000000000002 RDI: 0000000000000108 RBP: ffff88807a22f030 R08: 0000000000000000 R09: ffffffff8dbb1097 R10: fffffbfff1b76212 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 00000000fffffff0 R15: ffff88807a22f0b0 FS: 0000555557335300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2b779a90b0 CR3: 000000007a1a7000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> proc_ioctl.part.0+0x48e/0x560 drivers/usb/core/devio.c:2356 proc_ioctl drivers/usb/core/devio.c:182 [inline] proc_ioctl_default drivers/usb/core/devio.c:2391 [inline] usbdev_do_ioctl drivers/usb/core/devio.c:2747 [inline] usbdev_ioctl+0x2c08/0x36f0 drivers/usb/core/devio.c:2807 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f2b77979779 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 b1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffe17c6ed98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f2b779bd184 RCX: 00007f2b77979779 RDX: 0000000020000040 RSI: 00000000c0105512 RDI: 0000000000000006 RBP: 00007ffe17c6edb0 R08: 0000000000000001 R09: 0000000000000001 R10: 000000000000ffff R11: 0000000000000246 R12: 0000000000000001 R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:__device_attach+0xad/0x4a0 drivers/base/dd.c:948 Code: e8 03 42 80 3c 20 00 0f 85 a3 03 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 65 48 49 8d bc 24 08 01 00 00 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 06 0f 8e 6e 03 00 00 45 0f b6 b4 24 08 01 00 RSP: 0018:ffffc90003447b98 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 1ffff92000688f74 RCX: 0000000000000000 RDX: 0000000000000021 RSI: 0000000000000002 RDI: 0000000000000108 RBP: ffff88807a22f030 R08: 0000000000000000 R09: ffffffff8dbb1097 R10: fffffbfff1b76212 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 00000000fffffff0 R15: ffff88807a22f0b0 FS: 0000555557335300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2b779a90b0 CR3: 000000007a1a7000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 ---------------- Code disassembly (best guess): 0: e8 03 42 80 3c callq 0x3c804208 5: 20 00 and %al,(%rax) 7: 0f 85 a3 03 00 00 jne 0x3b0 d: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax 14: fc ff df 17: 4c 8b 65 48 mov 0x48(%rbp),%r12 1b: 49 8d bc 24 08 01 00 lea 0x108(%r12),%rdi 22: 00 23: 48 89 fa mov %rdi,%rdx 26: 48 c1 ea 03 shr $0x3,%rdx * 2a: 0f b6 04 02 movzbl (%rdx,%rax,1),%eax <-- trapping instruction 2e: 84 c0 test %al,%al 30: 74 06 je 0x38 32: 0f 8e 6e 03 00 00 jle 0x3a6 38: 45 rex.RB 39: 0f .byte 0xf 3a: b6 b4 mov $0xb4,%dh 3c: 24 08 and $0x8,%al 3e: 01 00 add %eax,(%rax) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-03-14 8:46 [syzbot] general protection fault in __device_attach syzbot 2022-06-02 19:49 ` syzbot @ 2022-06-03 10:02 ` syzbot 2022-06-03 11:04 ` Andy Shevchenko 2024-01-10 13:12 ` [syzbot] [kernel?] " syzbot 2 siblings, 1 reply; 18+ messages in thread From: syzbot @ 2022-06-03 10:02 UTC (permalink / raw) To: andriy.shevchenko, gregkh, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs syzbot has bisected this issue to: commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1 Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Date: Fri Jun 18 13:41:27 2021 +0000 ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1040b80df00000 start commit: d1dc87763f40 assoc_array: Fix BUG_ON during garbage collect git tree: upstream final oops: https://syzkaller.appspot.com/x/report.txt?x=1240b80df00000 console output: https://syzkaller.appspot.com/x/log.txt?x=1440b80df00000 kernel config: https://syzkaller.appspot.com/x/.config?x=c51cd24814bb5665 dashboard link: https://syzkaller.appspot.com/bug?extid=dd3c97de244683533381 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15613e2bf00000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c90adbf00000 Reported-by: syzbot+dd3c97de244683533381@syzkaller.appspotmail.com Fixes: a9c4cf299f5f ("ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros") For information about bisection process see: https://goo.gl/tpsmEJ#bisection ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-03 10:02 ` syzbot @ 2022-06-03 11:04 ` Andy Shevchenko 2022-06-03 15:42 ` Alan Stern 0 siblings, 1 reply; 18+ messages in thread From: Andy Shevchenko @ 2022-06-03 11:04 UTC (permalink / raw) To: syzbot Cc: gregkh, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb, Alan Stern On Fri, Jun 03, 2022 at 03:02:07AM -0700, syzbot wrote: > syzbot has bisected this issue to: > > commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1 > Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Date: Fri Jun 18 13:41:27 2021 +0000 > > ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros Hmm... It's not obvious at all how this change can alter the behaviour so drastically. device_add() is called from USB core with intf->dev.name == NULL by some reason. A-ha, seems like fault injector, which looks like dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum, dev->devpath, configuration, ifnum); missed the return code check. But I'm not familiar with that code at all, adding Linux USB ML and Alan. > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1040b80df00000 > start commit: d1dc87763f40 assoc_array: Fix BUG_ON during garbage collect > git tree: upstream > final oops: https://syzkaller.appspot.com/x/report.txt?x=1240b80df00000 > console output: https://syzkaller.appspot.com/x/log.txt?x=1440b80df00000 > kernel config: https://syzkaller.appspot.com/x/.config?x=c51cd24814bb5665 > dashboard link: https://syzkaller.appspot.com/bug?extid=dd3c97de244683533381 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15613e2bf00000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c90adbf00000 > > Reported-by: syzbot+dd3c97de244683533381@syzkaller.appspotmail.com > Fixes: a9c4cf299f5f ("ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros") > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-03 11:04 ` Andy Shevchenko @ 2022-06-03 15:42 ` Alan Stern 2022-06-03 15:52 ` Greg KH 0 siblings, 1 reply; 18+ messages in thread From: Alan Stern @ 2022-06-03 15:42 UTC (permalink / raw) To: Andy Shevchenko Cc: syzbot, gregkh, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb On Fri, Jun 03, 2022 at 02:04:04PM +0300, Andy Shevchenko wrote: > On Fri, Jun 03, 2022 at 03:02:07AM -0700, syzbot wrote: > > syzbot has bisected this issue to: > > > > commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1 > > Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > Date: Fri Jun 18 13:41:27 2021 +0000 > > > > ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros > > Hmm... It's not obvious at all how this change can alter the behaviour so > drastically. device_add() is called from USB core with intf->dev.name == NULL > by some reason. A-ha, seems like fault injector, which looks like > > dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum, > dev->devpath, configuration, ifnum); > > missed the return code check. > > But I'm not familiar with that code at all, adding Linux USB ML and Alan. I can't see any connection between this bug and acpi/sysfs.c. Is it a bad bisection? It looks like you're right about dev_set_name() failing. In fact, the kernel appears to be littered with calls to that routine which do not check the return code (the entire subtree below drivers/usb/ contains only _one_ call that does check the return code!). The function doesn't have any __must_check annotation, and its kerneldoc doesn't mention the return code or the possibility of a failure. Apparently the assumption is that if dev_set_name() fails then device_add() later on will also fail, and the problem will be detected then. So now what should happen when device_add() for an interface fails in usb_set_configuration()? I guess the interface should be deleted; otherwise we have the possibility that people might still try to access it via usbfs, as in the syzbot test run. Same goes for the of_device_is_available() check. Fixing that will be a little painful. Right now there are plenty of places in the USB core that aren't prepared to cope with a non-existent interface. Alan Stern ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-03 15:42 ` Alan Stern @ 2022-06-03 15:52 ` Greg KH 2022-06-03 16:03 ` Alan Stern 0 siblings, 1 reply; 18+ messages in thread From: Greg KH @ 2022-06-03 15:52 UTC (permalink / raw) To: Alan Stern Cc: Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb On Fri, Jun 03, 2022 at 11:42:19AM -0400, Alan Stern wrote: > On Fri, Jun 03, 2022 at 02:04:04PM +0300, Andy Shevchenko wrote: > > On Fri, Jun 03, 2022 at 03:02:07AM -0700, syzbot wrote: > > > syzbot has bisected this issue to: > > > > > > commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1 > > > Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > Date: Fri Jun 18 13:41:27 2021 +0000 > > > > > > ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros > > > > Hmm... It's not obvious at all how this change can alter the behaviour so > > drastically. device_add() is called from USB core with intf->dev.name == NULL > > by some reason. A-ha, seems like fault injector, which looks like > > > > dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum, > > dev->devpath, configuration, ifnum); > > > > missed the return code check. > > > > But I'm not familiar with that code at all, adding Linux USB ML and Alan. > > I can't see any connection between this bug and acpi/sysfs.c. Is it a > bad bisection? > > It looks like you're right about dev_set_name() failing. In fact, the > kernel appears to be littered with calls to that routine which do not > check the return code (the entire subtree below drivers/usb/ contains > only _one_ call that does check the return code!). The function doesn't > have any __must_check annotation, and its kerneldoc doesn't mention the > return code or the possibility of a failure. > > Apparently the assumption is that if dev_set_name() fails then > device_add() later on will also fail, and the problem will be detected > then. > > So now what should happen when device_add() for an interface fails in > usb_set_configuration()? But how can that really fail on a real system? Is this just due to error-injection stuff? If so, I'm really loath to rework the world for something that can never happen in real life. Or is this a real syzbot-found-with-reproducer issue? thanks, greg k-h ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-03 15:52 ` Greg KH @ 2022-06-03 16:03 ` Alan Stern 2022-06-03 16:11 ` Greg KH 0 siblings, 1 reply; 18+ messages in thread From: Alan Stern @ 2022-06-03 16:03 UTC (permalink / raw) To: Greg KH Cc: Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb On Fri, Jun 03, 2022 at 05:52:38PM +0200, Greg KH wrote: > On Fri, Jun 03, 2022 at 11:42:19AM -0400, Alan Stern wrote: > > On Fri, Jun 03, 2022 at 02:04:04PM +0300, Andy Shevchenko wrote: > > > On Fri, Jun 03, 2022 at 03:02:07AM -0700, syzbot wrote: > > > > syzbot has bisected this issue to: > > > > > > > > commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1 > > > > Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > > Date: Fri Jun 18 13:41:27 2021 +0000 > > > > > > > > ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros > > > > > > Hmm... It's not obvious at all how this change can alter the behaviour so > > > drastically. device_add() is called from USB core with intf->dev.name == NULL > > > by some reason. A-ha, seems like fault injector, which looks like > > > > > > dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum, > > > dev->devpath, configuration, ifnum); > > > > > > missed the return code check. > > > > > > But I'm not familiar with that code at all, adding Linux USB ML and Alan. > > > > I can't see any connection between this bug and acpi/sysfs.c. Is it a > > bad bisection? > > > > It looks like you're right about dev_set_name() failing. In fact, the > > kernel appears to be littered with calls to that routine which do not > > check the return code (the entire subtree below drivers/usb/ contains > > only _one_ call that does check the return code!). The function doesn't > > have any __must_check annotation, and its kerneldoc doesn't mention the > > return code or the possibility of a failure. > > > > Apparently the assumption is that if dev_set_name() fails then > > device_add() later on will also fail, and the problem will be detected > > then. > > > > So now what should happen when device_add() for an interface fails in > > usb_set_configuration()? > > But how can that really fail on a real system? > > Is this just due to error-injection stuff? If so, I'm really loath to > rework the world for something that can never happen in real life. > > Or is this a real syzbot-found-with-reproducer issue? Aren't there quite a few reasons why device_add() might fail? (Although most of them probably are memory allocation errors...) Basically, you have to make up your mind. If a function can fail, you should be prepared to handle the failure. If it can't fail, there's no point in even checking the return code. Alan Stern ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-03 16:03 ` Alan Stern @ 2022-06-03 16:11 ` Greg KH 2022-06-03 16:27 ` Alan Stern 2022-06-04 8:32 ` Dmitry Vyukov 0 siblings, 2 replies; 18+ messages in thread From: Greg KH @ 2022-06-03 16:11 UTC (permalink / raw) To: Alan Stern Cc: Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb On Fri, Jun 03, 2022 at 12:03:32PM -0400, Alan Stern wrote: > On Fri, Jun 03, 2022 at 05:52:38PM +0200, Greg KH wrote: > > On Fri, Jun 03, 2022 at 11:42:19AM -0400, Alan Stern wrote: > > > On Fri, Jun 03, 2022 at 02:04:04PM +0300, Andy Shevchenko wrote: > > > > On Fri, Jun 03, 2022 at 03:02:07AM -0700, syzbot wrote: > > > > > syzbot has bisected this issue to: > > > > > > > > > > commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1 > > > > > Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > > > Date: Fri Jun 18 13:41:27 2021 +0000 > > > > > > > > > > ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros > > > > > > > > Hmm... It's not obvious at all how this change can alter the behaviour so > > > > drastically. device_add() is called from USB core with intf->dev.name == NULL > > > > by some reason. A-ha, seems like fault injector, which looks like > > > > > > > > dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum, > > > > dev->devpath, configuration, ifnum); > > > > > > > > missed the return code check. > > > > > > > > But I'm not familiar with that code at all, adding Linux USB ML and Alan. > > > > > > I can't see any connection between this bug and acpi/sysfs.c. Is it a > > > bad bisection? > > > > > > It looks like you're right about dev_set_name() failing. In fact, the > > > kernel appears to be littered with calls to that routine which do not > > > check the return code (the entire subtree below drivers/usb/ contains > > > only _one_ call that does check the return code!). The function doesn't > > > have any __must_check annotation, and its kerneldoc doesn't mention the > > > return code or the possibility of a failure. > > > > > > Apparently the assumption is that if dev_set_name() fails then > > > device_add() later on will also fail, and the problem will be detected > > > then. > > > > > > So now what should happen when device_add() for an interface fails in > > > usb_set_configuration()? > > > > But how can that really fail on a real system? > > > > Is this just due to error-injection stuff? If so, I'm really loath to > > rework the world for something that can never happen in real life. > > > > Or is this a real syzbot-found-with-reproducer issue? > > Aren't there quite a few reasons why device_add() might fail? (Although > most of them probably are memory allocation errors...) I was thinking of the dev_set_name() issue further back in the call chain. > Basically, you have to make up your mind. If a function can fail, you > should be prepared to handle the failure. If it can't fail, there's no > point in even checking the return code. True, ok, we should unwind the mess. I'll try to look at it after the merge window... But again, is this a "real and able to be triggered from userspace" problem, or just fault-injection-induced? thanks, greg k-h ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-03 16:11 ` Greg KH @ 2022-06-03 16:27 ` Alan Stern 2022-06-04 8:32 ` Dmitry Vyukov 1 sibling, 0 replies; 18+ messages in thread From: Alan Stern @ 2022-06-03 16:27 UTC (permalink / raw) To: Greg KH Cc: Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb On Fri, Jun 03, 2022 at 06:11:55PM +0200, Greg KH wrote: > On Fri, Jun 03, 2022 at 12:03:32PM -0400, Alan Stern wrote: > > On Fri, Jun 03, 2022 at 05:52:38PM +0200, Greg KH wrote: > > > On Fri, Jun 03, 2022 at 11:42:19AM -0400, Alan Stern wrote: > > > > So now what should happen when device_add() for an interface fails in > > > > usb_set_configuration()? > > > > > > But how can that really fail on a real system? > > > > > > Is this just due to error-injection stuff? If so, I'm really loath to > > > rework the world for something that can never happen in real life. > > > > > > Or is this a real syzbot-found-with-reproducer issue? > > > > Aren't there quite a few reasons why device_add() might fail? (Although > > most of them probably are memory allocation errors...) > > I was thinking of the dev_set_name() issue further back in the call > chain. As far as I know, the only reason for dev_set_name() to fail is -ENOMEM. That's not something the user can control directly. > > Basically, you have to make up your mind. If a function can fail, you > > should be prepared to handle the failure. If it can't fail, there's no > > point in even checking the return code. > > True, ok, we should unwind the mess. I'll try to look at it after the > merge window... > > But again, is this a "real and able to be triggered from userspace" > problem, or just fault-injection-induced? I don't think any of the failure paths here are controlled by the user. They all seem to involve something going wrong internally in the kernel (i.e., corruption or memory allocation failure for a small buffer). Once that happens, the game is pretty much over anyway. Is it worth handling this sort of thing, or should we ignore the possibility and allow it to escalate to the point where the user can potentially trigger a kernel panic? Another way of putting it is: How gracefully do you want the kernel to collapse when this sort of corruption happens? Alan Stern ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-03 16:11 ` Greg KH 2022-06-03 16:27 ` Alan Stern @ 2022-06-04 8:32 ` Dmitry Vyukov 2022-06-06 12:38 ` Dan Carpenter 1 sibling, 1 reply; 18+ messages in thread From: Dmitry Vyukov @ 2022-06-04 8:32 UTC (permalink / raw) To: Greg KH Cc: Alan Stern, Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > syzbot has bisected this issue to: > > > > > > > > > > > > commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1 > > > > > > Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > > > > Date: Fri Jun 18 13:41:27 2021 +0000 > > > > > > > > > > > > ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros > > > > > > > > > > Hmm... It's not obvious at all how this change can alter the behaviour so > > > > > drastically. device_add() is called from USB core with intf->dev.name == NULL > > > > > by some reason. A-ha, seems like fault injector, which looks like > > > > > > > > > > dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum, > > > > > dev->devpath, configuration, ifnum); > > > > > > > > > > missed the return code check. > > > > > > > > > > But I'm not familiar with that code at all, adding Linux USB ML and Alan. > > > > > > > > I can't see any connection between this bug and acpi/sysfs.c. Is it a > > > > bad bisection? > > > > > > > > It looks like you're right about dev_set_name() failing. In fact, the > > > > kernel appears to be littered with calls to that routine which do not > > > > check the return code (the entire subtree below drivers/usb/ contains > > > > only _one_ call that does check the return code!). The function doesn't > > > > have any __must_check annotation, and its kerneldoc doesn't mention the > > > > return code or the possibility of a failure. > > > > > > > > Apparently the assumption is that if dev_set_name() fails then > > > > device_add() later on will also fail, and the problem will be detected > > > > then. > > > > > > > > So now what should happen when device_add() for an interface fails in > > > > usb_set_configuration()? > > > > > > But how can that really fail on a real system? > > > > > > Is this just due to error-injection stuff? If so, I'm really loath to > > > rework the world for something that can never happen in real life. > > > > > > Or is this a real syzbot-found-with-reproducer issue? > > > > Aren't there quite a few reasons why device_add() might fail? (Although > > most of them probably are memory allocation errors...) > > I was thinking of the dev_set_name() issue further back in the call > chain. > > > Basically, you have to make up your mind. If a function can fail, you > > should be prepared to handle the failure. If it can't fail, there's no > > point in even checking the return code. > > True, ok, we should unwind the mess. I'll try to look at it after the > merge window... > > But again, is this a "real and able to be triggered from userspace" > problem, or just fault-injection-induced? Then this is something to fix in the fault injection subsystem. Testing systems shouldn't be reporting false positives. What allocations cannot fail in real life? Is it <=page_size? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-04 8:32 ` Dmitry Vyukov @ 2022-06-06 12:38 ` Dan Carpenter 2022-06-07 7:15 ` Dmitry Vyukov 0 siblings, 1 reply; 18+ messages in thread From: Dan Carpenter @ 2022-06-06 12:38 UTC (permalink / raw) To: Dmitry Vyukov Cc: Greg KH, Alan Stern, Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb On Sat, Jun 04, 2022 at 10:32:46AM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote: > On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote: > > > > But again, is this a "real and able to be triggered from userspace" > > problem, or just fault-injection-induced? > > Then this is something to fix in the fault injection subsystem. > Testing systems shouldn't be reporting false positives. > What allocations cannot fail in real life? Is it <=page_size? > Apparently in 2014, anything less than *EIGHT?!!* pages succeeded! https://lwn.net/Articles/627419/ I have been on the look out since that article and never seen anyone mention it changing. I think we should ignore that and say that anything over PAGE_SIZE can fail. Possibly we could go smaller than PAGE_SIZE... regards, dan carpenter ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-06 12:38 ` Dan Carpenter @ 2022-06-07 7:15 ` Dmitry Vyukov 2022-06-08 3:25 ` Matthew Wilcox 0 siblings, 1 reply; 18+ messages in thread From: Dmitry Vyukov @ 2022-06-07 7:15 UTC (permalink / raw) To: Dan Carpenter Cc: Greg KH, Alan Stern, Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb, Linux-MM On Mon, 6 Jun 2022 at 14:39, Dan Carpenter <dan.carpenter@oracle.com> wrote: > > On Sat, Jun 04, 2022 at 10:32:46AM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote: > > On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > But again, is this a "real and able to be triggered from userspace" > > > problem, or just fault-injection-induced? > > > > Then this is something to fix in the fault injection subsystem. > > Testing systems shouldn't be reporting false positives. > > What allocations cannot fail in real life? Is it <=page_size? > > > > Apparently in 2014, anything less than *EIGHT?!!* pages succeeded! > > https://lwn.net/Articles/627419/ > > I have been on the look out since that article and never seen anyone > mention it changing. I think we should ignore that and say that > anything over PAGE_SIZE can fail. Possibly we could go smaller than > PAGE_SIZE... +linux-mm for GFP expertise re what allocations cannot possibly fail and should be excluded from fault injection. Interesting, thanks for the link. PAGE_SIZE looks like a good start. Once we have the predicate in place, we can refine it later when/if we have more inputs. But I wonder about GFP flags. They definitely have some impact on allocations. If GFP_ACCOUNT is set, all allocations can fail, right? If GFP_DMA/DMA32 is set, allocations can fail, right? What about other zones? If GFP_NORETRY is set, allocations can fail? What about GFP_NOMEMALLOC and GFP_ATOMIC? What about GFP_IO/GFP_FS/GFP_DIRECT_RECLAIM/GFP_KSWAPD_RECLAIM? At least some of these need to be set for allocations to not fail? Which ones? Any other flags are required to be set/unset for allocations to not fail? FTR here is quick link to flags list: https://elixir.bootlin.com/linux/v5.19-rc1/source/include/linux/gfp.h#L32 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-07 7:15 ` Dmitry Vyukov @ 2022-06-08 3:25 ` Matthew Wilcox 2022-06-08 8:20 ` Dmitry Vyukov 0 siblings, 1 reply; 18+ messages in thread From: Matthew Wilcox @ 2022-06-08 3:25 UTC (permalink / raw) To: Dmitry Vyukov Cc: Dan Carpenter, Greg KH, Alan Stern, Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb, Linux-MM On Tue, Jun 07, 2022 at 09:15:09AM +0200, Dmitry Vyukov wrote: > On Mon, 6 Jun 2022 at 14:39, Dan Carpenter <dan.carpenter@oracle.com> wrote: > > > > On Sat, Jun 04, 2022 at 10:32:46AM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote: > > > On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > But again, is this a "real and able to be triggered from userspace" > > > > problem, or just fault-injection-induced? > > > > > > Then this is something to fix in the fault injection subsystem. > > > Testing systems shouldn't be reporting false positives. > > > What allocations cannot fail in real life? Is it <=page_size? > > > > > > > Apparently in 2014, anything less than *EIGHT?!!* pages succeeded! > > > > https://lwn.net/Articles/627419/ > > > > I have been on the look out since that article and never seen anyone > > mention it changing. I think we should ignore that and say that > > anything over PAGE_SIZE can fail. Possibly we could go smaller than > > PAGE_SIZE... > > +linux-mm for GFP expertise re what allocations cannot possibly fail > and should be excluded from fault injection. > > Interesting, thanks for the link. > > PAGE_SIZE looks like a good start. Once we have the predicate in > place, we can refine it later when/if we have more inputs. > > But I wonder about GFP flags. They definitely have some impact on allocations. > If GFP_ACCOUNT is set, all allocations can fail, right? > If GFP_DMA/DMA32 is set, allocations can fail, right? What about other zones? > If GFP_NORETRY is set, allocations can fail? > What about GFP_NOMEMALLOC and GFP_ATOMIC? > What about GFP_IO/GFP_FS/GFP_DIRECT_RECLAIM/GFP_KSWAPD_RECLAIM? At > least some of these need to be set for allocations to not fail? Which > ones? > Any other flags are required to be set/unset for allocations to not fail? I'm not the expert on page allocation, but ... I don't think GFP_ACCOUNT makes allocations fail. It might make reclaim happen from within that cgroup, and it might cause an OOM kill for something in that cgroup. But I don't think it makes a (low order) allocation more likely to fail. There's usually less memory avilable in DMA/DMA32 zones, but we have so few allocations from those zones, I question the utility of focusing testing on those allocations. GFP_ATOMIC allows access to emergency pools, so I would say _less_ likely to fail. KSWAPD_RECLAIM has no effect on whether _this_ allocation succeeds or fails; it kicks kswapd to do reclaim, rather than doing reclaim directly. DIRECT_RECLAIM definitely makes allocations more likely to succeed. GFP_FS allows (direct) reclaim to happen from filesystems. GFP_IO allows IO to start (ie writeback can start) in order to clean dirty memory. Anyway, I hope somebody who knows the page allocator better than I do can say smarter things than this. Even better if they can put it into Documentation/ somewhere ;-) https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html exists but isn't quite enough to answer this question. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-08 3:25 ` Matthew Wilcox @ 2022-06-08 8:20 ` Dmitry Vyukov 2022-06-08 8:24 ` Dmitry Vyukov 0 siblings, 1 reply; 18+ messages in thread From: Dmitry Vyukov @ 2022-06-08 8:20 UTC (permalink / raw) To: Matthew Wilcox Cc: Dan Carpenter, Greg KH, Alan Stern, Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb, Linux-MM On Wed, 8 Jun 2022 at 05:25, Matthew Wilcox <willy@infradead.org> wrote: > > On Tue, Jun 07, 2022 at 09:15:09AM +0200, Dmitry Vyukov wrote: > > On Mon, 6 Jun 2022 at 14:39, Dan Carpenter <dan.carpenter@oracle.com> wrote: > > > > > > On Sat, Jun 04, 2022 at 10:32:46AM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote: > > > > On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > But again, is this a "real and able to be triggered from userspace" > > > > > problem, or just fault-injection-induced? > > > > > > > > Then this is something to fix in the fault injection subsystem. > > > > Testing systems shouldn't be reporting false positives. > > > > What allocations cannot fail in real life? Is it <=page_size? > > > > > > > > > > Apparently in 2014, anything less than *EIGHT?!!* pages succeeded! > > > > > > https://lwn.net/Articles/627419/ > > > > > > I have been on the look out since that article and never seen anyone > > > mention it changing. I think we should ignore that and say that > > > anything over PAGE_SIZE can fail. Possibly we could go smaller than > > > PAGE_SIZE... > > > > +linux-mm for GFP expertise re what allocations cannot possibly fail > > and should be excluded from fault injection. > > > > Interesting, thanks for the link. > > > > PAGE_SIZE looks like a good start. Once we have the predicate in > > place, we can refine it later when/if we have more inputs. > > > > But I wonder about GFP flags. They definitely have some impact on allocations. > > If GFP_ACCOUNT is set, all allocations can fail, right? > > If GFP_DMA/DMA32 is set, allocations can fail, right? What about other zones? > > If GFP_NORETRY is set, allocations can fail? > > What about GFP_NOMEMALLOC and GFP_ATOMIC? > > What about GFP_IO/GFP_FS/GFP_DIRECT_RECLAIM/GFP_KSWAPD_RECLAIM? At > > least some of these need to be set for allocations to not fail? Which > > ones? > > Any other flags are required to be set/unset for allocations to not fail? > > I'm not the expert on page allocation, but ... > > I don't think GFP_ACCOUNT makes allocations fail. It might make reclaim > happen from within that cgroup, and it might cause an OOM kill for > something in that cgroup. But I don't think it makes a (low order) > allocation more likely to fail. Interesting. I was thinking of some malicious specifically crafted configurations with very low limit and particular pattern of allocations. Also what if there is just 1 process (current)? Is it possible to kill and reclaim the current process when a thread is stuck in the middle of the kernel on a kmalloc? Also I see e.g.: Tasks with the OOM protection (oom_score_adj set to -1000) are treated as an exception and are never killed. I am not an expert on this either, but I think it may be hard to fight with a specifically crafted attack. > There's usually less memory avilable in DMA/DMA32 zones, but we have > so few allocations from those zones, I question the utility of focusing > testing on those allocations. > > GFP_ATOMIC allows access to emergency pools, so I would say _less_ likely > to fail. KSWAPD_RECLAIM has no effect on whether _this_ allocation > succeeds or fails; it kicks kswapd to do reclaim, rather than doing > reclaim directly. DIRECT_RECLAIM definitely makes allocations more likely > to succeed. GFP_FS allows (direct) reclaim to happen from filesystems. > GFP_IO allows IO to start (ie writeback can start) in order to clean > dirty memory. > > Anyway, I hope somebody who knows the page allocator better than I do > can say smarter things than this. Even better if they can put it into > Documentation/ somewhere ;-) Even better to put this into code as a predicate function that fault injection will use. It will also serve as precise up-to-date documentation. > https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html > exists but isn't quite enough to answer this question. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-08 8:20 ` Dmitry Vyukov @ 2022-06-08 8:24 ` Dmitry Vyukov 0 siblings, 0 replies; 18+ messages in thread From: Dmitry Vyukov @ 2022-06-08 8:24 UTC (permalink / raw) To: Matthew Wilcox Cc: Dan Carpenter, Greg KH, Alan Stern, Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb, Linux-MM On Wed, 8 Jun 2022 at 10:20, Dmitry Vyukov <dvyukov@google.com> wrote: > > On Tue, Jun 07, 2022 at 09:15:09AM +0200, Dmitry Vyukov wrote: > > > On Mon, 6 Jun 2022 at 14:39, Dan Carpenter <dan.carpenter@oracle.com> wrote: > > > > > > > > On Sat, Jun 04, 2022 at 10:32:46AM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote: > > > > > On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > But again, is this a "real and able to be triggered from userspace" > > > > > > problem, or just fault-injection-induced? > > > > > > > > > > Then this is something to fix in the fault injection subsystem. > > > > > Testing systems shouldn't be reporting false positives. > > > > > What allocations cannot fail in real life? Is it <=page_size? > > > > > > > > > > > > > Apparently in 2014, anything less than *EIGHT?!!* pages succeeded! > > > > > > > > https://lwn.net/Articles/627419/ > > > > > > > > I have been on the look out since that article and never seen anyone > > > > mention it changing. I think we should ignore that and say that > > > > anything over PAGE_SIZE can fail. Possibly we could go smaller than > > > > PAGE_SIZE... > > > > > > +linux-mm for GFP expertise re what allocations cannot possibly fail > > > and should be excluded from fault injection. > > > > > > Interesting, thanks for the link. > > > > > > PAGE_SIZE looks like a good start. Once we have the predicate in > > > place, we can refine it later when/if we have more inputs. > > > > > > But I wonder about GFP flags. They definitely have some impact on allocations. > > > If GFP_ACCOUNT is set, all allocations can fail, right? > > > If GFP_DMA/DMA32 is set, allocations can fail, right? What about other zones? > > > If GFP_NORETRY is set, allocations can fail? > > > What about GFP_NOMEMALLOC and GFP_ATOMIC? > > > What about GFP_IO/GFP_FS/GFP_DIRECT_RECLAIM/GFP_KSWAPD_RECLAIM? At > > > least some of these need to be set for allocations to not fail? Which > > > ones? > > > Any other flags are required to be set/unset for allocations to not fail? > > > > I'm not the expert on page allocation, but ... > > > > I don't think GFP_ACCOUNT makes allocations fail. It might make reclaim > > happen from within that cgroup, and it might cause an OOM kill for > > something in that cgroup. But I don't think it makes a (low order) > > allocation more likely to fail. > > Interesting. > I was thinking of some malicious specifically crafted configurations > with very low limit and particular pattern of allocations. Also what > if there is just 1 process (current)? Is it possible to kill and > reclaim the current process when a thread is stuck in the middle of > the kernel on a kmalloc? > Also I see e.g.: > Tasks with the OOM protection (oom_score_adj set to -1000) > are treated as an exception and are never killed. > > I am not an expert on this either, but I think it may be hard to fight > with a specifically crafted attack. > > > > There's usually less memory avilable in DMA/DMA32 zones, but we have > > so few allocations from those zones, I question the utility of focusing > > testing on those allocations. > > > > GFP_ATOMIC allows access to emergency pools, so I would say _less_ likely > > to fail. KSWAPD_RECLAIM has no effect on whether _this_ allocation > > succeeds or fails; it kicks kswapd to do reclaim, rather than doing > > reclaim directly. DIRECT_RECLAIM definitely makes allocations more likely > > to succeed. GFP_FS allows (direct) reclaim to happen from filesystems. > > GFP_IO allows IO to start (ie writeback can start) in order to clean > > dirty memory. > > > > Anyway, I hope somebody who knows the page allocator better than I do > > can say smarter things than this. Even better if they can put it into > > Documentation/ somewhere ;-) > > Even better to put this into code as a predicate function that fault > injection will use. It will also serve as precise up-to-date > documentation. Also at the end of kmalloc as: WARN_ON(!ret && !cant_fail(size, gfp)); ! > > https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html > > exists but isn't quite enough to answer this question. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] [kernel?] general protection fault in __device_attach 2022-03-14 8:46 [syzbot] general protection fault in __device_attach syzbot 2022-06-02 19:49 ` syzbot 2022-06-03 10:02 ` syzbot @ 2024-01-10 13:12 ` syzbot 2 siblings, 0 replies; 18+ messages in thread From: syzbot @ 2024-01-10 13:12 UTC (permalink / raw) To: 42.hyeyoo, andriy.shevchenko, dan.carpenter, dvyukov, gregkh, hdanton, keescook, lenb, linux-acpi, linux-kernel, linux-mm, linux-usb, rafael.j.wysocki, rafael, rientjes, rjw, stern, syzkaller-bugs, vbabka, willy syzbot suspects this issue was fixed by commit: commit 49378a05ce7f01a203550eb7c2ef772f6d24565c Author: Vlastimil Babka <vbabka@suse.cz> Date: Thu Oct 26 15:45:42 2023 +0000 mm/slub: remove slab_alloc() and __kmem_cache_alloc_lru() wrappers bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15b179cde80000 start commit: d1dc87763f40 assoc_array: Fix BUG_ON during garbage collect git tree: upstream kernel config: https://syzkaller.appspot.com/x/.config?x=c51cd24814bb5665 dashboard link: https://syzkaller.appspot.com/bug?extid=dd3c97de244683533381 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15613e2bf00000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c90adbf00000 If the result looks correct, please mark the issue as fixed by replying with: #syz fix: mm/slub: remove slab_alloc() and __kmem_cache_alloc_lru() wrappers For information about bisection process see: https://goo.gl/tpsmEJ#bisection ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20220603033532.5154-1-hdanton@sina.com>]
* Re: [syzbot] general protection fault in __device_attach [not found] <20220603033532.5154-1-hdanton@sina.com> @ 2022-06-03 3:55 ` syzbot 0 siblings, 0 replies; 18+ messages in thread From: syzbot @ 2022-06-03 3:55 UTC (permalink / raw) To: hdanton, linux-kernel, syzkaller-bugs Hello, syzbot has tested the proposed patch but the reproducer is still triggering an issue: general protection fault in __device_attach usb usb9: device_add((null)) --> -22 general protection fault, probably for non-canonical address 0xdffffc0000000021: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000108-0x000000000000010f] CPU: 1 PID: 4084 Comm: syz-executor.0 Not tainted 5.18.0-syzkaller-11972-gd1dc87763f40-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__device_attach+0xad/0x4a0 drivers/base/dd.c:948 Code: e8 03 42 80 3c 20 00 0f 85 a3 03 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 65 48 49 8d bc 24 08 01 00 00 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 06 0f 8e 6e 03 00 00 45 0f b6 b4 24 08 01 00 RSP: 0018:ffffc90003347b98 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 1ffff92000668f74 RCX: 0000000000000000 RDX: 0000000000000021 RSI: 0000000000000002 RDI: 0000000000000108 RBP: ffff888021878030 R08: 0000000000000000 R09: ffffffff8dbb1097 R10: fffffbfff1b76212 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 00000000fffffff0 R15: ffff8880218780b0 FS: 00007f8da1571700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8da059d090 CR3: 000000006b626000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> proc_ioctl.part.0+0x48e/0x560 drivers/usb/core/devio.c:2356 proc_ioctl drivers/usb/core/devio.c:182 [inline] proc_ioctl_default drivers/usb/core/devio.c:2391 [inline] usbdev_do_ioctl drivers/usb/core/devio.c:2747 [inline] usbdev_ioctl+0x2c08/0x36f0 drivers/usb/core/devio.c:2807 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f8da0489109 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f8da1571168 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f8da059bf60 RCX: 00007f8da0489109 RDX: 0000000020000040 RSI: 00000000c0105512 RDI: 0000000000000005 RBP: 00007f8da04e308d R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fff3b94a64f R14: 00007f8da1571300 R15: 0000000000022000 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:__device_attach+0xad/0x4a0 drivers/base/dd.c:948 Code: e8 03 42 80 3c 20 00 0f 85 a3 03 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 65 48 49 8d bc 24 08 01 00 00 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 06 0f 8e 6e 03 00 00 45 0f b6 b4 24 08 01 00 RSP: 0018:ffffc90003347b98 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 1ffff92000668f74 RCX: 0000000000000000 RDX: 0000000000000021 RSI: 0000000000000002 RDI: 0000000000000108 RBP: ffff888021878030 R08: 0000000000000000 R09: ffffffff8dbb1097 R10: fffffbfff1b76212 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 00000000fffffff0 R15: ffff8880218780b0 FS: 00007f8da1571700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8da059d090 CR3: 000000006b626000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 ---------------- Code disassembly (best guess): 0: e8 03 42 80 3c callq 0x3c804208 5: 20 00 and %al,(%rax) 7: 0f 85 a3 03 00 00 jne 0x3b0 d: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax 14: fc ff df 17: 4c 8b 65 48 mov 0x48(%rbp),%r12 1b: 49 8d bc 24 08 01 00 lea 0x108(%r12),%rdi 22: 00 23: 48 89 fa mov %rdi,%rdx 26: 48 c1 ea 03 shr $0x3,%rdx * 2a: 0f b6 04 02 movzbl (%rdx,%rax,1),%eax <-- trapping instruction 2e: 84 c0 test %al,%al 30: 74 06 je 0x38 32: 0f 8e 6e 03 00 00 jle 0x3a6 38: 45 rex.RB 39: 0f .byte 0xf 3a: b6 b4 mov $0xb4,%dh 3c: 24 08 and $0x8,%al 3e: 01 00 add %eax,(%rax) Tested on: commit: d1dc8776 assoc_array: Fix BUG_ON during garbage collect git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git console output: https://syzkaller.appspot.com/x/log.txt?x=113f50ddf00000 kernel config: https://syzkaller.appspot.com/x/.config?x=c51cd24814bb5665 dashboard link: https://syzkaller.appspot.com/bug?extid=dd3c97de244683533381 compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 patch: https://syzkaller.appspot.com/x/patch.diff?x=12390667f00000 ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20220603074439.5255-1-hdanton@sina.com>]
* Re: [syzbot] general protection fault in __device_attach [not found] <20220603074439.5255-1-hdanton@sina.com> @ 2022-06-03 10:41 ` syzbot 0 siblings, 0 replies; 18+ messages in thread From: syzbot @ 2022-06-03 10:41 UTC (permalink / raw) To: hdanton, linux-kernel, syzkaller-bugs Hello, syzbot has tested the proposed patch and the reproducer did not trigger any issue: Reported-and-tested-by: syzbot+dd3c97de244683533381@syzkaller.appspotmail.com Tested on: commit: d1dc8776 assoc_array: Fix BUG_ON during garbage collect git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git kernel config: https://syzkaller.appspot.com/x/.config?x=c51cd24814bb5665 dashboard link: https://syzkaller.appspot.com/bug?extid=dd3c97de244683533381 compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 patch: https://syzkaller.appspot.com/x/patch.diff?x=1244e933f00000 Note: testing is done by a robot and is best-effort only. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-01-10 13:12 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-03-14 8:46 [syzbot] general protection fault in __device_attach syzbot 2022-06-02 19:49 ` syzbot 2022-06-03 10:02 ` syzbot 2022-06-03 11:04 ` Andy Shevchenko 2022-06-03 15:42 ` Alan Stern 2022-06-03 15:52 ` Greg KH 2022-06-03 16:03 ` Alan Stern 2022-06-03 16:11 ` Greg KH 2022-06-03 16:27 ` Alan Stern 2022-06-04 8:32 ` Dmitry Vyukov 2022-06-06 12:38 ` Dan Carpenter 2022-06-07 7:15 ` Dmitry Vyukov 2022-06-08 3:25 ` Matthew Wilcox 2022-06-08 8:20 ` Dmitry Vyukov 2022-06-08 8:24 ` Dmitry Vyukov 2024-01-10 13:12 ` [syzbot] [kernel?] " syzbot [not found] <20220603033532.5154-1-hdanton@sina.com> 2022-06-03 3:55 ` [syzbot] " syzbot [not found] <20220603074439.5255-1-hdanton@sina.com> 2022-06-03 10:41 ` syzbot
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.