* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports [not found] <20200414040717.22040-1-hdanton@sina.com> @ 2020-04-14 4:31 ` Jens Axboe 0 siblings, 0 replies; 37+ messages in thread From: Jens Axboe @ 2020-04-14 4:31 UTC (permalink / raw) To: Hillf Danton; +Cc: Qian Cai, LKML, Linux-MM > On Apr 13, 2020, at 10:07 PM, Hillf Danton <hdanton@sina.com> wrote: > > >> On Mon, 13 Apr 2020 18:06:21 -0400 Qian Cai wrote: >> >> BTW, I=E2=80=99ll be adding fuzzers to my daily linux-next routines = >> where it triggers this >> io_uring/scheduler bug almost immediately, so hopefully would buy syzbot = >> some >> time to resume on linux-next. >> >> [67493.516737][T211750] BUG: unable to handle page fault for address: = >> ffffffffffffffe8 >> [67493.557315][T211750] #PF: supervisor read access in kernel mode >> [67493.586726][T211750] #PF: error_code(0x0000) - not-present page >> [67493.614434][T211750] PGD f96e17067 P4D f96e17067 PUD f96e19067 PMD 0=20= >> >> [67493.644846][T211750] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI >> [67493.674127][T211750] CPU: 55 PID: 211750 Comm: trinity-c127 Tainted: = >> G B L 5.7.0-rc1-next-20200413 #4 >> [67493.722516][T211750] Hardware name: HP ProLiant DL380 Gen9/ProLiant = >> DL380 Gen9, BIOS P89 04/12/2017 >> [67493.764925][T211750] RIP: 0010:__wake_up_common+0x98/0x290 >> __wake_up_common at kernel/sched/wait.c:87 >> [67493.790675][T211750] Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 = >> 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 = >> 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 = >> 8b 47 10 >> [67493.881650][T211750] RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046 >> [67493.909854][T211750] RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: = >> ffffffffaa9636b8 >> [67493.947131][T211750] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: = >> ffffffffffffffe8 >> [67493.983829][T211750] RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: = >> fffffbfff582c5fd >> [67494.020861][T211750] R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: = >> 0000000000000000 >> [67494.059249][T211750] R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: = >> ffffffffffffffe8 >> [67494.099699][T211750] FS: 00007fdcba4c4740(0000) = >> GS:ffff889033780000(0000) knlGS:0000000000000000 >> [67494.141858][T211750] CS: 0010 DS: 0000 ES: 0000 CR0: = >> 0000000080050033 >> [67494.172660][T211750] CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: = >> 00000000001606e0 >> [67494.209760][T211750] Call Trace: >> [67494.224720][T211750] __wake_up_common_lock+0xea/0x150 >> (inlined by) __wake_up_common_lock at kernel/sched/wait.c:124 >> [67494.248753][T211750] ? __wake_up_common+0x290/0x290 >> [67494.272014][T211750] ? lockdep_hardirqs_on+0x16/0x2c0 >> [67494.296139][T211750] __wake_up+0x13/0x20 >> [67494.314946][T211750] io_cqring_ev_posted+0x75/0xe0 >> (inlined by) io_cqring_ev_posted at fs/io_uring.c:1160 >> [67494.337726][T211750] io_ring_ctx_wait_and_kill+0x1c0/0x2f0 >> io_ring_ctx_wait_and_kill at fs/io_uring.c:7305 >> [67494.363840][T211750] io_uring_create+0xa8d/0x13b0 >> [67494.386526][T211750] ? io_req_defer_prep+0x990/0x990 >> [67494.410119][T211750] ? __kasan_check_write+0x14/0x20 >> [67494.433646][T211750] io_uring_setup+0xb8/0x130 >> [67494.454870][T211750] ? io_uring_create+0x13b0/0x13b0 >> [67494.478342][T211750] ? check_flags.part.28+0x220/0x220 >> [67494.502947][T211750] ? lockdep_hardirqs_on+0x16/0x2c0 >> [67494.526965][T211750] __x64_sys_io_uring_setup+0x31/0x40 >> [67494.551820][T211750] do_syscall_64+0xcc/0xaf0 >> [67494.574829][T211750] ? syscall_return_slowpath+0x580/0x580 >> [67494.604591][T211750] ? lockdep_hardirqs_off+0x1f/0x140 >> [67494.628901][T211750] ? entry_SYSCALL_64_after_hwframe+0x3e/0xb3 >> [67494.657616][T211750] ? trace_hardirqs_off_caller+0x3a/0x150 >> [67494.683999][T211750] ? trace_hardirqs_off_thunk+0x1a/0x1c >> [67494.709982][T211750] entry_SYSCALL_64_after_hwframe+0x49/0xb3 > > > See if it makes your fuzzers happy. It will, and we should probably do that separately too. I already posted another fix that avoids the posted call that triggers it. — Jens Axboe ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/2] mm: Two small fixes for recent syzbot reports @ 2020-04-08 1:40 Peter Xu 2020-04-09 0:47 ` Andrew Morton 0 siblings, 1 reply; 37+ messages in thread From: Peter Xu @ 2020-04-08 1:40 UTC (permalink / raw) To: linux-kernel, Linus Torvalds, linux-mm; +Cc: Andrew Morton, peterx The two patches should fix below syzbot reports: BUG: unable to handle kernel paging request in kernel_get_mempolicy https://lore.kernel.org/lkml/0000000000002b25f105a2a3434d@google.com/ WARNING: bad unlock balance in __get_user_pages_remote https://lore.kernel.org/lkml/00000000000005c65d05a2b90e70@google.com/ Note that the 1st patch also applied two extra small changes comparing to when posted on the list in that: (1) it squashed an "interupt" spelling error that Andrew has pointed out when picked up, and (2) it also initializes the "page" pointer to NULL. But I'm fairly confident it shouldn't affect the correctness of the patch. The 2nd patch is exactly the patch posted previously. Thanks, Peter Xu (2): mm/mempolicy: Allow lookup_node() to handle fatal signal mm/gup: Mark lock taken only after a successful retake mm/gup.c | 2 +- mm/mempolicy.c | 7 +++++-- 2 files changed, 6 insertions(+), 3 deletions(-) -- 2.24.1 ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-08 1:40 Peter Xu @ 2020-04-09 0:47 ` Andrew Morton 2020-04-09 11:49 ` Matthew Wilcox 2020-04-09 12:55 ` Dmitry Vyukov 0 siblings, 2 replies; 37+ messages in thread From: Andrew Morton @ 2020-04-09 0:47 UTC (permalink / raw) To: Peter Xu; +Cc: linux-kernel, Linus Torvalds, linux-mm On Tue, 7 Apr 2020 21:40:08 -0400 Peter Xu <peterx@redhat.com> wrote: > The two patches should fix below syzbot reports: > > BUG: unable to handle kernel paging request in kernel_get_mempolicy > https://lore.kernel.org/lkml/0000000000002b25f105a2a3434d@google.com/ > > WARNING: bad unlock balance in __get_user_pages_remote > https://lore.kernel.org/lkml/00000000000005c65d05a2b90e70@google.com/ (Is there an email address for the syzbot operators?) sysbot does test linux-next, yet these patches sat in linux-next for a month without a peep, but all hell broke loose when they hit Linus's tree. How could this have happened? Possibly I've been carrying a later patch which fixed all this up, but I'm not seeing anything like that. Nothing at all against mm/gup.c. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 0:47 ` Andrew Morton @ 2020-04-09 11:49 ` Matthew Wilcox 2020-04-09 13:00 ` Dmitry Vyukov 2020-04-09 12:55 ` Dmitry Vyukov 1 sibling, 1 reply; 37+ messages in thread From: Matthew Wilcox @ 2020-04-09 11:49 UTC (permalink / raw) To: Andrew Morton Cc: Peter Xu, linux-kernel, Linus Torvalds, linux-mm, syzkaller-bugs On Wed, Apr 08, 2020 at 05:47:32PM -0700, Andrew Morton wrote: > On Tue, 7 Apr 2020 21:40:08 -0400 Peter Xu <peterx@redhat.com> wrote: > > > The two patches should fix below syzbot reports: > > > > BUG: unable to handle kernel paging request in kernel_get_mempolicy > > https://lore.kernel.org/lkml/0000000000002b25f105a2a3434d@google.com/ > > > > WARNING: bad unlock balance in __get_user_pages_remote > > https://lore.kernel.org/lkml/00000000000005c65d05a2b90e70@google.com/ > > (Is there an email address for the syzbot operators?) I'd suggest syzkaller-bugs@googlegroups.com (added to the Cc). But there's a deeper problem in that we don't have anywhere to stash that kind of information in the kernel tree right now. Perhaps a special entry in the MAINTAINERS file for bot operators? Or one entry per bot? > sysbot does test linux-next, yet these patches sat in linux-next for a > month without a peep, but all hell broke loose when they hit Linus's > tree. How could this have happened? > > Possibly I've been carrying a later patch which fixed all this up, but > I'm not seeing anything like that. Nothing at all against mm/gup.c. > > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 11:49 ` Matthew Wilcox @ 2020-04-09 13:00 ` Dmitry Vyukov 2020-04-09 18:16 ` Andrew Morton 0 siblings, 1 reply; 37+ messages in thread From: Dmitry Vyukov @ 2020-04-09 13:00 UTC (permalink / raw) To: Matthew Wilcox Cc: Andrew Morton, Peter Xu, LKML, Linus Torvalds, Linux-MM, syzkaller-bugs, syzkaller On Thu, Apr 9, 2020 at 1:49 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Wed, Apr 08, 2020 at 05:47:32PM -0700, Andrew Morton wrote: > > On Tue, 7 Apr 2020 21:40:08 -0400 Peter Xu <peterx@redhat.com> wrote: > > > > > The two patches should fix below syzbot reports: > > > > > > BUG: unable to handle kernel paging request in kernel_get_mempolicy > > > https://lore.kernel.org/lkml/0000000000002b25f105a2a3434d@google.com/ > > > > > > WARNING: bad unlock balance in __get_user_pages_remote > > > https://lore.kernel.org/lkml/00000000000005c65d05a2b90e70@google.com/ > > > > (Is there an email address for the syzbot operators?) > > I'd suggest syzkaller-bugs@googlegroups.com (added to the Cc). syzkaller@googlegroups.com is a better one. syzkaller-bugs@googlegroups.com plays more of an LKML role. > But there's a deeper problem in that we don't have anywhere to stash > that kind of information in the kernel tree right now. Perhaps a special > entry in the MAINTAINERS file for bot operators? Or one entry per bot? I don't mind adding syzkaller. Some time ago I wanted to contact KernelCI, CKI, LKFT, 0-day owners, finding relevant lists wasn't impossible, but for some it was hard. For syzkaller it would be: https://github.com/google/syzkaller/issues for bugs/feature requests. syzkaller@googlegroups.com for discussions. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 13:00 ` Dmitry Vyukov @ 2020-04-09 18:16 ` Andrew Morton 2020-04-09 18:53 ` Linus Torvalds 0 siblings, 1 reply; 37+ messages in thread From: Andrew Morton @ 2020-04-09 18:16 UTC (permalink / raw) To: Dmitry Vyukov Cc: Matthew Wilcox, Peter Xu, LKML, Linus Torvalds, Linux-MM, syzkaller-bugs, syzkaller On Thu, 9 Apr 2020 15:00:20 +0200 Dmitry Vyukov <dvyukov@google.com> wrote: > On Thu, Apr 9, 2020 at 1:49 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Wed, Apr 08, 2020 at 05:47:32PM -0700, Andrew Morton wrote: > > > On Tue, 7 Apr 2020 21:40:08 -0400 Peter Xu <peterx@redhat.com> wrote: > > > > > > > The two patches should fix below syzbot reports: > > > > > > > > BUG: unable to handle kernel paging request in kernel_get_mempolicy > > > > https://lore.kernel.org/lkml/0000000000002b25f105a2a3434d@google.com/ > > > > > > > > WARNING: bad unlock balance in __get_user_pages_remote > > > > https://lore.kernel.org/lkml/00000000000005c65d05a2b90e70@google.com/ > > > > > > (Is there an email address for the syzbot operators?) > > > > I'd suggest syzkaller-bugs@googlegroups.com (added to the Cc). > > syzkaller@googlegroups.com is a better one. > syzkaller-bugs@googlegroups.com plays more of an LKML role. > > > But there's a deeper problem in that we don't have anywhere to stash > > that kind of information in the kernel tree right now. Perhaps a special > > entry in the MAINTAINERS file for bot operators? Or one entry per bot? > > I don't mind adding syzkaller. Some time ago I wanted to contact > KernelCI, CKI, LKFT, 0-day owners, finding relevant lists wasn't > impossible, but for some it was hard. > > For syzkaller it would be: > > https://github.com/google/syzkaller/issues for bugs/feature requests. > syzkaller@googlegroups.com for discussions. OK, thanks. A MAINTAINERS entry would be great. Could I please direct attention back to my original question regarding the problems we've recently discovered in 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times") and 71335f37c5e8 ("mm/gup: allow to react to fatal signals")? > sysbot does test linux-next, yet these patches sat in linux-next for a > month without a peep, but all hell broke loose when they hit Linus's > tree. How could this have happened? > > Possibly I've been carrying a later patch which fixed all this up, but > I'm not seeing anything like that. Nothing at all against mm/gup.c. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 18:16 ` Andrew Morton @ 2020-04-09 18:53 ` Linus Torvalds 2020-04-09 19:12 ` Andrew Morton 0 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2020-04-09 18:53 UTC (permalink / raw) To: Andrew Morton Cc: Dmitry Vyukov, Matthew Wilcox, Peter Xu, LKML, Linux-MM, syzkaller-bugs, syzkaller On Thu, Apr 9, 2020 at 11:16 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > Could I please direct attention back to my original question regarding > the problems we've recently discovered in 4426e945df58 ("mm/gup: allow > VM_FAULT_RETRY for multiple times") and 71335f37c5e8 ("mm/gup: allow to > react to fatal signals")? What earlier question? The "how could this happen" one? Dmitry already answered that one - are you perhaps missing the emails? linux-next has apparently not worked at all for over a month. So it got no testing at all, and thus also all the gup patches got no testing in linux-next. Only when they hit my tree, did they start getting testing. Not because my tree is the only thing getting tested, but because my tree is the only tree that _works_. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 18:53 ` Linus Torvalds @ 2020-04-09 19:12 ` Andrew Morton 2020-04-09 19:46 ` Linus Torvalds 0 siblings, 1 reply; 37+ messages in thread From: Andrew Morton @ 2020-04-09 19:12 UTC (permalink / raw) To: Linus Torvalds Cc: Dmitry Vyukov, Matthew Wilcox, Peter Xu, LKML, Linux-MM, syzkaller-bugs, syzkaller On Thu, 9 Apr 2020 11:53:33 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, Apr 9, 2020 at 11:16 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > Could I please direct attention back to my original question regarding > > the problems we've recently discovered in 4426e945df58 ("mm/gup: allow > > VM_FAULT_RETRY for multiple times") and 71335f37c5e8 ("mm/gup: allow to > > react to fatal signals")? > > What earlier question? The "how could this happen" one? > > Dmitry already answered that one - are you perhaps missing the emails? Yup, email threading got broken. > linux-next has apparently not worked at all for over a month. So it > got no testing at all, and thus also all the gup patches got no > testing in linux-next. > > Only when they hit my tree, did they start getting testing. Not > because my tree is the only thing getting tested, but because my tree > is the only tree that _works_. > And now the challenge is to protect your tree from the bad patches. https://groups.google.com/forum/#!msg/syzkaller-bugs/phowYdNXHck/qU1P0TsjBAAJ points at net/openvswitch/conntrack.c net/bluetooth/l2cap_sock.c sound/core/oss/pcm_plugin.c and other things, but it's 2+ weeks old. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 19:12 ` Andrew Morton @ 2020-04-09 19:46 ` Linus Torvalds 2020-04-09 19:56 ` Matthew Wilcox 0 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2020-04-09 19:46 UTC (permalink / raw) To: Andrew Morton Cc: Dmitry Vyukov, Matthew Wilcox, Peter Xu, LKML, Linux-MM, syzkaller-bugs, syzkaller On Thu, Apr 9, 2020 at 12:12 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > And now the challenge is to protect your tree from the bad patches. Well, right now, yes. But in the longer term, I think we want to protect linux-next from the bad patches so that they don't poison the testing that the bots can do. So that's why I suggested that linux-next and syzbot have some protocol to have things that cause syzbot pain to be removed from linux-next more aggressively. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 19:46 ` Linus Torvalds @ 2020-04-09 19:56 ` Matthew Wilcox 2020-04-09 19:58 ` Linus Torvalds 0 siblings, 1 reply; 37+ messages in thread From: Matthew Wilcox @ 2020-04-09 19:56 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Dmitry Vyukov, Peter Xu, LKML, Linux-MM, syzkaller-bugs, Stephen Rothwell On Thu, Apr 09, 2020 at 12:46:08PM -0700, Linus Torvalds wrote: > On Thu, Apr 9, 2020 at 12:12 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > And now the challenge is to protect your tree from the bad patches. > > Well, right now, yes. > > But in the longer term, I think we want to protect linux-next from the > bad patches so that they don't poison the testing that the bots can > do. > > So that's why I suggested that linux-next and syzbot have some > protocol to have things that cause syzbot pain to be removed from > linux-next more aggressively. We should probably give Stephen a cc here ... ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 19:56 ` Matthew Wilcox @ 2020-04-09 19:58 ` Linus Torvalds 2020-04-09 20:27 ` Eric Biggers 0 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2020-04-09 19:58 UTC (permalink / raw) To: Matthew Wilcox Cc: Andrew Morton, Dmitry Vyukov, Peter Xu, LKML, Linux-MM, syzkaller-bugs, Stephen Rothwell On Thu, Apr 9, 2020 at 12:56 PM Matthew Wilcox <willy@infradead.org> wrote: > > We should probably give Stephen a cc here ... Heh. I already did, but then that got broken because Andrew had lost that part of the thread and the discussion re-started. So Stephen was already cc'd for my original request to have linux-next kick things out aggressively. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 19:58 ` Linus Torvalds @ 2020-04-09 20:27 ` Eric Biggers 2020-04-09 20:34 ` Linus Torvalds 0 siblings, 1 reply; 37+ messages in thread From: Eric Biggers @ 2020-04-09 20:27 UTC (permalink / raw) To: Linus Torvalds Cc: Matthew Wilcox, Andrew Morton, Dmitry Vyukov, Peter Xu, LKML, Linux-MM, syzkaller-bugs, Stephen Rothwell On Thu, Apr 09, 2020 at 12:58:48PM -0700, Linus Torvalds wrote: > On Thu, Apr 9, 2020 at 12:56 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > We should probably give Stephen a cc here ... > > Heh. I already did, but then that got broken because Andrew had lost > that part of the thread and the discussion re-started. > > So Stephen was already cc'd for my original request to have linux-next > kick things out aggressively. > Well, if (for example) we look at "linux-next test error: WARNING: suspicious RCU usage in ovs_ct_exit" (https://lkml.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com/), it was sent to the maintainers of net/openvswitch/ where the warning occurred. It was then ignored. Would it help if bugs blocking testing on linux-next were Cc'ed to linux-next@vger.kernel.org, so that Stephen could investigate? FWIW, the issue of "syzbot report sent and ignored for months/years" is actually a much broader one which applies to all bugs, not just build / test breakages. There are tons of open bugs on https://syzkaller.appspot.com/upstream which are definitely still valid (sort by "Last" occurred). Long-term, to fix this we really need syzbot to start sending reminders. But first there's work needed to make the noise level low enough so that people don't just tune them out. - Eric ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 20:27 ` Eric Biggers @ 2020-04-09 20:34 ` Linus Torvalds 2020-04-09 23:34 ` Stephen Rothwell 2020-04-10 1:11 ` Theodore Y. Ts'o 0 siblings, 2 replies; 37+ messages in thread From: Linus Torvalds @ 2020-04-09 20:34 UTC (permalink / raw) To: Eric Biggers Cc: Matthew Wilcox, Andrew Morton, Dmitry Vyukov, Peter Xu, LKML, Linux-MM, syzkaller-bugs, Stephen Rothwell On Thu, Apr 9, 2020 at 1:27 PM Eric Biggers <ebiggers@kernel.org> wrote: > > Would it help if bugs blocking testing on linux-next were Cc'ed to > linux-next@vger.kernel.org, so that Stephen could investigate? Maybe. I'll let Stephen say. But I think the big issue is the "blocking testing" part. If it's "just" regular bugs, then: > FWIW, the issue of "syzbot report sent and ignored for months/years" is actually > a much broader one which applies to all bugs, not just build / test breakages. I don't know what to do about that, but it may be that people just don't judge the bugs interesting or assume that they are old. That's what made bugzilla so useless - being flooded with stale bugs that might not be worth worrying about, and no way to really tell. So old bugs generally should be aged out, and then if they still happen, prioritized. With "this keeps us from even finding new bugs" being a fairly high priority.. One de-motivational issue with syzbot reported bugs may be that they sometimes get sent to the wrong set of people - but still wide enough that everybody feels it's somebody elses issue. A kind of bystander effect for bugs. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 20:34 ` Linus Torvalds @ 2020-04-09 23:34 ` Stephen Rothwell 2020-04-10 1:11 ` Theodore Y. Ts'o 1 sibling, 0 replies; 37+ messages in thread From: Stephen Rothwell @ 2020-04-09 23:34 UTC (permalink / raw) To: Linus Torvalds Cc: Eric Biggers, Matthew Wilcox, Andrew Morton, Dmitry Vyukov, Peter Xu, LKML, Linux-MM, syzkaller-bugs [-- Attachment #1: Type: text/plain, Size: 714 bytes --] Hi all, On Thu, 9 Apr 2020 13:34:18 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, Apr 9, 2020 at 1:27 PM Eric Biggers <ebiggers@kernel.org> wrote: > > > > Would it help if bugs blocking testing on linux-next were Cc'ed to > > linux-next@vger.kernel.org, so that Stephen could investigate? > > Maybe. I'll let Stephen say. It would certainly help so I could at least chase people and maybe revert commits. Dropping trees can be problematic once Andrew has built his quilt series on top of them, so I try not to drop whole trees. I can also use a previous version of a tree (which is usually what I do if I discover a build problem). -- Cheers, Stephen Rothwell [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 20:34 ` Linus Torvalds 2020-04-09 23:34 ` Stephen Rothwell @ 2020-04-10 1:11 ` Theodore Y. Ts'o 1 sibling, 0 replies; 37+ messages in thread From: Theodore Y. Ts'o @ 2020-04-10 1:11 UTC (permalink / raw) To: Linus Torvalds Cc: Eric Biggers, Matthew Wilcox, Andrew Morton, Dmitry Vyukov, Peter Xu, LKML, Linux-MM, syzkaller-bugs, Stephen Rothwell On Thu, Apr 09, 2020 at 01:34:18PM -0700, Linus Torvalds wrote: > > FWIW, the issue of "syzbot report sent and ignored for months/years" is actually > > a much broader one which applies to all bugs, not just build / test breakages. > > I don't know what to do about that, but it may be that people just > don't judge the bugs interesting or assume that they are old. Syzkaller bugs which requuire (a) root privileges to trigger, or (b) require a deliberately corrupted file system are things which I don't consider super interesting. (For the latter, I'll usually wait for some other file system fuzzer to find it, such as Hydra, because Syzkaller makes it painful extract out the file system image, where as other file system fuzzers are *much* more file system developer friendly.) This shouldn't be a surprise to Dmitry, because I've given these feedbacks to him before. It would be nice if there was some way we could triage Syzkaller bugs into different buckets (requires root, lower to P2; requires a corrupted file system image, lower to P2). Unfortunately, that would require Syzkaller to have some kind of login system and way to track state, and Dmitry doesn't want to replicate the functionality of a bug tracker. > That's what made bugzilla so useless - being flooded with stale bugs > that might not be worth worrying about, and no way to really tell. At least with Bugzilla, it becomes possible to attach priorities and flags to them, instead of trying to assume that developers should treat all Syzkaller bugs as the same priority. Because when you do insist that all bugs be treated as high priority, many people will just treat them *all* as a P2 bug, especially when there are so many. - Ted ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 0:47 ` Andrew Morton 2020-04-09 11:49 ` Matthew Wilcox @ 2020-04-09 12:55 ` Dmitry Vyukov 2020-04-09 16:32 ` Linus Torvalds 1 sibling, 1 reply; 37+ messages in thread From: Dmitry Vyukov @ 2020-04-09 12:55 UTC (permalink / raw) To: Andrew Morton; +Cc: Peter Xu, LKML, Linus Torvalds, Linux-MM On Thu, Apr 9, 2020 at 2:49 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 7 Apr 2020 21:40:08 -0400 Peter Xu <peterx@redhat.com> wrote: > > > The two patches should fix below syzbot reports: > > > > BUG: unable to handle kernel paging request in kernel_get_mempolicy > > https://lore.kernel.org/lkml/0000000000002b25f105a2a3434d@google.com/ > > > > WARNING: bad unlock balance in __get_user_pages_remote > > https://lore.kernel.org/lkml/00000000000005c65d05a2b90e70@google.com/ > > (Is there an email address for the syzbot operators?) > > sysbot does test linux-next, yet these patches sat in linux-next for a > month without a peep, but all hell broke loose when they hit Linus's > tree. How could this have happened? The same thing: https://groups.google.com/d/msg/syzkaller-bugs/phowYdNXHck/qU1P0TsjBAAJ linux-next is boot-broken for more than a month and bugs are piling onto bugs, I've seen at least 3 different ones. syzbot can't get any working linux-next build for testing for a very long time now. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 12:55 ` Dmitry Vyukov @ 2020-04-09 16:32 ` Linus Torvalds 2020-04-09 16:58 ` Qian Cai 2020-04-09 23:29 ` Stephen Rothwell 0 siblings, 2 replies; 37+ messages in thread From: Linus Torvalds @ 2020-04-09 16:32 UTC (permalink / raw) To: Dmitry Vyukov, Stephen Rothwell; +Cc: Andrew Morton, Peter Xu, LKML, Linux-MM On Thu, Apr 9, 2020 at 5:55 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > linux-next is boot-broken for more than a month and bugs are piling > onto bugs, I've seen at least 3 different ones. > syzbot can't get any working linux-next build for testing for a very > long time now. Ouch. Ok, that's not good. It means that linux-next has basically only done build-testing this whole cycle. Stephen, Dmitry - is there some way linux-next could possibly kick out trees more aggressively if syzbot can't even boot? This merge window has seemed otherwise fairly smooth to me (famous last words), and it's really sad how the nice page fault cleanups ended up being such an ongoing pain when the problems _could_ have been caught earlier. We started to get syzbot reports very quickly after they got merged into my tree, so this is clearly something that gets exercised well - but it would have been oh-so-much better if it had gotten noticed in linux-next. Kicking trees out of linux-next and making noise if they cause syzbot failures might also make some maintainers react more.. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 16:32 ` Linus Torvalds @ 2020-04-09 16:58 ` Qian Cai 2020-04-09 17:05 ` Linus Torvalds 2020-04-09 23:29 ` Stephen Rothwell 1 sibling, 1 reply; 37+ messages in thread From: Qian Cai @ 2020-04-09 16:58 UTC (permalink / raw) To: Linus Torvalds Cc: Dmitry Vyukov, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM > On Apr 9, 2020, at 12:32 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Kicking trees out of linux-next and making noise if they cause syzbot > failures might also make some maintainers react more.. On the other hand, this makes me worry who is testing on linux-next every day. The worst nightmare I am having right now is some maintainers pick up commits that only have been in -next for a few days and then push to the mainline but then it is becoming my burden to fix those commits in case they introduced regressions because it is much harder to revert patches once in mainline. Kicking out of trees in linux-next on the other hand could make the situation worst unless we have a counter solution that make sure commits must be in -next for a certain time (a month?) before merged in mainline. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 16:58 ` Qian Cai @ 2020-04-09 17:05 ` Linus Torvalds 2020-04-09 17:58 ` Qian Cai 0 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2020-04-09 17:05 UTC (permalink / raw) To: Qian Cai Cc: Dmitry Vyukov, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM On Thu, Apr 9, 2020 at 9:58 AM Qian Cai <cai@lca.pw> wrote: > > On the other hand, this makes me worry who is testing on linux-next every day. Well, probably not very many people outside of robots. Which is fine, but is also why I'd like robot failures to then be a big deal. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 17:05 ` Linus Torvalds @ 2020-04-09 17:58 ` Qian Cai 2020-04-09 18:06 ` Linus Torvalds 0 siblings, 1 reply; 37+ messages in thread From: Qian Cai @ 2020-04-09 17:58 UTC (permalink / raw) To: Linus Torvalds Cc: Dmitry Vyukov, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM > On Apr 9, 2020, at 1:05 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Well, probably not very many people outside of robots. > > Which is fine, but is also why I'd like robot failures to then be a big deal. Agree to make a big deal part. My point is that when kicking trees of linux-next, it also could reduce the exposure of many patches (which could be bad) to linux-next and miss valuable early testing either from robots or human. Thus, the same mistakes could happen again because maintainers could simply push those little or none linux-next exposure patches to mainline with no restrictions. There is a balance to strike. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 17:58 ` Qian Cai @ 2020-04-09 18:06 ` Linus Torvalds 2020-04-09 21:14 ` Qian Cai 0 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2020-04-09 18:06 UTC (permalink / raw) To: Qian Cai Cc: Dmitry Vyukov, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM On Thu, Apr 9, 2020 at 10:58 AM Qian Cai <cai@lca.pw> wrote: > > Agree to make a big deal part. My point is that when kicking trees of linux-next, it also could reduce the exposure of many patches (which could be bad) to linux-next and miss valuable early testing either from robots or human. Sure. But I'd want to be notified when something gets kicked out, so that I then know not to pull it. So it would reduce the exposure of patches, but it would also make sure those patches then don't make it upstream. Untested patches is fine - as long as nobody else has to suffer through them. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 18:06 ` Linus Torvalds @ 2020-04-09 21:14 ` Qian Cai 2020-04-10 13:12 ` Tetsuo Handa 0 siblings, 1 reply; 37+ messages in thread From: Qian Cai @ 2020-04-09 21:14 UTC (permalink / raw) To: Linus Torvalds Cc: Dmitry Vyukov, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM > On Apr 9, 2020, at 2:06 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, Apr 9, 2020 at 10:58 AM Qian Cai <cai@lca.pw> wrote: >> >> Agree to make a big deal part. My point is that when kicking trees of linux-next, it also could reduce the exposure of many patches (which could be bad) to linux-next and miss valuable early testing either from robots or human. > > Sure. But I'd want to be notified when something gets kicked out, so > that I then know not to pull it. > > So it would reduce the exposure of patches, but it would also make > sure those patches then don't make it upstream. > > Untested patches is fine - as long as nobody else has to suffer through them. Excellent. It now very much depends on how Stephen will notify you when a tree, a patchset or even a developer should be blacklisted for some time to make this a success. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 21:14 ` Qian Cai @ 2020-04-10 13:12 ` Tetsuo Handa 2020-04-10 14:26 ` Qian Cai 0 siblings, 1 reply; 37+ messages in thread From: Tetsuo Handa @ 2020-04-10 13:12 UTC (permalink / raw) To: Dmitry Vyukov Cc: Qian Cai, Linus Torvalds, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM On 2020/04/10 6:14, Qian Cai wrote: > > >> On Apr 9, 2020, at 2:06 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: >> >> On Thu, Apr 9, 2020 at 10:58 AM Qian Cai <cai@lca.pw> wrote: >>> >>> Agree to make a big deal part. My point is that when kicking trees of linux-next, it also could reduce the exposure of many patches (which could be bad) to linux-next and miss valuable early testing either from robots or human. >> >> Sure. But I'd want to be notified when something gets kicked out, so >> that I then know not to pull it. >> >> So it would reduce the exposure of patches, but it would also make >> sure those patches then don't make it upstream. >> >> Untested patches is fine - as long as nobody else has to suffer through them. > > Excellent. It now very much depends on how Stephen will notify you when > a tree, a patchset or even a developer should be blacklisted for some time > to make this a success. > Since patch flow forms tree structure, I don't know whether maintainers can afford remembering which tree, patchset or developer should be blacklisted when problems come from leaf git trees. By the way... Removing problematic trees might confuse "#syz test:" request, for developers might ask syzbot to test proposed patches on a kernel which does not contain problematic trees. In lucky case, test request fails as patch failure or build failure. But in unlucky case, syzbot fails to detect that proposed patch was tested on a kernel without problematic trees. A bit related to https://github.com/google/syzkaller/issues/1609 . ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-10 13:12 ` Tetsuo Handa @ 2020-04-10 14:26 ` Qian Cai 2020-04-10 17:26 ` Andrew Morton 0 siblings, 1 reply; 37+ messages in thread From: Qian Cai @ 2020-04-10 14:26 UTC (permalink / raw) To: Tetsuo Handa Cc: Dmitry Vyukov, Linus Torvalds, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM > On Apr 10, 2020, at 9:12 AM, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > > On 2020/04/10 6:14, Qian Cai wrote: >> >> >>> On Apr 9, 2020, at 2:06 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: >>> >>> On Thu, Apr 9, 2020 at 10:58 AM Qian Cai <cai@lca.pw> wrote: >>>> >>>> Agree to make a big deal part. My point is that when kicking trees of linux-next, it also could reduce the exposure of many patches (which could be bad) to linux-next and miss valuable early testing either from robots or human. >>> >>> Sure. But I'd want to be notified when something gets kicked out, so >>> that I then know not to pull it. >>> >>> So it would reduce the exposure of patches, but it would also make >>> sure those patches then don't make it upstream. >>> >>> Untested patches is fine - as long as nobody else has to suffer through them. >> >> Excellent. It now very much depends on how Stephen will notify you when >> a tree, a patchset or even a developer should be blacklisted for some time >> to make this a success. >> > > Since patch flow forms tree structure, I don't know whether maintainers can > afford remembering which tree, patchset or developer should be blacklisted > when problems come from leaf git trees. > > > > By the way... > > Removing problematic trees might confuse "#syz test:" request, for > developers might ask syzbot to test proposed patches on a kernel which > does not contain problematic trees. In lucky case, test request fails > as patch failure or build failure. But in unlucky case, syzbot fails to > detect that proposed patch was tested on a kernel without problematic > trees. A bit related to https://github.com/google/syzkaller/issues/1609 . > I looked at those blocking bug list sent by Dmitry. I wonder “boys, why they did’t send those out earlier to linux-next or somewhere more visible?” because I had dealt with most of those before, and I knew the solutions to unblock them! Even though my testing setup is somewhat different from syzbot. I don’t do fuzzers, and my config is only focus on mm, iommu and a few core kernel pieces with more debugging options on, but it does bare-metal and multi-arch, there are still lots of opportunities to help each other with dealing with blocking issues. A few things I am doing differently with syzbot on linux-next where would help to be run continuous without blocking most of the time are, I don’t set panic_on_warn. I’ll deal with warnings afterwards. Occasionally, there are hard failures that I have to deal with right now. I’ll get to the end of it, and figured out the exact commit caused it. In syzbot mode, the bisection (by robot) is the hard part, because if you don’t figure out the exact commit, most of times people (CC by the bug reports) would have no clue and will be ignored. (even if the bad commit was figured out, it is not 100% guaranteed developers would know what’s going on but it helps dramatically, and at least we can revert it without blocking if everything else fails). Thus, it would be really help if syzbot (or human operators) could help bisect, even if it could only figure out one of merge commit in linux-next is bad (where with high accuracy) and may get those ignored less. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-10 14:26 ` Qian Cai @ 2020-04-10 17:26 ` Andrew Morton 2020-04-10 19:46 ` Qian Cai 0 siblings, 1 reply; 37+ messages in thread From: Andrew Morton @ 2020-04-10 17:26 UTC (permalink / raw) To: Qian Cai Cc: Tetsuo Handa, Dmitry Vyukov, Linus Torvalds, Stephen Rothwell, Peter Xu, LKML, Linux-MM On Fri, 10 Apr 2020 10:26:23 -0400 Qian Cai <cai@lca.pw> wrote: > I don't set panic_on_warn. I'll deal with warnings afterwards. I'm not understanding why sysbot sets panic_on_warn. This decision will needlessly turn many kernel errors into wont-boot situations and will block further testing? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-10 17:26 ` Andrew Morton @ 2020-04-10 19:46 ` Qian Cai 0 siblings, 0 replies; 37+ messages in thread From: Qian Cai @ 2020-04-10 19:46 UTC (permalink / raw) To: Andrew Morton Cc: Tetsuo Handa, Dmitry Vyukov, Linus Torvalds, Stephen Rothwell, Peter Xu, LKML, Linux-MM > On Apr 10, 2020, at 1:26 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > > On Fri, 10 Apr 2020 10:26:23 -0400 Qian Cai <cai@lca.pw> wrote: > >> I don't set panic_on_warn. I'll deal with warnings afterwards. > > I'm not understanding why sysbot sets panic_on_warn. This decision > will needlessly turn many kernel errors into wont-boot situations and > will block further testing? I can feel that it is very reasonable to set panic_on_warn for the fully automatic systems because once those warnings happen, the rest of things can no longer to be trusted, so the goal to kill the first enemy on sight, and then deal the next one. It could be a good idea for some trees more stable like the mainline, but for linux-next, I could only dream of set it one day. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 16:32 ` Linus Torvalds 2020-04-09 16:58 ` Qian Cai @ 2020-04-09 23:29 ` Stephen Rothwell 2020-04-13 22:06 ` Qian Cai 1 sibling, 1 reply; 37+ messages in thread From: Stephen Rothwell @ 2020-04-09 23:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dmitry Vyukov, Andrew Morton, Peter Xu, LKML, Linux-MM [-- Attachment #1: Type: text/plain, Size: 1535 bytes --] Hi Linus, On Thu, 9 Apr 2020 09:32:32 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, Apr 9, 2020 at 5:55 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > linux-next is boot-broken for more than a month and bugs are piling > > onto bugs, I've seen at least 3 different ones. > > syzbot can't get any working linux-next build for testing for a very > > long time now. > > Ouch. > > Ok, that's not good. It means that linux-next has basically only done > build-testing this whole cycle. Well, there are other CI's beyond syzbot .. Does syzbot only build/test a single kernel arch/config? > Stephen, Dmitry - is there some way linux-next could possibly kick out > trees more aggressively if syzbot can't even boot? Of course that could be done if I knew that there were problems. From memory and my mail archives, I was only cc'd on 3 problems by sysbot since last November and they were all responded to by the appropriate maintainers/developers. Currently, when I am cc'd on reports, if they are also sent to who seem like the appropriate people, I just file the report assuming it will be dealt with. > Kicking trees out of linux-next and making noise if they cause syzbot > failures might also make some maintainers react more.. That may be true, but in some cases I have carried fixups/reverts/older versions of trees for quite some time before things get fixed. But at least if that happens, I do tend to remind people. -- Cheers, Stephen Rothwell [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-09 23:29 ` Stephen Rothwell @ 2020-04-13 22:06 ` Qian Cai 2020-04-13 23:05 ` Jens Axboe 2020-04-14 11:12 ` Dmitry Vyukov 0 siblings, 2 replies; 37+ messages in thread From: Qian Cai @ 2020-04-13 22:06 UTC (permalink / raw) To: Linus Torvalds, Stephen Rothwell, Andrew Morton Cc: Dmitry Vyukov, Peter Xu, LKML, Linux-MM, Jens Axboe, Christoph Lameter, Johannes Weiner > On Apr 9, 2020, at 7:29 PM, Stephen Rothwell <sfr@canb.auug.org.au> wrote: > > Hi Linus, > > On Thu, 9 Apr 2020 09:32:32 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: >> >> On Thu, Apr 9, 2020 at 5:55 AM Dmitry Vyukov <dvyukov@google.com> wrote: >>> >>> linux-next is boot-broken for more than a month and bugs are piling >>> onto bugs, I've seen at least 3 different ones. >>> syzbot can't get any working linux-next build for testing for a very >>> long time now. >> >> Ouch. >> >> Ok, that's not good. It means that linux-next has basically only done >> build-testing this whole cycle. > > Well, there are other CI's beyond syzbot .. Does syzbot only build/test > a single kernel arch/config? > >> Stephen, Dmitry - is there some way linux-next could possibly kick out >> trees more aggressively if syzbot can't even boot? > > Of course that could be done if I knew that there were problems. From > memory and my mail archives, I was only cc'd on 3 problems by sysbot > since last November and they were all responded to by the appropriate > maintainers/developers. > > Currently, when I am cc'd on reports, if they are also sent to who > seem like the appropriate people, I just file the report assuming it > will be dealt with. > >> Kicking trees out of linux-next and making noise if they cause syzbot >> failures might also make some maintainers react more.. > > That may be true, but in some cases I have carried fixups/reverts/older > versions of trees for quite some time before things get fixed. But at > least if that happens, I do tend to remind people. BTW, I’ll be adding fuzzers to my daily linux-next routines where it triggers this io_uring/scheduler bug almost immediately, so hopefully would buy syzbot some time to resume on linux-next. [67493.516737][T211750] BUG: unable to handle page fault for address: ffffffffffffffe8 [67493.557315][T211750] #PF: supervisor read access in kernel mode [67493.586726][T211750] #PF: error_code(0x0000) - not-present page [67493.614434][T211750] PGD f96e17067 P4D f96e17067 PUD f96e19067 PMD 0 [67493.644846][T211750] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [67493.674127][T211750] CPU: 55 PID: 211750 Comm: trinity-c127 Tainted: G B L 5.7.0-rc1-next-20200413 #4 [67493.722516][T211750] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 04/12/2017 [67493.764925][T211750] RIP: 0010:__wake_up_common+0x98/0x290 __wake_up_common at kernel/sched/wait.c:87 [67493.790675][T211750] Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10 [67493.881650][T211750] RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046 [67493.909854][T211750] RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8 [67493.947131][T211750] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8 [67493.983829][T211750] RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd [67494.020861][T211750] R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000 [67494.059249][T211750] R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8 [67494.099699][T211750] FS: 00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000 [67494.141858][T211750] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [67494.172660][T211750] CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0 [67494.209760][T211750] Call Trace: [67494.224720][T211750] __wake_up_common_lock+0xea/0x150 (inlined by) __wake_up_common_lock at kernel/sched/wait.c:124 [67494.248753][T211750] ? __wake_up_common+0x290/0x290 [67494.272014][T211750] ? lockdep_hardirqs_on+0x16/0x2c0 [67494.296139][T211750] __wake_up+0x13/0x20 [67494.314946][T211750] io_cqring_ev_posted+0x75/0xe0 (inlined by) io_cqring_ev_posted at fs/io_uring.c:1160 [67494.337726][T211750] io_ring_ctx_wait_and_kill+0x1c0/0x2f0 io_ring_ctx_wait_and_kill at fs/io_uring.c:7305 [67494.363840][T211750] io_uring_create+0xa8d/0x13b0 [67494.386526][T211750] ? io_req_defer_prep+0x990/0x990 [67494.410119][T211750] ? __kasan_check_write+0x14/0x20 [67494.433646][T211750] io_uring_setup+0xb8/0x130 [67494.454870][T211750] ? io_uring_create+0x13b0/0x13b0 [67494.478342][T211750] ? check_flags.part.28+0x220/0x220 [67494.502947][T211750] ? lockdep_hardirqs_on+0x16/0x2c0 [67494.526965][T211750] __x64_sys_io_uring_setup+0x31/0x40 [67494.551820][T211750] do_syscall_64+0xcc/0xaf0 [67494.574829][T211750] ? syscall_return_slowpath+0x580/0x580 [67494.604591][T211750] ? lockdep_hardirqs_off+0x1f/0x140 [67494.628901][T211750] ? entry_SYSCALL_64_after_hwframe+0x3e/0xb3 [67494.657616][T211750] ? trace_hardirqs_off_caller+0x3a/0x150 [67494.683999][T211750] ? trace_hardirqs_off_thunk+0x1a/0x1c [67494.709982][T211750] entry_SYSCALL_64_after_hwframe+0x49/0xb3 [67494.737167][T211750] RIP: 0033:0x7fdcb9dd76ed [67494.757698][T211750] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 57 2c 00 f7 d8 64 89 01 48 [67494.849485][T211750] RSP: 002b:00007ffe7fd4e4f8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9 [67494.887906][T211750] RAX: ffffffffffffffda RBX: 00000000000001a9 RCX: 00007fdcb9dd76ed [67494.924754][T211750] RDX: fffffffffffffffc RSI: 0000000000000000 RDI: 0000000000005d54 [67494.961516][T211750] RBP: 00000000000001a9 R08: 0000000e31d3caa7 R09: 0082400004004000 [67494.998485][T211750] R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000000000002 [67495.035510][T211750] R13: 00007fdcb842e058 R14: 00007fdcba4c46c0 R15: 00007fdcb842e000 [67495.072679][T211750] Modules linked in: bridge stp llc nfnetlink cn brd vfat fat ext4 crc16 mbcache jbd2 loop kvm_intel kvm irqbypass intel_cstate intel_uncore dax_pmem intel_rapl_perf dax_pmem_core ip_tables x_tables xfs sd_mod tg3 firmware_class libphy hpsa scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: binfmt_misc] [67495.221120][T211750] CR2: ffffffffffffffe8 [67495.240237][T211750] ---[ end trace f9502383d57e0e22 ]--- [67495.265301][T211750] RIP: 0010:__wake_up_common+0x98/0x290 [67495.290903][T211750] Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10 [67495.382302][T211750] RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046 [67495.410551][T211750] RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8 [67495.447570][T211750] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8 [67495.484252][T211750] RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd [67495.521068][T211750] R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000 [67495.557461][T211750] R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8 [67495.594607][T211750] FS: 00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000 [67495.639332][T211750] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [67495.669033][T211750] CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0 [67495.704569][T211750] Kernel panic - not syncing: Fatal exception [67495.731758][T211750] Kernel Offset: 0x29800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [67495.784988][T211750] ---[ end Kernel panic - not syncing: Fatal exception ]— Also, I’ll need to deal with this slub sysfs and memcg lockdep splat first, so lockdep would still be functioning during the fuzzing. [ 8137.254287][T53013] WARNING: possible circular locking dependency detected [ 8137.261231][T53013] 5.7.0-rc1-next-20200413+ #2 Not tainted [ 8137.266981][T53013] ------------------------------------------------------ [ 8137.274016][T53013] trinity-c10/53013 is trying to acquire lock: [ 8137.280127][T53013] ffffffff89ad2968 (slab_mutex){+.+.}-{3:3}, at: slab_attr_store+0x79/0xf0 [ 8137.288660][T53013] [ 8137.288660][T53013] but task is already holding lock: [ 8137.295944][T53013] ffff9ea421940dd8 (kn->count#88){++++}-{0:0}, at: kernfs_fop_write+0x10e/0x280 [ 8137.305004][T53013] [ 8137.305004][T53013] which lock already depends on the new lock. [ 8137.305004][T53013] [ 8137.315347][T53013] [ 8137.315347][T53013] the existing dependency chain (in reverse order) is: [ 8137.324470][T53013] [ 8137.324470][T53013] -> #1 (kn->count#88){++++}-{0:0}: [ 8137.331774][T53013] __kernfs_remove+0x3bb/0x420 [ 8137.336977][T53013] kernfs_remove+0x2c/0x40 [ 8137.341829][T53013] sysfs_remove_dir+0x7e/0x90 [ 8137.346988][T53013] kobject_del+0x60/0xb0 [ 8137.351663][T53013] sysfs_slab_unlink+0x1c/0x20 [ 8137.356853][T53013] shutdown_cache+0x155/0x1c0 [ 8137.361958][T53013] kmemcg_cache_shutdown_fn+0xe/0x20 [ 8137.367756][T53013] kmemcg_workfn+0x35/0x50 [ 8137.372705][T53013] process_one_work+0x560/0xba0 [ 8137.377985][T53013] worker_thread+0x80/0x5f0 [ 8137.382920][T53013] kthread+0x1db/0x200 [ 8137.387467][T53013] ret_from_fork+0x27/0x50 [ 8137.392296][T53013] [ 8137.392296][T53013] -> #0 (slab_mutex){+.+.}-{3:3}: [ 8137.399587][T53013] __lock_acquire+0x1673/0x23a0 [ 8137.404865][T53013] lock_acquire+0xcd/0x410 [ 8137.409710][T53013] __mutex_lock+0xc9/0xbf0 [ 8137.414561][T53013] mutex_lock_nested+0x31/0x40 [ 8137.419823][T53013] slab_attr_store+0x79/0xf0 [ 8137.424991][T53013] sysfs_kf_write+0x9a/0xb0 [ 8137.429919][T53013] kernfs_fop_write+0x15e/0x280 [ 8137.435272][T53013] do_iter_write+0x261/0x2c0 [ 8137.440291][T53013] vfs_writev+0xe6/0x170 [ 8137.444995][T53013] do_pwritev+0xab/0xd0 [ 8137.449731][T53013] __x64_sys_pwritev+0x61/0x80 [ 8137.455079][T53013] do_syscall_64+0x91/0xb10 [ 8137.460010][T53013] entry_SYSCALL_64_after_hwframe+0x49/0xb3 [ 8137.466330][T53013] [ 8137.466330][T53013] other info that might help us debug this: [ 8137.466330][T53013] [ 8137.476695][T53013] Possible unsafe locking scenario: [ 8137.476695][T53013] [ 8137.484067][T53013] CPU0 CPU1 [ 8137.489337][T53013] ---- ---- [ 8137.494609][T53013] lock(kn->count#88); [ 8137.498718][T53013] lock(slab_mutex); [ 8137.505130][T53013] lock(kn->count#88); [ 8137.511718][T53013] lock(slab_mutex); [ 8137.515594][T53013] [ 8137.515594][T53013] *** DEADLOCK *** [ 8137.515594][T53013] [ 8137.523780][T53013] 3 locks held by trinity-c10/53013: [ 8137.528967][T53013] #0: ffff9ea42c841430 (sb_writers#3){.+.+}-{0:0}, at: vfs_writev+0x13f/0x170 [ 8137.537847][T53013] #1: ffff9ea3c2c9c288 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write+0xfe/0x280 [ 8137.547149][T53013] #2: ffff9ea421940dd8 (kn->count#88){++++}-{0:0}, at: kernfs_fop_write+0x10e/0x280 [ 8137.556554][T53013] [ 8137.556554][T53013] stack backtrace: [ 8137.562360][T53013] CPU: 22 PID: 53013 Comm: trinity-c10 Not tainted 5.7.0-rc1-next-20200413+ #2 [ 8137.571216][T53013] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 03/09/2018 [ 8137.580670][T53013] Call Trace: [ 8137.583865][T53013] dump_stack+0xa4/0x100 [ 8137.588105][T53013] print_circular_bug.cold.49+0x13c/0x141 [ 8137.593746][T53013] check_noncircular+0x183/0x1a0 [ 8137.598818][T53013] ? do_raw_spin_unlock+0x10b/0x1c0 [ 8137.603971][T53013] __lock_acquire+0x1673/0x23a0 [ 8137.608732][T53013] ? print_unreferenced+0x224/0x230 [ 8137.613846][T53013] lock_acquire+0xcd/0x410 [ 8137.618169][T53013] ? slab_attr_store+0x79/0xf0 [ 8137.619804][T52801] VFS: "mand" mount option not supported [ 8137.623085][T53013] __mutex_lock+0xc9/0xbf0 [ 8137.632939][T53013] ? slab_attr_store+0x79/0xf0 [ 8137.637679][T53013] ? _find_next_bit.constprop.1+0xd0/0x100 [ 8137.643477][T53013] ? slab_attr_store+0x79/0xf0 [ 8137.648252][T53013] ? find_next_bit+0x36/0x40 [ 8137.652748][T53013] ? cpumask_next+0x46/0x60 [ 8137.657155][T53013] mutex_lock_nested+0x31/0x40 [ 8137.661897][T53013] ? slab_attr_show+0x30/0x30 [ 8137.666476][T53013] ? mutex_lock_nested+0x31/0x40 [ 8137.671373][T53013] slab_attr_store+0x79/0xf0 [ 8137.676035][T53013] ? slab_attr_show+0x30/0x30 [ 8137.680672][T53013] sysfs_kf_write+0x9a/0xb0 [ 8137.685085][T53013] ? sysfs_file_ops+0xb0/0xb0 [ 8137.689673][T53013] kernfs_fop_write+0x15e/0x280 [ 8137.694433][T53013] do_iter_write+0x261/0x2c0 [ 8137.698997][T53013] vfs_writev+0xe6/0x170 [ 8137.703142][T53013] ? lock_acquire+0xcd/0x410 [ 8137.707696][T53013] ? find_held_lock+0x35/0xa0 [ 8137.712279][T53013] ? __fget_light+0xa3/0x170 [ 8137.716897][T53013] do_pwritev+0xab/0xd0 [ 8137.720960][T53013] __x64_sys_pwritev+0x61/0x80 [ 8137.725808][T53013] do_syscall_64+0x91/0xb10 [ 8137.730274][T53013] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 8137.735733][T53013] entry_SYSCALL_64_after_hwframe+0x49/0xb3 [ 8137.741588][T53013] RIP: 0033:0x7f54d0a386ed ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-13 22:06 ` Qian Cai @ 2020-04-13 23:05 ` Jens Axboe 2020-04-14 11:12 ` Dmitry Vyukov 1 sibling, 0 replies; 37+ messages in thread From: Jens Axboe @ 2020-04-13 23:05 UTC (permalink / raw) To: Qian Cai, Linus Torvalds, Stephen Rothwell, Andrew Morton Cc: Dmitry Vyukov, Peter Xu, LKML, Linux-MM, Christoph Lameter, Johannes Weiner On 4/13/20 4:06 PM, Qian Cai wrote: > > >> On Apr 9, 2020, at 7:29 PM, Stephen Rothwell <sfr@canb.auug.org.au> wrote: >> >> Hi Linus, >> >> On Thu, 9 Apr 2020 09:32:32 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: >>> >>> On Thu, Apr 9, 2020 at 5:55 AM Dmitry Vyukov <dvyukov@google.com> wrote: >>>> >>>> linux-next is boot-broken for more than a month and bugs are piling >>>> onto bugs, I've seen at least 3 different ones. >>>> syzbot can't get any working linux-next build for testing for a very >>>> long time now. >>> >>> Ouch. >>> >>> Ok, that's not good. It means that linux-next has basically only done >>> build-testing this whole cycle. >> >> Well, there are other CI's beyond syzbot .. Does syzbot only build/test >> a single kernel arch/config? >> >>> Stephen, Dmitry - is there some way linux-next could possibly kick out >>> trees more aggressively if syzbot can't even boot? >> >> Of course that could be done if I knew that there were problems. From >> memory and my mail archives, I was only cc'd on 3 problems by sysbot >> since last November and they were all responded to by the appropriate >> maintainers/developers. >> >> Currently, when I am cc'd on reports, if they are also sent to who >> seem like the appropriate people, I just file the report assuming it >> will be dealt with. >> >>> Kicking trees out of linux-next and making noise if they cause syzbot >>> failures might also make some maintainers react more.. >> >> That may be true, but in some cases I have carried fixups/reverts/older >> versions of trees for quite some time before things get fixed. But at >> least if that happens, I do tend to remind people. > > BTW, I’ll be adding fuzzers to my daily linux-next routines where it > triggers this io_uring/scheduler bug almost immediately, so hopefully > would buy syzbot some time to resume on linux-next. > > [67493.516737][T211750] BUG: unable to handle page fault for address: ffffffffffffffe8 > [67493.557315][T211750] #PF: supervisor read access in kernel mode > [67493.586726][T211750] #PF: error_code(0x0000) - not-present page > [67493.614434][T211750] PGD f96e17067 P4D f96e17067 PUD f96e19067 PMD 0 > [67493.644846][T211750] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI > [67493.674127][T211750] CPU: 55 PID: 211750 Comm: trinity-c127 Tainted: G B L 5.7.0-rc1-next-20200413 #4 > [67493.722516][T211750] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 04/12/2017 > [67493.764925][T211750] RIP: 0010:__wake_up_common+0x98/0x290 > __wake_up_common at kernel/sched/wait.c:87 > [67493.790675][T211750] Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10 > [67493.881650][T211750] RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046 > [67493.909854][T211750] RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8 > [67493.947131][T211750] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8 > [67493.983829][T211750] RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd > [67494.020861][T211750] R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000 > [67494.059249][T211750] R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8 > [67494.099699][T211750] FS: 00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000 > [67494.141858][T211750] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [67494.172660][T211750] CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0 > [67494.209760][T211750] Call Trace: > [67494.224720][T211750] __wake_up_common_lock+0xea/0x150 > (inlined by) __wake_up_common_lock at kernel/sched/wait.c:124 > [67494.248753][T211750] ? __wake_up_common+0x290/0x290 > [67494.272014][T211750] ? lockdep_hardirqs_on+0x16/0x2c0 > [67494.296139][T211750] __wake_up+0x13/0x20 > [67494.314946][T211750] io_cqring_ev_posted+0x75/0xe0 > (inlined by) io_cqring_ev_posted at fs/io_uring.c:1160 > [67494.337726][T211750] io_ring_ctx_wait_and_kill+0x1c0/0x2f0 > io_ring_ctx_wait_and_kill at fs/io_uring.c:7305 > [67494.363840][T211750] io_uring_create+0xa8d/0x13b0 > [67494.386526][T211750] ? io_req_defer_prep+0x990/0x990 > [67494.410119][T211750] ? __kasan_check_write+0x14/0x20 > [67494.433646][T211750] io_uring_setup+0xb8/0x130 > [67494.454870][T211750] ? io_uring_create+0x13b0/0x13b0 > [67494.478342][T211750] ? check_flags.part.28+0x220/0x220 > [67494.502947][T211750] ? lockdep_hardirqs_on+0x16/0x2c0 > [67494.526965][T211750] __x64_sys_io_uring_setup+0x31/0x40 > [67494.551820][T211750] do_syscall_64+0xcc/0xaf0 > [67494.574829][T211750] ? syscall_return_slowpath+0x580/0x580 > [67494.604591][T211750] ? lockdep_hardirqs_off+0x1f/0x140 > [67494.628901][T211750] ? entry_SYSCALL_64_after_hwframe+0x3e/0xb3 > [67494.657616][T211750] ? trace_hardirqs_off_caller+0x3a/0x150 > [67494.683999][T211750] ? trace_hardirqs_off_thunk+0x1a/0x1c > [67494.709982][T211750] entry_SYSCALL_64_after_hwframe+0x49/0xb3 > [67494.737167][T211750] RIP: 0033:0x7fdcb9dd76ed > [67494.757698][T211750] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 57 2c 00 f7 d8 64 89 01 48 > [67494.849485][T211750] RSP: 002b:00007ffe7fd4e4f8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9 > [67494.887906][T211750] RAX: ffffffffffffffda RBX: 00000000000001a9 RCX: 00007fdcb9dd76ed > [67494.924754][T211750] RDX: fffffffffffffffc RSI: 0000000000000000 RDI: 0000000000005d54 > [67494.961516][T211750] RBP: 00000000000001a9 R08: 0000000e31d3caa7 R09: 0082400004004000 > [67494.998485][T211750] R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000000000002 > [67495.035510][T211750] R13: 00007fdcb842e058 R14: 00007fdcba4c46c0 R15: 00007fdcb842e000 > [67495.072679][T211750] Modules linked in: bridge stp llc nfnetlink cn brd vfat fat ext4 crc16 mbcache jbd2 loop kvm_intel kvm irqbypass intel_cstate intel_uncore dax_pmem intel_rapl_perf dax_pmem_core ip_tables x_tables xfs sd_mod tg3 firmware_class libphy hpsa scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: binfmt_misc] > [67495.221120][T211750] CR2: ffffffffffffffe8 > [67495.240237][T211750] ---[ end trace f9502383d57e0e22 ]--- > [67495.265301][T211750] RIP: 0010:__wake_up_common+0x98/0x290 > [67495.290903][T211750] Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10 > [67495.382302][T211750] RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046 > [67495.410551][T211750] RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8 > [67495.447570][T211750] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8 > [67495.484252][T211750] RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd > [67495.521068][T211750] R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000 > [67495.557461][T211750] R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8 > [67495.594607][T211750] FS: 00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000 > [67495.639332][T211750] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [67495.669033][T211750] CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0 > [67495.704569][T211750] Kernel panic - not syncing: Fatal exception > [67495.731758][T211750] Kernel Offset: 0x29800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > [67495.784988][T211750] ---[ end Kernel panic - not syncing: Fatal exception ]— This looks like it can happen if you fail setting up the ring (you probably have some error injection enabled), and then io_poll_remove_all() does an unconditional io_cqring_ev_posted() even if we didn't post any events. I think the best fix here is to ensure io_cqring_ev_posted() is only called if we actually post events. If we did, we know for a fact that rings have been setup. I'll queue this up for 5.7. diff --git a/fs/io_uring.c b/fs/io_uring.c index 5190bfb6a665..c0aa72e738b4 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4342,7 +4342,7 @@ static void io_poll_remove_all(struct io_ring_ctx *ctx) { struct hlist_node *tmp; struct io_kiocb *req; - int i; + int posted = 0, i; spin_lock_irq(&ctx->completion_lock); for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) { @@ -4350,11 +4350,12 @@ static void io_poll_remove_all(struct io_ring_ctx *ctx) list = &ctx->cancel_hash[i]; hlist_for_each_entry_safe(req, tmp, list, hash_node) - io_poll_remove_one(req); + posted += io_poll_remove_one(req); } spin_unlock_irq(&ctx->completion_lock); - io_cqring_ev_posted(ctx); + if (posted) + io_cqring_ev_posted(ctx); } static int io_poll_cancel(struct io_ring_ctx *ctx, __u64 sqe_addr) -- Jens Axboe ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-13 22:06 ` Qian Cai 2020-04-13 23:05 ` Jens Axboe @ 2020-04-14 11:12 ` Dmitry Vyukov 2020-04-14 11:59 ` Qian Cai ` (2 more replies) 1 sibling, 3 replies; 37+ messages in thread From: Dmitry Vyukov @ 2020-04-14 11:12 UTC (permalink / raw) To: Qian Cai Cc: Linus Torvalds, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM, Jens Axboe, Christoph Lameter, Johannes Weiner, syzkaller, Dan Rue On Tue, Apr 14, 2020 at 12:06 AM Qian Cai <cai@lca.pw> wrote: > > On Apr 9, 2020, at 7:29 PM, Stephen Rothwell <sfr@canb.auug.org.au> wrote: > > Hi Linus, > > > > On Thu, 9 Apr 2020 09:32:32 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > >> > >> On Thu, Apr 9, 2020 at 5:55 AM Dmitry Vyukov <dvyukov@google.com> wrote: > >>> > >>> linux-next is boot-broken for more than a month and bugs are piling > >>> onto bugs, I've seen at least 3 different ones. > >>> syzbot can't get any working linux-next build for testing for a very > >>> long time now. > >> > >> Ouch. > >> > >> Ok, that's not good. It means that linux-next has basically only done > >> build-testing this whole cycle. > > > > Well, there are other CI's beyond syzbot .. Does syzbot only build/test > > a single kernel arch/config? > > > >> Stephen, Dmitry - is there some way linux-next could possibly kick out > >> trees more aggressively if syzbot can't even boot? Hello all, Sorry for corona/holiday-delays. I will try to answer/comment on all things in this thread in this email. AI: we need to CC linux-next@ on linux-next build/boot failures. I will work on this. We have functionality to CC given emails on _all_ bugs on the given tree, but we don't have this for build/boot bugs only. I will try to add this soon. Stephen, do you want to be CCed as well? Or just linux-next@? > So old bugs generally should be aged out This actually happens now. Bugs without reproducers are auto-closed after 60-120 days since last occurrence (based on past frequency). And for linux-next the range is 40-60 days. Bugs with reproducers are not auto-closed. But they are fix bisected and cause bisected, both of which are only ~66% correct, but still frequently provide a useful signal. Also bugs with reproducers are just generally easier to handle. Another important distinction from Bugzilla is that syzbot dashboard has up-to-date "Last crash time" information. Click on the "Last" column here: https://syzkaller.appspot.com/upstream It's very easy to ignore everything that happened months ago for starters, if that's the concern. So it's not as perfect as it would be with a dedicated human team attached, but I would say it's now in a reasonable shape with ~400 open bugs that happened within the last month. And now we have data to confirm that "old" does not mean "irrelevant". Our leader: BUG: please report to dccp@vger.kernel.org => prev = 0, last = 0 at net/dccp/ccids/lib/packet_history.c:LINE/tfrc_rx_hist_sample_rtt() https://syzkaller.appspot.com/bug?id=0881c535c265ca965edc49c0ac3d0a9850d26eb1 was first triggered 964 days ago, but pretty much still there all that time. > It would be nice if there was some way we could triage Syzkaller > bugs into different buckets. Though, yes, I am afraid of stepping onto the slippery slope of implementing a full-fledged bug tracking system, I think syzbot will gather more bug tracker features and tags will happen. We still have https://github.com/google/syzkaller/issues/608 open and it's mainly the question of allocating resources for implementation and figuring out the actual tags hierarchy. For login and credentials, I guess we will go with just "whoever can send emails is a root" because we are doing this already anyways (closing a bug is more critical than changing a tag) :) Re panic_on_warn. We don't have a dedicated engineer to sheriff and give manual consideration and judgement to each case. And as Qian noted, in such circumstances it's reasonable to don't trust anything after a warning. Some notorious examples: LOCKDEP warnings disable LOCKDEP; so if we boot in such state with eyes closed and then try to do fuzzing, or "better" test a patch for a LOCKDEP error, or do bisection of a LOCKDEP error, we will immediately give bogus testing results or bisection culprit. Or a warning about hung task may re-appear later during testing and confuse results again. Or if we ignore KASAN warning, we boot potentially with corrupted memory with who-knows-what consequences. A "normal" WARNING may be benign (misuse of WARNING), or maybe not. Impossible to figure out automatically. And in the end, if we ignore that, who/when will notice and fix that? We get this far with this black-and-white criteria for kernel bugs. I think it had some positive effects on a number of areas, as we go forward I think it's better to extend panic_on_warn to more testing systems. Then non-fatal bugs will be no different from fatal bugs during boot, which we need to handle in a reasonable timeframe anyway. Which gets me to the next "interesting" point. > Well, there are other CI's beyond syzbot. > On the other hand, this makes me worry who is testing on linux-next every day. How do these use-after-free's and locking bugs get past the unit-testing systems (which syzbot is not) and remain unnoticed for so long?... syzbot uses the dumbest VMs (GCE), so everything it triggers during boot should be triggerable pretty much everywhere. It seems to be an action point for the testing systems. "Boot to ssh" is not the best criteria. Again if there is a LOCKDEP error, we are not catching any more LOCKDEP errors during subsequent testing. If there is a use-after-free, that's a serious error on its own and KASAN produces only 1 error by default as well. And as far as I understand, lots of kernel testing systems don't even enable KASAN, which is very wrong. I've talked to +Dan Rue re this few days ago. Hopefully LKFT will start catching these as part of unit testing. Which should help with syzbot testing as well. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-14 11:12 ` Dmitry Vyukov @ 2020-04-14 11:59 ` Qian Cai 2020-04-14 12:05 ` Dmitry Vyukov 2020-04-14 19:28 ` Dan Rue 2020-04-16 0:34 ` Stephen Rothwell 2 siblings, 1 reply; 37+ messages in thread From: Qian Cai @ 2020-04-14 11:59 UTC (permalink / raw) To: Dmitry Vyukov Cc: Linus Torvalds, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM, Jens Axboe, Christoph Lameter, Johannes Weiner, syzkaller, Dan Rue > On Apr 14, 2020, at 7:13 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > > How do these use-after-free's and locking bugs get past the > unit-testing systems (which syzbot is not) and remain unnoticed for so > long?... > syzbot uses the dumbest VMs (GCE), so everything it triggers during > boot should be triggerable pretty much everywhere. There are many reasons that any early testing would not be able to catch ALL the syzbot blockers. The Kconfigs are different. For example, I don’t have openvswitch enabled, so would miss that ovs rcu-list lockdep warning. Same for that use-after-free in net/bluetooth and a warning in sound subsystem. But, notifying Linux-next ML is a good start, so at least we could ask Paul or Steve to pull out the commit which enabling rcu-list debugging by default with PROVE_RCU. I learned through that restricted kconfig to some degree of minimal could save a lot of troubles late on especially those options that I have no way to exercise like net/bluetooth and sound currently. It is going to be extra works though because those default options in Linux-next or even defconfigs are not always pleasant and would want to enable something I don’t need if not given human intervention. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-14 11:59 ` Qian Cai @ 2020-04-14 12:05 ` Dmitry Vyukov 0 siblings, 0 replies; 37+ messages in thread From: Dmitry Vyukov @ 2020-04-14 12:05 UTC (permalink / raw) To: Qian Cai Cc: Linus Torvalds, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM, Jens Axboe, Christoph Lameter, Johannes Weiner, syzkaller, Dan Rue On Tue, Apr 14, 2020 at 1:59 PM Qian Cai <cai@lca.pw> wrote: > > On Apr 14, 2020, at 7:13 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > > > > How do these use-after-free's and locking bugs get past the > > unit-testing systems (which syzbot is not) and remain unnoticed for so > > long?... > > syzbot uses the dumbest VMs (GCE), so everything it triggers during > > boot should be triggerable pretty much everywhere. > > There are many reasons that any early testing would not be able to catch ALL the syzbot blockers. > > The Kconfigs are different. For example, I don’t have openvswitch enabled, so would miss that ovs rcu-list lockdep warning. Same for that use-after-free in net/bluetooth and a warning in sound subsystem. > > But, notifying Linux-next ML is a good start, so at least we could ask Paul or Steve to pull out the commit which enabling rcu-list debugging by default with PROVE_RCU. > > I learned through that restricted kconfig to some degree of minimal could save a lot of troubles late on especially those options that I have no way to exercise like net/bluetooth and sound currently. It is going to be extra works though because those default options in Linux-next or even defconfigs are not always pleasant and would want to enable something I don’t need if not given human intervention. We only try to enable what we can reach. There is significant reach for sound and net/bluetooth even without any hardware. So I would assume generic testing systems like KernelCI, LKFT, CKI should enable these as well. Hopefully we don't have all of the sound and net/bluetooth completely untested in linux-next. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-14 11:12 ` Dmitry Vyukov 2020-04-14 11:59 ` Qian Cai @ 2020-04-14 19:28 ` Dan Rue 2020-04-15 11:09 ` Dmitry Vyukov 2020-04-16 0:34 ` Stephen Rothwell 2 siblings, 1 reply; 37+ messages in thread From: Dan Rue @ 2020-04-14 19:28 UTC (permalink / raw) To: Dmitry Vyukov Cc: Qian Cai, Linus Torvalds, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM, Jens Axboe, Christoph Lameter, Johannes Weiner, syzkaller On Tue, Apr 14, 2020 at 01:12:50PM +0200, Dmitry Vyukov wrote: > On Tue, Apr 14, 2020 at 12:06 AM Qian Cai <cai@lca.pw> wrote: > > Well, there are other CI's beyond syzbot. > > On the other hand, this makes me worry who is testing on linux-next every day. > > How do these use-after-free's and locking bugs get past the > unit-testing systems (which syzbot is not) and remain unnoticed for so > long?... > syzbot uses the dumbest VMs (GCE), so everything it triggers during > boot should be triggerable pretty much everywhere. > It seems to be an action point for the testing systems. "Boot to ssh" > is not the best criteria. Again if there is a LOCKDEP error, we are > not catching any more LOCKDEP errors during subsequent testing. If > there is a use-after-free, that's a serious error on its own and KASAN > produces only 1 error by default as well. And as far as I understand, > lots of kernel testing systems don't even enable KASAN, which is very > wrong. > I've talked to +Dan Rue re this few days ago. Hopefully LKFT will > start catching these as part of unit testing. Which should help with > syzbot testing as well. LKFT has recently added testing with KASAN enabled and improved the kernel log parsing to catch more of this class of errors while performing our regular functional testing. Incidentally, -next was also broken for us from March 25 through April 5 due to a perf build failure[0], which eventually made itself all the way down into v5.6 release and I believe the first two 5.6.x stable releases. For -next, LKFT's gap is primarily reporting. We do build and run over 30k tests on every -next daily release, but we send out issues manually when we see them because triaging is still a manual effort. We're working to build better automated reporting. If anyone is interested in watching LKFT's -next results more closely (warning, it's a bit noisy), please let me know. Watching the results at https://lkft.linaro.org provides some overall health indications, but again, it gets pretty difficult to figure out signal from noise once you start drilling down without sufficient context of the system. Dan [0] https://lore.kernel.org/stable/CA+G9fYsZjmf34pQT1DeLN_DDwvxCWEkbzBfF0q2VERHb25dfZQ@mail.gmail.com/ -- Linaro LKFT https://lkft.linaro.org ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-14 19:28 ` Dan Rue @ 2020-04-15 11:09 ` Dmitry Vyukov 2020-04-15 16:23 ` Dan Rue 0 siblings, 1 reply; 37+ messages in thread From: Dmitry Vyukov @ 2020-04-15 11:09 UTC (permalink / raw) To: Dan Rue Cc: Qian Cai, Linus Torvalds, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM, Jens Axboe, Christoph Lameter, Johannes Weiner, syzkaller On Tue, Apr 14, 2020 at 9:28 PM Dan Rue <dan.rue@linaro.org> wrote: > > On Tue, Apr 14, 2020 at 01:12:50PM +0200, Dmitry Vyukov wrote: > > On Tue, Apr 14, 2020 at 12:06 AM Qian Cai <cai@lca.pw> wrote: > > > Well, there are other CI's beyond syzbot. > > > On the other hand, this makes me worry who is testing on linux-next every day. > > > > How do these use-after-free's and locking bugs get past the > > unit-testing systems (which syzbot is not) and remain unnoticed for so > > long?... > > syzbot uses the dumbest VMs (GCE), so everything it triggers during > > boot should be triggerable pretty much everywhere. > > It seems to be an action point for the testing systems. "Boot to ssh" > > is not the best criteria. Again if there is a LOCKDEP error, we are > > not catching any more LOCKDEP errors during subsequent testing. If > > there is a use-after-free, that's a serious error on its own and KASAN > > produces only 1 error by default as well. And as far as I understand, > > lots of kernel testing systems don't even enable KASAN, which is very > > wrong. > > I've talked to +Dan Rue re this few days ago. Hopefully LKFT will > > start catching these as part of unit testing. Which should help with > > syzbot testing as well. > > LKFT has recently added testing with KASAN enabled and improved the > kernel log parsing to catch more of this class of errors while > performing our regular functional testing. > > Incidentally, -next was also broken for us from March 25 through April 5 > due to a perf build failure[0], which eventually made itself all the way > down into v5.6 release and I believe the first two 5.6.x stable > releases. > > For -next, LKFT's gap is primarily reporting. We do build and run over > 30k tests on every -next daily release, but we send out issues manually > when we see them because triaging is still a manual effort. We're > working to build better automated reporting. If anyone is interested in > watching LKFT's -next results more closely (warning, it's a bit noisy), > please let me know. Watching the results at https://lkft.linaro.org > provides some overall health indications, but again, it gets pretty > difficult to figure out signal from noise once you start drilling down > without sufficient context of the system. What kind of failures and noise do you get? Is it flaky tests? I would assume build failures are ~0% flaky/noisy. And boot failures are maybe ~1% flaky/noisy due to some infra issues. I can't find any actual test failure logs in the UI. I've got to this page: https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.7-rc1-24-g8632e9b5645b/testrun/1363280/suite/kselftest/tests/ which seem to contain failed tests on mainline. But I still can't find the actual test failure logs. > Dan > > [0] https://lore.kernel.org/stable/CA+G9fYsZjmf34pQT1DeLN_DDwvxCWEkbzBfF0q2VERHb25dfZQ@mail.gmail.com/ > > -- > Linaro LKFT > https://lkft.linaro.org ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-15 11:09 ` Dmitry Vyukov @ 2020-04-15 16:23 ` Dan Rue 0 siblings, 0 replies; 37+ messages in thread From: Dan Rue @ 2020-04-15 16:23 UTC (permalink / raw) To: Dmitry Vyukov Cc: Qian Cai, Linus Torvalds, Stephen Rothwell, Andrew Morton, Peter Xu, LKML, Linux-MM, Jens Axboe, Christoph Lameter, Johannes Weiner, syzkaller On Wed, Apr 15, 2020 at 01:09:32PM +0200, Dmitry Vyukov wrote: > On Tue, Apr 14, 2020 at 9:28 PM Dan Rue <dan.rue@linaro.org> wrote: > > > > On Tue, Apr 14, 2020 at 01:12:50PM +0200, Dmitry Vyukov wrote: > > > On Tue, Apr 14, 2020 at 12:06 AM Qian Cai <cai@lca.pw> wrote: > > > > Well, there are other CI's beyond syzbot. > > > > On the other hand, this makes me worry who is testing on linux-next every day. > > > > > > How do these use-after-free's and locking bugs get past the > > > unit-testing systems (which syzbot is not) and remain unnoticed for so > > > long?... > > > syzbot uses the dumbest VMs (GCE), so everything it triggers during > > > boot should be triggerable pretty much everywhere. > > > It seems to be an action point for the testing systems. "Boot to ssh" > > > is not the best criteria. Again if there is a LOCKDEP error, we are > > > not catching any more LOCKDEP errors during subsequent testing. If > > > there is a use-after-free, that's a serious error on its own and KASAN > > > produces only 1 error by default as well. And as far as I understand, > > > lots of kernel testing systems don't even enable KASAN, which is very > > > wrong. > > > I've talked to +Dan Rue re this few days ago. Hopefully LKFT will > > > start catching these as part of unit testing. Which should help with > > > syzbot testing as well. > > > > LKFT has recently added testing with KASAN enabled and improved the > > kernel log parsing to catch more of this class of errors while > > performing our regular functional testing. > > > > Incidentally, -next was also broken for us from March 25 through April 5 > > due to a perf build failure[0], which eventually made itself all the way > > down into v5.6 release and I believe the first two 5.6.x stable > > releases. > > > > For -next, LKFT's gap is primarily reporting. We do build and run over > > 30k tests on every -next daily release, but we send out issues manually > > when we see them because triaging is still a manual effort. We're > > working to build better automated reporting. If anyone is interested in > > watching LKFT's -next results more closely (warning, it's a bit noisy), > > please let me know. Watching the results at https://lkft.linaro.org > > provides some overall health indications, but again, it gets pretty > > difficult to figure out signal from noise once you start drilling down > > without sufficient context of the system. > > What kind of failures and noise do you get? Is it flaky tests? > I would assume build failures are ~0% flaky/noisy. And boot failures > are maybe ~1% flaky/noisy due to some infra issues. Right - infrastructure problems aside (which are the easy part), tests are quite flaky/noisy. I guess we're getting quite off topic now, but in LKFT's case we run tests that are available from the likes of LTP, kselftest, and a variety of other test suites. Every test was written by a developer with certain assumptions in place - many of which we violate when we run them on a small arm board, for example. And many may just be low quality to begin with, but they often work well enough for the original author's use-case. In such cases, we mark them (manually at this point) as a known issue. For example, here are our kselftest known issues: https://github.com/Linaro/qa-reports-known-issues/blob/master/kselftests-production.yaml These lists are quite a chore to keep up to date, and so they tend to lag reality. What's needed (and what we're working toward) is more sophisticated analytics on top of our results to determine actual regressions. I'll give just one example, randomly selected but typical. Here's a timer test that sometimes passes and sometimes fails, which compares how much time something takes with a hard coded value of what the author expects. Running on small arm hosts or under qemu, the following check sometimes fails: https://github.com/torvalds/linux/blob/master/tools/testing/selftests/timers/rtcpie.c#L104-L111 There are _many_ such tests - hundreds or thousands, which rely on hard coded expectations and are quite hard to "fix". But we run them all because most of them haven't failed yet, and if they do we'll find out why. We ignore the tests which either always fail, or which sometimes fail, in general. I'm sure there are some legitimate bugs in that set of failures, but they're probably not "regressions" so just as syzkaller lets old bugs close automatically, we ignore tests that have a history of failing. > > I can't find any actual test failure logs in the UI. I've got to this page: > https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.7-rc1-24-g8632e9b5645b/testrun/1363280/suite/kselftest/tests/ > which seem to contain failed tests on mainline. But I still can't find > the actual test failure logs. From the link you gave, if you go up one level to https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.7-rc1-24-g8632e9b5645b/testrun/1363280/, you will see links to the "Log File" which takes you to https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.7-rc1-24-g8632e9b5645b/testrun/1363280/log. In some test suite cases (perhaps just LTP), we have logs per test. In most, we just have one large log of the entire run. Even when we have a log-per-test, it may miss some asynchronous dmesg output I expect, causing an investigator to look at the whole log anyway. Dan > > > > Dan > > > > [0] https://lore.kernel.org/stable/CA+G9fYsZjmf34pQT1DeLN_DDwvxCWEkbzBfF0q2VERHb25dfZQ@mail.gmail.com/ > > > > -- > > Linaro LKFT > > https://lkft.linaro.org -- Linaro LKFT https://lkft.linaro.org ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-14 11:12 ` Dmitry Vyukov 2020-04-14 11:59 ` Qian Cai 2020-04-14 19:28 ` Dan Rue @ 2020-04-16 0:34 ` Stephen Rothwell 2020-05-11 15:29 ` Dmitry Vyukov 2 siblings, 1 reply; 37+ messages in thread From: Stephen Rothwell @ 2020-04-16 0:34 UTC (permalink / raw) To: Dmitry Vyukov Cc: Qian Cai, Linus Torvalds, Andrew Morton, Peter Xu, LKML, Linux-MM, Jens Axboe, Christoph Lameter, Johannes Weiner, syzkaller, Dan Rue [-- Attachment #1: Type: text/plain, Size: 480 bytes --] Hi Dmitry, On Tue, 14 Apr 2020 13:12:50 +0200 Dmitry Vyukov <dvyukov@google.com> wrote: > > AI: we need to CC linux-next@ on linux-next build/boot failures. I > will work on this. > We have functionality to CC given emails on _all_ bugs on the given > tree, but we don't have this for build/boot bugs only. I will try to > add this soon. > Stephen, do you want to be CCed as well? Or just linux-next@? Please cc me as well, thanks. -- Cheers, Stephen Rothwell [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports 2020-04-16 0:34 ` Stephen Rothwell @ 2020-05-11 15:29 ` Dmitry Vyukov 0 siblings, 0 replies; 37+ messages in thread From: Dmitry Vyukov @ 2020-05-11 15:29 UTC (permalink / raw) To: Stephen Rothwell Cc: Qian Cai, Linus Torvalds, Andrew Morton, Peter Xu, LKML, Linux-MM, Jens Axboe, Christoph Lameter, Johannes Weiner, syzkaller, Dan Rue On Thu, Apr 16, 2020 at 2:34 AM Stephen Rothwell <sfr@canb.auug.org.au> wrote: > > Hi Dmitry, > > On Tue, 14 Apr 2020 13:12:50 +0200 Dmitry Vyukov <dvyukov@google.com> wrote: > > > > AI: we need to CC linux-next@ on linux-next build/boot failures. I > > will work on this. > > We have functionality to CC given emails on _all_ bugs on the given > > tree, but we don't have this for build/boot bugs only. I will try to > > add this soon. > > Stephen, do you want to be CCed as well? Or just linux-next@? > > Please cc me as well, thanks. To close the loop: I implemented and deployed 2 improvements: 1. Notion of per-repo "build maintainers": https://github.com/google/syzkaller/commit/88cb3e92ba25303ab67aaceb083fe7304fccd32f Now linux-next@ and Stephen should be CCed on all linux-next breakages automatically. 2. Finding maintainers for build errors: https://github.com/google/syzkaller/commit/65a44e22ba217ef7272b9d3735e9d12cfaa204f6 syzbot will attempt to extract the broken file name from make output and then run get_matainers.pl on it to find relevant emails. ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2020-05-11 15:29 UTC | newest] Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20200414040717.22040-1-hdanton@sina.com> 2020-04-14 4:31 ` [PATCH 0/2] mm: Two small fixes for recent syzbot reports Jens Axboe 2020-04-08 1:40 Peter Xu 2020-04-09 0:47 ` Andrew Morton 2020-04-09 11:49 ` Matthew Wilcox 2020-04-09 13:00 ` Dmitry Vyukov 2020-04-09 18:16 ` Andrew Morton 2020-04-09 18:53 ` Linus Torvalds 2020-04-09 19:12 ` Andrew Morton 2020-04-09 19:46 ` Linus Torvalds 2020-04-09 19:56 ` Matthew Wilcox 2020-04-09 19:58 ` Linus Torvalds 2020-04-09 20:27 ` Eric Biggers 2020-04-09 20:34 ` Linus Torvalds 2020-04-09 23:34 ` Stephen Rothwell 2020-04-10 1:11 ` Theodore Y. Ts'o 2020-04-09 12:55 ` Dmitry Vyukov 2020-04-09 16:32 ` Linus Torvalds 2020-04-09 16:58 ` Qian Cai 2020-04-09 17:05 ` Linus Torvalds 2020-04-09 17:58 ` Qian Cai 2020-04-09 18:06 ` Linus Torvalds 2020-04-09 21:14 ` Qian Cai 2020-04-10 13:12 ` Tetsuo Handa 2020-04-10 14:26 ` Qian Cai 2020-04-10 17:26 ` Andrew Morton 2020-04-10 19:46 ` Qian Cai 2020-04-09 23:29 ` Stephen Rothwell 2020-04-13 22:06 ` Qian Cai 2020-04-13 23:05 ` Jens Axboe 2020-04-14 11:12 ` Dmitry Vyukov 2020-04-14 11:59 ` Qian Cai 2020-04-14 12:05 ` Dmitry Vyukov 2020-04-14 19:28 ` Dan Rue 2020-04-15 11:09 ` Dmitry Vyukov 2020-04-15 16:23 ` Dan Rue 2020-04-16 0:34 ` Stephen Rothwell 2020-05-11 15:29 ` Dmitry Vyukov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).